Journal of Complexity, Volume 23, Issues 4-6, Pages 421-964 (August-December 2007), Festschrift for the 60th Birthday of Henryk Woźniakowski

IC.qxp 11/13/2007 5:02 PM Page 1 Contents continued from outside back cover Journal of Complexity EDITOR-IN-CHIEF J...

Author: Boleslaw Kacewicz | Leszek Plaskota and Grzegorz Wasilkowski (eds.)

5 downloads 374 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

IC.qxp

11/13/2007

5:02 PM

Page 1

Contents continued from outside back cover

Journal of Complexity EDITOR-IN-CHIEF Joseph F. Traub Department of Computer Science Columbia University 1214 Amsterdam Avenue, MC 0401 New York, NY 10027, USA [email protected]

EDITORS Harald Niederreiter

Henryk Wozńiakowski

Department of Mathematics National University of Singapore 2 Science Drive 2 Singapore 117543 Republic of Singapore [email protected]

Department of Computer Science Columbia University 1214 Amsterdam Avenue, MC 0401 New York, NY 10027, USA [email protected]

ASSOCIATE EDITORS Jin-Yi Cai Computer Sciences Department University of Wisconsin—Madison 1210 West Dayton Street Madison, WI 53706, USA [email protected]

Ronald Cools Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan 200A B-3001 Heverlee, Belgium [email protected]

Stefan Heinrich Universität Kaiserslautern FB Informatik, Postfach 3049 D-67653 Kaiserslautern, Germany [email protected]

Peter Hertling Institut für Theoretische Informatik und Mathematik Universität der Bundeswehr Muenchen 85577 Neubiberg, Germany [email protected]

Felipe Cucker Department of Mathematics City University of Hong Kong 83, Tat Chee Avenue Kowloon, Hong Kong, PRC [email protected]

Stephan Dahlke Universitaet Marburg Fachbereich Mathematik und Informatik Hans Meerwein Strasse D-35032 Marburg, Germany [email protected]

Jean-Pierre Dedieu MIP, Université Paul Sabatier 31062 Toulouse Cedex 4 France [email protected]

Fred J. Hickernell Department of Applied Mathematics Illinois Institute of Technology Chicago, Il 60616-3793, USA [email protected]

Arieh Iserles DAMTP, University of Cambridge Silver Street Cambridge CB3 9EW England [email protected]

Ker-I Ko Department of Computer Science State University of New York at Stony Brook Stony Brook, NY 11794, USA [email protected]

Error propagation of general linear methods for ordinary differential equations J.C. Butcher, Z. Jackiewicz, W.M. Wright

560

On the existence of higher order polynomial lattices based on a generalized figure of merit Josef Dick, Peter Kritzer, Friedrich Pillichshammer, Wolfgang Ch. Schmid

581

A note on parallel and alternating time Felipe Cucker, Irénée Briquel

594

Searching for extensible Korobov rules Hardeep S. Gill, Christiane Lemieux

603

Optimal approximation of elliptic problems by linear and nonlinear mappings III: Frames Stephan Dahlke, Erich Novak, Winfried Sickel A note on the existence of sequences with small star discrepancy Josef Dick Optimal recovery of solutions of the generalized heat equation in the unit ball from inaccurate data K.Yu. Osipenko, E.V. Wedenskaya

614 649

653

Discrepancy with respect to convex polygons W.W.L. Chen, G. Travaglini

662

Simple Monte Carlo and the Metropolis algorithm Peter Mathé, Erich Novak

673

Tensor-product approximation to operators and functions in high dimensions Wolfgang Hackbusch, Boris N. Khoromskij

697

BDDC methods for discontinuous Galerkin discretization of elliptic problems Maksymilian Dryja, Juan Galvis, Marcus Sarkis

715

An effective algorithm for generation of factorial designs with generalized minimum aberration Kai-Tai Fang, Aijun Zhang, Runze Li

740

Lattice-Nyström method for Fredholm integral equations of the second kind with convolution type kernels Josef Dick, Peter Kritzer, Frances Y. Kuo, Ian H. Sloan

752

Contents continued on backmatter page

Journal of Complexity ASSOCIATE EDITORS Thomas Lickteig Laboratoire d’Arithmétique, LACO Faculté des Sciences Université de Limoges 123 Avenue Albert Thomas F-87060 Limoges Cedex, France [email protected]

Klaus Ritter Fachbereich Mathematik TU Darmstadt Schlossgartenstr 7 64289 Darmstadt, Germany [email protected]

Ian H. Sloan Peter Mathe Institut für Angewandte Analysis und Stochastik Mohrenstrasse 39 D-10117 Berlin, Germany [email protected]

School of Mathematics University of New South Wales Sydney 2052, Australia [email protected]

Vladimir N. Temlyakov Klaus Meer Brandenburgische Technische Universität Cottbus Lehrstuhl Theoretische Informatik Konrad-Wachsmann-Allee 1 D-03046 Cottbus, Germany [email protected]

Erich Novak Mathematisches Institut University of Jena Ernst-Abbe-Platz 2 07740 Jena, Germany [email protected]

Luis M. Pardo Departmento de Matematicas, Estadistica y Computacion Facultad de Ciencias Universidad de Cantabria Avda. Los Castros, s/n E-39071 SANTANDER, Spain [email protected]

Sergei Pereverzyev Johann Radon Institute for Computational and Applied Mathematics Austrian Academy of Science c/o Johannes Kepler Universitat Linz A—4040 Linz, Austria [email protected]

Department of Mathematics University of South Carolina Columbia, SC 29208, USA [email protected]

Roberto Tempo IRITI-CNR Politecnico di Torino Corso Duca degli Abruzzi, 24 10129 Torino, Italy [email protected]

Shu Tezuka Faculty of Mathematics Kyushu University, 6-10-1, Hakozaki, Higashi-ku, Fukuoka-shi, Fukuoka-ken, Japan 812-8581 [email protected]

Grzegorz Wasilkowski Department of Computer Science University of Kentucky 773 Anderson Hall Lexington, KY 40506-0046, USA [email protected]

Arthur G. Werschulz Department of Computer Science Columbia University 1214 Amsterdam Avenue, MC 0401 New York, NY 10027, USA [email protected]

Leszek Plaskota Department of Mathematics, Informatics, and Mechanics University of Warsaw Ul. Banacha 2 Warsaw 02-097, Poland [email protected]

Mihalis Yannakakis Department of Computer Science Columbia University 1214 Amsterdam Avenue, MC 0401 New York, NY 10027, USA [email protected]

Journal of Complexity 23 (2007) 421 – 422 www.elsevier.com/locate/jco

Guest Editors’ Preface

Issue dedicated to Professor Henryk Wo´zniakowski This issue of the Journal of Complexity is dedicated to Professor Henryk Wo´zniakowski on the occasion of his 60th birthday, which was celebrated during the seminar "Algorithms and Complexity for Continuous Problems" at Schloss Dagstuhl, Germany, in Fall 2006. Henryk has been a dear friend, teacher, mentor, and collaborator for the three of us as well as for many others. This special issue is a small token of our appreciation for all that he has done for each of us individually, for the community at large, and for the whole field of computational mathematics. For over 30 years, Henryk has been an important player in computational mathematics, as well as a co-founder of information-based complexity. He has co-authored three research monographs and written some 150 papers. He has written papers with 34 different researchers from various countries, including Australia, Austria, China, Germany, Poland, and the USA. He was the advisor to 11 PhD students and many more MS students. Henryk was a co-founder of the Journal of Complexity, which is currently one of the top applied mathematics and computer science journals on the Thompson ISI Impact Factor list. He is on the Editorial Boards of Numerical Algorithms and Matematyka Stosowana and on the Advisory Board of Foundations of Computational Mathematics. Henryk has received a number of awards for his achievements, including two Prizes of the First Degree from the Ministry of National Education of Poland in 1980 and 1989, Research Awards of the Polish Academy of Sciences in 1975 and 1983, the Stanislaw Mazur Award of the Polish Mathematical Society in 1988, the Wladyslaw Orlicz Medal in 2006, and a Humboldt Research Award for 2006–2007. Describing all Henryk’s research contributions would require too many pages. This is why we restrict ourselves to some of the results pertaining to scientific computing and computational complexity. Over 20 of his papers deal with numerical stability and/or convergence of algorithms for solving linear and non-linear systems of equations. For instance, in the paper that appeared in Numerische Mathematik in 1977 he gave a stable version of the Chebyshev method for large systems of linear equations. In 1980, he published in Linear Algebra and Applications a result showing that the commonly used conjugate gradient algorithm is unstable; however, it regains numerical stability when complemented by a few steps of iterative refinement. In a number of papers he introduced the very useful concept of the order of information of iterative methods for solving nonlinear equations and obtained a number of seminal results on the optimal convergence rate of iterations. Although the results on iterative methods always addressed cost and optimality questions, Henryk’s research moved into the complexity of general problems with his first research monograph,

0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.10.001

422

Guest Editors’ Preface / Journal of Complexity 23 (2007) 421 – 422

A General Theory of Optimal Algorithm co-authored with J.F. Traub, Academic Press, 1980. Among a number of new results, they extended Bakhvalov’s result by showing that adaption does not help for linear problems, and characterized optimal algorithms for linear problems defined on Hilbert spaces. This monograph was devoted solely to the worst case setting. Other settings, including average, probabilistic and randomized, were considered in the next two monographs, Information, Uncertainty, Complexity, Addison Wesley, 1983, and Information-Based Complexity, Academic Press, 1988, both co-authored with J.F. Traub and G.W. Wasilkowski. The monographs presented a number of new results in those settings and motivated people to pursue research in information-based complexity. Another seminal result appeared in the Bulletin of AMS in 1991. Henryk showed that the problem of selecting optimal sampling points for multivariate integration in the average case setting with respect to the Wiener sheet measure is equivalent to the problem of low discrepancy points and the minimal worst case errors of quasi-Monte Carlo methods. This observation was instrumental in renewing interest in quasi-Monte Carlo methods and their applications to problems with a huge number of variables. In 1994, Henryk formalized the concept of tractability of continuous problems. Since then, hundreds of papers have been written on this subject by many researchers from all over the world. Henryk plays the major role in this research area and is the author or coauthor of many important and deep results. These findings often provide new efficient algorithms for a variety of problems, including multivariate approximation, integration, and path integrals. Moreover, with collaborators, he has proposed such concepts as weighted spaces, finite-order weights, and generalized tractability, to mention just a few. Recently, Henryk has also contributed to the relatively new field of numerical quantum computation. For example, he showed that path integrals could be computed faster on a quantum computer than on a classical computer. In summary, Henryk has made very major contributions to a number of different areas of applied mathematics and computer science and has been the leading researcher in informationbased complexity. Numerous conferences and seminars all over the world that he co-organized resulted in the development and popularization of the topic. He has infected many of us with his enthusiasm and interest in continuous complexity. He has befriended and mentored many more; the number of submitted papers to this Festschrift and their scope are clear proof of that. We thank all the friends who submitted papers. We adhered to the very strict acceptance criteria of the Journal of Complexity, which resulted in the rejection of some interesting papers. To avoid a scheduling problem (which, in general, is NP-complete), we decided to present the accepted papers in the order in which they were submitted. Boleslaw Kacewicz Leszek Plaskota Grzegorz Wasilkowski


On the counting function of the lattice profile of periodic sequences Fang-Wei Fua,∗,1 , Harald Niederreiterb a Temasek Laboratories, National University of Singapore, 5 Sports Drive 2, Singapore 117508, Republic of Singapore b Department of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543,

Republic of Singapore Received 20 February 2006; accepted 11 May 2006 Available online 4 August 2006 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract The lattice profile analyzes the intrinsic structure of pseudorandom number sequences with applications in Monte Carlo methods and cryptology. In this paper, using the discrete Fourier transform for periodic sequences and the relation between the lattice profile and the linear complexity, we give general formulas for the expected value, variance, and counting function of the lattice profile of periodic sequences with fixed period. Moreover, we determine in a more explicit form the expected value, variance, and counting function of the lattice profile of periodic sequences for special values of the period. © 2006 Elsevier Inc. All rights reserved. Keywords: Periodic sequences; Lattice profile; Linear complexity; Discrete Fourier transform; Expected value; Variance; Counting function

1. Introduction Let Fq be the finite field with q elements, where q is an arbitrary prime power. Let S = (s0 , s1 , s2 , . . .) be a sequence with terms in the finite field Fq , or as we shall say more briefly, a sequence over Fq . For a positive integer N, the sequence S is called N-periodic if si+N = si for all i 0. The N-periodic sequence S can be completely described by the N-tuple S(N) = (s0 , s1 , . . . , sN−1 ). ∗ Corresponding author.

E-mail addresses: [email protected] (F.-W. Fu), [email protected] (H. Niederreiter). 1 Chern Institute of Mathematics, Nankai University, Tianjin 300071, P.R. China.


424

F.-W. Fu, H. Niederreiter / Journal of Complexity 23 (2007) 423 – 435

The polynomial corresponding to the N-periodic sequence S is defined as S(x) = s0 + s1 x + s2 x 2 + · · · + sN−1 x N−1 . Definition 1. The linear complexity L(S) of an N-periodic sequence S over the finite field Fq is the smallest nonnegative integer l for which there exist coefficients d1 , d2 , . . . , dl ∈ Fq such that sj + d1 sj −1 + · · · + dl sj −l = 0

for all j l.

The linear complexity of sequences is an important security measure for stream cipher systems (see [1,3,19,22,23]). Note that L(S) = 0 if S is the zero sequence. Obviously, we always have 0 L(S)N . Note that if S is not the zero sequence, then L(S) is the length of the shortest linear feedback shift register that can generate S. For a general introduction to the theory of linear feedback shift register sequences, we refer the reader to [11, Chapter 8] and the references therein. The lattice profile defined below analyzes the lattice structure of pseudorandom number sequences with applications in Monte Carlo methods and stream ciphers in cryptology (see [4–8,12, 17,18,20,21]). The lattice profile is used, for instance, in the structural assessment of periodic sequences of pseudorandom numbers by means of Marsaglia’s lattice test (see [8,12]). Thus, it is of interest to gain further insight into the lattice profile by studying e.g. the expected value and the variance of the lattice profile of periodic sequences over Fq with fixed period. This establishes then statistical benchmarks for the assessment of pseudorandom numbers via the lattice profile. Definition 2. Let S = (s0 , s1 , s2 , . . .) be an N-periodic sequence over the finite field Fq . For integers t 0 and n2, we say that S passes the t-dimensional n-lattice test if the vectors si − s0 , i = 1, . . . , n − t, span Ftq , where si = (si , si+1 , . . . , si+t−1 ),

0 i n − t.

The lattice profile T (S, n) of S at n is defined as the greatest t such that S passes the t-dimensional n-lattice test. The lattice profile T (S) of S is defined as T (S) = sup T (S, n). n2

Note that if the sequence S passes the t-dimensional n-lattice test, then it passes all t -dimensional n-lattice test for all t t, and if the sequence S fails the t-dimensional n-lattice test, then it fails all t -dimensional n-lattice test for all t t. We always have 0 T (S)N − 1 (see [7]). Dorfer, Niederreiter, and Winterhof (see [7,21]) established the following important relationship between the lattice profile and the linear complexity for a periodic sequence. Lemma 1. Let S = (s0 , s1 , s2 , . . .) be an N-periodic sequence over the finite field Fq . If gcd(N, q) = 1, then N−1 L(S) if si = 0, i=0 T (S) = (1) N−1 L(S) − 1 if i=0 si = 0. If gcd(N, q) = 1, then T (S) = L(S) − 1. Linear complexity and lattice profile can also be defined for a finite sequence over Fq (see [4–7]). The important relationship between the lattice profile and the linear complexity for a


425

finite sequence was also established in [4–7,21]. Motivated by the relationship between the lattice profile and the linear complexity for a finite sequence, Dorfer et al. [5] determined the expected value, variance, and counting function of the lattice profile for finite sequences over Fq . In [2,9,14,15], the expected value, variance, and counting function of the linear complexity of periodic sequences were studied. It is easy to see from Lemma 1 that when gcd(N, q) = 1, the expected value, variance, and counting function of the lattice profile of N-periodic sequences over Fq can be directly obtained from the expected value, variance, and counting function of the linear complexity of N-periodic sequences over Fq . In this paper, we study the expected value, variance, and counting function of the lattice profile of N-periodic sequences over Fq when gcd(N, q) = 1. Using the discrete Fourier transform for periodic sequences and the relationship between the lattice profile and the linear complexity, we give general formulas for the expected value, variance, and counting function of the lattice profile of periodic sequences. Moreover, we determine in a more explicit form the expected value, variance, and counting function of the lattice profile of periodic sequences for special values of the period. This paper is organized as follows. In Section 2, we briefly review the discrete Fourier transform of sequences and the relationship between the linear complexity of a periodic sequence and the Hamming weight of the discrete Fourier transform for the periodic sequence. We also review the formulas for computing the expected value and variance of the linear complexity of N-periodic sequences over Fq where gcd(N, q) = 1. In Section 3, we derive general formulas for the expected value, variance, and counting function of the lattice profile of N-periodic sequences over Fq where gcd(N, q) = 1. In Section 4, we determine in a more explicit form the expected value, variance, and counting function of the lattice profile of N-periodic sequences with certain periods N. 2. The discrete Fourier transform of sequences In this section, we briefly review the discrete Fourier transform of sequences over Fq and the relationship between the linear complexity of a periodic sequence and the Hamming weight of the discrete Fourier transform for the periodic sequence. We also list some basic properties of the discrete Fourier transform for periodic sequences and review the formulas for computing the expected value and variance of the linear complexity of N-periodic sequence over Fq where gcd(N, q) = 1. Definition 3. The discrete Fourier transform of an N-tuple S(N) = (s0 , s1 , . . . , sN−1 ) over Fq , where gcd(N, q) = 1, is defined by DFT S(N) = S(1), S(), . . . , S N−1 , (2) where is a primitive Nth root of unity in some extension field of Fq and S(x) = s0 + s1 x + s2 x 2 + · · · + sN−1 x N−1 is the polynomial corresponding to the N-tuple S(N) . The Hamming weight of an N-tuple over a finite field is defined as the number of nonzero coordinates in this N-tuple. Blahut (see [13]) established the following relationship between the linear complexity of an N-periodic sequence S = (s0 , s1 , s2 , . . .) and the Hamming weight of the discrete Fourier transform of the corresponding N-tuple S(N) = (s0 , s1 , . . . , sN−1 ). Lemma 2. The linear complexity L(S) of an N-periodic sequence S over Fq, where gcd(N, q) = 1, is equal to the Hamming weight of the discrete Fourier transform DFT S(N) of the N-tuple S(N) corresponding to S.

426


Let N be a positive integer with gcd(N, q) = 1. We put ZN = {0, 1, . . . , N − 1}. Definition 4. For j ∈ ZN , the cyclotomic coset Cj of j modulo N relative to powers of q is defined by Cj = j, j · q, . . . , j · q lj −1 (mod N ), where lj is the least positive integer satisfying j · q lj ≡ j (mod N ). The smallest integer in the cyclotomic coset Cj is called the coset representative of this cyclotomic coset. The following lemma (see [15, Lemma 1]) shows an algebraic property of the entries of the DFT of an N-tuple. Lemma 3. For an integer j ∈ ZN , let the integer k ∈ ZN be an element of the cyclotomic coset N, i.e., k ≡ j q b (mod N ) for some integer b 0. Let |Cj | = lj . Then the Cj of j modulo (N) DFT S of an N-tuple S(N) over Fq satisfies S(j ) ∈ Fq lj ,

q b S(k ) = S(j ) .

Let D1 , D2 , . . . , Dh be the different cyclotomic cosets modulo N relative to powers of q. Denote |Di | = mi ,

i = 1, 2, . . . , h.

In this paper we always assume that D1 = C0 = {0}. Using Lemma 3, Meidl and Niederreiter [15] showed that the DFT S(N) of an N-tuple S(N) over Fq has the special form called DFT form: DFT S(N) is uniquely determined by h coordinates corresponding to h coset representatives of the cyclotomic cosets D1 , D2 , . . . , Dh . The entry of DFT S(N) at j -position, where j ∈ Di , is an element of Fq mi . Furthermore, the DFT is a bijection. Using Lemmas 2 and 3, Meidl and Niederreiter [15] showed that the linear complexity of an N-periodic sequence S over Fq can be written in the following form. Lemma 4. Let N be a positive integer with gcd(N, q) = 1. Let D1 , D2 , . . . , Dh be the different cyclotomic cosets modulo N relative to powers of q. Let mi = |Di |, 1 i h, be the sizes of these cyclotomic cosets, respectively. Let ji , 1i h, be the coset representative of the cyclotomic coset Di . Then the linear complexity L(S) of an N-periodic sequence S over Fq , where gcd(N, q) = 1, is given by L(S) =

h

i mi ,

i=1

where

i =

1 if S(ji ) = 0, 0 if S(ji ) = 0.

In the rest of the paper, the underlying stochastic model is that of each N-periodic sequence over Fq having the same probability q −N . This means that a random N-periodic sequence S is


427

uniformly distributed over the set of all N-periodic sequences over Fq . Meidl and Niederreiter [15, Corollary 4] gave a general formula in terms of the mi , the sizes of the cyclotomic cosets Di , for the expected value of the linear complexity L(S) of a random N-periodic sequence S over Fq . Fu et al. [9, Corollary 3] gave a general formula in terms of the mi for the variance of the linear complexity L(S) of a random N-periodic sequence S over Fq . Equivalent formulas for the expected value and variance of the linear complexity L(S) of a random N-periodic sequence S over Fq , but in other forms and without proof, were given by Dai and Yang [2]. The following lemma summarizes the relevant formulas from [15] and [9]. Lemma 5. Let N be a positive integer with gcd(N, q) = 1. Let D1 , D2 , . . . , Dh be the different cyclotomic cosets modulo N relative to powers of q. Let mi = |Di |, 1 i h, be the sizes of these cyclotomic cosets, respectively. Then the expected value and variance of the linear complexity L(S) of a random N-periodic sequence S over Fq are given by E(L(S)) = N −

h

mi q −mi ,

(3)

i=1

Var(L(S)) =

h

m2i q −mi (1 − q −mi ).

(4)

i=1

3. The expected value, variance, and counting function of the lattice profile of periodic sequences In this section, we derive general formulas for the expected value, variance, and counting function of the lattice profile of N-periodic sequences over Fq where gcd(N, q) = 1. Denote by PN (q) the set of N-periodic sequences over Fq . Let S be an N-periodic sequence over Fq with the corresponding N-tuple S(N) = (s0 , s1 , . . . , sN−1 ). Then N−1

si = 0 ⇐⇒ S(1) = 0,

(5)

i=0

where S(x) = s0 + s1 x + s2 x 2 + · · · + sN−1 x N−1 is the corresponding polynomial. It is easy to see that |{S ∈ PN (q) : S(1) = 0}| = q N−1 ,

|{S ∈ PN (q) : S(1) = 0}| = q N − q N−1 .

(6)

Theorem 1. Let N be a positive integer with gcd(N, q) = 1. Let D1 , D2 , . . . , Dh be the different cyclotomic cosets modulo N relative to powers of q. Let mi = |Di |, 1 i h, be the sizes of these cyclotomic cosets, respectively. Then the expected value and variance of the lattice profile T (S) of a random N-periodic sequence S over Fq are given by q −1

− mi q −mi , q h

E(T (S)) = N −

(7)

i=1

Var(T (S)) =

h

i=1

m2i q −mi (1 − q −mi ) −

q −1 . q2

(8)

428


Proof. By Lemma 1 and (6), we have

1 T (S) E(T (S)) = N q S∈PN (q) ⎡ ⎤

1 ⎣ = N L(S) + (L(S) − 1)⎦ q S∈PN (q), S(1)=0 S∈PN (q), S(1)=0 ⎡ ⎤ 1 ⎣

= N L(S) − (q N − q N−1 )⎦ q S∈PN (q)

q −1 . q Hence, (7) follows from (3) and (9). Next we recall that = E(L(S)) −

(9)

Var(T (S)) = E(T 2 (S)) − (E(T (S)))2 .

(10)

By Lemma 1 and (6), we have

1 T 2 (S) E(T 2 (S)) = N q S∈PN (q) ⎡ ⎤

1 = N ⎣ L2 (S) + (L(S) − 1)2 ⎦ q S∈PN (q), S(1)=0 S∈PN (q), S(1)=0 ⎡ ⎤

1 ⎣

= N L2 (S) − 2 L(S) + q N − q N−1 ⎦ q S∈PN (q)

S∈PN (q), S(1)=0

2 = E(L2 (S)) − N q

L(S) +

S∈PN (q), S(1)=0

q −1 . q

(11)

By (9)–(11) and noting that Var(L(S)) = E(L2 (S)) − (E(L(S)))2 , we obtain

2(q − 1) 2 q −1 Var(T (S)) = Var(L(S)) + − E(L(S)) + q qN q2

S∈PN (q), S(1)=0

By Lemmas 3 and 4, (6), and the specific DFT form of DF T (S(N) ), we have

L(S) =

S∈PN (q), S(1)=0

h

i mi

S∈PN (q), S(1)=0 i=1

=

1+

S∈PN (q), S(1)=0

= q N − q N−1 +

i mi

i=2 h

i=2

= q N − q N−1 +

h

h

i=2

mi

S∈PN (q), S(1)=0,

1 S(ji )=0

mi (q − 1)(q mi − 1)q N−1−mi

L(S). (12)


= q N − q N−1 +

h

i=2

mi (q − 1)(q N−1 − q N−1−mi )

= q N − q N−1 + (q − 1) q N−1

h

=q −q

N−1

+ (q − 1) q

mi −

i=2

N

429

N−1

h

mi q N−1−mi

i=2 h

(N − 1) −

mi q

N−1−mi

i=2

= N (q − 1)q N−1 − (q − 1)

h

mi q N−1−mi .

(13)

i=2

It follows from (3), (12), and (13) that q −1 2(q − 1) m1 q −m1 + Var(T (S)) = Var(L(S)) − q q2 q −1 = Var(L(S)) − . q2 Hence, (8) follows from (4) and (14). This completes the proof.

(14)

Corollary 1. Let N be a positive integer with gcd(N, q) = 1. Then the expected value of the lattice profile T (S) of a random N-periodic sequence S over Fq satisfies E(T (S)) > N − O(N ε ) for every ε > 0, where the implied constant depends only on ε. Proof. By [9, Remark 1] the sizes of the cyclotomic cosets modulo N relative to powers of q are given as follows: for each positive divisor d of N, there are exactly (d)/Hq (d) cyclotomic cosets of size Hq (d), where is Euler’s totient function and Hq (d) is the multiplicative order of q modulo d. It follows then from (7) that q −1

E(T (S)) = N − (d)q −Hq (d) . (15) − q d|N

Now q Hq (d) > q Hq (d) − 1 d since d divides q Hq (d) − 1 by the definition of Hq (d). Therefore from (15), q − 1 (d) q −1

E(T (S)) > N − 1 − N − − q d q d|N

d|N

q −1 − (N ), =N − q where (N ) is the number of positive divisors of N. By [10, Theorem 3.15] we have (N ) = O(N ε ) for every ε > 0, with an implied constant depending only on ε. This yields the desired result. Corollary 2. Let N be a positive integer with gcd(N, q) = 1. Then the variance of the lattice profile T (S) of a random N-periodic sequence S over Fq satisfies Var(T (S)) = O(N ε ) for every ε > 0, where the implied constant depends only on ε.

430


Proof. Using the same information on the sizes of the cyclotomic cosets modulo N relative to powers of q as in the proof of Corollary 1, we obtain from (8) that Var(T (S)) =

d|N

q −1 (d)Hq (d)q −Hq (d) 1 − q −Hq (d) − . q2

(16)

(d)Hq (d)q −Hq (d) .

(17)

This implies Var(T (S)) <

d|N

Next we note that q Hq (d) d + 1, and so Hq (d) logq (d + 1). Since for fixed q the function g(x) = xq −x is decreasing on the interval [logq e, ∞), it follows from (17) that Var(T (S))
0, and if one also applies the Baker’s transformation a convergence of N −2+ for all > 0, see [8] and the references therein, for non-periodic functions higher convergence rates for smoother functions are not known. (On the other hand there is still the possibility to apply periodizing transformations to make the integrand periodic such that lattice rules work—at least in theory. However, this procedure can magnify the variation of the integrand. See [8] for more information.) Digital nets on the other hand have only been known to achieve a convergence of N −1+ for all > 0 for functions with bounded variation [18]. In [3,4] these results have been extended to yield explicit constructions of generalized digital nets which can achieve arbitrary high convergence rates under suitable conditions on the integrands. The analysis in [3,4] is based on Walsh functions, in particular the behavior of the Walsh coefficients of the reproducing kernel [3] and, in general, of smooth functions [4] was analyzed and used to obtain explicit constructions of generalized digital nets. We remark that results concerning high order accuracy for digital smooth functions have been shown in a series of papers, see, for example, [11–14]. Here, however, we are concerned with ordinary smoothness which makes a significant difference. In this paper we use the insights obtained from [3,4] to also generalize polynomial lattice rules. Polynomial lattices first introduced in [17], which are the quadrature points used in a polynomial lattice rule, are very similar to lattice rules and have been shown to achieve the optimal rate of convergence for integration in Sobolev spaces with partial mixed derivatives up to order one square integrable [5]. In this paper we give the correct generalization of polynomial lattices which also achieve the optimal rate of convergence for Sobolev spaces with higher order mixed partial derivatives. Indeed we can even show the existence of polynomial lattice rules which automatically adjust themselves to the smoothness of the integrand in terms of the convergence of the integration error within a certain (arbitrary high) range. Note that an analogous result for lattice rules is not known, hence for the time being polynomial lattice rules have an upper hand for the integration of non-periodic smooth functions. Strong tractability roughly means that the worst-case error in a sequence of spaces of increasing dimension goes to zero independently of the dimension. In [6] digital nets and in [5] polynomial lattice rules have already been shown to achieve strong tractability results in Sobolev spaces with partial mixed derivatives up to order one square integrable. Here we extend these results for higher order Sobolev spaces by showing the existence of polynomial lattice rules which also achieve strong tractability results in this case. In the following section we generalize the classical definitions of digital nets and polynomial lattice rules. In Section 3 we briefly introduce Walsh functions and in Section 4 we consider numerical integration in Sobolev spaces. Section 5 finally deals with (strong) tractability.

438

J. Dick, F. Pillichshammer / Journal of Complexity 23 (2007) 436 – 453

2. Digital nets and polynomial lattice rules for arbitrary smooth functions In this section we introduce digital nets and polynomial lattice rules which can achieve arbitrary high convergence rates of the integration error for suitably smooth functions, see [3,4]. This is achieved by a slight generalization of the classical definition of digital nets, see [16–18], and [19] for a very recent survey article on digital nets. The following generalization appeared first in [4]. Definition 2.1 (Digital nets). Let b be a prime and let s 1 and m, n1 be integers. Let C1 , . . . , Cs be n × m matrices over the finite field Zb . We construct N = bm points in [0, 1)s in the following way: for 0 h < bm let h = h0 + h1 b + · · · + hm−1 bm−1 be the b-adic expansion of h. Identify h with the vector h = (h0 , . . . , hm−1 ) ∈ Zm b , where means the transpose of the vector. For 1 j s multiply the matrix Cj by h, i.e., Cj h =: (yj,1 (h), . . . , yj,n (h)) ∈ Znb , and set xh,j :=

yj,1 (h) yj,n (h) . + ··· + b bn

We call the point set {xh = (xh,1 , . . . , xh,s ) : 0 h < bm } a digital net (over Zb ). The matrices C1 , . . . , Cs are called the generating matrices of the digital net. In [17] (see also [18, Section 4.4]) Niederreiter introduced a special family of digital nets over Zb . Those nets are obtained from rational functions over finite fields. For a prime b let Zb ((x −1 )) be the field of formal Laurent series over Zb . Elements of Zb ((x −1 )) are formal Laurent series, L=

∞

tl x −l ,

l=w

where w is an arbitrary integer and all tl ∈ Zb . Note that Zb ((x −1 )) contains the field of rational functions over Zb as a subfield. Further let Zb [x] be the set of all polynomials over Zb . The following definition is a slight generalization of the definition from [17], see also [18]. As we will see later, polynomial lattice rules as defined below can achieve arbitrary high convergence rates and the generalization is based on results in [3,4]. Definition 2.2 (Polynomial lattice rules). Let b be prime and 1 m n. Let n be the map from Zb ((x −1 )) to the interval [0, 1) defined by ∞ n −l n = tl x tl b−l . l=w

l=max(1,w)

For a given dimension s 1, choose p ∈ Zb [x] with deg(p) = n 1 and let q1 , . . . , qs ∈ Zb [x]. For 0 h < bm let h = h0 + h1 b + · · · + hm−1 bm−1 be the b-adic expansion of h. With each


439

such h we associate the polynomial h(x) =

m−1

hr x r ∈ Zb [x].

r=0

Then Sp,m,n (q) is the point set consisting of the bm points h(x)q1 (x) h(x)qs (x) , . . . , n ∈ [0, 1)s , xh = n p(x) p(x) for 0 h < bm . A quasi-Monte Carlo rule using the point set Sp,m,n (q) is called a polynomial lattice rule. We remark here that for our results only the degree of the polynomial p is important and not the specific choice of p itself (we will assume though that p is irreducible, but this assumption could be removed by a more complicated analysis). Remark 2.3. The point set Sp,m,n (q) consists of the first bm points of Sp,n,n (q), i.e., the first bm points of a classical polynomial lattice. Hence the definition of a polynomial lattice in [17] is covered by choosing n = m in the definition above. Furthermore it is important to note that for dimension s = 1 for m < n the points of Sp,m,n (q) are not equally spaced in general (contrary to the case where m = n). Using similar arguments as for the classical case n = m, see [17,18], it can be shown that the point set Sp,m,n (q) is a digital net in the sense of Definition 2.1. The generating matrices C1 , . . . , Cs of this digital net can be obtained in the following way: for 1 j s, consider the expansions ∞ qj (x) (j ) ul x −l ∈ Zb ((x −1 )), = p(x) l=wj

(j )

where wj ∈ Z. Then the elements ci,r of the n × m matrix Cj over Zb are given by (j )

(j )

ci,r = ur+i ∈ Zb ,

(2.1)

for 1 j s,1i n, 0 r m − 1. ∞ i xi Let x = ∞ i=1 bi ∈ [0, 1) and let = i=1 bi ∈ [0, 1), where xi , i ∈ {0, . . . , b − 1}. We define the digital b-adic shifted point y by y =x⊕=

∞ y

i

i=1

bi

,

where yi = xi + i ∈ Zb . For points x ∈ [0, 1)s and ∈ [0, 1)s the digital b-adic shift x ⊕ is defined component wise. Definition 2.4 (Shifted digital nets and polynomial lattice rules). A digital net for which all points are digitally shifted by the same ∈ [0, 1)s is called a digitally shifted digital net or simply shifted digital net, and a polynomial lattice rule for which the underlying quadrature

440


points are digitally shifted by the same ∈ [0, 1)s is called a digitally shifted polynomial lattice rule or simply a shifted polynomial lattice rule. Finally we introduce some notation: for arbitrary k = (k1 , . . . , ks ) ∈ Zb [x]s and q = (q1 , . . . , qs ) ∈ Zb [x]s , we define the ‘inner product’ k·q=

s

kj qj ∈ Zb [x],

j =1

and we write q ≡ 0 (mod p) if p divides q in Zb [x]. Further, for b prime we associate a non-negative integer k = 0 + 1 b + · · · + a ba with the polynomial k(x) = 0 + 1 x + · · · + a x a ∈ Zb [x] and vice versa. 3. Walsh functions We recall the definition of Walsh functions. Henceforth let N denote the set of positive integers and N0 = N ∪ {0}. We have the following definitions. Definition 3.1 (Walsh functions). Let b 2 be an integer. For a non-negative integer k with base b representation k = 0 + 1 b + · · · + a b a , with i ∈ {0, . . . , b − 1}, we define the Walsh function b walk : [0, 1) −→ C by b walk (x)

:= e2i(x1 0 +···+xa+1 a )/b ,

for x ∈ [0, 1) with base b representation x = xb1 + bx22 + · · · (unique in the sense that infinitely many of the xi must be different from b − 1). If it is clear which base b is chosen we will simply write walk . Definition 3.2 (Multivariate Walsh functions). Let b 2 be an integer. For dimension s 2, x1 , . . . , xs ∈ [0, 1) and k1 , . . . , ks ∈ N0 we define b walk1 ,...,ks : [0, 1)s −→ C by b walk1 ,...,ks (x1 , . . . , xs ) :=

s

b walkj (xj ).

j =1

For vectors k = (k1 , . . . , ks ) ∈ Ns0 and x = (x1 , . . . , xs ) ∈ [0, 1)s we write b walk (x)

:= b walk1 ,...,ks (x1 , . . . , xs ).

Again, if it is clear which base we mean we simply write walk (x). It is clear from the definitions that Walsh functions are piecewise constant. It can be shown that for any integers s 1 and b 2 the system {b walk1 ,...,ks : k1 , . . . , ks 0} is a complete orthonormal system in L2 ([0, 1)s ), see for example [2,15]. More information on Walsh functions can be found for example in [2,4,7,23]. We note that if Walsh functions, digital shifts, digital nets or polynomial lattice rules are used in conjunction with each other they are always in the same base b. Therefore we will often omit the b.


441

4. Numerical integration in Sobolev spaces We consider the Sobolev space Hs,, for which s 1 and 1. For the 1-dimensional case the inner product is given by

1

1

1 −1 1 −1 ( ) f, g H1,,() = f (x) dx g(x) dx + f (x) dx g ( ) (x) dx 0

=1 0

0

+−1

1

f () (x)g () (x) dx,

0

(4.1)

0

where f ( )

denotes the th derivative of f, f (0) = f and > 0 denotes the weight (see [22]). The 1/2 corresponding norm in H1,,() is given by f H1,,() = f, f H1,,() . The reproducing kernel (see [1] for more information about reproducing kernels) for this space is given by K1,,(1) (x, y) =

B (x)B (y) =0

( !)2

+ (−1)+1

B2 (|x − y|) , (2)!

where B denotes the Bernoulli polynomial of degree . For example we have B0 (x) = 1, B1 (x) = x − 21 , B2 (x) = x 2 − x + 16 and so on. The reproducing kernel for the s-dimensional weighted Sobolev space Hs,, is now given by B (xj )B (yj ) +1 B2 (|xj − yj |) , u + (−1) Ks,, (x, y) = (2)! ( !)2 u⊆S

j ∈u

=1

where S = {1, . . . , s},u are positive reals for all u ⊆ S (the ‘weights’) and = {u }u⊆S . For example if u = j ∈u j then the space Hs,, is a tensor product space of weighted 1dimensional spaces. The inner product in this space is now the s-fold product of (4.1) and the 1/2 corresponding norm in Hs,, is given by f Hs,, = f, f Hs,, . Note that numerical integration in the Sobolev space with = 1 using digital nets and polynomial lattice rules has already been considered in [5,6,20]. As Ks,, ∈ L2 ([0, 1)2s ) it follows that Ks,, can be represented by a Walsh series, i.e., we have s,, (k, l) walk (x) wall (y), Ks,, (x, y) = K k,l∈Ns0

where s,, (k, l) = K

Ks,, (x, y) walk (x) wall (y) dx dy. [0,1)2s

s,, (k, l) = 0 if kj = lj = 0 for j ∈ Note that if u = 0 for some u, then K / u and kj , lj = 0 for j ∈ u. One of the crucial points to obtain higher convergence in [3,4] is the analysis of the behavior s,, (k, l). For the periodic reproducing kernels for Korobov spaces of the Walsh coefficients K

442


there is a direct connection between the smoothness and the decay of the Fourier coefficients of the reproducing kernel. As shown in [3,4] the relation between the smoothness and the decay of the Walsh coefficients of the kernel is a bit more complicated. Indeed here the decay of the Walsh coefficients depends on the wavenumber through the base b representation of the wavenumber. More precisely, it was shown in [3,4] that for any 1 there exists a constant Cb, > 0 independent of the wavenumber k ∈ Ns0 such that

2 s

(4.2) Ks,, (k, k) Cb, rb, (k) for all k ∈ N0 , where rb, (k) = sj =1 rb, (kj ) and rb, (0) = 1 and for k = 1 ba1 −1 + · · · + v bav −1 with v 1, 0 < av < · · · < a1 and i ∈ {1, . . . , b − 1} we set rb, (k) = b−(a1 +···+amin(v,) ) . For = 1 this follows from [6, Section 6] and for > 1 from [3,4]. In [3,4] it was then shown that this structure in the Walsh coefficients can be exploited to obtain arbitrary high convergence rates for a suitable generalization of digital nets. In the following we will show that this can also be done using the slightly more general definition of polynomial lattice rules given by Definition 2.2. In the following we consider the worst-case error for multivariate integration in the Sobolev space Hs,, for s 1 and 1, i.e.,

Is (f ) − Qbm ,s (f ) . e(Qbm ,s , Hs,, ) = sup f ∈Hs,, f Hs,, 1

The initial error is given by e(Q0,s , Hs,, ) =

sup f ∈Hs,, f Hs,, 1

|Is (f )| .

From [4, Theorem 15] we know that e2 (Q0,s , Hs,, ) = ∅ and e (Q 2

bm ,s

, Hs,, ) = −∅ + =

1 b2m

m −1 b

1 b2m

m −1 b

h,h =0

Ks,, (xh , xh )

h,h =0

(Ks,, (xh , xh ) − ∅ ).

let the dual net D = For a digital net which has generating matrices C1 , . . . , Cs ∈ Zn×m b s D(C1 , . . . , Cs ) = {k ∈ N0 \ {0} : C1 k1 + · · · + Cs ks = 0}, where k = (k1 , . . . , ks ) and for kj = 0 + 1 b + · · · let kj = (0 , . . . , n−1 ) . Furthermore 0 denotes the zero-vector in Znb . For u ⊆ S let Du = Du ((Cj )j ∈u ) be the projection of the vectors in D to the coordinates in u and let Du∗ = Du∗ ((Cj )j ∈u ) = Du ∩ N|u| . For a vector k ∈ Ns0 and for u ⊆ S let (ku , 0) denote the vector k with all components whose index is not in u replaced by zero.


443

Using the same arguments as in [6, Section 6] we can now obtain a formula for the mean square worst-case error e2 (Qbm ,s , Hs,, ) of randomly shifted digital nets (see Definition 2.4) where the expectation value of the square worst-case error is taken over all random i.i.d. ∈ [0, 1)s , i.e., e2 (Qbm ,s , Hs,, ) = E e2 (Qbm ,s (), Hs,, ) , where Qbm ,s () denotes the quadrature rule for which all quadrature points are digitally shifted by ∈ [0, 1)s . Using [4, Theorem 15] together with the results from [6, Section 6] we obtain that for any 1 we have s,, ((ku , 0), (ku , 0)) K e2 (Qbm ,s , Hs,, ) = u ∅=u⊆S

ku ∈Du∗

and by applying (4.2) we obtain for any 1 that |u| 2 e2 (Qbm ,s , Hs,, ) u Cb, rb, (ku ). ∅=u⊆S

(4.3)

ku ∈Du∗

Compare this result with its deterministic version [4, Lemma 9]. Note that the worst-case error essentially depends on the structure of the dual net D with respect 2 (k ). Essentially, generalized digital nets for which the largest summand in 2 to rb, ku ∈Du∗ rb, (ku ) u is as small as possible will also yield a small worst-case error. How explicit constructions of such digital nets can be obtained has been explained in [3,4]. In this paper we want to show the existence of polynomial lattice rules which can achieve arbitrary high convergence. This is done in the following by showing that the generalized polynomial lattice rules introduced above give the same dual space as the one for generalized digital nets (when one views the generalized polynomial lattice as a generalized digital net) and an averaging argument (together with Jensen’s inequality) will then be enough to yield the result. In the subsequent lemma we now state a similar result to (4.3) for polynomial lattice rules. From a slight generalization of [17, Lemma 4.40] we obtain that the analogous definition of the dual space for a polynomial lattice is given by D = Dp (q) = {k ∈ Ns0 \ {0} : q · k¯ ≡ a (mod p) with deg(a) < n − m}, where for k = (k1 , . . . , ks ) ∈ Ns0 we associate the vector of polynomials k¯ = (k¯1 , . . . , k¯s ) where for kj = 0 + 1 b + · · · we define k¯j (x) = 0 + 1 x + · · · + n−1 x n−1 and where we set deg(0) = −1. Hence for m = n we obtain the usual definition of the dual space, see [5,18], and for m < n we obtain a superset. As above, for any u ⊆ S, we also define the projections of the vectors ∗ (q) = D ∩ N|u| . in D to the coordinates in u by Du = Du,p (q) and further we set Du∗ = Du,p u A proof of the following lemma can be obtained by using a slight generalization of [18, Lemma 4.40] and (4.3). Lemma 4.1. Let b be a prime and 1 be an integer. Then there exists a constant Cb, > 0 depending only on b and (and not on s and m) such that the mean square worst-case error for multivariate integration in the Sobolev space Hs,, using a randomly shifted polynomial lattice rule Qbm ,s can be bounded by |u| 2 e2 (Qbm ,s , Hs,, ) u Cb, rb, (ku ). ∅=u⊆{1,...,s}

∗ (q) ku ∈Du,p

444


Further, we need the following lemma. Lemma 4.2. Let 1 be an integer. Then for every 21 < 1 there exists a constant 0 < Cb,, < ∞ such that ∞ l=1

2

rb, (l)Cb,, .

Proof. Note that it is enough to show the result for satisfying 21 < < min 1, 2(1−1) as ∞ 2

l=1 rb, (l) is a monotonically decreasing function in , i.e., we can use the constant Cb,, to 2 bound ∞ l=1 rb, (l) for all < 1. In the following let l = 1 ba1 −1 + · · · + v bav −1 , where v 1, 0 < av < · · · < a1 and

i ∈ {1, . . . , b − 1}. We divide the sum over all l ∈ N into two parts, namely, firstly where 1v and secondly where v > . For the first part we have

(b − 1)v

v=1

=

b−2 (a1 +···+av )

0 0 independent of s and N. Therefore the integration problem in the sequence of spaces {Hs,, }s 1 is strongly QMC-tractable. From this it is clear that if 0 is the supremum over all which satisfy (5.1),


451

then the -exponent of strong tractability lies in the interval [1/, 1/ 0 ]. If (5.1) holds for all ∈ [ 21 , ), then 0 = which proves the last assertion of item (1). (2) If B1/2,q < ∞ for some non-negative q, then we have e(QN,s , Hs,, ) c · s q/2 · N −1/2 for some c > 0 independent of s and N and it follows that the integration problem in the sequence of spaces {Hs,, }s 1 is QMC-tractable. If B ,q < ∞, then we have e(QN,s , Hs,, ) c · s q · N − and the assertion concerning - and s-exponent follows. As the proof is based on the result from Theorem 4.9 it is clear that the corresponding bounds on the worst-case error can be achieved by digitally shifted polynomial lattice rules. In the sequel we will consider a special choice of weights, namely so-called product weights. Here we have a sequence 1 , 2 , . . . of non-negative reals and the weight corresponding to the projection given by u ⊆ {1, . . . , s} is given by u = j ∈u j for u = ∅ and ∅ = 1. In this case for any < it follows from Theorem 4.9 that there exists a digitally shifted polynomial lattice rule such that ⎛ ⎞ s 1 √ ⎝ 1/(2 ) ⎠ e(QN,s , Hs,, ) 2 −1 + , (5.3) 1 + Cj N j =1

1/(2 )

where C = Cb,

Cb,,1/(2 ) is from the bound in Theorem 4.9 and where N = bm .

Theorem 5.3. Let 1. We have: (1) For some ∈ [ 21 , ) assume that ∞

1/(2 )

j

< ∞.

(5.4)

j =1

Then the integration problem in the sequence of spaces {Hs,, }s 1 is strongly QMCtractable. Let 0 be the supremum over all which satisfy (5.4). Then the -exponent of strong tractability lies in the interval [1/, 1/ 0 ]. If (5.4) holds for all ∈ [ 21 , ), then the -exponent of strong tractability has the value 1/ (which is optimal). (2) Under the assumption s j =1 j A := lim sup < ∞, log s s→∞ we obtain that the integration problem in the sequence of spaces {Hs,, }s 1 is QMCtractable. If s A := lim sup s→∞

1/(2 ) j =1 j

log s

< ∞,

then the -exponent of tractability lies in the interval [1/, 1/ ] and the s-exponent is at most C · A . Moreover the corresponding upper bounds on the worst-case error can be achieved by digitally shifted polynomial lattice rules.

452


Proof. (1) This part of the theorem follows from Theorem 5.2, part (1), since for product weights we have ⎛ ⎞ ∞ 1/(2 ) ⎠ j , B ,0 exp ⎝C j =1

if the sum in the above expression is finite. (2) For any > 0 there exists a positive s such that s

1/(2 )

j

(A + ) log s

∀s s .

j =1

From (5.3) we obtain e(QN,s , Hs,, ) N

− √

2 s

s

j =1 log

1/(2 )

1+C j

1/(2 ) s √ C j =1 j N − 2 s √ N − 2 s C(A +)

/ log s

/ log s

for any > 0 and all s s . The result follows. As the proof is based on the result from Theorem 4.9 it is clear that the corresponding bounds on the worst-case error can be achieved by digitally shifted polynomial lattice rules. The result follows. Remark 5.4. Note that the conditions for (strong) tractability in the case of product weights are independent of the smoothness parameter . Acknowledgements The second author would like to thank Prof. Ian H. Sloan and Josef Dick for their hospitality during his visit at the University of New South Wales where the main part of this paper was written. References [1] N. Aronszajn, Theory of reproducing kernels, Trans. Am. Math. Soc. 68 (1950) 337–404. [2] H.E. Chrestenson, A class of generalized Walsh functions, Pac. J. Math. 5 (1955) 17–31. [3] J. Dick, Explicit constructions of quasi-Monte Carlo rules for the numerical integration of high dimensional periodic functions, submitted for publication. [4] J. Dick, Walsh spaces containing smooth functions and quasi-Monte Carlo rules of arbitrary high order, submitted for publication. [5] J. Dick, F.Y. Kuo, F. Pillichshammer, I.H. Sloan, Construction algorithms for polynomial lattice rules for multivariate integration, Math. Comput. 74 (2005) 1895–1921. [6] J. Dick, F. Pillichshammer, Multivariate integration in weighted Hilbert spaces based on Walsh functions and weighted Sobolev spaces, J. Complexity 21 (2005) 149–195. [7] N.J. Fine, On the Walsh functions, Trans. Amer. Math. Soc. 65 (1949) 372–414. [8] F.J. Hickernell, Obtaining O(N −2+ ) convergence for lattice quadrature rules, in: K.T. Fang, F.J. Hickernell, H. Niederreiter (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 2000, Springer, Berlin, 2002, pp. 274–289. [9] E. Hlawka, Zur angenäherten Berechnung mehrfacher Integrale, Monatsh. Math. 66 (1962) 140–151. [10] N.M. Korobov, The approximate computation of multiple integrals, Dokl. Akad. Nauk SSSR 124 (1959) 1207–1210.


453

[11] G. Larcher, H. Niederreiter, W.Ch. Schmid, Digital nets and sequences constructed over finite rings and their application to quasi-Monte Carlo integration, Monatsh. Math. 121 (1996) 231–253. [12] G. Larcher, G. Pirsic, R. Wolf, Quasi-Monte Carlo integration of digitally smooth functions by digital nets, in: H. Niederreiter, P. Hellekalek, G. Larcher, P. Zinterhof (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 1996, Lecture Notes in Statistics, vol. 127, Springer, New York, 1998, pp. 321–329. [13] G. Larcher, W.Ch. Schmid, Multivariate Walsh series, digital nets and quasi-Monte Carlo integration, in: H. Niederreiter, P.J.-S. Shiue (Eds.), Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Lecture Notes in Statistics, vol. 106, Springer, New York, 1995, pp. 252–262. [14] G. Larcher, C. Traunfellner, On the numerical integration of Walsh series by number-theoretic methods, Math. Comput. 63 (1994) 277–291. [15] K. Niederdrenk, Die endliche Fourier- und Walshtransformation mit einer Einführung in die Bildverarbeitung, Vieweg, Braunschweig, 1982. [16] H. Niederreiter, Point sets and sequences with small discrepancy, Monatsh. Math. 104 (1987) 273–337. [17] H. Niederreiter, Low-discrepancy point sets obtained by digital constructions over finite fields, Czech. Math. J. 42 (1992) 143–166. [18] H. Niederreiter, Random number generation and quasi-Monte Carlo methods. CBMS-NSF Series in Applied Mathematics, vol. 63, SIAM, Philadelphia, 1992. [19] H. Niederreiter, Constructions of (t, m, s)-nets and (t, s)-sequences, Finite Fields Appl. 11 (2005) 578–600. [20] G. Pirsic, J. Dick, F. Pillichshammer, Cyclic digital nets, hyperplane nets, and multivariate integration in Sobolev spaces, SIAM J. Numer. Anal. 44 (2006) 385–411. [21] I.F. Sharygin, A lower estimate for the error of quadrature formulas for certain classes of functions, Zh. Vychisl. Mat. Mat. Fiz. 3 (1963) 370–376. [22] I.H. Sloan, H. Wo´zniakowski, When are quasi-Monte Carlo algorithms efficient for high dimensional integrals?, J. Complexity 14 (1998) 1–33. [23] J.L. Walsh, A closed set of normal orthogonal functions, Amer. J. Math. 55 (1923) 5–24.


Regularized collocation method for Fredholm integral equations of the first kind M. Thamban Naira , Sergei V. Pereverzevb,∗ a Department of Mathematics, Indian Institute of Technology Madras, Chennai 600 036, India b Johann Radon Institute for Computational and Applied Mathematics, Austrian Academy of Science, Altenbergstrasse

69, 4040 Linz, Austria Received 9 April 2006; accepted 3 September 2006 Available online 1 November 2006 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract In this paper we consider a collocation method for solving Fredholm integral equations of the first kind, which is known to be an ill-posed problem. An “unregularized” use of this method can give reliable results in the case when the rate at which smallest singular values of the collocation matrices decrease is known a priori. In this case the number of collocation points plays the role of a regularization parameter. If the a priori information mentioned above is not available, then a combination of collocation with Tikhonov regularization can be the method of choice. We analyze such regularized collocation in a rather general setting, when a solution smoothness is given as a source condition with an operator monotone index function. This setting covers all types of smoothness studied so far in the theory of Tikhonov regularization. One more issue discussed in this paper is an a posteriori choice of the regularization parameter, which allows us to reach an optimal order of accuracy for deterministic noise model without any knowledge of solution smoothness. © 2006 Elsevier Inc. All rights reserved. MSC: 65J20; 47L10 Keywords: Ill-posed problems; Collocation method; Regularization; Order optimal error bounds; General source conditions; Operator monotone functions; A posteriori parameter choice

∗ Corresponding author.

E-mail addresses: [email protected] (M. T. Nair), [email protected] (S. V. Pereverzev). 0885-064X/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2006.09.002

M. T. Nair, S. V. Pereverzev / Journal of Complexity 23 (2007) 454 – 467

455

1. Introduction We discuss collocation for Fredholm integral equations of the first kind 1 k(s, t)x(t) dt = y(s), 0 s 1,

(1.1)

0

where the non-degenerate kernel k(·, ·) and the right-hand term y are assumed to be continuous functions, i.e. y ∈ C([0, 1]) and k(·, ·) ∈ C([0, 1] × [0, 1]). The collocation method for (1.1) is considered to be a special form of discretization that arises when we replace the original problem by one in a finite dimentional space. In case of collocation, this space is just the Euclidean space Rn . Recall that for any positive integer n, a collocation scheme is determined by sets n = {i }ni=1 ⊂ [0, 1] of the collocation points satisfying 0 n1 < n2 < · · · < nn 1, and by operators Tn : C([0, 1]) → Rn such that Tn f = (f (n1 ), f (n2 ), . . . , f (nn )),

∀f ∈ C([0, 1]).

Then, within a collocation scheme based on n , the original equation (1.1) is replaced by an operator equation in Rn , which can be written abstractly as Kn x = Tn y,

(1.2)

where Kn = Tn K and K is the integral operator defined by 1 k(s, t)x(t) dt. (Kx)(s) = 0

Note that (1.2), where Kn is an operator from L2 (0, 1) to Rn , is always solvable at least in the sense of least squares, and can be reduced to a system of n linear algebraic equations. In principle, a least squares solution of (1.2) can be taken as an approximate solution of the original equation (1.1). We know that (1.1) is an ill-posed equation, since the integral operator K with a non-degenerate and continuous kernel k(·, ·) is a compact operator with non-closed range in L2 (0, 1), and hence it is not continuously invertible. This ill-posedness is reflected in the ill-conditioning of the system of linear algebraic equations corresponding to (1.2). Therefore, even small perturbations y1 , y2 , . . . , yn of the data y(n1 ), y(n2 ), . . . , y(nn ) may drastically change a least squares solution of (1.2). Thus, even for a finite-dimensional system (1.2) one needs regularization algorithms, which are capable of dealing with ill-conditioning caused by ill-posedness of the original problem. From [6, Chapter 3.3], it is known that the influence of non-vanishing data noise ni = y(ni ) − yi , i = 1, 2, . . . , n, depends on the smallest singular value n of the operator Kn . If the rate at which n decreases is known, then the problem can be regularized by a proper choice of the discretization parameter n in (1.2). This is sometimes called regularization by discretization, or self-regularization, because no additional regularization of the finite-dimensional problem (1.2) is needed. This aspect has been extensively discussed in the literature (see, e.g., [1], and the references therein). For some ill-posed equations, such as elliptic boundary integral equations and pseudo-differential equations, this rate is known a priori. For this type of problems one can employ a self-regularization of collocation schemes, as it has been discussed in [4,17]. But in

456


general the problem of estimation of n is more difficult than the problem (1.1) itself. Therefore, if information about the rate of decay of n is not available, then other techniques should be used for regularizing (1.2). There are various ways in which regularization can be applied to the discretized equation (1.2). Tikhonov regularization is the most popular one. A few selected references from the literature on this topic are [5,7,10,16]. But it is worth noting that the previous study of regularized collocation was restricted to the case of so-called moderately ill-posed problems. More precisely, it was assumed that a solution xˆ of (1.1) satisfies a source condition xˆ = (K ∗ K) v,

v ∈ L2 (0, 1),

(1.3)

for some > 0, where K ∗ is the adjoint of the operator K : L2 (0, 1) → L2 (0, 1). In certain cases it is possible to interpret the above source condition as an inclusion of xˆ into a Sobolev 2 space W2 (0, 1). But in general, within the setup of the Hilbert spaces a general source condition of the form xˆ = (K ∗ K)v,

v ∈ L2 (0, 1),

(1.4)

for an appropriate function is much more flexible for describing a solution’s smoothness than the scales of Sobolev or Besov spaces. Indeed, within the framework of these scales, the smoothness is described in terms of real numbers, while a representation (1.4) gives the possibility to use a function as a smoothness index. Moreover, an accuracy of order O(log− (1/)), which is typical for severely ill-posed problems, cannot be expressed in Sobolev or Besov scales, while it can easily be covered by analysis based on a general source condition. In the present paper we extend the analysis of regularized collocation to the case of solution smoothness given as a source condition (1.4) with an operator monotone function . This covers all types of smoothness studied so far in the theory of Tikhonov regularization. In particular, severely ill-posed problems can be well described and analyzed within this framework. In the previous study of regularized collocation, the number of collocation points was interpreted as an amount of indirect observations. In this paper we treat the number of collocation points n and the number of indirect noisy observations m separately. We will show that for deterministic data noise, the number m can be much smaller than n, but it still allows us to obtain the order of accuracy, that cannot be improved in general. One more issue which is discussed in this paper is an a posteriori choice of the regularization parameter which yields the best possible accuracy without the knowledge of the index function describing the smoothness of the true solution. 2. Preliminaries In this section, we describe the framework and assumptions for our analysis. The following assumption is similar to that used in [7,10]. Assumption 1. There exist a constant , a set {ni }ni=1 of positive quadrature weights associated with a set {ni }ni=1 of collocation points for n = 1, 2, . . . , and a decreasing sequence {n } of positive real numbers such that (i) n → 0 as n → ∞,


457

n (ii) ni , n = 1, 2, . . . , and for all large enough n, i=1 1 (iii) 0 k(, ·)k(, ·) d − ni=1 ni k(ni , ·)k(ni , ·)L2 L2 n , 11 where g(·, ·)2L L = 0 0 |g(s, t)|2 ds dt. 2

2

Example 1. Let ni = (i −1)/(n−1), i = 1, 2, . . . , n. The trapezoidal quadrature rule associated with these collocation points has the weights n1 = nn = 1/[2(n − 1)], ni = 1/(n − 1), i = 2, 3, . . . , n − 1. If this rule is used for numerical integration with n > 2, then it is well-known that for any function f having a bounded second derivative, we have n 1 sup ∈[0,1] |f ()| n n f () d − i f (i ) . 0 32(n − 1)2 i=1

If k(s, t) is twice continuousely differentiable with respect to s, and if > 0 is such that i * sup i k(s, t) , s, t ∈ [0, 1], i = 0, 1, 2 , *s then it is easy to check, for the above mentioned trapezoidal quadrature rule, that Assumption 1 is satisfied with = 1 and n = (n − 1)−2 2 /8. Corresponding to the set {ni } of quadrature weights as in Assumption 1, we define an inner product ·, ·,n on Rn by

u, v,n :=

n

ni ui vi ,

u, v ∈ Rn .

i=1

In the sequel we denote by Rn the space Rn endowed with the inner product ·, ·,n and the corresponding norm · ,n . Proposition 1. Let Kn = Tn K : L2 (0, 1) → Rn be the operator as in equation (1.2). Then, under the Assumption 1, we have K ∗ K − Kn∗ Kn n , where the adjoint Kn∗ : Rn → L2 (0, 1) of Kn is given by (Kn∗ u)(·) =

n

ni k(ni , ·)ui ,

u ∈ Rn .

i=1

Proof. For any x ∈ L2 (0, 1), u ∈ Rn we have 1 n n

Kn x, u,n = i ui k(ni , t)x(t) dt i=1

n 1

0

=

0

from which the

ni k(ni , s)ui

i=1 = x, Kn∗ uL2 (0,1) , formula for Kn∗ is obtained.

x(s) ds

458


Next we note that ∗

(K Kx

− Kn∗ Kn x)(s)

=

1 0

1

k(, s)k(, t) d −

0

1

ni k(ni , s)k(ni , t)

x(t) dt.

i=1

Hence,

(K ∗ K − Kn∗ Kn )xL2

n

k(, ·)k(, ·) d −

0

n i=1

Now, using Assumption 1, the proof can be completed.

ni k(ni , ·)k(ni , ·)

L2

xL2 . L2

As we have already mentioned, our plan is to consider Tikhonov regularization of (1.2) using perturbed values of the right-hand term y(s) at the collocation points {ni }. We will assume that the measurements of y(s) are made at the points si , i = 1, 2, . . . , m, which may not coincide with ni , i = 1, 2, . . . , n. Moreover, in practice these measurements are made usually in the presence of some noise, so that the observed measurements are

yj = y(sj ) + j ,

j = 1, 2, . . . , m,

(2.1)

where j denotes an error of the j-th measurement. Our subsequent analysis will be done in a deterministic framework in which the errors j are assumed to be bounded so that | j | for all j = 1, 2, . . . , m, for some positive number . To use the measurement data for collocation we should be able to calculate the values yi ≈ y(ni ) out of (2.1). To this end we assume that there is a system of functions {gjm }m j =1 ⊂ C([0, 1]), m = 1, 2, . . . , such that m m y(s) − εm |||y|||, s ∈ [0, 1], y(s )g (s) (2.2) j j j =1 for any y ∈ Range(K) , where |||·||| is some seminorm defined on Range(K) and {εm } is a sequence of positive real numbers such that εm → 0 as m → ∞. Moreover, we assume that there exists a constant > 0 such that m

|gjm (s)|,

(2.3)

j =1

for any s ∈ [0, 1] and m = 1, 2, . . .. Example 2. Consider the integral operator K with the kernel k(s, t) as in Example 1. Then |||y||| = sup{|y (2) (s)|, s ∈ [0, 1]},

y ∈ Range(K),

defines a seminorm on Range(K). For each m = 2, 3, . . . , let sj = (j − 1)/(m −

1), j = 1, 2, 1 1 . . . , m, and let Bm (t) be the linear B-spline defined as follows: Bm (t) ≡ 0 for t ∈ − m−1 , m−1 ,

1 Bm (t) = Bm (−t) and Bm (t) = 1 + (m − 1)t for t ∈ − m−1 , 0 . Then it can be seen that m j =1

|Bm (s − sj )| ≡ 1


459

for all s ∈ [0, 1], and for any function y having bounded second derivative, it is well-known that m |||y||| y(s) − y(s )B (s − s ) . j m j 2 8(m − 1) j =1 Thus, assumptions in (2.2) and (2.3) are satisfied with εm = (m − 1)−2 /8, = 1 and gjm (s) = Bm (s − sj ) for j = 1, 2, . . . , m. Note that a system {gjm } with properties (2.2) and (2.3) can be used for producing an arbitrary amount of perturbed collocation data from a fixed amount of noisy measurements (2.1). Indeed,

one can calculate {yi }ni=1 from {yj }m j =1 by yi =

m

yj gjm (ni ),

i = 1, 2, . . . , n.

j =1

Then from (2.1)–(2.3) we have m n n m n |y(i ) − yi | y(i ) − y(sj ) gj (i ) j =1 +

m

|yj − y(sj )| |gjm (ni )|εm |||y||| + .

(2.4)

j =1

This estimation shows that within a deterministic framework it is reasonable to choose the number of observations m = m( ) such that εm( ) . (Here and in the sequel the expression a b means that there are two b-independent constants c, C > 0 such that ca b Ca.) From (2.4) it also follows that in a deterministic framework the level of collocation data noise = max{|y(ni ) − yi |, i = 1, 2, . . . , n} depends on the interplay between the level of the measurement errors and the amount of measurements m, but it does not depend on the number of collocation points. Therefore, the following assumption seems to be appropriate. Assumption 2. Assume that for any n = 1, 2, . . . , and for sufficiently small ∈ (0, 1), we are able to receive collocation data y1 , y2 , . . . , yn such that |y(ni ) − yi | ,

i = 1, 2, . . . , n.

In view of Assumption 2 and the relation ni=1 ni from Assumption 1, the error level of the vector yn := (y1 , y2 , . . . , yn ) from yn := (y(n1 ), y(n1 ), . . . , y(nn )) with respect to the norm in Rn is given by 1/2

n √ n n 2 i |(K x)( ˆ i ) − yi | , (2.5) yn − yn ,n = i=1

So far, the description of the problem (1.1) as an ill-posed equation is not complete. Facing with such an equation, what one usually looks for is a stable approximation for the Moore–Penrose generalized solution of (1.1), defined as a function xˆ with minimal L2 -norm such that K xˆ − yL2 = inf{Kx − yL2 , x ∈ L2 (0, 1)}.

460


Apart from the quality of collocation data, the achievable accuracy for recovery of xˆ is essentially determined by its smoothness. The benchmark for the smoothness of xˆ is provided by the Picard criterion, which is based on the singular value decomposition of the integral operator K from Eq. (1.1) as (Kx)(s) =

∞

ak vk , xL2 uk (s),

k=1

where {vk } and {uk } are orthonormal systems of eigenfunctions of the operators K ∗ K and KK ∗ , respectively, and a12 , a22 , . . . are the corresponding eigenvalues. The Moore–Penrose generalized solution xˆ of (1.1) is then given by x(t) ˆ =

∞

y, uk L

2

ak

k=1

vk (t).

(2.6)

2 2 The Picard criterion asserts that xˆ ∈ L2 (0, 1) if and only if ∞ k=1 | y, uk L2 | /ak < ∞, which implies a minimal decay of the Fourier coefficients y, uk L2 . Therefore, it seems natural to measure the smoothness of xˆ by enforcing some faster decay. More precisely, we require a stronger condition ∞

y, uk 2L2 k=1

ak2 2 (ak2 )

0 be such that max{A, B}b. Suppose is an operator monotone index function on [0, a] where a > b. Then there exists a constant c depending on a − b such that (A) − (B) c (A − B). Moreover, there exists a constant d > 0 such that d

t (t) ()

(2.8)

whenever 0 < t < a. Thus, an operator monotone index function allows us to estimate the norm of (K ∗ K) − (Kn∗ Kn ). Therefore, in our analysis we will rely on the following assumption. Assumption 3. Assume that the Moore–Penrose generalized solution xˆ of Eq. (1.1) meets the source condition (1.4), and that the index function is operator monotone on an interval [0, a] such that a > b max{K ∗ K, Kn∗ Kn } for all n = 1, 2, . . . , where K, Kn are operators as in Eqs. (1.1), (1.2), respectively. Example 3. Let us again consider the integral operator K with the kernel k(s, t) as in Example 1. Then K ∗ K = K2 sup{|k(s, t)|2 , s, t ∈ [0, 1]} 2 . Moreover, from the proof of Proposition 1 it follows that Kn∗ Kn

sup

n

s,t∈[0,1] i=1

ni |k(ni , s)k(ni , t)| 2 ,

where is as in Assumption 1. Thus, in this case, Assumption 3 is satisfied with an index function , which is operator monotone on the interval [0, a], where a > max{ 2 , 2 }. Recall that in the theory of Tikhonov regularization the index functions (t) = t , 0 1, are traditionally considered (see e.g. [6]). These functions are operator monotone on [0, ∞). Severely ill-posed problems correspond to index functions (t) = log− (1/t) or (t) = log− (log(1/t)), 0 < 1 (see [3,9,18]). These functions are operator monotone on [0, 1), and can be used within the framework of Assumption 3 for sufficiently small . Since the operator K can always be scaled properly so that the spectrum of K ∗ K lies in [0, 1), one can conclude that Assumption 3 covers all types of smoothness studied so far in the theory of Tikhonov regularization.

462


3. Regularization For the sake of simplicity, we will assume in the sequel that Eq. (1.1) has at least one L2 solution x. ˆ Then, y(t) = (K x)(t), ˆ but in accordance with the Assumption 2, for any n we are able to receive only a noisy collocation data yn = (y1 , y2 · · · , yn ) ∈ Rn . Recall from (2.5) that √ Tn K xˆ − yn ,n , (3.1) where is the constant from Assumption 1. Our problem now is to recover xˆ from a finite-dimensional operator equation Kn x = yn , which is a perturbed version of (1.2). Regularizing this equation by Tikhonov regularization method, we obtain a one-parameter family of equations x + Kn∗ Kn x = Kn∗ yn ,

(3.2)

where > 0 is called the regularization parameter. For any > 0, the unique solution x ,n of (3.2) is considered as a regularized collocation approximation for x. ˆ Before analyzing this procedure, let us observe a representation of x ,n . Clearly, it belongs to Range(Kn∗ ). From Proposition 1 we know that Range(Kn∗ ) is spanned by {i k(ni , ·)}ni=1 . Hence, x ,n can be represented as x ,n =

n

cj nj k(nj , ·),

j =1

where the coefficients cj can be found from the system ci +

n

aij cj = yi ,

i = 1, 2, . . . , n,

j =1

of linear equations, where 1 k(ni , t)k(nj , t) dt. aij = nj 0

This system can be written in matrix form as c + A c = yn ,

(3.3)

where A = MW with W = diag(1 , . . . , n ),

M = [mij ],

mij =

1 0

We observe that u, v,n = W u, vRn , and

Au, v,n = W Au, vRn = W MW u, vRn , so that

Au, v,n = u, Av,n

∀ u, v ∈ Rn ,

k(ni , t)k(nj , t) dt.


463

i.e., A is self-adjoint with respect to ·, ·,n . To see that A is a positive operator, we observe that

Au, u,n = MW u, W uRn . Now, taking (t) = [ 1 (t), . . . , n (t)] with i (t) = k(ni , t), we conclude that 2 1 1 n

i (t) j (t)j uj i ui dt = j uj j (t) dt 0.

MW u, W u = 0 i,j

0 j =1

Thus, for any positive value of the regularization parameter the matrix I + A of the system (3.3) is a strictly positive and self-adjoint operator on Rn . Therefore, this system is uniquely solvable. The following lemma can be derived from [13, formula (4)]. For the sake of the reader’s convenience we present its proof below. Lemma 1. Under the Assumptions 1–3, we have √ xˆ − x ,n L2 cˆ [( ) + (n )] + √ , 2 where the constant cˆ does not depend on and n. In particular, for > 0 belonging to the domain of , if n := n( ) is the least positive integer such that n( ) , then √ xˆ − x ,n L2 2cˆ ( ) + √ , 2 Proof. Using the notation g (t) = ( + t)−1 one can represent x ,n as x ,n = ( I + Kn∗ Kn )−1 Kn∗ yn = g (Kn∗ Kn )Kn∗ yn . Moreover, from Proposition 3 of [12] we know that for any function satisfying (2.8), we have sup |(1 − g (t)t)(t)|

t∈[0,a]

( ) , d

(3.4)

where d is as in Proposition 2. Observe now that xˆ − x ,n = g (Kn∗ Kn )Kn∗ (Tn K xˆ − yn ) + (I − g (Kn∗ Kn )Kn∗ Kn )x. ˆ Using (3.1) and spectral theory one can estimate the first summand on the right as g (Kn∗ Kn )Kn∗ (Tn K xˆ − yn ) g (Kn∗ Kn )Kn∗ Rn →L2 Tn K xˆ − yn ,n √ √ √ sup | t( + t)−1 | √ . 2 t We use (1.4) to decompose the second summand in (3.5) as (I − g (Kn∗ Kn )Kn∗ Kn )xˆ = (I − g (Kn∗ Kn )Kn∗ Kn )(Kn∗ Kn )v +(I − g (Kn∗ Kn )Kn∗ Kn )((K ∗ K) − (Kn∗ Kn ))v. Now, (3.4) and spectral theory give us (I − g (Kn∗ Kn )Kn∗ Kn )(Kn∗ Kn )v v sup |(1 − g (t)t)(t)| t

v ( ). d

(3.5)

464


Since (I − g (Kn∗ Kn )Kn∗ Kn ) sup |1 − g ()| 1, by Propositions 1 and 2 we have (I − g (Kn∗ Kn )Kn∗ Kn )((K ∗ K) − (Kn∗ Kn ))v v(K ∗ K) − (Kn∗ Kn ) c v(K ∗ K − Kn∗ Kn ) cv(n ). Summing up the estimates above, we obtain the statement of the lemma.

The function (·) defined by √ (t) = t(t), t ∈ [0, K2 ], turns out to be important in the a priori√ choice of the regularization parameter. Note that for > 0, = −1 () if and only if ( ) = / . Also, for > 0, we use the notation n( ) for the least positive integer that satisfies n( ) , where {n } is the sequence introduced in Assumption 1. Theorem 1. Let Assumptions 1–3 be satisfied and for > 0 in the range of (·), let = −1 (). Let n = n( ) be the least positive integer such that n( ) . Then xˆ − x ,n L2 c (−1 ()), where the constant c does not depend on . Proof. From Lemma 1 we have

√ xˆ − x ,n( ) L2 2 cˆ ( ) + √ . 2 Moreover, by definition / −1 () = (−1 ()). Thus for = −1 (), √ xˆ − x ,n( ) L2 2cˆ (−1 ()) + (−1 ()) = c (−1 ()). 2

Suppose that the source condition (1.4) is given with a known index function, i.e., function . The above theorem shows that in order to maintain the best possible order of accuracy, it is sufficient to choose = −1 () and n = n( ) as in Theorem 1. Remark 1. It has been shown in [12] that, typically, the order of accuracy (−1 ()) can not be improved as far as the source condition (1.4) is concerned. Therefore, this rate can serve as a benchmark for error estimates. Several authors have investigated schemes where the approximation Kn∗ Kn to K ∗ K depends on . In [7] the condition K ∗ K − Kn∗ Kn 2 is shown to allow the optimal order of accuracy for the source condition given by (t) = t , 0 < 1. This is improved to K ∗ K − Kn∗ Kn

(3.6)


465

in [20], but another condition, namely, ˆ L2 +1 (K ∗ K − Kn∗ Kn )x is also used there. Our Theorem 1 can be seen as an improvement of these results. It shows that only condition (3.6) allows an optimal order of accuracy under source conditions (1.4) with operator monotone index functions . It covers (t) = t as a particular case. Note also that the special structure of finite-dimensional operator Kn = Tn K is not used in the proof of Theorem 1 and Lemma 1. 4. Adaptive choice of the regularization parameter An a priori parameter choice = −1 () can seldom be used in practice because the smoothness properties of the unknown solution xˆ reflected in the index function from (1.4) are generally unknown. In this section our focus is on the question of how to adapt the regularization parameter to the unknown in such a way that the optimal order of accuracy of (−1 ()) would be reached automatically. Such an adaptive strategy has been proposed recently in [12]. Its generalization has been studied in [19]. It is the only known strategy that can be applied within the framework of the Tikhonov scheme without the saturation effect, i.e., it allows us to reach the best order of accuracy for all linear problems that in principle can be treated in an optimal way within the Tikhonov method. For our subsequent analysis we need a fact proved in [14] (see Lemma 3 in [14]). From this result it follows that for any operator monotone index function and > 0, there are positive constants c , c depending only on and such that c (t)(t) c (t)

(4.1)

for all t such that t and t belong to the domain of . In practical applications the values of the regularization parameter are often selected from some geometric sequence i GM q = { i = 0 q , i = 0, 1, . . . , M}

with 0 = 2 , q > 1, where M is determined from q M−1 0 1 < q M 0 . Lemma 1 above can be used to propose an adaptive strategy for choosing the number n of collocation points along with ∈ GM q such that the error estimate is not spoiled by discretization. From Lemma 1, we know that √ xˆ − x ,n( ) L2 2c( ) ˆ + √ . (4.2) 2 Of course, for any n > n( ) the error xˆ − x ,n also admits this estimate, but the size of linear algebraic system that should be solved for constructing x ,n is larger. √ 2 In the sequel we will assume that c( ˆ ) < /4. It is not a restriction at all, because in √ the opposite case the right-hand side of (4.2) is larger than the constant /4 for any ∈ GM q . Clearly, such an error bound would be too rough.

466


The estimate (4.2) allows an application of Theorem 1 from [19]. Directly from this theorem it follows that if √ 2 : x − x , i = 0, 1, . . . , j , (4.3) + = max j ∈ GM √ q j ,n( j ) i ,n( i ) i then √ xˆ − x + ,n( + ) L2 12cˆ q( ),

(4.4) √

where ¯ is the solution of the equation 2c( ) ˆ = 2√ . Note that the choice = + does not require any knowledge of the index function . Theorem 2. Let Assumptions 1–3 be satisfied. Then xˆ − x + ,n( + ) L2 c (−1 ()), where the constant c does not depend on . Proof. Observe that from (4.4) can be represented as = −1 (c ), √ where c = /(4c). ˆ If c 1 then the statement of the theorem follows directly from (4.4). Assume that c > 1, i.e., > opt = −1 (). Then √ ( ) ( ) c = = √ ( opt ) ( opt ) opt opt

⇒

2 c opt .

Using (4.1) and (4.4), we finally obtain √ 2 −1 xˆ − x + ,n( + ) L2 12cˆ q(c ())c(−1 ()).

Theorem 2 tells us that adaptive parameter choice strategy (4.3) leads to the accuracy of optimal order (−1 ()) and does not require any knowledge of solution smoothness. Remark 2. In the routine (4.3) at any value ∈ GM q a regularized solution x ,n( ) requires us to deal with n( ) collocation points. In the context of Example 1, given , the number n of points is chosen depending on such that n = (n − 1)−2 2 /8 = , i.e. n( ) −1/2 . Applying a strategy (4.3), we start with = 0 = 2 . Thus, the number of collocation points used in the routine (4.3) has the order n( 0 ) −1 .

In the Example 2, we have discussed a situation when εm + , where εm = (m − 1)−2 /8, m is the number of observations (2.1), and is a level of the measurement error. If the number of observation is such that εm , then m = m( ) −1/2 , while n( 0 ) −1 −1 . Thus, in the case considered, the number of observations is essentially less than the number of collocation points required in the routine (4.3) in order to reach an accuracy benchmark (−1 ( )).


467

Acknowledgments This study started when the second author visited the Department of Mathematics, Indian Institute of Technology Madras. He wants to express his gratitude for the excellent conditions and hospitality. References [1] F. Bauer, S. Pereverzev, Regularization without preliminary knowledge of smoothness and error behaviour, European J. Appl. Math. 16 (3) (2005) 303–317. [2] A. Böttcher, B. Hofmann, U. Tautenhahn, M. Yamanoto, Convergence rates for Tikhonov regularization from different kind of smoothness conditions, Appl. Analysis 85 (2006). [3] G. Bruckner, J. Elschner, M. Yamamoto, An optimization method for grating profile reconstruction, Progress in analysis, vol. I, II (Berlin, 2001), World Sci. Publishing, River Edge, NJ, 2003, pp. 1391–1404. [4] G. Bruckner, S. Prössdorf, G. Vainikko, Error bounds of discretization methods for boundary integral equations with noisy data, Appl. Anal. 63 (1–2) (1996) 25–37. [5] F.R. de Hoog, R.S. Anderssen, Regualrization of first kind integral equations with application to Couette viscometry, J. Intergral Equations Appl. (2006). [6] H.W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse Problems, Kluwer, Dordrecht, 1996. [7] C.W. Groetsch, Convergence analysis of a regularized degenerate kernel method for Fredholm integral equations of the first kind, Integral Equations Operator Theory 13 (1) (1990) 67–75. [8] F. Hansen, Operator inequalities associated with Jensen’s inequality, Survey on classical inequalities, Mathematical Applications, vol. 517, Kluwer Acad. Publ., Dordrecht, 2000, pp. 67–98. [9] T. Hohage, Regularization of exponentially ill-posed problems, Numer. Funct. Anal. Optimiz. 21 (2000) 439–464. [10] M.A. Lukas, Comparisons of parameter choice methods for regularization with discrete noisy data, Inverse Problems 14 (1) (1998) 161–184. [11] P. Mathé, S.V. Pereverzev, Moduli of continuity for operator valued functions, Numer. Funct. Anal. Optim. 23 (5–6) (2002) 623–631. [12] P. Mathé, S.V. Pereverzev, Geometry of linear ill-posed problems in variable Hilbert scales, Inverse Problems 19 (3) (2003) 789–803. [13] P. Mathé, S.V. Pereverzev, Discretization strategy for linear ill-posed problems in variable Hilbert scales, Inverse Problems 19 (6) (2003) 1263–1277. [14] P. Mathé, S.V. Pereverzev, Regularization of some linear inverse problems with discretized random noisy data, Math. Comput. 75 (2006) 1913–1929. [15] M.T. Nair, E. Schock, U. Tautenhahn, Morozov’s discrepancy principle under general source conditions, J. Anal. Appl. 22 (2003) 199–214. [16] D.W. Nychka, D.D. Cox, Convergence rates for regularized solutions of integral equations from discrete noisy data, Ann. Statist. 17 (2) (1989) 556–572. [17] S.V. Pereverzev, S. Prössdorf, On the characterization of self-regularization properties of a fully discrete projection method for Symm’s integral equation, J. Integral Equations Appl. 12 (2) (2000) 113–130. [18] S.V. Pereverzev, E. Schock, Morozov’s discrepancy principle for Tikhonov regularization of severely ill-posed problems in finite-dimensional subspaces, Numer, Funct. Anal. Optim. 21 (7–8) (2000) 901–916. [19] S.V. Pereverzev, E. Schock, On the adaptive selection of the parameter in regualrization of ill-posed problems, SIAM J. Numer. Anal. 43 (2005) 2060–2076. [20] M.P. Rajan, Convergence analysis of a regularized approximation for solving Fredholm integral equations of the first kind, J. Math. Anal. Appl. 279 (2) (2003) 522–530. [21] G.M. Vainikko, A.Y. Veretennikov, Iteration procedures in Ill-posed problems (in Russian), Nauka, Moscow, 1986.


Wavelet para-bases and sampling numbers in function spaces on domains Hans Triebel Mathematisches Institut, Friedrich-Schiller-Universität Jena, D-07737 Jena, Germany Received 19 May 2006; accepted 23 August 2006 Available online 13 November 2006

Abstract This paper deals with wavelet frames (para-bases), local polynomial reproducing formulas, and sampling numbers in function spaces on arbitrary and on E-thick domains in Euclidean n-space. In an Appendix we collect some recent instruments for corresponding function spaces on Euclidean n-space. © 2006 Elsevier Inc. All rights reserved. MSC: 46E35; 41A25; 41A46; 42B35; 42C40 Keywords: Wavelets on domains; Function spaces; Sampling numbers; Polynomial reproducing formulas

1. Introduction s (Rn ) and F s (Rn ) are known for Unique wavelet representations in the function spaces Bpq pq all admitted parameters s ∈ R, 0 < p ∞, (p < ∞ for the F -spaces), 0 < q ∞. They are unconditional bases if p < ∞, q < ∞. The situation for corresponding spaces on domains in Rn is less favourable even if is an interval or a cube and even if only classical function spaces are considered. But this problem attracted a lot of attention. The state of the art may be found in s () and [1–4,8,10]. In [21,20, Section 4.2] we offered a new approach for some (sub-)spaces of Bpq n s Fpq () for bounded Lipschitz domains in R . This resulted in what we called para-bases. It is one aim of this paper to extend these considerations to more general (and more natural) domains in Rn . But we shift a comprehensive study of these problems to a later occasion restricting ourselves here to those assertions needed for the second (and main) purpose of this paper. We wish to demonstrate the symbiotic relationship between the recent theory of function spaces and some questions of numerical analysis such as local polynomial reproducing formulas and the

E-mail address: [email protected]. 0885-064X/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2006.08.002

H. Triebel / Journal of Complexity 23 (2007) 468 – 497

469

accuracy of reconstructing functions belonging to some function spaces by the means of function values resulting in (linear and non-linear) sampling numbers. This is the direct continuation of and [12,20, Sections 4.3, 4.4]. For the reasons just outlined this is not a paper about (general) function spaces. This may justify to collect what we need in Appendix A. In addition to basic definitions we describe there those instruments of the recent theory of function spaces on Rn from which we believe that they complement more classical tools (such as derivatives and differences) in a decisive way. We hope that this Appendix may also serve as a little specific self-contained survey. We give references, but some assertions are formulated here for the first time, at least in the sharp versions presented. The paper is organised as follows. Section 2 deals with refined localisation spaces F spq () on arbitrary domains in Rn and characterisations in terms of wavelet para-bases. But first we remind of (classical and fractional) Sobolev spaces, classical Besov spaces and Hölder–Zygmund s and F s . A reader who is not familiar with the theory of spaces as special cases of the spaces Bpq pq more general spaces may simply identify what follows with these special cases. In Section 3 we introduce E-thick domains (with bounded Lipschitz domains and snowflake domains as distins () and F s (). pq pq guished examples) and consider wavelet para-bases for the related spaces B Section 4 deals with wavelet J -para-bases in related B-spaces and F -spaces and respective local polynomial reproducing formulas. Clipping all together we arrive finally in Section 5 at sampling numbers of compact embeddings of some of these spaces into Lt () with 0 < t ∞. 2. Spaces on arbitrary domains 2.1. Distinguished spaces We use the notation according to Appendix A, including Definition 24 where we introduced s (Rn ) and F s (Rn ). But we describe a few distinguished special cases. A reader the spaces Bpq pq s (Rn ) and F s (Rn ) may identify what follows with who is not familiar with the general spaces Bpq pq these special cases. (i) Recall the Paley–Littlewood theorem 0 Lp (Rn ) = Fp,2 (Rn ),

1 < p < ∞.

(2.1)

s ∈ N0 , 1 < p < ∞,

(2.2)

(ii) Furthermore, s Fp,2 (Rn ) = Wps (Rn ),

are the classical Sobolev spaces usually normed by ⎞1/p ⎛ f |Wps (Rn ) = ⎝ D f |Lp (Rn )p ⎠ . || s

(iii) Recall that I :

∨ f → f ,

∈ R, = (1 + ||2 )1/2 ,

is a one-to-one map of S(Rn ) onto itself and of S (Rn ) onto itself. It is a lift in the spaces s (Rn ) and F s (Rn ), Bpq pq s (Rn ) = B s− (Rn ), I Bpq pq

s (Rn ) = F s− (Rn ), I Fpq pq

470


for all admitted s, p, q. In particular, Hps (Rn ) = I−s Lp (Rn ),

s ∈ R, 1 < p < ∞,

are the (fractional) Sobolev spaces with the classical Sobolev spaces Hps (Rn ) = Wps (Rn ),

s ∈ N0 , 1 < p < ∞,

as special cases. (iv) Let 1h f = f (x + h) − f (x),

1 l l+1 f (x) = h h (x), h

(2.3)

where x ∈ Rn , h ∈ Rn , l ∈ N, be iterated differences in Rn . Then the Hölder–Zygmund spaces C s (Rn ), s > 0, can be (equivalently) normed by f |C s (Rn )m = sup |f (x)| + sup |h|−s |m h f (x)|, x∈Rn

where 0 < s < m ∈ N. The second supremum is taken over all x ∈ Rn and all h ∈ Rn with 0 < |h|1. One has s C s (Rn ) = B∞∞ (Rn ),

s > 0.

(2.4)

(v) The last assertion can be generalised as follows. Let 0 < p ∞,

0 < q ∞,

p < s < m ∈ N,

s (Rn ) can be equivalently quasi-normed by with p as in (A.8). Then Bpq ⎞1/q ⎛ n q dh ⎟ s (Rn ) = f |L (Rn ) + ⎜ f |Bpq |h|−sq m ⎠ ⎝ m p h f |Lp (R ) |h|n |h| 1

s (Rn ) are the (with the usual modification if q = ∞). If 1p < ∞, 1 q ∞, then Bpq classical Besov spaces.

Remark 1. Similar lists and (historical) references may be found in [15, Section 2.2.2; 20, Section 1.2]. 2.2. Refined localisation spaces Open sets in Rn are denoted as domains. The refined localisation we have in mind is based on the well-known Whitney decomposition, applied to arbitrary domains in Rn with = Rn , in the version of Stein [13, Theorem 3, p. 16; Theorem 1, p. 167] adapted to our needs. Let Q0lr ⊂ Q1lr ⊂ Q2lr ⊂ Qlr ,

l ∈ N0 , r = 1, 2, . . . ,

(2.5)

with sides parallel to the axes of coordinates centred at 2−l mr

be concentric open cubes in R for some mr ∈ Zn and with the respective side-lengths 2−l , 5 · 2−l−2 , 6 · 2−l−2 , 2−l+1 . According to the Whitney decomposition there are pairwise disjoint cubes Q0lr of this type such that = Q0lr and dist(Qlr , *) ∼ 2−l (2.6) n

l,r


471

if l ∈ N and r = 1, 2, . . . , complemented by dist(Q0r , *) c for some c > 0. We may assume |l − l | 1 for two adjacent cubes Q0lr , Q0l r . Let = {lr } be a related resolution of unity by non-negative C ∞ functions such that supp lr ⊂ Q1lr ,

|D lr (x)| c 2l|| , ∈ Nn0 ,

(2.7)

for some c > 0, and ∞ l=0

lr (x) = 1

if x ∈ .

(2.8)

r

s s Let temporarily F∞∞ = B∞∞ . As usual, D () is the collection of all distributions on .

Definition 2. Let be an arbitrary domain in Rn with = Rn . Let 0 pq . Then F spq () = f ∈ D () : f |F spq () < ∞ (2.9) with f

|F spq ()

=

∞ l=0

1/p lr f

s (Rn )p |Fpq

(2.10)

r

(usual modification if p = q = ∞). Remark 3. Of course lr f with f ∈ D () is extended by zero outside of . These spaces have a little history. In [17, Theorem 5.14] we proved for bounded C ∞ domains in Rn that (2.10) is an equivalent quasi-norm in the closed subspaces s (Rn ) : supp f ⊂ f ∈ Fpq s (Rn ), denoted as refined localisation property. We extended this assertion in [21,20, Proposiof Fpq ton 4.20] to bounded Lipschitz domains in Rn under the additional restriction p > 1, q > 1. We s -spaces if return to this point below but without this restriction. There is no counterpart for Bpq p = q. Now we take (2.9), (2.10) as a definition and call F spq () refined localisation spaces. One has to prove that F spq () is independent of = {lr }. Furthermore we wish to characterise these spaces in terms of the ball means 1/u M M −n u dt,u f (x) = t |(h f )(x)| dh , x ∈ Rn , t > 0, (2.11) |h| t

where 0 < u∞ (usual modification if u = ∞) and where (M h f )(x) are the differences according to (2.3). Let

(x) = min 1, dist(x, *) , x ∈ . As usual B(x, t) denotes a ball in Rn centred at x ∈ Rn and of radius t > 0. For M ∈ N let with 0 < < 1 and c > 0 be numbers such that

B(x, Mt) ⊂ , dist B(x, Mt), * c(x)

472


for all x ∈ and 0 < t (x). Let Lp () with 0 < p ∞ be the usual quasi-Banach space with respect to the Lebesgue measure, quasi-normed by 1/p p f |Lp () = |f (x)| dx

with the obvious modification if p = ∞. Theorem 4. Let be an arbitrary domain in Rn with = Rn . Let 0 pq .

(i) Then F spq () is a quasi-Banach space. It is independent of = {lr } (equivalent quasi-norms). Let n n max(1, p) < w ∞, s − > − (2.12) p w (interpreted as w = ∞ if p = ∞). Then F spq () → Lw ().

(2.13)

(ii) Let 0 < u < min(1, p, q) and s < M ∈ N in (2.11). Let be as above. Then f ∈ Lw () (with w as in (2.12)) belongs to F spq () if, and only if, 1/q (·) −sq M q dt Lp () + −s (·) f Lp () < ∞ t d f (·) (2.14) t,u t 0 (equivalent quasi-norms).

Proof. The independence of F spq () of follows from the pointwise multiplier assertion in s (Rn ), the Proposition 42(ii). Furthermore, (2.13) follows from a corresponding assertion for Fpq obvious refined localisation property for Lw () and p w. Finally, (2.14) is essentially covered by [17, Corollary 5.15, p. 66] and the underlying proof. Corollary 5. Let be an arbitrary domain in Rn with = Rn . Then W kp () = F kp,2 (),

k ∈ N, 1 < p < ∞,

is the collection of all f ∈ Lp () such that −k −k+|| D f Lp () + D f Lp () ∼ f Lp () < ∞ || k

||=k

(equivalent norms). Proof. This follows from (2.2), (2.10) and well-known equivalent norms for the classical Sobolev spaces. One may also consult [14, Chapter 3].


473

2.3. Para-bases It is one of the main aims of this paper to extend wavelet representations for B-spaces and F -spaces on Rn according to Theorem 40 to some F -spaces on arbitrary domains in Section 2 and to some B-spaces and F -spaces on E-thick domains in the next Section 3. To prepare both we introduce first wavelets and sequence spaces. We always assume that is an arbitrary domain in Rn with = Rn furnished with the Whitney decomposition (2.5), (2.6). We rely on the notation introduced in Section A.2.3 and modify (A.22) by G,m (x) = 2(j +L)n/2 j

n

Ga 2j +L xa − ma ,

G ∈ {F, M}n , m ∈ Zn ,

(2.15)

a=1

where L ∈ N0 is fixed once and for all such that if 2−j −L m ∈ Q2lr for l ∈ N0 and j l,

j

suppG,m ⊂ Qlr

(2.16)

and 2−L−j m ∈ Q2lr

j

if Q1lr ∩ supp G,m = ∅ for l ∈ N0 and j l

(2.17)

for all admitted cubes according to (2.5), (2.6). We use the same notation as in (A.22) since one simply replaces the scaling function F in (A.18) by the scaling function 2L/2 F (2L ·). With {F, M}n and {F, M}n∗ as in (A.20), (A.21) we introduce for j ∈ N0 the main index set (2.18) Sj,1 = {F, M}n∗ × m ∈ Zn : 2−j −L m ∈ Q2lr for some l < j , some r and the residual index set

Sj,1 . Sj,2 = {F, M}n × m ∈ Zn : 2−j −L m ∈ Q2j r for some r

(2.19)

With S = S ,1 ∪ S ,2 ,

S ,1 =

∞ Sj,1 , j =0

let

S ,2 =

∞ Sj,2 ,

(2.20)

j =0

j 1, = G,m : (j, G, m) ∈ S ,1

(2.21)

be the main wavelet system and j 2, = G,m : (j, G, m) ∈ S ,2

(2.22)

j

be the residual wavelet system where G,m are given by (2.15)–(2.17). This is an adapted version of corresponding constructions in [21,20, Section 4.2.4] where one finds further discussions, especially about the orthogonality of the systems 1, and 2, . Let lr be the characteristic functions of the cubes Qlr in (2.5), (2.6).

474


Definition 6. Let be an arbitrary domain in Rn with = Rn . Let S be as in (2.18)–(2.20) and Sj = Sj,1 ∪ Sj,2

with j ∈ N0 .

s, Let s ∈ R, 0 < p ∞, 0 < q ∞. Then bpq is the collection of all sequences

= j,G m ∈ C : (j, G, m) ∈ S

(2.23)

such that ⎛ ⎛ ∞ ⎜ j (s−n/p)q ⎜ s, |bpq =⎝ 2 ⎝ j =0

⎞q/p ⎞1/q

p⎟ | j,G m | ⎠

⎟ ⎠

s is sufficient. Clipping together the re-transformed expansions one gets −j n/2 j f = lr f =

j,G G,m m (f ) 2 l,r

(j,G,m)∈S

with (2.25)–(2.27), hence (2.31) and (2.32). As for some details about these scaling procedures we refer to [20, Section 4.2.2, especially Proposition 4.17]. Then one gets by (2.33) also (2.30). Remark 9. If (j, G, m) ∈ S ,1 then the coefficients j,G m in (2.29) are unique and they coincide ,1 in (2.29) remains to be (after appropriate with j,G (f ) in (2.25). The summation over S m normalisation) an expansion by an orthonormal basis. The coefficients in the summation over the residual part S ,2 might not be unique, but this part is harmless by its construction. In any case (2.31) with (2.25)–(2.27) is a stable frame (where stable refers to the optimality of (f ) according to (2.30)). This may justify to call in (2.28) a para-basis. Further details may be found in [20, Section 4.2.4] including discussions about the convergence of (2.29). We collect the outcome which can also be obtained directly from (2.29). One has always unconditional convergence in S (Rn ). If p < ∞ and w < ∞ in (2.12) then (2.29) converges absolutely (and hence unconditionally) in Lw (). If p < ∞, q < ∞ then (2.29) converges unconditionally in

476


F spq (). If p < ∞, q = ∞ then one has unconditional convergence in F pp () with pq < < s. If is bounded and p = q = ∞, then (2.29) converges unconditionally in C () = B ∞∞ () with 0 < < s (using the notation (2.4)). If is unbounded then one has this convergence at least in any domain {x ∈ , |x| < R} with 0 < R (→ ∞). For our later considerations we need a counterpart of Theorem 8 for Lr () with 1 < r < ∞. Theorem 10. Let be an arbitrary domain in Rn with = Rn . Let 1 < r < ∞ and let j = G,m : (j, G, m) ∈ S (2.34) be the same intrinsic wavelet system as in (2.28) based on F and M according to (A.18), (A.19) now with u ∈ N. Then Lr () is the collection of all locally integrable functions f (in ) which can be represented by 0, −j n/2 j f =

j,G G,m , ∈ fr,2 . (2.35) m 2 (j,G,m)∈S

Furthermore, 0, f |Lr () ∼ inf |fr,2 ,

(2.36)

where the infimum is taken over all representations (2.35). Any f ∈ Lr () can be represented by (2.31) with (2.25)–(2.27) and 0, f |Lr () ∼ (f ) |fr,2

(2.37)

(equivalent norms). Proof. Obviously, Lr () = Lr () has the refined localisation property according to Definition 2 s . Furthermore, there is also an immediate counterpart of the homogeneity with Lr in place of Fpq property (A.31). Using the Paley–Littlewood assertion (2.1) one can carry over Step 2 of the proof of Theorem 8 resulting in the representation (2.31) with (2.37). It remains to prove (2.36) for any representation (2.35). We split as in (2.28) in its main wavelet system and its residual wavelet system, hence f = ··· + · · · = f1 + f2 . (j,G,m)∈S ,1

(j,G,m)∈S ,2

0 (Rn ). According to Theorem 30 one needs first moment conditions for atoms in Lr (Rn ) = Fr,2

By (A.19) with v = 0 it follows that G,m with G,m ∈ S ,1 are atoms (after normalisation) with respect to Lr (Rn ). We split in (2.35) into 1 and 2 , ,l

l = j,G , l = 1, 2. ∈ C : (j, G, m) ∈ S m j

j

Then it follows from Theorem 30 that 0, 0 f1 |Lr () ∼ f1 |Fr,2 (Rn ) c 1 |fr,2 .

(2.38)


One has for the residual part f2 , f2 |Lr ()r c

477

r 2−j n | j,G m |

(j,G,m)∈S ,2 0, r ∼ 2 |frr0, r ∼ 2 |fr,2 ,

(2.39)

0, where we used the structure of frq according to (2.24) and the structure of S ,2 . By (2.38) and (2.39) one gets the desired estimate 0, f |Lr () c |fr,2 .

Together with (2.37) one gets (2.36).

Although not subject of this paper we mention a somewhat curious consequence of the last j theorem. Let k ∈ N0 and k u ∈ N where u has the same meaning as above. Put m = 1 if ,1 (j, G, m) ∈ S . Then we modify the absolute values of (2.25), (2.26) by j j j,G k j n/2

m (f ) = 2 f (x) · D m G,m (x) dx . || k

For 1 < r < ∞ and k ∈ N0 let Wrk () = f ∈ Lr () : D f ∈ Lr (), ||k be the obviously normed intrinsically defined Sobolev spaces. Corollary 11. Let be an arbitrary domain in Rn with = Rn . Let 1 < r < ∞, k ∈ N, and let be as in (2.34) with k u. Then f ∈ Lr () is an element of Wrk () if, and only if, it can be represented by 0, k −j n/2 j f =

j,G G,m , (f )k ∈ fr,2 m (f ) 2 (j,G,m)∈S

with (2.25)–(2.27). Furthermore, 0, f |Wrk () ∼ (f )k |fr,2

(equivalent norms). Proof. This follows from Theorem 10 applied to D f with || k and partial integration as far as the coefficients are concerned. 3. Spaces on E-thick domains 3.1. E-thick domains Recall that domain means open set. Let l(Q) be the side-length of a cube Q in Rn with sides parallel to the axes of coordinates.

478


Fig. 1.

Definition 12. A domain in Rn is said to be E-thick (exterior thick) if one finds for any interior cube Qi ⊂ with l(Qi ) ∼ 2−j ,

dist(Qi , *) ∼ 2−j , j j0 ∈ N,

(3.1)

a complementing exterior cube Qe ⊂ c = Rn \ with l(Qe ) ∼ 2−j ,

dist(Qe , *) ∼ dist(Qi , Qe ) ∼ 2−j ,

where all equivalence constants are independent of j . Example 13. Every bounded Lipschitz domain in Rn is E-thick. If in R2 is (locally) above the cusp x2 = |x1 | with 0 < < 1 then is (locally) E-thick. If is (locally) below this cusp then is not E-thick. As indicated in Fig. 1 the usual snowflake domain in R2 is E-thick. But there might be rather bizarre E-thick domains. Proposition 14. (i) For any domain in Rn with = Rn one has Rn = ∪ * ∪ (Rn \) and * Rn ⊂ *. Furthermore, * = * Rn

if , and only if ,

()◦ = .

(ii) If is E-thick then ()◦ = . (iii) There are bounded E-thick domains with |*| > 0.


479

Proof. One checks (i) and (ii) easily. We prove (iii). Let {rl : l ∈ N} be the set of all rational numbers with 0 < rl < 1 and let Il be open intervals centred at rl such that Il ⊂ (0, 1). Let 0 =

∞

Il =

l=1

∞

Il0

with

l=1

∞

|Il | < 1,

l=1

where Il0 are disjoint open intervals. Then ∞ 0 0 * = [0, 1] Il and |* | > 0. l=1

We decompose each interval Il0 into ∞ Il0 = Il1 ∪ xlk ∪ Il2 , k=1

1 of length, say, ∼ 2−k |I 0 |, k ∈ N. Similarly is the union of disjoint open intervals Il,k where l 2 1 Il . This can be done in such a way that Il is E-thick at the expense of Il2 and vice versa. Then 1 = Il1 is E-thick at the expense of 2 = Il2 and vice versa. Furthermore,

Il1

0

1

2

0 < |* | = |* | + |* |. This proves (iii).

3.2. Spaces and para-bases Recall again that domain means open set. As usual, D () is the collection of all distributions on the domain . Definition 15. Let be an arbitrary domain in Rn with = Rn . Let s ∈ R, 0 < p ∞ (p < ∞ for F -spaces), 0 < q ∞. s () is the collection of all f ∈ D () such that there is an g ∈ B s (Rn ) with (i) Then Bpq pq g| = f . Furthermore, s () = inf g |B s (Rn ), f |Bpq pq s (Rn ) with g| = f . Similarly for F s (). where the infimum is taken over all g ∈ Bpq pq (ii) Let s () = f ∈ B s (Rn ) : suppf ⊂ pq B pq

and

s () = f ∈ F s (Rn ) : suppf ⊂ pq F pq

as closed subspaces of the corresponding spaces on Rn . s () is the collection of all f ∈ D () such that there is an g ∈ B s () with pq pq (iii) Then B g| = f . Furthermore, s () = inf g |B s (), pq pq f |B s () with g| = f . Similarly for F s (). pq pq where the infimum is taken over all g ∈ B

480


Remark 16. If |*| = 0 and 0 p , then one may identify the spaces in (ii) and (iii) (appropriately interpreted). On the other hand, according to Proposition 14(iii) there are bounded E-thick domains with |*| > 0. Then even for s > p the spaces in (ii) and (iii) might be different. Next we deal with the counterpart of Theorem 8 using the same notation as there. Theorem 17. Let be an E-thick domain in Rn according to Definition 12. s () be as in Definition 15(iii). Let be pq (i) Let 0 p , and let B the same intrinsic wavelet system as in (2.28) based on F and m according to (A.18), (A.19) with s < u ∈ N. Let w be as in (2.12). Then s () → L (). pq B w

(3.2)

s, s () if, and only if, it can be pq be as in Definition 6. Then f ∈ Lw () is an element of B Let bpq represented by −j n/2 j s, f =

j,G G,m , ∈ bpq . (3.3) m 2 (j,G,m)∈S

Furthermore, s () ∼ inf |bs, , pq f |B pq s () can be represented by pq where the infimum is taken over all representations (3.3). Any f ∈ B −j n/2 j

j,G G,m (3.4) f = m (f ) 2 (j,G,m)∈S

with (2.25)–(2.27) and s () ∼ (f ) |bs, pq f |B pq

(equivalent quasi-norms). s () be as in Definition 15(iii). Let pq (ii) Let 0 pq , and let F and w be as in part (i) again with s < u ∈ N. Then one has (3.2) with F in place of B. Let s, s () if, and only if, it can be pq fpq be as in Definition 6. Then f ∈ Lw () is an element of F represented by −j n/2 j s, f =

j,G G,m , ∈ fpq . (3.5) m 2 (j,G,m)∈S

Furthermore, s () ∼ inf |f s, , pq f |F pq s () can be represented by pq where the infimum is taken over all representations (3.5). Any f ∈ F (3.4) and s () ∼ (f ) |f s, pq f |F pq

(equivalent quasi-norms).


481

Proof. The embedding (3.2) and its F -counterpart follow from a corresponding assertion in Rn . We prove (ii). First, we remark that (3.5) can be considered as an atomic decomposition according to Theorem 30(ii) (after correct normalisation), no moment conditions are needed. One gets s () f |F s () = f |F s (Rn ) c |f s, . pq pq f |F pq pq

(3.6)

s (). Then f ∈ L () with w < ∞, and it follows from Theorem pq Conversely, let f ∈ F w 10 that f can be represented by (3.4) with (2.25)–(2.27) at least in Lw (). We wish to apply Theorem 36(ii) identifying (A.15) with (2.25), (2.26). By (A.18)–(A.22) one has the required j moment conditions in (A.13) with B = u > s if G ∈ {F, M}n∗ . This applies to all kernels G,m j

j

in (2.25) according to (2.18), (2.20). The kernels m G,m in (2.26) may not have the required moment conditions. Of course one has only to care for terms with j j0 . In particular the kernels j j m G,m in question have supports in cubes to which Definition 12 applies, hence j

j

supp m G,m ⊂ Qi j with (3.1). Let Qe be a related complementing exterior cube. Then there is a function G,m ∈ j n u e C (R ) with supp G,m ⊂ Q such that j j j (x), kjGm (x) = m (x) G,m (x) + G,m

x ∈ Rn ,

is an admitted kernel satisfying the required moment conditions. The existence of such a com j is quite plausible but not obvious. We refer for details to [22, p. 665]. plementing function G,m s () with g| = f and pq Let g ∈ F s (Rn ) = g |F s () ∼ f |F s (). pq pq g |Fpq

Since supp g ⊂ one gets G kj m (x) g(x) dx = R

j

R

n

n

j

m (x) G,m (x) f (x) dx.

Now one can apply Theorem 36(ii) and obtains that s, s (Rn ) ∼ f |F s (). pq c g |Fpq (f ) |fpq

(3.7)

Then (3.4), (3.7) and (3.6) prove part (ii). The proof of part (i) is the same. We only mention that we have now w = ∞ if p = ∞. But everything in representation (3.4) is local and applies also to L∞ () since L∞ () ⊂ Lloc v () for 1 < v < ∞. Corollary 18. Let be an E-thick domain in Rn according to Definition 12 and let 0 pq .

s () be as in Definition 15(iii) (with F s s pq ∞∞ ∞∞ =B ). Let F spq () be as in Definition 2 and F Then s (). pq F spq () = F

Proof. This is an immediate consequence of the Theorems 8 and 17.

482


4. J -para-bases and polynomial reproducing formulas 4.1. J -para-bases We modified (A.22) in (2.15) by an additional dilation 2L . This was not indicated since L is assumed to be fixed once and for all such that one has (2.16), (2.17) based on the Whitney decomposition (2.5), (2.6). Now we replace l 0 in (2.5)–(2.8) and also in (2.16), (2.17) by l J ∈ N0 . Then we have to adapt the wavelet system (2.18)–(2.22) appropriately where we indicate now J . This is covered by the multi-resolution philosophy. We fix the outcome. Instead of (2.5), (2.6) we have now Q0lr ⊂ Q1lr ⊂ Q2lr ⊂ Qlr , and =

Q0lr

and

l J, r = 1, 2, . . . ,

dist(Qlr , *) ∼ 2−l if l > J

l,r

and r = 1, 2, . . . , complemented by dist(QJ r , *) c 2−J for some c > 0. In (2.8) the summation over l ∈ N0 is now replaced by l J . In (2.16), (2.17) we assume now l J with respect to (2.15) which remains unchanged. Similarly one has now (2.18) with J l < j and (2.19) with J j and as a consequence (J S) = (J S),1 ∪ (J S),2 ,

∞ Sj,1 ,

(J S),1 =

j =J

(J S),2 =

∞ Sj,2 .

(4.1)

j =J

Then one gets an obvious modification of Theorem 8. In particular, f ∈ F spq () can be optimally represented as −j n/2 j

j,G G,m (4.2) f = m (f ) 2 (j,G,m)∈(J S)

with

j,G m (f )

=2

j

j n/2

or

j n/2

j,G m (f ) = 2

f (x) G,m (x) dx

j

j

f (x) m (x) G,m (x) dx

(4.3)

(4.4)

j

with m as in (2.27). However, instead of the decomposition (4.1) we rely now on the decomposition of (J S) into three disjoint index sets, (J S) = J S ∪ {J S} ∪ [J S] , where • J S collects all (j, G, m) ∈ (J S) where j = J and G = (F )n = (F, . . . , F ) with j,G m (f ) as in (4.3), • {J S} collects all (j, G, m) ∈ (J S) where j J and G ∈ {F, M}n∗ with j,G m (f ) as in (4.3),


483

• [J S] collects the remaining elements (j, G, m) ∈ (J S) . In particular, J S refers to those n-dimensional father wavelets Jm (x) = J(F )n ,m (x) = 2(J +L)n/2

n

F 2J +L xa − ma ,

a=1

where

n

)

Jm (f ) = J,(F (f ) = 2J n/2 m

f (x) Jm (x) dx.

By construction this applies to all terms Jm with dist(supp Jm , *) c1 2−J ,

J ∈ N0 , j

for some c1 > 0 which is independent of J . Recall that m in (4.4) is independent of admitted G ∈ {F, M}n and that dist(supp G,m , *)c2 2−J j

if (j, G, m) ∈ [J S] ,

(4.5)

for some c2 > 0 which is independent of J ∈ N0 . Hence (4.2) is now decomposed into three sums, −j n/2 j

j,G G,m f= m (f ) 2 (j,G,m)∈(J S)

=

Jm (f ) 2−J n/2 Jm +

(j,G,m)∈J S

(j,G,m)∈{J S}

= f J + {f }J + [f ]J ,

··· +

···

(j,G,m)∈[J S]

(4.6)

indicating J ∈ N0 . 4.2. Local polynomial reproducing formulas Decompositions of type (4.6) are the basis for local reproducing formulas. Let for > 0, (4.7) = x ∈ : dist(x, *) > and let C() be the set of all (complex-valued) continuous bounded functions in the (arbitrary) domain in Rn . As usual, B(x, ) stands for a ball in Rn centred at x ∈ Rn and of radius > 0. Let P M () be the collection of all complex-valued polynomials of degree less than M ∈ N in . Theorem 19. Let be an arbitrary domain in with || < ∞. Let F spq () be the spaces according to Definition 2 with 0 max p Let M ∈ N. Then there are numbers 0 > 0, a > 0, b > 0, c > 0, with the following property. For any with 0 < 0 one finds points x j ∈ , having pairwise distance of at least a, and real functions hj ∈ C() with sup |hj (x)| c,

supp hj ⊂ B(x j , b) ⊂ ,

(4.8)

484


such that the mapping U , U f = f (x j ) hj ,

f ∈ F spq (),

(4.9)

j

is polynomial reproducing in , (U P )(x) = P (x)

where P ∈ P M (), x ∈ .

(4.10)

Proof. Step 1: Let u > max(M − 1, s) in (A.18), (A.19). Then we can apply Theorem 8 now based on (4.6). Furthermore, one has for P ∈ P M (), j j n/2

j,G (P ) = 2 P (x) G,m (x) dx = 0 if (j, G, m) ∈ {J S} . (4.11) m

Since || < ∞ it follows that is a bounded domain. Let ∈ D( ) be a cut-off function with (x) = 1 if x ∈ 2 . Then (4.6) can be applied to f = P ∈ F spq () for any P ∈ P M (). For = d 2−J with a suitable (small) d > 0 one gets by (4.6), (4.5) and (4.11) that J n/2 J P (x) = 2 P (y) m (y) dy · 2−J n/2 Jm (x), (4.12)

(j,G,m)∈J S

x ∈ g2−J for some g > 0 which is independent of J . Furthermore, we have by (2.16) that supp Jm ⊂ QJ r ⊂ 2−J for some > 0, r = r(m) and (j, G, m) ∈ J S (then j = J ). In the next step we prove K 0 that there are points x k,J,m k=1 ⊂ Q−1 J r (where the latter is a cube concentric with QJ r with side-length 2−J −1 ), having pairwise distance of at least c2−J for some c > 0, and constants ckJ,m with |ckJ,m | C for some C > 0 and all admitted J, k, m such that 2J n/2

P (y) Jm (y) dy =

K

ckJ,m P (x k,J,m ),

P ∈ P M ().

(4.13)

k=1

Taking this for granted we put now for ∼ 2−J ,

U f =

K

ckJ,m f (x k,J,m ) 2−J n/2 Jm (x)

(j,G,m)∈J S k=1

=

f (x l ) hl ,

f ∈ F spq ().

(4.14)

l

Recall that F spq () → C() since s > n/p. Hence (4.14) makes sense. By (4.12) one has (4.10) and also (4.8), where x l ⇐⇒ x k,J,m have the desired properties. Step 2: It remains to prove (4.13). First, we deal with the one-dimensional case and ∼ 1. Let P (x) =

M−1 m=0

am x m ,

0 < x < 1.

(4.15)


485

Let xl = l/M (or nearby) with l = 0, . . . , M − 1. Then the determinant of the M linear algebraic equations for am , M−1

am xlm = P (xl ),

l = 0, . . . , M − 1,

(4.16)

m=0

is l max , pq p s (), F s () according to Definition 15(iii) with pq pq (q = ∞ if p = ∞) or B

0 n/p

(p < ∞ for the F -spaces). Recall that all these spaces are continuously embedded in C(), where C() has the same meaning as at the beginning of Section 4.2. Since || < ∞ one has also a continuous embedding in Lt (), 0 < t ∞. Let G2 () = C() or

G2 () = Lt (), 0 < t ∞.

(5.1)

Then one gets as a by-product of the considerations below that id :

G1 () → G2 ()

(5.2)

is not only continuous but also compact. As for technicalities connected with these embeddings one may consult [12,20, Section 4.3.1] including the explanations and references given there. This


487

will not be repeated here. In any case by (5.1), (5.2), pointwise evaluation of f ∈ G1 () makes K sense. Let x k k=1 ⊂ . Then the information map NK :

G1 () → CK ,

K ∈ N,

(5.3)

given by

NK f = f (x 1 ), . . . , f (x K ) ,

f ∈ G1 (),

(5.4)

CK → G2 ()

(5.5)

is reasonable. Let UK , UK = K ◦ NK

where K :

is an arbitrary map (also called method or algorithm). Hence UK f = K f (x 1 ), . . . , f (x K ) ∈ G2 (), f ∈ G1 (). Definition 21. Let be an arbitrary domain in Rn with || < ∞. Let G1 () and G2 () be the above spaces and let id be the embedding (5.2). (i) Then ! " gK (id ) = inf

sup

f |G1 () 1

f − UK f |G2 ()

(5.6)

K is the Kth sampling number where the infimum is taken over all K-tuples x k k=1 ⊂ and all maps UK according to (5.3)–(5.5). lin (id ) are given by (5.6) where the infimum is taken over (ii) The linear sampling numbers gK k K all K-tuples x k=1 and all linear maps UK with UK f =

K

f (x k ) hk ,

hk ∈ G2 (), f ∈ G1 ().

(5.7)

k=1

Remark 22. This is an adapted version of [12, Definition 17, 20, Definition 4.32]. There we dealt with bounded Lipschitz domains. If one admits in (5.6) not only the specific linear maps in (5.7) but all linear maps from G2 () into G1 () with rank less than K + 1 then one gets the well-known approximation numbers aK+1 (id ), hence lin (id ), aK+1 (id )gK

K ∈ N.

lin (id ) tends to zero if K → ∞. According to Theorem 23 below in all cases considered gK Then one has the same assertion for the approximation numbers aK (id ) with the well-known consequence that id is compact.

5.2. Main assertions After all these preparations we are now in the position to apply the techniques developed in [12] to determine the behaviour of the sampling numbers of the compact embeddings id in (5.2) with the specifications indicated. Recall that a+ = max(a, 0) if a ∈ R. As usual, ∼ means that there are positive equivalence constants which are independent of k ∈ N.

488


Theorem 23. (i) Let be an arbitrary domain in Rn with || < ∞. Let F spq () with 0 max

n , pq p

(5.8)

(q = ∞ if p = ∞) be the refined localisation spaces according to Definition 2. Then id :

F spq () → Lt (),

0 < t ∞

(where L∞ () can be replaced by C()) is compact. Furthermore, gk (id ) ∼ gklin (id ) ∼ k −s/n+(1/p−1/t)+ ,

k ∈ N.

spq () (ii) Let be an E-thick domain in Rn according to Definition 12 with || < ∞ and let A with A = B or A = F be the spaces as introduced in Definition 15(iii) with 0 

n p

(p < ∞ for the F -spaces). Then # : id

spq () → Lt (), A

0 < t ∞

(where L∞ () can be replaced by C()) is compact. Furthermore, # ) ∼ g lin (id # ) ∼ k −s/n+(1/p−1/t)+ , gk (id k

k ∈ N.

Proof. Step 1: As for the compactness we refer to the comments in Remark 22. Step 2: We prove (i). Let p < t ∞. Then F spq () → F t∞ (),

−

n n = s − > 0. t p

(5.9)

This follows from the well-known embedding s (Rn ) → F (Rn ), Fpq t∞

(2.10) and the monotonicity of the r -spaces. We wish to prove that gklin (id )c k −s/n+(1/p−1/t)+ ,

k ∈ N,

(5.10)

in all cases. Using (5.9) if p < t, and Hölder’s inequality for Lt () if t < p based on || < ∞, it follows that we may assume p = t, which means gklin (id )c k −s/n ,

k ∈ N,

(5.11)

where id :

F spq () → Lp (),

0 < p ∞


489

(q = ∞ if p = ∞). Let with 0 < < 1 as in (4.7) and let be the same cut-off function as after (4.11) with the usual conditions for D such that the pointwise multiplier assertion in Proposition 42(ii) can be applied uniformly with respect to . In particular, one gets for f = (1 − )f with f ∈ F spq () that f |Lp () c s −s (·)f |Lp () c s f |F spq (),

(5.12)

where we used (2.14). Next we apply the polynomial reproducing formula (4.9), (4.10) to f = f . But then one is precisely in the same situation as in [12, Proposition 21 and its proof, 20, Proposition 4.36] where one has to use now Theorem 4. This will not be repeated here. With ∼ 2−J and k ∼ 2J n one gets (5.11) from [12,20] applied (uniformly) to f and from (5.12). This proves (5.10). The rest is now the same as in [12,20]. In particular the estimate from below c k −s/n+(1/p−1/t)+ gk (id ) gklin (id ),

k ∈ N,

for some c > 0 is local and can be taken over verbally. Step 3: We prove part (ii). By Corollary 18 and part (i) we have the desired assertion for the s () with (5.8). Let 0 < q ∞ and pq spaces F 0 s > s1 > n/p,

The well-known real interpolation formula in Rn , s (Rn ) = F s0 (Rn ), F s1 (Rn ) Bpq pp pp

,q

has the -counterpart s0 s1 s () = F pp pp pq (), F () B

,q

(5.13)

.

This is not obvious and requires some efforts. But we omit the details and take it for granted. For the same linear operator U according to (4.9) we get by the above considerations s0 pp (), f − U f |Lt () cs0 −n(1/p−1/t)+ f |F

s0 pp f ∈F (),

(5.14)

s1 pp (), f − U f |Lt () cs1 −n(1/p−1/t)+ f |F

s1 pp f ∈F ().

(5.15)

and

One may also consult [20, (4.188)]. Then one gets by (5.13)–(5.15) and the interpolation property that s () pq f − U f |Lt () c s−n(1/p−1/t)+ f |B

and, using elementary embedding, s (). pq f − U f |Lt () cs−n(1/p−1/t)+ f |F

This proves the counterpart of (5.10), # )c k −s/n+(1/p−1/t)+ , gklin (id

k ∈ N.

The rest is now the same as in Step 2 and as in [12,20].

490


Acknowledgement I wish to thank the referees for careful reading and some suggestions, which I inserted. Appendix A. Function spaces on Euclidean n-space A.1. Definitions We use standard notation. Let N be the collection of all natural numbers and N0 = N ∪ {0}. Let Rn be Euclidean n-space, where n ∈ N. Put R = R1 , whereas C is the complex plane. Let S(Rn ) be the usual Schwartz space and S (Rn ) the space of all tempered distributions on Rn . Furthermore, Lp (Rn ) with 0 < p ∞, is the standard quasi-Banach space with respect to the Lebesgue measure, quasi-normed by 1/p f |Lp (Rn ) = |f (x)|p dx Rn

with the obvious modification if p = ∞. As usual, Z is the collection of all integers; and Zn where n ∈ N, denotes the lattice of all points m = (m1 , . . . , mn ) ∈ Rn with mj ∈ Z. Let Nn0 , where n ∈ N, be the set of all multi-indices, = (1 , . . . , n ) with j ∈ N0 and || =

n

j .

j =1

If x = (x1 , . . . , xn ) ∈ Rn and = (1 , . . . , n ) ∈ Nn0 then we put

x = x 1 · · · xn n

(monomials).

If ∈ S(Rn ) then () = (F )() = (2)−n/2

Rn

e−ix (x) dx,

∈ Rn ,

(A.1)

denotes the Fourier transform of . As usual, F −1 or ∨ , stands for the inverse Fourier transform, given by the right-hand side of (A.1) with i in place of −i. Here, x denotes the scalar product in Rn . Both F and F −1 are extended to S (Rn ) in the standard way. Let 0 ∈ S(Rn ) with 0 (x) = 1

if |x| 1 and 0 (y) = 0

if |y|3/2,

and let k (x) = 0 (2−k x) − 0 (2−k+1 x), x ∈ Rn , k ∈ N. $ n Since ∞ j =0 j (x) = 1 for x ∈ R , the j form a dyadic resolution of unity. The entire analytic

∨ functions j f (x) make sense pointwise for any f ∈ S (Rn ). Definition 24. Let = {j }∞ j =0 be the above dyadic resolution of unity. (i) Let 0 < p ∞,

0 < q ∞, s ∈ R.

H. Triebel / Journal of Complexity 23 (2007) 468 – 497 s (Rn ) is the collection of all f ∈ S (Rn ) such that Then Bpq ⎞1/q ⎛ ∞

∨ s (Rn ) = ⎝ f |Bpq 2j sq j f |Lp (Rn )q ⎠ < ∞

491

(A.2)

j =0

(with the usual modification if q = ∞). (ii) Let 0 < p < ∞,

0 < q ∞, s ∈ R.

s (Rn ) is the collection of all f ∈ S (Rn ) such that Then Fpq ⎛ ⎞1/q ∞

∨ ⎝ j sq q n n s f |Fpq (R ) = 2 | j f (·)| ⎠ Lp (R ) < ∞ j =0

(A.3)

(with the usual modification if q = ∞). Remark 25. The theory of these spaces may be found in [15,16,20], including many historical references. We only mention that these spaces are independent of (equivalent quasi-norms for admitted ’s). This justifies our omission of the subscript in (A.2), (A.3) in the sequel. In Section 2.1 we listed some (more or less classical) special cases. A.2. Properties We collect those (and only those) properties needed in the main body of this paper and from which we believe that they complement in a decisive way classical instruments such as derivatives and differences in connection with problems as treated here. A.2.1. Atoms Let Qj m be cubes in Rn with sides parallel to the axes of coordinates, centred at 2−j m with side length 2−j +1 where m ∈ Zn and j ∈ N0 . If Q is a cube in Rn and r > 0 then rQ is the cube in Rn concentric with Q and with side-length r times of the side length of Q. Let j m be the characteristic function of Qj m . Definition 26. Let 0 < p ∞, 0 < q ∞. Then bpq is the collection of all sequences

= j m ∈ C : j ∈ N0 , m ∈ Zn

(A.4)

such that

⎛ ⎛ ⎞q/p ⎞1/q ∞ ⎟ ⎜ ⎝ |bpq = ⎝ | j m |p ⎠ ⎠ < ∞, j =0

m∈Zn

and fpq is the collection of all sequences according to (A.4) such that ⎛ ⎞1/q ⎝ |fpq = 2j nq/p | j m j m (·)|q ⎠ Lp (Rn ) < ∞. j,m

(A.5)

492


Remark 27. If p = ∞ and/or q = ∞ then one has to modify in the usual way. Note that the factor 2j nq/p in (A.5) disappears if one relies on the p-normalised characteristic function (p) j m (x) = 2j n/p j m (x). Next we introduce atoms, which may be discontinuous. Definition 28. Let s ∈ R, 0 < p ∞, K ∈ N0 , L ∈ N0 , and c 1. Then L∞ -functions aj m : Rn → C with j ∈ N0 , m ∈ Zn , are called (s, p)-atoms if supp aj m ⊂ c Qj m ,

j ∈ N0 , m ∈ Zn ,

there exists all (classical) derivatives D aj m with || K such that |D aj m (x)|2−j (s−n/p)+j || , and

Rn

x aj m (x) dx = 0

||K, j ∈ N0 , m ∈ Zn ,

for || < L, j ∈ N, m ∈ Zn .

(A.6)

(A.7)

Remark 29. There are no moment conditions (A.7) for a0,m . Furthermore, if L = 0 then (A.7) is empty (no conditions). Of course, the above atoms depend on K, L, and c. But this will not be indicated. We put as usual 1 1 p = n (A.8) and pq = n −1 −1 p min(p, q) + + where b+ = max(b, 0) if b ∈ R. Theorem 30. (i) Let 0 s

L > p − s

and

(A.9)

s (Rn ) if, and only if, it can be represented as be fixed. Then f ∈ S (Rn ) belongs to Bpq

f =

∞ j =0 m∈Z

j m a j m ,

(A.10)

n

where for fixed c 1, aj m are (s, p)-atoms according to Definition 28 with (A.9) and ∈ bpq . Furthermore, s (Rn ) ∼ inf |b f |Bpq pq

are equivalent quasi-norms where the infimum is taken over all admissible representations (A.10). (ii) Let 0 s

and

L > pq − s

(A.11)

s (Rn ) if, and only if, it can be represented by (A.10) where be fixed. Then f ∈ S (Rn ) belongs to Fpq now for fixed c 1, aj m are (s, p)-atoms according to Definition 28 with (A.11) and ∈ fpq . Furthermore, s (Rn ) ∼ inf |f f |Fpq pq

are equivalent quasi-norms where the infimum is taken over all admissible representations (A.10).


493

Remark 31. These formulations coincide essentially with [20, Section 1.5.1]. There one finds technical comments how the convergence in (A.10) must be understood. Atoms of the above type go back essentially to [6,7]. But more details about the history of atoms may be found in [16, Section 1.9]. A.2.2. Local means Compactly supported kernels of local means are dual to atoms according to Definition 28. The cubes Qj m have the same meaning as above. Definition 32. Let A ∈ N0 , B ∈ N0 and C > 0. Then L∞ -functions kj m : Rn → C with j ∈ N0 , m ∈ Zn , are called kernels if supp kj m ⊂ C Qj m ,

j ∈ N0 , m ∈ Zn ,

there exist all (classical) derivatives D kj m with || A such that |D kj m (x)|C 2j n+j || , and

Rn

x kj m (x) dx = 0

||A, j ∈ N0 , m ∈ Zn ,

(A.12)

for || < B, j ∈ N, m ∈ Zn .

(A.13)

Remark 33. There are no moment conditions (A.13) for k0,m . If B = 0 then (A.13) is empty. Compared with Definition 28 for atoms we have different normalisations in (A.6) and (A.12) (also s (Rn ) due to the history of atoms). Roughly speaking atoms are normalised building blocks in Bpq n s and Fpq (R ), reflected by Theorem 30 based on the sequence spaces bpq and fpq in Definition 26. Now we adapt these sequence spaces to the above kernels. Again j m are the characteristic functions of Qj m . Definition 34. Let s ∈ R, 0 < p ∞, 0 < q ∞. Then bspq is the collection of all sequences

according to (A.4) such that ⎛ ⎛ ⎞q/p ⎞1/q ∞ ⎟ ⎜ |bspq = ⎝ 2j (s−n/p)q ⎝ | j m |p ⎠ ⎠ < ∞, j =0

m∈Zn

and f spq is the collection of all sequences according to (A.4) such that ⎛ ⎞1/q ⎝ s j sq q⎠ n |f pq = 2 | j m j m (·)| Lp (R ) < ∞. j,m Remark 35. In connection with wavelets we introduce the slightly modified sequence spaces s and f s without the above bar. Otherwise we wish to specify the sequence in (A.4) by the bpq pq sequence of local means k(f ) = kj m (f ) : j ∈ N0 , m ∈ Zn , (A.14)

494


where

kj m (f ) =

Rn

kj m (y) f (y) dy = f, kj m

(A.15)

s (Rn ) or f ∈ F s (Rn ). This requires that k considered as a dual pairing where f ∈ Bpq j m belongs pq n n s s to the dual spaces of Bpq (R ) or Fpq (R ). This will always be the case in what follows. But we do not discuss this point here.

Theorem 36. (i) Let 0 p − s,

B > s,

(A.16)

s (Rn ), and C > 0 be fixed. Let k(f ) be as in (A.14), (A.15). Then for some c > 0 and all f ∈ Bpq s (Rn ). k(f ) |bspq c f |Bpq

(ii) Let 0 pq − s,

B > s,

(A.17)

s (Rn ), and C > 0 are fixed. Then for some c > 0 and all f ∈ Fpq s (Rn ). k(f ) |f spq c f |Fpq

Remark 37. The duality between atoms and kernels is well reflected by (A.9), (A.11) compared with (A.16), (A.17). Later on we will even choose K=B

and L = A,

changing the roles of the needed smoothness and cancellations. The proof of the above theorem is somewhat complicated and will be shifted to a later occasion. But local means (of continuous, or, as above, of discrete type) are well known and have their own history. This theory started (at least as far as presentations in books are concerned) in [16, Sections 1.8.4, 2.4.6, 2.5.3]. A recent account may be found in [20, Section 1.4]. On the other hand dual pairings of type (A.15) have s (Rn ) and F s (Rn ) at many occasions. been considered constantly in the theory of the spaces Bpq pq Nearest to us and to the above theorem might be [9]. A.2.3. Wavelets We suppose that the reader is familiar with wavelets in Rn of Daubechies type and the related multi-resolution analysis. The standard references are [5,10,11,24]. A short summary of what is needed in our context may also be found in [20, Section 1.7.3]. In [20, Section 3.1] we dealt with s (Rn ) and F s (Rn ). We describe now wavelet bases and wavelet isomorphisms for all spaces Bpq pq an improved version based on the new Theorem 36 which was not known to us when [20, Section 3.1] was written. This improvement is helpful even in Rn , but indispensable when it comes to domains. As usual C u (R) collects all (complex-valued) continuous functions on R having continuous bounded derivatives up to order u ∈ N. Let F ∈ C u (R),

M ∈ C u (R), u ∈ N,

(A.18)


495

be real compactly supported Daubechies wavelets with M (x) x v dx = 0 for v ∈ N0 with v < u.

(A.19)

R

Recall that F is called the scaling function (father wavelet) and M is the associated (mother) wavelet. We extend these wavelets from R to Rn by the usual tensor procedure. Let G = (G1 , . . . , Gn ) ∈ G0 = {F, M}n

(A.20)

if Gr is either F or M. Let G = (G1 , . . . , Gn ) ∈ Gj ∈ {F, M}n∗ ,

j ∈ N,

(A.21)

if Gr is either F or M and where ∗ indicates that at least one of the components of G must be an M. Hence G0 has 2n elements, whereas Gj with j ∈ N has 2n − 1 elements. Let j

G,m (x) = 2j n/2

n

Gr (2j xr − mr ),

G ∈ Gj , m ∈ Zn ,

(A.22)

r=1

where j ∈ N0 . We always assume that F and M in (A.18) have L2 -norm 1. Then j G,m : j ∈ N0 , G ∈ Gj , m ∈ Zn

(A.23)

is an orthonormal basis in L2 (Rn ) (for any u ∈ N) and f =

∞

−j n/2

j,G G,m m 2 j

(A.24)

j =0 G∈Gj m∈Zn

with

j,G m

=

j,G m (f )

=2

j n/2

Rn

j j f (x) G,m (x) dx = 2j n/2 f, G,m

(A.25)

is the corresponding expansion, adapted to our later needs, where 2−j n/2 G,m are uniformly bounded functions. One may ask whether (A.23) remains to be an (unconditional) basis in other spaces. First candidates are Lp (Rn ) with 1 < p < ∞ but also related (fractional) Sobolev spaces and classical Besov spaces. We refer to the books mentioned at the beginning of this Section A.2.3 and to [20, Remarks 1.63, 1.66] for more details and further references. An extension of s (Rn ) and F s (Rn ) has been given in [9,18,20, Section 3.1.3, Theorem this theory to all spaces Bpq pq 3.5]. We describe an improved version of [18,20, Theorem 3.5]. For this purpose we adapt the sequence spaces introduced in Definition 34 (now without the bar). j

s is the collection of all sequences Definition 38. Let s ∈ R, 0 < p ∞, 0 < q ∞. Then bpq j n ∈ C : j ∈ N , G ∈ G , m ∈ Z (A.26)

= j,G 0 m

such that

⎛ ⎛ ⎞q/p ⎞1/q ∞ ⎟ ⎜ s p⎠ ⎝ =⎝ 2j (s−n/p)q | j,G |bpq ⎠ max(s, p − s)

(A.29)

s (Rn ) if, and only if, it can be represented in (A.18), (A.19). Then f ∈ S (Rn ) is an element of Bpq n s s by (A.24) with ∈ bpq . Furthermore, if f ∈ Bpq (R ) then representation (A.24) is unique with s (Rn ) onto bs .

= (f ) according to (A.25) and f → (f ) is an isomorphic map of Bpq pq (ii) Let 0 max(s, pq − s)

(A.30)

s (Rn ) if, and only if, it can be represented in (A.18), (A.19). Then f ∈ S (Rn ) is an element of Fpq n s s by (A.24) with ∈ fpq . Furthermore, if f ∈ Fpq (R ) then the representation (A.24) is unique s (Rn ) onto f s . with = (f ) according to (A.25) and f → (f ) is an isomorphic map of Fpq pq

Remark 41. Compared with [18,20, Theorem 3.5] we have now better and more natural condis (Rn ) tions (A.29), (A.30) for u in (A.18), (A.19). In [20] we relied on the characterisation of Bpq n s and Fpq (R ) according to [20, Propositon 3.3] in terms of maximal functions. This spoils the estimate for u. Replacing this proposition by the above Theorem 36 then one gets without any additional efforts the above theorem. If p < ∞, q < ∞ then (A.23) is an unconditional Schauder s (Rn ) and F s (Rn ). Otherwise we refer to [20] for a more careful discussion of basis in Bpq pq convergence. A.2.4. Homogeneity and pointwise multipliers s (Rn ) with p < ∞ complemented now We need a few further preparations for the spaces Fpq s s by F∞∞ (Rn ) = B∞∞ (Rn ). Proposition 42. Let 0 pq

(with q = ∞ if p = ∞) and 0 < ε 1. (i) Then s (Rn ) ∼ ε s−n/p f |F s (Rn ) f (ε·) |Fpq pq

(A.31)

for all s (Rn ) f ∈ Fpq

with suppf ⊂ {x : |x| < ε},

where the equivalence constants in (A.31) are independent of ε and f with (A.32).

(A.32)


497

(ii) Then there is a constant c such that s (Rn ) c f |F s (Rn ) f |Fpq pq

for all 0 < ε 1, all f according to (A.32) and all having classical derivatives up to order 1 + [s] with |D (x)| ε −|| ,

||1 + [s], |x| < 2ε.

Remark 43. These assertions are covered by [17, Sections 5.16, 5.17]. References [1] A. Cohen, Numerical Analysis of Wavelet Methods, North-Holland, Elsevier, 2003. [2] A. Cohen, W. Dahmen, R. DeVore, Multiscale decompositions on bounded domains, Trans. Amer. Math. Soc. 352 (2000) 3651–3685. [3] A. Cohen, W. Dahmen, R. DeVore, Adaptive wavelet techniques in numerical simulation, in: Encyclopedia Computational Mechanics, Wiley, Chichester, 2004, pp. 1–64. [4] W. Dahmen, Wavelet methods for PDEs—some recent developments, J. Comput. Appl. Math. 128 (2001) 133–185. [5] I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, Philadelphia, 1992. [6] M. Frazier, B. Jawerth, Decomposition of Besov spaces, Indiana Univ. Math. J. 34 (1985) 777–799. [7] M. Frazier, B. Jawerth, A discrete transform and decompositions of distribution spaces, J. Funct. Analysis 93 (1990) 34–170. [8] J.A. Hogan, J.D. Lakey, Time–Frequency and Time-Scale Methods, Birkhäuser, Boston, 2005. [9] G. Kyriazis, Decomposition systems for function spaces, Studia Math. 157 (2003) 133–169. [10] S. Mallat, A Wavelet Tour of Signal Processing, second ed., Academic Press, San Diego, 1999. [11] Y. Meyer, Wavelets and Operators, Cambridge University Press, Cambridge, UK, 1992. [12] E. Novak, H. Triebel, Function spaces in Lipschitz domains and optimal rates of convergence for sampling, Constr. Approximation 23 (2006) 325–350. [13] E.M. Stein, Singular Integrals and Differentiability Properties of Functions, Princeton University Press, Princeton, 1970. [14] H. Triebel, Interpolation Theory, Function Spaces, Differential Operators, North-Holland, Amsterdam, 1978. [15] H. Triebel, Theory of Function Spaces, Birkhäuser, Basel, 1983. [16] H. Triebel, Theory of Function Spaces II, Birkhäuser, Basel, 1992. [17] H. Triebel, The Structure of Functions, Birkhäuser, Basel, 2001. [18] H. Triebel, A note on wavelet bases in function spaces, Orlicz Centenary Banach Center Publications, vol. 64, Polish Acad. Sci., Warszawa, 2004, pp. 193–2006. [19] H. Triebel, Sampling numbers and embedding constants, Trudy Mat. Inst. Steklov 248 (2005) 275–284 [Proc. Steklov Inst. Math. 248 (2005) 268–277]. [20] H. Triebel, Theory of Function Spaces III, Birkhäuser, Basel, 2006. [21] H. Triebel, Wavelets in function spaces on Lipschitz domains, Math. Nachr., to appear. [22] H. Triebel, H. Winkelvoss, Intrinsic atomic characterizations of function spaces on domains, Math. Z. 221 (1996) 647–673. [23] H. Wendland, Local polynomial reproduction and moving least squares approximation, IMA J. Numerical Anal. 21 (2001) 285–300. [24] P. Wojtaszczyk, A Mathematical Introduction to Wavelets, Cambridge University Press, Cambridge, UK, 1997.


CBS constants for multilevel splitting of graph-Laplacian and application to preconditioning of discontinuous Galerkin systems R.D. Lazarova, b,∗ , S.D. Margenovc a Department of Mathematics, Texas A&M University, College Station, TX 77843, USA b Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria c Institute for Parallel Processing, Bulgarian Academy of Sciences, Sofia, Bulgaria

Received 31 May 2006; accepted 3 October 2006 Available online 19 December 2006 Dedicated to Professor Henryk Wozniakowki on the occasion of his 60th birthday

Abstract The goal of this work is to derive and justify a multilevel preconditioner of optimal arithmetic complexity for symmetric interior penalty discontinuous Galerkin finite element approximations of second order elliptic problems. Our approach is based on the following simple idea given in [R.D. Lazarov, P.S. Vassilevski, L.T. Zikatanov, Multilevel preconditioning of second order elliptic discontinuous Galerkin problems, Preprint, 2005]. The finite element space V of piece-wise polynomials, discontinuous on the partition T, is projected onto the space of piece-wise constant functions on the same partition that constitutes the largest space in the multilevel method. The discontinuous Galerkin finite element system on this space is associated to the so-called “graph-Laplacian”. In 2-D this is a sparse M-matrix with −1 as off diagonal entries and nonnegative row sums. Under the assumption that the finest partition is a result of multilevel refinement of a given coarse mesh, we develop the concept of hierarchical splitting of the unknowns. Then using local analysis we derive estimates for the constants in the strengthened Cauchy–Bunyakowski–Schwarz (CBS) inequality, which are uniform with respect to the levels. This measure of the angle between the spaces of the splitting was used by Axelsson and Vassilevski in [Algebraic multilevel preconditioning methods II, SIAM J. Numer. Anal. 27 (1990) 1569–1590] to construct an algebraic multilevel iteration (AMLI) for finite element systems. The main contribution in this paper is a construction of a splitting that produces new estimates for the CBS

∗ Corresponding author. Department of Mathematics, Texas A&M University, College Station, TX 77843, USA.

Fax: +1 979 8624190. E-mail addresses: [email protected] (R.D. Lazarov), [email protected] (S.D. Margenov). 0885-064X/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2006.10.003

R.D. Lazarov, S.D. Margenov / Journal of Complexity 23 (2007) 498 – 515

499

constant for graph-Laplacian. As a result we have a preconditioner for the system of the discontinuous Galerkin finite element method of optimal arithmetic complexity. © 2006 Elsevier Inc. All rights reserved. Keywords: Discontinuous Galerkin; Second order elliptic equation; Graph-Laplacian; Multilevel preconditioning; CBS constant

1. Introduction Consider a second order elliptic problem on a polygonal domain ⊂ R 2 : −∇ · (a(x)∇u) = f (x)

in ,

u(x) = gD

on D ,

a∇u · n = gN

on N .

(1.1)

Here n is the exterior unit normal vector to * ≡ . The boundary is assumed to be decomposed into two disjoint parts D and N , D ∩ N = ∅ and the boundary data gD , gN are smooth. For the formulation below we shall need the existence of the traces of u and a∇u · n on certain interfaces in . Thus, the solution u is assumed to have the required regularity. To simplify our exposition we assume that the set D is not empty and its 1-D measure is nonzero. In Section 2 we introduce two discontinuous Galerkin FEMs for the above problem that lead to symmetric and positive definite algebraic problems. These are special cases of a more general class of discontinuous Galerkin schemes for second order elliptic problems (see, e.g. [1,10,13,17]). Below we comment on some of the first works on efficient solution methods for system linear equations arising in discontinuous Galerkin approximations [9,15,16]. In [15] Gopalakrishnan and Kanschat discuss and study multigrid method (MG) for a symmetric discontinuous Galerkin method. A multigrid with variable number of smoothing steps is considered under the standard assumptions: (1) there is a sequence of nested triangulations T1 ⊂ T2 ⊂ · · · ⊂ TJ and multilevel spaces V1 ⊂ V2 ⊂ · · · ⊂ VJ of piece-wise polynomials that define a sequence of operators A1 , A2 , . . . , AJ and corresponding projections Pk−1 : Vk → Vk−1 ; (2) Pk−1 has certain weak approximation property: for all u ∈ Vk , k = 2, . . . , J : |||u − Pk−1 u|||k Chk Ak u−1+ . It is shown that if the number m(k) of smoothing steps increases as k decreases, say m(k) = 2J −k , then MG preconditioner has optimal arithmetic complexity comparable to the complexity of W-cycle. Brenner and Zhao [9] considered rectangular partitions and bilinear finite element spaces in the discontinuous Galerkin method for the above problem. Their main result for V-cycle could be summarized as follows: if the solution satisfies the a priori estimate uH 1+ Cf H −1+ , ∈ ( 21 , 1], then there is m0 which is independent of the number oflevels k such that the norm of the multigrid error propagation operator Emg satisfies the estimate Emg C/m , k 1, m m0 . This shows that the higher smoothness of the solution improves the MG convergence. Similar results are obtained for the W-cycle. Recently in [11,16] two-level and multilevel iteration methods for solving discontinuous Galerkin systems have been proposed and studied. The approach is based on the following idea: first apply the classical two-grid method involving the original space of discontinuous functions V and an auxiliary space V (0) of piece-wise polynomials defined on the same mesh and then use

500


the algebraic multilevel iteration (AMLI) of Axelsson and Vassilevski [3,18]. In [11,16] three different choices for the auxiliary space V (0) have been proposed and studied: (1) continuous linear finite elements, (2) nonconforming Crouzeix–Raviart elements, (3) space of discontinuous piece-wise constant functions. Then the convergence of the two-grid method is considered in the general framework established in [14]. Under certain assumptions it has been proved that the two-grid method converges independently of the mesh size. The numerical results presented in [11] well illustrate the robust scalability of the algorithm. In [16] a multilevel extension based on the AMLI of Axelsson and Vassilevski [3,18] has been studied. The third choice of spaces led to a novel and interesting problem from mathematical viewpoint. The discontinuous Galerkin scheme on the space V (0) of discontinuous piece-wise constant functions produces a symmetric and positive definite matrix, called “graph-Laplacian”. In the past such algebraic problems were generated by cell-centered approximations of elliptic equations on rectangular grids. On rectangular grids the “graph-Laplacian” approximates the Laplacian and the analysis could use some of the tools of the general multigrid theory. Multigrid preconditioners of optimal complexity for such problems were analyzed and tested in [8]. On an irregular grid the corresponding linear problem does not approximate an elliptic problem and the study of optimal preconditioners does not fit into the general framework of multigrid or multilevel methods. In this paper we consider AMLI method for preconditioning the graph-Laplacian matrix. The AMLI method is suitable for such situations since it could be analyzed by algebraic means, see e.g. [3,5,6,12,18]. We have assumed that the matrix is generated from a finite element partition obtained by a regular refinement of a given initial mesh consisting of both triangles and quadrilaterals. We note that this approach is applicable to more general partitions, including pentagons, etc. and is further capable of constructing and analyzing AMLI preconditioners for discontinuous Galerkin systems in 3-D. One way to construct and justify optimal preconditioners is to introduce multilevel splitting of the unknowns as proposed originally by Bank and Dupont in [4] for the standard finite element approximations. Our study is based on certain properties of the hierarchical splitting of the spaces of piece-wise constant functions represented by their nodal bases. These include the locality of the new basis, so that the pivot block in the two-level matrix has a uniformly bounded condition number. This block corresponds to the unknowns which are complementary to the coarse grid unknowns. The related second diagonal block can be viewed as a certain aggregation of the current two-level matrix. Within a suitably introduced parametric setting, we require this block to be not only associated but equal to the coarse grid matrix. The key role in the derivation of optimal convergence rate estimates plays the constant in the strengthened CBS inequality, associated with the angle between the two subspaces of the hierarchical splitting. It turns out that existence only of a uniform estimate for this constant is not enough, therefore, accurate quantitative bounds for have to be found as well. More precisely, the value of the upper bound for ∈ (0, 1) is a part of the construction of the multilevel extension of the related two-level method. Thus, the main contribution of our paper is construction of a splitting of the piece-wise constant spaces generated by hierarchy of partitions of triangles and quadrilaterals that produced new estimate for Cauchy–Bunyakowski–Schwarz (CBS) constant for graph-Laplacian. This in turn has generated an optimal AMLI preconditioner for the graph-Laplacian and therefore for the discontinuous Galerkin systems as well. The paper is organized as follows. In Section 2 we describe two symmetric interior penalty discontinuous Galerkin (IPDG) finite element approximations of second order elliptic problem. The two-grid algorithm which reduces the problem to a system with graph-Laplacian is introduced in Section 3. Sections 4 and 5 contain the needed setting of the AMLI method and the theory of


501

the CBS constant. The new estimates of the CBS constant for graph-Laplacian and the related optimal multilevel preconditioner are presented in Section 6. 2. Discontinuous Galerkin FE approximation Let T be a partitioning of into finite number of open subdomains (finite elements) K with boundaries *K. We assume that the partition is quasi-uniform and regular. For each finite element we denote by hK its size (say its diameter) and further h = maxK∈T hK . Let e = K 1 ∩ K 2 be the interface (or edge) of two adjacent subdomains K1 , K2 . The set of all such interfaces is denoted by E0 , note that these edges are inside . Further, ED and EN will be the edges of finite elements on the boundary D and N , respectively. Finally, E will be the set of all edges: E = E0 ∪ ED ∪ EN . Here we allow finite elements of polygonal shape with hanging nodes, etc. The important assumption is that if e is an edge of a finite element K ∈ T then |e| ≈ hK . In other words we do not allow very small edges. On the partition T we define the finite element space V := V(T ) := {v ∈ L2 () : v|K ∈ Pr (K), K ∈ T }, where Pr is the set of polynomials of degree r 0. For each e = K¯ ∩ K¯ ∈ E0 we define the jump [[v]] of any function v ∈ V as the vector v|K · n + v|K · n , e = K¯ ∩ K¯ i.e. e ∈ E0 , [[v]]e := v|K · n, e = K¯ ∩ D i.e. e ∈ E \ E0 . Here n and n are the external unit vectors to K and K , respectively. We shall also need the following notation for the average value of the traces of the normal component of a vector function v ∈ V on e = K¯ ∩ K¯ : {v}|e :=

1 2 {v|K · n v|K · n,

− v|K · n }, e = K¯ ∩ K¯ i.e. e ∈ E0 , e = K¯ ∩ D i.e. e ∈ E \ E0 ,

and the piece-wise constant function hE defined on E as hE = hE (x) = |e| for x ∈ e ∈ E. Further denote (a∇v, ∇v)T :=

K∈T

h−1 [[u]], [[v]] E

E ∪ED

a∇u ∇v dx,

K

:=

e∈E ∪ED

e

h−1 E [[u]] · [[v]] ds.

Finally, we shall use the following norm on V: |||v|||2h = (a∇v, ∇v)T + h−1 E [[v]], [[v]]

E ∪ED

.

(2.1)

502


We shall consider the following symmetric IPDG finite element method (see, e.g. [1]): find uh ∈ V such that A(uh , v) = L(v) ∀ v ∈ V, where

(2.2)

A(uh , v) ≡ (a∇uh , ∇v)T + hE −1 [[uh ]], [[v]]

E ∪ED

− {a∇uh }, [[v]]E ∪ED −[[uh ]], {a∇v}E ∪ED and

L(v) = (f, v) + gN , vEN − gD , a∇v · nED + hE −1 gD , v

(2.3)

ED

.

(2.4)

It is well known that if is sufficiently large then the bilinear form (2.3) is coercive and bounded in V equipped with norm (2.1) (see, e.g. [1]). Another symmetric discontinuous Galerkin scheme could be derived by using an approach developed in the work of Ewing et al. [13]. In this case we get a bilinear form A(uh , v) ≡ (a∇uh , ∇v)T + hE −1 [[uh ]], [[v]] E ∪ED

− {a∇uh }, [[v]]E ∪ED − [[uh ]], {a∇v}E ∪ED − 41 −1 hE [[a∇uh · n]], [[a∇v · n]]E0

(2.5)

which is coercive for sufficiently large . Note that the corresponding DG scheme is slightly different from (2.2). We summarize the main results regarding the discontinuous Galerkin method (2.2) in the following lemma: Lemma 2.1. Assume that the finite element partition T is regular and locally quasi-uniform. Then the bilinear form A(·, ·) defined by (2.3) or (2.5) is coercive and bounded in V equipped with the norm (2.1) for any sufficiently large > 0 and the discontinuous Galerkin method (2.2) has unique solution. 3. Two-level method Now we present the two-level (also called two-grid) iteration method, which for DG systems was introduced in an algebraic setting in [11,16] and studied in the general algebraic framework of [14]. Together with the DG space V it uses an auxiliary, in general smaller, space V (0) and proper restriction and prolongation operators. In [16] three possibilities for V (0) were considered and studied. Here, we take one of these, namely V (0) , as the space of piece-wise constant functions over the partition T . To describe the two-grid method and the results from [11] we need some matrix notations. We shall first reformulate problem (2.2) in terms of matrices and the vector spaces of degrees of freedom. Let n and n0 be the dimensions of the spaces V and V (0) , respectively, and n0 let {j }nj=1 and {j }j =1 be their nodal bases. In the case we consider here, j is the characteristic function of the finite element Kj ∈ T and n0 is the number of finite elements in T . We denote by V and V(0) the spaces of n- and n0 -dimensional vectors of the degrees of freedom of V and V (0) , respectively. These could be identified as the spaces Rn and Rn0 . For linear finite elements


503

over triangular mesh the space V is identified with R3n0 (i.e. n = 3n0 ), for bilinear elements over quadrilateral mesh, with R4n0 (i.e. n = 4n0 ), while for both cases V(0) is identified with Rn0 . The elements of V and V(0) are further denoted in bold face, i.e. u, v, etc. Each coarse grid basis function k ∈ V (0) has unique expansion via the basis of V: k =

n

p j k j ,

k = 1, . . . , n0 .

(3.1)

i=1

Now we introduce the matrix P0 = {pik }, j = 1, . . . , n and k = 1, . . . , n0 , which could be viewed as an injection (prolongation) operator from V(0) to V. To avoid proliferation of indexes further we shall leave out the subindex h in our notation, that is, uh is replaced by u. We define the standard 2 -inner product for elements on V and V(0) : (u, v)2 = vT u

for v, u ∈ V (or v, u ∈ V(0) ).

Then we introduce the matrix A˜ = A˜ D + A˜ P , where (A˜ D u, v)2 = (a∇u, ∇v)T ,

(A˜ P u, v)2 = h−1 [[u]], [[v]] E

E ∪ED

.

Obviously both matrices are symmetric and A˜ D is semidefinite, while A˜ P is positive definite. Next we define the “stiffness matrices” A and A(0) by the identities (Au, v)2 = A(u, v),

v, u ∈ V,

(A(0) u, v)2 = A(u, v),

v, u ∈ V(0) .

Because of expansion (3.1), obviously, A0 = P0T AP0 . Finally, we introduce the norm ˜ v)2 = (A˜ D v, v)2 + (A˜ P v, v)2 , |||v|||2 = |||v|||2 = (Av, where u, v ∈ V and by duality u, v ∈ V. From the symmetry, the coercivity and the boundedness of the bilinear form A it follows that A ˜ For studying is symmetric and positive definite matrix that is spectrally equivalent to the matrix A. the convergence of the two-grid iteration we shall also need the operator norm |||A|||. Let M be a smoothing matrix that satisfies the condition M T + M − A is symmetric and positive definite. The following two-level method has been studies and justified in [14]: Two-level algorithm. (0) Let u0 be given For ui “approximating” u, the solution of Au = b, define ui+1 as follows: (1) Set x1 = ui − M −1 (Aui − b) (presmooth) (2) x2 = x1 − P0 (A(0) )−1 P0T (Ax1 − b) (correct) (3) ui+1 = x2 − M −T (Ax2 − b) (postsmooth). More general two-grid methods with m presmoothing and m postsmoothing steps could be also justified. In [14], the convergence of the two-level method, which is characterized by the error transfer operator Etg , that is, u − ui+1 = Etg (u − ui ), has been established in the following form: Etg = 1 − 1/K,

K sup

v∈V

I − Qv22 |||v|||2

,

504


ΓD 9 3 11 6

2 5

1 4

8

7

10

Fig. 1. Partition T and related graph-Laplacian.

where Q : V → V(0) is an 2 -orthogonal projection operator. A sufficient condition for convergence, independent of the step size, is existence of an operator Q : V → V(0) such that (I − Q)v22 C |||v|||2 ,

∀v ∈ V.

In [11] the following result has been obtained by using the general theory of [14]: Theorem 3.1. The two-level method with Gauss–Seidel as a smoother and coarse space V (0) of piece-wise constant functions is uniformly convergent with respect to the number of degrees of freedom. Further, in [16] this result has been extended to a multilevel method using the general framework of the AMLI of Axelsson and Vassilevski [3] and the basic properties of the two-level projection methods of Falgout et al. [14]. Our goal is to obtain similar results by using multilevel splitting of the unknowns and establishing sharp estimate for the angle between the corresponding spaces. Now consider the bilinear form A(·, ·), defined by (2.3) or (2.5), on the space V (0) of piecewise constant functions, which reduces the formula to the jump part only, h−1 E [[u]], [[v]] E ∪E . D Then A(0) , further called “graph-Laplacian” is defined by (A(0) u, v)2 = h−1 [[u]], [[v]] E E ∪ED

for u, v ∈ V (0) . Now we associate the partition T with a planar graph. The finite elements are the vertices and the interfaces of the finite elements are the edges of the graph. Then taking as degrees of freedom the values of a function in V (0) over each finite element, we shall get a matrix that has an entry −1 at the s graph vertices connected to a chosen graph vertex and an entry s at the vertex itself. As an illustration the matrix representing “graph-Laplacian” for a particular mesh is given in Fig. 1. For any partitions into quadrilaterals, regardless of the shape, we get the standard 4-point stencil with 4, −1, −1, −1, −1, probably the reason for the name. We note that the case of piece-wise constant space is simple and natural. It is a generalization of the technique of cell-centered schemes that are still popular and frequently exploited in petroleum reservoir modeling using rectangular (or parallelepiped) meshes. These schemes are produced either by finite difference approximation of the elliptic problem or by mixed finite element


505

approximations with subsequent elimination of the fluxes. Preconditioners for such systems were developed in [8]. Important ingredient of the analysis in [8] is the fact that on an orthogonal grid the cell-centered scheme has approximation property on all levels. The matrix of the graph-Laplacian is a symmetric M-matrix. However, this matrix does not have any approximation property on an arbitrary grid. Therefore, the multigrid theory that relies on such property (see, e.g. [8]) cannot be used for designing a robust preconditioner by using “graph-Laplacian”. Below is the matrix of graph-Laplacian for the mesh shown in Fig. 1: ⎡

A(0)

⎤ 1 −1 ⎢ −1 3 −1 ⎥ −1 ⎢ ⎥ ⎢ ⎥ −1 4 −1 −1 ⎢ ⎥ ⎢ ⎥ 2 −1 −1 ⎢ ⎥ ⎢ ⎥ −1 −1 3 −1 ⎢ ⎥ ⎢ ⎥. −1 −1 3 −1 =⎢ ⎥ ⎢ ⎥ −1 3 −1 −1 ⎢ ⎥ ⎢ −1 −1 4 −1 −1 ⎥ ⎢ ⎥ ⎢ −1 −1 4 −1 ⎥ ⎢ ⎥ ⎣ −1 2 −1 ⎦ −1 −1 −1 4

4. AMLI preconditioner We construct hierarchical basis functions (HB) for multilevel preconditioners for the algebraic system involving graph-Laplacian. To this end we follow the framework for constructing HB two-level preconditioners for conforming FEM, as described e.g. in [4], and their multilevel extensions, known as AMLI, see e.g. [3,18]. The construction of a hierarchical decomposition for the discontinuous Galerkin FE spaces is neither obvious nor unique. To fit the classical HB construction techniques in the nonconforming case, we search for a hierarchical decomposition of the fine grid degrees of freedom, such that one part corresponds to the degrees of freedom of the coarse grid problem. Such kind of aggregation-based hierarchical decompositions were recently studied in [5,6] for the case of Crouzeix–Raviart nonconforming FEs. Let A(0) u = b be the algebraic formulation of our problem where A(0) is a symmetric positive definite graph-Laplacian, corresponding to the finest discretization T0 . Consider a sequence of nested meshes (triangulations) Tm ⊂ Tm−1 ⊂ · · · ⊂ T0 of the domain , the spaces of piece-wise constant functions V (m) ⊂ V (m−1) ⊂ · · · ⊂ V (0) , the spaces of degrees of freedom V(m) , V(m−1) , . . . , V(0) , and the number of degrees of freedom nm < nm−1 < · · · < n0 . Further, introduce the graph-Laplacian associated with each triangulation level, A(m) , A(m−1) , . . . , A(0) , −1 (s) with (A u, v)2 = hE [u], [v] for u, v ∈ V (s) , s = m, . . . , 0. (k)

E ∪ED

k the set of standard piece-wise constant basis functions on level We denote by (k) = {i }ni=1 (k) n k (k) k and by = { i }i=1 the set of properly defined HB. The hierarchical basis (k) is determined (k) (k) (k) (k) by a nonsingular transformation matrix J , i.e., = J . Then the hierarchical basis (k) and hierarchical basis spaces of degrees of freedom Vk are expressed as stiffness matrix A follows:

(k) = J (k) −T V(k) V

(k) = J (k) A(k) J (k) T . and A

(4.1)

506


(k) is partitioned into a two-by-two block form: On each level k the matrix A (k) (k) A }nk − nk+1 A 11 12 (k) = , A (k) (k) }n k+1 A21 A22

(4.2)

where nk+1 is the dimension of the space V(k+1) . Regarding splitting (4.2) we make the following assumption: Assumption 4.1. The hierarchical basis is locally constructed so that the transformation matrix is sparse. Moreover, the following relations hold: (k) = A(k+1) , A (k) = O(1). A (4.3) 22 11 Obviously splitting (4.2) generates a splitting in the space V(k) into two subspaces in the T (k) where (k+1) , then this vT2 )T ∈ V v2 ∈ V v such that v = ( vT1 , following manner: if v = J (k) (k) (k) gives the splitting V(k) = V1 + V2 where T T v1 0 (k) (k) ∈ V1 , v2 = J (k) ∈ V2 . (4.4) v = v1 + v2 ∈ V(k) with v1 = J (k) 0 v2 Since the matrix A(k) is symmetric and positive definite it generates an inner product and geometry in V(k) . The ideal case of splitting (4.4) would be when the vectors v1 , v2 are orthogonal in the (k) (k) A(k) -inner product. In any case, between these spaces V1 and V2 there is an angle in the A(k) inner product. The cosine of the angle is defined by the constant (k) in the strengthened CBS inequality: (k) (k) (A(k) v1 , v2 )(k) (A(k) v1 , v1 ) (A(k) v2 , v2 ), v1 ∈ V1 , v2 ∈ V2 . Later in the next section this inequality will be given a different, but equivalent form, which is more convenient for estimating (k) . The following assumption on the constant (k) plays an important role in the construction of hierarchical preconditioners: Assumption 4.2. There is an absolute constant such that the following inequality is valid for all k 0: (k) < 1. We will analyze the AMLI generalization of the multiplicative two-level method, corresponding to the introduced hierarchical setting. AMLI was originally proposed by Axelsson and Vassilevski for the case of conforming linear FEs, see [3,18]. Algorithm 4.3 (AMLI method). C (m) = A(m) ; for k = 0, 1, . . . , m − 1 (k) (k)−1 (k) C 0 −1 −T A I C 11 12 11 J (k) , C (k) = J (k) (k) ˜ (k+1) 0 I A A 21


507

(k) are symmetric positive definite approximations of A (k) , and the Schur where the blocks C 11 11 complement approximation is stabilized by −1 −1 −1 A˜ (k+1) = I − p C (k+1) A(k+1) A(k+1) . The acceleration polynomial is explicitly defined by 1 + − 2t 1 + T 1− , p (t) = 1+ 1 + T 1− where ∈ (0, 1) is a properly chosen parameter, and T stands for the Chebyshev polynomial of degree with L∞ -norm 1 on (−1, 1). The following theorem is a straightforward reformulation of the basic result from [3]. Theorem 4.4. Let Assumptions 4.1 and 4.2 hold and let the integer satisfy

1 1 − 2

< < ,

where = max nk /nk+1 . Then there exists ∈ (0, 1), suchthat the AMLI preconditioner C (0) de−1 fined in (4.3) has optimal condition number C (0) A(0) = O(1), and the total computational complexity is O(n0 ). Remark 4.5. Explicit formulas for the AMLI parameter are given in [3] where the considered acceleration polynomials are of degree = 2 and = 3. The constant in the strengthened CBS inequality (CBS constant) (k) is a quantitative characterization of the HB. The remaining part of the paper is devoted to the construction of hierarchical splittings for the case of class of matrices represented by graph-Laplacian that satisfy Assumptions 4.1 and 4.2. 5. On the local estimates of the CBS constant (k) × V (k) be the partitioning corresponding to the block two-by-two presentation (k) = V Let V 1 2 (k) . More appropriate for computation of the CBS (4.2) of the hierarchical basis stiffness matrix A constant is the following formula: T (k) (k) −1 (k) (k) T A A12 v2 v A v1 A12 v2 2 21 11 sup = sup . (k) = (k) v2 (k) v2T A T (k) v (k) ,i=1,2 v T A (k) vi ∈V v2 ∈V 22 1 11 v1 v2 A 2 i 2 22 (5.1) Now, let us assume that (k) = (k) A A e , e∈F

v=

e∈F

ve ,

508


e are symmetric positive semidefinite local matrices, F is some set of indices, and the where A summation is understood as assembling. The global hierarchical basis splitting naturally induces (k) the block two-by-two presentation of the local matrix A e , namely, ⎤ ⎡ (k) (k) A A ve,1 e:11 e:12 ⎦ (k) ⎣ . (5.2) , ve = Ae = ve,2 (k) A (k) A (k)

e:21

e:22

(k) (k) (k) (k) (k) Let V e be the restriction of V , corresponding to the local matrix Ae , and let Ve = Ve:1 × (k) be the partitioning corresponding to (5.2). V e:2 v1 (k) (k) (k) ∈ ker(A Lemma 5.1. Assume that for all w = e ), v1 ∈ Ve:1 , v2 ∈ Ve:2 , it holds that v2 (k) ). Then the local CBS constant (k) v2 ∈ ker(A e is determined by e:22 T (k) (k) −1 (k) Ae:12 v2 v2 Ae:21 Ae:11 (k) = sup < 1, (5.3) e (k) v2 v2T A (k) \ker(A (k) ) v2 ∈V e:22 e:2 e:22

and the following estimate holds: (k) max (k) e .

(5.4)

e∈F

Proof. The assumption of the lemma is necessary condition for the correctness of (5.3), see e.g. [2,12]. (k) , and let ve:i ∈ V (k) be the restrictions, corresponding to the local matrices Now, let vi ∈ V i e:i (k) e , i = 1, 2. Then A T (k) T (k) T (k) ve:1 Ae:12 ve:2 ve:1 Ae:12 ve:2 v1 A12 v2 = e∈F

(k) T A T A (k) ve:2 (k) v v ve:2 e:1 e e:22 e:1 e:11

e∈F

max (k) e e∈F

e∈F

max (k) e e∈F

T A (k) ve:1 vT A (k) ve:1 e:11 e:2 e:22 ve:2

e∈F

e∈F

T A (k) ve:1 ve:1 e:11

T A ve:2 ve:2 e:22 (k)

e∈F

(k) v1 vT A (k) max (k) v1T A e 11 2 22 v2 e∈F

which completes the proof.

Remark 5.2. The obtained result is a straightforward generalization of the known estimate for the standard finite element method, where the local matrices are the element stiffness matrices.


509

6. Estimates of the CBS constant for graph-Laplacian Let us consider two consecutive discretizations Tk ⊂ Tk−1 . In what follows we will derive uniform estimates of the CBS constant based on properly introduced construction of hierarchical basis and related decomposition of the graph-Laplacian: (k) = (k) A(k) = A(k) A A e , e , e∈E

e∈E

as a sum of local matrices associated with the set of edges E of the coarser grid Tk . 6.1. Mesh of triangles Let us assume that the coarsest mesh Tm consists of triangles only, and each refined mesh is obtained by dividing the current triangle into four congruent triangles connecting the midpoints (k) of its sides. Following the numbering from Fig. 2, we introduce the local matrix Ae in the form ⎡ ⎤ t −1 t −1 ⎢ 1 −t ⎥ 2 2 ⎢ ⎥ ⎢ −t t ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1−t ⎢ t −1 ⎥ ⎢ ⎥ 1+ −1 ⎢ 2 ⎥ 2 ⎢ ⎥ ⎢ t −1 ⎥ 1 − t ⎢ ⎥ −1 1+ ⎢ 2 ⎥ (k) 2 Ae = ⎢ (6.1) ⎥. ⎢ ⎥ t −1 ⎥ t −1 ⎢ ⎢ ⎥ 1 −t ⎢ ⎥ 2 2 ⎢ ⎥ ⎢ ⎥ −t t ⎢ ⎥ ⎢ ⎥ t −1 1−t ⎢ ⎥ −1 1 + ⎢ ⎥ 2 2 ⎢ ⎥ ⎣ 1−t ⎦ t −1 1+ −1 2 2 This edge matrix is also associated with the macroelement E = T1 + T2 of the two adjacent triangles from Tk with a common side e. The role of the weight parameter t ∈ (0, 1) is correctly to distribute the contribution of the links between the interior nodes. For example, the couple (1,2) t −1 has a weight t here, but will appear also with a weight of in the local matrices associated 2 with the rest two sides of the current triangle, so that the total contribution to have the right weight of one. It is natural to introduce the hierarchical basis locally with respect to the triangles from Tk . Let us consider the macroelement T1 and the set of standard piece-wise constant basis functions (k) (k) (k) (k) T1 = { T1 :i }4i=1 in the form T1 = {T1 :i }4i=1 . We introduce the related hierarchical basis (k) (k) (k) (k) (k) T1 :1 = T1 :1 + pT1 :2 + qT1 :3 + qT1 :4 , (k) (k) (k) (k) (k) T1 :2 = T1 :1 + qT1 :2 + pT1 :3 + qT1 :4 ,

510


6 T2 8

7

5 e

4

1

3 2

T1

Fig. 2. Macroelement of two adjacent triangles from Tk . (k) (k) (k) (k) (k) T1 :3 = T1 :1 + qT1 :2 + qT1 :3 + pT1 :4 ,

(k) (k) (k) (k) (k) T1 :4 = r T1 :1 + T1 :2 + T1 :3 + T1 :4 ,

(6.2)

where p, q are parameters to be determined later, and r is the corresponding scaling factor. Then (k) the assembled transformation matrix Je is as follows: ⎡ ⎤ 1p q q ⎢1 q p q ⎥ ⎢ ⎥ ⎢1 q q p ⎥ ⎢ ⎥ ⎢ ⎥ 1 p q q (k) ⎢ ⎥ (6.3) Je = ⎢ ⎥ 1 q p q ⎢ ⎥ ⎢ 1 q q p⎥ ⎢ ⎥ ⎣r r r r ⎦ r r r r and

(k) A e

=

(k) T Je(k) A(k) e Je

=

(k) (k) A A e:11 e:12 (k) A (k) A e:21 e:22

.

Lemma 6.1. Consider the hierarchical basis (6.2) for nested meshes of triangles. Then (k) = A(k+1) A 22

if and only if

r=

√

2 2 .

(k) has Proof. The definition of the last two terms in the local hierarchical basis ensures that A e:22 row sums/column sums equal to zero. Then, the equivalent statement 1 −1 (k) = A e:22 −1 1 simply follows from the equalities (k) (1, 1) = r 2 A e:22

4 i,j =1

2 A(k) e (i, j ) = 2r ,

2 (k) A e (2, 2) = r

8 i,j =5

2 A(k) e (i, j ) = 2r .


a

b

6

6 Q2

5

T

8

7

7

e 4

3 Q1

1

511

5

4

3

2

Q

8

e

1

2

Fig. 3. (a) Macroelement of two adjacent quadrilaterals of the mesh Tk . (b) Macroelement of adjacent triangle and quadrilateral of Tk .

(k)

Now, it is readily seen from (5.3) that (e )2 = 1 − where is the eigenvalue (which is unique in this particular case) of the eigenproblem (k) S (k) e v = Ae:22 v,

v =

1 , 1

. −A (A )−1 A where Se = A e:22 e:21 e:11 e:12 (k)

(k)

(k)

(k)

(k)

Lemma 6.2. Consider the hierarchical splitting (6.1), (6.3) with parameters p = 1, q = −0.5, and t = 0.5. Then the following estimate holds uniformly with respect to the refinement level k: 2 2e = 2T T =

16 25 .

(6.4)

Proof. The construction of the hierarchical basis and all related matrices are independent of the particular edge e ∈ E and of the current refinement level. Then, the estimate of the local CBS constant follows straightforwardly by simple computations with fixed numbers. Here, T T indicates that the interface edge is always between two triangles.

Remark 6.3. Varying the parameters (p, q, t) we can get a family of hierarchical splittings. For 9 example, the parameter set p = −1, q = 0, and t = 13 corresponds to 2e = 13 which leads to the condition number estimate of the related multiplicative two-level method, < 13 4 . The latter result is derived by different arguments in [16]. 6.2. Mesh of quadrilaterals We assume here that the coarsest mesh Tm consists of quadrilaterals only, and each next refinement is obtained by dividing the current element into four new quadrilaterals as illustrated in Fig. 3(a). Following the setting of the previous subsection and the node numbering from Fig. 3,

512

R.D. Lazarov, S.D. Margenov / Journal of Complexity 23 (2007) 498 – 515 (k)

we introduce the new local matrix Ae in the form ⎡

A(k) e

1 2

−s

⎢ ⎢ ⎢ 1 ⎢ −s ⎢ 2 ⎢ ⎢ 1 ⎢ ⎢s − ⎢ 2 ⎢ ⎢ 1 ⎢ s− ⎢ 2 =⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

s−

⎤

1 2 1 s− 2

3 2

−s

−s

3 2

−1

−1 −1 1 2

−s

−s

1 2

s− −1

1 s− 2 s−

1 2 s−

1 2

3 2

−s

−s

3 2

1 2

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(6.5)

The weight parameter s ∈ (0, 1) is again responsible for the correct distribution of the contribution of the links between the interior nodes of each quadrilateral macroelements Qi , see Fig. 3(a). The hierarchical basis is now introduced locally with respect to the quadrilaterals from Tk . If we consider the macroelement Q1 , then the set of standard piece-wise constant basis functions is (k) (k) (k) (k) T1 = { T1 :i }4i=1 is introduced in the form T1 = {T1 :i }4i=1 , and the related hierarchical basis (k) (k) (k) (k) (k) T1 :1 = (T1 :1 + T1 :2 ) − (T1 :3 + T1 :4 ), (k) (k) (k) (k) (k) T1 :2 = (T1 :1 + T1 :3 ) − (T1 :2 + T1 :4 ), (k) (k) (k) (k) (k) T1 :3 = (T1 :1 + T1 :4 ) − (T1 :2 + T1 :3 ), (k) (k) (k) (k) (k) T1 :4 = r(T1 :1 + T1 :2 + T1 :3 + T1 :4 ),

(6.6) (k)

where r is again the corresponding scaling factor. Then the assembled transformation matrix Je reads as ⎡ ⎤ 1 1 −1 −1 ⎢ 1 −1 1 −1 ⎥ ⎢ ⎥ ⎢ 1 −1 −1 1 ⎥ ⎢ ⎥ ⎢ 1 1 −1 −1 ⎥ ⎥. Je(k) = ⎢ (6.7) ⎢ 1 −1 1 −1 ⎥ ⎢ ⎥ ⎢ 1 −1 −1 1 ⎥ ⎢ ⎥ ⎣r r r r ⎦ r r r r We follow the local analysis scheme from the previous subsection and get the next two lemmas.


513

Lemma 6.4. Consider the hierarchical basis (6.6) for nested meshes of quadrilaterals. Then √ (k) = A(k+1) if and only if r = 2 . A 22 2 Lemma 6.5. The estimate 2 2e = 2QQ →

1 2

(6.8)

holds uniformly with respect to the refinement level k for the hierarchical splitting (6.6) with positive weight parameter s → 0+ . Proof. The straightforward computations lead to the following expression for the Schur complement: 1 − 2s 1 −1 Se = , 2(1 − s) −1 1 and therefore 2QQ = 1 − = 1 − which completes the proof.

1 − 2s 1 = 2(1 − s) 2(1 − s)

Here, QQ indicates that the interface edge is always between two quadrilaterals. Remark 6.6. The result from Lemma 6.5 is sympathetically equivalent to the condition number estimate of the related multiplicative two-level method, < 2. Applying a different technique, the later estimate is obtained in [16] for quadrilateral meshes of arbitrary space dimension. 6.3. Mesh of quadrilaterals and triangles The general case of a coarsest mesh Tm consisting of quadrilaterals and triangles is considered. The refinement procedure is regular, and for the particular cases, it is the same as was considered in the previous two subsections. What remains to be analyzed is the situation, where macroelements of different kinds are adjacent as shown in Fig. 3(b). Combining the constructions from the previous two subsections and following the node numbering from Fig. 3(b), we get the local (k) matrix Ae in the form ⎡ 1 ⎤ 1 −s s − ⎢ 2 ⎥ 2 ⎢ ⎥ 1 1 ⎢ −s ⎥ s− ⎢ ⎥ 2 2 ⎢ ⎥ 3 1 ⎢ ⎥ −1 −s ⎢s − ⎥ ⎢ ⎥ 2 2 ⎢ ⎥ 1 3 ⎢ ⎥ s − −s −1 (k) ⎢ ⎥ Ae = ⎢ (6.9) 2 2 ⎥ ⎢ t −1 ⎥ t −1 ⎢ ⎥ 1 −t ⎢ ⎥ 2 2 ⎢ ⎥ −t t ⎢ ⎥ ⎢ ⎥ t −1 1−t ⎢ ⎥ −1 1+ ⎢ ⎥ 2 2 ⎣ ⎦ 1−t t −1 −1 1+ 2 2

514


with weight parameter s, t ∈ (0, 1). Keeping the already introduced local definitions of hierarchical bases, we write the combined transformation matrix in the form ⎡ ⎤ 1 1 −1 −1 ⎢ 1 −1 1 −1 ⎥ ⎢ ⎥ ⎢ 1 −1 −1 1 ⎥ ⎢ ⎥ ⎢ ⎥ 1 p q q (k) ⎢ ⎥. Je = ⎢ (6.10) ⎥ 1 q p q ⎢ ⎥ ⎢ 1 q q p⎥ ⎢ ⎥ ⎣r r r r ⎦ r r r r Let us stress the attention on the fact that all locally introduced parameters are fixed for each particular triangle/quadrilateral macroelement√from Tk+1 , independent of what kind of neighbors it has. In this respect it is important that r = 22 in both cases of triangles and quadrilaterals, see Lemmas 6.1 and 6.4. Lemma 6.7. Consider the local matrix, corresponding to the case of edge between quadrilateral √ and triangle, indicated below by “QT”, and let r = 22 , p = 1, q = −0.5, t = 0.5, and s → 0+ . Then, 1 −1 (k) Ae:22 = , −1 1 and the relation 2e = 2QT →

25 43

(6.11)

holds uniformly with respect to the refinement level k. Proof. Following the scheme from Lemma 6.5 we get 18 − 36s 1 −1 Se = , 43 − 68s −1 1 and therefore 2QT = 1 − = 1 − which completes the proof.

18 − 36s 25 − 32s = 43 − 68s 43 − 68s

The next two theorems summarize the results of Lemmas 6.2, 6.5, and 6.7. Theorem 6.8. Consider the hierarchical splitting of the graph-Laplacian, corresponding to the general case of nested meshes, where the coarsest one Tm consists of quadrilaterals and triangles. √ (k) = A(k+1) if and only if r = 2 . (a) Then A 22

2

(b) If p = 1, q = −0.5, t = 0.5, and 0 < s 35 16 ≈ 2.19, then, 2 max{2T T , 2QQ , 2QT } =

16 25

for all k, 0 k m.

(6.12)


515

Theorem 6.9. Let the parameters of the hierarchical splitting of the graph-Laplacian satisfy conditions (a) and (b) of Theorem 6.8. Then the related AMLI algorithm with acceleration polynomial of degree ∈ {2, 3} has optimal condition number and the total computational complexity is O(n0 ). Proof. The statement follows directly from Theorems 4.4 and 6.8, taking into account that =4 and 3 2 16 25 < 4 .

Acknowledgments This work has been conducted during the Special Radon Semester on Computational Mechanics, October–December 2005, and supported in part by RICAM, Austrian Academy of Sciences, Linz. The partial support of the first author by Texas A&M University through funding of his sabbatical leave is gratefully acknowledged. The second author gratefully acknowledges the support provided via EC INCO Grant BIS-21 + + 016639/2005. The authors have also been partially supported by the Bulgarian NSF Grant VU-MI-202/2006. References [1] D. Arnold, F. Brezzi, B. Cockburn, L.D. Marini, Unified analysis of discontinuous Galerkin methods for elliptic problems, SIAM J. Numer. Anal. 39 (2002) 1749–1779. [2] O. Axelsson, Iterative Solution Methods, Cambridge University Press, Cambridge, 1994. [3] O. Axelsson, P.S. Vassilevski, Algebraic multilevel preconditioning methods II, SIAM J. Numer. Anal. 27 (1990) 1569–1590. [4] R. Bank, T. Dupont, An optimal order process for solving finite element equations, Math. Comp. 36 (1981) 427–458. [5] R. Blaheta, S. Margenov, M. Neytcheva, Uniform estimate of the constant in the strengthened CBS inequality for anisotropic non-conforming FEM systems, Numer. Linear Algebra Appl. 11 (4) (2004) 309–326. [6] R. Blaheta, S. Margenov, M. Neytcheva, Robust optimal multilevel preconditioners for non-conforming finite element systems, Numer. Linear Algebra Appl. 12 (2) (2005) 495–514. [8] J.H. Bramble, R.E. Ewing, J.E. Pasciak, J. Shen, The analysis of multigrid algorithms for cell centered finite difference methods, Adv. Comput. Math. 5 (1) (1996) 15–29. [9] S. Brenner, J. Zhao, Convergence of multigrid algorithms for interior penalty methods, Appl. Numer. Anal. Comput. Math. 2 (1) (2005) 3–18. [10] B. Cockburn, Discontinuous Galerkin methods, Z. Angew. Math. Mech. 83 (11) (2003) 731–754. [11] V.A. Dobrev, R.D. Lazarov, P.S. Vassilevski, L.T. Zikatanov, Two-level preconditioning of discontinuous Galerkin approximations of second order elliptic equations, Numer. Linear Algebra Appl. 13 (6) (2006) 753–770. [12] V. Eijkhout, P.S. Vassilevski, The role of the strengthened Cauchy–Bunyakowski–Schwarz inequality in multilevel methods, SIAM Rev. 33 (1991) 405–419. [13] R.E. Ewing, J. Wang, Y. Yang, A stabilized discontinuous finite element method for elliptic problems, Numer. Linear Algebra Appl. 10 (2003) 83–104. [14] R. Falgout, P.S. Vassilevski, L.T. Zikatanov, On two-grid convergence estimates, Numer. Linear Algebra Appl. 12 (2005) 471–494. [15] J. Gopalakrishnan, G. Kanschat, A multilevel discontinuous Galerkin method, Numer. Math. 95 (3) (2003) 527–550. [16] R.D. Lazarov, P.S. Vassilevski, L.T. Zikatanov, Multilevel preconditioning of second order elliptic discontinuous Galerkin problems, Preprint, 2005. [17] B. Rivière, M.F. Wheeler, V. Girault, A priory error estimates for finite element method based on discontinuous approximation spaces for elliptic problems. Part II, SIAM J. Numer. Anal. 39 (3) (2001) 902–931. [18] P.S. Vassilevski, Hybrid V -cycle algebraic multilevel preconditioners, Math. Comp. 58 (198) (1992) 489–512.


Pseudorandom numbers and entropy conditions István Berkesa,1 , Walter Philippb, , Robert F. Tichyc,∗,2 a Department of Statistics, Technical University Graz, Steyrergasse 17/IV, A-8010 Graz, Austria b Department of Statistics, University of Illinois, 725 S. Wright Street, Champaign, IL 61820, USA c Institute of Mathematics A, Technical University Graz, Steyrergasse 30, A-8010 Graz, Austria

Received 12 July 2006; accepted 20 December 2006 Available online 26 January 2007

Abstract We investigate measures of pseudorandomness of finite sequences (xn ) of real numbers. Mauduit and Sárközy introduced the “well-distribution measure”, depending on the behavior of the sequence (xn ) along arithmetic subsequences (xak+b ). We extend this definition by replacing the class of arithmetic progressions by an arbitrary class A of sequences of positive integers and show that the so obtained measure is closely related to the metric entropy of the class A. Using standard probabilistic techniques, this fact enables us to give precise bounds for the pseudorandomness measure of classical constructions. In particular, we will be interested in “truly” random sequences and sequences of the form {nk }, where {·} denotes fractional part, (nk ) is a given sequence of integers and ∈ [0, 1). © 2007 Elsevier Inc. All rights reserved. Keywords: Pseudorandomness; Discrepancy; Well-distribution measure; Metric entropy

1. Introduction Computer generated pseudorandom numbers are used in many algorithms of applied mathematics (Monte Carlo methods, simulation, etc.) and the performance of such algorithms depends in an essential way on the properties of the random numbers used. A simple but important concept in the study of pseudorandomness is the discrepancy, characterizing how close the distribution of a finite Deceased.


E-mail addresses: [email protected] (I. Berkes), [email protected] (R.F. Tichy). 1 The research of István Berkes was supported by OTKA grants T 43037, K 61052 and FWF grant S9603-N13. 2 The research of Robert F. Tichy was supported by FWF grant S9603-N13.


I. Berkes et al. / Journal of Complexity 23 (2007) 516 – 527

517

sequence is to the uniform distribution. The discrepancy DN of a finite sequence (x1 , . . . , xN ) in the unit interval [0, 1) is defined by 1 DN = DN (x1 , . . . , xN ) := sup card (k N : xk t) − t . (1) 0t 1 N An infinite sequence (xn ) in [0, 1) is called uniformly distributed in the sense of Weyl if DN (x1 , . . . , xN ) → 0 as N → ∞. Uniform distribution and discrepancy are particularly useful tools in connection with Monte Carlo and quasi-Monte Carlo integration, since by a well-known inequality of Koksma and its multi-dimensional generalizations (see e.g. [13, p. 143 and 151]), the error term in such procedures depends on the discrepancy of the pseudorandom sequence used. However, uniform distribution catches only one aspect of randomness and so called low discrepancy sequences may have rather poor performance with respect to other algorithms, such as simulation. Recall that if (n ) is a sequence of i.i.d. random variables uniformly distributed in [0, 1), then by the Chung–Smirnov LIL (see e.g. [22, p. 504]) we have lim sup N→∞

1 NDN (1 , . . . , N ) =√ N log log N 2

a.s.

(2)

In other words, the discrepancy of “truly” independent sequences has the precise order of magnitude O(N −1/2 (log log N )1/2 ) with probability 1. On the other hand, if n = {n} where is a random variable uniformly distributed in [0, 1), then by a result of Kesten [11] we have NDN (1 , . . . , N ) ∼

2 log N log log N 2

in probability.

(3)

Here {t} denotes the fractional part of t. Thus the sequence (n ) gives a better remainder term in Monte Carlo integration than the “truly” i.i.d. sequence (n ), but obviously its fluctuations are quite different from those of i.i.d. sequences and this makes (n ) unsuitable for simulation purposes. A sequence resembling i.i.d. sequences not only has to have small discrepancy, but it must share several other properties with random sequences as well. Such properties can be used as “tests” for pseudorandomness, see Knuth [12] for a detailed discussion. For example, an i.i.d. sequence (e1 , . . . , en ) ∈ {−1, 1}n has the normality property meaning that not too long strings of ±1 occur in it with the “proper” frequency, it must be well-distributed relative to arithmetic progressions in the sense that the sums rj =1 ea+bj with a ∈ Z, b ∈ N and subject to 1 a + b a + br n are uniformly small compared with n (in fact, roughly O(n1/2 )), it must have small multiple correlations, etc. In a series of papers (see e.g. [14–16]), Mauduit and Sárközy give a detailed study of these properties, in particular, they investigate the well-distribution and correlation measure of several concrete constructions of pseudorandom sequences. In the context of sequences in [0, 1), they define the well-distribution measure by WN (x1 , . . . , xN ) := sup 1(xpk 1/2) − 1/2 , (4) (pk )∈L p N k where 1(B) denotes the indicator function of the set B and L is the class of arithmetic progressions pk = a + bk, k = 1, 2, . . . with integers a 0, b 1. Both WN and NDN are suprema of sums of centered indicator functions 1(xj t) − t, but they have a completely different behavior. For example, the order of magnitude of NDN for an infinite sequence (xk ) in [0, 1) can be as small as O(log N), an order of magnitude which is in fact the smallest possible by a classical result of

518


Schmidt (see e.g. [6,13]). In contrast, by a result of Roth [21], for any sequence (x1 , . . . , xN ) we have WN (x1 , . . . , xN ) cN 1/4 , where c is an absolute constant. The discrepancy 1 , . . . , xN ) can be fairly sharply estimated DN2(x ihxk by using the Erd˝ os–Turán and Koksma in terms of the exponential sums SN (h) = N k=1 e inequalities (see e.g. [6,13]), reducing the study of DN to an analytic problem for which powerful tools exist. On the other hand, the computation of WN leads to difficult combinatorial problems which are still unsolved in many important cases. The purpose of the present paper is to give a detailed analysis of the well distribution measure WN in (4); we will be specifically interested in the order of magnitude of WN (x1 , . . . , xN ) for i.i.d. sequences (xn ) and sequences of the type xk = {nk }, where (nk ) is an increasing sequence of positive integers. The sequence {nk } provides a particularly simple example for a uniformly distributed sequence in the sense of Weyl and it has been investigated extensively in the literature. Apart from technical simplifications, using the class L of arithmetic progressions in (4) has no particular significance; for example, for “not too large” classes A of sequences of positive integers and for i.i.d. sequences (xn ) we will be able to give sharp bounds for the more general quantity (A) WN (x1 , . . . , xN ) := sup sup (5) 1(xpk t) − t . (pk )∈A 0 t 1 p N k (A)

We will see that the order of magnitude of WN is intimately connected with the geometric properties of the class A, namely, its metric entropy function (A; , N ) and related quantities. Metric entropy plays an important role in uniformity problems in the law of large numbers, CLT and LIL for random variables indexed by sets (see e.g. Dudley [7,8], Dudley and Philipp [9], Pollard [20]), but no such connection has been studied when uniformity is meant over subsequences of integers as in (5). In analogy with the existing probabilistic results on uniformity in the CLT, LIL and other limit theorems, it can be expected that metric entropy type quantities provide not only (A) (A) upper, but also lower estimates for WN , thereby reducing the study of WN to the computation of metric entropy numbers. Before formulating our results, it will be useful to review existing results on the ordinary discrepancy and well-distribution measure of the sequence {nk }. By a classical result of Weyl [23], for any increasing sequence (nk ) of integers, {nk } is uniformly distributed for every ∈ [0, 1), except for a set of Lebesgue measure 0. Kesten’s result cited above shows that NDN ({k}) ∼

2 log N log log N 2

in measure. Another case where the order of magnitude of the discrepancy of {nk } is known is when (nk ) grows very rapidly. Philipp [18] proved that if (nk ) satisfies the Hadamard gap condition nk+1 /nk q > 1,

k = 1, 2, . . .

(6)

then we have for almost all ∈ [0, 1) 1 NDN ({nk }) lim sup C(q), 4 N log log N N→∞

(7)


519

where C(q)>1/(q − 1). Recalling that the precise order of magnitude of the discrepancy of i.i.d. uniform sequences is O(N −1/2 (log log N )1/2 ) with probability 1, the result of Philipp shows that, in the sense of discrepancy, the sequence {nk } behaves exactly like an i.i.d. sequence. For subexponentially growing (nk ) the behavior of DN ({nk }) is much more complicated and depends sensitively on the number-theoretic properties of the sequence (nk ); see Berkes et al. [3] for a detailed analysis of the arithmetic effect. In [3] it is also shown that in a certain statistical sense, for “most” subexponential sequences (nk ) the discrepancy DN ({nk }) still satisfies (7). Passing to general sequences (nk ), Baker [1] proved, improving earlier results of Cassels [5] and Erd˝os and Koksma [10], that for any increasing sequence (nk ) of positive integers we have NDN ({nk }) = O(N 1/2 (log N )3/2+ε ) a.e.

(8)

for any ε > 0. On the other hand, one can construct examples such that NDN ({nk })cN 1/2 (log N )1/2

a.e. for infinitely many N

(see e.g. Berkes and Philipp [2]). This means that there exist sequences {nk } whose discrepancy DN ({nk }) exceeds the discrepancy of i.i.d. sequences, but the excess factor can be at most a power of log N. The previous results give a fairly satisfactory picture of the metric discrepancy of sequences {nk } in a number of important cases. In contrast, relatively little is known on the well-distribution measure WN of {nk }. Mauduit and Sárközy [15,16] showed that in the case nk = k we have WN ({k})>N 1/2 (log N )1+ε for almost every ∈ [0, 1), and that the exponent of the log can be replaced by 1/2 if the partial quotients of the continued fraction expansion of remain bounded. They also proved that WN ({k})?N 1/2 for every irrational . Thus for almost all the order of magnitude of WN ({k}) is roughly O(N 1/2 ), which, as Theorem 1 in combination with the estimate (21) below will show, is very close to the order of magnitude of the well-distribution measure of “true” i.i.d. sequences. As noted, however, NDN ({k}) is much smaller than O(N 1/2 ), indicating a very complicated probabilistic behavior of the sequence {k}. Except for the sequence {k}, no precise estimates for the well-distribution measure of {nk } seem to be known. For the sequence {k r } (r = 2, 3, . . .), Mauduit and Sárközy [15,16] proved that for almost every WN ({k r })>N 1−r with some (explicitly computed) constant r > 0. In particular, WN ({k 2 })>N 3/5 (log N )2/5+ε

a.e.

Philipp and Tichy [19] proved that for any increasing sequence (nk ) of integers we have WN ({nk })>N 2/3 (log N )1+ε

a.e.

(9)

It is possible that, in analogy with Baker’s result (8), the factor N 2/3 here can be replaced by N 1/2 , but this remains open.

520


2. Results We are now ready to formulate our main results. Let (n ) be any sequence of random variables with values in [0, 1), and let A be a class of subsequences of N. Our purpose is to estimate the quantity (A) WN (1 , . . . , N ) := sup 1(pk t) − t . sup (10) (pk )∈A 0 t 1 p N k Our main interest will be the case when k are independent random variables or k =k ()={nk }, a sequence of random variables defined on the interval [0, 1) endowed with Lebesgue measure. (A) When the sequence (k ) is understood, we simply write WN (A) instead of WN (1 , . . . , N ). Clearly, for any A and (k ) we have (A)

WN (1 , . . . , N ) N and for “large” A this estimate cannot be substantially improved even if k are i.i.d. random variables. For example, if k are independent r.v.’s taking the values 0 and 2/3 with probability 1/2 − 1/2 and A is the class of all increasing sequences in N, then (A)

WN (1 , . . . , N ) N/4. Indeed, if for each we let p1 () < p2 () < · · · denote those indices such that pk () = 0 then either (pk ) or its complement in the segment [1, 2, . . . , N] has cardinality at least N/2. Consequently, we have for all sup 1(pk 1/2) − 1/2 N/4. (pk )∈A p N k In the case when A consists of a single sequence and (n ) is an i.i.d. uniform sequence of r.v.’s, we have (A)

WN (1 , . . . , N ) = o(N)

a.s.

(11)

by the Glivenko–Cantelli theoremof probability theory. (Actually, in this case the right hand side of (11) can be improved to O (N log log N )1/2 by the Chung–Smirnov law of the iterated logarithm.) If relation (11) holds for a larger class A, this means a certain uniformity in the Glivenko–Cantelli theorem with respect to a class of subsequences of integers. Uniformity in the Glivenko–Cantelli theorem with respect to subsets of the Euclidean space Rd has been investigated extensively in the literature. Let (n ) be a sequence of i.i.d. random variables, uniformly distributed over the unit cube Kd of Rd , and let C be a class of Borel sets ⊆ Kd . Put ZN (C) = (1(k ∈ C) − (C)), C ∈ C, k N

where is the Lebesgue measure. As it turns out, the validity of the uniform strong law and LIL, i.e. lim sup

N→∞ C∈C

1 |ZN (C)| = 0 N

a.s.

(12)


521

and sup |ZN (C)| lim sup C∈C 0. Then with probability 1 1 lim sup (N log log N )−1/2 WN (A)C 4 N→∞ for some constant C, depending only on the constant B in (16).

(16)

522


Next, let (nk ) be a sequence of real numbers satisfying the Hadamard gap condition nk+1 /nk q > 1,

k = 1, 2, . . . .

(17)

Then the sequence k () := {nk }

(18)

defined on the unit interval [0, 1) endowed with Lebesgue measure, is a sequence of random variables having asymptotically uniform distribution over [0, 1). Theorem 2. Let (nk ) be a sequence of real numbers satisfying the Hadamard gap condition (17) and let k = k () = {nk }. Let A be a class of subsequences of N with entropy function satisfying

(A; N, r) B · 2r

(19)

for some constants B > 0 and > 0. Then with probability 1 1 lim sup (N log log N )−1/2 WN (A)C 4 N→∞ for some constant C > 0, depending only on B, and q. The second entropy concept is based on the Hamming distance of sequences of integers. For N 1 we define the distance of two sequences A and B of positive integers by d(A, B; N) =

1 |1(n ∈ A) − 1(n ∈ B)|. N nN

Given a class A of increasing sequences of positive integers we define the entropy function by (A; , N )

:= sup m : there exist A1 , . . . , Am ∈A such that d(Ai , Aj ; N )> for all i =j . (20) Clearly is a non-increasing function of 0. Theorem 3. Let (k ) be a sequence of independent random variables with uniform distribution (15) over [0, 1). Let A be a class of increasing sequences of positive integers with entropy function (A; , N ) growing not faster than a polynomial in 1/ (depending only on A). Then with probability 1

√ WN (A)> N log A; N − , N + (log log N )1/2 for any > 1/2. The same result holds if k = {nk }, where (nk ) is a sequence of real numbers satisfying the Hadamard gap condition (17). ˇ As an example consider a Vapnik–Cervonenkis (VC) class A in the set N of positive integers. For any finite set F ⊂ N, let A (F ) be the number of different subsets F ∩ A, A ∈ A. For n = 1, 2, . . . let mA (n) := max (A (F ) : card F = n).


523

Clearly mA (n)2n . Let inf{n : mA (n) < 2n } v = V (A) := +∞ if mA (n) = 2n for all n. If V (A) < +∞ then A is called a VC class in N. We recall a result of Dudley [7, Lemma 7.13] or Dudley [8, p. 105], measuring the size of VC classes. Let be the set of all laws on N of the form n−1

n

x(j )

j =1

for unit point masses x(j ) at x(j ) ∈ N, j = 1, 2, . . . , n; n = 1, 2, . . . where the x(j ) need not be distinct. For > 0 and ∈ let

∗ (A, ; ) := sup m : there exist A1 , . . . , Am ∈ A such that (Ai Aj ) > for i = j and ∗ (A; ) := sup{(A, ; ) : ∈ }. Lemma 1 (Dudley [7,8]). If A is a VC class in N with V (A) = v, then there is a constant K depending only on v such that ∗ (A; )K−v | log |v

for all > 0.

Hence if A is a VC class in N, the entropy function defined in (20) does not grow faster than a polynomial in 1/. Corollary 1. Let (k ) be a sequence of independent random variables with uniform distribution (15) over [0, 1) or k = {nk } with a Hadamard lacunary (nk ). Then if A is a VC class in N, with probability 1 we have √ WN (A)> N log N. In the following two results we consider the case when k = {nk } with an arbitrary increasing sequence (nk ) of positive integers. If L denotes the collection of all integer valued arithmetic progressions pk = a + bk, k = 1, 2, . . . , a 0, b 1, then it is easy to see that the entropy function satisfies

(L; N, r) 22r ,

r = 1, 2, . . . .

(21)

Theorem 4. Let (nk ) be an increasing sequence of positive integers and let k = k () = {nk }. Let A be a class of subsequences of N with entropy function satisfying (19) for some positive constants and B. Then with probability 1 for any ε > 0

3

WN (A) > N 1+ (log N ) 1+

+ε

if > 1,

1 2

2+ε

if = 1,

1 2

3 2 +ε

if < 1.

> N (log N ) > N (log N )

524


Remark. The case = 2 and (21) yield Theorem 1 in [19]. Theorem 5. Let (nk ) be an increasing sequence of positive integers and let k = k () = {nk }. Let A be a class of increasing sequences of positive integers with entropy function (A; , N )C−v for some v 0, where C depends only on A. Then with probability 1 3

WN (A)>N v+2 (log N ) v+2 +ε , v+1

ε > 0.

3. Proofs In what follows, we will prove Theorem 4 and outline the idea of the proof of Theorem 3 in the lacunary case, which is typical for the proof of the remaining results. Complete proofs of all results and a number of further results will be given in our forthcoming paper [4]. Assume the conditions of Theorem 4. Fix N 1, r 1 and let (pk ) be a fixed sequence in [1, N ] such that (pk ) ∈ AN (r). By the Erd˝os–Turán inequality (see e.g. [6, p. 15], or [13, p. 114]) we have for any 1 QN 1 6R . 1(pk () t) − t +6 e(hn ) sup pk h 0t 1 H pk Q

1hH

pk Q

Here R = #{k : pk Q}, e(x) = exp(2ix) and H 1 is arbitrary. Clearly R N and thus 2 max sup 1(pk () t) − t QN 0t 1 pk Q ⎞2 ⎛ 2 72N 1 ⎠ . ⎝ + 72 max e(hn ) pk 2 h QN H 1hH

pk Q

By Hunt’s inequality (see e.g. [17]) we have ⎛ 2 ⎞ ⎟ ⎜ e(hnpk ) ⎠ C 1 CN2−(r−1) E ⎝ max QN pk Q pk N and thus choosing H = N and using Minkowski’s inequality we get ⎛ 2 ⎞ ⎟ ⎜ 1(pk t) − t ⎠ >N 2−r log2 N + 1>N 2−r log2 N. E ⎝ max sup QN 0t 1 pk Q

(22)

(To justify the last step, we note that without loss of generality we can assume that N 2−(r−1) 1, since otherwise AN (r) is empty.) Since the number of sequences (pk ) ∈ AN (r) is at most B · 2r by the assumptions of Theorem 4, we have for any > 0, > 0 (to be chosen suitably later), ⎞ ⎛ max sup 1(pk t) − t 2N (log N ) ⎠ P ⎝ max (pk )∈AN (r) Q N 0 t 1 pk Q >N 1−2 (log N )2−2 · 2r(−1) .

(23)


525

Without loss of generality we can assume that N2−(r−1) N (log N ) , i.e. 2r 2N 1− (log N )− ,

(24)

since otherwise the absolute value of the sum in (23) would be less than N (log N ) . Summing the probability bounds in (23) over all r subject to (24) and choosing and according to the following table:

>1

/(1 + )

(3 + ε)/(1 + )

=1

1 2 1 2

2+ε

m−(1+ε ) for some C ∗ > 0, ε > 0. We apply the convergence part of the Borel–Cantelli lemma and obtain the conclusion of Theorem 4. For the proof of Theorem 3 in the lacunary case define, for 0 s < t 1, xn (s, t) := 1(s n < t) − (t − s). We state the following maximal inequality. Proposition 1. Let N 1 be an integer and let R 1. Suppose that := t − s N −3/2 . Then for some constant A1 depending only on q and for any > 0 we have as N → ∞ ⎛ ⎞ Q P ⎝ max xk (s, t) AR1/32 (N log log N )1/2 ⎠ QN k=1 > exp(−16R−1/32 log log N ) + R −8 N −2 , where the constant implied by > only depends on q and . An exponential bound of this kind is a crucial ingredient of all discrepancy estimates of LIL type. The proof depends on a martingale approximation argument and can be modelled after the proof of [18, Proposition 4.2.1]. The details are, however, long and technical and will be given in [4]. To deduce Theorem 3 from Proposition 1, fix 1/2 < < 1 and 0 s < t 1. By the hypotheses of the theorem, we can choose > 0 such that (A; , N )>−/2 ,

(25)

where the constant implied by > depends only on A. For simplicity we set () := (A; , [1/]).

(26)

526


By Proposition 1 we have for any sequence (pk ) ⊂ N and R 1, 0 < ε 1/32 and t − s 2−3r/2 as r → ∞

1 1 P maxQ 2r pk Q xpk (s, t) AR2 2 r (t − s)ε (log (2−r ) + (log r) 2 ) 1 1 exp(−16R(t − s)−ε log(2−r )log 2 r) + R −8 2−2r if log(2−r ) > log 2 r, (27) > 1 if log(2−r ) log 2 r exp(−16R(t−s)−ε logr) + R −8 2−2r for some constant A1. (In the case of the first line of (27) we apply Proposition 1 with R replaced by R log (2−r )(log r)−1/2 .) Let := AR(t − s)ε 2−r/2 (1)

(28)

(M)

and B = {(pk ), . . . , (pk )} a maximal set of sequences in A with pairwise distance > with respect to the Hamming distance d(·, ·, 2r ). Then M = (A, , 2r ) (A; 2−r , 2r ) = (2−r ) since (t − s)ε 2−r/2 2−r(3ε/2+1/2) 2−r provided we choose ε > 0 so small that 3ε/2 + 1/2 < . Clearly, for any (qk ) ∈ A there is a (pk ) ∈ B with d((pk ), (qk ), 2r ) , which implies that for any Q 2r the sums pk Q xpk (s, t) and qk Q xqk (s, t) differ at most by 2r = AR(t − s)ε 2r/2 . Hence using (27) we get ⎞ ⎛ 1 1 xqk (s, t) 2AR2 2 r (t − s)ε (log (2−r ) + log 2 r)⎠ P ⎝ max maxr (qk )∈A Q 2 qk Q

1 1 > exp −8R(t − s)−ε (log (2−r ) + log 2 r) log 2 r + log (2−r ) +R −8 (2−r )2−2r > exp(−4R(t − s)−ε log r) + R −8 2−3r /2 1

1

by distinguishing the cases log (2−r ) > log 2 r and log (2−r ) log 2 r and by using (25) in the estimate of the very last term. The proof of Theorem 3 can now be completed by a chaining argument similar to that in [18]. Note added in proof With great sadness, we inform the reader that Walter Philipp passed away on July 19, 2006, at the age of 69, near Graz, Austria.–I. Berkes and R.F. Tichy. References [1] R.C. Baker, Metric number theory and the large sieve, J. London Math. Soc. 24 (2) (1981) 34–40. [2] I. Berkes, W. Philipp, The size of trigonometric and Walsh series and uniform distribution mod 1, J. London Math. Soc. 50 (2) (1994) 454–464. [3] I. Berkes, W. Philipp, R.F. Tichy, Empirical processes in probabilistic number theory: the LIL for the discrepancy of (nk ) mod 1, Illinois J. Math. 50 (2006) 107–145. [4] I. Berkes, W. Philipp, R.F. Tichy, Entropy conditions for subsequences of random variables with applications to empirical processes, Monatshefte Math., to appear.


527

[5] J.W.S. Cassels, Some metrical theorems of Diophantine approximation III, Proc. Cambridge Philos. Soc. 46 (1950) 219–225. [6] M. Drmota, R.F. Tichy, Sequences, discrepancies and applications, Lecture Notes in Mathematics, vol. 1651, Springer, Berlin, 1997. [7] R.M. Dudley, Central limit theorems for empirical measures, Ann. Probab. 6 (1978) 899–929, Ann. Probab. 7 (1979) 909–911 (Correction). [8] R.M. Dudley, A course on empirical processes, Lecture Notes in Mathematics, vol. 1097, Springer, Berlin, 1984, pp. 1–142. [9] R.M. Dudley, W. Philipp, Invariance principles for sums of Banach space valued random elements and empirical processes Z. Wahrscheinlichkeitstheorie verw. Geb. 62 (1983) 509–552. [10] P. Erdös, J.F. Koksma, On the uniform distribution modulo 1 of sequences (f (n, ϑ)), Proc. Kon. Nederl. Akad. Wetensch. 52 (1949) 851–854. [11] H. Kesten, The discrepancy of random sequences {kx}, Acta Arith. 10 (1964/65) 183–213. [12] D.E. Knuth, The Art of Computer Programming, vol. 2, second ed., Addison-Wesley, Reading, MA, 1981. [13] L. Kuipers, H. Niederreiter, Uniform Distribution of Sequences, Wiley, New York, 1974. [14] C. Mauduit, A. Sárközy, On finite pseudorandom binary sequences I. Measure of pseudorandomness, the Legendre symbol, Acta Arith. 82 (1997) 365–377. [15] C. Mauduit, A. Sárközy, On finite pseudorandom binary sequences V. On (n) and (n2 ) sequences, Monatshefte Math. 129 (2000) 197–216. [16] C. Mauduit, A. Sárközy, On finite pseudorandom binary sequences VI. On (nk ) sequences, Monatshefte Math. 130 (2000) 281–298. [17] C.J. Mozzochi, On the pointwise convergence of Fourier series, Lecture Notes in Mathematics, vol. 199, Springer, Berlin, 1971. [18] W. Philipp, A functional law of the iterated logarithm for empirical distribution functions of weakly dependent random variables, Ann. Probab. 5 (1977) 319–350. [19] W. Philipp, R.F. Tichy, Metric theorems for distribution measures of pseudorandom sequences, Monatshefte Math. 135 (2002) 321–326. [20] D. Pollard, Convergence of Stochastic Processes, Springer, Berlin, 1984. [21] K.F. Roth, Remark concerning integer sequences, Acta Arith. 9 (1964) 257–260. [22] G.R. Shorack, J. Wellner, Empirical Processes with Applications to Statistics, Wiley, New York, 1986. [23] H. Weyl, Über die Gleichverteilung von Zahlen mod. Eins, Math. Ann. 77 (1916) 313–352.


Quadrature in Besov spaces on the Euclidean sphere K. Hessea,1 , H.N. Mhaskarb,∗,2 , I.H. Sloana,3 a School of Mathematics and Statistics, University of New South Wales, Sydney, NSW 2052, Australia b Department of Mathematics, California State University, Los Angeles, California 90032, USA

Received 18 July 2006; accepted 31 October 2006 Available online 29 December 2006 Dedicated to Professor Dr. Henryk Wo´zniakowski on the occasion of his sixtieth birthday

Abstract Let q 1 be an integer, Sq denote the unit sphere embedded in the Euclidean space Rq+1 , and q be its Lebesgue surface measure. We establish upper and lower bounds for M sup f d q − wk f (xk ) , xk ∈ Sq , wk ∈ R, k = 1, . . . , M, f ∈Bp, q k=1 S

where Bp, is the unit ball of a suitable Besov space on the sphere. The upper bounds are obtained for choices of xk and wk that admit exact quadrature for spherical polynomials of a given degree, and satisfy a certain continuity condition; the lower bounds are obtained for the infimum of the above quantity over all choices of xk and wk . Since the upper and lower bounds agree with respect to order, the complexity of quadrature in Besov spaces on the sphere is thereby established. © 2006 Elsevier Inc. All rights reserved. Keywords: Besov spaces on the sphere; Numerical integration; Polynomial frames; Quadrature formulas on the sphere; Sphere


E-mail address: [email protected] (H.N. Mhaskar). 1 The support of the Australian Research Council is gratefully acknowledged. Part of the work was carried out while

the author was a guest of the Center for Constructive Approximation at Vanderbilt University. 2 The research of this author was supported, in part, by Grant W911NF-04-1-0339 from the U.S. Army Research Office, Grant DMS-0204704, and its continuation Grant DMS-0605209 from the National Science Foundation. 3 The support of the Australian Research Council is gratefully acknowledged.


K. Hesse et al. / Journal of Complexity 23 (2007) 528 – 552

529

1. Introduction Let q 1 be an integer, Sq be the unit sphere embedded in the Euclidean space Rq+1 , and q be its Lebesgue surface measure, so that q :=

Sq

dq =

2(q+1)/2 . ((q + 1)/2)

(1.1)

q

For integer N 0, let N denote the class of all spherical polynomials of degree at most N; i.e., the class of restrictions to Sq of polynomials in q + 1 variables with total degree at most N. Many applications in geophysicsand partial differential equations require an approximate evaluation of an integral of the form Sq f dq . Therefore, several mathematicians have recently q investigated quadrature formulas for such integrals that are required to be exact for f ∈ N and high integer values of N. The examples include the product Gaussian formulas in the book [27, Chapter 2] of Stroud, Driscoll–Healy formulas [6, Theorem 3], and some newer formulas by Brown et al, [2, Theorem 4], Potts et al. [22]. Numerically stable interpolatory quadrature formulas based on points that maximize a certain determinant have been studied by Sloan and Womersley [25, Sections 4 and 5]. Jetter et al. [12] have established the existence of signed quadrature rules based on “scattered data”; i.e., in the case when no assumptions are made on the location of the nodes. The existence of positive quadrature formulas based on scattered data is proved in [18, Section 4, Theorem 4.1]. Some ideas on the computation of these formulas are also given in q [18]. The number of data points for which quadrature formulas, exact for N , can be obtained is q proportional to the dimension of N . q For a quadrature formula of the form Qf = M k=1 wk f (xk ), where wk ∈ R and xk ∈ S , that q is exact for polynomials in N , an obvious error bound can be obtained by observing that for any q P ∈ N , f d − Qf = (f − P ) d − Q(f − P ) q q q q S

S

q +

M

|wk | maxq |f (x)−P (x)|.

k=1

x∈S

Therefore, if f is a continuous function on Sq , and

q EN,∞ (f ) := inf maxq |f (x) − P (x)| : P ∈ N , x∈S

we have

Sq

M f dq − Qf q + |wk | EN,∞ (f ).

(1.2)

k=1

The rate of convergence of EN,∞ (f ) to zero as N → ∞ is traditionally expressed in terms of moduli of smoothness of derivatives of f (see for example Ragozin [23, Theorem 3.4]). A recent result [8, Proposition 4.3(b)] directly in terms of derivatives is that, if > 0, and f has a continuous fractional derivative * f of order (see (2.7) for a precise definition), then

530


EN,∞ (f ) = O(N − ). The same estimate on the quadrature error was proved in the case q = 2 for quadrature formulas satisfying a continuity condition, by Hesse and Sloan [10, Theorem 5], under the weaker assumption that * f is square-integrable on Sq . We note that for all the quadrature formulas mentioned above, this rate is O(M −/q ) in terms of the number of nodes M if M = O(N q ). For q = 2, Hesse and Sloan have proved in [11, Theorem 1] that this estimate is the best possible for any quadrature formula using M nodes, without any further assumptions on the nodes, the weights, or on polynomial exactness, by proving a lower bound for the worst-case quadrature error of the same order. These results have been extended to the case of higher values of q in [1,9] and further to the case of arbitrary Lp spaces and weighted integrals in [16]. The condition in [10,11] that * f be square-integrable can be reformulated to state that f is in a certain Besov space (see Proposition 2). In this paper, we study the error of quadrature formulas for arbitrary Lp -Besov spaces on the sphere, 1p ∞. We obtain both upper and lower bounds for this error. The upper bounds are given in terms of the degree of polynomials for which the quadrature formulas are exact. The lower bounds are given in terms of certain geometric quantities associated with the support of the quadrature measures. Since the upper and lower bounds agree with respect to orders of M, they yield a result on the complexity of quadrature in Besov spaces on the sphere. In Section 2, we develop the notations necessary for stating and proving the results in the paper, as well as review a few known facts. The main results are stated in Section 3, and the proofs are given in Section 4. Essential tools in our proofs are the concept of polynomial frames and their relationship with the Besov spaces, using material that is scattered in various other papers [14–16,19,8]. The Appendix contains a sketch of the proof of Theorem 3, and of the estimate (4.3) used in the proof of Lemma 8.

2. Notations and preliminaries 2.1. General notations If 1 p ∞, and f : Sq → R is Lebesgue measurable, we write f p :=

{ Sq |f (x)|p dq (x)}1/p if 1 p < ∞, ess supx∈Sq |f (x)|

if p = ∞.

The space of all Lebesgue measurable functions on Sq such that f p < ∞ will be denoted by Lp , with the usual convention that two functions are considered equal as elements of this space if they are equal almost everywhere. The symbol X p will denote Lp if 1 p < ∞ and the space of all continuous functions on Sq if p = ∞ (equipped with the norm of L∞ ). Strictly speaking, the space Lp consists of equivalence classes. If f ∈ Lp is almost everywhere equal to a continuous function, we will assume that this continuous function is chosen as the representer of its class. For a fixed integer 0, the restriction to Sq of a homogeneous harmonic polynomial of exact degree is called a spherical harmonic of degree . Most of the following information is based on [20; 26, Section IV.2; 7, Chapter XI], although we use a different notation. The class of all spherical q q harmonics of degree will be denoted by H . The spaces H are mutually orthogonal relative to q q q the inner product of L2 . For any integer N 0, we have N = N =0 H . The dimension of H


531

is given by

⎧ ⎪ ⎨ 2 + q − 1 + q − 1 if 1, q q +q −1 d := dim H = ⎪ ⎩ 1 if = 0

(2.1)

∞ q q q+1 q 2 2 and that of N is N =0 d = dN . Furthermore, L = L -closure =0 H . Hence, if we q q choose an orthonormal basis {Y,k : k = 1, . . . , d } for each H , then the set {Y,k : = q 0, 1, . . . and k = 1, . . . , d } is a complete orthonormal basis for L2 . One has the well-known addition formula [20; 7, Chapter XI, Theorem 4]: q

d

q

Y,k (x)Y,k (y) =

k=1

d P (q + 1; x · y), q

= 0, 1, . . . ,

where P (q + 1; ◦) is the degree- Legendre polynomial in q + 1-dimensions. The Legendre polynomials are normalized so that P (q+1; 1) = 1, and satisfy the orthogonality relations [20, Lemma 10] 1 q q P (q + 1; t)Pk (q + 1; t)(1 − t 2 ) 2 −1 dt = q ,k . q−1 d −1 ( q−1 2 )

They are related to the ultraspherical (Gegenbauer) polynomials P ( , ) P

(cf. [28, Section 4.7; 20,

p. 33]), and the Jacobi polynomials [28, Chapter IV], with = = + q − 2 ( q−1 ) P 2 (t) = P (q + 1; t) (q 2), + q2 − 1 ( q −1, q2 −1) P 2 (t) = P (q + 1; t).

q 2

− 1, via

(2.2)

When q = 1, the Legendre polynomials P (2; ◦) coincide with the Chebyshev polynomials (0) T ; the ultraspherical polynomials are then given by P (t) = (2/)T (t), if 1. For = 0, (0) P0 (t) = 1. Let S be the space of all infinitely often differentiable functions on Sq , endowed with the locally convex topology induced by the supremum norms of all the derivatives of such functions, and let S ∗ be the dual of this space, that is, the space of distributions on Sq . For x ∗ ∈ S ∗ , we define q x∗ (, k) := x ∗ (Y,k ), k = 1, . . . , d , = 0, 1, . . . . (2.3) A most common example is when x ∗ is defined by g → Sq g()f () dq () for some integrable function f on Sq . In this case, we identify x ∗ with f and use the notation fˆ(, k) to denote the corresponding x∗ (, k).

2.2. Besov spaces Our definition of Besov spaces is motivated by the equivalence theorem for the characterization of Besov spaces on the sphere [13, Theorem 3.1]. For any integer N 0, 1 p ∞ and f ∈ Lp ,

532


we write EN,p (f ) := minq f − P p . P ∈N

We will define the Besov spaces in terms of the sequence {E2j, p (f )}j ∈N0 . Let 0 < ∞, > 0, and let a = {aj }j ∈N0 be a sequence of real numbers. We define ⎧⎛ ⎞1/ ∞ ⎪ ⎪ ⎪ ⎪⎝ 2j |aj | ⎠ if 0 < < ∞, ⎨ a, := j =0 ⎪ ⎪ ⎪ j ⎪ if = ∞. ⎩ sup 2 |aj | j ∈N0

The space of sequences a for which a, < ∞ will be denoted by b, . If 1 p ∞, the Besov space Bp, consists of all functions f ∈ Lp for which {E2j, p (f )}j ∈N0 ∈ b, . The expression ⎧ 1/ ⎪ ∞ ⎪ ⎪ j ⎨ f p + if 0 < < ∞, 2 E2j, p (f ) j =0 f p,, := (2.4) ⎪ ⎪ j E ⎪ f + sup 2 (f ) if = ∞, j p ⎩ 2 ,p j ∈N0

defines a quasi-norm on the space, and the unit ball Bp, := {f : f p,, 1} is a compact subset of Lp (cf. [15, Theorem 2.2]). In the remainder of this paper, we adopt the following convention regarding constants. The letters c, c1 , . . . will denote positive constants depending only on such fixed parameters in the discussion as the dimension q, the different norms involved in the formula, and any other explicitly mentioned quantities. Their value will be different at different occurrences, even within the same formula. The expression A ∼ B will mean cA B c1 A. For the convenience of the reader, we summarize certain facts about the sequence spaces b, in the following lemma. Lemma 1. Let 0 < ∞, 0 < < , {aj }j ∈N0 be a sequence of real numbers. (a) We have {aj }j ∈N0 , = {2j aj }j ∈N0 ,− . (b) (Discrete Hardy inequality) We have ⎧ ⎫ ⎨ ∞ ⎬ c {aj }j ∈N , . |aj | 0 ⎩ ⎭ j =n

(2.5)

(2.6)

n∈N0 ,

Proof. Part (a) is clear from the definition. Part (b) is proved, for example, in [5, Lemma 3.4, p. 27]. Next, we illustrate a connection between Besov spaces and the smoothness conditions in [10,11]. For > 0 and sufficiently smooth f (or more generally, for a distribution f associated with the coefficients fˆ(, k)) we may define the fractional differentiation pseudo-differential


533

operator * by * f (, k) := ( + 1) fˆ(, k),

q

= 0, 1, . . . , k = 1, . . . , d .

(2.7)

Proposition 2. Let > 0, f ∈ L2 . Then f ∈ B2,2 if and only if * f ∈ L2 , and if this holds then f 2 + * f 2 ∼ f 2,2, . Proof. In view of the Parseval identity, we observe that for f ∈ L2 , q

d ∞

[Em,2 (f )] = 2

|fˆ(, k)|2 ,

m = 0, 1, 2, . . . .

=m+1 k=1

Using the Parseval identity again, q

* f 22 =

d ∞

( + 1)2 |fˆ(, k)|2

=0 k=1 q

=

d 1

( + 1) |fˆ(, k)| + 2

∼

1

|fˆ(, k)|2 +

=

1

|fˆ(, k)|2 +

=

1

|fˆ(, k)|2 +

∼

1 =0 k=1

22j

∞

j +1 2

q

d

|fˆ(, k)|2

=2j +1 k=1

! " 22j [E2j, 2 (f )]2 − [E2j +1, 2 (f )]2

∞

2

2j

[E2j, 2 (f )] − 2 2

j =0

=0 k=1 q d

( + 1)2 |fˆ(, k)|2

j =0

=0 k=1 q d

∞ j =0

=0 k=1 q d

d ∞ 2 j =0 =2j +1 k=1

=0 k=1 q d

q

j +1

2

|fˆ(, k)|2 +

∞

−2

∞

22j [E2j, 2 (f )]2

j =1

22j [E2j, 2 (f )]2 .

j =0

The proposition now follows from the definitions.

Finally, we recall an alternative characterization of the Besov spaces using polynomial operators [15,19,4]. In the sequel, h : [0, ∞) → [0, ∞) will denote a fixed function with the following properties: (i) h is infinitely differentiable, (ii) h is nonincreasing, (iii) h(x) = 1 if x 1/2, and (iv) h(x) = 0 if x 1. All the generic constants c, c1 , . . . may depend upon the choice of h. The univariate polynomials j of degree at most 2j − 1 are defined by q ∞ d j (h, t) := h j P (q + 1; t), 2 q =0

t ∈ R, j = 0, 1, . . . .

(2.8)

534


We define j (h, t) = 0 if j < 0. We will need the following operators defined for f ∈ L1 and integer j: the summability operator j is defined by j (f )(x) := f (y)j (h, x · y) dq (y) Sq

q d ∞ = h j fˆ(, k)Y,k (x), 2

x ∈ Sq ,

(2.9)

=0 k=1

and the frame operator j is defined by

j (f ) = j (f ) − j −1 (f ) q d ∞ = h j − h j −1 fˆ(, k)Y,k . 2 2

(2.10)

=0 k=1

Note that j (f ) and j (f ) are polynomials of degree 2j − 1, and that j (f ) is L2 -orthogonal to all polynomials of degree 2j −2 . The proof of the following theorem is sketched in the Appendix. Theorem 3. Let 1p ∞, and let f ∈ X p . Then the decomposition f =

∞

j (f )

(2.11)

j =0

holds in the sense of convergence in Xp . Furthermore, for 0 < ∞ and > 0, c1 f p,, f p + j (f )p j ∈N c2 f p,, . 0

,

(2.12)

In particular, f ∈ Bp, if and only if { j (f )p }j ∈N0 ∈ b, . The following corollary of the above theorem will be used tacitly throughout the paper.

Corollary 4. Let 1 p ∞, > q/p and 0 < ∞. If f ∈ Bp, then f is almost everywhere equal to a continuous function on Sq , the series in (2.11) converges uniformly to this continuous function, and f ∞ cf p,, . Proof. For p = ∞ nothing needs to be shown because X ∞ convergence of (2.11) implies uniform convergence, and (2.4) directly yields f ∞ f ∞,, . For 1 p < ∞, the Nikolskii inequality gives P ∞ cN q/p P p ,

q

P ∈ N q

(for a proof see [17, Proposition 2.1]). Since j (f ) ∈ 2j , we have j (f )∞ c2j q/p j (f )p .

Because f ∈ Bp, , (2.12) in Theorem 3 implies, in particular, that j (f )p c2−j f p,, . Therefore, for > q/p ∞ ∞ j (f )∞ cf p,, 2−j (−q/p) = c1 f p,, . j =0

j =0


535

q p Thus, ∞ j =0 j (f ) converges uniformly to a continuous function on S . In view of the X q convergence of (2.11), this continuous function is equal to f almost everywhere on S . Moreover, f ∞

∞

j (f )∞ c1 f p,, .

j =0

3. Main results Our main purpose in this section is to describe the upper and lower bounds on quadrature formulas on the sphere. Although we are mainly interested at this time in the quadrature formulas of the form x∈C wx f (x), where C is a finite subset of Sq , we prefer to state our theorems for more general approximations of the integral f dq , for example, allowing approximations which involve averages of f on finitely many caps rather than point evaluations. Towards this goal, we observe that for any point x ∈ Sq , one has the Dirac-delta measure x , with the property that for any continuous f : Sq → R, we have f dx = f (x). Sq

So, one may write wx f (x) = x∈C

Sq

f d,

where the measure is defined by = x∈C wx x . This notation has the advantage that we need not specify the points x, the weights wx , or the number of points involved in the sum. We recall that the total variation measure of any signed measure is defined for Borel subsets U ⊂ Sq by ||(U) := sup

∞

|(Ui )|

i=1

where the supremum is taken over all countable, -measurable partitions {Ui } of U. In the case when = x∈C wx x , one can easily deduce that ||(Sq ) = x∈C |wx |. A (closed) spherical cap with center y and (angular) radius is defined by q S (y) := x ∈ Sq : dist(x, y) , where dist(x, y) is the geodesic distance on Sq , dist(x, y) := cos−1 (x · y),

x, y ∈ Sq .

(That the geodesic distance is a metric is very well known for q = 2. That the triangle inequality dist(x, y) dist(x, z) + dist(y, z) holds for x, y, z ∈ Sq and general q follows from the fact that the intersection of Sq with span (x, y, z) is isomorphic to S2 , or to S1 or S0 , for which the result is already known.) If r > 0, A1, and is a (possibly signed) measure, we will say that is (A, r)-continuous if for every spherical cap C, we have ||(C)A(q (C) + r q ).

(3.1)

Note that q is (1, r)-continuous for every r > 0, and so is any measure of the form (B) = ∞ B f dq for fixed f ∈ L , f ∞ 1. In general, is (A, r)-continuous if and only if ||/A is

536


(1, r)-continuous. In view of Lemma 9, (3.1) is equivalent to the statement that for some constant A1 > 0, ||(C)A1 q (C) for all caps C of radius at least r. In particular, for fixed A > 0 and arbitrary r, with 0 < r 1, every (A, r)-continuous measure satisfies ||(Sq )c,

(3.2)

with c independent of (i.e., c may depend on A, but not on r ∈ (0, 1]). We now formulate our upper bound on the quadrature error as follows. Theorem 5. Let A 1, 1 p ∞, 0 < ∞, and > q/p. For any sequence {N }N∈N0 of (possibly signed) (A, 2−N )-continuous measures N on Sq that satisfy q P dN = P dq , P ∈ 2N , N ∈ N0 , (3.3) Sq

Sq

the estimate

f dq − f dN cf p,, , Sq Sq N∈N0 ,

f ∈ Bp, ,

(3.4)

holds. We observe that most of the constructions mentioned in the introduction yield positive measures N , supported on O(2Nq ) points of Sq , such that (3.3) is satisfied. Reimer [24, Lemma 1] has proved that all such measures satisfy the continuity condition required in the theorem. A simpler proof, stated for arbitrary positive measures satisfying (3.3), is given in [16, Theorem 3.3]. The estimate (3.4) clearly implies that f dq − f dN c2−N f p,, . (3.5) Sq

Sq

In view of Proposition 2, this estimate, with p = = 2, implies that of Hesse and Sloan in [10] for S2 and also the extension to Sq , q 2, in [1]. We now turn our attention to the lower bounds. These will be obtained by constructing a “bad function” for each quadrature formula such that the quadrature error for this function is estimated from below by the right-hand side of (3.5), assuming M = O(2Nq ). Let Q be any quadrature formula based on M points. In [11, Theorem 1] for S2 and in [9, Theorem 1] for Sq , with arbitrary q 2, it was shown that there exists a constant c, independent of M and of the quadrature points and weights, such that there exist 2M disjoint caps on the sphere, each of radius c/M 1/q . Necessarily, there are at least M of these caps that do not contain any of the quadrature points. Then the “bad function” was constructed as a sum of judiciously chosen functions, each supported on one of the caps not containing a quadrature point. In the present paper we prefer a variant of the argument, in which M spherical caps containing quadrature points are supposed given, and the need is to construct M new caps which are disjoint from each other and from the original caps. The advantage of this approach is that it can be extended to a somewhat more general setting.


537

For a compact set K ⊂ Sq , and > 0, let the -neighborhood of K be defined by N (K) := {x ∈ Sq : dist (x, K) }. If K consists of M points then N (K) is the union of M spherical caps of angular radius . In general, if K is the support of any signed measure, and if for some fixed ∈ (0, 1) and sufficiently small > 0 we have q (N (K))(1 − )q ,

(3.6)

then we will show that there exist at least c−q mutually disjoint caps of radius /2 which are also disjoint from K. (The constant c may depend on .) Our lower bound will be stated in terms of , allowing us to choose the largest for which (3.6) is satisfied. In the case in which K is a well distributed configuration of M points, the largest such satisfies ∼ M −1/q . Theorem 6. Let 1 p ∞, 0 < ∞, > q/p, let be any signed measure supported on a compact subset K ⊂ Sq , and let ∈ (0, 1) and > 0 be such that q (N (K))(1 − )q .

Then there exists f ∗ ≡ / 0, f ∗ ∈ Bp, , such that ∗ ∗ c() f ∗ p,, , f d − f d q Sq

(3.7)

Sq

where c() is a positive constant depending only on , q, p, , and , but not on , f ∗ , , or K. In particular, if for an integer M 1, is a signed measure supported on at most M points, then sup f dq − f d c1 M −/q , (3.8)

f ∈Bp,

Sq

Sq

with a positive constant c1 depending on q, p, , and , but not on . We conclude this section with an explicit result for the worst-case complexity of quadrature based on finitely many points. This exploits the agreement with respect to order between the upper bound stated in (3.5) with M ∼ 2Nq and the lower bound (3.8). If M 1 is an integer, w = (w1 , . . . , wM ), C = {x1 , . . . , xM } ⊂ Sq , we write for > q/p, M f dq − wk f (xk ) errorq,p,,,M (w, C) := sup q f p,, =1 S k=1

and Eq,p,,,M := inf{errorq,p,,,M (w, C) : w ∈ RM , C ⊂ Sq , |C| = M}. Theorem 7. Let 1 p ∞, 0 < ∞, and > q/p. Then for M 1, Eq,p,,,M ∼ M −/q .

538


4. Proofs In order to prove Theorem 5, we find it convenient to introduce another kernel. We define ˜ j (h, t):=j +1 (h, t)−j −2 (h, t)=

∞ =0

h˜ j ()

q

d P (q + 1; t), q

t ∈ R, j = 2, 3, . . . , (4.1)

where h˜ j () = h

2j +1

−h

.

2j −2

˜ j (h, t) = j (h, t), j = 0, 1, and ˜ j (h, t) = j (h, t) = 0 if j < 0. The following We define Lemma 8 is essentially contained in [16, Proposition 4.1]. Lemma 8. Let 1 p ∞, A 1, be a (possibly signed) (A, r)-continuous measure on Sq , and 1/2j +1 r 1. Then there exists a constant c independent of A and r such that ! "1/p ˜ j (h, ◦ · y)| d||(y) c(||(Sq ))1/p A(2j r)q | , j = 0, 1, . . . , (4.2) Sq

p

where 1/p + 1/p = 1, with the usual understanding if p = 1 or ∞. Proof. For any sequence {an }n∈N0 , let an := 1 an := an+1 − an ,

k an := (k−1 an ),

k = 2, 3, . . . .

To apply Proposition 4.1 in [16], we define h() := h˜ j (), where h˜ j () is defined above, and observe that h() = 0

for 2j +1 and h() = 0 for 2j −4 .

This allows us to apply Proposition 4.1 in [16] with D = 2j +1 , C1 = 1/32, and C2 = 1. ˜ j (h, x · y), the estimate (4.2) Choosing also K = q + 1 and = r, and replacing (h, x · y) by in Proposition 4.1 in [16] gives Sq

j +1

˜ j (h, x · y)| d||(y) cA(2j +1 r)q |

q+1 2

( + 1)i−1 |i h()|,

x ∈ Sq .

(4.3)

i=1 =0

A repeated application of the mean value theorem shows that |i h()| c max |h(i) (t)| c2−j i max |h(i) (x)| c2−j i , t∈R

x∈R

from which it follows that ˜ j (h, x · y)| d||(y) cA(2j r)q , | Sq

This proves (4.2) for the case p = ∞.

x ∈ Sq .

1 i q + 1,

(4.4)


539

To prove the result for the case p = 1, it is useful to note that as a special case of (4.4) we have ˜ j (h, x · y)| dq (y)c, x ∈ Sq , | Sq

since for the Lebesgue surface measure q the assumption of (A, r)-continuity is satisfied for A = 1 and every choice of r, thus we may choose r = 1/2j +1 . Applying the Fubini theorem, we now obtain ˜ j (h, ◦ · y)| d||(y) = ˜ j (h, x · y)|dq (x) d||(y) c||(Sq ), | | Sq

Sq

1

Sq

proving (4.2) for the case p = 1. The result in (4.2) then follows for all values of p from the interpolation inequality 1/p

1/p

gp g1 g∞ ,

g ∈ L1 , p

which follows by applying Hölder’s inequality to gp .

For the convenience of the reader, a self-contained sketch of the proof of the key estimate (4.3), following [14,16], is given in the Appendix. Proof of Theorem 5. From the definitions (2.9) and (2.10), we see that Hence, (2.11) can be written in the form ∞

f = N (f ) +

j (f ),

N

j =0 j (f )

= N (f ).

N ∈ N0 ,

j =N+1 q

where (cf. Corollary 4) the series converges uniformly on Sq . Since N (f ) ∈ 2N , we have, from the assumption (3.3), ∞ f dN = N (f ) dN +

j (f ) dN Sq

Sq

q j =N+1 S

=

Sq

N (f ) dq +

∞ q j =N+1 S

j (f ) dN .

Since (2.9) shows that Sq N (f ) dq = Sq f dq , we obtain ∞ ∞ N := f dq − f dN =

j (f ) dN q j (f ) dN . q q q S S j =N+1 S j =N+1 S (4.5) We will estimate each of the terms on the right-hand side above. In the sequel, we will assume that N 2, since for N = 0 and 1 the bound (3.5) is trivial. Now, let j N + 1. Since from (2.10)

j (f )(, k) = h j − h j −1 fˆ(, k), 2 2

540


j j −2 . So it follows from the definition of h in Section 2.2 that j (f )(, k) = 0 if 2 or 2 q

j (f ) is orthogonal to 2j −2 . In particular,

j (f )(y)j −2 (h, x · y) dq (y) = 0, x ∈ Sq .

Sq

Similarly, using the fact that h(/2j +1 ) = 1 if 2j , we deduce that

j (f )(y)j +1 (h, x · y) dq (y) = j (f )(x), x ∈ Sq . Sq

Therefore, for j N + 1 and x ∈ Sq , with (4.1),

j (f )(x) =

j (f )(y)j +1 (h, x · y) dq (y) − Sq

=

Sq

Sq

j (f )(y)j −2 (h, x · y) dq (y)

˜ j (h, x · y) dq (y).

j (f )(y)

In essence, this holds because h˜ j () = 1 whenever j (f )(, k) = 0. Using Fubini’s theorem, we then obtain

˜

j (f )(x) dN (x) =

j (f )(y) j (h, x · y) dN (x) dq (y). Sq

Sq

Sq

Since each N is (A, 2−N )-continuous, an application of Hölder’s inequality and (4.2) with p in place of p now shows that (on noting also (3.2)) j −N q/p ˜ j (h, x · ◦)dN (x)

j (f )dN j (f )p ) j (f )p . (4.6) c(2 Sq

Sq

p

From (4.5) and (4.6), we conclude that 2Nq/p N c

∞

2j q/p j (f )p .

j =N+1

Using (2.5), this leads to

{N }N∈N0 , = {2Nq/p N }N∈N0

,−q/p

⎫ ⎧ ⎨ ∞ ⎬ j q/p 2 (f ) j p ⎩ ⎭ j =N+1

.

(4.7)

N∈N0 ,−q/p

Bp, ,

Since f ∈ we may use (2.6), (2.5), and (2.12) to obtain ⎧ ⎫ ⎨ ∞ ⎬ j q/p 2 (f ) c{2j q/p j (f )p }j ∈N0 ,−q/p j p ⎩ ⎭ j =N+1 N∈N0 ,−q/p

= c{ j (f )p }j ∈N0 , c2 f p,, . From (4.5), (4.7), and (4.8), we finally obtain the estimate (3.4).

(4.8)


541

We find it convenient to organize our proof of Theorem 6 in a number of lemmas. The following simple lemma gives a useful estimate on the volume of spherical caps. Lemma 9. Let y ∈ Sq and ∈ [0, ]. Then q

q (S (y)) ∼ q .

(4.9)

Proof. Let n be the point (0, . . . , 0, 1) ∈ Sq , using the standard Cartesian coordinate system. In q q view of the rotational invariance of q , q (S (y)) = q (S (n)). Writing x = (sin x , cos ) ∈ Sq , x ∈ Sq−1 , ∈ [0, ], we observe that S (n) = {x ∈ Sq : x · n cos } = {(sin x , cos ) : x ∈ Sq−1 , 0 }. q

It is now elementary to check, as in [20], that q sinq−1 d. q (S (n)) = q dq (x) = q−1 S (n)

0

First, let ∈ [0, /2]. Then the estimates 2 sin ,

∈ [0, /2],

show that q

q (S (n)) ∼ q . q

If ∈ [/2, ], then clearly, q /2 q (S (n)) q , completing the proof.

The next lemma will allow us to establish the existence of adequately many mutually disjoint spherical caps in the complement of the support K of the measure in Theorem 6. In our application of the following lemma, G plays the role of the complement of an -neighborhood of K with = /2. Lemma 10. Let 0 < , and let G ⊂ Sq be a compact set with q (G) > 0. There exists a finite subset Y ⊂ G with cardinality |Y | satisfying cq (G) −q |Y |c1 −q q (N (G)),

(4.10)

q

such that the caps S (y), y ∈ Y , are mutually disjoint, and the constants c and c1 are independent of and G. Proof. In this proof only, we say that a set Y ⊂ G is 2 -distinguishable, if dist(x, y) > 2 for all x, y ∈ Y , x = y. Since G is compact, such a set is # necessarily finite. Let Y be a maximal q set of 2 -distinguishable points in G. If x ∈ G, and x ∈ / y∈Y S2 (y), then Y ∪ {x} is a strictly larger set of 2 -distinguishable points, which contradicts the maximal property of Y. Therefore, q G ⊂ ∪y∈Y S2 (y). In view of (4.9), we deduce that q q (G) q (S2 (y))c|Y | q . y∈Y

542


This proves the first inequality in (4.10). Since a set of 2 -distinguishable points, the caps # Y is q q S (y), y ∈ Y are mutually disjoint. Since y∈Y S (y) ⊂ N (G), it follows from (4.9) that ⎛ ⎞ $ q q |Y | q ∼ q (S (y)) = q ⎝ S (y)⎠ q (N (G)). y∈Y

y∈Y

This proves the second inequality in (4.10).

The following corollary will be used in our proof of Theorem 6. Corollary 11. Let , ∈ (0, 1), and let K be a compact subset of Sq satisfying q (N (K)) < (1 − )q . Then the closure of the set Sq \ N (K) contains a finite subset Y with |Y | ∼ −q , such q that the caps S/2 (y), y ∈ Y are mutually disjoint, and do not intersect K. The implied constants in |Y | ∼ −q may depend on but not on or K. Proof. We use Lemma 10, with G being the closure of Sq \ N (K), and = /2. The condition q (N (K)) < (1 − )q ensures that q (G) > q > 0, and the fact that < ensures that the q caps S (y) do not intersect K. In particular, if K consists of M points, then there exist M mutually disjoint caps of radius c/M 1/q which do not contain any point of K. This follows from the corollary on choosing say = 1/2 and = 2c/M 1/q with a suitable choice of c. Next, we establish a lemma, following ideas in [9], that will help us to estimate the iterated Laplace–Beltrami operators applied to zonal functions. The Laplace–Beltrami operator ∗ is defined in the distributional sense by ∗ f (, k) := −( + q − 1)fˆ(, k),

q

= 0, 1, . . . , k = 1, . . . , d .

It is known (cf. [20]) that ∗ is the angular part of the Laplacian on Rq+1 . In particular, it is a surface differential operator, and if f is twice continuously differentiable and f (x) = 0 on an open subset of Sq , then ∗ f (x) = 0 on that subset. A zonal function is a function of the form x ∈ Sq → f (x · z), where z ∈ Sq is fixed and f : [−1, 1] → R is an integrable function. We will need the following facts (cf. [20]). If f is a zonal function, then with t = x · z we obtain 1 f (x · z) dq (x) = q−1 f (t) (1 − t 2 )(q−2)/2 dt. Sq

−1

Further, if : [−1, 1] → R is twice differentiable, and

Dq (t) := −qt (t) + (1 − t 2 ) (t), we obtain ∗ (◦ · z) = (Dq )(◦ · z).

(4.11)

In the remainder of this section, let C+ denote the space of all infinitely differentiable functions on (−∞, 1] such that (t) = 0 for all t 0.


543

Lemma 12. For every integer s 1, there exist linear differential operators Tj,s : C+ → C+ of order 2s with the following property. Let 0 < < 1, ∈ C+ , and g(t) := ((t − )/(1 − )). Then s t − Dqs g(t) = (1 − )−j (Tj,s ) . (4.12) 1− j =0

Proof. Writing u = (t − )/(1 − ) and hence t = + (1 − )u = 1 − (1 − )(1 − u), we calculate that ! " 1 − t 2 = 2(1 − t) − (1 − t)2 = (1 − ) 2(1 − u) − (1 − )(1 − u)2 , and hence, t 1 − t 2 (u) + (u) 1− (1 − )2 " & ! 1 % −q (u) + 2(1 − u) (u) + q(1 − u) (u) − (1 − u)2 (u) = 1− 1 =: (T1,1 )(u) + (T0,1 )(u). 1−

Dq g(t) = −q

This proves (4.12) for s = 1. Clearly, T1,1 and T0,1 are both in C+ . The general statements follow easily by induction. Corollary 13. Let ∈ C+ , 0 < < /2, y ∈ Sq , and let y, : Sq → R be defined by y, (x) := ((x · y − cos )/(1 − cos )), x ∈ Sq . Then for integer s 1, (∗ )s y, ∞ c −2s ,

(4.13)

where c may depend on . q

Proof. Since (t) = 0 for all t 0, we see that y, (x) = 0 if x ∈ / S (y). Moreover, in view of (4.11) and Lemma 12, we see that for all x ∈ Sq s x · y − cos

c(s, )(1 − cos )−s . |(∗ )s y, (x)| = (1 − cos )−j (Tj,s ) 1 − cos j =0 Since 1 − cos = 2(sin( /2))2 ∼ 2 , this proves (4.13).

Next, we construct the function f ∗ required in Theorem 6. We define ⎧ 0 ⎪ ⎪ ⎪ −1 t ⎨ 1/2 2 2 (t) := exp exp du du ⎪ u(2u − 1) u(2u − 1) ⎪ 0 0 ⎪ ⎩ 1 It is clear that ∈ C+ , and 0 (t) 1 for all t ∈ (−∞, 1].

if t 0, if 0 < t < 1/2, if 1/2 t 1.

544


Next, with K, and as in Theorem 6 we take the set Y as in Corollary 11, choose = /2, and define x · y − cos

, x ∈ Sq , y, (x) := 1 − cos

and then f ∗ = f ∗ () :=

y, .

y∈Y

The support of the function f ∗ is the union of the |Y | spherical caps S (y) = S/2 (y), y ∈ Y , and is a subset of Sq \ N (K). We recall from Corollary 11 that |Y | ∼ −q . In the next lemma, we estimate the Besov space norm of f ∗ . q

q

Lemma 14. Let 1 p ∞, 0 < ∞, > 0, and > 0. With f ∗ = f ∗ () constructed as above we have f ∗ p,, c− .

(4.14)

Proof. In this proof only, let s be the smallest integer such that 2s > . We will assume at first that < ∞; the case = ∞ is simpler. Recalling that each y, with = /2 is supported on q S/2 (y), and that these caps are disjoint, we see that f ∗ (x) =

y, (x) 0

q

if x ∈ S/2 (y) for some y ∈ Y, otherwise.

(4.15)

Thus, we have f ∗ ∞ = y, ∞ 1. If 1 p < ∞, then using (4.9) and |Y | ∼ −q , we conclude that q ∗ p p |y, | dq = |y, |p dq |Y |q (S/2 (y)) c−q q = c. f p = q q y∈Y

S

y∈Y

S/2 (y)

Thus, for all p, 1 p ∞, E2j, p (f ∗ )f ∗ p c,

j = 0, 1, 2, . . . .

(4.16)

So, !

"

2j E2j, p (f ∗ )

2j 0. We use (A.2) with the choice H () = h(/2j ), so that D = 2j , and q in place of , 1/D in place of r. With these choices, (H, t) = j (h, t), and we obtain Sq

|j (h, x · y)| dq (y)c

q+1 ∞

( + 1)i−1 |i H ()|,

x ∈ Sq .

i=1 =0

A repeated application of the mean value theorem yields, as in the proof of (4.4), that the right hand side of the above inequality is bounded uniformly in j. Thus, |j (h, x · y)| dq (y)c, x ∈ Sq . (A.16) Sq

An application of Fubini’s theorem now yields j (f )1 cf 1 for all f ∈ L1 and j (f )∞ cf ∞ for all f ∈ L∞ . In view of the Riesz–Thorin interpolation theorem [5, Theorem 4.3], q this yields j (f )p cf p for every f ∈ Lp . Since j (f ) ∈ 2j and j (P ) = P for every q q P ∈ 2j −1 , we have for every P ∈ 2j −1 E2j, p (f )f − j (f )p = f − P − j (f − P )p cf − P p . This completes the proof of (A.15).

Proof of Theorem 3. This proof is based on the proof of the equivalence of parts (a) and (b) of [15, Theorem 3.3] (cf. also the proof of [19, Theorem 4]). From the definitions, we have for any integer n 0, n (f ) = nj=0 j (f ). Therefore, (A.15) implies that n f −

j (f ) = f − n (f )p cE2n−1, p (f ). j =0 p

If f ∈ Xp then E2n−1, p (f ) → 0 as n → ∞. This proves (2.11). The series representation (2.11) and the estimates (A.15) imply that for j = 0, 1, . . ., j ∞

n (f ) n (f )p . E2j, p (f )f − j (f )p = f − n=0 n=j +1 p

Therefore, the discrete Hardy inequality (2.6) leads to the first inequality in (2.12).


551

Using (A.15) again (and recalling our convention that n = 0 when n < 0), we see that for n = 0, 1, 2, . . ., n (f )p = f − n (f ) − (f − n−1 (f ))p f − n (f )p + f − n−1 (f )p cE2n−2, p (f ). This implies the second inequality in (2.12).

References [1] J.S. Brauchart, K. Hesse, Numerical integration over spheres of arbitrary dimension, Constr. Approx. 25 (2007) 41–71. [2] G. Brown, F. Dai, Y.S. Sun, Kolmogorov widths of classes of smooth functions on the sphere S d−1 , J. Complexity 18 (2002) 1001–1023. [3] F. Dai, Jackson-type inequality for doubling weights on the sphere, Constr. Approx. 26 (1) (2006) 91–112. [4] F. Dai, Characterizations of function spaces on the sphere using frames, Trans. Amer. Math. Soc. 359 (2) (2007) 567–589. [5] R.A. DeVore, G.G. Lorentz, Constructive Approximation, Springer, Berlin, 1993. [6] J.R. Driscoll, D.M. Healy, Computing Fourier transforms and convolutions on the 2-sphere, Adv. Appl. Math. 15 (1994) 202–250. [7] A. Erdélyi (Ed.), W. Magnus, F. Oberhettinger, F. G. Tricomi (research associates), Higher Transcendental Functions, vol. II, California Institute of Technology, Bateman Manuscript Project, McGraw-Hill Book Company, Inc., New York, Toronto, London, 1953. [8] Q.T. Le Gia, H.N. Mhaskar, Polynomial operators and local approximation of solutions of pseudo-differential equations on the sphere, Numer. Math. 103 (2006) 299–322. [9] K. Hesse, A lower bound for the worst-case cubature error on spheres of arbitrary dimension, Numer. Math. 103 (2006) 413–433. [10] K. Hesse, I.H. Sloan, Cubature over the sphere S 2 in Sobolev spaces of arbitrary order, J. Approx. Theory 141 (2006) 118–133. [11] K. Hesse, I.H. Sloan, Optimal lower bounds for cubature error on the sphere S 2 , J. Complexity 21 (2005) 790–803. [12] K. Jetter, J. Stöckler, J.D. Ward, Norming sets and spherical cubature formulas, in: Z. Chen, Y. Li, C. Micchelli, Y. Xu (Eds.), Computational Mathematics, Marcel Decker, New York, 1998, pp. 237–245. [13] P.I. Lizorkin, Kh.P. Rustamov, Nikolskii–Besov spaces on the sphere in connection with approximation theory, Tr. Mat. Inst. Steklova 204 (1993) 172–201 (Proc. Steklov Inst. Math. 3 (1994), 149–172). [14] H.N. Mhaskar, Polynomial operators and local smoothness classes on the unit interval, J. Approx. Theory 131 (2004) 243–267. [15] H.N. Mhaskar, On the representation of smooth functions on the sphere using finitely many bits, Appl. Comput. Harmon. Anal. 18 (3) (2005) 215–233. [16] H.N. Mhaskar, Weighted quadrature formulas and approximation by zonal function networks on the sphere, J. Complexity 22 (2006) 348–370. [17] H.N. Mhaskar, F.J. Narcowich, J.D. Ward, Approximation properties of zonal function networks using scattered data on the sphere, Adv. Comput. Math. 11 (1999) 121–137. [18] H.N. Mhaskar, F.J. Narcowich, J.D. Ward, Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature, Math. Comput. 70 (235) (2001) 1113–1130 Corrigendum: Math. Comput. 71 (2001) 453–454. [19] H.N. Mhaskar, J. Prestin, Polynomial frames: a fast tour, in: C.K. Chui, M. Neamtu, L.L. Schumaker (Eds.), Approximation Theory, XI, Gatlinburg, 2004, Nashboro Press, Brentwood, 2005, pp. 287–318. [20] C. Müller, Spherical Harmonics, Lecture Notes in Mathematics, vol. 17, Springer, Berlin, 1966. [21] S. Pawelke, Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tôhoku Math. J. 24 (1972) 473–486. [22] D. Potts, G. Steidl, M. Tasche, Fast algorithms for discrete polynomial transforms, Math. Comput. 67 (1998) 1577–1590. [23] D.L. Ragozin, Constructive polynomial approximation on sphere and projective spaces, Trans. Amer. Math. Soc. 162 (1971) 157–170. [24] M. Reimer, Hyperinterpolation on the sphere at the minimal projection order, J. Approx. Theory 104 (2) (2000) 272–286.

552


[25] I.H. Sloan, R.S. Womersley, Extremal systems of points and numerical integration on the sphere, Adv. Comput. Math. 21 (2004) 107–125. [26] E.M. Stein, G. Weiss, Fourier Analysis on Euclidean Spaces, Princeton University Press, Princeton, NJ, 1971. [27] A.H. Stroud, Approximate Calculation of Multiple Integrals, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1971. [28] G. Szegö, Orthogonal Polynomials, American Mathematical Society Colloqium Publication, vol. 23, American Mathematical Society, Providence, RI, 1975.


Note

A note on the complexity and tractability of the heat equation Arthur G. Werschulza, b,∗ a Department of Computer and Information Sciences, Fordham University, New York, NY 10023, USA b Department of Computer Science, Columbia University, New York, NY 10027, USA

Received 15 August 2006; accepted 31 January 2007 Available online 17 February 2007

Abstract We wish to solve the heat equation ut = u − qu in I d × (0, T ), where I is the unit interval and T is a maximum time value, subject to homogeneous Dirichlet boundary conditions and to initial conditions u(·, 0) = f over I d . We show that this problem is intractable if f belongs to standard Sobolev spaces, even if we have complete information about q. However, if f and q belong to a reproducing kernel Hilbert space with finite-order weights, we can show that the problem is tractable, and can actually be strongly tractable. © 2007 Elsevier Inc. All rights reserved. Keywords: Heat equation; Information-based complexity; Tractability; Weighted reproducing kernel Hilbert spaces

1. Introduction This issue of the Journal of Complexity is being dedicated as a special Festschrift marking Henryk Wo´zniakowski’s 60th birthday, and I am grateful for the opportunity to contribute to the festivities. Henryk and I have known each other for over one-half of our lifetimes (!), going back to 1974, when I asked him to be on my doctoral committee at Carnegie-Mellon. Over the last 30 odd years, Henryk has been my mentor, colleague, and dear friend. When asked to give a one-word description of Henryk, the word mentsch 1 immediately comes to mind. I am happy to join with Henryk’s other colleagues and friends in wishing him all the best as we mark this milestone. ∗ Corresponding address: Department of Computer Science, Columbia University, New York, NY 10027, USA.

Fax: +1 212 666 0140. E-mail address: [email protected]. 1 A nearly untranslatable Yiddish word, roughly meaning “the epitome of a decent, caring, upright person”.


554

Arthur G. Werschulz / Journal of Complexity 23 (2007) 553 – 559

One of the topics that Henryk and I have been studying is the complexity and tractability of “quasilinear problems”. The basic idea here is motivated by the observation that whereas the solution u of a linear operator equation Lu = f depends linearly on f, this often does not tell the whole story. The operator L often depends on one or more coefficient functions. One of the simplest examples is the case Lq = − + q of the Helmholtz (or Schrödinger) operator on a d-dimensional domain. If we want to admit algorithms using partial information about both f and q, then we no longer have a linear problem. These considerations have led us to consider the approximate solution of d-dimensional problems given by an operator Sd , in which the mapping Sd (·, q) is linear for each q. Under mild smoothness conditions, we say that such problems are quasilinear. Henryk and I first studied quasilinear problems in [11]. That paper, based on [7], developed a general framework for establishing (strong) tractability of quasilinear problems over certain weighted reproducing kernel Hilbert spaces. We then used the tools of [11] to study tractability for the elliptic problem −u + qu = f , subject to homogeneous Dirichlet or Neumann boundary conditions; our results may be found in [12]. This note is similar in spirit to [12]. However, whereas [12] deals with a model elliptic problem, this paper will look at tractability of the heat equation *u (x, t) = −(Lq u)(x, t) *t u(x, 0) = f (x) u(x, t) = 0

∀ x ∈ Id,

t ∈ (0, T ),

∀ x ∈ Id,

∀ x ∈ *I d ,

(1)

t > 0.

Here, I = (0, 1) and q is the heat transfer rate for conductive loss to the ambient environment. As in [12], our main task is to prove an appropriate a priori inequality, which will establish the quasilinearity of our heat equation problem. Having done so, we can immediately apply the tools of [11], which will yield simple conditions guaranteeing (strong) tractability of the heat equation, as well as explicit error and cost bounds. However, since these explicit results are so similar to those of [12] and since they are so lengthy to state, this paper will merely summarize them by giving the exponents of tractability, which are the most important pieces of the puzzle. Those who wish to see a more precise statement of these results may consult [10]. 2. The heat equation We use the standard notation for Sobolev inner products, seminorms, norms, and spaces, found in (e.g.) [4,5,9]. Moreover, we let Z++ denote the set of positive integers, and let Qd denote the non-negative functions in L∞ (I d ). For f ∈ L2 (I d ) and q ∈ Qd , we wish to solve (1). Note that we can write Lq v, w = Bd (v, w; q) ∀ v, w ∈ H01 (I d ), where

Bd (v, w; q) =

Id

[∇v∇w + qvw]

∀ v, w ∈ H01 (I d ).

From [5, pp. 382–383], we have to see that there exists a unique solution u = Sd (f, q) ∈ L2 [0, T ]; H01 (I d ) ∩ H 1 [0, T ]; H −1 (I d ) to the heat equation (1), and that u ∈ C [0, T ]; L2 (I d ) .


555

Note that Sd (f, q) depends continuously on f and q, this bound being sharp in its dependence on f: Theorem 2.1. Let (f, q), (f˜, q) ˜ ∈ L2 (I d ) × Qd . Then ˜ C [0,T ];L

f − f˜ L2 (I d ) Sd (f, q) − Sd (f˜, q)

2 (I

d)

f − f˜ L2 (I d ) + T q − q

˜ L2 (I d ) f L∞ (I d ) . Proof. Let u = Sd (f, q) and u˜ = Sd (f˜, q). ˜ Since u(0) = f and u(0) ˜ = f˜, we immediately obtain the first inequality. Hence, it only remains to prove the second inequality. Choose t ∈ (0, T ), and let e(t) = u(t) − u(t). ˜ Since Lq is self-adjoint in L2 (I d ), we can check that

e(t) L2 (I d )

d ˜ + (q − q)u(t), ˜ e(t)L2 (I d )

e(t) L2 (I d ) = −Bd (e(t), e(t); q) dt (q − q)u(t), ˜ e(t)L2 (I d ) (q − q)u(t)

˜ L2 (I d ) e(t) L2 (I d ) ,

where we have used the fact that Bd (w, w; q)0 for any w ∈ H01 (I d ). Hence, d

e(t) L2 (I d ) (q − q)u(t)

˜ L2 (I d ) . dt

(2)

Recall (see, e.g. [2, Theorem 2.12]) that the strong maximum principle implies that

u(t) L∞ (I d ) f L∞ (I d ) , so that ˜ L2 (I d ) u(t) L∞ (I d ) q − q

˜ L2 (I d ) f L∞ (I d ) .

(q − q)u(t)

˜ L2 (I d ) q − q

Substituting this inequality into (2), we obtain d ˜ L2 (I d ) f L∞ (I d ) .

e(t) L2 (I d ) q − q

dt Since we have the initial condition

e(0) L2 (I d ) = f − f˜ L2 (I d ) , we find that ˜ L2 (I d ) f L∞ (I d ) .

e(t) L2 (I d ) f − f˜ L2 (I d ) + t q − q

Since t ∈ (0, T ) is arbitrary, this establishes the theorem.

Let Fd ⊂ L2 (I d ) × Qd be the set of problem elements (f, q) for which we wish to solve the heat equation. Let denote the class of admissible information functionals; thus, is either the set all of all continuous linear functionals or the set std of standard information consisting of function evaluations. Then card(, Sd , Fd , ) denotes the minimal number of -evaluations needed to compute an -approximation in the worst case setting under a given error criterion. The typical choices of error criterion are the absolute error criterion (in which we guarantee that the

556


worst case error is at most ) and the normalized error criterion (in which we guarantee that the initial error is reduced by a factor of at most ). The problem S = {Sd }∞ d=1 is tractable if there exist C > 0, perr 0, and pdim 0 such that perr 1 d pdim ∀ ∈ (0, 1), d ∈ Z++ . card(, Sd , Fd , ) C If no such perr and pdim exist, then the problem S is said to be intractable. Furthermore, the problem S is said to be strongly tractable if there exist C > 0 and pstrong > 0 such that pstrong 1 . card(, Sd , Fd , ) C 3. Intractability for classical Sobolev spaces Recall that our set Fd of problem elements is a subset of L2 (I d ) × Qd , where Qd denotes the non-negative elements of L∞ (I d ). We briefly discuss tractability when the first component of Fd is a ball of fixed radius in a standard Sobolev space H r (I d ). There is no essential loss of generality in assuming that this ball has unit radius. Theorem 3.1. Let = all . Regardless of whether the absolute or normalized error criterion is used, the heat equation is intractable if the first component of Fd is the unit ball BH r (I d ). Proof. First, suppose that we are using the absolute error criterion. From the lower bound in Theorem 2.1, we see that e(n, Sd , Fd , all ) e(n, Appd , BH r (I d ), all ), where Appd : H r (I d ) → L2 (I d ) is the approximation problem given by Appd f = f

∀f ∈ H r (I d ).

It is well known (see, e.g. [3]) that there exists Cd > 0 such that e(n, Appd , BH r (I d ), all ) Cd n−r/d . Combining these results, we see that d/r Cd all abs , card (, Sd , Fd , ) and hence our problem is intractable in the absolute error criterion. We now turn to the normalized error criterion. Fix (f, q) ∈ Fd and let u = Sd (f, q). For any t ∈ [0, T ], we have the series representation u(t) =

∞

e−j t f, zj L2 (I d ) zj ,

j =1

where z1 , z2 , . . . ∈ H01 (I d ) are the L2 (I )-orthonormal eigenvectors of Lq corresponding to the positive eigenvalues 1 2 . . . , from which we see that

u(t) L2 (I d ) f L2 (I d ) f H r (I d ) .


557

Since Sd (·, q) ∈ Lin[H r (I d ), L2 (I d )] for any q ∈ Qd , we may use the results of [6, Section 4.5], along with the previous inequality, to find that e(0, Sd , Fd , all ) = max

sup Sd (f, q)(t) L2 (I d ) 1.

0 t T (f,q)∈Fd

Hence, e(n, Sd , Fd , all ) e(0, Sd , Fd , all )

Cd n−r/d ,

and so we have cardnor (, Sd , Fd , all )

Cd

d/r .

Thus our problem is intractable in the normalized error criterion.

Note that we are approximating the solution of the heat equation Sd over the full time interval [0, T ]. One might well ask what would happen if we were only trying to approximate the solution Sd,t at a fixed positive time value t. Using the techniques of [8], one can show that for either the absolute or normalized error criterion, we have 1 1 d/2 card(, Sd,t , F˜d , all ) ∼ ln tcd (for some positive constant cd ). Hence this weaker problem is still intractable. 4. Tractability for finite-order weighted spaces Since the heat equation is intractable for classical Sobolev spaces, we ask what happens when the problem elements are based on balls of finite-order weighted spaces. Let Kd be a reproducing kernel of the form Kd (x, y) = d,u K(xj , yj ), u∈{1,... ,d} | u|

j ∈u

where K ∈ L∞ (I 2 ) is the reproducing kernel of a Hilbert space H (K) of univariate functions, and the d,u are non-negative numbers (weights). We shall assume that the weights have finite order (see, e.g. [1]), which means that there exists ∈ Z++ such that ∀ u ⊆ {1, . . . , d} such that |u| > ∀d ∈ Z++ .

d,u = 0

(3)

The order of a set of finite-order weights is the smallest ∈ Z++ such that (3) holds. Now we are ready to describe our problem element set Fd . We will choose Fd = Fd,F × (Fd,Q ∩ Qd ), where both Fd,F and Fd,Q will be balls of H (Kd ) having finite radius. We are now ready to state tractability and strong tractability results for the heat equation. These results depend on two additional pieces of data. The first is whether 1 1 2 = K(x, y) dx dy 0

0

558


is positive or zero. (Note that since K is a reproducing kernel, we know that 2 is finite and non-negative.) The second is whether we are dealing with finite-order weights of order having a bounded sum, i.e. whether d,u < ∞. sup 1 d 0 all std

perr 2, pdim 2 perr 4, pdim 4

2 = 0

Bounded sum 2 > 0


pstrong 2 pstrong 4

2 = 0

Bounded sum 2 > 0


pstrong 2 pstrong 4

2. For the normalized error criterion, we have General case 2 > 0 all std

perr 2, pdim perr 4, pdim 2

Hence, the heat equation is always tractable for finite-order weighted RKHSs, and it is strongly tractable if the sum of the weights is bounded. As mentioned above, we are giving neither the proofs nor the (lengthy!) exact statements of the results, which may be found in [10]. These can all be obtained by using the tools of [11], together with Theorem 2.1, as was done in [12] for the Helmholtz equation. It is worthwhile to compare the results for the heat equation with those we obtained in [12]: 1. The results for the heat equation under the absolute error criterion are the same as for the Helmholtz equation under both Dirichlet and Neumann boundary conditions. 2. The results for the heat equation under the normalized error criterion are the same as for the Helmholtz equation under Neumann boundary conditions. Note that we studied both Dirichlet and Neumann boundary conditions in [12]. The main reason for introducing Neumann conditions in [12] was that we were unable to establish strong tractability for the Dirichlet problem under the normalized error criterion, and we wanted to exhibit a version of the problem for which the Neumann problem was strongly tractable. Since the Dirichlet problem for the heat equation is strongly tractable under the normalized error criterion if the weights have a bounded sum, we did not feel the need to analyze the Neumann problem for the heat equation. Acknowledgment I am happy to thank A. Papageorgiou and J.F. Traub for their comments.


559

References [1] J. Dick, I.H. Sloan, X. Wang, H. Wo´zniakowski, Good lattice rules in weighted Korobov spaces with general weights, Numer. Math. 103 (1) (2006) 63–97. [2] G.M. Lieberman, Second Order Parabolic Differential Equations, World Scientific, River Edge, NJ, 1996. [3] E. Novak, Deterministic and Stochastic Error Bounds in Numerical Analysis, Lecture Notes in Mathematics, vol. 1349, Springer, New York, 1988. [4] J.T. Oden, J.N. Reddy, An Introduction to the Mathematical Theory of Finite Elements, Wiley-Interscience, New York, 1976. [5] M. Renardy, R.C. Rogers, An Introduction to Partial Differential Equations, second ed., Texts in Applied Mathematics, vol. 13, Springer, New York, 2004. [6] J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-Based Complexity, Academic Press, New York, 1988. [7] G.W. Wasilkowski, H. Wo´zniakowski, Finite-order weights imply tractability of linear multivariate problems, J. Approx. Theory 130 (2004) 57–77. [8] A.G. Werschulz, What is the complexity of related elliptic, parabolic, and hyperbolic problems?, Math. Comp. 47 (176) (1986) 461–472. [9] A.G. Werschulz, The Computational Complexity of Differential and Integral Equations: an Information-Based Approach, Oxford University Press, New York, 1991. [10] A.G. Werschulz, Complexity and tractability of the heat equation. Technical Report CUCS-031-06, Department of Computer Science, Columbia University, New York, NY, 2006. Available via the Web at http://mice.cs.columbia.edu/getTechreport.php?techreportID = 414. [11] A.G. Werschulz, H. Wo´zniakowski, Tractability of quasilinear problems. I: general results. J. Approx. Theory (2007). doi: 10.1016/j.jat.2006.09.005. [12] A.G. Werschulz, H. Wo´zniakowski, Tractability of quasilinear problems. II: elliptic problems, Math. Comp. 76 (258) (2007) 45–76.


Error propagation of general linear methods for ordinary differential equations J.C. Butchera,1 , Z. Jackiewiczb,2 , W.M. Wrightc,∗,3 a Department of Mathematics, The University of Auckland, Private Bag 92019, Auckland, New Zealand b Department of Mathematics, Arizona State University, Tempe, Arizona 85287, USA c Department of Mathematical and Statistical Sciences, La Trobe University, Melbourne, Vic. 3086, Australia

Received 1 September 2006; accepted 17 January 2007 Available online 27 March 2007 Dedicated to Henryk Wozniakowski on the occasion of his 60th birthday

Abstract We discuss error propagation for general linear methods for ordinary differential equations up to terms of order p + 2, where p is the order of the method. These results are then applied to the estimation of local discretization errors for methods of order p and for the adjacent order p + 1. The results of numerical experiments confirm the reliability of these estimates. This research has applications in the design of robust stepsize and order changing strategies for algorithms based on general linear methods. © 2007 Published by Elsevier Inc. MSC: 65L05; 65L06 Keywords: General linear methods; Nordsieck representation; Error propagation; Local error estimation for methods of adjacent orders; Adaptive stepsize selection; Stability analysis


E-mail addresses: [email protected] (J.C. Butcher), [email protected] (Z. Jackiewicz), [email protected] (W.M. Wright). URLs: http://www.math.auckland.ac.nz/butcher/ (J.C. Butcher), http://math.la.asu.edu/jackiewi/zdzislaw.html (Z. Jackiewicz), http://www.latrobe.edu.au/mathstats/staff/wright/ (W.M. Wright). 1 The research of this author was supported by the New Zealand Marsden Fund. 2 The research of this author was supported by the National Science Foundation under Grant DMS-0509597. 3 The research of this author was supported by a New Zealand Science and Technology Postdoctoral Fellowship.

0885-064X/$ - see front matter © 2007 Published by Elsevier Inc. doi:10.1016/j.jco.2007.01.009

J.C. Butcher et al. / Journal of Complexity 23 (2007) 560 – 580

561

1. Introduction This paper is concerned with the propagation of errors and the estimation of errors when a general linear method (GLM) is used for the numerical solution of an initial value problem y (x) = f (y(x)),

(1.1a)

y(x0 ) = y0 ,

(1.1b)

where f : Rm → Rm is given. It will be assumed throughout the paper that f is differentiable arbitrarily often and that, consequently, the exact solution is also arbitrarily smooth. We will consider GLMs with s stages and with r = p + 1 values passed from step to step, where p is the order of the method. We will also assume that the information output at the end p+1 of step n, with stepsize hn is supposed to approximate, to within O(hn ), the Nordsieck vector p made up from subvectors y(xn ), hn y (xn ), h2n y (xn ), . . . , hn y (p) (xn ). The p + 1 components of this output approximation will be written for convenience as yn , for the first component, and y [n] for the remaining p components, with the full p + 1 subvector output written as y[n] . That is, y[n] =

yn . y [n]

We will assume that a large number of steps, with output values y1 , y2 , . . . , yN , are to be computed as approximations to the solution of (1.1) at the points x1 , x2 , . . . , xN , respectively. Because we will need to use a local reference solution, we will write y(x), not as the global solution to this initial value problem, but as a function on [x0 , xN ) whose restriction to [xn−1 , xn ) is defined as the solution to (1.1a) subject to the initial condition y(xn−1 ) = yn−1 . Thus y is right-continuous and has jumps at the step values equal to the local truncation error in the computed approximations. p+1 Because the method will have order p the jump at xn will be equal to O(hn ). This notation will have to be extended slightly to allow for the fact that each stage value computed in step n will be related to the reference solution defined by y(xn−1 ) = yn−1 , whether or not the stage abscissa lies in [0, 1). Hence we will sometimes need to write yn−1 (x) to denote this reference solution. Thus yn−1 (xn ) is the limiting value of y(x) as x tends to xn from below. Since much of our analysis will be based on a single time step, or will deal with constant stepsize, we will for convenience write h = hn = xn − xn−1 . Using this simplification we can write, hp+1 y (p+1) (xn ) + O(hp+2 ) for the local truncation error in step n so that yn (xn ) − yn−1 (xn ) = −hp+1 y (p+1) (xn ) + O(hp+2 ).

(1.2)

The value of will be found in Theorem 2. As a consequence of the discontinuity at xn of y(x), there will be discontinuities also in the quantities that are used in the Nordsieck vector. For k > 1, the jump in hk y (k) (xn ) will be O(hp+3 ), whereas hyn (xn ) − hyn−1 (xn ) = −hp+2 f (yn )y (p+1) (xn ) + O(hp+3 ),

(1.3)

where f (yn ) denotes the matrix of partial derivatives *f/*y evaluated at yn . This linear operator will invariably occur in the context f (y)y (p+1) , where it operates on the vector y (p+1) .

562


We can write the partitioned coefficient matrix in the form ⎡A e U ⎤ A U = ⎣ bT 1 v T ⎦ , B V B 0 V

(1.4)

where A ∈ Rs×s , U ∈ Rs×p , B ∈ Rp×s , V ∈ Rp×p , b ∈ Rs and v ∈ Rp . We will limit our investigations to methods in which the eigenvalues of V are in the open unit disc. With this terminology, the process of computing yn and y [n] from input approximations yn−1 and y [n−1] , takes the form Y = eyn−1 + AhF + Uy [n−1] , Fi = f (Yi ), i = 1, 2, . . . , s, yn = yn−1 + bT hF + v T y [n−1] , y [n] = BhF + V y [n−1] .

(1.5)

We have chosen, for ease of notation, not to include the Kronecker products in the formulation of method or in the remainder of the paper. For example, given ∈ Rp , hp+1 y (p+1) (xn ) will denote the vector in Rpm consisting of p copies of hp+1 y (p+1) (xn ), scaled consecutively by the elements of . The vector e is defined as e = [1, . . . , 1]T ∈ Rs , so that eyn−1 denotes a vector made up from s copies of yn−1 . In (1.5), the internal stages are denoted by Yi and the corresponding stage derivatives Fi are approximations of stage order q to y(xn−1 + ci h), and y (xn−1 + ci h) for i = 1, 2, . . . , s, where c = [c1 , . . . , cs ]T denotes the abscissa vector. Note that Y , F denote the vectors ⎡ ⎤ ⎡ ⎤ F1 Y1 ⎢ F2 ⎥ ⎢ Y2 ⎥ ⎢ ⎥ ⎢ ⎥ Y = ⎢ . ⎥, F = ⎢ . ⎥. ⎣ .. ⎦ ⎣ .. ⎦ Ys

Fs

It will be assumed that the vectors y [n−1] and y [n] are approximations of order p to the Nordsieck vectors z(xn−1 , h) and z(xn , h), respectively, where z(x, h) is defined by ⎡ ⎤ hy (x) ⎢ h2 y (x) ⎥ ⎢ ⎥ z(x, h) = ⎢ (1.6) ⎥, .. ⎣ ⎦ . hp y (p) (x) and the scaled derivatives at xn−1 and xn are computed from right-derivatives of the discontinuous function y or, what is equivalent, as the limit of z(x, h) as x tends to xn−1 or xn , respectively, from above. To achieve order p we only need to approximate the Nordsieck vector at the end of step number n to within O(hp+1 ). However, we want to use linear combinations of the stage derivatives F and the input data y [n−1] in each step to estimate hp+1 y (p+1) (xn ) and hp+2 y (p+2) (xn ), so we need to analyse the quantities passed from step to step to within O(hp+3 ). Although we will need to carry out this task in a variable stepsize environment we will consider first the constant stepsize case. Even if we started a sequence of steps with an exact Nordsieck vector, this accuracy would not persist into later steps. In fact, perturbations consisting of combinations of hp+1 y (p+1) (xn ),


563

hp+2 y (p+2) (xn ) as well as of hp+2 f (yn )y (p+1) (xn ) would be introduced into every step. Thus we can assume that y [n] = z(xn , h) − n hp+1 y (p+1) (xn ) − n hp+2 y (p+2) (xn ) −n hp+2 f (yn )y (p+1) (xn ) + O(hp+3 ).

(1.7)

We will discuss in Section 2 the relationship between (n , n , n ) and the values these quantities have in the previous step. It will be shown that in the constant stepsize case they converge to fixed values and we can write ( , , ) for these limiting values. When the stepsize is allowed to vary from step to step, our aim will be to apply a generalization of a “scale and modify procedure” introduced in [5] to the output from the step so that in every step (1.7) can be regarded as being true, with the limiting values (n , n , n ) = ( , , ). Further details of how this can be achieved will be discussed in Sections 2–4. Even though we have limited our scope to methods which pass a Nordsieck vector from step to step, we further refine the class of methods by requiring that the internal stage approximations are of the same order as the solution approximation. In this situation, the stage order and order conditions take the simple form described in the following theorem, compare, for example, [10]. Theorem 1. A GLM in Nordsieck form has order and stage order p if and only if 1 (cz) = A exp(cz) + U Z + O(zp ), 1 (z) = bT exp(cz) + v T Z + O(zp ),

(1.8)

exp(z)Z = B exp(cz) + V Z + O(zp ), where z is a complex parameter and the basis vector Z = [1, z, . . . , zp−1 ]T . The rational function 1 (z) = (exp(z)−1)/z and both the exp and 1 functions are applied component-wise to a vector. As a direct consequence of the above theorem the matrices U , v T and V can be chosen so that the order and stage order are guaranteed to equal p. Eqs. (1.8) are equivalent to U = D − AC, v T = P − bT C,

(1.9)

V = E − BC, where the Vandermonde matrices C and D are

1 cp−1 , D = c C = e c 2!1 c2 . . . (p−1)! the vector P and Toeplitz matrix E are

1 P = 1 2!1 3!1 . . . p! , E = exp(K),

1 2 1 3 2! c 3! c

...

1 p p! c

,

K = 0 e1 e2 . . . ep−2 .

This can be seen by noting that 1 (cz) = DZ + O(zp ), 1 (z) = P Z + O(zp ), exp(cz) = CZ + O(zp ) and exp(z)Z = EZ + O(zp ). The results of this paper will apply only to methods where the stage order is equal to the order and an approximation of order O(hp+1 ) to a Nordsieck vector is passed from step to step. Even with these restrictions in place, several well-known classes of methods satisfy these criteria and we now discuss some of them.

564


As proposed by Nordsieck in [21] an efficient implementation of linear multistep methods, reinterprets the information passed from step to step to approximate a Nordsieck vector. Since linear multistep methods can be represented as one stage GLMs, with stage order equal to the order, these methods satisfy our criteria. Similarly, the criteria are satisfied by composite linear multistep methods, in which a selection of methods is used over a series of smaller steps and interpreted as one step of a larger method. The methods which make up the individual components will typically have inferior properties to the composite scheme. The predict evaluate correct (PEC) or predict evaluate correct evaluate (PECE) or variants, see [20], can also be represented in this framework, provided that the data passed from step to step is transformed into Nordsieck form and the method has stage order equal to the order; the reader is referred to [2], for further details. The Nordsieck representation of DIMSIMs, which was introduced in [3] (compare also [4,16]) corresponds to the case when s = p, V = 0 and the stability function has only one nonzero eigenvalue. This representation was inspired by the classical paper [21]. Results concerning the construction and implementation of DIMSIMs are discussed in [1,11,17]. GLMs with inherent Runge–Kutta stability (IRKS), investigated in [10,25], correspond to methods where s = p + 1, and the stability function has only one nonzero eigenvalue. These methods have many attractive properties (compare [6,7]) and their utilization as building blocks of powerful new algorithms for both nonstiff and stiff differential systems is a subject of recent work [8]. Approximating the local truncation error in a step is clearly necessary to achieve any sort of rational stepsize control. However, we will also wish to provide for variable order and this requires the assessment of the relative efficiencies of several alternative methods. This will include the method currently in operation and a contending method of one higher order. In the present paper, in Section 3, we show how to estimate both hp+1 y (p+1) and hp+2 y (p+2) thus allowing for a reliable assessment of the relative advantages of retaining order p or increasing the order to p + 1. Observe also that the last component of y [n] carries an approximation to hp y (p) which allows for the local error estimation of the method of order p − 1. The approach we use in this paper is not the only way of making this comparison dynamically, and we draw attention to a recent paper [9] which provides an alternative approach. In Section 2 we will discuss starting methods and the underlying one-step method, providing the motivation for studying the error propagation of methods. Section 3 investigates error propagation and the scale and modify process which ensures that (1.7) can be regarded as true, even when the stepsize is varied. In Section 4 we estimate the local error hp+1 y (p+1 )(xn ) and the quantities hp+2 y (p+2) (xn ) and hp+2 f (yn )y (p+1) (xn ) which are needed to compute the corrections in the scale and modify process. In Section 5 a zero-stability analysis is provided. A selection of methods of orders two and three along with error estimates and regions of zero-stability are included in Section 6. Several numerical experiments will be given in Section 7, which will validate the aims of this paper. The main results obtained in this paper are briefly summarized in Section 8.

2. Starting methods and the underlying one-step method To understand the scale and modify procedure, that forms the basis of this paper, we focus attention on the relationship between yn , the approximation to y(xn ), and y [n] , the vector made up from the remaining components of the output y[n] at the end of step number n. We will assume that we are attempting to approximate an idealized quantity which we will write as Syn . Define F as the mapping which selects from y[n] just the first subvector yn . Thus, F ◦ S = id. This


565

assumes that we have a suitable starting procedure S defined so that S ◦ R = M ◦ S,

(2.1)

or, what is equivalent R = F ◦ M ◦ S, where M denotes the action of applying the method to the input data available at the end of the step and R is the underlying one-step method. The concept of the underlying one-step method was introduced by Kirchgraber [19] in the context of linear multistep methods and extended to GLMs by Stoffer [24]. We also refer to [2,25] for additional discussion of underlying one step methods in the context of GLMs. Actually it will be sufficient if (2.1) holds only to within O(hp+3 ) accuracy because our aim will be to look for estimates of quantities which behave like hp+1 and hp+2 . To actually calculate Syn−1 at the start of step n we need to have some way of calculating the Nordsieck vector, with appropriate modifications of the hp+1 and hp+2 terms. Let y(x) denote the solution to (1.1a), restricted to the interval [xn−1 , xn ) = [xn−1 , xn−1 + h), and with initial value given by y(xn−1 ) = yn−1 . To actually evaluate Syn−1 , we need to calculate the various high derivatives of y(x) at x = xn−1 . In practice we will only want to work to within O(hp+3 ) so that elementary differentials up to order p + 2 are needed, but not for a higher order. We will now find specific formulae for (n , n , n ) along with various contributions to the local truncation error. This will make it possible to evaluate the first few terms in the definition of S. Theorem 2. Suppose the input to step number n consists of yn−1 = y(xn−1 ), y [n−1] = z(xn−1 , h) − n−1 hp+1 y (p+1) (xn−1 ) − n−1 hp+2 y (p+2) (xn−1 ) −n−1 hp+2 f (yn−1 )y (p+1) (xn−1 ) + O(hp+3 ).

Then the stage values, scaled stage derivatives and output values are given by Y = y(xn−1 + ch) − hp+1 y (p+1) (xn−1 ) + O(hp+2 ),

(2.2a)

hF = hy (xn−1 + ch) − hp+2 f (yn−1 )y (p+1) (xn−1 ) + O(hp+3 ),

(2.2b)

yn = yn−1 (xn ) − hp+1 y (p+1) (xn ) − hp+2 y (p+2) (xn ) − hp+2 f (yn )y (p+1) (xn ) + O(hp+3 ),

(2.2c)

y [n] = z(xn , h) − n hp+1 y (p+1) (xn ) − n hp+2 y (p+2) (xn ) −n hp+2 f (yn )y (p+1) (xn ) + O(hp+3 ),

(2.2d)

where =

cp+1 Acp − + U n−1 , (p + 1)! p!

=

1 bT cp − + v T n−1 , (p + 1)! p!

=

1 bT cp+1 − + v T n−1 − , (p + 2)! (p + 1)!

566


= bT + v T n−1 , n = Ep − B

cp + V n−1 , p!

n = Ep+1 − B

cp+1 + V n−1 − n , (p + 1)!

n = B − e1 + V n−1 . Proof. To verify (2.2a), expand the two sides by Taylor’s theorem; (2.2b) follows by noting that if two vectors differ by hp+1 y (p+1) (xn−1 ) + O(hp+2 ) then the results of applying hf to each of these vectors differ by hp+2 f (yn−1 )y (p+1) (xn−1 ) + O(hp+3 ). Finally, (2.2c) and (2.2d) follow by further applications of Taylor’s theorem about xn−1 followed by rewriting using y (p+1) (xn−1 ) = y (p+1) (xn ) − hy (p+2) (xn ) + O(h2 ).

Corollary 3. Let S denote the starting method in the definition of the underlying one-step method and write y [n−1] for the final p components of Syn−1 . Then y [n−1] = z(xn−1 , h) − hp+1 y (p+1) (xn−1 ) − hp+2 y (p+2) (xn−1 ) − hp+2 f (yn−1 )y (p+1) (xn−1 ) + O(hp+3 ),

(2.3)

where

cp = (I − V )−1 Ep − B , p! cp+1 = (I − V )−1 Ep+1 − B − , (p + 1)!

(2.4)

= (I − V )−1 (B − e1 ) . Proof. Because (V ) < 0, it follows from Theorem 2 that the values of (n , n , n ) converge to the given values of ( , , ) as n → ∞. These ideas are illustrated in Fig. 1, where R is the underlying one-step method and T is the local truncation error. To summarize this section, we have found a refinement to z(x, h), given by (1.6). In the constant stepsize case, y [n] is an approximation to within O(hp+1 ) to z(xn , h). However, if we replace z(xn−1 , h) by an adjusted input based on (2.3), we will obtain an output of the same form. Because of the central role played by this adjusted target value for the rest of the paper we will write z(x, h) = z(x, h) − hp+1 y (p+1) (x) − hp+2 y (p+2) (x) − hp+2 f (y(x))y (p+1) (x).

(2.5)


567

Fig. 1. A single step of the method M, with starting method S, finishing method F and underlying one-step method R. The local truncation error is represented by the symbol T .

3. Error propagation for GLMs Our aim is to investigate the error propagation in a variable stepsize setting and to maintain the form of the approximate Nordsieck vector given by (1.7), as the stepsize varies. It is not possible to achieve this aim with the current form of the method (1.5), unless the expression for y [n] is amended. As a first attempt to make this correction, the expression would be multiplied by the diagonal matrix D( ), where = n = hn+1 / hn and D( ) = diag( , 2 , . . . , p ). This will give the correct output to within O(hp+1 ) but a further adjustment is needed. The multiplication by D( ) and the additional correction constitute the scale and modify process and our aim will be to explore what is needed for the modify part of this process. This is described in the following theorem. Theorem 4. Assume that the input to step n, for which the stepsize will be hn = xn − xn−1 , consists of yn−1 and y [n−1] given by y [n−1] = z(xn−1 , hn )

p+1 (p+1)

hn = z(xn−1 , hn ) − p+2

− hn

y

p+2 (xn−1 ) − hn y (p+2) (xn−1 ) p+3

f (yn−1 )y (p+1) (xn−1 ) + O(hn

),

(3.1)

compare (2.5). Then the result of applying the scaling D( n ) to hn BF + V y [n−1] is p+1 (p+1)

z(xn , hn+1 ) − 1 ( )hn

y

p+2 (p+2)

(xn ) − 2 ( )hn

p+2 −3 ( )hn f (yn )y (p+1) (xn ) + O(hp+3 ),

y

(xn ) (3.2)

where 1 ( ), 2 ( ) and 3 ( ) are defined by 1 ( ) = D( ) − p+1 I , 2 ( ) = D( ) − p+2 I , 3 ( ) = D( ) − p+2 I .

(3.3)

568


Proof. The stage values, stage derivatives and output yn from the step are given by Theorem 2, with h replaced by the current stepsize hn and with the scaling by D( n ) applied to the output. This gives a result p+1 p+2 z(xn , hn+1 ) − −(p+1) hn+1 y (p+1) (xn ) − −(p+2) hn+1 y (p+2) (xn ) p+2

− −(p+2) hn+1 f (yn )y (p+1) (xn ) + O(hp+3 ),

which can be written in the form (3.2) with 1 ( ), 2 ( ) and 3 ( ) given by (3.3). p+1

p+2

p+2

If we can find reliable approximations for hn y (p+1) (xn ), hn y (p+2) (xn ) and hn f (yn ) n ), using only quantities computed in step number n, we can then construct an approximation to y (p+1) (x

p+1 (p+1)

1 ( )hn

y

p+2 (p+2)

(xn ) + 2 ( )hn

y

p+2

f (yn )y (p+1) (xn )

(xn ) + 3 ( )hn

and add this to the scaled output from the step to yield an approximation to z(xn , hn+1 ). This will be the “scaled and modified” result. The overall numerical algorithm for the numerical solution of (1.1) based on the formula (1.5), the scale and modify process described above, and p+1 p+2 p+2 the estimations of the quantities hn y (p+1) (xn ), hn y (p+2) (xn ), and hn f (y(xn ))y (p+1) (xn ), will be described in Section 4. 4. Estimating the corrections Once the stages have been evaluated in step n we have available the quantities hn F and y [n−1] . Our aim will therefore be to use approximations of the form Ti hn F + Ti y [n−1] ,

i = 1, 2, 3,

to estimate the quantities p+1 (p+1)

hn

y

(xn ),

p+2 (p+2)

hn

y

(xn )

p+2

f (xn )y (p+1) (xn ).

and hn

p+1

Because each of the resulting three estimators will be O(hn ), it will follow that Ti = −Ti C, where C is the Vandermonde matrix introduced for use in (1.9). Given this condition, these approximations are given in the following theorem. Theorem 5. We have the following estimates: p+1 (p+1)

(xn ) + O(hn

p+2 (p+2)

(xn ) + O(hn

T1 hn F + T1 y [n−1] = hn T2 hn F + T2 y [n−1] = hn

y

y

p+2

T3 hn F + T3 y [n−1] = hn

p+3

),

p+3

),

(4.1) p+3

f (yn )y (p+1) (xn ) + O(hn

),

where i and i , i = 1, 2, 3, satisfy the linear systems of equations cp T1 C + T1 = 0, T1 − T1 = 1, p! cp+1 = 1, T1 + T1 ( + e1 ) = 0, − T1 T1 (p + 1)!

(4.2)


T2 C + T2 = 0, T2

T2

T3 C + T3 = 0, T3

cp − T2 = 0, p!

cp+1 = 1, − T2 (p + 1)!

and T3

569

T2 + T2 ( + e1 ) = 0,

(4.3)

cp = 0, − T3 p!

cp+1 = 0, − T3 (p + 1)!

T3 + T3 ( + e1 ) = −1.

(4.4)

Proof. As a step towards finding suitable values of the Ti and Ti vectors, i = 1, 2, 3, we will approximate Ti hn F + Ti y [n−1] , i = 1, 2, 3, assuming that Ti = −Ti C, using Taylor series. This leads to p p+1 p+2 T [n−1] T T T c i hn F + i y = i hn y (p+1) (xn ) − hn y (p+2) (xn ) − i p! cp+1 p+2 hn y (p+2) (xn ) − Ti + Ti (p + 1)! p+2 p+3 − Ti + Ti ( + e1 ) hn f (yn )y (p+1) (xn ) + O(hn ). p+3

Using this result for i and i to achieve the approximations to within O(hn ), leads to the conditions that these coefficient vectors must satisfy, i.e. the systems (4.2)–(4.4). To summarize the results obtained in Sections 3 and 4, the overall numerical algorithm for computing approximations to the solution of (1.1) consisting of the method (1.5) and the scale p+1 and modify process described in Section 3, and estimations of the quantities hn y (p+1) (xn ), p+2 (p+2) p+2 (p+1) hn y (xn ) and hn f (y(xn ))y (xn ) derived in this section takes the form Y = eyn−1 + Ahn F + Uy [n−1] , Fi = f (Yi ), i = 1, 2, . . . , s, yn = yn−1 + bT hn F + v T y [n−1] , y [n] = D( )B + 1 ( )T1 + 2 ( )T2 + 3 ( )T3 hn F + D( )V + 1 ( ) T1 + 2 ( ) T2 + 3 ( ) T3 y [n−1] ,

(4.5)

where 1 ( ), 2 ( ) and 3 ( ) are defined by (3.3) and the vectors i and i , i = 1, 2, 3, are defined by systems (4.2)–(4.4), compare also (4.1). This algorithm is a generalization of the approach presented in [5]. The difference with the formulation presented in [5] is that in that paper 2 ( ) was able to be kept constant. and 3 ( ) were each zero and only 5. Zero-stability analysis In this section we will analyze zero-stability properties of the overall scale and modify method given by (4.5). Applying this method to the test equation y = 0,

y(0) = 1,

570


on the nonuniform grid {xn } we obtain from (4.5) that yn = yn−1 + v T y [n−1] , y [n] = D( n )V + 1 ( n ) T1 + 2 ( n ) T2 + 3 ( n ) T3 y [n−1] . Here, n = hn+1 / hn , the quantities 1 ( n ), 2 ( n ), 3 ( n ) are defined by (3.3) and i , i = 1, 2, 3, are defined by (4.2)–(4.4). To simplify notation we define the amplification matrix M( ) as M( ) = D( n )V + 1 ( n ) T1 + 2 ( n ) T2 + 3 ( n ) T3 .

(5.1)

Expressing y [n] in terms of the initial starting vector y [0] leads to y [n] = M( n )M( n−1 ) · · · M( 1 )y [0] , and the zero-stability of the method (4.5), is equivalent to the uniform boundedness of the product of matrices M( n )M( n−1 ) · · · M( 1 ). We follow the approach proposed in [12,13] to find the conditions under which this is the case. This approach is based on the theory of the joint spectral radius and the notion of a polytope norm for a family of matrices. According to this theory, zero-stability of (4.5) would follow if we can construct a polytope norm · ∗ in Rp , such that for the induced matrix norm, denoted by the same symbol, satisfies M( )∗ 1,

(5.2)

for ∈ [0, ∗ ]. These polytope norms are defined by their unit balls in Rp . Put

∗ = max { : (M( )) 1} , where (M( )) is the spectral radius of the amplification matrix M( ), given in (5.1). As explained in [12], often these polytope norms · ∗ can be found by successively applying the matrix M( ∗ ) to the set of vectors S = e1 , e2 , . . . , ep , where ei are canonical basis vectors in Rp . If M j ( )P ,

P ∈ S, j = 1, 2, . . .

are contained in a common convex hull, symmetric with respect to the origin, of some points in Rp for ∈ [0, ∗ ], then this convex hull defines the unit ball of the polytope norm · ∗ satisfying (5.2). This process was illustrated in [13] for variable stepsize three-step backward differentiation method, in [5] for some GLMs of order p = 2, and in [18] for some two-step W-methods of order p = 2. In Section 6, we give the amplification matrix M( ) and the convex hull which defines the unit ball for various methods satisfying (4.5). For methods presented in next section the matrices M( ) will have the following block structure: 0 0 M( ) = , M1 ( ) M2 ( )


571

with square blocks on the diagonal. It can be verified that 0 0 M( n ) · · · M( 2 )M( 1 ) = , M2 ( n ) · · · M2 ( 2 )M1 ( 1 ) M2 ( n ) · · · M2 ( 2 )M2 ( 1 ) and it follows that if M2 ( n ) · · · M2 ( 2 )M2 ( 1 ) C, for some constant C, then the nonzero blocks of the matrix M( n ) · · · M( 2 )M( 1 ) can be bounded by CM1 ( 1 ) and C, respectively. This means that zero-stability properties of the underlying numerical methods whose stability matrix is M( ) are governed by the product of nonzero diagonal blocks M2 ( n ) · · · M2 ( 2 )M2 ( 1 ). We can take advantage of this fact to investigate stability properties of the methods for which the matrix M( ) has the structure described above. This will be illustrated in Section 6. 6. Examples of methods To achieve stage order and order p requires at least p stages, all the methods reported in this section have s = p + 1 stages. This choice is prompted mainly because it is known that for this case IRKS methods can be derived using only linear operations [10]. For methods introduced to compete with IRKS methods, s = p + 1 in every case, this simplifies a direct comparison of accuracy because the computational costs per step are the same. We have chosen an order two and an order three IRKS method and compared each with a PECE scheme, see [20], of the same order. The first method, of order two, uses the composition of the order two Adams–Bashforth method over two steps of size 21 h then uses a PECE scheme with the composite Adams–Bashforth method as the predictor and the order two Adams–Moulton method as the corrector. We then reinterpret the data passed from step to step to approximate a Nordsieck vector. The overall method with error estimates is ⎡ ⎤ 0 0 0 1 21 81 ⎥ ⎡ ⎤ ⎢ 3 0 0 1 41 81 ⎥ ⎢ 4 A e U ⎢ ⎥ ⎢ T ⎥ ⎢ 1 1 1 ⎥ 1 0 1 ⎢b 1 vT ⎥ ⎢ 4 4 2 8 ⎥ ⎢ ⎥ ⎢ ⎥ 1 1 ⎢ B 0 V ⎥ ⎢ 1 0 1 2 81 ⎥ ⎢ ⎥ ⎢ 4 ⎥ 4 (6.1) ⎢ T 0 T ⎥ = ⎢ ⎥. ⎢ 1 ⎢ 0 ⎥ 0 1 0 0 0 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ T 0 T ⎥ ⎢ −2 0 2 0 0 0⎥ ⎢ ⎥ 2 ⎦ ⎣ 2 ⎢ ⎥ T −20 12 −4 0 12 2 T ⎢ ⎥ 3 0 3 ⎣ −24 16 −8 0 16 4 ⎦ 0 −16 16 0 0 0

572


For this method the abscissa vector is c = [ 21 , 1, 1]T , the error constant is = , and given by formulae (2.4) are 0 0 0 = 1 , = , = . 1 1 − 48 − 24 4

1 24

and the vectors

The amplification matrix M( ) given by formula (5.1) takes the form 0 0 M( ) = 7 2 3 2 4 1 2 1 3 1 4 . 3 − 3 + 3

3 − 2 + 6

Applying the procedure described in Section 5 it can be verified that the condition (5.2) is satisfied for ∈ [0, ∗ ], ∗ ≈ 2.5747, for the polytope norm · ∗ whose unit ball is a polytope with vertices P1 , P2 , P3 and P4 given by P1 = −P3 = [1 0]T ,

P2 = −P4 = [0 6.4394]T .

The second method, which is of third order is similar to the first method except we use the third order Adams–Bashforth method over three steps of size 13 h as the predictor and the order three Adams–Moulton method as the corrector. The method reinterpreted in Nordsieck form is ⎡ ⎤ 1 1 0 0 0 0 1 13 18 162 ⎢ 23 1 1 ⎥ 1 ⎢ ⎥ 0 0 0 1 36 108 72 ⎥ ⎢ 36 ⎥ 23 1 1 7 1 ⎡ ⎤ ⎢ ⎢ 36 0 0 1 6 108 72 ⎥ 36 A e U ⎢ ⎥ 2 5 1 1 ⎥ 1 ⎢ T ⎥ ⎢ 11 0 1 36 ⎥ 18 9 36 108 72 ⎢b 1 vT ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 5 1 1 11 1 ⎢ B 0 V ⎥ ⎢ ⎢ ⎥ 0 1 ⎢ ⎥ 18 9 36 36 108 72 ⎥ . (6.2) ⎢ T 0 T ⎥ = ⎢ ⎢ ⎢ 1 1 ⎥ 0 0 0 1 0 0 0 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ 9 ⎢ T 0 T ⎥ ⎢ −6 0 0 0 0 0 ⎥ ⎢ 23 2 2 ⎦ ⎣ 2 ⎢ ⎥ 0 9 0 0 0 0 ⎥ ⎢ 9 −18 T3 0 T3 ⎢ ⎥ 243 ⎢ 9 − 171 − 171 0 81 18 21 ⎥ ⎢ ⎥ 2 2 2 2 ⎢ ⎥ ⎣ 36 −99 135 −99 0 27 18 2 ⎦ 972 0 0 0 0 0 0 − 972 5 5 The abscissa vector is c = [ 13 , 23 , 1, 1]T , the error constant is = are ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 0 0 ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ = ⎣ − 108 ⎦ , = ⎣− 1 ⎦. = ⎣ 1 ⎦,

17 1944

and the vectors , and

108

27 1 3

7 − 108

5 − 108

The amplification matrix M( ) takes the form ⎡ 0 0 ⎢ 5 2 3 4 1 5 1 2 2 4 1 5 M( ) = ⎣ 4 − 2 + 4

2 − 3 + 6

47 3 4

−

27 4 2

+ 47 5

29 3 6

− 6 4 + 76 5

0 1 4 − 54

+ 1 3 27

−

1 5 54

1 4 7 5 6 + 54

⎤ ⎥ ⎦.


573

The condition (5.2) is satisfied for ∈ [0, ∗ ], ∗ = 1.621033683, for the polytope norm · ∗ whose unit ball in the three-dimensional space (x, y, z) is a diamond-shaped region with vertices [1, 0, 0]T and [−1, 0, 0]T connected to the base in (y, z) plane with vertices P1 , P2 , P3 , P4 , P5 and P6 . This base is plotted in Fig. 2. The (x, y, z) coordinates of the points Pi are P1 = −P4 = [0 1 0]T , P2 = −P5 = [0 4.2136 22.4684]T , P3 = −P6 = [0 4.2741 23.5764]T . We can reach the same conclusion about zero-stability taking into account the special form of the matrix M( ). For the matrix M2 ( ) given by the last two rows and columns of M( ), it is found that (M2 ( ))1 for ∈ [0, ] where is as above. The eigenvalues of M2 ( ) are {−1, 0.0339637790} and furthermore the eigenvector matrix T is given by T =

0.0882162446 0.087063408 . 0.4709452399 1.598368850

Form the matrix T −1 M2 ( )T , and evaluate its · ∞ norm. Then it is found that T −1 M2 ( )T ∞ 1,

∈ [0, ].

The third example is a second order IRKS method. The free parameters have been chosen in such a way that the method is similar to the order two PECE scheme above. The method coefficients are ⎡ ⎡

A

⎢ T ⎢b ⎢ ⎢ B ⎢ ⎢ T ⎢ 1 ⎢ ⎢ T ⎣ 2 T3

e

U

1 vT 0 V 0 0 0

T1 T2 T3

0

0

0

⎢ 5 ⎢ 3 0 0 ⎢ ⎢ 1 23 ⎥ ⎢ 0 ⎥ ⎢ 24 8 ⎥ ⎢ ⎥ ⎢ 23 1 0 ⎥ ⎢ 24 8 ⎥=⎢ ⎥ ⎢ 0 0 1 ⎥ ⎢ ⎥ ⎢ −2 −2 4 ⎦ ⎢ ⎢ −20 −17 25 ⎢ ⎣ −24 −20 28 0 12 −12 ⎤

1 2 − 23 1 − 12

1 8 − 13 5 − 48

1 1 − 12

5 − 48

1 1 1

0 0

0 0

0 0

0 0 0

12 16 0

2 4 0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(6.3)

1 This method has abscissa vector c = [ 21 , 1, 1]T , error constant = − 24 . The vectors , and , the amplification matrix M( ) and the polytope are the same as for the order two PECE method (6.1). The last example is a third order IRKS method. The free parameters have been chosen in such a way that the method is similar to the order three PECE scheme above. The method

574


50

Q6

40 30 20

P3 P2

Q7,8,9,10

z

10 P4

0

P1

−10 Q2,3,4,5

−20

P5 P6

−30 −40 −50

Q1

−5

−4

−3

−2

−1

0 y

1

2

3

5

4

Fig. 2. Bases in (y, z)-plane of unit balls in the polytope norms for the order three PECE scheme (Pi vertices) and the order three IRKS method (Qi vertices).

coefficients are

⎡

A

⎢ T ⎢b ⎢ ⎢ B ⎢ ⎢ T ⎢ 1 ⎢ ⎢ T ⎣ 2 T3

e

⎡

U

1 vT 0 V 0 T1 0 0

T2 T3

0

⎢ 3 ⎢ 5 ⎢ ⎢ ⎤ ⎢ 73 ⎢ ⎢ 529 ⎥ ⎢ 810 ⎥ ⎢ ⎥ ⎢ 529 ⎥ ⎢ 810 ⎥ ⎢ ⎥=⎢ 0 ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ − ⎦ ⎢ 6 ⎢ ⎢ −21 ⎢ ⎢ ⎢ 9 ⎢ ⎣ 36 0

0

1

1 3 1 15 1 − 14 23 − 270

1 18 1 45 1 − 14 14 − 405

151 14580

7 81

0

23 1 − 270

14 − 405

151 14580

1

0

0

0

0

1

0

0

0

1

9 14 28 81

0

0

1

7 81

28 81

0

0

− 31 6

− 14 3

9

−3

−21

27

− 171 2

−171

−99 0

1 162 13 810

0

0

0

0

1

1 6

1 − 108

0

18

3

− 16

207 0

81 2

18

1 2

−180 216 0 270 −270 0

27 0

18 0

2 0

This method has abscissa vector c = [ 13 , 23 , 1, 1]T , error constant = and are ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 0 0 ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ = ⎣ − 108 ⎦ , = ⎣ − 324 ⎦ . = ⎣ 27 ⎦ , 7 1 1 − 108 − 108 3

1 120

0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ . (6.4) ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

and the vectors ,


575

The amplification matrix M( ) takes the form ⎡ ⎢ M( ) = ⎣

0

0

9 2 3 4 1 5 4 − 2 + 4

119 3 27 4 7 5 4 − 2 + 4

2 2 2 4 1 5 3 − 3 + 6

4 47 3 7 5 6 − 6 + 6

0 1 2 − 108

−

− 16 4 −

1 4 1 5 54 + 54

7 3 7 5 54 + 54

⎤ ⎥ ⎦.

The condition (5.2) is satisfied for ∈ [0, ∗ ], ∗ = 1.547908766, for the polytope norm · ∗ whose unit ball in the three-dimensional space (x, y, z) is a diamond-shaped region with vertices [1, 0, 0]T and [−1, 0, 0]T connected to the base in (y, z) plane with vertices Q1 , Q2 , Q3 , Q4 , Q5 , Q6 , Q7 , Q8 , Q9 , and Q10 . This base is plotted in Fig. 2. The (x, y, z) coordinates of the points Qi are Q1 = −Q6 = [0 0.9987 − 48.3857]T , Q2 = −Q7 = [0 2.4927 − 18.7892]T , Q3 = −Q8 = [0 2.5442 − 17.7671]T , Q4 = −Q9 = [0 2.5459 − 17.7313]T , Q4 = −Q10 = [0 2.5459 − 17.7296]T . As before we can reach the same conclusion about zero-stability using the approach described at the end of Section 5. For the matrix M2 ( ) given by the last two rows and columns of M( ), it is found that (M2 ( ))1 for ∈ [0, ] where is as above. The eigenvalues of M2 ( ) are {−1, −0.0345118943} and furthermore the eigenvector matrix T is given by 0.0820634390 0.0455181560 T = . −0.5714535837 0.9016010973 Again, form the matrix T −1 M2 ( )T , and evaluate its · ∞ norm. Then it is found that T −1 M2 ( )T ∞ 1,

∈ [0, ].

We compare these results to those in [7], where it was proved that methods, using a slightly modified form of (4.5), with 2 ( ) = 3 ( ) = 0 are zero stable for any choice of stepsize sequence. However, this is at the expense of losing the ability to estimate the higher order terms p+2 p+2 of the form hn y (p+2) (xn ) and hn f (y(xn ))y (p+1) (xn ). The construction of highly stable methods (possibly unconditionally stable) which also allow for the estimation of terms of order p + 2 is the subject of ongoing work. 7. Numerical experiments In this section, we test experimentally the reliability of the error estimates for y (p+1) , y (p+2) and f y (p+1) using the approach of Section 3. At the same time we wish to assess the accuracy of low order derivative estimates y , y , . . . , found from outgoing Nordsieck vector approximations. We apply the tests to the methods with p = 2 and 3 derived in Section 6 in each case using the

576


well-known Van der Pol equation (denoted by E2 in the DETEST set [15]) y1 = y2 , y2

y1 (0) = 2,

= (1 − y12 )y2 − y1 ,

y2 (0) = 0,

(7.1)

with integration interval [0, 8]. For such a simple system as this, it is possible to find formulae (i−1) (i−2) = y2 , i = 3, 4, . . . , so that successive derivatives of the vector for y (i) . Write yi = y1 T valued function y = [y1 , y2 ] can be found as y2 y3 y4 (3) , y = , y = , ... . y = y3 y4 y5 Formulae for y3 , y4 , . . . are y3 = (1 − y12 )y2 − y1 , y4 = (1 − y12 )y3 − 2y1 y2 − y2 , y5 = (1 − y12 )y4 − 6y1 y2 y3 − 2y23 − y3 , y6 = (1 − y12 )y5 − 8y1 y2 y4 − 12y23 y3 − 6y1 y32 − y4 , y7 = (1 − y12 )y6 − 10y1 y2 y5 − 30y2 y32 − 20y22 − 20y1 y3 y4 − y5 and we also have available the Jacobian matrix ⎤ ⎡ *f1 *f1 ⎥ ⎢ *y *f 0 1 *y 1 2 ⎥ ⎢ = , f = =⎣ *f2 *f2 ⎦ −2y1 y2 − 1 1 − y12 *y *y1 *y2 which is needed to evaluate the required values of f y (p+1) , for p = 2, 3. The numerical experiments were performed in a variable stepsize environment based on the formula 1/(p+1)

Tol hn+1 = hn , (7.2) est(p, xn ) where = 0.5. Formula (7.2) corresponds to the standard step changing strategy without any limiters or exceptions, compare for example [14,22,23]. Here, Tol is a given accuracy tolerance and the estimate of the local discretization error est(p, xn ) = T1 hn F + T1 y [n−1] p+1

is used to estimate hn y (p+1) (xn ); compare with Eqs. (4.1) and (4.2). In Fig. 3, we have plotted the norms of the derivative expressions required to compare order two behaviour with that of contending methods of orders one and three. This is compared with these quantities computed using the order two PECE scheme and the order two IRKS scheme. Both methods were given in Section 6. Apart from a slight phase shift in the O(hp+2 ) approximations, the numerical estimations are found to be quite accurate. In fact they are much more accurate than the estimates obtained without scale and modify procedure. In Fig. 4 we have repeated the experiment reported in Fig. 3 but now using order three methods. Although a variable order solver would normally permit switching from orders three to one, we


577

10 5 0 0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

20 10 0 100 50 0 1000 500 0 200 100 0 Fig. 3. Each of the derivative expressions (solid line) y (x), y (x), y (3) (x), y (4) (x) and f (y(x))y (3) (x), from top to bottom, plotted over the integration interval along with approximations to these quantities computed using order two methods: PECE scheme (•) and IRKS scheme (◦). In each case the tolerance was 10−3 .

have not included the values of y (x) in this figure, because the estimates are exact as they were for the orders two methods in Fig. 3. As for the order two experiments, there is a phase shift in the O(hp+2 ) approximations, but otherwise, the results confirm the ability to estimate the quantities we need in a variable order strategy. Examining the numerical experiments shows that the quality of the estimators for the higher order terms p+1 (p+1)

hn

y

(xn ),

p+2 (p+2)

hn

y

(xn )

p+2

and hn

f (y(xn ))y (p+1) (xn )

578


20

10

0 0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

100

50

0 1000 500

0 5000 2500 0 2000

1000

0 Fig. 4. Each of the derivative expressions (solid line) y (x), y (3) (x), y (4) (x), y (5) (x) and f (y(x))y (4) (x) from top to bottom, plotted over the integration interval along with approximations to these quantities computed using order three methods: PECE scheme (•) and IRKS scheme (◦). In each case the tolerance was 10−4 .

is reasonably good and sufficient for practical purposes. The exact quality of these estimates depends very much on the method chosen. Future work will focus on determining how we can identify which methods are the most suitable and use these methods in a variable stepsize, variable order environment. 8. Concluding remarks In this paper we have considered the error propagation of a subclass of GLMs, which have stage order equal to the order and passes a Nordsieck vector from step to step. This choice

J.C. Butcher et al. / Journal of Complexity 23 (2007) 560 – 580 p+1

579 p+2

makes it possible to estimate the elementary differentials hn y (p+1) (xn ), hn y (p+2) (xn ) and p+2 p+3 hn f (y(xn ))y (p+1) (xn ) to within O(hn ). This makes available an asymptotically correct local error estimator and also an asymptotically correct local error estimator of a method of one higher order. This information can be used by an algorithm to effectively choose the most efficient scheme from the methods of order {p − 1, p, p + 1}, allowing an increase or a decrease in the order as is most suitable. Acknowledgments One of the authors (Z.J.) acknowledges support from the Foundation under Grant DMS0509597. Another (W.M.W.) was supported by the Royal Society of New Zealand. Each of the authors (especially J.C.B.) acknowledge assistance and support from the Marsden Fund. The helpful and constructive comments by the referees have led to substantial improvements; the authors gratefully acknowledge this assistance. References [1] J.C. Butcher, Diagonally-implicit multi-stage integration methods, Appl. Numer. Math. 11 (1993) 347–363. [2] J.C. Butcher, The Numerical Solution of Ordinary Differential Equations, Wiley, New York, 2003. [3] J.C. Butcher, P. Chartier, Z. Jackiewicz, Nordsieck representation of DIMSIMs, Numer. Algorithms 16 (1997) 209–230. [4] J.C. Butcher, P. Chartier, Z. Jackiewicz, Experiments with a variable-order type 1 DIMSIM code, Numer. Algorithms 22 (1999) 237–261. [5] J.C. Butcher, Z. Jackiewicz, A new approach to error estimation for general linear methods, Numer. Math. 95 (2003) 487–502. [6] J.C. Butcher, Z. Jackiewicz, Construction of general linear methods with Runge–Kutta stability properties, Numer. Algorithms 36 (2004) 53–72. [7] J.C. Butcher, Z. Jackiewicz, Unconditionally stable general linear methods for ordinary differential equations, BIT 44 (2004) 557–570. [8] J.C. Butcher, Z. Jackiewicz, Towards a code for nonstiff differential systems based on general linear methods with inherent Runge–Kutta stability, in preparation. [9] J.C. Butcher, H. Podhaisky, On error estimation in general linear methods for stiff ODEs, Appl. Numer. Math. 56 (2006) 345–357. [10] J.C. Butcher, W.M. Wright, The construction of practical general linear methods, BIT 43 (2003) 695–721. [11] P. Chartier, The potential of parallel multi-value methods for the simulation of large real-life problems, CWI Quarterly 11 (1998) 7–32. [12] N. Guglielmi, M. Zennaro, On the asymptotic properties of a family of matrices, Linear Algebra Appl. 322 (2001) 169–192. [13] N. Guglielmi, M. Zennaro, On the zero-stability of variable stepsize multistep methods: the spectral radius approach, Numer. Math. 88 (2001) 445–458. [14] E. Hairer, S.P. NZrsett, G. Wanner, Solving Ordinary Differential Equations I. Nonstiff Problems, Springer, Berlin, Heidelberg, New York, 1993. [15] T.E. Hull, W.H. Enright, B.M. Fellen, A.E. Sedgwick, Comparing numerical methods for ordinary differential equations, SIAM J. Numer. Anal. 9 (1972) 603–637. [16] Z. Jackiewicz, Implementation of DIMSIMs for stiff differential systems, Appl. Numer. Math. 42 (2002) 251–267. [17] Z. Jackiewicz, Construction and implementation of general linear methods for ordinary differential equations. A review, J. Sci. Comput. 25 (2005) 29–49. [18] Z. Jackiewicz, H. Podhaisky, R. Weiner, Construction of highly stable two-step W-methods for ordinary differential equations, J. Comput. Appl. Math. 167 (2004) 389–403. [19] U. Kirchgraber, Multistep methods are essentially one-step methods, Numer. Math. 48 (1986) 85–90. [20] J.D. Lambert, Computational Methods in Ordinary Differential Equations, Wiley, Chichester, New York, 1973. [21] A. Nordsieck, On numerical integration of ordinary differential equations, Math. Comp. 16 (1962) 22–49.

580


[22] L.F. Shampine, Numerical Solution of Ordinary Differential Equations, Chapman & Hall, New York, London, 1994. [23] L.F. Shampine, I. Gladwell, S. Thompson, Solving ODEs with MATLAB, Cambridge University Press, Cambridge, 2003. [24] D. Stoffer, General linear methods: connection to one step methods and invariant curves. Numer. Math. 64 (1993) 395–408. [25] W.M. Wright, General linear methods with inherent Runge–Kutta stability, Ph.D. Thesis, The University of Auckland, New Zealand, 2003.


On the existence of higher order polynomial lattices based on a generalized figure of merit Josef Dicka , Peter Kritzerb , Friedrich Pillichshammerc , Wolfgang Ch. Schmidb,∗ a University of New South Wales Asia, 1 Kay Siang Road, Singapore 248922, Singapore b Fachbereich Mathematik, Universität Salzburg, HellbrunnerstraYe 34, A-5020 Salzburg, Austria c Institut für Finanzmathematik, Universität Linz, AltenbergerstraYe 69, A-4040 Linz, Austria

Received 9 October 2006; accepted 29 December 2006 Available online 30 January 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract Dick and Pillichshammer recently introduced generalized rank-1 polynomial lattices which can be viewed as digital (t, , , n × m, s)-nets as introduced by the first author. In this work we generalize the figure of merit of rank-1 polynomial lattices such that the new figure of merit is related to the t-value, when one views the rank-1 polynomial lattice as a digital (t, , , n × m, s)-net. Then we show the existence of rank-1 polynomial lattices for which the generalized figure of merit satisfies a certain condition. We present some numerical results comparing the corresponding t-value to known explicit constructions. © 2007 Elsevier Inc. All rights reserved. MSC: 65D32; 65D30; 11K36 Keywords: Digital net; Polynomial lattice; Figure of merit

1. Introduction Digital (t, m, s)-nets in base b (see [8–11]) are useful for the numerical integration of functions with bounded variation over the high-dimensional unit cube. Recently, generalized digital nets (so-called digital (t, , , n × m, s)-nets in base b) were introduced in [2,3] which are also useful ∗ Corresponding author.

E-mail addresses: [email protected] (J. Dick), [email protected] (P. Kritzer), [email protected] (F. Pillichshammer), [email protected] (W.Ch. Schmid). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2006.12.003

582

J. Dick et al. / Journal of Complexity 23 (2007) 581 – 593

for the numerical integration of smooth functions. First constructions of such generalized digital nets were also introduced in [2,3]. There is a subclass of “classical” digital nets called polynomial lattices [9,10], which was generalized in [4] to fit the new framework of (t, , , n × m, s)-nets introduced by the first author. Note that in this paper we only consider rank-1 polynomial lattices, but for short we refer to them as polynomial lattices. Various results on the existence of such polynomial lattices are known for the classical case [5,12] and in this paper we show results on the existence of polynomial lattices within the new framework depending on their digital (t, , , n × m, s)net properties (the precise definition of such digital nets will be given below). In particular, a result which relates the t-value of a classical polynomial lattice to its figure of merit [10], is generalized here to a relation between the type of digital nets considered in [2,3] and the polynomial lattices introduced in [4]. More precisely, we generalize the figure of merit to higher orders > 1 and relate it to the t-value when one considers those polynomial lattices as digital (t, , , n × m, s)-nets. The relevance of such constructions for numerical integration will be explained in the following. Consider the Sobolev space Hsob,s, of functions f : [0, 1]s → R whose partial mixed derivatives up to order in each variable are square integrable. Here s 1 and > 1 (for = 1 we obtain the classical case which has been considered in many papers, see for example [10,14] and the references therein). For the one-dimensional case the inner product is given by f, gHsob,1, =

−1

1

f

()

1

(x) dx

=0 0

g

()

1

(x) dx +

0

f () (x)g () (x) dx,

0

where f () denotes the th derivative of f and where f (0) = f . The reproducing kernel (see [1] for more information about reproducing kernels) for this space is given by Ksob,1, (x, y) =

B (x)B (y) =0

(!)2

+

B2 (|x − y|) , (2)!

where B denotes the Bernoulli polynomial of degree , e.g., we have B0 (x) = 1, B1 (x) = x−1/2, B2 (x) = x 2 − x + 1/6 and so on. For dimension s > 1, Hsob,s, refers to the tensor product of s such one-dimensional spaces. In [14] the authors introduced the notion of weighted function spaces. To this end they introduced a sequence = {u : u ⊆ {1, . . . , s}} of positive weights to model the importance of different coordinate directions of the integrand. For an exact definition of the weighted version Hsob,s,, of the function space Hsob,s, and for more information on those or related spaces in our context we refer to [3] (see also [2] for a version for periodic functions). Consider now numerical integration of functions from the weighted Sobolev space Hsob,s,, by a quasi-Monte Carlo rule Qbm ,s based on digital nets, i.e., an equal weight quadrature rule with the quadrature points taken from a digital net consisting of bm points. We measure the quality of the quadrature points using the worst-case error e(Qbm ,s , Hsob,s,, ), which is the worst performance of our quasi-Monte Carlo rule Qbm ,s over all functions in the unit ball in the Sobolev space Hsob,s,, . Using these notations, the following theorem was shown in [3]. Theorem 1. Let > 1 be an integer and b 2 be a prime number. The worst-case error for multivariate integration in the Sobolev space Hsob,s,, , using a digital (t, , , n × m, s)-net


583

over Fb (with 0 < 1) as quadrature points, is bounded by ⎛ ⎞1/2 2 2|u| ⎠ u (C|u|,b, , e(Qbm ,s , Hsob,s,, ) b−(n−t) ⎝ ) (n − t + ) ∅=u⊆{1,...,s}

where C|u|,b,

=

|u|/2 Cb, b|u|

b

−1

−|u| 1/−1 + 1−b

and Cb, > 0 is a constant depending only on b and . Among other things, in [3] it was shown that for each , m and s there is a digital (t, , 1, m × m, s)-net (i.e., = 1 and n = m) and that one can explicitly construct such digital nets with the t-value being independent of m. By using such a digital (t, , 1, m × m, s)-net with one can obtain a convergence rate of order N − (log N )s (where N = bm denotes the number of points), see [3] (and also [2] for similar results). Note that the t-value is a quality parameter of such digital nets (see below for the precise definitions). Smaller values of t imply better bounds on the worst-case error (compare with Theorem 1). The aim of this paper is to show that computer search methods for finding good digital (t, , , n × m, s)-nets (via generalized polynomial lattices) can be useful. More precisely, we show here that for certain values of , n, m, s, b we can find digital (t, , , n × m, s)-nets using computer search based on generalized rank-1 polynomial lattices with a lower t-value than the known constructions from [2,3]. As for classical (t, m, s)-nets in base b there are purely theoretical constructions, but for some parameters m, s, b computer search methods provide nets with smaller t-value, i.e., higher quality (see the web-based database system MINT available at the address http://mint.sbg.ac.at/). As will be shown in this paper, the same happens for the generalized digital nets introduced in [2,3]. Unfortunately, our results here are not explicit as opposed to the constructions in [2,3]. On the other hand our results here show that there is still room for improvement upon the constructions for digital nets proposed in [2,3] (though the same cannot be inferred from our paper for digital sequences). Just as for classical polynomial lattices, the asymptotic results on the t-value are not as good as the ones from theoretical constructions, see [5] for the classical case, which also appears in our case here. Hence the improvement upon the theoretical constructions is not in general terms (i.e., asymptotically) but rather for specific instances of , n, m, s and b. At the end of the paper we give numerical results comparing the t-values obtained in this paper with the ones obtained using the construction in [2,3] based on known explicit constructions. From there one can see that in some cases computer search methods can produce digital nets of higher quality than one can obtain from the explicit constructions proposed in [2,3]. 2. Digital nets and polynomial lattices In this section we introduce digital nets and polynomial lattices which can achieve arbitrary high convergence rates of the integration error for suitably smooth functions (see [2,3]). This is achieved by a slight generalization of the classical definition of digital nets (i.e., we consider generating matrices of size n × m instead of m × m), see [8–10], and [11] for a recent survey article on digital nets. In the following let b be a prime and let Fb denote the finite field of order b.

584


Definition 1 (Digital net). Let b be a prime and let s, m, n1 be integers. Let C1 , . . . , Cs be n × m matrices over the finite field Fb . We construct bm points in [0, 1)s in the following way: for 0 h < bm let h = h0 + h1 b + · · · + hm−1 bm−1 be the b-adic expansion of h. Identify h with the vector h = (h0 , . . . , hm−1 ) ∈ Fm b , where means the transpose of the vector. For 1 j s

i.e., multiply the matrix Cj by h, Cj h =: (yj,1 (h), . . . , yj,n (h)) ∈ Fnb , and set xh,j :=

yj,1 (h) yj,n (h) . + ··· + b bn

The point set {x0 , . . . , xbm −1 } with xh = xh,1 , . . . , xh,s is called a digital net (over Fb ) (with generating matrices C1 , . . . , Cs ). The following definition was first introduced in [3] (see also [2] for a similar definition). It is fitted to the behavior of the Walsh coefficients of smooth functions via the dual net in order to obtain a fast convergence of the integration error for numerical integration using this type of digital net. For details see [2,3]. Definition 2 (Digital (t, , , n×m, s)-net). Let n, m, 1 be natural numbers, let 0 v: We count all k = kav x av −1 + · · · + ka1 x a1 −1 with 0 < av < · · · < a1 , kar = 0 for r ∈ {1, . . . , v} and a1 + · · · + av = l. For kar , r ∈ {1, . . . , v} we have exactly (b − 1)v possible choices. The number of 0 < av < · · · < a1 with a1 + · · · + av = l is the same as the number of 0 bv · · · b1 with b1 + · · · + bv = l − v(v+1) 2 ; write bi = ai − (v + 1 − i) for i = 1, . . . , v. This number can l− v(v+1) +v−1 2 . As v may be chosen from {1, . . . , − 1} we have be bounded from above by v−1 at most −1 v(v+1) +v−1 v l− 2 (b − 1) v−1 v=1

polynomials k = kav x av −1 + · · · + ka1 x a1 −1 with 0 < av < · · · < a1 and kar = 0 for r ∈ {1, . . . , v}, > v and deg (k) = l. The result follows by adding the two sums from the above two cases. Now we can prove our main result which gives a condition for the existence of a polynomial lattice with a certain figure of merit. Theorem 3. Let n, m, 1, s 2 be natural numbers, b a prime and p ∈ Fb [x] with deg(p) = nm be irreducible. For > 0 define s i s (s, , ) = C(, lz ), i l=0 i=1

l1 ,...,li 1 z=1 l1 +···+li =l

where C(, l) is defined in Lemma 1. (1) If (s, , ) < bm , then there exists a q ∈ Rns with (Sp,m,n (q)) .

bm , then there exists a polynomial q ∈ Rn such that q ≡ 1, q, q 2 , . . . , q s−1 (2) If (s, , ) < s−1 (mod p) satisfies (Sp,m,n (q)) . Proof. (1) There are |Rns | = |Rn |s = bns vectors q to choose from. We will estimate the number of vectors q for which (Sp,m,n (q)) < for some chosen 0. If this number is smaller than the total number of possible choices then it follows that there is at least one vector with (Sp,m,n (q)). For each non-zero vector k ∈ Fsb [x] there are bns−m vectors q ∈ Rns such that k · q ≡ a (mod p) for some a ∈ Fb [x] with deg(a) < n − m.


589

Let now A(l, s, ) denote the number of non-zero vectors k ∈ Fsb [x] with sj =1 deg (kj ) = l. The quantity C(, l) defined in Lemma 1 is an upper bound on the number of non-zero polynomials k ∈ Fb [x] with deg (k) = l. Thus we have A(l, s, )

s s i=1

i

i

C(, lz ).

l1 ,...,li 1 z=1 l1 +···+li =l

Now l=0 A(l, s, ) is a bound on the number of non-zero vectors k ∈ Fsb [x] with sj =1 deg (kj ) . Hence the number of vectors q ∈ Rns for which (Sp,m,n (q)) < is bounded by bns−m ns l=0 A(l, s, ). Hence if this number is smaller than b , that is if at least bns−m

A(l, s, ) < bns ,

l=0

then there exists a vector q ∈ Rns with (Sp,m,n (q)). Hence the result follows. (2) We proceed as in (1), but we note that there are |Rn | = bn polynomials q ∈ Rn to choose from and that for each non-zero vector k ∈ Fsb [x] there are at least (s−1)bn−m of these polynomials

q such that k · 1, q, q 2 , . . . , q s−1 ≡ a (mod p) for some a with deg(a) < n − m. If at least (s − 1)bn−m

A(l, s, ) < bn ,

l=0

then there exists a q ∈ Rn such that q ≡ 1, q, q 2 , . . . , q s−1 (mod p) satisfies (Sp,m,n (q)) . Hence the result follows. Above we have shown the existence of polynomial lattices which are digital (t, , , n × m, s)nets over Fb for which the quality parameter t satisfies a certain condition. This follows from Theorem 2 together with Theorem 3. Note that in the search for a polynomial lattice we have to choose the value up front. If we do not know the smoothness of the integrand, then it can happen that = . Hence in order for the bound in Theorem 1 to apply we still need to know the figure of merit of some order of a polynomial lattice which was constructed using the parameter (where possibly = ; the bound in Theorem 1 can then be used with n − t = ). Hence in the following we will establish a propagation rule for polynomial lattices. Theorem 4. Let Sp,m,n (q) be a polynomial lattice with figure of merit (Sp,m,n (q)). Then for all we have (Sp,m,n (q)) (Sp,m,n (q)) and for 1 we have (Sp,m,n (q))

(Sp,m,n (q)) − 2.

Proof. First let . Then deg (k)deg (k) for all k ∈ Fb [x] and hence the definition of the figure of merit implies the result. Let now 1 . Theorem 2 implies that the polynomial lattice Sp,m,n (q) is a digital (t, , , n × m, s)-net over Fb with t = n − (Sp,m,n (q)). From a result

590


in [3] it follows that Sp,m,n (q) is also a digital (t , , , n × m, s)-net over Fb with = / and t = t /. Using Theorem 2 again it follows that (Sp,m,n (q)) = n − t = n / − t / which is the desired result.

(Sp,m,n (q)) − 2,

4. Discussion Combining Theorems 2 and 3 yields results on the existence of digital (t, , , n × m, s)-nets over Fb with, in certain cases, low t-value. Let us in the following, for fixed b and integer , consider the case n = m and = 1, i.e., we study digital (t, , 1, m × m, s)-nets over Fb or for short digital (t, , m × m, s)-nets. Theorem 3 (1) guarantees the existence of a digital (t1 , , m × m, s)-net over Fb , where t1 = m − 1

(1)

and 1 is the maximal such that (s, , ) as defined in Theorem 3 is less than bm . Furthermore, Theorem 3(2) guarantees the existence of a digital (t2 , , m × m, s)-net Sp,m,m (q) over Fb with q ≡ (1, q, q 2 , . . . , q s−1 ) (mod p), where t2 = m − 2

(2)

and 2 is the maximal such that (s, , ) < bm /(s − 1). We compare our existence results to explicit constructions of digital (t, , m × m, s)-nets over Fb . Given the generating matrices C1 , . . . , Cs of a digital (t , m, s)-net over Fb , ([3], see also [2]) gives the construction principle of a digital (t3 , , m × m, s)-net over Fb with ( − 1) t3 = min m, t + s . (3) 2 For several values of , m, s, and b, we computed the values of t1 , t2 , and t3 given by (1), (2), and (3), respectively. Our numerical results are visualized in Figs. 1–4. The values of t for existing digital (t , m, s)-nets over Fb with explicitly computable generating matrices were taken from 30

30

25

25

20

20

15

15

10

10

t1 t2 t3

5 2

5

10

15 m

20

t1 t2 t3

5 25

2

5

10

15

20

25

m

Fig. 1. t-values depending on m (2 m 25) for s = 5, = 2, and b = 2 (left), b = 3 (right).


50

50

40

40

30

30

20

20 t1 t2 t3

10

2

5

10

15

20

591

t1 t2 t3

10

25

2

5

10

m

15

20

25

m


50

50

40

40

30

30

20

20 t1 t2 t3

10

2

5

10

15 m

20

t1 t2 t3

10

25

2

5

10

15

20

25

m


the web-based database system MINT (available at the address http://mint.sbg.ac.at/) for querying bounds on (t, m, s)-net and (t, s)-sequence parameters (see [13] for a recent outline). From Figs. 1–4 we see that we frequently have t2 > t1 , which occurs as the bound on (s, 2 , ) is smaller than that on (s, 1 , ) (point sets in Theorem 3 (2) (q ≡ (1, q, q 2 , . . . , q s−1 ) (mod p)) are also special cases of those considered in Theorem 3 (1)). On the other hand, when performing a full search, generating vectors q of the form as in Theorem 3 (2) are more likely to be found than in the general case since the size of the search space is smaller. Overall, the difference between t1 and t2 can be said to be not very large. The main conclusion to be drawn from Figs. 1–4 is that both t1 and t2 are lower than t3 for higher dimensions and/or higher values of , whereas the opposite is the case for lower dimensions and/or lower values of . This is certainly caused by the term s( − 1)/2 in the formula for t3 depending on t . This “error term” becomes large as s and grow—it becomes so large that for higher dimension t3 attains the maximal possible value m. Note that the term s( − 1)/2 comes from an estimation which in general cannot be improved for the construction proposed in [2,3] unless one uses more information about the underlying digital (t , m, s)-net over Fb (it is possible on the other hand that the real t-value is actually smaller than the upper bound (3)). In [3] there

592


70

70

60

60

50

50

40

40

30

30 t1 t2 t3

20 10 2

5

10

15 m

20

t1 t2 t3

20 10 25

2

5

10

15

20

25

m


is also a lower bound (which again relates the t-value of a digital (t, , m × m, s)-net over Fb to a digital (t , m, s)-net over Fb ), which is the same as the upper bound except for this additional term. From this it follows that the constructions in [2,3] leave some room for improvement and we have shown here that indeed there exist polynomial lattices which can in certain cases improve upon the construction in [2,3]. Unfortunately, our results here are not explicit as opposed to the results in [2,3]. Acknowledgments This work is supported by the Austrian Research Foundation (FWF), Project S 9609, that is part of the Austrian Research Network “Analytic Combinatorics and Probabilistic Number Theory”, and Project P18455. The support of the ARC under its Center of Excellence program is gratefully acknowledged. References [1] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950) 337–404. [2] J. Dick, Explicit constructions of quasi-Monte Carlo rules for the numerical integration of high dimensional periodic functions, submitted for publication. [3] J. Dick, Walsh spaces containing smooth functions and quasi-Monte Carlo rules of arbitrary high order, submitted for publication. [4] J. Dick, F. Pillichshammer, Strong tractability of multivariate integration of arbitrary high order using digitally shifted polynomial lattice rules, J. Complexity, 2007, to appear. [5] G. Larcher, A. Lauss, H. Niederreiter, W.Ch. Schmid, Optimal polynomials for (t, m, s)-nets and numerical integration of multivariate Walsh series, SIAM J. Numer. Anal. 33 (1996) 2239–2253. [6] P. L’Ecuyer, Polynomial integration lattices, in: H. Niederreiter (Ed.), Monte Carlo and Quasi-Monte Carlo Methods 2002, Springer, Berlin, 2004, pp. 73–98. [7] C. Lemieux, P. L’Ecuyer, Randomized polynomial lattice rules for multivariate integration and simulation, SIAM J. Sci. Comput. 24 (2003) 1768–1789. [8] H. Niederreiter, Point sets and sequences with small discrepancy, Monatsh. Math. 104 (1987) 273–337. [9] H. Niederreiter, Low-discrepancy point sets obtained by digital constructions over finite fields, Czechoslovak Math. J. 42 (1992) 143–166. [10] H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, CBMS-NSF Series in Applied Mathematics, vol. 63, SIAM, Philadelphia, 1992. [11] H. Niederreiter, Constructions of (t, m, s)-nets and (t, s)-sequences, Finite Fields Appl. 11 (2005) 578–600.


593

[12] W.Ch. Schmid, Improvements and extensions of the “Salzburg Tables” by using irreducible polynomials, in: H. Niederreiter, J. Spanier (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 1998, Springer, Berlin, 2000, pp. 436–447. [13] R. Schürer, W.Ch. Schmid, MinT: a database for optimal net parameters, in: H. Niederreiter, D. Talay (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 2004, Springer, Berlin, 2006, pp. 457–469. [14] I.H. Sloan, H. Wo´zniakowski, When are quasi-Monte Carlo algorithms efficient for high-dimensional integrals?, J. Complexity 14 (1998) 1–33.


A note on parallel and alternating time Felipe Cuckera,∗,1 , Irénée Briquelb a Department of Mathematics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong b Laboratoire d’Informatique et de Parallélisme, ENS Lyon 46, allée d’Italie, 69364 Lyon Cedex 07, France

Received 16 October 2006; accepted 19 February 2007 Available online 24 March 2007 Dedicted to Henryk Wozniakowski on the occasion of his 60th birthday

Abstract A long standing open question in complexity theory over the reals is the relationship between parallel time and quantifier alternation. It is known that alternating digital quantifiers is weaker than parallel time, which in turn is weaker than alternating unrestricted (real) quantifiers. In this note we consider some complexity classes defined through alternation of mixed digital and unrestricted quantifiers in different patterns. We show that the class of sets decided in parallel polynomial time is sandwiched between two such classes for different patterns. © 2007 Elsevier Inc. All rights reserved. Keywords: Complexity classes; Parallelism; Alternation

1. Introduction In classical complexity theory (that is, in the theory built upon the Turing machine) it was early realized [3,7] that the three following resources: 1. parallel time, 2. alternating time, and 3. space were equivalent under polynomial bounds. In other words, the complexity classes defined by parallel polynomial time, alternating polynomial time, and polynomial space are actually the same class. ∗ Corresponding author.

E-mail addresses: [email protected] (F. Cucker), [email protected] (I. Briquel). 1 Partially funded by a City University grant, SRG 7001712.


F. Cucker, I. Briquel / Journal of Complexity 23 (2007) 594 – 602

595

In contrast with the above, it was soon remarked that in the theory of complexity over the reals developed by Blum, Shub, and Smale [2], every decidable set can be decided using constant workspace [8]. This was at the expense of an exponential increase in the running time. Yet, the contrast with the classical situation prevailed. If we denote by PSPACER the class of sets of real vectors decidable in exponential time and using polynomial space, by PATR the class of sets decidable using polynomial alternating time, and by PARR the one of those decidable using parallel polynomial time, one can still show [4] that PARR ⊂ PSPACER ⊆ PATR .

(1)

Note that, in the classical setting, the requirement of exponential time is superfluous when polynomial space is ensured. Hence the (somehow abusive) notation PSPACER . Note also that the first inclusion above is strict. Along with these inclusions, a major achievement in algorithmics over the reals was the inclusion PHR ⊆ PARR ,

(2)

where PHR denotes the polynomial hierarchy over the reals (see, e.g., [9]). This is the class of sets which can be decided in polynomial alternating time with a constant (though not universally bounded) number of alternations. The inclusions in (1) and (2) together draw a critical boundary among the sets decidable with alternation: if the number of alternations between existential and universal guesses is constant (i.e., independent on the input size) then the set can be decided within parallel polynomial time. If no such constant bound exists then the problem may need exponential parallel time (and in some cases it actually does). The goal of this paper is to further investigate this boundary by looking at quantifiers with a restricted expressive power and consider the classes they define. The quantifiers we will look at are the digital quantifiers introduced in [5]. These are simply the usual quantifiers ∃ and ∀ but with variables ranging on the set {0, 1}. We will denote then by ∃B and ∀B , respectively. Digital versions of NPR and coNPR are naturally defined and a number of problems are shown to belong to these classes (see [6] for a non-trivial example). Alternation is also naturally defined and with it, the class DPATR of digital polynomial alternating time. Since any computation in this class makes only a polynomial number of guesses (digital, either universal or existential) one can simulate the computation in parallel polynomial time by independently computing its outcome for the exponential number of possible guesses and then checking whether the set of outcomes satisfy the prefix of quantifiers. Therefore, DPATR ⊆ PARR . Recall from (2), we also have PHR ⊆ PARR . One of our main results is to extend both these inclusions by R proving that DPATPH ⊆ PARR . R Our second main result involves classes defined by alternating digital and ordinary quantifiers. We define the class MA∃R (mixed alternation with real existentials) containing all sets decidable alternating digital universal and real existential guesses in polynomial time. Similarly, one defines the class MA∀R (mixed alternation with real universals) containing all sets decidable alternating digital existential and real universal guesses in polynomial time. Precise definitions are in Section 4. Then, we show that PARR ⊂ MA∃R and PARR ⊂ MA∀R (we will actually show that PSPACER is included in both MA∃R and MA∀R , hence the strict inclusions for PARR ). Together with our first result mentioned above this sharpens the relationship of PARR with R alternation. For, on the one hand, we characterize the class DPATPH by a form of alternation R where one first alternates a polynomial number of digital quantifiers and then a polynomial number

596


of real quantifiers (but these ones with only a bounded number of alternations). And, on the other hand, the classes MA∃R and MA∀R allow real quantifiers to alternate with digital ones provided all the real quantifiers are of the same kind. We can summarize the relationship between complexity classes which emerges from our results in the following diagram (where a line means inclusion of the left-hand side class in the right-hand side one and EXPR denotes the class of sets decidable in exponential time).

Probably the absence of a characterization of PARR in terms of quantifier alternation was an obstacle in the search for complete problems in PARR of which, to the best of our knowledge, there are no known natural examples. In Section 3 we provide one such problem (whose completeness is used later on in the paper). 2. Preliminaries We denote by R∞ the disjoint union of the Euclidean spaces Rn , for n 1. Given x ∈ R∞ we denote by |x| its size, i.e., the only n 1 such that x ∈ Rn . Sequential machines over R were introduced in [2]. Roughly speaking, they take inputs from R∞ and compute their output by performing arithmetic operations and comparisons. The class PR of subsets S ⊂ R∞ decidable in polynomial time is then readily defined. Nondeterministic machines over R were also introduced in [2], together with the class NPR of subsets decidable in nondeterministic polynomial time. Alternating machines are defined similarly. See [1] for the latter as well as for details on the definition of the polynomial hierarchy PHR and its levels. A parallel machine over the reals is defined in [1, Chapter 18]. It is a collection of processors, each with its own memory, and able to read other processors’ memory. The class PARR is the class of all subsets of R∞ decidable by parallel machines with a single exponential number of processors, and in polynomial time. It is shown there that PARR can also be defined as the class of subsets decidable by PR -uniform families of decisional circuits with polynomial depth (and hence, exponential size). See [1, Chapter 18] for details. 3. A PARR -complete problem While the nature of the class PARR suggests it must have complete problems, to the best of our knowledge, no natural PARR -complete problem has been exhibited in the literature. In this section we provide such a completeness result. Consider the following decisional problem: SCER (Succinct circuit evaluation): Given a tuple (M, x, 1p , 1t ) decide whether (i) M is a machine “describing” a circuit C in time at most p (i.e., with input i, M returns the encoding of the ith gate of C—or NIL if the size of C is less than i—in time at most p),


597

(ii) C has depth at most t, (iii) C has size(x) input gates and one output gate, and (iv) C(x) = 1. Proposition 3.1. The problem SCER is PARR -complete. Proof. Let S ∈ PARR . Then, there exists a PR -uniform family of circuits {Cn }n∈N deciding S in polynomial depth. Let p(n) and t (n) be the polynomials bounding the running time of the machine M describing the circuits, and the depth of Cn , respectively. On input (n, i), the machine M returns the ith gate of Cn . For all n ∈ N, let Mn be the machine computing the circuit Cn . Since Mn computes the function i → M(n, i) the code of the machine Mn can be computed in time polynomial in n (from the code of M). Then the map x → (M|x| , x, 1p(|x|) , 1t (|x|) )

gives a many-one reduction from S to SCER . This proves the hardness. For the membership, simply check that the following algorithm solves SCER in PARR : input (M, x, 1p , 1t ) % check conditions (i), (ii), and (iii) % for i = 1, . . . , 2t in parallel do compute the output gi of p steps of M with input i if gi is not NIL or a gate encoding HALT and REJECT end for let C = {gi }i 2t if C not a circuit with |x| input gates and 1 output gate HALT and REJECT % check condition (iv) % let m := size(C) for i = 1, . . . , m in parallel do i := 0 end for for j = 1, . . . , t do for i = 1, . . . , m in parallel do if the ’s corresponding to the parents of the ith gate are both 1 then evaluate g, set i := 1 and set vi to be the result of the evaluation end for end for if m = 1 and vm = 1 then HALT and ACCEPT else REJECT. 4. An upper bound for PARR : MA∃R In this section we sharpen the inclusion PARR ⊂ PATR by showing that PARR ⊂ MA∃R ⊆ PATR (the second inclusion being trivial).

598


For a quantifier Q and a variable x, let us use the notation QB x and QR x instead of Qx ∈ {0, 1} and Qx ∈ R, respectively. Definition 4.1. We define MA∃R to be the class of sets S ⊆ R∞ such that there exists a set B ⊆ R∞ in PR and a polynomial p such that, for x ∈ R∞ , x belongs to S if and only if ∀B y1 ∃R z1 . . . ∀B yp(|x|) ∃R zp(|x|)

(x, y, z) ∈ B.

We define the class MA∀R to be the class of sets whose complement is in MA∃R . The main result of this section is the following. Proposition 4.2. The class PSPACER is included in MA∃R and in MA∀R . Proof. Since PSPACER is closed by complementation, we just need to show that PSPACER is included in MA∃R . To do so we will closely follow the main argument in the proof of the inclusion PSPACER ⊆ PATR given in [4]. We define M to be the set of true formulas of the form Q1 X1 Q2 X2 . . . Qn Xn ,

(X1 , X2 , . . . , Xn ),

where the Qi are either ∀B or ∃R , and the expression (X1 , X2 , . . . , Xn ) denotes a semi-algebraic system. Clearly, M is a MA∃R -complete problem. To prove our statement it is therefore enough to reduce any problem in PSPACER to M. Let S be a language in PSPACER and M a machine over R deciding S in exponential time and polynomial space. Let p be a polynomial bounding the space used by M and q one bounding the logarithm (base 2) of the time bound for M. Fix x ∈ Rn . It is shown in [4] that any configuration of the computation of M with input x may be represented by a real vector of size p(n) + 3 (which, roughly speaking, encodes the current instruction and the current contents of the memory of M). For , ∈ Rp(n)+3 we define the formulas Next(, ),

Equal(, ),

Initial(, x),

and

Accepts()

meaning, respectively, “ is the configuration resulting from after one step of M”, “ and are the same configuration”, “ is the initial configuration of M with input x” and “ is an accepting configuration”. These formulas may be constructed in time polynomial in n by a real machine (whose code uses that of M). Our next goal is to describe a formula Access_2m (, ), also constructible in polynomial time, expressing that the configuration is reached from after at most 2m steps of M. If m = 0 we take Access_20 (, ) = Equal(, ) ∨ Next(, ).

For m > 0 we could define Access_2m := ∃ Access_2m−1 (, ) ∧ Access_2m−1 (, ),


599

but the length of this expression doubles at each iteration. To avoid the exponential growth of the expanded formula, we introduce a Boolean universal quantifier meant to describe the two calls to Access_2m−1 above with only one such call. We define Access_2m (, ) as follows: ∃ R ∀B b ∃R ∃R [(Equal( , ) ∧ Equal( , ) ∧ b = 0) ∨(Equal( , ) ∧ Equal( , ) ∧ b = 1)] ∧ Access_2m−1 ( , ). Let us denote by zm the vector of the variables present in this step of the recursion, that is, zm = (, , , b, , ). We denote by the formula (zm ) = [(Equal( , ) ∧ Equal( , ) ∧ b = 0) ∨ (Equal( , ) ∧ Equal( , ) ∧ b = 1)]. With these notations, Access_2m (, ) = ∃ R m ∀B bm ∃R m ∃R m (zm ) ∧ Access_2m−1 (m , m ) = ∃ R m . . . ∃R m−1 (zm ) ∧ (zm−1 ) ∧ Access_2m−2 (m−1 , m−1 )

.. . = ∃ R m ∀B bm ∃R m ∃R m . . . ∃R 1 ∀B b1 ∃R 1 ∃R 1 (zm ) ∧ · · · ∧ (z1 ) ∧ Access_20 (1 , 1 ). Note that we let the inner quantifiers migrate in front since the corresponding variables are not used in the previous part of the formula. Our reduction from S to M can be now simply described. To a point x ∈ R∞ we associate the formula ∃R ∃R [Initial(, x) ∧ Accepts() ∧ Access_2q(|x|) (, ))], which is constructed in polynomial time in |x|, has the form required by M, and belongs to M if and only if x ∈ S. Corollary 4.3. The class PARR is included in MA∃R and in MA∀R . Proof. It follows from Proposition 4.2 and the inclusion PARR ⊂ PSPACER shown in [4, Lemma 5.3]. R 5. A lower bound for PARR : DPATPH R

Roughly speaking, oracle machines are theoretical computational devices which, during the computation, may query whether an intermediately computed value, say z ∈ R∞ , belongs to a fixed set A ⊆ R∞ (called the oracle). The underlying computational device can be sequential, parallel, nondeterministic, alternating . . . . Formal definitions can be found, e.g., in [1]. Given a complexity class C (defined in terms of a class of resource-bounded machines) and a set A as the above one denotes by C A the class of sets decidable by machines in C which query the oracle A. Also, given complexity classes C and D one defines CA. CD = A∈D

600


Probably the best known example of classes defined this way are the levels of the polynomial hierarchy. Recall, for k 1, one defines kR to be the class of sets S ⊆ R∞ , for which there exists a set B ∈ PR and polynomials p1 , . . . , pk such that, for all x ∈ R∞ , x ∈ S if and only if ∃y1 ∈ Rp1 (|x|) ∀y2 ∈ Rp2 (|x|) . . . Qk yk ∈ Rpk (|x|)

(x, y1 , . . . , yk ) ∈ B.

(3)

Here Qk = ∃ if k is odd and Qk = ∀ otherwise. A well-known result (cf. [1, Chapter 18]) shows that k−1

kR = NPRR . R Our next result provides a similar characterization for DPATPH R .

Lemma 5.1. For a set S ⊆ R∞ the following are equivalent: R (i) S ∈ DPATPH R , (ii) there exists B ∈ PR , k 0, and polynomials q, p1 , . . . , pk such that, for all x ∈ R∞ , x ∈ S if and only if

∃ B b1 ∀B b2 . . . QB bq(|x|) ∃y1 ∈ Rp1 (|x|) ∀y2 ∈ Rp2 (|x|) . . . Qk yk ∈ Rpk (|x|) (x, b1 , . . . , bq(|x|) , y1 , . . . , yk ) ∈ B. Here QB = ∃B if q(|x|) is odd and QB = ∀B otherwise. Similarly for Qk . R Proof. We begin with (i) ⇒ (ii). To do so, let S ∈ DPATPH R . Then, there exist a digitally alternating machine M and a set A ∈ R (for some 0) such that M decides S with oracle A. The idea of the proof is that we can modify M so that all the oracle queries are performed after the alternation has been done. This is obtained by replacing a query “z ∈ A?” in the program of M by an existential binary guess (which replaces the answer to the query “z ∈ A”). Then, once all the alternation has been performed, the program adds the following instructions (here d1 , . . . , dr are the binary guesses corresponding to the oracle queried values z1 , . . . , zr )

if for all j = 1, . . . , r (zj ∈ A and dj = 1) or (zj ∈ / A and dj = 0) then continue else REJECT Note that, once fixed all the binary values corresponding to the alternation (this includes A which is known to d1 , . . . , dr ) the computation in the instructions above is performed in PR +1 be included in +1 R . And computations in R can be described by a quantifier prefix as that described in (3) with k = + 1. This shows that S can be described as in (ii). For the direction (ii) ⇒ (i) consider a set S as described in (ii). Define a set A ⊆ R∞ consisting of the points z ∈ R∞ satisfying that: (1) z is of the form (x, b1 , . . . , bq(|x| ) with x ∈ R∞ and bi ∈ {0, 1}, for i q(|x|). (2) ∃y1 ∈ Rp1 (|x|) ∀y2 ∈ Rp2 (|x|) . . . Qk yk ∈ Rpk (|x|) (z, y1 , . . . , yk ) ∈ B. R Since (1) is checked in PR and (2) in kR we have A ∈ kR . To show that S ∈ DPATPH one R ∞ considers the machine that, given x ∈ R , first guesses the elements b1 , . . . , bq(|x|) ∈ {0, 1}


601

(alternating existential and universal guesses) and then queries whether z = (x, b1 , . . . , bq(|x|) ) k

R is in A. This machine decides S in DPATA R ⊆ DPATR .

R Proposition 5.2. We have DPATPH ⊆ PARR . R R Proof. Let S ∈ DPATPH R . Then, S can be characterized as in Lemma 5.1(ii). Consider now a parallel machine which, with input x ∈ R∞ , independently generate the 2q(|x|) elements in {0, 1}q(|x|) and for each one of them, say b, checks whether

∃y1 ∈ Rp1 (|x|) ∀y2 ∈ Rp2 (|x|) . . . Qk yk ∈ Rpk (|x|) (x, b, y1 , . . . , yk ) ∈ B. This checking can be done in PARR (we saw in the proof of Lemma 5.1 that it can be done in kR and now we use that kR ⊆ PARR ). Therefore, we compute in PARR the 2q(|x|) bits corresponding to all the possible guesses b. We now check that these bits satisfy the prefix of quantifiers corresponding to b, which can also be done in PARR . k

R Proposition 5.3. If DPATPH = PARR then there exists k 0 such that DPATRR = PARR . R R R Proof. Assume DPATPH = PARR . Then, SCER ∈ DPATPH and therefore, there exists k 0 R R

k

such that SCER ∈ DPATRR . Since SCER is complete in PARR (Proposition 3.1) all problems in k

PARR must be in DPATRR .

Remark 5.4. (i) One can prove Proposition 5.2 differently. First, we claim that the inclusion A DPATR ⊆ PARR relativizes (i.e., that for every set A ⊆ R∞ , one has DPATA R ⊆ PARR ). A Indeed, any computation in DPATR makes only a polynomial number of guesses (digital, either universal or existential). Therefore, one can simulate the computation in parallel polynomial time by independently computing its outcome for the exponential number of possible guesses (each of A ) and then checking whether the set of outcomes satisfy the prefix these computations being in PR of quantifiers. This shows the claim. Now Proposition 5.2 follows from this claim by taking A = PHR ⊆ PARR and noting that if A ⊆ PARR then PARA R = PARR . R ⊆ PARR . We do (ii) A natural question arising from Proposition 5.2 is whether PHDPAT R not have an answer for it. Actually we note that we do not have a result similar to Lemma 5.1 R (that would characterize PHDPAT by alternating first real quantifiers—with a bounded number of R alternations—and then digital ones) nor can we show that the inclusion PHR ⊆ PARR relativizes (the proofs known for this inclusion being too involved (e.g., [9])). References [1] L. Blum, F. Cucker, M. Shub, S. Smale, Complexity and Real Computation, Springer, 1998. [2] L. Blum, M. Shub, S. Smale, On a theory of computation and complexity over the real numbers: NP-completeness, recursive functions and universal machines, Bull. Am. Math. Soc. 21 (1989) 1–46. [3] A. Borodin, On relating time and space to size and depth, SIAM J. Comput. 6 (1977) 733–744. [4] F. Cucker, On the complexity of quantifier elimination: the structural approach, Comput. J. 36 (1993) 400–408. [5] F. Cucker, M. Matamala, On digital nondeterminism, Math. Syst. Theory 29 (1996) 635–647.

602


[6] K. Meer, On the complexity of quadratic programming in real number models of computation, Theor. Comput. Sci. 133 (1994) 85–94. [7] A. Meyer, L. Stockmeyer, The equivalence problem for regular expressions with squaring requires exponential time, in: 13th IEEE Symposium on Switching and Automata Theory, 1973, pp. 125–129. [8] C. Michaux, Une remarque à propos des machines sur R introduites par Blum, Shub et Smale, C. R. Acad. Sci. Paris 309 (Série I) (1989) 435–437. [9] J. Renegar, On the computational complexity and geometry of the first-order theory of the reals. Part II, J. Symbolic Comput. 13 (1992) 301–327.


Searching for extensible Korobov rules Hardeep S. Gilla , Christiane Lemieuxb,∗ a Department of Mathematics, University of British Columbia, Canada b Department of Statistics and Actuarial Science, University of Waterloo, Canada

Received 18 October 2006; accepted 18 January 2007 Available online 7 February 2007 Dedicated to Prof. Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract Extensible lattice sequences have been proposed and studied in [F.J. Hickernell, H.S. Hong, Computing multivariate normal probabilities using rank-1 lattice sequences, in: G.H. Golub, S.H. Lui, F.T. Luk, R.J. Plemmons (Eds.), Proceedings of the Workshop on Scientific Computing (Hong Kong), Singapore, Springer, Berlin, 1997, pp. 209–215; F.J. Hickernell, H.S. Hong, P. L’Ecuyer, C. Lemieux, Extensible lattice sequences for quasi-Monte Carlo quadrature, SIAM J. Sci. Comput. 22 (2001) 1117–1138; F.J. Hickernell, H.Niederreiter, The existence of good extensible rank-1 lattices, J. Complexity 19 (2003) 286–300]. For the special case of extensible Korobov sequences, parameters can be found in [F.J. Hickernell, H.S. Hong, P. L’Ecuyer, C.Lemieux, Extensible lattice sequences for quasi-Monte Carlo quadrature, SIAM J. Sci. Comput. 22 (2001) 1117–1138]. The searches made to obtain these parameters were based on quality measures that look at several projections of the lattice. Because it is often the case in practice that low-dimensional projections are very important, it is of interest to find parameters for these sequences based on measures that look more closely at these projections. In this paper, we prove the existence of “good” extensible Korobov rules with respect to a quality measure that considers two-dimensional projections. We also report results of experiments made on different problems where the newly obtained parameters compare favorably with those given in [F.J. Hickernell, H.S. Hong, P. L’Ecuyer, C. Lemieux, Extensible lattice sequences for quasi-Monte Carlo quadrature, SIAM J. Sci. Comput. 22 (2001) 1117–1138]. © 2007 Elsevier Inc. All rights reserved. MSC: 11D45; 11K36; 65C05; 65D30 Keywords: Lattice sequences; Korobov rules; Highly uniform point sets


E-mail addresses: [email protected] (H.S. Gill), [email protected] (C. Lemieux). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.01.005

604

H.S. Gill, C. Lemieux / Journal of Complexity 23 (2007) 603 – 613

1. Introduction Point sets and sequences that are more uniform than random ones are often used within various numerical methods, namely for multidimensional integration. More precisely, for an integral of the form f (u) du, (1) [0,1)s

where f is a real-valued function, an approximation for (1) can be formed by using 1 f (ui ), n n−1

ˆ =

i=0

where Pn = {u0 , . . . , un−1 } is some point set in [0, 1)s . The Monte Carlo method amounts to choosing Pn randomly and uniformly, while using a highly-uniform point set (HUPS) is often referred to as the quasi-Monte Carlo method. Several practical problems can be formulated as (1), with a dimension s that can be quite large or even infinite. Other problems can benefit from the availability of HUPS or sequences, for instance optimization problems [11,17]. There are two main families of constructions used to generate HUPS: digital nets and lattice rules. A class of lattice rules that is often used in practice are Korobov rules. While digital nets often come from digital sequences containing an infinite number of points, lattice rules are generally built for a fixed number of points n. Point sets that come from a sequence are preferred for applications where the user may want to increase the number of points without discarding previous function evaluations. In an effort to make lattice rules useful in that context, Hickernell and Hong [5] proposed a method to construct extensible lattice rules, that is, infinite sequences of points that can be used to provide lattice rules. The construction is investigated further in [6,7], and more recently in [1,3]. In particular, parameters for extensible Korobov rules are given in [6]. These parameters were found by performing computer searches based on quality measures that assess the quality of different projections of the point set, but that do not put a special emphasis on low-dimensional projections. Because it is quite important that these low-dimensional projections be of good quality in practice, using quality measures that put more emphasis on those projections seems like a promising approach to perform parameter searches for extensible lattices. This paper investigates this idea, both from a theoretical and practical point of view. More precisely, we look at a quality measure that can be used to put more emphasis on low-dimensional projections [4,15], and prove that for a special case of this measure where only two-dimensional projections are considered, there exist extensible Korobov rules that are “good” with respect to that measure. We then investigate empirically the quality of sequences of Korobov rules found using this measure. The rest of this paper is organized as follows. In Section 2, we recall how extensible lattice rules are constructed. In Section 3, we describe the general quality measure considered in this paper, and prove the existence result mentioned above. In Section 4, we give numerical results where we compare the quality of rules obtained by a computer search based on the two-dimensional criterion studied in the previous section against other ones obtained in [6]. The comparison is done by looking at the empirical variance of the resulting estimators on two practical problems. A conclusion with ideas for future research is given in Section 5.


605

2. Background on lattice rules Because of its widespread use in practice, the lattice construction we chose to study in this paper is a Korobov rule [9], which for a dimension s and a number of points n is defined by a generator a as follows: i 2 s−1 Pn = (1, a mod n, a mod n, . . . , a mod n) mod 1 : i = 0, . . . , n − 1 . (2) n This construction is a special case of a rank-1 lattice rule, which is determined by a generating vector z = (z1 , . . . , zs ) as follows: i Pn = (z1 , . . . , zs ) mod 1 : i = 0, . . . , n − 1 . (3) n Here, it is assumed that each component zj is between 1 and n − 1, and usually we also have gcd(zj , n) = 1 so that the n coordinates {izj /n, i = 0, . . . , n − 1} are distinct. To explain how extensible lattice rules are constructed, we follow [6]. First, we recall the definition of the radical-inverse function b : for b 2, let n be a non-negative integer and consider its unique digit expansion in base b given by n=

∞

ai b i ,

(4)

i=0

where 0 ai < b, and ai = 0 for all sufficiently large i, i.e., the sum in (4) is actually finite. Then we have b (n) =

k

ai b−i−1 .

i=0

Now, to define extensible rank-1 lattice rules, we need to remove the dependence on n in the definition (3) of a rank-1 lattice. First, as pointed out in [6], if the number of points n in (3) is a power of some integer b 2, then we can replace i/n in that definition by b (i), and get the same point set, but with the points generated in a different order. Second, the generating vectors used for extensible rules should not depend on n. Therefore, we can assume thateach component zj l−1 . has a base b expansion of the form zj = . . . zj 2 zj 1 . That is, we write zj = ∞ l=1 zj l b We can now define an infinite rank-1 lattice sequence based on the generating vector z as {b (i)z mod 1 : i = 0, 1, 2, . . .}. Hence, such sequences are entirely determined by the integer vector z, which in case of a Korobov sequence amounts to choosing an integer a, as z = (1, a, a 2 , . . . , a s−1 ). It is easy to see that for any m1, the first bm points produced by this sequence correspond to a rank-1 lattice with generating vector zm = (z1 mod bm , . . . , zs mod bm ) = (z1m . . . z12 z11 , . . . , zsm . . . zs2 zs1 ). Hence the first bm points do not depend on the digits zj k of the generating vector for k > m. 3. An existence result for extensible Korobov rules As seen in the previous section, to construct extensible Korobov rules, we simply need to select a generator a. In order to do this, we need to choose a search criterion that can be used to assess

606


the quality of the Korobov point sets of different sizes defined by a given generator. In this section, we introduce the general quality measure that will be used for this purpose, and then show that for a special case of that measure, there exist “good” extensible Korobov rules, i.e., for which that quality measure behaves asymptotically better than for a random point set. Before we proceed to the definition of this quality measure, we need to introduce some notation and recall some definitions. First, in what follows we will be working with the b-adic integers l−1 , where i ∈ {0, 1, . . . , b − 1} as in [7]. Let Zb be the set of all b-adic integers i = ∞ l l=1 il b for all l 1. Then define Ab = {i ∈ Zb : gcd(i1 , b) = 1}. We will be looking at generating vectors of the form z = (1, a, a 2 , . . .) ∈ A∞ b = Ab × Ab × · · · for extensible Korobov rules, i.e., vectors that can be used for an arbitrary large dimension s. Also, for a given n, we denote by Ab,n = {a ∈ Ab : 1a < n} the set of admissible generators a. For lattice point sets, a quality measure that is widely used is the weighted P [4], which for a point set of size n in dimension s generated by a, is defined as P,n,s (a) = Ih h− , (5) 0=h:h·a≡0 mod n

· · · + hs a s−1 , Ih = {j : hj = 0, 1 j s}, {I , ∅ = I ⊆ {1, . . . , s}} where h · a = h1 + h2 a + is a set of weights, h = si=1 h¯ i , and h¯ = max(1, |h|). In what follows, we will make use of the fact that P,n,s (a) = I P,n,s,I (a), I

where P,n,s,I (a) is the value of the measure P,n,|I | (a) for the projection of the Korobov point set (2) over I when all weights are set to 1 (i.e., this is the unweighted P as studied, for example, in [14]). That is, for I = {i1 , . . . , it } ⊆ {1, . . . , s}, P,n,s,I (a) = h− , 0=h∈Zt , h·aI ≡0 mod n

where h · aI = h1 a i1 −1 + · · · + ht a it −1 . Of special interest in this paper are versions of the weighted P with finite-order weights, which are studied in [15] in the context of tractability of multivariate integration over Korobov spaces. That is, we consider versions of P,n,s (a) where all weights I are zero when |I | > q for some order q ∈ {1, . . . , s}. In particular, we consider here the finite-order weighted P measure with order q = 2. Also, since Korobov point sets are dimension-stationary [10], in our case the order-2 weighted P can be written as M,n,s (a) :=

s

{1,k} P,n,s,{1,k} (a).

(6)

k=2

Note that by setting {1,k} in (6) to s−k+1 l=1

{l,l+k−1}

(7)


607

the criterion (6) becomes equivalent to the order-2 version of (5), where all two-dimensional projections are included in the sum. Using this equivalence, one can more easily put our results into the framework of [15]. We can now present our first result, which states that for any n = bm , where b is some prime, any dimension s, and any > 1, we can find a generator a for which the criterion M,n,s (a) is bounded by a constant over n. At first sight, this may seem like a weak result, but as in [16], Jensen’s inequality can be used to improve the behavior of this bound with respect to n. This approach is used in our second result, where we prove the existence of a generator a that can define a “good” sequence of Korobov rules, i.e., an a for which the criterion M,n,s (a) is O(n−v logc n) for some v > 1. This is done using an approach very similar to the one presented in [7, Theorem 2], in combination with Jensen’s inequality. Note however that by contrast with [7], the existence result proved here is for the particular case of Korobov rules rather than the more general rank-1 rules (proofs based on averaging arguments as those used here are typically easier in the latter case), and is also based on a different quality measure. Proposition 1. Given n = bm , b prime, > 1, and s 1, there exists a Korobov lattice rule generator a ∗ such that M,n,s (a ∗ )c(, s)n−1 , where c(, s) is a constant with respect to n. Proof. The proof proceeds by finding a bound on the average value of M,n,s (a) over all generators a. Note that similar bounds for Korobov rules can be found elsewhere (e.g., [11,16]), but the approach used here is somewhat different as n is a power of a prime. Define 1 M¯ ,n,s = M,n,s (a), |A| a∈A

where we dropped the subscripts in Ab,n to ease the notation. Because n = bm , it holds that gcd(a, b) = 1 for all a ∈ A, and thus |A| = (n) = n(1 − 1/b). Next, we use the notation n (l) = 1 if n|l, and 0 otherwise. We then expand the sums in the definition of M,n,s (a) and use the fact that for > 1, 0=h∈Z2 h− n (h · (1, a k−1 )) converges absolutely for all a ∈ Ab,n , to get s n (h · (1, a k−1 )) 1 M¯ ,n,s = {1,k} . |A| h 2 k=2

0=h∈Z a∈A

Hence, we must obtain a bound on the number of a ∈ A satisfying n (h · (1, a k−1 )) = 1 for a given h ∈ Z2 and k. Equivalently, we need to find the number of a ∈ A satisfying h1 + h2 a k−1 ≡ 0 mod n. This problem can be solved in two steps. First, we find the number of solutions of h1 + h2 x ≡ 0 mod n that lie in {0, . . . , n − 1}. Next we use Propositions 4.2.2 and 4.2.3 in [8] to bound the number of solutions to the equivalence a k−1 ≡ x0 mod n for each solution x0 of the equivalence in the first step. Now, a solution x0 for the first equivalence exists only if d|(−h1 ), where d = gcd(h2 , n), and in that case, there is a total of d solutions. Note that d has to be of the form bi for some 0 i m

608


because n = bm . In addition, using the fact that x0 must be such that a solution exists to the second equivalence, it can be proved that h must satisfy gcd(h1 , n) = gcd(h2 , n). Hence s n (h1 + h2 a k−1 ) 1 M¯ ,n,s = {1,k} , |A| h

h∈L a∈A

k=2

where L = {h ∈ Z2 \0 : gcd(h1 , n) = gcd(h2 , n)}. Next, we decompose the set L as L = 0 q m Lq , where for 0 q m, Lq = {h ∈ Z2 \0 : bq = gcd(h1 , n) = gcd(h2 , n)}. We also define k =

n (h1 + h2 a k−1 ) = h

h∈L a∈A

n (h1 + h2 a k−1 ) . h

0 q m h∈Lq a∈A

So for each h ∈ Lq , there are bq solutions to h1 + h2 x ≡ 0 mod n. Next, for each solution x0 we can find an upper bound—denoted db (k)—on the number of solutions a ∈ A to the equivalence a k−1 ≡ x0 mod n, where 1k s. First, if b = 2, then d2 (k) = 2 gcd(k − 1, 2m−2 ) by Proposition 4.2.2 in [8]. If b > 2, then db (k) = gcd(k − 1, n(1 − 1/b)), by Proposition 4.2.3 in [8]. Note that the value db (k) is at most 2(k − 1), and occurs when b = 2 and k = 2e + 1 for some e < m − 1. We now have that k db,s

bq

0q m

h∈Lq

1 , h

where db,s = max2 k s db (k). Next, we get the following bound for 0 q < m: h∈Lq

⎞2 ⎛ 1 1 ⎠ = 1 42 (). ⎝ q h (lb ) b2q l=0

Similarly, we can find a bound of 4b−2m (() + 1)2 for the case q = m. Hence we get that k 4db,s (() + 1)2

0q m

1 bq(2−1)

= 4db,s (() + 1)2

1 − (nb)1−2 (, s), 1 − b1−2

where (, s) = 4db,s (() + 1)2 (1 − b1−2 )−1 . Therefore 1 1 M¯ ,n,s = {1,k} k {1,k} (, s), |A| n(1 − 1/b) s

s

k=2

k=2

and by letting c(, s) = (, s)Ws (1 − 1/b)−1 , where Ws = result.

s

k=2 {1,k} ,

we get the desired


609

Note that the behavior of c(, s) with respect to s depends on the size of the bound Ws on the sum of the weights {1,k} . For instance, if we make the assumption that each {1,k} is bounded by 1, then Ws is O(s) and c(, s) is O(s 2 ), since db,s is bounded by 2(s − 1). If the weights {1,k} arise as sums of weights as in (7) (which are derived from the order-2 version of the criterion (5) that considers all two-dimensional projections), then Ws is O(s 2 ), which still yields a function c(, s) that is polynomial in s. We can now present our main result: Proposition 2. For any prime b, > 1, v ∈ [1, ), and > 0, there exists a generator a such that M,n,s (a)k ∗ (s, )[log(log(n) + 1)]v(1+ ) (log(n)n−1 )v for n = b, b2 , . . . , and s = 1, 2, . . . , where k ∗ (s, ) is a constant with respect to n. Proof. The proof follows closely that of Theorem 2 in [7]. As in [7], let be a probability measure on the set Zb for which the set of all i ∈ Zb with specified first l digits has measure b−l . This probability measure, conditional on the set Ab , is denoted b . From Proposition 1, we have that for any fixed m 0, s > 0, M¯ ,bm ,s =

Ab

˜ M,bm ,s (a)db (a)c(, s)n−1 =: M(, m, s).

We then use this result to define sets of “bad” generating vectors. More precisely, let ˜ Gb,m,s = {a ∈ Ab : M,bm ,s (a)cm cs M(, m, s)}, where cj = cj ( ) = c0 ( )j [log(j + 1)]1+ , j 1, and ∞

c0 ( ) >

k −1 [log(k + 1)]−1− .

k=1

Then b (Gb,m,s ) < 1/(cm cs ) because ˜ b (Gb,m,s )cm cs M(, m, s)

Gb,m,s

Ab

Now let Gb =

∞

m=1

b (Gb ) = b

∞

s=1 Gb,m,s .

∞ ∞

Gb,m,s m=1 s=1 ∞ ∞

= c0−2 ( )

m=1 s=1

Then

M,bm ,s (a) db (a)

˜ M,bm ,s (a) db (a) = M(, m, s).

∞ ∞

b (Gb,m,s )

m=1 s=1

1 0, and so there exists at least one a ∗ ∈ Ab such that for all s, m, we have ˜ M,bm ,s (a ∗ ) < cm cs M(, m, s) = c(, s)c02 ( )s log(n)[log(s + 1) log(log(n) + 1)]1+ n−1 = k(, s, ) log(n)[log(log(n) + 1)]1+ n−1 , where k(, s, ) = c(, s)c02 ( )s. As in [16], we can now apply Jensen’s inequality to show that M,n,s (a)(M,n,s (a))1/ for some ∈ (1/, 1], where the weights in M,n,s (a) are obtained by raising to the power the weights in M,n,s (a). Hence for some a ∗ and for all s, m, we have that M,bm ,s (a ∗ ) (k(, s, )(log(n)[log(log(n) + 1)]1+ n−1 )1/ = k ∗ (s, )[log(log(n) + 1)](1+ )/ (log(n)n−1 )1/ , where k ∗ (s, ) = (k(, s, ))1/ . Setting v = 1/ gives the desired result.

4. Numerical results Using the quality measure M,n,s (a), we can now define a criterion to be used for computer searches of “good” generators a. To do so, we must choose a dimension s and an integer m1 that will define the range of point set sizes considered. That is, the criterion will measure the quality of each potential generator a by computing M,n,s (a) for n = b, b2 , . . . , bm1 , and diˆ m1 ,s (a) used in our vide it by a scaling factor as in [6]. Definition 1 describes the criterion G searches. ˆ m1 ,s (a) = max1 m m1 Gm,s (a), where Definition 1. Let G Gm,s (a) =

M,bm ,s (a) . −m b (1 + m log b)1/2

In the following experiments, we restrict our attention to the case b = 2. Also, we choose = 2 and use weights of the form {1,k} = k−2 for some ∈ (0, 1). In Fig. 1, we show in the left table the generators a obtained with s = 32 and different values of m1 and , while in the right table, we fix m1 = 15 and = 0.8, and list the generators obtained for varying dimensions s.

Fig. 1. Best choices of a: (left) when s = 32; (right) when m1 = 15.


611

0.01 17797 14471 11335 MC

0.008

0.006

0.004

0.002

0

9

10

11

12

13

14

15

16

Fig. 2. Standard errors for Asian option pricing.

To test the adequacy of our quality measure, we pick one of the generators (a = 14471) from Fig. 1, use it to construct estimators for different problems, and compute the standard error obtained on these estimators. More precisely, for a Korobov point set Pn generated by a, we use a random shift uniformly distributed in [0, 1)s as in [2], and construct the estimator ˆ =

1 f ((ui + ) mod 1). n ui ∈Pn

We repeat this procedure m = 100 times with independent shifts, thus obtaining m independent estimators ˆ 1 , . . . , ˆ 100 . We then compute the standard error

1 (ˆ i − ) ¯ 2 m(m − 1) m

1/2 ,

i=1

where ¯ is the average of the ˆ i ’s. In Fig. 2, we compare the standard error obtained for an Asian option pricing problem (see, ˆ 17,32 with = 0.8, against e.g., [10] for the details) by the generator a = 14 471 obtained using G the generators 17 797 and 11 335 listed in [6], which were based on criteria assessing the quality of a over the same range (s 32 and m 17). The parameters for the option are s = 32 prices entering the mean, an initial asset value of 100, a strike price of 100, a risk-free interest rate of 0.05, and a volatility of 0.2. Also shown on that figure is the Monte Carlo standard error obtained for some values of n. Similar experiments were conducted using digital option pricing as in [13], this time with s = 128. Results are shown in Fig. 3. Note that although the generators used in this experiment were found based on a criterion where s = 32, they provide estimators that perform much better than Monte Carlo even if s = 128 for this problem, and the generator a = 14471 still outperforms the other two for values of n > 512.

612


0.025

17797 14471 11335 MC

0.02

0.015

0.01

0.005

0

9

10

11

12

13

14

15

16

Fig. 3. Standard errors for digital option pricing.

5. Conclusion In this paper we proved the existence of good extensible Korobov rules with respect to an order-2 weighted P criterion. We also provided numerical results suggesting that rules found with this criterion can outperform previously published rules. For future research, an obvious goal to pursue would be to prove the existence of good rules with respect to a criterion of order q > 2. We believe this will be mathematically challenging since bounds on the number of equations for congruences with more than one term are not readily available. In addition, it would be interesting to compare the Korobov rules obtained in this paper with extensible rank-1 rules given in [1,3], and to study how the approach used in [3] could be applied in our setting to find generators satisfying our existence result. Finally, we would like to establish results similar to those presented in this paper, but for extensible polynomial Korobov rules. Existence results in that case are given in [12], but to our knowledge, no parameters have been published so far. Acknowledgments We thank the co-editor of this special issue and the anonymous referees for their helpful comments and suggestions. This work was funded by NSERC—Canada. References [1] R. Cools, F.Y. Kuo, D. Nuyens, Constructing embedded lattice rules for multivariate integration, SIAM J. Sci. Comput. 28 (6) (2006) 2162–2188. [2] R. Cranley, T.N.L. Patterson, Randomization of number theoretic methods for multiple integration, SIAM J. Numer. Anal. 13 (6) (1976) 904–914. [3] J. Dick, F. Pillischammer, B. Waterhouse, The construction of good extensible rank-1 lattices, Math. Comput., to appear. [4] F. J. Hickernell, Lattice rules: how well do they measure up?, in: P. Hellekalek, G. Larcher (Eds.), Random and Quasi-Random Point Sets, Lecture Notes in Statistics, vol. 138, Springer, New York, 1998, pp. 109–166.


613

[5] F.J. Hickernell, H.S. Hong, Computing multivariate normal probabilities using rank-1 lattice sequences, in: G.H. Golub, S.H. Lui, F.T. Luk, R.J. Plemmons (Eds.), Proceedings of the Workshop on Scientific Computing (Hong Kong), Singapore, Springer, Berlin, 1997, pp. 209–215. [6] F.J. Hickernell, H.S. Hong, P. L’Ecuyer, C. Lemieux, Extensible lattice sequences for quasi-Monte Carlo quadrature, SIAM J. Sci. Comput. 22 (2001) 1117–1138. [7] F.J. Hickernell, H. Niederreiter, The existence of good extensible rank-1 lattices, J. Complexity 19 (2003) 286–300. [8] K. Ireland, M. Rosen, A Classical Introduction to Modern Number Theory, second ed., Springer, Berlin, 1998. [9] N.M. Korobov, The approximate computation of multiple integrals, Dokl. Akad. Nauk SSSR 124 (1959) 1207–1210 (in Russian). [10] P. L’Ecuyer, C. Lemieux, Variance reduction via lattice rules, Manage. Sci. 46 (9) (2000) 1214–1235. [11] H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, SIAM CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 63, SIAM, Philadelphia, PA, 1992. [12] H. Niederreiter, The existence of good extensible polynomial lattice rules, Monatshefte für Mathematik 139 (2003) 295–307. [13] A. Papageorgiou, The brownian bridge does not offer a consistent advantage in quasi-Monte Carlo integration, J. Complexity 18 (1) (2002) 171–186. [14] I.H. Sloan, S. Joe, Lattice Methods for Multiple Integration, Clarendon Press, Oxford, 1994. [15] I.H. Sloan, X. Wang, H. Wo´zniakowski, Finite-order weights imply tractability of multivariate integration, J. Complexity 20 (2004) 46–74. [16] X. Wang, I.H. Sloan, J. Dick, On Korobov lattice rules in weighted Korobov spaces, SIAM J. Numer. Anal. 42 (2004) 1760–1779. [17] S. Yakowitz, P. L’Ecuyer, F. Vazquez-Abad, Global stochastic optimization with low-dispersion point sets, Oper. Res. 48 (6) (2000) 939–950.


Optimal approximation of elliptic problems by linear and nonlinear mappings III: Frames Stephan Dahlkea,1 , Erich Novakb,∗ , Winfried Sickelb a Philipps-Universität Marburg, FB12 Mathematik und Informatik, Hans-Meerwein StraYe, Lahnberge, 35032 Marburg,

Germany b Friedrich-Schiller-Universität Jena, Mathematisches Institut, Ernst-Abbe-Platz 2, 07743 Jena, Germany

Received 9 November 2006; accepted 1 March 2007 Available online 14 March 2007 Dedicated to our dear colleague and friend Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We study the optimal approximation of the solution of an operator equation A(u) = f by certain n-term approximations with respect to specific classes of frames. We consider worst case errors, where f is an element of the unit ball of a Sobolev or Besov space Bqt (Lp ()) and ⊂ Rd is a bounded Lipschitz domain; the error is always measured in the H s -norm. We study the order of convergence of the corresponding nonlinear frame widths and compare it with several other approximation schemes. Our main result is that the approximation order is the same as for the nonlinear widths associated with Riesz bases, the Gelfand widths, and the manifold widths. This order is better than the order of the linear widths iff p < 2. The main advantage of frames compared to Riesz bases, which were studied in our earlier papers, is the fact that we can now handle arbitrary bounded Lipschitz domains—also for the upper bounds. © 2007 Elsevier Inc. All rights reserved. MSC: 41A25; 41A46; 41A65; 42C40; 65C99 Keywords: Elliptic operator equation; Worst case error; Frames; Nonlinear approximation methods; Best n-term approximation: Manifold width; Besov spaces on Lipschitz domains


E-mail addresses: [email protected] (S. Dahlke), [email protected] (E. Novak), [email protected] (W. Sickel) URLs: http://www.mathematik.uni-marburg.de/∼dahlke/ (S. Dahlke), http://www.minet.uni-jena.de/∼novak,sickel (W. Sickel). 1 This author acknowledges support through the European Union’s Human Potential Programme, under contract HPRN-

CT-2002-00285 (HASSIP), and through DFG, Grant Da 360/4-3. He also wants to thank the Friedrich-Schiller-Universität Jena for the hospitality and support. 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.001

S. Dahlke et al. / Journal of Complexity 23 (2007) 614 – 648

615

1. Introduction We study the optimal approximation of the solution of an operator equation A(u) = f,

(1)

where A is a linear operator A:H →G

(2)

from a Hilbert space H to another Hilbert space G. We always assume that A is boundedly invertible, hence (1) has a unique solution for any f ∈ G. We have in mind the more specific situation of an operator equation which is given as follows. Assume that ⊂ Rd is a bounded Lipschitz domain and assume that A : H0s () → H −s ()

(3)

is an isomorphism, where s > 0. For the exact definitions of Lipschitz domains and spaces of distributions defined on such domains we refer to the Appendix, see also [9]. Now we put H = H0s () and G = H −s (). Since A is boundedly invertible, the inverse mapping S : G → H is well defined. This mapping is sometimes called the solution operator—in particular if we want to compute the solution u = S(f ) from the given right-hand side A(u) = f . We study different mappings Sn for the approximation of the solution u = A−1 (f ) for f contained in F ⊂ G. We consider the worst case error e(Sn , F, H ) =

sup A−1 (f ) − Sn (f )H ,

f F 1

(4)

where F is a normed (or quasi-normed) space, F ⊂ G. In our main results, F will be a Sobolev or Besov space. 2 Hence we use the following commutative diagram: S G −→ H I SF F. Here I : F → G denotes the identity and SF the restriction of S to F. Then one is interested in approximations that have an optimal order of convergence depending on n, where n denotes the degrees of freedom. For our purposes, the following approximation schemes are important. Consider the class Ln of all continuous linear mappings Sn : F → H , Sn (f ) =

n

Li (f ) · h˜ i

i=1

with arbitrary h˜ i ∈ H . The worst case error of optimal linear mappings is given by the approximation numbers or linear widths enlin (S, F, H ) = inf e(Sn , F, H ). Sn ∈Ln

2 Formally we only deal with Besov spaces. Because of the embeddings B −s+t (L ()) ⊂ W −s+t () ⊂ p p 1 −s+t B∞ (Lp ()), which hold for 1 p ∞, t s, see [45], our results are valid also for Sobolev spaces.

616


We may also use nonlinear approximations with respect to a Riesz basis R of H, i.e. we consider the class Nn (R) of all (linear or nonlinear) mappings of the form Sn (f ) =

n

ck hik ,

k=1 non (S, F, H ) where the ck and the ik depend in an arbitrary way on f. Then the nonlinear widths en,C are given by non (S, F, H ) = inf en,C

inf

R∈RC Sn ∈Nn (R)

e(Sn , F, H ).

Here RC denotes a set of Riesz bases for H where C indicates the stability of the basis, i.e. we require B/A C and A, B are the Riesz constants of the basis. The investigation of these widths non and its comparison with the linear widths have been the major part of our analysis in [8,9]. en,C This has continued earlier research on related topics, cf. e.g. [24,38–40]. The next type of widths non in [9]. The we are interested in has served as a very useful tool in our analysis of the widths en,C manifold widths are related to the class Cn of continuous mappings, given by arbitrary continuous mappings Nn : F → Rn and n : Rn → H . Again we define the worst case error of optimal continuous mappings by encont (S, F, H ) = inf e(Sn , F, H ), Sn ∈Cn

(5)

where Sn = n ◦ Nn . These numbers have been studied in [13,27] and later in [9,14,17,16]. As mentioned above we have studied the relationships of these widths in [9]. It has turned out that for problems as in (3) with F = Bq−s+t (Lp ()) (with some extra conditions on ) one has the following: if p 2 and t > 0 then enlin (S, Bq−s+t (Lp ()), H0s ()) encont (S, Bq−s+t (Lp ()), H0s ()) non en,C (S, Bq−s+t (Lp ()), H0s ()) n−t/d ,

(6)

whereas in the case 0 d(1/p − 21 ) enlin (S, Bq−s+t (Lp ()), H0s ()) n−t/d+1/p−1/2 and non (S, Bq−s+t (Lp ()), H0s ()) n−t/d . encont (S, Bq−s+t (Lp ()), H0s ()) en,C

Hence, if p < 2 then there is an essential difference in the behavior, nonlinear approximations can do better than linear ones. This paper is a continuation of [8,9]. We are again interested in optimal nonlinear approximation schemes, but this time not related to Riesz bases but to classes of frames. The motivation for this is given by the following observations. In [9], we presented upper and lower non (S, F, H ). The proof of the lower bound was quite general and used the fact that bounds for en,C non en,C (S, F, H ) can be estimated from below by the manifold widths encont (S, F, H ) up to some constants. In contrary to this, the proof of the upper bound was based on norm equivalences of Besov norms with weighted sequence norms that are induced by a biorthogonal wavelet basis. However, this restricts the choice of the underlying domain ⊂ Rd since on a general Lipschitz domain the construction of a suitable wavelet basis might be very complicated or even impossible.


617

This problem becomes less serious in the frame setting since a suitable wavelet frame always exists, see Section 5.2 for a detailed discussion. Moreover, in recent years the application of frame methods for the numerical resolution of the solution u in (1) has become a field of increasing importance. Especially, it has been possible to derive adaptive wavelet frame schemes that are guaranteed to converge for a wide range of problems [6,7,37]. Therefore it is important to clarify the power that frame schemes can have, in principle. In this paper, we give a first answer. Our main result states that the nonlinear frame widths show non (S, F, H ), where we now can allow arbitrary bounded the same asymptotic behavior as the en,C Lipschitz domains. There is an interesting difference to the Riesz bases case. In the frame setting, we do not work with arbitrary n-term approximations, but only with those induced by a frame pair, see Section 2.2 for details. The reason is that, for practical applications, only these canonical representations are used. Actually we prove that if we would allow arbitrary n-term approximations then the associated frame widths would be zero. Moreover, certain conditions related to stability must be satisfied by the admissible frames. Fortunately, these conditions are always satisfied for the known constructions of wavelet frames on Lipschitz domains. This paper is organized as follows. In Section 2, we describe the basic setting. First of all, we introduce and discuss the frame concept as far as it is needed for our purposes. Then, in Section 2.2, we define the nonlinear frame widths and prove some basic properties that are needed in the sequel. Section 3 contains the main results of this paper. In the next section two examples are discussed: the Poisson equation for Lipschitz domains and a Fredholm integral equation of the first kind (the single layer potential). Proofs of our main results are given in Section 5. For general Hilbert spaces H and G we show that similar to the Riesz bases case the nonlinear frame widths can be estimated from below by the manifold widths. Then, for the more specific case of Besov spaces on Lipschitz domains, we also prove an upper estimate which shows that the asymptotic behavior is the same as for the Riesz basis case—but this time for arbitrary bounded Lipschitz domains. Notation. We write a b if there exists a constant c > 0 (independent of the context dependent relevant parameters) such that c−1 a b c a . One-sided estimates of this type are denoted by a < ∼ b. All unimportant constants will be denoted by c, sometimes with additional indices. Identity operators are always denoted by I, also sometimes with additional indices. 2. Frames In this paper, we will study certain approximations of u = S(f ) based on frames. Therefore, in this section we recall the basic properties of frames as far as they are needed for our purposes and introduce the corresponding nonlinear widths. For further information on frames, we refer the reader e.g. to [2,21]. A sequence F = {hk }k∈N in a separable Hilbert space H is a frame for H if there exist constants A, B > 0 such that A2

∞ ∞ (f, hk )H 2 f 2 B 2 (f, hk )H 2 H k=1

k=1

(7)

618


for all f ∈ H . As a consequence of (7), the corresponding operators of analysis and synthesis given by T : H → 2 (N), f → (f, hk )H k∈N , (8) T∗ : 2 (N) → H,

c →

∞

(9)

ck h k

k=1

are bounded. The composition T ∗ T is a boundedly invertible (positive and self-adjoint) operator := (T ∗ T )−1 F is again a frame for H , the canonical called the frame operator. Furthermore, F dual frame. The following formulas hold: f =

∞

(f, (T ∗ T )−1 hk )H hk =

k=1

∞

(f, hk )H (T ∗ T )−1 hk

(10)

k=1

for all f ∈ H . This classical concept of a frame is too general, we need an additional stability condition, stronger than (7). Without this additional assumption on the frames, there would not exist lower bounds for corresponding widths as we shall now explain. Remark 1. Let H be a separable Hilbert space and let K ⊂ H be a compact subset. Then for an arbitrary C > 1 there exists a frame F = {hi }i∈N in H with B/A < C such that the following is true: for all f ∈ K and for all ε > 0 there exists a hi ∈ F and c ∈ R such that f − chi H < ε. Hence the best n-term approximation yields an error 0 already for n = 1. To prove this statement, we construct such a frame for a given compact set K ⊂ H . Let M1 = {ei , i ∈ N} be a complete orthonormal set of H and let {ki , i ∈ N} be a dense subset of K. We consider sets of the form M2 = {1 k1 , 2 k2 , . . . } ⊂ H with i = i , where 0 < < 1 and put F = M1 ∪ M2 . It is not difficult to check that F is a frame with all the claimed properties if = (C) is chosen appropriately. The frames F can be considered as “pathological”, since the norms of many elements of F are extremely small. A first idea would be to request that the norms of the frame elements are uniformly bounded from above and below, 0 < c1 hi H c2 < ∞ for all hi ∈ F = {hi }i∈N , but this does not help: now we can define F as the union of M1 and multiples of the ei ± i ki . Then one obtains such a “normed” frame such that: for all f ∈ K and for all ε > 0 there exist hi ∈ F and ci ∈ R such that f − c1 h1 − c2 h2 H < ε. Therefore we go into a different direction, see Definitions 1 and 2. 2.1. Frame pairs As it is well-known, Sobolev spaces built on L2 () can be discretized by means of weighted 2 -spaces, see the Appendix for some examples how one can do this. Let w := (wk )k∈N be a


619

sequence of positive numbers which we call simply a weight in what follows. Then we put ⎧ ⎫ 1/2

∞ ⎨ ⎬ 2,w := a = (ak )k∈N : a 2,w := wk |ak |2 0, all finite subsets ⊂ N and all f ∈ K. (iii) Let K be a subspace of H and let C 1 be a given number. By PC (K) we denote the set of all stable frame pairs (F, G) with respect to K such that the constants A, B and A in (12) and (14) satisfy B/ min(A, A ) C. Remark 2. To avoid any type of confusion we shall use (·, ·) for the scalar product in H and ·, · for duality pairings, in particular for H × H . Some comments are in order. Remark 3. (i) A frame pair in the sense of (11) and (12) is sometimes called an atomic decomposition, cf. e.g. [2, Definition 17.3.1.]. However, the phrase atomic decomposition is used with a different meaning in the theory of function spaces, cf. e.g. [18,25,43,46]. For this reason we do not use it here. (ii) Let (F, G) be a frame pair for (H, w). As above let F = {hk }k∈N ⊂ H and G := {gk }k∈N ⊂ H . By the Riesz representation theorem, for every hk there exists an element hk ∈ H such hk )H . Consequently, that f, hk H ×H = (f, √ (f, wk hk )k∈N 2 = (f, hk H ×H )k∈N 2,w for all f ∈ H.

620


√ h k )k . Hence, there is a one-to-one correspondence between F and the Hilbert frame ( wk √ However, note that G need not be related to the canonical dual frame of ( wk hk )k . (iii) The reader might wonder why we use the concept of frame pairs instead of the classical frame setting as introduced in (7) and (10). However, since we are dealing here with Gelfand triples (H0s (), L2 (), H −s ()), s − 21 = integer, see Remark 10, this approach would be at least problematic since we are not allowed to identify the space H0s () with its dual. (Otherwise, it would not be possible to identify L2 () with its dual at the same time—a strange construction. We refer to [23] for further details.) (iv) Our concept is closely related to Banach frames in the sense of [20,22]. A Banach frame for a separable and reflexive Banach space B is a sequence F = {hk }k∈N in B with an associated sequence space Bd such that the following properties hold: (B1) norm equivalence: there exist constants A, B > 0 such that A f, hk B×B k∈N B f B B f, hk B×B k∈N B d

d

(15)

for all f ∈ B; (B2) there exists a bounded operator S from Bd onto B, a so-called synthesis or reconstruction operator, such that S f, hk B×B k∈N = f. (16) (It is a remarkable fact that for Banach spaces the existence of the reconstruction operator does not follow from the norm equivalence (15) and has to be explicitly required). A frame pair in the sense of Definition 1(i) induces a Banach frame F = {hk }k∈N for the special case B = H , Bd = 2,w (N) where the operator R serves as synthesis operator, cf. [2, Theorem 3.2.3]. Consequently, in our setting, the estimate ck gk B (ck )k∈N 2,w (17) k∈N

H

always holds. (v) We comment on the condition (14). Clearly, (14) always holds on all of H for a Riesz basis {gk }k∈N for H. However, there exist frames which are not Riesz bases and for which (14) holds on H. E.g. take an orthonormal basis {ek }k∈N and define the frame F =: {e1 , 2−1/2 e2 , 2−1/2 e2 , e3 , e4 , . . .}. This is a tight frame, (12) holds with A = B = 1, so the primal and the canonical dual frame coincide. (We refer again to [2, Chapter 5] for further information). Since {ek }k∈N is an orthonormal basis, a direct computation shows that (14) holds for A = 2−1/2 . Nevertheless, requiring (14) on all of H would be very restrictive, and most frames would not satisfy it. As an example, consider the frame F := {e1 , 2−1/2 e2 , 2−1/2 e2 , 3−1/2 e3 , 3−1/2 e3 , 3−1/2 e3 , . . .}. This is also a tight frame, but again a direct check shows that (14) does not hold. Therefore we require (14) only on subsets. Fortunately, such a condition is satisfied in case of the known frame constructions for function spaces on Lipschitz domains. (vi) The example in (v) shows that the two constants A and A in Definition 1 need not be related at all. Nevertheless, to avoid unnecessary notational difficulties, we will restrict ourselves to the case A = A in the sequel. The modifications to the case A = A are straightforward.


621

(vii) For simplicity, we have introduced our basic concepts for frame pairs indexed by the set of natural numbers. Later on, we shall also use frame pairs corresponding to more general countable sets, with the obvious modifications. For later use, let us finally state the following simple but useful property: frame pairs are invariant under isomorphic mappings. Lemma 1. Let G, H be Hilbert spaces and let S : G → H be an isomorphism. Let (F, G) be a frame pair for (G, w) with frame constants A, B. Then the following holds: = A/S −1 and B = (i) (S ∗ −1 (F), S(G)) is a frame pair for (H, w) with frame constants A BS. (ii) If (F, G) is contained in PC (K) then (S ∗ −1 (F), S(G)) is contained in PC(S(K)), where = CSS −1 . C Proof. Step 1: Proof of (i). We start by showing (11). For f ∈ H , we obtain

−1 −1 f = S(S (f )) = S S (f ), hk H ×H gk = f, S ∗ −1 (hk )H ×H S(gk ). k∈N

k∈N

The next step is to show the norm equivalence (12). We obtain 1 1 f H = (S ◦ S −1 )(f )H S −1 (f )G S S B(S −1 (f ), hn G×G )2,w = B(f, S ∗ −1 (hk )H ×H )2,w B B S −1 (f )G S −1 f H . A A Let R be the bounded operator associated with (F, G). Then R˜ = S ◦ R is again a bounded operator with ˜ k ) = S(R(k )) = S(gk ), R(

˜ SR SB, R

and (i) is shown. Step 2: Proof of (ii). For f ∈ S(K), we get −1 f, S ∗ −1 (hk )H ×H S(gk ) S −1 −1 S (f ), h g k G×G k k∈ k∈ H

G

S −1 −1 A (S −1 (f ), hk G×G )k∈ 2,w

= S −1 −1 A (f, S ∗ −1 hk H ×H )k∈ 2,w , = B/ A = CSS −1 . and (ii) is proved with C

2.2. Nonlinear widths for frame pairs The aim of this paper is to study the asymptotic behavior of specific nonlinear approximation schemes based on frames and to compare them with other well-known widths. Especially, we

622


want to prove frame analogues to the results obtained in [8,9] for the nonlinear widths associated with classes of Riesz bases. Let (F, G) be a frame pair for (H, w) in the sense of Definition 1 and consider specific n-term approximations of the form u, hk H ×H gk (18) n u, (F, G) := inf u − . || n k∈ H

We do not allow arbitrary expansions in terms of the gk involving at most n nonvanishing coefficients. The reason is that, for practical applications, only these canonical representations are used. Furthermore, to end up with a reasonable notion of a width we need to restrict us to stable frame pairs. In what follows we shall use the conventions: if F is a subspace of G and if S : G → H is an isomorphism then we equip the subspace S(F ) with the quasi-norm S(f ) |S(F ) := f |F . Furthermore, if K is a subspace of S(F ) we endow it with the quasi-norm of S(F ). Definition 2. Let G and H be separable Hilbert spaces and let S : G → H be an isomorphism. Let F be a quasi-normed subspace of G. For a given constant C 1 we denote by KC the set of all subspaces K ⊂ S(F ) such that the inequality encont (I, S(F ), H )Cencont (I, K, H )

(19)

frame (S, F, H ) of the operator S is holds for all n. Then, for n ∈ N, the nonlinear frame width en,C defined by frame (S, F, H ) := inf sup n S(f ), (F, G) | (F, G) ∈ PC (K), K ∈ KC . (20) en,C f F 1

frame (S, F, H ) Remark 4. We comment on this definition. To get a reasonable lower bound for en,C we need to restrict ourselves to frame pairs which are stable with respect to subspaces K of S(F ) which are not too small. “Not too small” is expressed by the inequality (19).

In the above definition we decided for the manifold widths because they have some nice properties. These widths encont are particular examples of s-numbers in the sense of Pietsch [31], see also [27]. One of the interesting properties consists in the inequality encont (T2 ◦ T1 ◦ T0 , E0 , F0 ) T0 T2 encont (T1 , E, F ),

(21)

where T0 ∈ L(E0 , E), T1 ∈ L(E, F ), T2 ∈ L(F, F0 ) and E0 , E, F, F0 are arbitrary quasi-Banach spaces. As a consequence one obtains that the asymptotic behavior of the manifold widths remains unchanged under isomorphisms. A similar result is true in case of our nonlinear frame widths. As a consequence we can concentrate on the investigation of identity operators in what follows. Lemma 2. Let G and H be separable Hilbert spaces and let S : G → H be an isomorphism. Let F be a quasi-normed subspace of G and let I : F → G be the identity. For C 1 and = C (S −1 S)2 C


623

we obtain frame frame en, (S, F, H )Sen,C (I, F, G) C

(22)

frame −1 frame en, (I, F, G) S en,C (S, F, H ). C

(23)

and

Proof. We shall prove (22), the proof of (23) is very similar. From (20) we can conclude that for any ε > 0 we can find a subspace K ∈ KC and a frame pair (F, G) ∈ PC (K) for (G, w) such that frame sup inf f − f, hk G×G gk en,C (I, F, G) + ε. f F 1 || n k∈ G

∗ −1

(F), S(G)) is a frame pair for (H, w) which is contained in PC1 Lemma 1 implies that (S (S(K)), where C1 = C S −1 S. We consider the following commutative diagrams: I1

S(F ) −−−−→ ⏐ ⏐ S −1 F

I2

H ⏐ ⏐S

−−−−→ G

K ⏐ ⏐ S

I2

−−−−→ G ⏐ −1 ⏐S I1

S(K) −−−−→ H.

By means of (21) we derive from these diagrams encont (I1 , S(F ), H )S −1 Sencont (I2 , F, G) and encont (I2 , K, G) S −1 Sencont (I1 , S(K), H ). Now our assumption K ∈ KC yields encont (I1 , S(F ), H ) S −1 Sencont (I2 , F, G) CS −1 Sencont (I2 , K, G) C(S −1 S)2 encont (I1 , S(K), H ).

In other words, S(K) belongs to the set KC. From ∗ −1 S(f ) − S(f ), S (h ) S(g ) S f − f, h g k H ×H k k g×G k k∈ k∈ H

G

it follows that frame frame en, (S, F, H )Sen,C (I, F, G). C

We finish this section by proving two additional properties of nonlinear frame widths that will be used later on in Section 5.3. Lemma 3. Let G1 , G2 , H1 , H2 be Hilbert spaces and let Si ∈ L(Fi , Hi ), i = 1, 2, be isomorphisms. Let F1 , F2 be quasi-normed subspaces of G1 and G2 , respectively. Furthermore we

624


suppose T1 ∈ L(F1 , F2 ), T2 ∈ L(H2 , H1 ) and both are isomorphisms. Finally, we assume that we can decompose S1 = T2 ◦ S2 ◦ T1 . Then, frame frame en, (S1 , F1 , H1 ) T2 T1 en,C (S2 , F2 , H2 ) C

(24)

= CT −1 T2 . holds with C 2 Proof. Corresponding to our assumptions we have the following commutative diagram: F1 −−−−→ H1 S1 ⏐ ⏐ ⏐T T1 ⏐ 2 F2 −−−−→ H2 . S2

By definition, for any ε > 0 we can find a subspace K ∈ KC ⊂ G and a frame pair (F, G) ∈ PC (K) for (H2 , w) such that eframe (S2 , F2 , H2 ) + ε. sup S inf f − S f, h g 2 2 k H2 ×H2 k n,C | | n f F2 1 k∈ H2

Lemma 1 implies that (T2∗ −1 (F), T2 (G)) is a frame pair for (H1 , w) which is contained in = CT −1 T2 . We put PC(T2 (K)), where C 2 uk := T2∗ −1 (fk )

and vk := T2 (gk ).

Consequently S1 g − S1 g, uk H1 ×H1 vk k∈ H1 −1 ∗ T2 S2 (T1 g) − S2 (T1 g), T2 uk H2 ×H2 T2 vk k∈

H2

frame T2 (en,C (S2 , F2 , H2 ) + ε),

if T1 gF2 1. A homogeneity argument yields sup inf S1 (g) − S1 g, uk H1 ×H1 vk gF1 1 || n k∈

frame T2 T1 en,C (S2 , F2 , H2 )

H1

which proves our claim.

Lemma 4. Let U be a closed subspace of the Hilbert space H equipped with the same norm as H. Let G be a Hilbert space and let S : G → H be an isomorphism. If F is a subset of S −1 (U ), then frame frame en,C (S, F, U )en,C (S, F, H )

follows.


625

Proof. The Hilbert space H can be written as the orthogonal sum of U and its orthogonal complement V. By P we denote the orthogonal projection onto U. Let (F, G) be a frame pair for (H, w). Then the elements f ∈ U can be written in the form f =

∞

f, hk P gk .

k=1

The norm equivalences (12) remain unchanged. Hence, (F, P (G)) is a frame pair for (U, w) B and A A B B. Concerning the stability it is enough to notice that only with constants A, subsets K of S(F ) ⊂ U come into consideration. 3. Main results In this section, we state the main results of this paper. The first theorem is a general result for arbitrary Hilbert spaces H and G that clarifies the relationships of the manifold widths encont (S, F, H ) frame (S, F, H ). The second theorem deals with the more specific with the nonlinear frame widths en,C situation of function spaces on Lipschitz domains contained in Rd and provides upper and lower frame (S, B −s+t (L ()), H s ()). bounds for en,C p q 0 Theorem 1. Let H and G be separable Hilbert spaces. Let S : G → H be an isomorphism. Suppose that the embedding F → G is compact. Then for all C 1 and all n ∈ N, we have cont frame e4n+1 (S, F, H )2C 2 en,C (S, F, H ).

(25)

Theorem 2. Let be a bounded Lipschitz domain contained in Rd . Let 0 < p, q ∞, s > 0, and t > d( p1 − 21 )+ . Let S : H −s () → H0s () be an isomorphism. Then there exists a number C ∗ such that for any C C ∗ we have frame en,C (S, Bq−s+t (Lp ()), H0s ()) n−t/d .

Remark 5. (i) The number C ∗ depends on . It is known that for any Lipschitz domain there exists an appropriate frame pair as it is needed here. However, optimal estimates about the stability seem to be not known. (ii) For exact definitions of the distribution spaces defined on Lipschitz domains we refer to the Appendix and to [9] (iii) Theorem 2 is a frame analogue to Theorem 4 in [9]. In [9], it has been shown that if the domain is chosen in such a way that the spaces Bq−s+t (Lp ()) and H −s () allow a ˜ ∗ , then also discretization by one common wavelet system R non (S, Bq−s+t (Lp ()), H0s ()) n−t/d en,C

holds for C sufficiently large. We see that the restrictive condition on the domain that was needed in the Riesz basis case can be dropped in the frame setting. (iv) Our proof of the upper bounds in Theorem 2 is constructive. One may always use the frame pair constructed in Lemma 5.

626


4. Examples In this section, we apply the analysis presented above to two classical examples, i.e. the Poisson equation in a Lipschitz domain and the single layer potential equation on the unit circle. 4.1. The Poisson equation We consider the Poisson equation in a bounded Lipschitz domain contained in Rd −u = f in , u = 0 on *.

(26)

As usual, we study (26) in the weak formulation. Then, it can be shown that the operator A = : H01 −→ H −1 is boundedly invertible, see, e.g. [23] for details. Hence Theorem 2 applies with s = 1, so that frame (S, Bq−1+t (Lp ()), H01 ()) n−t/d en,C

if t > d( p1 − 21 )+ . 4.2. The single layer potential As a second example we shall deal with an integral equation. Let be the unit circle. Then we consider the Fredholm integral equation of the first kind 1 Af (x) := − log |x − y|f (y) dy = (x), x ∈ . 2 The left-hand side is called the single layer potential. The following is known, cf. e.g. [5]: the operator A belongs to L(H −1/2 (), H 1/2 ()), where H 1/2 () is the collection of all functions g ∈ L2 () such that

|g(x) − g(y)|2 dx dy < ∞ |x − y|2

and H −1/2 () its dual. Furthermore, A is a bijection of H onto G where g(y) dy = 0 and H := {g ∈ H −1/2 () : g, 1 = 0}. G := g ∈ H 1/2 () :

The space G can be interpreted as the quotient space H 1/2 ()/R of H 1/2 () with R (the constants) and H can be interpreted as the quotient space H −1/2 ()/R of H −1/2 () with R. By S we denote frame (S, F, G) where F is chosen to be A−1 , defined on G with values in H. Now we investigate en,C t+1/2

the quotient space of the Besov space Bq t+1/2 definition of Bq (Lp ()). We put

(Lp ()) and the constants, see Section 5.3.2 for a

Yqs (Lp ()) := {g ∈ Bqs (Lp ()) : g, 1 = 0}.


627

The same principles as above apply. Again we use a commutative diagram S −→

H 1/2 ()/R I

F :=

H −1/2 ()/R SF

t+1/2 Yq (Lp ()).

(27)

Here I denotes the identity and SF the restriction of S to F. Then the outcome is as follows. Theorem 3. Let 0 < p, q ∞ and t > ( p1 − 21 )+ . Then there exists a number C ∗ such that for any C C ∗ we have t+1/2

frame en,C (S, Yq

(Lp ()), H ) n−t .

Remark 6. There are far-reaching extensions concerning the theory of the mapping properties of the single layer potentials. In particular, much more general curves and surfaces are discussed. We refer to [44, Section 20] for the discussion of these properties in the framework of d-sets. 5. Proofs 5.1. Proof of Theorem 1 First, we deal with Theorem 1. Here we shall work in the framework of Hilbert frame pairs. Hence we consider sequences (gk )k and (hk )k in a (separable) Hilbert space H such that f =

∞

(28)

(f, hk )gk

k=1

for all f ∈ H , compare with Remark 3(ii). By (17) we may assume that 2 ∞ ∞ ck gk B 2 · ck2 k=1

(29)

k=1

for arbitrary (ck )k∈N ∈ 2 (N). Moreover, we assume that the representation (28) is stable on K ⊂ H in the sense that 2 |(f, hk )|2 (f, h )g (30) A2 k k k∈ k∈ for arbitrary f ∈ K and ⊂ N. Moreover we assume that B C. A

(31)

We consider particular n-term approximations of f ∈ K by subsums of (28) and their error n (f ) = inf f − (32) (f, hk )gk . || n k∈

628


We define en,C (K, H ) =

inf

sup n (f )

(gk )k ,(hk )k f ∈K

(33)

with the understanding that (28)–(32) hold true. Moreover, we define encont (K, H ) := inf sup n (Nn (u)) − u, Nn ,n u∈K

(34)

where the infimum runs over all continuous mappings n : Rn → H and Nn : K → Rn . Then the following result is a frame analog of Proposition 1 from [9]. Proposition 1. Assume that K ⊂ H is compact and C 1. Then cont e4n+1 (K, H )2Cen,C (K, H ).

(35)

Proof. Assume that K, n, and C 1 are given. Let ε > 0. Then there exist sequences (gk )k and (hk )k in H such that (28)–(31) as well as sup inf f − (f, hk )gk (36) en,C (K, H ) + ε f ∈K || n k∈ hold. Since we only consider f ∈ K, we can always assume that the index set is a subset of {1, 2, . . . , N}. We only loose another ε. Here N might be large, but is finite. We write LN (f ) =

N

(f, hk )gk

(37)

sup f − LN (f ) ε

(38)

sup inf LN (f ) − (f, hk )gk en,C (K, H ) + 4ε. f ∈K || n k∈

(39)

k=1

and obtain f ∈K

and

For the n-term approximation in (39) we also write fn∗ = a k gk ,

(40)

k∈

hence ak = (f, hk ) and || = n for each f ∈ K and sup LN (f ) − fn∗ en,C (K, H ) + 4ε.

f ∈K

(41)

For the proof we may assume that A = 1. We consider the modification L∗N of LN defined by L∗N (f )

=

N k=1

ak∗ gk ,

(42)


629

where ak∗ = ak if |ak | 2 and ak∗ = 0 if |ak |. To obtain a continuous dependence of ak∗ from ak and, hence, a continuous mapping L∗N : H → H , we define ak∗ = 2 sgn ak · (|ak | − ) if |ak | ∈ (, 2). The number > 0 will be defined later. Assume that for f ∈ K there are m > n of the ak with |ak | . Then ak g k , LN f − fn∗ = ˜ k∈

˜ contains at least m − n elements with |ak |. Then we obtain from (30) where LN f − fn∗ (m − n)1/2 and with (41) we get m − n

1 2

(en,C (K, H ) + 4ε)2 .

(43)

Now we consider the sum |ak |d . − p 2 + Then encont (S, Bq−s+t (Lp ()), H0s ()) n−t/d . 5.2.2. Upper bounds The proof of the upper bound turns out to be a little bit more complicated. However, let us mention that our proof is constructive. As a first step we reduce the proof of Theorem 2 to the proof of the following Theorem 4. Let be as above. Let 0 < p, q ∞, s ∈ R and suppose that 1 1 t >d − p 2 +


631

holds. Then there exists a number C ∗ such that for any C C ∗ we have frame −t/d en,C (I, Bqs+t (Lp ()), B2s (L2 ())) < . ∼ n

Proof of Theorem 2. Since H −s () = B2−s (L2 ()), cf. Remark 10, Theorem 4 yields that frame −t/d (I, Bq−s+t (Lp ()), H −s ()) < . en,C ∼ n

Since S : H −s () → H0s () is an isomorphism, Lemma 2 implies the desired result.

5.2.3. Widths and discrete Besov spaces The proof of Theorem 4 requires several preparations. First of all, let us fix some notation. Let 0 < p, q ∞ and let s ∈ R. Let ∇ := (∇j )∞ j =−1 be a sequence of subsets of finite cardinality of d d the set {1, 2, . . . , 2 − 1} × Z . We suppose that there exist 0 < C1 C2 and J ∈ N such that the cardinality |∇j | of ∇j satisfies C1 2−j d |∇j | C2

for all j J.

(45)

s (∇), where 0 < q < ∞, denotes the collection of all sequences a = (a ) Then bp,q j, j, of complex numbers such that

⎛ ⎜ s abp,q := ⎝

∞

⎞q/p ⎞1/q ⎟ 2j (s+d(1/2−1/p))q ⎝ |aj, |p ⎠ ⎠ < ∞. ⎛

j =−1

(46)

∈∇j

For q = ∞, we use the usual modification ⎛ s abp,∞ :=

sup

j =−1,0,1,...

2j (s+d(1/2−1/p)) ⎝

⎞1/p |aj, |p ⎠

< ∞.

(47)

∈∇j

In our paper [9] we have dealt with several types of widths of embeddings of those discrete Besov spaces. A few of the results we obtained there will be recalled now. Proposition 3. Let 0 < p, q ∞ and s ∈ R. Suppose that 1 1 t >d . − p 2 +

(48)

It holds s+t s s+t s encont (I, bp,q (∇), b2,2 (∇)) ennon (I, bp,q (∇), b2,2 (∇)) n−t/d .

Remark 7. Of course, the constants in the above inequalities depend on ∇ (and therefore on C1 , C2 and J) as well as on s, t, p and q. But this will play no role in what follows. 5.2.4. Frame pairs for Sobolev spaces on domains Now we turn to the construction of frame pairs for Sobolev spaces with some additional features.

632


Let s ∈ R be fixed and let # $ := k , k : k ∈ Zd # $ ∪ i,j,k ,

i,j,k : i = 1, . . . , 2d − 1, j = 0, 1, 2, . . . , k ∈ Zd ,

(49)

be a biorthogonal wavelet system such that the parameter r, controlling the smoothness and the moment conditions, satisfies r > |s|, see Proposition 4 in the Appendix. Here, as always in this subsection we shall use H s () = B2s (L2 ()) in the sense of equivalent norms, see the Appendix. We suppose supp , supp , supp , supp

⊂ [−N, N]d , i = 1, . . . , 2d − 1. i

i

By we denote a ball with radius R and center x 0 . We may assume ⊂ B(x 0 , R) for some R > 0 and x 0 ∈ . Rychkov [33] has proved that in case of a bounded Lipschitz domain there exists a linear and continuous extension operator E ∈ L(H s () → H s (Rd )). In addition, we may assume that B(x 0 , R)

supp Ef ⊂ B(x 0 , 2R)

(50)

holds for all f ∈ H s (). Now we turn to the wavelet decomposition of Ef . Defining # $ j := k ∈ Zd : |2−j ki − xi0 |2R + 2−j N, i = 1, . . . , d , j = 0, 1, . . . , we obtain for given f ∈ H s () Ef =

Ef, k k +

d −1 ∞ 2

Ef,

i,j,k i,j,k

(convergence in S )

(51)

i=1 j =0 k∈j

k∈0

and ⎛ Ef |H s (Rd ) ⎝

⎞1/2 |Ef, k |2 ⎠

k∈0

⎛

⎛

+⎝

22j s ⎝

d −1 ∞ 2

i=1 j =0

⎞⎞1/2 |Ef,

i,j,k |2 ⎠⎠

< ∞.

(52)

k∈j

This can be rewritten by using ∇−1 := 0 , # $ ∇ j := (i, k) : 1i 2d − 1, k ∈ j ,

(53) j = 0, 1, . . . ,

(54)

j, := i,j,k , if = (i, k) ∈ ∇j , j ∈ N0 , and j, := k if = k ∈ ∇−1 . Similarly in case of the dual basis. Then (51), (52) read as Ef =

∞ j =−1 ∈∇j

Ef,

j, j,

(convergence in S )

(55)

and s (∇) . f |H s () Ef |H s (Rd ) (Ef,

j, )j, b2,2

(56)


633

Let X denote the characteristic function of . We put gj, := X j, ,

j = −1, 0, 1, . . . , ∈ ∇j .

For M ∈ N we have M j =−1 ∈∇j

⎛

Ef,

j, gj, = ⎝

M

j =−1 ∈∇j

(57) ⎞

Ef,

j, j, ⎠

and consequently lim

M→∞

in

H s ().

M j =−1 ∈∇j

Ef,

j, gj, = (Ef )| = f

Let E ∗ denote the adjoint of E. Define

hj, = E ∗ (

j, ),

j = −1, 0, 1, . . . , ∈ ∇j .

(58)

Then, taking into account the norm equivalences (56), it follows that (F, G) satisfies (11) and s (∇)), where (12) for (H s (), b2,2 F = {hj, : j = −1, 0, 1, . . . , ∈ ∇j }

(59)

G = {gj, : j = −1, 0, 1, . . . , ∈ ∇j }.

(60)

and

Instead of writing (H, w) we used here the notation (H, 2,w ), see Definition 1. To obtain a frame pair, it remains to establish a suitable reconstruction operator. Due to the norm equivalences stated in (52) and Proposition 4, it is clear that such an operator R : 2,w −→ H s (Rd ) exists on all of Rd . Therefore s R˜ : b2,2 (∇) −→ H s (),

a = (aj, )(j, )∈∇ −→ R(a)

does the job. We collect our findings in the following lemma. Lemma 5. Let ⊂ Rd be a bounded Lipschitz domain. Let be a wavelet system, see (49), such that r > |s|, see Proposition 4. Let F and G be defined as in (57)–(60). Then (F, G) is a s (∇)), where ∇ = ∇() is defined in (53), (54). frame pair for (H s (), b2,2 5.2.5. Stability of frame pairs Next we need to investigate the stability of this frame pair constructed in the previous subsection. The symbol ∇ will always refer to ∇ = ∇() defined in (53), (54). Let 0 < p, q ∞ and suppose t > d( p1 − 21 )+ . Furthermore, we require that the parameter r of the wavelet system satisfies 1 1 (61) r > max s + t, d max 0, − 1 − s, d max 0, − 1 − (s + t) , p p see Proposition 4. We choose a rectangular subset of such that dist(, *) > 0. Then we define # $ ∇j∗ := (i, k) ∈ j : supp j, ⊂ , j = 0, 1, . . . . (62)

634


Of course, it may happen that ∇j∗ = ∅ if j is small. Let J ∈ N be a number such that ∇j∗ = ∅ for all j J . Then we put ⎫ ⎧ ⎪ ⎪ ∞ ⎬ ⎨ s+t K := f ∈ D () : there exists (aj, )j, ∈ bp,q (∇ ∗ ) s.t. f = aj, j, . (63) ⎪ ⎪ ⎭ ⎩ j =J ∈∇ ∗ j

Because of dist(, *) > 0 we can extend f by zero outside of and obtain from Proposition 4 that K ⊂ Bqs+t (Lp ()). Again making use of Proposition 4 we find that s (∇ ∗ ) , a

a

(aj, )(j, )∈ b2,2 j, j, j, j, (j, )∈ s s d (j, )∈ H ()

&∞

H (R )

if ⊂ j =J ∇j∗ . Here the constants do not depend on . Finally, we have to show that K is sufficiently large or more exactly, that K ∈ KC for some sufficiently large C. By definition of K the mapping T : f → (f,

j, )(j, )∈∇j∗ s+t (∇ ∗ )). Moreover, it is invertible and T −1 ∈ L(bs+t (∇ ∗ ), K). Once again belongs to L(K, bp,q p,q we shall use the extension operator E. In addition, we apply the fact that E may be chosen such that E ∈ L(Bqs+t (Lp ()), Bqs+t (Lp (Rd ))), cf. Rychkov [33]. Now we extend T by defining

T : f → (Ef,

j, )(j, )∈∇j . This extension is again bounded, cf. Proposition 4. Let us have a look at the commutative diagram I1

s+t (∇ ∗ ) − bp,q −−−→ ⏐ ⏐ T −1

K

s (∇) b2,2 ⏐ ⏐T

I2

−−−−→ B2s (L2 ()).

Because of ∇j∗ ⊂ ∇j , j J , there is a natural embedding operator between these sequence spaces, s (∇)) we can apply (21) and conclude here denoted by I1 . Since T ∈ L(B2s (L2 ()), b2,2 s+t s (∇ ∗ ), b2,2 (∇)) T −1 T encont (I2 , K, B2s (L2 ())). encont (I1 , bp,q

(64)

Furthermore s+t s s+t s encont (I1 , bp,q (∇ ∗ ), b2,2 (∇ ∗ )) = encont (I1 , bp,q (∇ ∗ ), b2,2 (∇)). s (∇) into bs (∇ ∗ ) and its orthogonal complement U. Then the claimed To explain this we split b2,2 2,2 identity follows from the observation that optimal approximations Sn = n ◦ Nn , see (5), of s+t (∇ ∗ ) are obtained with : Rn → bs (∇ ∗ ). The behavior of the left-hand side elements of bp,q n 2,2 in (64) is known, see Proposition 3. As a consequence we obtain s+t s s+t s c1 n−t/d encont (I1 , bp,q (∇ ∗ ), b2,2 (∇ ∗ )) = encont (I1 , bp,q (∇ ∗ ), b2,2 (∇))

c2 encont (I2 , K, B2s (L2 ()))

(65)


635

with some positive c1 , c2 . Summarizing we have proved that the frame pair (F, G) from Lemma 5 is admissible in the sense of Definition 2 for C sufficiently large. Lemma 6. Let ⊂ Rd be a bounded Lipschitz domain. Let be a rectangular subset of such that dist(, *) > 0. Let s ∈ R, 0 < p, q ∞ and t > d( p1 − 21 )+ . Let be a wavelet system, see (49), such that r satisfies (61), see Proposition 4. Let F and G be defined as in (57)–(60). Then the frame pair (F, G) is stable with respect to the set K defined in (63), i.e. it belongs to PC (K), and it also belongs to KC ⊂ Bqs+t (Lp ()) if C is sufficiently large. 5.2.6. Proof of Theorem 4 To prove Theorem 4 we shall use the frame pair from Lemmata 5 and 6. Let ⊂ ∇ be a set of cardinality n. Then ∗˜ s n (f, (F, G))B2 (L2 ()) f, E j, gj, s (j, )∈ B2 (L2 ())

∗˜

s , c1 (f, E j, )(j, )∈ b2,2

0 (∇) where we have once again used (17). By O we denote the canonical orthonormal basis of b2,2 s and by ej, its elements, respectively. For a ∈ b2,2 (∇) we put s n a, O)b2,2 := inf aj, ej, . || n s (j, )∈ b2,2 (∇)

˜ | then If contains the n largest terms 2j s |f, E∗

j, ˜ )(j, )∈∇ , O n (f, (F, G))B s (L ()) c1 n (f, E∗

2

j,

2

s b2,2

s+t (∇). follows. Next we shall use the following abbreviations: let F1 = Bqs+t (Lp ()) and F2 = bp,q Using Proposition 3 with respect to ∇ and a simple homogeneity argument we find

sup

f F1 1

n (f, (F, G))B2s (L2 ()) c2

sup

aF2 1

−t/d s c3 n n (a, O)b2,2 ,

since ˜ )j, ∈∇ s+t f s+t (f, E∗

j, bp,q Bq (Lp ()) . This completes the proof of Theorem 4.

Remark 8. The advantage of our frame construction consists in the fact that it is universal for all bounded Lipschitz domains. The disadvantage of our frame construction lies in the use of the operator E ∗ . This limits its value in case of concrete calculations. There are other frame constructions in the literature. Let us mention here the constructions given in [4,47,6]. We add a few comments to these frames: • The frame pairs constructed in [4] allow a discretization of Besov spaces on domains under certain restrictions, both with respect to the domains and with respect to the parameters of the

636


Besov space. In particular, only the case 1p ∞, 0 < q ∞ and s > 0 is considered. With (F, G) denoting the frame pairs constructed in the aforementioned paper we obtain sup

f F1 1

n (f, (F, G))H −s () n−t/d ,

where F1 := Bq−s+t (Lp ()),

t − s > 0, 1p, q ∞.

Generalization to the case 0 < q, p < 1 have been given in [15]. • The frames constructed in [47] allow a discretization of Besov spaces on Lipschitz domains under the restrictions 0 < p, q ∞ and s < 0. The frame pairs consist of either wavelets originating from a wavelet basis on Rd or dilated and shifted versions of the associated scaling function. They all have the property that their support is contained in . Furthermore, these dilated and shifted copies of the scaling functions show up only near the boundary. Inside a box contained in and with some distance to the boundary the frame pair reduces to a biorthogonal wavelet subsystem. The same construction can be made to discretize the Besov qs (Lp ()) if s > d max(0, 1/p − 1), see the Appendix for a definition. Hence, with spaces B (F, G) denoting the frame pair of [47] we obtain sup

f F1 1

where

F1 :=

n (f, (F, G))H −s () n−t/d ,

Bq−s+t (Lp ()) if t − s < 0, q−s+t (Lp ()) if t − s > d max(0, 1 − 1). B p

• The frame pairs constructed in [6] allow a discretization of H s ()-spaces with s > 0. This construction works for domains with piecewise analytic boundary and is based on an overlapping partition of the domain by means of sufficiently smooth parametric images of the unit cube. On the reference cube, a tensor product biorthogonal wavelet basis employing the boundary adapted wavelets on the interval from [10] is constructed. Under certain conditions, the union of all the parametric images of these bases gives rise to frame pair for H s (), s > 0. • Of course, all the examples of biorthogonal wavelet bases on polyhedral domains also fit into our setting. One natural way as, e.g. outlined in [1,11], is to decompose the domain into a disjoint union of parametric images of reference cubes. Then one constructs wavelet bases on the reference cubes and glues everything together in a judicious fashion. However, due to the glueing procedure, only Sobolev spaces H s with smoothness s < 23 can be characterized. This bottleneck can be circumvented by the approach in [12]. There, a much more tricky domain decomposition method involving certain projection and extension operators is used. By proceeding in this way, norm equivalences for all spaces Bqt (Lp ()) can be derived, at least for the case p > 1, see [12, Theorem 3.4.3]. However, the authors also mention that their results can be generalized to the case p < 1, see [12, Remark 3.1.2]. 5.3. Proof of Theorem 3 Periodic Besov spaces have analoguous properties than the Besov spaces defined on smooth domains or on Rd . Our general reference for these classes is [34]. A definition of periodic Besov spaces is given in the Appendix.


637

5.3.1. Widths of periodic Besov spaces As a preparation of the proof of Theorem 3 we shall investigate the widths of embeddings of periodic Besov spaces, a topic which is also of self-contained interest. In [9] we reduced the corresponding problem for the nonperiodic Besov spaces on a Lipschitz domain to that one for the discrete Besov spaces. It would be of interest to construct an isomorphism between these periodic s as well, see Section 5.2.3. Periodic wavelet constructions exist in the spaces Bqs (Lp (T)) and bp,q literature. However, up to our knowledge, those characterizations of periodic Besov spaces are established only with additional restrictions for the parameters. So we employ a different strategy here. Theorem 5. Let 0 < p, q ∞, s ∈ R and suppose that t>

1 1 − p 2

+

holds. Then there exists a constant C ∗ such that for any C C ∗ we have frame en,C (I, Bqs+t (Lp (T)), B2s (L2 (T))) n−t .

Proof. Step 1: Preparations. For the estimate from above we shall use a connection between periodic and weighted spaces. Let (x) := (1 + |x|2 )− /2 , x ∈ R, > 0. We define # $ Bqs (Lp (R, )) := f ∈ S (R) : f ∈ Bqs (Lp (R)) ,

(66)

endowed with the natural quasi-norm f |Bqs (Lp (R, )) := f |Bqs (Lp (R)). Here S (R) denotes the collection of the tempered distributions on R. As a combination of Franke’s characterization of weighted spaces, see Theorem 5.1.3 in [34], and a result of Triebel [41] we find that f ∈ Bqs (Lp (T)) if and only if f is a 2-periodic distribution in S (R) which belongs to Bqs (Lp (R, )) with > (1/p). Moreover, there exist positive constants c1 , c2 such that c1 f |Bqs (Lp (R, )) f |Bqs (Lp (T)) c2 f |Bqs (Lp (R, )) holds for all such f. Step 2: Let ∈ C0∞ (R) be a smooth cut-off function such that (x) = 1 if |x| and (x) = 0 if |x|2. We shall study the mapping T : f → · f . Let J = [−3, 3]. Obviously f |Bqs (Lp (J )) f |Bqs (Lp (R)) = f (1/ ) (·/2) |Bqs (Lp (R)) c3 (1/ ) (·/2)|C (R)f |Bqs (Lp (R)) c4 f |Bqs (Lp (R, )), where has to be chosen sufficiently large, cf. e.g. [42, 2.8, 32, 4.7]. Since is a pointwise multiplier for these weighted Besov spaces as well we end up with T ∈ L(Bqs (Lp (T)), Bqs (Lp (J ))). Moreover, T is a bijection onto a closed subspace of Bqs (Lp (J )), denoted by Tqs (Lp (J )),

638


simultanuously for all parameters. Now we consider the commutative diagram: I1

Bqs+t (Lp (T)) −−−−→ B2s+t (L2 (T)) ⏐ ⏐ ⏐ −1 T ⏐T I2

Tqs+t (Lp (J )) −−−−→ T2s (L2 (J )). Lemma 3 yields frame s+t s −1 frame en, en,C (I2 , Tqs+t (Lp (J )), T2s (L2 (J ))) (I1 , Bq (Lp (T)), B2 (L2 (T)))T T C

= CT −1 T . Now we employ Lemma 4 and obtain with C frame frame en,C (I2 , Tqs+t (Lp (J )), T2s (L2 (J )))en,C (I2 , Tqs+t (Lp (J )), B2s (L2 (J ))).

This, together with a monotonicity arguments leads to frame s+t s −1 frame en, en,C (I2 , Bqs+t (Lp (J )), B2s (L2 (J ))). (I1 , Bq (Lp (T)), B2 (L2 (T)))T T C

The estimate from above is finished by using Theorem 4 with = J and d = 1. Step 3: Let J = (− 21 , 21 ). Then there exists a linear extension operator E : Bqs (Lp (J )) → s Bq (Lp (R)), see [33]. Let be as above. We define Ef (x) (6x) if − x , Tf (x) := 2-periodic extension otherwise. We claim that T ∈ L(Bqs (Lp (J )), Bqs (Lp (T))) for all parameter constellations. To see that we first construct an appropriate decomposition of unity. We put

(x) , k=−∞ (x − 2k)

(x) := ∞

It follows that ∞ (x − 2m) 1=

x ∈ R.

for all x ∈ R

m=−∞

and supp ⊂ {x ∈ R : (x/2) = 1}. Hence, with t = min(1, p, q) and > 1/t 1/p, we obtain Tf |Bqs (Lp (T))t c2t (Tf ) |Bqs (Lp (R))t ∞ t = c2t (· − 2m)(Tf ) |Bqs (Lp (R)) c2t =

c2t

m=−∞ ∞

(· − 2m)(Tf ) |Bqs (Lp (R))t

m=−∞ ∞

· − 2m (· − 2m)

2 m=−∞

(Tf ) |Bqs (Lp (R))t

·−2m (Tf )|Bqs (Lp (R))t , c3 (·−2m) |C (R)

2 m=−∞ ∞

t


639

where we used again assertions on pointwise multipliers, see, e.g., [42, 2.8, 32, 4.7]. The shiftinvariance of · |Bqs (Lp (R)) and the periodicity of Tf imply · − 2m (Tf )|B s (Lp (R)) = (·/2)(Tf )|B s (Lp (R)) q q 2 for all m ∈ Z. Furthermore, elementary calculations yield (· − 2m) |C (R) c4 (2m) with c4 independent of m. Altogether this proves

Tf |Bqs (Lp (T))

c5 (·/2)(Tf )|Bqs (Lp (R))

∞

1/t (2m)

t

m=−∞

c6 (·/2)(Tf )|Bqs (Lp (R)). Taking into account the identity ⎛

(x/2)Tf (x) = (x/2) ⎝

2

⎞ Ef (x − 2m) (6(x − 2m))⎠

m=−2

we have (·/2)(Tf )|Bqs (Lp (R)) c7 c8

2 m=−2 2

(·/2)Ef (x − 2m) (6(x − 2m))|Bqs (Lp (R)) Ef (x − 2m) (6(x − 2m))|Bqs (Lp (R))

m=−2

c9 Ef (6(·))|Bqs (Lp (R)) c10 Ef |Bqs (Lp (R)) c10 Ef |Bqs (Lp (J )), which proves the claim. Moreover, T is a bijection onto a closed subspace of Bqs (Lp (T)). This subspace will be denoted by Tqs (Lp (T)). Now we can argue as in Step 2. The commutative diagram I1

Bqs+t (Lp (J )) −−−−→ B2s+t (L2 (J )) ⏐ ⏐ ⏐ −1 T ⏐T I2

Tqs+t (Lp (T)) −−−−→ T2s (L2 (T)) implies frame s+t s −1 frame en,C (I2 , Bqs+t (Lp (T)), B2s (L2 (T))). en, (I1 , Bq (Lp (J )), B2 (L2 (J )))T T C

= C T −1 T . The estimate from below is finished by using Theorem 4 with = J with C and d = 1.

640


Now we consider some subspaces of Bqs (Lp (T)). Let # $ Zqs (Lp (T)) := f ∈ Bqs (Lp (T)) : f, 1T = 0 .

(67)

Observe that the function g(x) = 1 belongs to D(T), the collection of all complex-valued, 2-periodic and infinitely differentiable function. Since D(T) → Bqs (Lp (T)) → D (T) the scalar product f, 1T is well-defined for all f ∈ Bqs (Lp (T)), cf. [34, 3.5.1]. Corollary 1. Let 0 < p, q ∞, s ∈ R and suppose that 1 1 t> − p 2 + holds. Then there exists a constant C ∗ such that for any C C ∗ we have frame (I, Zqs+t (Lp (T)), Z2s (L2 (T))) n−t . en,C

Proof. The upper estimate can be established as above. For the estimate from below we start with f ∈ Bqs (Lp (J )) and J = [− 21 , − 41 ]. The operator T has to be replaced by Ef (x) (14(x + 1/2)) − Ef (−x) (14(−x + 1/2)) if − x , T f (x) := 2-periodic extension otherwise. Hence Tf, 1T = 0 which is clear for f ∈ D(T). Since D(T) is dense in D (T) it follows in general. 5.3.2. Besov spaces on the unit circle There is a simple transformation of the interval [0, 2) onto the unit circle given by t → (cos t, sin t),

0 t < 2.

For a given distribution f ∈ D () we define h(t) := f (cos t, sin t),

t ∈ R.

(68)

Observe that ∈ D() implies (cos t, sin t) ∈ D(T). Hence, if f ∈ D () then h ∈ D (T). Definition 3. Let s ∈ R and 0 < p, q ∞. Then Bqs (Lp ()) is the collection of all distributions f ∈ D () such that the corresponding distribution h is contained in Bqs (Lp (T)). We put f |Bqs (Lp ()) := h|Bqs (Lp (T)). 1/2

Lemma 7. In the sense of equivalent norms we have H 1/2 () = B2 (L2 ()) as well as −1/2 H −1/2 () = B2 (L2 ()). Proof. It holds 1/2 B2 (L2 (T))

2 2 |h(x) − h(y)|2 = h ∈ L2 (T) : dx dy < ∞ , |x − y|2 0 0


641

see e.g. [34, 3.5.4]. Furthermore, the norms h|Bqs (Lp (T)) and

1/2 2 2 |h(x) − h(y)|2 dx dy h|L2 (T) + |x − y|2 0 0 are equivalent. Now it remains to observe that 1/2 |f (x) − f (y)|2 dx dy f |L2 () + |x − y|2

1/2 2 2 |h(x) − h(y)|2 h|L2 (T) + dx dy |x − y|2 0 0 since there exist positive constants c1 , c2 such that c1 |x − y|2 (cos x − cos y)2 + (sin x − sin y)2 c2 |x − y|2 1/2

for all x, y ∈ [0, 2]. This proves H 1/2 () = B2 (L2 ()) in the sense of equivalent norms. The second assertion follows from (H 1/2 ()) = H −1/2 () (just by definition) and the duality 1/2 −1/2 relation (B2 (L2 (T))) = B2 (L2 (T)), see [34, 3.5.6]. 5.3.3. Proof of Theorem 3 We consider the commutative diagram t+1/2

Yq

I1

H 1/2 () ⏐ −1 ⏐T

I2

1/2

(Lp ()) −−−−→ ⏐ ⏐ T

t+1/2

Zq

(Lp (T)) −−−−→ Z2 (L2 (T))

Here the operator T is chosen to be the mapping f → h. Since T is a bijection considered as a mapping defined on D () with values in D (T) we obtain that T is an isomorphism belonging t+1/2 t+1/2 t+1/2 t+1/2 to L(Bq (Lp ()), Bq (Lp (T))). Consequently, T : Yq (Lp ()) → Zq (Lp (T)) is an isomorphism as well. Lemma 3 yields t+1/2

frame en, (I1 , Yq C

(Lp ()), H 1/2 ())

frame T T −1 en,C (I2 , Zq

t+1/2

1/2

(69)

(Lp (T)), Z2 (L2 (T)))

= C T −1 T . As a consequence of the commutative diagram with C t+1/2

Zq

t+1/2

Yq

I1

1/2

I2

H 1/2 ()

(Lp (T)) −−−−→ Z2 (L2 (T)) ⏐ ⏐ ⏐ ⏐T T −1 (Lp ()) −−−−→

Lemma 3, and inequality (69) we conclude t+1/2

frame en, (I1 , Yq C

t+1/2

frame (Lp ()), H 1/2 ()) en,C (I2 , Zq

From Corollary 1 we derive t+1/2

frame (I1 , Yq en,C

(Lp ()), H 1/2 ()) n−t

1/2

(Lp (T)), Z2 (L2 (T))) .

642


for C sufficiently large. Now the assertion follows from the commutative diagram (27) and Lemma 2. Acknowledgment We thank Hans Georg Feichtinger, Massimo Fornasier and Hans Triebel for valuable remarks and comments that improved our paper. Appendix A. Besov spaces Here we collect some properties of Besov spaces which have been used in the text before. For general information on Besov spaces we refer to the monographs [28–30,32,42,43,46]. A collection of results for Besov as well as Sobolev spaces on domains can be found in [9]. There detailed references are given. In most of the references given above Besov as well as Sobolev spaces are treated as classes of complex-valued functions (distributions). In the framework of information based complexity it is common to deal with real-valued functions (distributions), cf. e.g. (5). Here we make use of the following point of view: all spaces in the Appendix are spaces of complex-valued distributions. Then, finally we consider the restrictions to the real-valued subspaces. A.1. Wavelet characterizations For the construction of biorthogonal wavelet bases as considered below we refer to the recent monograph of Cohen [3, Chapter 2]. Let be a compactly supported scaling function of sufficiently high regularity and let i , i = 1, . . . , 2d − 1 be corresponding wavelets. More exactly, we suppose for some N >0 and r ∈ N supp , supp i ⊂ [−N, N]d , i = 1, . . . , 2d − 1, , ∈ C r (Rd ), i = 1, . . . , 2d − 1, i x i (x) dx = 0 for all ||r, i = 1, . . . , 2d − 1, and (x − k), 2j d/2 i (2j x − k),

j ∈ N0 , k ∈ Z d ,

is a Riesz basis in L2 (Rd ). We shall use the standard abbreviations

i,j,k (x) = 2j d/2 i (2j x − k) and k (x) = (x − k). Further, the dual Riesz basis should fulfill the same requirements, i.e. there exist functions and

i , i = 1, . . . , 2d − 1, such that

i,j,k , k = 0, k , i,j,k = k , = k, (Kronecker symbol),

i,j,k , u,v, = i,u j,v k, , supp , supp

i ⊂ [−N, N]d , i = 1, . . . , 2d − 1, r ,

i ∈ C (Rd ), i = 1, . . . , 2d − 1, x

i (x) dx = 0 for all ||r, i = 1, . . . , 2d − 1.


643

For f ∈ S (Rd ) we put f, i,j,k = f ( i,j,k )

and f, k = f (k ),

(70)

whenever this makes sense. Proposition 4. Let s ∈ R and 0 max s, d max 0, − 1 − s . p

(71)

Then Bqs (Lp (Rd )) is the collection of all tempered distributions f such that f is representable as f =

k∈Zd

a k k +

d −1 ∞ 2

ai,j,k i,j,k

(convergence in

S )

i=1 j =0 k∈Zd

with ⎛ f |Bqs (Lp (Rd ))∗ := ⎝

⎞1/p

k∈Z

⎛

|ak |p ⎠

d

d −1 ∞ 2

⎜ +⎝

⎛ 2j (s+d(1/2−1/p))q ⎝

i=1 j =0

⎞q/p ⎞1/q ⎟ |ai,j,k |p ⎠ ⎠ < ∞,

k∈Zd

if q < ∞ and ⎛

s (Lp (Rd ))∗ := ⎝ f |B∞

⎞1/p |ak |p ⎠

k∈Zd

+

sup i=1,...,2d −1

⎛ sup 2j (s+d(1/2−1/p)) ⎝

j =0,...

k∈Z

⎞1/p |ai,j,k |p ⎠

d( p1 − 1)+ , is given in [3, Theorem 3.7.7]. However, there are many forerunners with some restrictions on s, p and q.

644


A.2. Besov spaces on domains Let ⊂ Rd be an bounded open nonempty set. Then we define Bqs (Lp ()) to be the collection of all distributions f ∈ D () such that there exists a tempered distribution g ∈ Bqs (Lp (Rd )) satisfying f () = g()

for all ∈ D(),

i.e. g| = f in D (). We put f |Bqs (Lp ()) := inf g |Bqs (Lp (Rd )), where the infimum is taken with respect to all distributions g as above. A.3. Sobolev spaces on domains Let be a bounded Lipschitz domain. Let m ∈ N. As usual H m () denotes the collection of all functions f such that the distributional derivatives D f of order || m belong to L2 (). The norm is defined as f |H m () := D f |L2 (). || m

It is well-known that H m (Rd ) = B2m (L2 (Rd )) in the sense of equivalent norms, cf. e.g. [42]. As a consequence of the existence of a bounded linear extension operator for Sobolev spaces on bounded Lipschitz domains, cf. [35, p. 181], it follows H m () = B2m (L2 ()) (equivalent norms), for such domains. For fractional s>0 we introduce the classes by complex interpolation. Let 0 < s < m, s ∈ N. Then, following [26, 9.1], we define ' ( s H s () := H m (), L2 () , = 1 − . m This definition does not depend on m in the sense of equivalent norms, cf. [45]. The outcome H s () coincides with B2s (L2 ()), cf. [9] for further details. A.4. Spaces on domains and boundary conditions We concentrate on homogeneous boundary conditions. Here it makes sense to introduce two further scales of function spaces (distribution spaces). Definition 4. Let ⊂ Rd be an open nontrivial set. Let s ∈ R and 0 < p, q ∞. (i) Then B˚ qs (Lp ()) denotes the closure of D() in Bqs (Lp ()), equipped with the quasi-norm of Bqs (Lp ()). (ii) Let s 0. Then H0s () denotes the closure of D() in H s (), equipped with the norm of H s ().


645

qs (Lp ()) we denote the collection of all f ∈ D () such that there is a g ∈ Bqs (Lp (Rd )) (iii) By B with g| = f

and

supp g ⊂ ,

(72)

equipped with the quasi-norm qs (Lp ()) = inf g |Bqs (Lp (Rd )), f |B where the infimum is taken over all such distributions g as in (72). qs (Lp ()) = Bqs (Lp ()) Remark 9. For a bounded Lipschitz domain it holds B˚ qs (Lp ()) = B if 1 1 1 0 < p, q < ∞, max − 1, d −1 <s< , p p p cf. [19, Corollary 1.4.4.5, 45]. Hence, s (L2 ()) = B s (L2 ()) = H s () H0s () = B˚ 2s (L2 ()) = B 2 2 if 0 s < 21 . A.5. Sobolev spaces with negative smoothness In what follows duality has to be understood in the framework of the dual pairing (D(), D ()). Definition 5. Let ⊂ Rd be a bounded Lipschitz domain. For s > 0 we define ⎧ ⎨ H s () if s − 21 = integer, 0 H −s () := ⎩ B s (L2 ()) otherwise. 2 Remark 10. If ⊂ Rd is a bounded Lipschitz domain then s (L2 ()), H0s () = B 2

s > 0, s −

1 2

= integer

holds. Furthermore H −s () = B2−s (L2 ()),

s>0

(73)

to be understood in the sense of equivalent norms. Again we refer to [9] for detailed references. A.6. Besov spaces on the torus Here our general reference is [34, Chapter 3]. Since we are using also spaces with negative smoothness s m s/nm is bounded above by ∞ ∞ 2 s 2 K2m e−2m 2K 2m e−2m 2

m,s=1

= 2K

m=1 ∞

c2 (m + 1)−2c log(m + 1). 2

(2)

m=1

By choosing the constant c large enough the last expression can be made arbitrarily√small. Hence the probability that an i.i.d. randomly chosen sequence P satisfies Dn∗m ,s (P ) m s/nm for all m, s ∈ √ N can be made greater than 0 and hence there exists a sequence P such that Dn∗m ,s (P )m s/nm for all m, s ∈ N. Thus the theorem follows. We now prove the first corollary. Above we showed already that the probability that an i.i.d. randomly chosen sequence P is such that there is an m and an s in N such that Dn∗m ,s (P ) > √ m s/nm is bounded above by (2). Now by increasing c this probability can be made arbitrarily small and thus the probability, that a constant CP ,n as in the corollary exists, is 1. A proof of Corollary 2 and 3 can be obtained using the same arguments as in the proof of Theorem 1. Acknowledgments The support of the Australian Research Council under its Center of Excellence Program is gratefully acknowledged. The author would also like to thank the referee who suggested Corollary 1 and an improvement of Theorem 1. References [1] B. Doerr, M. Gnewuch, A. Srivastav, Bounds and constructions for the star-discrepancy via -covers, J. Complexity 21 (2005) 691–709. [2] S. Heinrich, Some open problems concerning the star-discrepancy. Numerical integration and its complexity (Oberwolfach, 2001), J. Complexity 19 (2003) 416–419. [3] S. Heinrich, E. Novak, G.W. Wasilkowski, H. Wo´zniakowski, The inverse of the star-discrepancy depends linearly on the dimension, Acta Arith. 96 (2001) 279–302. [4] F.J. Hickernell, H. Niederreiter, The existence of good extensible rank-1 lattices. Numerical integration and its complexity (Oberwolfach, 2001), J. Complexity 19 (2003) 286–300. ˇ [5] A. Hinrichs, Covering numbers, Vapnik–Cervonenkis classes and bounds for the star-discrepancy, J. Complexity 20 (2004) 477–483. [6] E. Hlawka, Funktionen von beschränkter Variation in der Theorie der Gleichverteilung, Ann. Mat. Pura Appl. 54 (1961) 325–333 (in German). [7] H. Niederreiter, Random number generation and quasi-Monte Carlo methods, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 63, SIAM, Philadelphia, PA, 1992. [8] H. Niederreiter, The existence of good extensible polynomial lattice rules, Monatsh. Math. 139 (2003) 295–307. [9] H. Niederreiter, Constructions of (t, m, s)-nets and (t, s)-sequences, Finite Fields Appl. 11 (2005) 578–600.


Optimal recovery of solutions of the generalized heat equation in the unit ball from inaccurate data夡 K.Yu. Osipenko∗ , E.V. Wedenskaya “MATI”—Russian State Technological University, Russia Received 29 October 2006; accepted 8 March 2007 Available online 27 March 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We consider the problem of optimal recovery of solutions of the generalized heat equation in the unit ball. Information is given at two time instances, but inaccurate. The solution is to be constructed at some intermediate time. We provide the optimal error and present an algorithm which achieves this error level. © 2007 Elsevier Inc. All rights reserved. Keywords: Optimal recovery; Heat equation; Inaccurate information

The application of optimal recovery theory to problems of partial differential equations was started by Traub and Wo´zniakowski in [12]. In particular, this monograph considered optimal recovery of solutions of the heat equation from finitely many Fourier coefficients of the initial function. Several recovery problems for partial differential equation from noisy information were recently studied in [2,5,7,9,13,14]. The results considered in these papers were based on a general method for optimal recovery of linear operators developed in [3,4] (see also [8]). This method extended previous research from [6]. Various problems of optimal recovery from noisy information may be found in [10] (see also [15] where the complexity of differential and integral equations is discussed). 夡 The research was carried out with the financial support of the Russian Foundation for Basic Research (Grant nos. 0501-00275, 06-01-81004, 05-01-00261, and 06-01-00530) and the President Grant for State Support of Leading Scientific Schools in Russian Federation (Grant no. NSH-5813.2006.1).


E-mail addresses: [email protected] (K.Yu. Osipenko), [email protected] (E.V. Wedenskaya). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.003

654

K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661

Here we consider the optimal recovery problem for solutions of the generalized heat equation in the unit d-ball at the time from inaccurate solutions at the times t1 and t2 . Set d d 2 2 xj < 1 , B = x = (x1 , . . . , xd ) : |x| = j =1

Sd−1 = { x ∈ Rd : |x| = 1 }. Consider the problem of finding the solution of the generalized heat equation in L2 (Bd ): ut + (−)/2 u = 0, u|t=0 = f (x), u|x∈Sd−1 = 0.

> 0, (1)

Let 0 t1 < t2 . Suppose we know approximate solutions y1 and y2 of (1) at times t1 and t2 , given with errors 1 and 2 in the L2 (Bd ) norm. We want to recover in the best way the solution of (1) at the time , t1 < < t2 . We assume that y1 , y2 ∈ L2 (Bd ) satisfy u(·, tj ) − yj (·)L2 (Bd ) j ,

j = 1, 2.

Any map : L2 (Bd ) × L2 (Bd ) → L2 (Bd ) is admitted as a recovery method. The quantity e (, L2 (Bd ), 1 , 2 , ) =

sup f,y1 ,y2 ∈L2 (Bd ) u(·,tj )−yj (·) j , j =1,2 L2 (Bd )

u(·, ) − (y1 , y2 )(·)L2 (Bd ) ,

where u is the solution of (1), is called the error of the method . The quantity E (, L2 (Bd ), 1 , 2 ) =

inf

:L2 (Bd )×L2 (Bd )→L2 (Bd )

e (, L2 (Bd ), 1 , 2 , )

is called the error of optimal recovery and a method delivering the lower bound is called an optimal recovery method. Note that the initial functions f belong to the whole space L2 (Bd ). In other words, the a priori information about initial functions is not a compact set. Therefore we use the information with infinite cardinality ([12] dealt with algorithms using information having finite cardinality). For example, it can be shown that knowing (even precisely) any finite number of Fourier coefficients of u(·, tj ), j = 1, 2, does not lead to the finite error of optimal recovery. The analysis of the problem is different for d = 1 and d > 1, because of different types of orthogonal eigensystems. We begin with the case d > 1. Let Hk denote the set of spherical harmonics of order k. It is known (see [11]) that dim H0 = a0 = 1, dim Hk = ak = (d + 2k − 2) and L2 (Sd−1 ) =

∞ k=0

Hk .

(d + k − 3)! , (d − 2)!k!

k = 1, 2, . . .


655

(k)

Let {Yj }aj k=1 denote an orthonormal basis in Hk . Let Jp be the Bessel function of the first kind (p)

of order p and s , s = 1, 2, . . . , be the zeros of Jp . The functions (p)

Zskj (x) =

Jp (s r) (k) Y (x ), r d/2−1 j

where r = |x|, x = x/r, and p = k + (d − 2)/2, form an orthogonal basis in L2 (Bd ). Moreover, (p)

Zskj = −(s )2 Zskj . We will use the orthonormal basis in L2 (Bd ), Yskj =

Zskj . Zskj L2 (Bd )

We recall that the operator (−)/2 is defined as follows: (−)/2 f =

∞ ∞

(s ) (p)

ak

cskj Yskj ,

j =1

s=1 k=0

where f =

ak ∞ ∞

(2)

cskj Yskj .

s=1 k=0 j =1

The solution of (1) can be easily found by the Fourier method of separation of variables. It has the form u(x, t) =

∞ ∞

(p) ) t

e−(s

ak

cskj Yskj (x),

j =1

s=1 k=0

where cskj are the Fourier coefficients of the initial function. Set (p)

ask = e−2(s

)

(we recall that p = k + (d − 2)/2 and is from (1)). It is known (see [1]) that for all s ∈ N, (p)

s

(p+1)

< s

(p)

< s+1

(p)

(p)

and s → ∞ as s → ∞. So the set of zeros of the Bessel functions s , s = 1, 2, . . ., p = k + (d − 2)/2, k = 0, 1, . . . , can be arranged in ascending order (p )

(p )

(p )

s1 1 < s2 2 < · · · < sn n < · · · .

656


Consequently, as1 k1 > as2 k2 > · · · > asn kn > · · · . For the case d = 1 the functions s Ys (x) = sin (x + 1), s = 1, 2, . . . , 2 form an orthonormal basis in L2 (B1 ) = L2 ([−1, 1]) and s 2 Ys = − Ys . 2 We define the operator (−)/2 as follows: /2

(−)

f =

∞ s s=1

2

cs Ys ,

where cs are the Fourier coefficients of f . It is easily verified that for d = 1 the solution of (1) is given by u(x, t) =

∞

e−(s/2) t cs Ys (x),

s=1

where cs are the Fourier coefficients of the initial function. For an arbitrary decreasing sequence 1 > 2 > · · · > 0 we introduce the following notation: 2 −t1 m = tm+1 , tm2 −t1 , 0 = t12 −t1 , +∞ , ⎧ −t2 m+1 − m−t2 ⎪ ⎪ ⎪ , ⎨ t1 −t2 m+1 − tm1 −t2 1 = ⎪ ⎪ −t1 ⎪ ⎩ 1 ,

22 21 22 21

⎧ −t m 1 − m+1 −t1 ⎪ ⎪ ⎪ , ⎨ t2 −t1 2 −t1 m − tm+1 2 = ⎪ ⎪ ⎪ ⎩ 0,

∈ m , m 1, ∈ 0 , 22 21 22 21

∈ m , m 1, ∈ 0 .

Theorem 1. Set d > 1, asm ,km , m = e−2(m/2) , d = 1. Then for all 1 , 2 > 0 the following equality: E (, L2 (Bd ), 1 , 2 ) = 1 21 + 2 22


657

holds. Moreover, the method

(y1 , y2 )(x) =

⎧ ∞ ∞ ak t1 /2 t /2 /2 1 ask y1skj + 2 ask2 y2skj ⎪ ⎪ ⎪ a Yskj (x), ⎪ t1 t2 ⎪ ⎨ s=1 k=0 sk j =1 1 ask + 2 ask

d > 1,

∞ ⎪ −(s/2) t1 y1s + ⎪ 2 e−(s/2) t2 y2s ⎪ −(s/2) 1 e ⎪ e Ys (x), d = 1, ⎪ ⎩ 1 e−2(s/2) t1 + 2 e−2(s/2) t2 s=1

(3)

where y1skj , y2skj and y1s , y2s are the Fourier coefficients of y1 (·) and y2 (·), is optimal. To prove Theorem 1 we use a general scheme of construction of optimal recovery methods for linear operators developed in [3,4] (see also [8]). Consider the following extremal problem: u(·, )2L (Bd ) → max, 2

u(·, tj )2L (Bd ) 2j , 2

j = 1, 2, f ∈ L2 (Bd ),

(4)

where u is the solution of problem (1). Set L(f, 1 , 2 ) = −u(·, )2L (Bd ) + 1 u(·, t1 )2L (Bd ) + 2 u(·, t2 )2L (Bd ) . 2 2 2 From [4] (see also [8]) follows: Theorem 2. Suppose that there exist 1 0, 2 0 and an admissible function f in (4) such that (a)

min

f ∈L2 (Bd )

L(f, 1 , 2 ) = L(f , 1 , 2 ),

(b) 1 ( u(·, t1 )2L (Bd ) − 21 ) + 2 ( u(·, t2 )2L (Bd ) − 22 ) = 0, 2 2 where u is the solution of (1) with the initial function f . If for all y1 , y2 ∈ L2 (Bd ) there exists a solution f0 of the problem 1 u(·, t1 ) − y1 (·)2L (Bd ) + 2 u(·, t2 ) − y2 (·)2L (Bd ) → min, 2 2

f ∈ L2 (Bd ),

where u is the solution of (1), then the method (y1 , y2 )(x) = u0 (x, ), where u0 is the solution of (1) with the initial function f0 , is optimal and for the error of optimal recovery the following equality: E(, L2 (Bd ), 1 , 2 ) = 1 21 + 2 22 holds.

658


Proof of Theorem 1. Consider the case d > 1. We have ∞ ∞

L(f, 1 , 2 ) =

t1 t2 (−ask + 1 ask + 2 ask )

ak

2 cskj ,

j =1

s=1 k=0

where cskj are the Fourier coefficients of f . Putting bsk =

ak

2 cskj ,

j =1

we rewrite L(f, 1 , 2 ) in the form 2 ) = L(f, 1 ,

∞ ∞

t1 − t2 − ask (−1 + 1 ask + 2 ask )bsk .

s=1 k=0

Assume that 22 /21 ∈ m , m 1. It is easily seen that in this case for 1 and 2 , equalities 2 tm2 = m , 1 tm1 + t1 2 2 tm+1 = m+1 1 m+1 +

(5)

hold. Consider the function g(z) = −1 + 1 e−2z(t1 −) + 2 e−2z(t2 −) . It is easy to verify that g is (pm+1 ) (p ) a convex function. It follows from (5) that g has two zeros zm = (smm ) and zm+1 = (sm+1 ) . In view of the convexity of g for all z zm and all z zm+1 the inequality g(z) 0 holds. Thus for all f ∈ L2 (Bd ) we have L(f, 1 , 2 )0. Define bsm ,km and bsm+1 ,km+1 from the conditions t tj bsm ,km mj + bsm+1 ,km+1 m+1 = 2j ,

j = 1, 2.

It is easy to verify that t2 −t1 2 2 2 2 /1 − m+1 bsm ,km = t11 · t −t , 2 −t1 m m2 1 − tm+1

bsm+1 ,km+1 =

21 1 tm+1

·

tm2 −t1 − 22 /21 2 −t1 tm2 −t1 − tm+1

.

For j = m, m + 1 we set bsj ,kj = 0. Then the function f (x) =

m+1 j =m

bsj kj Ysj kj 1 (x)


659

will be admissible and 2 ) = 0. L(f , 1 , Thus conditions (a) and (b) of Theorem 2 hold. Now we assume that 22 /21 ∈ 0 . It means that 22 21 t12 −t1 . Putting −t /2 f (x) = 1 1 1 Ys1 k1 1 (x),

for the solution u of (1) with the initial function f we have u(·, t1 )2L (Bd ) = 21 , 2 u(·, t2 )2L (Bd ) = 21 t12 −t1 22 . 2 Consequently, condition (b) of Theorem 2 holds. Condition (a) of the same theorem holds since for all functions f ∈ L2 (Bd ), 2 )0, L(f, 1 , and moreover 2 ) = 0. L(f , 1 , Now let us construct an optimal recovery method. According to Theorem 2 we have to solve the problem 1

ak ∞ ∞

2

t /2

(ask1 cskj − y1skj )

s=1 k=0 j =1

+ 2

ak ∞ ∞

t /2

ask2 cskj − y2skj

2

→ min,

f ∈ L2 (Bd ),

s=1 k=0 j =1

where cskj are the Fourier coefficients of f (see (2)). It can be easily verified that the solution of this problem has the form

cskj =

t /2 t /2 1 ask1 y1skj + 2 ask2 y2skj . t1 t2 1 ask + 2 ask

The optimality of method (3) now follows from Theorem 2.

660


The case d = 1 may be considered in a similar way.

(p)

We give the table (see [1]) of the first 10 ordered numbers s and for odd d (when p = k + 21 , k = 0, 1, . . .). (p )

for even d (that is, for p ∈ Z+ ) (p )

j

sj

pj

sj j

sj

pj

sj j

1

1

0

2.4048

1

3.1416

2

1

1

3.8317

1

3

1

2

5.1356

1

4

2

0

5.5200

2

5

1

3

6.3802

1

6

2

1

7.0156

2

7

1

4

7.5883

1

8

2

2

8.4172

2

9

3

0

8.6537

1

10

1

5

8.7715

3

1 2 3 2 5 2 1 2 7 2 3 2 9 2 5 2 11 2 1 2

4.4934 5.7635 6.2832 6.9879 7.7253 8.1826 9.0950 9.3558 9.4248

The authors are grateful to referees for their remarks and suggestions which greatly help us to improve the paper.

References [1] M. Abramovitz, I.A. Stegun, Handbook of Mathematical Functions, Dover, New York, 1972. [2] E.A. Balova, On optimal recovery of the Dirichlet problem solution in an annulus, Vladikavkaz Mat. Zh. 8 (2) (2006) 15–23. [3] G.G. Magaril-Il’yaev, K.Yu. Osipenko, Optimal recovery of functions and their derivatives from Fourier coefficients prescribed with an error, Mat. Sb. 193 (2002) 79–100 (English translation in Sb. Math. 193 (2002)). [4] G.G. Magaril-Il’yaev, K.Yu. Osipenko, Optimal recovery of functions and their derivatives from inaccurate information about a spectrum and inequalities for derivatives, Funkc. analiz i ego prilozh. 37 (2003) 51–64 (English translation in Functional Anal. Appl. 37 (2003)). [5] G.G. Magaril-Il’yaev, K.Yu. Osipenko, V.M. Tikhomirov, On optimal recovery of heat equation solutions, in: D.K. Dimitrov, G. Nikolov, R. Uluchev (Eds.), Approximation Theory: A Volume Dedicated to B. Bojanov, Marin Drinov Academic Publishing House, Sofia, 2004, pp. 163–175. [6] A.A. Melkman, C.A. Micchelli, Optimal estimation of linear operators in Hilbert spaces from inaccurate data, SIAM J. Numer. Anal. 16 (1979) 87–105. [7] K.Yu. Osipenko, On recovery of the Dirichlet problem solution by inaccurate input data, Vladikavkaz Mat. Zh. 6 (4) (2004) 55–62. [8] K.Yu. Osipenko, The Hardy–Littlewood–Pólya inequality for analytic functions from Hardy–Sobolev spaces, Mat. Sb. 197 (2006) 15–34 (English translation in Sb. Math. 197 (2006) 315–334). [9] K.Yu. Osipenko, N.D. Vysk, Optimal recovery of wave equation solution by inaccurate input data, Mat. Zametki 81 (6) (2007) 803–815. [10] L. Plaskota, Noisy Information and Computational Complexity, Cambridge University Press, Cambridge, 1996. [11] E.M. Stein, G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces, Princeton University Press, Princeton, NJ, 1971. [12] J.F. Traub, H. Wo´zniakowski, A General Theory of Optimal Algorithms, Academic Press, New York, 1980.


661

[13] N.D. Vysk, On a wave equation solution with inaccurate Fourier coefficients of function defined the initial form of string, Vladikavkaz Mat. Zh. 8 (4) (2006) 12–17. [14] E.V. Wedenskaya, On optimal recovery of heat equation solution by inaccurate temperature given at several times, Vladikavkaz Mat. Zh. 8 (1) (2006) 16–21. [15] A.G. Werschulz, The Computational Complexity of Differential and Integral Equations: An Information-Based Approach, Oxford University Press, New York, 1991.


Discrepancy with respect to convex polygons W.W.L. Chena,∗ , G. Travaglinib a Department of Mathematics, Macquarie University, Sydney, NSW 2109, Australia b Dipartimento di Statistica, Università di Milano-Bicocca, Edificio U7, Via Bicocca degli Arcimboldi 8, 20126 Milano,

Italy Received 31 October 2006; accepted 20 March 2007 Available online 6 April 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We study the problem of discrepancy of finite point sets in the unit square with respect to convex polygons, when the directions of the edges are fixed, when the number of edges is bounded, as well as when no such restrictions are imposed. In all three cases, we obtain estimates for the supremum norm that are very close to best possible. © 2007 Elsevier Inc. All rights reserved. Keywords: Discrepancy; Irregularities of distribution

1. Introduction Suppose that P is a distribution of N > 1 points, not necessarily distinct, in the unit square [0, 1]2 . For every Lebesgue measurable set A ⊆ [0, 1]2 , let Z[P; A] denote the number of points of P that fall into A, and consider the discrepancy function D[P; A] = Z[P; A] − N (A),

(1)

where (A) denotes the measure (or area) of A. We shall study the discrepancy function (1) when the subsets A are closed convex polygons in [0, 1]2 . More precisely, we study the behaviour of the function sup |D[P; A]|

A∈A

with respect to three classes A of convex polygons in [0, 1]2 . ∗ Corresponding author.

E-mail addresses: [email protected], [email protected] (W.W.L. Chen), [email protected] (G. Travaglini). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.006

W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672

663

Notation. We adopt standard Vinogradov notation. For two functions f and g, we write f >g to denote the existence of a positive constant c such that |f | cg. For any non-negative functions f and g, we write f ?g to denote the existence of a positive constant c such that f cg. The inequality signs > and ? may be used with subscripts involving parameters such as k and , in which case the positive constant c in question may depend on the parameters indicated. Let = (1 , . . . , k ), where 1 , . . . , k ∈ [0, ) are fixed. We denote by A() the collection of all convex polygons A in [0, 1]2 such that every side of A makes an angle i for some i = 1, . . . , k with the positive horizontal axis. Note that if = (0, /2), then A() is simply the collection of all aligned rectangles in [0, 1]2 . Then the famous result of Schmidt [12] shows that for every set P of N points in [0, 1]2 , we have sup

A∈A(0,/2)

|D[P; A]|? log N.

(2)

This result is best possible, apart from the implicit constant in the inequality, as an old result of Lerch [10] implies that there exists a set P of N points in [0, 1]2 such that sup

A∈A(0,/2)

|D[P; A]|> log N.

For the general case, the ideas in Beck and Chen [4] can be adapted easily to show that for every set P of N points in [0, 1]2 , we have sup |D[P; A]|? log N.

A∈A()

Here we establish the following complementary result. Theorem 1. Suppose that = (1 , . . . , k ), where 1 , . . . , k ∈ [0, ) are fixed. Then for every integer N > 1, there exists a set P of N points in [0, 1]2 such that sup |D[P; A]|> log N.

A∈A()

Next, we relax the restriction on the direction of the sides of the convex polygons and replace this with a restriction on the number of sides instead. We denote by Ak the collection of all convex polygons in [0, 1]2 with at most k sides. Then a result of Beck [1] implies that for every set P of N points in [0, 1]2 , we have sup |D[P; A]|?k N 1/4 .

A∈Ak

(3)

Here we establish the following upper bound. Theorem 2. For every integer N > 1, there exists a set P of N points in [0, 1]2 such that sup |D[P; A]|>k N 1/4 (log N )1/2 .

A∈Ak

(4)

Finally, we relax all the restrictions on the direction and number of sides of the convex polygons. Accordingly, we denote by A∗ the collection of all convex polygons in [0, 1]2 . Our study is

664


motivated by the wonderfully elegant work of Schmidt [13] and Beck [2] on the collection C ∗ of all convex sets in [0, 1]2 . Here, for every set P of N points in [0, 1]2 , we have sup |D[P; A]|?N 1/3 .

A∈C ∗

(5)

This is essentially best possible. For every integer N > 1, there exists a set P of N points in [0, 1]2 such that sup |D[P; A]|>N 1/3 (log N )4 .

A∈C ∗

Here we establish the following lower bound. Theorem 3. For every integer N > 1, for every set P of N points in [0, 1]2 , we have sup |D[P; A]|?N 1/3 .

A∈A∗

(6)

We remark that some of the arguments can be extended to polytopes in the d-dimensional unit cube [0, 1]d . In particular, inequalities (3) and (4) can be generalized to arbitrary dimensions d, with the exponent 41 replaced by the exponent 21 − 21 d, while inequalities (5) and (6) can also be generalized to arbitrary dimensions d, with the exponent 13 replaced by the exponent 1−2/(d +1). On the other hand, the generalization of inequality (2) to arbitrary dimensions is one of the most frustrating unsolved problems in the subject. For example, we do not know whether for every set P of N points in the cube [0, 1]3 , there is an aligned rectangular box A in [0, 1]3 such that |D[P; A]|?(log N)2 . 2. Diophantine approximation To establish Theorem 1, we shall follow the argument of Beck and Chen [5] and make use of a suitably scaled and rotated copy of the lattice Z2 . The rotation is made possible by the following result on diophantine approximation due to Davenport [7]. Lemma 2.1. Suppose that f1 , . . . , fr are real valued functions of a real variable, with continuous first derivatives in some open interval I containing some point 0 ∈ R such that f1 (0 ), . . . , fr (0 ) are all non-zero. Then there exists ∈ I such that f1 (), . . . , fr () are all badly approximable. √ Remark. A real number , such as = 2, is said to be badly approximable if there exists a constant c > 0 such that nn > c for every natural number n ∈ N. Here denotes the distance of from the nearest integer. More precisely, we shall use the following simple consequence. Lemma 2.2. Suppose that the angles 1 , . . . , k ∈ [0, ) are fixed. Then there exists ∈ [0, 2) such that tan , tan( − /2), tan( − 1 ), . . . , tan( − k ) are all finite and badly approximable.


665

We shall be concerned with the collection A() of convex polygons in [0, 1]2 , where 1 , . . . , k ∈ [0, ) are fixed. Recall that every side of such a polygon A ∈ A() makes an angle i for some i = 1, . . . , k with the positive horizontal axis. Corresponding to the given , we now choose a value of from Lemma 2.2 and keep it fixed throughout. We would like to consider the lattice formed by rotating the lattice (N −1/2 Z)2 anticlockwise by the angle about the origin. In particular, we are interested in the lattice points of that fall into [0, 1]2 . Notationally, however, it is far simpler to rescale and rotate the unit square [0, 1]2 and the convex polygons in A(). Accordingly, we consider the following rescaled and rotated variant of the original problem. Let U denote the image of the square [0, N 1/2 ]2 rotated clockwise by the angle about the origin, and let AN (; ) denote the collection of all convex polygons B in U such that every side of B either is parallel to a side of U or makes an angle i − for some i = 1, . . . , k with the positive horizontal axis. For every measurable subset B ⊆ U , let Z(B) denote the number of lattice points of Z2 that fall into B, and write E(B) = Z(B) − (B). We need the following intermediate result. Lemma 2.3. For every B ∈ AN (; ), we have |E(B)|> log N. Deduction of Theorem 1. Unfortunately, the set Z2 ∩ U does not necessarily have precisely N points. Let Q denote a set of precisely N points in U obtained by adding to or removing from Z2 ∩ U precisely ||Z2 ∩ U | − N | points. Note that ||Z2 ∩ U | − N | = |E(U )|> log N in view of Lemma 2.3. For every B ∈ AN (; ), we now let Z[Q; B] denote the number of points of Q in B. Then |Z[Q; B] − (B)| |E(B)| + |Z(B) − Z[Q; B]| |E(B)| + |Z(U ) − Z[Q; U ]| = |E(B)| + |E(U )| > log N. Now let P be obtained by rotating N −1/2 Q anticlockwise by the angle . Then P is a set of precisely N points in [0, 1]2 , and the inequality |D[P; A]|> log N holds for every convex polygon A ∈ A().

Proof of Lemma 2.3. We adopt the convention that 1 , . . . , k are distinct, but note that no convex polygon can have three parallel sides. For every n = (n1 , n2 ) ∈ Z2 , let S(n) = (n1 − 21 , n1 + 21 ] × (n2 − 21 , n2 + 21 ]. For any convex polygon B ∈ AN (; ), let N = {n ∈ Z2 : S(n) ∩ B = ∅},

666


so that E(B) =

E(B ∩ S(n)).

n∈N

Furthermore, for every i = 1, . . . , k, let Ti denote the edge(s) of B that makes the angle i − with the positive horizontal axis, let Ti∗ denote the totality of all the other edges of B, and write Ni = {n ∈ N : S(n) ∩ Ti = ∅ and S(n) ∩ Ti∗ = ∅}. We also write N + = {n ∈ N : there exist i = i with S(n) ∩ Ti = ∅ and S(n) ∩ Ti = ∅} and N − = {n ∈ N : S(n) ∩ Ti = ∅ for every i}. Clearly, N = N1 ∪ · · · ∪ Nk ∪ N + ∪ N − , and E(B) =

k

E(B ∩ S(n)) +

E(B ∩ S(n)) +

n∈N +

i=1 n∈Ni

E(B ∩ S(n)).

(7)

n∈N −

It is easy to see that |N + | = O (1) and that |E(B ∩ S(n))| 1 for every n ∈ N , so that E(B ∩ S(n)) = O (1).

(8)

n∈N +

It is also easy to see that E(B ∩ S(n)) = 0.

(9)

n∈N −

Combining (7)–(9), we conclude that E(B) =

k

E(B ∩ S(n)) + O (1).

i=1 n∈Ni

To prove Lemma 2.3, it remains to prove that for every i = 1, . . . , k, we have E(B ∩ S(n))> log N.

(10)

n∈Ni

Write i = i − . In view of symmetry, we may assume that 0 i /4. There are at most two edges of B that make the angle i with the positive horizontal axis. Let one of these lie on the line x2 − a2 = tan i , x1 − a 1 where (x1 , x2 ) ∈ R2 denotes any point on the line and a1 and a2 are real constants. Elementary calculation then shows that the contribution from this edge to the sum in (10) is given by (a2 + (m − a1 ) tan i ), ± Ai m Bi


667

√ where Ai and Bi are integers satisfying 0 Ai Bi 2N 1/2 , and (z) = z − [z] − 1/2 for every z ∈ R. Since tan i is badly approximable, giving rise to good distribution of the sequence m tan i modulo 1, the well-known result of Lerch [10] (see also [8,9,6]) shows that (a2 + (m − a1 ) tan i )> log(Bi − Ai + 2)> log N. Ai m Bi

i

i

This establishes inequality (10), and completes the proof of Lemma 2.3.

3. An argument of Beck To study Theorem 2, we use an elaboration of the idea of Beck as discussed in Section 8.1 of [3]. It is convenient to restrict the natural number N to be a perfect square, so that N = M 2 for some natural number M. This restriction can be lifted easily, in view of Lagrange’s theorem that every positive integer is a sum of at most four integer squares, so that we can superimpose up to four point distributions where the number of points in each is a perfect square. We shall consider a rescaled version of the problem, and study sets of N points in the square [0, M]2 . Let k ∈ N be fixed, with k 3. We denote by Gk the collection of all convex polygons in [0, M]2 which have at most k sides. Suppose that P is a set of N points in [0, M]2 . For every measurable subset A ⊆ [0, M]2 , let Z[P; A] denote the number of points of P that fall into A, and let E[P; A] = Z[P; A] − (A) denote the corresponding discrepancy. We would like to show that there exists a set P of N points in [0, M]2 such that for every convex polygon A ∈ Gk , we have |E[P; A]|>k N 1/4 (log N )1/2 . Our first step is to approximate the convex polygons in Gk by a special finite collection of polygons. Let = (6kM)−1 , and let Hk denote the collection of all convex polygons in [0, M]2 with at most 4k sides and with vertices on ( Z)2 ∩[0, M]2 . It is easy to see that |( Z)2 ∩[0, M]2 | = (6kN + 1)2 , so that 4k (6kN + 1)2 |Hk | ck N 8k , d d=3

where the constant ck depends at most on k. Lemma 3.1. For every convex polygon A ∈ Gk , there exist two convex polygons B + , B − ∈ Hk such that B − ⊆ A ⊆ B + and (B + \ B − ) 1. Lemma 3.2. There exists a set P of N points in [0, M]2 such that for every convex polygon B ∈ Hk , we have |E[P; B]|Ck N 1/4 (log N )1/2 , where the constant Ck depends at most on k. Before we establish these two lemmas, we shall first complete the very short deduction of Theorem 2.

668


Deduction of Theorem 2. For every convex polygon A ∈ Gk , it is not difficult to show that the convex polygons B + , B − ∈ Hk given by Lemma 3.1 satisfy the inequality |E[P; A]| max{|E[P; B − ]|, |E[P; B + ]|} + (B + \ B − ) Ck N 1/4 (log N )1/2 + 1. This gives Theorem 2 immediately.

We shall establish Lemma 3.2 in Section 4, and Lemma 3.1 in Section 5. 4. Large deviation In this section, we establish Lemma 3.2 using a large deviation-type argument. For every l = (1 , 2 ) ∈ Z2 ∩ [0, M)2 , let ql ∈ S(l) = [1 , 1 + 1) × [2 , 2 + 1) be a random point uniformly distributed in S(l) and independent of the points in the other squares, and consider the random point set = {ql : l ∈ Z2 ∩ [0, M)2 }. P Consider a fixed convex polygon B ∈ Hk , and let L(B) = {l ∈ Z2 ∩ [0, M)2 : S(l) ∩ *B = ∅}. Then it is easy to show that |L(B)|4N 1/2 . For any l ∈ L(B), let 1 if ql ∈ B,

l = 0 otherwise. Then B] = E[P;

( l − E l ).

l∈L(B)

We now use the following large deviation-type inequality due to Hoeffding; see, for example, Appendix B of Pollard [11]. Lemma 4.1. Suppose that 1 , . . . , m are independent random variables such that 0 i 1 for every i = 1, . . . , m. Then for every > 0, m 2 ( i − E i ) 2e−2 /m . Prob i=1

Note that m = |L(B)|4N 1/2 , and choose = Ck N 1/4 (log N )1/2 with a sufficiently large constant Ck . Then it is easy to check that Ck2 2 Ck2 N 1/2 log N log N, = m 4 4N 1/2


669

so that 4e−2

2 /m

4N −Ck /2 ck−1 N −8k , 2

where the last inequality is valid for all N 2 provided that Ck is large enough in terms of k and ck . Since 2 −1 1 1 −1 −8k 2e−2 /m , 2 |Hk | 2 ck N

we have

B]|Ck N 1/4 (log N )1/2 1 |Hk |−1 . Prob |E[P; 2

If we now consider all convex polygons B ∈ Hk , then the above implies

B]|Ck N 1/4 (log N )1/2 for some B ∈ Hk 1 , Prob |E[P; 2 and so

B]| Ck N 1/4 (log N )1/2 for all B ∈ Hk 1 . Prob |E[P; 2

This completes the proof of Lemma 3.2. 5. Convexity In this section, we establish Lemma 3.1 using a convexity argument. Recall that Gk denotes the collection of all convex polygons in [0, M]2 which have at most k sides, and Hk denotes the collection of all convex polygons in [0, M]2 with at most 4k sides and with vertices on ( Z)2 ∩ [0, M]2 , where = (6kM)−1 . For convenience, we make an ad hoc definition. By a -square, we mean a closed square of side and with all vertices in ( Z)2 ∩ [0, M]2 . 5.1. The outer convex polygon B + Suppose that a convex polygon A ∈ Gk is given. Corresponding to every vertex v of A, we shall define the set Ov of “outer grid points” corresponding to v. We distinguish two cases: Case 1: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Then we take Ov = {v}. Case 2: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Then we take Ov to be the collection of the vertices outside A or on the boundary of A of all -squares that contain v and whose interior intersects the boundary of A. To construct the convex polygons B + ∈ Hk given in Lemma 3.1, we simply let B + = ch Ov : v is a vertex of A denote the convex hull of all the outer grid points of A. Trivially, the convex polygon B + has at most 4k sides, since A has at most k sides. The inclusion A ⊆ B + is immediate from our definition. On the other hand, we have (B + \ A) 21 .

(11)

670


To see this, note that any point of Ov has vertical or horizontal distance at most 2 from the (extended) edges of A that intersect at v. It follows that the set B + \ A is contained in the union of k sets, each of area at most 2 M. Inequality (11) follows immediately. 5.2. The inner convex polygon B − Suppose that a convex polygon A ∈ Gk is given. Here we run into some technical complications caused by the possibility of A having some vertices that are very close together. To overcome these complications, we introduce an iterative process whereby we can remove some of the vertices of A, one at a time, to obtain a smaller polygon A∗ . Start with A0 = A. For each i = 0, 1, 2, . . . , we remove, if possible, a vertex of the polygon Ai by taking one of the steps below, and denote by Ai+1 the convex polygon formed with the remaining vertices: • Option 1: Remove a vertex v of Ai if a -square containing v contains another vertex of Ai . • Option 2: Remove a vertex v of Ai if all four vertices of every -square containing v lie outside Ai and at least one of the following two conditions is satisfied: ◦ The horizontal distance from v to an adjacent vertex of Ai is less than the horizontal distance in the same direction from v to any grid point of ( Z)2 ∩ [0, M]2 lying inside Ai or on the boundary of Ai . ◦ The vertical distance from v to an adjacent vertex of Ai is less than the vertical distance in the same direction from v to any grid point of ( Z)2 ∩ [0, M]2 lying inside Ai or on the boundary of Ai . Note that Ai+1 ⊆ Ai , and (Ai \ Ai+1 ) M. This iterative process stops when it is no longer possible to remove any vertex of a convex polygon under either option, and we denote by A∗ the last convex polygon obtained from A by this process. Note that (A \ A∗ )j M,

(12)

where j is the number of vertices of A removed by this process. Note that the convex polygon A∗ may not be unique, and has at most k − j sides. Corresponding to every vertex v of A∗ , we shall define the set Iv of “inner grid points” corresponding to v. We distinguish two cases: Case 1: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Then we take Iv = {v}. Case 2: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Let Fv denote the collection of vertices inside A∗ or on the boundary of A∗ of all -squares that contain v and whose interior intersects the boundary of A∗ —there is only one such -square, unless v lies on the boundary of two adjacent ones in which case there are precisely two. There are three possibilities: • If Fv = ∅, then we take Iv = Fv . • If Fv = ∅, and no point of the lattice ( Z)2 ∩ [0, M]2 lies inside A∗ or on the boundary of A∗ , then we take Iv = ∅. • If Fv = ∅, and there are points of the lattice ( Z)2 ∩[0, M]2 that lie inside A∗ or on the boundary of A∗ , then for every -square that contains v and whose interior intersects the boundary of A∗ , one or more of its four edges must have the following property: the edge intersects A∗ , and there is a grid line of ( Z)2 ∩ [0, M]2 , parallel to this edge, closest to v but on the other side of this edge from v, that contains points of ( Z)2 ∩ [0, M]2 that lie inside A∗ or on the boundary


671

of A∗ . We take Iv to include all such grid points of ( Z)2 ∩ [0, M]2 on these closest grid lines that lie inside A∗ or on the boundary of A∗ . The following is easy to prove: if the boundary of A∗ crosses precisely one edge or three edges of the -square, then the elements of Iv arising from this -square lie on at most one grid line. If the boundary of A∗ crosses precisely two edges of the -square, then the elements of Iv arising from this -square lie on at most two distinct grid lines, only one of which can contain more than one element of Iv . Note that the boundary of A∗ cannot cross all four edges of the -square, as this would imply that no point of the lattice ( Z)2 ∩ [0, M]2 lies inside A∗ or on the boundary of A∗ . To construct the convex polygons B − ∈ Hk given in Lemma 3.1, we simply let B − = ch Iv : v is a vertex of A∗ denote the convex hull of all the inner grid points of A∗ , with the convention that B − = ∅ if Iv = ∅ for every vertex v of A∗ . Trivially, the convex polygon B − has fewer than 4k sides, since A∗ has at most k sides. The inclusions B − ⊆ A∗ ⊆ A are immediate from our definitions. On the other hand, we have (A \ B − ) 21 .

(13)

To see this, note that each vertex v of A∗ contributes at most three vertices of B − . Moreover, any point of Iv has vertical or horizontal distance at most from the edges of A∗ that intersect at v. It follows that the set A∗ \ B − is contained in the union of k − j sets “along the edges”, each of area at most M, and the union of at most 2(k − j ) triangles “near the vertices”, each of area at most M. Inequality (13) then follows at once on noting inequality (12). The case when B − = ∅ is trivial. 6. An elementary geometric argument In this section, we adapt the wonderfully elegant geometric argument described in Schmidt [13] to give a simple proof of Theorem 3. Consider the circle of radius 21 lying within the unit square [0, 1]2 . Now let k = [N 1/3 ], and let A denote a regular convex polygon of k sides inscribed in this circle. Elementary calculation shows that any triangle whose three vertices are one of the vertices of A and the midpoints of the two adjacent edges has area 1 3 1 1 1 2 3 = 3 . sin cos (14) 4 k k 8 k N k Corresponding to each vertex of A, we now consider an isosceles triangle of area 1/2N and with its two equal sides lying on the two edges of A adjacent to this vertex. Let B1 , . . . , Bs denote those isosceles triangles which contain points of P, and let C1 , . . . , Ct denote those isosceles triangles which do not contain points of P. Clearly, D[P; Bi ] 21

for every i = 1, . . . , s,

and D[P; Cj ] = − 21

for every j = 1, . . . , t.

672


Furthermore, the triangles B1 , . . . , Bs , C1 , . . . , Ct are pairwise disjoint, in view of (14) above, and s + t = k = [N 1/3 ]. It is also easy to see that both A+ = A \ (B1 ∪ · · · ∪ Bs ) and

A− = A \ (C1 ∪ · · · ∪ Ct )

are convex polygons. But now D[P; A− ] − D[P; A+ ] =

s

D[P; Bi ] −

i=1

t

D[P; Cj ]

j =1

s t k 1 + = = [N 1/3 ]. 2 2 2 2

It follows that |D[P; A− ]| 41 [N 1/3 ]

or |D[P; A+ ]| 41 [N 1/3 ],

and this completes the proof of Theorem 3. References [1] J. Beck, Irregularities of distribution I, Acta Math. 159 (1987) 1–49. [2] J. Beck, On the discrepancy of convex plane sets, Monatsh. Math. 105 (1988) 91–106. [3] J. Beck, W.W.L. Chen, Irregularities of Distribution, Cambridge Tracts in Mathematics, vol. 89, Cambridge University Press, Cambridge, 1987. [4] J. Beck, W.W.L. Chen, Irregularities of point distribution relative to convex polygons, in: G. Halász, V.T. Sós (Eds.), Irregularities of Partitions, Algorithms and Combinatorics, vol. 8, Springer, Berlin, 1989, pp. 1–22. [5] J. Beck, W.W.L. Chen, Irregularities of point distribution relative to convex polygons III, J. London Math. Soc. 56 (1997) 222–230. [6] H. Davenport, Note on irregularities of distribution, Mathematika 3 (1956) 131–135. [7] H. Davenport, A note on diophantine approximation II, Mathematika 11 (1964) 50–58. [8] G.H. Hardy, J.E. Littlewood, Some problems of diophantine approximation: the lattice points of a right-angled triangle I, Proc. London Math. Soc. 20 (1922) 15–36. [9] G.H. Hardy, J.E. Littlewood, Some problems of diophantine approximation: the lattice points of a right-angled triangle II, Abh. Math. Sem. Univ. Hamburg 1 (1922) 212–249. [10] M. Lerch, Question 1547, L’Intermediare Math. 11 (1904) 145–146. [11] D. Pollard, Convergence of Stochastic Processes, Springer, Berlin, 1984. [12] W.M. Schmidt, Irregularities of distribution VII, Acta Arith. 21 (1972) 45–50. [13] W.M. Schmidt, Irregularities of distribution IX, Acta Arith. 27 (1975) 385–396.


Simple Monte Carlo and the Metropolis algorithm Peter Mathéa , Erich Novakb,∗ a Weierstrass Institute for Applied Analysis and Stochastics, Mohrenstrasse 39, D-10117 Berlin, Germany b Friedrich Schiller University Jena, Mathem. Institute, Ernst-Abbe-Platz 2, D-07743 Jena, Germany

Received 21 October 2006; accepted 14 May 2007 Dedicated to our dear colleague and friend Henryk Wo´zniakowski on the occasion of his 60th birthday Available online 15 June 2007

Abstract We study the integration of functions with respect to an unknown density. Information is available as oracle calls to the integrand and to the non-normalized density function. We are interested in analyzing the integration error of optimal algorithms (or the complexity of the problem) with emphasis on the variability of the weight function. For a corresponding large class of problem instances we show that the complexity grows linearly in the variability, and the simple Monte Carlo method provides an almost optimal algorithm. Under additional geometric restrictions (mainly log-concavity) for the density functions, we establish that a suitable adaptive local Metropolis algorithm is almost optimal and outperforms any non-adaptive algorithm. © 2007 Elsevier Inc. All rights reserved. MSC: 65C05; secondary: 65Y2068Q17; 82B80 Keywords: Monte Carlo methods; Metropolis algorithm; Log-concave density; Rapidly mixing Markov chains; Optimal algorithms; Adaptivity; Complexity

1. Introduction, problem description In many applications one wants to compute an integral of the form f (x) · c(x)(dx)

(1)

with a density c(x), x ∈ , where c > 0 is unknown and is a probability measure. Of course we have 1/c = (x)(dx), but the numerical computation of the latter integral is often as hard as the original problem (1). Therefore it is desirable to have algorithms which are able ∗ Corresponding author.

E-mail addresses: [email protected] (P. Mathé), [email protected] (E. Novak). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.05.002

674

P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696

to approximately compute (1) without knowing the normalizing constant, based solely on n function values of f and . In other terms, these functions are given by an oracle, i.e., we assume that we can compute function values of f and . Solution operator. Assume that we are given any class F() of input data (f, ) defined on a set . We can rewrite the integral in (1) as f (x) · (x)(dx) (f, ) ∈ F(). (2) S(f, ) = (x)(dx) This solution operator is linear in f but not in . We discuss algorithms for the (approximate) computation of S(f, ). Remark 1. This solution operator is closely related to systems in statistical mechanics, which obey a Boltzmann (or Maxwell or Gibbs) distribution, i.e., when there is a countable number j = 1, 2, . . . of microstates with energies, say Ej , and the overall system is distributed according to the Boltzmann distribution, with inverse temperature , as P (j ) :=

e−Ej , Z

j = 1, 2, . . . .

In this case the normalizing constant Z is the partition function, corresponding to 1/c from (1) and (j ) = e−Ej for j ∈ N. In this setup, if A is any global thermodynamic quantity, then its expected value A is given by A :=

1 Aj e−Ej , Z j

which can be written as S(A, ). Observe, however, that we use here slightly different assumptions since we use the counting measure on N, not a probability measure. Randomized methods. Monte Carlo methods (randomized methods) are important numerical tools for integration and simulation in science and engineering, we refer to the recent special issue [7]. The Metropolis method, or more accurately, the class of Metropolis–Hastings algorithms ranges among the most important methods in numerical analysis and scientific computation, see [6,23]. Here we consider randomized methods Sn that use n function evaluations of f and . Hence Sn is of the form as exhibited in Fig. 1. In all steps, random number generators may be used to determine the consecutive node. If the nodes xi from Step do not depend on previously computed values of f (x1 ), . . . , f (xi−1 ) and (x1 ), . . . , (xi−1 ), then the algorithm is called non-adaptive, otherwise it is called adaptive. simple and Snmh , introduced in (3) and (5) below. Specifically we analyze the procedures Sn Remark 2. The notion of adaption which is used here differs from the one recently used to introduce adaptive MCMC, see e.g. [1,3]. The Metropolis algorithm which is used in this paper is based on a homogeneous Markov chain, in our notation this is still an adaptive algorithm since the used nodes xi depend on . Hence we use the concept of adaptivity from numerical analysis and information-based complexity, see [22].


675

Fig. 1. Generic Monte Carlo algorithm based on n values of f and . The final Compute may use any mapping n : R2n → R.

For details on the model of computation we refer to [20,21,27]. Here we only mention the following: We use the real number model and assume that f and are given by an oracle for function values. Our lower bounds hold under very general assumptions concerning the available random number generator. 1 For the upper bounds we only study two algorithms in this paper, described in (3) and (5), below. Specifically we shall deal with the (non-adaptive) simple Monte Carlo method and a specific (adaptive) Metropolis–Hastings method. The former can only be applied if a random number generator for on is available. Thus there are natural situations when this method cannot be used. The latter will be based on a suitable ball walk. Hence we need a random number generator for the uniform distribution on a (Euclidean) ball. Thus the Metropolis–Hastings methods can also be applied when a random number generator for on is not available. Instead, we need a “membership oracle” for : On input x ∈ Rd this oracle can decide with cost 1 whether x ∈ or not. Error criterion. We are interested in error bounds uniformly for classes F() of input data. If Sn is any method that uses (at most) n values of f and then the (individual) error for the problem instance (f, ) ∈ F() is given by 1/2 e(Sn , (f, )) = E |S(f, ) − Sn (f, )|2 , where E means the expectation. The overall (or worst case) error on the class F() is e(Sn , F()) =

sup

(f,)∈F ()

e(Sn , (f, )).

The complexity of the problem is given by the error of the best algorithm, hence we let en (F()) := inf e(Sn , F()). Sn

The classes F() under consideration will always contain constant densities = c > 0 and all f with f ∞ 1, hence F1 () := {(f, ), |f (x)| 1, x ∈ , and = c} ⊂ F(). 1 Observe, however, that we cannot use a random number generator for the “target distribution’’ = · / , since 1 is part of the input.

676


On this class the problem (2) reduces to the classical integration problem for uniformly bounded functions, and it is well known that the error of any Monte Carlo method can decrease at a rate n−1/2 , at most. Precisely, it holds true that en (F1 ()) =

1 √ , 1+ n

if the probability is non-atomic, see [17]. On the other hand we will only consider (f, ) with S(f, ) ∈ [−1, 1], hence the trivial algorithm S0 = 0 always has error 1. For the classes FC () and F (), which will be introduced in Section 2, we easily obtain the optimal order en (F()) n−1/2 . We will analyze how en (F()) depends on the parameters C and , in case F() := FC () or F() := F (), respectively. We discuss some of our subsequent results and provide a short outline. In Section 2 we shall specify the methods and classes of input data to be analyzed. The classes FC (), analyzed first in Section 3, contain all densities with sup / inf C. In typical applications we may face C = 1020 . Then we cannot decrease the error of optimal methods from 1 to 0.7 even with sample size n = 1015 , see Theorem 1 for more details. Hence the classes FC () are so large that no algorithm, deterministic or Monte Carlo, adaptive or non-adaptive, can provide an acceptable error. We also prove that the simple (non-adaptive) Monte Carlo method is almost optimal, no sophisticated Markov chain Monte Carlo method can help. Thus we face the question whether adaptive algorithms, such as the Metropolis algorithm, help significantly on “suitable and interesting” subclasses of FC (). We give a positive answer for the classes F (), analyzed in Section 4. Here we assume that ⊂ Rd is a convex body, and that is the normalized Lebesgue measure on . The class F () contains log-concave densities, where is the Lipschitz constant of log . We shall establish in Section 4.1 that all non-adaptive methods (such as the simple Monte Carlo method) suffer from the curse of dimension, i.e., we get similar lower bounds as for the classes FC (). However, in Section 4.2 we shall design and analyze specific (adaptive) Metropolis algorithms that are based on some underlying ball walks, tuned to the class parameters. Using such algorithms we can break the curse of dimension by adaption. The main error estimate for this algorithm is given in Theorem 5, and we conclude this study with further discussion in the final Section 5. 2. Specific methods and classes of input We consider the approximate computation of S(f, ) for large classes of input data. Since with deterministic algorithms one cannot improve the trivial zero algorithm (with error 1), we study randomized or Monte Carlo algorithms. The methods. The Monte Carlo methods under consideration fit the schematic view from Fig. 1. Simple Monte Carlo. Here the random numbers 1 , . . . , n are identically and independently distributed according to , and the routine Step chooses Xi := i . The final routine Compute is the quotient of the sample means of the computed function values n simple (f, ) Sn

:=

j =1 f (Xj )(Xj ) n . j =1 (Xj )

(3)


677

Metropolis–Hastings method. This describes a class of (adaptive) Monte Carlo methods which are based on the ingenious idea to construct in Step a Markov chain having · := (4) (x)(dx) as invariant distribution without knowing the normalization. Thus, if (X1 , X2 , . . . , Xn ) is a trajectory of such a Markov chain, then we let Compute be given as 1 f (Xj ). n n

Snmh (f, ) :=

(5)

j =1

Hence we use n steps of the Markov chain, the number of needed (different) function values of and f might be smaller. We will further specify the Metropolis–Hastings algorithm for the problem at hand in Section 4.2, see Figs. 2 and 3 for a schematic presentation and Theorem 5 for the choice of . Both Monte Carlo methods construct Markov chains, i.e., the point xi depends on xi−1 and (xi−1 ), only. This trivially holds true for simple Monte Carlo, since xi does not at all depend on earlier computed function values. Remark 3. Comparisons of different Monte Carlo methods for problems similar to (2) are frequently met in the literature. We mention [5] with a comparison of Metropolis algorithms and importance sampling, where an error expansion at any instance (f, ) is given in terms of certain auto-correlations. The simple Monte Carlo method, as introduced below, is also studied there as ˜ I for = 1. simple

and Snmh , as n → ∞, is The (point-wise almost sure) convergence of both methods Sn ensured by corresponding ergodic theorems, see [14]. But, as outlined above, we are interested in the uniform error on relatively large problem classes. The classes. Here we formally describe the classes of input under consideration. The class FC (). Let be an arbitrary probability measure on a set and consider the set (x) FC () = (f, )|f ∞ 1, > 0, C, x, y ∈ . (y) Note that necessarily C 1. If C = 1 then is constant and we almost face the ordinary integration problem, since can be recovered with only one function value. In many applications the constant C is huge and we will establish that the complexity of the problem (the cost of an optimal algorithm) is linear in C. Therefore, for large C, the class is too large. We have to look for smaller classes that contain many interesting pairs (f, ) and have smaller complexity. The class F () with log-concave densities. In many applications, we have a weight with additional properties and we assume the following: • The set ⊂ Rd is a convex body, that is a compact and convex set with non-empty interior. The probability = is the normalized Lebesgue measure on the set . • The functions f and are defined on . • The weight > 0 is log-concave, i.e., ( x + (1 − )y) (x) · (y)1− , where x, y ∈ and 0 < < 1. • The logarithm of is Lipschitz, i.e., | log (x) − log (y)| x − y2 .

678


Thus we consider the class of log-concave weights on ⊂ Rd given by R () = {| > 0, log is concave, | log (x) − log (y)| x − y2 }. We study the following class F () of problem elements,

F () = (f, )| ∈ R (), f 2, 1 ,

(6)

(7)

where · 2, is the L2 -norm with respect to the probability measure , see (4). In some places we restrict our study to the (Euclidean) unit ball, i.e., := B d ⊂ Rd . Remark 4. Let RC () be the class of weight functions that belong to FC (). Then R () ⊂ RC () if C = eD , where D is the diameter of . Thus large correspond to “exponentially large” values of C. However, the densities from the class R () have some extra (local) properties: they are log-concave and Lipschitz continuous. These properties can be used for the construction of fast adaptive methods, via rapidly mixing Markov chains. 3. Analysis for FC () We assume that is an arbitrary set and is a probability measure on , and that the functions f and are defined on . In the applications, the constant C might be very large, something like C = 1020 is a realistic assumption. Therefore we want to know how the complexity (the cost of optimal algorithms) depends on C. Observe that the problem is correctly normalized or scaled such that S(FC ()) = [−1, 1], for any C 1. We will prove that the complexity of the problem is linear in C, and hence there is no way to solve the problem if C is really huge. We start with establishing a lower bound and then show that simple Monte Carlo achieves this error up to a constant. 3.1. Lower bounds Here we prove lower bounds for all (adaptive or non-adaptive) methods that use n evaluations of f and . We use the technique of Bahvalov, i.e., we study the average error of deterministic algorithms with respect to certain discrete measures on FC (). Theorem 1. Assume that we can partition into 2n disjoint sets with equal measure (equal to 1/2n). Then for any Monte Carlo method Sn that uses n values of f and we have the lower bound ⎧ ⎪ ⎨ C, 2n C − 1, √ 1 2n e(Sn , FC ()) 2 (8) 3C ⎪ 6 ⎩ , 2n < C − 1. C + 2n − 1 The lower bound will be obtained in two steps. (1) We first reduce the error analysis for Monte Carlo sampling to the average case error analysis with respect to a certain prior probability on the class FC (). This approach is due to Bahvalov, see [4]. (2) For the chosen prior the average case analysis can be carried out explicitly and will thus yield a lower bound.


679

To construct the prior let m := 2n and 1 , . . . , m the partition into sets of equal probability, and

j the corresponding characteristic functions. Furthermore, let ⎧ m ⎨ , l := C−1 ⎩ 1

m C − 1, else.

Denote Jlm the set of all subsets of {1, . . . , m} of cardinality equal to l, and m,l the equi-distribution on Jlm , while Em,l denotes the expectation with respect to the prior m,l . Let (1 , . . . , m ) be independent and identically distributed with P (j = −1) = P (j = 1) = 21 , j = 1, . . . , m. The overall prior is the product probability on Jlm × {±1}m . For any realization = (I, 1 , . . . , m ) we assign f := j j and := C

j +

j . j ∈I

j ∈I

j ∈I

The following observation is useful. Lemma 1. For any subset N ⊂ {1, . . . , m} of cardinality at most n it holds l Em,l #(I \ N) . 2 Proof. Clearly, for any fixed k ∈ {1, . . . , m} we have m,l (k ∈ I ) = l/m, thus Em,l #(I \ N) =

Em,l I (r) = #(N c )

r∈N c

where we denoted by N c the complement of N.

l l , m 2

Proof Theorem 1. Given the above prior let us denote 1/2 avg en (FC ()) := inf Em,l E |S(f, ) − q(f, )|2 , q

(9)

where the inf is taken with respect to any (possibly adaptive) deterministic algorithm which uses at most n values from f and . For any Monte Carlo method Sn we have, using Bahvalov’s argument [4], the relation avg

e(Sn , FC ())en (FC ()).

(10)

avg

We provide a lower bound for en (FC ())2 . To this end note that for each realization (f , ) the integral d is constant. In the first case m C − 1, and we can bound the integral by the choice of l as 1 cm,l := (x) (dx) = (lC + (m − l)1) 3. (11) m In the other case m < C − 1, we obtain cm,1 = (C − 1 + m)/m. Now, to analyze the average case error, let qn be any (deterministic) method, and let us assume that it uses the set N of nodes.

680


We have the decomposition

⎛

S(f , ) − qn (f , ) = ⎝

C mcm,l

⎞

⎛

j ⎠ − ⎝

j ∈I \N

C mcm,l

j ∈I ∩N

⎞ j − qn (f , )⎠ .

Given I, the random variables in the brackets are conditionally independent, thus uncorrelated. Hence we conclude that 2 C 2 Em,l E S(f , ) − qn (f , ) Em,l E j mcm,l j ∈I \N =

C2 C2l E #(J \ N ) , m,l 2 2 m2 cm,l 2m2 cm,l

by Lemma 1. In the case m C − 1 we obtain l m/C and have cm,l 3, such that Em,l |S(f, ) − qn (f, )|2

C , 36n

which in turn yields the first case bound in (8). In the other case m < C − 1 the value of l = 1 yields the second bound in (8). 3.2. The error of the simple Monte Carlo method simple

from (3). We will prove The direct approach to evaluate (1) would be to use the method Sn an upper bound for the error of this method, and we start with the following: Lemma 2. If the function obeys the requirements in FC (), then (1) 0 < inf x∈ (x) supx∈ (x) < ∞. √ (2) For every probability measure on we have 2, C1, . Proof. To prove the first assertion, fix any y0 ∈ . Then the assumption on yields (x) C(y0 ), and reversing the roles of x and y also the lower bound. Now both, the assumption on as well as the second assertion, are invariant with respect to multiplication of by a constant. In the lightof the first assertion we may and do assume that 1 (x) C, x ∈ , and we derive, using 1 (x) (dx), that 2 2 (x) (dx)C (x) (dx)C (x) (dx) ,

completing the proof of the second assertion and of the lemma.

We turn to the bound for the simple Monte Carlo method. Theorem 2. For all n ∈ N we have simple , FC ())2 min e(Sn

1,

2C n

.

(12)


681

Proof. The upper bound 2 is trivial, it even holds deterministically. Fix any pair (f, ) of inmean put. For any sample (X1 , . . . , Xn ) and function g we denote √ the sample mean by Sn (g) := n mean 1/n j =1 g(Xj ). It is well known that e(Sn , g)g2 / n. With this notation we can bound Snmean (f ) Snmean (f ) Snmean (f ) simple + − mean (f, ) S(f, ) − S(f, ) − Sn Sn () (x)(dx) (x)(dx) 1 f (x)(x)(dx) − S mean (f ) n 1 mean Sn (f ) mean + mean (x)(dx) − Sn () Sn () 1 f (x)(x)(dx) − S mean (f ) n 1 +f ∞ (x)(dx) − Snmean () , where we used Snmean (f )/Snmean () f ∞ , which holds true since the enumerator and denominator use the same sample. This yields the following error bound: √ 2 mean simple e(Sn e(Sn , f ) + f ∞ e(Snmean , ) , (f, )) 1 √ √ √ 2 2 2f ∞ 2 2 2C √ , √ (f 2 + f ∞ 2 ) √ 1 1 n n n where we use Lemma 2. Taking the supremum over (f, ) ∈ FC () allows to complete the proof. 4. Analysis for F () In this section we impose restrictions on the input data, in particular on the density, in order to improve the complexity. This class is still large enough to contain many important situations. Monte Carlo methods for problems when the target (invariant) distribution is log-concave proved to be important in many studies, we refer to [10]. One of the main intrinsic features of such classes of distributions are isoperimetric inequalities, see [2,13], which will also be used here in the form as used in [29]. Recall that here we always require that ⊂ Rd is a convex body, as introduced in Section 2. We start with a lower bound for all non-adaptive algorithms to exhibit that simple Monte Carlo cannot take into account the additional structure of the underlying class of input data and adaptive methods should be used. This bound, together with Theorem 5, will show that adaptive methods can outperform any non-adaptive method, if we consider S on F (B d ). Indeed, we also show that specific Metropolis algorithms, based on local underlying Markov chains are suited for this problem class. 4.1. A lower bound for non-adaptive methods Here we prove a lower bound for all non-adaptive methods (hence in particular for the simple Monte Carlo method) for the problem on the classes F (). Again, this lower bound will use Bahvalov’s technique.

682


We start with a result on sphere packings. The Minkowski–Hlawka theorem, see [25], says that the density of the densest sphere packing in Rd is at least (d) · 21−d 21−d . It is also known, see [11], that the density (by definition of the whole Rd ) can be replaced by the density within a convex body , as long as the radius r of the spheres tends to zero. Hence we obtain the following result. Lemma 3. There is n ∈ N such that for all m n there are points y1 , . . . , ym ∈ such that with vol() 1/d r := r(, m) := 2−1 m−1/d vol(B d ) the closed balls Bi := B(yi , r) ⊂ are disjoint. Our construction will use such points y1 , . . . , ym ∈ and the corresponding balls B1 , . . . , Bm as follows. For i ∈ {1, . . . , m} we assign i (y) := ci exp (−y − yi 2 ) , fi (y) := c˜i Bi (y), y ∈ ,

y∈

and

with constants ci and c˜i chosen such that 1= i (y) dy = ci exp(−y − yi )dy 1 = fi 2,i = c˜i2 ci exp(−y − yi ) dy.

and

Bi

The corresponding values of the mapping S are computed as S(fi , i ) = fi i dy = c˜i ci exp(−y − yi ) dy

Bi

= ci =

1/2

exp(−y − yi ) dy Bi

B(0,r) exp(−y) dy

exp(−y

− yi ) dy

= ci

1/2 exp(−y) dy B(0,r)

1/2 .

(13)

Again we turn to the average case setting, this time with probability measure 2n being the equidistribution on the set

F 2n := i fi , i , i = 1, . . . , 2n, i = ±1 ⊂ F (). Similar to (10) we have for any non-adaptive Monte Carlo method Sn (f, ) the relation ! e(Sn , F ()) min eavg (qn , 2n ), qn is deterministic and non-adaptive , where eavg (qn , 2n ) denotes the average case error of the deterministic non-adaptive method qn with respect to the probability 2n . Thus let qn be any non-adaptive (deterministic) algorithm for S on the class F () that uses at most n values.


683

The average case error can then be bounded from below as 2 1 E S(i fi , i ) − qn (i fi , i ) 2n i=1 2 1 1 min E S(i fi , i ) min S(fi , i )2 . 2 i=1,...,2n 2 i=1,...,2n 2n

E2n |S(f, ) − qn (f, )|2 =

Above, E denotes the expectation with respect to the independent random variables i = ±1. Together with (13) we obtain 1/2 1√ B(0,r) exp(−y) dy e(Sn , F ()) 2 min . i=1,...,2n 2 exp(−y − yi ) dy We bound the enumerator from below and the denominator from above. For r log 2 we can bound 1 1 exp(−y) dy vol(B(0, r)) = r d vol(B d ). 2 2 B(0,r) For the denominator we have exp(−y − yi ) dy

exp(−y − yi ) dy

Rd

= −d

Rd

exp(−y) dy = −d (d) vol *B d ,

such that we finally obtain, using the well known formula vol(*B d ) = d vol(B d ), that e(Sn , F ())

1/2 1/2 1 √ d r d 1 d r d 2 = . 2 2d! 2 d!

Using the value for r = r(, 2n) from Lemma 3 we end up with Theorem 3. Assume that Sn is any non-adaptive Monte Carlo method for the class F (). Then, with n from Lemma 3, we have for all vol d 2n max n , (/log 4) · vol B d that

e(Sn , F ())2

−d/2−3/2

·

vol vol B d

1/2

d/2 · √ n−1/2 . d!

(14)

−1/2 Remark 5. For fixed d this is a lower bound of the form e(Sn ) c d/2 √ n −1 . It is interesting only if is “large”, otherwise the already mentioned lower bound (1 + n) is better.

We stress that in the above reasoning we essentially used the non-adaptivity of the method Sn . Indeed, if Sn were adaptive, then by just one appropriate function value (x), we could identify the index i, since the functions i are global. Then, knowing i, we could ask for the value of i and would obtain the exact solution to S(f, ) for this small class F 2n for all n2.

684


4.2. Metropolis method with local underlying walk The Metropolis algorithm we consider here has a specific routine Step in Fig. 1, whereas the final step Compute is exactly as given in (5). It is based on a specific ball walk and this version is sometimes called ball walk with Metropolis filter, see [29]. Two concepts from the theory of Markov chains turn out to be important, reversibility and uniform ergodicity. We recall these notions briefly, see [24] for further details. A Markov chain (K, ) is reversible with respect to , if for all measurable subsets A, B ⊂ the balance K(x, B) (dx) = K(x, A) (dx) (15) A

B

holds true. Notice that in this case necessarily is an invariant distribution. A Markov chain is uniformly ergodic if there are n0 ∈ N, a constant c > 0 and a probability measure on such that K n0 (x, A)c(A)

for all A ⊂ and x ∈ .

(16)

Markov chains which are uniformly ergodic have a unique invariant probability distribution. Our analysis will be based on conductance arguments and we recall the basic notions, see [12,16]. If (K, ) is a Markov chain with transition kernel K and invariant distribution then we assign the (1) local conductance at x ∈ by lK (x) := K(x, \ {x}), (2) and the conductance as c A K(x, A ) (dx) (K, ) := inf , 0< (A) 0 a lower bound for the local conductance, if lK (x) l for all x ∈ . The ball walk and some of its properties. Here we gather some properties of the ball walk, see [16,29], which will serve as ingredients for the analysis of Metropolis chains using this as the underlying proposal. In particular we prove that on convex bodies in Rd the ball walk is uniformly ergodic and we bound its conductance from below, in terms of bounds l > 0 for the local conductance. We abbreviate B(0, ) = B d . Let Q be the transition kernel of a local random walk having transitions within -balls of its current position, i.e., we let Q (x, {x}) := 1 − and

vol(B(x, ) ∩ ) , vol(B d )

⎧ ⎨ vol(B(x, ) ∩ A) , Q (x, A) := vol(B d ) ⎩ Q (x, A \ {x}) + Q (x, {x}),

(18)

A⊂

and

x∈ / A,

A⊂

and

x ∈ A.

Schematically, the transition kernel may be viewed as in Fig. 2.

(19)


685

Fig. 2. Schematic view of ball walk step.

Clearly we may restrict to D, the diameter of . The following observation is important and explains why we restrict ourselves to convex bodies.. Lemma 4. If ⊂ Rd is a convex body, then the ball walk Q has a (non-trivial) lower bound l > 0 for the local conductance. Proof. It is well-known that convex bodies satisfy the cone condition (see [9, Section 3.2, Lemma 3]). Therefore we obtain that for each > 0 there is l > 0 such that for each x ∈ we have lQ (x)l. Remark 6. Observe however, that l might be very small. For = [0, 1]d , for example, we get d l = 2−d √, even if is very small. In contrast, we will see that a large l is possible for = B and 1/ d + 1, see Lemma 7. Notice that lQ (x) = vol(B(x, ) ∩ )/vol(B d ), hence in the following we use the inequality: vol(B(x, ) ∩ )l vol(B d ),

(20)

where l > 0 is a lower bound for the local conductance of the ball walk. The following result is folklore, but for a lack of reference we sketch a proof. Proposition 1. The ball walk Q is reversible with respect to the uniform distribution and uniformly ergodic. The crucial tool for proving this is provided by the notion of small and petite sets, where we refer to [19, Sections 5.2 and 5.5] for details and properties. To this end we introduce a sampled chain, say (Q )a , where a is some probability a = (a0 , a1 , . . .) on {0, 1, 2, . . .} and (Q )a is j defined by (Q )a (x, C) := ∞ j =0 aj Q (x, C). We recall that a (measurable) subset C ⊂ is petite (for Q ), if there are a probability a and a probability measure on such that (Q )a (y, A)(A),

A ⊂ ,

y ∈ C.

(21)

A set C ⊂ is small, if the same property holds true for some Dirac probability a := n , such that obviously small sets are petite. We first show that certain balls are small. Lemma 5. The sets B(x, /2) ∩ , x ∈ are small for Q .

686


Proof. First, we note that y ∈ B(x, /2) implies B(x, /2) ⊂ B(y, ). Let l > 0 be a lower bound for the local conductance of Q/2 . Using (20) for Q/2 , we obtain for any set A ⊂ that vol(B(x, /2) ∩ A) vol(B(y, ) ∩ A) 2−d Q (y, A) Q (y, A \ {y}) = vol(B(y, )) vol(/2B d ) vol(A ∩ B(x, /2) ∩ ) l · 2−d . vol(B(x, /2) ∩ ) Hence estimate (21) holds true with n0 := 1, := l · 2−d and (A) :=

vol(A ∩ B(x, /2) ∩ ) , vol(B(x, /2) ∩ )

This completes the proof.

A ⊂ .

Proof Proposition 1. We first prove reversibility with respect to . Notice that it is enough to verify (15) for disjoint sets A, B ⊂ . Furthermore we observe that for any pair A, B ⊂ of measurable subsets the characteristic function of the set {(x, y) ∈ × ,

x ∈ A, y ∈ B, x − y }

can equivalently be rewritten as

B (y) B(y,)∩A (x) or A (x) B(x,)∩B (y). Hence, letting temporarily c := vol() vol(B d ) we obtain 1 Q (x, B) (dx) = vol(B(x, ) ∩ B) dx c A A 1

(x) B(x,)∩B (y) dy dx = c A 1 =

B (y) B(y,)∩A (x) dx dy = c

B

Q (y, A) (dy),

proving reversibility. By Lemma 5 each set B(x, /2) ∩ is small, thus also petite. Petiteness is inherited by taking finite unions. Since , being compact, can be covered by finitely many sets B(x, /2) ∩ , this implies that is petite. By [19, Theorem 16.2.2] this yields uniform ergodicity of the ball walk (see [19, Theorem 16.0.2(v)]). We mention the following conductance bound of the ball walk, which is a slight improvement of [29, Theorem 5.2]. This will be a special case of Theorem 4, below, and we omit the proof. Proposition 2. Let (Q , ) be the ball walk from above, and let (Q , ) be its conductance. Let D be the diameter of and let l be a lower bound for the local conductance. Then l2

(Q , ) . (22) √ 2 8D d + 1 The local conductance may be arbitrarily small if the domain has sharp corners. For specific sets we can explicitly provide lower bounds for the local conductance, and this will be used in the later convergence analysis. In the following we mainly discuss the case = B d .


687

We start with a technical result, related to the Gamma function on R+ . We use the well-known formula vol(B d ) = d/2 /(d/2 + 1).

(23)

Lemma 6. For any z > 0 we have (z + 1/2) √ z. (z)

(24)

Consequently, vol(B d−1 ) vol(B d )

d +1 . 2

(25)

Proof. By [8, Chapter VII, Eq. (11)] we know that the function z → log (z) is convex for z > 0. Thus we conclude log (z + 1/2) =

1 2 1 2

(log (z + 1) + log (z)) (log z + 2 log (z)) = log

√

z + log (z),

from which the proof of assertion (24) can be completed. Using the representation for the volume from (23) and applying the above bound with z := (d + 1)/2 we obtain vol(B d−1 ) d +1 (d/2 + 1) √ , d vol(B ) 2

((d + 1)/2) and the proof is complete.

Using Lemma 6, we can prove the following lower bound for the local conductance of the ball walk on B d . √ Lemma 7. Let (Q , ) be the local ball walk on B d ⊂ Rd . If 1/ d + 1, then its local conductance obeys l 0.3. Proof. The proof is based on some geometric reasoning. It is clear that the local conductance l(x) ", of is minimal for points x at the boundary of B d , and in this case its value equals the portion, say V the volume of B(x, ) inside B d . If H is the hyperplane at x to B d , then this cuts off B(x, ) exactly one half of its volume. Thus we let Z(h) be the cylinder with base being the (d − 1)-ball around x in the hyperplane H of radius . Its height h is the distance of H to the hyperplane determined by the intersection of B d ∩B(x, ). This height h is exactly determined from the quotient h/ = /2, " 1 − vol(Z(h))/vol(B(x, )) and by similarity, hence h := 2 /2. By construction we have V 2 we can lower bound the local conductance l(x) by l(x)

vol(Z(h)) 1 − . 2 vol(B(x, ))

We can evaluate vol(Z(h)) as vol(Z(h)) = hd−1 vol(B d−1 ), and we obtain vol(B d−1 ) 1 d+1 vol(B d−1 ) 1 1− . l(x) − = 2 2 vol(B d ) 2d vol(B d )

688


The bound (25) from Lemma 6 implies √ 1 d +1 l(x) 1− √ . 2 2 √ √ For 1/( d + 1) we get l(x) 21 (1 − 1/ 2 ) 0.3, completing the proof.

We close this subsection with the following technical lemma, which can be extracted from the unpublished seminar note [28]. For the convenience of the reader we present its proof. In addition we will slightly improve the statement. Lemma 8. Let l > 0 be a lower bound for the local conductance of the ball walk (Q , ). For any 0 < t < l and any set A ⊂ with related sets l−t c A1 := x ∈ A, Q (x, A ) < ⊂ A, (26) 2 l−t A2 := y ∈ Ac , Q (y, A) < (27) ⊂ Ac , 2 √ we have d(A1 , A2 ) > t 2 / (d + 1). For its proof we need the following: √ Lemma 9. Let > 0. If x, y ∈ Rd are two points with distance t 2 / (d + 1) at most, then vol(B(x, ) ∩ B(y, ))(1 − t) vol(B d ).

(28)

Proof. Let u := x − y2 . If u < then the volume of the intersection of B(x, ) and B(y, ) is exactly the same as the volume of the ball B d minus the volume of the middle slice with distance u as thickness. The volume of this slice is bounded from above by the volume of the cylinder with base B d−1 and thickness u. Thus we obtain vol(B d−1 ) vol(B(x, ) ∩ B(y, ))vol(B d ) − u vol(B d−1 ) = vol(B d ) 1 − u . vol(B d ) Applying Lemma 6 we obtain vol(B d−1 ) vol(B d−1 ) 1 d + 1 = , vol(B d ) vol(B d ) 2 √ √ thus by the choice of u 2 t/ d + 1 we conclude that √ √ 2 t d + 1 vol(B d−1 ) √ √ u t, vol(B d ) 2 d + 1 and the proof is complete.

We turn to the Proof √ of Lemma 8. Let x ∈ A1 and y ∈ A2 be in , and suppose that their distance is at most t 2 / (d + 1). Simple set theoretic reasoning shows that vol(B(x, ) ∩ B(y, ) ∩ ) vol(B(x, ) ∩ ) − vol(B(x, ) \ B(y, )) vol(B(x, ) ∩ ) − vol(B(x, ) \ (B(x, ) ∩ B(y, ))) = vol(B(x, ) ∩ ) − vol(B d ) + vol(B(x, ) ∩ B(y, )).


689

Since l is a lower bound for the conductance l(x) we have that vol(B(x, ) ∩ ) l vol(B(x, )) = l vol(B d ). Taking this into account and using (28) we end up with vol(B(x, ) ∩ B(y, ) ∩ ) l vol(B d ) − vol(B d ) + (1 − t) vol(B d ) = (l − t) vol(B d ). In probabilistic terms this rewrites as Q (x, B(x, ) ∩ B(y, ) ∩ )l − t, and similarly Q (y, B(x, ) ∩ B(y, ) ∩ ) l − t. Now, if A ⊂ is any measurable subset with complement Ac then for x ∈ A and y ∈ Ac we obtain # B(x, ) ∩ B(y, ) ∩ ⊂ B(x, ) ∩ Ac ∩ (B(y, ) ∩ A ∩ ) , which in turn yields Q (x, Ac ) + Q (y, A) l − t, but this contradicts the definition of the sets A1√and A2 . Hence any two points from A1 and A2 , respectively, must have distance larger than t 2 / (d + 1), and the proof is complete. Properties of the related Metropolis method. We analyze Metropolis Markov chains which are based on the ball walk, introduced above, for some appropriately chosen . As it will turn out, the related Metropolis chains are perturbations of the underlying ball walk, and its properties, as established in Propositions 1 and 2 extend in a natural way. For ∈ R () we define the acceptance probabilities as (y) (x, y) := min 1, . (29) (x) The corresponding Metropolis kernel is given by K, (x, dy) := (x, y)Q (x, dy) + (1 − (x, y)Q (x, dy))x (dy). Note that for x ∈ / A we obtain K, (x, A) = (x, y)Q (x, dy) = A

1 vol(B d )

(30)

A∩B(x,)

(x, y) dy.

Below we sketch a single Metropolis Step from the present position x ∈ with kernel K, (x, ·) (Fig. 3). The procedure Ball-walk-step was described in Fig. 2. We start with the following observation. Lemma 10. Let be the Lipschitz constant in R () and := exp(−). Uniformly for ∈ R () the following bound for the related Metropolis chain holds true: K, (x, dy)Q (x, dy).

(31)

Proof. Let A ⊂ . If dist(x, A) > then there is nothing to prove. Otherwise, for y ∈ A∩B(x, ) we find from (6) and (29) that (x, y) exp(−x − y2 ) e− = .

690


Fig. 3. Schematic view of the Metropolis step. Note that the Acceptance step results in an acceptance probability of (x, y) = min {1, (y)/(x)}.

By definition of the transition kernel K, from (30) we can use to bound K, (x, A) min {(x, y), y ∈ A ∩ B(x, )} Q (x, A) Q (x, A). The proof is complete.

The assertion of Proposition 1 extends to the family of Metropolis chains as follows. Proposition 3 (cf. Mathé [18, Proposition 1]). Let Q be the ball walk from (19) on . For each ∈ R () and D the corresponding Metropolis chains from (30) are uniformly ergodic and reversible with respect to the related . Proof. Reversibility with respect to is clear by the choice of the function . To prove uniform ergodicity, let be from Lemma 10 and c from (16). As established in Lemma 10 we have K, (x, dy)Q (x, dy). It is easy to see, and was established in [18, Proof of Theorem 2], that this extends to all iterates as Kn, (x, dy)n Qn (x, dy). Recall that under the assumptions made, the ball walk is uniformly ergodic, and from Proposition 1 we obtain n0 such that for all x ∈ we have n

A ⊂ ,

K,0 (x, A)n0 c(A), proving uniform ergodicity.

(32)

Remark 7. Notice that (32) is obtained with right-hand side uniformly for all ∈ R (), a fact which will prove useful later. Finally we prove lower bounds for the conductance of the Metropolis chains. Theorem 4. Let (K, , ) be the Metropolis chain based on the local ball walk (Q , ) and let (K, , ) be its conductance, where ∈ R (). Let l be a lower bound for the local conductance of Q . For ∈ R () we have

le− l min (K, , ) ,1 , (33) √ 8 2D d +1 where D is the diameter of .


691

Remark 8. As mentioned above, Proposition 2 is a special case of Theorem 4 for = 0. The proof of Theorem 4 will be based on Lemma 8 for the underlying ball walk, specifying t := l/2. This extends to the Metropolis walk as follows. Lemma 11. Let from (6) and l be the local conductance of the ball walk. We let := exp(−). For A ⊂ we assign l T1 := x ∈ A, K, (x, Ac ) < ⊂ A, (34) 4 l T2 := y ∈ Ac , K, (y, A) < ⊂ Ac . (35) 4 √ Then d(T1 , T2 ) > l / (2d + 2). Proof. It is enough to prove T1 ⊂ A1 and T2 ⊂ A2 . If x ∈ T1 then Lemma 10 implies K, (x, Ac ) < l/4, hence 1 l Q (x, Ac ) K, (x, Ac ) . 4 The other inclusion is proved similarly.

We turn to the Proof of Theorem 4. Let A ⊂ be the set for which the conductance is attained. We assign sets T1 and T2 as in Lemma 11 and distinguish two cases. If (T1 ) < (A)/2 or (T2 ) < (Ac )/2, then the estimate (33) follows easily. For instance, if (T1 ) < (A)/2 then c K, (x, A ) (dx) K, (x, Ac ) (dx) A

A\T1

l l l (A \ T1 ) (A) min (A), (Ac ) , 4 8 8

thus (K, , )l/8 in this case, which proves (33). Otherwise we have (T1 ) (A)/2 and (T2 ) (Ac )/2. In this case we apply an isoperimetric inequality, see [29, Theorem 4.2] to the triple (T1 , T2 , T3 ) with T3 := \ (T1 ∪ T2 ) to conclude that (T3 )

2d(T1 , T2 ) min (T1 ), (T2 ) , D

(36)

hence under the size constraints in this case it holds true that (T3 )

d(T1 , T2 ) min (A), (Ac ) . D

Using the reversibility of the Metropolis chain (K, , ) we have K, (x, Ac ) (dx) = K, (y, A) (dy), A

Ac

(37)

692


which implies 1 K, (x, Ac ) (dx) = K, (x, Ac ) (dx) + K, (y, A) (dy) 2 A Ac A 1 c K, (x, A ) (dx) + K, (y, A) (dy) 2 Ac ∩T3 A∩T3 1 l l (A ∩ T3 ) + (Ac ∩ T3 ) 2 4 4 l l = (A ∩ T3 ) + (Ac ∩ T3 ) = (T3 ). 8 8 √ Since by Lemma 11 we can bound d(T1 , T2 ) l / (2d + 2) we use (37) to complete the proof. If we restrict ourselves to Metropolis chains on B d , then Lemma 7 provides a lower bound for the local conductance which is independent of the dimension d. As a simple consequence of Theorem 4 we then obtain the following: Corollary 1. Assume that ∈ R (B d ) and (d + 1)−1/2 . Then we obtain

9 (K, , ) e− . √ 2 1600 d + 1

√ To maximize we define ∗ = min 1/ d + 1, 1/ and obtain 1 1 1 ∗ (K, , )0.0025 √ . min √ , d +1 d +1 Error bounds. For the class F () the above lower conductance bound (33) will yield an error estimate for the problem (2). Let Sn be the estimator based on a sample of the local Metropolis Markov chain with transition K, , starting at zero. To estimate its error we combine the estimates of the conductance of K, with two results, partially known from the literature. To formulate the results we note the following. The Markov kernel K, is reversible with respect to and hence induces a self-adjoint operator K, : L2 (, ) → L2 (, ). The spectrum (K, ) is contained in [−1, 1] and 1 ∈ (K, ) and we are interested in the second largest eigenvalue , := sup{ ∈ (K, )| = 1} of K, . This is motivated by the extension of a result from [18, Corollary 1] about the worst case error of Sn , uniformly for (f, ) ∈ F (). Lemma 12. lim

sup

n→∞ (f,)∈F ()

e(Sn , (f, ))2 · n =

sup

∈R ()

1 + , 1 − ,

.


693

The proof is given in the Appendix. For Markov chains which start according to the invariant distribution the bound is similar, but more explicit and was given in [26] and [16, Theorem 1.9]. The relation of the second largest eigenvalue , to the conductance is given in Lemma 13 (Cheeger’s Inequality, see [12,15,16]). , := 1 − , 2 (K, , )/2. We are ready to state our main result for the Metropolis algorithm Sn , based on the Markov chain K, , for the class F (B d ), i.e., when ⊂ Rd is the Euclidean unit ball. Theorem 5. Let Sn = 1/n nj=1 f (Xj ) be the estimator based on a sample (X1 , . . . , Xn ) of the local Metropolis Markov chain with transition K, , where (d + 1)−1/2 . Then lim

n→∞

sup (f,)∈F (B d )

e(Sn , (f, ))2 · n

8 · 16002 e2 (d + 1) · 2 . 81

(38)

Again we may choose ∗ = min (d + 1)−1/2 , −1 and obtain lim

n→∞

sup (f,)∈F (B d )

! ∗ e(Sn , (f, ))2 · n 594700 · (d + 1) max d + 1, 2 .

Proof. This follows from Corollary 1, and Lemmas 12 and 13.

(39)

5. Summary Let us discuss our findings. The results from Section 3 clearly indicate that the superiority of Metropolis algorithms upon simpler (non-adaptive) Monte Carlo methods does not hold in general. Specifically, it does not hold for the large classes FC () of input without additional structure. On the other hand, for the class F (B d ), specific Metropolis algorithms that are based on local underlying walks are superior to all non-adaptive methods. Even more, on B d the cost of the ∗ algorithm Sn , roughly given by the number n of evaluations of and f, increases like a polynomial ∗ in d and . More precisely, according to the asymptotic constant limn→∞ e(Sn , F (B d ))2 · n

(39), is bounded by a constant times max d 2 , d2 , i.e., the complexity grows polynomially in d and and, for fixed d, increases (at most) as 2 . If we only allow non-adaptive methods then this asymptotic constant, again for fixed d, increases at least as d , see (14). We believe that this problem is tractable in the sense that the number of function values to achieve an error can be bounded by n(, F (B d ))C−2 d max(d, 2 ).

(40)

We did not prove (40), however, since Theorem 5 is only a statement for large n. Notice that according to Theorem 5 the size ∗ of the underlying balls walk needs to be adjusted both to the spatial dimension d and the Lipschitz constant .

694


The analysis of the Metropolis algorithm is based on properties of the underlying ball walk; in particular we establish uniform ergodicity of the ball walk for convex bodies ⊂ Rd . Also, based on conductance arguments, we provide lower bounds for the spectral gap of the ball walk. As a consequence, in the case = 0 the estimate (38) provides an error bound for the ball walk (Q , ), which is asymptotically of the form e(Sn , L2 (B d , )) C−1 (d/n)1/2 . The results extend in a similar way to any family d ⊂ Rd for which the underlying local ball walk Q has (for d ) a non-trivial lower bound for the local conductance that is independent of the dimension. Finally, from the results of Section 3 we can conclude that adaption does not help much for the classes FC (). Hence we have new results concerning the power of adaption, see [22] for a survey of earlier results, in particular that it may help to break the curse of dimensionality for the classes F (B d ). Acknowledgment We thank two anonymous referees and Daniel Rudolf for their comments. Appendix A. Proof of Lemma 12 Lemma 12 extends the bound from [18, Theorem 1], which deals with a single uniformly ergodic chain. It was obtained from on a contraction property, as stated in [18, Proposition 1]. The goal of the present analysis is to establish this asymptotic result uniformly for all Metropolis chains with density from R (), by showing that this contractivity holds true uniformly. Contractivity of the Markov operator. We assign to each transition kernel K on with corresponding invariant distribution the bounded linear mapping P, given by (Pf )(x) := f (y)K(x, dy). (41) Also we let E denote the mapping which assigns any integrable function its expectation as a constant function E(f ): = f (x)(dx). For each K the mapping P −E is bounded in L∞ (, ), with norm less than or equal to one and we shall strengthen this uniformly for kernels K, with ∈ R (). Within this operator context uniform ergodicity is equivalent to a specific form of quasi-compactness, namely there are 0 < < 1 and n0 ∈ N for which P n − E: L∞ () → L∞ ()

for nn0 .

(42)

We first show that reversibility allows to transfer this to the spaces L1 (, ). Lemma 14. Suppose that the transition kernel K with corresponding mapping P is reversible. Then for all n ∈ N we have P n − E: L1 (, ) → L1 (, ) P n − E: L∞ (, ) → L∞ (, ).

(43)

Proof. If K is reversible, then so are all iterates K n . Thus for arbitrary functions f ∈ L1 (, ) and h ∈ L∞ (, ) we have, using the scalar product on L2 (, ), that (P n − E)f, h = f, (P n − E)h.


Consequently, for any f ∈ L1 (, ) we have (P n − E)f 1 = sup (P n − E)f, h = h∞ 1

f 1

695

sup f, (P n − E)h

h∞ 1

sup (P − E)h∞ , n

h∞ 1

from which the proof can be completed.

Proposition 4. For any convex body ⊂ Rd there are an integer n0 and a constant 0 < < 1 such that uniformly for ∈ R () we have n

P,0 − E: L1 (, ) → L1 (, ) .

(44)

Proof. This is an immediate consequence of the bound (32). As mentioned in Remark 7 uniform ergodicity was established uniformly for ∈ R (). It is well known (see [19, Theorem 16.2.4]) that this implies that there is an < 1 such that uniformly for ∈ R () we have n

P,0 − E: L∞ () → L∞ () In the light of Lemma 14 this yields (44).

for n n0 .

(45)

Finally we sketch the Proof of Lemma 12. Using Proposition 4 we can extend the proof of [18, Theorem 1]. In particular, the bounds from Eqs. (13)–(15) in [18] tend to zero uniformly for ∈ R (). Moreover, starting at zero, after one step according to the underlying ball walk, the (new) initial distribution is uniformly bounded with respect to the uniform distribution on , hence also with respect to , such that we establish the asymptotics in Lemma 12. References [1] C. Andrieu, É. Moulines, On the ergodicity properties of some adaptive MCMC algorithms, Ann. Appl. Probab. 16 (3) (2006) 1462–1505. [2] D. Applegate, R. Kannan, Sampling and integration of near log-concave functions, in: STOC ’91: Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, ACM Press, New York, NY, USA, 1991, pp. 156–163. [3] Y.F. Atchadé, J.S. Rosenthal, On adaptive Markov chain Monte Carlo algorithms, Bernoulli 11 (5) (2005) 815–828. [4] N.S. Bahvalov, Approximate computation of multiple integrals, Vestnik Moskov. Univ. Ser. Mat. Meh. Astr. Fiz. Him. 1959 (4) (1959) 3–18. [5] F. Bassetti, P. Diaconis, Examples comparing importance sampling and the Metropolis algorithm, Illinois J. Math. 50 (2006) 67–91. [6] I. Beichl, F. Sullivan, The Metropolis algorithm, Comput. Sci. Eng. 2 (1) (2000) 65–69. [7] I. Beichl, F. Sullivan, Guest editors’ introduction: Monte Carlo methods, Comput. Sci. Eng. 8 (2) (2006) 7–8. [8] N. Bourbaki, Functions of a real variable, Elements of Mathematics (Berlin), Springer, Berlin, 2004. [9] V.I. Burenkov, Sobolev Spaces on Domains, Teubner-Texte zur Mathematik, vol. 137, Teubner Verlag Stuttgart, 1998. [10] A. Frieze, R. Kannan, N. Polson, Sampling from log-concave distributions, Ann. Appl. Probab. 4 (3) (1994) 812– 837. [11] E. Hlawka, Ausfüllung und Überdeckung konvexer Körper durch konvexe Körper, Monatsh. Math. Phys. 53 (1949) 81–131. [12] M. Jerrum, A. Sinclair, Approximating the permanent, SIAM J. Comput. 18 (6) (1989) 1149–1178. [13] R. Kannan, L. Lovász, M. Simonovits, Isoperimetric problems for convex bodies and a localization lemma, Discrete Comput. Geom. 13 (3–4) (1995) 541–559.

696


[14] U. Krengel, Ergodic theorems, de Gruyter Studies in Mathematics, vol. 6, Walter de Gruyter & Co., Berlin, 1985. [15] G.F. Lawler, A.D. Sokal, Bounds on the L2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality, Trans. Amer. Math. Soc. 309 (2) (1988) 557–580. [16] L. Lovász, M. Simonovits, Random walks in a convex body and an improved volume algorithm, Random Structures Algorithms 4 (4) (1993) 359–412. [17] P. Mathé, The optimal error of Monte Carlo integration, J. Complexity 11 (4) (1995) 394–415. [18] P. Mathé, Numerical integration using Markov chains, Monte Carlo Methods Appl. 5 (4) (1999) 325–343. [19] S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Springer, London, 1993. [20] E. Novak, Deterministic and stochastic error bounds in numerical analysis, Lecture Notes in Mathematics, vol. 1349, Springer, Berlin, 1988. [21] E. Novak, The real number model in numerical analysis, J. Complexity 11 (1) (1995) 57–73. [22] E. Novak, On the power of adaption, J. Complexity 12 (3) (1996) 199–237. [23] D. Randall, Rapidly mixing Markov chains with applications in computer science and physics, Comput. Sci. Eng. 8 (2) (2006) 30–41. [24] G.O. Roberts, R.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 (1) (1996) 95–110. [25] C.A. Rogers, Packing and covering, Cambridge Tracts in Mathematics and Mathematical Physics, No. 54, Cambridge University Press, New York, 1964. [26] A. Sokal, Monte Carlo methods in statistical mechanics: foundations and new algorithms, in: Functional integration (Cargèse, 1996), Plenum, New York, 1997, pp. 131–192 [27] J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-based complexity, Academic Press Inc., Boston, MA, 1988 with contributions by A.G. Werschulz and T. Boult. [28] S. Vempala, Lecture 17, Random walks and polynomial time algorithms, http://www-math.mit.edu/ ˜vempala/random/course.html, 2002. [29] S. Vempala, Geometric random walks: a survey, Combinatorial and computational geometry, Math. Sci. Res. Inst. Publ., vol. 52, Cambridge University Press, Cambridge, 2005, pp. 577–616.


Tensor-product approximation to operators and functions in high dimensions Wolfgang Hackbusch, Boris N. Khoromskij∗ Max-Planck-Institute for Mathematics in the Sciences, Inselstr. 22-26, D-04103 Leipzig, Germany Received 13 December 2006; accepted 14 March 2007 Available online 6 April 2007 Dedicated to Henryk Wozniakowski on the occasion of this 60th birthday

Abstract In recent papers tensor-product structured Nyström and Galerkin-type approximations of certain multidimensional integral operators have been introduced and analysed. In the present paper, we focus on the analysis of the collocation-type schemes with respect to the tensor-product basis in a high spatial dimension d. Approximations up to an accuracy O(N −/d ) are proven to have the storage complexity O(dN 1/d logq N ) with q independent of d, where N is the discrete problem size. In particular, we apply the theory to a collocation 1 , x, y ∈ Rd , d 3. Numerical illustrations are discretisation of the Newton potential with the kernel |x−y| given in the case of d = 3. © 2007 Published by Elsevier Inc. MSC: 65F50; 65F30; 46B28; 47A80

1. Introduction The construction of efficient representations to multi-variate functions and related operators plays a crucial role in the numerical analysis of higher dimensional problems arising in a wide range of modern applications. For example we mention multi-dimensional integral equations, elliptic and parabolic boundary value problems posed in Rd , d 2. In multi-dimensional applications, standard numerical methods usually fail due to the so-called “curse of dimensionality’’ (Bellman). This effect can be relaxed or completely avoided by a systematic application of Kronecker-type tensor-product representations of the arising high-order


E-mail addresses: [email protected] (W. Hackbusch), [email protected] (B.N. Khoromskij). 0885-064X/$ - see front matter © 2007 Published by Elsevier Inc. doi:10.1016/j.jco.2007.03.007

698

W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714

tensors. Algebraic methods for tensor-product approximations to high-order tensors have been extensively discussed in the literature (see [25,4,5,16,21,27] and related references). In recent papers, modern methods of structured tensor-product approximations to some classes of multi-dimensional integral operators and operator-valued functions have been applied successfully (see [1,14,10,2,12,13,17,19,22] and references therein). Approximations via the Nyström and Galerkin methods have been considered in [14,13,19]. Applications to nonlocal operators associated with the density matrix ansatz for solving the Hartree–Fock equation [7,2], computation of molecular density functions by the Ornstein–Zernicke equation [6], as well as collision integrals of the deterministic Boltzmann equation [18] have demonstrated the efficiency of low-rank tensor-product decompositions. In the present paper, we discuss analytic methods for tensor-product approximations to multidimensional integral operators. For the case of collocation schemes we focus on the construction of tensor decompositions which are exponentially convergent in the separation rank. It is worthwhile to note that on the one hand, collocation schemes can be applied to much more general class of integral operators than the Nyström methods (including kernels with the diagonal singularity), on the other hand, they are much simpler than the Galerkin methods (requiring only a one-fold integration). Approximations up to the accuracy O(n− ) are proven to have the storage complexity O(dn logq n) with q independent of d, where N = nd is the discrete problem size (compare with the linear complexity O(nd )). For example, such methods can be applied to the classical Newton, −|x−y| |x−y|) 1 , e |x−y| and cos(|x−y| with x, y ∈ Rd . Yukawa and Helmholtz kernels |x−y| The rest of the paper is organised as follows. In Section 2 analytic methods for the separable approximation via collocation schemes of multi-variate functions and related tensors are presented and analysed. We describe constructive schemes via sinc-quadrature and sincinterpolation methods. In Section 3 we apply the results of Section 2 to integral operators in Rd in the collocation case. We complete the article with some numerical examples illustrating the efficiency of the low tensor-rank approximation of Newton’s potential via optimised sinc-quadratures. 2. Separable approximation of functions and tensors 2.1. Approximation of functions with low separation rank We start the discussion on the level of functions. In many applications we are interested in approximating a multi-variate function f = f (x1 , . . . , xd ) (from a certain class H) in the set of separable functions M1 = {u : u(x) = 1 (x1 ) · . . . · d (xd ),

k ∈ H },

(2.1)

where H is a real, separable Hilbert space of functions defined on R (say, H = L2 (R)). A better approximation can be obtained by allowing for a linear combination of separable products in the approximation set, Mr = u : u(x) =

k

(1) (d) bk k1 (x1 ) · . . . · kd (xd ),

bk ∈

() R, k

∈H ,

(2.2)


699

where the sum is taken over multi-indices k = (k1 , . . . , kd ) with 1 k r , r ∈ N, and r = (r1 , . . . , rd ). We call the coefficients B = {bk } ∈ Rr1 ×···×rd

(2.3) ()

the core tensor. Without loss of generality we can assume that the components k ( = 1, . . . , d) are orthonormal, i.e., ()

(k , () m ) = k ,m ,

k , m = 1, . . . , r ,

where k ,m is Kronecker’s delta. Approximations in the set r (1) (d) Mr = u : u(x) = bk k (x1 ) · . . . · k (xd ),

bk ∈ R,

() k

∈H

⊂ Mr ,

(2.4)

k=1 ()

with normalised components k = 1 can be considered. This is the special case of the approximation problem in Mr with r = (r, . . . , r), under the constraint that all off-diagonal elements of the coefficient tensor B = {bk } are zero. Since Mr is not a linear space, we obtain a difficult nonlinear approximation problem when we want to estimate (f, S) := inf f − s s∈S

(2.5)

for f ∈ H, where either S = Mr or S = Mr . 2.1.1. Approximation in S = Mr For S = Mr , the approximation problem (2.5) can be considered in the framework of best r-term approximation with regard to a redundant dictionary (cf. [24]). A system D of functions from H is called a dictionary, if each g ∈ D has norm one and its linear span is dense in H. We denote by r (D) the collection of all functions in H which can be written in the form: s= cg g, ⊂ D, # r, g∈

with cg ∈ R and r ∈ N. For f ∈ H, the best r-term approximation error is defined by r (f, D) :=

inf

s∈r (D )

f − s.

Let H be a real separable Hilbert space. A simple algorithm that inductively computes an estimate to the best r-term approximation is known as the so-called Pure Greedy Algorithm (see [24] and respective references). Let g = g(f ) ∈ D be an element from D maximising |(f, g)|. We define G(f ) := (f, g)g,

R(f ) := f − G(f ).

Now the Pure Greedy Algorithm reads as follows: define R0 (f ) := f and G0 (f ) := 0. Then, for all 1 mr, define Gm (f ) := Gm−1 (f ) + G(Rm−1 (f )), Rm (f ) := f − Gm (f ) = R(Rm−1 (f )) inductively. The output Gr (f, D) of this algorithm is proven to realise the best r-term approximation in the particular case when D is an orthogonal basis of H.

700


For the approximation problem on Mr we set D := {g ∈ H ∩ M1 : g = 1}

and hence r (D) = Mr .

The Pure Greedy Algorithm can be applied to functions characterised via the approximation property r (f, D)r −q ,

r = 1, 2, . . . ,

with some q ∈ (0, 21 ], and leads to the error bound (cf. [24]) f − Gr (f, D) C(q, D)r −q ,

r = 1, 2, . . . ,

which is “too pessimistic’’ in our applications. More precisely, we are interested in an efficient r-term approximation on a class of analytic functions with point singularities. In this case, under certain assumptions, we are able to prove exponential convergence r (f, D)C exp(−r q ),

r = 1, 2, . . . ,

with q = 1 or 21 . Since, in general, the Pure Greedy Algorithm fails to recover exponential convergence, we will discuss more special numerical methods to estimate r (f, D) for this special class of analytic functions. Specifically, we consider quadrature- and interpolation-based approaches. 2.1.2. Approximation in S = Mr () Notice that the coefficients bk and the “single-component’’ functions k in (2.2) are not uniquely defined (up to orthogonal transforms). However, this does not pose any problems from the computational point of view since the minimisation problem (2.5) is equivalent to the dual maximisation problem on V , = 1, . . . , d, which does not include bk . Assume that there exists a minimiser of the problem (2.5). Then, for given orthonormal com() ponents () = (1 , . . . , () r ) ( = 1, . . . , d), the coefficient tensor bk minimising (2.5) is represented by (1) (d) (2.6) bk = f, k1 (·) · . . . · kd (·) , k = (k1 , . . . , kd ). For given f ∈ H, the minimisation problem (2.5) with S = Mr is equivalent to the maximisation problem 2 (1) (d) (f ; Mr ) := sup f (x1 , . . . , xd )k1 (x1 ) · . . . · kd (xd ) , () k

()

where () , = 1, . . . , d, is taken from the set of r -tuples () = (1 , . . . , () r ) with orthonormal components. (1) (d) In fact, let f(r) = k bk k1 (x1 ) · . . . · kd (xd ) be the solution of problem (2.5). Then we obtain the identity f(r) = BF ,


701

since orthonormal components do not effect the L2 -norm. Now, with fixed components () ( = 1, . . . , d), relation (2.5) is actually a linear least-squares problem with respect to bk ,

(1) (d) (f, f ) − 2 f, bk k1 (x1 ) · . . . · kd (xd ) + (B, B) → min . k

Solving the corresponding Lagrange equation

(1) (d) − f, bk k1 (x1 ) · . . . · kd (xd ) + (B, B) = 0

for all B ∈ Rr1 ×···×rd ,

k

implies (2.6). Now we obtain f − f(r) 2 = f 2 − B2F , and substitution of (2.6) proves the assertion. 2.2. Tucker and canonical tensor decompositions Higher-order tensors (multi-dimensional arrays) appear in numerical computations as the discrete analogue of multi-variate functions. We consider dth order tensors A = [ai1 ,...,id ](i1 ,...,id )∈I ∈ RI defined on the product index set I = I1 × · · · × Id . It is a generalisation of vectors (tensors of √ order 1) and matrices (tensors of order 2). We use the Frobenius norm A := A, A induced by the inner product A, B := ai1 ,...,id bi1 ,...,id with A, B ∈ RI , (2.7) (i1 ,...,id )∈I

which corresponds to the Euclidean norm of a vector. Below we will discuss tensor-product approximations which can be viewed as an analogue to low-rank approximations of matrices, where a large system matrix is replaced by a low-rank matrix (compare the classical approximation of integral operators using degenerate kernels). The class of rank-1 tensors is a discrete analogue of the class of separable functions M1 . In the following, we use the notation ⊗ to represent the canonical (rank-1) tensor U ≡ {ui }i∈I = b · U (1) ⊗ · · · ⊗ U (d) ∈ RI , (1)

(d)

()

defined by ui1 ,...,id = b · ui1 · · · uid with U () ≡ {ui }i ∈I ∈ RI and with a multi-index i := (i1 , . . . , id ) ∈ I. The discrete analogue of the approximation in Mr given by (2.2) is called the Tucker representation which deals with the approximation A(r) =

r1 k1 =1

···

rd kd =1

(1)

(d)

bk1 ,...,kd · Vk1 ⊗ · · · ⊗ Vkd ≈ A, ()

(2.8)

where the Kronecker factors Vk ∈ RI (k = 1, . . . , r , = 1, . . . , d) are real vectors of the respective size n = |I |. Without loss of generality, we assume that for all the vectors () {Vk : k = 1, . . . , r } are orthonormal. In the following, we denote by Tr the set of tensors represented by (2.8). Conventionally, we use the short notations r = (r1 , . . . , rd ) (Tucker rank)

702


and B = {bk } ∈ Rr1 ×···×rd (core tensor). Notice that the representation of elements A ∈ Tr even with orthogonal V() is not unique due to the rotational uncertainty in the core tensor B. The canonical representation is defined by A(r) =

r

(1)

bk · V k

(d)

⊗ · · · ⊗ Vk ,

bk ∈ R,

(2.9)

k=1 ()

where the Kronecker factors Vk ∈ RI are normalised vectors (in chemometrics literature it is often called CANDECOMP/PARAFAC, or shortly CP model). The minimal number r in the representation (2.9) is called the Kronecker rank of A(r) . We denote by Cr the set of tensors represented by (2.9). If we let r = r , n = n ( = 1, . . . , d), then both the CP and Tucker representations require only drn numbers to represent the canonical components plus r (resp. r d ) memory units for the core tensor B. The main computational problem is the approximation of a given higher-order tensor A0 in a certain set of structured low-rank tensors S. In particular, S may be one of the classes Tr or Cr . There are algebraic, analytically-based and combined strategies for computing a Kronecker tensor-product decomposition of a higher-order tensor. In this paper we apply analytically-based representation methods, which are efficient for a special class of function-related operators/tensors (see definitions and examples in §3). In the context of integral operators, we consider the representation problem for a class of realvalued square matrices related to discrete multi-dimensional operators posed in Rd , such that A ∈ RN×N , N = nd . More precisely, let A ∈ RI ×I with #I = N be a real-valued matrix defined on the index set I := In × · · · × In (d factors) with In = {1, . . . , n}. A matrix A (resp. a vector X) can also be regarded as a dth order tensor A ∈ RI1 ×···×Id (resp. X ∈ RI1 ×···×Id ). Hence one needs numerically tractable data-sparse representations of the arising high-dimensional tensors. We recall that the Kronecker product of matrices A ⊗ B is defined as a block matrix [aij B], provided that A = [aij ]. The operation “⊗’’ can be applied to arbitrary rectangular matrices (in particular, to row or column vectors) and in the multi-factor version as in (2.11). The general rank-(r1 , . . . , rd ) Tucker-type matrix decomposition uses the tensor-product matrix format 2

A=

r1

···

k1 =1

rd kd =1

bk1 ,...,kd Vk1 ⊗ · · · ⊗ Vkd ∈ RI1 ×···×Id , (1)

(d)

2

2

bk1 ,...,kd ∈ R,

2

(2.10)

where the Kronecker factors Vk ∈ RI ×I , k = 1, . . . , r , = 1, . . . , d, may be matrices of a certain structure (say, hierarchical matrix, wavelet-based format, Toeplitz/circulant, low-rank, etc.). Here r = (r1 , . . . , rd ) is again called the Kronecker rank. The matrix representation by the format (2.10) is a generalisation of the low-rank approximation of matrices, corresponding to the case d = 2. Note that (2.10) is identical to (2.8) except that now () Vk are matrices and not vectors. The canonical Kronecker tensor-product format as proposed in [14,12] reads ()

A=

r k=1

(1)

bk Vk

(d)

⊗ · · · ⊗ Vk ,

bk ∈ R,

(2.11)


703

()

where the Kronecker factors Vk ∈ Rn×n may be matrices of a certain structure (say, hierar() chical matrices). Again, (2.11) is identical to (2.9), but with vectors Vk replaced by matrices. Approximations of function-related matrices by matrices of the form (2.11) were, e.g., studied in [14,26]. The main result of these papers are estimates of the form r = O(log2 ) and r = O(| log | log n), where is the prescribed approximation accuracy. If there is no structure in the Kronecker factors then the storage is O(drn2 ), while the matrix-times-matrix complexity is O(dr 2 n3 ). Introducing the hierarchical (H-matrix) approximation to the Kronecker factors (HKTapproximations) leads to estimates of the form O(dr 2 n logq n) (under certain assumptions on the origin of the matrices [14]). 2.3. Collocation-type approximation of function-related tensors Here we discuss the low Kronecker rank approximation of a special class of higher-order tensors related to certain “discretisations’’ of multi-variate functions, which will be called functiongenerated tensors (FGTs). They directly arise from: (a) a separable approximation of multi-variate functions; (b) Nyström/collocation/Galerkin discretisations of integral operators; (c) the tensor-product approximation of some analytic matrix-valued functions. In the following we define FGTs corresponding to collocation-type discretisation. 2.3.1. General error estimate p Let ( = 1, . . . , d) be a uniform tensor-product grid of intervals on a rectangle := p [a0 , b0 ] , a0 , b0 > 0, indexed by I = I,1 × · · · × I,p with I being the product index set such that for i = (i,1 , . . . , i,p ) ∈ I we have i,m ∈ In := {1, . . . , n} (m = 1, . . . , p). p p p Furthermore, let d := 1 × · · · × d be the corresponding tensor-product lattice in a hypercube d d := ⊂ R with d = dp. (1) (d) We denote by {xi1 , . . . , xid } with i ∈ I ( = 1, . . . , d) a set of collocation points living on the tensor-product lattice d := 1 × · · · × d . In our applications we have d 2 with some fixed p ∈ {1, 2, 3}. In particular, matrix decompositions correspond to the choice p = 2. In this case we introduce the reordered index set of pairs M := {m : m = (i , j ), i , j ∈ In } ( = 1, . . . , d), so that I = M1 × · · · × Md with M = In × In . The Nyström and Galerkin approximations to function-related tensors were discussed in [12,19]. In the following we focus on the collocation-type schemes, which are based on tensor-product ansatz functions i

(y1 , . . . , yd ) =

d

i (y ),

i = (i1 , . . . , id ) ∈ I1 × · · · × Id .

(2.12)

=1

In the following definition, g is a given function defined on × . Definition 2.1 (Collocation, FGT(C)). Given the tensor-product basis set (2.12), we introduce the () () () variable i := (xi , y ) with the collocation point xi and y ∈ , the pair m := (i , j ) ∈ M

704


and define the collocation-type dth order FGT by A ≡ A(g) := [am1 ,...,md ] ∈ RM1 ×···×Md with (1) (d) g( i1 , . . . , id ) j (y1 , . . . , yd ) dy, m ∈ M . (2.13) am1 ,...,md :=

In numerical calculations involving integral operators (e.g., arising in classical potential theory or from the Hartree–Fock, Ornstein–Zernicke and Boltzmann equations), n may vary from several hundreds to several thousands, therefore, for d 3, a naive “entry-wise’’ representation to the fully populated tensor A in (2.13) amounts to substantial computer resources, at least of the order O(ndp ). The key observation is that there is a natural duality between separable approximation of the multi-variate generating function and the tensor-product decomposition of the related multidimensional array. Hence, the CP-type decompositions like (2.9) (or (2.11) in the matrix case) can be derived by using a corresponding separable expansion of the generating function g (see [12,14] for more details). Lemma 2.2. Suppose that a multi-variate function g : ⊂ Rd → R can be approximated by a separable expansion gr ():=

r

(1)

(d)

k k ( (1) )· · ·k ( (d) )≈g(),

=( (1) , . . ., (d) )∈Rd ,

(2.14)

k=1

where k ∈ R and k : ⊂ R2 → R. Define the CP decomposition (2.9) via A(r) := A(gr ) (cf. Definition 2.1) with the choice,

() () () j k ( i ) (y ) dy ∈ RI ×J , = 1, . . . , d, k = 1, . . . , r, Vk = (i,j )∈M

(2.15) and with

()

i

=

() (xi , y ),

i ∈ I . Then the FGT(C) A(r) provides the error estimate

A(g) − A(r) (gr )∞ Cg − gr L∞ () . Proof. Using (2.13) we readily obtain (r) (g(x, y) − gr (x, y)) j (y) dy | max |am1 ,...,md − am 1 ,...,md x∈d j g − gr L∞ () (y) dy, and the result follows with C = maxj

supp j

j

| (y)| dy.

Though in general a decomposition (2.14) with small separation rank r is a complicated numerical task, in many interesting applications efficient approximation methods are available. In particular, for a class of multi-variate functions (say, for certain shift-invariant Green’s kernels in Rd ) it is possible to obtain a dimensionally independent Kronecker rank r = O(log n| log |), e.g., based on sinc-quadrature methods or an approximation by exponential sums (see case-study examples in [12,3,18]). The next lemma shows that the error of the Tucker decomposition in the collocation case is directly related to the error of the separable approximation of the generating function.


705

Lemma 2.3. Let g : → R be approximated by a separable expansion gr ():=

r1

···

k1 =1

rd kd =1

(1)

(d)

bk1 ,...,kd k1 ( (1) )· · ·kd ( (d) )≈g,

() ∈R2 , 1 d,

(2.16)

where bk1 ,...,kd ∈ R. Then the FGT(C), corresponding to the choice

() () () j Vk = k ( i ) (y ) dy ∈R I × J , =1, . . ., d, k =1, . . ., r , (i,j )∈M

(2.17) ()

with i

()

= (xi , y ) provides the error estimate

A(g) − A(r) (gr )∞ Cg − gr L∞ () . Proof. In the FGT(C) case, by the construction of A(r) , we have ⎛ ⎞ rd r1 (1) (d) A−A(r) ∞ max ⎝g(x, y)− ··· bk1 ,...,kd k1 ( (1) )· · ·kd ( (d) )⎠ j (y) dy x∈d k1 =1 kd =1 g − gr L∞ () max | j j (y) dy, j

supp j

which proves the assertion.

Next we discuss the constructive CP and Tucker decomposition of FGTs applied to a general class of analytic generating functions characterised in terms of their Laplace transform. The construction is based on sinc-approximation methods. 2.3.2. Error bounds for canonical decomposition of FGTs We use constructive approximation based on the sinc-quadrature and sinc-interpolation methods. For the readers convenience we recall the standard approximation results by the sinc-methods (cf. [23,9]). First, we introduce the Hardy space H 1 (D ) as the set of all complex-valued functions f, which are analytic in the strip D := {z ∈ C : |m z| < }, such that

(2.18)

N(f, D ) :=

*D

|f (z)| |dz| =

R

(|f (x + i)| + |f (x − i)|) dx < ∞.

Given f ∈ H 1 (D ), h > 0, and M ∈ N0 , the corresponding sinc-quadrature reads as TM (f, h) := h

M

f (kh) ≈

k=−M

R

f ( ) d .

(2.19)

Proposition 2.4. Let f ∈ H 1 (D ), h > 0, and M ∈ N0 be given. If |f ( )|C exp(−b| |)

for all ∈ R with b, C > 0,

(2.20)

706


then the quadrature error satisfies √ f ( ) d − TM (f, h) Ce− 2 bM R

with h =

2 /bM

and with a positive constant C depending only on f, , b (cf. [23]). If f possesses the hyperexponential decay |f ( )| C exp(−b ea| | )

for all ∈ R with a, b, C > 0,

(2.21)

then the choice h = log( 2 baM )/(aM) leads to (cf. [9]) f ( ) d − TM (f, h) CN(f, D )e−2 aM/ log(2 aM/b) . R

Note that 2M + 1 is the number of quadrature/interpolation points. If f is an even function, the number of quadrature/interpolation points reduces to M + 1. We consider a class of multi-variate functions g : Rd → R parametrised by g() = G(()) ≡ G() with ≡ () = 1 ( (1) ) + · · · + d ( (d) ) > 0, : R2 → R+ , where the univariate function G : R+ → R can be represented via the Laplace transform G()e− d. G() = R+

The FGT(C) approximation corresponds to p = 2, () = (x , y ) (cf. Definition 2.5). Without loss of generality, we introduce one and the same scaling function i (·) = (· + (i − 1)h),

i ∈ In ,

(2.22)

for all spatial dimensions = 1, . . . , d, where h > 0 is the mesh parameter. We simplify further and set ≡ () = d=1 0 ( () ), i.e., = 0 (x , y )

( = 1, . . . , d) with 0 : [a, b]2 → R+ .

(2.23)

For i ∈ In , let {x¯i } be the set of cell-centred collocation points on [a, b]. For each i, j ∈ In , we introduce the parameter dependent integral e−0 (x¯i ,y) (y + (j − 1)h) dy, 0. (2.24) i,j () := R2

Theorem 2.5 (FGT(C) approximation). Assume (a)–(c) below: (a) G() has an analytic extension G(w), w ∈ G , into a certain domain G ⊂ C which can be mapped conformally onto the strip D , such that w = (z), z ∈ D and −1 : G → D ; (b) for all (i, j) ∈ I × J the transformed integrand f (z) := (z)G((z))

d

i j ((z))

=1

belongs to the Hardy space H 1 (D ) with N (f, D ) < ∞ uniformly in (i, j);

(2.25)


707

(c) the function f (t), t ∈ R, in (2.25) has either exponential (c1) or hyper-exponential (c2) decay as t → ±∞. Under the assumptions (a)–(c), we have that, for each M ∈ N+ , the FGT(C), A(g), defined on [a, b]d allows an exponentially convergent super-symmetric 1 CP decomposition A(r) ∈ Cr with () Vk as in (2.15), where the expansion (2.14) is obtained by the substitution of f from (2.25) into the sinc-quadrature (2.19), such that we have

A(g) − A(r) ∞ Ce−M with r = 2M + 1, √ where = 21 , = 2 b in case (c1) and with = 1, = Proof. First, we notice that by definition d aij = G() i j () d = f (t) dt R+

R

=1

(2.26) 2 b log(2 aM/b)

in case (c2).

for (i, j) ∈ I × J .

(2.27)

We now apply the sinc-quadrature to the transformed integrand f to obtain M TM (f, h) := h f (kh) ≈ f (t) dt, (i, j) ∈ I × J , R

k=−M

with

f (t) dt − TM (f, h) Ce−M , R

and with the respective , (see Proposition 2.4). Combining this estimate with (2.27) and taking into account the separability property of the exponential prove the assertion for all (i, j) ∈ I × J . Noticing that our quadrature does not depend on the index (i, j) completes the proof. Theorem 2.5 proves the existence of a CP decomposition to the FGT A(g) with the Kronecker rank r = O(| log | log 1/ h) (in case (c2)) or r = O(log2 ) (in case (c1)), which provide an approximation of order O(). In our applications we usually have 1/ h = O(n), where n is the number of grid-points in one spacial direction. Theorem 2.5 typically applies to translationinvariant or spherically symmetric functions (see examples in §3). 2.3.3. Error bounds for Tucker decomposition of FGTs For the class of applications with more general than translation-invariant functions the analytic separation methods are based on tensor-product interpolation. This leads to the rank-(r1 , . . . , rd ) Tucker decomposition with small rank parameters r . Again we recall the related results on the sinc-interpolation method. Let x sin [ (x − kh)/h] S(k, h)(x) = ≡ sinc −k (k ∈ Z, h > 0, x ∈ R)

(x − kh)/h h be the kth sinc-function with step size h, evaluated at x with the sinc-function given by sinc(z) =

sin( z) ,

z

z ∈ C.

1 A dth order tensor is called super-symmetric if it is invariant under arbitrary permutations of indices in {1, . . . , d}.

708


The classical sinc-interpolant (cardinal series representation) is given by CM (f, h) =

M

S(, h)f (h) ≈ f.

(2.28)

=−M

If (2.20) holds then the interpolation error satisfies (cf. [23]) √

f − CM (f, h)∞ CM 1/2 e− bM

with h =

/bM,

(2.29)

where specifies the width of the strip D in (2.18). Assuming the hyper-exponential decay of f as in (2.21), we obtain (cf. [9]) N (f, D ) − aM/ log( aM/b)

aM with h = log e f − CM (f, h)∞ C (aM) . 2 b (2.30) The sinc-interpolation method can be extended to the multi-dimensional case. For each = 1, . . . , d, let g (·) : = [a0 , b0 ] → R be a univariate parameter-dependent function in variable

() , which is the restriction of a multi-variate function g( (1) , . . . , (d) ) onto with fixed remaining variables (1) , . . . , (−1) , (+1) , . . . , (d) . Suppose that g (·) satisfies all the regularity and decay conditions above, uniformly in = 1, . . . , d. It is shown in [12] that the tensor-product (1) (d) sinc-interpolation CM g := CM , . . . , CM g with respect to d variables, provides the exponential error estimate |g( ) − CM (g, h)( )|

− M CdM max N (g (·), D ) e log M , 2 =1,...,d

()

()

with the stability (Lebesgue) constant M = O(log M), and where CM g = CM (g, h) denotes the univariate sinc-interpolation from (2.28) applied to the variable ∈ I . For a class of analytic functions with point singularities the expansion (2.16) can be derived via tensor-product sinc-interpolation applied with respect to variables 1 , . . . , d . Theorem 2.6. Assume that all conditions in Theorem 2.5 are satisfied. Then the FGT(C), A(g), () allows an exponentially convergent rank-(r, . . . , r) Tucker decomposition A(r) ∈ Tr with Vk as ()

in (2.17), where k ( () ) = sinc(−ak 0 ( () )) with 0 from (2.23) ( = 1, . . . , d), and where bk are explicitly represented via the sinc-interpolation (2.28), such that: A(g) − A(r) ∞ C(1 + log M)d e−M with = 21 , = Theorem 2.5.

with r = 2M + 1,

√ 2 b in case (c1) and with = 1, =

2 b log(2 aM/b)

(2.31) in case (c2) as in

Proof. Modifying the proof of Theorem 2.5, we now apply the sinc-interpolation. In particular, the error bounds (2.29) and (2.30) show exponential convergence in M for the tensor-product sinc-interpolant CM g, which proves the assertion.


709

The error estimate (2.31) yields max r = O(| log |−1 ). In some cases we get the estimate = O(log 1/ h) (cf. [12]). −1

3. Tensor approximation of integral operators 3.1. Canonical and Tucker decompositions in Rd The principal ingredient in the structured tensor-product representation of integral operators in many spatial dimensions is a separable approximation of the multi-variate function representing the kernel of the operator. Given the integral operator G : L2 () → L2 () in := [0, 1]d ∈ Rd , d 2, g(x, y)u(y) dy, x, y ∈ , (Gu) (x) :=

with some shift-invariant kernel function g(x, y) = g(|x − y|), which can be represented in the form 2 2

1 + · · · + d , g(x, y) = g( 1 , . . . , d ) ≡ g where = |x − y | ∈ [0, 1], = 1, . . . , d. To approximate the operator G, we consider a collocation scheme with tensor-product test functions i (x1 , . . . , xd ) as in (2.12). If the kernel function g allows a global separable approximation, cf. Lemma 2.6, we approximate the collocation stiffness matrix A = {(Aj )|x¯i }i,j∈Ind ∈ RN×N ,

N = nd , x¯i ∈ d ,

by a matrix A(r) of the form (2.11), where the Vk are n × n matrices given by n Vk =

1

0

j

k (|x¯i − y |) (y ) dy

,

= 1, . . . , d,

(3.1)

i,j =1

providing the corresponding error estimate in l∞ matrix norm. For standard singular kernels (say, Green’s kernels) the direct separable approximation is usually not possible. In this case one can apply Theorem 2.9. In both cases we are able to prove the existence of a low Kronecker rank CP approximation for the class of multi-dimensional integral operators. Note that A − A(r) can be easily estimated in, say, the Frobenius matrix norm. When using the tensor-product sinc-interpolation, the function k (|u − v|) can be proved to be asymptotically smooth. For the class of kernel functions approximated by exponential sums, the factor k (|u − v|) even appears to be globally smooth (indeed, it is the entire function). Hence, the canonical components Vk can be further approximated in the H-matrix format (cf. [13]). In the case of uniform grids also the Toeplitz-type structure can be used to represent n × n matrices Vk . For the class of translation-invariant kernels (see [12] and examples below), we obtain a dimensionally independent bound r = O log h−1 log −1 log log −1 .

710


Following Definition 2.1, we introduce the dth order FGT(C) representing the integral operator G, A ≡ A(g) := [am1 ,...,md ] ∈ RM1 ×···×Md . Assume that the kernel function g(x, y) ≡ g( (1) , . . . , (d) ) allows a separable approximation (2.16) via the sinc-interpolation, so that the approximation converges exponentially in r = max r (see Theorem 2.6). Then the associated rank-(r1 , . . . , rd ) Tucker decomposition (2.10) in Tr () (cf. (2.10)) is specified by the Kronecker factors Vk ∈ RM , explicitly defined by (2.17). Let r = (r, . . . , r). Theorem 2.6 now yields the error estimate A(g) − A(r) ∞ Ce−M

with r = 2M + 1,

(3.2)

and with constants , from (2.31). As it was already mentioned, (3.2) yields max r = O(| log |−1 ) with from (2.18). In turn, for a class of shift-invariant kernels we get the estimate −1 = O(log n). In general, given a tolerance > 0, we have the bound d−1 −1 −1 . log log r = O log (n) log The numerical complexity of the Tucker decomposition is estimated by drn2 + r d . The storage cost for the corresponding Tucker approximation combined with hierarchical matrices has the complexity drn logq n + r d . Notice that the Tucker approximation can be applied to more general kernel functions compared with the canonical representation (as it was already mentioned, the latter is usually restricted to the class of translation-invariant kernels). 3.2. Application to the Newton potential Let x, y ∈ Rd , p = 2, and define = |x − y|2 = 21 + · · · + 2d with = x − y : R2 → R, ∈ R2d . The family of functions g(x, y) ≡ g() := 1/

with ∈ R>0 ,

arises in potential theory, in quantum chemistry and in computational gas dynamics (cf. [18]). The choice = 21 corresponds to the classical Newton potential, while = − 21 refers to the Euclidean distance function. Low separation rank decomposition to the multi-variate functions √ 1/, 1/ and to the related Galerkin approximations were discussed in [12–14,19], while the kernel function , ∈ R, was considered in [18]. Let us take a closer look to the collocation-type FGT corresponding to the Newton potential √ 1/ in the hypercube [−R, R]d ∈ Rd . As a basic example, we consider piecewise constant finite elements on the uniform grid with step-size h > 0, defined by scaling functions (x) = (x) associated with a tensor-product grid. Again, we let {x¯i } be the set of cell-centred collocation points. In our case, for the function in (2.24) we have 0 (x, y) = (x − y)2 (x, y ∈ R), hence making use of the Gaussian transform 2 1 2 e− d, √ =√

R+


we obtain

i,j () = |i−j | () :=

e−

2 (x¯ −y)2 i

R

2

j (y) dy,

711

0, i, j ∈ In

(see (2.24), (2.22) for the definition of i,j , j ). √ Lemma 3.1. The FGT(G) for the Newton potential 1/ allows a CP approximation in the d hypercube [−R, R]d ∈ R with exponential convergence rate (independent of d) as in (2.31), where = 21 . Proof. We apply Theorem 2.5. To check the condition (a), let us choose the analyticity domain as a sector G := {w ∈ C : |arg(w)| < } with apex angle 0 < 2 < /2 (here G = 1), and then apply the conformal map −1 : G → D

with w = (z) = ez , −1 (w) = log(w)

(cf. Theorem 2.5(a)). To check condition (b) of Theorem 2.5, first, we notice that the transformed integrand f (z) := exp(z)

d

i j ((z))

=1

belongs to the Hardy space H 1 (D ). In fact, introducing the error function erf by t 2 2 erf(t) := √ e− d,

0

(3.3)

we calculate the explicit representation d−1

i ,j () = i () =

2d erf( ih) − erf( (i − 1)h) , 2

(3.4)

with xi = xi = (i −1)h, n = n, h = b/n (uniform grid spacing) for i = i −j +1 = 1, . . . , n, = 1, . . . , d. Since erf(z)/z is an entire function it proves the required analyticity of f. Now we estimate the constant N (f, D ) applying arguments similar to those in [19] (cf. Lemma 4.7). Finally, we check condition (c1). Using properties of the erf-function as t → ±∞, we obtain the required asymptotical behaviour of f (t), t → ±∞, with d 2. This completes our proof. Lemma 3.1 proves the exponential convergence of the canonical decomposition with = 21 . However, it is also possible to apply the improved quadrature with hyper-exponential decay of the integrand which leads to the true exponential convergence with = 1. Using a variable transformation t = sinh(u) and taking advantage of the symmetry of the integrand we obtain the quadrature formula I=

R

f (t) dt =

R+

2 cosh(u)f (sinh(u)) du ≈

M k=0

(M)

wk

(M)

f (tk

) =: IM ,

(3.5)

712


100

Error Curves for h = 0.01, C0= 6 and R = 0

Error Curves for h = 0.01, C0 = 2 and R = 1.7147 100 10-2 Relative Error

Relative Error

10-2 10-4 10-6 10-8

10-4 10-6 10-8

10-10

10 20 30 40 50 60 70 80 90 100 Kronecker Rank

10-10

5

10 15 20 25 30 35 40 45 50 Kronecker Rank

Fig. 1. √ Comparison between the improved and not-improved sinc-quadratures for d = 3, h = 0.01, R = 0 (left) and R = 3 (right).

with (M)

tk

:= sinh(khM )

and

(M)

wk

:=

for k = 0, hM 2 hM cosh(khM ) for k > 0,

(3.6)

(3.7)

with the choice hM = C0 log(M) for some C0 (see Lemma 5.1 in [12]). M In the numerical illustrations we consider the case d = 3. Due to the Toeplitz structure of the n × n matrices Vk , in the numerical experiments below we control the accuracy of our quadrature-based decompositions only for a fixed index i = 1 and vary the index j = 1, . . . , n ( = 1, . . . , 3). Hence, in our notation we distinguish the distance R√from the observation point to the origin: for example, R = 0 corresponds to j = 1, while 3 corresponds to j = n ( = 1, . . . , 3). First we demonstrate the advantage of the improved quadrature (3.5), see Fig. 1. For a fixed number of quadrature terms M, in order to obtain uniform error control for all indices j = j = 1, . . . , n, we optimise the quadrature with respect to the factor C0 in hM = C0 ln(M) , M √ such that the quadrature errors are approximately equalised for two limiting cases R = 1 and 3. Then the error for all intermediate values of R lie in the “corridor’’ between the above-mentioned error bounds. Fig. 2 presents non-optimised (left) and optimised (right) errors considered for the limiting values of R (top) and other representative data (bottom), for h = 0.01 and = 10−5 . For our quadrature-based decompositions we observe the exponential convergence in the Kronecker rank. Further reduction of the Kronecker rank can be achieved by applying the so-called near-far field decomposition. It is based on the observation that the quadrature optimisation for the off-diagonal part of the target matrix (i.e., without the diagonal elements corresponding to j 2) leads to a much smaller Kronecker rank compared with an approximation of the whole matrix. In this case the low Kronecker rank representation of the complete matrix is obtained by adding a rank-1 term


100

10-2

10-2 Relative Error

Relative Error

Error Curves for h = 0.01 and C0 = 4.5 100

10-4 10-6 10-8 10-10

713

Error Curves for h = 0.01 and C0 = 3.1 R=0 R = 0.01 R = 0.02 R = 0.03 R = 0.99 R = 1.7147 r = 31,ε

10-4 10-6 10-8 10-10

10-12

10

20

30 40 50 60 Kronecker Rank

70

10-12

80

10

20


70

80

Fig. 2. Non-optimised (left) and optimised (right) errors for h = 0.01, = 10−5 .


10-2

10-2 Relative Error

Relative Error


10-4 10-6 10-8 10-10 10-12

10-4 10-6 10-8 10-10

10

20


70

80

10-12

10

20


70

80

Fig. 3. Optimal quadratures without (left) and with near-far field decomposition (right) for h = 10−2 and = 10−5 .

representing the diagonal part (j = 1). The numerical results are depicted in Fig. 3 (indicate the rank reduction from 30 to 20). Acknowledgment The authors are thankful to C. Bertoglio for the assistance with numerical experiments. References [1] G. Beylkin, M.J. Mohlenkamp, Numerical operator calculus in higher dimensions, Proc. Nat. Acad. Sci. USA 99 (2002) 10246–10251. [2] G. Beylkin, M.J. Mohlenkamp, Algorithms for numerical analysis in higher dimensions, SIAM J. Sci. Comput. 26 (2005) 2133–2159. [3] G. Beylkin, L. Morzón, On approximation of functions by exponential sums, Appl. Comput. Harmonic Anal. 19 (2005) 17–48.

714


[4] L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank-1 and rank-(R1 , . . . , RN ) approximation of higherorder tensors, SIAM J. Matrix Anal. Appl. 21 (2000) 1324–1342. [5] L. De Lathauwer, B. De Moor, J. Vandewalle, Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition, SIAM J. Matrix Anal. Appl. 26 (2004) 295–327. [6] M.V. Fedorov, G. Chuev, H.-J. Flad, L. Grasedyck, B.N. Khoromskij, Low-rank wavelet solver for the Ornstein–Zernike integral equation, Computing 80 (2007), to appear. [7] H.-J. Flad, W. Hackbusch, B.N. Khoromskij, R. Schneider, Concept of data-sparse tensor-product approximation in many-particle models, in preparation. [9] I.P. Gavrilyuk, W. Hackbusch, B.N. Khoromskij, Data-sparse approximation to a class of operator-valued functions, Math. Comp. 74 (2005) 681–708. [10] I.P. Gavrilyuk, W. Hackbusch, B.N. Khoromskij, Tensor-product approximation to elliptic and parabolic solution operators in higher dimensions, Computing 74 (2005) 131–157. [12] W. Hackbusch, B.N. Khoromskij, Low-rank Kronecker product approximation to multi-dimensional nonlocal operators. Part I. Separable approximation of multi-variate functions, Computing 76 (2006) 177–202. [13] W. Hackbusch, B.N. Khoromskij, Low-rank Kronecker product approximation to multi-dimensional nonlocal operators. Part II. HKT representations of certain operators, Computing 76 (2006) 203–225. [14] W. Hackbusch, B.N. Khoromskij, E. Tyrtyshnikov, Hierarchical Kronecker tensor-product approximation, J. Numer. Math. 13 (2005) 119–156. [15] W. Hackbusch, B.N. Khoromskij, E.E. Tyrtyshnikov, Approximate iterations for structured matrices, Preprint 112, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, 2005. [16] R. Harshman, Foundation of the PARAFAC procedure: model and conditions for an “explanatory” multi-mode factor analysis, UCLA Working Papers in Phonetics, vol. 16, 1970, pp. 1–84. [17] B.N. Khoromskij, An Introduction to Structured Tensor-product Representation of Discrete Nonlocal Operators. Lecture Notes, vol. 27, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, 2005. [18] B.N. Khoromskij, Structured data-sparse approximation to high order tensors arising from the deterministic Boltzmann equation, Preprint 4, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, 2005, Math. Comp., to appear. [19] B.N. Khoromskij, Structured rank-(r1 , . . . , rd ) decomposition of function-related tensors in Rd , Comput. Meth. Appl. Math. 6 (2) (2006) 194–220. [21] T. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23 (2001) 243–255. [22] Ch. Lubich, On variational approximations in quantum molecular dynamics, Math. Comp. 74 (2005) 765–779. [23] F. Stenger, Numerical Methods based on Sinc and Analytic Functions, Springer, Berlin, 1993. [24] V.N. Temlyakov, Greedy algorithms and M-term approximation with regard to redundant dictionaries, J. Approx. Theory 98 (1999) 117–145. [25] L.R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966) 279–311. [26] E.E. Tyrtyshnikov, Tensor approximations of matrices generated by asymptotically smooth functions, Mat. Sb. 194 (6) (2003) 147–160 (in Russian) (Translation in Sb. Math. 194 (2003) 941–954). [27] T. Zhang, G.H. Golub, Rank-one approximation to high order tensors, SIAM J. Matrix Anal. Appl. 23 (2001) 534–550.


BDDC methods for discontinuous Galerkin discretization of elliptic problems Maksymilian Dryjaa,∗,1 , Juan Galvisb , Marcus Sarkisb, c,2 a Department of Mathematics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland b Instituto Nacional de Matemática Pura e Aplicada, Estrada Dona Castorina 110, CEP 22460-320, Rio de Janeiro,

Brazil Rio de Janeiro, Brazil c Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA 01609, USA

Received 27 October 2006; accepted 15 February 2007 Available online 24 March 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract A discontinuous Galerkin (DG) discretization of Dirichlet problem for second-order elliptic equations with discontinuous coefficients in 2-D is considered. For this discretization, balancing domain decomposition with constraints (BDDC) algorithms are designed and analyzed as an additive Schwarz method (ASM). The coarse and local problems are defined using special partitions of unity and edge constraints. Under certain assumptions on the coefficients and the mesh sizes across *i , where the i are disjoint subregions of the original region , a condition number estimate C(1 + maxi log(Hi / hi ))2 is established with C independent of hi , Hi and the jumps of the coefficients. The algorithms are well suited for parallel computations and can be straightforwardly extended to the 3-D problems. Results of numerical tests are included which confirm the theoretical results and the necessity of the imposed assumptions. © 2007 Elsevier Inc. All rights reserved. Keywords: Interior penalty discretization; Discontinuous Galerkin method; Elliptic problems with discontinuous coefficients; Finite element method; BDDC algorithms; Schwarz methods; Preconditioners


E-mail address: [email protected] (M. Dryja). 1 This work was supported in part by Polish Sciences Foundation under grant 2P03A00524. 2 This work was supported in part by CNPQ (Brazil) under grant 305539/2003-8.


716

M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739

1. Introduction In this paper, a discontinuous Galerkin approximation of elliptic problems with discontinuous coefficients is considered. The problem is considered in a polygonal region which is a union of disjoint polygonal subregions i . The discontinuities of the coefficients occur across *i . The problem is approximated by a conforming finite element method (FEM) on matching triangulation in each i and nonmatching one across *i . Composite discretizations are motivated first of all by the regularity of the solution of the problem being discussed. Discrete problems are formulated using DG methods, symmetric and with interior penalty terms on the *i ; see [4,5,8]. A goal of this paper is to design and analyze balancing domain decomposition with constraints (BDDC) preconditioners for the resulting discrete problem; see [7,17,16] for conforming finite elements. In the first step, the problem is reduced to the Schur complement problem with respect to unknowns on *i for i = 1, . . . , N. For that, discrete harmonic functions defined in a special way are used. The preconditioners are designed and analyzed using the general theory of ASMs; see [18]. The local spaces are defined on i and faces of *j which are common to i plus zero average values constraints on faces of i or/and faces of j . The coarse basis functions follow from local orthogonality with respect to the local spaces and from average constraints across those faces. A special partitioning of unity with respect to the substructures i is introduced and it is based on master and slave sides of substructures. A side Fij = *i ∩ *j is a master when i is larger than j , otherwise it is a slave, so if Fij ⊂ *i is a master side then Fj i ⊂ *j is a slave side. The hi - and hj -triangulations on Fij and Fj i , respectively, are built in a way that hi is coarser where i is larger. Here hi and hj denote the parameters of these triangulations. It is proved that the algorithms are almost optimal and its rate of convergence is independent of hi and hj , the number of subdomains i and the jumps of coefficients. The algorithms are well suited for parallel computations and they can be straightforwardly extended to the problems in the 3-D cases. DG methods are becoming more and more popular for the approximation of PDEs since they are well suited to dealing with regions with complex geometries or discontinuous coefficients, and local or patch refinements; see [5,4] and the literature therein. The class of DG methods we deal within this paper uses symmetrized interior penalty terms on the boundaries *i . A goal is to design and analyze BDDC algorithms for the resulting discrete problem; see [7] and also [17,16]. There are also several papers devoted to algorithms for solving discrete DG problems. In particular in connection with domain decomposition methods, we can mention [15,12,14,1–3] where related discretizations to those discussed here are considered. In these papers Neumann–Dirichlet methods and two-level overlapping and nonoverlapping Schwarz methods are proposed and analyzed for DG discretization of elliptic problems with continuous coefficients. In [8] for the discontinuous coefficient case, a nonoptimal multilevel ASM is designed and analyzed. In [6,13], two-level overlapping and nonoverlapping ASMs are proposed and analyzed for DG discretization of fourthorder problems. In those works, the coarse problems are based on polynomial coarse basis functions on a coarse triangulation. In addition, ideas of iterative substructuring methods and notions of discrete harmonic extensions are not explored. Condition number estimates of O( H ) and O( Hh ), 3

3

and O( H3 ) and O( Hh3 ) are obtained for second- and fourth-order problems, respectively,

where is the overlap parameter. In addition, for the cases where the distribution of the coefficients i is not quasimonotonic, see [10], these methods when extended straightforwardly to 3-D problems have condition number estimates which might deteriorate as the jumps of the coefficients get more severe. To the best of our knowledge, BDDC algorithms for DG discretizations of


717

elliptic problems with continuous and discontinuous coefficients have not been considered in the literature. We note that part of the analysis presented here has previously appeared as a technical report for analyzing several iterative substructuring DG preconditioners of Neumann–Neumann type; see [11]. In [9] we have also successfully extended these preconditioners to the balancing domain decomposition (BDD) method. The paper is organized as follows. In Section 2 the differential problem and its DG discretization are formulated. In Section 3 the Schur complement problem is derived using discrete harmonic functions in a special way. Some technical tools are presented in Section 4. Sections 5 and 6 are devoted to designing a BDDC algorithm while Sections 7 and 8 are devoted to the proof of the main result, Theorem 7.1. In Section 9 we introduce coarse spaces of dimension half smaller than those defined in Section 6. Finally in Section 10 some numerical experiments are presented which confirm the theoretical results. The enclosed numerical results show that the introduced assumption on the coefficients and the parameter steps are necessary and sufficient. 2. Differential and discrete problems 2.1. Differential problem Consider the following problem: find u∗ ∈ H01 () such that a(u∗ , v) = f (v)

∀v ∈ H01 (),

(1)

where a(u, v) :=

N i=1 i

i ∇u∗ · ∇v dx

and f (v) :=

f v dx.

¯ = N ¯ We assume that i=1 i and the substructures i are disjoint regular polygonal subregions of diameter O(Hi ) and form a geometrical conforming partition of , i.e., ∀i = j the intersection *i ∩*j is empty, or is a common vertex or an edge of *i and *j . We assume that f ∈ L2 () and, for simplicity of presentation, let i be a positive constant. 2.2. Discrete problem Let us introduce a shape-regular triangulation in each i with triangular elements and hi as mesh parameter. The resulting triangulation on is in general nonmatching across *i . Let Xi (i ) be the regular finite element (FE) space of piecewise linear continuous functions in i . Note that we do not assume that functions in Xi (i ) vanish on *i ∩ *. Define Xh () := X1 (1 ) × · · · × XN (N ). The discrete problem obtained by the DG method, see [5,8], is of the form: Find u∗h ∈ Xh () such that ah (u∗h , vh ) = f (vh ) ∀vh ∈ Xh (),

(2)

where ah (u, v) =

N i=1

aˆ i (u, v)

and f (v) =

N i=1

i

f vi dx,

(3)

718


aˆ i (u, v) := ai (u, v) + si (u, v) + pi (u, v), ai (u, v) := i ∇ui ∇vi dx, i

si (u, v) :=

Fij ⊂*i

pi (u, v) :=

ij Fij

Fij ⊂*i

Fij

lij

(4) (5)

*ui *vi (vj − vi ) + (uj − ui ) *n *n

ds,

ij (uj − ui )(vj − vi ) ds, lij hij

(6)

N and u = {ui }N i=1 ∈ Xh (), v = {vi }i=1 ∈ Xh (). We set lij = 2 when Fij = *i ∩ *j is a common face (edge) of *i and *j , and define ij := 2i j /(i + j ) as the harmonic average of i and j , and hij := 2hi hj /(hi + hj ). In order to simplify the notation we include the index j = * and put li * := 1 when Fi * := *i ∩ * has a positive measure. We also set u* = 0, v* = 0 and define i * := i and hi * := hi . The **n denotes the outward normal derivative on *i , and is a positive penalty parameter. We note that when ij is given by the harmonic average, it can be shown that min{i , j } ij 2 min{i , j }. We also define

di (u, v) := ai (u, v) + pi (u, v),

(7)

and dh (u, v) :=

N

(8)

di (u, v).

i=1

It is known that there exists a 0 = O(1) > 0 such that for 0 , we obtain |si (u, u)| < cdi (u, u) and i si (u, u) < cdh (u, u), where c < 1, and therefore, the problem (2) is elliptic and has a unique solution. A priori error estimates for the method are optimal for the continuous coefficients, see [4,5], and for the discontinuous coefficients if i *n u∗ − j *n u∗ = 0 in L2 (Fij ), see [8]. Note that this condition is satisfied if the solution u∗ of (2.1) restricted to the i and j is in H 3/2+ (i ) and H 3/2+ (j ) with > 0. We use the dh -norm, also called broken norm, in Xh () with weights given by i and lij hijij . For u = {ui } ∈ Xh () we note that ⎧ ⎫ ⎪ N ⎪ ⎨ ⎬ ij 2 2 dh (u, u) = i ∇ui L2 ( ) + (ui − uj ) ds . (9) i ⎪ ⎪ lij hij ⎭ i=1 ⎩ F ⊂* ij

i

Fij

Lemma 2.1. There exists 0 > 0 such that for 0 , for all u ∈ Xh () the following inequalities hold: 0 di (u, u) aˆ i (u, u) 1 di (u, u)

i = 1, . . . , N,

(10)

and 0 dh (u, u)ah (u, u) 1 dh (u, u), where 0 and 1 are positive constants independent of the i , hi and Hi .

(11)


719

The proof essentially follows from (37), see below, or refer to [8]. 3. Schur complement problem In this section we derive a Schur complement version for the problem (2). We first introduce some auxiliary notations. Let u = {ui } ∈ Xh () be given. We can represent ui as ui = Hi ui + Pi ui ,

(12)

where Hi ui is the discrete harmonic part of ui in the sense of ai (., .), see (5), i.e., ai (Hi ui , vi ) = 0 Hi ui = ui

o

∀vi ∈ X i (i ),

(13)

on *i ,

(14) o

while Pi ui is the projection of ui into X i (i ) in the sense of ai (., .), i.e. o

ai (Pi ui , vi ) = ai (ui , vi ) ∀vi ∈ X i (i ).

(15)

o

Here X i (i ) is a subspace of Xi (i ) of functions which vanish on *i , and Hi ui is the classical o

o

discrete harmonic part of ui . Let us denote by X h () the subspace of Xh () defined by Xh () := o

o

X1 (1 ) × . . . × XN (N ) and consider the global projections Hu := {Hi ui }N i=1 and Pu := o N N {Pi ui }i=1 : Xh () → X h () in the sense of i=1 ai (., .). Hence, a function u ∈ Xh () can therefore be decomposed as u = Hu + Pu.

(16)

The function u ∈ Xh () can also be represented as ˆ + Pu, ˆ u = Hu

(17) o

ˆ = {Pˆ i ui }N : Xh () → X h () is the projection in the sense of ah (., .), the original where Pu i=1 o o bilinear form of (2), see (3). Since Pˆ i ui ∈ X i (i ) and vi ∈ X i (i ), we have ai (Pˆ i u, vi ) = ah (u, vi ). ˆ ∗ + Pu ˆ ∗ . To find Pu ˆ ∗ we need to The discrete solution of (2) can be decomposed as u∗h = Hu h h h solve the following set of standard discrete Dirichlet problems: o Find Pˆ i u∗ ∈ X i () such that h

ai (Pˆ i u∗h , vi ) = f (vi )

o

∀vi ∈ X i (i )

(18)

for i = 1, . . . , N. Note that these problems are local and independent, so they can be solved in parallel. This is a precomputational step. ˆ ∗ . Let Hˆ i u be the discrete harmonic part of u in the We now formulate the problem for Hu h sense of aˆ i (., .), see (4), where Hˆ i u ∈ Xi (i ) is the solution of aˆ i (Hˆ i u, vi ) = 0 ui

on *i

and

o

∀vi ∈ X i (i ), uj

on Fj i ⊂ *j are given

(19) (20)

720

M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739 o

where uj are given on Fj i = *i ∩ *j . We points out that for vi ∈ Xi (i ) we have ij *vi , uj − u i . aˆ i (ui , vi ) = (i ∇ui , ∇vi )L2 (i ) + lij *n L2 (Fij )

(21)

Fij ⊂*i

Note that (19)–(20) has a unique solution. To see this, let us rewrite (19) in the form ij *k i , uj − ui , i (∇ Hˆ i u, ∇ki )L2 (i ) = − lij *n 2 Fij ⊂*i

(22)

L (Fij )

o

where ki are nodal basis functions of X i (i ) associated with interior nodal points xk of the *k

hi -triangulation of i . Note that *ni does not vanish on *i when xk is a node of an element touching *i . We see that Hˆ i u is a special extension into i where u is given on *i and on all the Fj i , and therefore, it depends on the values of uj given on Fj i = *i ∩ *j and on F*i (we already have assumed u* = 0 for j = *). Note that Hˆ i u is discrete harmonic except at nodal points close to *i . We will sometimes call Hˆ i u discrete harmonic in a special sense, i.e., in the ˆ = {Hˆ i u}N ∈ Xh (). sense of aˆ i (., .) or Hˆ i . We let Hu i=1 Note that (19) is obtained from ˆ v) = 0 ah (Hu,

(23) o

ˆ ˆ N for u ∈ Xh () and when taking v = {vi }N i=1 ∈ X h (). It is easy to see that Hu = {Hi u}i=1 and ˆ = {Pˆ i ui }N are orthogonal in the sense of ah (., .), i.e. Pu i=1 ˆ Pv) ˆ =0 ah (Hu,

u, v ∈ Xh ().

(24)

In addition, ˆ = Hu, HHu

ˆ ˆ HHu = Hu

(25)

ˆ and Hu do not change the values of u on any of the nodes on the boundaries of the since Hu subdomains i also denoted by (26) *ihi , := i

where *ihi is the set of nodal points of *i . We note that the definition of includes the nodes on both sides of i *i . We are now in a position to derive a Schur complement problem for (2). Let us apply the decomposition (17) in (2). We get ˆ ∗ + Pu ˆ ∗ , Hv ˆ h + Pv ˆ h ) = f (Hv ˆ h + Pv ˆ h) ah (Hu h h or ˆ ∗ , Hv ˆ h ) + 2ah (Hu ˆ ∗ , Pv ˆ h ) + ah (Pu ˆ ∗ , Pv ˆ h ) = f (Hv ˆ h ) + f (Pv ˆ h ). ah (Hu h h h


721

Using (18) and (23) we have ˆ ∗ , Hv ˆ h ) = f (Hv ˆ h) ah (Hu h

∀vh ∈ Xh ().

(27)

This is the Schur complement problem for (2). We denote by Vh () or V, which we will use ˆ h = 0, i.e., the space of discrete harmonic later, the set of all functions vh in Xh () such that Pv functions in the sense of the Hˆ i . We rewrite the Schur complement problem as follows: Find u∗h ∈ Vh () such that S(u∗h , vh ) = g(vh ) ∀vh ∈ Vh (),

(28)

ˆ ∗ , and here and below u∗h ≡ Hu h ˆ h , Hv ˆ h ), S(uh , vh ) = ah (Hu

ˆ h ). g(vh ) = f (Hv

(29)

This problem has a unique solution. 4. Technical tools Our main goal is to design and analyze BDDC methods for solving (28). This will be done in the next section. We now introduce some notations and facts to be used later. Let u = {ui }N i=1 ∈ Xh () N and v = {vi }i=1 ∈ Xh (). Let di (., .) and dh (., .) be the bilinear forms defined in (7) and (8). o

Note that, for u, v ∈ Xh (), di (u, v) = ai (u, v) = i (∇ui , ∇vi )L2 (i )

(30)

and, for u ∈ Xh (), 0 dh (u, u)ah (u, u) 1 dh (u, u)

(31)

in view of Lemma 2.1, where 0 and 1 are positive constants independent of hi , Hi and i . The next lemma shows the equivalence between discrete harmonic functions in the sense of H and ˆ and therefore, we can take advantage of all the discrete Sobolev norm results in the sense of H, known for H discrete harmonic extensions. Lemma 4.1. For u ∈ Xh () we have ˆ Hu)Cd ˆ di (Hu, Hu)di (Hu, i (Hu, Hu),

i = 1, . . . , N,

(32)

and ˆ Hu)Cd ˆ dh (Hu, Hu)dh (Hu, h (Hu, Hu),

(33)

ˆ ˆ N where Hu = {Hi ui }N i=1 and Hu = {Hi u}i=1 are defined by (13)–(14) and (19)–(20), respectively, and C is a positive constant independent of hi , u, i and Hi . Proof. We note that P and H are projections in the sense of i ai (., .) while Pˆ and Hˆ are projections in the sense of ah (., .). Therefore, the left-hand inequality of (33) follows from properties of minimum energy of discrete harmonic extensions in the i ai (., .) sense. To prove the right-hand inequality of (33) note that ˆ Hu) ˆ = dh (Hu, ˆ HHu ˆ + P Hu) ˆ = dh (Hu, ˆ Hu) + dh (Hu, ˆ P Hu) ˆ dh (Hu,

(34)

722


in view of (25). The first term is estimated as ˆ Hu)εdh (Hu, ˆ Hu) ˆ + 1 dh (Hu, Hu), dh (Hu, 4ε

(35)

with arbitrary ε > 0. To estimate the second term on the right-hand side of (34) note that, for o ˆ ∈ X () and using (22), we get v := P Hu ˆ v) = dh (Hu,

N

i (∇ Hˆ i ui , ∇vi )L2 (i )

i=1

=−

N i=1 Fij ⊂*i

ij

lij

*vi , uj − u i *n

(36)

. L2 (Fij )

The terms on the right-hand side of (36) are estimated as follows: *vi *vi ui − uj L2 (Fij ) , uj − u i ij ij *n 2 *n L2 (Fij ) L (Fij ) C C

ij 1/2

hi

∇vi L2 (i ) ui − uj L2 (Fij )

ij

∇vi L2 (i ) ui − uj L2 (Fij ) 1/2 hij ij 2 2 C εij ∇vi L2 ( ) + ui − uj L2 (F ) i ij 4εhij ij C 2εi ∇vi 2L2 ( ) + ui − uj 2L2 (F ) , i ij 4εhij where we have used that hij 2hi and ij 2i . Substituting this into (36), we get ⎧ ⎫ N ⎨ ⎬ ij ˆ v)C 2εi ∇Pi Hˆ i ui 2L2 ( ) + ui − uj 2L2 (F ) , dh (Hu, i ij ⎭ ⎩ 4hij ε i=1

(37)

(38)

Fij ⊂*i

and using ∇Pi Hˆ i ui L2 (i ) ∇ Hˆ i ui L2 (i ) , we obtain

ˆ v)C εdh (Hu, ˆ Hu) ˆ + 1 dh (Hu, Hu) . dh (Hu, 4ε

(39)

Substituting (39) and (35) into (34) we get 1 ˆ ˆ ˆ ˆ dh (Hu, Hu)C εdh (Hu, Hu) + dh (Hu, Hu) . 4ε Choosing a sufficiently small ε, the right-hand side of (33) follows.


723

5. Balancing domain decomposition with constraints method We design and analyze BDDC methods for solving (28); see [7,17,16] for conforming elements. We use the general framework of ASMs as stated below in Lemma 5.1; see [18]. For i = 0, . . . , N, let Vi be auxiliary spaces and Ii prolongation operators from Vi to V, and define the operators T˜i : V → Vi as bi (T˜i u, v) = ah (u, Ii v)

∀v ∈ Vi ,

where bi (·, ·) is symmetric and positive definite on Vi × Vi , and set Ti = Ii T˜i . Then the ASMs, in particular the BDDC methods, are defined as T =

N

(40)

Ti .

i=0

The bilinear form ah is defined in (3). The bilinear forms bi , the operators Ii , and the spaces Vi , i = 0, . . . , N, are defined in the next subsections. Lemma 5.1. Suppose the following three assumptions hold: (i) There exists a constant C0 such that, for all u ∈ V , there is a decomposition u = with u(i) ∈ Vi , i = 0, . . . , N, and N

N

i=0 Ii u

(i)

bi (u(i) , u(i) ) C02 ah (u, u).

i=0

(ii) There exist constants ij , i, j = 1, . . . , N, such that for all u(i) ∈ Vi , u(j ) ∈ Vj , ah (Ii u(i) , Ij u(j ) ) ij ah (Ii u(i) , Ii u(i) )1/2 ah (Ij u(j ) , Ij u(j ) )1/2 . (iii) There exists a constant such that ah (Ii u, Ii u) bi (u, u)

∀u ∈ Vi , i = 0, . . . , N.

Then, T is invertible and C0−2 ah (u, u)ah (T u, u) (() + 1)ah (u, u) ∀u ∈ V . Here, () is the spectral radius of the matrix = {ij }N i,j =1 . 5.1. Notations and the interface condition Let us denote by i the set of all nodes on *i and on the neighboring faces Fj i ⊂ *j . We note that the nodes of *Fj i (which are vertices of j ) are included in i . Define Wi as the vector space associated to the nodal values on i and extended via Hˆ i inside i . We say that u(i) ∈ Wi (i) (i) if u(i) is represented as u(i) := {ul }l∈#(i) , where #(i) = {i and ∪ j : Fij ⊂ *i }. Here ui and (i) the uj stand for the nodal values of u(i) on *i and the F¯j i , respectively. We write u = {ui } ∈ V to refer to a function defined on all of with each ui defined (only) on *i . We point out that Fij and Fj i are geometrically the same even though the mesh on Fij is inherited from the i mesh while the mesh on Fj i corresponds to the j mesh.

724


Denote by i := {Fij : Fij ⊂ *i } ∪ {Fj i : Fj i = Fij , Fj i ⊂ *j } the set of all faces of i and all faces of j which has a common face with i . Given u(i) ∈ Wi and Fk ∈ i we use the notation 1 (i) uk = u(i) ds. |Fk | Fk Let us define the regular zero extension operator I˜i : Wi → V as follows: given u(i) ∈ Wi , let I˜i u(i) be equal to u(i) on nodes i and zero on \i . A face across i and j has two sides, the side contained in *i , denoted by Fij , and the side contained in *j , denoted by Fj i . In addition, we assign to each pair {Fij , Fj i } a master and a slave side. If Fij is a slave side then Fj i is a master side and vice versa. If Fij is a slave side we will use the notation ij (instead of Fij ) to emphasize this fact while if Fij is a master side we will use the notation ij . The choice of slave–master sides are such that the interface condition, stated next, can be satisfied. In this case Theorem 7.1 below holds with a constant C independent of the i , hi and Hi . Assumption 1 (The interface condition). We say that the coefficients {i } and the local mesh sizes {hi } satisfy the interface condition if there exist constants C0 and C1 , of order O(1), such that for any face Fij the following conditions hold: hi C0 hj and i C1 j if Fij is a slave side, or (41) hj C0 hi and j C1 i if Fij is a master side. (i)

We associate with each i , i = 1, . . . , N, the weighting diagonal matrices D (i) = {Dl }l∈#(i) on i defined as follows: • On *i (l = i):

⎧ 1 if x is a vertex of *i , ⎪ ⎪ ⎨ (i) Di (x) = 1 if x is an interior node of a master face Fij , ⎪ ⎪ ⎩ 0 if x is an interior node of a slave face Fij .

(42)

• On Fj i (l = j ):

⎧ 0 if x is an end point of the face Fj i , ⎪ ⎪ ⎨ (i) Dj (x) = 1 if x is an interior node and Fj i is a slave face, ⎪ ⎪ ⎩ 0 if x is an interior node and Fj i is a master face.

(43)

(i)

• For x ∈ Fi * we set Di (x) = 1. Remark 5.1. We note that two alternatives of weighting diagonal matrices D (i) can also be considered while ensuring that Theorem 7.1 below holds: (1) On faces Fij where hi and hj are of the same order, the values of (42) and (43) at interior nodes x of the faces Fij and Fj i can be √ i replaced by √ √ ; (2) Similarly, on faces Fij where i and j are of the same order, we can i +

j

replace (42) and (43) at interior nodes x of the faces Fij and Fj i by

hi hi +hj

.


725

The prolongation operators Ii : Wi → V , i = 1, . . . , N, are defined as Ii = I˜i D (i) ,

(44)

and they form a partition of unity on described as N

Ii I˜iT = I .

(45)

i=1

6. Local and coarse spaces The local spaces Vi = Vi (i ), i = 1, . . . , N, are defined as the subspaces of Wi of functions with zero face-average values on all faces Fij and Fj i associated to the subdomain i , i.e., for all Fk ∈ i . For u(i) , v (i) ∈ Vi (i ) we define the local bilinear form bi as bi (u(i) , v (i) ) := aˆ i (u(i) , v (i) ),

(46)

where the bilinear form aˆ i was defined in (4). Now we define a BDDC coarse space. As in BDDC methods, here we define the coarse space using local bases and imposing continuity conditions with respect to the primal variables; see [7,17,16]. Recall that i := {Fij : Fij ⊂ i } ∪ {Fj i : Fj i = Fij , Fj i ⊂ j } is the set of all faces of i and all faces of j which has a common face with i . For Fk ∈ i define the local coarse basis (i) function Fk ∈ Wi by (i)

bi (Fk , v) = 0 with 1 |Fk | and

(47)

(i)

Fk = 1 Fk

(i)

F k

∀v ∈ Vi (i )

Fk = 0 (i)

∀F k = Fk with F k ∈ i .

(i)

Note that Fk = Fk .

(i) Define V0i = V0i (i ) := Span{Fk : Fk ∈ i } ⊂ Wi . Then (47) implies that Vi is Hˆ i orthogonal to V0i , and Wi is a direct sum of V0i and Vi , i.e., V0i ⊕ Vi = Wi . (i) The global coarse space V0 is defined as the set of all u0 := {u0 } ∈ N i=1 V0i (i ) such that, for i, j = 1, . . . , N, we have, using the notation introduced in Subsection 5.1, (i)

(j )

u0k = u0k

∀Fk ∈ i ∩ j .

The coarse prolongation operator I0 : V0 → V is defined as I0 u0 = form b0 is of the form b0 (u0 , v0 ) :=

N i=1

(i)

(i)

bi (u0 , v0 ).

(48) N

(i) i=1 Ii u0

and the bilinear

(49)

726


7. Main result In this section we state and prove our main result. Theorem 7.1. Let the Assumption 1 be satisfied. Then, there exists a positive constant C, independent of hi , Hi and the jumps of i , such that H 2 ah (u, u) ∀u ∈ V , (50) ah (u, u)ah (T u, u) C 1 + log h i where T is defined in (40). Here log Hh = maxi log H hi .

Proof. By the general theorem of ASMs we need to check the three key assumptions of Lemma 5.1. (i) ∈ V such that Assumption (i). We prove that for u = {ui }N i i=1 ∈ V there exists u0 ∈ V0 and u

I0 u0 +

N

Ii u(i) = u

(51)

i=1

and b0 (u0 , u0 ) +

N

bi (u(i) , u(i) ) = a(u, u).

(52)

i=1 (i)

Let u = {ui }N i=1 ∈ V (). Define u0 ∈ V0i (i ) as 1 (i) (i) u0 = u ds Fk , |Fk | Fk

(53)

Fk ∈i

(i)

(i)

where functions Fik were defined in (47). Note that u0 and u have the same face-average values on all faces Fk ∈ i , i.e., ⎧ 1 1 (i) (i) ⎪ u ds = u ds = u0k , ⎨ |Fk | Fk |Fk | Fk 0 (54) 1 (j ) (j ) ⎪ ⎩ 1 u ds = u ds = u , 0k |Fk | Fk |Fk | Fk 0 and therefore, for all the faces Fk ∈ i ∩ j we have, see (48), (j )

(i)

u0k = u0k .

(55) (i)

Define u0 ∈ V0 by u0 = {u0 }N i=1 and set w = u − I0 u0 , where I0 u0 = can write (i)

N

i=1

i=1

where we have defined u(i)

=

(i) I˜iT u − u0

w=

(51) holds.

N

Ii (I˜iT u − u0 ) =

N

(i) i=1 Ii u0 .

Then we

Ii u(i) , ∈ Vi . Since the operators Ii I˜iT form a partition of unity,


727

To check (52) observe that u(i) has zero face-average values on all faces Fk ∈ i , hence it is (i) ˆ Hi -orthogonal to u0 ; see (47). Then, from the definition of b0 we have b0 (u0 , u0 ) +

N

bi (u , u ) = (i)

(i)

i=1

N

(i)

(i)

bi (u0 , u0 ) + bi (u(i) , u(i) )

i=1

=

N

(i)

(i)

bi (u0 + u(i) , u0 + u(i) )

i=1

=

N

bi (I˜iT u, I˜iT u) = ah (u, u).

i=1

This ends the proof of Assumption (i). Assumption (ii). We need to prove that 1/2

1/2

ah (Ii u(i) , Ij u(j ) ) Cεij ah (Ii u(i) , Ii u(i) ) ah (Ij u(j ) , Ij u(j ) ),

(56)

for u(i) ∈ Vi and u(j ) ∈ Vj , i, j = 1, . . . , N, and the spectral radius (ε) of ε = {εij }N i,j =1 is bounded. In our case (ε) C with constant independent of hi and Hi . This follows from coloring arguments and the fact that u(i) and u(j ) are different from zero only on i and j and their neighboring substructures. Assumption (iii). We need to prove that for i = 1, . . . , N, ah (Ii u(i) , Ii u(i) ) bi (u(i) u(i) ) ∀u(i) ∈ Vi ,

(57)

ah (I0 u0 , I0 u0 ) b0 (u0 , u0 ) ∀u0 ∈ V0

(58)

and

with C(1 + log Hh )2 where C is a positive constant independent of hi , Hi and the jumps of i . For the proof of (57) see Lemma 8.1, and for the proof of (58) see Lemma 8.2 in the next section. 8. Auxiliary lemmas In this section we complete the proof of Theorem 7.1 by proving two auxiliary lemmas associated with (57) and (58). Lemma 8.1. Assume that the Assumption 1 holds. Then for u(i) ∈ Vi , i = 1, . . . , N, we have H 2 (i) (i) ah (Ii u , Ii u ) C 1 + log bi (u(i) , u(i) ), (59) h where C is independent of hi , Hi and the jumps of i .

728


ˆ Hu) ˆ by dh (Hu, Hu) on the left-hand side Proof. In order to prove (59) we can replace ah (Hu, ˜ of (59) and on its right-hand side we can put di (HIi u(i) , HI˜i u(i) ) instead of bi (u(i) , u(i) ); see Lemmas 2.1 and 4.1. In order to simplify the notation, all the functions are considered as harmonic extensions in the (i) H sense. Hence, we denote HIi u by Ii u and let u = {ul }l∈#(i) ∈ Vi . Using (7), (8) and (44) we obtain dj (I˜i D (i) u(i) , I˜i D (i) u(i) ), (60) dh (Ii u(i) , Ii u(i) ) = di (I˜i D (i) u(i) , I˜i D (i) u(i) ) + j

where the sum is taken over j which has a common face with i . The first term on the right-hand side of (60) can be estimated as follows: di (I˜i D (i) u(i) , I˜i D (i) u(i) ) (i) (i) = i |∇Di ui |2 dx + i

Fij ⊂*i

ij lij hij

(i) (i)

Fij

(i) (i)

(Di ui − Dj uj )2 dx.

(61)

To bound the first term of (61) we use (i) (i)

(i) (i)

(i)

(i)

i ∇Di ui 2L2 ( ) 2i { ∇(Di ui − ui ) 2L2 ( ) + ∇ui 2L2 ( ) } i

i

i

and therefore, (i) (i)

(i)

i ∇(Di ui − ui ) 2L2 ( ) C i

(i)

(i)

i u˜ i 2

1/2

H00 (ij )

ij ⊂*i

(i)

.

(i)

Here u˜ i = ui at the interior nodal points of ij and u˜ i = 0 on *ij . Recall that ij denotes Fij when Fij is a slave side. It can be proved, see for example [18], that (i) C i u˜ i 2 1/2 H (ij ) 00

Hi 1 + log hi

2

(i)

i |ui |2H 1 ( ) .

(62)

i

(i)

Here we have used the fact that ui has zero face-average values. We now estimate the second term of (61) and (67), see below. Note that for Fi * , i.e. for faces on *, the estimates of the terms corresponding to Fi * follow straightforwardly. On a slave face Fij of *i , i.e. where hi C0 hj and i C1 j , we have (i) (i)

(i) (i)

(i)

Di ui − Dj uj 2L2 (F ) Chi max |ui |2 ij

and ij hij

(63)

Fij

(i) (i) Di ui

(i) (i) (i) − Dj uj 2L2 (F ) Ci max |ui |2 C ij Fij

Hi 1 + log hi

(i)

i |ui |2H 1 ( ) , i

where we have used ij 2i and hi Chij since hi < C0 hj . We have also used that u(i) has zero face-average value on any face of i , therefore, the Poincaré inequality can be used to bound the H 1 (i )-norm by the seminorm.


729

On a master side Fij of *i , i.e. where hj C0 hi and j C1 i , we have (i) (i) (i) (i) (i) (i) (i) j j uj (xv )v Di ui − Dj uj L2 (Fij ) ui − uj L2 (Fij ) + j xv ∈*Fij

,

(64)

L2 (Fij )

and using a triangle inequality we obtain (i)

(i)

(i)

(i)

uj (xvj )jv L2 (Fij ) ui (xvi )iv L2 (Fij ) + ui (xvi )iv − uj (xvj )jv L2 (Fij ) , j

(65)

j

where iv and v are the nodal basis functions corresponding to xvi and xv , respectively. The first term of (65) can be estimated as Hi (i) i (i) 2 (i) i 2 ui (xv )v L2 (F ) C max |ui | hi Chi 1 + log |ui |2H 1 ( ) , ij i Fij hi while the second term of (65) can be bounded as in (81), see below. Using these estimates in (61) and Lemma 2.1 we get Hi 2 (i) (i) di (Ii u , Ii u ) C 1 + log bi (u(i) , u(i) ). (66) hi We now estimate the second term of (60) by bounding dj (I˜i D (i) u(i) , I˜i D (i) u(i) ) by bi (u(i) , u(i) ). (i) For u = {ul } ∈ Vi we have dj (I˜i D (i) u(i) , I˜i D (i) u(i) ) =

(i) (i) j ∇Dj uj 2L2 ( ) j

ij + lij hij

(i) (i)

(i) (i)

(Di ui − Dj uj )2 dx,

Fij

(67)

(i) (i)

where here and below Dj uj is extended by zero on *j \Fj i . We need only to estimate the first term of (67) since the second term has been already estimated; see (63), (64) and (65). If Fij (i) (i) (i) is a slave side of *i then Dj vanishes, and so vanishes ∇Dj uj 2L2 ( ) . We now consider j

the case where Fij is a master side of *i and it is not equal to Fi * . On Fj i we decompose j (i) (i) (i) j (i) (i) (i) uj = wj + x j ∈*F uj (xv )v , where wj = Dj uj . We have ji

v

(i)

∇wj 2L2 (

j)

(i)

C wj 2

1/2

H00 (Fj i )

=C

(i) |wj |2H 1/2 (F ) ji

(i)

(wj )2

+ Fj i

dist(s, *Fj i )

ds .

(68)

We now estimate the first term of (68). Let Qj be the L2 -projection on the hj -triangulation of Fj i . Then, (i)

|wj |2H 1/2 (F

ji )

(i)

(i)

2{|wj − Qj ui |2H 1/2 (F

ji )

(i)

+ |Qj ui |2H 1/2 (F ) } ji

1 (i) (i) (i) w − ui 2L2 (F ) + ∇ui 2L2 ( ) C ji i hj j

(69)

730


and 2 (i) (i) 2 (i) (i) 2 (i) j j uj (xv )v wj − ui L2 (F ) 2 uj − ui L2 (F ) + 2 ji ji j 2 xv ∈*Fj i

,

(70)

L (Fj i )

where the second term of (70) can be bounded as before, see (64), (65) and (81), and using that j C1 i . It remains to estimate the second term of (68). In order to simplify the notation, we take Fij as the interval [0, H ]. Note that

(i)

(wj )2

Fj i

dist(s, *Fj i )

H /2

ds C

(i)

(wj )2 s

0

(i)

H

(wj )2

H /2

(H − s)

ds +

ds .

(71)

Let us estimate the first term on the right-hand side of (71). We have

H /2

(i)

(wj )2 s

0

hj

=

(i)

(wj )2 s

0

ds

ds +

H /2 hj

(i)

(uj )2 s

ds

(i) (ui )2 C + ds + ds s s hj hj 2 Hj 1 (i) (i) (i) (i) C uj (hj ) + ui − uj 2L2 (F ) + 1 + log max |ui |2 ji Fij hj hj Hj 1 (i) Hi (i) 2 (i) 2 C 1 + log ui H 1 ( ) , u − uj L2 (F ) + 1 + log ij i hj i hi hj (i) (uj (hj ))2

H /2

(i)

(i)

(ui − uj )2

H /2

(i)

where uj (hj )2 has been estimated as in (81). The second term of (71) is estimated similarly. (i)

Substituting these estimates into (71) and using that ui has zero face-average values we get (i) (uj )2 H 2 (i) ∇ui 2L2 ( ) ds C 1 + log i h Fj i dist(s, *Fj i ) 1 (i) (i) 2 (72) + ui − uj L2 (F ) . ij hj In turn, substituting (69) and (72) into (68), and the resulting estimate into (67), and using Lemma 2.1, we get H 2 (i) (i) ˜ (i) (i) ˜ dj (Ii D u , Ii D u ) C 1 + log bi (u(i) , u(i) ). h

(73)


Using (66) and (73) in (60), we get H 2 dh (Ii u(i) , Ii u(i) ) C 1 + log bi (u(i) , u(i) ). h

731

Lemma 8.2. Suppose that the Assumption 1 holds. Then, for u0 ∈ V0 , V0 defined by (48), we have the following inequality H 2 b0 (u0 , u0 ), (74) ah (I0 u0 , I0 u0 ) C 1 + log h where C is independent of hi , Hi and the jumps of i . Proof. By Lemmas 2.1 and 4.1 ˆ Hu)Cd ˆ ˆ Hu) ˆ Cdh (Hu, Hu), ah (Hu, h (Hu,

(75)

ˆ Hu) ˆ by where dh (., .) is defined by (8). Hence, to prove the result (74) we can replace ah (Hu, dh (Hu, Hu) on the left-hand side of (74). (i) In order to simplify the notation we write u instead of u0 and put I0 u0 = I0 u = N i=1 Ii u , see (48) and thereafter. We have ⎧ ⎫2 ⎨ ⎬ (i) (j ) ∇ (I u ) + (I u ) di (I0 u, I0 u) = i i j i ⎩ i ⎭ F ⊂* ij

+

Fij ⊂*i

Fij

L2 (i )

i

ij {(Ii u(i) )i + (Ij u(j ) )i } lij hij 2

−{(Ii u(i) )j + (Ij u(j ) )j }

(76)

ds.

To bound the second term on the right-hand side of (76) let us consider the case where Fij is a master side. The proof for the case where Fij is a slave side is similar; see also the arguments given in (63) and thereafter. Then using the definition of Ii and D (i) , we obtain ij 2 J= {(Ii u(i) )i + (Ij u(j ) )i } − {(Ii u(i) )j + (Ij u(j ) )j ds Fij lij hij ij (i) (i) (j ) (j ) (j ) (j ) 2 (i) (i) = {Di ui − Dj uj } − {Dj uj − Di ui } ds Fij lij hij 2 ij (i) (i) (j ) (j ) (i) (i) = {Di ui − Dj uj } − {Dj uj − 0} ds Fij lij hij ij (i) (i) (j ) (i) (j ) (i) (j ) 2 (i) = {Di ui − (Dj + Dj )uj } + Dj {uj − uj } ds Fij lij hij = Fij

⎛ ij ⎜ (i) (i) ⎝{ui − uj } − lij hij

j

xv ∈*Fj i

⎞2 ⎟ (j ) (i) {uj (xvj ) − uj (xvj )}jv ⎠ ds,

(77)

732

M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739 j

j

where v is the nodal basis function corresponding to xv . Hence, ij (i) (i) {ui − uj }2 ds J C Fij lij hij +Chj

ij (j ) (i) max {u (x j ) − uj (xvj )}2 . lij hij xvj ∈*Fj i j v

(78) (j )

(i)

It remains to estimate the second term of (78). First note that uj i = uj i since there are primal variables associated to the faces Fj i ∈ i and Fj i ∈ j ; see (48). Therefore, (i)

(j )

(j )

(j )

(i)

(i)

|uj (xvj ) − uj (xvj )| |uj (xvj ) − uj i | + |uj (xvj ) − uj i |

Hj C 1 + log hj

1 2

(j )

(i)

(i)

∇uj L2 (j ) + |uj (xvj ) − uj i |.

(79)

To deduce the estimate on the first term on the right-hand side of (79) we have used a Poincaré inequality and an L∞ bound for FEM functions, see [18]. The second term of (79) is estimated as (i)

(i)

(i)

(i)

(i)

(i)

(i)

(i)

|uj (xvj ) − uj i | |uj (xvj ) − ui (xvi )| + |ui (xvi ) − uij | + |uij − uj i | C

Hi 1 + log hi

(i) (i) |uj (xvj ) − ui (xvi )| +

−1 (i) +hj 2 ui

1 2

(i)

∇ui L2 (i )

(i) − uj L2 (Fij )

(80)

,

where we have used a Poincaré inequality and an L∞ bound for FEM functions to obtain the second term on the right-hand side of (80) and a Cauchy–Schwarz inequality to obtain the third (i) (i) term of (80). To estimate the first term of (80), let Qj ui be the L2 -projection of ui on the hj triangulation of Fj i . We obtain (i)

(i)

(i)

(i)

(i)

(i)

|uj (xvj ) − ui (xvi )| |uj (xvj ) − Qj ui (xvi )| + |Qj ui (xvi ) − ui (xvi )|

−1

(i)

(i)

C hj 2 uj − ui L2 (Fij ) 1 Hj 2 (i) + 1 + log ∇ui L2 (i ) , hj

(81)

where the first estimate was obtained from an inverse inequality and the second from the approximation properties of the L2 projection and an L∞ bound for FEM functions. By Lemmas 2.1 and 4.1 we can bound the term di (HI˜i u(i) , HI˜i u(i) ) by bi (Hˆ i u(i) , Hˆ i u(i) ). Then we conclude that J of (77) can be estimated as H {bi (u(i) , u(i) ) + bj (u(j ) , u(j ) }, (82) J C 1 + log h since ij Ci and hj Chij .


733

It remains to estimate the first term in (76). We have ⎧ ⎫2 ⎨ ⎬ (j ) ∇ (Ii u(i) )i + (Ij u )i ⎩ ⎭ F ⊂* ij

L2 (i )

i

⎧⎛ ⎫2 ⎞ ⎨ ⎬ (j ) (j ) (j ) (i) (i) (i) ⎝ ⎠ = ∇ Di + Di Di (ui − ui ) ui + ⎩ ⎭ F ⊂* F ⊂* ij

C

⎧ ⎨

i

ij

(i)

∇ui 2L2 ( ) +

⎩

i

(j )

(j )

Di (ui

L2 (i )

i

⎫ ⎬

(i)

− ui ) 2

H00 (ij ) ⎭

ij ⊂*i

1/2

(83)

,

where the sum in (83) reduces to the slave sides Fij . From (48) we obtain (j )

(j )

Di (ui

(i)

− ui ) 2

(j ) (j ) Di (ui

2

1/2

H00 (Fij )

(j ) − uij ) 2 1/2 H (Fij ) 00

(j ) (i) + Di (ui

(i) − uij ) 2 1/2 H (Fij )

(84)

00

and therefore, the first term of (84) is estimated as (j )

(j )

i Di (ui

(j ))

− uij ) 2

1/2

H00 (Fij )

(j ) (j ) (j ) 2i Di (ui − Qi uj ) 2

(j )

1/2

H00 (Fij )

(j )

(j )

+ Di (Qi uj − uj i ) 2

(j )

(j )

(j )

+ Di (uj i − uij ) 2 Ci

1/2

H00 (Fij )

1/2

H00 (Fij )

Hj 2 1 (j ) (j ) 2 (j ) 2 u − uj L2 (F ) + 1 + log ∇uj L2 ( ) ji j hi i hj

Hj C 1 + log hj

2 bj (u(j ) , u(j ) ),

(85)

since i C1 j and hij 2hi when Fij is a slave side, and in view of Lemma 2.1. The second term on the right-hand side of (84) is bounded by

(j ) (i) i Di (ui

(i) − uij ) 2 1/2 H (Fij ) 00

Hi 2 (i) Ci 1 + log ∇ui L2 (i ) hi Hi 2 bi (u(i) , u(i) ). 1 + log hi

(86)

734


Using (85) and (86) in (84) and the resulting inequality in (83) we see that ⎧ ⎫2 ⎨ ⎬ (i) (j ) i ∇ (Ii u )i + (Ij u )i ⎭ ⎩ Fij ⊂*i

H C 1 + log h

L2 (i )

2 {bi (u(i) , u(i) ) + bj (u(j ) , u(j ) )}.

This estimate and (82), see (76), imply that

H di (I0 u0 , I0 u0 )C 1 + log h

2 {bi (u(i) , u(i) ) + bj (u(j ) , u(j ) )}.

Summing this over i and using Lemmas 2.1 and 4.1 we get (74).

9. Smaller coarse spaces In Section 6 we have defined the coarse space with a primal variable associated to each face Fk ∈ i . In this case the number of constraints per subdomain is twice the number of edges of *i for floating subdomains i . In this section we discuss choices of subsets of i which imply smaller coarse problems and still maintain the bound (50) of Theorem 7.1. Recall that a face across i and j has two sides, the side contained in *i , denoted by Fij , ˜ i , i = 1, . . . , N, be such that for all pairs and the side contained in *j , denoted by Fj i . Let ˜i ∩ ˜ j contains one and only one face from of neighboring subdomains i and j the subset each pair {Fij , Fj i }, i.e., Fij or Fj i . We denote the chosen face by ij = j i . For instance, we ˜ i as the set of master faces ij associated to i . can choose ˜ i , the local spaces Vi = Vi (i ), i = 1, . . . , N, are defined as the subspaces After choosing ˜ i while the spaces V0i are of Wi of functions with zero face-average values on all faces k ∈ (i) (i) ˜ defined as V0i = V0i (i ) = Span{ : k ∈ i } ⊂ Wi where the functions are defined k k ˜ i in each subdomain; see (47). as in Section 6 replacing i by From now on we will use the notation 1 (i) uk = u(i) ds, | k | k (i)

where u(i) ∈ Wi . The global coarse space V0 is now defined as the set of all u0 = {u0 } ∈ N i=1 V0i (i ) such that for i = 1, . . . , N, we have (i)

(j )

u0ij = u0ij (i)

˜ i. ∀ ij ∈

(87)

Recall that u0 is defined locally. Then we have the following possible cases of continuity with respect to the primal variables: (i) Case 1: ij = j i = Fij . This case imposes continuity of the face-average values of u0 and (j ) u0 on Fij ; see (87). Case 2: ij = j i = Fj i . This case imposes continuity of the face-average values on Fj i .


735

Example 9.1. Consider the domain = (0, 1)2 and divide it into N = M × M squares subdomains i which are unions of fine elements, with H = 1/M. We note that for floating subdomains ˜ i has only four coarse basis functions. i , i has eight coarse basis functions while The bilinear forms ah , bi and the operators Ii , i = 1, . . . , N, and the operator I0 are defined in Sections 5 and 6. We now show that with these new local and global spaces Theorem 7.1 still holds. The proof is basically the same as the one given in Sections 7 and 8 with some minor modifications depending on which of the above cases is considered and also on a modification of the Poincaré inequality. Theorem 9.1. If the Assumption 1 holds, then there exists a positive constant C independent of hi , Hi and the jumps of i such that H 2 ah (u, u) ∀u ∈ V , ah (u, u)ah (T u, u) C 1 + log h

(88)

where T is defined in (40), the local spaces Vi , i = 1, . . . , N, are defined above in this section i and the global space V0 is defined using (87). Here log Hh = maxi log H hi . Proof. We now mention the main modifications of the proof of the three key assumptions of Lemma 5.1. (i)

Assumption (i). Let u = {ui }N i=1 ∈ V (). Define u0 ∈ V0i (i ) by (i) u0

1 (i) = u ds k | k | k

(89)

˜i k ∈

and proceed as in the proof of Theorem 7.1. Assumption (ii). It is the same argument given to verify Assumption (ii) in the proof of Theorem 7.1. Assumption (iii). We modify the proof of Lemmas 8.2 and 8.1 as follows: For the proof of Lemma 8.2 we consider the following cases to obtain a bound for the left-hand side of (79), Case 1: ij = j i = Fj i . In this case we use the same argument as in the proof of Lemma 8.2 to estimate the left-hand side of (79). Case 2: ij = j i = Fij . In this case we estimate, see (79), (j )

(i)

(i)

(i)

(j )

(j )

(i)

(j )

|uj (xvj ) − uj (xvj )||uj (xvj ) − uj i | + |uj (xvj ) − uj i | + |uj i − uj i |.

(90)

The first and second term of (90) can be bounded as in Case 1. The third term of (90) is bounded (j ) (i) as follows: since ij = j i = Fij we have that uij = uij ; see (87). Then (i)

(j )

(i)

(i)

(j )

(j )

|uj i − uj i | |uj i − uij | + |uij − uj i |

(91)

736


and we obtain (i)

(i)

−1

(i)

−1

(i)

(i)

(i)

|uj i − uij | CHj 2 uj − ui L2 (Fij ) Chj 2 uj − ui L2 (Fij ) . An analogous bound holds also for the second term of (91); see (79). For the proof of Lemma 8.1 we can apply Poincaré inequality only in the case which ij = Fij ⊂ *i . If this is not the case, i.e., if ij = Fj i ⊂ j , we can still bound the H 1 (i ) norm by the seminorm using the following argument: if u(i) ∈ Vi and ij = Fj i then u(i) has zero face-average value on Fj i and therefore, (i)

1/2

ui L2 (i ) ui − uij L2 (i ) + Hi

(i)

(i)

(i)

uij − uj i L2 (Fij ) (i)

Hi ∇ui L2 (i ) + ui − uj L2 (Fij ) . Having modified the proof of Lemmas 8.2 and 8.1, then Assumption (iii) follows.

10. Numerical experiments In this section we present numerical results for the preconditioner introduced in (40) and show that the bounds of Theorems 7.1 and 9.1 are reflected in the numerical tests. In particular we show that the Assumption 1, see (41), is necessary and sufficient. We consider the domain = (0, 1)2 and divide into N = M × M square subdomains i which are unions of fine elements, with H = 1/M. Inside each subdomain i we generate a structured triangulation with ni subintervals in each coordinate direction, and apply the discretization presented in Section 2 with = 4. This value = 4 was chosen because numerically it was observed that the L2 approximation error seems to stabilize when becomes larger. The minimum value of that gives a positive definite system is min = 1.565. In the numerical experiments we use a red–black checkerboard type subdomain partition. On the black subdomains we let ni = 2∗2Lb and on the red subdomains we let ni = 3∗2Lr , where Lb and Lr are integers denoting −Lb the number of refinements inside each subdomain i . Hence, the mesh sizes are hb = 22M and −Lr

hr = 23M , respectively. We solve the second-order elliptic problem −div((x)∇u∗ (x)) = 1 in with homogeneous Dirichlet boundary conditions. In the numerical experiments, we run PCG until the l2 -norm initial residual is reduced by a factor of 106 . In the first test we consider the constant coefficient case = 1. We consider different values of M × M coarse partitions and different values of local refinements Lb = Lr , therefore, keeping constant the mesh ratio hb / hr = 23 . We place the masters on the black subdomains. We note that the interface condition (41) is satisfied. Table 1 lists the number of PCG iterations and in parenthesis the condition number estimate of the preconditioned system in the case we choose eight coarse functions per subdomain. As expected from the analysis, the condition numbers appear to be independent of the number of subdomains and seem to grow by a logarithmic factor when the size of the local problems increases. Note that in the case of continuous coefficients, Theorems 7.1 and 9.1 are valid without any assumptions on hb and hr if the master sides are chosen on the larger meshes. ˜ i as the set of master faces of i . Table 2 is the same as before, however, now we have chosen In this case we have four coarse basis functions in each subdomain. We note that even though the coarse problems are smaller, the results are very similar to the ones presented in Table 1 where the


737

Table 1 PCG/BDDC iteration counts and condition numbers for different sizes of coarse and local problems and constant coefficients i with eight coarse basis functions per subdomain M ↓ Lr →

0

1

2

3

4

5

2 4 8 16 32

12 (5.7) 14 (5.8) 15 (5.9) 15 (6.0) 15 (6.0)

14 (6.7) 18 (8.5) 20 (9.1) 20 (9.4) 20 (9.3)

15 (7.5) 21 (11.7) 24 (12.3) 25 (12.8) 25 (12.8)

18 (10.6) 24 (15.2) 27 (15.8) 28 (16.3) 28 (16.3)

19 (14.5) 27 (19.2) 31 (19.6) 31 (20.1) 32 (20.2)

19 (19.0) 29 (23.9) 34 (24.0) 35 (24.5) 35 (24.6)

Table 2 PCG/BDDC iteration counts and condition numbers for different sizes of coarse and local problems and constant coefficients i with four coarse basis functions per subdomain associated to its master faces M ↓ Lr

0

1

2

3

4

5

2 4 8 16 32

13 (5.7) 15 (5.8) 17 (6.1) 18 (6.1) 18 (6.1)

15 (6.7) 19 (8.5) 21 (9.1) 23 (9.4) 24 (9.4)

16 (7.5) 22 (11.7) 25 (12.3) 27 (12.8) 27 (12.8)

18 (10.7) 24 (15.1) 28 (15.7) 30 (16.3) 30 (16.3)

19 (14.5) 27 (19.2) 31 (19.6) 32 (20.1) 32 (20.2)

19 (18.9) 29 (23.8) 34 (24.0) 35 (24.5) 35 (24.6)

Table 3 PCG/BDDC iteration counts and condition numbers for different values of coefficients and the local mesh sizes on the red subdomains only

↓ Lr

→

1000 10 0.1 0.001

0

1

2

3

4

5

85 (2099) 28 (24.4) 16 (6.6) 16 (6.96)

165 (2822) 37 (32.9) 17 (6.8) 16 (7.12)

263 (3746) 43 (42.3) 16 (6.8) 16 (7.16)

282 (4758) 47 (52.8) 17 (6.8) 16 (7.25)

287 (5922) 51 (64.8) 17 (6.9) 17 (7.38)

310 (7168) 53 (77.7) 17 (6.9) 18 (7.50)

The coefficients and the local mesh sizes on the black subdomains are kept fixed. The subdomains are also kept fixed to 4 × 4 and eight coarse basis functions in each subdomain are used.

coarse problems are larger. As in the case of Table 2 the smallest eigenvalue of the preconditioned operator is 1. We now consider the discontinuous coefficient case where we set i = 1 on the black subdomains and i = on the red subdomains. The subdomains are kept fixed at 4 × 4, i.e., 16 subdomains. Table 3 lists the results of computations for different values of and for different levels of refinement on the red subdomains. On the black subdomains ni = 2 is kept fixed. The masters are placed on the black subdomains. It is easy to see that the interface condition (41) holds if, and only if, is not large, which seems to be in agreement with the results in Table 3. We repeat the same experiment as in Table 3 but this time with four coarse local basis functions associated to the master sides of the subdomain. The results are presented in Table 4.

738


Table 4 PCG/BDDC iteration counts and condition numbers for different values of coefficients and the local mesh sizes on the red subdomains only

↓ Lr

→

1000 10 0.1 0.001

0

1

2

3

4

5

84 (2127) 32 (24.7) 15 (6.9) 15 (7.4)

133 (2905) 40 (33.4) 16 (6.8) 15 (7.3)

188 (3827) 45 (43.0) 16 (6.8) 16 (7.2)

254 (4838) 49 (53.5) 17 (6.8) 17 (7.3)

326 (5980) 53 (65.3) 17 (6.9) 17 (7.42)

384 (7205) 54 (78.0) 17 (7.0) 18 (7.52)

The coefficients and the local mesh sizes on the black subdomains are kept fixed. The subdomains are also kept fixed to 4 × 4 and four coarse basis functions in each subdomain are used. Master faces are chosen.

11. Conclusions and extensions In this paper several BDDC methods with different coarse spaces, for DG discretization of second-order elliptic equations with discontinuous coefficients, have been designed and analyzed. It has been proved that the methods are almost optimal and very well suited for parallel computations. Their rates of convergence are independent of the parameters of the triangulations, the number of substructures and the jumps of the coefficients. The numerical tests confirm the theoretical results. (i) In 2-D, the methods are based on choosing Di to be equal to one at the vertices of i . The (i) methods can be extended to 3-D by considering Di to be equal to one at nodal points of edges and vertices of the i . In this case Theorems 7.1 and 9.1 hold. The methods also can be generalized to max (x) the case where i = mixxx i(x) is not large. In this case, define constants ¯ i as the integral average i of the i (x) over the i . The ¯ i are used to determine the mortar and slave sides, and can be used to define the weighting matrices D (i) as well. For the bilinear forms bi (·, ·) we use exact solvers where i (x) are considered rather than ¯ i . In this case, Theorems 7.1 and 9.1 are valid, with lower bound equal to one, and upper bound now involving a constant C depending linearly on i . The case where the i (x) have large variations inside the i will be discussed elsewhere. Finally, we remark that the condition number of the preconditioned systems deteriorates as we increase the penalty parameters to large values. Acknowledgment We would like to express our thanks to the anonymous referee for the suggestions to improve the presentation of the paper. References [1] P.F. Antonietti, Domain decomposition, spectral correctness and numerical testing of discontinuous Galerkin methods, Ph.D. Thesis, Dipartimento di Matematica, Università di Pavia, 2006. [2] P.F. Antonietti, B. Ayuso, Schwarz domain decomposition preconditioners for discontinuous Galerkin approximations of elliptic problems: non-overlapping case, Technical Report 20-VP, IMATI-CNR, June 2005. To appear in M2AN. [3] P.F. Antonietti, B. Ayuso, Multiplicative schwarz methods for discontinuous Galerkin approximations of elliptic problems, Technical Report 10-VP, IMATI-CNR, June 2006. Submitted to M2AN. [4] D.N. Arnold, An interior penalty finite element method with discontinuous elements, SIAM J. Numer. Anal. 19 (4) (1982) 742–760. [5] D.N. Arnold, F. Brezzi, B. Cockburn, D. Marini, Unified analysis of discontinuous Galerkin method for elliptic problems, SIAM J. Numer. Anal. 39 (5) (2002) 1749–1779.


739

[6] S. Brenner, K. Wang, Two-level additive Schwarz preconditioners for C 0 interior penalty methods, Numer. Math. 102 (2) (2005) 231–255. [7] C.R. Dohrmann, A preconditioner for substructuring based on constrained energy minimization, SIAM J. Sci. Comput. 25 (1) (2003) 246–258. [8] M. Dryja, On discontinuous Galerkin methods for elliptic problems with discontinuous coefficients, Comput. Methods Appl. Math. 3 (1) (2003) 76–85. [9] M. Dryja, J. Galvis, M. Sarkis, Balancing domain decomposition methods for discontinuous Galerkin discretization, in: Ulrich Langer et al. (Eds.), Domain Decomposition Methods in Science and Engineering XVII, Lecture Notes in Computational Science and Engineering, Springer, Berlin, 2007, to appear. [10] M. Dryja, M. Sarkis, O. Widlund, Multilevel Schwarz methods for elliptic problems with discontinuous coefficients in three dimensions, Numer. Math. 72 (1996) 313–348. [11] M. Dryja, M. Sarkis, A Neumann-Neumann method for DG discretization of elliptic problems, Tech. Rep. Serie A 456, Intituto de Mathemática Pura e Aplicada, http://www.preprint.impa.br/Shadows/SERIE_A/2006/456.html, June 2006. [12] X. Feng, O.A. Karakashian, Two-level additive Schwarz methods for a discontinuous Galerkin approximation of second-order elliptic problems, SIAM J. Numer. Anal. 39 (4) (2001) 1343–1365. [13] X. Feng, O.A. Karakashian, Two-level non-overlapping Schwarz preconditioners for a discontinuous Galerkin approximation of the biharmonic equation, J. Sci. Comput. 22 (1) (2005) 289–314. [14] C. Lasser, A. Toselli, An overlapping domain decomposition preconditioners for a class of discontinuous Galerkin approximations of advection–diffusion problems, Math. Comput. 72 (243) (2003) 1215–1238. [15] R.D. Lazarov, S.Z. Tomov, P.S. Vassilevski, Interior penalty discontinuous approximations of elliptic problems, Comput. Methods Appl. Math. 1 (4) (2001) 367–382. [16] J. Li, O.B. Widlund, FETI-DP, BDDC, and block Cholesky methods, Internat. J. Numer. Methods Eng. 66 (2) (2006) 250–271. [17] J. Mandel, C.R. Dohrmann, R. Tezaur, An algebraic theory for primal and dual substructuring methods by constraints, Appl. Numer. Math. 54 (2) (2005) 167–193. [18] A. Toselli, O.B. Widlund, Domain decomposition methods—algorithms and theory, Springer Series in Computational Mathematics, vol. 34, Springer, Berlin, 2005.


An effective algorithm for generation of factorial designs with generalized minimum aberration Kai-Tai Fanga,∗ , Aijun Zhangb , Runze Lic a BNU - HKBU United International College, Zhuhai Campus of Beijing Normal University, Jinfeng Road, Zhuhai,

519085, China b Department of Statistics, University of Michigan, Ann Arbor, MI 48105, USA c Department Statistics, Penn State University, University Park, PA 16802, USA

Received 28 September 2006; accepted 16 March 2007 Available online 4 May 2007

Abstract Fractional factorial designs are popular and widely used for industrial experiments. Generalized minimum aberration is an important criterion recently proposed for both regular and non-regular designs. This paper provides a formal optimization treatment on optimal designs with generalized minimum aberration. New lower bounds and optimality results are developed for resolution-III designs. Based on these results, an effective computer search algorithm is provided for sub-design selection, and new optimal designs are reported. © 2007 Elsevier Inc. All rights reserved. Keywords: Fractional factorial design; Generalized minimum aberration; Lagrange analysis; Sub-design selection

1. Introduction Fractional factorial designs (FFDs) are popular choices of designs of experiments in industry. Extensive research has been done on factorial designs in recent decades, with main focus on optimality theory and design construction. The two most successful optimality criteria are maximum resolution by Box and Hunter [2] and minimum aberration by Fries and Hunter [7]. However, these criteria are defined for regular designs only; they cannot be used to assess a factorial design in general. Recently, generalized minimum aberration (GMA) was proposed for both regular and non-regular designs, with the two-level case by Tang and Deng [14], and multi-level case by Ma ∗ Corresponding author.

E-mail addresses: [email protected] (K.-T. Fang), [email protected] (A. Zhang), [email protected] (R. Li). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.010

K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751

741

and Fang [10] and Xu and Wu [17]. More background on FFD and GMA will be presented in the next section. This paper is to study the optimal conditions for GMA designs, which is a non-trivial problem due to the sequential optimization nature of the criterion. Unlike conventional combinatorial approaches, we provide a formal treatment for resolution-III designs from the optimization perspective. It will be shown that our new optimality results can be viewed as a natural extension of the weak-equidistance optimality for resolution-II designs by Zhang et al. [18]. Here we restrict ourselves to the symmetrical designs, i.e. those in which all the factors take the same number of levels, while the methodology is readily generalized to mixed-level designs in a straightforward manner. There exist several approaches to the construction of GMA designs. Among others, Lin [8] proposed to use half fractions of Hadamard matrices for constructing two-level supersaturated designs (SSDs), and Fang et al. [5] proposed the RBIBD method for constructing multi-level SSDs. However, these construction methods are restricted to GMA designs of resolution II. For designs of resolution III or higher, it is most natural to consider the subset design approach based on existing classes of orthogonal arrays. Butler [3] obtained some GMA designs by projecting specific saturated orthogonal arrays. In this paper, we propose a general sub-design selection algorithm, which utilizes the newly developed lower bounds and optimality conditions. The paper is organized as follows. Some background material is presented in Section 2. In Section 3, new lower bounds and optimality results are developed for orthogonal FFDs of resolution III, by Lagrange analysis for the nonlinear programming problem and a strengthening technique to take into account the integer-valued condition. These results are applied in Section 4 for sub-design selection, where we provide an effective computer search algorithm and report some new optimal designs with GMA. In the final section, we discuss some possible routes for future works. Throughout the paper, we use the symbol x for representing the largest integer not exceeding x, x for the smallest integer not less than x and x = x − x for the fractional part of x. The Kronecker delta function is defined as (x, y) = 1 if x = y and 0 otherwise. For non-negative integers j and k, S(j, k) denotes the Stirling number of the second kind, i.e. the number of ways of partitioning a set of j elements into k non-empty sets. Furthermore, we extendthe definition of binomial coefficient function xj to cover any non-negative argument x ∈ R+ : xj = 1 if j = 0, x x(x−1)···(x−j +1) x otherwise. j! j = 0 if x < j and j = 2. Background A factorial design of n runs and s factors for which each factor takes q levels is denoted by D(n, q s ). The full factorial design with n = q s runs comprises all possible level combinations. An FFD with n>q s runs takes only some fraction of the runs required for the full factorial; see Wu and Hamada [15] for details. A particular FFD is often chosen to satisfy some constraint or optimize some condition or set of conditions. Two common conditions of interest are balance and orthogonality. Balance means that for each factor each level appears in the same number of runs. Orthogonality means that for each pair of factors, all the q 2 possible level-combinations appear equally often. Two designs are said to be isomorphic if one can be obtained from the other by re-ordering runs, permuting factors or switching levels of one or more factors. For given parameters (n, s, q), we use D(n, q s ), U(n, q s ) and L(n, q s ) to denote the sets of non-isomorphic designs that have no constraint, balanced constraint and orthogonality constraint, respectively. A design in the set of U(n, q s ) is also called a U-type design in the uniform design literature; see

742


Fig. 1. Nested classes of fractional factorial designs.

the recent monograph by Fang et al. [6]. Note that these sets of designs are nested by D(n, q s ) ⊃ U(n, q s ) ⊃ L(n, q s ). The column-wise study of fractional factorials is tightly connected with the notion of orthogonal arrays. An FFD D(n, q s ) can be viewed as an orthogonal array of strength t, often denoted by OA(n, s, q, t), if for each t-tuple of factors each level combination appears equally often. Similarly, let us use OA(n, s, q, t) to denote the set of non-isomorphic orthogonal arrays, where OA(n, s, q, 1) ≡ U(n, q s ) and OA(n, s, q, 2) ≡ L(n, q s ). For given (n, s, q), an illustration of the nested structure is given in Fig. 1. Rao [13] presented the following well-known conditions for the existence of OA(n, s, q, t): u s i if t = 2u i (q − 1) n i=0 (1) u s i + s−1 (q − 1)u+1 if t = 2u + 1. (q − 1) i=0 i u These general lower bounds of n for given (s, q, t) are called Rao’s bounds. They have been improved for many specific parameter settings, see e.g. Bose and Bush [1] and Mukerjee and Wu [12]. The row-wise study of factorial designs is tightly connected with the notion of error-correcting codes in MacWilliams and Sloane [11]. It leads to the definition of the generalized minimum j aberration (GMA) criterion. Let ik be the coincidence indicator between the ith and kth runs at the jth factor. For any 1i, k n, the (i, k)-coincidence of the design is defined as ik = s j j =1 ik . For an OA(n, s, q, t), Bose and Bush [1] derived the following necessary conditions on the coincidences: n ik n s for 1 i n, 1j t. (2) = j j q j k=1

On the other hand, Zhang et al. [18] proved that for any D(n, q s ), n n ik n 2 n s = Nv(u) − j for 1 j s, − j q j q j u v i=1

k=1

(3)

where u denotes the summation over all j-element subset of {1, . . . , s}, v denotes the sum(u) mation over q j j-tuple level-combinations, and Nv is the frequency of level-combination v appearing in a u-factor sub-design. It is implied that D is an OA(n, s, q, t) if the right-hand side of (3) vanishes for all j t. Thus, it is clear that the Bose–Bush identities (2) are also sufficient


743

conditions for D(n, q s ) to have orthogonal strength t. Besides, for a design D(n, q s ) that is saturated in the sense that n = 1 + s(q − 1), [12] derived that ik = s − n/q

for any 1 i < k n,

(4)

a useful property in many ways, with a typical example in the study of complementary designs. Maximum resolution and minimum aberration are well-known criteria for regular designs. They are extended to non-regular designs via row-wise coincidences and MacWilliams identities in coding theory, as noted by Ma and Fang [10] and Xu and Wu [17]. For any FFD D(n, q s ), define j n n 1 ik w j −w s − ik Aj (D) = 2 (−1) (q − 1) (5) w j −w n i=1 k=1 w=0

for 1 j s. The vector w(D) = (A1 (D), . . . , As (D)) is called the generalized word-length pattern (GWP), and the index of the first non-zero element corresponds to the resolution. For two designs, D1 is said to have less generalized aberration than D2 if the first non-zero element of w(D1 ) − w(D2 ) is negative. A design D∗ is said to have GMA if no other design has less generalized aberration than it. 3. Lower bounds and optimality results For given parameters (n, s, q), the GMA criterion tends to sequentially minimize the GWP from low to high orders, such that the selected designs not only have the maximum resolution (say r), but also have the smallest Ar -value. Furthermore, if there are multiple resolution-r designs with the same smallest Ar , the GMA criterion will sequentially reduce the set of candidates by minimizing Ar+1 (D), Ar+2 (D), . . . , until the resulting candidates all have the same optimal GWP. The lower bounds of Aj (D) for j = 1, 2, 3, . . . are of crucial importance in the search of GMA designs. Delsarte [4] derived that Aj (D) 0 for 1 j s, where the equality holds for all j t if and only if D(n, q s ) is an orthogonal array of strength t. Consider the set of candidate designs D ∈ St ≡ OA(n, s, q, t) \ OA(n, s, q, t + 1), for which A1 (D) = · · · = At (D) = 0 while At+1 (D) > 0. Schematically in Fig. 1, St for t = 1, 2, . . . represent the resolution-(t + 1) rings from outer to inner areas. For D ∈ St , there is lack of a tight lower bound for At+1 (D). This section presents lower bounds and optimality results for St with t = 1 and 2. We begin with a brief review of the weak-equidistant optimality for S1 , as obtained by [18] through majorization inequality. Then, we develop a general treatment for S2 from a formal optimization perspective. It will be shown that the optimality results for S2 can be viewed as a natural extension of those for S1 . 3.1. Balanced designs of resolution II Zhang et al. [18] provides the optimality results for U-type designs D ∈ S1 (n, q s ), namely, the weak-equidistance lower bounds: A2 (D)

q2 ((n − 1)( + 2 − 1) + s(s − 1)(1 − n/q 2 )), 2n

(6)

744


where = 0 and = 0 , based on the average of pairwise coincidences

1 0 = n

2 1 i

d , p1


774

J. Vybíral / Journal of Complexity 23 (2007) 773 – 792

¯ Second, we always assume that which guarantees the continuous embedding Bps11 q1 () → C(). s1 s2 the embedding Bp1 q1 () → Bp2 q2 () is compact, which holds if and only if 1 1 s1 − s2 > d − . p1 p2 + We measure the worst case error of Sn f by sup{f − Sn f |Bps22 q2 () : f |Bps11 q1 () 1}.

(1.2)

The same worst case error may also be considered for nonlinear sampling methods: Sn f = (f (x1 ), . . . , f (xn )),

(1.3)

where : Cn → Bps22 q2 () is an arbitrary mapping. In this paper, we discuss the decay of (1.2) for linear (1.1) and nonlinear (1.3) sampling methods. In some cases we restrict ourselves to the case = I d = (0, 1)d . This allows to describe the optimal sampling operator more explicitly. However, we conjecture, that many of these results can be generalised to general bounded Lipschitz domains. Let Lp () stand for the usual Lebesgue space and Wpk (), k ∈ N, denotes the classical Sobolev space over . Then it is well known that − dk +( p1 − p1 )+

inf sup{f − Sn f |Lp2 () : f |Wpk1 () 1} ≈ n Sn

1

2

,

(1.4)

where the infimum in (1.4) runs over all linear sampling operators Sn , see (1.1) (cf. [5] or [10]). The result remains true if we switch to the general situation where nonlinear methods Sn are allowed. In [12], this statement has been proved for arbitrary bounded Lipschitz domain, but with the Sobolev spaces replaced by the more general scales of Besov and Triebel-Lizorkin spaces. The target space was always given by Lp2 (). The proof given there uses the simple structure of the Lebesgue space. It is the main aim of this paper to generalise (1.4) and to investigate also other “target” spaces. Let us present our main results. If s2 > 0, then the quantity inf sup{f − Sn f |Bps22 q2 () : f |Bps11 q1 () 1} Sn

(1.5)

behaves like −

n

s1 −s2 1 1 d +( p1 − p2 )+

in both, the linear as well as the nonlinear setting. We prove this result only for the special case of = (0, 1)d . However, in this situation we are able to give an explicit description of in order kd optimal operator which we are going to introduce now. Namely, if n ≈ 2 , where k ∈ N dis fixed, we use a smooth decomposition of unity {k, } such that k, (x) = 1 for x ∈ (0, 1) where the support of k, is concentrated around 2−k . Then we approximate f locally on supp k, by a polynomial gk, and define gk, k, . Sn f =

To calculate each of the 2(k+2)d functions gk, we need to combine M+d−1 function values of d kd ≈ n function values of f to obtain ≈ 2 f in a linear way. Altogether, we need 2(k+2)d M+d−1 d


775

Sn f . Here, M > s1 is a fixed natural number. The generalisation of this construction to bounded Lipschitz domains remains a subject of further study. If s2 < 0, we give the following characterisation of (1.5). If p1 p2 or p1 < p2 and pd2 − pd1 > s2 , then (1.5) decays like s1

n− d

and if p1 < p2 and 0 > s2 > −

n

s1 s2 1 1 d + d + p1 − p2

d p2

−

d p1 ,

then (1.5) behaves like

.

All these results hold for linear as well as nonlinear methods Sn . These estimates can be applied in connection with elliptic differential operators, which was the actual motivation for this research, cf. [6,7]. Let us briefly introduce this setting. Let A:H →G be a bounded linear operator from a Hilbert space H to another Hilbert space G. We assume that A is boundedly invertible, hence A(u) = f has a unique solution for every f ∈ G. A typical application is an operator equation, where A is an elliptic differential operator, and we assume that A : H0s () → H −s (), where is a bounded Lipschitz domain, H0s () is a function space of Sobolev type with fractional order of smoothness s > 0 of functions vanishing on the boundary and H −s is a function space of Sobolev type with negative smoothness −s < 0. The classical example is the Poisson equation −u = f in

and u = 0 on *.

Here, s = 1 and A = − : H01 () → H −1 () is bounded and boundedly invertible. We want to approximate the solution operator u = S(f ) using only function values of f . We define the nth linear sampling number of the identity id : H −1+t () → H −1 () by gnlin (id : H −1+t () → H −1 ()) = inf id − Sn |L(H −1+t (), H −1 ()), Sn

where t is a positive real number with −1 + t > S : H −1+t () → H 1 () by

d 2,

and the nth linear sampling number of

gnlin (S : H −1+t () → H 1 ()) = inf S − Sn |L(H −1+t (), H 1 ()). Sn

(1.6)

(1.7)

The infimum in (1.6) and (1.7) runs over all linear operators Sn of the form (1.1) and L(X, Y ) stands for the space of bounded linear operators between two Banach spaces X and Y, equipped with the classical operator norm.

776


It turns out that these quantities are equivalent (up to multiplicative constants which do not depend neither on f nor on n) and are of the asymptotic order gnlin (S : H −1+t () → H 1 ()) ≈ gnlin (id : H −1+t () → H −1 ()) ≈ n−

−1+t d

.

We refer to [6,7] for a detailed discussion of this approach. The estimates of sampling numbers of embedding between two function spaces translates therefore into estimates of sampling numbers of the solution operator S. We observe that the more regular f, the faster is the decay of the linear sampling numbers of the solution operator S. Let us also point out that optimal linear methods (not restricted to use only the function values of f) achieve asymptotically a better rate of convergence, t namely n− d . Hence, the limitation to the sampling operators results in a serious restriction. One has to pay at least n1/d in comparison with optimal linear methods. Using our estimates of sampling numbers of identities between Besov and Triebel-Lizorkin spaces, this result may be generalised as follows. 1 If p 2, 1 q ∞ and −1 + t > pd then −1+t −1+t gnlin (S : Bpq () → H 1 ()) ≈ gnlin (id : Bpq () → H −1 ()) ≈ n−

If p < 2 with

1 p

>

1 d

+ 21 , 1 q ∞ and −1 + t >

d p

−1+t d

.

then − dt + p1 − 21

−1+t −1+t gnlin (S : Bpq () → H 1 ()) ≈ gnlin (id : Bpq () → H −1 ()) ≈ n

Finally, if p < 2 with

1 p

d p

.

then

−1+t −1+t gnlin (S : Bpq () → H 1 ()) ≈ gnlin (id : Bpq () → H −1 ()) ≈ n−

−1+t d

.

We prove the same results also for the nonlinear sampling numbers gn (S). Altogether, the regularity information of f may now be described by an essentially broader scale of function spaces. All the unimportant constants are denoted by the letter c, whose meaning may differ from one ∞ occurrence to another. If {an }∞ n=1 and {bn }n=1 are two sequences of positive real numbers, we write an bn if, and only if, there is a positive real number c > 0 such that an c bn , n ∈ N. Furthermore, an ≈ bn means that an bn and simultaneously bn an . 2. Sampling numbers The notation and basic facts about function spaces, which we shall need later on, are included in the Appendix. We now introduce the concept of sampling numbers. Definition 2.1. Let be a bounded Lipschitz domain. Let G1 () be a space of continuous functions on and G2 () ⊂ D () be a space of distributions on . Suppose, that the embedding id : G1 () → G2 () is compact. 1 Although the results are stated only for Besov spaces, they are proved also for Triebel-Lizorkin spaces, which include also fractional Sobolev spaces as a special case.


777

For {xj }nj=1 ⊂ we define the information map Nn : G1 () → Cn ,

Nn f = (f (x1 ), . . . , f (xn )), f ∈ G1 ().

For any (linear or nonlinear) mapping n : Cn → G2 () we consider Sn : G1 () → G2 (),

Sn = n ◦ Nn .

(i) Then, for all n ∈ N, the nth sampling number gn (id) is defined by gn (id) = inf sup{f − Sn f |G2 () : f |G1 () 1}, Sn

(2.1)

where the infimum is taken over all n-tuples {xj }nj=1 ⊂ and all (linear or nonlinear) n . (ii) For all n ∈ N the nth linear sampling number gnlin (id) is defined by (2.1), where now only linear mappings n are admitted. 2.1. The case s2 > 0 In this section, we discuss the case where = I d = (0, 1)d is the unit cube, G1 () = Asp11 q1 ()

and G2 () = Asp22 q2 () with s1 >

and s1 − d

1 p1

−

1 s p2 + > s2 > 0. Here, Apq () stands s () or a Triebel-Lizorkin space F s (), see Definition A.3 for details. either for a Besov space Bpq pq We start with the most simple and most important case, namely when p1 = p2 = q1 = q2 . d p1

s1 s2 Proposition 2.2. Let = I d = (0, 1)d . Let G1 () = Bpp () and G2 () = Bpp () with 1p ∞,

s1 >

d p

and s1 > s2 > 0.

Then gnlin (id) n−

s1 −s2 d

.

Proof. First, we introduce necessary notation. Let a > 0, z ∈ Rd and U ⊂ Rd . Then aU = {ax : x ∈ U }

and z + aU = {z + ax : x ∈ U }.

(2.2)

Furthermore, if k ∈ N0 and ∈ Zd , we set Qk, = {x ∈ Rd : 2−k i < xi < 2−k (i + 1)},

Qk, = x ∈ I d : 2−k i − 21 < xi < 2−k i + 23 . We point out, that (up to a set of measure zero) I d = ∪{Qk, : 0 i 2k − 1, i = 1, 2, . . . , d}. Next, we introduce smooth decomposition of unity, first on Rd and then its restriction to I d . Let ˜ ∈ S(Rd ) with d ˜ ⊂ −1, 3 ˜ − ) = 1, x ∈ Rd . supp (x and 2 2 ∈Zd

778


Then we define

k, (x) =

˜ k x − ) if x ∈ I d , (2 0 otherwise.

(2.3)

Let us denote Ak = {−1, 0, . . . , 2k }d . By (2.3), the following identities are true for every k ∈ N: 1 if x ∈ I d , k, (x) = k, (x) = Id (x) = 0 otherwise, ∈Ak

∈Zd

supp k, ⊂ Qk, ,

∈ Ak .

Now we define linear approximation operators S˜k . Take f ∈ G1 (I d ) and consider the decomposition f = f k, . ∈Ak

To each Qk, we associate gk, ∈ P M (Qk, ) such that gk, (2−k ·) approximates f (2−k ·) on 2k Qk, according to Corollary A.6, see the Appendix, s1 (f − gk, )(2−k ·)|Bpp (2k Qk, )

1

t 0

−s1 p

M,2k Qk, dt (f (2−k ·))(x)|Lp (2k Qk, )p

The operators S˜k : G1 (I d ) → G2 (I d ) are defined by S˜k f = gk, k, , k ∈ N.

dt t

1/p .

(2.4)

(2.5)

∈Ak

Trivially, the right-hand side of (2.5) belongs to G1 (I d ) and hence also to G2 (I d ). The operators M+d−1 k ˜ Sk use · (2 + 2)d ≈ 2kd points. So, it is enough to prove the estimate d s 2 (I d ) 2−k(s1 −s2 ) f |B s1 (I d ). B (f − g ) k, k, pp pp ∈Ak s1 We use the dilation property (cf. [9, Proposition 2.2.1]) as well as the embedding Bpp (Rd ) → s2 d Bpp (R ) and obtain s d 2 (f − gk, )k, Bpp (I ) ∈Ak k s2 − pd −k −k s2 k d 2 (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak d k s − −k −k s1 k d 2 2 p (f − g )(2 ·) (2 ·) B (2 I ) (2.6) k, k, pp . ∈Ak


We claim that −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak ⎛ ⎞1/p p s1 k k, ⎠ ⎝ (2 Q ) . (f − gk, )(2−k ·) Bpp

779

(2.7)

∈Ak

To prove (2.7), we first decompose (independent of k ∈ N) so that

∈Ak

into

K =1

∈Ak

with the number K ∈ N

dist(supp k,1 (2−k ·), supp k,2 (2−k ·)) > 1

(2.8)

for every 1 , 2 ∈ Ak and every = 1, . . . , K. To every ∈ Ak we associate E ((f − gk, )(2−k ·)) defined on Rd such that E ((f − gk, )(2−k x)) = (f − gk, )(2−k x), E ((f − gk, )(2−k x)) = 0

x ∈ 2k Qk, ,

if x ∈ supp k, (2−k ·)

(2.9) (2.10)

if ∈ Ak , = and s1 s1 E ((f − gk, )(2−k x))|Bpp (Rd ) c (f − gk, )(2−k x)|Bpp (2k Qk, ).

(2.11)

The existence of E ((f − gk, )(2−k ·)) satisfying (2.9)–(2.11) follows directly from the Definition A.3, possibly combined with some smooth cut-off function and the pointwise multiplier assertion, cf. [15, Theorem 2.8.2]. Denoting ˜ (x) = (2 ˜ k x − ), k,

x ∈ Rd , k ∈ N, ∈ Zd ,

we get −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak K −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) =1 ∈Ak K −k −k s1 d E ((f − gk, )(2 ·))k, (2 ·) Bpp (R ) . =1 ∈A k

(2.12)

780


By (2.8) and the so-called localisation property, cf. [16, Chapter 2.4.7], we may estimate the last expression from above by K

=1

⎛

⎞1/p p s1 d ⎠ ⎝ (R ) E ((f − gk, )(2−k ·))k, (2−k ·) Bpp ∈Ak

⎞1/p K p −k −k s d 1 (R ) ⎝ E ((f − gk, )(2 ·))k, (2 ·) Bpp ⎠ ⎛

=1 ∈Ak

⎞1/p p s1 =⎝ (Rd ) ⎠ . E ((f − gk, )(2−k ·))k, (2−k ·) Bpp ⎛

∈Ak

Together with Lemma A.7 and (2.11) this finally leads to −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak ⎛ ⎞1/p p p s1 d s1 d ⎠ ⎝ (R ) · k, (2−k ·) Bpp (R ) E ((f − gk, )(2−k ·)) Bpp ∈Ak

⎞1/p p s1 ⎝ (Rd ) ⎠ E ((f − gk, )(2−k ·)) Bpp ⎛

∈Ak

⎞1/p p −k s k k, 1 (2 Q ⎝ ) ⎠ , (f − gk, )(2 ·) Bpp ⎛

∈Ak

which finishes (2.7). We insert (2.7) into (2.6) and use (2.4) together with (A.4) s d 2 (f − gk, )k, Bpp (I ) ∈Ak ⎛ ⎞1/p 1 p dt d k k, k s − M,2 Q ⎠ 2 2 p ⎝ t −s1 p (dt f (2−k ·))(x) Lp (2k Qk, ) t 0 ∈Ak

2

k

s2 − pd

⎛ ⎝

∈Ak 0

1

⎞1/p p dt k, M,Q ⎠ . t −s1 p (d2−k t f )(2−k x) Lp (2k Qk, ) t


781

The rest is done by direct substitutions and Theorem A.4 s2 d (f − gk, )k, |Bpp (I ) ∈Ak ⎛ ⎞1/p 2−k p d d k, k s −s − M,Q ⎠ 2 2 1 p ⎝ −s1 p (d f )(2−k x) Lp (2k Qk, ) ∈Ak 0 ⎛ ⎞1/p p d 2−k k, M,Q ⎠ 2k(s2 −s1 ) ⎝ −s1 p (d f )(x) Lp (Qk, ) 0 ∈Ak

−k 1/p p 2 −s1 p M,I d −k(s1 −s2 ) d d 2 f )(x) Lp (I ) (d 0 s1 2−k(s1 −s2 ) f |Bpp (I d ).

Next we consider the case of general integrability and summability parameters. Proposition 2.3. Let = I d = (0, 1)d . Let G1 () = Asp11 q1 () and G2 () = Asp22 q2 () with 1p1 , p2 , q1 , q2 ∞ (p1 , p2 < ∞ in the F-case), d 1 1 s1 > and s1 − d − > s2 > 0. (2.13) p1 p1 p2 + Then − gnlin (id)n

s1 −s2 1 1 d + p1 − p2 +

(2.14)

.

Proof. First, we deal with the case p1 = p2 = p and p = q1 and/or p = q2 . We use the well-known real interpolation formula, cf. [13,1,15,17]: r r0 r1 Bpq (Rd ) = Bpp (Rd ), Bpp (Rd ) ,q

and its counterpart r r0 r1 Bpq (I d ) = Bpp (I d ), Bpp (I d )

,q

for 1 p, q ∞,

0 < < 1,

r0 < r1 ,

r = (1 − )r0 + r1 .

If, for example, p = q2 , we find two different real numbers s2 and s2 such that s1 > s2 , s2 > 0,

s2 = (1 − )s2 + s2

782


and apply Proposition 2.2 to embeddings id and id in the following diagram: s

2 Bpp (I d ) t9 t id tt t tt t t s1 s2 d Bpp (I d ) id / Bpq 2 (I ) JJ JJ JJ J id JJ% s2 d (I ) Bpp

s2 Using the same approximation operator S˜k , we may interpolate the estimates for f − S˜k f |Bpp s2 (I d ) and f − S˜k f |Bpp (I d ) and obtain (2.14). If also p = q1 , we proceed in the same way. If p1 < p2 we define s0 by 1 1 > s2 > 0 − s1 > s0 := s2 + d p1 p2

and use the chain of embeddings Bps11 q1 (I d ) → Bps01 q2 (I d ) → Bps22 q2 (I d ). The first embedding provides the estimate gnlin (id) n−

s1 −s0 d

−

=n

s1 −s2 1 1 d + p1 − p2

,

the second one is bounded. If p1 > p2 , we use the embedding Bps11 q1 (I d ) → Bps21 q2 (I d ) → Bps22 q2 (I d ). The second embedding is bounded, the first one together with Proposition 2.2 gives the result. This finishes the proof in the B-case. The F-case then follows through trivial embeddings, cf. [15, 2.3.2] Fps11 q1 (I d ) → Bps11 ,∞ (I d ) → Bps22 ,1 (I d ) → Fps22 q2 (I d ).

Theorem 2.4. Let = I d = (0, 1)d . Let G1 () = Asp11 q1 () and G2 () = Asp22 q2 () with 1p1 , p2 , q1 , q2 ∞ (p1 , p2 < ∞ in the F-case) and (2.13). Then gn (id) ≈

gnlin (id)

−

≈n

s1 −s2 1 1 d + p1 − p2 +

.

(2.15)

Proof. According to the Proposition 2.3, it is enough to prove that gn (id) n

−

s1 −s2 1 1 d + p1 − p2 +

.

(2.16)

We use the following simple observation, (cf. [12, Proposition 20]). For = {xj }nj=1 ⊂ we denote G 1 () = {f ∈ G1 () : f (xj ) = 0 for all j = 1, . . . , n}.


783

Then gn (id) ≈ inf sup{f |G2 () : f ∈ G 1 (), f |G1 () = 1}

= inf id : G1 () → G2 (),

(2.17) (2.18)

where both the infima extend over all sets = {xj }nj=1 ⊂ .

To prove (2.16), we construct for every = {xj }2j =1 , l ∈ N, a function l ∈ G 1 () with ld

l |G1 ()1

l |G2 () 2

and

l s2 −s1 +d p1 − p1 1

2 +

,

(2.19)

where the constants of equivalence do not depend on l ∈ N. We rely on the wavelet characterisation of the spaces Aspq (Rn ), as described in [18, Section 3.1]. Let F ∈ C K (R)

and M ∈ C K (R), K ∈ N,

be the Daubechies compactly supported K-wavelets on R with K large enough. Then we define (x) =

d

M (xi ),

x = (x1 , . . . , xd ) ∈ Rd

i=1

and j

m (x) = (2j x − m),

j ∈ N0 , m ∈ Z n .

Then the function j j (x) = j m m (x),

j ∈N

(2.20)

m

satisfies

j |Aspq () ≈ 2

j (s− pd )

1/p | j m |p

(2.21)

m

with constants independent on j ∈ N and on the sequence = { j m }. The summation in (2.20) j and (2.21) runs over those m ∈ Zn for which the support of m is included in . The proof of (2.21) is based on [18, Theorem 3.5]. First, this theorem tells us that the Aspq ()-norm of (2.20) may be estimated from above by the right-hand side of (2.21). On the other hand, considering another extension of j to Rd and its (unique) wavelet decomposition, we get the opposite inequality. ld There is a number k ∈ N with the following property. For any l ∈ N and any = {xj }2j =1 , there are mj ∈ Zd , j = 1, . . . , 2ld such that supp k+l mj ⊂

and

ld supp k+l mj ∩ = ∅ for j = 1, . . . , 2 .

Step 1: p1 p2 . In this case, we take in (2.20) k+l,m1 = 2 n = 2, . . . , 2ld and apply (2.21) twice to verify (2.19).

−j (s− pd )

and k+l,mn = 0,

784


Step 2: p1 > p2 . In this case, we take k+l,mn = 2−j s , n = 1, . . . , 2ld in (2.20) and apply again (2.21) twice to prove (2.19). 2.2. The case s2 = 0 In the case s2 = 0, new phenomena come into play. First we point out that Lemma A.8 for s = 0 gives an immediate counterpart of (2.6) and this leads to the following result. Theorem 2.5. Let = I d = (0, 1)d . Let id : G1 () → G2 (), with G1 () = Bps 1 q1 ,

G2 () = Bp02 q2

and 1p1 , q1 , p2 , q2 ∞,

s>

d . p1

Then − ds +( p1 − p1 )+

n

1

2

gn (id) gnlin (id) n

− ds +( p1 − p1 )+ 1

2

(1 + log n)1/q2 ,

n ∈ N.

(2.22)

If the target space is a Lebesgue space, this can be improved, cf. [12]. Theorem 2.6. Let be a bounded Lipschitz domain in Rd . Let id : G1 () = Aspq () → Lr () = G2 (), with 1 p, q ∞,

s>

d p

and

1r ∞

(p < ∞ in the F-case). Then − ds +( p1 − 1r )+

gn (id) ≈ gnlin (id) ≈ n

,

n ∈ N.

Remark 2.7. We show in one example, that the logarithmic factor cannot be removed in general. Let = I d = (0, 1)d and consider the embedding s 0 () → B1,1 (). id : B1,1

(0) = 0. For every k ∈ N and every = Finally, take ∈ S(Rd ) with supp ⊂ and {xj }nj=1 ⊂ , n = 2kd , we set fk (x) = (2k+1 (x − x )), where x is chosen such that supp fk ∩ = ∅ and supp fk ⊂ . We claim that s (I d )c 2k(s−d) fk |B1,1

(2.23)

0 (I d )c k 2−kd . fk |B1,1

(2.24)

and


785

Combining (2.23) with (2.24), it follows that gn (id) ≈ gnlin (id) ≈ n− d (1 + log n), s

n ∈ N.

The proof of (2.23) follows directly from Lemma A.8. To prove (2.24), let l ∈ N be the smallest natural number such that () = 0

for || 2−l

and write for k 2l 0 0 fk |B1,1 (I d ) c fk |B1,1 (Rd ) = c

∞

d ∨ (j f k ) |L1 (R )

j =0

c

k−l−1

(2−k−1 )e−i ·x )∨ |L1 (Rd ) (1 (2−j )2(−k−1)d

j =0

= c 2(−k−1)d

k−l−1

(1 (2−j ) (2−k−1 ))∨ |L1 (Rd )

j =0

=c

k−l−1

(1 (2−j +k+1 ) ())∨ (2k+1 x)|L1 (Rd )

j =0

=2

(−k−1)d

k−l−1

(1 (2−j +k+1 ) ())∨ (x)|L1 (Rd ).

(2.25)

j =0

To estimate each of the summands from below, we consider the function 1 (1 (2−j +k+1 ·))∨ = (1 (2−j +k+1 ·) · · · 0 (2l ·))∨ and use Young’s inequality to estimate its L1 -norm. d −j +k+1 ∨ ∨ ·)) |L1 (Rd ) 1 |L1 (R ) = (1 (2

(1 (2

−j +k+1

(2l ·) ∨ 0 d ·) · ) |L1 (R ) · L1 (R ) . ∨

d

Now, (2.24) is a combination of (2.25) and (2.26). 2.3. The case s2 < 0 As the last case, we consider the situation s2 < 0. Theorem 2.8. Let be a bounded Lipschitz domain in Rd . Let id : G1 () = Asp11 q1 () → G2 () = Asp22 q2 () with 1 p1 , p2 , q1 , q2 ∞ (with p1 , p2 < ∞ in the F-case) and s1 >

d , p1

s2 < 0.

(2.26)

786


If p1 p2 , then s1

gn (id) ≈ gnlin (id) ≈ n− d . If p1 < p2 and s2 >

d d − , then p2 p1 −

gn (id) ≈ gnlin (id) ≈ n If p1 < p2 and

(2.27)

s1 s2 1 1 d + d + p1 − p2

(2.28)

.

d d − > s2 , then p2 p1 s1

gn (id) ≈ gnlin (id) ≈ n− d .

(2.29)

Proof. Step 1: In this step, we prove two estimates from below. First, using the method from the proof of Theorem 2.4, we obtain − gnlin (id) gn (id)n

s1 −s2 1 1 d + p1 − p2

exactly as in the case s2 > 0. To prove the second estimate from below, namely s1

gnlin (id) gn (id)n− d ,

(2.30) Asp11 q1 (Rd )

spaces as described in we proceed as follows. We rely on atomic decomposition of [18, Chapter 1.5]. For every set ⊂ with || = 2j d we construct a function j (x) =

Mj

j m aj m (x),

x ∈ Rd ,

m=1 −j

d

where Mj ≈ 2j d , j m = 2 p1 for m = 1, . . . , Mj and aj m are positive atoms in the sense of [18, Definition 1.15]. As s1 > 0, no moment conditions are needed. We suppose that supp aj m ∩ = ∅ and supp aj m ⊂ . Altogether, we get j |Asp11 q1 ()j |Asp11 q1 (Rd ) 1 and

j |L1 () = ≈ 2

Id jd

j (x) dx ≈

Mj

j m aj m (x)|L1 (Rd )

m=1

·2

−j pd

1

·2

−j d

·2

−j (s− pd ) 1

= 2−j s1 .

Finally, we choose a non-negative function ∈ S(Rd ) such that the mapping

(x)f (x) dx f →

yields a linear bounded functional on Asp22 q2 (), supp ⊂ and (x)j (x) dx j (x) dx. This leads to 2−j s1 ≈ j |L1 () (x)j (x) dxj |Asp22 q2 ().

Hence, (2.30) is proved and it implies all estimates from below included in the theorem.


787

Step 2: If p1 p2 we use the following chain of embeddings: Asp11 q1 () → Lp1 () → Asp22 q2 ()

(2.31)

and obtain s1

gnlin (id)gnlin (id : Asp11 q1 () → Lp1 ()) · id : Lp1 () → Asp22 q2 () n− d . (2.32) If p1 < p2 and 0 > pd2 − pd1 > s2 , then (2.31) holds true as well and, consequently, also (2.32) remains true. If p1 < p2 and 0 > s2 > pd2 − pd1 , we define r > 0 by 1r := − sd2 + p12 . It follows that p1 < r < p2 . Using the embeddings Asp11 q1 () → Lr () → Asp22 p2 ()

(2.33)

we get gnlin (id)gnlin (id : Asp11 q1 () → Lr ()) · id : Lr () → Asp22 p2 () −

n

s1 1 1 d + p1 − r

−

=n

s1 −s2 1 1 d + p1 − p2

.

This proves the upper estimate in (2.28) if p2 = q2 . The general case follows then by interpolation, similar to the proof of Proposition 2.3. 2.4. Comparison with approximation numbers In this closing part we wish to compare the sampling numbers of id : Bps11 q1 () → Bps22 q2 ()

(2.34)

for = (0, 1)d with corresponding approximation numbers. Let us first recall their definition. Definition 2.9. Let A, B be Banach spaces and let T be a compact linear operator from A to B. Then for all n ∈ N the nth approximation number an (T ) of T is defined by an (T ) = inf{T − L : L ∈ L(A, B), rank Ln},

(2.35)

where rank L is the dimension of the range of L. Obviously, an (id) represents the approximation of id by linear operators with the dimension of the range smaller or equal to n, in general not restricted to involve only function values. Hence an (id)gnlin (id),

n ∈ N.

We again assume that d s1 > , p1

s1 − s2 > d

1 1 − p1 p2

,

(2.36)

+

which ensures that (2.34) is compact and its sampling numbers are well defined. The approximation numbers of (2.34) are well known, we refer to [2,14,4,18] for details. We wish to discuss, when the equivalence an (id) ≈ gnlin (id) holds true. The comparison of our results with the known results

788


for an (id) shows, that this is the case if either 1. s2 > 0 and 1p2 p1 ∞ or, 2. s2 > 0 and 1p 1 p 2 2 or 2 p1 p2 ∞ or, 3. 0 > s2 > d p12 − p11 and 1 p1 p2 2 or 2 p1 p2 ∞. Acknowledgment I would like to thank to Erich Novak, Winfried Sickel, Hans Triebel and to the anonymous referee for many valuable discussions and comments on the topic. Appendix A. Function spaces on domains A.1. Function spaces on Rd We use standard notation: N denotes the collection of all natural numbers, Rd is the Euclidean d-dimensional space, where d ∈ N, and C stands for the complex plane. Let S(Rd ) be the Schwartz space of all complex-valued rapidly decreasing, infinitely differentiable functions on Rd and let S (Rd ) be its dual—the space of all tempered distributions. Furthermore, Lp (Rd ) with 1 p ∞, are the Lebesgue spaces endowed with the norm ⎧ 1/p p ⎨ , 1 p < ∞, Rd |f (x)| dx d f |Lp (R ) = ess sup |f (x)|, p = ∞. ⎩ x∈Rd

For ∈ S(Rd ) we denote by () = (F )() = (2)−d/2

Rd

e−ix, (x) dx,

x ∈ Rd ,

its Fourier transform and by ∨ or F −1 its inverse Fourier transform. We give a Fourier-analytic definition of Besov and Triebel-Lizorkin spaces, which relies on the so-called dyadic resolution of unity. Let ∈ S(Rd ) with (x) = 1 if |x|1

and (x) = 0

if |x| 23 .

(A.1)

We put 0 = and j (x) = (2−j x) − (2−j +1 x) for j ∈ N and x ∈ Rd . This leads to identity ∞

j (x) = 1,

x ∈ Rd .

j =0 s (Rd ) is the collection of all f ∈ S (Rd ) Definition A.1. (i) Let s ∈ R, 1 p, q ∞. Then Bpq such that ⎞1/q ⎛ ∞ q s (Rd ) = ⎝ 2j sq (j f)∨ Lp (Rd ) ⎠ < ∞ (A.2) f |Bpq j =0

(with the usual modification for q = ∞).


789

s (Rd ) is the collection of all f ∈ S (Rd ) such (ii) Let s ∈ R, 1 p < ∞, 1 q ∞. Then Fpq that ⎛ ⎞1/q ∞ s (Rd ) = ⎝ 2j sq |(j f)∨ (·)|q ⎠ |Lp (Rd ) < ∞ (A.3) f |Fpq j =0

(with the usual modification for q = ∞). Remark A.2. These spaces have a long history. In this context we recommend [13,15,16,18] as s (Rd ) and F s (Rd ) are independent of the standard references. We point out that the spaces Bpq pq choice of in the sense of equivalent norms. Special cases of these two scales include Lebesgue spaces, Sobolev spaces, Hölder–Zygmund spaces and many other important function spaces. We omit any detailed discussion. A.2. Function spaces on domains Let be a bounded domain. Let D() = C0∞ () be the collection of all complex-valued infinitely differentiable functions with compact support in and let D () be its dual—the space of all complex-valued distributions on . Let g ∈ S (Rd ). Then we denote by g| its restriction to : (g|) ∈ D ()

(g|)() = g() for ∈ D().

Definition A.3. Let be a bounded domain in Rd . Let s ∈ R, 1 p, q ∞ with p < ∞ in the s or F s . Then F-case. Let Aspq stand either for Bpq pq Aspq () = {f ∈ D () : ∃g ∈ Aspq (Rd ) : g| = f } and f |Aspq () = inf g|Aspq (Rd ), where the infimum is taken over all g ∈ Aspq (Rd ) such that g| = f . We collect some important properties of spaces Aspq () which will be useful later on. For this reason, we have to restrict to bounded Lipschitz domains. We use a standard definition of the notion of Lipschitz domain, the reader may consult for example [18, Chapter 1.11.4]. Let x ∈ Rd , h ∈ Rd and M ∈ N. Then 1 f )(x) = (1h M (M+1 h f )(x) with (h f )(x) = f (x + h) − f (x), h

are the usual differences in Rd . For x ∈ we consider the differences with respect to : (M h, f )(x)

=

(M h f )(x) if x + lh ∈ for l = 0, . . . , M, 0 otherwise.

790


We also need to adapt the classical ball means of differences to bounded domains. Let M ∈ N, t > 0, x ∈ . Then we define V M (x, t) = {h ∈ Rd : |h| < t, x + h ∈ for 0 < M} and dtM, f (x) = t −d

V M (x,t)

|(M h f )(x)| dh.

We shall also use the simple relation (cf. [12, (4.10)]) f )( x), (dtM, f ( ·))(x) = (d M, t

x ∈ , 0 < , t < ∞.

(A.4)

The following theorem connects the classical definition of Besov and Triebel-Lizorkin spaces using differences with Definition A.3. We refer to and [8,18, 1.11.9] for details and references to this topic. Theorem A.4. Let be a bounded Lipschitz domain in Rd . Let 1 p, q ∞ and 0 < s < M ∈ N. s () is the collection of all f ∈ L () such that Then Bpq p

1

f |Lp () +

t 0

−sq

dtM, f |Lp ()q

dt t

1/q

d p.

d p,

Then

s s s h1 · h2 |Bpq (Rd ) ch1 |Bpq (Rd ) · h2 |Bpq (Rd ),

where the constant c does not depend on h1 and h2 . Finally, we consider the dilation operator Tk : f → f (2k ·), k ∈ N, and its behaviour on the scale of Besov spaces. For the proof, we refer to [3, 1.7; 9, 2.3.1]. s (Rd ) Lemma A.8. Let s 0, 1 p, q ∞ and k ∈ N. Then the operator Tk is bounded on Bp,q

and its norm is bounded by c2 does not depend on k ∈ N.

k(s− pd )

if s > 0 and by c2

−k pd

(1 + k)1/q if s = 0. The constant c

References [1] L. Bergh, J. Löfström, Interpolation Spaces, An Introduction, Berlin, Springer, 1976. [2] M.Sh. Birman, M.Z. Solomyak, Piecewise polynomial approximation of functions of the class Wp , Mat. Sb. (N. S.) 73 (1967) 331–355; English translation: Math. USSR Sb. 2 (1967) 295–317. [3] G. Bourdaud, Sur les opérateurs pseudo-différentiels à coefficinets peu réguliers, Habilitation thesis, Université de Paris-Sud, Paris, 1983. [4] A.M. Caetano, About approximation numbers in function spaces, J. Approx. Theory 94 (1998) 383–395. [5] P.G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978. [6] S. Dahlke, E. Novak, W. Sickel, Optimal approximation of elliptic problems by linear and nonlinear mappings I, J. Complexity 22 (2006) 29–49. [7] S. Dahlke, E. Novak, W. Sickel, Optimal approximation of elliptic problems by linear and nonlinear mappings II, J. Complexity 22 (2006) 549–603. [8] S. Dispa, Intrinsic characterisation of Besov spaces on Lipschitz domains, Math. Nachr. 260 (2003) 21–33. [9] D.E. Edmunds, H. Triebel, Function spaces, entropy numbers, differential operators, Cambridge University Press, Cambridge, 1996. [10] S.N. Kudryavtsev, The best accuracy of reconstruction of finitely smooth functions from their values at a given number of points, Izv. Math. 62 (1) (1998) 19–53. [11] W. Light, W. Cheney, A Course in Approximation Theory, Brooks/Cole, Pacific Grove, 1999. [12] E. Novak, H. Triebel, Function spaces in Lipschitz domains and optimal rates of convergence for sampling, Constr. Approx. 23 (2006) 325–350. [13] J. Peetre, New Thoughts on Besov Spaces, Duke University Mathematics Series, Duke University Press, Durham, 1976. [14] V.M. Tikhomirov, Analysis II, Convex Analysis and Approximation Theory, Springer, Berlin, 1990. [15] H. Triebel, Theory of Function Spaces, Birkhäuser, Basel, 1983. [16] H. Triebel, Theory of Function Spaces II, Birkhäuser, Basel, 1992. [17] H. Triebel, Function spaces in Lipschitz domains and on Lipschitz manifolds. Characteristic functions as pointwise multipliers, Rev. Mat. Complut. 15 (2002) 475–524. [18] H. Triebel, Theory of Function Spaces III, Birkhäuser, Basel, 2006. [19] H. Triebel, Sampling numbers and embedding constants, Trudy Mat. Inst. Steklov 248 (2005) 275–284.


Quantum lower bounds by entropy numbers Stefan Heinrich∗ Department of Computer Science, University of Kaiserslautern, D-67653 Kaiserslautern, Germany Received 30 November 2006; accepted 30 January 2007 Available online 13 March 2007

Abstract We use entropy numbers in combination with the polynomial method to derive a new general lower bound for the nth minimal error in the quantum setting of information-based complexity. As an application, we improve some lower bounds on quantum approximation of embeddings between finite dimensional Lp spaces and of Sobolev embeddings. © 2007 Elsevier Inc. All rights reserved. Keywords: Quantum information-based complexity; Minimal quantum error; Lower bound; Entropy number

1. Introduction There is one major technique for proving lower bounds in the quantum setting of informationbased complexity (IBC) as introduced in [5]. It uses the polynomial method [1] together with a result on approximation by polynomials from [14]. This method has been applied in [5,9,19]. Other papers on the quantum complexity of continuous problems use this implicitly by reducing mean computation to the problem under consideration and then using the lower bound for mean computation of [14] directly [15,18,11,16]. This approach, however, does not work for the case of approximation of embedding operators in spaces with norms different from the infinity norm. To settle such situations, a more sophisticated way of reduction to known bounds was developed in [6], based on a multiplicativity property of the nth minimal quantum error. In this paper we introduce an approach which is new for the IBC quantum setting. We again use the polynomial method of [1], but combine it with methods related to entropy [4]. We derive lower bounds for the nth minimal quantum error in terms of certain entropy numbers. Similar ∗ Fax: +49 631 205 3270.

E-mail address: [email protected] URL: http://www.uni-kl.de/AG-Heinrich. 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.01.007

794

S. Heinrich / Journal of Complexity 23 (2007) 793 – 801

ideas have been applied before in [17], the model and methods, however, being different, see also related work [13]. As an application, we improve the lower bounds [6,7] on approximation as well as those of [8] by removing the logarithmic factors. Let us also mention that a modification of the polynomial method based on trigonometric polynomials was used in [2,3] for proving lower bounds for a type of query different from that introduced in [5], the so-called power query [16]. Our method can also be applied in this setting and simplifies the analysis from [2,3]. We comment on this at the end of the paper. 2. Lower bounds by entropy We work in the quantum setting of IBC as introduced in [5]. We refer to this paper extensively. Let D and K be nonempty sets, let F(D, K) denote the set of all functions on D with values in K, let F ⊆ F(D, K) be nonempty, and let G be a normed linear space. Let S be a mapping from F to G, the solution operator, which we seek to approximate. Let A be a quantum algorithm from F to G. The error of A at input f ∈ F is the smallest ε 0 such that with probability at least 43 the algorithm output A(f ) is within distance ε of S(f ), formally, e(S, A, f ) = inf {ε 0 | P{S(f ) − A(f )ε}3/4} . The error over the class F is then defined as e(S, A, F ) = sup e(S, A, f ). f ∈F

For any subset C ⊆ G define the function pC : F → R by pC (f ) = P{A(f ) ∈ C}

(f ∈ F ),

the probability that the output of algorithm A at input f belongs to C. This quantity is well defined for all subsets C since the output of A takes only finitely many values, see [5]. Furthermore, define PA,F = span{pC : C ⊆ G} ⊆ F(F, R) to be the linear span of the functions pC . We need some notions related to entropy. We refer to [4] for the definitions. For a nonempty subset W of a normed space G and k ∈ N (we use the notation N = {1, 2, . . .} and N0 = {0, 1, 2, . . .}) define the kth inner entropy number as k (W, G) = sup{ε : there exist u1 , . . . , uk+1 ∈ W such that ui − uj 2ε for all 1 i = j k + 1}. It is worthwhile mentioning a related notion. The kth entropy number is defined to be εk (W, G) = inf ε : there exist g1 , . . . , gk ∈ G such that min g − gi G ε for all g ∈ W . 1i k

(1)

(2)

Then k (W, G)εk (W, G) 2k (W, G),

(3)


795

see [4, relations (1.1.3) and (1.1.4)]. Also observe that the first numbers of both types are related to the radius and diameter of W as follows: 1 (W, G) =

1 2

diam(W, G),

ε1 (W, G) = rad(W, G).

(4)

Entropy numbers of bounded linear operators (that means, the entropy numbers of the image of the unit ball under the action of the operator) as well as their relation to various s-numbers and to eigenvalues are well studied, see again [4] and references therein. Our basic lemma relates the error e(S, A, F ) of a quantum algorithm A from F to G to the dimension of PA,F and the entropy of S(F ) ⊆ G. Lemma 1. (i) Let k ∈ N be such that k + 1 > (log2 5) dim PA,F .

(5)

e(S, A, F )k (S(F ), G).

(6)

Then

(ii) If A is an algorithm without queries, then (7)

e(S, A, F )1 (S(F ), G).

Proof. The first part of the proof is the same for both cases. For case (i) we assume that k satisfies (5), while in case (ii) we set k = 1. Let f1 , . . . , fk+1 ∈ F be arbitrary elements and put (8) ε = min S(fi ) − S(fj ) : 1 i = j k + 1 . It suffices to show that (9)

e(S, A, F )ε/2. For ε = 0 this is trivial, so we suppose ε > 0. We assume the contrary of (9), that is,

(10)

e(S, A, F ) < ε/2. By (8), the subsets Vi ⊂ G defined by ε Vi = g ∈ G : S(fi ) − g < 2

(i = 1, . . . , k + 1)

(11)

are disjoint. It follows from (10) and (11), that for i = 1, . . . , k + 1 P{A(fi ) ∈ Vi } 43 .

(12)

Let us first complete the proof of (ii): if A has no queries, its output does not depend on f ∈ F , and in particular, the distribution of the random variables A(f1 ) and A(f2 ) is the same. But then (12) implies P{A(f1 ) ∈ V1 ∩ V2 } 1/2, thus V1 ∩ V2 = ∅, a contradiction, which proves (9) in case (ii).

796


Now we deal with case (i). Let C be the set of all C ⊂ G of the form Vi C= i∈I

with I being any subset of {1, . . . , k + 1}. Clearly, |C| = 2k+1 .

(13)

Let PA,F be endowed with the supremum norm p∞ = sup |p(f )|. f ∈F

We have pC ∞ 1 (C ∈ C).

(14)

Moreover, pC1 − pC2 ∞ 21

(C1 = C2 ∈ C).

(15)

Indeed, for C1 = C2 ∈ C there is an i with 1 i k + 1 such that Vi ⊆ C1 \ C2 or Vi ⊆ C2 \ C1 . Without loss of generality we assume the first. Then, because of (12), we have pC1 (fi ) = P{A(fi ) ∈ C1 } P{A(fi ) ∈ Vi } 43 , while pC2 (fi ) = P{A(fi ) ∈ C2 } P{A(fi ) ∈ G \ Vi } 41 , hence |pC1 (fi ) − pC2 (fi )| 21 implying (15). For p ∈ PA,F let B(p, r) be the closed ball of radius r around p in PA,F . By (15) the balls B(pC , 1/4) have disjoint interior for C ∈ C. Moreover, by (14), B(pC , 1/4) ⊆ B(0, 5/4). C∈C

A volume comparison gives 2k+1 = |C| 5dim PA,F , hence, taking logarithms, we get a contradiction to (5), which completes the proof.

q

Let en (S, F ) denote the nth minimal quantum error, that is, the infimum of e(S, A, F ) taken over all quantum algorithms A from F to G with at most n queries (see [5]). As an immediate consequence of Lemma 1, and also for later use, we note the following. Corollary 1. 1 2

q

diam(S(F ), G)e0 (S, F ) rad(S(F ), G).

(16)


797

Proof. The lower bound follows from Lemma 1(ii) and (4). The upper bound is obtained by taking for any > 0 a point g ∈ G with S(f ) − g rad(S(F ), G) +

for all f ∈ F

and then using the trivial algorithm which outputs g for all f ∈ F , with probability 1.

Next we recall some facts from [5, Section 4]. Let L ∈ N and for each u = (u1 , . . . , uL ) ∈ {0, 1}L let fu ∈ F(D, K) be assigned such that the following is satisfied: Condition I. For each t ∈ D there is an , 1 L, such that fu (t) depends only on u , in other words, for u, u ∈ {0, 1}L , u = u implies fu (t) = fu (t). The following result was shown in [5, Corollary 2], based on the idea of the quantum polynomial method [1]. Lemma 2. Let L ∈ N and assume that (fu )u∈{0,1}L ⊆ F(D, K) satisfies Condition I. Let n ∈ N0 and let A be a quantum algorithm from F(D, K) to G with n quantum queries. Then for each subset C ⊆ G,

pC (fu ) = pC f(u1 ,...,uL ) , considered as a function of the variables u1 , . . . , uL ∈ {0, 1}, is a real multilinear polynomial of degree at most 2n. Now we are ready to state the new lower bound on the nth minimal quantum error. Proposition 1. Let D, K be nonempty sets, let F ⊆ F(D, K) be a nonempty set of functions, G a normed space, S : F → G a mapping, and L ∈ N. Suppose L = (fu )u∈{0,1}L ⊆ F(D, K) is a system of functions satisfying Condition I. Then q

en (S, F )k (S(F ∩ L), G) whenever k, n ∈ N satisfy 2n L and 2n eL k + 1 > (log2 5) . 2n

(17)

Proof. Let n ∈ N with 2n L and let A be a quantum algorithm from F to G with no more than n queries. Note that, by definition, a quantum algorithm from F ⊆ F(D, K) to G is always also a quantum algorithm from F(D, K) to G (see [5, p. 7]). We show that e(S, A, F )k (S(F ∩ L), G)

(18)

for all k ∈ N satisfying (17). Let ML,2n be the linear space of real multilinear polynomials in L variables of degree not exceeding 2n. Since 2n L, its dimension is dim ML,2n =

2n

L i=0

i

eL 2n

2n (19)

798


(see, e.g., [12, (4.7), p. 122] for the inequality). Set U = {u ∈ {0, 1}L : fu ∈ F } and let ML,2n (U ) denote the space of all restrictions of functions from ML,2n to U. Clearly, dim ML,2n (U ) dim ML,2n .

(20)

Define : PA,F ∩L → F(U, R) by setting for p ∈ PA,F ∩L and u ∈ U (p)(u) = p(fu ). Obviously, is linear, moreover, for C ⊆ G (pC )(u) = pC (fu )

(u ∈ U ).

By Lemma 2, pC (fu ), as a function of u ∈ U , is the restriction of an element of ML,2n to U. Hence, pC ∈ ML,2n (U ), and by linearity and the definition of PA,F ∩L as the linear span of functions pC , we get (PA,F ∩L ) ⊆ ML,2n (U ). Furthermore, is one-to-one, since {fu : u ∈ U } = F ∩ L. Using (19) and (20) it follows that 2n eL . dim PA,F ∩L dim ML,2n (U ) 2n Consequently, for k satisfying (17), k + 1 > (log2 5) dim PA,F ∩L . Now (18) follows from Lemma 1.

3. Some applications For N ∈ N and 1p ∞, let LN p denote the space of all functions f : {1, . . . , N} → R, equipped with the norm 1/p N 1 p f LNp = |f (i)| , N i=1

if p < ∞, f LN∞ = max |f (i)|, 1i N

N N N N and let B(LN p ) be its unit ball. Define Jpq : Lp → Lq to be the identity operator Jpq f = f (f ∈ N Lp ).


799

N was obtained using a mulAs already mentioned, the lower bound for approximation of Jpq tiplicativity property of the nth minimal quantum error [6, Proposition 1]. The result involved some logarithmic factors of negative power [6, Proposition 6]. Based on Proposition 1 above we improve this bound by removing the logarithmic factors.

Proposition 2. Let 1 p, q ∞. There is a constant c > 0 such that for all n ∈ N0 , N ∈ N with n cN q

N 1 en (Jpq , B(LN p )) 8 .

Proof. It suffices to prove the case p = ∞, q = 1. We put L = N and fu = u for u ∈ {0, 1}N . Clearly, the system L = (fu )u∈{0,1}N satisfies Condition I and L ⊂ B(LN ∞ ).

(21)

Let {fui : 1i k + 1} be a maximal in L system with fui − fuj LN 41 1

(1i = j k + 1),

(22)

i.e., a system which has no proper superset in L satisfying (22). Maximality implies {0, 1}N =

k+1

u ∈ {0, 1}N : fu − fui LN < 1

i=1

1 . 4

On the other hand, k+1

1 N N 2 u ∈ {0, 1} : fu − fui LN1 < 4 i=1

N (k + 1)(4e)N/4 (k + 1) j 0 j 0, hence k ∈ N. From (22) we obtain N 1 k J∞,1 (L), LN 1 8.

(23)

(24)

Consider the function g : (0, 1] → R, 1 g(x) = x log2 e + log2 . x It is elementary to check that g is monotonically increasing. Moreover g(x) → 0 as x → 0. Choose 0 < c2 1 in such a way that g(x)
0 such that for all n ∈ N

r d

> max

1 2 p, p

−

2 q

. Then there is a

en (Jpq , B(Wpr ([0, 1]d )))cn−r/d . q

Furthermore, the lower bounds from [6] were also used in [8, Proposition 3, Corollary 3]. Using Proposition 2, these results can be improved in a similar way. We omit the details. Let us finally comment on lower bounds for power queries introduced in [16]. An inspection of the proof of Lemma 1 shows that the type of query is not used at all in the argument, so the


801

statement also holds for power queries. One part of the argument in both [2,3] consists of proving that for a quantum algorithm with at most n power queries and for a suitable subset F0 ⊆ F , which can be identified with the interval [0, 1], the respective space PA,F0 is contained in the (complex) linear span of functions e2i t (t ∈ [0, 1]), with frequencies from a set of cardinality not greater than cn for some c > 0, hence, dim PA,F0 2cn . Moreover, S(F0 ) can also be identified with the unit interval. Now Lemma 1 above directly yields the logarithmic lower bounds of [2,3], since the kth inner entropy number of the unit interval is k −1 . References [1] R. Beals, H. Buhrman, R. Cleve, M. Mosca, R. de Wolf, Quantum lower bounds by polynomials, in: Proceedings of the 39th IEEE FOCS, 1998, pp. 352–361, see also http://arXiv.org/abs/quant-ph/9802049. [2] A. Bessen, A lower bound for quantum phase estimation, Phys. Rev. A 71 (2005) 042313 see also

http://arXiv.org/abs/quant-ph/0412008. [3] A. Bessen, A lower bound for the Sturm–Liouville eigenvalue problem on a quantum computer, J. Complexity 22 (2006) 660–675 see also http://arXiv.org/abs/quant-ph/0512109. [4] B. Carl, I. Stephani, Entropy, Compactness and the Approximation of Operators, Cambridge University Press, Cambridge, 1990. [5] S. Heinrich, Quantum summation with an application to integration, J. Complexity 18 (2002) 1–50 see also

http://arXiv.org/abs/quant-ph/0105116. [6] S. Heinrich, Quantum approximation I. Embeddings of finite dimensional Lp spaces, J. Complexity 20 (2004) 5–26 see also http://arXiv.org/abs/quant-ph/0305030. [7] S. Heinrich, Quantum approximation II. Sobolev embeddings, J. Complexity 20 (2004) 27–45 see also

http://arXiv.org/abs/quant-ph/0305031. [8] S. Heinrich, On the power of quantum algorithms for vector valued mean computation, Monte Carlo Methods Appl. 10 (2004) 297–310. [9] S. Heinrich, E. Novak, On a problem in quantum summation, J. Complexity 19 (2003) 1–18 see also

http://arXiv.org/abs/quant-ph/0109038. [11] B. Kacewicz, Almost optimal solution of initial-value problems by randomized and quantum algorithms, J. Complexity 22 (2006) 676–690 see also http://arXiv.org/abs/quant-ph/0510045. [12] J. Matousek, Geometric Discrepancy. An Illustrated Guide, Springer, Berlin, 1999. [13] A. Nayak, Optimal lower bounds for quantum automata and random access codes, FDCS, 1999 p. 369, see also

http://arXiv.org/abs/quant-ph/9904093. [14] A. Nayak, F. Wu, The quantum query complexity of approximating the median and related statistics, in: STOC, May 1999, pp. 384–393, see also http://arXiv.org/abs/quant-ph/9804066. [15] E. Novak, Quantum complexity of integration, J. Complexity 17 (2001) 2–16 see also http://arXiv.org/ abs/quant-ph/0008124. [16] A. Papageorgiou, H. Wo´zniakowski, Classical and quantum complexity of the Sturm–Liouville eigenvalue problem, Quantum Inform. Process. 4 (2005) 87–127 see also http://arXiv.org/abs/quant-ph/0502054. [17] Y. Shi, Entropy lower bounds for quantum decision tree complexity, Inform. Process. Lett. 81 (1) (2002) 23–27 see also http://arXiv.org/abs/quant-ph/0008095. [18] J.F. Traub, H. Wo´zniakowski, Path integration on a quantum computer, Quantum Inform. Process. 1 (5) (2002) 365–388 see also http://arXiv.org/abs/quant-ph/0109113. [19] C. Wiegand, Quantum complexity of parametric integration, J. Complexity 20 (2004) 75–96 see also

http://arXiv.org/abs/quant-ph/0305103.


On the complexity of the multivariate Sturm–Liouville eigenvalue problem A. Papageorgiou∗ Department of Computer Science, Columbia University, New York, USA Received 30 November 2006; accepted 12 March 2007 Available online 28 March 2007 Dedicated to Henryk Wozniakowski on the occasion of his 60th birthday

Abstract We study the complexity of approximating the smallest eigenvalue of − + q with Dirichlet boundary conditions on the d-dimensional unit cube. Here is the Laplacian, and the function q is non-negative and has continuous first order partial derivatives. We consider deterministic and randomized classical algorithms, as well as quantum algorithms using quantum queries of two types: bit queries and power queries. We seek algorithms that solve the problem with accuracy ε. We exhibit lower and upper bounds for the problem complexity. The upper bounds follow from the cost of particular algorithms. The classical deterministic algorithm is optimal. Optimality is understood modulo constant factors that depend on d. The randomized algorithm uses an optimal number of function evaluations of q when d 2. The classical algorithms have cost exponential in d since they need to solve an eigenvalue problem involving a matrix with size exponential in d. We show that the cost of quantum algorithms is not exponential in d, regardless of the type of queries they use. Power queries enjoy a clear advantage over bit queries and lead to an optimal complexity algorithm. © 2007 Elsevier Inc. All rights reserved. Keywords: Eigenvalue problem; Eigenvalue approximation

1. Introduction In a recent paper with Wo´zniakowski [21] we studied the classical and quantum complexity of the Sturm–Liouville eigenvalue problem. This paper extends those results to the multidimensional case. By analogy with the Sturm–Liouville eigenvalue problem [9] in one dimension, we consider the eigenvalue problem −u + qu = u defined on the d-dimensional unit cube with Dirichlet boundary condition. Here is the d-dimensional Laplacian, and q is a non-negative function ∗ Fax: +1 212 666 0140.


A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827

803

of d variables whose first order partial derivatives exist and are continuous. Then we study the complexity of approximating the smallest eigenvalue (q) with accuracy ε. We assume that q is not explicitly known but we can sample it at any point of the unit cube. Any algorithm solving this problem will need to compute a number of evaluations of q and to combine them to obtain an approximation of the eigenvalue of interest. Classical algorithms may be deterministic or randomized. The former evaluate q at deterministically chosen points, while the latter can sample q at randomly chosen points. Moreover, randomized algorithms may also combine the evaluations of q randomly. We obtain the worst case error of classical deterministic algorithms, and the worst expected error of randomized algorithms. We address the information cost of classical algorithms, i.e., the number of function evaluations the algorithms use, as well as their total cost by taking into account the additional cost of the operations that are used for combining the function evaluations. Accordingly, the minimal information cost of any algorithm solving the problem with accuracy ε is the information complexity of the problem, while the minimal total cost of any algorithm with error at most ε is the problem complexity. Clearly, the information complexity provides a lower bound for the problem complexity. Quantum algorithms use quantum queries to evaluate q at deterministically chosen points. (Recently, quantum algorithms with randomized queries have been considered [30] but we do not deal with them in this paper.) The query information is combined using a number of quantum operations. Quantum algorithms succeed in producing an ε-approximation with probability, say, 3 4 . The minimal number of queries of any algorithm solving the problem with accuracy ε is the query complexity. The total cost of a quantum algorithm takes into account the additional quantum operations, excluding the ones used for queries, required to solve the problem with accuracy ε. We will distinguish between quantum algorithms using two types of queries, bit queries and power queries. Bit queries are oracle calls similar to those in Grover’s search algorithm [13]. Power queries are obtained by considering the propagator of the system at different time steps, as in phase estimation [19]. In some cases, quantum algorithms may be used to solve parts of the problem while other parts may be solved classically. In such a case we need to consider the cost of the classical and the quantum parts. The definition of the error of algorithms and the details of the model of computation in the different settings can be found in [21] but we will include them in this paper for the convenience of the reader. Turning to the eigenvalue problem we show a perturbation formula relating the eigenvalues (q) and (q) ¯ for two functions q and q, ¯ as in [21]. In particular, we show that (q) = (q) ¯ + ¯ u2q¯ (x) dx + O q − q ¯ 2∞ , (q(x) − q(x)) Id

¯ Using this equation we reduce the eigenwhere uq¯ is the eigenfunction that corresponds to (q). value problem to the integration problem. For deterministic and randomized classical algorithms we use known lower bounds [25] for the information complexity of integration to obtain lower bounds for the information complexity of the eigenvalue problem. For upper bounds we study the cost of particular algorithms that approximate (q) with error ε. We show that by discretizing the continuous problem and solving the resulting matrix eigenvalue problem we obtain an optimal deterministic algorithm. Optimality is understood modulo multiplicative constants that depend on d. We derive a randomized algorithm using the perturbation formula above. Roughly speaking, the idea is to first approximate q by a

804


function q, ¯ and then to approximate the first two terms in the right-hand side of the perturbation formula. Using a matrix discretization we approximate (q) ¯ and using Monte Carlo (MC) we approximate the weighted integral. We derive the cost of the algorithm and show that it has optimal information complexity only when d 2. Proving the optimality of this algorithm for d > 2 is an open question at this time. In summary, denoting by n(ε) and comp(ε) the information complexity and the problem complexity, for deterministic algorithms we have n(ε) = (ε−d ), (c ε−d ) = comp(ε) = O(c ε−d + ε −d log ε−1 ), while for randomized algorithms we have (ε−2d/(d+2) ) = n(ε) = O(ε− max(2/3,d/2) ), (ε−2d/(d+2) ) = comp(ε) = O(c ε− max(2/3,d/2) + ε −d log ε−1 ), where the asymptotic constants depend on d, and c denotes the cost of one function evaluation. It is worth pointing out that even if one is able to obtain matching information complexity bounds for any d, the combinatorial cost (i.e., the number of operations excluding function evaluations) of the randomized algorithm is still exponential in d, because we have to solve a matrix eigenvalue problem and the size of the matrix is exponential in d. For quantum algorithms, we treat algorithms using bit queries and power queries separately. For quantum algorithms with bit queries we use the perturbation formula above to reduce the problem to integration. We obtain lower bounds for the query complexity of integration, which yield lower bounds for the query complexity of the eigenvalue problem. We see that we can modify the classical randomized algorithm we discussed above, to obtain a hybrid algorithm, i.e., an algorithm with classical and quantum parts. The only difference with the randomized algorithm is that, instead of using MC, we approximate the weighted integral in the perturbation formula by a quantum algorithm. The quantum algorithm that approximates the integral is due to Novak [20]. We show that the number of queries plus the number of classical function evaluations of the hybrid algorithm matches the query complexity lower bound only when d = 1. Then q ∈ C 1 ([0, 1]), while for q ∈ C 2 ([0, 1]) the same result has been shown in [21]. When d > 1 we only show that the algorithm has information cost and uses a number of classical function evaluations that is exponential in d. The cost of approximating q by q¯ with error ε is dominant in the worst case. As we already indicated, even if we are able to show matching upper and lower bounds for the query complexity that are also proportional to the classical information cost when d > 1, the number of classical operations required by the algorithm is still exponential in d, due to the cost of the matrix eigenvalue problem. However, there is a different quantum algorithm (without any classical parts) that uses bit queries whose cost is not exponential in d. Indeed, we can use phase estimation to solve the problem. Phase estimation typically uses power queries [1,19] but we can approximate the power queries using a number of bit queries that is polynomial in ε −1 , where the degree of the polynomial is independent of d. Denoting the query complexity by nquery (ε) we show that for bit queries (ε−d/(d+1) ) = nquery (ε) = O(ε−6 log2 ε −1 ),


805

where the asymptotic constants depend on d. Moreover the algorithm uses a number of quantum operations, excluding the queries, that is proportional to dε −6 log4 ε −1 , a number of qubits proportional to d log ε−1 , and the algorithm succeeds with probability at least 43 . We remark that due to the results of [30] the number of qubits is optimal modulo multiplicative constants. Phase estimation with power queries has a considerable advantage since nquery (ε) = (log ε −1 ), where the asymptotic constant is an absolute constant, and the lower bound follows using the results of [5,6]. The number of quantum operations, excluding queries, is proportional to log2 ε −1 , the number of qubits is proportional to d log ε−1 and thereby optimal, while the algorithm succeeds with probability at least 43 . 2. Problem definition Let Id = [0, 1]d and consider the class of functions *q Q = q: Id → [0, 1]q, Dj q := ∈ C(Id ), Dj q∞ 1, q∞ 1 , *xj where · ∞ denotes the supremum norm. For q ∈ Q, define Lq := − + q, where = d 2 2 j =1 * /*xj is the Laplacian, and consider the eigenvalue problem Lq u = u, x ∈ (0, 1)d , u(x) ≡ 0, x ∈ *Id . In the variational form, the smallest eigenvalue = (q) of (1), (2) is given by d 2 2 j =1 [Dj u(x)] + q(x)u (x) dx Id (q) = min . 2 0=u∈H01 Id u (x) dx

(1) (2)

(3)

We will study the complexity of classical and quantum algorithms approximating (q) with error ε. We will show asymptotic bounds for the error of the algorithms and the problem complexity, assuming that d is fixed. Henceforth, all asymptotic constants in the error estimates, the complexity estimates and the cost of algorithms are either absolute constants or depend on d. Often we will be addressing these constants. In some cases their nature will be evident from the properties of the algorithm under consideration, but in all cases, especially when the constants are omitted from the discussion, the reader may assume they depend only on d for simplicity.

806


2.1. Preliminary analysis The properties of the eigenvalues and eigenvectors of problems such as (1), (2) (defined on a rectangular domain) are discussed extensively in [23] where it is shown that the eigenfunctions are continuous and they have continuous partial derivatives of the first order, including the boundary of Id . The operator Lq is symmetric and its eigenvalues and eigenvectors are real. The eigenvalues are positive, they can be indexed in non-decreasing order 0 < 1 (q)2 (q) · · · k (q) · · · , and the sequence of eigenvalues tends to infinity. We denote the corresponding eigenvectors by uq,k , k = 1, 2, . . . . The smallest eigenvalue (q) ≡ 1 (q) is simple, the corresponding eigenspace has dimension one, and the eigenvector, uq ≡ uq,1 , is uniquely determined up to the sign. It is convenient to assume that the uq,k are normalized, i.e.,

1/2 2 uq,k L2 := uq,k (x) dx = 1, k = 1, 2 . . . . Id

Thus they form a complete orthonormal system in L2 (Id ). Then (3) becomes d (q) = min (Dj u)2 (x) + q(x)u2 (x) dx u∈H01 ,uL2 =1 Id j =1

=

d Id j =1

(Dj uq )2 (x) + q(x)u2q (x) dx.

(4)

For additional details concerning the properties of eigenvalues and eigenfunctions of elliptic operators as well as numerical methods approximating them, see [2,9,11,12] and the references therein. For a constant function q ≡ c we know that (c) = d + c 2

and

uc (x1 , . . . , xd ) = 2

d/2

d

sin(xj ).

j =1

It is also known that the eigenvalues of Lq are non-decreasing functions of q [9,23], i.e., q(x) ¯ for all k = 1, 2, . . . . Thus, using (4) for the q(x), ¯ for all x ∈ [0, 1]d , implies that k (q)k (q), class Q we get d2 = (0)(q)d2 + 1,

q ∈ Q.

For d > 1, the eigenvalues of Lq are, generally, not all simple. However, as in the case d = 1, the smallest eigenvalue (q) is simple and is well separated from the remaining eigenvalues. This is because of the non-decreasing property of the eigenvalues of Lq with respect to q, and the fact that the second smallest eigenvalue of L0 is equal to 2 (0) = (d + 3)2 . Therefore, using (q) d2 + 1, we obtain k (q) − (q)2 (q) − (q)32 − 1,

k 3, q ∈ Q.

(5)


807

We will use this fact to establish an estimate for the smallest eigenvalue by considering a perturbation of q. For any two functions q, q¯ ∈ Q we have |(q) − (q)| ¯ q − q ¯ ∞, uq − uq¯ L2 O (q − q ¯ ∞) , (q) = (q) ¯ + ¯ u2q¯ (x) dx + O q − q ¯ 2∞ . (q(x) − q(x))

(6) (7) (8)

Id

Eqs. (6) and (8) are derived as in [21]. They follow from elementary arguments and (7). For the convenience of the reader we point out that it is easy to show that 2 2 2 (q(x) − q(x))u ¯ (q(x) − q(x))(u ¯ (q)(q) ¯ + q (x) dx + q¯ (x) − uq (x)) dx, Id

and similarly

Id

(q)(q) ¯ + Id

(q(x) ¯ − q(x))u2q (x) dx.

These inequalities imply (6), and using them with (7) we obtain (8). Moreover, 2 2 (q(x) − q(x))(u ¯ q¯ (x) − uq (x)) dx 0. Id

We prove Eq. (7) using an approach similar to that in [29], which is based on the separation between 2 (q) and (q). It is a different proof from the one used in [21]. Indeed, let q and q¯ be two functions from the class Q and consider Lq and Lq¯ . Let k (q), uq,k and k (q), ¯ uq,k ¯ , be the eigenvalues and the normalized eigenvectors, k = 1, 2 . . ., of Lq and Lq¯ , respectively. Then Lq¯ uq − (q)uq = Lq uq + (q¯ − q)uq − (q)uq , which implies that Lq¯ uq − (q)uq L2 = (q¯ − q)uq L2 q¯ − q∞ . Since the eigenvectors of Lq¯ form a complete orthonormal system in L2 (Id ) we have uq =

∞

ak uq,k ¯

with uq 2L2 =

k=1

∞

ak2 = 1, ak ∈ R

k=1

and Lq¯ uq =

∞

ak k (q)u ¯ q,k ¯ .

k=1

Thus

2 ∞ ∞ q − q ¯ 2∞ a [ ( q) ¯ − (q)]u = ak2 |k (q) ¯ − (q)|2 k k q,k ¯

k=1 ∞

L2

k=1 ∞ ak2 |k (q) ¯ − (q)|2 (32 − 1)2 ak2 k=2 k=2

= (32 − 1)2 (1 − a12 ),

808


where the last inequality is due to the lower bound (5). Thus ¯ 2∞ . a12 1 − (32 − 1)−2 q − q

(9)

Observe that the inequality above implies that a12 0.99. Without loss of generality we assume that the sign of uq has been chosen such that a1 > 0 and then a1 1 − (32 − 1)−2 q − q ¯ 2∞ . Also uq − uq¯ 2L2 = (1 − a1 )2 +

∞

ak2

(10)

k=2

= (1 − a1 )2 + 1 − a12 = 2(1 − a1 ) ¯ 2∞ 2 1 − 1 − (32 − 1)−2 q − q ¯ 2∞ , (32 − 1)−2 q − q

(11)

¯ 2∞ ∈ (0, 1), and this proves where the last inequality is due to the fact that (32 − 1)−2 q − q (7). As a final remark, we observe that the same analysis that led to (9) can be used to establish that (q) is indeed a simple eigenvalue for any q ∈ Q. Clearly (0) is simple. Using (9) with q¯ = 0, we obtain that the square of the projection of u0 onto uq is bounded from below, i.e., 2 that Id u0 (x)uq (x) dx > 21 . If (q) were not simple and the eigenspace corresponding to it had dimension greater than one, there would be at least two orthogonal eigenfunctions uq,1 and uq,2 (both corresponding to (q)). Then each of the projections of uq,1 and uq,2 on u0 satisfies the preceding inequality (since (0) is simple). Thus, expanding u0 using the eigenfunctions of Lq would lead us to conclude that u0 L2 > 1, a contradiction since we have assumed u0 is a normalized eigenfunction. 3. Classical algorithms Let us now discuss the type of classical algorithms we consider and define how we measure their error and cost. These algorithms can be either deterministic or randomized. They use information about the functions q from Q by computing q(ti ) for some discretization points ti ∈ [0, 1]d . Here, i = 1, 2, . . . , nq , for some nq , and the points ti can be adaptively chosen, i.e., ti can be a function ti = ti (t1 , q(t1 ), . . . , ti−1 , q(ti−1 )) of the previously computed function values and points for i 2. The number nq can also be adaptively chosen, see, e.g., [25] for details. A classical deterministic algorithm produces an approximation (q) = (q(t1 ), . . . , q(tnq )) to the smallest eigenvalue (q) based on finitely many values of q computed at deterministic points. Let n = supq∈Q nq . We assume that n < ∞. The worst case error of such a deterministic


809

algorithm is given by ewor (, n) = sup |(q) − (q)|.

(12)

q∈Q

A classical randomized algorithm produces an approximation to (q) based on finitely many evaluations of q computed at random points, and is of the form (q) = (q(t1, ), . . . , q(tnq, , )), where , ti, and nq, are random variables. We assume that the mappings → ti, = ti (t1, , q(t1, ), . . . , ti−1, , q(ti−1, )), → , → nq, are measurable. Let nq = E(nq, ) be the expected number of values of the function q with respect to . As before, we assume that n = supq∈Q nq < ∞. The randomized error of such a randomized algorithm is given by 1/2 . (13) eran (, n) = sup E[(q) − (q)]2 q∈Q

We denote the minimal number of function values needed to compute an ε-approximation of the Sturm–Liouville eigenvalue problem in the worst case and randomized settings by nwor (ε) = min{ n: ∃ such that ewor (, n) ε } and nran (ε) = min{ n : ∃ such that eran (, n) ε }, respectively. We refer to nwor (ε) and nran (ε) as the worst case and the randomized case information complexity, respectively. We also consider the cost of combining the function evaluations. For a function q ∈ Q, let mq be the number of arithmetic operations used by an algorithm in order to combine nq function values and obtain the final result. Then the worst case cost of an algorithm is defined as costwor () = sup c nq + mq , q∈Q

where c denotes the cost of an evaluation of q. The worst case complexity compwor (ε) is defined as the minimal cost of an algorithm whose worst case error is at most ε, compwor (ε) = min cost wor () : such that ewor (, n) ε . Obviously, compwor (ε) c nwor (ε). The cost of a randomized algorithm using n = supq∈Q E(nq, ) < ∞ randomized function evaluations is defined as 2 1/2 , costran () = sup E c nq, + mq, q∈Q

810


where mq, is the number of arithmetic operations used by the algorithm for a function q from Q and a random variable . The randomized complexity compran (ε) = min cost ran () : such that eran (, n) ε is the minimal cost of an algorithm whose randomized error is at most ε. Obviously, compran (ε) c nran (ε). 3.1. Deterministic algorithms In this section we derive lower and upper bounds for the error and the complexity of deterministic algorithms in the worst case. We begin with the lower bounds. Our derivation is based on the proof in [21] which deals with the case d = 1. Let q¯ = 21 and consider q ∈ Q such that q − 21 ∞ c. Then u1/2 is known and (8) becomes (q) = d2 +

1 + 2d 2

q(x1 , . . . , xd ) − Id

1 2

d

sin2 (xj ) dx1 . . . dxd + ,

(14)

j =1

where || = O(c2 ). Recall that 0 because the first three terms in the right-hand side of the equation overestimate (q) due to (4). Functions that differ by a constant satisfy the above equation with the same value of . Assume that c > 0 is sufficiently small so that c + || < 21 . We will reduce the eigenvalue problem to the multivariate integration problem and use the well known [25] lower bounds for integration to establish a lower bound for the eigenvalue problem. Consider the class of functions Fc = f : Id → R f, Dj f ∈ C(Id ), Dj f ∞ 1, j = 1, . . . , d, f ∞ c (15) and the approximation of weighted integrals of the form S(f ) =

f (x1 , . . . , xd ) Id

d

sin2 (xj ) dx1 . . . dxd .

(16)

j =1

The worst case error of any deterministic algorithm approximating such integrals using n points in Id is (n−1/d ), where the asymptotic constant depends on d. Here we assume that n is large enough so that c?n−1/d . This lower bound is known [25] for integration without weights but the same proofs carry over to this case. Take an f ∈ Fc and set q = f + 21 . Then q belongs to Q. The functions q ± also belong to the class Q because c + || < 21 . Let q˜ = q − . Then (q) ˜ = d2 +

1 2

+ 2d S(f ).

ˆ q) Let ( ˜ be an algorithm approximating (q) ˜ using n function evaluations of q˜ at deterministic points. Then ˆ q) ˜ − d2 − 21 (17) (f ) = 2−d (


811

is an algorithm approximating the weighted integral S(f ) with error ˆ q) |S(f ) − (f )| = 2−d |( ˜ − (q)|. ˜ In the worst case with respect to f this quantity is (n−1/d ). Hence, the error of any deterministic algorithm ˆ that approximates (q), for q ∈ Q, using n evaluations of q is bounded from below as follows: ˆ n) = sup |(q) ˆ ewor (, − (q)| = (n−1/d ), q∈Q

where the asymptotic constant depends on d. Therefore, the worst case information complexity nwor (ε) is bounded from below by a quantity proportional to ε −d . Let us now consider upper bounds for the problem complexity. We discretize Lq at the points (i1 h, . . . , id h), ij = 1, . . . , m, j = 1, . . . , d, where h = (m + 1)−1 , and we obtain an md × md matrix Mh (q) = −h +Bh (q), where −h is the md ×md matrix resulting from the (2d +1)-point finite difference discretization of the Laplacian [11,12]. The matrix Bh (q) is diagonal containing evaluations of q at all the discretization points. The matrix Mh (q) is sparse, symmetric positive definite, and its smallest eigenvalue approximates the smallest eigenvalue of Lq with error O(h) [26,27], i.e., |(q) ¯ − (Mh (q))| ¯ = O(h). For example when d = 2 we have ⎛ ⎞ Th −I ⎜ −I Th −I ⎟ ⎜ ⎟ ⎟ −2 ⎜ . . . .. .. .. −h = h ⎜ ⎟ ⎜ ⎟ ⎝ −I Th −I ⎠ −I Th where I is the m × m matrix given by ⎛ 4 −1 ⎜ −1 4 ⎜ ⎜ .. Th = ⎜ . ⎜ ⎝

⎛

and

⎜ ⎜ ⎜ Bh (q) ¯ =⎜ ⎜ ⎜ ⎝

⎞

b11 ..

⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠

. bij ..

. bmm

identity matrix, bij = q(ih, j h), i, j = 1, . . . , m, and Th is the m × m ⎞ ⎟ −1 ⎟ ⎟ .. .. ⎟, . . ⎟ −1 4 −1 ⎠ −1 4

see [11, p. 270] for more details. The matrix −h has been extensively studied in the literature; see [11,12] and the references therein. Its eigenvalues and eigenvectors are known. The smallest eigenvalue of Mh (0) = −h is (Mh (0)) = 4dh−2 sin2 (h/2) = d2 (1 + O(h2 )). Moreover, the eigenvectors of Mh (0) are tensor products of the eigenvectors of the corresponding matrix in the one-dimensional case d = 1. This is also trivially true for the eigenvectors of Mh (c), where c is any constant. Using results concerning the eigenvalues of perturbed symmetric matrices [29] we have that the smallest eigenvalue (Mh (q)) of Mh (q) satisfies |(Mh (0)) − (Mh (q))|1.

812


Moreover, the eigenvalues k (Mh (q)), k = 1, . . . , md (indexed in non-decreasing order) satisfy an equation similar to (5), namely, k (Mh (q)) − (Mh (q))2 (Mh (q)) − (Mh (q)) 32 − 2,

k 3, q ∈ Q.

(18)

The inequalities follow from results concerning the eigenvalues of the sum of two symmetric matrices [29, p. 101], and the separation of the eigenvalues of the matrix Mh (0) = −h . We can approximate the smallest eigenvalue of Mh (q) with error h using the bisection method [11, p. 228] in O(log m) steps. Each step takes a number of arithmetic operations proportional to the number of non-zero elements in Mh (q), which is O(md ), with the asymptotic constant depending on d. Hence, the total cost of approximating (Mm (q)) is O(md log m), and the asymptotic constant depends on d. Setting m + 1 = ε−1 , we obtain an algorithm that approximates (q) by the smallest eigenvalue of the matrix (Mε (q)). This algorithm has error O(ε) and uses ε −d evaluations of q, and O(ε−d log ε−1 ) arithmetic operations. Combining the lower bound for nwor (ε) from the first part of this section with the cost of the algorithm above we obtain the following theorem. Theorem 3.1. nwor (ε) = (ε−d ),

(c ε−d ) = compwor (ε) = O(c ε−d + ε −d log ε−1 ),

where the asymptotic constants depend on d. We conclude this section by remarking that we can extend these results about nwor (ε) to the case where q has continuous and bounded partial derivatives up to order r. The same approach yields that nwor (ε) = (ε−d/r ). So we have a delayed curse of dimension. 3.2. Randomized algorithms We first prove lower bounds for nran (ε) just as we proved lower bounds for nwor (ε). We reduce the problem to multivariate integration and use the known randomized information complexity lower bounds for integration. Recall the perturbation formula (14), the definition (15) of the class Fc and the weighted integration problem (16). Assuming that n is sufficiently large so that c?n−(d+2)/(2d) , we know [25] that the error of any randomized algorithm that approximates the weighted integral S(f ) using n function evaluations at randomly chosen points is bounded from below by a quantity proportional to n−(d+2)/(2d) . (As we already mentioned, this is known for integrals without weights but the same proofs carry over to this case.) For f ∈ Fc , set q = f + 21 ∈ Q and q˜ = q − ∈ Q; see (14). Let ˆ be any randomized algorithm that uses n function evaluations to approximate (q). ˜ Then (f ), defined by replacing ˆ with ˆ in (17), is a randomized algorithm approximating S(f ), and its error is

E[S(f ) − (f )]2

1/2

1/2 = 2−d E[ˆ (q) ˜ − (q)] ˜ 2 .

Taking the worst case with respect to f we see that this quantity is (n−(d+2)/(2d) ).


813

Therefore for any randomized algorithm ˆ that approximates (q), for q ∈ Q, using n function evaluations of q at randomly chosen points, we have ˆ n) = (n−(d+2)/(2d) ), eran (, which implies that nran (ε) = (ε−2d/(d+2) ), where the asymptotic constant depends on d. We now derive upper bounds for compran by constructing an algorithm. First we take (n + 1)d samples of q on a grid of equally spaced points (i1 /n, . . . , id /n), ij = 0, . . . , n, j = 1, . . . , d. Using these points we construct a piecewise polynomial q˜ by interpolation. For instance, q˜ can be a natural spline. Then q˜ − q∞ = O(n−1 ). Setting q¯ = q˜ + O(n−1 ) we have that q¯ 0 and q¯ − q∞ = O(n−1 ). Clearly, given the evaluations of q, q¯ can be constructed with O(nd ) arithmetic operations. The perturbation formula (8) for q and q¯ becomes (q) = (q) ¯ + (q(x) − q(x)) ¯ u2q¯ (x) dx + O(n−2 ). (19) Id

We will approximate (q) by an algorithm that ˆ q) 1. computes ( ¯ that approximates (q) ¯ by discretizing Lq¯ and solving a matrix eigenvalue problem, 2. replaces uq¯ in the integral above by an approximate eigenfunction uˆ q¯ , and 3. approximates the resulting integral by MC. Therefore, (19) becomes 2 (q) = (q) ¯ + (q(x) − q(x)) ¯ uˆ q¯ (x) dx + (q(x) − q(x)) ¯ (u2q¯ (x) − uˆ 2q¯ (x)) dx Id

Id

+O(n−2 ),

(20)

and the algorithm approximates the first two terms in the right-hand side of this expression. In particular, the algorithm is given by 1 ˜ ˆ q) (q) := ( ¯ + (q(ti, ) − q(t ¯ i, )) uˆ 2q¯ (ti, ), k k

(21)

i=1

where t1, , . . . , tk, are independent random numbers that follow the uniform distribution in Id . Then the expected error of this algorithm satisfies 1/2 2 ˜ ˆ q)| E[(q) − (q)] |(q) ¯ − ( ¯ 2 !1/2 k 1 2 2 + E (q(x) − q(x)) ¯ uˆ q¯ (x) dx − (q(ti, ) − q(t ¯ i, )) uˆ q¯ (ti, ) k Id i=1

+ uq¯ − uˆ q¯ L2 O(n

−1

) + O(n

−2

).

(22)

Let us now discuss the individual steps of the algorithm and the resulting errors. We discretize the operator Lq¯ on a grid with mesh size h = (m + 1)−1 , exactly as we did in the previous section. The smallest eigenvalue (Mh (q)) ¯ of the resulting matrix approximates (q) with error |(q) ¯ − (Mh (q))| ¯ = O(h),

(23)

814


ˆ h (q)) see [27]. We approximate (Mh (q)) ¯ by (M ¯ with error ˆ h (q)) |(M ¯ − (Mh (q))|h, ¯

(24)

which we obtain using the bisection method with cost proportional to md log m times a constant ˆ q) ˆ h (q)). ¯ that depends on d. In (21), we set ( ¯ := (M We now show how to construct the approximate eigenfunction uˆ q¯ required for the second step of our algorithm. Let z = z(Mh (q)) ¯ be the eigenvector of Mh (q) ¯ that corresponds to (Mh (q)). ¯ We assume that z is normalized so that z2 :=

md

1/2 zk2

= 1.

k=1

ˆ h (q)), ¯ we compute an approximation of z using an inverse iteration with the matrix Given (M ˆ h (q)) Mh (q) ¯ − (M ¯ I. We can compute the determinant of this matrix with cost proportional to md . If the matrix is ˆ h (q)) singular, we can perturb (M ¯ by h to obtain a non-singular matrix. The initial vector in inverse iteration is z0 , the eigenvector of Mh (0) that corresponds to its smallest eigenvalue. Observe that the separation of eigenvalues of Mh (q) ¯ as expressed by (18) and arguments similar to those that ¯ 2∞ /(32 − 2)2 . led to (9) and which can be found in [29, p. 172], yield that (z0T z)2 1 − q Since the projection of the initial vector onto the eigenvector of interest is sufficiently large, with O(log m) inverse iteration steps we obtain an approximate eigenvector zˆ , with ˆz2 = 1, such that ˆz − z2 = O(h). The total cost to obtain zˆ is O(md log m). The Rayleigh quotient h =

¯ z zˆ T Mh (q)ˆ , 2 ˆz2

¯ with error O(h). Using zˆ we construct the approximate eigenfunction also approximates (Mh (q)) uˆ q¯ of Lq¯ by a method suggested by Courant [10] and used in [27]. In particular, we subdivide Id into simplices whose vertices are the grid points. Then we construct a piecewise linear function on each simplex that is zero on the boundary of Id and interpolates the values of zˆ at the grid points; see [27] for the details. We denote the interpolating function by u˜ q¯ . The cost for constructing u˜ q¯ is O(md ). Consider now the Rayleigh quotient d ˜ q¯ (x)]2 + q(x) ¯ u˜ 2q¯ (x) dx j =1 [Dj u Id (25) = u˜ q¯ 2L2 for the function u˜ q¯ . From [27] we know that (q) ¯ h + O(h).


815

Since |h − (Mh (q))| ¯ = O(h), the equation above and (23) imply that | − (q)| ¯ = O(h).

(26)

We set uˆ q¯ := u˜ q¯ /u˜ q¯ L2 with cost O(md ). Let us now estimate uq¯ − uˆ q¯ L2 . Consider the ∞ eigenvalues ( q) ¯ and eigenvectors u , k = 1, . . . , of L . Then we have u ˆ = k q,k ¯ q ¯ q ¯ ¯ , k=1 ak uq,k ∞ 2 where k=1 ak = 1. Thus from (25) we obtain =

∞

k (q)a ¯ k2 .

k=1

Equivalently, 0=

∞

ak2 [k (q) ¯ − ]

=

k=1

∞

ak2 [k (q) ¯ − ] − a12 [ − 1 (q)]. ¯

k=2

¯ we obtain Using (5) and (26) and the fact that (q) ¯ = 1 (q) a12 [ − (q)] ¯ (32 − 2)

∞

ak2 = (32 − 2)(1 − a12 ),

k=2

and using (26) again, we find that 1 − a12 = O(h). Hence, uq¯ − uˆ q¯ 2L2 = O(h).

(27)

The proof of the last equation is the same as the proof we used to derive (11) from Eq. (9). Recall that the algorithm (21) uses MC to approximate the first integral in (20). It is well known that MC with k function evaluations has error bounded from above by the L2 norm of the integrand times k −1/2 , i.e., the MC error does not exceed n−1 k −1/2 .

(28)

˜ Combining (22) with (23), (24), (27), (28) we obtain that the expected error of the algorithm (q), described in (21), is bounded from above by a quantity proportional to m−1 + n−1 k −1/2 + n−1 m−1/2 + n−2 .

(29)

The cost of this algorithm is equal to nd evaluations of q at deterministic points, plus k evaluations involving q (i.e., evaluations of (q − q) ¯ uˆ 2q¯ ), plus a number of arithmetic operations propord d tional to n + m log m + k times a constant that depends on d. Taking m−1 = ε and observing that we can take k = nd without changing the order of magnitude of the cost of the algorithm, expression (29) becomes ε + n−(d+2)/2 + n−1 ε 1/2 + n−2 .

(30)

The number of evaluations of q is proportional to nd and the number of arithmetic operations is proportional to nd + ε −d log ε−1 times a constant that depends on d.

816


The cost of approximating q by q¯ is proportional to nd . It is worth noting that this is the dominant part of the algorithm cost. Indeed, even though we can approximate the first integral of (21) with high accuracy using MC with O(nd ) function evaluations, the advantages of this approximation are lost when n−(d+2)/2 = O(n−2 ) since the eigenvalue error depends on O(n−2 ) as seen in (30). Therefore when d 2, we get error of order ε with ε −2d/(d+2) function evaluations, while for d > 2 we get error of order ε with ε−d/2 function evaluations. In both cases the number of arithmetic operations is proportional to ε −d log ε−1 times a constant that depends on d. We summarize the results of this section in the following theorem. Theorem 3.2. (ε−2d/(d+2) ) = nran (ε) = O(ε− max(2/3,d/2) ), (ε−2d/(d+2) ) = compran (ε) = O(c ε− max(2/3,d/2) + ε −d log ε−1 ), where the asymptotic constants depend on d. When d > 2 we do not have matching upper and lower bounds for nran (ε) and improving the upper bound is an open problem at this time. One possibility would be to use a perturbation formula of higher order of accuracy. On the other hand, we see that if we consider functions that have continuous and bounded mixed partial derivatives up to order r then our approach yields that (ε−2d/(2r+d) ) = nran (ε) = O(ε− max(2d/(2r+d),d/(2r)) ), which extends the range of values of d to 1 d 2r for which we do have matching upper and lower bounds. 4. Quantum algorithms A quantum algorithm applies a sequence of unitary transformations to an initial state, and the final state is measured. See [3,8,14,19] for the details of the quantum model of computation. We briefly summarize this model to the extent necessary for this paper. The initial state |0 is a unit vector of the -fold tensor product Hilbert space H = C2 ⊗ · · · ⊗ C2 , for some appropriately chosen integer , where C2 is the two-dimensional space of complex numbers. The dimension of H is 2 . The number denotes the number of qubits used in quantum computation. The final state | is also a unit vector of H and is obtained from the initial state |0 by applying a number of unitary 2 × 2 matrices, i.e., | := UT QY UT −1 QY · · · U1 QY U0 |0 .

(31)

Here, U0 , U1 , . . . , UT are unitary matrices that do not depend on the input function q. The unitary matrix QY with Y = [q(t1 ), . . . , q(tn )] is called a quantum query and depends on n (with n 2 ), function evaluations of q computed at some non-adaptive points ti ∈ Id . The quantum query QY is the only source of information about q. The integer T denotes the number of quantum queries we choose to use. At the end of the quantum algorithm, a measurement is applied to its final state |. The measurement produces one of M outcomes, where M 2 . Outcome j ∈ {0, 1, . . . , M − 1} occurs with probability pY (j ), which depends on j and the input Y. Knowing the outcome j, we compute an approximation ˆ Y (j ) of the smallest eigenvalue on a classical computer.


817

We now define the error in the quantum setting. In this setting, we want to approximate the smallest eigenvalue (q) with a probability p > 21 . For simplicity, we take p = 43 in the rest of this section. As is common for quantum algorithms, we can achieve an ε-approximation with probability arbitrarily close to 1 by repeating the original quantum algorithm, and by taking the median as the final approximation. The local error of the quantum algorithm with T queries that computes ˆ Y (j ) for the function q ∈ Q and the outcome j ∈ {0, 1, . . . , M − 1} is defined by e(ˆ Y , T ) = min : pY (j ) 43 . j : |(q)−ˆ Y (j )|

This can be equivalently rewritten as e(ˆ Y , T ) = min max (q) − ˆ Y (j ), A: (A) 3/4 j ∈A

where A ⊂ {0, 1, . . . , M − 1} and (A) =

j ∈A pY (j ).

The worst probabilistic error of a quantum algorithm ˆ with T queries for the Sturm–Liouville eigenvalue problem is defined by quant ˆ d ˆ e (, T ) = sup e(Y , T ): Y = [q(t1 ), . . . , q(tn )], ti ∈ [0, 1] , for q ∈ Q . (32) We define the query complexity nquery (ε) of a quantum algorithm by ˆ T ) ε }. nquery (ε) = min{ T : ∃ ˆ such that equant (,

(33)

Moreover, since we will be dealing with two types of queries, bit queries and power queries, we will be using the notation nbit-query (ε) and npower-query (ε), respectively, to label the query complexity by the type of queries used. In principle, quantum algorithms may have many measurements applied between sequences of unitary transformations of the form presented above. However, any algorithm with many measurements and a total of T quantum queries can be simulated by a quantum algorithm with only one measurement at the end, for details see e.g., [14]. Classical algorithms in floating or fixed point arithmetic can also be written in the form of (31). Indeed, all classical bit operations can be simulated by quantum computations, see e.g., [4]. Classically computed function values will correspond to bit queries, which we discuss in the next section. We formally use the real number model of computation [24]. Since our eigenvalue problem is well conditioned and properly normalized, we obtain practically the same results in floating or fixed point arithmetic. More precisely, it is enough to use O(log ε −1 ) mantissa bits, and the cost of bit operations in floating or fixed point arithmetic is of the same order as the cost in the real number model multiplied by a power of log ε−1 . Hybrid algorithms, which are combinations of classical and quantum algorithms, can be viewed as finite sequences of algorithms of the form (31) and can be expressed as one quantum algorithm of the form (31), see [14,15]. Consequently, when proving lower bounds it suffices to consider only algorithms of the form (31). For upper bounds it is sometimes convenient to distinguish between classical and quantum computations and charge their costs differently. The cost of classical computations is defined in the previous section. The cost of quantum computations is defined as

818


the sum of the number of quantum queries multiplied by the cost of a query, plus the number of quantum operations other than queries. It is also important to indicate how many qubits are used by the quantum algorithm. 4.1. Bit queries Quantum queries are important in the complexity analysis of quantum algorithms. A quantum query corresponds to a function evaluation in classical computation. By analogy with the complexity analysis of classical algorithms, we analyze the cost of quantum algorithms in terms of the number of quantum queries that are necessary to compute an ε-approximation with probability 43 . Clearly, this number is a lower bound on the quantum complexity, which is defined as the minimal total cost of a quantum algorithm that solves the problem. Different quantum queries have been studied in the literature. Probably the most commonly studied query is the bit query as used in Grover’s search algorithm [13]. For a Boolean function f : {0, 1, . . . , 2m − 1} → {0, 1}, the bit query is defined by Qf |j |k = |j |k ⊕ f (j ). Here = m + 1, |j ∈ Hm , and |k ∈ H1 with ⊕ denoting addition modulo 2. For real functions q the bit query is constructed by taking the most significant bits of the function evaluated at some points tj . More precisely, as in [14], the bit query for the function q has the form Qq |j |k = |j |k ⊕ (q( (j ))), where the number of qubits is now = m + m and |j ∈ Hm , |k ∈ Hm with some functions : [0, 1] → {0, 1, . . . , 2m − 1} and : {0, 1, . . . , 2m − 1} → Id , and ⊕ denotes addition modulo 2m . Hence, we compute q at tj = (j ) ∈ Id and then take the m most significant bits of q(tj ) by (q(tj )), for details and a possible use of ancilla qubits see again [14]. The quantum amplitude amplification algorithm of Brassard et al. [7] computes the mean of a Boolean function defined on the set of N elements with accuracy ε and probability 43 using of order min{N, ε−1 } bit queries. Modulo multiplicative factors, it is an optimal algorithm, in terms of the number of bit queries. This algorithm can be also used to approximate the mean of a real function f : Id → R with |f (x)|M, x ∈ Id , see [14,20]. More precisely, if we want to approximate N−1 1 SN (f ) := f (xj ) N j =0

for some xj ∈ Id and N, then the amplitude amplification algorithm QSN (f ) approximates SN (f ) such that |SN (f ) − QSN (f )| ε

with probability

3 4

(34)

using of order min(N, Mε−1 ) bit queries, min(N, Mε −1 ) log N quantum operations, and log N qubits. We begin by showing a lower bound for the query complexity, nbit-query (ε), of the eigenvalue problem. We do this by first estimating the bit query complexity, nbit-query (ε, INTFc ), of the weighted integration problem (16) in the class Fc , as defined in (15), and then reducing the eigenvalue problem to the integration problem.


819

From [20] we have nbit-query (ε, INTFc ) = O(ε−d/(d+1) ). Consider now any quantum algorithm that solves the integration problem with error ε and probability at least 43 , using k bit queries. " Let h(x1 , . . . , xd ) = dj =1 hj (xj ) for (x1 , . . . , xd ) ∈ Id , where hj (x) = x 2 (1 − x)2 , x ∈ [0, 1], and h(x1 , . . . , xd ) = 0 for (x1 , . . . , xd ) ∈ Rd \ Id . Here, is a constant such that h ∈ F1 , where F1 is defined by (15) with c = 1. For each j = 1, . . . , d and i = 0, . . . , n − 1, let hi,j (x) = hj (n(x − i/n)). Then the support of hi,j is [i/n, (i + 1)/n]. We obtain nd functions on Id . Each function is defined by hi1 ,...,id (x1 , . . . , xd ) =

d

hij ,j (xj ), n j =1

" and its support is the cube dj =1 [ij /n, (ij + 1)/n], ij = 0, . . . , n − 1. For notational convenience we re-index these functions, in any desirable way, and denote them by g , = 0, . . . , nd − 1 (i.e., g = hi1 ,...,id .) Thus g ∞ n−1 and assuming that c?n−1 we have g ∈ Fc . Then −d−1 g (x) dx = n h(x) dx, = 0, . . . , nd − 1. Id

Id

Consider now any Boolean function B: {0, 1, . . . , nd − 1} → {0, 1} and define the function fB (x) =

d −1 n

B()g (x),

x ∈ Id .

=0

Then fB ∈ Fc and

fB (x) dx = Id

Id

−1 h(x) dx 1 n B(). n nd d

=0

Thus, computing the Boolean mean is reduced to computing the integral of fB . From [18] we know that k < nd bit queries yield error (k −1 ) in the approximation of the Boolean mean. Therefore, by setting k = nd , ∈ (0, 1), we obtain that the error in approximating the integral of fB is (n−(d+1) ). Hence, for error ε we need k = (ε −d/(d+1) ) bit queries. Using the upper bound of [20] we obtain nbit-query (ε, INTFc ) = (ε−d/(d+1) ). This complexity bound remains valid if c depends on ε and c(ε) → 0, as ε → 0, but not very fast. Therefore, when c(ε)ε−1/(d+1) → ∞ as ε → 0, the bit query complexity for integration in the class Fc(ε) is (ε −d/(d+1) ). Now that we have the bit query complexity for integration, we reduce the eigenvalue problem to integration and obtain a lower bound for the bit query complexity of the eigenvalue problem. This is done in exactly the same way as for classical deterministic and randomized algorithms. In particular, using Eq. (14) we see that any algorithm approximating (q) ˜ can be used to derive an

820


algorithm that solves the integration problem S(f ) defined in (16), and f = q˜ + − 21 belongs to the class Fc (15). We omit the details since we have already presented this argument twice. Therefore, solving the eigenvalue problem with error ε and probability at least 43 implies that we can solve the integration problem with error O(ε) and probability at least 43 . Consequently the bit query complexity nbit-query (ε) of the eigenvalue problem is at least as large as the bit query complexity of the integration problem. We have proved the following theorem. Theorem 4.1. nbit-query (ε) = (ε−d/(d+1) ). To derive a quantum algorithm for the eigenvalue problem we can slightly modify the randomized algorithm we presented previously. The third and last step of the randomized algorithm approximates a weighted integral using MC. The quantum algorithm will approximate that integral using the amplitude amplification algorithm [7]. In particular the quantum algorithm approximates the first two terms on the right-hand side of Eq. (20) by ˜ ˆ q) (q) := ( ¯ + ((q − q) ¯ uˆ 2q¯ ),

(35)

ˆ q) ˆ h (q)), ¯ := (M ¯ h = (m + 1)−1 , where, just as before, q¯ approximates q with error O(n−1 ), ( 2 while ((q − q) ¯ uˆ q¯ ) is the result of the amplitude amplification algorithm with T bit queries as applied in [20] for the approximation of the integral of (q − q) ¯ uˆ 2q¯ in (20). Since (q − q) ¯ uˆ 2q¯ ∞ = O(n−1 ), with probability 43 the error of (35) is bounded from above by ˆ q)| |(q) ¯ − ( ¯ + O((nT )−1 ) + uq¯ − uˆ q¯ L2 O(n−1 ) + O(n−2 ), where the second term is the error of the quantum algorithm ; see also (34). We have seen that ˆ q)| |(q) ¯ − ( ¯ = O(m−1 ) and uq¯ − uˆ q¯ L2 = O(m−1/2 ). This yields an error proportional to m−1 + (nT )−1 + n−1 m−1/2 + n−2 . The algorithm uses nd evaluations of q at deterministic points, plus a number of classical operations proportional to nd + md log m times a constant that depends on d. The algorithm also uses T bit queries involving q, plus of order log2 T + d log m quantum operations, excluding the cost of queries, for the details see [7,19]. Note that log2 T operations are sufficient for the quantum implementation of the Fourier transform used in the amplitude amplification algorithm. The number of qubits is of order log T + d log m. Setting m−1 = ε 2 and T = O(nd ), we get that the error of our algorithm is bounded from above by a quantity proportional to ε2 + n−(d+1) + n−1 ε + n−2 . Note that when d 2 we do not necessarily have to take as many as O(nd ) queries, since reducing the integration error does not reduce the upper bound of the algorithm error which still depends on n−2 . However, taking T = O(nd ) does not change the order of magnitude of the cost of the algorithm. The dominant component of the cost of the algorithm is the nd classical function evaluations required for the approximation of q by q. ¯ Finally, setting n = ε −1/2 yields error O(ε).


821

3 4

and error O(ε) by the

Theorem 4.2. The eigenvalue problem can be solved with probability hybrid algorithm (35). This algorithm uses

ε−d/2 classical function evaluations, ε−2d log ε−1 (times a constant that depends on d) classical arithmetic operations, ε−d/2 bit queries, d 2 log2 ε −1 (times a constant independent of d) quantum operations, excluding queries, (mostly used for the quantum implementation of the Fourier transform), • and a number of qubits proportional to d log ε −1 . • • • •

We see that the number of bit queries used by this algorithm matches the bit query complexity only when d = 1. Perhaps, as in the case of the randomized algorithm, we can improve this situation and obtain matching upper and lower bounds for the number of bit queries using a perturbation formula with higher order terms. This is an open question at this time. Nevertheless, even if this question has a positive answer, the number of arithmetic operations will remain exponential in d, and this is also true for the deterministic and randomized algorithms we have seen, because all of them solve a matrix eigenvalue problem and the size of the matrix is exponential in d. We can solve the eigenvalue problem with cost (number of queries plus other operations) that is not exponential in d using a quantum algorithm without any classical components. The details of the algorithm will become apparent after we discuss, in the next section, a quantum algorithm that solves the eigenvalue problem using a different type of queries, called power queries. This algorithm is based on phase estimation [19], a quantum algorithm approximating an eigenvalue of a Hermitian matrix, which solves the problem with O(log ε −1 ) power queries. Each of the power queries can be approximated by bit queries using the Trotter formula [19] and phase kick-back [8]. The number of bit queries required for the approximation of each power query is a polynomial in ε−1 and its degree is independent of d. In particular, the degree of this polynomial only depends on the norm of the matrix whose eigenvalue is sought, which is independent of d, and on the accuracy demand ε. We have the following theorem whose proof we postpone to the next section. Theorem 4.3. Phase estimation applied for the approximation of the smallest eigenvalue of Mε (q) achieves error O(ε) with probability at least 43 using a number of bit queries proportional to ε−6 log2 ε −1 . The initial state for phase estimation is the eigenvector of Mε (0) = −ε that corresponds to its smallest eigenvalue. The algorithm uses a number of quantum operations, excluding bit queries, proportional to d ε−6 log4 ε −1 and a number of qubits proportional to d log ε−1 . Consequently, nbit-query (ε) = O(ε −6 log2 ε −1 ). 4.2. Power queries In this section, we consider power queries as they have been described in [21]. For some problems, a quantum algorithm can be written in the form #T UT −1 W #T −1 · · · U1 W #1 U0 |0 . | := UT W

(36)

Here U1 , . . . , UT denote unitary matrices independent of the function q just as before, whereas #j are of the form controlled-Wj , see [19, p. 178]. Then Wj = W pj for an the unitary matrices W n × n unitary matrix W that depends on the input of the computational problem, and for some

822


non-negative integers p1 , . . . , pT . Without loss of generality we assume that n is a power of two. Let {|yk } be orthonormalized eigenvectors of W, so that W |yk = k |yk with the corresponding eigenvalue k , where | k | = 1 and k = eik with k ∈ [0, 2) for k = 1, 2, . . . , n. For the unit #j is defined as vectors |x = |0 + |1 ∈ C2 , = 1, 2, . . . , r, the quantum query W

#j |x1 |x2 · · · |xr |yk = |x1 | · · · |xj −1 j |0 + j eipj k |1 |xj +1 · · · |xr |yk . (37) W #j is a 2 × 2 unitary matrix with = r + log n. We stress that the exponent pj only Hence, W affects the power of the complex number eik . #j is called a power query since it is derived from powers of W. Power queries have been W successfully used for a number of problems including the phase estimation problem, see [8,19]. The phase estimation algorithm approximates an eigenvalue of a unitary operator W using a good approximation [1] of the corresponding eigenvector as part of the initial state. The powers of W are defined by pi = 2i−1 . Therefore, phase estimation uses queries with W1 = W , W2 = W 2 , 2 m−1 W3 = W 2 , . . . , Wm = W 2 . It is typically assumed, see [8], that we do not explicitly know 2 W but we are given quantum devices that perform controlled-W, controlled-W 2 , controlled-W 2 , and so on. For our eigenvalue problem, we discretize the operator Lq on a grid with mesh size h, as we did when we were discussing deterministic algorithms. We obtain an md × md matrix Mh (q), with h = (m + 1)−1 , that is symmetric positive definite. Then we define the matrix √ W = exp (i Mh (q)) with i = −1 and a positive , (38) which is unitary since Mh (q) is symmetric. #j used in (36). Accordingly, we modify the Using the powers of W we obtain the matrices W #j is one query definition in Eq. (31) by assuming, as in [19, Chapter 5], that for each j the W quantum query. Hence for algorithms that can be expressed in the form (36), the number of power queries is T, independently of the powers pj . With the understanding that the number of queries T is defined differently in this section than ˆ T ) of the algorithm (36) is given by (32). Similarly, the power query before the error equant (, power query (ε) is defined by (33). complexity n We now exhibit a quantum algorithm with power queries that approximates (q) with error O(ε). Consider W defined by (38) with = 1/(2d), i.e.,

1 W = exp (39) i Mh (q) . 2d The eigenvalues of W are eij (Mh (q))/(2d) , with j (Mh (q)) being the eigenvalues of the md × md matrix Mh (q). Without loss of generality we assume that m is a power of two. These eigenvalues can be written as e2ij , where j = j (Mh (q)) =

1 j (Mh (q)) 4d

are called phases. We are interested in estimating the smallest phase 1 (Mq ), which belongs to (0, 1) since 1 (Mh (q)) ∈ [d2 , d2 + 1]. We denote the eigenvector of Mh (q) and W that corresponds to j (Mh (q)) by zj (Mh (q)), with zj (Mh (q))2 = 1, j = 1, . . . , md , indexed in non-decreasing order of eigenvalues.


823

Phase estimation, see [19, Section 5.2], is a quantum algorithm that approximates the phase 1 (Mq ). Clearly, to compute an ε-approximation of 1 (Mh (q)), it is enough to compute an ε/(4d)-approximation of 1 (Mh (q)). The initial state of phase estimation algorithm is |0⊗b |z1 , where b is related to the accuracy of the algorithm and will be determined later, while |z1 = |z1 (Mh (q)). It is helpful to think of two registers holding the initial state. The top register is b qubits long and holds |0⊗b , while the bottom register holds the eigenvector |z1 . Abrams and Lloyd [1] showed that phase estimation can still be used even if the eigenvector |z1 is replaced by a good approximation |. More precisely, expanding | in the basis of the eigenvectors |zj , the initial state takes the form |0

⊗b

⊗b

| = |0

d −1 m

dk |zk .

k=0

The success probability of the algorithm depends on |d1 |2 , the square of the projection of | onto |z1 . Omitting the details, which are not important in the analysis here and can be found in [1,21], a measurement of the top register of the final state of phase estimation, with probability at least 8 |d1 |2 , 2 will produce an index j ∈ [0, 2b − 1] such that j − 1 (Mh (q)) 1 . 2b 2b The cost of phase estimation is equal to b power queries, plus a number of operations proportional to b2 + d log m, plus the cost for preparing the initial state |. The number of qubits used is b + d log m. We remark that the O(b2 ) operations are for the quantum implementation of the (inverse) Fourier transform used in phase estimation [19]. Taking into account that the matrix eigenvalue approximates (q) with error O(h), where h = (m + 1)−1 , we obtain that (q) − 4dj 4d + O(h). 2b 2b Therefore, it suffices to set h = ε and b = log ε−1 to obtain error O(ε) in the approximation of (q). Under these conditions, the cost of the algorithm is equal to log ε −1 power queries, plus a number of operations proportional to log2 ε −1 , plus the cost of preparing the initial state. Recall that we want to implement a good approximation | of |z1 leading to success probability at least 43 . Consider the eigenvector corresponding to the smallest eigenvalue when q = 0. Denote this (1) eigenvector by |z1 (Mε (0)), Mε (0) = −ε . Then |z1 (Mε (0)) = |z1 ⊗d , i.e., |z1 (Mε (0)) is the tensor product of the eigenvectors of the ε−1 × ε−1 matrix of the corresponding one-dimensional (1) problem (i.e., when d = 1) [11]. Each |z1 can be implemented using the Fourier transform with a number of operations proportional to log2 ε −1 , see [19, p. 209, 17,28] for more details. Therefore, we can implement |z1 (Mε (0)) with cost proportional to d log2 ε −1 .

824


Now consider any q ∈ Q. Since we know that the eigenvalues of Mε (q) are well separated (18), using |z1 (Mε (0)) as an approximate eigenvector we find that the square of its projection onto z1 (Mε (q)) satisfies [29, p. 173] |z1 (Mε (q))|z1 (Mε (0))|2 1 −

1 . (32 − 2)2

Define the initial state of phase estimation using | := |z1 (Mε (0)) to obtain that |d1 |2 1 − (32 − 2)−2 which leads to a success probability

8 1 3 8 2 . |d1 | 2 1 − 4 2 (32 − 2)2 We have proved the following theorem. Theorem 4.4. The eigenvalue problem can be solved with error O(ε), and probability at least 3 4 , by discretizing Lq and then approximating the smallest eigenvalue of the resulting matrix Mε (q) by phase estimation that uses power queries. The initial state of phase estimation uses the eigenvector of Mε (0) = −ε that corresponds to its smallest eigenvalue. The cost of the algorithm is proportional to • log(ε−1 ) power queries, • log2 ε −1 + d log ε−1 quantum operations, • d log ε−1 qubits. Let us now turn to the query complexity npower-query (ε). The previous theorem implies that = O(log ε−1 ). Consider a function q ∈ Q such that q(x1 , . . . , xd ) = dj =1 g(xj ), where g ∈ C 1 ([0, 1]) is non-negative and g∞ 1 and g ∞ 1. Then [23, p. 113] the eigenvalue problem (1), (2) has a separable solution which is obtained by solving the Sturm–Liouville eigenvalue problem

npower-query (ε)

−y (x) + g(x)y(x) = y(x), y(0) = y(1) = 0.

x ∈ (0, 1),

Denoting the smallest eigenvalue of this problem by (g) we have (q) = d (g). Any algorithm that approximates (q) with error O(ε) also approximates (g) with error O(ε). Using the power query lower bound for the Sturm–Liouville eigenvalue problem [5,6], we conclude any quantum algorithm with power queries that approximates (q) with error O(ε) must use (log ε−1 ) queries. Combining the lower bound with the previous theorem leads to tight power query complexity bounds. Theorem 4.5. npower-query (ε) = (log ε−1 ). We are now ready to prove the upper bound for the bit-query complexity of Theorem 4.3.


825

Proof of Theorem 4.3. We use phase estimation as in the proof of Theorem 4.4 but instead of power queries we will use bit queries to approximate them. Recall Eq. (39), with h = ε. The matrix Mε (q) has size md × md with (m + 1)−1 = ε. Its largest eigenvalue does not exceed 4dε −2 + 1 [11, p. 268]. Therefore, we have (2d)−1 Mε (q)2 (4dε −2 + 1)/(2d). For = 4dε −2 + 1 we have (2d)−1 Mε (q)2 1. Recall that (2d)−1 Mε (q) = −(2d)−1 ε + (2d)−1 Bε (q). For notational convenience define A1 = −(2d)−1 ε and A2 = (2d)−1 Bε (q). Then A1 2 1 and A2 2 1. Using the Trotter formula [19, p. 208] we have i(A1 +A2 )/k − eiA1 /k eiA2 /k ck −2 , e 2

where c is a constant (see also [16,22] and the references therein). From (39) we have W L = ei(A1 +A2 )L

for any L ∈ N

and therefore L iA /k iA /k k L c L . W − e 1 e 2 k 2

(40)

In phase estimation we require the maximum power of W to be of order ε−1 . Setting L = O(ε −1 ) in the equation above, we have that L is of order ε−3 . Thus for k proportional to ε −3 log2 ε −1 , the error in the approximation of the matrix exponential (40) is O(log−2 ε −1 ). From [8] we know that using bit queries and phase kick-back we can obtain eiA2 /k . Hence, to approximate the O(log ε−1 ) power queries of phase estimation the algorithm we need a total number of bit queries proportional to ε−6 log2 ε −1 . Since the eigenvalues and eigenvectors of −ε are known, each of the eiA1 /k can be implemented using the quantum Fourier transform with a number of quantum operations proportional to d log2 ε −1 . Thus the total number of quantum operations, excluding bit queries, required to approximate all the power queries is proportional to d ε −6 log4 ε −1 . Using (40) to approximate the power queries only changes the success probability of phase estimation [19, p. 195]. Since phase estimation uses of order log ε −1 power queries and each is approximated with error O(log−2 ε −1 ) the success probability may be reduced by a quantity proportional to log−1 ε −1 . Therefore for ε sufficiently small, the probability remains greater than or equal to 43 . We conclude by addressing the qubit complexity of our problem. By qubit complexity we mean the minimum number of qubits required for a quantum algorithm to achieve error ε. We denote the qubit complexity by nqubit (ε). The qubit complexity is related to the classical information complexity nwor (ε) by nqubit (ε) = (log nwor (ε)). This is shown in [30] and it holds regardless of the type of queries used. Since, nwor (ε) = (ε−d ) we get nqubit (ε) = (log ε−1 ). On the other hand, phase estimation solves the problem with error O(ε) using a number of qubits proportional to d log ε−1 . We have proved the following theorem.

826


Theorem 4.6. nqubit (ε) = (log ε−1 ). Acknowledgments I am very grateful to A. Bessen for the extensive discussions we had and his insightful remarks that significantly improved this paper. I thank J.H. Lai, J.F. Traub and A.G. Werschulz for their comments and suggestions. References [1] D.S. Abrams, S. Lloyd, Quantum algorithm providing exponential speed increase for finding eigenvalues and eigenvectors, Phys. Rev. Lett. 83 (1999) 5162–5165. [2] I. Babuska, J. Osborn, Eigenvalue problems, in: P.G. Ciarlet, J.L. Lions (Eds.), Handbook of Numerical Analysis, vol. II, North-Holland, Amsterdam, 1991, pp. 641–787. [3] R. Beals, H. Buhrman, R. Cleve, R. Mosca, R. de Wolf, Quantum lower bounds by polynomials, in: Proceedings FOCS’98, 1998, pp. 352–361, also http://arXiv.org/quant-ph/9802049. [4] E. Bernstein, U. Vazirani, Quantum complexity theory, SIAM J. Comput. 26 (5) (1997) 1411–1473. [5] A.J. Bessen, A lower bound for phase estimation, Phys. Rev. A 71 (4) (2005) 042313, also http://arXiv.org/quant-ph/0412008. [6] A.J. Bessen, A lower bound for the Sturm–Liouville eigenvalue problem on a quantum computer, J. Complexity 22 (5) (2006) 660–675, also http://arXiv.org/quant-ph/04512109. [7] G. Brassard, P. Hoyer, M. Mosca, A. Tapp, Quantum amplitude amplification and estimation, in: Contemporary Mathematics, vol. 305, American Mathematical Society, Providence, NJ, 2002, pp. 53–74, also http://arXiv.org/quant-ph/0005055. [8] R. Cleve, A. Ekert, C. Macchiavello, M. Mosca, Quantum algorithms revisited, Proc. R. Soc. London A 454 (1998) 339–354. [9] C. Courant, D. Hilbert, Methods of Mathematical Physics, vol. I, Wiley Classics Library, Wiley-Interscience, New York, 1989. [10] R. Courant, Variational methods for the solution of problems of equilibrium and variations, Bull. Amer. Math. Soc. 49 (1943) 1–23. [11] J.W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, 1997. [12] G.E. Forsythe, W.R. Wasow, Finite-Difference Methods for Partial Differential Equations, Dover, New York, 2004. [13] L. Grover, Quantum mechanics helps in searching for a needle in a haystack, Phys. Rev. Lett. 79 (2) (1997) 325–328, also http://arXiv.org/quant-ph/9706033. [14] S. Heinrich, Quantum summation with an application to integration, J. Complexity 18 (1) (2002) 1–50, also http://arXiv.org/quant-ph/0105116. [15] S. Heinrich, Quantum integration in Sobolev spaces, J. Complexity 19 (2003) 19–42. [16] T. Jahnke, C. Lubich, Error bounds for exponential operator splitting, BIT 40 (4) (2000) 735–744. [17] A. Klappenecker, M. Rötteler, Discrete Cosine Transforms on Quantum Computers, 2001, http://arXiv.org/ quant-ph/0111038. [18] A. Nayak, F. Wu. The quantum query complexity of approximating the median and related statistics, in: Proceedings of the 31st Annual ACM Symposium on the Theory of Computing (STOC), 1999, pp. 384–393. LANL preprint quant-ph/9804066. [19] M.A. Nielsen, I.L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, UK, 2000. [20] E. Novak, Quantum complexity of integration, J. Complexity 17 (2001) 2–16, also http://arXiv.org/ quant-ph/0008124. [21] A. Papageorgiou, H. Wo´zniakowski, Classical and quantum complexity of the Sturm–Liouville eigenvalue problem, Quantum Inform. Process. 4 (2005) 87–127, also http://arXiv.org/quant-ph/0502054. [22] M. Suzuki, General theory of higher-order decomposition of exponential operators and symplectic integrators, Phys. Lett. A 165 (1992) 387–395. [23] E.C. Titschmarsh, Eigenfunction Expansions Associated with Second-Order Differential Equations, Part B, Oxford University Press, Oxford, UK, 1958.


827

[24] J.F. Traub, A continuous model of computation, Phys. Today (May 1999) 39–43. [25] J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-Based Complexity, Academic Press, New York, 1988. [26] H.F. Weinberger, Upper and lower bounds for eigenvalues by finite difference methods, Comm. Pure Appl. Math. IX (1956) 613–623. [27] H.F. Weinberger, Lower bounds for higher eigenvalues by finite difference methods, Pacific J. Math. 8 (2) (1958) 339–368. [28] M.V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, A.K. Peters, Wellesley, 1994. [29] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, UK, 1965. [30] H. Wo´zniakowski, The quantum setting with randomized queries for continuous problems, Quantum Inform. Process. 5 (2) (2006) 83–130, also http://arXiv.org/quant-ph/060196.


Cubature formulas for function spaces with moderate smoothness Michael Gnewuch∗ , René Lindloh,1 , Reinhold Schneider, Anand Srivastav Institut für Informatik, Christian-Albrechts-Universität zu Kiel, Christian-Albrechts-Platz 4, 24098 Kiel, Germany Received 30 November 2006; accepted 25 July 2007 Available online 14 September 2007 To Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We construct simple algorithms for high-dimensional numerical integration of function classes with moderate smoothness. These classes consist of square-integrable functions over the d-dimensional unit cube whose coefficients with respect to certain multiwavelet expansions decay rapidly. Such a class contains discontinuous functions on the one hand and, for the right choice of parameters, the quite natural d-fold tensor product of a Sobolev space H s [0, 1] on the other hand. The algorithms are based on one-dimensional quadrature rules appropriate for the integration of the particular wavelets under consideration and on Smolyak’s construction. We provide upper bounds for the worst-case error of our cubature rule in terms of the number of function calls. We additionally prove lower bounds showing that our method is optimal in dimension d =1 and almost optimal (up to logarithmic factors) in higher dimensions. We perform numerical tests which allow the comparison with other cubature methods. © 2007 Elsevier Inc. All rights reserved. Keywords: Numerical integration; Smolyak’s algorithm; Sparse grids; Multi wavelets

1. Introduction The computation of high-dimensional integrals is a difficult task arising, e.g., from applications in physics, quantum chemistry, and finance. The traditional methods used in lower dimensions, ∗ Corresponding author. Fax: +49 431 880 1725.

E-mail addresses: [email protected] (M. Gnewuch), [email protected] (R. Lindloh), [email protected] (R. Schneider), [email protected] (A. Srivastav). 1 Partially supported by the DFG-Graduiertenkolleg 357 “Effiziente Algorithmen und Mehrskalenmethoden”.


M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850

829

such as product rules of one-dimensional quadratures, are usually too costly in high dimensions, since the number of function calls increases exponentially with the dimension. In this paper we present a cubature method which can be used to handle the following multivariate integration problem also in higher dimensions: Problem definition. We want to approximate the integral f (x) dx I (f ) = [0,1]d

for functions f : [0, 1)d → R belonging to function classes H of theoretical or practical interest. It is important from the view point of applicability of high-dimensional cubature that the function class is general and rich and contains important classes arising in numerical mathematics. A general cubature formula with N sample points {x1 , x2 , . . . , xN } ⊂ [0, 1]d is given by QN (f ) =

N

f (x ),

=1

where {1 , . . . , N } is some suitable set of weights. To measure the quality of a given cubature QN we use the worst case error over H defined by err(H, QN ) :=

sup

f ∈H,f =1

err(f, QN ),

where, err(f, QN ) := |I (f ) − QN (f )|. As I and QN are linear, err(H, QN ) is nothing but the operator norm I − QN op induced by the norm of H. Results. The function classes we consider in this paper are certain Hilbert spaces Hs = f ∈ L2 | f s < ∞ which are spanned by multiwavelets and are characterized by ||2s discrete norms f 2s = f, 2 . The functions in Hs are continuously embedded 2 2 d in L [0, 1] and under proper requirements Hs contains classical function spaces like Sobolev spaces. Our aim is to provide a cubature method that guarantees a (nearly) optimal worst case error and which is easy to implement. For arbitrary parameters s > 21 we show that its worst case error over Hs is of the form (d−1)(s+1/2) , where N denotes the number of sample points used. We also prove a lower O log(N) N s (d−1)/2 for all cubatures on Hs using N sample points. This shows that the bound log(N)N s presented integration method converges on Hs asymptotically almost optimal. Our cubatures are based on one-dimensional quadratures chosen with respect to the particular space Hs under consideration, and Smolyak’s construction. More precisely, we use composite quadrature rules of a fixed order n. These rules are exact for piecewise polynomials of order n. The presented Smolyak construction is related to tensor product multiwavelet expansions in the way that the cubature is exact on finite multiwavelet series up to a critical level.

830


Related work. To some extent our work is motivated by [13], where the considered function classes depend on Haar wavelet series and a randomized cubature given by a quasi-Monte Carlo rule using so-called scrambled nets (see, e.g., [15]) is studied. These classes of Haar wavelets are included in the classes of multiwavelets that we consider. Notice that cubature rules using scrambled nets are not exact for (piecewise) polynomials of higher degree, in contrast to our method. It is known that Smolyak’s construction leads in general to almost optimal approximations in any dimension d > 1 as long as the underlying one-dimensional quadrature rule is optimal. The application of Smolyak’s construction to numerical integration has been studied in a number of papers so far, see, e.g., [3,4,11,12,14,16,20,24] and the literature mentioned therein. The error bounds provided in these papers were usually proved on Korobov spaces or spaces of functions with bounded mixed derivatives, i.e., on spaces of functions with a certain degree of smoothness. For our method we provide good error bounds with respect to the Hilbert spaces Hs of not necessarily smooth functions. Note that the power of the logarithm of N in our upper bound is (d − 1)/2 less than the power in the corresponding upper bounds appearing in the papers mentioned above. This paper is organized as follows: In Section 2 we define multiwavelets and introduce the spaces on which our cubatures of prescribed level should be exact. In Section 3 we present one-dimensional quadratures suited to evaluate the integrals of the univariate wavelets introduced in Section 2. We define a scale of Hilbert spaces of square integrable functions over [0, 1) via wavelet coefficients and prove an optimal error bound for our quadrature with respect to these spaces. In Section 4 we use Smolyak’s construction to obtain from our one-dimensional quadratures cubature rules for multivariate integrands. After giving a precise definition of the class of Hilbert spaces Hs of multivariate functions we want to consider error bounds for our cubatures; first in terms of the level of our cubatures, then in terms of the number of function calls. We provide also lower bounds for the worst case error of any cubature QN using N sample points. These lower bounds show that our cubature method is asymptotically almost optimal (up to logarithmic factors). In Section 5 we report on several numerical tests which allow us to compare our method with known methods. In Section 6 we provide a conclusion and make some remarks concerning future work. 2. Discontinuous multiwavelet bases 2.1. The one-dimensional case We start by giving a short construction of a class of bases in L2 [0, 1] that are called discontinuous multiwavelet bases. This topic has already been studied in the mathematical literature, see, e.g., [2,18,23]. By n we denote the set of polynomials of order n, i.e., of degree strictly smaller than n, on [0, 1). Let h0 , h1 , . . . , hn−1 denote the set of the first n Legendre polynomials on the interval [0, 1); an explicit expression of these polynomials is given by

j j j +k hj (x) = (−1) (−x)k k k j

k=0


831

for all x ∈ [0, 1), see, e.g., [1]. These polynomials build an orthogonal basis of n and are orthogonal on lower order polynomials, 1 hj (x)x i dx = 0, i = 0, 1, . . . , j − 1. 0

For convenience we extend the polynomials hj by zero to the whole real line. With the help of these (piecewise) polynomials we define for i = 0, 1, . . . , n − 1 a set of scaling functions i (x) := hi (x)/ hi 2 , where · 2 is the usual norm on L2 [0, 1]. For arbitrary j ∈ N0 we use the shorthand ∇j := 0, 1, 2, . . . , 2j − 1 . We consider dilated and translated versions j

i,k := 2j/2 i (2j · −k),

i = 0, 1, . . . , n − 1, j ∈ N0 , k ∈ ∇j ,

of the scaling functions i . Observe that these functions have compact support supp i,k = [2−j k, 2−j (k + 1)] =: Ik j

j

and j

j

i,k , i ,k = i,i k,k . Furthermore, we define spaces of piecewise polynomial functions of order n, j j Vn := span i,k |i = 0, 1, . . . , n − 1, k ∈ ∇j . j

It is obvious that the spaces Vn have dimension 2j n and that they are nested in the following way: n = Vn0 ⊂ Vn1 ⊂ · · · ⊂ L2 [0, 1]. j

For j = 0, 1, 2, . . . we define the 2j n-dimensional space Wn to be the orthogonal complement j j +1 of Vn in Vn , i.e., j j +1 j Wn := ∈ Vn |, = 0 for all ∈ Vn . This leads to the orthogonal decomposition j −1

j

Vn = Vn0 ⊕ Wn0 ⊕ Wn1 ⊕ · · · ⊕ Wn j

of Vn . 0 Let (i )n−1 i=0 be an orthonormal basis of Wn . (An explicit construction of such a basis in more general situations is, e.g., given in [18, Subsection 5.4.1].) Then it is straightforward to verify that the 2j n functions j

i,k := 2j/2 i (2j · −k),

i = 0, . . . , n − 1, k ∈ ∇j , j

form an orthonormal basis of Wn . The functions (i )n−1 i=0 are called multiwavelets and are obviously also piecewise polynomials of degree strictly less than n. Multiwavelets are supported on canonical intervals j

j

supp i,k = Ik

832


and satisfy the orthogonality condition j

i,k , m l,n = i,l j,m k,n . j

Since the spaces Wn are orthogonal to Vn0 = n , we have vanishing moments 1 j i,k (x)x dx = 0, = 0, 1, . . . , n − 1. 0

Next we define the space V :=

∞

j

Vn = Vn0 ⊕

∞

j =0

j

(2.1)

Wn .

j =0

Notice that V contains all elements of the well-known Haar basis; therefore V is dense in L2 [0, 1]. We follow the convention from [18] and define −1 i := i (please do not confuse this notation with the notation of inverse functions), ∇−1 := {0} and I0−1 := [0, 1]. A so-called multiwavelet basis of order n for L2 [0, 1] is given by j i,k |i = 0, 1, . . . , n − 1, j − 1, k ∈ ∇j , and for every f ∈ L2 [0, 1] we get the following unique multiwavelet expansion f =

n−1

j

j

f, i,k i,k .

j −1 k∈∇j i=0

2.2. The multivariate case In this subsection we extend the concept of multiwavelet bases to higher dimensions. Here we follow an approach that is suitable for our later analysis. For a given multi-index j ∈ Zd we put |j| := j1 + j2 + · · · + jd , and for i ∈ Nd0 let |i|∞ := max {i1 , . . . , id }. A multivariate multiwavelet basis of L2 [0, 1]d is given by so-called tensor product wavelets. For n ∈ N, we define the approximation space on level L by Vnd,L

:=

d

j

Vn i .

(2.2)

|j|=L i=1

Similarly to the one-dimensional case we put V d :=

∞

V d,L .

L=0

Since V = V 1 is dense in L2 [0, 1], the space V d is dense in L2 [0, 1]d . Thus we obtain the following expansion for f ∈ L2 [0, 1]d f =

n−1 j −1 k∈∇j |i|∞ =0

j

j

f, i,k i,k ,


833

where j = (j1 , . . . , jd ) −1 is meant in the way that ju −1 for all u = 1, . . . , d. (In the following all inequalities between vectors and between a vector and a scalar are meant componentwise.) Furthermore, we used the shorthands ∇j = ∇j1 × · · · × ∇jd and j

i,k :=

d

j

iuu,ku .

u=1 j

If the d-dimensional canonical interval Ik is defined by j

j

j

j

Ik := Ik11 × Ik22 × · · · × Ikdd , j

j

then supp i,k = Ik holds. 3. One-dimensional integration 3.1. One-dimensional quadrature formulas Recall that a general one-dimensional quadrature is given by Qm (f ) =

m

f (x ),

(3.1)

=1

where x1 , . . . , xm ⊂ [0, 1] are the sample points, and 1 , . . . , m ∈ R are the weights. Since we are here interested in quadrature formulas with high polynomial exactness—like the Newton– Cotes, Clenshaw–Curtis or Gauss formulas—we confine ourselves to the case m =1 = 1. For a detailed discussion of one-dimensional quadrature formulas see, e.g., [7]. Our aim is to give a simple construction of quadrature formulas QN which satisfy for a given polynomial order n and a so-called critical level l err(h, QN ) = 0

for all h ∈ Vnl .

We get the requested quadrature by scaling and translating a simpler one-dimensional quadrature formula Qm that is exact for all polynomials of order n on [0, 1]. If Qm has the explicit form (3.1), then our resulting quadrature uses 2l m sample points and is given by Am (l, 1)(f ) :=

m

2−l f (2−l x + 2−l k).

(3.2)

k∈∇l =1 j

Am (l, 1) is exact for polynomials on canonical intervals Ik , j l, k ∈ ∇j , of degree strictly less than n and therefore also on the whole space Vnl . Let us call a sequence of quadratures or cubatures (QN )N nested if the corresponding sets of sample points (XN )N are nested, i.e., if XN ⊆ XN+1 for all N. Whether our quadratures (Am (l, 1))l are nested or not depends of course on the set of sample points X of the underlying quadrature Qm . If we, e.g., consider the case n = 1, then we may choose Qm to be the mid point rule Qm (f ) = f ( 21 ), which results in the non-nestedness of our quadratures (Am (l, 1))l . If we choose on the other hand the rule Qm (f ) = f (0), then our quadratures are indeed nested. (Notice that in the latter case Am (l, 1) is nothing but the iterated trapezoidal rule for periodic functions.)

834


3.2. Error analysis For the error analysis of our one-dimensional quadrature method let n ∈ N, and let j i,k |i = 0, 1, 2, . . . , n − 1, j − 1, k ∈ ∇j , be the multiwavelet basis of order n defined in Section 2.1. For s > 0 we define a discrete norm |f |2s,n :=

n−1

j

2j 2s f, i,k 2

(3.3)

Hs,n := f ∈ L2 [0, 1]| |f |s,n < ∞ ,

(3.4)

j −1 k∈∇j i=0

on the space

consisting of functions whose wavelet coefficients decrease rapidly. Point evaluations are obvij ously well defined on the linear span of the functions i,k , i = 0, 1, . . . , n − 1, j − 1, k ∈ ∇j . Moreover, it is easy to see that they can be extended to bounded linear functionals on Hs,n as long as s > 21 . On these spaces quadrature formulas are therefore well defined. Now we choose an m = m(n) and an underlying quadrature rule Qm as in (3.1) such that Qm is exact on n . Let Am (l, 1) be as in (3.2). Then the wavelet expansion of a function f ∈ Hs,n and the Cauchy–Schwarz inequality yield the following error bound for our algorithm Am (l, 1): Theorem 3.1. Let s > 21 and n ∈ N. Let Qm and Am (l, 1) be as above. Then there exists a constant C > 0 such that err(Hs,n , Am (l, 1))C2−ls .

(3.5)

Proof. Let f ∈ Hs,n . The quadrature error is given by err(f, Am (l, 1)) = |I (f ) − Am (l, 1)f | n−1 j j j = f, i,k I (i,k ) − Am (l, 1)i,k . j −1 k∈∇j i=0 The Cauchy–Schwarz inequality yields ⎛ err(f, Am (l, 1)) |f |s,n ⎝

n−1

2−j 2s I (i,k ) − Am (l, 1)i,k j

j

2

⎞1/2 ⎠

.

j −1 k∈∇j i=0

Recall that the Cauchy–Schwarz inequality leads to a tight worst case error bound. Because of the polynomial exactness and vanishing moments we get therefore err(Hs,n , Am (l, 1))2 =

n−1 j l k∈∇j i=0

2 j 2−j 2s Am (l, 1)i,k .

M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850 j

835

j

j

By some easy calculations and with the identities supp i,k = Ik and i,k ∞ = 2j/2 i ∞ we get err(Hs,n , Am (l, 1))2

⎧ m ⎨

⎫2 ⎬ j 2−j 2s 2−l | | i,k 1I j (2−l x + 2−l k ) ⎩ ⎭ ∞ k j l k∈∇j i=0 k ∈∇l =1 ⎧ ⎫2 m n−1 ⎨ ⎬ −l −l i 2 | | = 2−2l 2j (1−2s) (2 x + 2 k ) . 1 j I ∞ ⎩ ⎭ k n−1

j l

k ∈∇l =1

k∈∇j

i=0

For j l and k ∈ ∇j let = (j, k, l) be the unique element ∈ ∇l such that 2−l 2−j k < 2−j (k + 1) 2−l ( + 1). Then err(Hs,n , Am (l, 1))2

2

2

∞

j l

2

i=0

k∈∇j

i=0

∈∇l

n−1

2 i |∇l | ∞

2

=

2

−2l j (1−2s)

2

m

=1

2 −l

−l

| | 1I j (2 x + 2 ) k

m 2 n−1 2 −l −l i | | 1Il (2 x + 2 ) ∞

−2l j (1−2s)

j l

n−1 2 i

−2l j (1−2s)

j l

=1

m

2 | |

.

=1

i=0

Note that |∇l | = 2l . We can upper bound the integration error by m 2 n−1 2 2 | | 2−l 2j (1−2s) i err(Hs,n , Am (l, 1)) ∞

i=0

=

n−1

2 i

∞

i=0

=

n−1 2 i

∞

i=0

=1 m

2−l2s

| |

=1 m

j l

2 2 | |

=1

2j (1−2s)

j 0

2−l2s . 1 − 2(1−2s)

Thus we proved that (3.5) holds with the constant C=√

1 1 − 21−2s

n−1 i=0

1/2 i 2∞

m

| |.

=1

Remark 3.2. The error estimate in Theorem 3.1 is asymptotically optimal as Theorem 4.9 will reveal.

836


4. Multivariate numerical integration 4.1. The d-dimensional cubature method Now we extend our one-dimensional algorithm Am (l, 1) to a d-dimensional cubature. This should be done via Smolyak’s construction: The so-called difference quadrature of level l 0 is defined by l := Am (l, 1) − Am (l − 1, 1), with Am (−1, 1) := 0. Smolyak’s construction of level L is then given by Am (L, d) := (l1 ⊗ l2 ⊗ · · · ⊗ ld ). l∈Nd0 ,|l| L

Examples of sets of sample points used by Smolyak’s algorithm are provided in Fig. 1. Notice that we have 0 = Qm . Let us recall that in the one-dimensional case Am (l, 1) is exact on Vnl . In the d-dimensional case, it is not too difficult to show the exactness of Am (L, d) on Vnd,L . Theorem 4.1. The cubature Am (L, d) is exact on the approximation space Vnd,L . The proof follows the lines of the proof of [14, Theorem 2] and proceeds via induction over the dimension. 4.2. Upper bounds for the cubature error For the error analysis we consider product spaces which are based on the spaces Hs,n used for our one-dimensional quadrature error bounds. These seem to be the natural spaces for our 1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

Fig. 1. A3 (5, 2) and A2 (3, 2) with underlying Gauss quadrature. In the right diagram “+” denotes sample points with positive, “o” sample points with negative weights.


837

variation of Smolyak’s construction. For a function f we define a norm |f |2d,s,n :=

n−1 j −1 k∈∇j |i|∞ =0

2|j|2s f, i,k 2 j

(4.1)

and the space d := {f ∈ L2 [0, 1]d | |f |d,s,n < ∞}. Hs,n

In [24, Lemma 2] Wasilkowski and Wo´zniakowski provided an error bound that is valid not only for d-dimensional cubatures, but also for more general d-dimensional approximation algorithms based on Smolyak’s construction. Adapting the corresponding proof, we see that our one dimensional error bound from Theorem 3.1 implies the following result. Theorem 4.2. Let d, n ∈ N, and let the one-dimensional quadrature Qm be exact on n . For s > 21 let C be the constant from (3.5). The worst case error of Am (L, d) satisfies

s !d−1 −Ls L + d d s . 2 err(Hs,n , Am (L, d))C max 2 , C (1 + 2 ) d −1 Instead of explaining the proof in detail we want to provide a better upper bound in which L+d essentially the term d−1 ∼ Ld−1 is replaced by L(d−1)/2 . Before establishing the corresponding theorem, we state a simple helpful lemma and a well-known identity. Lemma 4.3. Let i, j ∈ Zd with n − 1i 0, j − 1, and let k ∈ ∇j . Assume that jd = −1 and id = 0. If i , j and k denote the (d − 1)-dimensional vectors consisting of the first d − 1 components of i, j and k, respectively, then we have for all L ∈ N0 j

j

Am (L, d)i,k = Am (L, d − 1)i ,k . 0 d d Proof. We have idd,kd = −1 0 = 0 = 1[0,1) , implying id ,kd = 1 and id ,kd = 0 for all 1. Now the lemma follows immediately from the definition of Am (L, d). j

j

j

A well-known formula expressing Am (L, d) solely in terms of tensor quadratures is

d d −1 L−|l| (−1) Am (lu , 1). Am (L, d) = L − |l| L−d+1 |l| L

(4.2)

u=1

A proof of this identity can, e.g., be found in [24, Lemma 1]. Theorem 4.4. Let d, n ∈ N, and let the one-dimensional quadrature Qm be exact on n . For s > 21 there exists a constant C > 0 such that for all L > 0 the worst case error of Am (L, d) satisfies d , Am (L, d))C2−Ls L(d−1)/2 . err(Hs,n

Proof. For the sake of brevity we do not try to give a reasonably good bound for the constant C in the theorem; instead we use rather rough estimates and a generic constant C, which may depend on n, m, s and d, but not on the given level L.

838


We proceed via induction on d. The case d = 1 has already been treated in Theorem 3.1. Let now d 2, and let the induction hypothesis hold for d − 1. Similarly as in the one-dimensional case we can use the Cauchy–Schwarz inequality and the exactness of Am (L, d) on Vnd,L to get d err(Hs,n , Am (L, d))2 =

n−1

|j| L−d+1 k∈∇j |i|∞ =0

2−|j|2s {Am (L, d)i,k }2 ; j

j

j

hereby note that i,k ∈ ⊗d=1 Vnl if and only if l > j for all ∈ {1, . . . , d}, i.e., i,k ∈ Vnd,L if and only if |j| < L − d + 1. To avoid technical difficulties, we now show that the summation over the index sets U () := {(i, j)|n − 1|i|∞ 0, i = 0, |j|L − d + 1, j = −1} for all ∈ {1, . . . , d} contributes not essentially to the square of the worst case error. Indeed, if i , j , and k denote (d − 1)-dimensional vectors, then Lemma 4.3 yields j 2−|j|2s {Am (L, d)i,k }2 (i,j)∈U () k∈∇j

=

n−1

|i |∞ =0 |j | L−(d−2) k ∈∇j

j

2−(|j |−1)2s {Am (L, d − 1)i ,k }2

d−1 = 22s err(Hs,n , Am (L, d − 1))2 C2−2Ls Ld−2 ,

where in the last step we used the induction hypothesis. So let us now consider solely pairs (i, j) where for all ∈ {1, . . . , d} we have i 1 or j 0. For such pairs (i, j), for k ∈ ∇j and ∈ {L − d + 1, . . . , L} let us define d j, ju d Si,k := l ∈ N0 ||l| = ∧ Am (lu , 1)iu ,ku = 0 . u=1 j

If ju < lu , then, due to the exactness of Am (lu , 1) on Vnlu , we have Am (lu , 1)iuu,ku = 0 (since ju = −1 or iu = 0). Thus j, Si,k ⊆ S˜j, := l ∈ Nd0 ||l| = ∧ ∀u ∈ {1, . . . , d}: lu ju . A coarse estimate of the cardinality of S˜j, is

|j| − + d − 1 |S˜j, | . d −1 (One can verify this bound by starting with j and counting the ways to distribute the difference − |j| to the components of j to get an l ∈ Zd with |l| = and lu ju for 1 ud.) With these observations and with identity (4.2) we get d , Am (L, d))2 err(Hs,n

C2−2Ls Ld−2 +C

n−1

⎧ ⎪ ⎨

L

⎫ d ⎪2 ⎬ j Am (lu , 1)iuu,ku . ⎪ ⎭

2−|j|2s ⎪ ⎩=L−d+1 l∈S˜ |i|∞ =0 |j| L−d+1 k∈∇j

j,

u=1


Since for l ∈ S˜j, the tensor quadrature j supp (i,k ), we have

#d

u=1 Am (lu , 1)

839

uses at most md sample points from

d d $ ju j Am (lu , 1)iu ,ku 2−|l| md iuu,ku ∞ 2− md 2|j|/2 M, u=1

u=1

where,

d n−1 M := max{i ∞ , i ∞ } . i=0

#d d L Since each of the tensor quadratures u=1 Am (lu , 1) uses not more than m 2 points, we have 2L function evaluations to calculate the term inside the to make at most C |j|−(L−d+1)+d−1 d−1 j

parentheses. For fixed i and j all the i,k , k ∈ ∇j , have pairwise disjoint support and thus only the summation over some subset ∇˜ j of ∇j with

|j| − (L − d + 1) + d − 1 L |∇˜ j | C 2 d −1 yields a non-trivial contribution to our estimate. Altogether we get (suppressing the lower order term C2−2Ls Ld−2 ) d err(Hs,n , Am (L, d))2 C

∞

|∇˜ j |2−2s

=L−d+1 |j|=

−(L−d+1)+d − 1 −L /2 2 2 2 . d −1

Our estimate for |∇˜ j | and |{j|j − 1, |j| = }| = +2d−1 lead to d−1

∞ + 2d − 1 −(L−d+1)+d−1 3 d err(Hs,n , Am (L, d))2 C2−L 2(1−2s) d −1 d −1 =L−d+1

∞ +d −1 3 −L (L−d+1)(1−2s) (1−2s) L + + d C2 2 2 d −1 d −1 =0 ∞

4 +d −1 2−2Ls Ld−1 . C 2(1−2s) d −1 =0

The sum inside the parentheses converges as s > 21 .

From the abstract definition of our function space Hs,n it is not immediately clear if it contains a reasonable class of interesting functions away from the piecewise polynomials. At least in the case where the parameter n is strictly larger than s, the Sobolev space H s [0, 1] is continuously embedded in Hs,n . There are several ways to define Sobolev spaces with non-integer index s ∈ R, one can use for example the Fourier transform ˆ f ( ) := f (x)e−ix dx R

840


to define the norm 2 f s = (1 + |y|2 )s fˆ(y) dy R

and the space H s (R) = f ∈ L2 | f s < ∞ . For the interval [0, 1] we define H s [0, 1] = H s (R)|[0,1] by restriction, i.e., f ∈ H s [0, 1] if there exists a function g ∈ H s (R) such that in the sense of distributions g|[0,1] = f and f H s [0,1] =

inf

g:f =g|[0,1]

gs .

The continuous embedding of H s [0, 1] into Hs,n is established by some Jackson type inequality. s Theorem 4.5. Let (i )n−1 i=0 be multiwavelets of order n. For all s < n the inclusion H [0, 1] ⊂ Hs,n holds. More precisely, there exists a constant K > 0 such that for every f ∈ H s [0, 1] we have n−1 j −1 k∈∇j i=0

j

2j 2s f, i,k 2 K 2 f 2H s [0,1] .

For a proof of the theorem see, e.g., [5,18,23]. Notice that in general we cannot hope to prove equivalence of the norms on Hs,n and H s [0, 1]. This is obvious in the case where s > 21 : Hs,n contains discontinuous functions, while H s [0, 1] does not. s is defined by The mixed Sobolev space Hmix s Hmix = H s [0, 1] ⊗ H s [0, 1] ⊗ · · · ⊗ H s [0, 1], % &' ( d times s i.e., it is the complete d-fold tensor product of the Hilbert space H s [0, 1]. In terms of Hmix Theorem 4.4 reads as follows:

Corollary 4.6. Let s > 21 and n > s. Let the one-dimensional quadrature Qm be exact on n . Then there exists a constant C > 0 such that for every L > 0 s err(Hmix , Am (L, d)) C2−Ls L(d−1)/2 .

Now we analyze the cost of the cubature algorithm Am (L, d). Identity (4.2) shows clearly that the number of multiplications and additions performed by the algorithm Am (L, d) is more or less proportional to the number of function evaluations. Since the cost of one function evaluation is in general much greater than the cost of an arithmetic operation, we concentrate here on the number of sample points N = Nm (L, d) used by Am (L, d). Since for l ∈ Nd0 and a gen# eral d-variate function f the operator du=1 Am (lu , 1) uses 2|l| md function values, identity (4.2)


841

gives us N

2|l| md

L−d+1 |l| L

md 2 L

d−1

2j −d+1

j =0

L+j d −1

md 2L+1

L+d −1 . d −1

The bound on N can be improved if our cubatures (Am (L, d))L are nested, i.e., if the set of sample points used by Am (L, d) is a subset of the set of sample points of Am (L + 1, d) for all L. As pointed out in Section 3.1, the right choice of the underlying quadrature Qm implies that the quadratures (Am (l, 1))l are nested, which again implies—see (4.2)—that the cubatures (Am (L, d))L are nested. Although we get for our cubatures, regardless if they are nested or not, the asymptotic estimate N O(2L Ld−1 ), the hidden constants in the big-O-notation are reasonably smaller if we have nestedness. The upper bound on N, Theorem 4.4, and some elementary calculations lead to the following corollary. Corollary 4.7. Let d, n ∈ N and let Qm be exact on n . For s > Am (L, d) satisfies d , Am (L, d)) err(Hs,n

=O

log(Nm (L, d))(d−1)(s+1/2) (Nm (L, d))s

1 2

the worst case error of

.

s d if s < n. In this situation Remark 4.8. Recall that Hmix is continuously embedded in Hs,n s d Corollary 4.7 holds in particular for Hmix in place of Hs,n .

4.3. Lower bounds for the cubature error In the previous section we discussed error bounds for our d-dimensional cubature rule based d and H s . For the considered spaces on Smolyak’s construction with respect to the spaces Hs,n mix d Hs,n there is a general method to prove lower bounds for the worst case error of any cubature QN . In [13] Heinrich et al., presented a lower bound for Haar wavelet spaces that can be extended to d . (It is not hard to verify that their spaces H the spaces Hs,n wav,s coincide (for base b = 2) with d our spaces Hs,1 .) The idea is to construct a finite linear combination f of weighted (multi)wavelet series that is zero on all canonical intervals of a fixed chosen level which contain a sample point of QN . This should be done in such a way that the d-dimensional integral I (f ) is large while the norm |f |d,s,n should remain small. (Similar proof ideas had been appeared in the mathematical literature before; cf, e.g., the well-known proof of Roth of the lower bound for the L2 -discrepancy [17].) Theorem 4.9. Let s > 21 and n ∈ N. There exists a constant C > 0 such that for any d-dimensional cubature rule QN using N sample points we have d err(Hs,n , QN ) C

(log N )(d−1)/2 . Ns

842


Proof. Let P ⊂ [0, 1]d , |P | = N be the set of sample points used by the cubature rule QN . For all l ∈ Nd0 we define a function ) 1 for all x ∈ Ikl , k ∈ ∇l with Ikl ∩ P = ∅, fl (x) = 0 else. Now we choose the uniquely determined integer L that satisfies 2L−1 < 2N 2L and define a function f = fl . |l|=L

Hence we get for the norm of our candidate n−1

|f |2d,s,n =

j

j −1 k∈∇j |i|∞ =0

=

2|j|2s f, i,k 2

n−1

|l|=|l |=L j −1 k∈∇j |i|∞ =0

2|j|2s fl , i,k fl , i,k . j

j

j

Due to (2.1) the inner product fl , i,k vanishes if one of the indices j satisfies j l 0. d Furthermore, if we put M := maxn−1 { , } , we have i ∞ i ∞ i=0 j j j fl , i,k i,k fl ∞ vol(Ik ) M|∇j |−1/2 . ∞

Therefore we get |f |2d,s,n nd M 2

2|j|2s |∇j |−1

|l|=|l |=L −1 j 2, is not of cotype 2, we get that for X = lr , r > 2, (∗) is not true. It is known that (∗) is true for Banach spaces which have Sazonov property (see [26, Theorem 6.2.4]). For example, the space X = lr , 1 r 2, has Sazonov property (see [26, Corollary to Theorem 6.2.1], and, therefore, (∗) is true for X = lr , 1 r 2. 3.6. Gaussian disintegration In this subsection we will establish the existence of a disintegration of a Gaussian measure with respect to a continuous linear mapping. Theorem 3.11. Let X, Y be real separable Banach spaces, be a Gaussian measure on B(X) with mean zero and covariance operator C : X ∗ → X. Let also : X → Y be a continuous linear operator and := be the image of under . Then there exist a Borel measurable mapping m : Y → X, a Gaussian covariance R : X ∗ → X with R C and a disintegration (qy )y∈Y of on B(X) with respect to such that for a fixed y ∈ Y , qy is Gaussian measure on B(X) with mean m(y) ∈ X and covariance operator R. Moreover: (a) If C = C ∗ : Y ∗ → Y is a finite-rank operator, then (C (Y ∗ )) = 1 and the mapping m : Y → X is a continuous linear operator with the property (m(y)) = y, ∀y ∈ C (Y ∗ ). (b) If C = C ∗ : Y ∗ → Y is not a finite-rank operator, then there exists a vector subspace Y0 ⊂ Y such that Y0 ∈ B(Y ), (Y0 ) = 1 and the restriction of the mapping m : Y → X to Y0 is a Borel measurable linear operator with the property (m(y)) = y, ∀y ∈ Y0 . Proof. Clearly, is a Gaussian measure on B(Y ) with mean m = m = 0 and covariance operator C . We consider separately three cases and show that in each of these cases conditions from Subsection 2.3 are satisfied.

V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866

861

Case 1: C = 0. In this case the conclusion of the theorem is satisfied with identically zero mapping m : Y → X and with the Gaussian covariance R = C . Case 2: 1 dim(C (Y ∗ ) < ∞. We have (C (Y ∗ )) = 1 by Lemma 3.3. By Lemma 3.5 we can select from Y ∗ a finite C -representing sequence yi∗ , i = 1, . . . , n and write xi∗ = ∗ yi∗ , i = 1, . . . , n. Define then mappings m : Y → X and R : X∗ → X by the equalities: n

y, yi∗ C xi∗ , ∀y ∈ Y

m(y) =

and

i=1

Rx ∗ = C x ∗ −

n

C xi∗ , x ∗ C xi∗ ,

∀x ∗ ∈ X∗ .

i=1

Clearly m : Y → X is a continuous linear and the equality (m(y)) = y, ∀y ∈ C (Y ∗ ) holds because yi∗ , i = 1, . . . , n, is a C -representing sequence. Let us see that R : X∗ → X is a Gaussian covariance with R C . In fact, define R1 : X∗ → X by the equality R1 x ∗ =

n

C xi∗ , x ∗ C xi∗ ,

∀x ∗ ∈ X∗ .

i=1

Clearly, the C -orthonormality of yi∗ , i=1, . . . , n, implies C -orthonormality of xi∗ , i=1, . . . , n. Hence by Lemma 3.4(b), R1 : X∗ → X is a symmetric positive operator and R1 C . This shows that the operator R = C − R1 is also symmetric positive and satisfies the condition R C . Therefore, by Proposition 3.9 R is a Gaussian covariance. Since R is a Gaussian covariance, according to Lemma 3.8, for every y ∈ Y we get the existence of the Gaussian measure qy on B(X) with the mean m(y) and with the covariance operator R. Now we show that the family (qy )y∈Y is a disintegration of with respect to . Fix A ∈ B(X). The function y → qy (A) is B(Y )-measurable as composition of the B(X)-measurable function x → q0 (A − x) with the continuous linear mapping m : Y → X. Consequently, condition (Dis1) is satisfied. (Dis2) is also satisfied with Y0 := C (Y ∗ ). In fact, fix y ∈ Y0 . As we have noted, (C (Y ∗ )) = 1 and the equality (m(y)) = y holds. The Gaussian measure qy ◦ −1 has the mean (m(y)) = y and the covariance operator R∗ = 0, hence (see Lemma 3.3) qy ◦ −1 ({y}) = 1 and therefore qy (−1 ({y})) = 1. Let us check now (Dis3); we must show that is equal to the mixture of (qy )y∈Y with respect to the mixing measure . Taking into account implication (ii) ⇒ (i) of Proposition 3.2 it is sufficient to prove the equality qˆy (x ∗ ) d(y), ∀x ∗ ∈ X∗ . (3.7) (x ˆ ∗) = Y

Fix x ∗ ∈ X∗ . Since

qˆy (x ∗ ) = exp{i m(y), x ∗ − we get Y

1 2

Rx ∗ , x ∗ },

1 qˆy (x ) d(y) = exp − Rx ∗ , x ∗ 2 ∗

Y

∀y ∈ Y,

exp{i m(y), x ∗ } d(y).

862


Clearly,

1 exp{i m(y), x ∗ } d(y) = ˆ (m∗ x ∗ ) = exp − C m∗ x ∗ , m∗ x ∗ . 2 Y

Since m∗ x ∗ = ni=1 C xi∗ , x ∗ yi∗ and yi∗ , i = 1, . . . , n, are C -orthonormal, we have C m∗ x ∗ , 2 m∗ x ∗ = ni=1 C xi∗ , x ∗ = R1 x ∗ , x ∗ . Therefore, we get 1 1 1 qˆy (x ∗ ) d(y) = exp − Rx ∗ , x ∗ exp − R1 x ∗ , x ∗ exp − C x ∗ , x ∗ 2 2 2 Y and consequently relation (3.7) is proved. Case 3: dim(C (Y ∗ ) = ∞. By Lemma 3.5 we can select from Y ∗ an infinite C -representing sequence yi∗ , i = 1, 2, . . . , and write xi∗ = ∗ yi∗ , i = 1, 2, . . . . For a fixed natural number n introduce a continuous linear mapping mn : Y → X by the equality mn (y) =

n

y, yi∗ C xi∗ ,

∀y ∈ Y.

i=1

Let Y2 := {y ∈ Y : the sequence (mn (y))n∈N converges in X} and Y3 := {y ∈ Y : limn y − n ∗ ∗ i=1 y, yi C yi Y = 0}. Introduce then a mapping m : Y → X as follows: m(y) = 0 for y ∈ Y \ Y2 and m(y) =

∞

y, yi∗

i=1

C xi∗

= lim n

n

y, yi∗ C xi∗ ,

∀y ∈ Y2 .

i=1

Define also mappings R1 , R : X ∗ → X by the equalities R1 x ∗ =

∞

C xi∗ , x ∗ C xi∗ ,

∀x ∗ ∈ X∗ , R = C − R1 .

(3.8)

i=1

Since C -orthonormality of yi∗ , i = 1, 2, . . . , implies C -orthonormality of xi∗ , i = 1, 2, . . . , by Lemma 3.4(b) the equality (3.8) defines a symmetric positive operator R1 : X ∗ → X with R1 C . This shows that the operator R = C − R1 is also symmetric positive and satisfies the condition R C . Now we will see that the conclusion of the theorem is satisfied with these m and R. First we will prove the following statement. Claim. We have (Y2 ) = 1 and (Y3 ) = 1. Proof. As above we can see that

exp{i mn (y), x ∗ } d(y) = ˆ (m∗n x ∗ ) Y 1 = exp − C m∗n x ∗ , m∗n x ∗ 2 n 1 ∗ ∗ 2 = exp − , C x i , x 2 i=1

∀n ∈ N, ∀x ∗ ∈X ∗ .


Hence,

lim n

Y

1 ∗ ∗ ∗ exp{i mn (y), x } d(y) exp − R1 x , x , 2

∀x ∗ ∈ X∗ .

863

(3.9)

Since R1 C , by Proposition 3.9 R1 is a Gaussian covariance. From this according to Lemma 3.8 we get the existence of mean-zero Gaussian measure 1 on B(X) with the covariance operator R1 . From this and (3.9) we get

lim exp{i mn (y), x ∗ } d(y) = ˆ1 (x ∗ ), ∀x ∗ ∈ X∗ . (3.10) n

Y

Observe now that, since is a mean-zero Gaussian measure, the C -orthonormality of yi∗ , i = 1, 2, . . . , implies that yi∗ , i = 1, 2, . . . , are independent standard Gaussian random variables on the probability space (Y, B(Y ), ). This observation and relation (3.10) according to Ito–Nisio’s theorem (see implication (c) ⇒ (a) of [26, Theorem 5.2.4]) imply (Y2 ) = 1. The equality (Y3 ) = 1 can be verified analogously and our claim is proved. Now we continue the proof of the theorem. Since R C , by Proposition 3.9 R is a Gaussian covariance. From this according to Lemma 3.8 for every y ∈ Y we get the existence of the Gaussian measure qy on B(X) with the mean m(y) and the covariance operator R. We now show that the family (qy )y∈Y is the disintegration of with respect to . (Dis1) Clearly m is a Borel measurable mapping. Hence, (Dis1) can be verified as in Case 2. (Dis2) is also satisfied with Y0 := Y2 ∩ Y3 . In fact, according to our claim we have (Y0 ) = 1. Fix y ∈ Y0 . The equality (m(y)) = y holds because on the one hand limn mn (y) = m(y) (as y ∈ Y2 ), hence limn mn (y) = m(y); on the other hand, limn mn (y) − y Y = 0 (as y ∈ Y3 ). Using this, we get that the Gaussian measure qy ◦ −1 has the mean (m(y)) = y and the covariance operator R∗ = 0, hence (see Lemma 3.3), qy ◦ −1 ({y}) = 1, therefore qy (−1 ({y})) = 1. (Dis3) Note first that according to relation limn mn (y) = m(y), ∀y ∈ Y2 , from (Y2 ) = 1 and (3.9) we get

1 ∗ ∗ ∗ exp{i m(y), x } d(y) = exp − R1 x , x (3.11) , ∀x ∗ ∈ X∗ . 2 Y Now (Dis3) can be verified using implication (ii) ⇒ (i) of Proposition 3.2 and relation (3.11) as in Case 2. Remark 3.12. (1) It follows from the uniqueness part of Theorem 2.4 that the disintegration described in Theorem 3.11 is unique. (2) (Suggested to pay attention to by one of the referees.) If in Theorem 3.11 the mapping is injective, then (X) ∈ B(Y ), there exists a vector subspace Y0 ⊂ (X) such that Y0 ∈ B(Y ), (Y0 ) = 1 and m(y) = −1 (y), ∀y ∈ Y0 ; moreover, qy = m(y) , ∀y ∈ Y0 . Corollary 3.13. Let X be a real separable Banach spaces, be a Gaussian measure on B(X) with mean zero and non-zero covariance operator C : X ∗ → X. Let also n be a natural number, xi∗ , i = 1, . . . , n be a C -orthonormal sequence and : X → Rn be the linear mapping induced by the sequence xi∗ , i = 1, . . . , n. Then = is the standard Gaussian measure on B(Rn ) and there exists a disintegration (qy )y∈Y of on B(X) with respect to such that for a fixed y = (y1 , . . . , yn ) ∈ Rn , qy is Gaussian measure on B(X) with mean m(y) =

864


n

C xi∗ ∈ X and covariance operator i=1 yi ∗ C x − ni=1 C xi∗ , x ∗ C xi∗ , ∀x ∗ ∈ X∗ .

R : X ∗ → X defined by the equality Rx ∗ =

Remark 3.14. (1) Corollary 3.13 was obtained earlier in [12]; this result is presented also in [24, Appendix, Lemma 2.9.6] and in [20, Theorem 3.4.1]. One of the key points of the proof in [12] is Proposition 3.9, which was derived there from the statement which later turned out to be not correct for the general case (see Remark 3.10). (2) Note finally that the conclusion of Corollary 3.13 remains valid also when is a Gaussian Radon measure in a Hausdorff locally convex space X and : X → Rn is a -measurable linear mapping induced by a finite sequence of -measurable and -orthonormal linear functionals [2, Proposition 6.11.4]. 4. The existence of average-case optimal algorithms 4.1. IBC-formulations Let us describe briefly the best approximation problem in terms of the theory of IBC as it is presented in [24]. Let X, Y be non-empty sets, G a (real or complex) normed space, S : X → G, : X → Y be mappings and be a non-empty set of mappings : Y → G. Let us agree to call S the solution operator, the information operator, the set of admissible algorithms. Moreover, fix a mapping e : GX × GX → [0, ∞] and call it the error criterion. Problem. Compute an approximation of S by means of the given information and the given algorithms ∈ in such a way that to make the error e(S, ◦ ) as small as possible. An algorithm 0 ∈ which achieves the smallest possible error (whenever it exists) will be called the optimal algorithm. Traditionally as an error criterion the functional e∞ is chosen defined by the equality e∞ (S, T ) = sup Sx − T x G ,

S, T ∈ GX .

x∈X

For a given ∈ the quantity e∞ (S, ◦ ) is called the worst-case error. An algorithm 0 ∈ which achieves the smallest possible worst-case error (whenever it exists) will be called the worstcase optimal algorithm. We refer to [24] for the justification of such a terminology and illustrating examples. To introduce a different error criterion, let us assume further that the set X is endowed by a -algebra A on which a probability measure is given, the set Y is endowed by a -algebra F, the solution operator S : X → G belongs to L2 (X, A, ; G), the information operator : X → Y is (A, F)-measurable and : F → [0, 1] is the distribution of associated with . The set of admissible algorithms is contained in L2 (Y, F, ; G). As an error criterion let us choose the functional e2, defined by the equality

1/2 e2, (S, T ) =

Sx − T x 2G d(x) , S, T ∈ L2 (X, A, ; G). X

For a given ∈ we have that ◦ ∈ L2 (X, A, ; G), hence the quantity e2, (S, ◦ ) is well defined and it is called the average-case error. An algorithm 0 ∈ which achieves the


865

smallest possible average-case error (whenever it exists) will be called the average-case optimal algorithm. 4.2. Average-case optimal algorithms via disintegration In this subsection we shall see that by using disintegration it is possible to prove the existence and, at the same time, to find an explicit form of the average-case optimal algorithm. Proposition 4.1. Let X, G, Y be separable Banach spaces and be a mean-zero Gaussian measure on B(X). Let, moreover, S : X → G be a continuous linear solution operator; : X → Y be a continuous linear information operator; (qy )y∈Y be the Gaussian disintegration of with respect to and finally m : Y → X be the mapping from Theorem 3.11. Then 0 = S ◦ m : Y → G is an average-case optimal algorithm for S and . Proof. As it is well known every Gaussian measure in a separable Banach space is of strong order 2. We have S ∈ L2 (X, B(X), ; G) because S is continuous linear and so ◦ S −1 is a Gaussian measure on B(G). Since qy , y ∈ Y are also Gaussian measures we also have

Sx 2G dqy (x) < ∞, ∀y ∈ Y. X

From the last relation and = q by Lemma 2.3 we can conclude that 0 ∈ L2 (Y, F, ; G). Fix arbitrarily y ∈ Y . Since the Gaussian measure qy has mean m(y) and S is a continuous linear operator, we have 0 (y) = Sm(y) = S(x) dqy (x). X

From the last equality, since the Gaussian measure qy ◦ S −1 is symmetric with respect to its mean 0 (y), we get 2

S(x) − (y) G dqy (x) S(x)−0 (y) 2G dqy (x), ∀∈L2 (Y, F, ; G). (4.1) X

X

−1 ({y})

for y ∈ Y . By property (Dis2) we can find Y0 ∈ F with (Y0 ) = 1 such Let Xy := that qy (Xy ) = 1, ∀y ∈ Y0 . Fix arbitrarily y ∈ Y0 and ∈ L2 (Y, F, ; G). Using inequality (4.1) and Lemma 2.3 we obtain 2 2

S(x) − ((x)) G d(x) =

S(x) − ((x)) G dqy (x) d (y) X

Y0

Xy

× Y0

Xy

Y0

=

X

and the proof is finished.

Xy

S(x) − (y) 2G dqy (x) d (y)

S(x) − 0 (y) 2G dqy (x)

S(x) − 0 ((x)) 2G d(x)

d (y)

866


Remark 4.2. Proposition 4.1 for the case Y = Rn and : X → Rn is a linear mapping induced by some C -orthonormal sequence xi∗ , i = 1, . . . , n, was obtained earlier in [24]. Acknowledgments We are grateful to the referees for their valuable remarks and suggestions. References [1] S.K. Berberian, A note on the disintegration of measures, Proc. Amer. Math. Soc. 71 (1) (1978) 115–116. [2] V.I. Bogachev, Gaussian measures, Mathematical Surveys and Monographs, vol. 62, American Mathematical Society, Providence, Rhode Island, 1998, xi, 433pp. [3] N. Bourbaki, Integration Vectorielle, Hermann, Paris, 1959, 105pp. Chapter VI. [4] S.D. Chaterji, Disintegration of measures and lifting. Vector and operator valued measures and applications, Proceedings of a symposium on vector and operator valued measures and applications, held at Snowbird Resort, Alta, Utah, August 7–12, 1972, Academic Press, New York, 1973, pp. 69–83. [5] Iu.A. Davydov, M.A. Lifshits, N.V. Smorodina, Local properties of distributions of stochastic functionals, translated from the 1995 Russian original by V. E. Naza˘ıkinski˘ı and M.A. Shishkova, Translations of Mathematical Monographs, vol. 173, American Mathematical Society, Providence, RI, 1998, xiv+184pp. ISBN 0-8218-0584-3. [6] G.A. Edgar, Disintegration of measures and the vector-valued Radon–Nikodým theorem, Duke Math. J. 42 (3) (1975) 447–450. [7] A.M. Faden, The existence of regular conditional probabilities: necessary and sufficient conditions, Ann. Probab. 13 (1985) 288–298. [8] S. Graf, L.D. Mauldin, A classification of disintegrations of measures, Measure and Measurable Dynamics (Rochester, NY, 1987), Contemporary Mathematics, vol. 94, American Mathemtical Society, Providence, RI, 1989, pp. 147–158. [9] H. Helson, Disintegration of measures, Harmonic Analysis and Hypergroups (Delhi, 1995), Trends in Mathematics, Birkhäuser Boston, Boston, MA, 1998, pp. 47–50. [10] J. Hoffmann-Jorgensen, The Theory of Analytic Spaces, Aarhus Universitet, Matematisk Institut, Various Publication Series, vol. 10, June, 1970, 314pp. [11] J. Hoffmann-Jorgensen, Existence of conditional probabilities, Math. Scand. 28 (1971) 257–264. [12] D. Lee, G.W. Wasilkowski, Approximation of linear functionals on a Banach space with a Gaussian measure, J. Complexity 2 (1) (1986) 12–43. [13] D. Maharam, Strict disintegration of measures, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 32 (1975) 73–79. [14] K. Musial, Existence of proper regular conditional probabilities, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 22 (1972) 8–12. [15] J.V. Neumann, Zur Operatoremethode In Der Klassischen Mechanik, Ann. Math. 33 (3) (1933) 587–682. [16] J. Neveu, Bases matematiques du calcul des probabilites, Masson et Cie, Paris, 1964 (Russian transl. Translated and annotated by V.V. Sazonov. Mir, Moscow, 1969, 309pp). [17] J.K. Pachl, Disintegration and compact measures, Math. Scand. 43 (1) (1978/1979) 157–168. [18] K.R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New York, London, 1967. [19] K.R. Parthasarathy, Introduction to Probability and Measure, Springer, New York, 1978 xii+312pp. ISBN 0-38791135-9 (Russian translation: Moscow, Mir, 1983, 344pp.). [20] L. Plaskota, Noisy Information and Computational Complexity, Cambridge University Press, Cambridge, 1996 xii+308pp. [21] M.M. Rao, Conditional measures and applications, Marcel Dekker, New York, 1993 xiv+417pp. [22] A. Tortrat, Lois indefinement divisibles ( ∈ I ) dans un group topologique abelian metrisable X. Cas des espaces vectoriels, C. R. Acad. Sci. Paris 261 (1965) 4973–4975. [23] A. Tortrat, Désintégration d’une probabilité, statistiques exhaustives, Séminaire de Probabilités, XI (University of Strasbourg, Strasbourg, 1975/1976), Lecture Notes in Mathematics, vol. 581, Springer, Berlin, 1977, pp. 539–565, (in French). [24] J.F. Traub, G.W. Wasilkowski, H. Wozniakowski, Information-based complexity, with contributions by A. G. Werschultz and T. Boult, Computer Science and Scientific Computing, Academic Press, Boston, MA, 1988, xiv+523pp. [25] N.N. Vakhania, Probability Distributions on Linear Spaces, North-Holland, Amsterdam, 1981. [26] N.N. Vakhania, V.I. Tarieladze, S.A. Chobanyan, Probability Distributions on Banach Spaces, Reidel, Dordrecht, 1987.


Free-knot spline approximation of stochastic processes Jakob Creutziga , Thomas Müller-Gronbachb , Klaus Rittera,∗ a Fachbereich Mathematik, Technische Universität Darmstadt, SchloYgartenstraYe 7, 64289 Darmstadt, Germany b Fakultät für Mathematik und Informatik, FernUniversität Hagen, LützowstraYe 125, 58084 Hagen, Germany

Received 8 December 2006; accepted 26 May 2007 Available online 22 June 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We study optimal approximation of stochastic processes by polynomial splines with free knots. The number of free knots is either a priori fixed or may depend on the particular trajectory. For the s-fold integrated Wiener process as well as for scalar diffusion processes we determine the asymptotic behavior of the average Lp -distance to the splines spaces, as the (expected) number of free knots tends to infinity. © 2007 Elsevier Inc. All rights reserved. Keywords: Integrated Wiener process; Diffusion process; Stochastic differential equation; Optimal spline approximation; Free knots

1. Introduction Consider a stochastic process X = (X(t))t 0 with continuous paths on a probability space (, A, P ). We study optimal approximation of X on the unit interval by polynomial splines with free knots, which has first been treated in [11]. For k ∈ N and r ∈ N0 we let r denote the set of polynomials of degree at most r, and we consider the space k,r of polynomial splines =

k j =1

1]tj −1 ,tj ] · j ,

where 0 = t0 < · · · < tk = 1 and 1 , . . . , k ∈ r . Furthermore, we let Nk,r denote the class of ∗ Corresponding author.

E-mail address: [email protected] (K. Ritter). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.05.003

868

J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889

mappings : → k,r , X and for 1p ∞ and 1 q < ∞ we define 1/q q ∈ Nk,r . ek,r (X, Lp , q) = inf E∗ X − X :X Lp [0,1] Here we use the outer expectation value E∗ in order to avoid cumbersome measurability considerations. The reader is referred to [21] for a detailed study of the outer integral and expectation. Note that ek,r (X, Lp , q) is the q-average Lp -distance of the process X to the spline space k,r . A natural extension of this methodology is not to work with an a priori chosen number of free knots, but only to control the average number of knots needed. This leads to the definition r = ∞ k=1 k,r and to the study of the class Nr of mappings : → r . X ∈ Nr we define For a spline approximation method X = E∗ (min{k ∈ N : X(·) ∈ k,r }), (X) − 1 is the expected number of free knots used by X. Subject to the bound (X)k, i.e., (X) the minimal achievable error for approximation of X in the class Nr is given by 1/q av q ∈ Nr , (X) k . (X, Lp , q) = inf E∗ X − X :X ek,r Lp [0,1] av as k tends to infinity. We shall study the asymptotics of the quantities ek,r and ek,r The spline spaces k,r form nonlinear manifolds that consist of k-term linear combinations of functions of the form 1]t,1] · with 0 t < 1 and ∈ r . We refer to [7, Section 6] for a detailed treatment in the context of nonlinear approximation. Hence we are addressing a so-called nonlinear approximation problem. While nonlinear approximation is extensively studied for deterministic functions, see [7] for a survey, much less is known for stochastic processes, i.e., for random functions. Here we refer to [2,3], where wavelet methods are analyzed, and to [11]. In the latter paper nonlinear approximation is related to approximation based on partial information, as studied in information-based complexity, and spline approximation with free knots is analyzed as a particular instance.

2. Main results For two sequences (ak )k∈N and (bk )k∈N of positive real numbers we write ak ≈ bk if limk→∞ ak /bk = 1, and ak bk if lim inf k→∞ ak /bk 1. Additionally, ak bk means c1 ak /bk c2 for all k ∈ N and some positive constants ci . Fix s ∈ N0 and let W (s) denote an s-fold integrated Wiener process. In [11], the following result was proved. Theorem 1. For r ∈ N0 with r s, av ek,r (W (s) , L∞ , 1) ek,r (W (s) , L∞ , 1) k −(s+1/2) .


869

Our first result refines and extends this theorem. Consider the stopping time r,s,p = inf t > 0 : inf W (s) − Lp [0,t] > 1 , ∈r

which yields the length of the maximal subinterval [0, r,s,p ] that permits best approximation of W (s) from r with error at most one. We have 0 < E r,s,p < ∞, see (14), and we put =s+

1 2

+ 1/p

as well as cr,s,p = (E r,s,p )− and bs,p = (s + 21 )s+1/2 · p −1/p · − , where, for p = ∞, we use the convention ∞0 = 1. Theorem 2. Let r ∈ N0 with r s and 1 q < ∞. Then, for p = ∞, av ek,r (W (s) , L∞ , q) ≈ ek,r (W (s) , L∞ , q) ≈ cr,s,∞ · k −(s+1/2) .

(1)

Furthermore, for 1 p < ∞, bs,p · cr,s,p · k −(s+1/2) ek,r (W (s) , Lp , q)cr,s,p · k −(s+1/2)

(2)

av ek,r (W (s) , Lp , q) k −(s+1/2) .

(3)

and

Note that the bounds provided by (1) and (2) do not depend on the averaging parameter q. Furthermore, lim bs,p = 1

p→∞

for every s ∈ N, but lim bs,p = 0

s→∞

for every 1 p < ∞. We conjecture that the upper bound in (2) is sharp. (p) ∈ Nk,r that achieve the upper bounds in (1) We have an explicit construction of methods X k and (2), i.e., 1/q ∗ (p) q ≈ cr,s,p · k −(s+1/2) , (4) E W (s) − X k Lp [0,1] see (10) and (21). Moreover, these methods a.s. satisfy Lp [0,1] ≈ cr,s,p · k −(s+1/2) W (s) − X k (p)

(5)

as well, while k Lp [0,1] bs,p · cr,s,p · k −(s+1/2) W (s) − X

(6)

870


k ∈ Nk,r . Note that the right-hand sides in (5) holds a.s. for every sequence of approximations X (s) and (6) do not depend on the specific path of W , i.e., on ∈ . Our second result deals with approximation of a scalar diffusion process given by the stochastic differential equation dX(t) = a(X(t)) dt + b(X(t)) dW (t), X(0) = x0 .

t 0, (7)

Here x0 ∈ R, and W denotes a one-dimensional Wiener process. Moreover, we assume that the functions a, b : R → R satisfy (A1) a is Lipschitz continuous. (A2) b is differentiable with a bounded derivative. (A3) b(x0 ) = 0. Theorem 3. Let r ∈ N0 , 1 q < ∞, and 1 p ∞. Then av ek,r (X, Lp , q) ek,r (X, Lp , q) k −1/2

holds for the strong solution X of Eq. (7). For a diffusion process X piecewise linear interpolation with free knots is frequently used in connection with adaptive step-size control. Theorem 3 provides a lower bound for the Lp -error of any such numerical algorithm, no matter whether just Wiener increments or, e.g., arbitrary multiple Itô-integrals are used. Under slightly stronger conditions on the diffusion coefficient b, error estimates in [9,17] lead to refined upper bounds in Theorem 3 for the case 1 p < ∞, as follows. Put 1/p2 p (p1 , p2 ) = E b ◦ XL2p [0,1] 1

for 1 p1 , p2 < ∞. Furthermore, let B denote a Brownian bridge on [0, 1] and define 1/p p (p) = E BLp [0,1] . Then ek,1 (X, Lp , p) (p) · (2p/(p + 2), p) · k −1/2 and av ek,1 (X, Lp , p) (p) · (2p/(p + 2), 2p/(p + 2)) · k −1/2 .

We add that these upper bounds are achieved by piecewise linear interpolation of modified Milstein schemes with adaptive step-size control for the Wiener increments. In the case p = ∞ it is interesting to compare the results on free-knot spline approximation with average k-widths of X. The latter quantities are defined by

1/q q dk (X, Lp , q) = inf E inf X − Lp [0,1] ,

∈

where the infimum is taken over all linear subspaces ⊆ Lp [0, 1] of dimension at most k. For X = W (s) as well as in the diffusion case we have dk (X, L∞ , q) k −(s+1/2) ,


871

see [4,14–16,6]. Almost optimal linear subspaces are not known explicitly, since the proof of the upper bound for dk (X, L∞ , q) is non-constructive. We add that in the case of an s-fold integrated Wiener process piecewise polynomial interpolation of W (s) at equidistant knots i/k only yields errors of order (ln k)1/2 · k −(s+1/2) , see [20] for results and references. Similarly, in the diffusion k ∈ Nr that are only based on pointwise evaluation of W and satisfy (X k ) k case, methods X 1/2 −1/2 , see [18]. can at most achieve errors of order (ln k) · k The rest of the paper is organized as follows. In the next section, some auxiliary results about approximation of a fixed function by piecewise polynomial splines are established. In Section 4, this is used to prove Theorem 2, as well as Eqs. (4)–(6). Section 5 is devoted to the proof of Theorem 3. In the Appendix, we prove an auxiliary result about convergence of negative moments of means and a small deviation result, which controls the probability that a path of W (s) stays close to the space r . 3. Approximation of deterministic functions Let r ∈ N0 and 1 p ∞ be fixed. We introduce error measures, which allow to determine suitable free knots for spline approximation. For f ∈ C [0, ∞[ and 0 u < v we put

[u,v] (f ) = inf f − Lp [u,v] . ∈r

Furthermore, for ε > 0, we put 0,ε (f ) = 0, and we define j,ε (f ) = inf{t > j −1,ε (f ) : [j −1,ε (f ),t] (f ) > ε} for j 1. Here inf ∅ = ∞, as usual. Put Ij (f ) = {ε > 0 : j,ε (f ) < ∞}. Lemma 4. Let j ∈ N. (i) If ε ∈ Ij (f ) then

[j −1,ε (f ),j,ε (f )] (f ) = ε. (ii) The set Ij (f ) is an interval, and the mapping ε → j,ε (f ) is strictly increasing and rightcontinuous on Ij (f ). Furthermore, j,ε (f ) > j −1,ε (f ) if ε ∈ Ij −1 (f ), and limε→∞ j,ε (f ) = ∞. (iii) If v → [u,v] (f ) is strictly increasing for every u 0, then ε → j,ε (f ) is continuous on Ij (f ). Proof. First we show that the mapping (u, v) → [u,v] (f ) is continuous. Put J1 = [u/2, u + (v − u)/3] as well as J2 = [v − (v − u)/3, 2v]. Moreover, let (t) = ri=0 i · t i for ∈ Rr+1 , and define a norm on Rr+1 by = Lp [u+(v−u)/3,v−(v−u)/3] . If (x, y) ∈ J1 × J2 and f − Lp [x,y] = [x,y] (f ) then Lp [x,y] [u/2,2v] (f ) + f Lp [u/2,2v] .

872


Hence there exists a compact set K ⊆ Rr+1 such that

[x,y] (f ) = inf f − Lp [x,y] ∈K

for every (x, y) ∈ J1 × J2 . Since (x, y, ) → f − Lp [x,y] defines a continuous mapping on J1 × J2 × K, we conclude that (x, y) → inf ∈K f − Lp [x,y] is continuous, too, on J1 × J2 . Continuity and monotonicity of v → [u,v] (f ) immediately imply (i). The monotonicity stated in (ii) will be verified inductively. Let 0 < ε1 < ε2 with ε2 ∈ Ij (f ), and suppose that j −1,ε1 (f )j −1,ε2 (f ). Note that the latter holds true by definition for j = 1. From (i) we get

[j −1,ε1 (f ),j,ε2 (f )] (f ) [j −1,ε2 (f ),j,ε2 (f )] (f ) = ε2 . This implies j,ε1 (f )j,ε2 (f ), and (i) excludes equality to hold here. Since [u,v] (f )f Lp [u,v] , the mappings ε →j,ε (f ) are unbounded and j,ε (f )>j −1,ε (f ) if ε ∈ Ij −1 (f ). For the proof of the continuity properties stated in (ii) and (iii) we also proceed inductively, and we use (i) and the monotonicity from (ii). Consider a sequence (εn )n∈N in Ij (f ), which converges monotonically to ε ∈ Ij (f ), and put t = limn→∞ j,εn (f ). Assume that limn→∞ j −1,εn (f ) = j −1,ε (f ), which obviously holds true for j = 1. Continuity of (u, v) → [u,v] (f ) and (i) imply [j −1,ε (f ),t] (f ) = ε, so that t j,ε (f ). For a decreasing sequence (εn )n∈N we also have j,ε (f ) t. For an increasing sequence (εn )n∈N we use the strict monotonicity of v → [u,v] (f ) to derive t = j,ε (f ). Let F denote the class of functions f ∈ C [0, ∞[ that satisfy j,ε (f ) < ∞

(8)

for every j ∈ N and ε > 0 as well as lim j,ε (f ) = 0

(9)

ε→0

for every j ∈ N. Let k ∈ N. We now present an almost optimal spline approximation method of degree r with k − 1 free knots for functions f ∈ F . Put k (f ) = inf{ε > 0 : k,ε (f )1} and note that (9) together with Lemma 4(ii) implies k (f ) ∈ ]0, ∞[. Let j = j,k (f ) (f ) for j = 0, . . . , k and define (p)

k (f ) =

k j =1

1]j −1 ,j ] · argmin f − Lp [j −1 ,j ] . ∈r

(10)

Note that Lemma 4 guarantees (p)

f − k (f )Lp [j −1 ,j ] = k (f )

(11)


873

for j = 1, . . . , k and k 1.

(12) (p)

The spline k (f )|[0,1] ∈ k,r enjoys the following optimality properties. Proposition 5. Let k ∈ N and f ∈ F . (i) For 1p ∞, (p)

f − k (f )Lp [0,1] k 1/p · k (f ). (ii) For p = ∞ and every ∈ k,r , f − L∞ [0,1] k (f ). (iii) For 1p < ∞, every ∈ k,r , and every m ∈ N with m > k, f − Lp [0,1] (m − k + 1)1/p · m (f ). Proof. For p < ∞, (p)

p

f − k (f )Lp [0,1]

k j =1

(p)

p

f − k (f )Lp [j −1 ,j ] = k · (k (f ))p

follows from (11) and (12). For p = ∞, (i) is verified analogously. Consider a polynomial spline ∈ k,r and let 0 = t0 < · · · < tk = 1 denote the corresponding knots. Furthermore, let ∈ ]0, 1[. For the proof of (ii) we put

j = j, ·k (f ) (f ) for j = 0, . . . , k. Then k < 1, which implies [ j −1 , j ] ⊆ [tj −1 , tj ] for some j ∈ {1, . . . , k}. Consequently, by Lemma 4, f − L∞ [0,1] f − L∞ [ j −1 , j ] inf f − L∞ [ j −1 , j ] = · k (f ). ∈r

For the proof of (iii) we define

= , ·m (f ) (f ) for = 0, . . . , m. Then m < 1, which implies [ i −1 , i ] ⊆ [tji −1 , tji ] for some indices 1j1 · · · jm−k+1 k and 1 1 < · · · < m−k+1 m. Hence, by Lemma 4, f

p − Lp [0,1]

m−k+1 i=1

p

inf f − Lp [ −1 , ] = (m − k + 1) · p · (m (f ))p .

∈r

i

i

for 1p < ∞. Letting tend to one completes the proof.

874


4. Approximation of integrated Wiener processes Let W denote a Wiener process and consider the s-fold integrated Wiener processes W (s) defined by W (0) = W and t W (s) (t) = W (s−1) (u) du 0

for t 0 and s ∈ N. We briefly discuss some properties of W (s) that will be important in the sequel. The scaling property of the Wiener process implies that for every > 0 the process ( −(s+1/2) · (s) W ( · t))t 0 is an s-fold integrated Wiener process, too. This fact will be called the scaling property of W (s) . While W (s) has no longer independent increments for s 1, the influence of the past is very explicit. For z > 0 we define z W (s) inductively by zW

(0)

zW

(s)

(t) = W (t + z) − W (z)

and (t) =

t zW

(s−1)

(u) du.

0

Then it is easy to check that W (s) (t + z) =

s t i (s−i) (z) + z W (s) (t). W i!

(13)

i=0

Consider the filtration generated by W, which coincides with the filtration generated by W (s) , and let denote a stopping time with P ( < ∞) = 1. Then the strong Markov property of W implies that the process W

(s)

= ( W (s) (t))t 0

is an s-fold integrated Wiener process, too. Moreover, the processes W (s) and (1[0,] (t)·W (t))t 0 are independent, and consequently, the processes W (s) and (1[0,] (t)·W (s) (t))t 0 are independent as well. These facts will be called the strong Markov property of W (s) . Fix s ∈ N0 . In the sequel we assume that r s. For any fixed ε > 0 we consider the sequence of stopping times j,ε (W (s) ), which turn out to be finite a.s., see (14), and therefore are strictly increasing, see Lemma 4. Moreover, for j ∈ N, we define j,ε = j,ε (W (s) ) − j −1,ε (W (s) ). These random variables yield the lengths of consecutive maximal subintervals that permit best approximation from the space r with error at most ε. Recall that F ⊆ C [0, ∞[ is defined via properties (8) and (9) and that = s + 21 + 1/p. In the case s = 0 and r = 1 the analogous construction with interpolation instead of best approximation has already been used for the study of rates of convergence in the functional law of the iterated logarithm, see [8].


875

Lemma 6. The s-fold integrated Wiener process W (s) satisfies P (W (s) ∈ F ) = 1. For every ε > 0 and m ∈ N the random variables j,ε form an i.i.d. sequence with 1,ε = ε1/ · 1,1 d

and E (m 1,1 ) < ∞.

Proof. We claim that E (j,ε (W (s) )) < ∞

(14)

for every j ∈ N. For the case j = 1 let Z = [0,1] (W (s) ) and note that

[0,t] (W (s) ) = t · Z d

follows for t > 0 from the scaling property of W (s) . Hence we have P (1,ε (W (s) ) < t) = P ( [0,t] (W (s) ) > ε) = P (Z > ε · t − ),

(15)

which, in particular, yields 1,ε (W (s) ) = ε1/ · 1,1 (W (s) ). d

(16)

According to Corollary 17, there exists a constant c > 0 such that P (Z ) exp(−c · −1/(s+1/2) ) holds for every ∈ ]0, 1]. We conclude that P (1,1 (W s) ) > t) exp(−c · t) (s) if t 1, which implies E (m 1,1 (W )) < ∞ for every m ∈ N. Next, let j 2, put = j −1,ε (W (s) ) and = j,ε (W (s) ), and assume that E (m ) < ∞. From representation (13) and the fact that r s we derive

[, ] (W (s) ) = [0, −] ( W (s) ), and hence it follows that = + 1,ε ( W (s) ).

(17)

We have E ((1,ε ( W (s) ))m ) < ∞, since W (s) is an s-fold integrated Wiener process again, and consequently E (( )m ) < ∞. We turn to the properties of the sequence j,ε . Due to (16) and (17) we have j,ε = 1,ε ( W (s) ) = 1,ε (W (s) ) = ε1/ · 1,1 . d

d

Furthermore, j,ε and (1[0,] (t) · W (s) (t))t 0 are independent because of the strong Markov property of W (s) , and therefore j,ε and (1,ε , . . . , j −1,ε ) are independent as well.

876


It remains to show that the trajectories of W (s) a.s. satisfy (9). By the properties of the sequence j,ε we have j,ε (W (s) ) = ε1/ · j,1 (W (s) ). d

(18)

Observing (14) we conclude that

(s) P lim j,ε (W ) t = lim P (j,ε (W (s) ) t) ε→0

ε→0

= lim P (j,1 (W (s) ) t/ε 1/ ) = 0 ε→0

for every t > 0, which completes the proof.

Because of Lemma 6, Proposition 5 yields upper and lower bounds for the error of spline approximation of W (s) in terms of the random variable Vk = k (W (s) ). Remark 7. Note that W (s) a.s. satisfies W (s) |[u,v] ∈ r for all 0 u < v. Assume that p < ∞. Then v → [u,v] (W (s) ) is a.s. strictly increasing for all u0. We use Lemma 4(iii) and Lemma 6 to conclude that, with probability one, Vk is the unique solution of k,Vk (W (s) ) = 1. Consequently, due to (11), we a.s. have equality in Proposition 5(i) for 1 p < ∞, too. Note that with positive probability solutions ε of the equation k,ε (W (s) ) = 1 fail to exist in the case p = ∞. To complete the analysis of spline approximation methods we study the asymptotic behavior of the sequence Vk . Lemma 8. For every 1 q < ∞, q 1/q E Vk ≈ (k · E (1,1 ))− . Furthermore, with probability one, Vk ≈ (k · E (1,1 ))− . Proof. Put Sk = 1/k ·

k

j,1

j =1

and use (18) to obtain −

P (Vk ε) = P (k,ε (W (s) ) 1) = P (k − · Sk ε). Therefore −q

E (Vk ) = k −q · E (Sk q

),

(19)


877

and for the first statement it remains to show that −q

E (Sk

) ≈ (E (1,1 ))−q .

The latter fact follows from Proposition 15, if we can verify that 1,1 has a proper lower tail behavior (29). To this end we use (15) and the large deviation estimate (33) to obtain P (1,1 < ) = P ( [0,1] (W (s) ) > − ) P (W (s) Lp [0,1] > − ) exp(−c · −2 ) with some constant c > 0 for all 1. In order to prove the second statement, put Sk∗ = (k · 2 )−1/2 ·

k

(j,1 − ),

j =1

where = E (1,1 ) and 2 denotes the variance of 1,1 . Let > 1. Then P (Vk > · (k · )− ) = P (Sk < −1/ · ) = P (Sk∗ < k 1/2 · ) with = ( −1/ − 1)/ · < 0, due to (19). We apply a local version of the central limit theorem, which holds for i.i.d. sequences with a finite third moment, see [19, Theorem V.14], to obtain P (Vk > · (k · )− ) c1 · k

−1/2

· (1 + k

1/2

· | |)

−3

+ (2)

c2 · k −2

−1/2

·

k 1/2 · −∞

exp(−u2 /2) du

with constants ci > 0. For every < 1 we get P (Vk < · (k · )− ) c2 · k −2

(20)

in the same way. It remains to apply the Borel–Cantelli Lemma.

4.1. Proof of (4), (5), and the upper bounds in (1), (2), (3) Consider the methods (p) = (p) (W (s) ) ∈ Nk,r . X k k

(21)

Observe Remark 7 and use Proposition 5(i) as well as Lemma 6 to obtain Lp [0,1] = k 1/p · Vk W (s) − X k (p)

a.s.

Now, apply Lemma 8 to obtain (4) and (5). Clearly, (4) implies the upper bounds in (1), (2), and (3).

878


4.2. Proof of (6) and the lower bound in (2) k ∈ Nk,r and put Consider an arbitrary sequence of approximations X mk = /(s + 21 ) · k. Use Lemma 6, and apply Proposition 5(ii) in the case p = ∞ and Proposition 5(iii) in the case p < ∞ to obtain k Lp [0,1] (mk − k + 1)1/p · Vm W (s) − X k

a.s.

Clearly, mk ≈ /(s + 1/2) · k. Hence, by Lemma 8, q 1/q (mk − k)1/p · Vmk ≈ (mk − k)1/p · E Vmk ≈ k −(s+1/2) · p −1/p · − · (s + 21 )s+1/2 · (E (1,1 ))− with probability one, which implies (6) and the lower bound in (2). 4.3. Proof of the lower bound in (1) k ∈ Nr such that (X k ) k, i.e., Let k ∈ N and consider X

∞ ∗ · 1B k E

(22)

=1

∈ ,r \ −1,r , where 0,r = ∅. By Proposition 5(ii) and Lemma 6, for B = X(·)

∞ q ∗ k q E 1 · V E∗ W (s) − X B . L [0,1] ∞

=1

For ∈ ]0, 1[, = E (1,1 ), and L ∈ N we define A = V > · ( · )− , and CL =

L

B .

=1

Since (f )+1 (f ) for f ∈ F , we obtain ∞ L ∞ q q q 1B · V 1B · VL + 1B · V =1

=1

=L+1 ∞ q 1B ∩AL · VL + =1 =L+1

L

q

1B ∩A · V

∞

q −q · L−q · 1CL ∩AL + −q · 1B ∩A

l=L+1

q −q · L−q · (1CL − 1AcL ) +

∞ l=L+1

−q · (1B − 1Ac )


879

with probability one, which implies

∞ ∞ q −q q ∗ ∗ −q −q 1B · V E L · 1CL + · 1B ·E =1

l=L+1 ∞

−E L−q · 1AcL +

−q · 1Ac .

l=L+1

From (20) we infer that P (Ac ) c1 · −2 with a constant c1 > 0. Hence there exists a constant c2 > 0 such that (L) = E

∗

L

−q

∞

· 1CL +

−q

· 1B − c2 · L−q−1

l=L+1

satisfies k q −q q · E∗ W (s) − X L

∞ [0,1]

(L)

(23)

for every L ∈ N. Put = (1 + 2q)/(2 + 2q), and take L(k) ∈ [k − 1, k ]. We claim that there exists a constant c3 > 0 such that

1+q k q · (L(k)) 1 − k −(1−)q − c3 · k −1/2 .

(24)

First, assume that the outer probability of CL satisfies P ∗ (CL ) k −(1−)q . Then

k q · (L(k)) k q · k −q · P ∗ (CL ) − c2 · (k − 1)−q−1 1 − c3 · k −1/2 with a constant c3 > 0. Next, assume P ∗ (CL ) < k −(1−)q and use (22) to derive

∞ −(1−)q ∗ c ∗ P (CL ) = E 1B 1−k = E∗

∞

l=L+1

( · 1B )q/(1+q) · (−q · 1B )1/(1+q)

l=L+1

∗

E

∞

q/(1+q) 1/(1+q) ∞ −q · 1B · · 1B

l=L+1

l=L+1

q/(1+q) 1/(1+q) ∞ ∞ E∗ · 1B · E∗ −q · 1B l=L+1

l=L+1

1/(1+q)

∞ q/(1+q) ∗ −q k · E · 1B . l=L+1

880


Consequently,

∞ −q · 1B − c2 · (k − 1)−q−1 k q · (L(k)) k q · E∗

1−k

=L+1

1+q −(1−)q

− c3 · k −1/2 ,

which completes the proof of (24). By (23) and (24), k q E∗ W (s) − X q −q · k −q L [0,1] ∞

for every ∈ ]0, 1[. 4.4. Proof of the lower bound in (3) av (W (s) , L , 1). For further use, Clearly it suffices to establish the lower bound claimed for ek,r 1 we shall prove a more general result.

Lemma 9. For every s ∈ N there exists a constant c > 0 with the following property. For every ∈ Nr , every A ∈ A with P (A) 4 , and every t ∈ ]0, 1] we have X 5

∗ (s) L [0,t] c · t s+3/2 · ((X)) −(s+1/2) . E 1A · W − X 1 Proof. Because of the scaling property of W (s) it suffices to study the particular case t = 1. < ∞ and put k = (X) as well as Assume that (X) ∈ 2k,r }. B = {X Then ∗ k (X)E ((2k + 1) · 1B c ) = (2k + 1) · P ∗ (B c ),

which implies P ∗ (B) 21 . Due to Lemma 6 and Proposition 5(iii), L [0,1] 1B · 2k · V4k 1B · W (s) − X 1

a.s.

Put = E (1,1 ), choose 0 < c < (2)− , and define Dk = {Vk > c · k − }. By (19) we obtain P (Dk ) = P (Sk c−1/ ) P (Sk 2). Hence lim P (Dk ) = 1

k→∞

due to the law of large numbers, and consequently P ∗ (B ∩ Dk ) 25 if k is sufficiently large, say k k0 . We conclude that L [0,1] 1A∩B∩D · c · 21−2 · k −(s+1/2) 1A∩B∩D4k · W (s) − X 1 4k and

P ∗ (A ∩ B

a.s.

∩ D4k ) 1/5 if 4k k0 . Take outer expectations to complete the proof.


881

Lemma 9 with A = and t = 1 yields the lower bound in (3) 5. Approximation of diffusion processes Let X denote the solution of the stochastic differential equation (7) with initial value x0 , and recall that the drift coefficient a and the diffusion coefficient b are supposed to satisfy conditions (A1)–(A3). In the following we use c to denote unspecified positive constants, which may only depend on x0 , a, b and the averaging parameter 1 q < ∞. Note that q

E XL∞ [0,1] < ∞ and E

sup

t∈[s1 ,s2 ]

(25)

|X(t) − X(s1 )|q c · (s2 − s1 )q/2

(26)

for all 1q < ∞ and 0 s1 s2 1, see [10, p. 138]. 5.1. Proof of the upper bound in Theorem 3 In order to establish the upper bound, it suffices to consider the case of p = ∞ and r = 0, i.e., nonlinear approximation in supremum norm with piecewise constant splines. We dissect X into its martingale part t M(t) = b(X(s)) dW (s) 0

and

t

Y (t) = x0 +

a(X(s)) ds. 0

∈ Nk,0 such that Lemma 10. For all 1q < ∞ and k ∈ N, there exists an approximation Y

1/q q c · k −1 . E∗ Y − Y L∞ [0,1] Proof. Put gLip = sup0 s 0. Choose > 0 as well as a function b0 : R → R such that: (a) b0 is differentiable with a bounded derivative, (b) inf x∈R b0 (x)b(x0 )/2, (c) b0 = b on the interval [x0 − , x0 + ]. We will use a Lamperti transform based on the space-transformation x 1 g(x) = du. b x0 0 (u) Note that g = 1/b0 and g = −b0 /b02 , and define H1 , H2 : C[0, ∞[→ C[0, ∞[ by t g a + g /2 · b2 (f (s)) ds H1 (f )(t) = 0

and H2 (f )(t) = g(f (t)). Put H = H2 − H1 . Then by the Itô formula, t b(X(s)) dW (s). H (X)(t) = b 0 (X(s)) 0 The idea of the proof is as follows. We show that any good spline approximation of X leads to a good spline approximation of H (X). However, since with a high probability, X stays within [x0 − , x0 + ] for some short (but nonrandom) period of time, approximation of H (X) is not easier than approximation of W, modulo constants. First, we consider approximation of H1 (X). 1 ∈ Nk,0 such that Lemma 12. For every k ∈ N there exists an approximation X 1 L [0,1] c · k −1 . E∗ H1 (X) − X 1

884


Proof. Observe that g a + g /2 · b2 (x)c · (1 + x 2 ), and proceed as in the Proof of Lemma 10. Next, we relate approximation of X to approximation of H2 (X). ∈ Nr with (X) < ∞ there exists an approximation Lemma 13. For every approximation X 2 ∈ Nr such that X 2 )2 · (X) (X and

2 L [0,1] c · E∗ X − X L [0,1] + 1/(X) . E∗ H2 (X) − X 1 1

Proof. For a fixed ∈ let X() be given by X() =

k

1]tj −1 ,tj ] · j .

j =1

tk = 1 that contains all the We refine the corresponding partition to a partition 0 = t0 < · · · < Furthermore, we define the polynomials points i/, where = (X). j ∈ r by X() =

k j =1

j . 1]tj −1 ,tj ] ·

Put f = X() and define 2 () = X

k j =1

1]tj −1 ,tj ] · qj

with polynomials tj −1 )) + g (f ( tj −1 )) · ( j − f ( tj −1 )) ∈ r . qj = g(f ( 2 (). If t ∈ tj −1 , tj ⊆ ](i − 1)/, i/], then Let f2 = X |H2 (f )(t) − f2 (t)| tj −1 )) · ( j (t) − f ( tj −1 )) = g(f (t)) − g(f ( tj −1 )) − g (f ( g(f (t)) − g(f ( tj −1 )) − g (f ( tj −1 )) · (f (t) − f ( tj −1 )) + g (f ( tj −1 )) · |f (t) − j (t)|

c · |f (t) − f ( tj −1 )|2 + |f (t) − j (t)|

c · sup |f (s) − f ((i − 1)/)|2 + |f (s) − j (s)| . s∈](i−1)/,i/]

Consequently, we may invoke (26) to derive 2 L [0,1] c · 1/(X) + E∗ X − X L E∗ H2 (X) − X 1

2 )2 · (X). Moreover, (X

1 [0,1]

.


885

We proceed with establishing a lower bound for approximation of H (X). ∈ Nr , Lemma 14. For every approximation X L [0,1] c · ((X)) −1/2 . E∗ H (X) − X 1 Proof. Choose t0 ∈ ]0, 1] such that A = sup |X(t) − x0 | t∈[0,t0 ]

satisfies P (A) 45 . Observe that L [0,1] 1A · W − X L [0,t ] , 1A · H (X) − X 1 1 0 and apply Lemma 9 for s = 0.

∈ Nr with k − 1 < (X) k, and choose X 1 and X 2 Now, consider any approximation X according to Lemmas 12 and 13, respectively. Then 2 − X 1 )L [0,1] E∗ H (X) − (X 1 ∗ 2 L [0,1] + E∗ H1 (X) − X 1 L [0,1] E H2 (X) − X 1 1 ∗ −1 −1 c · E X − XL1 [0,1] + ((X)) + k L [0,1] + k −1 . c · E∗ X − X 1 1 ) (X 2 ) + k 3 · k, so that 2 − X On the other hand, (X 2 − X 1 )L [0,1] c · k −1/2 E∗ H (X) − (X 1 follows from Lemma 14. We conclude that L [0,1] c · k −1/2 , E∗ X − X 1 as claimed. Acknowledgment The authors are grateful to Mikhail Lifshits for helpful discussions. In particular, he pointed out to us the approach in Appendix B. We thank Wenbo Li for discussions on the subject and for providing us with Ref. [8]. We are also grateful for numerous comments from the anonymous referees, which led to an improvement of the presentation. Appendix A. Convergence of negative moments of means Let (i )i∈N be an i.i.d. sequence of random variables such that 1 > 0 a.s. and E (1 ) < ∞. Put Sk = 1/k ·

k i=1

i .

886


Proposition 15. For every > 0, lim inf E (Sk− )(E (1 ))− . k→∞

If P (1 < v) c · v ,

v ∈ ]0, v0 ] ,

(29)

for some constants c, , v0 > 0, then lim E (Sk− ) = (E (1 ))− .

k→∞

Proof. Put = E (1 ) and define gk (v) = · v −(+1) · P (Sk < v). Thanks to the weak law of large numbers, P (Sk < v) tends to 1],∞[ (v) for every v = . Hence, by Lebesgue’s theorem, ∞ lim gk (v) dv = − . (30) k→∞

/2

Since E (Sk− ) =

∞

0

P (Sk− > u) du =

∞

gk (v) dv 0

the asymptotic lower bound for E (Sk− ) follows from (30). Given (29), we may assume without loss of generality that c · v0 < 1. We first consider the case 1 1 a.s., and we put /2 v0 /k gk (v) dv and Bk = gk (v) dv. Ak = 0

v0 /k

For v0 /k v /2 we use Hoeffding’s inequality to obtain gk (v) · v −(+1) · P (|Sk − | > /2) · (k/v0 )+1 · 2 exp(−k/2 · 2 ), which implies lim Ak = 0.

k→∞

On the other hand, if k > , then v0 k

−(+1) Bk = k · · v ·P i < v dv

0

i=1 v0

k ·· 0

k · · ck ·

−(+1)

v v0 0

· (P (1 < v))k dv

v k−(+1) dv k−

= k · · ( k − )−1 · ck · v0

,


887

and therefore lim Bk = 0.

k→∞

In view of (30) we have thus proved the proposition in the case of bounded variables i . In the general case put i,N = min{N, i } as well as Sk,N = 1/k · ki=1 i,N , and apply the result for bounded variables to obtain − lim sup E (Sk− ) inf lim sup E (Sk,N ) = inf (E 1,N )− = (E 1 )− N∈N

k→∞

N∈N

k→∞

by the monotone convergence theorem.

Appendix B. Small deviations of W (s) from r Let X denote a centered Gaussian random variable with values in a normed space (E, · ), and consider a finite-dimensional linear subspace ⊂ E. We are interested in the small deviation behavior of d(X, ) = inf X − . ∈

Obviously, P (X ε)P (d(X, ) ε)

(31)

for every ε > 0. We establish an upper bound for P (d(X, ) ε) that involves large deviations of X, too. Proposition 16. If dim() = r then P (d(X, )ε)(4/ε)r · P (X 2ε) + P (X − ε) for all ε > 0. Proof. Put B (x) = {y ∈ E : y − x } for x ∈ E and > 0, and consider the sets A = ∩ B (0) and B = Bε (0). Then {d(X, )ε} ⊂ {X ∈ A + B} ∪ {X − ε}, and therefore it suffices to prove P (X ∈ A + B) (4/ε)r · P (X 2ε).

(32)

Since 1/ · A ⊂ ∩ B1 (0), the ε-covering number of A is not larger than (4/ε)r , see [1, Eq. (1.1.10)]. Hence A⊂

n

Bε (xi )

i=1

for some x1 , . . . , xn ∈ E with n (4/ε)r , and consequently, A+B ⊂

n i=1

B2ε (xi ).

888


Due to Anderson’s inequality we have P (X ∈ B2ε (xi ))P (X ∈ B2ε (0)), which implies (32).

Now, we turn to the specific case of X = (W (s) (t))t∈[0,1] and E = Lp [0, 1], and we consider the subspace = r of polynomials of degree at most r. According to the large deviation principle for the s-fold integrated Wiener process, − log P (W (s) Lp [0,1] > t) t 2

(33)

as t tends to infinity, see, e.g., [5]. Furthermore, the small ball probabilities satisfy − log P (W (s) Lp [0,1] ε) ε−1/(s+1/2)

(34)

as ε tends to zero, see, e.g., [12,13]. Corollary 17. For all r, s ∈ N0 and 1 p ∞ we have − log P (d(W (s) , r ) ε) ε−1/(s+1/2) as ε tends to zero. Proof. From (31) and (34) we derive − log P (d(W (s) , r ) ε) − log P (W (s) Lp [0,1] ε) ε−1/(s+1/2) , yielding the upper bound in the corollary. For the lower bound we employ Proposition 16 with = ε− for = (2s + 1)−1 to obtain P (d(W (s) , r ) ε) 4r · ε −r(1+ ) · P (W (s) Lp [0,1] 2ε) + P (W (s) Lp [0,1] ε− − ε).

(35)

However, for ε1+ 21 we have ε − /2 ε − − ε ε − and thus, using (33), − log P (W (s) Lp [0,1] ε− − ε) ε −2 = ε−1/(s+1/2) as ε tends to zero. Furthermore, by (34),

− log 4r · ε −r(1+ ) · P (W (s) Lp [0,1] 2ε) ε −1/(s+1/2) . The latter two estimates, together with (35) and the elementary inequality log(x + y) log(2) + max(log(x), log(y)), yield the lower bound in the corollary. References [1] B. Carl, I. Stephani, Entropy, Compactness and the Approximation of Operators, Cambridge University Press, Cambridge, 1990. [2] A. Cohen, J.-P. d’Ales, Nonlinear approximation of random functions, SIAM J. Appl. Math. 57 (1997) 518–540. [3] A. Cohen, I. Daubechies, O.G. Guleryuz, M.T. Orchard, On the importance of combining wavelet-based nonlinear approximation with coding strategies, IEEE Trans. Inform. Theory 48 (2002) 1895–1921.


889

[4] J. Creutzig, Relations between classical, average, and probabilistic Kolmogorov widths, J. Complexity 18 (2002) 287–303. [5] A. Dembo, O. Zeitouni, Large Deviation Techniques and Applications, Springer, New York, 1998. [6] S. Dereich, T. Müller-Gronbach, K. Ritter, Infinite-dimensional quadrature and quantization, Preprint, 2006, arXiv: math.PR/0601240v1. [7] R. DeVore, Nonlinear approximation, Acta Numer. 8 (1998) 51–150. [8] K. Grill, On the rate of convergence in Strassen’s law of the iterated logarithm, Probab. Theory Related Fields 74 (1987) 583–589. [9] N. Hofmann, T. Müller-Gronbach, K. Ritter, The optimal discretization of stochastic differential equations, J. Complexity 17 (2001) 117–153. [10] P.E. Kloeden, P. Platen, Numerical Solution of Stochastic Differential Equations, Springer, Berlin, 1995. [11] M. Kon, L. Plaskota, Information-based nonlinear approximation: an average case setting, J. Complexity 21 (2005) 211–229. [12] W. Li, Q.M. Shao, Gaussian processes: inequalities, small ball probabilities and applications, in: D.N. Shanbhag et al. (Ed.), Stochastic Processes: Theory and Methods, The Handbook of Statistics, vol. 19, North-Holland, Amsterdam, 2001, pp. 533–597. [13] M. Lifshits, Asymptotic behaviour of small ball probabilities, in: B. Grigelionis et al. (Eds.), Proceedings of the Seventh Vilnius Conference 1998, TEV-VSP, Vilnius, 1999, pp. 153–168. [14] V.E. Maiorov, Widths of spaces endowed with a Gaussian measure, Russian Acad. Sci. Dokl. Math. 45 (1992) 305–309. [15] V.E. Maiorov, Average n-widths of the Wiener space in the (L∞ )-norm, J. Complexity 9 (1993) 222–230. [16] V.E. Maiorov, Widths and distribution of values of the approximation functional on the Sobolev space with measure, Constr. Approx. 12 (1996) 443–462. [17] T. Müller-Gronbach, Strong approximation of systems of stochastic differential equations, Habilitationsschrift, TU Darmstadt, 2002. [18] T. Müller-Gronbach, The optimal uniform approximation of systems of stochastic differential equations, Ann. Appl. Probab. 12 (2002) 664–690. [19] V.V. Petrov, Sums of Independent Random Variables, Springer, Berlin, 1975. [20] K. Ritter, Average-Case Analysis of Numerical Problems, Lecture Notes in Mathematics, vol. 1733, Springer, Berlin, 2000. [21] A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.


On the best interval quadrature formulae for classes of differentiable periodic functions V.F. Babenkoa, b,∗ , D.S. Skorokhodova a Dnepropetrovsk National University, Ukraine b Institute of Applied Mathematics and Mechanics of NAS, Ukraine

Received 8 January 2007; accepted 20 March 2007 Available online 7 April 2007 Dedicated to Henryk Wozniakowski on the occasion of his 60th birthday

Abstract In this paper we solve the problem about optimal interval quadrature formula for the class W r F of differentiable periodic functions with rearrangement invariant set F of their derivatives of order r. We prove that the formula with equal coefficients and n node intervals having equidistant midpoints is optimal for considering classes. To this end a sharp inequality for antiderivatives of rearrangements of averaged monosplines is proved. © 2007 Elsevier Inc. All rights reserved. Keywords: Quadrature formulae; Monosplines; Rearrangements

1. Introduction, notations, statement of the problem Let Lp , 1p ∞, be the space of 2-periodic functions f : R → R with the usual norm f p =

⎧ ⎨ 2 |f (t)|p dt 1/p ⎩

0

if p < ∞,

esssup{|f (t)| : t ∈ [0, 2)} if p = ∞.

Let also C2 be the space of continuous 2-periodic functions f : R → R endowed with the uniform norm f C .

∗ Corresponding author. Dnepropetrovsk National University, Ukraine.

E-mail address: [email protected] (V.F. Babenko). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.005

V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917

891

Denote by Kn , n = 1, 2, . . . , the set of all possible quadrature formulae of the form (f ) =

n

aj f (xj ),

j =1

where x1 < x2 < · · · < xn < x1 + 2, aj ∈ R. Let M be some (non-symmetric in general) class of continuous 2-periodic functions. For f ∈ M and ∈ Kn set 2 R(f, ) = f (t) dt − (f ). 0

The error of approximate integration with the help of the formula ∈ Kn on the class M we shall characterize by the pair of values R ± (M, ) = sup{R(±f, ) : f ∈ M} or, equivalently, with the help of the interval (M, ) := [−R − (M, ), R + (M, )]. Certainly, for the symmetric classes M we have R + (M, ) = R − (M, ). Set R± (M, Kn ) = inf{R ± (M, ) : ∈ Kn }.

(1.1)

The Kolmogorov problem about the best quadrature formula for the class M can be formulated in the following way. Find the values (1.1) and find the formulae ∈ Kn that realize the infimum in the right hand part of (1.1), if such formulae exist. The case when there exists a quadrature formula , which realizes infimum in both R+ (M, Kn ) and R− (M, Kn ), is especially interesting. For this and for an arbitrary formula we shall have (M, ) ⊂ (M, ). Quadrature formula satisfying the latter conditions will be called optimal for the class M. Let 0 < h < /n be given. Denote by Kni (h) the set of so-called interval quadrature formulae of the form yj +h n 1 i (f ) = bj f (t) dt, 2h yj −h j =1

where y1 < y2 < · · · < yn < y1 + 2, bj ∈ R. For f ∈ M and i ∈ Kni (h) set 2 R(f, i ) = f (t) dt − i (f ). 0

The error of approximate integration with the help of i ∈ Kni (h) on the class M we shall characterize by the pair of values R ± (M, i ) = sup{R(±f, i ) : f ∈ M}

892


or, that is equivalent, with the help of the interval (M, i ) := [−R − (M, i ), R + (M, i )]. As above, for symmetric classes M we have R + (M, i ) = R − (M, i ). Set R± (M, Kni (h)) = inf{R ± (M, i ) : i ∈ Kni (h)}.

(1.2)

The analog of the Kolmogorov problem about the best interval quadrature formula for the class M can be formulated in the following way. Find the values (1.2) and find the formulae i ∈ Kni (h) that realizes the infimum in the right hand part of (1.2). For the interval formulae as well as for usual quadrature formulae the case when there exists an interval quadrature formula i which realize infimum in both R+ (M, Kni (h)) and R− (M, Kni (h)) is especially interesting. For this i and for an arbitrary formula i we shall have (M, i ) ⊂ (M, i ). Interval quadrature formula satisfying the latter conditions will be called optimal for the class M. From the applications point of view, interval quadrature formulae are more natural than the usual quadrature formulae based on values at points, since quite often the result of measuring physical quantities, due to the structure of the measurement devices, is an average values of the function, describing the studied quantities, over some interval. Note that one can obtain the usual quadrature formula from the corresponding interval quadrature formula as a limit case, setting h → 0. Given h > 0, define the Steklov operator Sh : L1 → C2 in the following way: x+h 1 f (t) dt. Sh (f )(x) := 2h x−h We shall often write f h instead of Sh (f ). It can be easily seen that the problem of finding the optimal interval quadrature formula for the class M can be considered as a problem of finding the optimal usual quadrature formula for the class Sh (M) := {Sh (f ) : f ∈ M}. Let f ∈ L1 . The notation f ⊥ 1 means that 2 f (t) dt = 0. 0

Let F be a subset of L1 such that {f ∈ F : f ⊥ 1} = ∅. For r = 1, 2, . . . denote by W r F the class of functions f that have locally absolutely continuous derivative f (r−1) and such that f (r) ∈ F . In the case when F is the unit ball of the space Lp we obtain the standard Sobolev class Wpr of periodic functions. For a non-negative function f ∈ L1 let us denote by P (f, t) the decreasing rearrangement (see e.g. [9, p. 130, 10, pp. 92, 93]) of the restriction of f to [0, 2). If g is an arbitrary function from L1 , then set (see e.g. [10, p. 99]) (g, t) = P (g+ , t) − P (g− , 2 − t), where g± (t) = max{±g(t); 0}. The set F ⊂ L1 is called rearrangement invariant or, shortly, -invariant if conditions f ∈ F and (g) = (f ) imply g ∈ F .


893

In order to illustrate the variety of the classes W r F with -invariant sets F we mention some examples. 1. For F one can take the unit sphere of any symmetric space of 2-periodic functions embedded in L1 , in particular, the unit sphere in the space Lp , 1 p ∞, in Orlich [11], Lorentz and Marcinkiewicz [12,22] spaces. 2. Let be an arbitrary non-negative, non-decreasing function defined on [0, ∞). One can take

2

F = F () = f ∈ L1 :

(|f (t)|) dt 1 .

0

3. Let , > 0 be non-negative real numbers, 1p ∞. One can take F = Fp;, = {f+ + f− p 1}. r We shall denote the corresponding class W r Fp;, by Wp; , . r 4. Very interesting classes W Ff, correspond to the set Ff, = {g ∈ L1 : (g) = (f )}, where f is a fixed function from L1 , f ⊥ 1. 5. For F one can take the set

Ff,P = {g ∈ L1 : P (|g|, t) = P (|f |, t), t ∈ [0, 2)} or = {g ∈ L1 : P (|g|, t) P (|f |, t), t ∈ [0, 2)}. Ff,P

The list of examples could, of course, be continued. The following integral representation for functions f ∈ W r F plays an essential role in investigation of various extremal problems for classes W r F . Let ∞ 1 −r Dr (x) = j cos(j x − r/2), r ∈ N j =1

be the Bernoulli kernel. Then 2 a0 a0 f (x) = Dr (x − t)f (r) (t) dt = + + (Dr ∗ f (r) )(x), 2 2 0

(1.3)

where a0 =

1

2 f (t) dt. 0

Note that, considering the problem on optimization of quadrature formulae or interval quadrar ture formulae restrict our consideration n for the classes W F , we may by formulae from Kn such that j =1 aj = 2 or by formulae i from Kni (h) such that nj=1 bj = 2 only. For such formulae set m(t) = m,r (t) = −

n j =1

aj Dr (xj − t)

894


and mh (t) = mhi ,r (t) = −

n

bj Drh (yj − t).

j =1

Set Mnr := {m,r : ∈ Kn }, and Mnr,h = Sh (Mnr ) := {mhi ,r : i ∈ Kni (h)}. Functions from

Mnr and from Mnr,h will be called monosplines and averaged monosplines, respectively. With the help of representation (1.3) one can obtain the error of approximate integration by these formulae in the form 2 R(f, ) = f (r) (t)m(t) dt, m = m,r ∈ Mnr 0

if ∈ Kn , or in the form 2 i R(f, ) = f (r) (t)mh (t) dt, 0

mh = mhi ,r ∈ Mnr,h

(1.4)

if i ∈ Kni (h). r Denote by Snr (, ), n = 1, 2, . . . , r = 0, 1, . . . , , > 0, the set of functions f ∈ W∞; ,

with zero mean value on a period such that −1 (f (r) )+ + −1 (f (r) )− ≡ 1 and f (r) admits at most 2n changes of sign on a period. In this paper we shall discuss the Kolmogorov problems on optimal quadrature formulae and optimal interval quadrature formulae for classes W r F with -invariant sets F. We shall show that for any fixed h ∈ (0, /n) the interval quadrature formula having equidistant nodes yj , j = 1, n, and equal coefficients bj = 2/n is optimal for the class W r F among all interval quadrature formulae from Kni (h). To this end a sharp inequality for antiderivatives of rearrangements of averaged monosplines will be proved. The paper is organized in the following way. In Section 2 we shall present the known results, formulate main results of the paper, and describe the ideas of the proof. Some auxiliary results will be presented in Section 3. In Sections 4–7 we shall prove results, formulated in Section 2. 2. Background, main results, scheme of the proof Set n (f ) =

n 2 f (2j/n) n j =1

and in (f ) =

2j/n+h n 2 1 f (t) dt. n 2h 2j/n−h j =1

In addition, set

n 2j 2 mn,r (x) = − −x Dr n n j =1


895

and denote by n,r;, the rth periodic integral with zero mean value over a period of a 2n−1 periodic function n,0;, which equals on the interval [0, 2n−1 ( + )−1 ), and equals − on the interval [2n−1 ( + )−1 , 2n−1 ). It was proved in the papers of Motornyi [16], Ligun [14], and Zhensykbaev (see [23,24]) that if M = Wpr , r = 1, 2, . . . , 1 p ∞, then R± (M, Kn ) = R ± (M, n ). At the same time it does not hold for some natural analogues of the class Wpr [19]. Therefore, it was an interesting problem to determine the most general conditions on the class M of functions that ensure the optimality of formula n . This problem was solved by Babenko [3,5]. He proved the following: Theorem A. Let n, r = 1, 2, . . . , and let F ⊂ L1 be rearrangement invariant. Then R± (W r F , Kn ) = R ± (W r F , n )

2 = sup (±f, t)(mn,r , t) dt : f ∈ F, f ⊥ 1 . 0

Let us describe the scheme of the proof of Theorem A. For non-negative 2-periodic functions f and F we shall write f ≺ F if for any x ∈ [0, 2], x x P (f, t) dt P (F, t) dt. 0

0

The following extremal property of monosplines was proved in [5] in order to establish Theorem A. Theorem B. Let n, r = 1, 2, . . . . Then for any m ∈ Mnr and any ∈ R, (mn,r − )± ≺ (m − )± . To prove Theorem B it was enough (see Theorem 10 in Section 3) to prove the following: Theorem C. Let n, r = 1, 2, . . . . Then for any m ∈ Mnr and any , > 0, E0 (mn,r )1;, E0 (m)1;, . (For the definition of the values E0 (f )1;, , f ∈ L1 , see Section 3.) To prove this it was enough to prove: Theorem D. Let r = 1, 2, . . . and , > 0. Then for an arbitrary n ∈ N the quadrature formula r with equidistant nodes and equal coefficients is optimal for the class W∞; , . Moreover, r ± r R± (W∞; , , Kn ) = R (W∞;, , n ) = E0 (mn,r )1;−1 ,−1

= −2 min(±n,r;−1 ,−1 (u)). u

896


To prove the last theorem the following two theorems were established: Theorem E. For any n points x1 < x2 < · · · < xn < x1 + 2 there exists a spline g ∈ Snr (, ) with equal minima at these points. Theorem F. Let n, r = 1, 2, . . . and , , , > 0. Then for any g ∈ Snr (, ), E0 (n,r;, )1;, E0 (g)1;, . Interval quadrature formulae have been considered by many mathematicians (see for instance [18,20,13,21,4,17,15]). The results about optimal interval quadrature formula for the classes of r [17], and W 1 F [7,8]. differentiable periodic functions are known for the classes W1r [4], W∞ The main result of our paper is the following: Theorem 1. Let n, r = 1, 2, . . . and 0 < h < /n. Then for an arbitrary -invariant set F, R± (W r F, Kni (h)) = R ± (W r F, in ) = R± (Sh (W r F ), Kn ) = R ± (Sh (W r F ), n )

2 = sup (±f, t)(Sh (mn,r ), t) dt : f ∈ F, f ⊥ 1 . 0

To prove this theorem we shall use the above presented scheme of the proof of Theorem A. In particular, we shall prove the following theorem which is of independent interest. Theorem 2. Let n, r = 1, 2, . . . and 0 < h < /n. Then for any mh ∈ Mnr,h and any ∈ R, (Sh (mn,r ) − )± ≺ (mh − )± . To prove Theorem 2 it is enough to prove: Theorem 3. Let n, r = 1, 2, . . . . Then for any mh ∈ Mnr,h and any , > 0, E0 (Sh (mn,r ))1;, E0 (mh )1;, . To prove this it suffices to prove the following: Theorem 4. Let r = 1, 2, . . . and , > 0. Then for an arbitrary n ∈ N the interval quadrature formula with equal coefficients and node intervals having equidistant midpoints is optimal for the r class W∞; , . Furthermore, r i ± r i R± (W∞; , , Kn (h)) = R (W∞;, , n ) = E0 (Sh (mn,r ))1;−1 ,−1

= −2 min(±h u

n,r;−1 ,−1

(u)).

To prove Theorem 4 we shall prove the following two theorems. Theorem 5. For every system of points x1 < x2 < · · · < xn < 2 + x1 there exists a function fr ∈ Snr (, ) such that frh attains equal minimal values at these points.


897

Theorem 6. Let n, r = 1, 2, . . . , 0 < h < /n and , , , > 0. Then for every f ∈ Snr (, ), E0 (hn,r;, )1;, E0 (Sh (f ))1;, . The implementation of this outline meets serious difficulties connected with the fact that Steklov operator Sh does not have the following property: for any 2-periodic function f having zero mean value on a period

(Sh (f )) (f ), where (f ) is number of sign changes of the function f on a period. To overcome these difficulties we shall prove the following theorem which plays the crucial role in proofs of Theorems 5 and 6. Theorem 7. Let n, r = 1, 2, . . . and , > 0. Let splines s1 , s2 ∈ Sn0 (, ) be such that (s1h ) =

(s2h ) = 2n. If f (t) = s1 (t) − s2 (t) then (f h ) (f ). 3. Some auxiliary results Here we shall present some known definitions and results which will be frequently used in the rest of the paper. Let 1 p ∞. Let f ∈ Lp and let H be a subspace of L1 . We shall denote by E(f ; H )p the best approximation of the function f by the subspace H in the Lp -metric, i.e.: E(f ; H )p = inf{f − up : u ∈ H }. In addition, let E ± (f ; H )p = inf{f − up : ±u ± f, u ∈ H } denote the best one-sided approximation of the function f by the subspace H in the Lp -metric. Let , > 0. Then we shall denote by E(f ; H )p;, the best (, )-approximation [2] of the function f by the subspace H in the Lp -metric, i.e.: E(f ; H )p;, = inf{(f − u)+ + (f − u)− p : u ∈ H }. For = we obtain, up to a constant factor, the usual best approximation (instead of E(f ; H )p;1,1 we shall write E(f ; H )p ). By virtue of Theorem 2 in [2], as → ∞ ( → ∞), E(f ; H )p;1, (E(f ; H )p;,1 ) tends monotone non-decreasingly to the best approximation from below (from above) of the function f by the elements of H in the Lp -metric: E + (f ; H )p (E − (f ; H )p ), i.e.: lim E(f ; H )p;1, = E + (f ; H )p lim E(f ; H )p;,1 = E − (f ; H )p . →∞

→∞

This allows us to include the problem of the best approximation without constraint and the problem of the best one-sided approximation into the family of problems of the same type with “loose” constraints, and consider them from a general point of view (see for this reason also [3]). In what follows we shall allow +∞ for or identifying E(f ; H )p;, with the corresponding one-sided approximation. When H is the space of all constants, let E0 (f )p;, = E(f ; H )p;, .

898


Theorem 8 (Criterion for the best (, )-approximation,[2], Theorem 4). Let H be a finite dimensional subspace of Lp , 1p < ∞, and , > 0. For an element u0 ∈ H to be the best (, )-approximation for f ∈ Lp in the Lp -metric, it is sufficient and (for p = 1 in the case if f − u0 almost everywhere differs from 0) necessary, that for any u ∈ H , 2 u(t)|f (t) − u0 (t)|p−1 [p sign (f (t) − u0 (t))+ − p sign (f (t) − u0 (t))− ] dt = 0. 0

Theorem 9 (Duality theorem for the best (, )-approximation, [2], Theorem 5). Let 1 p 0 and x ∈ [0, 2). Define A (x) =

∞ 1 eiqx . 2 q=−∞ ch(q )

It is easy to verify that the convolution of function A (x) and arbitrary periodic function is analytic on a real line. Hence, a convolution of A (x) and an arbitrary function, not identically constant, differs from zero almost everywhere. It is known (see, for example, [6]) that for every function f ∈ C2 ,

(A ∗ f ) (f ).

(3.1)

In addition, for every f ∈ C2 , (A ∗ f )(·) − f (·)C2 → 0 as → 0. Let n, r = 1, 2, . . . and 0 < h < /n. Due to Lemma 5.1 from [5], it is easy to verify that the following lemma holds. Lemma 3.1. Let the spline g ∈ Snr (, ) with nodes at the points x1 , . . . , x2l be such that g (r) (x) = for x ∈ (x1 , x2 ). Then (A ∗ g h )(x) = ((A ∗ Drh ) ∗ g (r) )(x) = ( + )

2l j =1

(−1)j

2 0

D1 (xj − t)[Dr ∗ A ]h (x − t) dt.


899

Lemma 3.2 (Babenko [5], Lemma 5.2). Let the function g ∈ L1 be almost everywhere different from every fixed constant and g ⊥ 1. Then + 2 − E0 (g)1;, = inf |g(t) − | dt + 2 . 2 2 ∈R 0 Lemma 3.3. Let s ∈ Sn0 (, ), and x1 < x2 < · · · < x2l < x1 + 2 be the nodes of s. Let ∈ R, and 2 2 2l j h F (x1 , . . . , x2l ; )= (+) (−1) D1 (xj − t)[Dr ∗A ] (x − t) dt− dx. j =1 0

0

Then F is continuously differentiable in the sense that the partial derivatives **F and **xF , k = 1, 2l, k exist and are continuous. Moreover, ⎛ ⎞ 2 2 2l *F j h = − sign ⎝( + ) (−1) D1 (xj − t)[Dr ∗ A ] (x − t) dt − ⎠ dx, * 0

*F = (−1)k ( + ) *xk ⎛

j =1

2 0

× sign ⎝( + )

0

[A ∗ Dr ]h (xk − x)

2l j =1

2 (−1)j

⎞ D1 (xj − t)[Dr ∗ A ]h (x − t) dt − ⎠ dx.

0

This lemma can be proved analogously to the proof of Lemma 5.3 from [5]. Lemma 3.4 (See Babenko [6]). Let n, r = 1, 2, . . . , , > 0, and l ∈ N, l < n. Then for an arbitrary t ∈ [0, 2), min(A ∗ l,r;, )(u) < (A ∗ n,r;, (t)) < max(A ∗ l,r;, )(u). u

u

The statement of this lemma was noted in [6]. The following theorems represent the statements of Theorem 2.3 and Lemmas 2.2–2.3 from [5]. Theorem 10. Let f and F be continuous 2-periodic functions with zero mean value on a period and for all , > 0 let E0 (f )1;, E(F )1;, . Then f± ≺ F± .

(3.2)

Theorem 11. For any f ∈ L1 with zero mean value on a period and for any F ∈ L1 the following equality holds:

2 2 g(t)F (t) dt : (g) = (f ) = (f, t)(F, t) dt. sup 0

0

900


Theorem 12. Let the 2-periodic functions f and F be continuous with zero mean values on a period and such that for all ∈ R and x ∈ [0, 2), the inequality (3.2) holds. Then for any function g ∈ L1 with zero mean value on a period we have 2 2 (g, t)(f, t) dt (g, t)(F, t) dt. 0

0

4. Some properties of averaged (, )-splines In this section we shall prove Theorem 7 which plays very important role in the rest of the paper. Let n be a positive integer, , > 0, and 0 < h < /n. The following two results represent a generalization of Lemmas 2 and 3 from the paper of Motornyi [17] for the case of non-symmetric perfect splines. Lemma 4.1. Let s ∈ Sn0 (, ) be an arbitrary spline and let us denote by x1 < x2 < · · · < x2n its nodes on a period. Then the Steklov function s h is non-decreasing on the interval (xj −h, xj +1 −h), if s(t) ≡ on the interval (xj , xj +1 ), and is non-increasing on (xj − h, xj +1 − h), if s(t) ≡ − on (xj , xj +1 ). Proof. Let us consider the first derivative of the Steklov function

t+h d 1 1 h (s ) (t) = s(t) dt = [s(t + h) − s(t − h)]. dt 2h t−h 2h This provides (s h ) (t − h) = [s(t) − s(t − 2h)]/2h. It can be easily seen that (s h ) (t − h) 0 on the interval t ∈ (xj , xj +1 ), if s(t) ≡ on the same interval, and (s h ) (t − h)0 on the interval t ∈ (xj , xj +1 ), if s(t) ≡ − on the same interval. Thus, we obtain s h is non-decreasing on (xj − h, xj +1 − h), if s(t) ≡ on the interval (xj , xj +1 ). Similarly, s h is non-increasing, if s(t) ≡ − on the interval (xj , xj +1 ). This is the desired conclusion. Lemma 4.2. Let s ∈ Sn0 (, ). Assume that (s h ) = 2n. Then the length of the interval (xj , xj +1 ) is greater than 2h/( + ) in the case s(t) ≡ on this interval, and is greater than 2h/( + ) in the case s(t) ≡ − on (xj , xj +1 ). Proof. Let x1 < x2 < · · · < x2n < x1 + 2 denote the nodes of the spline s and let x2n+1 = x1 + 2. Since (s h ) = 2n, by the previous lemma, we have that s h (xj − h)s h (xj +1 − h) < 0, j = 1, 2n − 1. Without loss of generality, we may assume sign s h (xj − h) = (−1)j , j = 1, 2n. From this it follows that s(t) ≡ on the interval (x1 , x2 ). Note that the sum of lengths of all intervals (xj , xj +1 ), j = 1, 2n, on which s attains the value , is equal to 2/( + ). Then there exists an interval on which s(t) ≡ , with the length greater than 2h/( + ). Similarly, there exists an interval on which s(t) ≡ −, with the length greater than 2h/( + ). Suppose the assertion of the lemma is false. Then, due to the remark above, we obtain two possible cases: (1) There exists 1j 2n such that s(t) ≡ for t ∈ (xj −1 , xj ), s(t) ≡ − for t ∈ (xj , xj +1 ) and the length of the interval (xj −1 , xj ) is greater than 2h/( + ) and the length of the interval (xj , xj +1 ) is less than 2h/( + ).


901

(2) There exists 1j 2n such that s(t) ≡ − for t ∈ (xj −1 , xj ), s(t) ≡ for t ∈ (xj , xj +1 ) and the length of the interval (xj −1 , xj ) is greater than 2h/( + ) and the length of the interval (xj , xj +1 ) is less than 2h/( + ). We consider the first case in detail. The second one can be studied similarly. Without loss of generality, we may assume that j = 2. From this we have s h (x3 − h) < 0, since sign s h (x3 − h) = −1. Let us consider x3 − 2h x2 − 2h/( + ). Then x3 1 s(t) dt s h (x3 − h) = 2h x3 −2h x2 x3 x2 −2h/(+) 1 = s(t) dt + s(t) dt + s(t) dt 2h x2 −2h/(+) x2 x3 −2h

1 2h 2h (−) · x2 − − x3 + 2h + · − · (x3 − x2 ) = 0. 2h + + In the case x3 − 2hx2 − 2h/( + ) we obtain

x2 x3 1 1 h s (x3 − h) = · [2h − ( + )(x3 − x2 )] > 0. s(t) dt + s(t) dt = 2h 2h x3 −2h x2 Thus, s h (x3 − h)0, which contradicts the fact that s h (x3 − h) < 0.

The following statement is a trivial corollary of Lemma 4.2. Lemma 4.3. Let the spline s ∈ Sn0 (, ) be such that (s h ) = 2n. Then for an arbitrary point x ∈ [0, 2) spline s has at most two sign changes on the interval (x − h, x + h). Due to Lemma 4.3, considering different possibilities for location of points, where splines s1 and s2 change their signs, we obtain that the following lemma holds. Lemma 4.4. Let splines s1 , s2 ∈ Sn0 (, ) be such that (s1h ) = (s2h ) = 2n and let x be an arbitrary point from the interval [0, 2). Then the difference f (t) = s1 (t) − s2 (t) has at most two sign changes on the interval (x − h, x + h). Lemma 4.5. Let splines s1 , s2 ∈ Sn0 (, ) be such that (s1h ) = (s2h ) = 2n. Assume there exists a point x ∈ [0, 2) such that the function f (t) = s1 (t) − s2 (t) has exactly two sign changes on the interval (x − h, x + h). Then there exists x˜ > 0 such that the function f has exactly one sign change on the interval (x + x˜ − h, x + x˜ + h). Moreover, f h (y) = f h (x) for arbitrary y ∈ [x, x + x]. ˜ Proof. Let x ∈ [0, 2) satisfy conditions of the lemma. Analyzing different possibilities for location of nodes of splines s1 and s2 on the interval [x − h, x + h], we conclude that the function f has exactly two sign changes on this interval only when both splines s1 and s2 have exactly two sign changes on the interval [x − h, x + h] and there exists a neighborhood U (x − h) of the point x − h such that s1 (t) ≡ const and s2 (t) ≡ const when t ∈ U (x − h), and s1 (x − h) · s2 (x − h) < 0. Without loss of generality, we may assume that s1 (x − h) = and s2 (x − h) = −.

902


Let x1,1 , x1,2 , x1,3 and x1,4 be the neighboring nodes of the spline s1 such that x1,1 x − h < x1,2 < x1,3 < x + h x1,4 and let x2,1 , x2,2 , x2,3 and x2,4 be the neighboring nodes of the spline s2 such that x2,1 x − h < x2,2 < x2,3 < x + h x2,4 . Then s1 (t) ≡ in the case t ∈ (x1,1 , x1,2 ) or t ∈ (x1,3 , x1,4 ) and s1 (t) ≡ − in the case t ∈ (x1,2 , x1,3 ). At the same time s1 (t) ≡ − in the case t ∈ (x2,1 , x2,2 ) or t ∈ (x2,3 , x2,4 ) and s1 (t) ≡ in the case t ∈ (x2,2 , x2,3 ). Set x˜ = min{x1,2 −x +h; x2,2 −x +h}. Without loss of generality we assume x˜ = x1,2 −x +h. This implies the splines s1 and s2 are equal to and −, respectively, on the interval (x − h, x1,2 ). Thus we obtain f (t) = + for t ∈ (x − h, x1,2 ). At the same time splines s1 and s2 are equal to and −, respectively, on the interval (x + h, x + x˜ + h). Indeed, applying Lemma 4.2 we obtain x1,3 − x1,2 > 2h/( + ) and x1,4 − x1,3 > 2h/( + ). From the last inequalities we conclude that x1,4 − x1,2 > 2h = x + h + x˜ − x1,2 , hence that x + x˜ + h < x1,4 . Similarly, x + x˜ + h < x2,4 . By these arguments for an arbitrary y ∈ [0, x] ˜ we have f (x − h + y) = f (x − h) = + = f (x + h) = f (x + h + y). For every z ∈ [0, x] ˜ it can be easily seen that z < 2h. It follows that the following equalities hold: f h (x + z) − f h (x)

x+h x+z+h 1 = f (t) dt − f (t) dt 2h x−h x+z−h

x+z−h x+h x+h x+z+h 1 = f (t) dt + f (t) dt − f (t) dt − f (t) dt 2h x−h x+z−h x+z−h x+h

z z 1 f (x − h + ) d − f (x + h + ) d = 0. = 2h 0 0 Obviously, the function f does not have more than two sign changes on the interval [x+x−h, ˜ x+ x+h], ˜ which completes the proof. Proof of Theorem 7. Set (f h ) = 2b, where b is a positive integer. Due to Lemmas 4.4 and 4.5, there exist points x1 < x2 < · · · < x2b < x1 + 2 such that sign f h (xj ) = (−1)j , j = 1, 2b, and the function f has at most one sign change on each of intervals [xj − h, xj + h], j = 1, 2b. Clearly, for every j = 1, 2b there exists a non-empty interval j ⊂ [xj − h, xj + h] such that sign f (t) = (−1)j on it. Let us denote by yj , yj∗ and yj∗∗ the midpoint, the left and right endpoints of the interval j , respectively. This implies xj − h < yj < xj + h for every j = 1, 2b. j We shall show that the sequence {yj }2b j =1 increases and sign f (yj ) = (−1) , j = 1, 2b. The second proposition holds by choosing points yj . Suppose, there exists j0 such that yj0 > yj0 +1 . Without loss of generality we may take j0 = 1. It can be easily seen that y2 ∈ (x2 −h, x2 +h) and we conclude from the assumption and inequality x1 − h < x2 − h that y1 ∈ (x2 − h, x2 + h)


903

and y2 ∈ (x1 − h, x1 + h). It is easy to verify that x1 − h < x2 − h y2∗ < y2∗∗ y1∗ < y1∗∗ x1 + h < x2 + h. Thus, f (t)0 when t ∈ (x1 − h, x2 − h), otherwise there exist three points from the interval [x1 , −h, x1 + h] with alternate sign. Similarly, f (t) 0 when t ∈ (x1 + h, x2 + h). Therefore, x2 −h x2 +h h h f (t) dt + f (t) dt 0, 0 < f (x2 ) − f (x1 ) = − x1 −h

x1 +h

which is impossible. Thus, y1 < y2 < · · · < y2b < y1 + 2 and sign f (yj ) = (−1)j , j = 1, 2b. This gives (f ) 2b = (f h ). 5. On existence of the spline from Sh (Snr (, )) with prescribed minima This section is devoted to the proof of Theorem 5. This theorem can be proved in many ways. We shall use methods from the paper [16]. Let r, n = 1, 2, . . . , 0 < h < /n and , > 0. Let N˜ nr denote the set of functions f which can be represented in the form f = g h + a,

g ∈ Snr (, ), a ∈ R

and have exactly 2n extrema on a period. It can be easily seen that the Steklov function of every 2/n-periodic function f ∈ Snr (, ) belongs to the set N˜ nr . Hence, N˜ nr = ∅. Let f ∈ Sn0 (, ), and let

1 < 2 < · · · < 2n < 1 + 2 be the nodes of the spline f such that f (t) ≡ when t ∈ ( 1 , 2 ). Then, since f has a zero mean value, the following equality holds 2n

(−1)j j =

j =1

2 . +

(5.1)

Hence, every system of points 1 < 2 < · · · < 2n−1 < 1 + 2 such that 2n−1 2 (−1)j j < 1 + 2 − + j =1

uniquely determines some spline f ∈ Sn0 (, ). Such a system of points we shall denote by ( = { j }2n−1 j =1 ), and we shall call it as the determining system for the spline f. In addition, we shall denote by f the spline which corresponds to the system of points . Let be a given determining system for some spline. Then set

2n =

2n−1 2 (−1)j j − + j =1

and

2n+1 = 1 + 2.

904


Lemma 5.1. Let , be determining systems for splines f and f , respectively. If the difference f (t) − f (t) changes sign exactly 2n times on [0, 2) and

j < j +1 < j +2 ,

j = 1, 2n − 1,

then it is necessary that j = j for every j = 1, 2n. Proof. Let 0 = 2n − 2 and 0 = 2n − 2. It is easy to verify that the difference f (t) − f (t) changes sign at most once on each of intervals ( j −1 , j +1 ) and (j −1 , j +1 ), j = 1, 2n. Assume to the contrary, there exists 1 j 2n such that j = j . There are two possible cases: j −1 j −1 and j −1 j −1 . We will consider the first case. In this case f (t) − f (t) does not have sign changes on the interval ( j −1 , j +1 ), which contradicts the assumption (f − f ) = 2n. The second one can be studied similarly. We shall denote by U ( ) the closed ball with the center = ( 1 , . . . , 2n−1 ) and the radius > 0, in (2n − 1)-dimensional space R2n−1 with the norm := max | j |. j

Lemma 5.2. Let ∈ R2n−1 be the determining system for the spline f ∈ Sn0 (, ). Then there exists > 0 such that an arbitrary point ∈ U ( ) is a determining system for some spline f ∈ Sn0 (, ). This lemma can be proved similarly to Lemma 3.2 in [16]. Let be the determining system for the spline f ∈ Sn0 (, ) such that f h,r ∈ N˜ nr , where f ,r (t) = (Ir f )(t) = (Dr ∗ f )(t) :=

2

Dr (t − )f () d.

0

Since Ir is a bounded operator, we may assume to be such that fh,r ∈ N˜ nr for every ∈ U ( ). Let us consider an arbitrary interval (a, a + 2) containing n points x1 < x2 < · · · < xn at which f h,r attains its minima. We may choose > 0 such that for every ∈ U ( ) points y1 < y2 < · · · < yn at which fh,r attains its minima, belong to the interval (a, a + 2). For every point ∈ U ( ) let () = {y1 , . . . , yn , fh,r (y2 ) − fh,r (y1 ), . . . , fh,r (yn ) − fh,r (y1 )}. Clearly, the mapping from the ball U ( ) into R2n−1 is continuous. Lemma 5.3. There exists < such that the restriction of the mapping to the ball U ( ) is injective. Proof. Let a t1 t2 · · · t2n < a + 2 be the points at which f h,r attains its local extrema.

Set mj := f h,r (tj ), j = 1, 2n. Let us denote by w0 the smallest number satisfying the equality (f h,r ; w) =

1 2

min |mj +1 − mj |,

j =1,2n

where m2n+1 = m1 , and (g; t) is the modulus of continuity of the function g. Let := 41 min | j +1 − j |. For every 0 < ε < 18 min |mj +1 − mj | let us choose such j =1,2n

j =1,2n

that < min{; w0 /2; /2} and for an arbitrary ∈ U ( ) the distance between functions


905

fh,r and f h,r in the L∞ -metric does not exceed ε. Due to the definition of numbers ε and w0 , we have that the distance between the neighboring points of local extremum of the function fh,r , ∈ U ( ), is greater than or equal to w0 . Now suppose the assertion of the lemma is false. Then there exist two points , ∈ U ( ), = , such that () = (). Let {yj }nj=1 and {zj }nj=1 be the points from the interval (a, a+2) at which fh,r and fh,r attain their local minima, respectively. Hence, yj = zj and fh,r (yj ) − fh,r (y1 ) = fh,r (zj ) − fh,r (z1 ), j = 1, n. Let u = 1 − 1 , and let us consider the function f (t − u). Let us consider the case u > 0 in detail. The case u < 0 can be studied similarly. Clearly, {1 , 2 +u, . . . 2n−1 +u} is a determining system for the spline f (t − u). Let 2n be chosen such that 2n =

2n−1 2 (−1)j j . − + j =1

For every j = 1, 2n − 2 we have j + u < j + 2j +1 j +2 − 2 < j +2 + u, since |u|2 < 2. Let us apply Lemma 5.1 to determining systems for the splines f (t − u) and f (t). Since the first points of these systems are equal, Lemma 5.1 shows that the difference f (t − u) − f (t) has at most (2n − 1) sign changes. Define g1 (t) := fh,r (t − u) − fh,r (y1 ), g2 (t) := fh,r (t) − fh,r (y1 ). We shall show that the difference g1 (t) − g2 (t) has at least two sign changes on every interval [yj , yj +1 ], j = 1, 2n. Since |u| < 2 < w0 , g1 (yj ) − g2 (yj ) > 0,

j = 1, n.

Furthermore, g1 (yj + u) − g2 (yj + u) < 0,

j = 1, n.

Hence,

(g1 − g2 ) (g1 − g2 ) 2n. At the same time g1 (t) − g2 (t) = fh,r−1 (t − u) − fh,r−1 (t) and (fh,r−1 ) = (fh,r−1 ) = 2n. Therefore, applying Rolle’s theorem and Theorem 7 we obtain 2n (g1 − g2 ) (g1 − g2 ) = (fh,r−1 (· − u) − fh,r−1 (·)) (fh (· − u) − fh (·)) (f (· − u) − f (·)) 2n − 1, which is impossible. This proves the lemma.

906


Since the mapping is continuous, we derive from the last lemma that is a homeomorphism from U ( ) into R2n−1 . Let us denote by E the set of points x = (x1 , x2 , . . . , xn−1 ) ∈ Rn−1 such that 0 < x1 < · · · < xn−1 < 2. Obviously, E is a connected set. Let E0r ⊂ E be such that for every point x ∈ E0r there exists a function f h,r ∈ N˜ nr with equal local minima at the points 0, x1 , . . . , xn−1 . The set E0r is non-empty, since (2/n, 4/n, . . . , 2(n − 1)/n) ∈ E0r . In fact, for an arbitrary 2/n-periodic function f h,r ∈ N˜ nr we can choose a number b such that

the function f h,r (t + b) attains its minima at the points 2k/n, k = 0, n − 1. Lemma 5.4. The set E0r is open in E.

Proof. For an arbitrary x ∈ E0r there exists a function f h,r ∈ N˜ nr that attains the equal local minima at the points 0, x1 , . . . , xn−1 . Due to Lemma 5.3, there exists a ball U ( ) such that the mapping () : U ( ) → R2n−1 is a homeomorphism. By virtue of theorem about invariance (see [1, p. 196]) of the domain, this provides the existence of an interior point ∈ (U ), ( ) = (0, x1 , . . . , xn−1 , 0, . . . , 0) ∈ (U ( )). Moreover, there exists a neighborhood of the point x such that for every point y ∈ E from this neighborhood (0, y1 , . . . , yn−1 , 0, . . . , 0) ∈ (U ( )). Thus, there exists ∈ U ( ) such that () = (0, y1 , . . . , yn−1 , 0, . . . , 0). This completes the proof. Lemma 5.5. The set E0r is closed in E. r Proof. Let x ∈ E and let the sequence {x m }∞ m=1 ⊂ E0 converges to x as m → ∞. By definition m m of the sequence {x }, for every point x there exists a spline f m ∈ Sn0 (, ) with a determining 2n−1 system m = { m j }j =1 such that

f hm ,r (0) = f hm ,r (xj ),

j = 1, n − 1

and f hm ,r−1 (0) = f hm ,r−1 (xjm ) = 0,

j = 1, n − 1.

It can be easily seen that there exists the subsequence { mk } which tends to some point ∈ R2n−1 as k → ∞. Clearly, is the determining system for the spline f ∈ Sn0 (, ). This implies that f mk − f 1 → 0 as k → ∞. From this, the sequence {f hmk ,b } converges uniformly to f h,b for an arbitrary integer 0 b r. Thus, we have f hmk ,r (xjmk ) → frh (xj )

and

h f hmk ,r−1 (xjmk ) → fr−1 (xj ),


907

as k → ∞, for every j = 1, n − 1, and f hmk ,r (0) → frh (0) and

h f hmk ,r−1 (0) → fr−1 (0),

as k → ∞. Hence, frh (xj ) = frh (0),

j = 1, n − 1,

h h (xj ) = fr−1 (0) = 0, fr−1

j = 1, n − 1,

and frh attains its minima at the points 0, x1 , . . . , xn−1 . This proves the lemma.

To summarize, observe that E0r is non-empty, open and closed subset in the connected set E. This gives E0r = E. Thus, the last remark proves Theorem 5. 6. Proof of Theorem 6 In this section we shall prove the following: Theorem 13. Let n, r = 1, 2, . . . , 0 < h < /n and , , , , > 0. Then for every function f ∈ Snr (, ), E0 (A ∗ hn,r;, )1;, E0 (A ∗ f h )1;, . We shall establish Theorem 6 by letting → 0. Let n, r = 1, 2, . . . , , > 0 and 0 < h < /n. Note that the nodes x1 < x2 < · · · < x2l < x1 + 2, l n, of the spline g ∈ Snr (, ) for which g (r) attains the value on the interval (x1 , x2 ) satisfy 2l

(−1)j xj =

j =1

2 . +

Fix , , , , , n, r and consider the extremal problem E0 (A ∗ g h )1;, → inf,

g ∈ Snr (, ).

(6.1)

Since A ∗ Sh (Snr (, )) := {A ∗ s : s ∈ Sh (Snr (, ))} is compact in the topology of the uniform convergence and E0 (A ∗ g h )1;, continuously depends on g ∈ Snr (, ), the solution of the problem (6.1) exists. Assume that the spline solving the problem (6.1) has exactly 2l, l n, nodes. Due to Lemmas 3.1 and 3.2 the nodes x1 < · · · < x2l < x1 + 2 of this spline are also solutions of the following problem: 2 2l + 2 j h ( + ) (−1) D1 (xj − t)[A ∗ Dr ] (x − t) dt − dx 2 0 0 j =1 − → min, 2 under the constraint + 2

2l j =1

(−1)j xj =

2 , +

(6.2)

∈ R.

(6.3)

908


Due to Lemma 3.3, we can apply the Lagrange multiplier method to study problem (6.2). This implies the following necessary conditions to be satisfied by the solutions x1 , . . . , x2l , of this problem: ⎛ ⎞ 2 2l + 2 − sign ⎝( + ) (−1)j D1 (xj − t)[A ∗ Dr ]h (x − t) dt − ⎠ dx 2 0 0 j =1

+ 2

− = 0, 2

(6.4)

2 + · ( + ) [A ∗ Dr ]h (xk − x) 2 0 ⎛ ⎞ 2 2l ×sign ⎝( + ) (−1)j D1 (xj − t)[A ∗ Dr ]h (x − t) − ⎠ dx

(−1)k

j =1

= (−1)k+1 , 2l

(−1)j xj =

j =1

0

k = 1, 2l,

2 , +

(6.5) (6.6)

where is the Lagrange multiplier. Let x1 < x2 < · · · < x2l < x1 + 2 be such that the relation (6.6) holds. For a given number m = 0, 1, . . . set fm (x) = ( + )

2l

(−1)j +m Dm+1 (xj − x).

j =1

Using this notation we have ( + )

2l

(−1)j

j =1

2 0

D1 (xj − t)[A ∗ Dr ]h (x − t) dt = (A ∗ frh )(x).

Conditions (6.4)–(6.6) can be written as follows. If (x1 , . . . , x2l , ) is a solution of the problem (6.2), then (1) fr ∈ Slr (, ) and A ∗ frh ∈ A ∗ Sh (Slr (, ) so that A ∗ frh is a solution of the problem (6.1). (2) is the constant of the best (, )-approximation of A ∗ frh in the space L1 and if g0 (x) = sign((A ∗ frh )(x) − )+ − sign((A ∗ frh )(x) − )− =

+ − sign((A ∗ frh )(x) − ) − , 2 2

then sign g0 (x) = sign((A ∗ frh )(x) − )

(6.7)


909

and gr (x) = (Dr ∗ g0 )(x) ∈ Slr (, ) and consequently (A ∗ grh )(x) ∈ A ∗ Sh (Slr (, )). (3) A ∗ grh attains at the points xj (nodes of f0 ) the equal values and sign((A ∗ grh )(x) − (A ∗ grh )(x1 )) = ±sign f0 (x). Note that the condition (1) follows from the relation (6.6). As for condition (2), the statement that is the constant of the best (, )-approximation of A ∗ frh in the space L1 follows from condition (6.4) and Theorem 8. From the fact that (f0 ) = 2l, Lemma 4.1, Rolle’s theorem, property (3.1) and relation (6.7) we have gr ∈ Slr (, ). Finally, as for condition (3), the fact that (A ∗ grh )(x) attains at the points xj (nodes of f0 ) equal values follows from condition (6.5). In addition, we can apply Lemma 4.1, Rolle’s theorem and property (3.1) to verify that the difference (A ∗ grh )(x) − (A ∗ grh )(x1 ) does not have zeros different from xj and that this difference changes its sign at the points xj . We shall prove now the following: Theorem 14. Conditions (1)–(3) can be satisfied (up to a translation of the argument) only by the function (A ∗ frh )(x) = (A ∗ hl,r;, )(x). For a number y ∈ R set Fy,0 (x) := f0 (x) − f0 (x + y),

Fy,r = fr (x) − fr (x + y)

Hy,0 (x) = g0 (x) − g0 (x + y),

Hy,r (x) = gr (x) − gr (x + y).

and

h has only isolated zeros. By () let us denote the number of zeros of the Function A ∗ Hy,r function on a period counted according to the following rule: the simple isolated zeros of are counted once, while the multiple zeros are counted two times.

Lemma 6.1. For any y ∈ R, h

(A ∗ Hy,r ) (Fy,0 ).

Proof. In fact, if on a period there exist 2s points t1 < t2 < · · · < t2s at which Fy,0 has non-zero h also has non-zero values at this values with alternating sign, then, by condition (3), A ∗ Hy,r points with alternating sign. This completes the proof. Lemma 6.2. Let r 2. Then for any y ∈ R, h (A ∗ Hy,r ) (Fy,0 ). h )= Proof. Lemma 6.2 is an analogue of Lemma 5.5 from the paper of Babenko [5]. Let (A ∗Hy,r 2s. Then by virtue of Rolle’s theorem and our method of enumerating zeros on a period, there

910


h ) . However, between neighboring zeros of exist 2s different zeros for the function (A ∗ Hy,r h h (A ∗ Hy,r ) , the function (A ∗ Hy,r ) alternates its sign at least once. Applying Rolle’s theorem we obtain h h h

(A ∗ Hy,0 ) · · · (A ∗ Hy,r−2 ) = ((A ∗ Hy,r ) ) 2s.

From property (3.1) of the function A (x) we conclude that h )2s.

(Hy,0

Let us ensure that functions g0 (x) and g0 (x + y) satisfy conditions of Theorem 7. To this end it is suffices to verify that the function g0h has exactly 2l sign changes on a period. By condition (3), the function (A ∗ grh )(x) − (A ∗ grh )(x1 ) changes its sign at nodes of f0 . This implies that this function has exactly 2l sign changes. Hence, due to property (3.1) the difference grh (x) − grh (x1 ) has at least 2l sign changes on a period. However, by Rolle’s theorem,

(g0h )2l. Finally, by Lemma 4.1, the function g0 has at least 2l sign changes. At the same time, due to condition (2), g0 ∈ Slr (, ). This provides that g0 and consequently g0h have exactly 2l sign changes on a period. Thus, functions g0 (x) and g0 (x + y) satisfy conditions of Theorem 7. Applying Theorem 7 we conclude that

(Hy,0 )2s. As a consequence, there exist 2s points t1 , . . . , t2s on a period such that Hy,0 attains non-zero values at these points and alternates its sign when an argument passing from tj to tj +1 . Because of (6.7), we have h

(A ∗ Fy,r )2s.

Applying Rolle’s theorem and property (3.1) yields h h h h (A ∗ Hy,r ) = 2s (A ∗ Fy,r ) (A ∗ Fy,0 ) (Fy,0 ).

(6.8)

Functions f0 (x) and f0 (x + y) satisfy conditions of Theorem 7. In fact, we have already established that

(g0 ) = 2l. By relation (6.7)

((A ∗ frh )(·) − ) = 2l. Hence, applying property (3.1) and Rolle’s theorem we obtain

(f0h )2l.


911

However, (f0h ) (f0 ) (Lemma 4.1), and since (f0 ) = 2l from the definition of f0 we conclude that

(f0h ) = 2l. Finally, applying Theorem 7 yields h

(Fy,0 ) (Fy,0 ).

Comparing (6.8) with the latter inequality we obtain h (A ∗ Hy,r ) (Fy,0 ).

Proof of Theorem 14. Due to Lemmas 6.1 and 6.2 we conclude that h h (A ∗ Hy,r ) = (A ∗ Hy,r ) = (Fy,0 ) h ) (A ∗ H h ) for any y ∈ R for which A ∗ H h and F as (A ∗ Hy,r y,0 are not identically y,r y,r zero. Thus, every non-identically zero difference must have only isolated simple zeros. We shall show that it follows the function A ∗ grh is 2n−1 -periodic. Let T be the minimal period for A ∗ grh and let a1 be the point of the smallest local maximum of A ∗ grh . We prove that A ∗ grh has exactly two zeros on the interval [a1 , a1 + T ). Assume to the contrary that the function A ∗ grh has at least four zeros on the interval [a1 , a1 + T ). However, then there is at least one local maximum of A ∗ grh on this interval. Let a2 be the point of local maximum of A ∗ grh nearest to a1 from the right, and a3 the local maximum of A ∗ grh nearest to a1 + T from the left. Moreover, let b1 be the point of local minimum of A ∗ grh nearest to a1 from the right, and b2 the local minimum of A ∗ grh nearest to a1 + T from the left. We shall prove h has a multiple zero at some point on the period. that there exists y ∈ (0, T ) such that A ∗ Hy,r This will show that A ∗ grh has a period y < T , i.e., we obtain a contradiction to the minimality of the period T. h )(a ) = If (A ∗grh )(a1 ) = (A ∗grh )(a2 ), then we can choose y = a2 −a1 < T . Hence, (A ∗Hy,r 1 h h h (A ∗gr )(a1 )−(A ∗gr )(a1 +a2 −a1 ) = 0 and (A ∗Hy,r ) (a1 ) = 0. This provides a1 is a multiple h . Now assume (A ∗ g h )(a ) > (A ∗ g h )(a ) and (A ∗ g h )(a ) > (A ∗ g h )(a ). zero of A ∗ Hy,r 2 1 3 1 r r r r Let us consider the values (A ∗ grh )(b1 ) and (A ∗ grh )(b2 ). If they are equal, then we can choose y = b2 − b1 . Without loss of generality, we may assume (A ∗ grh )(b2 ) > (A ∗ grh )(b1 ). Hence, there exist c1 ∈ (a1 , b1 ) and c2 ∈ (a3 , b2 ) such that (A ∗ grh )(c1 ) = (A ∗ grh )(b2 ) and (A ∗ grh )(c2 ) = (A ∗ grh )(a1 ). Let us show that there exist ∈ [a1 , c1 ] and ∈ [c2 , b2 ] such that (A ∗ grh )( ) = (A ∗ grh )() and (A ∗ grh ) ( ) = (A ∗ grh ) (). It can be easily seen that A ∗ grh decreases on the intervals [a1 , c1 ] and [c2 , b2 ]. In addition, (A ∗grh )(t) attains every value from the interval [(A ∗grh )(b2 ), (A ∗grh )(a1 )], when t ∈ [a1 , c1 ]. Similarly, (A ∗ grh )(t) attains every value from the interval [(A ∗ grh )(b2 ), (A ∗ grh )(a1 )], when t ∈ [c2 , b2 ]. Therefore, there exist functions 1 = (A ∗grh |[a1 ,c1 ] )−1 and 2 = (A ∗grh |[c2 ,b2 ] )−1 , defined on the interval [(A ∗ grh )(b2 ), (A ∗ grh )(a1 )], which are continuously differentiable. Then limx→x0 1 (x) = ∞, when x0 = (A ∗grh )(a1 ), and is finite, when x0 = (A ∗grh )(b2 ). In addition, limx→x0 2 (x) = ∞, when x0 = (A ∗ grh )(b2 ), and is finite, when x0 = (A ∗ grh )(a1 ). Thus, there exists w ∈ [(A ∗ grh )(b2 ), (A ∗ grh )(a1 )] such that 1 (w) = 2 (w). Hence, there exist

∈ (a1 , c1 ) and ∈ (c2 , b2 ) such that (A ∗ grh )( ) = (A ∗ grh )() = w and

(A ∗ grh ) ( ) =

1 1 (w)

=

1 2 (w)

= (A ∗ grh ) ().

912


Then y = − < T is a period of A ∗ grh , which is impossible. This implies A ∗ grh has exactly two zeros on [a1 , a1 + T ). Since A ∗ grh has 2l zeros on [0, 2), from the last note we have that T = 2l −1 . As a consequence A ∗ grh has period 2l −1 . However, then both f0 and A ∗ frh are 2l −1 -periodic, so that A ∗ frh = A ∗ hl,r;, up to a translation of the argument. Theorem 14 is proved. To prove Theorem 13 it remains to show that E0 (A ∗ hn,r;, )1;, < E0 (A ∗ hl,r;, )1;,

(6.9)

as soon as l < n. The proof falls naturally into four parts. Lemma 6.3. Let l < n, , > 0 and r = 1, 2, . . . . Then for an arbitrary x ∈ [0, 2), min(A ∗ hl,r;, )(t) < (A ∗ hn,r;, )(x) < max(A ∗ hl,r;, )(t). t

t

(6.10)

Proof. We shall prove the second inequality of (6.10). The first one can be established similarly. From Lemma 3.4 we have that (A ∗ n,r;, )(x) < max(A ∗ l,r;, )(t) for an arbitrary t

x ∈ [0, 2). Let y, z ∈ R be such that max(A ∗ hn,r;, )(t) = (A ∗ hn,r;, )(y) t

and max(A ∗ hl,r;, )(t) = (A ∗ hl,r;, )(z). t

Let us consider the function f (t) = (A ∗ l,r;, )(z + t) − (A ∗ n,r;, )(y + t). It follows that f (−h) = f (h), and there exists a point ∈ [−h, h] such that f ( ) > 0. It can be easily seen that f does not have sign changes on [−h, h] when f (h) > 0. Then f (t) > 0 for every t ∈ [−h, h] and h 1 h h h f (0) = (A ∗ l,r;, )(z) − (A ∗ n,r;, )(y) = f (t) dt > 0. 2h −h Now we shall consider the case f (h) < 0. Let the point 0 ∈ (−h − 2/n, −h) be such that (A ∗ l,r;, )(0 ) = (A ∗ l,r;, )(0 + 2/n). Then f has exactly two sign changes on the interval [0 , 0 + 2/n]. Therefore, f h (0) = (A ∗ hl,r;, )(z) − (A ∗ hn,r;, )(y) h +2/n 1 1 f (t) dt f (t) dt > 0, = 2h −h 2h which can be easily verified. This completes the proof.

Lemma 6.4. Let , ∈ R be such that (A ∗ hn,r;, )( ) = (A ∗ hl,r;, )(). Then |(A ∗ hn,r−1;, )( )| |(A ∗ hl,r−1;, )()| as soon as (A ∗ hn,r−1;, )( ) · (A ∗ hl,r−1;, )() > 0.


913

Proof. Let x1 < x2 < · · · < x2l < x1 + 2 be the points of extrema of the function A ∗ hl,r;, . Assume to the contrary, that there exist points , ∈ R such that (A ∗ hn,r;, )( ) = (A ∗ hl,r;, )() and |(A ∗ hn,r−1;, )( )| > |(A ∗ hl,r−1;, )()|, although (A ∗ hn,r−1;, )( ) · (A ∗ hl,r−1;, )() > 0. Applying Theorem 7 we obtain that the function f (t) = (A ∗ hl,r;, )(t) − (A ∗ hn,r;, )(t + − ) has exactly one zero on every

interval [xj , xj +1 ), j = 1, 2l, x2l+1 = x1 + 2. Without loss of generality we may assume that (A ∗ hn,r−1;, )( ) > 0. This implies f () = 0 and f () < 0. Let ∈ [xj , xj +1 ). Thus, there exists at least one zero of f either on the interval (, xj +1 ) or on the interval [xj , ), which is impossible. Lemma 6.5. Let l < n. Then (A ∗ hn,r;, )± ≺ (A ∗ hl,r;, )± .

(6.11)

Proof. Let us consider the rearrangements of the functions (A ∗ hl,r;, )(t) − and (A ∗ hn,r;, )(t) − for an arbitrary ∈ R. Applying Lemma 6.3, yields (A ∗ hn,r;, − , 0) < (A ∗ hl,r;, − , 0)

and

(A ∗ hn,r;, − , 2) > (A ∗ hl,r;, − , 2). Obviously, 2 0

(A ∗ hn,r;,

− , t) dt =

2 0

(A ∗ hl,r;, − , t) = −2.

(6.12)

It follows that (A ∗ hn,r;, − , t) and (A ∗ hl,r;, − , t) intersect at least at one point on [0, 2). We shall prove that there exists exactly one point of intersection of these functions. Assume to the contrary that there exist two points of intersection of (A ∗ hn,r;, − , t) and (A ∗ hl,r;, − , t). Hence, there exist points xn and xl such that (A ∗ hn,r;, − , xn ) = (A ∗ hl,r;, − , xl ) = z and (A ∗ hn,r;, − , xn ) < (A ∗ hl,r;, − , xl ). Let points xn < xn and xl < xl from [0, 2) be such that (A ∗ hn,r;, )(xn ) = (A ∗ hn,r;, )(xn ) = (A ∗ hl,r;, )(xl ) = (A ∗ hl,r;, )(xl ) = z and (A ∗ hn,r;, )(x) > z for every x ∈ (xn , xn ) as well as (A ∗ hl,r;, )(x) > z for ev-

ery x ∈ (xl , xl ), since the equality (A ∗ hn,r;, )(x) = c, c ∈ (minu (A ∗ hn,r;, )(u),

914


maxu (A ∗ hn,r;, )(u)), always has exactly 2n solutions on the period. Thus, 1 · n

(A ∗ hn,r;, − , xn ) =

1 1

−

(A ∗ hn,r−1;, )(xn )

1 (A ∗ hn,r−1;, )(xn )

and (A ∗ hl,r;, − , xl ) =

1 · l

1 1 (A ∗ hl,r−1;, )(xl )

−

1

.

(A ∗ hl,r−1;, )(xl )

Applying Lemma 6.4 we obtain (A ∗ hn,r−1;, )(xn ) < (A ∗ hl,r−1;, )(xl )

and

(A ∗ hn,r−1;, )(xn ) > (A ∗ hl,r−1;, )(xl ). This provides (A ∗ hl,r;, − , xl ) =

1 · l

1 1 (A ∗ hl,r−1;, )(x l )

1 · l

−

1 (A ∗ hl,r−1;, )(xl )

1 1 (A ∗ hn,r−1;, )(xn )

−

1 (A ∗ hn,r−1;, )(xn )

(A ∗ hn,r;, − , xn ), which is impossible. Therefore, for every x ∈ [0, 2) x x h ˜ ˜ t) dt, (A ∗ l,r;, − , t) dt (A ∗ hn,r;, − , 0

0

where ˜ = for arbitrary ∈ R, which is the desired conclusion.

(A ∗ hl,r;, , 2). Due to (6.12), it follows immediately that inequality (6.11) holds

Relation (6.9) easily follows from Lemma 6.5. In fact, taking x = 2 and , to be the constant of the best (, )-approximation of the function A ∗ hl,r;, in the space L1 , we can assert that E0 (A ∗ hn,r;, )1;, 2 [((A ∗ hn,r;, )(t) − )+ + ((A ∗ hn,r;, )(t) − )− ] dt 0

=

2

0 2 0

P ((A ∗ hn,r;, )+ , t) dt

+

2

P ((A ∗ hl,r;, )+ , t) dt +

= E0 (A ∗ hl,r;, )1;, .

0 2

0

P ((A ∗ hn,r;, )− , t) dt

P ((A ∗ hl,r;, )− , t) dt


915

Thus, the inequality (6.8) holds, which proves Theorem 13. Letting → 0, we obtain that Theorem 6 holds. 7. Optimal interval quadrature formula on classes W r F (Proof of Theorems 1–4) Let n, r = 1, 2, . . . , 0 < h < /n and , > 0. Let x1 < x2 < · · · < xn < x1 + 2. Due h ∈ Sh (Snr (, )) such that it attains equal minimal to Theorem 5, there exists the spline f±, x; ¯ , n values at the points {xj }j =1 . Then, ⎡ ⎤ 2 n ⎣± inf sup f h (t) dt ∓ aj f h (xj )⎦ aj f ∈W r

0

∞;−1 ,−1

2 0

j =1

h h [±f±, x; ¯ , (t) − min(±f±,x; ¯ , (u))] dt.

(7.1)

u

For the formula in with equidistant nodes we have R ± (W r

∞;−1 ,−1

, in ) = R ± (Sh (W r =

2 0

∞;−1 ,−1

), n )

[±hn,r;, (t) − min(±hn,r;, (u))] dt. u

(7.2)

In fact, due to (7.1), it suffices to prove that the left-hand side does not exceed the right-hand side. Let be the constant of best (, )-approximation of Sh (mn,r ). Restricting our consideration to R + (W r −1 −1 , in ) and taking into account (1.4) and Theorem 9, we have ∞; , R + (W r −1 −1 , in ) ∞; ,

= R + (Sh (W r

∞;−1 ,−1

), n ) = E0 (Sh (mn,r ))1;,

2 n 2 h 2j =− Dr −x [ sign(Sh (mn,r ) − )+ − sign(Sh (mn,r )−)− ] dx n n 0 j =1

−

2 · n · min hn,r;, (t) = t n

2 0

[hn,r;, (x) − min hn,r;, (t)] dx. t

Finally, note that from Theorem 6 the equality inf

g∈Snr (,)

E0± (g h )1 = E0± (hn,r;, )1

(7.3)

easily follows. Comparing relations (7.1)–(7.3), we conclude that Theorem 4 holds. Now we are ready to prove Theorem 2. In view of Theorem 10, it suffices to prove Theorem 3, i.e., that for all , > 0 and for any monospline mhi we have E0 (Sh (mn,r ))1;, E0 (mhi )1;, .

(7.4)

916


However, by the duality Theorem 9 and the representation (1.4) for R(f h , ), we see that if the monospline mhi corresponds to the quadrature formula i ∈ Kni (h), then E0 (mhi )1;, = R + (W r

∞;−1 , −1

; i ).

From this and from Theorem 4 (since Sh (mn,r ) corresponds to the formula in ), inequality (7.4) follows, and Theorems 3, 4 are proved. Now we shall prove Theorem 1. We obtain from relation (1.4) and Theorems 2, 10 and 11 that

2

R ± (W r F, i ) = R ± (Sh (W r F ), ) = sup

= sup

(±f (t))Sh (m)(t) dt : f ∈ F, f ⊥ 1

0

2 sup

g:(g)=(f )

= sup

2

(±g(t))Sh (m)(t) dt : f ∈ F, f ⊥ 1

0

(±f, t)(Sh (m), t) dt : f ∈ F, f ⊥ 1

0

sup

2

(±f, t)(Sh (mn,r ), t) dt : f ∈ F, f ⊥ 1

0

= R ± (Sh (W r F ), n ) = R ± (W r F, in ). Thus, Theorem 1 is proved. References [1] P.S. Aleksandrov, Combinatorial Topology, OGIZ, Moscow, 1947 (in Russian); P.S. Aleksandrov, Combinatorial Topology, vol. 1, Graylock Press, Albany, NY, 1956 (in English). [2] V.F. Babenko, Nonsymmetric approximations in the spaces of summable functions, Ukrainian Math. J. 34 (1982) 409–416 (in Russian). [3] V.F. Babenko, Inequalities for rearrangements of differentiable periodic functions, problems of approximation and integrating, Dokl. USSR 272 (1983) 1038–1041 (in Russian). [4] V.F. Babenko, On a certain problem of optimization of the approximate integration, Studies on Modern Problems of Summation and Approximation of Functions and their Applications, Dnepropetrovsk University, Dnepropetrovsk, 1984, pp. 3–13 (in Russian). [5] V.F. Babenko, Approximations, widths and optimal quadrature formulae for classes of periodic functions with rearrangement invariant sets of derivatives, Anal. Math. 13 (1987) 15–28. [6] V.F. Babenko, Widths and optimal quadrature formulae for convolution classes, Ukrainian Math. J. 43 (1991) 1135–1148. [7] S.V. Borodachov, On optimization of interval quadrature formulae on some nonsymmetric classes of periodic functions, Bull. Dnepropetrovsk Univ. Math. 4 (1999) 19–24 (in Russian). [8] S.V. Borodachov, On optimization of interval quadrature formulae on some classes of absolutely continuous functions, Bull. Dnepropetrovsk Univ. Math. 5 (2000) 28–34 (in Russian). [9] N.P. Korneichuk, Extremal Problems of Approximation Theory, Nauka, Moscow, 1976, p. 320 (in Russian). [10] N.P. Korneichuk, A.A. Ligun, V.G. Doronin, Approximation with Constraints, Naukova dumka, Kiev, 1982 (in Russian). [11] M.A. Krasnosel’skii, Ya.B. Rutickii, Convex Functions and Orlich Spaces, Fizmatgiz, Moscow, 1958 (in Russian). [12] S.G. Krein, Yu.I. Petunin, E.M. Semenov, Interpolation of Linear Operators, Nauka, Moscow, 1978 (in Russian).


917

[13] A.L. Kuz’mina, Interval quadrature formulae with multiple node intervals, Izv. Vuzov Math. 7 (1980) 39–44 (in Russian). [14] A.A. Ligun, Exact inequalities for spline-functions and best quadrature formulae for some classes of functions, Math. Zametki 19 (1976) 913–926 (in Russian). [15] G.V. Milovanovic, A.S. Cvetkovic, Gauss–Radau and Gauss–Lobatto interval quadrature rules for Jacobi weight function, Numer. Math. 3 (102) (2006) 523–542. [16] V.P. Motornyi, On the best quadrature formula of the form nk=1 pk f (xk ) for certain classes of periodic differentiable functions, Izv. Akad. Nauk SSSR. Ser. Mat. 38 (1974) 583–614 (in Russian). [17] V.P. Motornyi, On the best interval quadrature formula in the class of functions with bounded rth derivative, East J. Approx. 4 (1998) 459–478. [18] M. Omladich, S. Pahor, S. Suhadolc, On a new type of quadrature formulae, Numer. Math. 25 (1976) 421–426. [19] K.I. Oskolkov, On optimality of quadrature formula with equidistant nodes on the classes of periodic functions, Dokl. Akad Nauk USSR 249 (1979) 49–52 (in Russian). [20] Fr. Pittnauer, M. Reimer, Interpolation mit Intervallfunctionalen, Math. Z. 146 (1976) 7–15. [21] R.N. Sharipov, Best interval quadrature formulae for Lipschitz classes, Constructive Function Theory and Functional Analysis, vol. 4, Kazan University, Kazan, 1983, pp. 124–132 (in Russian). [22] H. Tribel, Theory of Interpolation, Function Spaces, Differential Operators, Mir, Moscow, 1980 (in Russian). [23] A.A. Zhensykbaev, The best quadrature formula for some classes of periodic functions, Izv. Akad Nauk USSR, Ser. Math. 41 (1977) 1110–1124 (in Russian). [24] A.A. Zhensykbaev, Monosplines of minimal norm and the best quadrature formulae, Uspehi Math. Nauk. 36 (1981) 107–159 (in Russian).


Deterministic constructions of compressed sensing matrices夡 Ronald A. DeVore Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA Received 8 January 2007; accepted 16 April 2007 With high esteem to Professor Henryk Wozniakowski on the occasion of his 60th birthday Available online 4 May 2007

Abstract Compressed sensing is a new area of signal processing. Its goal is to minimize the number of samples that need to be taken from a signal for faithful reconstruction. The performance of compressed sensing on signal classes is directly related to Gelfand widths. Similar to the deeper constructions of optimal subspaces in Gelfand widths, most sampling algorithms are based on randomization. However, for possible circuit implementation, it is important to understand what can be done with purely deterministic sampling. In this note, we show how to construct sampling matrices using finite fields. One such construction gives cyclic matrices which are interesting for circuit implementation. While the guaranteed performance of these deterministic constructions is not comparable to the random constructions, these matrices have the best known performance for purely deterministic constructions. © 2007 Elsevier Inc. All rights reserved. Keywords: Compressed sensing; Sampling; Widths; Deterministic construction

1. Introduction Compressed sensing (CS) offers an alternative to the classical Shannon theory for sampling signals. The Shannon theory models signals as bandlimited and encodes them through their time samples. The Shannon approach is problematic for broadband signals since the high sampling rates cannot be implemented in circuitry. In CS one replaces the bandlimited model of signals by the assumption that the signal is sparse or compressible with respect to some basis or dictionary of wave forms and enlarges the concept of sample to include the application of any linear functional. 夡

This research was conducted while the author was the visiting Texas Instrument Professor at Rice University. E-mail address: [email protected].


R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925

919

Much of the methodology of CS traces back to early work on Gelfand widths and information based complexity (IBC); see [6,5,4] for a discussion of these connections. This paper will be concerned with the discrete CS problem where we are given a discrete signal which is a vector x ∈ RN with N large and we wish to capture x by linear information. This means that we are allowed to sample x by inner products v · x of x with vectors v. We are interested in seeing how well we can do given a budget n < N in the number of samples we are allowed to take. This should be contrasted to the usual paradigm in compression, where one represents the signal with respect to some basis, computes all of its coefficients, but then retains only a small number (in our case n) of the largest of these coefficients to obtain compression. Here we want to see if we can avoid computing all of these coefficients and merely take a compressed number of samples to begin with. If we choose n sampling vectors then our sampling can be represented by an n × N matrix (called a CS matrix) whose rows are the vectors v that have been chosen for the sampling. Thus, the information we extract from x through is the vector y = x which lies in the lower dimensional space Rn . The question becomes: What are good sampling matrices ? To give this question a precise formulation, we need to specify several ingredients. First, what will we allow as decoders of y. That is how will we recover x or an approximation x¯ to x from y. Here we will be very general and consider any mapping from Rn → RN as a potential decoder. The mapping will generally be nonlinear—in contrast to which is assumed to be linear. The problem of having practical, numerically implementable decoders is an important one and to a large extent separates CS from the earlier work on widths and IBC. However, this will not be the concern of this paper. Given that the dimensions n, N of our problem are fixed, we let An,N denote the set of all encoding–decoding pairs (, ) where is an n × N matrix and maps Rn → RN . A second ingredient is how we shall measure distortion. The vector x¯ := (x) will in general not be the same as x. We can measure the distortion x − x¯ in any norm on RN . The typical choices are the N p norms: 1/p N p |x | , 0 < p < ∞, j j =1 (1.1) xNp := maxj =1,...,N |xj |, p = ∞. There are several ways in which we can measure performance of a CS matrix (see [4]). In this paper, we shall restrict our attention to only one method which relates to Gelfand widths. Given a vector x ∈ RN , the performance of the encoding–decoding pair (, ) in the metric of N p is given by E(x, , )Np := x − (x)Np .

(1.2)

Rather than measure the performance on each individual x, we shall measure performance on a class K. If K is a bounded set contained in RN , the error of this encoding–decoding on K is given by E(K, , )Np := sup E(x, , )Np .

(1.3)

x∈K

Thus, the error of the class K is determined by the largest error on K. The best possible performance of an encoder–decoder is given by En,N (K)Np :=

inf

(,)∈An,N

E(K, , )Np .

(1.4)

920


We say that an encoder–decoder pair (, ) ∈ An,N is near optimal on K with constant M, if E(K, , )Np MEn,N (K)Np .

(1.5)

If M = 1 we say the pair is optimal. This is the so-called min–max way of measuring optimality prevalent in approximation theory, information based complexity, and statistics. Given a set K, the optimal performance En,N (K)Np of CS is directly connected with the Gelfand widths of the set K. If K is a compact set in N p , and n is a positive integer, then the Gelfand width of K is by definition d n (K)Np := inf sup{xNp : x ∈ K ∩ Y },

(1.6)

Y

where the infimum is taken over all subspaces Y of X with codimension n. If K = −K and K + K ⊂ C0 K, for some constant C0 , then d n (K)Np En,N (K)Np C0 d n (K)Np ,

1n N.

(1.7)

In other words, finding the best performance of encoding–decoding on K is equivalent to finding its Gelfand width. The relation between these two problems is the following. If (, ) is an encoding–decoding pair for CS on K, then the null space Y of is a space of codimension n which is a candidate for Gelfand widths. Conversely, given any space Y for Gelfand widths then any basis for its orthogonal complement gives a CS matrix for CS on K. Using these correspondences, one easily proves (1.7) (see [4]). N The Gelfand widths of the unit balls K = U (N q ) in p are known up to multiplicative constants. N We highlight only one of these results for the Gelfand width of U (N 1 ) in 2 which is the deepest result in this field. It states that there exist absolute constants C1 , C2 such that log(N/n) log(N/n) n N C1 d (U (1 ))N C2 . (1.8) 2 n n The upper estimate in (1.8) was proved by Kashin [8] save for the correct power of the logarithm. Later Garneev and Gluskin proved the upper and lower bounds in (1.8) (see [7]). The upper bound is proved via random constructions and there remains to this date no deterministic proof of the upper bound in (1.8). In CS, their constructions correspond to random matrices whose entries are independent realizations of a Gaussian or Bernouli random variable. Our interest in this paper centers around deterministic constructions of matrices for CS. We ask how close we can get to the Gelfand width of classes with such constructions. We shall give constructions of matrices using finite fields which are related to the use of finite fields to prove results on Kolmogorov widths as given in [2]. A related construction using number theory was given by Maiorov [11] (see also [10] for another deterministic construction). Our constructions will not give optimal or near optimal performance, as will be explained later. However, their performance is the best known to the author for deterministic constructions. We shall also consider modifications of this construction so that the resulting matrices are circulant (each row of is a certain shift of the previous row with wrapping). The importance of circulant matrices is that they can be more readily implemented in circuits. An outline of our paper is the following. In the next section, we discuss the restricted isometry property (RIP) introduced by Candès and Tao [3] and how this property guarantees upper bounds for the performance of CS matrices on classes. The following section, gives our construction of CS matrices and the proof that they satisfy a RIP. The final section gives some concluding remarks.


921

2. Some simple results about CS matrices How can we decide if a given matrix is good for CS? Candès and Tao [3] have introduced a condition on matrices which they call the restricted isometry property and show that whenever a matrix satisfies this property, we can obtain estimates for its performance on sets K = U (N q ). For the remainder of this paper, · will always denote an 2 norm. All other norms will be subscripted. If k 1 is an integer, we denote by k the set of all vectors x ∈ RN such that at most k of the coordinates of x are nonzero. In other words, k is the union of all the k-dimensional spaces XT , #(T ) = k, where T ⊂ {1, . . . , N} and XT is the linear space of all x ∈ RN which vanish outside of T . Given any vector x ∈ RN , we define k (x)Np := inf x − zNp ,

(2.1)

z∈k

which is the error of k term approximation to x in N p. Following Candès and Tao, we say that has the RIP of order k and constant ∈ (0, 1) if (1 − )x2 x2 (1 + )x2 ,

x ∈ k .

(2.2)

Notice that x ∈ Rn so that x is the n2 norm. To get a better understanding of this property, consider the n × #(T ) matrices T formed by the columns of with indices from T . Then (2.2) is equivalent to showing that the Grammian matrices AT := tT T ,

#(T ) = k,

(2.3)

are bounded and boundedly invertible on 2 with bounds as in (2.2), uniform for all T such that #(T ) = k. The matrix AT is symmetric and nonnegatively definite, so this is equivalent to each of these matrices having their eigenvalues in [1 − , 1 + ]. The importance of the RIP is seen from the following theorem of Candès and Tao [3] (reinterpreted in [4]). If the n × N matrix satisfies RIP of order 3k for some ∈ (0, 1), then there is a decoder such that for any vector x ∈ RN , we have x − (x)N C 2

k (x)N √ 1 . k

(2.4)

This means that the bigger the value of k for which we can verify the RIP then the better guarantee we have on the performance of . As an example, let us return to the case of the set K = U (N 1 ). If an n × N matrix has the RIP of order k then (2.4) shows that √ N (2.5) d n (U (N 1 ))N En,N (U (1 ))N C/ k. 2

2

To get the optimal result we want to satisfy RIP of order k = n/ log(N/n). Matrices of this type can be constructed using random variables such as Gaussian or Bernouli as their entries (see [1] for example). However, there are no deterministic constructions for k of this size. In the next section, we shall give a deterministic construction of matrices which satisfy RIP for a more modest range of k.

922


3. Deterministic constructions of CS matrices We shall give a deterministic construction of matrices which satisfy the RIP. The vehicle for this construction are finite fields F . For simplicity of this exposition, we shall consider only the case that F has prime order and hence is the field of integers modulo p. The results we prove can be established for other finite fields as well. Given F , we consider the set F × F of ordered pairs. Note that this set has n := p 2 elements. Given any integer 0 < r < p, we let Pr denote the set of polynomials of degree r on F . There are N := pr+1 such polynomials. Any polynomial Q ∈ Pr can be represented as Q(x) = a0 + a1 x + · · · + ar x r where the coefficients a0 , . . . , ar are in F . If we consider this polynomial as a mapping of F to F then its graph G(Q) is the set of ordered pairs (x, Q(x)), x ∈ F . This graph is a subset of F × F . We order the elements of F × F lexicographically as (0, 0), (0, 1), . . . , (p − 1, p − 1). For any Q ∈ Pr , we denote by vQ the vector indexed on F × F which takes the value one at any ordered pair from the graph of Q and takes the value zero otherwise. Note that there are exactly p ones in vQ ; one in the first p entries, one in the next p entries, and so on. Theorem 3.1. Let 0 be the n × N matrix with columns vQ , Q ∈ Pr , with these columns ordered lexicographically with respect to the coefficients of the polynomials. Then, the matrix := √1p 0 satisfies the RIP with = (k − 1)r/p for any k < p/r + 1. Proof. Let T be any subset of column indices with #(T ) = k and let T be the matrix created from by selecting these columns. The Grammian matrix AT := tT T has entries p1 vQ · vR with Q, R ∈ Pr . The diagonal entries of AT are all one. For any Q, R ∈ Pr with Q = R, there are at most r values of x ∈ F such that Q(x) = R(x). So any off diagonal entry of AT is r/p. It follows that the off diagonal entries in any row or column of AT have sum (k − 1)r/p = < 1 whenever k < p/r + 1. Hence we can write AT = I + BT ,

(3.1)

where BT where the norm is taken on either of 1 or ∞ . By interpolation of operators, the norm of BT is as an operator from 2 to 2 . It follows that the spectral norm of AT is 1 + and that of its inverse is (1 − )−1 . This verifies (2.2) and proves the lemma. Notice that since n = p2 and N = p r+1 , log(N/n) = (r − 1) log p√ = (r − 1) log(n)/2, we have constructed matrices that satisfy RIP for the range k − 1 < p/r < n log n/(2 log(N/n)). Our next goal is to modify the above construction to obtain circulant matrices = (i,j ). A circulant matrix has the property that i+1,j + = i,j ,

(3.2)

where := N/n and the arithmetic on indices is done modulo N . Hence a circulant matrix is determined by its first columns. Once these columns have been specified, all other entries are determined by imposing condition (3.2). Each other column will be a cyclic shift of one of the first columns. As in the previous theorem, our construction will use the vectors vQ , Q ∈ Pr , to generate the first columns. However, now we must be more selective in which polynomials we shall choose for these columns. Let us observe how we fill out the matrix from its first columns. The next block of columns is each gotten by a cyclic shift. For example each column with index m +


923

with m ∈ {1, . . . , } is obtained by taking the entries in column m and shifting them down one while the last entry in the mth column is moved to the top position. We continue in this fashion to the next block of columns and so forth. There will be n = p 2 such blocks. Consider the j th block, 0 j n − 1. We can write j = a + bp with a, b ∈ {0, . . . , p − 1}. Each column in this block will be a cyclic shift of the corresponding column vQ from the first block. Recall that we index the rows of by (x, y) ∈ F × F . The entry in the (x, y) position of vQ will now occupy the position (x , y ) where y = y + j = y + a modulo p and x = x + b modulo p or x = x + b + 1 modulo p. Since the ones in vQ occur precisely in the positions (x, Q(x)) the new ones in the corresponding column of block j will occur either in position (x , y ) where y = Q(x) + a modulo p and x = x + b modulo p or x = x + b + 1 modulo p. To describe the set of polynomials we shall use for the columns, we define the equivalence relation that two polynomials P , Q of degree r over F are equivalent (written P ≡ Q) if there exist a, b ∈ F such that P (x) = Q(x + a) + b,

∀x ∈ F.

(3.3)

Let us see what the structure of such an equivalence class is. For this, we use the simple lemma. Lemma 3.2. If f is any function on F for which there exist a, b ∈ F , not both zero, such that f (x) = f (x + a) + b for all x ∈ F , then f is a linear function. Proof. It follows that f (a) = f (0) − b and more generally f (ka) = f (0) − kb, for each k ∈ F . If a = 0, then ka, k = 1, . . . , p exhaust F and so f (x) = f (0) − a −1 bx for all x ∈ F so that f is linear. If a = 0, then f (x) = f (x) + b and hence b = 0 as well. Let us now consider the equivalence classes. One equivalence class consists of all the constant functions; there are p functions in this equivalence class. For each P (x) = x with = 0, its equivalence class will consist of all linear functions of the form x + b, b ∈ F ; there are again p functions in each of these equivalence classes. Finally if P is a polynomial which is not linear, then its equivalence class will consist of the p 2 polynomials P (x + a) + b corresponding to the p 2 choices of a, b (see Lemma 3.2). Let r consist of a set of representatives from each of the equivalence classes which do not consist of linear polynomials. That is we choose one representative from each of these equivalence classes except that we never take polynomials of degree 1. Let us see what the cardinality of r is. There are p r+1 polynomials of degree r and p 2 linear polynomials. So there are p r+1 − p 2 polynomials which are not linear. They are divided into sets of size p 2 (the equivalence classes). Hence, := #(r ) = pr−1 − 1. Now, there are n = p 2 cyclic shifts so N = pr+1 − p 2 . In going further in this section, let 0 denote the circulant matrix whose first columns are the vQ , Q ∈ r written in lexicographic order. Our next lemma bounds the inner products of any two columns of 0 . Lemma 3.3. For any two columns v = w from the matrix 0 , we have |v · w| 4r.

(3.4)

Proof. Each of the columns v, w of 0 can be described as a cyclic shift of vectors vQ , vR with Q, R ∈ r . As we have observed above, there are integers a0 , b0 (depending only on v) such that any one in column v occurs at a position (x , y ) if and only if x = x +b0 +0 and y = Q(x)+a0

924


with x ∈ F and 0 ∈ {0, 1}. Similarly, a one occurs in column w at position (x

, y

) if and only ¯ + a1 with x¯ ∈ F and 1 ∈ {0, 1}. The inner product v · w if x

= x¯ + b1 + 1 and y

= R(x) counts the number of row positions for which there is a one in each of these two columns. That is the number of solutions to x + b0 + 0 = x¯ + b1 + 1 and Q(x) + a0 = R(x) ¯ + a1 with x, x¯ ∈ F and 0 , 1 ∈ {0, 1}. Consider first the case when Q = R. We fix one of the four possibilities for 0 , 1 . These equations mean that x¯ = x + b and R(x + b) = Q(x) + a with b = b0 − b1 + 0 − 1 and a = a0 − a1 . Since R = Q, we know that R(· + b) is not identical to Q(·) + a because these R and Q are not equivalent. In this case the only possible x which can satisfy the above are the zeros of the nonzero polynomial R(· + b) − Q(·) − a. Thus there are at most r such x because this latter polynomial has degree r. Since there are four possibilities for (0 , 1 ), we have |v · w| 4r as desired. Now consider the case when R = Q and any one of the four possible values for (0 , 1 ). Similar to the case just handled, we have that x¯ = x + b and Q(x + b) − a = Q(x). We are interested in the number of x for which this can happen. As long as these two polynomials are not identical this can happen at most r times. But we know that they can only be identical if Q is linear (see Lemma 3.2) and we know linear polynomials are not in r . Thus, even in the case Q = R we also have that |v · w| is at most 4r. Theorem 3.4. The cyclic matrix := k − 1 < p/4r.

√1 0 p

has the RIP (2.2) with = 4(k − 1)r/p whenever

Proof. The proof is the same as that of Theorem 3.1.

Notice that since n = p2 and N = pr+1 −p 2 , log(N/n) < (r −1) log p√= (r −1) log(n)/2, we have constructed matrices that satisfy RIP for the range k−1 < p/(4r) < n log n/(8 log(N/n)). 4. Concluding remarks √ The matrices of our two theorems satisfy RIP of order k for k C n log n/ log(N/n) which is the largest range of k that is known to the author for deterministic constructions. However, it falls far short of the range k Cn/ log(N/n) known for probabilistic constructions. The fact is that √ we know from probabilistic constructions that there exist n × N matrices with entries ±1/ n that satisfy RIP for the larger range k Cn/ log(N/n). We just cannot explicitly describe one of these matrices when N and n are large. It is therefore very interesting to try to obtain a larger range of k with deterministic methods and to understand if there are any essential limitations to deterministic methods. Let us point out some of the deficiencies in our approach. First, we begin by asking what are good compressed sensing matrices. The restricted isometry property is just a sufficient condition to guarantee that a matrix has good performance on classes. Two matrices can has exactly the same performance on classes and yet one will satisfy RIP and the other not. So there may be a more direct avenue to constructing good CS matrices by not going through RIP. The RIP is a condition on the spectral norm of the matrices AT = tT T . We have bounded the spectral norm by bounding the 1 and ∞ norms (which are much easier to handle than the spectral norm) and then using interpolation. The bounds we have gotten on k appear to be the best we could expect to get by this approach. Indeed, with an eye toward results on distribution of scalar products of unit vectors (see [9, Lemma 4.1, Chapter 14]), it seems that we could not


925

improve much on the bounds we gave for diagonal dominance. Of course, the spectral norm of a matrix can be much smaller than the 1 , ∞ norms. Thus it may be that estimating the spectral norm directly may be the way to go to obtain stronger results than ours. Acknowledgments The author thanks the Electrical and Computer Engineering Department at Rice, in particular Professor Rich Baraniuk, for their great hospitality. This research was supported by the Office of Naval Research Contracts ONR-N00014-03-1-0051, ONR/DEPSCoR N00014-03-1-0675, and ONR/DEPSCoR N00014-05-1-0715; and the National Science Foundation Grant DMS-354707. References [1] R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, The Johnson–Lindenstrauss meets compressed sensing, Constr. Approx., to appear. [2] C. de Boor, R. DeVore, K. Hoellig, Mixed norm n-widths, Proc. Amer. Math. Soc. 80 (1980) 577–583. [3] E. Candès, T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (2005) 4203–4215. [4] A. Cohen, W. Dahmen, R. DeVore, Compressed sensing and best k-term approximation, submitted for publication. [5] R. DeVore, Optimal Computation, vol. I, Proceedings of ICM 2006, Madrid, European Mathematical Society Publishing House, 2007, to appear. [6] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory 52 (2006) 1289–1306. [7] E.D. Gluskin, Norms of random matrices and widths of finite-dimensional sets, Math. USSR Sb. 48 (1984) 173–182. [8] B. Kashin, The widths of certain finite dimensional sets and classes of smooth functions, Izvestia 41 (1977) 334–351. [9] G.G. Lorentz, M. von Golitschek, Yu. Makovoz, Constructive Approximation: Advanced Problems, Springer Grundlehren, vol. 304, Springer, Berlin, Heidelberg, 1996. [10] V. Maiorov, Trigonometric widths of Sobolev classes in the space Lq , Math. Zametki 40 (1986) 161–173. [11] V. Maiorov, Linear diameters of Sobolev classes, Soviet Dokl. 43 (1991) 1127–1130.


On linear codes with large weights simultaneously for the Rosenbloom–Tsfasman and Hamming metrics夡 M.M. Skriganov Steklov Mathematical Institute, Fontanka 27, St. Petersburg 191023, Russia Received 26 January 2007; accepted 26 February 2007 Available online 24 March 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We show that maximum distance separable (MDS) codes, or more generally nearly MDS codes, for the Rosenbloom–Tsfasman metric can meet the Gilbert–Varshamov bound for their Hamming weights. The proof is based on a careful analysis of orbits of a linear group preserving the Rosenbloom–Tsfasman metric. © 2007 Elsevier Inc. All rights reserved. Keywords: Coding theory with non-Hamming metrics

1. Introduction A new approach to the theory of uniformly distributed point sets was developed in the recent papers [1,8,9]. This approach crucially depends on a specific version of coding theory where, unlike the classical coding theory, two basic metrics are involved. One of them is the standard Hamming metric while the other one is the Rosenbloom–Tsfasman metric introduced in [7]. In the present paper, we address an aspect of such a version of coding theory. Suppose that a linear code C ⊂ Fq over a finite field Fq with a large Rosenbloom–Tsfasman weight (C) is given. What can one say about the Hamming weight (C) of this code? Simple examples show that in general (C) is not controlled by (C). However, it turns out (see our main Theorem 3.1 in Section 3) that if one considers the orbit of the code C under the action of a linear group preserving the weight (C), then a portion of codes on this orbit have large Hamming weights. Furthermore, if C is a maximum distance separable (briefly MDS) code, or more generally nearly MDS code for the Rosenbloom–Tsfasman metric, then there exist codes on 夡

Supported by RFFI (Project No. 05-01-00935). E-mail address: [email protected].


M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936

927

the orbit of C which meet the Gilbert–Varshamov bound for their Hamming weights (see Theorem 3.2 below). We conjecture that point distributions constructed in terms of such specific codes have a series of remarkable properties. The author hopes to consider these intriguing questions in forthcoming papers. The present paper is organized as follows. In Section 2, preliminary material on coding theory is discussed. Our main Theorem 3.1 is given in Section 3. This section also contains asymptotic consequences of Theorem 3.1 given in Theorem 3.2. In Section 4, we consider the structure of orbits of a group preserving the Rosenbloom–Tsfasman metric, and relying on this consideration, we complete the proof of Theorem 3.1 in Section 5. 2. Preliminaries Let Matn,s (Fq ) denote the linear space of all matrices with n rows and s columns with entries from a fixed finite field Fq of q elements. Clearly, the space Matn,s (Fq ) is a direct product of n copies of the space Mat1,s (Fq ), so that Matn,s (Fq ) = Mat1,s (Fq ) × · · · × Mat1,s (Fq ) Fq ,

= ns.

(2.1)

n

By definition (cf. [4]), the Hamming weight (), ∈ Matn,s (Fq ), is equal to the number of nonzero entries of the matrix . In this case, (1 − 2 ) defines the Hamming metric on the space Matn,s (Fq ). The Rosenbloom–Tsfasman weight (), ∈ Matn,s (Fq ), is defined as follows. At first, let n = 1 and = (1 , . . . , s ) ∈ Mat1,s (Fq ). Then, we put (0) = 0, and () = max{i : i = 0} for = 0. Now, let

(2.2)

⎞ 1 ⎟ ⎜ = (1 , . . . , n ) = ⎝ ... ⎠ ∈ Matn,s (Fq ), ⎛

j ∈ Mat1,s (Fq ), 1 j n.

n

Then, we put () =

n

(j ).

(2.3)

j =1

It is easy to check that () = 0 if and only if = 0, and that the weights (2.2) and (2.3) satisfy the triangle inequality. Thus, (1 − 2 ) defines the Rosenbloom–Tsfasman metric on the space Matn,s (Fq ). Note that definition (2.2) implies even a stronger inequality (1 − 2 ) max{(1 ), (2 )},

1 , 2 ∈ Mat1,s (Fq ).

(2.4)

Thus, the Rosenbloom–Tsfasman metric for n = 1 is an ultrametric. It is obvious that ()()s(),

(2.5)

928


and these inequalities cannot be improved on the whole space Matn,s (Fq ). Thus, for large s the metric is stronger than . For s = 1 the both metrics coincide. It is remarkable that fundamental concepts related to the Hamming metric can be very naturally extended to the Rosenbloom–Tsfasman metric (see [2,7,8]). Following [8], we introduce a group Tsn of linear transformations on Matn,s (Fq ) preserving the weight . At first, let n = 1, = (1 , . . . , s ) ∈ Mat1,s (Fq ), and Ts denote the group of all lower triangular s × s matrices over Fq with arbitrary nonzero diagonal elements. From definition (2.2), we immediately conclude that the linear mappings t : Mat1,s (Fq ) → t ∈ Mat1,s (Fq ),

t ∈ Ts ,

(2.6)

preserve the weight : we have (t) = (). Now, let = (1 , . . . , n ) ∈ Matn,s (Fq ), j ∈ Mat1,s (Fq ), 1 j n, and Tsn = Ts × · · · × Ts

(2.7)

n

denote a direct product of n copies of Ts . Then, the linear mappings : Matn,s (Fq ) = (1 , . . . , n ) → = (1 t1 , . . . , n tn ) ∈ Matn,s (Fq ),

(2.8)

= (t1 , . . . , tn ) ∈ Tsn , preserve the weight : we have () = (). Obviously, the orders of the groups Tsn can be given by #{Tsn } = (q − 1)ns q

ns(s−1) 2

(2.9)

.

We write #{ · } for the cardinality of a finite set. Note that the full group of linear transformations preserving the Rosenbloom–Tsfasman weight is a semidirect product of the group Tsn and the group of all permutations of rows in matrices ∈ Matn,s (Fq ). This claim was conjectured in [8] and proved in [3]. However, in the present paper we do not use this fact. Finally, we recall (see [4] for details) that the Hamming ball Bn,s (r) = { ∈ Matn,s (Fq ) : () r},

r 0,

(2.10)

has the cardinality Vq (, r) = #{Bn,s (r)} =

r i=0

i

(q − 1)i ,

(2.11)

where · denotes the integer part of a real number, and = ns as given in (2.1). Furthermore, for each ∈ [0, q−1 q ], we have asymptotically log q Vq (, ) = Hq () + o(1), as → ∞, (2.12) where log q denotes the log in base q and Hq is the q-ary entropy function: Hq (0) = 0, and Hq () = log q (q − 1) − log q − (1 − )log q (1 − ) for 0 < q−1 q .


929

Note that Hq () is a continuous monotonic function, increasing on the interval [0, q−1 q ] from ← 0 to 1. Therefore, the inverse function Hq (x) is continuous and monotonic on the interval [0, 1],

increasing from 0 to q−1 q . We have listed the main auxiliary facts. Some additional facts will be given in the next section. 3. The main results

A linear code C is a subspace in Matn,s (Fq ). The parameter = ns is called the length of a code. We will consider only linear codes C = {0}. Introduce the Hamming and Rosenbloom–Tsfasman (minimum) weights for a linear code C ⊂ Matn,s (Fq ) by wt (C) = min {wt () : ∈ C\{0}} ,

(3.1)

where wt denotes any one of the weights or . Obviously, the group Tsn preserves the weight : we have (C) = (C), ∈ Tsn , where C = { , : ∈ C}. In view of (3.1) and (2.5), we have (C)(C)s(C).

(3.2)

Thus, if the weight (C) is large, the weight (C) is also large. However, as it was mentioned in the Introduction, our concern here is with the opposite situation when the weight (C) is known to be large and we are interested in whether the weight (C) can be large as well. Our main result is the following: Theorem 3.1. Let C ⊂ Matn,s (Fq ) be an arbitrary linear code. Suppose that the inequality

n q q (C ) q Vq (, d − 1) (3.3) q −1 holds for some positive integer d. Then, there exists a nonempty subset G(C) ⊂ Tsn such that the bound (C) d holds for all transformations ∈ G(C). Furthermore, the cardinality of the subset G(C) satisfies the bound n

q #{G(C)} Vq (, d − 1)q −(C ) 0. > 1 − q #{Tsn } q −1

(3.4)

(3.5)

The proof of Theorem 3.1 will be given in Section 5. Now we wish to derive some asymptotic consequences of Theorem 3.1. Both weights (C) and (C) (see (3.1)) satisfy the bound wt (C) − k(C) + 1,

(3.6)

where k(C) denotes the dimension of the linear subspace C ⊂ Matn,s (Fq ). For the Hamming weight this is the well-known Singleton bound (see [4]), and for the Rosenbloom–Tsfasman weight this bound was proved in [7] (see also [1] and [8]).

930


If for one of the weights (C) or (C) we have the equality in (3.6), wt (C) = − k(C) + 1,

(3.7)

then the code C is called MDS code for the corresponding metric. Trivial MDS codes of dimensions 1, − 1, and can be easily constructed (say, in the last case C = Matn,s (Fq )). Nontrivial MDS codes (of dimension 1 < k(C) < − 1) for the Rosenbloom–Tsfasman metric and s → ∞ exist if and only if q n − 1 (see [8]). The corresponding conditions in the case of the Hamming metric can be found in [4]. Let us write (C) = − k(C) + 1 − (C),

(3.8)

where the nonnegative parameter (C) is called the deficiency of the code C. Thus, MDS codes have zero deficiency. Let an infinite sequence of linear codes Cn,s ⊂ Matn,s (Fq ), s → ∞, be given. The codes Cn,s are called the nearly MDS codes for the Rosenbloom–Tsfasman metric if (Cn,s ) = o() as s → ∞. One can easily construct linear codes Cn,s ∈ Matn,s (Fq ) of deficiency (Cn,s ) = O(n log n) (see [8]). Obviously, these codes are nearly MDS codes if log n = o(s) as s → ∞. With more complicated methods of [5], one can construct codes of deficiency (Cn,s ) = O(n). Moreover, this bound cannot be improved for large n. Obviously, such codes are always nearly MDS codes. The role of both metrics and in the context of uniformly distributed point sets is discussed in detail in [9]. In particular, using the dual codes to linear codes Cn,s ⊂ Matn,s (Fq ) of dimension k(Cn,s ) = (n − 1)s and small deficiency (Cn,s ), one obtains very good distributions of q s points in the n-dimensional unit cube. If additionally the Hamming weights of the codes Cn,s are large, then the corresponding distributions of q s points have the minimal order of the Lp -discrepancies (see [1] and [9]). In applications to the theory of uniformly distributed point sets, the parameter n is usually assumed to be fixed while the parameter s → ∞. The situation when s is fixed and n → ∞ is also of interest for applications but in this case the behavior of the corresponding point distributions turns out to be very specific (see [6]). Note that in the last case the metrics and are equivalent (see (2.5) and (3.2)). For convenience, we normalize various characteristics of a code by the quantity = ns. More precisely, we write (C) =

(C) ,

(C) =

(C) ,

k(C) k(C) = ,

(C) (C) = .

In this notation, relation (3.8) for nearly MDS codes can be written in the form 1 (Cn,s ) = 1 − k(Cn,s ) + − k(Cn,s ) + o(1), (Cn,s ) = 1 −

as s → ∞.

(3.9)

Recall that in coding theory the parameter k( · ) is known as the rate of a linear code. Obviously, the group Tsn preserves the rate: we have k(Cn,s ) = k(Cn,s ), ∈ Tsn . With the above remarks, we have the following corollary of Theorem 3.1.


931

Theorem 3.2. Let Cn,s , s → ∞, be an infinite sequence of linear nearly MDS codes for the Rosenbloom–Tsfasman metric. Suppose also that (Cn,s )n{1 − log q (q − 1)} + 1,

as s → ∞.

(3.10)

Then, for all sufficiently large s, there exist nonempty subsets G(Cn,s ) ⊂ Tsn such that the Gilbert–Varshamov bound (cf . [4]) k(Cn,s )) + o(1), (Cn,s )Hq← (1 −

s → ∞,

(3.11)

holds for all transformations ∈ G(Cn,s ). The cardinality of the subsets G(Cn,s ) is given by (3.5) with C = Cn,s . Proof. With the assumption (3.10), we observe that for all sufficiently large s, inequality (3.3) holds for d = 1, at least. Let Dn,s 1 be the largest positive integer such that inequality (3.3) holds for C = Cn,s and d = Dn,s . Then

n n q q q Vq (, Dn,s ) > q (Cn,s ) q Vq (, Dn,s − 1). (3.12) q −1 q −1 Let us put n,s = Dn,s . D Taking the log q of each term in the inequalities (3.12) and using the asymptotic formula (2.12), we find that 1 1 n,s ) + o(1) {1 − log q (q − 1)} + + Hq (D s

1 1 1 > (Cn,s ) {1 − log q (q − 1)} + + Hq Dn,s − + o(1), as s → ∞. s Therefore, n,s ) + o(1), (Cn,s ) = Hq (D

as s → ∞,

and ← n,s = Hq← ( (Cn,s )) + o(1) = Hq (1 − k(Cn,s )) + o(1), D

as s → ∞.

(3.13) ←

In these asymptotic calculations we used the fact that both functions Hq ( · ) and Hq ( · ) are continuous. By Theorem 3.1, for all sufficiently large s, there exist nonempty subsets G(Cn,s ) ⊂ Tsn such that the bound n,s (Cn,s , ) D

(3.14)

holds for all transformations ∈ G(Cn,s ). Substituting the asymptotic formula (3.13) into the bound (3.14), we obtain the inequality (3.11). The proof of Theorem 3.2 is complete.

932


4. Orbits of the group Tsn on Matn,s (Fq ) First of all, we wish to describe the structure of orbits of the group Tsn on the space Matn,s (Fq ). Let n = 1, then we introduce the boxes a = { ∈ Mat1,s (Fq ) : () = a},

a ∈ Qs ,

(4.1)

in Mat1,s (Fq ), where Qs = {0, 1, . . . , s}. For arbitrary n, we put A =

n

aj = = (1 , . . . , n ) ∈ Matn,s (Fq ) : (j ) = aj , 1 j n ,

(4.2)

j =1

where A = (a1 , . . . , an ) ∈ Qns . Obviously, A1 ∩ A2 = ⭋ if A1 = A2 , and the space Matn,s (Fq ) can be represented as a disjoint union of all A , A ∈ Qns , so that A . (4.3) Matn,s (Fq ) = A∈Qns

The following is an improvement of Proposition 2.2(i) of [2]. Lemma 4.1. (i) The orbits of the group Tsn on Matn,s (Fq ) coincide with the boxes A , A ∈ Qns . (ii) The cardinality of the boxes A , A ∈ Qns , is given by #{A } = (q − 1)(A) q a1 +···+an −(A) ,

(4.4)

where (A) denotes the “Hamming weight” of the integer vector A = (a1 , . . . , an ), given by the number of nonzero entries of A. (iii) The stabilizer S(A ) = { ∈ Tsn : A = A } of a point A ∈ A is a subgroup in Tsn of order ns(s−1) #{Tsn } (4.5) = (q − 1)ns−(A) q 2 −a1 −···−an +(A) . #{S(A )} = #{A } Proof. (i) First, let n = 1. Then, 0 = {0} and the statement is trivial. If a 1, then the box a consists of all rows = (1 , . . . , s ) with j = 0 for j > a, arbitrary j ∈ Fq for j < a and with arbitrary a ∈ F∗q = Fq \{0}. Write a = (1,a , . . . , s,a ) with j,a = j,a , where j,a is the Kronecker symbol. For a lower triangular matrix t = (tj,i ) ∈ Ts , tj,i = 0 for j > i, we have a t = (t1,a , . . . , ts,a ) ∈ a . Thus, a = {a t : t ∈ Ts } is an orbit of the group Ts . This proves the statement (i) for n = 1. In view of formulas (2.1), (2.7), and (4.2), this also implies the statement (i) for arbitrary n. (ii) The above description of the structure of boxes a implies the formula 1 if a = 0, #{a } = (4.6) (q − 1)q a−1 if 1 a s. From (4.2), we conclude that #{A } =

n

#{aj }.

j =1

Substituting (4.6) into (4.7), we obtain (4.4).

(4.7)


933

(iii) Each orbit A , A ∈ Qns , can be identified with a homogeneous space: A Tsn /S(A ). Therefore, #{A } =

#{Tsn } , #{S(A )}

so that

#{S(A )} =

#{Tsn } , #{A }

(4.8)

and (4.5) follows from (4.8), (4.4), and (2.9). The proof of Lemma 4.1 is complete. Let two points 1 and 2 in Matn,s (Fq ) be given. What is the number of solutions ∈ Tsn of the equation 1 = 2 ? We put N (1 , 2 ) = { ∈ Tsn : 1 = 2 } ⊂ Tsn

(4.9)

and (1 , 2 ) = #{N (1 , 2 )}.

(4.10)

Lemma 4.2. (i) If 1 ∈ A1 , 2 ∈ A2 , and A1 = A2 , then (1 , 2 ) = 0. (ii) If 1 ∈ A and 2 ∈ A , then (1 , 2 ) = #{S(A )}. Proof. (i) The statement is a trivial consequence of Lemma 4.1(i). (ii) Since both points 1 and 2 belong to the same orbit A , we can write 1 = A 1 , 2 = A 2 for a fixed point A ∈ A and some 1 , 2 ∈ Tsn . Therefore, the equation 1 = 2 takes the form A 1 = A 2 or A 1 −1 2 = A . This gives −1 (1 , 2 ) = #{ ∈ Tsn : 1 −1 2 ∈ S(A )} = #{1 S(A )2 } = #{S(A )}.

The proof of Lemma 4.2 is complete.

Now our interest is with the distribution of points of a code C ⊂ Matn,s (Fq ) in boxes A , A ∈ Qns . Lemma 4.3. Let C ⊂ Matn,s (Fq ) be an arbitrary linear code. Then #{C ∩ 0 } = 1, and for nonzero A = (a1 , . . . , an ) ∈ Qns #{C ∩ A } = 0

if 0 < a1 + · · · + an < (C),

and #{C ∩ A } q a1 +···+an −(C )+1

if a1 + · · · + an (C).

This is Lemma 2.2 of [9]. It is worth noting that the ultrametric inequality (2.4) is crucial for the proof of this result. Relying on the above three lemmas, we can easily complete the proof of Theorem 3.1. 5. Proof of Theorem 3.1 Let a linear code C ⊂ Matn,s (Fq ) be given. Fix a Hamming ball B(d − 1) ⊂ Matn,s (Fq ) of radius d − 1, where d 1 is an integer (see (2.10)).

934


Let us split the group Tsn into a disjoint union of two subsets Tsn = G(C) ∪ B(C), where the subset G(C) of “good” transformations consists of all ∈ Tsn such that 1 = 2 for all 1 ∈ C\{0} and 2 ∈ B(d − 1)\{0}, and the subset B(C) of “bad” transformations consists of all ∈ Tsn such that 1 = 2 for at least one pair 1 ∈ C\{0} and 2 ∈ B(d − 1)\{0}. From these definitions we immediately conclude that #{G(C)} + #{B(C)} = #{Tsn },

(5.1)

(C) d

(5.2)

and

for all transformations ∈ G(C). Let us estimate the cardinality of the subset of bad transformations. With definitions (4.9) and (4.10), we have

B(C) ⊂

{N (1 , 2 ) : 1 ∈ C\{0}, 2 ∈ B(d − 1)\{0}}

1 ,2

and #{B(C)}

{(1 , 2 ) : 1 ∈ C\{0}, 2 ∈ B(d − 1)\{0}} .

(5.3)

1 ,2

Here, for simplicity, we write {E : ∈ O} instead of ∈O E and {f ( ) : ∈ O} instead of ∈O f ( ) if the corresponding region O is rather cumbersome to be indicated under the symbol for union or summation. For convenience, we denote by d (C) the sum in (5.3). Using (4.3), we can write this sum in the form (1 , 2 ) : 1 ∈ (C\{0}) ∩ A1 , d (C) = A1 ,A2 ∈Qns 1 ,2

2 ∈ (B(d − 1)\{0}) ∩ A2 .

By Lemma 4.2(i), all terms in (5.4) with A1 = A2 vanish. Therefore, {(1 , 2 ) : 1 ∈ (C\{0}) ∩ A , 2 ∈ (B(d − 1)\{0}) ∩ A } d (C) = A∈Qns 1 ,2

=

A∈Qns \{0} 1 ,2

{(1 , 2 ) : 1 ∈ C ∩ A , 2 ∈ B(d − 1) ∩ A } .

(5.4)


935

It then follows from Lemma 4.2(ii) and Lemma 4.1(iii) that {#{S(A )} : 1 ∈ C ∩ A , 2 ∈ B(d − 1) ∩ A } d (C) = A∈Qns \{0} 1 ,2

=

#{S(A )}#{C ∩ A }#{B(d − 1) ∩ A }

A∈Qns \{0}

= #{Tsn }

A∈Qns \{0}

#{C ∩ A } #{B(d − 1) ∩ A }. #{A }

(5.5)

With Lemma 4.3 we obtain an upper bound for the last sum in (5.5) to give the inequality q a1 +···+an −(C )+1 d (C) #{Tsn } #{B(d − 1) ∩ A } : (q − 1)(A) q a1 +···+an −(A) A A = (a1 , . . . , an ) ∈ Qns , a1 + · · · + an (C) =

#{Tsn }q −(C )

A

(A) q q #{B(d − 1) ∩ A } : q −1

A = (a1 , . . . , an ) ∈ Qns ,

a1 + · · · + an (C)

n q #{B(d − 1) ∩ A } : A = (a1 , . . . , an ) ∈ Qns , q −1 A a1 + · · · + an (C) n

q < #{Tsn }q −(C ) q {#{B(d − 1) ∩ A } q −1 A∈Qns

n q Vq (d − 1), (5.6) = #{Tsn }q −(C ) q q −1 #{Tsn }q −(C ) q

where Vq (d − 1) is the cardinality of the ball B(d − 1) (see (2.11)). Combining (5.3) and (5.6), we find an upper bound for the cardinality of the subset of bad transformations, in the form

n q #{B(C)} < #{Tsn }q Vq (d − 1)q −(C ) . q −1 Substituting this inequality into (5.1), we find the following lower bound for the cardinality of the subset of good transformations

n #{G(C)} q >1−q Vq (d − 1)q −(C ) . (5.7) #{Tsn } q −1 Suppose that inequality (3.3) of Theorem 3.1 holds. Then, it follows from (5.7) that #{G(C)} > 0, and the subset G(C) is nonempty. Therefore, the bound (5.2) holds for all transformations ∈ G(C). The proof of Theorem 3.1 is complete.

936


Acknowledgments The author is grateful to Michael Tsfasman, Serge Vladu¸t, and Henryk Wo´zniakowski for their many interesting and valuable discussions. The author is also grateful to the referees for their helpful remarks and suggestions, and to Grzegorz Wasilkowski for his diligent handling of this paper. References [1] W.W.L. Chen, M.M. Skriganov, Explicit constructions in the classical mean squares problem in irregularities of point distribution, J. Reine Angew. Math. 545 (2002) 67–95. [2] S.T. Dougherty, M.M. Skriganov, MacWilliams duality and the Rosenbloom–Tsfasman metric, Moscow Math. J. 2 (1) (2002) 81–97. [3] K. Lee, Automorphism group of the Rosenbloom–Tsfasman space, Eur. J. Combin. 24 (2003) 607–612. [4] J.H. van Lint, Introduction to Coding Theory, third ed., Graduate Texts in Mathematics, vol. 86, Springer, Berlin, 1999. [5] H. Niederreiter, C.P. Xing, Low-discrepancy sequences and global functional field with many rational points, Finite Fields Appl. 2 (1996) 241–273. [6] E. Novak, H. Wo´zniakowski, When are integration and discrepancy tractable?, in: R.A. DeVore et al. (Eds.), FOCM Proceedings of Oxford, 1999, Cambridge University Press, Cambridge, 2001, pp. 211–266. [7] M.Yu. Rosenbloom, M.A. Tsfasman, Codes for the m-metric, Problemy Peredachi Informatsii 33 (1) (1997) 55–63 (English translation in Probl. Inf. Transm. 33 (1) (1997) 45–52). [8] M.M. Skriganov, Coding theory and uniform distributions, Algebra i Analiz 13 (2) (2001) 191–239 (English translation in St. Petersburg Math. J. 13 (2) (2002) 301–337). [9] M.M. Skriganov, Harmonic analysis on totally disconnected groups and irregularities of point distributions, J. Reine Angew. Math. 600 (2006) 25–49.


Computation of local radius of information in SM-IBC identification of nonlinear systems夡 Mario Milanese, Carlo Novara∗ Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy Received 26 January 2007; accepted 29 May 2007 Available online 27 July 2007

Abstract System identification consists in finding a model of an unknown system starting from a finite set of noisecorrupted data. A fundamental problem in this context is to asses the accuracy of the identified model. In this paper, the problem is investigated for the case of nonlinear systems within the Set Membership—Information Based Complexity framework of [M. Milanese, C. Novara, Set membership identification of nonlinear systems, Automatica 40(6) (2004) 957–975]. In that paper, a (locally) optimal algorithm has been derived, giving (locally) optimal models in nonlinear regression form. The corresponding (local) radius of information, providing the worst-case identification error, can be consequently used to measure the quality of the identified model. In the present paper, two algorithms are proposed for the computation of the local radius of information: The first provides the exact value but requires a computational complexity exponential in the dimension of the regressor space. The second is approximate but involves a polynomial (quadratic) complexity. © 2007 Elsevier Inc. All rights reserved. Keywords: Radius of information computation; Nonlinear systems identification; Set membership; Information based complexity

1. Introduction Consider a nonlinear discrete-time dynamic system in regression form y t+1 = f0 w t , wt = [y t . . . y t−ny +1 ut . . . ut−nu +1 ],

(1)

where y t ∈ R, ut ∈ Rm , n = ny + mnu and f0 : W ⊂ Rn → R. 夡 This work has been partly supported by Ministero dell’Università e della Ricerca of Italy, under the National Projects “Advanced control and identification techniques for innovative applications” and “Control of advanced systems of transmission, suspension, steering and braking for the management of the vehicle dynamics”.

∗ Corresponding author. Fax: +39 011 564 7099.

E-mail addresses: [email protected] (M. Milanese), [email protected] (C. Novara). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.05.004

938

M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951

The problem of system identification is to find, from a set of noise-corrupted measurements of y t and w t , an estimate f of f0 giving small, possibly minimal, identification error f0 − f, where · is a suitable norm. This error is not known and, since data are finite and noise corrupted, a reliable estimate of the identification error can be obtained only if some information on f0 and on noise is available. In the literature, [20,10,15], the information on f0 is typically given by assuming that it belongs to some finitely parametrized subset F() of functions. In some cases, the knowledge of the laws governing the system (mechanical, economical, biological, etc.) generating the data, may allow to have information on its structure, where some basic parameters have to be calibrated from available data. In other situations, when the laws are too complex or not sufficiently known, the usual approach is to rconsider that f 0 belongsq to a finitely parametrized set of functions . F() = {f (w, ) = i=1 i i w, i , i ∈ R } where = [, ] and the i’s are given functions. Then, measured data are used to derive an estimate of and f w, is used as estimate of f0 . Basic to this approach is the proper choice of the parametric family of functions f (w, ), typically realized by some search on different functional forms of the i ’s, e.g. linear, polynomial, sigmoidal, wavelet, etc. and on the number r [20]. This search may be quite time consuming, and in any case leads to approximate model structures. The evaluation of the effects of this approximation on identification errors is at present a largely open problem. Another critical point is that the estimate of is usually obtained by minimization of a non-convex error function. Such a minimization may trap into local minima and thus provide a bad estimate. In [12] an alternative approach is proposed, formulating the problem in a set membership (SM)—information based complexity (IBC) framework. The SM framework is used in systems identification to deal with approximate model structures and finite sample accuracy evaluation, see, e.g. [13,14,11,18,1]. The SM framework, being related to approximation and interpolation of multivariable functions with bounded derivatives, from the knowledge of a finite number of their values, has strong connection with the IBC framework, see, e.g. [21,16,22]. In the nonlinear SM-IBC approach of [12], no assumptions on the functional form of f0 are required. An assumption on f0 regularity is used instead, given by a bound on its gradient. Moreover, the noise is assumed bounded, in contrast with statistical approaches, which rely on assumptions such as stationarity, ergodicity, uncorrelation, type of distribution, etc. The validity of these assumptions may be difficult to test in many applications and is certainly lost in presence of approximate modelling. In the nonlinear SM-IBC approach a locally optimal identification algorithm is derived, which gives an estimate of f0 with minimal guaranteed Lp identification error, without requiring iterative minimization and thus avoiding the problem of local minima. A quantity rI , called (local) radius of information, giving the worst-case identification error, is also defined. The radius of information rI allows to assess the accuracy achieved by the optimal estimate. More in general, rI allows to asses the quality of the overall identification procedure, involving specific problems such as input type selection, sampling time choice, input channels selection, regressors choice, model order selection [17]. These problems are quite relevant in system identification [10,4]. In this paper, the problem of computing the radius of information rI is considered. Two algorithms are proposed: The first provides the exact value of rI but requires a computational complexity which increases exponentially with the dimension n of the regressor space. The second provides an approximate value of rI and involves a polynomial (quadratic) complexity. The paper is organized as follows. Section 2 summarizes the nonlinear SM-IBC method. In Section 3, we introduce the notion of hyperbolic voronoi diagram (HVD), which are used to


939

compute the radius of information. Section 4 illustrates the two algorithms for the computation of the local radius of information. In Section 5, a numerical example is shown. 2. SM-IBC identification of nonlinear systems In this section the main concepts and results of the nonlinear SM-IBC identification method [12] are summarized. T = { T = { Consider that a set of noise corrupted data Y y t , t = 1, . . . , T } and W wt , t = 1, . . . , T } generated by (1) is available. Then wt ) + d t , y˜ t+1 = f0 (

t = 1, . . . , T ,

(2)

where the term d t accounts for the fact y t+1 and w t are not exactly known. T , W T ). The aim is to derive an estimate fˆ of f0 from available measurements (Y T T An identification algorithm is an operator mapping available data (Y , W ) into an estimate T , W T ) = fˆ f0 . The algorithm should be chosen to give small (possibly fˆ of f0 : (Y minimal) Lp error f0 − fˆp , where:

1/p p . , p ∈ [1, ∞), W |f (w)| dw (3) f p = ess supw∈W |f (w)| , p = ∞ and W is a bounded convex set in Rn . Whatever algorithm is chosen, no information on the identification error can be derived, unless some assumptions are made on the function f0 and the noise d. The typical approach in the literature is to assume a finitely parametrized functional form for f0 (linear, bilinear, neural network, etc.) and statistical models for the noise [6,20,15,9]. In the SM-IBC approach, different and somewhat weaker assumptions are taken, not requiring the selection of a parametric form for f0 , but related to its derivatives. Moreover, the noise sequence D T = {d t , t = 1, . . . , T } is only supposed bounded. . Prior assumptions on f0 : f0 ∈ K = f ∈ C 1 (W ) : f (w) , ∀w ∈ W . . Prior assumptions on noise: D T ∈ D = {d t , t = 1, . . . , T } : |d t | ε, t = 1, . . . , T . . n 2 Here, f (w) denotes the gradient of f (w) and x = i=1 xi is the Euclidean norm. As typical in any estimation theory, the problem of checking the validity of prior assumptions arises. This problem is considered in [12], where a validation analysis is provided, which also allows to properly choose the values of the bounds and ε. A key role in this SM framework is played by the feasible systems set, often called “unfalsified systems set”, i.e. the set of all systems consistent with prior information and measured data. Definition 1. Feasible systems set: t . F SS T = {f ∈ K : y˜ t+1 − f w ε t , t = 1, . . . , T }.

(4)

The feasible systems set F SS T summarizes all the information on the mechanism generating the data that is available up to time T . If prior assumptions are “true”, then f0 ∈ F SS T , an important property for evaluating the accuracy of identification. Using the notion of feasible systems set, we can define an identification algorithm as an T , W T ) until time operator mapping all available information about function f0 , noise d, data (Y

940


T , summarized by F SS T , into an estimate fˆ of f0 : F SS T = fˆ f0 . For given estimate, F SS T = fˆ, the related Lp error f0 − fˆ cannot be exactly computed, p ˆ ˆ but its tightest bound is given by f0 − f supf ∈F SS T f − f . This motivates the followp

p

ing definition of the identification error, often indicated as local worst-case or guaranteed error. Definition 2. The local identification error of the estimate fˆ = F SS T is . E F SS T = E(fˆ) = sup f − fˆ . f ∈F SS T

p

Looking for algorithms that minimize the identification error, leads to the following optimality concepts. Definition 3. An algorithm ∗ is called locally optimal if: E ∗ F SS T = inf E F SS T = rI .

The quantity rI , called local radius of information, gives the minimal identification error that can be guaranteed by any estimate based on the available information up to time T . Define the functions: t . f (w) = min h + w − w t , t=1,...,T . f (w) = max ht − w − w t , t=1,...,T

. t+1 y h = + ε, t

. t+1 y ht = − ε,

(5)

where “min” and “max” are to be intended for fixed w (the same holds for “inf” and “sup” in statement (iii) of Theorem 1 below). The next result shows that the algorithm: . c (F SS T ) = fc = 21 f + f is optimal for any Lp norm, the corresponding minimal identification error can be actually computed, and the functions f and f , called optimal bounds, are the tightest upper and lower bounds of f0 . Theorem 1 (Milanese and Novara [12]). For any Lp (W ) norm, with p ∈ [1, ∞]: (i) The identification algorithm c F SS T = fc is locally optimal. (ii) E (fc ) = 21 f − f = rI = inf E F SS T . p

(iii) f (w) = supf ∈F SS T f (w), f (w) = inf f ∈F SS T f (w).


941

Note that the functions fc , f and f are not C 1 (W ), since they are defined by means of “min” and “max” over finite sets of functions. Nevertheless, in [12] it is shown that they are C 1 almost everywhere on W . Remark. The local identification error actually depends on f0 and D T , i.e. E f = E f, f0 , D T . In IBC literature [21,16], a global error of given algorithms is often considered, defined as . E g () = sup E F SS T , f0 , D T . f0 ∈F () D T ∈D

An algorithm g is called globally optimal if E g g = inf E g (). Note that a locally optimal algorithm ∗ is globally optimal, but g is not in general locally optimal. Thus, the local optimality concept considered in this paper is stronger than the global optimality concept. In the rest of the paper the local optimality concept will be considered and the term local will be omitted. 3. Hyperbolic Voronoi diagrams In this section, the notion of HVD introduced in [12] is recalled. The HVD are a generalization of standard Voronoi diagrams (see, e.g. [2]) and are used in the present paper to compute the radius of information. T = { Consider the set of points: W w t , t = 1, 2, . . . , N} and a T × T antisymmetric matrix . t Let be the element of at the tth row and th column. Then define: • The (n − 1)-dimensional hyperbola H t : . = t , t = }. t − w − w H t = {w ∈ Rn : w − w t : • The n-dimensional regions S t containing w . < t , t = }. t − w − w S t = {w ∈ Rn : w − w . • The hyperbolic cell C t : C t = =t S t . t = H t ∩ [C t ], where [C t ] is the closure of The cells C t are also called n-faces. The surfaces H t C , are called (n − 1)-faces. The intersections between the (n − 1)-faces generate other cells of dimension d, with 0 d < n − 1, called d-faces. The 0-faces are called vertices. T , ) is defined as the set of all d-faces, 0 d n. Definition 4. The HVD V (W If t = 0, ∀t, , all hyperbola H t degenerate into hyperplanes and the definitions become the ones of standard Voronoi diagrams [2]. The next theorem shows some properties of HVD useful for characterizing the optimal bounds f and f . Theorem 2 (Milanese and Novara [12]). t (i) C t = ∅ ⇐⇒ w −w > t , ∀ = t, (ii) C t ∩ C =

∅, t = , and (iii) Tt=1 C t = Rn , where C t is the closure of C t .

942


This result shows that the non-empty cells of an HVD give a complete partition of Rn , so that any w ∈ Rn belongs to some (n − 1)-dimensional hyperbola H t or to one (and only one) cell C t . Now, for given functions f and f , consider the HVD V and V defined as . T , , V =V W

. T , , V =V W

t t where t = h − h /, t = ht − h /. Let C , t = 1, 2, . . . , T be the cells of V and C t , t = 1, 2, . . . , T be the cells of V . The following result and the comments below show the connection between the HVD V and V and the optimal bounds f (w) and f (w). Theorem 3 (Milanese and Novara [12]).

t t t (i) Let C be a non-empty cell of V . Then f (w) = h + w−w t , ∀w ∈ C . (ii) Let C t be a non-empty cell of V . Then f (w) = ht − w − w t , ∀w ∈ C t . t

This theorem shows that, for w belonging to a non-empty cell Ct , the function f (w) is given t n w − w × R defined by the equation y = h + , with vertex of coordinates by the cone in R w t , h

t

and axis along the y-dimension. Since from Theorem 2 the non-empty cells of V give

a complete partition of the regressor space Rn , f is a piece-wise conic function over a suitable n partition the intersection of two cones y = of R t that can be derived from the HVD V . Indeed, t t and y = h + w − w w − w , projected on Rn gives the hyperbola H = {w ∈ Rn : h + w − w = t , t = } that define the HVD V . Similar considerations hold for t − w − w the relation between f and V . 4. Radius of information computation Let us define the following error function: . fe (w) =

1 2

f (w) − f (w) ,

which allows to write the radius of information as rI = fe p , where the norm is defined in (3). The analytical computation of fe p does not appear feasible, since fe is a quite “complicated” function (see Section 2). Let us consider the numerical computation of fe p . The standard approach (see, e.g. [21,23] and the references therein) for the numerical computation of the Lp norm of a function f (w) ∈ C r (W ) is to evaluate f (w) on a set of m points:

f (w 1 ), f (w 2 ), . . . , f (w m ) : w1 , w2 , . . . , wm ∈ W .

Then, the norm is approximated as p = f p f

k p 1/p f w , p ∈ [1, ∞), maxk=1,...,m f wk , p = ∞, m k=1 ak


943

where ak are suitably chosen. For ak = 1/m, we have the widely used quasi-Monte Carlo algorithms. This approach is simple and easy to implement but is affected by two relevant problems: p is only an approximation of f p . (1) For finite m, f (2) In general, the number of points m required to obtain a certain degree of approximation grows exponentially with the dimension n of the set W : mc −n/r , p is the approximation error, and f ∈ C r (W ) where c is a positive number, = f p − f (see [21,23]). This is the well-known curse of dimensionality, by which norm computation is intractable for large values of n. In this paper, we focus on the L∞ norm, which is a most relevant case in nonlinear SM identification. Two methods for the computation of rI = fe ∞ are introduced. The first method provides exact evaluation of rI using a finite set of points. Such a method is still affected by the exponential dependence on the dimension n. The second method is approximated but not affected by the exponential dependence on n. Consider the HVDs individuated by the functions f and f , introduced in Section 3: . . T , . T , , V = V W V =V W t

Let C , t = 1, 2, . . . , T be the cells of V and C t , t = 1, 2, . . . , T be the cells of V . Denote with [X] the closure of set X, with (X)* the boundary of set X and define: t . B tk = [C ] ∩ [C k ] ∩ W,

t, k = 1, 2, . . . , T .

(6)

We refer to the set B0tk using the term cell. The boundary (B tk )* of a cell is composed of surfaces called (n − 1)-faces of B tk . Each (n − 1)-face of B tk is either a portion of an (n − 1)-face of V , or a portion of an (n − 1)-face of V , or a portion of (W )* . The intersections between the (n − 1)-faces generate other surfaces of dimension d, with 0 d < n − 1, called d-faces of B tk . The 0-faces are called vertices of B tk and are indicated with B0tk . The set of all vertices is denoted as T . tk B0 = B0 . t,k=1

Assume that W ⊂ Rn is a convex polytope. The following result shows that the exact value of rI can be calculated by evaluating the error function over the finite set of points B0 . Theorem 4. The radius of information rI is given by rI = max fe (w). w∈B0

Proof. From point (iii) of Theorem 2 it directly follows that the sets B tk constitute a complete T partition of W , i.e. that W = t,k=1 B tk . The radius of information can thus be expressed as rI = fe ∞ = ess sup |fe (w)| = w∈W

max

max fe (w).

t,k=1,...,T w∈Btk

944


Hence, let us consider the computation of maxw∈Btk fe (w). From Theorem 3 we have fe (w) =

1 t k ), (h − hk ) + (w − w t + w − w 2 2

w ∈ B tk .

(7)

k are This expression shows that fe (w) is a convex function on B tk , since w − w t and w − w convex functions. A function that is convex on a compact set has its maximum on the boundary of the set, see, e.g. [19]. Then, defining: tk . wM = arg max fe (w) w∈Btk

we have that tk wM ∈ (B tk )* .

(8)

tk is on an (n − 1)-face The boundary (B tk )* is composed of the (n − 1)-faces of B tk , hence wM tk tk of B . An (n − 1)-face of B is either an portion of an (n − 1)-face of V , or a portion of an (n − 1)-face of V , or a portion of (W )* . tk lies on an (n − 1)-face of V . Then w tk ∈ H t for some , where Consider the case that wM M t H is the (n − 1)-dimensional hyperbola defined by t . t = h − h /}. H = {w ∈ Rn : w − w t − w − w

Suppose that this hyperbola has curvature oriented towards w . Since the level surfaces of fe (w) t are ellipsoids with curvature oriented towards w , it follows fe (w) is convex on the (n − 1)-face t tk individuated by H . This implies that wM is on the boundary of the (n − 1)-face. If the hyperbola t H has curvature oriented towards w t , we can write fe (w) as fe (w) =

1 k ), (h − hk ) + (w − w + w − w 2 2

t

w∈H .

tk The level surfaces of this function are ellipsoids with curvature oriented towards w and thus wM is on the boundary of the (n − 1)-face. Similarly, it can be seen that this property holds also if the tk lies on a (n − 1)-face of V or V , it is on maximum lies on a (n − 1)-face of V . Therefore, if wM the boundary of the (n − 1)-face, i.e. on a (n − 2)-face of B tk . tk is on an (n − 1)-face B tk belonging to (W ) . This property holds also in the case that wM * 1 tk is a portion of a plane and thus a convex set. This implies that the error function fe Indeed, B 1 tk and that its maximum is on the boundary of B tk , i.e. on an (n − 2)-face of B tk . is convex on B 1 1 tk is on a 0-face of B tk , i.e. Iterating this argument for n − 3, n − 4, . . . , 0, we have that wM tk tk tk tk wM ∈ B0 . The claim of the theorem follows, since wM ∈ B0 for all t, k = 1, 2, . . . , T .

The computation of rI indicated in Theorem 4 requires to calculate the vertices of the sets B tk . An algorithm for this calculation has been developed in Matlab䉸 . The main functions of the algorithm (main program and function vertices) are reported below in a code-like format. The other functions are only qualitatively described, since their code is quite complex and not essential to understand how the algorithm works.


945

Algorithm Main program VERT = []; for t = 1 : T for k = 1 : T Vert=vertices( wtk ); VERT = [VERT Vert]; end end Function vertices v = vert_search( wtk ); Vert = v; Vpv = v; a = 0; while a == 0 V = Vpv; Vpv = []; b = 0; for i = 1:size(V , 2) [Vfn, b(i)] = first_neighbours(V (:, i)); Vpv = [Vpv Vfn]; end a = all(b == 1); Vert = [VM Vpv]; end Function vert_search: This function takes a starting point w tk as input and gives a vertex tk v ∈ B0 as output. Function first_neighbours: This function takes a vertex V (:, i) ∈ B0tk as input and gives the set Vfn of all vertices of B0tk which are first neighbours of V (:, i) as output. The function first_neighbours also allows to check if all the points of B0tk have been computed. In particular, if b = 1 for all the steps of the for loop in the function vertices, then Vert = B0tk . In this case, the while loop stops and the points of B0tk are all contained in Vert. On the contrary, if b = 1 for some step of the for loop, the while loop continues until Vert = B0tk . The function vertices allows to evaluate the vertices B0tk of a cell B tk for given t, k. In order to evaluate all the vertices of t,k B0tk , the main program runs this function for all t, k = 1, 2, . . . , T . Clearly, a vertex of a cell is also a vertex of other cells. In order to avoid unnecessary computations, the function first_neighbours also recognizes if a vertex has already been evaluated and then skips its computation. A simplified version of the algorithm, requiring only one for loop the in main program, has T been implemented for the computation of the vertices of a HVD V W , . The computation of rI as indicated in Theorem 4 and in the above algorithm can be performed in principle for any dimension n of the regressor space. However, as it happens for standard Voronoi diagrams [2], the computational complexity needed to evaluate the vertices is exponential in n. This issue can be overcome by means of the theorem below, which allows to compute an approximate radius of information with low computational cost.

946


T . . , 0 . Let C t = [At ] ∩ W , where At are the cells of Define the following HVD: H = V W H . The following lemma, describing some properties of the sets C t and B tt , is essential for the calculation of the approximate radius of information. Lemma 1. The sets C t and B tt are convex and B tt ⊆ C t . Proof. The HVD H is a standard Voronoi diagram (see Section 3), then the cells At are polyhedrons, i.e. convex sets. It follows that C t is a convex set, being the intersection between two convex sets (W is assumed convex). t From the definition of C and C t in Section 3, we have that B tt is given by ⎤ ⎡ ⎤ ⎡ t B tt = ⎣ S ⎦ ∩ ⎣ S t ⎦ ∩ W, =t

t

. S = {w ∈ Rn . S t = {w ∈ Rn t

=t

< (h − ht )/}, : w − w t − w − w : w − w t − w − w < ht − h /}.

(9)

t

⊆ S t if h ht , or S t ⊆ S if h > ht . Eq. (9) can thus be written as ⎤ ⎡ tt t S ⎦ ∩ W, B =⎣

Note that S

=t

. S t = {w ∈ Rn : w − w < t }, t − w − w with t = (h − ht )/ if h ht , or t = (ht − h )/ if h > ht . t It is easy to see regions. Indeed, the surface that defines S t individuated that St are convex t = by the equationw − w − w − w is a (n − 1)-dimensional hyperbola with curvature t t oriented towards w , i.e. towards S . It follows that B tt is convex, being the intersection between convex sets. The cells At are defined by . t At = S , =t

. < 0}. S t = {w ∈ Rn : w − w t − w − w Being t 0, we have that St ⊆ S t . This implies that B tt ⊆ C t .

Consider the following optimization problems:

ti = max wi − w it , i = 1, 2, . . . , n,

(10)

w∈C t

t , i,t = arg max wi − w it ,

t = max i,t − w i

w∈Btt

i = 1, 2, . . . , n.

The following theorem provides upper and lower bounds of the radius of information.

(11)


947

Theorem 5. The radius of information rI is bounded as r rI r,

(12)

where r = ε + maxt t ,

r = ε + maxt t .

Proof. From Theorem 3 we have that the error function can be expressed as 1 j k , fe (w) = (h − hk ) + j + w − w w − w 2 2 where

t j = arg min h + w − w t , t k = arg max ht − w − w t .

(13)

t

The HVD H is a standard Voronoi diagram (see Section 3), then the sets C t constitute a complete partition of W . Suppose that w ∈ C t . From (13) it follows that j t h + w − w j h + w − w t , −hk + w − w k − ht + w − w t . We have therefore: 1 t t ) t + w − w fe (w) (h − ht ) + (w − w 2 2 = ε + w − w t , w ∈ C t . t . Hence maxw∈C t fe (w) maxw∈C t ε + w − w From the definition of t , it is easy to see that maxw∈C t w − w t t , which yields: max fe (w)ε + t . w∈C t

Since the cells C t , t = 1, 2, . . . , T define a complete partition of W , we have rI = fe ∞ = ess sup |fe (w)| = max max fe (w) t=1,...,T w∈C t

w∈W

and then rI = max max fe (w) max t=1,...,T

w∈C t

t=1,...,T

ε + t ,

which proves that r is an upper bound of rI . Let us now show that r rI . Since B tt ⊆ C t (see Lemma 1), we have that rI = max max fe (w) max fe (w) ∀t. t=1,...,T w∈C t

w∈Btt

Theorem 3 shows that, for w ∈ B tt , the error function can be expressed as fe (w) = ε + w − w t .

948


From the definition of t it follows that maxw∈Btt w − w t t , ∀t. We have thus: rI ε + max t t=1,...,T

which shows that r is a lower bound of rI .

Note that the optimization problems (10) and (11) can be Indeed, (10) is equivalent easily solved. to the following optimization problems: a = maxw∈C t wi − w it , b = minC t wi − w it , ti = t max (|a| t,|b|). The first two problems are convex since C is a convex set (see Lemma 1) and i is a convex function; the third one is trivial. The same argument holds for the second wi − w of Eqs. (11). The first of Eqs. (11) is trivial. The following approximate radius of information: . rI = 21 r + r (14) is an estimate of rI and can be used when the dimension n of the regressor space is large. Remark. The computational complexity of evaluating rI is O(n2 ). Indeed, a complexity O(n)tis i,t t required for the evaluation of i or , since it must be verified that constraints such as w − w − w −w t are satisfied. Clearly, the complexity involved by the computation of the norm . n i,t t 2 x = i=1 xi is O(n). Since i and must be calculated for i = 1, 2, . . . , n, it follows that rI have complexity O(n2 ). the computation of t and t , and thus the computations of r, r and Note that, while the calculation of rI becomes intractable in practice for n5 or 6, the calculation of rI can be performed for large values of n without significant problems.

5. Example The radius of information allows to assess the accuracy achieved by the optimal estimate provided by Theorem 1. More in general, the radius of information allows to asses the quality of the overall identification procedure, involving specific problems such as input type selection, sampling time choice, input channels selection, regressors choice, model order selection [17]. These problems are quite relevant in system identification [10,4]. In the literature, much effort has been spent to solve them for linear systems, see, e.g. [10,3,7]. On the contrary, very few studies on nonlinear systems are available [5,8]. In this example, we have considered an input type selection problem for the following nonlinear system y(t + 1) = 0.88y(t) − 0.12 tanh[15y(t)] + 0.06u(t).

(15)

The initial condition y(1) = 0 has been assumed. Three input types have been used: U(1) = {3 sin(0.2t), t = 1, 2, . . . , T }, U(2) = {3 sin(0.0009t 2 ), t = 1, 2, . . . , T }, U(3) = {W N(0, 4, t), t = 1, 2, . . . , T }

(16)

where W N (0, 4, t) is a white Gaussian noise of mean 0 and variance 4. For each input type, a simulation of system (15) of length T = 300 has been performed and the corresponding exact radius of information rI , approximate radius of information rI , lower


u(t)

5

949

Input: U(1)

0 −5 0

u(t)

5

50

100

150 t

200

250

300

50

100

150 t

200

250

300

50

100

150 t

200

250

300

Input: U(2)

0 −5 0

u(t)

5

Input: U(3)

0 −5 0

Fig. 1. Input sequences.

bound r I , and upper bound r I have been computed. The sequences U(1) , U(2) , U(3) used in these simulations are shown in Fig. 1. The regressor has been defined as w(t) = [y(t)

u(t)].

The regressor domain of interest has been assumed to be the rectangular region indicated in Fig. 2 and defined by . W = {w : w1 0.35, w1 − 0.35, w2 3.5, w2 − 3.5} . The values ε = 0 and = 1.5 have been taken on the basis of the procedure proposed in [12]. The values of the exact radius of information rI , approximate radius of information rI , lower bound r I , and upper bound r I obtained are shown in Table 1. The fact that U(3) provides a lower radius of information, and then a higher identification accuracy, could be related to the more uniform exploration of the regressor domain W provided by U(3) with respect to U(1) and U(2) . This can be observed in Fig. 2, where the “measured” regressors are shown for the three simulations. Considering the values of exact and approximate radius of information in Table 1, we can conclude that U3 is the best input type among {U(1) , U(2) , U(3) } to be used for system (15) identification.

950


5

w2

Input: U(1) 0 W −5

−0.5

−0.4

−0.3

0.2

−0.1

0 w1

0.1

0.2

0.3

0.4

0.5

0.4

0.5

0.4

0.5

5

w2

Input: U(2) 0 W −5

−0.5

−0.4

−0.3

−0.2

−0.1

0 w1

0.1

0.2

0.3

5

w2

Input: U(3) 0 W −5

−0.5

−0.4

−0.3

−0.2

−0.1

0 w1

0.1

0.2

0.3

Fig. 2. “Measured” regressors.

Table 1 rI , r I , r I corresponding to input sequences U(1) , U(2) , U(3) Values of rI ,

U(1) U(2) U(3)

rI

rI

rI

rI

1.08 0.97 0.49

0.88 1 0.49

0.66 0.96 0.47

1.1 1.04 0.51

6. Conclusions Within the SM-IBC approach for nonlinear systems identification, a quantity called radius of information, giving the worst-case identification error, is defined. The radius of information is important in order to assess the quality of a given model and, more in general, of a whole identification procedure. In this paper, two algorithms for the evaluation of the radius of information have been proposed: The first is exact but requires a complexity exponential in the dimension of the regressor space. The second is approximate and involves a quadratic complexity.


951

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

J. Chen, G. Gu, Control-Oriented System Identification: An H∞ Approach, Wiley, New York, 2000. H. Edelsbrunner, Algorithms in Combinatorial Geometry, Springer, Berlin, 1987. K.R. Godfrey, Perturbation Sugnals for System Identification, Prentice-Hall International, New York, 1993. G. Goodwin, R. Payne, Dynamic System Identification: Experiment Design and Data Analysis, Academic Press, New York, 1977. D. Gorinevsky, On the persistency of excitation in radial basis function network identification of nonlinear systems, IEEE Trans. Neural Networks 6 (1995) 1237–1244. R. Haber, H. Unbehauen, Structure identification of nonlinear dynamic systems—a survey on input/output approaches, Automatica 26 (1990) 651–677. H. Hjalmarsson, From experiment design to closed loop control, Automatica 41 (2005) 393–438. K. Hsu, C. Novara, T. Vincent, M. Milanese, K. Poolla, Parametric and nonparametric curve fitting, Automatica 42 (11) (2006) 1869–1873. R. Isermann, S. Ernst, O. Nelles, Identification with dynamic neural networks—architectures, comparisons, applications, in: Sysid 97, vol. 3, 1997, pp. 997–1022. L. Ljung, System Identification: Theory for the User, Prentice-Hall, Upper Saddle River, NJ, 1999. M. Milanese, J. Norton, H.P. Lahanier, E. Walter, Bounding Approaches to System Identification, Plenum Press, New York, 1996. M. Milanese, C. Novara, Set membership identification of nonlinear systems, Automatica 40 (6) (2004) 957–975. M. Milanese, R. Tempo, Optimal algorithms theory for robust estimation and prediction, IEEE Trans. Automatic Control 30 (1985) 730–738. M. Milanese, A. Vicino, Optimal algorithms estimation theory for dynamic systems with set membership uncertainty: an overview, Automatica 27 (1991) 997–1009. K.S. Narendra, S. Mukhopadhyay, Neural networks for system identification, in: Sysid 97, vol. 2, 1997, pp. 763–770. E. Novak, Deterministic and Stochastic Error Bounds in Numerical Analysis, vol. 1349, Springer, Berlin, 1988. C. Novara, Experiment design in nonlinear set membership identification, in: American Control Conference, New York City, USA, 2007. J.R. Partington, Interpolation, Identification and Sampling, vol. 17, Clarendon Press—Oxford, New York, 1997. R.T. Rockafellar, Convex Analysis, Princeton University Press, New Jersey, 1970. J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Glorennec, H. Hjalmarsson, A. Juditsky, Nonlinear black-box modeling in system identification: a unified overview, Automatica 31 (1995) 1691–1723. J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-Based Complexity, Academic Press, Inc, 1988. G.W. Wasilkowski, H. Wo´zniakowski, Complexity of weighted approximation over R d , J. Complexity (17) (2001) 722–740. H. Wo´zniakowski, Open problems for tractability of multivariate integration, J. Complexity 19 (2003) 434–444.


A note on two fixed point problems Ch. Boonyasiriwata , K. Sikorskia,∗,1 , Ch. Xiongb,1 a School of Computing, University of Utah, Salt Lake City, UT 84112, USA b Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA

Received 20 March 2006; accepted 19 April 2007 Available online 10 May 2007 We dedicate this paper to Henryk Wo´zniakowski on the occasion of his 60th birthday

Abstract We extend the applicability of the Exterior Ellipsoid Algorithm for approximating n-dimensional fixed points of directionally nonexpanding functions. Such functions model many practical problems that cannot be formulated in the smaller class of globally nonexpanding functions. The upper bound 2n2 ln(2/ε) on the number of function evaluations for finding ε-residual approximations to the fixed points remains the same for the larger class. We also present a modified version of a hybrid bisection-secant method for efficient approximation of univariate fixed point problems in combustion chemistry. © 2007 Elsevier Inc. All rights reserved. Keywords: Fixed point problems; Optimal algorithms; Nonlinear equations; Ellipsoid algorithm; Computational complexity

1. Introduction An upper bound on the number of function evaluations needed to compute an -residual approximation x to some fixed point of a function f, f (x ) − x 2 , for globally nonexpanding in the 2-nd norm function f, is 2n2 ln(1/) in n dimensions (see [3, Section 3]). This bound is realized by the Exterior Ellipsoid Algorithm (EEA). It is much better than the best known bounds O((1/)2 ) for the Krasnoselski–Mann type iterations [11], and is within a factor of n from the


E-mail addresses: [email protected] (Ch. Boonyasiriwat), [email protected] (K. Sikorski), [email protected] (Ch. Xiong). 1 Partially supported by DOE under the C-SAFE center.


Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961

953

best possible bound O(n ln(1/)), → 0 [8,10] realized by the Centroid and Interior Ellipsoid algorithms (IEA). At the conference in Bedlewo (Poland), co-organized in 2004 by Professor Wo´zniakowski, Dr. Vassin asked us if these bounds and algorithms could be extended to larger, more practical classes of functions that are only nonexpanding in the direction of fixed points. We stress that these larger classes contain functions that may be globally expanding, may be noncontinuous or may have unbounded derivatives. It turns out that the answer to Dr. Vassin’s question is positive. We show with a simple proof, that the ellipsoid algorithms are applicable for the larger class and that the complexity bounds stay the same as for the globally nonexpanding functions. Several numerical tests of a new, numerically stable implementation of the EEA, as well as comparisons to simple iteration and Newton-type methods are presented in a separate paper [4]. We also introduce a univariate hyper-bisection/secant (HBS) method for approximating fixed points of certain combustion chemistry problems. That algorithm enjoys the average case number of iterations O(log log(1/)) for computing -absolute solutions. It is a modification of the bisection-secant method of Novak, Ritter and Wo´zniakowski that was proven by them to be optimal in the average case [12], with average number of function evaluations O(log log(1/)). We stress that the ellipsoid algorithms are not applicable in the infinity-norm case, since the “cutting ball/plane” Lemma 3.1 that makes possible the construction of exterior/interior ellipsoids does not hold in that case. For the infinity norm case we developed a Bisection Envelope algorithm (BEFix) [14] and a Bisection Envelope Deep-Cut algorithm (BEDFix) [15] for approximating fixed points of two-dimensional nonexpanding functions. Those algorithms enjoy the minimal number of function evaluations 2log2 (1/) + 1. We also developed a (non-optimal) recursive fixed point algorithm (PFix) for approximating fixed points of n-dimensional nonexpanding functions with respect to the infinity norm (see [16,17]). We note that the minimal number of function evaluations needed for finding -residual solutions for expanding functions with the factor > 1 is exponential ((/)(n−1) ), as → 0 [7,6].

2. Classes of functions Given the domain B = {x ∈ n | x 1}, the n-dimensional unit ball, we consider the class of Lipschitz continuous functions: B 1 ≡ {f : B → B : f (x) − f (y) x − y, ∀x, y ∈ B},

(1)

where n2, · = · 2 , and 0 < 1. In the case when 0 < < 1, the class of functions is denoted by B 0, x ,r if f (xi ) < 0, Ii = ⎩ i i−1 if f (xi ) = 0. [xi , xi ] The complete method is summarized in the flowchart in Fig. 2.


959

The BRS method is almost optimal on the average. The average number maver of function evaluations for finding an -approximation to the solution is bounded by 1 1 + A, · log log maver √ 1+ 5 log 2 where A is a constant [12]. For practical combustion simulations, the initial solid bulk temperature T0 usually varies within the interval TE = [280, 460] K and the gas phase pressure P within the interval PR = [0, 3000] atm. To carry out the tests, we selected 60 × 50 evenly spaced grid nodes for the set of parameters TE × PR. By choosing = 10−4 , the average number of iterations is 10.5, where the average is defined as the total number of iterations divided by the number of tested functions. We observed that for low P and high T0 it took in the worst case 12–13 iterations to solve the problem. We derive the HBS, a modification to the BRS method, in order to lower the average and worst case numbers of iterations. 4.1.2. HBS method To derive the HBS method, we first divide the parameter set TE × PR = [280, 460] × [0, 3000] into three subdomains Di , i = 1, 2, 3 by two lines P = 4 · (T0 − 250) and P = 15 · (T0 − 250), ⎧ ⎨ D1 if P 4 · (T0 − 250), Di = D2 if 4 · (T0 − 250) 15 · (T0 − 250). For each subdomain, we run two steps of hyper-bisection method defined as Hyperbisi = li + i · (ri − li ), T −l

where i = rs,ii −li i ∈ [0, 1]. Extensive numerical experiments indicate that the solutions are distributed around the point Tmin + · (Tmax − Tmin ) in the sub-domain D1 , where = 0.12. We therefore utilize 1 = for the first step of hyper-bisection, and 2 if f (Hyperbis1 ) < 0, 2 = 1 − otherwise choices guarantee that in most cases the solution is in the interval where = 0.2. Those Hyperbis1 , Hyperbis2 . The same strategy applies to subdomains D2 and D3 , with = 0.18 for D2 and = 0.25 for D3 . Parameter equals 0.2 for all subdomains. Ideally, the solution interval is reduced to 2–5% of its original length after two steps of the hyper-bisection. Thereafter, the BRS method is used to find the solution. When choosing the same set of test functions and = 10−4 , the average number of iterations of the HBS method is 5.7 (worst case 6) as compared to 10.5 (worst case 13) of the BRS method. We remark that the secant method in the BRS algorithm could be replaced by the Newton’s method in order to get asymptotically quadratic rate of convergence. This would however increase the cost of each iteration by a factor of at least two, since each step of Newton’s method requires the computation of function value and the derivative, whenever the secant step only needs one function evaluation. As a result, the total computational cost would increase.

960


4.1.3. Conclusion A hybrid bisection-secant method was developed for solving nonlinear equations derived from a combustion model. For the specific univariate zero finding problem, two more steps of a hyperbisection method in addition to the original algorithm reduce the average number of iterations from 10.5 to 5.7. The worst case numbers of iterations are reduced from 13 to 6. This represents a significant improvement in the cost of carrying out large scale combustion simulations, since this zero finding problem has to be solved at every cell and every time step of billions of cells and millions of time steps. Acknowledgments We would like to thank the referees for the comments that significantly improved our paper. References [1] A. Ageev, V. Vassin, E. Bessonova, V. Markusevich, Radiosounding ionosphere on two frequencies. Algorithmic analysis of Fredholm–Stiltjes integral equation, in: Theoretical Problems in Geophysics, 1997, pp. 100–118 (in Russian). [2] R.G. Bland, D. Goldfarb, M. Todd, The ellipsoid method: a survey, Oper. Res. 6 (1981) 1039–1090. [3] C. Boonyasiriwat, Circumscribed ellipsoid algorithm for fixed point problems, M.S. Thesis, University of Utah, Salt Lake City, UT, 2004. [4] C. Boonyasiriwat, K. Sikorski, CH.-W. Tsay, Algorithm XXX: circumscribed Ellipsoid algorithm for fixed points, 2007, submitted to ACM ToMS. [5] L.E. Brouwer, Über abbildungen von mannigfaltigkeiten, Math. Ann. 71 (1912) 97–115. [6] X. Deng, X. Chen, On algorithms for discrete and approximate Brouwer fixed points, in: H.N. Gabov, R. Fagin (Eds.), Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22–24, 2005, ACM, New York, 2005. [7] M.D. Hirsch, C. Papadimitriou, S. Vavasis, Exponential lower bounds for finding Brouwer fixed points, J. Complexity 5 (1989) 379–416. [8] Z. Huang, L. Khachiyan, K. Sikorski, Approximating fixed points of weakly contracting mappings, J. Complexity 15 (1999) 200–213. [9] L. Khachiyan, Polynomial algorithm in linear programming, Soviet. Math. Dokl. 20 (1979) 191–194. [10] L. Khachiyan, Private email communication to K. Sikorski, 2000. [11] U. Kochlenbach, Effective uniform bounds from proofs in abstract functional analysis, in: B. Cooper, B. Loewe, A. Sorbi (Eds.), CiE 2005 New Computational Paradigms: Changing Conceptions of What is Computable, Springer, Berlin, 2005. [12] E. Novak, K. Ritter, H. Wo´zniakowski, Average case optimality of a hybrid secant-bisection method, Math. Comput. 64 (1995) 1517–1539. [13] G. Perestonina, I. Prutkin, L. Timerkhanova, V. Vassin, Solving three-dimensional inverse problems of gravimetry and magnetometry for three layer medium, Math. Modeling 15 (2) (2003) 69–76 (in Russian). [14] S. Shellman, K. Sikorski, A two-dimensional bisection envelope algorithm for fixed points, J. Complexity 18 (2) (2002) 641–659. [15] S. Shellman, K. Sikorski, Algorithm 825: a deep-cut bisection envelope algorithm for fixed points, ACM Trans. Math. Soft. 29 (3) (2003) 309–325. [16] S. Shellman, K. Sikorski, A recursive algorithm for the infinity-norm fixed point problem, J. Complexity 19 (6) (2003) 799–834. [17] S. Shellman, K. Sikorski, Algorithm 848: a recursive fixed point algorithm for the infinity-norm case, ACM Trans. Math. Soft. 31 (4) (2005) 580–587. [18] K. Sikorski, Bisection is optimal, Numer. Math. 40 (1982) 111–117. [19] K. Sikorski, Fast algorithms for the computation of fixed points, in: R.T.M. Milanese, A. Vicino (Eds.), Robustness in Identification and Control, Plenum Press, New York, 1982, pp. 49–59. [20] K. Sikorski, Optimal Solution of Nonlinear Equations, Oxford Press, New York, 2001. [21] K. Sikorski, C.W. Tsay, H. Wo´zniakowski, An ellipsoid algorithm for the computation of fixed points, J. Complexity 9 (1993) 181–200.


961

[22] K. Sikorski, H. Wo´zniakowski, Complexity of fixed points, J. Complexity 3 (1987) 388–405. [23] C.W. Tsay, Fixed point computation and parallel algorithms for solving wave equations, Ph.D. Thesis, University of Utah, Salt Lake City, UT, 1994. [24] V. Vassin, Ill-posed problems with a priori information: methods and applications, Institute of Mathematics and Mechanics, Russian Academy of Sciences, Ural Subdivision, 2005. [25] V. Vassin, A. Ageev, Ill-posed problems with a priori information, VSP, Utrecht, The Netherlands, 1995. [26] V. Vassin, E. Eremin, Feyer type operators and iterative processes, Russian Academy of Sciences, Ural Subdivision, Ekaterinburg, 2005 (in Russian). [27] V. Vassin, T. Sereznikova, Two stage method for approximation of nonsmooth solutions and reconstruction of noisy images, Automat. Telemechanica 2 (2004) 12 (in Russian). [28] M. Ward, A new modeling paradigm for the steady deflagration of homogeneous energetic materials, M.S. Thesis, University of Illinois, Urbana-Champain, 1997. [29] M. Ward, S. Son, M. Brewster, Role of gas- and condensed-phase kinetics in burning rate control of energetic solids, Combust. Theory Modeling 2 (1998) 293–312. [30] M. Ward, S. Son, M. Brewster, Steady deflagration of HMX with simple kinetics: a gas phase chain reaction model, Combust. Flame 114 (1998) 556–568. [31] Ch. Xiong, Optimal nonlinear solvers for sub-grid scale combustion models, MS-CES Report, University of Utah, Computational Engineering and Science Program, 2005.

Journal of Complexity 23 (2007) 962–963

http://www.elsevier.com/locate/jco

Author Index for Volume 23 G

A Avendanõ, Martı´ n, 193 B Babenko, V.F., 346, 890 Bauer, Frank, 52 Berkes, Istvań, 516 Boonyasiriwat, Ch., 952 Borodachov, S.V., 346 Bournez, Olivier, 317 Briquel, Ireńeé, 594 Butcher, J.C., 560 C Calafiore, Giuseppe, 301 Campagnolo, Manuel L., 317 Cattani, Eduardo, 82 Chen, W.W.L., 662 Che`ze, Guillaume, 380 Creutzig, Jakob, 867 Cucker, Felipe, 594 D Dabbene, Fabrizio, 301 Dahlke, Stephan, 614 DeVore, Ronald A., 918 Dick, Josef, 436, 581, 649, 752 Dickenstein, Alicia, 82 Dryja, Maksymilian, 715 F Fang, Kai-Tai, 740 Fu, Fang-Wei, 423 doi:10.1016/S0885-064X(07)00130-6

Galvis, Juan, 715 Gill, Hardeep S., 603 Gnewuch, Michael, 262, 828 Graça, Daniel S., 317 H Hackbusch, Wolfgang, 697 Hainry, Emmanuel, 317 Heinrich, Stefan, 793 Hesse, K., 528 Hesse, Kerstin, 25 Huang, F.L., 73 Hui, Yao, 245 J Jackiewicz, Z., 560 K Kacewicz, Boleslaw, 421 Kaltenbacher, Barbara, 225 Kapusta, Joanna, 336 Khoromskij, Boris N., 697 Ko, Ker-I, 2 Krick, Teresa, 193 Kritzer, Peter, 581, 752 Kuo, Frances Y., 25, 752 L Lazarov, R.D., 498 Lecerf, Gre´goire, 380 Lemieux, Christiane, 603 Li, Runze, 740 Lindloh, Rene´, 828

Author Index / Journal of Complexity 23 (2007) 962–963

M Maller, Michael, 217 Margenov, S.D., 498 Mathe´, Peter, 673 Meidl, Wilfried, 169 Mhaskar, H.N., 528 Milanese, Mario, 937 Mu¨ller-Gronbach, Thomas, 867 N Nie, Jiawang, 135 Niederreiter, Harald, 1, 169, 423 Novak, Erich, 614, 673 Novara, Carlo, 937 O

Skorokhodov, D.S., 890 Skriganov, M.M., 926 Sloan, I.H., 528 Sloan, Ian H., 25, 752 Smarzewski, Ryszard, 336 Sombra, Martı´ n, 193 Srivastav, Anand, 828 T Tarieladze, Vaja, 851 Tempo, Roberto, 301 Thamban Nair, M., 454 Tichy, Robert F., 516 Traub, Joseph F., 1 Travaglini, G., 662 Triebel, Hans, 468 V

Osipenko, K.Yu., 653 P Papageorgiou, A., 802 Penã, Javier, 245 Pereverzev, Sergei V., 454 Pereverzev, Sergei, 52 Philipp, Walter, 516 Pillichshammer, Friedrich, 436, 581 Plaskota, Leszek, 421 R Ritter, Klaus, 867 Rivera, Juan Carlos, 245 Rosasco, Lorenzo, 52 S Sarkis, Marcus, 715 Scheiblechner, Peter, 359 Scheicher, Klaus, 152 Schmid, Wolfgang Ch., 581 Schneider, Reinhold, 828 Schweighofer, Markus, 135 Sickel, Winfried, 614 Sikorski, K., 952

Vakhania, Nicholas, 851 Venkateswarlu, Ayineedi, 169 Vera, Juan Carlos, 245 Vybı´ ral, Jan, 773 W Wasilkowski, Grzegorz, 421 Wedenskaya, E.V., 653 Werschulz, Arthur G., 553 Whitehead, Jennifer, 217 Wozńiakowski, Henryk, 1, 262 Wright, W.M., 560 Wu, Qiang, 108 X Xiong, Ch., 952 Y Ying, Yiming, 108 Yu, Fuxiang, 2 Z Zhang, Aijun, 740 Zhang, S., 73 Zhou, Ding-Xuan, 108

963

Contents continued from inside back cover Sampling numbers and function spaces Jan Vybíral

773

Quantum lower bounds by entropy numbers Stefan Heinrich

793

On the complexity of the multivariate Sturm–Liouville eigenvalue problem A. Papageorgiou

802

Cubature formulas for function spaces with moderate smoothness Michael Gnewuch, René Lindloh, Reinhold Schneider, Anand Srivastav

828

Disintegration of Gaussian measures and average-case optimal algorithms Vaja Tarieladze, Nicholas Vakhania

851

Free-knot spline approximation of stochastic processes Jakob Creutzig, Thomas Müller-Gronbach, Klaus Ritter

867

On the best interval quadrature formulae for classes of differentiable periodic functions V.F. Babenko, D.S. Skorokhodov Deterministic constructions of compressed sensing matrices Ronald A. DeVore On linear codes with large weights simultaneously for the Rosenbloom–Tsfasman and Hamming metrics M.M. Skriganov Computation of local radius of information in SM-IBC identification of nonlinear systems Mario Milanese, Carlo Novara

890 918

926

937

A note on two fixed point problems Ch. Boonyasiriwat, K. Sikorski, Ch. Xiong

952

Author Index for Volume 23

962

Journal of Complexity, Volume 23, Issues 4-6, Pages 421-964 (August-December 2007), Festschrift for the 60th Birthday of Henryk Woźniakowski

Harmonic analysis, signal processing, and complexity: Festschrift in honor of the 60th birthday of C.A. Berenstein

Harmonic Analysis, Signal Processing, and Complexity: Festschrift in Honor of the 60th Birthday of Carlos A. Berenstein

The Grothendieck Festschrift, Volume I: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck

Additive number theory: Festschrift in honor of 60th birthday of M.B.Nathanson

Fluctuating Paths and Fields: Festschrift Dedicated to Hagen Kleinert on the Occasion of His 60th Birthday

International Journal of Powder Metallurgy Volume 46 Issue 4

International Journal of Powder Metallurgy Volume 46 Issue 2

International Journal of Powder Metallurgy Volume 46 Issue 1

The Journal of Management Development, Volume 23, Number 9, 2004

International Journal of Powder Metallurgy Volume 46 Issue 3

International Journal of Powder Metallurgy Volume 46 Issue 6

International Journal of Powder Metallurgy Volume 46 Issue 5

Pages From A Journal

Geometry and analysis on complex manifolds : festschrift for Professor S. Kobayashi's 60th birthday

Geometry, Mechanics, and Dynamics: Volume in Honor of the 60th Birthday of J. E. Marsden

Geometry, Mechanics, and Dynamics: Volume in Honor of the 60th Birthday of J. E. Marsden

Advances in phase space analysis of PDEs: F.Colombini's 60th birthday

Henryk Sienkiewicz

A. A. Lyapunov (on his 60th birthday)

The Grothendieck Festschrift, Volume II: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhauser Classics)

The Grothendieck Festschrift, Volume II: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhäuser Classics)

Pages From a Vampire's Journal

Pages From a Vampire's Journal

The Grothendieck Festschrift, Volume III: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhäuser Classics)

The Grothendieck Festschrift, Volume II: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhauser Classics)

Exactitude: Festschrift for Robert Faurisson to His 75th birthday

Groups of Diffeomorphisms: In Honor of Shigeyuki Morita on the Occasion of His 60th Birthday

Social Psychology of the Workplace, Volume 23

Perspectives in Analysis, Geometry, and Topology: On the Occasion of the 60th Birthday of Oleg Viro

Henryk Prawy

Journal of Complexity, Volume 23, Issues 4-6, Pages 421-964 (August-December 2007), Festschrift for the 60th Birthday of Henryk Woźniakowski

Harmonic analysis, signal processing, and complexity: Festschrift in honor of the 60th birthday of C.A. Berenstein

Harmonic Analysis, Signal Processing, and Complexity: Festschrift in Honor of the 60th Birthday of Carlos A. Berenstein

The Grothendieck Festschrift, Volume I: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck

Additive number theory: Festschrift in honor of 60th birthday of M.B.Nathanson

Fluctuating Paths and Fields: Festschrift Dedicated to Hagen Kleinert on the Occasion of His 60th Birthday

International Journal of Powder Metallurgy Volume 46 Issue 4

International Journal of Powder Metallurgy Volume 46 Issue 2

International Journal of Powder Metallurgy Volume 46 Issue 1

The Journal of Management Development, Volume 23, Number 9, 2004

International Journal of Powder Metallurgy Volume 46 Issue 3

International Journal of Powder Metallurgy Volume 46 Issue 6

International Journal of Powder Metallurgy Volume 46 Issue 5

Pages From A Journal

Geometry and analysis on complex manifolds : festschrift for Professor S. Kobayashi's 60th birthday

Geometry, Mechanics, and Dynamics: Volume in Honor of the 60th Birthday of J. E. Marsden

Geometry, Mechanics, and Dynamics: Volume in Honor of the 60th Birthday of J. E. Marsden

Advances in phase space analysis of PDEs: F.Colombini's 60th birthday

Henryk Sienkiewicz

A. A. Lyapunov (on his 60th birthday)

The Grothendieck Festschrift, Volume II: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhauser Classics)

The Grothendieck Festschrift, Volume II: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhäuser Classics)

Pages From a Vampire's Journal

Pages From a Vampire's Journal

The Grothendieck Festschrift, Volume III: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhäuser Classics)

The Grothendieck Festschrift, Volume II: A Collection of Articles Written in Honor of the 60th Birthday of Alexander Grothendieck (Progress in Mathematics Modern Birkhauser Classics)

Exactitude: Festschrift for Robert Faurisson to His 75th birthday

Groups of Diffeomorphisms: In Honor of Shigeyuki Morita on the Occasion of His 60th Birthday

Social Psychology of the Workplace, Volume 23

Perspectives in Analysis, Geometry, and Topology: On the Occasion of the 60th Birthday of Oleg Viro

Henryk Prawy

Recommend Documents