Asymptotic theory of quantum statistical inference

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, ...

Author: Masahito Hayashi

11 downloads 1005 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

ASYMPTOTIC THEORY OF QUANTUM STATISTICAL INFERENCE Selected Papers Copyright © 2005 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-256-015-7

Printed in Singapore.

December 28, 2004

13:56

WSPC / Master file for review volume with part divider — 9in x 6in

PREFACE

In order to obtain information from a quantum system of interest, we need to perform a quantum measurement and extract the desired information from the obtained data. Needless to say, we had better optimize the above two processes for obtaining information of the quantum system. This research area is called “Quantum Statistical Inference,” and is much required for realizing many quantum information processing tasks. This research field was initiated in the middle of 1960s, and was studied up to 1980 by American and Russian researchers, for example Holevo, Yuen, Kennedy, and Belavkin, etc. Their research is summarized in the following books: [a] C. W. Helstrom, Quantum Detection and Estimation Theory, (Academic Press, New York, 1976). [b] A. S. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, (North-Holland, Amsterdam, 1982); Originally published in Russian (1980). However, these researchers did not consider the asymptotic aspects while the asymptotic theory is essential for the large sample case in statistics. Moreover, an elegant general theory has been established in classical statistical inference theory. In the middle of the 1980s, a different research direction has been started by Nagaoka (who is an expert in mathematical statistics and information geometry), which focused on the asymptotic theory. In the 1990s, several Japanese researchers (Fujiwara, Matsumoto, Ogawa, Hayashi) have been influenced by Nagaoka, and joined Quantum Statistical Inference. Hence, in the 1990s, by combining the mathematical formulation of quantum mechanics and mathematical statistics, these Japanese researchers obtained several good results in Quantum Statistical Inference. Especially, the Japanese v

entire

December 28, 2004

vi

13:56


Asymptotic Theory of Quantum Statistical Inference

researchers have deeply discussed its asymptotic aspects, which had not been studied in the earlier stage. Recently, Quantum Statistical Inference has drawn the attention of several European statisticians (Gill, BandorffNielsen, Jupp, Gut¸˘a, Ballester, etc.) who joined this research field. On the other hand, several different directions of this research area were started in Europe after the 1990s by physicists, Massar, Popescu, D’Ariano, Buˇzek, Keyl, Werner, Bagan, Baig, Mu˜ noz-Tapia, Gisin, Vidal, Latorre, Pascual, Tarrach etc. They were motivated by the foundations of physics. However, despite this progress, there has been no English book covering the great progress in quantum statistical inference since Holevo’s textbook. Therefore, this book: Asymptotic Theory of Quantum Statistical Inference is expected to contribute not only to quantum information science and statistics but also to various other research fields. Moreover, for the reader’s convenience, I add introductions (Chap 0 and introductions for every part), which explain preliminary concepts, historical backgrounds, and related researches. Indeed, some contributors strongly advised me to add introductions for the convienience of readers unfamiliar to quantum statistical inference, and agreed with these introductions. Particularly, a contributor strongly recommended me to introduce some researchers that are not contained in this book. However, these introductions are written based only on my personal point of view. Thus, I, the editor of this book, assume all responsibility for these parts. These introductions are different from the collective opinion of all contributors∗. Therefore, they should be used only as reference materials for the historical significances of contained papers, and their significances should be decided by readers themselves. I wish to thank Japan Science and Technology Agency for covering the reprint permission fee through ERATO Quantum Computation and Information Project. I would like to express my gratitude to Professor Hiroshi Imai and all members of Quantum Computation and Information Project for their warm support. I am particularly indebted to Professor Richard Gill, Professor Keiji Matsumoto, Professor Hiroshi Nagaoka, and Mr. Fuyuhiko Tanaka for a lot of helpful comments on the introductions. I am especially grateful to Mr. Fuyuhiko Tanaka, Dr. Kouki Yonezawa, Mr. Susumu Tokumoto, and Dr. Yosuke Kikuchi for supporting the editing process. Finally, my thanks go to all contributors and translators for their ∗ Indeed,

a contributor sent me a strong objection against containing these introductions because he disagree with the content of these introductions.

entire

December 28, 2004

13:56


Preface

vii

contribution and careful proof reading. Masahito Hayashi ERATO, Quantum Computation and Information Project Japan Science and Technology Agency Hongo, Tokyo, Japan

entire

December 28, 2004

viii

13:56



entire

December 28, 2004

13:56


CONTENTS

Preface

v

First Appearance . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

0

8

Introduction to Quantum Statistical Inference . . . . . . . . . .

Part I: Hypothesis Testing

17

1

Introduction to Part I . . . . . . . . . . . . . . . . . . . . . . . .

19

2

Strong Converse and Stein’s Lemma in Quantum Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Tomohiro Ogawa and Hiroshi Nagaoka

3

The Proper Formula for Relative Entropy and its Asymptotics in Quantum Probability . . . . . . . . . . . . . . . . . . .

43

Fumio Hiai and D´ enes Petz

4

Strong Converse Theorems in Quantum Information Theory . .

64

Hiroshi Nagaoka

5

Asymptotics of Quantum Relative Entropy from a Representation Theoretical Viewpoint . . . . . . . . . . . . . . . . .

66

Masahito Hayashi

6

Quantum Birthday Problems: Geometrical Aspects of Quantum Random Coding . . . . . . . . . . . . . . . . . . . . .

75

Akio Fujiwara

Part II: Quantum Cram´ er-Rao Bound in Mixed States Model

89

7

91

Introduction to Part II . . . . . . . . . . . . . . . . . . . . . . . ix

entire

December 28, 2004

x

8

13:56



A New Approach to Cramér-Rao Bounds for Quantum State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Hiroshi Nagaoka

9

On Fisher Information of Quantum Statistical Models . . . . . . 113 Hiroshi Nagaoka

10 On the Parameter Estimation Problem for Quantum Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Hiroshi Nagaoka

11 A Generalization of the Simultaneous Diagonalization of Hermitian Matrices and its Relation to Quantum Estimation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Hiroshi Nagaoka

12 A Linear Programming Approach to Attainable CramérRao Type Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Masahito Hayashi

13 Statistical Model with Measurement Degree of Freedom and Quantum Physics . . . . . . . . . . . . . . . . . . . . . . . . 162 Masahito Hayashi and Keiji Matsumoto

14 Asymptotic Quantum Theory for the Thermal States Family . . 170 Masahito Hayashi

15 State Estimation for Large Ensembles . . . . . . . . . . . . . . . 178 Richard D. Gill and Serge Massar

Part III: Quantum Cram´ er-Rao Bound in Pure States Model

215

16 Introduction to Part III . . . . . . . . . . . . . . . . . . . . . . . 217 17 Quantum Fisher Metric and Estimation for Pure State Models . 220 Akio Fujiwara and Hiroshi Nagaoka

18 Geometry of Quantum Estimation Theory . . . . . . . . . . . . 229 Akio Fujiwara

19 An Estimation Theoretical Characterization of Coherent States . 287 Akio Fujiwara and Hiroshi Nagaoka

20 A Geometrical Approach to Quantum Estimation Theory . . . . 305 Keiji Matsumoto

entire

December 28, 2004

13:56


Contents

Part IV: Group Symmetric Approach to Pure States Model

xi

351

21 Introduction to Part IV . . . . . . . . . . . . . . . . . . . . . . . 353 22 Optimal Extraction of Information from Finite Quantum Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Serge Massar and Sandu Popescu

23 Asymptotic Estimation Theory for a Finite-Dimensional Pure State Model . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Masahito Hayashi

24 Optimal Universal Quantum Cloning and State Estimation . . . 379 Dagmar Bruß, Artur Ekert, and Chiara Macchiavello

25 Bounds for Generalized Uncertainty of Shift Parameter . . . . . 386 Alexander S. Holevo

Part V: Large Deviation Theory in Quantum Estimation

393

26 Introduction to Part V . . . . . . . . . . . . . . . . . . . . . . . 395 27 On the Relation between Kullback Divergence and Fisher Information: From Classical Systems to Quantum Systems . . . 399 Hiroshi Nagaoka

28 Two Quantum Analogues of Fisher Information from a Large Deviation Viewpoint of Quantum Estimation . . . . . . . 420 Masahito Hayashi

29 Estimating the Spectrum of a Density Operator . . . . . . . . . 458 Michael Keyl and Reinhard F. Werner

Part VI: Further Topics on Quantum Statistical Inference

469

30 Introduction to Part VI . . . . . . . . . . . . . . . . . . . . . . . 471 31 Optimal Quantum Clocks . . . . . . . . . . . . . . . . . . . . . . 477 Vladim´ır Buˇzek, R. Derka, and Serge Massar

32 Quantum Channel Identification Problem . . . . . . . . . . . . . 487 Akio Fujiwara

33 Homodyning as Universal Detection . . . . . . . . . . . . . . . . 494 Giacomo Mauro D’Ariano

entire

December 28, 2004

xii

13:56



34 On the Measurement of Qubits . . . . . . . . . . . . . . . . . . . 509 Daniel F. V. James, Paul G. Kwiat, William J. Munro, and Andrew G. White

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

entire

December 28, 2004

13:56


FIRST APPEARANCE

Part I Chap. 2: T. Ogawa and H. Nagaoka, “Strong converse and Stein’s lemma in quantum hypothesis testing,” IEEE Trans. Inform. Theory, 46, 2428–2433, (2000); quant-ph/9906090, 1999: Received December 27, 1999; revised March 30, 2000. The material in this paper was presented at the 1999 IEEE Informtation Theory and Networking Workshop, Metsovo, Greece, June 27 July 1, 1999. Chap. 3: F. Hiai and D. Petz, “The proper formula for relative entropy and its asymptotics in quantum probability,” Commun. Math. Phys. 143, 99–114, (1991): Received 22 February 1991; in revised form June 3, 1991. Chap. 4: H. Nagaoka, “Strong converse theorems in quantum information theory,” In Proceedings of ERATO Workshop on Quantum Information Science 2001, Univ. Tokyo, Tokyo, Japan, September 6–8, 2001, pp. 33. Chap. 5: M. Hayashi, “Asymptotics of quantum relative entropy from a representation theoretical viewpoint,” J. Phys. A: Math. and Gen., 34, 3413–3419, (2001): Received 23 December 2000; quantph/9704040, 1997. Chap. 6: A. Fujiwara, “Quantum birthday problems: geometrical aspects of quantum random coding,” IEEE Trans. Inform. Theory, 47, 2644–2649, (2001): Received September 5, 2000; revised March 8, 2001. The material in this paper was presented in part at The 23rd Symposium on Information Theory and Its Applications, Aso, Kumamoto, Japan, October 10–13, 2000.

1

entire

December 28, 2004

2

13:56


First Appearance

Part II Chap. 8: H. Nagaoka, “A new approach to Cramér-Rao bounds for quantum state estimation,” IEICE Technical Report, 89, 228, IT 89-42, 9–14, (1989). Chap. 9: H. Nagaoka, “On Fisher information of quantum statistical models,” In Proceedings of 10th Symposium on Information Theory and Its Applications (SITA), Enoshima, Kanagawa, Japan, November 19–21, 1987, p. 241–246. (Originally written in Japanese. It was translated to English by F. Tanaka and Y. Tsuda). Chap. 10: H. Nagaoka, “On the parameter estimation problem for quantum statistical models,” In Proceedings of 12th Symposium on Information Theory and Its Applications (SITA), Inuyama, Aichi, Japan, December 6–9, 1989, p 577–582. Chap. 11: H. Nagaoka, “A generalization of the simultaneous diagonalization of Hermitian matrices and its relation to quantum estimation theory,” Trans. Jap. Soc. Indust. Appl. Math., 1, 43–56, (1991) (Originally written in Japanese. It was translated to English by F. Tanaka and Y. Tsuda). Chap. 12: M. Hayashi, “A linear programming approach to attainable Cramér-Rao type bounds,” in O. Hirota, A.S. Holevo, and C.A. Caves, eds., Quantum Communication, Computing, and Measurement, Plenum, New York, 1997, 99–108: The material in this paper was presented at The Third International Conference on Quantum Communication and Measurement, Hakone, Shizuoka, Japan, September 25–30, 1996. Chap. 13: M. Hayashi and K. Matsumoto, “Statistical model with measurement degree of freedom and quantum physics,” Surikaiseki Kenkyusho Kokyuroku, 1055, 96–110, (1998). The material in this paper was presented at RIMS workshop “Large Deviation and Statistical Inference” RIMS, Kyoto Univ., Kyoto, Japan, March 16–18, 1998. (Originally written in Japanese. It was translated to English by K. Matsumoto). Chap. 14: M. Hayashi, “Asymptotic quantum theory for the thermal states family,” in P. Kumar, G.M. D’ariano, and O. Hirota, eds., Quantum Communication, Computing, and Measurement 2, Kluwer/Plenum, New York, 2000, 99–104: The material in this paper was presented at The Fourth International Conference on Quantum Communication, Measurement, and Computing,

entire

December 28, 2004

13:56


First Appearance

3

Evanston, Illinois, USA, August, 22–27, 1998. Chap. 15: R. Gill and S. Massar, “State estimation for large ensembles,” Phys. Rev. A, 61, 042312, (2000); quant-ph/9902063, 1999: Received 18 February 1999; published 17 March 2000. Part III Chap. 17: A. Fujiwara and H. Nagaoka, “Quantum Fisher metric and estimation for pure state models,” Physics Letters, 201A, 119–124, (1995): Received 3 January 1995; accepted 7 April 1995. Chap. 18: A. Fujiwara, “Geometry of quantum estimation theory,” Part II of A Geometrical Study in Quantum Information Systems, Doctoral thesis, University of Tokyo, (1995). Chap. 19: A. Fujiwara and H. Nagaoka, “An estimation theoretical characterization of coherent states,” J. Math. Phys., 40, 4227–4239, (1999): Received 23 May 1997; accepted for publication 2 June 1999. Chap. 20: K. Matsumoto, “A geometrical approach to quantum estimation theory,” Doctoral thesis. Graduate School of Mathematical Sciences, University of Tokyo, (1997). Part IV Chap. 22: S. Massar and S. Popescu, “Optimal extraction of information from finite quantum ensembles,” Phys. Rev. Lett., 74, 1259, (1995): Received 18 October 1993. Chap. 23: M. Hayashi, “Asymptotic estimation theory for a finitedimensional pure state model,” J. Phys. A: Math. and Gen., 31, 4633–4655, (1998): Received 29 September 1997; quantph/9704041. Chap. 24: D. Bruß, A. Ekert, and C. Macchiavello, “Optimal universal quantum cloning and state estimation,” Phys. Rev. Lett., 81, 2598– 2601, (1998): Received 1 December 1997; quant-ph/9712019. Chap. 25: A.S. Holevo, “Bounds for generalized uncertainty of the shift parameter,” Lecture Notes in Mathematics, 1021, 243–251, (1983). Part V Chap. 27: H. Nagaoka, “On the relation between Kullback divergence and Fisher information — from classical systems to quantum

entire

December 28, 2004

4

13:56


First Appearance

systems —,” Proceedings of Joint Mini-workshop for data compression theory and fundamental open problems in information theory, 63–72, (1992) (Originally written in Japanese. It was translated to English by H. Nagaoka). Chap. 28: M. Hayashi, “Two quantum analogues of Fisher information from a large deviation viewpoint of quantum estimation,” J. Phys. A: Math. and Gen., 35, 7689-7727, (2002): Received 12 February 2002, in final form 14 June 2002 Published 28 August 2002; quantph/0202003. Chap. 29: M. Keyl and R.F. Werner, “Estimating the spectrum of a density operator,” Phys. Rev. A, 64, 052311, (2001): Received 7 March 2001; published 15 October 2001; quant-ph/0102027. Part VI Chap. 31: V. Buˇzek, R. Derka, and S. Massar, “Optimal quantum clocks,” Phys. Rev. Lett., 82, 2207, (1999): Received 17 August 1998; quantph/9808042, 1998. Chap. 32: A. Fujiwara, “Quantum channel identification problem,” Phys. Rev. A, 63, 042304, (2001): Received 25 August 2000; published 16 March 2001. Chap. 33: G.M. D’Ariano, “Homodyning as universal detection,” in O. Hirota, A.S. Holevo, and C.A. Caves, eds., Quantum Communication, Computing, and Measurement, Plenum, New York, 253–264, 1997; quant-ph/9701011: The material in this paper was presented at The Third International Conference on Quantum Communication and Measurement, Hakone, Shizuoka, Japan, September 25– 30, 1996. Chap. 34: D.F.V. James, P.G. Kwiat, W.J. Munro, and A.G. White, “On the measurement of qubits,” Phys. Rev. A, 64, 052312, (2001): Received 20 March 2001; published 16 October 2001; quantph/0103121.

entire

December 28, 2004

13:56


LIST OF CONTRIBUTORS

Dagmar Bruß Chap. 24 Inst. f. Theoret. Physik Universit¨ at Hannover Appelstr. 2, D-30167 Hannover, Germany [email protected]

Akio Fujiwara Chap. 6,17,18,19,32 Department of Mathematics Graduate School of Science Osaka University Toyonaka, Osaka 560-0043, Japan [email protected]

Vladim´ır Buˇ zek Chap. 31 Institute of Physics Slovak Academy of Sciences Dubravska cesta 9, 842 28 Bratislava, Slovakia [email protected]

Richard Gill Chap. 15 Mathematical Institute University Utrecht Box 80010, 3508 TA Utrecht, Netherlands [email protected]

Giacomo Mauro D’Ariano Chap. 33 Dipartimento di Fisica “A. Volta” via Bassi 6, I-27100 Pavia, Italy [email protected]

Masahito Hayashi Chap. 5,12,13,14,23,28 ERATO, Quantum Computation and Information Project, JST Bunkyo-ku, Tokyo 113-0033, Japan [email protected]

R. Derka Chap. 31

Fumio Hiai Chap. 3 Division of Mathematics Graduate School of Information Sciences Tohoku University Aoba-ku, Sendai 980-8579, Japan [email protected]

Artur Ekert Chap. 24 Centre for Quantum Computation University of Cambridge Cambridge CB3 0WA, UK [email protected]

5

entire

December 28, 2004

13:56


6

List of Contributors

Alexander S. Holevo Chap. 25 Steklov Mathematical Institute Gubkina 8, 119991 Moscow, Russia [email protected]

Chiara Macchiavello Chap. 24 Dipartimento di Fisica “A.Volta” Universit` a di Pavia 27100 Pavia, Italy

Daniel F. V. James Chap. 34 Theoretical Division T-4 MS B-283 Los Alamos National Laboratory Los Alamos, NM 87545, USA [email protected]

Serge Massar Chap. 15,22,31 Service de Physique Théorique CP 225 Université Libre de Bruxelles Bvd. du Triomphe, B-1050 Bruxelles, Belgium [email protected]

Michael Keyl Chap. 29 Institut fur Mathematische Physik TU-Braunschweig Mendelssohnstrase 3 38106 Braunschweig, Germany [email protected] Paul G. Kwiat Chap. 34 University of Illinois 1110 W. Green St., Urbana IL 61801-3080, USA [email protected] Keiji Matsumoto Chap. 13,20 Quantum Information Science Group National Institute of Informatics Chiyoda-ku, Tokyo 101-8430, Japan ERATO, Quantum Computation and Information Project, JST Bunkyo-ku, Tokyo 113-0033, Japan [email protected]

William J. Munro Chap. 34 Hewlett-Packard Laboratories Filton Road, Stoke Gifford Bristol BS34 8QZ, UK [email protected] Hiroshi Nagaoka Chap. 2,4,8,9,10,11,17,19,27 Graduate School of Information Systems The University of Electro-communications Chofu-shi, Tokyo 182-8585, Japan [email protected] Tomohiro Ogawa Chap. 2 Dept. of Mathematical Informatics Graduate School of Information Science and Technology University of Tokyo Bunkyo-ku, Tokyo 113-8656, Japan [email protected]

entire

December 28, 2004

13:56


List of Contributors

D´ enes Petz Chap. 3 Department for Mathematical Analysis Mathematical Institute Budapest University of Technology and Economics Megyetem rakpart 3-9, H-1111 Budapest, Hungary [email protected] Sandu Popescu Chap. 22 H H Wills Physics Lab. Bristol University Tyndall Av., Bristol BS8 1TL, UK [email protected] Andrew G. White Chap. 34 Department of Physics University of Queensland Brisbane, QLD 4072, Australia [email protected]

7

Reinhard F. Werner Chap. 29 TU Braunschweig Institut fur Mathematische Physik Mendelssohnstr. 3 38106 Braunschweig, Germany [email protected] Translators of Nagaoka’s paper: Fuyuhiko Tanaka Chap. 9,11 Dept. of Mathematical Informatics Graduate School of Information Science and Technology University of Tokyo Bunkyo-ku, Tokyo 113-8656, Japan [email protected] Yoshiyuki Tsuda Chap. 9,11 21COE, Chuo University Bunkyo-ku, Tokyo 112-8551, Japan [email protected]

entire

December 28, 2004

13:56


CHAPTER 0

Introduction to Quantum Statistical Inference

1. Classical/Quantum Statistical Inference Quantum mechanics has a statistical aspect at its core. Suppose that the state ρ of the given particle is unknown to us. If we want to fully know the density matrix of ρ, we need an infinite ensemble of quantum particles prepared in the same state. However, infinite ensembles do not exist in practice. Given a finite ensemble of identically prepared particles, how well can we estimate the state from the finite ensemble using results of quantum measurement? This problem is called quantum statistical inference, and the same problem for probability distributions has already been studied in mathematical statistics. Researchers in this field have investigated the problem of determining the unknown probability distribution from given finite data coming from it. (Statistical inference for probability distributions is called “classical” statistical inference in contrast with “quantum” statistical inference.) Thus, quantum statistical inference can be regarded as the quantum extension of classical statistical inference. Therefore, in order to comprehend it, we should first briefly outline the classical statistical inference theory.

1.1. Buildup of Classical Statistical Inference Usually researchers in classical statistical inference divide their problems into two classes: estimation and hypothesis testing. In estimation, they estimate the unknown distribution from given finite data. Usually, they assume that the unknown probability distribution belongs to a certain parameterized subset, i.e., a family of probability distributions. Using this assumption, they replace their problem by the estimation of the parameter identifying the distribution. On the other hand, in hypothesis testing, they test whether the given hypothesis is true or not. The formulations of classical estimation are given as follows: In classical statistical inference, we often assume that the data are independent, 8

entire

December 28, 2004

13:56



9

identically distributed (i.i.d.) with the unknown probability distribution p∗ . Based on this assumption, we can easily treat estimation theory in the case where the number n of data is large, i.e., in the large sample case. This is because asymptotic expansions are available in the large sample case. Usually, we evaluate the estimation error by the mean square error (MSE). The MSE of an estimator is almost proportional to 1/n in the large sample case. Hence, in this case, it is suitable to focus on the coefficient of the term 1/n. The optimization theory of this coefficient is called “asymptotic optimality theory,” (or “asymptotic theory”) and is one of the main topics in classical estimation theory. Fortunately, a unified treatment of asymptotic theory is available, which is a great advantage. On the other hand, there is no unified approach for the small sample case, i.e., the case where the number n of data is small. Hence, several methods are proposed as follows. One is the Bayesian approach, in which it is assumed in addition that the parameter is a random variable with a known distribution. This prior distribution is decided based on prior knowledge† . However, we cannot use the Bayesian approach when our prior knowledge is insuffucient to decide the prior distribution. On the other hand, there are several non-Bayesian approaches, e.g., minimum variance unbiased estimation, minimax method, invariance approach, and so on. In the minimum variance unbiased estimation, one demands unbiasedness: the expectation of the estimator always equals the true parameter; and one minimizes the error of the estimators satisfying this condition. However, there is no good practical reason to restrict our estimators to unbiased estimators. In the minimax method, one focuses on the worst error for an estimator given the family of probability distributions. This method provides an attractive estimator when the family of probability distribution has a homogenous structure. Otherwise, unfortunately, the optimal solution is not necessarily an attractive estimator. Further, the invariance approach is useful for a homogenous family of probability distribution, but is not so useful in other families. Thus, there is no one method that can always be successfully applied in the small sample case, i.e., there is no unified approach [0-3]. Similarly, we can consider these methods for hypothesis testing, but ∗ Currently,

many mathematical statisticians treat dependent data, and they obtained several basic results similar to the i.i.d. case. However, the discussion of the dependent case is so difficult that we mainly focus on the i.i.d. case in this book. † The optimal performance in the Bayesian approach almost coincides with those in nonBayesian approaches in the large sample case.

entire

December 28, 2004

10

13:56



they suffer the same drawbacks.

1.2. Research Direction of Quantum Statistical Inference Due to the success of asymptotic theory in classical statistical inference, it is natural to attempt to establish an asymptotic theory in quantum statistical inference. Since we can easily apply classical asymptotic theory to the i.i.d. case, we need the following repeatability in quantum statistical inference: • The unknown state can be prepared independently and identically. This condition is a quantum analogue of the i.i.d. condition and is called quantum i.i.d. condition. It will be more formally formulated in the final section in this chapter. Recently, these desired asymptotic theories have been established by reducing our problem in the large sample case as follows. First, the process of estimating/testing the unknown quantum state is divided into two parts: One is the appropriate choice of the quantum measurement, the other is that of the function providing the desired decision from the obtained data. While the latter belongs to the problem of classical statistics, the former is the central issue in quantum statistical inference. When we perform the same quantum measurement on every system, we can apply classical asymptotic theory. Therefore, our problem is reduced to the optimization of a suitable information quantity determined by our measurement‡ .

2. Significance of Classical/Quantum Statistical Inference We now consider what circumstances enable quantum statistical inference to be useful. First, we should remark that the importance of classical statistical inference depends on the amount of data. For example, if the number of observations is sufficiently huge (e.g., greater than 1,000,000,000) in a simple system, the obtained histogram almost equals the true distribution so that more technical analysis is not required§ . On the other hand, if the number of data is not so huge (e.g., 1,000), by choosing a suitable estimator, we can improve the error. Especially, if the number of data is not ‡ Even

in the quantum i.i.d. case, when we use a correlated measurement on the composite system, we cannot use this reduction. § If the model is complex and the amount of data is huge, we need theoretical analysis for treating data.

entire

December 28, 2004

13:56



11

so huge but sufficiently large, we can apply asymptotic theory so that the approximately optimal error can be simply characterized¶. On the quantum side, it is assumed that the ensemble of quantum systems is prepared independently and identically. Quantum statistical inference is powerful when the size of the ensemble is not huge. Such a situation often arises in quantum information processing. That is, a quantum state generator is needed in quantum information processing, and quantum statistical inference is a very useful method for identifying the quantum state ρ generated by it. Indeed, this method is more useful in the remote preparation of an entangled state because an identical state can be repeatedly produced; but the number of repetitions is limited. Moreover, we can also consider applications to experiments with elementary particles. In this research area, it is often important to precisely estimate physical constants through experiments with energy restrictions . In these experiments, only a limited amount of data is available so that quantum statistical inference can be expected to provide a guideline for the experiment. However, quantum statistical inference (especially, its asymptotic theory) is not necessarily effective for every case. For example, the usual asymptotic theory of quantum statistical inference is not useful in the detection process in optical communication, although quantum statistical inference was initiated in relation to its optimization. This is because the quantum analogue of the i.i.d. setting is not satisfied in this case. But, as is mentioned in Sec. 1.3 of Chap. 1, a modified application is appropriate for the detection process [I-7,I-8]. As another example, we can cite the NMR system. In this case, the ensemble is so huge that we can treat not each individual particles but only the ensemble of particles, i.e., we obtain only the average value of a measured observable with the ensemble as the data. Moreover, the huge size of the ensemble guarantees that the obtained average value almost equals the true average. Therefore, quantum statistical inference has little use in this case.

¶ The complexity of a suitable estimator in the sense of mathematical statistics is also huge in the huge sample case. But, the complexity is not so huge when the number of data is not so huge but sufficiently large. M. Hotta, M. Ozawa, “Quantum Estimation by Local Observables,” Phys. Rev. A, 70, 022327 (2004); quant-ph/0401187.

entire

December 28, 2004

12

13:56



3. Organization of This Book This book consists of the following parts Part I Hypothesis Testing Part II Quantum Cramér-Rao Bound I (Mixed States Model) Part III Quantum Cramér-Rao Bound II (Pure States Model and Geometrical Study) Part IV Group Symmetric Approach to Pure States Model Part V Large Deviation Theory in Quantum Estimation Part VI Further Topics on Quantum Statistical Inference Every part begins with an introduction treating the relations between the included papers and related research. As is mentioned in Preface, these introductions are written based on the editor’s personal point of view, and explain related papers. Part I consists of papers concerning quantum hypothesis testing, especially on quantum Stein’s lemma, and the remaining parts consist of papers concerning quantum estimation. Parts II and III discuss the asymptotic theory of quantum estimation. Part II treats its general theory and the mixed state case. Part III treats the pure state case. In particular, Part III discusses the geometrical approach, which is effective for the pure state case and even for the mixed state case using the method of unbiased estimators. Part IV treats the group symmetry approach in the pure state case, in which the minimax method is effective even in the small sample case. Its optimal solution is also the optimal Bayesian estimator under the invariant distribution. This part also discussed the relation between the small sample case and the large sample case. Part V discusses large deviation theory of quantum estimation. This topic is closely related to hypothesis testing discussed in Part I. Part VI treats the remaining two topics, the estimation of a quantum process and estimation by a fixed quantum measurement. The latter is often called tomography. Chapter 33 treats the estimation of an unknown state with the tomographic method in an infinite-dimensional system. Chapter 34 considers the application of quantum estimation to real physical systems. The reader interested in the mathematical aspects of quantum statistical inference may read Chapter 1 first. The reader interested in the practical aspects of quantum statistical inference may read Chapter 34 first. This chapter explains its experimental significance.

entire

December 28, 2004

13:56



13

Many papers in quantum statistical inference have appeared in the following e-print server∗∗: http://arxiv.org/archive/quant-ph whose preprints are abbreviated as quant-ph/???????. Several other preprints appeared in Mathematical Engineering Technical Reports, which is abbreviated as Math. Eng. Tech. Rep. These technical reports were previously managed by Mathematical Engineering Section of Department of Mathematical Engineering and Information Physics in School of Engineering, the University of Tokyo, and currently, by Department of Mathematical Informatics in Graduate School of Information Science and Technology, the University of Tokyo. Most of them are available at: http://www.keisu.t.u-tokyo.ac.jp/Research/techrep.0.html 4. Mathematical Formulation Here, we review the mathematical foundation of quantum measurement theory [0-1,0-2]. A quantum system is described by a Hilbert space H, which is a finite or infinite dimensional linear space with an Hermitian inner product. Any state of the system H is described by a density matrix/operator ρ, which is a positive semi-definite Hermitian matrix/operator with trace 1. The most general description of quantum measurement is provided by the mathematical concept of positive operator-valued measure (POVM) (or probability operator measure (POM)), which is a resolution of the unity on the system H. Generally, the resolution M = {M (ω)}ω∈Ω is called a POVM if it satisfies the following conditions†† : M (ω) = I. M (ω) = M (ω)∗ , M (ω) ≥ 0, ω∈Ω

When we perform the quantum measurement M = {M (ω)}ω∈Ω on the system H whose state is ρ, the data ω obeys the probability distribution PρM : PρM (ω) = Tr ρM (ω). The quantity PρM (ω) satisfies the axioms of probability as checked in the following case: The positivity of Tr ρM (ω) follows from the positivity of ρ ∗∗ A

contributor has an objection against introducing these preprints as further reading papers. However, the editor has decided to introduce them in order to provide the readers with more intormation. †† Some papers describe an outcome of a POVM by a subscript as M = {M } ω ω∈Ω .

entire

December 28, 2004

14

13:56



and M (ω). The condition ω∈Ω PρM (ω) = 1 is checked by the calculation: ω∈Ω Tr ρM (ω) = Tr ρ ω∈Ω M (ω) = Tr ρI = Tr ρ = 1. Thus, this description is sufficient to treat the output probability, but is not sufficient for the description of state demolition caused by the quantum measurement. Here, we do not treat the case when the set Ω is continuous, but its formulation in the continuous case is mentioned in section 2 of chapter 8. Furthermore, when the two systems H1 and H2 are independently prepared in the states ρ1 and ρ2 , respectively, the composite system is described by H1 ⊗ H2 , and its state is by ρ1 ⊗ ρ2 . Especially, when the n systems are prepared independently and identically, the composite system is given by the n tensor product space H⊗n ≡ H ⊗ · · · ⊗ H and its state is n

described by an n-fold tensor product state ρ⊗n ≡ ρ ⊗ · · · ⊗ ρ. Since this n

can be regarded as a quantum analogue of the independent and identically distributed (i.i.d.) condition, it is called quantum i.i.d. condition. When we perform the same quantum measurement M = {M (ω)}ω∈Ω on each of the components H, the quantum measurement on the composite system H⊗n with n outcomes ω n ≡ (ω1 , . . . ωn ) is described by M n = {M n (ωn )}: ωn ) ≡ M (ω1 ) ⊗ . . . ⊗ M (ωn ). M n (

(1)

We can also choose the k-th measurement M(k,ωk−1 ) as a function of the k − 1 outcomes ωk−1 = (ω1 , . . . ωk−1 ) of the preceding measurements. This measurement on the composite system can be written by the POVM M n = ωn )}: {M n ( M n (ω1 , . . . , ωn ) ≡

n

M(k,ωk−1 ) (ωk ).

(2)

k=1

Such a POVM is called adaptive. On the other hand, if a POVM M n = {M n (ω)} has the following form, it is called separable: F(1) (x) ⊗ . . . ⊗ F(n) (x). (3) M n (ω) = x∈Xω

Here, we should remark that {F(k) (x)} is not neccessarily a POVM. Especially, a POVM M n = {M n (ω)} having the following form is separable. M n (ω) = F(1) (ω) ⊗ . . . ⊗ F(n) (ω). Thus, any adaptive POVM is separable, i.e., the set of separable POVMs is a wider class of POVMs than that of adaptive POVMs. The set of separable POVMs also contains any POVM based on local operations and two-way

entire

December 28, 2004

13:56



15

classical communications‡‡ . Usually, it is very difficult to realize an arbitrary POVM on the composite system. Therefore, it is useful to find a good measurement among POVMs satisfying the same measurement condition (1) or the adaptive condition (2). As is mentioned in Sec. 4 of Chap. 7, the the class (3) is important form a theoretical viewpoint. For further introduction to quantum statistics, statistical researchers may read the survey paper [0-4], physical researchers may read Helstrom’s textbook [0-1], and mathematical researchers may read Holevo’s textbook [0-2]. Also, another book for a desired physical side of quantum statistical inference is planned [0-5]. References [0-1] C.W. Helstrom, Quantum Detection and Estimation Theory, (Academic Press, New York, 1976). [0-2] A.S. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, (North-Holland, Amsterdam, 1982); Originally published in Russian (1980). [0-3] A.W. van der Vaart, Asmyptotic Statistics, (Cambridge Univ. Press, 1998). [0-4] O.E. Barndorff-Nielsen, R.D. Gill, and P.E. Jupp, “On Quantum Statistical Inference,” (with discussion) Journal of the Royal Statistical Society. Series B, 65, 775–816 (2003). ˇ aˇcek, Quantum State Estimation, (Springer, 2004). [0-5] M. Paris and J. Reh´

‡‡ E.

M. Rains, IEEE Trans. Inform. Theory, 47, 2921 (2001).

entire

December 28, 2004

16

13:56



entire

December 28, 2004

13:56


PART I Hypothesis Testing

Chap. 2: T. Ogawa and H. Nagaoka “Strong converse and Stein’s lemma in quantum hypothesis testing” . . 28 Chap. 3: F. Hiai and D. Petz “The proper formula for relative entropy and its asymptotics in quantum probability” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chap. 4: H. Nagaoka “Strong converse theorems in quantum information theory” . . . . . . . 64 Chap. 5: M. Hayashi “Asymptotics of quantum relative entropy from a representation theoretical viewpoint” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Chap. 6: A. Fujiwara “Quantum birthday problems: Geometrical aspects of quantum random coding” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

entire

December 28, 2004

13:56


entire

December 28, 2004

13:56


CHAPTER 1

Introduction to Part I 1. Quantum Hypothesis Testing 1.1. Setting of Quantum Hypothesis Testing In the quantum hypothesis testing, it is required to decide whether the given hypothesis (which is often called the null hypothesis) for the density is true or not. Especially, the following setting is the simplest, and is called simple hypothesis testing: the null hypothesis for the true density is given as a density ρ and the alternative hypothesis (which is assumed to be valid if the null hypothesis is not valid) is also given as another density σ. Even in this simple setting, no quantum version of the fundamental asymptotic theory had been established until 1990. In the following sections, we briefly review the history of the development of its quantum version with an outline of its formulation. (For a more rigorous formulation, please refer to section 1 of [Chap. 2].) In this problem, we focus on two error probabilities: the first error probability is the probability of incorrectly deciding σ to be the true state when the true state is ρ, and the second error probability is the opposite error probability. In general, these two error probabilities do not have the same importance. Usually, we optimize the second error probability under the constraint that the first error probability is less than a given threshold. In the classical version of this problem, i.e., the test of two probability distributions p(ω) and q(ω), the likelihood ratio p(ω) q(ω) plays an important role. In particular, as mentioned in Neyman Pearson Lemma, the optimal test is given by a likelihood ratio test, which is defined by the likelihood ratio. In the quantum version, Holevo introduced a quantum counterpart of a likelihood ratio test, and obtained the quantum Neyman Pearson Lemma [I-1]. Following this result, Helstrom presented it in a more familiar form [I-2]. However, we cannot derive explicit forms of those error probabilities by only using this method.

19

entire

December 28, 2004

13:56


Introduction to Part I

20

1.2. Quantum Stein’s Lemma In the classical asymptotic i.i.d. setting (i.e., with a sufficiently large i.i.d. classical ensemble), there is an explicit characterization for error probabilities as follows. Here, we focus on the constraint that the first error probability is lower than a given constant . It is known that the optimal second error probability goes to 0 exponentially when the number of data goes to infinity. Furthermore, Stein’s Lemma guarantees that this exponential rate equals the relative entropy∗ D(p q) ≡ ω p(ω)(log p(ω) − log q(ω)). Using this lemma, we can characterize the asymptotic behavior of both of the two error probabilities. It should be noted that this decreasing rate is independent of the constant . As its quantum extension, one may think that this optimal decreasing rate is the quantum relative entropy D(ρ σ) ≡ Tr ρ(log ρ − log σ) even in quantum hypothesis testing. But, the solution of this problem is not so easy. In the quantum setting, when we perform a quantum measurement M , the data obeys a probability distribution PρM . Thus, if we fix a quantum measurement M , our problem is reduced to the test of two probability distributions PρM and PσM . That is, if we perform the same measurement M for the ensemble of the unknown state, the decreasing error rate equals classical relative entropy D(PρM PσM ) by performing a suitable classical data manipulation. Here, we should remark that the monotonicity of the quantum relative entropy† guarantees that D(PρM PσM ) ≤ D(ρ σ). Thus, for the optimal second error decreasing rate to be the quantum relative entropy, the achievement of the above inequality seems to be needed. But, in general, no quantum measurement M is known to satisfy this equality. This fact seems to indicate that the optimal decreasing rate is less than the quantum relative entropy. However, if it is proved that there exists a quantum measurement M n n n on an n-fold composite-system, such that the quantity n1 D(PρM⊗n PσM⊗n ) is sufficiently close to the quantum relative entropy D(ρ σ), we can construct a test whose decreasing error rate is close to the quantum relative entropy D(ρ σ). That is, the existence of such a test guaranteed by the application n n of the classical Stein’s Lemma to the hypotheses PρM⊗n vs. PσM⊗n if there exists such a sequence of quantum measurements. Hiai and Petz [Chap.

∗ In

statistics this is often called Kullback-Leibler divergence. refer to eq.(1) of Chap. 5.

† Please

entire

December 28, 2004

13:56



21

3] ‡ proved its existence§ , i.e., by combining classical Stein’s lemma, they showed that the Quantum relative entropy can be achieved by the optimal second error exponent with the constant constraint. Undoubtedly, their achievement is a great progress in Quantum hypothesis testing. However, Quantum Stein’s lemma was not completed only by the above result because an opposite inequality is required. In information theory, such a possibility part (achieving part, e.g., Hiai-Petz’s result) is called the direct part while the impossibility part is called the converse part. One may consider that the simple combination of the inequality D(PρM PσM ) ≤ D(ρ σ) and classical Stein’s Lemma would yield the converse part of quantum Stein’s Lemma. However, this is not true and more careful discussions are required, because in general, we must treat a sequence of correlated quann tum measurements M n , in which the probability distribution PρM⊗n is not an i.i.d. distribution. Concerning the converse part of Quantum Stein’s lemma, Hiai and Petz derived its weak version, i.e., proved a quantum version of the weak converse, which will be explained in Sec. 3. After their success, Ogawa and Nagaoka succeeded in completely proving its converse part, i.e., the Quantum Stein’s Lemma has been proved [Chap. 2]. After this work, Nagaoka [Chap. 4] simplified its converse part. 1.3. Further Analysis of Quantum Stein’s Lemma The first extension of Hiai-Petz’s result was performed by Hayashi [Chap. 5] with a representation-theoretical method. Their construction of M n satn n isfying n1 D(PρM⊗n PσM⊗n ) → D(ρ σ) depends on ρ and σ. Hayashi showed that such a measurement M n can be chosen independently of ρ, and consequently obtained another proof of the direct part. However, the above-mentioned proofs of quantum Stein’s Lemma require the classical Stein’s Lemma. Hence, these approaches did not directly construct the test achieving the optimal rate. Hayashi [I-3] found a pair of a measurement M n and a classical data process which achieves the optimal rate as follows: First, he characterized a measurement attaining this optimal ‡ A trigger for the work of [Chap.3] was the first meeting of Hiai and Nagaoka in 1990 when they were in the same university (but in different departments), in which Nagaoka asked the possibility of extending Stein’s lemma to the quantum case. § Their method is quite simple, but their paper was written with the notations of operator algebra so that it may not be familiar to researchers in quantum information. This is because at that time there was few researchers in quantum information. Therefore, for the reader’s convenience, Ogawa-Nagaoka’s paper is treated as the first paper.

entire

December 28, 2004

22

13:56



rate with a suitable likelihood test depending only on σ. He checked that the measurement proposed by Hayashi [Chap. 5] satisfies this condition. n The main difficulty is that the probability PρM⊗n is neither identical nor independent, and has been solved with the use of the information-spectrum method, which was introduced by Han¶ . This method is a powerful tool for the analysis of classical hypothesis testing problems in such a general setting. Nagaoka and Hayashi [I-5] focused on a quantum version of the information spectrum approach of hypothesis testing, i.e., they asymptotically treated the simple quantum hypothesis testing problem with a general sequence of two density matrixes that are hypotheses, and obtained general formulas of this setting. In the classical setting, Hoeffding focused on the constraint on the decreasing rate of the second error probability, and obtained the optimal asymptotic decreasing rate of the first error probability under this constraint. Ogawa and Hayashi [I-4] proceeded to the analysis of its quantum version, and derived a lower bound of its optimal rate, which directly yields another proof of the direct part of quantum Stein’s Lemma without use of the classical Stein’s Lemma. This setting can be treated from the viewpoint of non-regular parametric estimation [I-6]. Furthermore, Ogawa and Nagaoka [I-7] remarked that the direct part of the channel coding theorem can be proved based on the direct part of Quantum Stein’s lemma. Following this research direction, Hayashi and Nagaoka [I-8] derived the capacity of the classical-quantum channel in a general setting. They also obtained a simple proof of Hiai-Petz’s result, at Remark 15, whose proof is based on more elementary knowledge. 1.4. Further Topics in Quantum Hypothesis Testing Several quantum processings require the preparation of the maximally entangled state (Bell state). Thus, it is an interesting application of quantum hypothesis testing to test whether the experimentally realized state is truely the maximally entangled state or not. Several suitable measurements for this purpose have been proposed, and its performance and its optimality ¶ T. S. Han, “Hypothesis testing with the general source,” IEEE Trans. Inform. Theory, 46, no.7, 2415–2427 (2000). W. Hoeffding, “On probabilities of large deviations,” Proceedings of Symposium the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 203-219, Berkeley, University of California Press, 1965.

entire

December 28, 2004

13:56



23

have been discussed by the papers [I-9,I-10,I-11]. On the other hand, van Dam et al. [I-12] discussed an optimal experiment for proving nonlocality of quantum mechanics. Furthermore, Osawa and Matsumoto [I-13] considered the time-energy uncertainty relation from the viewpoint of hypothesis testing. 2. Multi-Hypotheses Testing and Distinguishability Since the number of assumed hypotheses is not necessarily two, we need to treat the multi-hypotheses case. Using a linear programming method, Yuen, Kennedy, and Lax [I-14] characterized the minimization problem of the average error probability with arbitrary prior probability∗∗ . This result was generalized to convex optimization problems, e.g., the optimization of mutual information by Holevo [I-15]. When this average error is 0, the set of hypotheses states is said to be distinguishable. While the distinguishability of a set of pure states can be easily characterized by the orthogonality between the elements, its entangled version by local operation and classical communication (LOCC) is slightly complicated. Fan [I-16] obtained several remarkable results in this case. The asymptotic extension of distinguishability has a difficulty that is different from the entangled case. In order to analyze this problem, Fujiwara [Chap. 6] introduced two asymptotic extensions of orthogonality. The first extension is the asymptotic strong orthogonality, where the sum of all of the fidelities between distinct elements goes to 0. The second is the asymptotic weak orthogonality, where the value dividing the above sum by the number of elements goes to 0†† . When every element is randomly generated by a given probability, the maximum size satisfying the asymptotic weak orthogonality is characterized by the von Neumann entropy, but the maximum satisfying the asymptotic strong orthogonality is characterized by the quantum Rényi entropy with degree 2. Here, we should remark that as was solved by Hausladen et al. [I-17], the von Neumann entropy equals the capacity of quantum channel coding with pure states, and that the quantum Rényi entropy with degree 2 coincides with the smaller lower bound of the capacity by Stratonovich and Vantsjan [I-18]. Thus, Fujiwara’s results can ∗∗ Indeed,

this result first appeared in H. P. Yuen, “Communication Theory of quantum systems,” Res. Lab. Electron., M.I.T., Cambridge Mass., Tech. Rep. 482, Aug. 1971. However, as Holevo pointed out to him, the first proof is invalid in the infinite-dimensional case. It was corrected in Yuen, Kennedy, and Lax [I-14]. †† The definition of asymptotic weak orthogonality is seemingly different from the original one. But, they are essentially equivalent.

entire

December 28, 2004

13:56



24

be regarded as their characterizations from the viewpoint of asymptotic weak/strong orthogonality. 3. Mathematical Treatment of Classical Hypothesis Testing In classical simple hypothesis testing with probability distributions p and q on the probability space Ω under null and alternative, any test is described by the acceptance region T for p, which is a subset of Ω. Hence, the first error probability is 1 − p(T ), and the second one is q(T ). Lemma 1: Neyman Pearson Lemma:‡‡ Assume that 1 − p(Tr ) = . Then the equation Tr = argmin q(T )

(1)

1−p(T )≤

holds, where the likelihood ratio test Tr with the ratio r is given by Tr ≡ {i ∈ Ω|pi ≥ rqi }. Here, for the the reader’s convenience, we give a proof of the classical Neyman-Pearson lemma. Proof: Let T be a test satisfying 1 − p(T ) ≤ . Thus, + rq(Tr ) = 1 − p(Tr ) + rq(Tr ) = 1 + (rq − p)(Tr ) ≤1 + (rq − p)(T ) = 1 − p(T ) + rq(T ) ≤ + rq(T ). Therefore, we obtain (1). Next, we proceed to the classical Stein’s lemma. Let pn and q n be the n- independent and identical distributions of p and q, i.e., pn (ω n ) ≡ p(ω1 ) . . . p(ωn ). Lemma 2: Stein’s Lemma: The equation lim

−1 log βn∗ (p q|) = D(p q) n

holds, where we define βn∗ (p q|) as βn∗ (p q|) ≡ ‡‡ If

min

1−pn (Tn )≤

q n (Tn ).

we allow randomized likelihood test, the statement can be extended as follows: For a given > 0, there exists a randomized likelihood test Tr such that 1 − p(Tr ) ≤ and 1 − p(T ) ≤ ⇒ q(Tr ) ≤ q(T ). In statistics, this extended statement is called Neyman Pearson Lemma.

entire

December 28, 2004

13:56



25

Indeed, classical Stein’s lemma can be divided into the following two parts: Direct Part: For any δ > 0, there exists a sequence of acceptance regions Tn of n identical observations such that pn (Tn ) → 0,

lim

−1 log q n (Tn ) ≥ D(p q) − δ. n

(2)

(Strong) Converse Part: −1 log βn∗ (p q|) ≤ D(p q). n

lim

(3)

Combining the above statement, we can prove Stein’s lemma. The following statement is known as a weaker converse statement. Weak Converse Part: If n → 0, then lim

−1 log βn∗ (p q|n ) ≤ D(p q). n

(4)

Here, for reader’s convenience, we give a proof of classical Stein’s lemma. Proof: For arbitrary real number n δ > 0, we choose

an acceptance re gion Tn = ω n = (ω1 , . . . , ωn ) pqn (ω n ) ≥ en(D(pq)−δ) . Then, the first error probability n n n n 1 1 − p (Tn ) = p (log p − log q)(ωi ) < D(p q) − δ ω n i=1 n goes to 0, because the random variable n1 i=1 (log p − log q)(ωi ) goes to D(p q) in pn probability. Thus, for a sufficienltly large N , we have 1 − pn (Tn ) ≤ , for any n ≥ N . On the other hand, the second error probability satisfies q n (Tn ) =

q n (ωn ) ≤

ω n ∈Tn

≤ e−n(D(pq)−δ)

e−n(D(pq)−δ) pn (ωn )

ω n ∈Tn

pn (ωn ) = e−n(D(pq)−δ) .

ωn

Thus, we obtain the direct part (2). Next, we show the opposite inequality lim

−1 log βn∗ (p q|) ≤ D(p q). n

(5)

entire

December 28, 2004

13:56



26

Let Tn be an acceptance region achieving the minimum value βn∗ (p q|), then

βn∗ (p q|) ≥ q n ω n ∈ Tn en(D(pq)+δ) q n (ω n ) ≥ pn (ω n )

≥ e−n(D(pq)+δ) pn ω n ∈ Tn en(D(pq)+δ) q n (ω n ) ≥ pn (ω n )

c , ≥ e−n(D(pq)+δ) pn (Tn ) − pn ω n en(D(pq)+δ) q n (ω n ) ≥ pn (ω n ) where δ is an arbitrary positive number. Since the random variable n 1 p − log q)(ωi ) goes to D(p q) in the pn probability, the probai=1 (log n c goesto 0. This implies that bility pn ω n en(D(pq)+δ) q n (ω n ) ≥ pn (ω n )

c n n n n(D(pq)+δ) n n n n p (Tn ) − p ω e q (ω ) ≥ p (ω ) goes to 1 − > 0. Therefore, we obtain −1 log βn∗ (p q|) ≤ D(p q) + δ. n Since δ is arbitrary, we obtain the converse part (3). lim

Further Reading [I-1] A. S. Holevo, “An analog of the theory of statistical decisions in noncommutative theory of probability,” Trudy Moskov. Mat. Obˇsˇc., 26, 133–149 (1972). (English translation is Trans. Moscow Math. Soc., 26, 133–149 (1972).) [I-2] C. W. Helstrom, Quantum Detection and Estimation Theory, (Academic Press, New York, 1976). [I-3] M. Hayashi, “Optimal sequence of quantum measurements in the sense of Stein’s lemma in quantum hypothesis testing,” J. Phys. A: Math. and Gen., 35, 10759–10773 (2002); quant-ph/0208020. [I-4] T. Ogawa and M. Hayashi, “On Error Exponents in Quantum Hypothesis Testing,” IEEE Trans. Inform. Theory, 50, 1368–1372 (2004); quant-ph/0206151. [I-5] H. Nagaoka and M. Hayashi, “An Information-Spectrum Approach to Classical and Quantum Hypothesis Testing for Simple Hypotheses,” quant-ph/0206185. [I-6] Y. Tsuda and K. Matsumoto, “Quantum estimation for nondifferentiable models,” quant-ph/0207150. [I-7] T. Ogawa and H. Nagaoka, “A New Proof of the Channel Coding Theorem via Hypothesis Testing in Quantum Information Theory,” quant-ph/0208139.

entire

December 28, 2004

13:56



27

[I-8] M. Hayashi and H. Nagaoka, “General formulas for capacity of classical-quantum channels,” IEEE Trans. Inform. Theory, 49, 1753– 1768 (2003); quant-ph/0206186. [I-9] S. Virmani and M. B. Plenio, “Construction of extremal local positiveoperator-valued measures under symmetry,” Phys. Rev. A, 67, 062308 (2003). [I-10] M. Barbieri, F. De Martini, G. Di Nepi,, P. Mataloni, G. M. D’Ariano, and C. Macchiavello, “Experimental Detection of Entanglement with Polarized Photons,” quant-ph/0307003 (2003). [I-11] Y. Tsuda, M. Hayashi, and K. Matsumoto, in preparation. [I-12] W. van Dam, R. D. Gill, and P. D. Gr¨ uwald, “The statistical strength of nonlocality proofs,” quant-ph/0307125 (2003). [I-13] S. Osawa and K. Matsumoto, “Considerations in the Time-Energy Uncertainty Relation from the Viewpoint of Hypotheis Testing,” in O. Hirota, A.S. Holevo, and C.A. Caves, eds., Quantum Communication, Computing, and Measurement, Plenum, New York, 1997, 381–386. [I-14] H. P. Yuen, R. S. Kennedy and M. Lax, “Optimum Testing of Multiple Hypotheses in Quantum Detection Theory,” IEEE Trans. Inform. Theory, 21, 125-134 (1975). [I-15] A. S. Holevo, “Statistical decision theory for quantum systems,” J. Multivariate Analysis, 3, 337-394 (1973). [I-16] H. Fan, “Distinguishability and indistinguishability by local operations and classical communication,” Phys. Rev. Lett., 92, 177905 (2004); quant-ph/0311026. [I-17] P. Hausladen, R. Jozsa, B. Schumacher, M. Westmoreland, W. Wootters, “Classical information capacity of a quantum channel,” Phys. Rev. A, 54, 1869–1876 (1996). [I-18] R. L. Stratonovich and A. G. Vantsjan, “On asymptotically errorless decoding in pure quantum channels,” Probl. Control Inform. Theory, 7, 161–174 (1978).

entire

December 28, 2004

13:56


CHAPTER 2

Strong Converse and Stein’s Lemma in Quantum Hypothesis Testing Tomohiro Ogawa and Hiroshi Nagaoka Abstract. The hypothesis testing problem for two quantum states is treated. We show a new inequality between the errors of the first kind and the second kind, which complements the result of Hiai and Petz to establish the quantum version of Stein’s lemma. The inequality is also used to show a bound on the probability of errors of the first kind when the power exponent for the probability of errors of the second kind exceeds the quantum relative entropy, which yields the strong converse in quantum hypothesis testing. Finally, we discuss the relation between the bound and the power exponent derived by Han and Kobayashi in classical hypothesis testing. Index Terms: Quantum hypothesis testing, quantum information theory, quantum relative entropy, Stein’s lemma, strong converse.

1. Introduction Let H be a Hilbert space which represents a physical system of interest. We assume dim H < ∞ for mathematical simplicity. Let L(H) be the set of linear operators on H and denote the set of density operators on H by S(H) = {ρ ∈ L(H) | ρ = ρ∗ ≥ 0, Tr[ρ] = 1}. def

We study the hypothesis testing problem for the null hypothesis H0 : ρ⊗n ∈ S(H⊗n ) versus the alternative hypothesis H1 : σ ⊗n ∈ S(H⊗n ), where ρ⊗n and σ ⊗n are the nth-tensor powers of arbitrarily given density operators ρ and σ in S(H). The problem is to decide which hypothesis is true based on a two-valued quantum measurement {An (H0 ), An (H1 )}, where An (H0 ) and An (H1 ) are positive semidefinite operators on H⊗n obeying An (H0 ) + An (H1 ) = In (the identity in L(H⊗n )). In the sequel, an operator An ∈ L(H⊗n ) satisfying 0 ≤ An ≤ In , or a sequence {An }∞ n=1 of such operators, is called a test, identifying it with the measurement {An (H0 ), An (H1 )} = c 2000 IEEE. Reprinted, with permission, from IEEE Trans. Inform. Theory, 46, 24282433, 2000. 28

entire

December 28, 2004

13:56


Strong Converse and Stein’s Lemma in Quantum Hypothesis Testing

29

{An , In − An }. For a test An , the error probabilities of the first kind and the second kind are, respectively, defined by αn (An ) = Tr[ ρ⊗n (In − An ) ] and βn (An ) = Tr[ σ ⊗n An ]. def

def

We see that αn (An ) is the probability of erroneously accepting σ ⊗n when ρ⊗n is true and βn (An ) is the error probability of the converse situation. Our interest lies in the asymptotic behavior of these error probabilities when n goes to ∞. In [8], Hiai and Petz proved an important theorem which claims that the quantum relative entropy def

D(ρ||σ) = Tr[ ρ(log ρ − log σ) ]

(1)

is obtained as a limit of classical relative entropies (Kullback divergences). Applying this theorem to the hypothesis testing problem, they showed that for any ε > 0 lim sup n→∞

1 log βn∗ (ε) ≤ −D(ρ||σ), n

(2)

where βn∗ (ε) = min{βn (An ) | An ∈ L(H⊗n ), 0 ≤ An ≤ I, αn (An ) ≤ ε}. def

(3)

In other words, for each ε > 0 there exists a test {An } such that αn (An ) ≤ ε for all n and βn (An ) ∼ e−nr with r ≥ D(ρ||σ) as n → ∞. Here we note that the ε-constraint on αn (An ) can be strengthened to the convergence of αn (An ) to 0 while keeping the rate condition on βn (An ) unchanged; i.e., there exists a test {An } satisfying lim αn (An ) = 0

n→∞

and

lim sup n→∞

1 log βn (An ) ≤ −D(ρ||σ). n

(4)

Indeed, it follows from (2) that for each ε > 0 there exist a number n0 (ε) and a test {An (ε) ∈ L(H⊗n )}∞ n=n0 (ε) such that αn (An (ε)) ≤ ε and (1/n) log βn (An (ε)) ≤ −D(ρ||σ) + ε for all n ≥ n0 (ε). An example of {An } satisfying (4) is then given by An = An (εk ) if

n0 (εk ) ≤ n < n0 (εk+1 ),

where {εk }∞ k=1 is an arbitrary sequence of positive numbers converging monotonically to 0. These results are categorized into the “direct part” in the sense that they are concerned with existence of good tests. For the “converse part”

entire

December 28, 2004

13:56



30

concerned with nonexistence of too good tests, Hiai and Petz obtained 1 1 D(ρ||σ) ≤ lim inf log βn∗ (ε). − (5) n→∞ n 1−ε They derived this from nD(ρ||σ) = D(ρ⊗n ||σ ⊗n ) ≥ d(αn (An )||1 − βn (An )) = −h(αn (An )) − αn (An ) log(1 − βn (An )) − (1 − αn (An )) log βn (An ) ≥ − log 2 − (1 − αn (An )) log βn (An ),

(6)

where def

d(p||q) = p log

1−p p + (1 − p) log , q 1−q

def

h(p) = −p log p − (1 − p) log(1 − p), and the first inequality follows from the monotonicity of the quantum relative entropy [10, 15] (see also Remark 3 below). Note that there is a gap between (2) and (5). In this paper∗ we will derive a new lower bound on the error probabilities which is tighter than (6), whereby the preceding results (2) and (5) will be complemented to establish the quantum version of Stein’s lemma (see e.g. [3], p. 115): 1 log βn∗ (ε) = −D(ρ||σ). n Furthermore, the same bound will be applied to investigate the asymptotic behavior of αn (An ) under the exponential-type constraint on βn (An ) and to prove the strong converse to (4) whose classical counterpart is found in [2, 7]; i.e., if βn (An ) goes to 0 with a rate βn (An ) ∼ e−nr for some r > D(ρ||σ) then αn (An ) necessarily goes to 1. Note that (6) can prove only the weak converse that αn (An ) does not go to 0 under the same assumption. Finally, we will compare our result with that of Han and Kobayashi [7] in classical hypothesis testing. lim

n→∞

2. A Bound on the Error Probabilities In the sequel we assume that Rng ρ ⊂ Rng σ, where Rng denotes the range of an operator, so that the relative entropy and the quantity Tr[ ρ1+s σ −s ] for s > 0 are well defined. Obviously the assumption is satisfied if ρ and σ are strictly positive operators. ∗A

brief summary of the results was published in [12].

entire

December 28, 2004

13:56



31

Let λ be an arbitrary real number and let the spectral decomposition of the operator ρ⊗n − enλ σ ⊗n be written as ρ⊗n − enλ σ ⊗n = µn,j En,j , (7) j∈Jn

where {µn,j }j∈Jn are the eigenvalues and {En,j }j∈Jn are the operators representing the orthogonal projections onto the eigenspaces. Define a test Xn,λ by def En,j , Xn,λ = j∈Kn def

where Kn = {j | µn,j ≥ 0}. Noting the equivalence pn (xn ) 1 log n n ≥ λ ⇐⇒ pn (xn ) − enλ q n (xn ) ≥ 0, n q (x ) we see that Xn,λ is a quantum analog of the classical likelihood ratio test. Indeed, Xn,λ turns out to be optimal in the sense of Neyman-Pearson or that of Bayes according to the quantum Neyman-Pearson lemma (see [6], p. 108) which is based on the following fact. Lemma 1: Every test An satisfies Tr[ (ρ⊗n − enλ σ ⊗n )Xn,λ ] ≥ Tr[ (ρ⊗n − enλ σ ⊗n )An ]. Proof: Tr[ (ρ⊗n − enλ σ ⊗n )An ] = ≤

µn,j Tr[ En,j An ] ≤

j∈Jn

(8)

µn,j Tr[ En,j An ]

j∈Kn

µn,j Tr[ En,j ] = Tr[ (ρ⊗n − enλ σ ⊗n )Xn,λ ].

j∈Kn

We use this lemma to derive a new lower bound on the error probabilities of quantum tests. Theorem 2: For any test An and any λ ∈ R we have 1 − αn (An ) ≤ e−nϕ(λ) + enλ βn (An ),

(9)

where def

ϕ(λ) = max {λs − ψ(s)}

(10)

ψ(s) = log Tr[ ρ1+s σ −s ].

(11)

0≤s≤1

def

entire

December 28, 2004

13:56


entire


32

Proof: Define probability distributions pn = {pn,j }j∈Jn and qn = {qn,j }j∈Jn on the set Jn by pn,j = Tr[ ρ⊗n En,j ] qn,j = Tr[ σ ⊗n En,j ]. Noting that µn,j Tr[ En,j ] = pn,j − enλ qn,j follows from (7) and using the Markov inequality, we have for ∀s ≥ 0 p

n,J pn,j = Pr ≥ enλ J ∼ pn Tr[ ρ⊗n Xn,λ ] = qn,J j∈Kn p s n,J −s ≤ e−nλs E p1+s J ∼ pn = e−nλs n,j qn,j qn,J j∈Jn

where J ∼ pn means that J is a random variable subject to pn . Now we restrict the range of s to 0 ≤ s ≤ 1 and invoke the following inequality (see Remark 3 below): −s ⊗n 1+s ⊗n −s p1+s ) (σ ) . (12) n,j qn,j ≤ Tr(ρ j∈Jn

We thus have Tr[ ρ⊗n Xn,λ ] ≤ min e−nλs Tr(ρ⊗n )1+s (σ ⊗n )−s = min e−n{λs−ψ(s)} = e−nϕ(λ) . 0≤s≤1

0≤s≤1

Combining this with (8), the theorem is proved as follows: 1 − αn (An ) = Tr[ ρ⊗n An ] ≤ Tr[ (ρ⊗n − enλ σ ⊗n )Xn,λ ] + enλ Tr[ σ ⊗n An ] ≤ Tr[ ρ⊗n Xn,λ ] + enλ Tr[ σ ⊗n An ] ≤ e−nϕ(λ) + enλ βn (An ).

Remark 3: As a quantum extension of Csiszár’s f -divergence ([4]), Petz [13, 14] introduced a quantity, which we denote by Sf (ρ||σ), for arbitrary density operators (or arbitrary states on an arbitrary von Neumann algebra) ρ, σ and an arbitrary operator convex function f : R+ → R. Here we say that f is operator convex (see e.g. [1], p. 123) if f ((1 − λ)A + λB) ≤ (1 − λ)f (A) + λf (B) holds for all positive matrices A, B, and all 0 ≤ λ ≤ 1. Using the spectral representations ai Fi σ = bj Gj , (13) ρ= i

j

December 28, 2004

13:56



33

where {ai }, {bj } are the eigenvalues and {Fi }, {Gj } are the orthogonal projections onto the eigenspaces, Sf (ρ||σ) is represented as bj Sf (ρ σ) = Tr Fi Gj . ai f (14) ai i,j The definition of Sf (ρ||σ) is summarized in Appendix A. The relative entropy D defined by (1) is an example of Sf corresponding to the operator convex function f (u) = − log u. Other important operator convex functions are f (u) = −ut for 0 ≤ t ≤ 1 and f (u) = ut for 1 ≤ t ≤ 2 or −1 ≤ t ≤ 0, and the corresponding f -divergences are represented as   − Tr[ρ1−t σ t ], if 0 ≤ t ≤ 1 (15) Sf (ρ||σ) =  1−t t Tr[ρ σ ], if 1 ≤ t ≤ 2 or − 1 ≤ t ≤ 0. Petz showed that the monotonicity property ([10, 15]) of the relative entropy can be extended to the class of Sf . In particular, Sf is monotone with respect to an arbitrary measurement (positive operator-valued measure or POM) {Ej }, which is a family of positive semidefinite operators satisfying j Ej = I, in the following sense. Let {pj } and {qj } be the probability distributions determined from ρ, σ and {Ej } by pj = Tr[ρEj ] and qj = Tr[σEj ]. Then the f -divergence def Sf (p||q) = pj f (qj /pj ) j

is not greater than Sf (ρ||σ). In the case of (15) we have t Tr[ρ1−t σ t ] ≤ p1−t if 0 ≤ t ≤ 1, j qj

(16)

j

and Tr[ρ1−t σ t ] ≥

t p1−t j qj

if

1 ≤ t ≤ 2 or − 1 ≤ t ≤ 0.

(17)

j

Now we can see that inequality (12) is a special case of (17). We also note that the monotonicity of − Tr[ρ1−t σ t ] for 0 ≤ t ≤ 1 including (16) is closely related to the Wigner-Yanase-Dyson-Lieb concavity (e.g. [9]) and follows from the interpolation theory of Uhlmann ([15]) as well as from Petz’s theorem. Remark 4: The proof of Theorem 2 is essentially based on an argument which is familiar in classical large deviation theory. It is interesting to compare it with the proof of Cramér’s theorem (e.g. [5], p. 26).

entire

December 28, 2004

13:56



34

3. Properties of ψ(s) and ϕ(λ) Let us observe some properties of the functions ψ(s) and ϕ(λ) for the later argument. First, we have d −ψ(s) 1+s −s e ρ σ ds = −e−ψ(s) ψ (s)ρ1+s σ −s + e−ψ(s) ρ1+s (log ρ)σ −s − e−ψ(s) ρ1+s (log σ)σ −s = e−ψ(s) ρ1+s A σ −s

(18)

where A = log ρ − log σ − ψ (s). def

Since the trace of e−ψ(s) ρ1+s σ −s is identically equal to 1 by the definition of ψ, its derivative vanishes and we obtain from (18) Tr[ρ1+s A σ −s ] = 0,

(19)

ψ (s) = e−ψ(s) Tr[ρ1+s (log ρ − log σ)σ −s ].

(20)

or, equivalently,

This implies that ψ (0) = D(ρ||σ) since ψ(0) = 0. From (19) we also have d −ψ(s) e Tr[ρ1+s A σ −s ] ds

d d e−ψ(s) σ −s ρ1+s A + e−ψ(s) Tr ρ1+s A σ −s . = Tr ds ds Substituting (18) and (d/ds)A = −ψ (s) into this, we have 0=

ψ (s) = e−ψ(s) Tr[ρ1+s Aσ −s A] 1+s ∗ 1+s s s ρ 2 Aσ − 2 >0 = e−ψ(s) Tr ρ 2 Aσ − 2

(21)

which means that ψ(s) is a convex function. Next, ϕ(λ) is the Legendre transformation of ψ(s) and hence is convex. Letting def

sˆ(λ) = arg max{λs − ψ(s)} 0≤s≤1

so that ϕ(λ) = λˆ s(λ) − ψ(ˆ s(λ)), we can see from Figs. 2.1 and 2.2 the following facts. (i)

If λ ≤ ψ (0) = D(ρ||σ), sˆ(λ) = 0 and ϕ(λ) = 0.

entire

December 28, 2004

13:56



35

s

( )

s

'() s^() 1

0

Fig. 2.1.

2

0

(1)

D(jj ) s s

The graph of ψ(s)

(1)

'()

r

D(jj ) 0

(1)

(1)

u(r )

0

D(jj )

Fig. 2.2.

(r)

0

(1)

2

0

(1)

(1)

The graph of ϕ(λ)

(ii) If D(ρ||σ) < λ < ψ (1), 0 < sˆ(λ) < 1,

ϕ(λ) > 0

s(λ)) = λ. and ψ (ˆ

(iii) If λ > ψ (1), sˆ(λ) = 1 and ϕ(λ) = λ − ψ(1) > 0. Although not essential in the following, it may be worth pointing out that for 0 ≤ s ≤ 1 we have the inverse Legendre transformation ψ(s) =

max

−∞ 0 and 1 − ε − e−nϕ(λ) > 0 for sufficiently large n. Thus (24) yields 1 1 log βn∗ (ε) ≥ −λ + log(1 − ε − e−nϕ(λ) ), n n and hence 1 lim inf log βn∗ (ε) ≥ −λ. n→∞ n Since λ can be arbitrarily close to D(ρ||σ), (23) has been proved. 5. The Strong Converse Property For each r > 0, there uniquely exists a number λ∗ = λ∗ (r) such that ϕ(λ∗ ) = r − λ∗ (see Fig. 2.2), and we define u(r) = ϕ(λ∗ (r)) = r − λ∗ (r). def

Theorem 6: For any test {An } and any r > 0, if 1 lim sup log βn (An ) ≤ −r n→∞ n then 1 lim sup log(1 − αn (An )) ≤ −u(r). n→∞ n

(25)

(26)

(27)

entire

December 28, 2004

13:56



37

Proof: Given an arbitrary δ > 0, it follows from the assumption (26) that there exists a number n0 such that βn (An ) ≤ e−n(r−δ)

∀n ≥ n0 .

∗

Then letting λ = λ (r) in (9), we have ∗

∗

1 − αn (An ) ≤ e−nϕ(λ ) + e−n(r−δ−λ

)

= e−nu(r) + e−n(u(r)−δ) ,

∀n ≥ n0 ,

and hence lim sup n→∞

1 log(1 − αn (An )) ≤ −u(r) + δ. n

Since δ > 0 is arbitrary, (27) has been proved. Now we can easily see that if r > D(ρ||σ) then u(r) > 0 (see Fig. 2.2). Thus we conclude that the strong converse holds as follows. Corollary 7: For any test An , if 1 lim sup log βn (An ) < −D(ρ||σ), n n→∞

(28)

then αn (An ) goes to 1 exponentially. A compact expression of the function u(r) is given in the following theorem. This is an analog of the representation of the rate function for classical hypothesis testing given by Blahut (Theorem 7 [3]), although what he treated there was not the exponent of 1 − αn (An ) but that of αn (An ). Theorem 8:

u(r) = max

0≤s≤1

s 1 r− ψ(s) . 1+s 1+s

(29)

Proof: For any r > 0 and any 0 ≤ s ≤ 1, we obtain from (25) u(r) = ϕ(λ∗ (r)) ≥ λ∗ (r)s − ψ(s)

(30)

and u(r) = r − λ∗ (r). Eliminating λ∗ (r) from these equations, we have s 1 r− ψ(s). (31) 1+s 1+s Now there exists a number s in 0 ≤ s ≤ 1 achieving the equality in (30) by the definition of ϕ(λ), and the s achieves the equality in (31) as well. Consequently, we obtain (29). u(r) ≥

entire

December 28, 2004

13:56



38

6. Relation with Classical Hypothesis Testing In this section, we discuss the relation between u(r) and the power exponent derived by Han and Kobayashi [7] for classical hypothesis testing. Let p and q be arbitrary probability distributions on a finite set X and denote their nth independent and identically distributed (i.i.d.) extensions by pn and q n . We consider the hypothesis testing problem for the hypotheses pn and q n and denote the error probabilities by αn (An ) = pn (Acn ) and βn (An ) = q n (An ) for an acceptance region An ⊂ X n . After Blahut [3] showed that if βn (An ) ≤ e−nr with r > D(p||q) then αn (An ) necessarily tends to 1 as n → ∞ at least with a polynomial rate, Han and Kobayashi [7] proved that the actual rate is exponential and found the optimal exponent. Namely, letting α∗n (r) = min{αn (An ) | An ⊂ X n , βn (An ) ≤ e−nr }, def

they derived lim inf n→∞

1 log(1 − α∗n (r)) = −˜ u(r), n

(32)

where def

u ˜(r) =

min pˆ : D(p||q)≤r ˆ

{D(ˆ p||p) + r − D(ˆ p||q)} .

(33)

Now assume that r is greater than but sufficiently near D(p||q). Then as remarked in [7], the minimum of (33) is attained with D(ˆ p||q) = r and hence we have u ˜(r) =

min pˆ : D(p||q)=r ˆ

D(ˆ p||p).

As shown in Appendix B, (34) is rewritten as    s  1 r− log u ˜(r) = max p1+s qj−s , j 0≤s≤1  1 + s  1+s

(34)

(35)

j∈X

which just corresponds to (29). It is interesting to observe that a direct translation of the argument of Appendix B for deriving (35) from (34) into the quantum setting yields s 1 r− D(ˆ ρ||ρ) = max ψ(s) min 0≤s≤1 1 + s 1+s ρˆ : D(ρ||σ)=r ˆ where

def ψ(s) = log Tr e(1+s) log ρ−s log σ .

entire

December 28, 2004

13:56



39

By the Golden-Thompson inequality Tr eA+B ≤ Tr eA eB for Hermitian operators A, B (see e.g. [1], p. 261), we can see that ψ(s) ≥ ψ(s) and that the equality holds if and only if ρ and σ commute. Thus we have min

ρˆ : D(ρ||σ)=r ˆ

D(ˆ ρ||ρ) ≥ ϕ(λ∗ ).

(36)

7. Concluding Remarks We have derived a new inequality for quantum hypothesis testing problem and proved the quantum Stein’s lemma and the strong converse property as its applications. The method used here is a kind of combination of classical large deviation technique such as Cramér’s theorem and the use of the monotonicity property of some quantum information measures. It is interesting to compare the results and the argument of this paper with those of [11] which proved the strong converse in the quantum channel coding problem. The former is essentially based on the operator convexity of f (u) = ut for 1 ≤ t ≤ 2 or −1 ≤ t ≤ 0, while the latter relies upon the operator concavity and the operator monotonicity of f (u) = ut for 0 ≤ t ≤ 1. In classical hypothesis testing the optimal exponent has been obtained by [7]. On the other hand, it is yet to be determined whether the exponent u(r) derived here for the quantum hypothesis testing is optimal or not. Appendix A. Definition of Sf (ρ||σ) Here we summarize the original definition of Sf (ρ σ) given by Petz [14], translating it into our notation, for readers’ convenience. Given arbitrary strictly positive density operators ρ, σ ∈ S(H), define the relative modular operator ∆ = ∆σ,ρ : L(H) → L(H) by ∆(A) = σAρ−1 def

∀A ∈ L(H)

or, equivalently, by + ∆(A), B− ρ = A, Bσ

∀A, B ∈ L(H)

entire

December 28, 2004

13:56



40

+ where , − ρ and , σ denote the inner products on L(H) such that

∗ def def ∗ A, B− and A, B+ ρ = Tr ρA B σ = Tr σBA . Then since ∆ is Hermitian and positive definite as an operator on the Hilbert space (L(H), , − ρ ), the ordinary operator calculus enables us to define the Hermitian operator f (∆) on the same Hilbert space for an arbitrary function f : R+ → R. Now let ∗ def SfA (ρ σ) = A, f (∆)(A)− ρ = Tr ρA f (∆)(A) for an arbitrary operator A ∈ L(H), and let def

Sf (ρ σ) = SfI (ρ σ). Petz called SfA (ρ σ) an quasientropy and proved its monotonicity in the case when f is operator convex. Using the spectral representations (13) we have bj bj Gj Fi ∆(Gj Fi ) = Gj Fi and f (∆)(Gj Fi ) = f ai ai which, combined with I =

i

j

Gj Fi , leads to (14).

B. Proof of (35) Let us define a family of probability distributions {p(s) | − ∞ < s < ∞} including p(0) = p and p(−1) = q as follows: p(s)j = e−ψ(s) p1+s qj−s j def

˜

(j ∈ X )

where ˜ def ψ(s) = log

p1+s qj−s . j

j∈X

On the assumption that r is greater than but sufficiently near D(p||q), there exists a number 0 < s∗ < 1 such that D(p(s∗ )||q) = r. We claim p(s∗ ) = arg min D(ˆ p||p). pˆ : D(p||q)=r ˆ

(37)

entire

December 28, 2004

13:56



41

To prove this, let pˆ be an arbitrary distribution satisfying D(ˆ p||q) = r. Then we have 0 = D(ˆ p||q) − D(p(s∗ )||q) = {ˆ pj log pˆj − pˆj log qj − p(s∗ )j log p(s∗ )j + p(s∗ )j log qj } j

= D(ˆ p||p(s∗ )) +

{ˆ pj − p(s∗ )j }{log p(s∗ )j − log qj }

j ∗

= D(ˆ p||p(s )) + (1 + s∗ )

{ˆ pj − p(s∗ )j }{log pj − log qj }

j

and similarly ∆ = D(ˆ p||p)−D(p(s∗ )||p) = D(ˆ p||p(s∗ )) + s∗ def

{ˆ pj − p(s∗ )j }{log pj − log qj } .

j

Eliminating the common term equations we obtain ∆=

j

{ˆ pj − p(s∗ )j }{log pj − log qj } from these

1 D(ˆ p||p(s∗ )) ≥ 0, 1 + s∗

which proves (37). Next we claim ˜ ∗ ) = r − λ∗ , D(p(s∗ )||p) = ϕ(λ where def

ϕ(λ) ˜ = max

0≤s≤1

˜ λs − ψ(s) ,

λ∗ =

def

(38)

p(s∗ )j (log pj − log qj ).

j

Since a direct calculation based on the assumption that D(p(s∗ )||q) = r yields ˜ ∗ ) = r − λ∗ , D(p(s∗ )||p) = λ∗ s∗ − ψ(s we have only to show that

˜ s∗ = arg max λ∗ s − ψ(s) . s

˜ Observing that ψ(s) is a convex function, this equation is reduced to λ∗ = ∗ ˜ ψ (s ), which can be verified by a direct calculation (cf. (20)). Now it is easy to see that (35) is obtained from (34), (37), and (38) in the same way as the proof of (29).

entire

December 28, 2004

13:56


42


Acknowledgments The authors wish to thank Prof. F. Hiai for his helpful comments. They are also grateful to Prof. T. S. Han for encouraging them to prove the strong converse for quantum hypothesis testing. References [1] R. Bhatia, Matrix Analysis, Springer, New York, 1997. [2] R. E. Blahut, “Hypothesis testing and information theory,” IEEE Trans. Inform. Theory, 20, 405–417, (1974). [3] R.E. Blahut, Principles and practice of information theory, Reading, Addison-Wesley, MA, 1987. [4] I. Csisz´ ar, “Information type measures of difference of probability distributions and indirect observations,” Studia Sci. Math. Hungar., 2, 299–318, 1967. [5] A. Dembo and O. Zeitouni, Large deviation techniques and applications, Jones and Bartlett, Boston, 1993. [6] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [7] T.S. Han and K. Kobayashi, “The strong converse theorem for hypothesis testing,” IEEE Trans. Inform. Theory, 35, 178–180 (1989). [8] F. Hiai and D. Petz, “The proper formula for relative entropy and its asymptotics in quantum probability,” Commun. Math. Phys., 143, 99–114 (1991). (Chap. 3 of this book) [9] E. H. Lieb, “Convex trace functions and the Wigner-Yanase-Dyson conjecture,” Advances in Math., 11, 267–288 (1973). [10] G. Lindblad, “Completely positive maps and entropy inequalities,” Commun. Math. Phys., 40, 147–151 (1975). [11] T. Ogawa and H. Nagaoka, “Strong converse to the quantum channel coding theorem,” IEEE Trans. Inform. Theory, 45, 2486–2489 (1999). [12] T. Ogawa and H. Nagaoka, “Strong converse theorems in the quantum information theory,” In Proc. of 1999 IEEE Information Theory and Networking Workshop, (ITW99 Metsovo) 54, 1999. [13] D. Petz, “Quasi-entropies for states of a von Neumann algebra,” Publ. RIMS, Kyoto Univ., 21, 787–800 (1985). [14] D. Petz, “Quasientropies for finite quantum systems,” Rep. Math. Phys., 23, 57–65 (1986). [15] A. Uhlmann, “Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpolation theory,” Commun. Math. Phys., 54, 21–32 (1977).

entire

December 28, 2004

13:56


CHAPTER 3

The Proper Formula for Relative Entropy and its Asymptotics in Quantum Probability Fumio Hiai and Dénes Petz Abstract. Umegaki’s relative entropy S(ω, ϕ) = Tr Dω (log Dω − log Dϕ ) (of states ω and ϕ with density operators Dω and Dϕ , respectively) is shown to be an asymptotic exponent considered from the quantum hypothesis testing viewpoint. It is also proved that some other versions of the relative entropy give rise to the same asymptotics as Umegaki’s one. As a byproduct, the inequality Tr A log AB ≥ Tr A(log A + log B) is obtained for positive definite matrices A and B.

1. Introduction and Motivation The relative entropy is an information quantity attached to two states of a system. In commutative (or classical) probability theory the states correspond to measures on a measurable space. When ν = (ν1 , ν2 , . . . , νn ) and µ = (µ1 , µ2 , . . . , µn ) are probability distributions, for the sake of simplicity, on an n-point space, the relative entropy (called also information divergence) introduced by Kullback and Leibler [17] is defined by S(ν, µ) =

i

νi log

νi . µi

(1)

In noncommutative (or quantum) probability theory the relative entropy of normal positive functionals was first studied by Umegaki [33] in the case of semifinite von Neumann algebras as the noncommutative extension of information divergence. Later on Araki [1, 2] extended it to the case of general von Neumann algebras by means of the notion of a relative modular operator. On the other hand Uhlmann [32] introduced the relative entropy of positive functionals of arbitrary ∗-algebras by a quadratic interpolation method. The importance of relative entropy has been justified by the fact that one encounters this quantity in dealing with a number of different problems. c 1991 Springer-Verlag. Reprinted, with permission, from Commun. Math. Phys., 143, 99-114, 1991. 43

entire

December 28, 2004

13:56



44

In quantum theory the states of a system correspond to positive operators of trace one. (These operators are called densities.) In particular, in the setting of matrix algebras Umegaki’s relative entropy of a state ω with respect to another state ϕ is defined by S(ω, ϕ) = Tr Dω (log Dω − log Dϕ ),

(2)

where Tr denotes the usual trace on matrices and Dω the density of ω with respect to Tr. [Note that S(ω, ϕ) sometimes is written as S(ϕ, ω).] This definition does not seem to be canonical. Compared with (1) one could suppose that other expressions like ω(pi ) : ω(pi ) log Sco (ω, ϕ) = sup ϕ(p i) i p1 , . . . , pn are projectors, pi = 1 , (3) Scp (ω, ϕ) = sup

i

i

ω(ai ) ω(ai ) log ai = 1 : a1 , . . . , an ≥ 0, ϕ(ai ) i

(4)

or −1 1/2 SBS (ω, ϕ) = Tr Dω log Dω1/2 Dϕ Dω

(5)

are as good as (2). The quantities Sco and Scp appeared in [8] and they may be related to observations which are projection-valued or positive operator-valued measures. The definitions (4) and (3) are of the form sup{S(ω ◦ α, ϕ ◦ α) : α}, where α runs over all positive (and multiplicative, respectively) unital maps of finite dimensional commutative C ∗ -algebras into the given matrix algebra. Given a1 , . . . , an ≥ 0, i ai = 1, the in equality i ω(ai ) log ω(ai )/ϕ(ai ) ≤ S(ω, ϕ) follows by applying the monotonicity of relative entropy [16, 32] to a positive unital map α((ξi )) = ∞ i ξi ai , (ξi ) ∈ ln . See also [8, 9] for this inequality. Thus, the inequality Scp (ω, ϕ) ≤ S(ω, ϕ) holds, while Sco (ω, ϕ) ≤ S(ω, ϕ) is trivial. But the equality here is very restrictive. In fact, it is known [24] that if ω and ϕ are faithful normal states of a von Neumann algebra M, then ω must commute with ϕ whenever S(ω|N , ϕ|N ) = S(ω, ϕ) < +∞ holds for some commutative von Neumann subalgebra N of M. The quantity SBS was introduced in [5] (in a more general setting) and it appeared in [12] in operator form. In this paper it will be shown that the entropy quantities Sco , Scp , and S give rise to the same asymptotic mean in the infinite tensor product system.

entire

December 28, 2004

13:56


The Proper Formula for Relative Entropy and its Asymptotics

45

We want to deal with the question of the proper definition of information divergence of two states in noncommutative probability theory. This question can be approached from two essentially different points of view. One can search for plausible postulates which should be satisfied by a good notion of relative entropy and one can try to show that S(ω, ϕ) is the only functional which meets all the desiderata. In this point it was proved in [27] that up to a constant factor only Umegaki’s relative entropy satisfies a reasonable set of postulates. Our approach in the present paper is more pragmatic. We consider the asymptotics of certain probabilities and observe that Umegaki’s relative entropy naturally shows up. In Sec. 2 of this paper we state the main results in the framework of finite dimensional C ∗ -algebras. Let A be a finite dimensional C ∗ -algebra (i.e., a finite direct sum of matrix algebras) with a fixed state ϕ. As the reference state we take the product state ϕ∞ = ⊗∞ −∞ ϕ on the infinite A. Let ψ be a stationary (i.e. invariant C ∗ -tensor product A∞ = ⊗∞ −∞ for the right shift) state of A∞ . Then one has the mean relative entropy SM (ψ, ϕ∞ ) = limn→∞ n1 S(ψn , ϕn ), where ψn = ψ| ⊗n1 A and ϕn =ϕ| ⊗n1 A. The mean relative entropy plays an important role in classical as well as quantum statistical mechanics and behaves as a rate function in limit theorems of large deviation type (cf. [11, 25, 28, 29]). Our first theorem says that limn→∞ n1 Sco (ψn , ϕn ) = SM (ψ, ϕ) for every stationary state ψ of A∞ . In particular we have limn→∞ n1 Sco (ωn , ϕn ) = S(ω, ϕ) for every state ω of A. This has an interesting corollary that SBS (ω, ϕ) ≥ S(ω, ϕ) for all states ω and ϕ of A. Moreover for n ≥ 1 and 0 < ε < 1 let us introduce the following quantities: n βε (ψn , ϕn ) = inf log ϕn (q) : q is a projection in ⊗1 A with ψn ≥ 1 − ε (6) and

Spr (ψn , ϕn ) = sup

ψn (q) 1 − ψn (q) + (1 − ψn (q)) log : ϕn (q) 1 − ϕn (q) q is a projection in ⊗n1 A . (7)

ψn (q) log

The quantity βε (ψn , ϕn ) has a natural meaning from the viewpoint of quantum hypothesis testing (cf. [4, 6, 13]). More precisely, let us suppose two hypotheses H0 and H1 so that the system A∞ has states ψ and ϕ∞ under H0 and H1 , respectively. A projection q in ⊗n1 A means a “quantum question”

entire

December 28, 2004

46

13:56



of size n, whose outcomes are the eigenvalues 1 or 0. We decide that H0 (respectively H1 ) is true if the outcome of q is 1 (respectively 0). Then ϕn (q) (respectively ϕn (1 − q)) gives the “probability” of the error of accepting H0 (respectively H1 ) when H1 , (respectively H0 ) actually is true. In this way we can consider the quantity exp{βε (ψn , ϕn )} as the bound of the first error probability over all decision rules of size n such that the second error probability does not exceed ε. Now assume that ψ is completely ergodic in the sense that it is ergodic for any power of the right shift. (This is the case when ψ is weakly mixing.) Then our second theorem says that lim supn→∞ n1 βε (ψn , ϕn ) ≤ 1 SM (ψ, ϕ∞ ). Thus we −SM (ψ, ϕ∞ ) and lim inf n→∞ n1 βε (ψn , ϕn ) ≥ − 1−ε can relate the mean relative entropy to a certain kind of asymptotic error bound in the quantum hypothesis testing. It could be mentioned that a desire for the visualization of noncommutative relative entropy as the logarithm of certain probabilities was formulated in [8] in connection with an interpretation of quantum theory. We finally establish that limn→∞ n1 Spr (ψn , ϕn ) = SM (ψ, ϕ∞ ). The proofs of these theorems are given in Sec. 3. The first theorem can be proved by a direct combinatorial computation. The proof of the second theorem is based on the Shannon-McMillan-Breiman theorem and the mean ergodic theorem together with the first theorem. In Sec. 4 we note that our theorems hold in AF C ∗ -algebras or hyperfinite von Neumann algebras as well. Furthermore we show that if ψ is a tracial ergodic state of A∞ , then limn→∞ n1 βε (ψn , ϕn ) = −SM (ψ, ϕ∞ ) holds for every 0 < ε < 1. 2. Main Results in Finite Dimensional C ∗ -Algebras In this section we state our main theorems in the setting of finite dimensional C ∗ -algebras. Their proofs will be presented in Sec. 3. Although the theorems hold true in the framework of AF C ∗ -algebras or hyperfinite von Neumann algebras as will be noted in Sec. 4, these extensions are very easy and our essential ideas consist in the finite dimensional case; so we restrict our detailed discussions to this case. First let us fix the notations. Let A be a finite dimensional C ∗ -algebra. Then A is identified with ⊗L l=1 Mdl (C) which is the direct sum of dl × dl matrix algebras Mdl (C), canonically dl ∗ represented on the Hilbert space ⊗L l=1 C . We denote by An the n-fold C n ∗ tensor product ⊗1 A for n ≥ 1, and by A∞ the two-sided infinite C -tensor product ⊗∞ −∞ A. Then {An } is considered as an increasing sequence of (fi-

entire

December 28, 2004

13:56



47

nite dimensional) C ∗ -subalgebras of A∞ by the natural inclusions. Let γ denote the right shift automorphism of A∞ . A state ψ of A∞ is said to be stationary if ψ is γ-invariant (i.e. ψ ◦ γ = ϕ). For a state ϕ of A∞ we denote n by ϕ∞ the infinite product state ⊗∞ −∞ ϕ, and let ϕn = ϕ∞ |An (= ⊗1 ϕ). We fix a state ϕ of A in the following discussions. Let ψ be a stationary state of A∞ and ψn = ψ|An , n ≥ 1. The relative entropies S(ψn , ϕn ) are defined as (2). Then in view of the superadditivity of relative entropy [23] and the stationarity of ψ we get S(ψm+n , ϕm+n ) ≥ S(ψm , ϕm ) + S(ψn , ϕn ),

m, n ≥ 1

so that limn→∞ n1 S(ψn , ϕn ) exists; in fact (cf. [10]) lim

n→∞

1 1 S(ψn , ϕn ) = sup S(ψn , ϕn ). n n≥1 n

(8)

We denote this limit by SM (ψ, ϕ∞ ), which is called the mean relative entropy of ψ with respect to ϕ∞ . Note that if ω is a state of A then SM (ω∞ , ϕ∞ ) = S(ω, ϕ) because S(ωn , ϕn ) = nS(ω, ϕ), n ≥ 1. In Sec. 1 besides Umegaki’s relative entropy we referred to some other entropy quantities (3)-(5) and (7). The quantities Sco (ψn , ϕn ) as in (3) are equivalently defined by Sco (ψn , ϕn ) = sup{S(ψn |B, ϕn |B) : B is a commutative C ∗ -subalgebra of An }. The next theorem shows that the asymptotic limit of Sco (ψn , ϕn ) exists and coincides with the mean relative entropy. Theorem 1: For every stationary state ψ of A∞ , lim

n→∞

1 Sco (ψn , ϕn ) = SM (ψ, ϕn ). n

(9)

A stationary state ψ of A is said to be ergodic if it is extremal in the set of stationary states. For ergodicity in general C ∗ -dynamical systems, see [10, 30] for instance. We say that ψ is weakly mixing if for every a, b ∈ A∞ , n−1 1 |ψ(γ i (a)b) − ψ(a)ψ(b)| = 0. n→∞ n i=0

lim

Obviously this is the case when ψ is strongly mixing (or strongly clustering), that is, limn→∞ ψ(γ n (a)b) = ψ(a)ψ(b) for every a, b ∈ A∞ . As in classical ergodic theory it is known [10] that if ψ is weakly mixing then it is ergodic. Note that the product state ω∞ defined by a state ω of A is strongly mixing.

entire

December 28, 2004

13:56



48

In the following we say that ψ is completely ergodic if it is ergodic for all γ n , n ≥ 1. We know that a weakly mixing state ψ is completely ergodic because it is weakly mixing for all γ n . Let βn (ψn , ϕn ) be defined by (6) for n ≥ 1 and 0 < ε < 1. The next theorem shows that we have for large n 1 βn (ψn , ϕn ) ≈ exp{−SM (ψ, ϕ∞ )} exp n when ε is sufficiently small and ψ is completely ergodic. Thus exp{−SM (ψ, ϕ∞ )} can be considered as the asymptotic error bound in the quantum hypothesis test for {ψ, ϕ∞ }. Theorem 2: If ψ is a completely ergodic state of A∞ , then for every 0 < ε < 1, 1 (10) lim sup βε (ψn , ϕn ) ≤ −SM (ψ, ϕ∞ ), n→∞ n 1 1 SM (ψ, ϕ∞ ). (11) lim inf βε (ψn , ϕn ) ≥ − n→∞ n 1−ε It may be possible that limn→∞ n1 βε (ψn , ϕn ) = −SM (ψ, ϕ∞ ) for every 0 < ε < 1 and every ergodic state ψ. In particular when ψ is a tracial ergodic state, this will be shown in Sec. 4. The quantities Spr (ψn , ϕn ) in (7) are also defined by Spr (ψn , ϕn ) = sup{S(ψn |B, ϕn |B) : B is a two dimensional subalgebra of An }. Note that Spr (ψn , ϕn ) ≤ Sco (ψn , ϕn ) ≤ Scp (ψn , ϕn ) ≤ S(ψn , ϕn )

(12)

by the monotonicity of relative entropy as mentioned in Sec. 1. As for completely ergodic states we can make Theorem 1 extremely sharp as follows. Indeed the method in proving (11) will work for Theorem 3 as well. Theorem 3: If ψ is a completely ergodic state of A∞ , then 1 lim Spr (ψn , ϕn ) = SM (ψ, ϕ∞ ). n→∞ n As a special case we have for every state ω of A, 1 1 1 lim Spr (ωn , ϕn ) = lim Sco (ωn , ϕn ) = lim Scp (ωn , ϕn ) = S(ω, ϕ). n→∞ n n→∞ n n→∞ n (13)

entire

December 28, 2004

13:56



49

This means that Umegaki’s relative entropy comes out when we first adopt any of the quantities Spr , Sco or Scp and then take asymptotics. The following examples show that the ergodicity assumption of ψ is essential in Theorems 2 and 3. We are indebted to the referee for the first example. Example 4: (1) Let 0 < ε < 1/2 and ψ be a stationary state of A∞ with ψ "= ϕ∞ . Then SM (2εϕ∞ + (1 − 2ε)ψ, ϕ∞ ) > 0 because otherwise by (8) we get 2εϕ∞ + (1 − 2ε)ψ = ϕ∞ so that ψ = ϕ∞ . On the other hand, since 2εϕn (q) + (1 − 2ε)ψn (q) ≥ 1 − ε implies ϕn (q) ≥ 1/2 for a projection q in An , we have βε (2εϕn + (1 − 2ε)ψn , ϕn ) ≥ − log 2. Hence lim

n→∞

1 βε (2εϕn + (1 − 2ε)ψn , ϕn ) = 0. n

(2) Let A = C ⊕ C and ω 1 , ω 2 , ϕ be given with the densities (1, 0), (0, 1), 1 2 (α, 1 − α), respectively, where 0 < α < 1/2. Let ψ = 12 (ω∞ + ω∞ ). By the affinity of SM (·, ϕ∞ ) (see [28]) we get SM (ψ, ϕ∞ ) =

1 1 {S(ω 1 , ϕ) + S(ω 2 , ϕ)} = − {log α + log(1 − α)}. 2 2

But we easily see that Spr (ψn , ϕn ) is the maximum of log

1 = −n log{αn + (1 − α)n }1/n αn + (1 − α)n

and

1 1 1 1 1 log n + log log α < SM (ψ, ϕ∞ ). = max − log(1 − α), − 2 2α 2 2(1 − α)n 2

In the rest of this section, using Theorem 1 let us establish the relation between the relative entropy S(ω, ϕ) and the entropy quantity SBS (ω, ϕ) in (5). The expression (5) implicitly means that SBS (ω, ϕ) = +∞ if the support projection of ω is not dominated by that of ϕ. The main properties of SBS follow from [12] devoted to an operator-valued relative entropy. We here state the additivity and the monotonicity of SBS . Proposition 5: (1) SBS (ω1 ⊗ ω2 , ϕ1 ⊗ ϕ2 ) = SBS (ω1 , ϕ1 ) when ωi and ϕi are states of (finite dimensional) C ∗ -algebras Ai , i = 1, 2. (2) SBS (ω|B, ϕ|B) ≤ SBS (ω, ϕ) for any C ∗ -subalgebra B of A.

entire

December 28, 2004

13:56



50

Indeed (1) is obvious from the definition. Although (2) follows from the operator-valued version of [12], we briefly recall the proof for the convenience of the reader. We may assume that ϕ is faithful. Since X(log X ∗ X) = (log XX ∗ )X holds for a matrix X, it follows that −1/2 −1/2 Dω Dϕ )), SBS (ω, ϕ) = − Tr(Dϕ η(Dϕ

where η(t) = −t log t, t ≥ 0. By the operator-concavity of η we get −1/2 −1/2 −1/2 −1/2 α(η(Dϕ Dω Dϕ )) ≤ η(α(Dϕ Dω Dϕ )),

where α(X) = E(Dϕ )−1/2 E(Dϕ XDϕ )E(Dϕ )−1/2 with the conditional expectation E from A onto B with respect to Tr. Hence 1/2

1/2

1/2 −1/2 −1/2 1/2 E(Dϕ (Dϕ Dω Dϕ )Dϕ )

≤ E(Dϕ )1/2 η(E(Dϕ )−1/2 E(Dω )E(Dϕ )−1/2 )E(Dϕ )1/2 . Taking the trace of both sides proves (2). Now Theorem 1 together with Proposition 5 has an interesting consequence as follows. Corollary 6: SBS (ω, ϕ) ≥ S(ω, ϕ) for all states ω and ϕ of A. Proof: Since SBS = S for commuting states, we have SBS (ω, ϕ) ≥ Sco (ω, ϕ) by (2) of the above proposition. In particular, for every n ≥ 1, SBS (ωn , ϕn ) ≥ Sco (ωn , ϕn ), so that by the above (1), 1 Sco (ωn , ϕn ). n Letting n → ∞ we infer the corollary due to Theorem 1. SBS (ω, ϕ) ≥

It is quite remarkable that the corollary is equivalent to the trace inequality Tr A log A1/2 BA1/2 ≥ Tr A(log A + log B)

(14)

for nonnegative matrices A and B. In fact, (14) is immediate from the corollary when B is invertible. Take the limit from B + εI, ε > 0, for general nonnegative B. When A and B are positive invertible, one can define log AB by analytical functional calculus or by power series and get the equality Tr A log A1/2 BA1/2 = Tr A log AB because Tr A(A1/2 BA1/2 )n = Tr A(AB)n for n ≥ 1.

entire

December 28, 2004

13:56



51

3. Proofs of Theorems In this section we present the proofs of Theorems 1–3. Let us keep the notations fixed in the previous section. Let Tr denote the canonical trace of A such that Tr(e) = 1 for all one dimensional projections e in A. Let Dϕ , be the density of ϕ with Tr, and K be the sum of the sizes of

respect to L simple summands of A i.e. K = l=1 dl . Taking the spectral decompoK sition of Dϕ , one can write Dϕ = k=1 λk ek , where ek are one dimensional projections. Let n be an arbitrary fixed positive integer. For each K-tuple K (n1 , n2 , . . . , nk ) of nonnegative integers with k=1 nk = n, we denote by In1 ,...,nK the set of all (i1 , i2 , . . . , in ) such that #{j : ij = k} = nk for 1 ≤ k ≤ K, and define the projection pn1 ,...,nK in An by pn1 ,...,nK = ei1 ⊗ ei2 ⊗ · · · ⊗ ein .

(i1 ,...,in )∈In1 ,...,nK

Then n1 ,...,nK pn1 ,...,nK = 1 and the density Dϕn of ϕn with respect to Trn = ⊗ni Tr is given by K ! nk n Dϕ n = ⊗ 1 Dϕ = λk (15) pn1 ,...,nK . n1 ,...,nK

k=1

Let En denote the conditional expectation from An onto ⊗n1 ,...,nK pn1 ,...,nK An pn1 ,...,nK with respect to Trn , which is given by En (x) = pn1 ,...,nK xpn1 ,...,nK , x ∈ An . (16) n1 ,...,nK

The (von Neumann) entropy S(ω) of a state ω of An is defined by S(ω) = − Trn (Dω log Dω ), where Dω is the density of ω with respect to Trn . Lemma 7: If ω is a state of An and B is the commutative subalgebra of An generated by {pn1 ,...,nK Dω pn1 ,...,nK }n1 ,...,nK ∪ {pn1 ,...,nK }n1 ,...,nK , then S(ω, ϕn ) = S(ω|B, ϕ|B) + S(ω ◦ En ) − S(ω). Proof: Let s(ω) and s(ϕn ) denote the support projections of ω and ϕn , respectively. Since Dϕn ∈ B by (15), we have s(ϕn ) = s(ϕn |B) ∈ B. If s(ω) ≤ s(ϕn ) is not satisfied, then the desired equality holds because S(ω, ϕn ) = S(ω|B, ϕn |B) = +∞. So suppose s(ω) ≤ s(ϕn ). In this case we may assume that ϕn is faithful; otherwise consider the restrictions of ω and ϕn to s(ϕn )An s(ϕn ) and Bs(ϕn ). Since B is included in the centralizer

entire

December 28, 2004

13:56



52

of ϕn , the conditional expectation EB from An onto B with respect to ϕn exists due to [31]. Hence we get by [14, 22, 23] S(ω, ϕn ) = S(ω|B, ϕ|B) + S(ω, ω ◦ EB ).

(17)

Since En (Dω ) ∈ B and B ⊂ En (An ) by (16), we get for every a ∈ An , (ω ◦ EB )(a) = Trn (Dω EB (a)) = Trn (En (Dω )EB (a)) −1 −1 En (Dω )EB (a)) = ϕn (EB (Dϕ En (Dω )a)) = ϕn (Dϕ n n −1 En (Dω )a) = Trn (En (Dω )a), = ϕn (Dϕ n

so that En (Dω ) is the density of ω ◦ EB as well as ω ◦ En with respect to Trn . Therefore ω ◦ EB = ω ◦ En and S(ω, ω ◦ En ) = Trn Dω (log Dω − log En (Dω )) = S(ω ◦ En ) − S(ω). (18) The desired equality follows from (17) and (18). Lemma 8: For every state ω of An , S(ω ◦ En ) − S(ω) ≤ K log(n + 1). Proof: In view of the joint convexity of the relative entropy [1, 16] it suffices by (18) to show the case when ω is a pure state. In this case Dω is a one dimensional projection and S(ω) = 0. Since each pn1 ,...,nK Dω pn1 ,...,nK is of rank one or zero, it follows that the rank of En (Dω ) is at most K # (n1 , . . . , nk ) : n1 , . . . , nk ≥ 0, nk = k (≤ (n + 1)K ). k=1

Therefore S(ω ◦ En ) ≤ log(n + 1)K = K log(n + 1), as desired. Proof of Theorem 1:

For every n we have

Sco (ψn , ϕn ) ≤ S(ψn , ϕn ) ≤ Sco (ψn , ϕn ) + K log(n + 1), by (12) and by Lemmas 7 and 8 applied to ψn . This proves (9). In the sequel of this section we assume that ψ is a completely ergodic state of A∞ . Proof of (10) of Theorem 2: For any r > −SM (ψ, ϕ∞ ) let us choose h < SM (ψ, ϕ∞ ) and δ > 0 with −h + δ < r. By Theorem 1 there exists a commutative C ∗ -subalgebra of B of Al for some l ≥ 1 such that S(ψl |B, ϕl |B) ≥ lh.

(19)

entire

December 28, 2004

13:56



53

We can consider Bk as a C ∗ -subalgebra of Akl , k ≥ 1, and B∞ as a C ∗ subalgebra of A∞ . Let σ = γ l |B∞ , the right shift on B∞ . Define µ = ϕl |B, ν = ψ|B∞ , µ∞ = ⊗∞ −∞ µ, µk = µ∞ |Bk and νk = ν|Bk . These states may be identified with the probability measures on the corresponding underlying spaces. Since ψ is completely ergodic, we can readily see that ν is ergodic for σ. In the following we work in the (commutative) von Neumann algebra πν (B∞ ) where πν is the GNS representation of B∞ associated with ν. We denote the normal extensions of ν and σ to πν (B∞ ) by the same ν and σ. First suppose ν1 % µ (i.e. s(ν1 ) ≤ s(µ)) is not satisfied. Then there is a projection p in B such

" that µ(p) = 0 and ν1 (p) > 0. Since the ergodicity k−1 i of ν implies that πν i=0 σ (p) → 1 strongly as k → ∞, there exists k0

" "k0 −1 i k0 −1 i such that ν i=0 σ (p) ≥ 1 − ε. Set q = i=0 σ (p) and n0 = k0 l. Then q is a projection in An0 such that ϕ(q) ≥ 1 − ε and ! k −1 0 # i σ (p) ≤ ko µ(p) = 0. ϕn0 (q) = µk0 i=0

Therefore βε (ψn , ϕn ) = −∞ for all n ≥ n0 , proving (10). Next suppose ν1 % µ. Then we see that νk % µk for every k because s(νk ) ≤ ⊗k1 s(ν1 ) ≤ ⊗k1 s(µ) = s(µk ). Let m denote the trace of B such that m(e) = 1 for all atoms e in B (i.e. m is the counting measure on the underlying space of B). Let us consider the selfadjoint operators Hk , k ≥ 1, in πν (B∞ ) given by 1 dνk dµk Hk = − πν log πν log k dmk dmk k−1 1 dνk 1 i dµ = πν log σ − πν log . (20) k dmk k i=0 dm By the Shannon-McMillan-Breiman theorem (cf. [3, 21]) the first term of (20) converges ν-almost uniformly as k → ∞ to the Kolmogorov-Sinai entropy hν (σ) of σ relative to ν. On the other hand, by the mean ergodic theorem the second term of (20) converges strongly as k → ∞ to ν1 (log dµ/dm). Thus we see that Hk converges in µ-measure to 1 1 dµk dνk h0 = lim νk log − log = lim S(νk , µk ) = SM (ν, µ∞ ). k→∞ k k→∞ k dmk dmk (21) Now for each k let pk be the projection in Bk with pk ≤ s(νk ) such that πν (pk ) is the spectral projection of Hk corresponding to the interval (h0 − δ, h0 + δ). Then there exists k0 such that ν(pk ) ≥ 1 − ε for all

entire

December 28, 2004

13:56



54

k ≥ k0 . Since πν (pk ) is the spectral projection of exp(kHk ) corresponding to (ek(h0 −δ) , ek(h0 +δ) ), we get πν (pk ) ≤ e−k(h0 −δ) πν (pk ) exp(kHk ) −1 dνk dµk = e−k(h0 −δ) pk , dmk dmk so that since pk ≤ s(νk ) pk ≤ e

−k(h0 −δ)

Therefore µk (pk ) ≤ e

−k(h0 −δ)

pk

dνk dmk

dµk dmk

−1 .

(22)

dνk mk p k = e−k(h0 −δ) νk (pk ) dmk

≤ e−k(h0 −δ) ≤ e−k(lh−δ) ,

(23)

because h0 ≥ S(ν1 , µ) ≥ lh by (21), (8) and (19). For each n ≥ k0 l let n = kl + j, where k ≥ k0 , 0 ≤ j < l, and put qn = pk . Then we have qn ∈ An and ψ(qn ) = 1 − ε, so that 1 1 1 k βε (ψn , ϕn ) ≤ log ϕn (qn ) ≤ log µk (pk ) ≤ − h+δ n n (k + 1)l k+1 by (23). This proves (10) thanks to −h + δ < r. Proof of (11) of Theorem 2: First suppose SM (ψ, ϕ∞ ) = 0. Then ψ = ϕ∞ by (2.1) and hence βε (ψn , ϕn ) ≥ log(1−ε), n ≥ 1, so that (11) is immediate. Now suppose SM (ψ, ϕ∞ ) > 0. For each n set βn = βε (ψn , ϕn ) and choose a projection qn in An such that ψn (qn ) ≥ 1 − ε and ϕn (qn ) < eβn +1 . Letting 0 < h < SM (ψ, ϕ∞ ), by (10) we have βn < −nh for sufficiently large n. Hence eβn +1 → 0 as n → ∞. Define Bn = Cqn + C(1 − qn ) which is a two dimensional subalgebra of An . Then by monotonicity S(ψn , ϕn ) ≥ S(ψn |Bn , ϕn |Bn ) = F (ψn (qn ), ϕn (qn )), where F (s, t) = s log

1−s s + (1 − s) log . t 1−t

Since for 0 < t < s ≤ 1, t−s ∂F (s, t) = < 0, ∂t t(1 − t)

(24)

entire

December 28, 2004

13:56



55

we have for every n large enough S(ψn , ϕn ) ≥ F (ψn (qn ), eβn +1 ) = −ψn (qn )(βn + 1) − (1 − ψn (qn )) log(1 − eβn +1 ) +ψn (qn ) log ψn (qn ) + (1 − ψn (qn )) log(1 − ψn (qn )) ≥ −(1 − ε)(βn + 1) − log 2. Therefore (1 − ε) lim inf n→∞

1 1 βn ≥ −(1 − ε) lim inf βn ≥ −SM (ψ, ϕ∞ ), n n

as desired. Proof of Theorem 3:

It suffices by monotonicity to show that

1 Spr (ψn , ϕn ) ≥ SM (ψ, ϕ∞ ). (25) n We may suppose SM (ω, ϕ∞ ) > 0. Let 0 < ε < 1 and 0 < h < SM (ψ, ϕ∞ ). By (10) we have βε (ψn , ϕn ) < −nh for sufficiently large n. Hence for each such n we can choose a projection qn in An such that ψn (qn ) ≥ 1 − ε and ϕn (qn ) < e−nh . Let Bn = Cqn + C(1 − qn ) and F be the function in (24). Then we have as in the proof of (11), lim inf n→∞

Spr (ψn , ϕn ) ≥ S(ψn |Bn , ϕn |Bn ) ≥ F (ψn (qn ), ϕm (qn )) ≥ F (ψn (qn ), e−nh ) ≥ n(1 − ε)h − log 2 for every n large enough. Therefore 1 Spr (ψn , ϕn ) ≥ (1 − ε)h. n We obtain (25) letting ε → 0 and h → SM (ψ, ϕ∞ ). lim inf n→∞

Indeed it is enough in Theorems 2 and 3 to assume that ψ is a stationary state of A∞ which is ergodic for γ n for infinitely many n. 4. Extensions In this section we observe that our theorems in Sec. 2 remain true in AF C ∗ -algebras or hyperfinite von Neumann algebras. When A is a general C ∗ algebra (always assumed to be unital), given two states ω and ϕ of A one can define the relative entropy S(ω, ϕ) of ω with respect to ϕ as Uhlmann’s relative entropy [32]. But this is also defined through Araki’s one [1, 2] for normal states of von Neumann algebras as follows: if π is a representation of A such that ω and ϕ have the respective normal extensions ω ¯ and ϕ¯ to

entire

December 28, 2004

13:56



56

π(A) with ω ¯ ◦ π = ω and ϕ¯ ◦ ϕ = ϕ, then we have S(ω, ϕ) = S(¯ ω , ϕ) ¯ (see [15, 25]). Let A be a C ∗ -algebra with a fixed state ϕ. As in Sec. 2 we take the ∗ n C -tensor products A∞ = ⊗∞ −∞ A, An = ⊗1 A, n ≥ 1, the right shift automorphism of A∞ and the product state ϕ∞ of A∞ . Moreover for a stationary state ψ of A∞ the mean relative entropy SM (ψ, ϕ∞ ) is defined as (8). To extend the theorems to the case of an AF C ∗ -algebra, we first give the following lemma. Lemma 9: Let {A(j)} be an increasing net of C ∗ -subalgebras of A such $ that A = j A(j). Let ψ be a stationary state of A∞ . If ϕ(j) = ϕ|A(j) and ψ(j) = ψ|A(j)∞ , then SM (ψ(j), ϕ(j)∞ ) increases to SM (ψ, ϕ∞ ). $ Proof: For each n, since An = j A(j), we have sup S(ψn |A(j)n , ϕn |A(j)n ) = S(ψn , ϕn ) j

by the martingale convergence of relative entropy [2, 16] applied under some representation of An . Hence 1 1 S(ψn , ϕn ) = sup sup S(ψn |A(j)n , ϕn |A(j)n ) n n j n≥1 n≥1

SM (ψ, ϕ∞ ) = sup

= sup SM (ψ(j), ϕ(j)∞ ). j

The increasingness is immediate from the monotonicity of relative entropy. Now assume that A is an AF C ∗ -algebra. Then there is an increasing $ net {A(j)} of finite dimensional subalgebras of A such that A = j A(j). Let ψ be a stationary state of A∞ and define Sco (ψn , ϕn ), n ≥ 1, as (3). For each j and n we have by Lemmas 7 and 8 applied to ψn |A(j)n , S(ψn |A(j)n , ϕn |A(j)n ) ≤ Sco (ψn , ϕn ) + Kj log(n + 1), where Kj is the sum of the sizes of simple summands of A(j). Hence for every j, SM (ψ(j), ϕ(j)∞ ) ≤ lim inf n→∞

1 Sco (ψn , ϕn ), n

so that Lemma 9 shows the equality (9). Furthermore by the above argument we can choose, given h < SM (ψ, ϕ∞ ), a finite dimensional commutative subalgebra B of Al for some l > 1 such that (19) holds. Hence the proof of (10) of Theorem 2 works well. Thus we infer that Theorems 1–3

entire

December 28, 2004

13:56



57

remain true for every stationary or completely ergodic state ψ of A∞ when A is an AF C ∗ -algebra. Our theorems can be formulated in the framework of von Neumann algebras too. Let M be a von Neumann algebra with a fixed normal state n tensor product ⊗1 M for n ≥ 1, ϕ. Let Mn be the n-fold von Neumann % & n and M∞ the C ∗ -completion of ∪∞ n=1 ⊗−n M . Then {Mn } is an increasing sequence of von Neumann algebras included in M∞ . We have the right shift and the product state ϕ∞ of M∞ as before. If {M(j)} is an increasing net

$ of von Neumann subalgebras of M such that M = M(j) and if j ϕ is a stationary state of M∞ such that ϕn = ϕ|Mn is normal for every n, then the same conclusion as Lemma 9 holds. Now assume that M is hyperfinite (i.e. approximately finite dimensional). Of course this is the case when M = B(H), the algebra of all bounded operators on a Hilbert space H. Then the same results as Theorems 1–3 hold for every stationary or completely ergodic state ϕ of M∞ such that ϕn is normal for every n. In particular we have (13) for every normal state ω of M. Assume again that A is finite dimensional with a state ϕ. We finally consider the special setting where ψ is a tracial ergodic state of A∞ . In this case we denote by ψ¯ and γ¯ the respective normal extensions of ψ and γ to πψ (A∞ ) . Then ψ¯ becomes a faithful normal tracial state of πψ (A∞ ) with ¯ Let Tr be the canonical trace of A mentioned at the beginning of ψ¯ ◦ γ¯ = ψ. Sec. 3 and Z(A) be the center of A. Although it is a challenging open problem to establish the noncommutative Shannon-McMillan-Breiman theorem for general ergodic states of A∞ , the special case of the following lemma was given in [20]. In fact, this follows from the classical case for ψ| ⊗∞ −∞ Z(A) n because dψn /d Trn belongs to ⊗1 Z(A), the center of An . Lemma 10: Let A and ψ be as above. Then − n1 πψ (log dψn /d Trn ) con¯ ¯ to a constant h. (h is the verges ψ-almost uniformly and in L1 (ψ)-norm dynamical entropy [7] of γ relative to ψ.) On the other hand the noncommutative mean ergodic theorem was given in [18]. Based on these convergence results we have the next proposition which is a stronger form of Theorem 2 and extends the classical results in [4, 6]. A similar result is given in [19] when ψ is the trace Tr and ψ is a product state. Proposition 11: Let ψ be a tracial ergodic state of A∞ . Then for every

entire

December 28, 2004

13:56



58

0 < ε < 1, lim

n→∞

1 βn (ψn , ϕn ) = −SM (ψ, ϕ∞ ). n

Proof: We shall show that lim inf n→∞

1 βε (ψn , ϕn ) ≥ −SM (ψ, ϕ∞ ). n

(26)

Our method in the following will allow the proof of (10) of Theorem 2 to be adapted to give 1 lim sup βε (ψn , ϕn ) ≤ −SM (ψ, ϕ∞ ). n n→∞ When s(ψ1 ) ≤ s(ϕ) is not satisfied, we can see arguing as in the proof of (10) that βε (ψn , ϕn ) = −∞ for every n large enough. This proves the proposition because SM (ψ, ϕ∞ ) ≥ S(ψ1 , ϕ) = +∞. Now suppose s(ψ1 ) ≤ s(ϕ). Then we have s(ψn ) ≤ ⊗n1 s(ψ1 ) ≤ ⊗n1 s(ϕ) = s(ϕn ) for every n. Define the selfadjoint operators Hn , n ≥ 1, in πψ (A∞ ) by 1 dψn dϕn Hn = − πψ log πψ log n d Trn d Trn n−1 1 dψn 1 i dϕ γ¯ = πψ log − πψ log . n d Trn n i=0 d Tr Lemma 10 and the noncommutative mean ergodic theorem imply that Hn ¯ converges in ψ-measure as n → ∞ to 1 dϕn dψn h0 = lim ψn log − log = SM (ψ, ϕ∞ ). n→∞ n d Trn d Trn Let 0 < δ < 1 − ε. For each n let pn be the projection in An with pn ≤ s(ψn ) such that πψ (pn ) is the spectral projection of Hn corresponding to (h0 − δ, h0 + δ). Then there exists n0 such that ψ(pn ) ≥ 1 − δ for all n ≥ n0 . Since dψn /d Trn , dϕn /d Trn and pn commute with one another, we get as (22), −1 dψn dϕn . (27) pn ≥ e−n(h0 +δ) pn log log d Trn d Trn For each n choose a projection qn in A such that ψ(qn ) ≥ 1 − ε and ϕn (qn ) < exp{βε (ψn , ϕn ) + 1}.

(28)

entire

December 28, 2004

13:56



59

Let qn = qn ∧ pn . Then for n > n0 we get thanks to the traciality of ψ, ψ(1 − qn ) ≤ ψ(1 − qn ) + ψ(1 − pn ) ≤ ε + δ.

(29)

Furthermore by (27)

dϕn dϕn = Trn qn ϕn (qn ) ≥ = Trn qn pn qn d Trn d Trn dψn −n(h0 +δ) ≥e Trn qn pn qn d Trn −n(h0 +δ) dψn Trn qn (30) = e−n(h0 +δ) ψn (qn ). =e d Trn ϕn (qn )

By (28)–(30) we have for n ≥ n0 exp{βε (ψn , ϕn ) + 1} ≥ e−n(h0 +δ) (1 − ε − δ), so that since 1 − ε − δ > 0, 1 1 {βε (ψn , ϕn ) + 1} ≥ −h0 − δ + log(1 − ε − δ). n n Therefore 1 lim inf βε (ψn , ϕn ) ≥ −h0 − δ. n→∞ n This implies (26). Acknowledgments The first author wishes to thank Professor Hiroshi Nagaoka who suggested to him the asymptotic theory for relative entropy. The second author is grateful to Professors L. Accardi and I. Csiszár for discussions on abelian subalgebras and I-divergency. The authors would also like to thank the referee for many valuable remarks on the first version of this paper. References [1] H. Araki, “Relative entropy of states of von Neumann algebras,” Publ. RIMS, Kyoto Univ., 11, 809–833 (1976). [2] H. Araki, “Relative entropy for states of von Neumann algebras, II,” Publ. RIMS, Kyoto Univ., 13, 173–192 (1977). [3] P. Billingsley, Ergodic theory and information, Wiley, New York, 1965. [4] R.E. Blahut, Principles and practice of information theory, Reading, Addison-Wesley, MA, 1987. [5] V.P. Belavkin and P. Staszewski, “C ∗ -algebraic generalization of relative entropy and entropy,” Ann. Inst. H. Poincar´ e, Sec. A 37, 51–58 (1982).

entire

December 28, 2004

60

13:56



[6] L. Csisz´ ar and J. K¨ orner, Information theory, coding theorems for discrete memoryless systems, Akademiai Kiado/Orlando: Academic Press, Budapest, 1981. [7] A. Connes and E. Størmer, “Entropy for automorphisms of II1 von Neumann algebras,” Acta Math., 134, 289–306 (1975). [8] M.J. Donald, “On the relative entropy,” Commun. Math. Phys., 105, 13–34 (1986). [9] M.J. Donald, “Continuity and relative hamiltonians,” Commun. Math. Phys., 136, 625–632, (1991). [10] S. Doplicher and D. Kastler, “Ergodic states in a non-commutative ergodic theory,” Commun. Math. Phys., 7, 1–20 (1968). [11] R.S. Ellis, Entropy, large deviations, and statistical mechanics, Berlin, Heidelberg, Springer, New York, 1985. [12] J.I. Fujii and E. Kamei, “Relative operator entropy in noncommutative information theory,” Math. Japan., 34, 341–348 (1989). [13] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [14] F. Hiai, M. Ohya, and M. Tsukada, “Sufficiency, KMS condition and relative entropy in von Neumann algebras,” Pacific J. Math., 96, 99–109 (1981). [15] F. Hiai, M. Ohya, and M. Tsukada, “Sufficiency and relative entropy in ∗-algebras with applications in quantum systems,” Pacific J. Math., 107, 117–140 (1983). [16] H. Kosaki, “Relative entropy of states: a variational expression,” J. Operator Theory, 16, 335–348 (1986). [17] S. Kullback and R.A. Leibler, “On information and sufficiency,” Ann. Math. Statist., 22, 79–86 (1951). [18] L. Kov´ acs and J. Sz´ ucs, “Ergodic type theorems in von Neumann algebras,” Acta Sci. Math., 27, 233–246 (1966). [19] M. Ohya and D. Petz, “Notes on quantum entropy,” preprint∗ . [20] M. Ohya, M. Tsukada, and H. Umegaki, “A formulation of noncommutative McMilian theorem,” Proc. Jpn. Acad., 63A, 54–63 (1987). [21] W. Parry, Entropy and generators in ergodic theory, Benjamin, New York, 1969. [22] D. Petz, “Properties of quantum entropy,” in L. Accardi and W. von Waldenfels, eds., Quantum probability and applications, II, Lect. Notes Math., 1136, 428–441, Berlin, Heidelberg, Springer, New York, 1985. [23] D. Petz, “Properties of the relative entropy of states of von Neumann algebras,” Acta Math. Hungar., 47, 65–72 (1986). [24] D. Petz, “Sufficient subalgebras and the relative entropy of states of a von Neumann algebra,” Commun. Math. Phys., 105, 123–131 (1986). [25] D. Petz, “First steps towards a Donsker and Varadhan theory in operator algebras,” Lect. Notes Math., 1442, 311–319. Berlin, Heidelberg, Springer, New York, 1990. ∗ Editorial

note: This paper has been published as M. Ohya and D. Petz, “Notes on quantum entropy,” Studia Sci. Math. Hungar., 31, 423–430 (1996).

entire

December 28, 2004

13:56



61

[26] D. Petz, “On certain properties of the relative entropy of states of operator algebras,” Math. Z (to appear)† . [27] D. Petz, “Characterization of the relative entropy of states of matrix algebras,” preprint‡ . [28] D. Petz, G.A. Raggio, and A. Verbeure, “Asymptotics of Varadhan-type and Gibbs variational principle,” Commun. Math. Phys., 121, 271–282 (1989). [29] G.A. Raggio and R.F. Werner, “Quantum statistical mechanics of general mean field systems,” Helv. Phys. Acta, 62, 980–1003 (1989). [30] E. Størmer, “Large groups of automorphisms of C ∗ -algebras,” Commun. Math. Phys., 5, 1–22 (1967). [31] M. Takesaki, “Conditional expectations in von Neumann algebras,” J. Funct. Anal., 9, 306–321 (1972). [32] A. Uhlmann, “Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpolation theory,” Commun. Math. Phys., 54, 21–32 (1977). [33] H. Umegaki, “Conditional expectation in an operator algebra, IV (entropy and information),” K¯ odai Math. Sem. Rep., 14, 59–85 (1962).

Technical Terms in Operator Algebra§ operator-concavity A real continuous function f on an interval I is said to be operator concave if the inequality f (λA + (1 − λ)B) ≤ λf (A) + (1 − λ)f (B) in the positive semidefiniteness order holds for any selfadjoint operators A, B with spectra inside I and for any 0 < λ < 1, where f (A) and others are defined via continuous functional calculus. AF C ∗ -algebra A C ∗ -algebra A is said to be AF (approximately finite ) if there is a net (usually a sequence) {Aj } of finite dimensional ∗-subalgebras of A such $ that j Aj is norm-dense in A. hyperfinite von Neumann algebra Let M be a von Neumann algebra on a separable Hilbert space H, i.e., M is a ∗-subalgebra (containing the identity operator) of B(H) such that M = M , the double commutant of M, or equivalently M is closed in the † Editorial note: This paper has been published as D. Petz, “On certain properties of the relative entropy of states of operator algebras,” Math. Z., 206, 351–361 (1991). ‡ Editorial note: This paper has been published as D. Petz, “Characterization of the relative entropy of states of matrix algebras,” Acta Math. Hungar., 59, 449–455 (1992). § This part is written by F. Hiai based on editor’s special request

entire

December 28, 2004

13:56


62


weak operator topology. If there is an increasing sequence {Mn } of finite $∞ dimensional ∗-subalgebras of M such that ( n=1 Mn ) = M, M is said to be hyperfinite or AFD (approximately finite dimensional ). A fundamental result of Connes says that M is hyperfinite if and only if it is injective, i.e., there is a norm one projection from B(H) onto M. GNS representation Let A be a C ∗ -algebra with identity 1 and ϕ be a state on A. A Hilbert space Hϕ is defined as the completion of A/Nϕ with an inner product a+Nϕ , b+Nϕ = ϕ(b∗ a), a, b ∈ A, where Nϕ = {a ∈ A : ϕ(a∗ a) = 0}. The ∗-representation πϕ : A → B(Hϕ ) can be defined by πϕ (a)(b + Nϕ ) = ab + Nϕ , a, b ∈ A, so that ϕ(a) = πϕ (a)ξϕ , ξϕ and Hϕ = πϕ (A)ξϕ where ξϕ = 1+Nϕ. Such a triple (πϕ , Hϕ , ξϕ ) is unique up to unitary equivalence, and it is called the GNS (Gelfand-Naimark-Segal ) representation of A associated with ϕ. mean ergodic theorem The mean ergodic theorem of von Neumann says that if U is a unitary operator on a Hilbert space H and P is the orthogonal projection form k converges H onto the subspace {ξ ∈ H : U ξ = ξ}, then n−1 n−1 k=0 U to P in the strong operator topology. When σ is an automorphism of a C ∗ -algebra A and ϕ is a state on A with ϕ ◦ σ = ϕ, there is a unitary operator Uσ on Hϕ implementing σ so that πϕ (σ(a)) = Uσ πϕ (a)Uσ∗ , a ∈ A, where (πϕ , Hϕ , ξϕ ) is the GNS representation associated with ϕ. Since πϕ (σ k (a))ξϕ = Uσk πϕ (a)ξϕ , the mean ergodic theorem can be applied to speak the convergence of πϕ (σ k (a))ξϕ . Kolmogorov-Sinai entropy Let (X, B, µ) be a probability space and T be an automorphism on X with µ ◦ T −1 = µ. For any finite measurable partition A of X, let "n−1 "n−1 Hµ ( k=0 T −k A) denote the (Shannon) entropy of k=0 T −k A, the finite partition generated by A, T −1 A, . . . , T −(n−1) A, with respect to µ. The "n−1 limit hµ (A, T ) = limn→∞ n−1 Hµ ( k=0 T −k A) exists, and the KolmogorovSinai entropy hµ (T ) is defined as the supremum of hµ (A, T ) over all finite measurable partitions A. If A0 is a measurable partition such that "∞ −k A0 = B, then hµ (T ) = hµ (A0 , T ) (the Kolmogorov-Sinai theok=−∞ T rem). In particular, when X is the infinite product space of {1, 2, . . . , d} (the

entire

December 28, 2004

13:56



63

alphabets) with the shift T and µ is a shift-invariant probability measure on X, hµ (T ) is the mean entropy of µ or the average information contained in the input-source µ per letter. Shannon-McMillan-Breiman theorem Let (X, B, µ, T ) be as in the item of Kolmogorov-Sinai entropy. For each finite measurable partition A of X, let Iµ (A) be the information of A with respect to µ defined by Iµ (A) = − A∈A χA (x) log µ(A) (so the expectation of Iµ (A) is the entropy Hµ (A)). The Shannon-McMillan-Breiman " −k A) converges to theorem asserts that if T is ergodic then n−1 Iµ ( n−1 k=0 T 1 hµ (A, T ) in L and µ-almost surely. This theorem played a crucial role in information theory and coding theory.

entire

December 28, 2004

13:56


CHAPTER 4

Strong Converse Theorems in Quantum Information Theory Hiroshi Nagaoka Abstract. There are many situations in information theory where it is required to find the threshold value of some relevant quantity for which a kind of error probability can asymptotically be made arbitrarily small. The strong converse property means that when the quantity exceeds the threshold value the error probability inevitably goes to one asymptotically. In quantum information theory, the strong converse property is known for (classical) capacity of quantum memoryless channel ([1, 4]) and for hypothesis testing problem of two quantum i.i.d. states ([2]). In this article we provide new proofs, which are based on a simple and unified argument, with these results.

1. Key Inequality Let ρ and σ be density operators on a common Hilbert space H. For any s ∈ k [0, 1] and any POVM {Xi }ki=1 it holds that i=1 (Tr[ρXi ])1+s (Tr[σXi ])−s ≤ Tr[ρ1+s σ −s ], which follows from a result in [3] and was used to prove the strong converse theorem for quantum hypothesis testing in [2]. In particular, for any self-adjoint operator A satisfying 0 ≤ A ≤ I, which we call a test in general, we have (Tr[ρA])1+s (Tr[σA])−s ≤ Tr[ρ1+s σ −s ].

(1)

This is the key inequality for the arguments below. 2. Hypothesis Testing def

def

Given a test An on H⊗n , let αn (An ) = 1 − Tr[ρ⊗n An ] and βn (An ) = Tr[σ ⊗n An ], which are interpreted as the error probabilities of the first and second kinds, and apply the above inequality to ρ⊗n , σ ⊗n and An to yield 1+s s log(1 − αn (An )) − log βn (An ) ≤ ψ(s), n n def

def

where ψ(s) = log Tr[ρ1+s σ −s ]. From lims↓0 ψ(s)/s = Tr[ρ(log ρ − log σ)] = def

D(ρ σ), we see that if r = − lim supn→∞ n1 log βn (An ) > D(ρ σ) then αn (An ) necessarily goes to one with exponential rate s ψ(s) 1 r− lim sup log(1 − αn (An )) ≤ − max . 0≤s≤1 1 + s 1+s n→∞ n 64

entire

December 28, 2004

13:56


Strong Converse Theorems in Quantum Information Theory

65

3. Channel Coding Let ∆ be a compact subset of S(H) (:the set of density operators on H), which is supposed to be the set of possible received states of a quantum communication channel. For n, M ∈ N and ε > 0, an (n, M, ε)(n) code consists of a family of product states ρi = ρi,1 ⊗ · · · ⊗ ρi,n with (n) ⊗n satρi,j ∈ ∆, i = 1, . . . , M , and a POVM X (n) = {Xi }M i=1 on H (n) (n) 1 isfying M i Tr[ρi Xi ] ≥ 1 − ε. Setting ρ, σ and A in (1) to be the (n) (n) 1 1 block diagonal matrices M diag{ρ1 , . . . , ρM }, M diag{σ ⊗n , . . . , σ ⊗n } and (n) (n) diag{X1 , . . . , XM }, where σ is an arbitrary state in S(H), we have (1 − ε)1+s M s ≤

M n 1 −s Tr[ρ1+s ] i,j σ M i=1 j=1

≤ max(Tr[ρ1+s σ −s ])n . ρ∈∆

It follows that if a sequence of (n, Mn , εn )-codes for n = 1, 2, . . . satisfies 1 def R = lim inf log Mn > min max D(ρ σ) n→∞ n σ∈S(H) ρ∈∆ = max λi D(ρi

λj ρj ), λ,{ρi }⊂∆

i

j

then εn necessarily goes to one with exponential rate s 1 log Tr[ρ1+s σ −s ] R − min max lim sup log(1 − εn ) ≤ − max . 0≤s≤1 1 + s σ∈S(H) ρ∈∆ 1+s n→∞ n References [1] T. Ogawa and H. Nagaoka, IEEE Trans. Inform. Theory, 45, 2486–2489 (1999). [2] T. Ogawa and H. Nagaoka, IEEE Trans. Inform. Theory, 46, 2428-2433 (2000). (Chap. 2 of this book) [3] D. Petz, Rep. Math. Phys., 23, 57–65 (1986); Publ. RIMS, Kyoto Univ., 21, 787–800 (1985). [4] A. Winter, IEEE Trans. Inform. Theory, 45, 2481-2485 (1999).

entire

December 28, 2004

13:56


CHAPTER 5

Asymptotics of Quantum Relative Entropy from a Representation Theoretical Viewpoint Masahito Hayashi Abstract. In this paper it is proved that the quantum relative entropy D(ρσ) can be asymptotically attained by the relative entropy of probabilities given by a certain sequence of positive-operator-valued measurements (POVMs). The sequence of POVMs depends on σ, but is independent of the choice of ρ.

1. Introduction In classical statistical theory, the relative entropy D(p q) is an information quantity which represents the statistical efficiency in distinguishing a probability measure p from another probability measure q. In the quantum mechanical case, states are described by density operators on a Hilbert space H, which represents a physical system of interest. We can distinguish quantum states by data given through quantum measurements. A quantum measurement is described by a positiveoperator-valued measure (POVM) M = {Mi }i∈I , which is a partition of the unit into positive operators. A POVM M = {Mi }i∈I satisfying Mi2 = Mi for any index i is called a projection-valued measure (PVM); this plays an important role in this paper. When the quantum measurement corresponding to a POVM M is made on the system in a state ρ, the data obey the probability distribution PM ρ (i) := Tr Mi ρ. The quantum relative entropy D(ρ σ) := Tr ρ(log ρ − log σ) is known as a quantum analogue of the relative entropy. However, the information quantity, which is directly linked to statistical significance, is not the quantum relative entropy D(ρ σ), but the relative entropy DM (ρ σ) := D(PρM PσM ). Concerning the relation between the two quantities D(ρ σ) and DM (ρ σ), we have the following inequality from the monotonicity of the quantum relative entropy [5, 9] DM (ρ σ) ≤ D(ρ σ) This chapter is reprinted from J. Phys. A: Math. and Gen., 34, 3413-3419, 2001. 66

(1)

entire

December 28, 2004

13:56


Asymptotics of Quantum Relative Entropy

67

where the equality holds for some M if and only if ρσ = σρ (see Petz [8], proposition 1.16 in Ohya-Petz [6] and Fujiwara-Nagaoka [1]). As for the inequality (1), Hiai-Petz [4] proved that even if the states ρ and σ are not commutative with one another, the equality is attained in an asymptotic setting as described below. First, we introduce the quantum i.i.d. condition in order to treat an asymptotic setting. Suppose that n independent physical systems are given in the same state ρ, then the quantum state of the composite system is described by ρ⊗n , defined by ρ⊗n := ρ ⊗ · · · ⊗ ρ on H⊗n n

where the tensored space H⊗n is defined by H⊗n := H ⊗ · · · ⊗ H . n

We call this condition the quantum i.i.d. condition, which is a quantum analogue of the independent-identical-distribution condition. Under the quantum i.i.d. condition, the equation D(ρ⊗n σ ⊗n ) = nD(ρ σ) holds. Hiai-Petz [4] proved the following theorem. (For the infinitedimensional case, see Petz [7].) Theorem 1: Let k be the dimension of H and let σ and ρn be states on H and H⊗n , respectively. Then there exists a POVM M n on tensored space H⊗n such that (k − 1) log(n + 1) 1 D(ρn σ ⊗n ) − n n 1 Mn 1 ≤ D (ρn σ ⊗n ) ≤ D(ρn σ ⊗n ). n n

(2)

When the limit limn→∞ n1 D(ρn σ ⊗n ) converges, the equation 1 Mn 1 D (ρn σ ⊗n ) = lim D(ρn σ ⊗n ) n→∞ n n→∞ n lim

holds. Theorem 1 tells us that there exists a sequence {Mn } satisfying (2) which may depend on both {ρn } and σ. In this paper, using a representation theoretical argument on the representation of SL(H) on H⊗n , we prove that there exists a sequence {Mn } satisfying (2) which depends only on σ

entire

December 28, 2004

13:56


Masahito Hayashi

68

when the sequence {ρn } satisfies the quantum i.i.d. condition; i.e., the state ρn is ρ⊗n . The following is the main theorem. Theorem 2: Let k be the dimension of H and let σ be a state on H. Then there exists a POVM M n on the tensored space H⊗n which satisfies D(ρ σ) −

n 1 (k − 1) log(n + 1) ≤ DM (ρ⊗n σ ⊗n ) ≤ D(ρ σ) ∀ρ. n n

(3)

In section 2, we prove theorem 2 and discuss the difference between our proof and the proof of theorem 1 given by Hiai-Petz [4]. In section 3, we explain some results in representation theory which are necessary for the proof of theorem 2. In the end of section 3, we construct the POVM M n satisfying (3). In section 4, we extend theorem 2 to the infinite-dimensional case. If we perform the POVM M n satisfying (3), we can attain the quantum relative entropy D(ρ σ) w.r.t. the rate of the second error probability in the quantum hypothesis testing: the null hypothesis is ρ⊗n and the alternative is σ ⊗n . Theorem 2 claims that the POVM M n is independent of the alternative ρ. Therefore, the POVM M n is useful for the quantum hypothesis testing, in which the alternative hypothesis consists of plural tensored states. In addition, an application of theorem 2 to the quantum estimation will be discussed in another paper [3] in preparation. 2. Proof of the Main Theorem In this section, we will prove the main theorem after some discussions about PVMs and the quantum relative entropy in the non-asymptotic setting. We make some definitions for this purpose. A state ρ is called commutative with a PVM E(= {Ei }) on H if ρEi = Ei ρ for any index i. For PVMs E(= {Ei }i∈I ), F (= {Fj }j∈J ), the notation E ≤ F means that for any index i ∈ I there exists a subset (F/E)i of the index set J such that Ei = j∈(F/E)i Fj . For a state ρ, we denote E(ρ) by the spectral measure of ρ, which can be regarded as a PVM. The map EE with respect to a PVM E is defined as: Ei ρEi (4) EE : ρ )→ i

which is an affine map from the set of states to itself. Note that the state EE (ρ) is commutative with a PVM E. If a PVM F = {Fj } is commutative with a PVM E = {Ei }, then we can define the PVM F × E = {Fj Ei }, which satisfies that F × E ≥ E and F × E ≥ F .

entire

December 28, 2004

13:56



69

Theorem 3: Let E be a PVM such that w(E) := supi dim Ei < ∞. If states σ, ρ are commutative with the PVM E and a PVM F satisfies E ≤ F, E(σ) ≤ F , then we have D(ρ σ) − log w(E) ≤ D(EF (ρ) EF (σ)) ≤ D(ρ σ).

(5)

This theorem follows from lemma 5 and lemma 6 below. Using theorem 3 and the following lemma, we will prove the main theorem. Lemma 4: There exists a PVM E n on H⊗n which is commutative with ρ⊗n for any ρ and satisfies the relation w(E n ) ≤ (n + 1)(k−1) . Lemma 4 is proved in the next section from a representation theoretical viewpoint. Now, let E n be a PVM satisfying the condition given in lemma 4. Then there exists a PVM F n such that F n ≥ E n × E(σ ⊗n ) and that w(F n ) = 1. Using theorem 3, we have the following. D(ρ σ)−

(k − 1) log(n + 1) 1 ≤ D(EF n (ρ⊗n ) EF n (σ ⊗n )) ≤ D(ρ σ) ∀ρ.(6) n n

Since the condition w(F n ) = 1 implies D(EF n (ρ⊗n ) EF n (σ ⊗n )) = n DF (ρ⊗n σ ⊗n ), we obtain theorem 2. Let us compare the above argument with that of Hiai-Petz [4]. We first note that the inequality DF (ρ σ) ≤ D(EF (ρ) EF (σ)) holds for any PVM, but the equality, in general, does not hold unless w(F ) = 1. Instead of (6), Hiai-Petz proved the following: 1 (k − 1) log(n + 1) 1 D(ρn σ ⊗n ) − ≤ D(EE(σ⊗n ) (ρ⊗n ) EE(σ⊗n ) (σ ⊗n )) n n n 1 (7) ≤ D(ρn σ ⊗n ). n In the case when ρn = ρ⊗n , this is the same as (6) except that E(σ ⊗n ) is substituted for F n . Since w(E(σ ⊗n )) = 1 does not hold, however, D(EE(σ⊗n ) (ρ⊗n ) EE(σ⊗n ) (σ ⊗n )) cannot be replaced with ⊗n DE(σ ) (ρ⊗n σ ⊗n ) here. In other words, even though the PVM E(σ ⊗n ) does not depend on a state ρ, the inequality (7) does not imply the existence of a PVM M n depending only on σ and satisfying (3) for all ρ. Indeed, the PVM M n which was shown to satisfy (2) in [4] is not E(σ ⊗n ), but E(σ ⊗n ) × E(EE(σ⊗n ) (ρn )), which depends on ρn in general. Therefore, the discussion in Hiai-Petz [4] does not imply theorem 2. Now, we prove two lemmas used in a proof of theorem 3. The following lemma 5 is the same as lemma 3.1 in Hiai-Petz [4] and theorem 1.13 in

entire

December 28, 2004

13:56


Masahito Hayashi

70

Ohya-Petz [6]. However, lemma 5 is proved in the following because it plays a particularly important role in our proof of the main theorem. Lemma 5: Let ρ, σ be states. If a PVM F satisfies E(σ) ≤ F , then D(ρ σ) = D(EF (ρ) EF (σ)) + D(ρ EF (ρ)).

(8)

Proof: As E(σ) ≤ F and F is commutative with σ, we have Tr EF (ρ) log EF (σ) = Tr ρ log σ. Since ρ is commutative with log ρ, we have Tr EF (ρ) log ρ = Tr ρ log ρ. Therefore, we get the following: D(EF (ρ) EF (σ)) − D(ρ σ) = Tr EF (ρ)(log EF (ρ) − log EF (σ)) − Tr ρ(log ρ − log σ) = Tr EF (ρ)(log EF (ρ) − log ρ). This proves (8). Lemma 6: Let E, F be PVMs such that E ≤ F . If a state ρ is commutative with E, then we have D(ρ EF (ρ)) ≤ log w(E). Proof: Let ai := Tr Ei ρEi , ρi := a1i Ei ρEi . Then we have ρ = EF (ρ) = i ai EF (ρi ), i ai = 1. Therefore, D(ρ EF (ρ)) = =

(9) i

ai ρi ,

Tr Ei ρ(log ρ − log EF (ρ))

i

Tr Ei ρEi (Ei log ρEi − Ei log EF (ρ)Ei )

i

=

ai D(ρi EF (ρi )) ≤ sup D(ρi EF (ρi )) i

i

= sup (Tr ρi log ρi − Tr EF (ρi ) log EF (ρi )) i

≤ − sup Tr EF (ρi ) log EF (ρi ) ≤ sup log dim Ei = log w(E). i

i

Thus, we obtain inequality (9). It is interesting to compare (9) with lemma 3.2 in Hiai-Petz [4], which was used to show (7); i.e., D(ρ EF (ρ)) ≤ log h(F ) where h(F ) denotes the number of indices i ∈ I for the PVM F = {Fi }i∈I .

entire

December 28, 2004

13:56



71

3. Quantum I.I.D. Condition from Group Theoretical Viewpoint In this section, we discuss the quantum i.i.d. condition from a group theoretical viewpoint for the purpose of lemma 4. In section 3.1, we consider the relation between irreducible representations and PVMs. In section 3.2, we discuss the quantum i.i.d. condition and PVMs from a theoretical viewpoint. 3.1. Group representation and its irreducible decomposition Let V be a finite-dimensional vector space over the complex numbers C. A map π from a group G to the generalized linear group of a vector space V is called a representation on V if the map π is homomorphism, i.e., π(g1 )π(g2 ) = π(g1 g2 ), ∀g1 , g2 ∈ G. A subspace W of V is called invariant with respect to a representation π if the vector π(g)w belongs to the subspace W for any vector w ∈ W and any element g ∈ G. A representation π is called irreducible if there is no proper nonzero invariant subspace of V with respect to π. Let π1 and π2 be representations of a group G on V1 and V2 , respectively. The tensored representation π1 ⊗ π2 of G on V1 ⊗ V2 is defined as (π1 ⊗ π2 )(g) = π1 (g) ⊗ π2 (g), and the direct sum representation π1 ⊕ π2 of G on V1 ⊕ V2 is also defined as (π1 ⊕ π2 )(g) = π1 (g) ⊕ π2 (g). In the following, we treat a representation π of a group G on a finitedimensional Hilbert space H. The following facts are crucial in the later arguments. There exists an irreducible decomposition H = H1 ⊕ · · · ⊕ Hl such that the irreducible components are orthogonal to one another if for any element g ∈ G there exists an element g ∗ ∈ G such that π(g)∗ = π(g ∗ ), where π(g)∗ denotes the adjoint of the linear map π(g). We can regard the irreducible decomposition H = H1 ⊕ · · · ⊕ Hl as the PVM {PHi }li=1 , where PHi denotes the projection to Hi . If two representations π1 , π2 satisfy the preceding condition, then the tensored representation π1 ⊗ π2 also satisfies it. Note that, in general, an irreducible decomposition of a representation satisfying the preceding condition is not unique. In other words, we cannot uniquely define the PVM from such a representation. 3.2. Relation between the tensored representation and PVMs Let the dimension of the Hilbert space H be k. Concerning the natural representation πSL(H) of the special linear group SL(H) on H, we consider

entire

December 28, 2004

72

13:56


Masahito Hayashi

⊗n its nth tensored representation πSL(H) := πSL(H) ⊗ · · · ⊗ πSL(H) on the n

tensored space H⊗n . For any element g ∈ SL(H), the relation πSL(H) (g)∗ = πSL(H) (g ∗ ) holds, where the element g ∗ ∈ SL(H) denotes the adjoint matrix of the matrix g. Consequently, there exists an irreducible decomposition of ⊗n regarded as a PVM and we denote the set of such PVMs by Ir⊗n . πSL(H) From the Weyl dimension formula ((7.1.8) or (7.1.17) in GoodmanWallch [2]), the nth symmetric tensored space is the maximum-dimensional space in the irreducible subspaces with respect to the nth tensored repre⊗n . Its dimension is equal to the repeated combination k Hn sentation πSL(H) & %n+k−1& % = = n+1 Hk−1 ≤ (n + 1)k−1 . Thus, evaluated by k Hn = n+k−1 k−1 n any element E n ∈ Ir⊗n satisfies that w(E n ) ≤ (n + 1)k−1 .

(10)

Lemma 7: A PVM E n ∈ Ir⊗n is commutative with the nth tensored state ρ⊗n of any state ρ on H. Proof: If det ρ "= 0, then this lemma is trivial from the fact that det(ρ)−1 ρ ∈ SL(H). If det ρ = 0, there exists a sequence {ρi }∞ i=1 such that det ρi "= 0 ⊗n → ρ as i → ∞. Because a PVM and ρi → ρ as i → ∞. We have ρ⊗n i ⊗n ⊗n n E ∈ Ir is commutative with ρi , it is, also, commutative with ρ⊗n . Now, we can see that lemma 4 follows from lemma 7 and (10). From the above discussion, if the PVM M n satisfies w(M n ) = 1 and E(σ ⊗n ) × E n ≤ M n for some E n ∈ Ir⊗n , the inequality (3) holds. In many cases, the relation w(E(σ ⊗n ) × E n ) = 1 holds. Therefore, in these cases, the PVM E(σ ⊗n ) × E n satisfies (3). For example, if the eigenvalues of σ are rationally independent of each other, the relation holds for arbitrary n. Also, in the spin-1/2 system except σ = 12 Id, the relation holds. In this case, the POVM E(σ ⊗n ) × E n can be regarded as a simultaneous measurement of the total momentum and the momentum of the direction specified by σ.

4. Infinite-Dimensional Case Next, we prove an infinite-dimensional version of theorem 2. Let B(H) be the set of bounded operators on H, and B(H)⊗n be B(H) ⊗ · · · ⊗ B(H). Ac n

cording to [6], from the separability of H, there exists a finite-dimensional approximation of H, i.e., a sequence {αn : B(Hn ) → B(H)} of unit-

entire

December 28, 2004

13:56



73

preserving completely positive maps such that Hn is finite-dimensional and lim D(α∗n (ρ) α∗n (σ)) = D(ρ σ)

n→∞

(11)

for any states ρ, σ on H such that µσ ≤ ρ ≤ λσ for some positive real numbers µ, λ, where α∗n is the adjoint of αn . From (3) and (11), for any positive integer n there exists a pair (ln , M n := {Min }) of an integer and a PVM on Hn⊗ln such that ' & n % (α∗n (ρ))⊗ln '(α∗n (σ))⊗ln DM 1 ∗ ∗ D(αn (ρ) αn (σ)) − (12) < . ln n ⊗l The completely positive map α⊗l to B(H)⊗l is defined as n from B(Hn ) ⊗l αn (A1 ⊗%A2 ⊗& · · · ⊗ Al ) = αn (A1 ) ⊗ αn (A2 ) ⊗ · · · ⊗ αn (Al ) for ∀Ai ∈ B(H). ∗ n n (ρ⊗l ) = α∗n (ρ)⊗l . Let M n := {α⊗l We have α⊗l n (Mi )}, then from (11), n (12) we obtain

% & % ⊗l &' &∗ % ⊗l & '% Mn ⊗ln ∗ n α ρ n ' α⊗l σ n M n ⊗ln ⊗ln D n n D (ρ σ ) = ln ln ' & n % α∗n (ρ)⊗ln ' α∗n (σ)⊗ln DM = ln 1 ∗ ∗ > D (αn (ρ) αn (σ)) + → D(ρ σ) as n → ∞. n Therefore, we obtain an infinite-dimensional version of theorem 2. Note that such a POVM M n is independent of ρ.

5. Conclusions It is proved that the quantum relative entropy D(ρ σ) is attained by the sequence of the relative entropies given by a certain sequence of PVMs which is independent of ρ. This formula is closely related to the quantum asymptotic detection. The physical realization of the sequence of measurements corresponding to PVMs satisfying (3) are left for future study. In the spin-1/2 system, it follows from the representation theoretical viewpoint in section 3 that it is enough to simultaneously measure the total momentum and the momentum of the direction specified by σ. Acknowledgments The author wishes to thank Professor H. Nagaoka for encouragement and useful advice to complete this manuscript. He is also grateful to thank Professor D. Petz for essential suggestions about the extension in the infinite-

entire

December 28, 2004

13:56


Masahito Hayashi

74

dimensional case. He is indebted to Professor A. Fujiwara and Dr. M. Ishii for several discussions on this topic. References [1] A. Fujiwara and H. Nagaoka, IEEE Trans. Inform. Theory, 44, 1071–1086 (1998). [2] R. Goodman and N. Wallch, Representations and Invariants of the Classical Groups, Cambridge University Press, 1998. [3] M. Hayashi, in preparation∗ . [4] F. Hiai and D. Petz, Commun. Math. Phys., 143, 99-114 (1991). (Chap. 3 of this book) [5] G. Lindblad, Commun. Math. Phys., 40, 147 (1975). [6] M. Ohya and D. Petz, Quantum Entropy and Its Use, Springer-Verlag, Berlin, 1993. [7] D. Petz, J. Funct. Anal., 120, 82–97 (1994). [8] D. Petz, “Properties of quantum entropies,” in L. Accardi and W. von Waldenfels, eds., Quantum Probability and Application II (Lecture Notes in Mathematics), 1136, 428-441, Springer, Berlin, 1985. [9] A. Uhlmann, Commun. Math. Phys. 54, 21 (1977).

∗ Editorial

book.

note: The paper [3] has been published, and is reprinted in Chap. 28 of this

entire

December 28, 2004

13:56


CHAPTER 6

Quantum Birthday Problems: Geometrical Aspects of Quantum Random Coding Akio Fujiwara Abstract. This correspondence explores asymptotics of randomly generated vectors on extended Hilbert spaces. In particular, we are interested to know how “orthogonal” these vectors are. We investigate two types of asymptotic orthogonality, the weak orthogonality and the strong orthogonality, that are regarded as quantum analogs of the classical birthday problem and its variant. As regards the weak orthogonality, a new characterization of the von Neumann entropy is derived, and a mechanism behind the noiseless quantum channel coding theorem is clarified. As regards the strong orthogonality, on the other hand, a characterization of the quantum Rényi entropy of degree 2 is derived. Index Terms: Asymptotic orthogonality, birthday problem, quantum channel coding, quantum information theory, random vector, R´ enyi entropy, von Neumann entropy.

1. Introduction Let H be a Hilbert space and let p be a probability measure on H the support of which is a countable set X of unit vectors. We assume that |X | ≥ 2 and that vectors in X are not parallel to each other. Associated with the probability measure p is the density operator p(φ) |φφ|, (1) ρ := φ∈X

where, and in what follows, we use the Dirac notation. Let {Xk (i)}ki be X -valued random variables independent and identically distributed (i.i.d.) with respect to p, and let {Ln }n be an increasing sequence of natural numbers. For each n ∈ N , we define Ln random vectors {Ψ(n) (i)}1≤i≤Ln on H⊗n by Ψ(n) (i) := X1 (i) ⊗ X2 (i) ⊗ · · · ⊗ Xn (i),

(i = 1, ..., Ln ).

c 2001 IEEE. Reprinted, with permission, from IEEE Trans. Inform. Theory, 47, 6, 2644-2649, 2001. 75

entire

December 28, 2004

13:56


Akio Fujiwara

76

We denote the inner product of the ith and jth vectors by n (n)

gij := Ψ(n) (i)|Ψ(n) (j) =

Xk (i)|Xk (j).

(2)

k=1 (n)

Note that gii = 1 for all n and 1 ≤ i ≤ Ln , and that, for fixed i and j (n) (i "= j), gij converges to 0 almost surely as n → ∞. Thus it is natural to inquire how “orthogonal” those random vectors are. For later convenience, we denote the ordered list of the Ln random vectors by C (n) . To motivate our problem, let us consider the special case when X forms (n) an orthonormal system. In this case, gij = 0 if Xk (i) "= Xk (j) for some k, (n)

and gij = 1 otherwise. Put differently, the Gram matrix  (n)  (n) g11 · · · g1Ln  . ..   G(n) :=  .   .. (n) (n) gLn 1 · · · gLn Ln gives a yes/no table indicating whether the ith n-tuple (X1 (i), ..., Xn (i)) (n) (n) and the jth n-tuple (X1 (j), ..., Xn (j)) are identical (gij = 1) or not (gij = 0). As a consequence, orthogonality problems for the random vectors are reduced to combinatorial ones when X is orthonormal. In his paper [8], Rényi posed several combinatorial problems that can be regarded as asymptotic versions of the classical birthday problem (cf. [1]) and its variants, and characterized classical entropies. From among these, let us recast two problems in terms of orthogonalities of random vectors. Let X be orthonormal. We say that C (n) satisfies a weak orthogonality condition with respect to the ith vector if the event (n)

Ei

:= {the ith vector Ψ(n) (i) is orthogonal to the other vectors in C (n) }

occurs, and that C (n) satisfies a strong orthogonality condition if the event F (n) := {the vectors in C (n) are mutually orthogonal} occurs. We are interested to know how fast can Ln be increased under the (n) condition that the probability P (Ei ) for some (then any) i, or P (F (n) ), tends to 1 as n → ∞. Let Cw (p) [resp., Cs (p)] be the supremum of (n) lim supn→∞ log Ln /n over all sequences {Ln }n that satisfy P (Ei ) → 1 [resp., P (F (n) ) → 1]. We may call Cw (p) [resp., Cs (p)] the weak [resp., strong] orthogonality capacity of the probability measure p. Since we are now dealing with a probability measure p that has an orthonormal support X , the problems are essentially combinatorial, and it is not too difficult to

entire

December 28, 2004

13:56


Quantum Birthday Problems: Geometrical Aspects of Quantum Random Coding

77

show that Cw (p) = H(p) and Cs (p) = H2 (p)/2, where H(p) and H2 (p) are the Shannon entropy and the Rényi entropy [8] of degree 2 with respect to the probability measure p. Let us now return to the general case when X is not necessarily orthonormal. Although the vectors in C (n) may not be strictly orthogonal in this case, it would be quite possible that they are “almost” orthogonal for sufficiently large n. It is, therefore, expected that quantum entropies might be characterized via asymptotic properties of the set C (n) of random vectors (as Rényi did for classical entropies via combinatorics). The purpose of this paper is to extend the notions of weak and strong orthogonality of random vectors to a general probability measure p, and to determine the corresponding capacities. In fact, with proper definitions of “asymptotic” orthogonalities, it is shown in Theorems 1 and 4 that the weak orthogonality capacity Cw (p) is given by the von Neumann entropy H(ρ) := −Tr ρ log ρ, and the strong orthogonality capacity Cs (p) is given by half the quantum Rényi entropy of degree 2 H2 (ρ) := − log Tr ρ2 , where the probability measure p and the density operator ρ are connected by (1). Since these results obviously generalize the above mentioned classical characterizations by Rényi, our problems may be called quantum birthday problems. It should be emphasized that each capacity depends only on the density operator ρ, so that, for probability measures p and q which give the same density operator, Cw (p) = Cw (q) and Cs (p) = Cs (q) hold. The orthogonality of vectors in C (n) is closely related to their “distinguishability” in quantum measurement theory. Let a physical system of interest be represented by the Hilbert space H⊗n , and let a unit vector in H⊗n correspond to a quantum pure state. When X is orthonormal, any two vectors in C (n) are either orthogonal or identical, so that there is a quantum-mechanical measurement on H⊗n that distinguishes distinct vectors in C (n) with probability one. In this sense, strict orthogonality implies strict distinguishability by a certain measurement. As a matter of fact, this corresponds to the classical situation: unlimited distinguishability for distinct objects is precisely the central dogma of the classical theory, and one can restore Rényi’s original problems by replacing the word(s) “orthogonal (n) (to)” with “distinct (from)” in the above definitions of the events Ei and F (n) . When X is not orthonormal, on the other hand, distinct vectors in

entire

December 28, 2004

13:56


Akio Fujiwara

78

C (n) are not necessarily orthogonal, so that they are not always strictly distinguishable in the sense of quantum mechanics. We therefore have to deal with, so to say, “asymptotic distinguishability” of random vectors. In fact, we will clarify a close connection between asymptotic orthogonality and the noiseless quantum channel coding problem. 2. Weak Orthogonality Let us start with a preliminary consideration as to what is the proper extension of the notion of weak orthogonality to the case when X is not necessarily orthonormal. If the ith vector in C (n) is “almost” orthogonal to the other (n) vectors in C (n) , then the inner products gij , (j = 1, ..., Ln , j "= i), must all be sufficiently small. Thus the proper extension of weak orthogonality (n) might be such that the random variables gij converge to 0 simultaneously for all j("= i) as n → ∞ in a certain mode of convergence. For example, the condition that (n)

Yi

:=

Ln

(n)

|gij |2 → 0

in probability

(3)

j(=i)

might be a candidate. However, in anticiption of a characterization of the von Neumann entropy as well as a relationship with the quantum channel coding problem, we adopt a slightly different approach. (In fact, it is shown in Appendix A that the condition (3) does not characterize the von Neumann entropy.) For each n, let L(n) be a subspace of H⊗n and let ΠL(n) be the projection operator onto the subspace L(n) . Given a pair (C (n) , L(n) ), let us denote the inner product of the projected ith and jth vectors by (n)

gîj := ΠL(n) Ψ(n) (i)|ΠL(n) Ψ(n) (j), and define random variables (n) Yî :=

Ln

(n)

|ˆ gij |2 .

j(=i)

We say that a sequence {C (n) }n satisfies the asymptotic weak orthogonality condition if there is a sequence {L(n) }n of subspaces such that the following conditions are satisfied for all i: (n) (i) gîi → 1 in probability, (n) (ii) Yî → 0 in probability.

entire

December 28, 2004

13:56



79

Some remarks are in order. Condition (i) implies that, for sufficiently large n, the ith vector Ψ(n) (i) will be almost parallel to L(n) , so that the projected ith vector ΠL(n) Ψ(n) (i) be almost identical to the original vector Ψ(n) (i). Condition (ii) implies that, for sufficiently large n, all the vectors Ψ(n) (j), (j "= i), will be simultaneously almost orthogonal to the projected ith vector ΠL(n) Ψ(n) (i). It should be emphasized that the choice of subspaces {L(n) }n is independent of the index i. Theorem 1: (von Neumann entropy as the weak orthogonality capacity) Given a probability measure p, let Cw (p) be the supremum of lim supn→∞ log Ln /n over all sequences {C (n) }n that satisfy the asymptotic weak orthogonality condition. Then Cw (p) = H(ρ), where H(ρ) is the von Neumann entropy for the density operator (1). Before proceeding to the proof, we mention a close connection between Theorem 1 and the noiseless quantum channel coding theorem [4]. Let us regard C (n) as a quantum random codebook. Given a vector (a quantum codeword) in C (n) , our task is to estimate, by means of a certain measurement, which vector among C (n) is the actual one. Associated with a codebook C (n) and a subspace L(n) is the Gram operator G :=

Ln

|ΠL(n) Ψ(n) (j)ΠL(n) Ψ(n) (j)|.

j=1

The operator G is strictly positive on the subspace Lˆ(n) := Span{ΠL(n) Ψ(n) (j); 1 ≤ j ≤ Ln }. Let the operator G −1 be the inverse of G on Lˆ(n) and zero on the orthogonal complement Lˆ(n)⊥ . According to [4], we introduce a measurement M (n) by   Ln   M (n) := |ˆ µ(Ln )|, I − |ˆ µ(j)ˆ µ(j)| , (4) µ(1)ˆ µ(1)|, . . . , |ˆ µ(Ln )ˆ   j=1

where µ ˆ(j) are vectors on Lˆ(n) defined by µ ˆ(j) := G −1/2 ΠL(n) Ψ(n) (j) = G −1/2 Ψ(n) (j). We can regard M (n) as a decoder for the codebook C (n) , in which the ith entry |ˆ µ(i)ˆ µ(i)|, (1 ≤ i ≤ Ln ), corresponds to the ith codeword in C (n) , and the (Ln + 1)st entry to the wild card. (Note that the decoder (4) with the special choice of a subspace L(n) = H⊗n was introduced by Holevo [5]).

entire

December 28, 2004

13:56


Akio Fujiwara

80

The idea for adopting the decoder (4) is this: when the ith vector Ψ(n) (i) is strictly orthogonal to the other vectors in C (n) , then the Gram operator G with L(n) := H⊗n is decomposed into the orthogonal direct sum |Ψ(n) (j)Ψ(n) (j)|, G = |Ψ(n) (i)Ψ(n) (i)| ⊕ j=i

so that

 G −1/2 = |Ψ(n) (i)Ψ(n) (i)| ⊕ 

−1/2 |Ψ(n) (j)Ψ(n) (j)|

,

j=i

and µ ˆ(i) = G −1/2 Ψ(n) (i) = Ψ(n) (i). (n)

As a consequence, the decoding error probability Pe (i) for the ith codeword Ψ(n) (i) by the decoder (4) is µ(i)ˆ µ(i)| = 0. Pe(n) (i) = 1 − Tr |Ψ(n) (i)Ψ(n) (i)|ˆ Thus it is expected that, when the ith vector is almost orthogonal to the (n) other vectors in C (n) , the decoding error probability Pe (i) will be small. In fact, this expectation is verified by the following (n) (n) Lemma 2: If |ˆ gii |2 > 1 − ε and Yî < ε hold for some i and 0 < ε < 1, (n) then the decoding error probability Pe (i) for the ith codeword Ψ(n) (i) by the decoder (4) is upper bounded by 32 ε.

Proof: See Appendix B. Corollary 3: If {C (n) }n satisfies asymptotic weak orthogonality condition, (n) (n) then E[Pe ] → 0 as n → ∞, where Pe is the average decoding error probability for the code (C (n) , M (n) ), and E[ · ] denotes the expectation. (n)

Proof: Let the event Ei (ε) be defined by (n) (n) (n) Ei (ε) := {|ˆ gii |2 > 1 − ε and Yî < ε}.

By the assumption of asymptotic weak orthogonality, for all ε > 0 and (n) δ > 0, there is an N such that for all n ≥ N and i, P (Ei (ε)) > 1 − δ holds. Then (n)

(n)

E[Pe(n) (i)] = E[Pe(n) (i); Ei (ε)] + E[Pe(n) (i); Ei (ε)c ] 3 (n) (n) ≤ E[Pe(n) (i); Ei (ε)] + P (Ei (ε)c ) < ε + δ. 2

entire

December 28, 2004

13:56



81

2 Here, E[X; A] := A X dP , and Lemma 2 is used in the last inequality. Since this upper bound is independent of i, we have E[Pe(n) ] =

Ln 1 3 E[Pe(n) (i)] < ε + δ. Ln i=1 2

This completes the proof. Theorem 1 and Corollary 3 clarify why the decoder of the type (4) has fitted to the random coding technique in the proof of the direct part of the noiseless quantum channel coding theorem [4]. The notion of asymptotic weak orthogonality for the random codebook C (n) thus explicates the physical implication of the probabilistic distinguishability among codewords in the quantum channel coding problem as well as the geometrical mechanism behind the decoder (4). Proof of Theorem 1: We first prove the direct part Cw (p) ≥ H(ρ). Fix an arbitrarily small positive constant δ and, for each n, let Ln be such that Ln < en(H(ρ)−4δ) . We show that there is a sequence {L(n) }n of subspaces for which the asymptotic weak orthogonality conditions (i) and (ii) hold for all i. The idea of the proof is similar to [4]: we take L(n) to be the (n) δ-typical subspace Λδ with respect to the density ρ. (For the reader’s convenience, the definition and the basic properties of the δ-typical subspace are summarized in Appendix C.) Since (n)

gîi = ΠΛ(n) Ψ(n) (i)|ΠΛ(n) Ψ(n) (i) = Tr |Ψ(n) (i)Ψ(n) (i)|ΠΛ(n) , δ

δ

δ

we have E[ˆ gii ] = Tr ρ⊗n ΠΛ(n) > 1 − δ (n)

δ

for all i and all sufficiently large n (see (9)). This proves (i). On the other hand, for all j("= i), (n)

|ˆ gij |2 = |ΠΛ(n) Ψ(n) (i)|ΠΛ(n) Ψ(n) (j)|2 δ

δ

= Tr ΠΛ(n) |Ψ(n) (i)Ψ(n) (i)|ΠΛ(n) |Ψ(n) (j)Ψ(n) (j)|ΠΛ(n) , δ

δ

δ

so that E|ˆ gij |2 = Tr ΠΛ(n) ρ⊗n ΠΛ(n) ρ⊗n ΠΛ(n) = Tr (ρ⊗n )2 ΠΛ(n) ≤ e−n(H(ρ)−3δ) , (n)

δ

δ

δ

δ

(see (10)), and (n)

E[Yî

] ≤ (Ln − 1) e−n(H(ρ)−3δ) < e−nδ .

entire

December 28, 2004

13:56


Akio Fujiwara

82

(n)

Thus Yî → 0 in L1 for all i, proving (ii). This completes the proof of the direct part Cw (p) ≥ H(ρ). We next prove the converse part Cw (p) ≤ H(ρ). Let X be the random variable uniformly distributed over C (n) and let Y be the random variable representing the outcome of the corresponding decoder M (n) defined by (4). Then by virtue of Fano’s inequality, log 2 + Pe(n) log Ln ≥ H(X|Y ) = H(X) − I(X : Y ) ' ! Ln Ln ' 1 1 ' = log Ln − D (n) |Ψ(n) (j)Ψ(n) (j)| ' |Ψ(n) (k)Ψ(n) (k)| ' Ln Ln j=1 M k=1 ' ! Ln Ln ' 1 ' 1 (n) (n) (n) (n) ≥ log Ln − D |Ψ (j)Ψ (j)| ' |Ψ (k)Ψ (k)| ' Ln Ln j=1 k=1 ! Ln 1 (n) (n) |Ψ (k)Ψ (k)| . = log Ln − H Ln k=1

Here D(σ τ ) := Tr σ(log σ − log τ ) is the quantum relative entropy between the quantum states σ and τ (with supp σ ⊂ supp τ ), and DM (σ τ ) denotes the classical Kullback-Leibler divergence between the probability distributions p(·) := Tr σM (·) and q(·) := Tr τ M (·) over the outcomes of the measurement M . (See [3] for notations.) The second inequality is due to the familiar monotonicity relation of the relative entropy (Theorem 1.5 [7]). Now taking the expectation for the above inequality, and using the concavity of the von Neumann entropy, we have 3 !4 Ln 1 (n) (n) (n) |Ψ (k)Ψ (k)| log 2 + E[Pe ] log Ln ≥ log Ln − E H Ln k=1 ! Ln 1 (n) (n) ≥ log Ln − H E |Ψ (k)Ψ (k)| Ln k=1

= log Ln − H(ρ⊗n ) = log Ln − nH(ρ). Therefore log 2 log Ln ≤ H(ρ) + . n n Thus in order to assure the asymptotic weak orthogonality (so that (n) E[Pe ] → 0 as n → ∞ by Corollary 3), lim supn log Ln /n must be less than or equal to H(ρ). This completes the proof of the converse part Cw (p) ≤ H(ρ). (1 − E[Pe(n) ])

entire

December 28, 2004

13:56



83

3. Strong Orthogonality If the vectors in C (n) are mutually strictly orthogonal, then the Gram matrix (n) (n) G(n) = [gij ], where gij is the inner product (2), is reduced to the identity. Therefore if they are mutually “almost” orthogonal, the Gram matrix is expected to be close to the identity. This observation prompts us to define the strong orthogonality as follows. Given C (n) , let the random variable Z (n) be defined by the squared sum of off-diagonal elements of G(n) , i.e., Z (n) :=

Ln Ln

(n)

|gij |2 .

i=1 j(=i)

We say that a sequence {C (n) }n satisfies the asymptotic strong orthogonality condition if the following two conditions are satisfied: (i) Z (n) → 0 in probability, (ii) the sequence {Z (n) }n is uniformly integrable. Theorem 4: (Quantum Rényi entropy as the strong orthogonality capacity) Given a probability measure p, let Cs (p) be the supremum of lim supn→∞ log Ln /n over all sequences {C (n) }n that satisfy asymptotic strong orthogonality condition. Then Cs (p) = 12 H2 (ρ), where H2 (ρ) is the quantum Rényi entropy of degree 2 for the density operator (1). Proof: For i "= j, (n)

|gij |2 = |Ψ(n) (i)|Ψ(n) (j)|2 = Tr (|Ψ(n) (i)Ψ(n) (i)|)(|Ψ(n) (j)Ψ(n) (j)|), so that E|gij |2 = Tr (ρ⊗n )2 = (Tr ρ2 )n = e−nH2 (ρ) , (n)

and E[Z (n) ] = Ln (Ln − 1) e−nH2 (ρ) . Let δ > 0 be an arbitrarily small positive consitant. If Ln < en(H2 (ρ)/2−δ) then E[Z (n) ] → 0, and if Ln > en(H2 (ρ)/2+δ) then E[Z (n) ] → ∞. This completes the proof. Several remarks are in order. If {C (n) }n satisfies condition (i) for asymptotic strong orthogonality, then in a similar way to Lemma 2, it can be (n) shown that the decoding error probabilities {Pe (i)}1≤i≤Ln by the de (n) Ln Pe (i) → 0 in L1 as n → ∞. coder (4) with L(n) := H⊗n exhibit i=1 (Compare this with Corollary 3.) The notion of asymptotic strong orthogonality thus leads to a new, strong type of probabilistic distinguishability

entire

December 28, 2004

13:56


Akio Fujiwara

84

in quantum measurement theory. It is not clear whether there is a coding theorem in which this strong distinguishability playes a pivotal role. A related question whether the converse part of Theorem 4 holds without the uniform integrability condition (ii) is also still open. 4. Conclusions We have shown that asymptotic orthogonalities of random vectors lead us to new, geometrical, characterizations of the von Neumann entropy and the quantum Rényi entropy of degree 2. These characterizations are closely related to the distinguishability of the vectors by quantum mechanical measurements. In particular, a mechanism behind the random coding technique for the noiseless quantum channel coding theorem was clarified. Appendix A. Remark on Condition (3) In this appendix, we show that condition (3) does not characterize the von Neumann entropy. Let φ0 , φ1 be unit vectors in H with a := |φ0 |φ1 |2 < 1, and let p be the probability measure on H such that p(φ0 ) = p(φ1 ) = 12 . The nonzero eigenvalues of the corresponding density operator ρ = 12 |φ0 φ0 | + √ 1 a)/2, so the quantum Rényi entropy of degree 2 is 2 |φ1 φ1 | are (1 ± 1+a . 2 Let {Xk (i)}ki be {φ0 , φ1 }-valued random variables i.i.d. with respect to p, and let Ψ(n) (i) = X1 (i) ⊗ · · · ⊗ Xn (i). The squared norm of the inner (n) product gij then becomes h2 := H2 (ρ) = − log

n (n)

|gij |2 =

|Xk (i)|Xk (j)|2 = aNij , k=1

where Nij is the number of indices k for which Xk (i) "= Xk (j). Now fix a number i arbitrarily. Then it is easily shown that for each n, (n) 2 {|gij | ; j ∈ N , j "= i} are i.i.d. random variables, each taking the value a7 n −n with probability 2 , where I = 0, ..., n. In particular, they have the I expectation 1 + a n (n) m(n) := E |gij |2 = = e−nh2 , 2 and the variance 1 + a2 n 1 + a 2n (n) v (n) := V |gij |2 = − . 2 2

entire

December 28, 2004

13:56



85

We claim the following. Proposition 5:

Let ε be a positive constant and let (n)

Yi

=

Ln

(n)

|gij |2 .

j(=i)

If Ln < en(h2 −ε) , then Yi converges to 0 in probability as n → ∞, and if (n) Ln > en(h2 +ε) , then Yi does not. (n)

Proof: Assume first that Ln < en(h2 −ε) . Then E[Yi ] = (Ln − 1) m(n) < (n) e−nε → 0 as n → ∞, proving that Yi → 0 in probability. To prove the second part, we use the following inequality: 1 4 (n) (n) , (5) < P Yi < (Ln − 1) m 2 (Ln − 1) m(n) (n)

which is verified as follows. (L − 1)m(n) (Ln − 1)m(n) (n) n (n) P Yi < ≤ P Yi −(Ln −1) m(n) > 2 2 2 (n) 4 v 4 2 (n) V [Yi ] = · (n) < . < (n) (n) (Ln − 1) m (Ln − 1) m m (Ln − 1) m(n) Now assume that Ln > en(h2 +ε) . Then (Ln − 1) m(n) > enε − m(n) → ∞ as (n) n → ∞. This fact and the inequality (5) together prove that Yi does not converge to 0 in probability. According to Proposition 5, it is h2 (the quantum Rényi entropy of de(n) gree 2) that characterizes the asymptotic behavior of Yi in this example. As a consequence, one cannot characterize the von Neumann entropy as the capacity for condition (3) in general. Remark 6: If the definition of the asymptotic weak orthogonality is such (n) (n) that Yi → 0 in probability and the sequence {Yi }n is uniformly integrable for all i, then it can be shown in a similar way to Theorem 4 that Cw (p) = H2 (ρ) for a general probability measure p. Appendix B. Proof of Lemma 2 Due to the symmetry, it suffices to consider the case when i = 1. Let {ˆ ek }k be a complete orthonormal system (CONS) of the finite dimensional subspace Lˆ(n) with eˆ1 :=

ΠL(n) Ψ(n) (1) .

ΠL(n) Ψ(n) (1)

entire

December 28, 2004

13:56


Akio Fujiwara

86

The (1, 1)th matrix element of G −1/2 with respect to the CONS {ˆ ek }k is

G −1/2

11

:= ˆ e1 |G −1/2 eˆ1 = =

ΠL(n) Ψ(n) (1)|G −1/2 ΠL(n) Ψ(n) (1)

ΠL(n) Ψ(n) (1) 2

Ψ(n) (1)|G −1/2 Ψ(n) (1) .

ΠL(n) Ψ(n) (1) 2

(6)

(n)

The error probability Pe (1) for the first codeword Ψ(n) (1) in C (n) with respect to the decoder (4) is evaluated as µ(1)|2 Pe(n) (1) = 1 − |Ψ(n) (1)|ˆ = 1 − ΠL(n) Ψ(n) (1) 4 G −1/2

= 1 − |Ψ(n) (1)|G −1/2 Ψ(n) (1)|2 2 −1 ≤ 1 − ΠL(n) Ψ(n) (1) 4 (G11 ) .(7)

11

Here G11 stands for the (1, 1)th matrix element of G, and we have used (6) and the inequality

−1/2 ≥ (G11 ) , G −1/2 11

which is verified by Lemma 7 below. On the other hand, G11 = |ˆ e1 |ΠL(n) Ψ(n) (1)|2 + |ˆ e1 |ΠL(n) Ψ(n) (j)|2 j≥2

1 = ΠL(n) Ψ(n) (1) 2 + |ΠL(n) Ψ(n) (1)|ΠL(n) Ψ(n) (j)|2

ΠL(n) Ψ(n) (1) 2 j≥2 ! (n) (1)|ΠL(n) Ψ(n) (j)|2 j≥2 |ΠL(n) Ψ (n) 2 = ΠL(n) Ψ (1) 1 +

ΠL(n) Ψ(n) (1) 4 1 ε . (8) < ΠL(n) Ψ(n) (1) 2 1 + = ΠL(n) Ψ(n) (1) 2 1−ε 1−ε Substituting (8) into (7), we have Pe(n) (1) < 1 − ΠL(n) Ψ(n) (1) 2 (1 − ε) < 1 − (1 − ε)3/2
1, and (Am )11 ≤ otherwise.

)m

entire

December 28, 2004

13:56



Proof: Let A=

87

λk Ek

k

be the spectral decomposition. Then A−1/2 = (λk )−1/2 Ek , k

so that −1/2

(A

)11 =

−1/2

(λk )

e1 |Ek e1 ≥

k

!−1/2 λk e1 |Ek e1

= (A11 )−1/2 .

k

Here we have used Jensen’s inequality and the fact that e1 |Ek e1 ≥ 0 for all k and k e1 |Ek e1 = 1. Appendix C. Typical Subspaces In this appendix, we give a brief account of the so-called typical subspace (cf. [9, 6, 4]). Given a density operator ρ on H, let ρ= λj Ej j∈J

be a Schatten decomposition, where λj > 0 for all j ∈ J and j λj = 1. Note that λ := (λ1 , λ2 , ...) is naturally regarded as a probability distribution on the index set J such that λ(j) = λj . A Schatten decomposition of ρ⊗n is given by ρ⊗n = (λj1 · · · λjn )(Ej1 ⊗ · · · ⊗ Ejn ). (j1 ,...,jn )∈J n

Obviously, the eigenvalues of ρ⊗n form a probability distribution λn , the i.i.d. extension of λ, on the set J n . Given a density operator ρ and a positive constant δ, an eigenvalue (λj1 · · · λjn ) of ρ⊗n is called δ-typical if the sequence j1 , ..., jn of indices is δ-typical (p. 51 [2]) with respect to the probability distribution λn , that is, if e−n(H(ρ)+δ) ≤ λn (j1 , ..., jn ) ≤ e−n(H(ρ)−δ) . It follows that for all sufficiently large n, (a) e−n(H(ρ)+δ) ≤ (a δ-typical eigenvalue) ≤ e−n(H(ρ)−δ) , (b) (the sum of δ-typical eigenvalues) > 1 − δ, (c) (1 − δ)en(H(ρ)−δ) ≤ (the number of δ-typical eigenvalues) ≤ en(H(ρ)+δ) .

entire

December 28, 2004

13:56


Akio Fujiwara

88

Here (a) is a direct consequence of the definition, and (b) and (c) follow from the asymptotic equipartition property (Theorem 3.1.2 [2]). (n) Let Λδ (⊂ H⊗n ) be the linear span of such eigenvectors of ρ⊗n that (n) correspond to δ-typical eigenvalues. The subspace Λδ is called δ-typical with respect to the density ρ. Let ΠΛ(n) be the projection operator onto the (n)

δ

δ-typical subspace Λδ . Clearly the operators ρ and ΠΛ(n) commute. And δ it follows immediately from (a)–(c) that, for all sufficiently large n, Tr ρ⊗n ΠΛ(n) > 1 − δ, δ 2

⊗n 2 Tr (ρ ) ΠΛ(n) ≤ e−n(H(ρ)−δ) en(H(ρ)+δ) = e−n(H(ρ)−3δ) .

(9) (10)

δ

Acknowledgments The author wishes to thank Michael Camarri and Hidetoshi Shimokawa for their helpful comments. He also acknowledges support of the Telecommunications Advancement Foundation. References [1] M. Camarri and J. Pitman, “Limit distributions and random trees derived from the birthday problem with unequal probabilities,” Electron. J. Probab., 5, 1–18 (2000). [2] T.M. Cover and J. Thomas, Elements of Information Theory, Wiley, New York, 1991. [3] A. Fujiwara and H. Nagaoka, “Operational capacity and pseudeclassicality of a quantum channel,” IEEE Trans. Inform. Theory, 44, 1071–1086 (1998). [4] P. Hausladen, R. Jozsa, B. Schumacher, M. Westmoreland, and W.K. Wootters, “Classical information capacity of a quantum channel,” Phys. Rev. A, 54, 1869–1876 (1996). [5] A.S. Holevo, “Capacity of a quantum communication channel,” Probl. Pered. Inform., 15, 3-11 (1979). Translation in Probl. Inform. Transm., 15, 247–253 (1979). [6] R. Jozsa and B. Schumacher, “A new proof of the quantum noiseless source coding theorem,” J. Mod. Opt., 41, 2343–2349 (1994). [7] M. Ohya and D. Petz, Quantum entropy and its use, Springer-Verlag, Berlin, 1993. [8] A. Rényi, “On the foundations of information theory,” Rev. Int. Statist. Inst., 1, 1–14 (1965). [9] B. Schumacher, “Quantum coding,” Phys. Rev. A, 51, 2738-2747 (1995).

entire

December 28, 2004

13:56


PART II Quantum Cram´ er-Rao Bound in Mixed States Model Chap. 8: H. Nagaoka “A new approach to Cramér-Rao bounds for quantum state estimation” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Chap. 9: H. Nagaoka “On Fisher information of quantum statistical models” . . . . . . . . . 113 Chap. 10: H. Nagaoka “On the parameter estimation problem for quantum statistical models” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chap. 11: H. Nagaoka “A generalization of the simultaneous diagonalization of Hermitian matrices and its relation to quantum estimation theory” . . . . . . . . 133 Chap. 12: M. Hayashi “A linear programming approach to attainable Cramér-Rao type bounds” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Chap. 13: M. Hayashi and K. Matsumoto “Statistical model with measurement degree of freedom and quantum physics” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Chap. 14: M. Hayashi “Asymptotic quantum theory for the thermal states family” . . . . . . 170 Chap. 15: R. D. Gill and S. Massar “State estimation for large ensembles” . . . . . . . . . . . . . . . . . . . 178

entire

December 28, 2004

13:56


entire

December 28, 2004

13:56


CHAPTER 7

Introduction to Part II 1. Main Issue in Quantum Estimation The aim of quantum statistical estimation theory is to estimate the true density matrix of the quantum system of interest through suitable quantum measurements. Its process is divided into two parts: One is the measurement process, in which we perform a quantum measurement on the quantum system of interest. The other is data manipulation, in which we estimate the true parameter from the obtained outcomes. Thus, for our purpose, we choose these two processes to be optimal. While the latter process can be treated as a problem in the classical estimation theory, the selection of the former is treated specially in the quantum case. Therefore, the choice of the measurement is the main issue in quantum estimation. 2. Asymptotic Theory of Classical Estimation Here, let us briefly review estimation theory in classical statistics. Statistical researchers usually assume that the true probability distribution belongs to a certain parametric subset of probability distributions, which we refer to as a family or a family of probability distributions, that is described by {pθ |θ ∈ Θ}, where Θ is the parameter space. The family of probability distributions is chosen on their prior knowledge. This assumption is effective for reducing the estimation error, and is a standard method in classical statistics. Under the one-parameter case, the estimation error is usually measured by the mean square error (MSE), which is defined as the expectation of the square of the difference between the true parameter and the estimated parameter under the true probability distribution. When the amount of obtained data is large enough, the MSE of a suitable estimator is approximately proportional to the inverse of the number of observations. In such an asymptotic case, it is natural to focus on its coefficient; and the optimal coefficient equals the Fisher information, which is defined as the expectation of the squared logarithmic derivative of the probability density with respect to the parameter. 91

entire

December 28, 2004

92

13:56


Introduction to Part II

In the multi-parameter case, some researchers focus on the sum of the MSEs of the respective parameters. In the asymptotic case, that of the optimal estimator is also proportional to the inverse of the number of observations, and its coefficient equals the trace of the inverse of the Fisher information matrix, which can be regarded as a multi-parameter extension of Fisher information. 3. Attainable Quantum Cram´ er-Rao Type Bound In the quantum case, similarly to the classical case, it is often known that the true quantum state belongs to a certain parametric subset of densities, referred to as a quantum state family, which is described by {ρθ |θ ∈ Θ}. Since the above-mentioned general framework has been established in the asymptotic theory of classical estimation, it is very natural to extend it to the quantum setting. In this setting, when we fix a quantum measurement M , the outcome obeys a probability distribution PρM . Thus, the choice of the quantum measurement M implies the choice of the family of probability distributions {PρMθ |θ ∈ Θ}. Hence, when we perform the same measurement M on the separate systems in the quantum i.i.d. case, the MSE of the best classical data manipulation is asymptotically characterized by Fisher information (or Fisher information matrix) of the probability distribution family {PρMθ |θ ∈ Θ}. Therefore, the central issue of quantum estimation is the minimization problem of the inverse of Fisher information (or the trace of the inverse of Fisher information matrix) over the family of probability distributions defined by the quantum measurement. Its minimum value is called the most informative Cramér-Rao bound, the attainable Cramér-Rao type bound, or the non-quantum-correlational Cramér-Rao type bound, which is simplified to the CR bound in the following. Indeed, it is possible to replace the minimization problem by the minimization of the trace of the covariance matrix of a locally unbiased estimator. This is because the former minimum is equivalent to the latter minimum, as was shown by Nagaoka [Chap. 8]∗ . As the first analysis of the CR bound, Helstrom [II-1] introduced symmetric logarithmic derivative (SLD) and SLD Fisher information as quantum analogues of logarithmic derivative and Fisher information, respectively. Using these, he solved this optimization problem in the one∗ While

the paper [Chap. 9] is an older paper than [Chap. 8], the paper [Chap. 8] is treated as the first paper in Part II because Chap. 8 is more introductory.

entire

December 28, 2004

13:56



93

parameter case. However, the optimization remained open in the multiparameter case. The selected papers [Chaps. 5,8,9,12,15,16] treated this problem, and calculated the CR bound in some specific cases. Here, we review the history of the CR bound in the multi-parameter case. Extensive progress in this case was first made by Yuen and Lax [II-2]. They solved this problem in the estimation problem of the coherent states family in thermal noise. Following this result, Holevo [0-2] expanded their result to the estimation problem of the expectation parameter of quantum Gaussian state family. His method is based on right logarithmic derivative (RLD) and another quantum analogue of Fisher information matrix which is called the RLD Fisher information matrix. He also obtained a lower bound to the CR bound, which can be applied in much great generality, and is called the Holevo bound. However, the researches until 1980 did not focus on the relation between asymptotic theory and the latter minimum value† . They were only concerned with the minimization under the unbiasedness condition or the locally unbiasedness condition. Since Nagaoka [Chap. 8] pointed out this relation in the middle of the 1980s, its asymptotic aspect has been studied. Nagaoka [Chap. 8] derived a more simple expression for the Holevo bound and a better lower bound in the two-parameter case, which is called the Nagaoka bound. Nagaoka [Chap. 11] calculated the CR bound in any two-parameter family in the quantum two-level system by showing that the Nagaoka bound is achievable in these families. He treated the simultaneous measurement of two observables [Chap. 11], which was generalized to that of multi-observables by Hayashi‡ . Hayashi [Chap. 12] calculated the CR bound of any family in the quantum two-level system [Chap. 12]. His method is based on the duality theorem of linear programming, and can be applied to the three-parameter case as well as the two-parameter case. After this progress, Gill and Massar [Chap. 15] independently obtained the CR bound of any family in the quantum two level system. (More details on the mathematical statistical side of their approach is discussed in another paper [II-3].) Their approach is completely different from the preceding ones, and is simpler than the previous approaches.

† Only

Young’s paper [II-7] treated this relation. Hayashi “On simultaneous measurement of noncommutative observables,” Development of infinite-dimensional non-commutative analysis Surikaisekikenkyusho (RIMS), Kyoto Univ., Kokyuroku No. 1099, 96–118, (1999), (In Japanese).

‡ M.

entire

December 28, 2004

94

13:56



4. Attainable Quantum Cram´ er-Rao Type Bound with Quantum Correlation In the previous section, we discussed the case where we do not use quantum correlation on the tensor product space in the measuring apparatus. In the following, we focus on the possible advantage of collective measurements§ , i.e., we consider whether using quantum correlation in the measuring process can improve the CR bound or not. It is easily verified that the collective measurement cannot improve this bound in the one-parameter case because the SLD Fisher information on the n-tensor product space equals n times the SLD Fisher information. But, the situation is completely changed in the multi-parameter case as follows: Hayashi discussed this problem in the quantum two-level system [II-5] and the simultaneous estimation problem of the displacement parameter and thermal noise parameter for a family of coherent states in thermal noise [Chap. 14]. Regarding the quantum two-level system, the tensor product states can be approximated by the coherent states family in the thermal noise. Thus, the results of the latter can be applied to the former case. Moreover, using group the representation theory, Matsumoto treated the CR bound in any family in the finitedimensional system¶ . Using the quantum central limit theorem, Hayashi obtained the same results in a more general setting [II-9]. In addition, any adaptive POVM cannot improve the Fisher information matrix based on a single POVM. This quantum two-level system case was proven by Gill and Massar [Chap. 15], and its general case by Hayashi [II-6]. Thus, for further improvement, we require a POVM that is not contained by separable POVMs. Furthermore, Tsuda and Matsumoto [I-6] extended some of the above results of Quantum Cramér-Rao bounds to the non-differentiable settings. 5. Global Attainability of CR Bound It is a serious problem, however, that the optimal measurement realizing the CR bound depends on the true state, in general. Nagaoka focused on a family wherein such a measurement is independent of the true state, and called such a family quasi-classical. He obtained several equivalent conditions for § We use the term ‘collective measurement’ in the sense of quantum measurement in whose process we use quantum correlation on the tensor product space, while some experimental researchers use it in the sense of the measurement concerning the ensemble average. ¶ K. Matsumoto, seminar note, (1999), (In Japanese).

entire

December 28, 2004

13:56



95

quasi-classicality [Chap. 9], whereas Young obtained similar results [II-7]. However, even if the state family is not quasi-classical, by adaptively choosing the measurement, we can asymptotically achieve the CR bound. This possibility was pointed by Nagaoka [Chap. 10], and independently proven by Hayashi & Matsumoto [Chap. 13] and Gill & Massar [Chap. 15] . This result implies that in the one-parameter case, the optimal error with quantum correlation can be asymptotically attained by adaptively choosing the measurement on the single system. Indeed, Fischer and Freyberger [II-10] proposed an experimental scheme of adaptive measurement, and Hannemann et al. [II-11] experimentally realized their proposal. Moreover, Nagaoka also focused on a quantum analogue of an efficient estimator [Chap. 9], and discussed a quantum extension of information geometry [Chap. 10] (This discussion is also contained in Amari and Nagaoka [II-12]). Finally, we should remark that there are other results regarding the CR bound in the pure states case. These will be discussed in the next part.

6. Mathematical Formulation in Classical Estimation 6.1. Cram´ er-Rao Inequality For the reader’s convenience, we will review the asymptotic theory in classical estimation, more formally. Assume that n data ωn ≡ (ω1 , . . . , ωn ) are independently generated subject to an unknown probability distribution, which belongs to a oneparametric probability family {pθ |θ ∈ R}. Then, the estimator is given by a function θˆn from the n observations to the real number. Usually, the estimation error is evaluated by the MSE, which is defined by MSEθ (θˆn ) ≡

(θ − θˆn (ωn ))2 pnθ (ωn ),

(1)

ωn

where pnθ is the n-fold product density obtained from pθ . When the estimator θˆn satisfies the following condition, it is called unbiased: θ=

θˆn (ωn )pnθ (ωn ),

∀θ ∈ R.

(2)

ωn

The

Gill-Massar paper was inspired by Barndorff-Nielsen and Gill [II-8], which roughly sketched how asymptotically optimal adaptive estimation in the one-paramter case could be done, in general.

entire

December 28, 2004

13:56



96

Any unbiased estimator θˆn satisfies the inequality: 1 MSEθ (θˆn ) ≥ nJθ

(3)

which is called the Cramér-Rao inequality, and Jθ is the Fisher information defined as Jθ ≡

d log pθ (ω) 2 dθ

ω

pθ (ω).

(4)

If an unbiased estimator attains the above equality at all points, it is called efficient. Indeed, it is easily verified that the following three conditions are equivalent for a family {pθ |θ ∈ R}: (i) There exists an efficient estimator X for the family. (ii) There exists an efficient estimator X n for the n-fold family {pnθ |θ ∈ R}. (iii) The family {pθ |θ ∈ R} has another parameter θ˜ such that ˜ ˜ − ψ(θ)), pθ˜(ω) = exp(θX(ω) ˜ ˜ ≡ log exp(θX(ω)). ψ(θ)

(5) (6)

ω

Especially, if the above condition is satisfied, the family {pθ |θ ∈ R} is called exponential. Moreover, when the condition (iii) holds, we have the following relations θ=

˜ dψ(θ) , dθ˜

dφ(θ) θ˜ = , dθ

˜ − ψ(θ). ˜ where φ(θ) ≡ maxθ˜ θθ 6.2. Asymptotic Setting When the number n of obtained data is large, the maximum likelihood n estimator θˆML ( ωn ) ≡ arg maxθ pnθ (ωn ) is almost optimal in the following sense: The maximum likelihood estimator satisfies 1 n )= , lim nMSEθ (θˆML Jθ

n→∞

(7)

entire

December 28, 2004

13:56



97

when the family satisfies some regularity conditions∗∗ . Conversely, it is known that any suitable†† , the estimator θˆn satisfies nMSEθ (θˆn )

1 Jθ

(8)

when the number n of obtained data is large. Therefore, we can conclude that the inverse of Fisher information provides the optimal performance in the asymptotic setting. 6.3. Multi-Parametric Case Next, we proceed to the multi-parametric case. We assume that n observations ωn ≡ (ω1 , . . . , ωn ) are independently generated subject to the unknown probability distribution, which belongs to a multi-parametric probability family {pθ |θ = (θ1 , . . . , θd ) ∈ Θ ⊂ Rd }. In this case, the estimator is denoted by a function θˆn from n observations to the parameter space Θ or the larger set Rd , and we focus on the sum of MSE of respective parameters, or the MSE matrix: ˆ MSEi,j (θi − θˆni (ωn ))(θj − θˆnj (ωn ))pnθ (ωn ), (9) θ (θn ) ≡ ωn

whose trace equals the sum of the MSEs of the respective parameters. In this case, an estimator θˆn = (θˆn1 , . . . , θˆnd ) is called unbiased if it satisfies (10) θˆni (ωn )pnθ (ωn ), ∀θ ∈ Θ, ∀i. θi = ωn

Any unbiased estimator θˆn satisfies the inequality

1 i,j −1 ˆ MSEi,j (J ) , θ (θn ) ≥ n θ

(11)

which is called the multi-parameter Cramér-Rao inequality, and (Jθi,j ) is the Fisher information matrix defined by ∂ log pθ (ω) ∂ log pθ (ω) (12) pθ (ω). Jθi,j ≡ ∂θi ∂θj ω ∗∗ Statistical

researchers often focus on the MSE of the limiting distribution instead of the limit of MSE. This is because the inequalities (7) and (8) hold with less conditions in the version of limiting distributions [0-3]. †† However, there is a problem in a rigorous definition of such a ‘suitable estimator’. This problem is called superefficiency, and there are many different solutions in classical theory. For detials, please refer to van der Vaart [0-3].

entire

December 28, 2004

13:56



98

n Furthermore, the maximum likelihood estimator (MLE) θˆML (ωn ) ≡ n ωn ) satisfies arg maxθ pθ ( i,j −1 ˆn lim nMSEi,j , θ (θML ) = (Jθ )

n→∞

(13)

when the family satisfies some regularity conditions. Thus, regarding the sum of MSE, we have lim n

n→∞

d

i,j −1 ˆn MSEi,i ), θ (θML ) = tr((Jθ )

(14)

i=1

where (Jθi,j )−1 denotes the inverse of the matrix (Jθi,j ). Note that its trace d tr((Jθi,j )−1 ) is different with i=1 (Jθi,i )−1 in general. Therefore, similarly to the one-parametric case, the maximum likelihood estimator is asymptotically optimal, and the inverse of Fisher information provides the optimal performance in the asymptotic setting. A similar relation of the multi-parametric case as (3) is summarized in Appendix K in Chap. 28.

Further Reading [II-1] C. W. Helstrom, “Minimum mean-square error estimation in quantum statistics,” Phys. Lett. 25A, 101–102 (1976). [II-2] H. P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of non-selfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740 (1973). [II-3] R. Gill, “Quantum asymptotics,” In: State of the Art in Probability and Statistics, IMS Lecture Notes - Monograph Series 36 (2001), 255–285, eds A.W. van der Vaart, M. de Gunst, C.A.J. Klaassen. [II-4] M. Hayashi “A Linear Programming Approach to Attainable Cramer-Rao type bound and Randomness Conditions,” quantph/9704044. [II-5] M. Hayashi, “Asymptotic Quantum Parameter Estimation in Spin 1/2 System,” quant-ph/9710040. [II-6] M. Hayashi, An Introduction to Quantum Information, (Saiensu-sha, Tokyo, 2004) (In Japanese). Its English translation is planned. [II-7] T. Y. Young, “Asymptotic Efficient Approaches to QuantumMechanical Parameter Estimation,” Information Sciences, 9, 25–42 (1975).

entire

December 28, 2004

13:56



99

[II-8] O.E. Barndorff-Nielsen and R.D. Gill, “Fisher information in quantum statistics,” J. Phys. A: Math. Gen., 30, 4481–4490 (2000); quant-ph/9808009. [II-9] M. Hayashi, “Quantum estimations and quantum central limit theorems,” S¯ ugaku 55, no. 4, 368–391, (2003), (In Japanese); Its English translation will be published from American Mathematical Society. [II-10] D. G. Fischer and M. Freyberger, “Estimating mixed quantum states,” quant-ph/0005090. [II-11] Th. Hannemann, D. Reiß, Ch. Balzer, W. Neuhauser, P. E. Toschek, and Ch. Wunderlich, “Estimation of qubit states in a factorizing basis,” quant-ph/0110068. [II-12] S. Amari and H. Nagaoka, Methods of Information Geometry, (AMS & Oxford University Press, 2000).

entire

December 28, 2004

13:56


CHAPTER 8

A New Approach to Cram´ er-Rao Bounds for Quantum State Estimation Hiroshi Nagaoka

1. Introduction Motivated mainly by the recent progress in the field of optical communications, the parameter estimation problem for a quantum statistical model S = {Sθ ; θ = (θ1 , . . . , θn ) ∈ Θ }, where {Sθ } are density operators on a Hilbert space, has been actively studied. The problem is to estimate the ‘true’ state which is unknown but is assumed to be represented by a density operator in H. This can be regarded as a noncommutative analogue of the parameter estimation problem for a ‘classical’ statistical model P = {Pθ ; θ = (θ1 , . . . , θn ) ∈ Θ }, where {Pθ } are probability distributions, and many notions in the classical theory can find their analogues in the quantum theory. The unbiasedness of an estimator is one of these notions and is very important in the quantum case, also. The fundamental theorem in the classical theory of unbiased estimation is the Cramér-Rao inequality which gives a lower bound for the variance of an arbitrary unbiased estimator. In the quantum case, however, several lower bounds of Cramér-Rao type (CR bounds, for short) are known. In particular, C S which is based on the notion of ‘symmetric logarithmic derivatives’ [1, 2] and C R which is based on the notion of ‘right logarithmic derivatives’ [5] are well-known. A. S. Holevo introduced and studied another CR bound C H which, in a sense, unifies C S and C R [3, 4]. However, C H is defined via some minimization procedure and is not generally written in an explicit form. Moreover, C H is not ‘optimal’ in general, in the sense that there may be a CR bound which is more informative than C H . The theory of quantum CR bounds is not yet completed. In this paper, some new results on quantum CR bounds are presented. First, we define the ‘most informative’ CR bound C MI and derive a general c 1989 IEICE. Reprinted, with permission, from IEICE Technical Report, IT 89-42, 9-14, 1989 100

entire

December 28, 2004

13:56


A New Approach to Cram´ er-Rao Bounds for Quantum State Estimation

101

property of this bound. In the 1-parameter case n = 1, it is shown that C MI = C S . Next, we introduce a new CR bound C NEW for the 2-parameter case n = 2, which is based on a fundamental inequality on simultaneous measurements of two noncommuting observables. It is shown that C MI ≥ C NEW ≥ C H in general. Furthermore, in the simplest noncommutative case dimH = 2, we show that C MI = C NEW . In this paper, we do not delve into the mathematically rigorous treatment of operators on a Hilbert space, but often proceed in an ‘intuitive’ manner, assuming several regularity conditions tacitly. For the rigorous argument, see [4]. 2. Quantum Measurement Theory In this section we give a brief review of the quantum measurement theory for the later discussion. Suppose that a quantum physical system is given. Then a complex Hilbert space H corresponds to the system, and physics of the system is described in terms of mathematics on H. We denote the inner product of two vectors φ and ψ in H by (φ|ψ). Let X be a linear operator on H. Then X ∗ (the adjoint operator of X) and Tr X (the trace of X) are defined as usual; i.e., (φ|Xψ) = (X ∗ φ|ψ) for ∀φ, ∀ψ and Tr X = j (ψj |Xψj ) where {ψj } is an arbitrary orthonormal basis of H. It should be noted that Tr XY = Tr Y X for ∀X, ∀Y . When X = X ∗ holds, X is called a Hermitian operator. We write X > 0(≥ 0) when (ψ|Xψ) > 0(≥ 0) for ∀ψ:nonzero. A Hermitian operator S satisfying S ≥ 0 and Tr S = 1 is called a density operator. It is also called a state, because a physical state of the quantum system is represented by a density operator. A state S is said to be pure if rankS = 1. In this case S becomes the orthogonal projection onto a 1dimensional linear subspace of H. Using the Dirac’s notation, a pure state S is written as S = |ψ)(ψ| by a vector ψ of norm 1, i.e., (ψ|ψ) = 1. In general, any state S can be written as a mixture of pure states: pj |ψj )(ψj | (1) S= j

where {pj } are positive numbers satisfying j pj = 1, and {ψj } are vectors of norm 1. Indeed, letting {pj } be the nonzero eigenvalues of S and {ψj } be the corresponding eigenvectors constituting a orthonormal system, S is written as (1). A state which is not pure is called a mixed state.

entire

December 28, 2004

13:56


Hiroshi Nagaoka

102

Next we proceed to mathematical description of quantum measurements. Consider an arbitrary measurement with values in a set Ω ; i.e., if the measurement is performed to the quantum system, we get an element of Ω as the outcome. Owing to the statistical nature of quantum mechanics, the outcome fluctuates in general under a probability distribution on (Ω , B), where B is a σ-algebra consisting of subsets of Ω . This probability distribution depends on the state of the system in the following way: when the present state of the system is represented by a density operator S, the probability that the outcome lies in B, where B is a set in B, is written as PS (B) = Tr SM (B) where M : B )→ M (B) is an operator-valued set function defined on B such that (i) M (φ) = 0, M (Ω ) = I (the identity on H) ; ∗ (ii) M (B) 5 = M (B)≥ 0 for ∀B ∈ B; M (Bj ) for any (at most countable) disjoint se(iii) M ( Bj ) = j

j

quence {Bj } of sets in B. It should be noted that M is determined by the considered measurement and does not depend on the state S. Motivated by this fact, a measurement on a measurable space (Ω , B) is mathematically defined as an operatorvalued set function M satisfying (i)–(iii) above. (Such an M is also called a probability operator-valued measure on (Ω , B).) We often omit B from (Ω , B) when B is the ‘natural’ σ-algebra on Ω such as the totality of Borel subsets in the case of a topological Ω or the totality of subsets in the case of finite Ω . If a measurement M on (Ω , B) is projection-valued in the sense that (iv) M (B)2 = M (B)

for ∀B ∈ B,

we say that M is simple. It can be shown that, under the conditions (i)–(iii), (iv) is equivalent to the orthogonality condition: (v) M (B1 ) M (B2 ) = 0

if B1 ∩ B2 = φ.

In many standard texts of quantum mechanics, only this type of measurements is referred to as the representation of measurements. It is known that an arbitrary measurement over the system H can be ‘realized’ by a simple measurement over the ‘composite system’ of H and an additional independent system H0 in a fixed state S0 (, see [4]).

entire

December 28, 2004

13:56



103

Simple measurements on R(the set of all real numbers) are of particular importance because they represent the measurements of real-valued physical quantities called observables. As is well-known, an arbitrary observable corresponds to a Hermitian operator, say X, which has the spectral representation 6 ∞ x E(dx), (2) X = −∞

where E is a simple measurement on R. This E represents the measurement of the observable corresponding to X. Note that the correspondence between Hermitian operators and simple measurements on R by (2) is oneto-one. 3. Parameter Estimation and Lower Bounds of Cram´ er-Rao Type (CR Bounds) Let Θ be an open set in Rn , and let S = {Sθ ; θ ∈ Θ } be a family of density operators on a Hilbert space H smoothly parametrized by an ndimensional parameter θ = (θ1 , . . . , θn ) with range Θ . Such a family is called an n-dimensional quantum statistical model. We consider the parameter estimation problem for the model S. An estimator for the parameter θ based on some measurement is itself regarded as a measurement with outcomes in the parameter space Θ , and is represented by a probability operator-valued measure on Θ . For a technical reason, we define an estimator as a measurement on Rn , permitting the possibility of getting an estimate out of Θ . Such a definition is often adopted in the classical statistical theory, also. ˆ be an estimator. When the system is in the state Sθ for some Let M θ ∈ Θ , the probability that the estimate lies in a Borel set B in Rn is written as ˆ (B) . Pˆθ (B) = Tr Sθ M ˆ is said to be unbiased if for ∀θ ∈ Θ M 6 ˆ = θj (j = 1, 2, . . . , n). θˆj Pˆθ (dθ) Differentiating this equation we obtain 6 ∂ ˆ = δ j (j, k = 1, 2, . . . , n), θˆj k Pˆθ (dθ) k ∂θ

(3)

(4)

entire

December 28, 2004

104

13:56


Hiroshi Nagaoka

ˆ satisfies (3) and (4) at a fixed where δkj is the Kronecker’s delta. When M ˆ point θ ∈ Θ , we say that M is locally unbiased at θ. Obviously, an estimator is unbiased if and only if it is locally unbiased at every θ ∈ Θ . ˆ is locally unbiased at a fixed point θ. Then the covariSuppose that M ˆ ] = [v jk ] ∈ Rn×n , is defined ˆ at the state Sθ , say Vθ [M ance matrix of M θ as 6 jk ˆ vθ = (θˆj − θj )(θˆk − θk )Pˆθ (dθ). We describe the accuracy of the estimation at θ by ˆ] = gjk vθjk , tr GVθ [M j,k

where tr denotes the trace of an n × n matrix in distinction from Tr for an operator on H, and G = [gjk ] ∈ Rn×n is a positive-definite matrix. Now we define a lower bound of Cramér-Rao type (CR bound, for short) (with respect to G at θ) as a quantity C which satisfies ˆ] ≥ C tr GVθ [M ˆ which is locally unbiased at θ. for any estimator M From now on, we assume that any state Sθ in S is nondegenerate; i.e., Sθ > 0 for ∀θ ∈ Θ . This assumption on the model S will simplify the argument on CR bounds very much. 4. Some Examples of CR Bounds We introduce two well-known CR bounds here. Suppose that G and θ are given and fixed. First, as quantum analogues of 1 ∂ ∂ log pθ = pθ ∂θj pθ ∂θj for a (classical) statistical model of probability distribution functions {pθ }, we define the symmetric logarithmic derivatives at θ {LSθ,j ; j = 1, . . . , n} and the right logarithmic derivatives at θ {LR θ,j ; j = 1, . . . , n} as follows: LSθ,j is a Hermitian operator on H such that ∂ 1 Sθ = (Sθ LSθ,j + LSθ,j Sθ ), j ∂θ 2 and LR θ,j is a linear operator on H such that ∂ Sθ = Sθ L R θ,j . ∂θj

entire

December 28, 2004

13:56



105

Using these logarithmic derivatives, quantum analogues of the Fisher information matrix are defined: JθS be an n × n real symmetric matrix whose (j, k) element is [JθS ]jk = Re Tr Sθ LSθ,j LSθ,k , and JθR be an n × n complex Hermitian matrix whose (j, k) element is R ∗ [JθR ]jk = Tr Sθ LR θ,k (Lθ,j ) .

Let C R = tr G Re(JθR )−1 + tr abs G Im(JθR )−1 ,

def

def

C S = tr G(JθS )−1 ,

where abs A for a matrix A is defined by  |α1 | def  .. abs A = T  .

  −1 T |αn |

when A is diagonalized as





α1

 A = T

..

 −1 T

. αn

and therefore we have: tr abs A = j |αj |. Then it can be shown that both C S and C R are CR bounds [1, 2, 4, 5]. A.S.Holevo introduced another CR bound, say C H , through which we can understand C S and C R in a unified manner [3, 4]. C H is based on the following lemma. Lemma 1: Suppose that

6 f (ω)M (dω) = F Ω

where M is a measurement on a set Ω , f is a complex-valued function defined on Ω , and F is an operator on H. Then we have 6 |f (ω)|2 M (dω) ≥ F F ∗ . Ω

Proof: 6 6 |f (ω)|2 M (dω) − F F ∗ = {f (ω) − F }M (dω){f (ω) − F }∗ ≥ 0. Ω

Ω

entire

December 28, 2004

13:56


Hiroshi Nagaoka

106

ˆ be an estimator which is locally unbiased at θ, and let Let M 6 def ˆ ˆ (dθ) (θˆj − θj )M (j = 1, 2, . . . , n). Xj =

(5)

It is noted that the local unbiasedness condition (3)(4) is written as Tr Sθ X j = 0 (j = 1, 2, . . . , n) ; ∂ Tr k Sθ X j = δkj (j, k = 1, 2, . . . , n). ∂θ

(6) (7)

For an arbitrary complex vector ξ = [ξ j ] ∈ Cn , define the function f : Rn → C and the operator F by ˆ = f (θ)

n

ξj (θˆj − θj ),

F =

j=1

Then we have that

2

n

ξj X j .

j=1

ˆM ˆ = F from (5), and therefore Lemma 1 claims ˆ (dθ) f (θ) 6

from which we obtain 6

ˆ ≥ F F ∗, ˆ 2M ˆ (dθ) |f (θ)|

ˆ ≥ Tr Sθ F F ∗ . ˆ 2 Tr Sθ M ˆ (dθ) |f (θ)|

ˆ ]ξ by means of the The left-hand side of this inequality is written as ξ ∗ Vθ [M ˆ ], while the right-hand side is written as ξ ∗ Zθ [X]ξ covariance matrix Vθ [M = [z jk ] ∈ Cn×n such that = (X 1 , . . . , X n )) by means of Zθ [X] (where X z jk = Tr Sθ X k X j .

(8)

Thus we have ˆ ] ≥ Zθ [X]. Vθ [M The above consideration leads us to define the CR bound C H as def

G) | X ∈ Xθ } C H = min{λθ (X; where

def λθ (X; G) = min tr GV

V is an n × n real symmetric matrix satisfying V ≥ Zθ [X]

+ tr abs G ImZθ [X], = tr G ReZθ [X]

([4])

entire

December 28, 2004

13:56



107

def

= (X 1 , . . . , X n ) | {X j } are Hermitian Xθ = {X operators satisfying (6) and (7) }. C H is ‘more informative’ than both C S and C R in the sense that C H ≥ max{C S , C R }. This can be seen immediately from the following relations: | X ∈ Xθ }, C S = min{tr G ReZθ [X]

G) | X ∈ X˜θ } C R = min{λθ (X;

where X˜θ is an extension of Xθ such that def = (X 1 , . . . , X n ) | {X j } are linear operators X˜θ = {X

and satisfy (6) and (7) } G) is extended to be defined for non-Hermitian and, correspondingly, λθ (X; by rewriting (8) to X z jk = Tr Sθ X k (X j )∗ . 5. The Most Informative CR Bound The most informative CR bound C MI is defined as the maximum of all the CR bounds, or equivalently def ˆ] | M ˆ is locally unbiased at θ}. C MI = min{tr GVθ [M

It is also written as G) | X ∈ Xθ } C MI = min{νθ (X;

(9)

where G) def ˆ]| M ˆ is an estimator satisfying = min{tr GVθ [M νθ (X; 6 j ˆ ˆ (dθ), X = (θˆj − θj )M j = 1, . . . , n}. So, the study of C MI can be divided into two steps: the first step is to seek G), and the second step is to solve the for an explicit expression of νθ (X; minimization problem (9). Actually, we will carry out this program for a special case later. Now we elucidate the relation between C MI and the Cramér-Rao bound of the classical statistical model obtained from S by fixing a measurement. Let M be a measurement on (Ω , B) and let def

PθM (B) = Tr Sθ M (B)

(B ∈ B,

θ ∈ Θ ).

entire

December 28, 2004

13:56


Hiroshi Nagaoka

108

def

Then P M = {PθM ; θ ∈ Θ } becomes a classical statistical model on (Ω , B). Since {Sθ } are assumed to be nondegenerate, the condition PθM (B) = 0 is equivalent to M (B) = 0 and hence does not depend on θ. This means that there exists a measure µ on (Ω , B) such that, for ∀θ ∈ Θ , the density M function of PθM w.r.t. µ, say pM θ = dPθ /dµ, is defined and is positive almost everywhere. So, we can define the Fisher information matrix JθM ∈ Rn×n as 6 ∂ ∂ def M [JθM ]jk = { j log pM log pM θ (ω)}{ θ (ω)}Pθ (dω). ∂θ ∂θk Theorem 2: C MI = min{tr G(JθM )−1 | M is a measurement}. Proof: Denote the right-hand side by C. The (classical) Cramér-Rao theˆ ] ≥ (J Mˆ )−1 for an arbitrary estimator M ˆ which is orem says that Vθ [M θ MI ≥ C. On the other hand, for an locally unbiased at θ. This leads to C arbitrary measurement M on a set Ω , define the mapping θˆ : Ω → Rn by n ∂ θˆj (ω) = θj + [(JθM )−1 ]jk k log pM (j = 1, 2, . . . , n, ω ∈ Ω ), θ (ω), ∂θ k=1

ˆ by and define the estimator M ˆ (B) = M (θˆ−1 (B)) M

(B : a Borel set in Rn ).

ˆ is locally unbiased at θ and satisfies Vθ [M ˆ] = It then turns out that M M −1 MI (Jθ ) . This leads to C ≤ C. Here we consider the single parameter case n = 1. Let Eθ be the spectral resolution of the Hermitian operator θ + (JθS )−1 LSθ ; i.e., Eθ is a simple measurement on R such that 6 S −1 S ˆ θ (dθ). ˆ θE θ + (Jθ ) Lθ = Then it follows from the definition of LSθ that Eθ is locally unbiased at θ. Moreover, noting that Eθ is a simple measurement, we can see that Vθ [Eθ ] = (JθS )−1 . This leads to C MI = C S , and from Theorem 2 we have JθS = JθEθ = max{JθM | M is a measurement }. This means that Eθ is most informative among all the measurements with respect to the parameter estimation problem for S at the fixed point θ.

entire

December 28, 2004

13:56



109

6. A New CR Bound for the 2-Parameter Case In this section we introduce a new CR bound C NEW for the two parameter case n = 2. The following lemma is fundamental. Lemma 3: Let A and B be Hermitian operators and let S be a nondegenerate density operator. If a measurement M on R2 satisfies 6 6 aM (da db) = A, bM (da db) = B (10) then

6 (a2 + b2 )Tr SM (da db) ≥ Tr S(A2 + B 2 ) + Tr Abs S[A, B]

(11)

def

where [A, B] = AB − BA and, for an operator X having the eigenvalues def {λj }, Tr Abs X = j |λj |. Proof: Since (10) is written as 6 (a ± ib)M (da db) = A ± iB,

(i2 = −1)

it follows from Lemma 1 that 6 6 2 2 |a ∓ ib|2 M (da db) (a + b )M (da db) = ≥ (A ∓ iB)(A ± iB) = A2 + B 2 ± i[A, B], which leads to 6 1 1 1 1 1 1 (a2 + b2 )S 2 M (da db)S 2 ≥ S 2 (A2 + B 2 )S 2 ± iS 2 [A, B]S 2 . 1

(12) 1

Let {λj } be the eigenvalues of the Hermitian operator iS 2 [A, B]S 2 and {ψj } be the corresponding eigenvectors constituting an orthonormal basis. Then from (12) we have 6 1 1 1 1 (a2 + b2 ) (ψj | S 2 M (da db)S 2 ψj ) ≥ (ψj | S 2 (A2 + B 2 )S 2 ψj ) + |λj |. Applying j to the both sides, we obtain 6 1 1 (a2 + b2 )Tr SM (da db) ≥ Tr S(A2 + B 2 ) + Tr Abs S 2 [A, B]S 2 , 1

1

which is identical with (11) because the eigenvalues of S 2 [A, B]S 2 are equal 1 1 1 1 to those of S[A, B] = S 2 (S 2 [A, B]S 2 )S − 2 .

entire

December 28, 2004

13:56


Hiroshi Nagaoka

110

1

1

Remark 4: In the above lemma we assumed that iS 2 [A, B]S 2 has discrete eigenvalues. Some mathematical refinement would be desired so that the lemma is reformulated and is proved in a more general setting. Remark 5: It is well-known that if [A, B] = 0 there 2 exists a simple measurement M satisfying (10). In this case, we have (a2 + b2 )M (da db) = A2 + B 2 , and therefore the equality in (11) holds for ∀S. Next, suppose that A and B satisfy the Heisenberg canonical commutation relation [A, B] = i ( is a positive constant). Then, for ∀a ∈ R, ∀b ∈ R, we can define ψ = |a, b) as a vector of norm 1 satisfying (A + iB)ψ = (a + ib)ψ, and a measurement M is defined by da db . 2π 2 This measurement is shown to satisfy (10) and (a2 + b2 )M (da db) = A2 + B 2 + ([4]). Again, the equality in (11) holds for ∀S. For arbitrarily given A and B, however, there does not always exist a measurement M satisfying (10) and achieving the lower bound in (11) for ∀S simultaneously. Nevertheless, we can consider the problem whether, for arbitrarily given A, B and S, there always exists a measurement M satisfying (10) and achieving the lower bound in (11). This has been affirmatively solved for the case dim H = 2 by the author∗ , but is still an open problem for the general case at present. def

M (da db) = |a, b)(b, a|

Now, suppose that we are given a 2-parameter model S = {Sθ ; θ = (θ , θ2 ) ∈ Θ } and a positive-definite matrix G = [gjk ] ∈ R2×2 . Let 1

def G) | X = (X 1 , X 2 ) ∈ Xθ } C NEW = min{µθ (X;

where G) def + µθ (X; = tr G ReZθ [X]

√ det G Tr Abs Sθ [X 1 , X 2 ].

We have the following theorem. Theorem 6: C NEW is a CR bound, and is more informative than C H . ˆ be an arbitrary estimator, and let Proof: Let M def

ˆ = a(θ)

2

h1k (θˆk − θk ),

def

ˆ = b(θ)

k=1 ∗ Additional

author’s note: See Chap. 11 of this book.

2 k=1

h2k (θˆk − θk )

entire

December 28, 2004

13:56



111

1

where hjk denotes the (j, k) element of G 2 . Then we have 6 ˆ ˆ 2 + {b(θ)} ˆ 2 ]Tr Sθ M ˆ (dθ). ˆ tr GVθ [M ] = [{a(θ)} Hence, defining def

A =

6 ˆM ˆ = ˆ (dθ) a(θ)

2

h1k X k ,

def

B =

6 ˆM ˆ = ˆ (dθ) b(θ)

k=1

2

h2k X k ,

k=1

ˆ by (5), we obtain from Lemma 3 where (X 1 , X 2 ) are defined from M ˆ ] ≥ Tr Sθ (A2 + B 2 ) + Tr Abs Sθ [A, B] = µθ (X; G). tr GVθ [M This inequality immediately proves the first assertion of the theorem that C NEW is a CR bound. The second assertion that C NEW ≥ C H follows from the inequality G) ≥ λθ (X; G) µθ (X;

= (X 1 , X 2 )), (∀X

(13)

which is proved in the following way. It is seen that 7 87 8 i g11 g12 Tr Sθ [X 1 , X 2 ] 0 2 G ImZθ [X] = g21 g22 0 − 2i Tr Sθ [X 1 , X 2 ] 7 8 i −g12 g11 . = Tr Sθ [X 1 , X 2 ] −g22 g21 2 √ This matrix has the eigenvalues ± 12 det G Tr Sθ [X 1 , X 2 ], and hence we have = tr G ImZθ [X]

√ √ det G | Tr Sθ [X 1 , X 2 ] | ≤ det G Tr Abs Sθ [X 1 , X 2 ] ,

which yields (13). In the simplest noncommutative case n = 2, which is known to describe the spin of a spin- 21 particle, we have the following very strong theorem. Regrettably, the proof is omitted here for want of space. Theorem 7: If dim H = 2, we have √ C NEW = C MI = C S + ( det G / det JθS ) Tr Abs Sθ [LSθ,1 , LSθ,2 ].

entire

December 28, 2004

13:56


Hiroshi Nagaoka

112

References [1] C.W. Helstrom, “Minimum mean-square error estimation in quantum statistics,” Phys. Lett., 25A, 101–102 (1967). [2] C.W. Helstrom, Quantum detection and estimation theory, (Academic Press, New York, 1976). [3] A.S. Holevo, “Noncommutative analogues of the Cramér-Rao inequality in the quantum measurement theory,” In Lecture Notes in Math., 550, 194–222 (1976). [4] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, (NorthHolland, Amsterdam, 1982). [5] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of non-selfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973).

Additional Note by Author Invoking some results from the paper “A generalization of the simultaneous diagonalization of Hermitian matrices and its relation to quantum estimation theory” included in this volume, Theorem 7 is readily proved as follows. = (X1 , X2 ), let For a pair of Hermitian operators X 7 1 1 8 1 2 def = X , X X , X , Φ[X] = ReZθ [X] X 2 , X 1 X 2 , X 2 def

where X, Y = ReTr Sθ XY . It then follows from Theorem 2 and Lemma 8 in “A generalization of ...” that 9 G) = νθ (X; G) = tr GΦ[X] + 2 det GΦ[X]. µθ (X; = (K 1 , K 2 ) be defined by In addition, letting K Kj =

2

(JθS )−1

jk

LSθ,k ,

k=1

∈ Xθ and Φ[X] ≥ Φ[K] = (J S )−1 for any X ∈ Xθ , which is the we have K θ S essence of the fact that C is a CR bound. Therefore we obtain 9 + 2 det GΦ[X] C NEW = C MI = min tr GΦ[X] X∈X θ

9 = tr G(JθS )−1 + 2 det G(JθS )−1 . The expression of Theorem 7 is obtained from this by applying Lemma 8 in 9 “A generalization of ...” again to yield Tr Abs Sθ [LSθ,1 , LSθ,2 ] = 2

det JθS .

entire

December 28, 2004

13:56


CHAPTER 9

On Fisher Information of Quantum Statistical Models Hiroshi Nagaoka

1. Introduction In various fields of up-to-date sciences and technologies, quantum noise (quantum mechanical fluctuation) is a big obstacle (one is beyond far and one is impending immediately) in the way of their progress. The characteristic of quantum noise is specified by the quantum mechanical state (statistical operator) of the system, which is considerably different from the way that the characteristic of the ordinary noise is specified by the probability distribution. Therefore, classical statistics and information theory based on the ordinary probability theory are not sufficient to deal with the quantum noise, and we newly need to develop quantum statistics and quantum information theory based on quantum mechanical probability theory (that is also called non-commutative probability theory). This kind of theory is mainly studied in relation to optical communications now, and it is expected that it will be applied to various fields by and by. Moreover, the results of such theoretical researches may also suggest new technical feasibilities in the future. In this paper, we will consider a parametric estimation problem with respect to a one-parameter model {Pθ | θ ∈ Θ} (Θ ⊂ 1) consisting of statistical operators. This is the quantum version of an ordinary (classical) parametric estimation problem. In the classical case, the accuracy of estimation can be evaluated by the so-called Fisher information I(θ). That is, the variance of an unbiased estimator cannot be smaller than 1/I(θ) (CramérRao inequality). If there exists an unbiased estimator attaining this lower bound, the model belongs to the class called exponential family, and it is known to have various good properties. In addition, even in a general model, this lower bound is attainable asymptotically (as data size → ∞). This paper was originally written in Japanese. It was translated to English by Fuyuhiko Tanaka and Yoshiyuki Tsuda. 113

entire

December 28, 2004

114

13:56


Hiroshi Nagaoka

Moreover, through consideration on the independence and the sufficiency of a statistic, I(θ) is provided with a meaning as the information of data with respect to estimation of θ. On the other hand, in the case of the quantum system, analogues of the Fisher information are defined based on symmetric logarithmic derivatives or right logarithmic derivatives. It is shown that quantum Cramér-Rao inequalities hold with respect to them [1, 2, 3]. However, the definition of them gives an impression as if they were defined ad hoc so that the Cramér-Rao inequalities hold. Moreover, the meaning of them as information is not obvious. In this paper, we will investigate the Fisher information defined by the symmetric logarithmic derivatives from various points of view and consider its meaning. In the sequel, the Hilbert space of the quantum system of interest is assumed to be finite-dimensional. Although this assumption causes a considerable loss of generality, the essence of the following arguments should be valid even in the infinite-dimensional case. Due to this assumption, we are free from complications accompanying functional-analytical arguments and are able to argue quantum mechanical non-commutativity in the framework of elementary linear algebra.

2. Statistical Operator and Non-Commutative Probability Theory In this section, we explain the basic idea of non-commutative probability theory. Now, suppose that the quantum system of interest is described by a finite-dimensional complex Hilbert space H. Let the inner product on H be denoted by (x, y) ∈ C (x, y ∈ H), and let L be the set of linear operators (linear transformations) on H. Then, for each A ∈ L, the conjugate operator A∗ ∈ L is defined by (x, Ay) = (A∗ x, y)

(∀x, ∀y ∈ H).

An operator A ∈ L is called a Hermitian operator iff A = A∗ holds, and the set of Hermitian operators is denoted by Lh . Each A ∈ Lh corresponds to an observable in the quantum system H, and the eigenvalues of A represent values which the observable can take. A Hermitian operator plays a role of a random variable in the ordinary probability theory. One can perform various measurements on the quantum system H, and each measurement corresponds in a one-to-one manner to a family of oper-

entire

December 28, 2004

13:56


On Fisher Information of Quantum Statistical Models

115

ators Π = {πk | k = 1, 2, . . .} satisfying the following: πk = πk∗ ∈ Lh ,

(1)

πk2

(2)

= πk ,

k "= l ⇒ πk πl = 0, πk = Id (identity).

(3) (4)

k

Moreover, for the measurement corresponding to Π, there is a one-to-one correspondence between an outcome of the measurement and a πk . The conditions (1) and (2) are equivalent to that πk is the orthogonal projection to a subspace Vk of H, and the conditions (3) and (4) imply that the {Vk } satisfies : Vk = H. k "= l ⇒ Vk ⊥ Vl (orthogonal), k

A family of operators Π = {πk } satisfying (1)-(4) is called an orthogonal resolution of identity or a measurement (Note that, in this paper, we do not consider the notion of generalized quantum measurement∗ [1, 2, 3].) Any observable A ∈ Lh can be written as ak πk (5) A = k

by a measurement Π = {πk }. For example, we can choose the orthogonal projection to the eigenspace Vk = {x | Ax = ak x} as πk . In this case, k "= l ⇒ ak "= al

(6)

holds, and then (5) is called the spectral decomposition of A. In general, equation (5) is interpreted to mean that the value of the observable A is determined by the measurement Π. Namely, when the outcome of the measurement corresponding to πk is obtained, the value of A is determined to be ak . The quantum system H can take various states. Each state corresponds in a one-to-one manner to a P ∈ Lh satisfying the following:

∗ Additional

P = P ∗ ≥ 0 (i.e. (x, P x) ≥ 0 (∀x ∈ H)),

(7)

Tr P = 1,

(8)

author’s note: This is another name of the POVM.

entire

December 28, 2004

13:56


Hiroshi Nagaoka

116

where, in general, Tr A (A ∈ L) denotes the trace of A, and, by using an arbitrary orthonormal basis {u1 , u2 , . . .} of H, it can be written as (uk , Auk ). Tr A = k

Note that Tr AB = Tr BA. An operator P satisfying (7) and (8) is called a statistical operator, a density operator, a state etc., and it plays a role of a probability distribution in the ordinary probability theory. When we perform the measurement Π under the state P , the probability of obtaining the outcome πk , say p(k | Π), is given by p(k | Π) = Tr P πk .

(9)

In terms of the spectral decomposition (5) and (6) of the observable A, equation (9) represents the probability of obtaining ak as the outcome of the measurement of A. From this, the expected value of A is given by the following formula: ak Tr P πk = Tr P A. (10) EP [A] = k

Hereafter, we focus on the regular statistical operators P (so that P > 0) and denote the set of them by P. Moreover, for each P ∈ P, a complex inner product ( )P and a real inner product P on L are defined by the following: (A, B)P = Tr P BA∗ , def def

A, BP = Re(A, B)P = (A, B ∈ L).

(11) 1 {(A, B)P + (B, A)P } 2

(12)

3. Quantum Cram´ er-Rao Inequalities Suppose that a one-parameter model {Pθ | θ ∈ Θ} ⊂ P of regular statistical operators is given. Then, let us consider the problem of estimating θ in the situation where the true state lies in the model {Pθ } and θ is unknown. Since the estimation is given based on a certain measurement, an estimator itself is an observable and denoted by a Hermitian operator. An estimator T ∈ Lh is called an unbiased estimator iff it satisfies def

Eθ [T ] = Tr Pθ T = θ (∀θ ∈ Θ).

(13)

Then, the variance of T is represented as follows: Vθ [T ] = Tr Pθ (T − θ)2 .

(14)

entire

December 28, 2004

13:56



117

Here, for each θ ∈ Θ, from the positivity of Pθ , there exists unique Lθ = L∗θ ∈ Lh satisfying d 1 Pθ = (Pθ Lθ + Lθ Pθ ). dθ 2

(15)

This Lθ is called the symmetric logarithmic derivative of the model {Pθ } at θ. In addition, d ˜ θ def L = Pθ−1 · Pθ dθ

(16)

is called the right logarithmic derivative. For them, let def

I(θ) = Tr Pθ (Lθ )2 , def

˜ I(θ) = Tr

Pθ L˜θ L˜∗θ ,

(17) (18)

then, for any unbiased estimator T , Vθ [T ] ≥ 1/I(θ), ˜ Vθ [T ] ≥ 1/I(θ)

(19) (20)

——Quantum Cramér-Rao inequality—— hold [1, 2, 3]. In fact, since ˜ I(θ) ≤ I(θ)

(21)

always holds [1, 2], if we have (19), then we do not need (20). However, in the case when the dimension of the parameter θ is more than one, an example is known such that the inequality based on the right logarithmic derivative gives a better evaluation than that based on the symmetric logarithmic derivative [1, 2, 3]. Now, we slightly generalize (19)–(21) and prove them. Let Kθ be the set of Kθ ∈ L satisfying the following equation: 1 d Pθ = (Pθ Kθ + Kθ∗ Pθ ). dθ 2

(22)

˜ θ belong to Kθ . Hence, it is enough to show that, for any Both Lθ and L Kθ ∈ Kθ , Vθ [T ] ≥ 1/TrPθ Kθ Kθ∗ , TrPθ Kθ Kθ∗

≥ I(θ).

and

(23) (24)

entire

December 28, 2004

13:56


Hiroshi Nagaoka

118

First, since it holds from (22) that ... (13) dθ d T − θ, Kθ Pθ = Tr Pθ (T − θ) = = 1, dθ dθ Schwarz’s inequality yields that T − θ, T − θPθ · Kθ , Kθ Pθ ≥ 1. This is exactly (23). Next, we observe that the equation (22) is equivalent to the following condition: d Pθ X for ∀ X ∈ Lh . Kθ , XPθ = Tr dθ In the right-hand side of the above equation Kθ does not appear. This means that Kθ and Lh are orthogonal (with respect to the inner product Pθ ). Hence, the orthogonal projection of ∀Kθ ∈ Kθ to Lh is the intersection of Kθ and Lh , that is Lθ . Hence, we obtain the following: Kθ , Kθ Pθ ≥ Lθ , Lθ Pθ . This is exactly (24). It follows from the above result that the symmetric logarithmic derivative Lθ gives the best lower bound among the logarithmic derivatives Kθ satisfying (22). Thus, in what follows, we call I(θ), which is defined from L(θ) by (17), the Fisher information of the model {Pθ }, and will investigate its properties. 4. Quantum Exponential Family An unbiased estimator attaining the lower bound in (19) is called an efficient estimator of θ. We shall clarify the condition under which there exists an efficient estimator. First, from the argument of the previous section, if the equality in (19) holds, then, for each θ ∈ Θ, there exists a constant c(θ) satisfying the following: Lθ = c(θ)(T − θ). Then, we can integrate (15) as follows: 1

1

Pθ = e 2 {β(θ)T −γ(θ)}P0 e 2 {β(θ)T −γ(θ)}, where def

6

β(θ) =

0

θ

c(θ )dθ ,

def

6

γ(θ) =

0

θ

θ c(θ )dθ .

(25)

entire

December 28, 2004

13:56



119

Here, 0 ∈ Θ is assumed for convenience. Conversely, if the model {Pθ } can be written in the form (25), then T can be shown to be an efficient estimator of the parameter θ¯ defined by dγ def (dγ/dθ) = . θ¯ = dβ/dθ dβ Hence, we obtain the following. Theorem 1: For a model {Pθ }, an efficient estimator of some parameter exists, if and only if {Pθ } can be written in the form (25). Equation (25) is an extension of the exponential family in the classical case to the quantum system, and we call it a quantum exponential family. In (25), letting T = k tk πk be the spectral decomposition of T , the distribution pθ (tk | T ) = TrPθ πk of the outcome of the measurement of T can be written as pθ (tk | T ) = p0 (tk | T )eβ(θ)tk −γ(θ) , which forms a classical exponential family. 5. Measurements of the Quantum System and Fisher Information When a measurement Π = {πk } is fixed, the probability distribution of the outcome of the measurement under a state Pθ is def

pθ (k | Π) = TrPθ πk .

(26)

Then, {pθ (· | Π) | θ ∈ Θ} is a classical statistical model, and the accuracy of the parametric estimation can be evaluated by the following Fisher information: 2 d def pθ (k | Π) . pθ (k | Π) (27) I(θ | Π) = dθ k

Now, let us study the relation between this classical I(θ | Π) and the quantum I(θ). First, let def M(Π) = ak πk ak ∈ C ⊂ L, k def Mh (Π) = M(Π) ∩ Lh = ak πk ak ∈ R . k

entire

December 28, 2004

13:56


Hiroshi Nagaoka

120

The set M = M(Π) satisfies the following. (i) For ∀X, ∀Y ∈ M, ∀a, ∀b ∈ C, aX + bY ∈ M, XY ∈ M, X ∗ ∈ M. (ii) Id ∈ M. In general, an M satisfying (i) and (ii) above is called a ∗-subalgebra of L. In the case when M = M(Π), it also satisfies the following commutativity. (iii) For ∀X, ∀Y ∈ M, XY = Y X. Moreover, Mh (Π) is the set of observables whose values are determined by the measurement Π. Here, we define Lθ (Π) ∈ Mh (Π) by the following equation: d def Lθ (Π) = (28) log pθ (k | Π) · πk . dθ k

Then, for ∀k, we have Lθ , πk Pθ

1 Tr(Pθ Lθ + Lθ Pθ )πk 2 ... (15) d d = Tr Pθ πk = pθ (k | Π) log pθ (k | Π) = Lθ (Π), πk Pθ , dθ dθ =

ReTrPθ Lθ πk =

so that Lθ , XPθ = Lθ (Π), XPθ

(∀X ∈ Mh (Π)).

Therefore, Lθ (Π) is the orthogonal projection of Lθ to Mh (Π). Here, since I(θ) = Lθ , Lθ Pθ and I(θ | Π) = Lθ (Π), Lθ (Π)Pθ , we finally obtain the following. Theorem 2: For any measurement Π, the following hold for each θ ∈ Θ: I(θ) ≥ I(θ | Π),

and

I(θ) = I(θ | Π) ⇐⇒ Lθ ∈ Mh (Π). Corollary 3: It holds that I(θ) = max I(θ | Π). Π

6. Quasi-Classical Model From the result of the previous section, we see the following. That is, when the true distribution is Pθ , the parametric estimation based on a measurement Πθ satisfying Lθ ∈ Mh (Πθ ) (the spectral decomposition of Lθ for instance) is optimal. In general, however, since Πθ changes depending on

entire

December 28, 2004

13:56



121

θ, there is no measurement that is optimal for all θ. In this point lies the difficulty peculiar to the estimation problem in the quantum system. If we can take Πθ independent of θ, namely, if there exists a measurement Π satisfying Lθ ∈ Mh (Π) for ∀θ ∈ Θ,

(29)

I(θ) = I(θ | Π) for ∀θ ∈ Θ

(30)

then

holds (; in this case, Π is said to preserve the Fisher information), and it is sufficient with respect to the estimation of θ to perform only the measurement Π. In this case, if we carry out, for example, the maximal likelihood estimation for the classical model {pθ (· | Π) | θ ∈ Θ} by using the outcome of the measurement Π, then the lower bound of Cramér-Rao (19) is attained asymptotically (as the number of times of the measurement → ∞). Thus, when there exists Π satisfying (29) (or (30)) we call {Pθ } a quasi-classical model. Now, suppose that (29) holds. Then, since it holds that Lθ Lθ = Lθ Lθ (∀θ, ∀θ ∈ Θ), (15) can be integrated as follows: Pθ = Mθ P0 Mθ ,

(31)

where def

1

Mθ = e 2

2θ 0

Lθ dθ

∈ Mh (Π).

Conversely, if {Pθ } is expressed by a certain {Mθ } ⊂ Mh (Π) as (31), then d Mθ )Mθ−1 so that (29) holds. Hence the following hold. we have Lθ = 2( dθ Theorem 4: A measurement Π preserves the Fisher information of a model {Pθ } if and only if there exists {Mθ } ⊂ Mh (Π) satisfying (31). Corollary 5: A model {Pθ } is quasi-classical if and only if it can be written as (31) by some commutative {Mθ } ⊂ Lh (i.e. Mθ Mθ = Mθ Mθ (∀θ, ∀θ )). In particular, a quantum exponential family (25) is quasi-classical. 7. Sufficiency of Measurements and Fisher Information For a classical model, a statistic preserves the Fisher information if and only if it is a sufficient statistic. In this section, we study the relation between the condition that a measurement Π preserves the Fisher information and the sufficiency of Π with respect to {Pθ }.

entire

December 28, 2004

13:56


Hiroshi Nagaoka

122

First, we shall explain the concept of conditional expectation in the quantum system [4]. Given a statistical operator P ∈ P and a ∗-subalgebra M of L, let EP (· | M) denote the orthogonal projection from L to M with respect to the complex inner product ( )P . Namely, for ∀A ∈ L, we have EP (A | M) ∈ M, and (A, X)P = (EP (A | M), X)p

(∀X ∈ M).

(32)

In the case when P and M satisfy a certain relation† , one can show that EP (· | M) satisfies desirable properties as conditional expectation. Moreover, it is also known that in the other case there does not exist such a desirable conditional expectation. Nevertheless, in this paper, for any P and M we call EP (· | M) the conditional expectation with respect to (P, M). Next, we shall describe the sufficiency. When EP (· | M) = EQ (· | M) holds for two statistical operators P , Q ∈ P and a ∗-subalgebra M of L, we say M is sufficient with respect to {P, Q}. (In [4], it is also required that P , Q and M satisfy the “certain relation” mentioned above.) This property is equivalent to (A, X)Q = (EP (A | M), X)Q

(∀A ∈ L, ∀X ∈ M).

(33)

Here, letting N = P −1 Q, it holds that (A, X)Q = (A, N X)P , and hence (33) can be rewritten as follows: (A, N X)P = (EP (A | M), N X)P

for ∀A ∈ L, ∀X ∈ M.

(34)

If N ∈ M, then the above equation holds from (32). Conversely, if the above equation holds, we can show that N ∈ M by letting X = Id especially. Therefore, M is sufficient with respect to {P, Q} if and only if P −1 Q ∈ M. When M = M(Π), we shall say a measurement Π is sufficient instead of saying M(Π) is sufficient. Now, if Π is sufficient with respect to {P, Q}, then N = P −1 Q ∈ M(Π), and hence we can write ak πk (ak ∈ C). N = k † Additional author’s note: Mentioned here is a relation equivalent to the condition that EP (A | M) ≤ A for ∀A ∈ L, where · denotes the operator norm; see J. Tomiyama, Proc. Japan Acad., 33, 608-612 (1957).

entire

December 28, 2004

13:56



123

Then, since TrQπk = TrP N πk = ak TrP πk for each k, we see that ak is a positive real number. Hence, the following holds: N = N ∗ > 0.

(35)

Since Q = P N is Hermitian, P N = N P and P Q = QP hold from (35). Now, we shall return to the subject of a model {Pθ }. In this case, also, if we say Π is sufficient with respect to {Pθ } when the conditional expectation EPθ (· | M(Π)) is independent of θ, then the following holds. Theorem 6: The condition that a measurement Π is sufficient with respect to {Pθ } is equivalent to the existence of {Nθ } ⊂ M(Π) satisfying the following equation: Pθ = P0 Nθ

(∀θ ∈ Θ).

(36)

Moreover, from (36), the following relations are derived: 0 < Nθ ∈ Mh (Π),

(37)

Pθ Nθ = Nθ Pθ ,

(38)

Pθ Pθ = Pθ Pθ

(39)

(∀θ, ∀θ ∈ Θ). In general, a model {Pθ } is called commutative if it satisfies (39). This condition is equivalent to the existence of a measurement Π satisfying {Pθ } ⊂ Mh (Π ). Although Theorem 6 does not necessarily implies Π = Π , Π also turns out to be sufficient with respect to {Pθ }. (Π satisfies the definition of sufficiency in Ref. [4] also.) Hence, the following holds. Corollary 7: There exists a measurement that is sufficient with respect to a model {Pθ } if and only if the model is commutative. Now, if Π is sufficient with respect to {Pθ }, then, from (36)-(38), the symmetric logarithmic derivative {Lθ } of {Pθ } is given by d Lθ = Nθ Nθ−1 ∈ Mh (Π), dθ so that it satisfies (29). Hence, we obtain the following. Theorem 8: If a measurement Π is sufficient with respect to a model {Pθ }, then Π preserves the Fisher information of {Pθ }.

entire

December 28, 2004

13:56

124


Hiroshi Nagaoka

On the other hand, it is easily seen that the converse of Theorem 8 does not hold. For, a quasi-classical model (31) is commutative only in a special case. For example, in the quantum exponential family (25), the measurement Π of the efficient estimator T preserves the Fisher information, but Π is not sufficient with respect to {Pθ } if T P0 "= P0 T . From this, we see that the sufficiency is a quite stronger condition than the preservation of the Fisher information. It is significantly different from the result in the classical case and inditates that we need deeper consideration of the meaning of the condition that the Fisher information is preserved. 8. Conclusion Throughout the paper, we considered the Fisher information of a quantum statistical model which has one parameter, mainly focusing on its relations with measurement. As future issues, first, by extending the result of this paper, we could discuss the Fisher information, more generally, in the case where a measurement is restricted to an arbitrary ∗-subalgebra of L. In addition, as a more important problem, there remains the extension to the case where the number of parameters is more than one. Since this concerns the problem of simultaneous measurement of non-commuting observables, it is much more difficult than the one-parameter case, and even on the Cramér-Rao inequality, no unified theory has been established so far [1, 2, 3]. References [1] C.W. Helstrom, Quantum detection and estimation theory, (Academic Press, New York, 1976). [2] O. Hirota, Optical communication theory (in Japanese), (Morikita Shuppan, Japan, 1985). [3] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, (NorthHolland, Amsterdam, 1982). [4] H. Umegaki and M. Ohya, Quantum theoretical entropy (in Japanese), (Kyoritsu Shuppan, Japan, 1984).

entire

December 28, 2004

13:56


CHAPTER 10

On the Parameter Estimation Problem for Quantum Statistical Models Hiroshi Nagaoka Abstract. The parameter estimation problem for a quantum statistical model S = {Sθ ; θ = (θ1 , . . . , θn ) ∈ Θ}, where {Sθ } are quantum states (i.e. density operators) and Θ is an open set in Rn , is considered. The problem is to estimate the true state which is unknown but is assumed to lie in S. The quantum analogues of various notions developed in the classical statistical estimation theory are studied; e.g., unbiased estimator, lower bound of Cramér-Rao type, efficient estimator, asymptotic efficiency, differential geometrical structures, etc. It is shown that the quantum theory for the one-parameter case (n = 1) is constructed almost in parallel with the classical theory, while several difficulties lie ahead of us to construct the general theory of quantum estimation for the multi-parameter case.

1. Introduction Motivated mainly by the recent progress in the field of optical communications, the quantum parameter estimation problem has been actively studied. The problem is to estimate the ‘true’ state which is unknown but is assumed to be represented by a density operator in a quantum statistical model S = {Sθ ; θ = (θ1 , . . . , θn ) ∈ Θ }. This can be regarded as a quantum analogue of the parameter estimation problem for a ‘classical’ statistical model P = {pθ ; θ = (θ1 , . . . , θn ) ∈ Θ }, where {pθ } are probability density functions, and many notions in the classical theory can find their analogues in the quantum theory. In this paper we study some of these notions such as unbiased estimator, lower bound of Cramér-Rao type, efficient estimator, exponential family, asymptotic efficiency, differential geometrical structures of a model, etc. from a comparative viewpoint between the classical case and the quantum case. It will be shown that the quantum theory for the one-parameter case (n = 1) is constructed almost in parallel with the classical theory, while several difficulties lie ahead of us to construct the general quantum theory for the multi-parameter case. In this paper, we do not delve into the mathematically rigorous treatment of operators on a Hilbert space, but often proceed in an ‘intuitive’ 125

entire

December 28, 2004

126

13:56


Hiroshi Nagaoka

manner, assuming several regularity conditions tacitly. For the rigorous argument, see [6]. 8. Efficient Estimator and Exponential Family∗ For an n-parameter classical statistical model P = {pθ ; θ ∈ Θ ⊂ Rn } of probability density functions on (Ω , B), an estimator θˆ : Ω → Rn is said to be efficient if θˆ is unbiased and achieves the Cramér-Rao bound (i.e., its covariance matrix coincides with the inverse of the Fisher information matrix) for ∀θ ∈ Θ . It should be noted that the unbiasedness and the efficiency of an estimator depend on the parametrization of P. If pθ in P is written as n βj (θ)tj (ω) − γ(θ)] (1) pθ (ω) = α(ω) exp[ j=1

by n + 1 functions {β1 , . . . , βn , γ} defined on Θ and n + 1 functions {α, t1 , . . . , tn } defined on Ω , P is called an exponential family. In this case, t = (t1 , . . . , tn ) becomes an efficient estimator for the parameter τ = (τ 1 , . . . , τ n ) defined by 6 ∂γ def τ j (θ) = tj (ω)pθ (ω)dω = . ∂βj Conversely, if θˆ is an efficient estimator for a parameter θ of a model P, then P is written as (1) where tj = θˆj (j = 1, . . . , n). Therefore the necessary and sufficient conditon for P to have an efficient estimator for some parametrization is that P is an exponential family. These are well-known results in the classical estimation theory. The corresponding problem in the quantum case is more complicated because we must consider the weight matrix G. It should be noted that there is no reason to restrict G to be constant w.r.t. θ. So we suppose that a weight field G = {Gθ ; θ ∈ Θ } is given in addition to the model ˆ is said to be efficient S = {Sθ ; θ ∈ Θ }. In this situation, an estimator M ˆ if M is unbiased and satisfies ˆ ] = C MI (Gθ ) tr Gθ Vθ [M θ ∗ Editorial

for ∀θ ∈ Θ

note: Section 2 - Section 7 are preliminary parts, where the author gave basic definitions and reviewed the results obtained in his previous paper “A New Approach to Cramér-Rao Bounds for Quantum State Estimation,” which is included in this volume as Chap. 8. Since definitions and notations of this paper are same as those of the previous paper, readers can understand this paper without these sections by referring to Chap. 8 if neccessary. Therefore, they are omitted with author’s permission.

entire

December 28, 2004

13:56


On the Parameter Estimation Problem for Quantum Statistical Models

127

where CθMI (Gθ ) denotes the most informative CR bound w.r.t. Gθ at θ. In the one parameter case n = 1, however, the efficiency of an estimator does not depend on G as in the classical case. Furthermore, the notion of one parameter exponential family has its quantum analogue as follows. Let S = {Sθ ; θ ∈ Θ ⊂ R} be a one parameter quantum model. Tracing ˆ ] ≥ C S , we can see that the carefully the derivation of the inequality Vθ [M ˆ to achieve the CR bound C S for ∀θ ∈ Θ , condition for an estimator M ˆ , is that M ˆ is the measurement of which is equivalent to the efficiency of M an observable T satisfying LSθ = c(θ)(T − θ),

∀θ ∈ Θ .

(2)

where LSθ is the symmetric logarithmic derivative such that ∂ 1 Sθ = (Sθ LSθ + LSθ Sθ ) ∂θj 2

(3)

and c(θ) is a constant. If (2) holds, the differential equation (3) for Sθ is solved as Sθ = exp[{β(θ)T − γ(θ)}/2] A exp[{β(θ)T − γ(θ)}/2]

(4)

where def

6

θ

β(θ) =

c(σ)dσ,

def

6

θ

γ(θ) =

θ0

σc(σ)dσ,

def

A = Sθ 0 .

θ0

Conversely, if Sθ in S is written as (4) by functions {β, γ} defined on Θ and Hermitian operators {A, T } with A > 0, we have LSθ =

dγ dβ T− , dθ dθ

and therefore the measurement of T is an efficient estimator for the parameter def

τ = Tr Sθ T =

dγ . dβ

A one parameter model S of the form (4) is called a quantum exponential family. In the multi-parameter case, the condition for a pair (S, G) to have an efficient estimator is not known yet, although there are several important examples of (S, G) for which the existence of efficient estimators have been shown [4, 6].

entire

December 28, 2004

13:56


Hiroshi Nagaoka

128

9. Asymptotic Efficiency Consider the problem of estimating θ under the assumption that observations can be made on a number N of independent, identical systems, all with the same but unknown state Sθ in the model S. An estimation based ˆ (N ) on Rn over the on N observations is represented by a measurement M def Hilbert space H(N ) = H ⊗ · · · ⊗ H, the tensor product of N copies of H. In the asymptotic theory of classical statistical estimations, the following results are well-known. Suppose that an estimator θˆ(N ) based on N observations for a model P = {pθ ; θ ∈ Θ } is consistent in the sense that, for ∀θ ∈ Θ , θˆ(N ) converges to θ as N → ∞ in probability w.r.t. pθ . Then, under some regularity conditions, the covariance matrix Vθ [θˆ(N ) ] always satisfies lim N Vθ [θˆ(N ) ] ≥ Jθ−1

N →∞

(5)

for ∀θ ∈ Θ , where Jθ is the Fisher information matrix of P at θ. This can be regarded as an asymptotic version of the Cramér-Rao inequality, which is based on the fact that the Fisher information matrix for N observations is given by N Jθ . When the equality holds in (5) for ∀θ ∈ Θ , the estimator θˆ(N ) is said to be (first-order) asymptotically efficient. It can be shown that the maximum likelihood estimator (MLE) is asymptotically efficient for an arbitrary model satisfying some regularity conditions. Let us proceed to the quantum case. Suppose that a model S = {Sθ ; θ ∈ Θ } and a weight field G = {Gθ ; θ ∈ Θ } are given, and let Mθ be a measurement satisfying† CθMI (Gθ ) = tr Gθ (JθMθ )−1 .

(6)

We consider an addaptive estimation procedure as follows: choose the initial value θˆ0 ∈ Θ arbitrarily, and then carry out the following (i) and (ii) for N = 1, 2, . . . successively. (i) Perform the measurement MθˆN −1 , and denote the measurement result by xN . (ii) Using the obtained data (x1 , x2 , . . . , xN ), compute the next estimate θˆN by MLE; i.e., θˆN is defined as θ maximizing the likelihood function pθ (x1 |θˆ0 )pθ (x2 |θˆ1 ) · · · · · pθ (xN |θˆN −1 ) ˆ where pθ (x|θ)dx = Tr Sθ Mθˆ(dx). It is clear that the procedure defines ˆ (N ) over H(N ) for N = 1, 2, . . ., although we do not give an estimator M † Additional

author’s note: See Theorem 2 in Chap. 8 of this volume.

entire

December 28, 2004

13:56



129

ˆ (N ) . When all the quantum system to be here the explicit expression of M observed are in the state Sθ , the probability that θˆN belongs to B ⊂ Rn (N ) ˆ (N ) (N ) def is written as Tr Sθ M (B), where Sθ = Sθ ⊗ · · · ⊗ Sθ (N -tuple). ˆ (N ) , which means that θˆ(N ) tends to θ as N → ∞, The consistency of M then immediately follows from the consitency of MLE. Hence, when N is sufficiently large, the computation of θˆ(N ) becomes nearly equal to that of MLE for the model {pθ (·|θ) ; θ ∈ Θ }, whose Fisher information matrices are {JθM θ ; θ ∈ Θ }. Thus, recalling the asymptotic efficiency of MLE and ˆ (N ) ] of the estimator (6), it is expected that the covariance matrix Vθ [M satisfies ˆ (N ) ] = CθMI (Gθ ), lim N tr Gθ Vθ [M

N →∞

∀θ ∈ Θ .

(7)

Can this property be called the asymptotic efficiency of the estimator? In the one parameter case, the answer is ‘yes’. Indeed, it is easy to see that, def (N ) for the model S (N ) = {Sθ ; θ ∈ Θ }, JθS for S is replaced with N JθS [4] MI and hence Cθ (Gθ ) is replaced with N −1 CθMI (Gθ ) (see Section 6‡ ). This ˆ (N ) is best in the totality proves that the asymptotic performance (7) of M of consistent estimators. In the multi-parameter case, however, we do not yet know whether (7) is best or not. 10. Geometrical Structures A classical statistical model P = {pθ ; θ ∈ Θ } can be regarded as a differential manifold with a coordinate system θ on which a Riemannian metric J is defined by the Fisher information matrices {Jθ ; θ ∈ Θ } and a couple of affine connections, the exponential connection Γ and the mixture connection Γ˜ , are defined in the following way [1]: n 6 [Γθ ]kij = {∂i ∂j log pθ (ω)}{∂h pθ (ω)}dω · [Jθ−1 ]hk [Γ˜θ ]kij =

h=1 n

6

{∂i ∂j pθ (ω)}{∂h log pθ (ω)}dω · [Jθ−1 ]hk

h=1 def

where ∂i = ∂/∂θi . Using the Kullback information 6 p(ω) dω, K(p, q) = p(ω) log q(ω) ‡ Editorial

note: This omitted section explains that C MI = C S in the one-parameter case; see section 5 of Chap.8 in this book.

entire

December 28, 2004

13:56


Hiroshi Nagaoka

130

these are also written as [2] [Jθ ]ij =

−∂i ∂j K(pθ , pθ )|θ =θ ,

[Γθ ]kij

=−

n

∂i ∂j ∂h K(pθ , pθ )|θ =θ · [Jθ−1 ]hk

h=1

[Γ˜θ ]kij = −

n

∂i ∂j ∂h K(pθ , pθ )|θ =θ · [Jθ−1 ]hk

h=1

where that

def ∂i =

i ∂/∂θ . Γ and Γ˜ are mutually dual w.r.t. J [9, 1] in the sense

∂i [Jθ ]jk =

n

[Γθ ]hij [Jθ ]hk + [Γ˜θ ]hik [Jθ ]hj . h=1

The results in the classical estimation theory mentioned in the previous sections can be understood in this geometrical framework. For instance, an exponential family is characterized as a Γ -autoparallel submanifold of the whole manifold consisting of all the probability density functions which are positive almost everywhere. A quantum statistical model S = {Sθ ; θ ∈ Θ } can also be regarded as a manifold with a coordinate system θ. In the sequel we introduce some differential geometrical structures on S and examine whether or not they help us to understand the estimation problem for S. H. Umegaki [10] defined a quantum analogue of the Kullback information, called the relative entropy, by def

K(S, Q) = Tr S(log S − log Q) for density operators S and Q. Using this K in the same way as in the classical case, we can introduce on S a Riemannian metric J K and a couple of affine connections {Γ K , Γ˜ K } which are mutually dual w.r.t. J K . (J K was treated in [7].) However, it seems that the geometrical structure (J K , Γ K , Γ˜ K ) has little connection with the estimation theory. We consider, for instance, a quantum analogue of the notion of exponential family from the viewpoint of this geometry. A one-parameter model S = {Sθ ; θ ∈ Θ ⊂ R} is Γ K -autoparallel (i.e., a Γ K -geodesic) as a submanifold of the whole manifold consisting of all the nonsingular density operators, if and only if it is written as Sθ = exp[log A + β(θ)T − γ(θ)]

(8)

by functions {β, γ} defined on Θ and Hermitian operators {A, T } with A > 0. (As in the classical case, such a model is also characterized as a

entire

December 28, 2004

13:56



131

set of S minimizing K(S, Q) for some Q under the restriction that Tr ST has a given value.) It should be noted that (8) is different from the form of quantum exponential family (4) unless AT = T A. If A = I, both (4) and (8) are reduced to the form of a Gibbs state with the Hamiltonian T appearing in the quantum statistical physics. The symmetric logarithmic derivatives {LSθ,j ; j = 1, . . . , n} provides us with another geometrical structure (J S , Γ S , Γ˜ S ) on S as follows: [JθS ]ij = Re Tr Sθ LSθ,i LSθ,j = Tr (∂i Sθ )(LSθ,j ) n [ΓθS ]kij = Tr (∂i LSθ,j )(∂h Sθ ) · [(JθS )−1 ]hk [Γ˜θS ]kij =

h=1 n

Tr (∂i ∂j Sθ )(LSθ,h ) · [(JθS )−1 ]hk .

h=1

˜ S are mutually dual w.r.t. J S and that It is easy to see that Γ S and Γ S K S ˜ ˜ Γ = Γ . We note that Γ is not torsion-free (i.e., [ΓθS ]kij "= [ΓθS ]kji ) in general and hence, being different from the previously mentioned structures, (J S , Γ S , Γ˜ S ) cannot be derived from a nonnegative distance-like function such as K. The important fact on this geometry is that a one-parameter quantum exponential family (4) is characterized as a Γ S -geodesic. Regrettably, the notion of multi-dimensional Γ S -autoparallel submanifold does not correspond to the existence of an efficient estimator. Let G = {Gθ ; θ ∈ Θ } be a weight field, and denote JθMθ in (6) by JθG . It can be shown that, if we regard G as a Riemannian metric on S, i.e., if G is regarded as a covariant tensor field of degree 2 w.r.t. coordinate transformations, then J G = {JθG ; θ ∈ Θ } also becomes a Riemannian metric on S. This fact seems to suggest that geometrical considerations will be useful for studying the quantum estimation poroblem, although at present we do not possess a satisfactory geometrical theory for the problem. Finally, we note that the following matrix inequality holds in general, whose proof is omitted for want of space. JθK ≥ JθS ≥ JθG

(∀θ ∈ Θ ).

References [1] S. Amari, Differential-geometrical methods in statistics, Lecture Notes in Statist. 28, Springer-Verlag, 1985. [2] S. Eguchi, “Second order efficiency of minimum contrast estimators in a curved exponential family,” Ann. Statist., 11, 793–803 (1983).

entire

December 28, 2004

132

13:56


Hiroshi Nagaoka

[3] C.W. Helstrom, “Minimum mean-square error estimation in quantum statistics,” Phys. Lett., 25A, 101–102 (1967). [4] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [5] A.S. Holevo, “Noncommutative analogues of the Cramér-Rao inequality in the quantum measurement theory,” In Lecture Notes in Math., 550, 194–222 (1976). [6] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, 1982. [7] R.S. Ingarden, et al., “Information geometry of quantum statistical systems,” Tesor, N.S., 37, 105–111 (1982). [8] H. Nagaoka, “A new approach to Cramér-Rao bounds for quantum state estimation,” IEICE Technical Report, 89, 228, IT 89-42, 9–14, (1989). (Chap. 8. in this book) [9] H. Nagaoka and S. Amari, “Differential geometry of smooth families of probability distributions,” Math. Eng. Tech. Rep. 82-7, Univ. Tokyo (1982). [10] H. Umegaki, “Conditional expectation in an operator algebra, III,” Kodai Math. Sem. Rep., 11, 51–64 (1959). [11] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of non-selfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973).

entire

December 28, 2004

13:56


CHAPTER 11

A Generalization of the Simultaneous Diagonalization of Hermitian Matrices and its Relation to Quantum Estimation Theory Hiroshi Nagaoka Abstract. We study the problem of minimizing a quadratic quantity defined for given two Hermitian matrices X, Y and a positive-definite Hermitian matrix. This problem is reduced to the simultaneous diagonalization of X, Y when XY = Y X. We derive a lower bound for the quantity, and in some special cases solve the problem by showing that the lower bound is achievable. This problem is closely related to a simultaneous measurement of quantum mechanical observables which are not commuting and has an application in the theory of quantum state estimation.

1. Introduction Suppose that two Hermitian matrices {X, Y } ⊂ Cn×n and a positivedefinite Hermitian matrix S ∈ Cn×n are given arbitrarily. For a natural number m, we consider m nonnegative-definite Hermitian matrices {M1 , . . . , Mm } ⊂ Cn×n and 2m real numbers {x1 , . . . , xm ; y1 , . . . , ym } ⊂ R satisfying m

Mj = I

(identity),

j=1 m

m

j=1

j=1

xj Mj = X,

yj Mj = Y,

(1)

(2)

and define def

∆ =

m

(x2j + yj2 ) Tr[SMj ] (≥ 0),

(3)

j=1

c 1991 Japan Society for Industrial and Applied Mathematics (JSIAM). Reprinted, with permission, from Transactions of Japan Society for Industrial and Applied Mathematics, 1, 43-56, 1991. All or any part of this article may not be reproduced in any form or by any means without permission from JSIAM. This paper was originally written in Japanese. It was translated to English by Fuyuhiko Tanaka and Yoshiyuki Tsuda. 133

entire

December 28, 2004

13:56


Hiroshi Nagaoka

134

where Tr denotes the trace. For given X, Y and S, how can we obtain {M1 , . . . , Mm ; x1 , . . . , xm ; y1 , . . . , ym } minimizing ∆ and the minimum of ∆? Here, no restriction is set to the natural number m. That is to say, we wish to minimize ∆ with m allowed to be arbitrarily large (possibly infinite). In this paper, we investigate this problem in relation to the theory of quantum state estimation and solve it in some special cases. If X and Y are commuting (i.e. [X, Y ] = XY − Y X = 0), then the above problem can be easily solved by the simultaneous diagonalization of X, Y as will be explained later. That is, if we find eigenvalues {x1 , . . . , xn ; y1 . . . , yn } ⊂ R and eigenvectors {u1 , . . . , un } ⊂ Cn satisfying Xuj = xj uj , u∗j uk

Y u j = yj u j ,

(4)

= δjk ,

(5)

and set Mj = uj u∗j ∈ Cn×n ,

(6)

then {Mj ; xj ; yj } satisfies (1) and (2), and attains min ∆ = Tr[S(X +Y )], the minimum of ∆. As it is known well, in the quantum mechanics, an arbitrary physical quantity (which is called an observable) X is represented by a Hermitian operator X on a Hilbert space, and the measurement of X corresponds to the spectral decomposition (namely, diagonalization) of X. Moreover, the mathematical fact that two commuting Hermitian operators X and Y can be diagonalized simultaneously is considered as an expression of the physical fact that the corresponding observables X and Y can be measured simultaneously. On the other hand, if [X, Y ] "= 0, then X and Y cannot be diagonalized simultaneously, which is considered as a mathematical expression of the uncertainty principle asserting that X and Y cannot be measured simultaneously. The above fact is written in most of standard textbooks of the quantum mechanics. The concept of measurement there, however, covers only those measurements which can determine the quantum state of the system in arbitrary accuracy, and does not cover all the measurements defined as the operations carried out for the purpose of obtaining a certain information from the physical system. (The former one is sometimes called a simple measurement and the latter a generalized measurement.) Indeed, there are many realistic situations where non-commuting physical quantities such as the position and the momentum are measured simultaneously (sacrificing accuracy to some extent), and they fall into the category of generalized 2

2

entire

December 28, 2004

13:56


A Generalization of the Simultaneous Diagonalization of Hermitian Matrices

135

measurements. Mathematically, the extension from the simple measurement to the generalized measurement corresponds to the one from {Mj } written by (5), (6) (the set of projections to mutually orthogonal subspaces) to the set of nonnegative-definite Hermitian matrices {Mj } satisfying only (1) (see Section 3). Considering such a background, one can easily imagine that the problem mentioned in the first paragraph is related to the simultaneous measurement of non-commuting observables in a certain sense. Indeed, the formulation presented in this paper is a kind of generalization of the simultaneous measurement of two canonical conjugate observables ([1, 3, 5]) which is a well-known concept (see Section 6). However, it is difficult to give a unique definition to the concept of the simultaneous measurement for general noncommuting observables. Instead, we require the formulation where we set a certain criterion and conditions adapted for the purpose of the measurement and find out the optimal measurement under the criterion among those satisfying the conditions. The problem treated in this paper is to find out the measurement minimizing the deviation ∆ at the given quantum state S among measurements satisfying the unbiasedness condition (2) on the expectation, and is presented for the purpose of parametric estimation for a quantum statisitical model (quantum state estimation) (see Section 3). The estimation theory for quantum states is a field which has been studied mainly concerned with the optical communications so far, and its formulation can be considered as a kind of extension of the ordinary (classical) statistical estimation problem. Indeed, quantum versions of several concepts of the statistical estimation theory, e.g., Cramér-Rao’s inequality, Fisher information, have been studied and optimal estimation methods have been obtained for some important models as their applications ([7, 3, 5]). However, these results are insufficient in respect of generality, and currently the general theory of quantum estimation corresponding to the classical estimation theory has not been established yet. In constructing such a general theory, the most significant difficulty is that the fundamental mathematical theory on generalized measurements has not been fully developed. For example, the estimation of complex amplitude of coherent light in background thermal noise ([7]) and its generalization — the estimation of the expectation parameter for the quantum Gaussian state ([5], Chap. VI) — are the most similar cases to the classical estimation theory. However, since the theory is developed deeply depending on the simultaneous measurement theory of canonical conjugate observables, when we try to extend

entire

December 28, 2004

13:56


Hiroshi Nagaoka

136

it to more general case, we face the above-mentioned mathematical problem and get immediately bogged down with it. The present study is an attempt with the awareness of these issues. If the general theory of quantum estimation is developed, it is expected that, for various situations in which obtaining some information from a quantum system is aimed at, the optimal measurement will be derived mathematically and a wide range of applications will be made possible through the physical consideration on its implementation. 2. General Formulation of Problem The problem described in Section 1 can be naturally generalized from the following four points of view. (1) To generalize n × n matrices to operators on a general Hilbert space. (2) To extend from the two X, Y to the case of X1 , . . . , Xp of arbitrary number. (3) To extend the square sum in the criterion ∆ to a general positive quadratic form. (4) To rewrite the expression of finite {Mj ; xj ; yj } to measure-theoretical one including infinite or continuously infinite cases. The most general formulation including the whole is as follows. Let H be a Hilbert space and let X1 , . . . , Xp and S be Hermitian operators on H. Assume that S is positive-definite and its trace is finite. In addition, let G = [gij ] ∈ Rp×p be a symmetric matrix of positive-definite. When these are given, we consider a map M : B )→ M (B) with the domain def

B ( = the σ-field of all Borel sets of Rp ) and the range of operators on H satisfying the following conditions. (i)

∀B ∈ B;

(ii) (iii)

M (B) = M (B)∗ ≥ 0 M (φ) = 0,

(nonnegative-definite)

M (Rp ) = I

(identity)

For a sequence of at most countable {Bj } ⊂ B 5 satisfying Bi ∩ Bj = φ (∀i "= ∀j), M ( Bj ) = M (Bj ) j

j

6 (iv)

ξj M (dξ) = Xj

(dξ = dξ1 · · · dξp )

(j = 1, . . . , p).

If M (B) is not an operator but a scalar, (i)-(iii) is just the definition of a probability measure on (Rp , B). Therefore, such an M is sometimes called a probability operator measure.∗ Moreover, the integration of (iv) is defined ∗ Additional

author’s note: This is a synonym for POVM.

entire

December 28, 2004

13:56



137

just in the same way as the case where an M is a usual measure. Now, for such an M , our problem is to minimize 6 p p def gij ξi ξj Tr[SM (dξ)]. (7) ∆ = ∆(M ; S; G) = i=1 j=1 def

In the above, since µ(dξ) = Tr[SM (dξ)] is a measure in the usual sense, ∆ is always a nonnegative real number. Let ∆∗ (X1 , . . . , Xp ; S; G) be the minimum of ∆. In general, a solution of the problem depends on a weight matrix G. However, if the case of G = Ip is generally solved, then we can also obtain a solution for arbitrary G. Namely, obtaining ∆∗ (X1 , . . . , Xp ; S; G) is equivalent to obtaining ∆∗ (Y1 , . . . , Yp ; S; Ip ), where G1/2 = [fij ] and p Yi = j=1 fij Xj . For {Mj ; xj ; yj } satisfying (1) and (2), if we set def Mj M (B) = j:(xj ,yj )∈B

for any Borel set B ⊂ R2 and set p = 2, X1 = X, X2 = Y , then (i)-(iv) hold. Moreover, in this case, ∆ in (3) is equal to ∆(M ; S; I2 ). Therefore, the problem in this section is a generalization of the problem in Section 1. 3. Background of Problem In quantum mechanics, there corresponds a Hilbert space H to the quantum system of interest, and various physical concepts for the system are expressed by mathematical concepts for H. For example, an observable is expressed by a Hermitian operator on H and a state of the system is expressed by a nonnegative-definite Hermitian operator of trace one (which is called a density operator or a statistical operator ). On the other hand, a (generalized) measurement performed to the system is expressed by a probability operator measure satisfying (i)-(iii) in Section 2. A little more precise description is as follows. A measurement for a quantum mechanical system has the following two aspects. One is the outcome of the measurement, generally considered as a statistic according to a probability distribution. This probability distribution depends on the state S just before the measurement. The other aspect of a measurement is the state change of the system S → S caused by the measurement. The mathematical concept describing the former one is the probability operator measure (see, for example, [3, 5, 4]). Now, suppose that a measurement takes values on Rp . For such

entire

December 28, 2004

13:56


Hiroshi Nagaoka

138

a measurement, there corresponds a probability operator measure M on (Rp , B), and the probability distribution of the outcome of the measurement is given by P (B) = Tr[SM (B)]

(∀B ∈ B),

(8)

where S is a density operator describing the state of the system just before the measurement. It can be easily checked from (i)-(iii) in Section 2 and the fact that S is a density operator (S = S ∗ ≥ 0 and Tr[S] = 1) that the above P is a probability measure on (Rp , B). A p-parameter family of density operators S = {Sθ ; θ = (θ1 , . . . , θp ) ∈ Θ} (Θ is an open set of Rp ) is called a p-dimensional quantum statistical model. For the family S, a parameter estimation problem can be considered in the same way as one in the ordinary statistical theory. That is, assuming that the true state of the observed physical system belongs to the model S, our problem is to find an estimator which estimates the unknown parameter θ as accurately as possible. In this case, an estimator is a measurement with values in Rp , so that it is described by a probability operator measure M on (Rp , B). When the state of the system is Sθ , the probability distribution of the def estimator can be written as Pθ (B) = Tr[Sθ M (B)] (∀B ∈ B). When the equation 6 (9) ξj Pθ (dξ) = θj (∀j ∈ {1, . . . , p}) holds for ∀θ ∈ Θ, the estimator is called unbiased. With this equation the variance-covariance matrix V (θ) = [vij (θ)] ∈ Rp×p is represented as 6 6 vij (θ) = (ξi − θi )(ξj − θj )Pθ (dξ) = ξi ξj Tr[Sθ M (dξ)] − θi θj . Now, let us give a real symmetric matrix G = [gij ] ∈ Rp×p as a weight and regard tr[GV (θ)] =

p p i=1 j=1

gij vij (θ) = ∆(M ; Sθ ; G) −

p p

gij θi θj

i=1 j=1

as a quantity representing the deviation of the estimated value (from the true value θ) with respect to the estimator M . (The trace tr of p×p matrices is distinguished from the trace Tr of operators on H.) Obtaining a lower bound of tr[GV (θ)] for unbiased estimators is equivalent to obtaining a quantum version of Cramér-Rao inequality in the ordinary statistics and plays an important role in the theory of quantum

entire

December 28, 2004

13:56



139

estimation. (As a matter of fact, we consider another condition called the local unbiasedness at θ, which is a weaker and localized version of the unbiasedness, to develop the theory more simply, although we do not explain the detail here. See [5, 6].) For an estimator M , when we define Hermitian operators {X1 , . . . , Xp } by Equation (iv) in Section 2, unbiasedness (9) is described by (∀j ∈ {1, . . . , p}).

Tr[Sθ Xj ] = θj

(10)

Therefore, obtaining a lower bound of tr[GV (θ)] is reduced to obtaining a lower bound of ∆∗ (X1 , . . . , Xp ; Sθ ; G) for {X1 , . . . , Xp } satisfying (10). Thus we find that the problem mentioned in Section 2 is relevant to the theory of quantum estimation.

4. Fundamental Inequality We again consider the problem in Section 1 in the following. That is to say, def def we consider the problem of minimizing ∆ ( = ∆(M ; S)) in (3) for M = {Mj ; xj ; yj } satisfying (1) and (2) when n × n Hermitian matrices X, Y and an n×n positive-definite Hermitian matrix S are given. (The minimum is denoted by ∆∗ (X, Y ; S).) For this problem, the following holds. Theorem 1: Let {λ1 , . . . , λn } be the eigenvalues of the matrix iS 1/2 [X, Y ]S 1/2 (i2 = −1). Then, we have ∆(M ; S)

( ≥ ∆∗ (X, Y ; S) ) ≥ Tr[S(X 2 + Y 2 )] +

n

|λk |

(11)

k=1 def

= Γ(X, Y ; S).

def

def

Proof: For each j ∈ {1, . . . , m}, let zj = xj ± iyj and Z = X ± iY (each sign chosen in the corresponding order), then (zj I − Z)Mj (zj I − Z)∗ is a nonnegative-definite Hermitian matrix. Hence, m

(zj I − Z)Mj (zj I − Z)∗ =

j=1

=

m j=1 m

|zj |2 Mj − ZZ ∗

(← (1), (2))

(x2j + yj2 )Mj − (X ± iY )(X ∓ iY )

j=1

entire

December 28, 2004

13:56


Hiroshi Nagaoka

140

is also a nonnegative-definite Hermitian matrix, and we obtain the matrix inequality m

(x2j + yj2 )Mj ≥ X 2 + Y 2 ± i[X, Y ].

j=1

Since S = S ∗ > 0 from the assumption, the above formula results in m

1

1

1

1

1

1

(x2j + yj2 )S 2 Mj S 2 ≥ S 2 (X 2 + Y 2 )S 2 ± iS 2 [X, Y ]S 2 .

(12)

j=1

Here, since iS 1/2 [X, Y ]S 1/2 is a Hermitian matrix, it has eigenvalues {λ1 , . . . , λn } (⊂ R) and orthonormal eigenvectors {u1 , . . . , un } (⊂ Cn ). Multiplying u∗k from the left and uk from the right in both sides of Equation (12), we have m

1

1

1

1

(x2j + yj2 )u∗k S 2 Mj S 2 uk ≥ u∗k S 2 (X 2 + Y 2 )S 2 uk + |λk |,

(13)

j=1

and obtain (11) by summing up both sides with respect to k. The above theorem holds in the same form even for a general probability operator measure M on (R2 ; B). Moreover, we may expect it to be naturally extended to the case of operators on an infinite-dimensional Hilbert space (under some regularity conditions). 5. The Commutative Case If [X, Y ] = 0, then Γ(X, Y ; S) = Tr[S(X 2 + Y 2 )] holds. In this case, if we define M = {Mj ; xj ; yj } by (4)-(6), then (1) and (2) obviously hold and m j=1

x2j Mj = X 2 ,

m

yj2 Mj = Y 2

j=1

hold, so that we have ∆(M ; S) = Tr[S(X 2 + Y 2 )]. Therefore, it gives the minimum of ∆ and ∆∗ (X, Y ; S) = Γ(X, Y ; S) holds. The solution M in this case depends only on X and Y , and is independent of S. Moreover, this solution also gives the minimum of ∆(M ; S; G) with respect to any positive-definite weight matrix G ∈ R2×2 . 6. The Case of Canonical Commutation Relation We consider two Hermitian operators X and Y satisfying the Heisenberg’s canonical commutation relation [X, Y ] = iµI (µ is a positive constant) such

entire

December 28, 2004

13:56



141

as the position and the momentum. In this case, even though the Hilbert space H is necessarily infinite-dimensional, from iS 1/2 [X, Y ]S 1/2 = −µS we can write Γ(X, Y ; S) = Tr[S(X 2 + Y 2 )] + µTr[S], and Theorem 1 holds as it is. In the following, we will show the existence of a probability operator measure M attaining the lower bound of Theorem 1. For arbitrary two real numbers x and y, we consider a subspace of H consisting of ψ satisfying (X + iY )ψ = (x + iy)ψ

(14)

(such a ψ is called a minimally uncertain state or a coherent state etc.), and let Ex,y be the orthogonal projection from H to this subspace. Note that (X + iY )Ex,y = (x + iy)Ex,y and its conjugate Ex,y (X − iY ) = (x − iy)Ex,y hold. Then, it is known that the formula 6 1 (15) Ex,y dx dy = I 2πµ holds ([2]). This relation is often referred to as the completeness or the over-completeness of the coherent states. By multiplying X + iY from the left in both sides of this formula, we have 6 1 (16) (x + iy)Ex,y dx dy = X + iY, 2πµ which can be also written as 6 1 xEx,y dx dy = X, 2πµ

1 2πµ

6 yEx,y dx dy = Y.

(17)

Moreover, if we multiply X − iY from the right in both sides of (16), from [X, Y ] = iµI, we have 6 1 (x2 + y 2 )Ex,y dx dy = X 2 + Y 2 + µI. (18) 2πµ (See, for example, Chap. III in [5]for rigorous treatments of these formulae.) From the completeness (15), a probability operator measure M on def

(R2 , B) is defined by M (dx dy) = Ex,y dx dy/2πµ, and from (17) and (18), ∆(M ; S) = Tr[S(X 2 + Y 2 )] + µTr[S] = Γ(X, Y ; S) holds for any density operator S. From the above, we see that ∆∗ (X, Y ; S) = Γ(X, Y ; S) holds also in the case of canonical commutation relation. The solution in this case is independent of S as well as the one in the last section, whereas, when we consider the minimization of ∆(M ; S; G) in general, the solution depends on G.

entire

December 28, 2004

13:56


Hiroshi Nagaoka

142

The estimation of complex amplitude of coherent light in background thermal noise ([7]) and the estimation of expectation parameter of quantum Gaussian state as its generalization ([5], Chap. VI) can be analyzed in a similar way to the one in the classical estimation theory, for which the optimal unbiased estimators have been obtained. The mathematical basis of the theory developed there can be found in what we mentioned in the present section. That is, for these models, one can obtain concretely the lower bound of the deviation of the estimated value tr[GV (θ)] defined in Section 3 by applying the above-mentioned relation, and since the estimator M attaining the lower bound does not depend on a state Sθ , the uniformly optimal estimator is obtained. 7. 2 × 2 Matrix Case In this section, we prove the following theorem. Theorem 2: In the case of 2 × 2 matrices, ∆∗ (X, Y ; S) = Γ(X, Y ; S) always holds. That is, for arbitrarily given 2 × 2 matrices X, Y and 2 × 2 positivedefinite Hermitian matrix S, we construct M = {Mj ; xj yj } satisfying (1) and (2) such that ∆(M ; S) = Γ(X, Y ; S). First, we remark the following fact. Lemma 3: If X, Y, S satisfy ∆∗ (X, Y ; S) = Γ(X, Y ; S), then, ∆∗ (X , Y ; S) = Γ(X , Y ; S) also holds, where X = X + aI and Y = Y + bI (∀a ∈ R, ∀b ∈ R). Proof: Suppose that M = {Mj ; xj ; yj } satisfy (1), (2) and ∆(M ; S) = Γ(X, Y ; S). Then, setting xj = xj + a, yj = yj + b and M = {Mj ; xj ; yj }, (1) and (2) hold for M , X , Y , and we can write ∆(M ; S) = (xj 2 + yj 2 )Tr[SMj ] j

= ∆(M ; S) + 2aTr[SX] + 2bTr[SY ] + (a2 + b2 )Tr[S]. Moreover, since [X , Y ] = [X, Y ], Γ(X , Y ; S) = Γ(X, Y ; S) − Tr[S(X 2 + Y 2 )] + Tr[S(X + Y )] 2

2

= Γ(X, Y ; S) + 2aTr[SX] + 2bTr[SY ] + (a2 + b2 )Tr[S]

entire

December 28, 2004

13:56



143

holds. Therefore, we have ∆(M ; S) = Γ(X , Y ; S). From this lemma, without loss of generality, we can assume Tr[SX] = Tr[SY ] = 0.

(19)

Let us make some additional simplifications. First, since the commutative case has been solved in Section 5, here we assume [X, Y ] "= 0. Moreover, we can assume Tr[S] = 1 without loss of generality. Now, under these assumptions, we introduce V = {V ∈ C2×2 | V = V ∗ and Tr[SV ] = 0} def def

W = {aX + bY | a ∈ R, b ∈ R}, for which it turns out that V is a three-dimensional real linear space and W is its two-dimensional subspace. Moreover, an inner product on V is defined as follows: def

V1 , V2 = Re(Tr[SV1 V2 ]) (V1 , V2 ∈ V) (Re means the real part). Setting def

φ(W ) = W, XX + W, Y Y ∈ W

(20)

for ∀W ∈ W, φ : W )→ φ(W ) defines a linear transformation on W. Lemma 4: The map φ is symmetric and positive-definite with respect to the inner product , . Proof: From φ(W1 ), W2 = W1 , XX, W2 + W1 , Y Y, W2 = W1 , φ(W2 ), φ is symmetric, and from φ(W ), W = W, X2 + W, Y 2 , φ is positive-definite. Therefore, φ has positive eigenvalues {µ1 , µ2 } and orthonormal eigenvectors {U1 , U2 } ⊂ W. Namely, we have φ(Uj ) = µj Uj ,

(21)

Uj , Uk = δjk .

(22)

entire

December 28, 2004

13:56


Hiroshi Nagaoka

144

Since {U1 , U2 } forms an orthonormal basis of W, the matrices X and Y can be represented as X = X, U1 U1 + X, U2 U2 ,

(23)

Y = Y, U1 U1 + Y, U2 U2 .

(24)

Moreover, the following holds: µj = φ(Uj ), Uj = Uj , X2 + Uj , Y 2

(← (20)).

(25)

On the other hand, since Uj (j = 1, 2) are 2×2 Hermitian matrices, they have eigenvalues {uj1 , uj2 } (⊂ R) and, letting {Ej1 , Ej2 } be the matrices of orthogonal projections to the corresponding eigenspaces, we have Uj =

2

ujk Ejk ,

(26)

k=1 2

Ejk = I.

(27)

k=1

Moreover, it holds that Uj 2 =

2

ujk 2 Ejk .

(28)

k=1

Now, we set for j, k ∈ {1, 2} def

pj =

√ √ √ µj /( µ1 + µ2 ),

def

Mjk = pj Ejk ,

(29) (30)

xjk = p−1 j X, Uj ujk ,

(31)

yjk = p−1 j Y, Uj ujk .

(32)

def

def

Lemma 5: We have 2 2

Mjk = I,

(33)

j=1 k=1 2 2 j=1 k=1

xjk Mjk = X,

2 2

yjk Mjk = Y.

(34)

j=1 k=1

Proof: Equation (33) is obvious from (27) and p1 + p2 = 1. Equation (34) is also obvious from (26), (23) and (24).

entire

December 28, 2004

13:56



145

Remark 6: Equations (26) and (30) indicate that the generalized measurement {Mjk } is a randomized measurement where either one of the two observables {U1 , U2 } is chosen with the probability {p1 , p2 } respectively and the simple measurement of it is performed. In what follows, we prove the equation ∆(M ; S) = Γ(X, Y ; S) for this M = {Mjk ; xjk ; yjk }. Lemma 7: It holds that √ ∆(M ; S) = µ1 + µ2 + 2 µ1 µ2 = X, X + Y, Y + 2

; X, XY, Y − X, Y 2 .

Proof: From (30), (31) and (28), we have 2 2

xjk 2 Mjk =

j=1 k=1

2 2

2 2 p−1 j X, Uj ujk Ejk =

j=1 k=1

2

2 2 p−1 j X, Uj Uj .

j=1

Similarly, we have 2 2

yjk 2 Mjk =

j=1 k=1

2

2 2 p−1 j Y, Uj Uj .

j=1

By summing them up, we obtain from (25) and (29) 2 2

(xjk 2 + yjk 2 )Mjk =

j=1 k=1

2 j=1

2 p−1 j µj Uj =

2

(µj +

√

µ1 µ2 )Uj2 .

j=1

Multiplying S from the left and taking trace of this equation, we obtain the first equality in the desired equation. (Note that Tr[SUj2 ] = Uj , Uj = 1.) Moreover, choosing {X, Y } as a basis of W, the matrix representation of φ is given by 7 8 X, X X, Y Φ= , Y, X Y, Y and hence, µ1 +µ2 and µ1 µ2 are expressed by the trace and the determinant of Φ respectively, which proves the second equality. def

Lemma 8: Letting {λ1 , λ2 } be the eigenvalues of Λ = iS 1/2 [X, Y ]S 1/2 , we have 2 j=1

|λj | = 2

; X, XY, Y − X, Y 2 .

entire

December 28, 2004

13:56


Hiroshi Nagaoka

146

Proof: All the quantities in the above equation are invariant under the transformations X → U XU ∗ , Y → U Y U ∗ and S → U SU ∗ for an arbitrary 2×2 unitary matrix U . Therefore, without loss of generality, we can assume that S is a diagonal matrix of the form 8 7 s1 0 , (sj > 0, s1 + s2 = 1). S= 0 s2 When we define 7; 8 s2 /s1 ;0 def V1 = , 0 − s1 /s2

7 def

V2 =

7

8 01 , 10

def

V3 =

8 0 i , −i 0

they form a basis of V. Therefore, X(∈ V) and Y (∈ V) can be written as X=

3

aj Vj ,

Y =

j=1

3

bj Vj .

j=1 def

From straightforward computation, the (j, k) component qjk of Q = S 1/2 XY S 1/2 is given by q11 = s2 a1 b1 + s1 (a2 b2 + a3 b3 ) − is1 (a2 b3 − a3 b2 ), q12 = s2 a1 b2 − s1 a2 b1 + i(s2 a1 b3 − s1 a3 b1 ), q21 = s2 a2 b1 − s1 a1 b2 + i(s1 a1 b3 − s2 a3 b1 ), q22 = s1 a1 b1 + s2 (a2 b2 + a3 b3 ) + is2 (a2 b3 − a3 b2 ). Hence, we have X, Y = Re(Tr[Q]) =

3

aj bj = a · b,

(35)

j=1

where a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ). The above formula holds as long as both X and Y are elements of V, which implies that {V1 , V2 , V3 } forms an orthonormal basis of V. Therefore, X, X =

3 j=1

a2j = a 2 ,

Y, Y =

3

b2j = b 2

j=1

also hold. On the other hand, Λ can be written as 7 8 2s1 c1 c2 + ic3 Λ = i(Q − Q∗ ) = , c2 − ic3 −2s2 c1

(36)

entire

December 28, 2004

13:56



147

where def

c1 = a 2 b 3 − a 3 b 2 ,

def

c2 = a 3 b 1 − a 1 b 3 ,

def

c3 = a 1 b 2 − a 2 b 1 .

(37)

The vector c = (c1 , c2 , c3 ) equals a × b, i.e., the vector product of a and b. Since λ1 λ2 = det Λ = −4s1 s2 c21 − c22 − c23 ≤ 0, we have 2

|λj | = |λ1 − λ2 | =

j=1

=

; (λ1 + λ2 )2 − 4λ1 λ2

(38)

9 ; (Tr[Λ])2 − 4 det Λ = 2 c21 + c22 + c23 = 2 a × b .

Letting θ be the angle between a and b, we can write

a × b 2 = a 2 · b 2 sin2 θ = a 2 · b 2 (1 − cos2 θ) = a 2 · b 2 −(a · b)2 , so that from (35) and (36) we have

a × b 2 = X, XY, Y − X, Y 2 .

(39)

From (38) and (39), we obtain the desired equation. From Lemmas 7 and 8, ∆(M ; S) = Γ(X, Y ; S) holds and therefore Theorem 2 has been proved. The above solution generally depends on S, which is different from the cases treated in Sections 5 and 6. 8. Conclusion For the problem of minimizing ∆ posed in Section 1, we have obtained Γ(X, Y ; S) as a lower bound and showed that the bound is attainable for some cases. From these results, one may conjecture that ∆∗ (X, Y ; S) = Γ(X, Y ; S) always holds in the case of general n × n matrices, or of operators on an arbitrary Hilbert space. So far, the problem of whether this conjecture is true or not has not been solved. When the method used in Section 7 is applied to the general case of n × n matrices, Lemmas 3, 4, 5 and 7 still hold by the same arguments, but Lemma 8 does not hold in general. Therefore, even if we construct M by the same method, the equation ∆(M ; S) = Γ(X, Y ; S) does not necessarily hold. In order to solve the case of n ≥ 3, it seems necessary to essentially extend the argument of Section 7. Indeed, it is very difficult at the current

entire

December 28, 2004

148

13:56


Hiroshi Nagaoka

stage to find out a meaningful relation between the solution in Section 6 and the one in Section 7. However, if we succeed in constructing a general theory which can be applied to any n, the gap will be filled and a new light will be shed on the method used in Section 6. Even for n × n matrices to which the method of Section 7 cannot be applied, there are some special cases in which it is possible to construct M satisfying ∆(M ; S) = Γ(X, Y ; S) by some combinatoric trial and error†. Thus, the author would expect that a key to the general theory will be obtained by accumulating such examples of construction. Though the above arguments are all for two X and Y , it is also important to extend them to the case of an arbitrary number of {X1 , . . . , Xp } mentioned in Section 2. For that purpose, we have to first extend Theorem 1 of Section 4. The extension, however, will require a completely new insight since the proof of Theorem 1 is deeply dependent on the condition p = 2. In this paper, we took up a mathematical problem as one of the difficulties arising in constructing the general theory of quantum state estimation and focused only on it. Our argument reveals a mathematical difficulty in generalizing known results for special models. At the same time, it is also an attempt to explore the possibility of overcoming such a difficulty. Of course, the problem taken up here is a mere part of the whole theory of quantum estimation and we can hardly expect meaningful applications only with it. For example, if the problem of this paper is completely solved, then ∆∗ (X1 , . . . , Xp ; Sθ ; G) in Section 3 can be obtained, but to obtain the lower bound of tr[GV (θ)] from this, i.e., to derive the Cramér-Rao’s inequality, we further need to obtain the lower bound of ∆∗ (X1 , . . . , Xp ; Sθ ; G) for {X1 , . . . , Xp } satisfying the (local) unbiasedness. (In the case of 2 × 2 matrices, the lower bound can be obtained and the author will discuss it at another opportunity‡ .) Moreover, since the Cramér-Rao’s inequality gives the lower bound for the deviation with the unknown parameter θ fixed, it is necessary to clarify in what case the lower bound is uniformly attained independent of the value of θ, namely, what corresponds to the concepts of efficient estimator and of exponential family in the classical estimation theory. Thus, the contribution of this paper to the theory of quantum estimation is not enough by itself, or rather, we may say that its significance is to give a basis † Dr. Naoki Suehiro showed interesting constructions for some numerical examples of 3×3 in a private communication to the author. ‡ Additional author’s note: See the note at the end of chap. 8 in this book.

entire

December 28, 2004

13:56



149

to consider the problems mentioned above as well as problems concerning such notions as the asymptotic efficiency and the sufficiency. References [1] E. Arthurs and J.L. Kelly Jr., “On the simultaneous measurement of a pair of conjugate observables,” Bell System Tech. J., 44, 725–729 (1965). [2] R.J. Glauber, “Coherent and incoherent states of the radiation field,” Phys. Rev. 131, 6, 2766–2788 (1963). [3] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [4] O. Hirota, Optical communication theory, Morikita Shuppan, Japan, 1985 (in Japanese). [5] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, 1982. [6] H. Nagaoka, “A new approach to Cramér-Rao bounds for quantum state estimation,” IEICE Technical Report, 89, 228, IT 89-42, 9–14, (1989). (Chap. 8. in this book) [7] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of nonselfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973).

entire

December 28, 2004

13:56


CHAPTER 12

A Linear Programming Approach to Attainable Cram´ er-Rao Type Bounds Masahito Hayashi Abstract. The author studies the relation between the attainable Cramér-Rao type bound and the duality theorem in the infinite dimensional linear programming. By this approach, the attainable CramérRao type bound for a 3-parameter spin-1/2 model is explicitly derived.

1. Introduction It is well-known that the lower bound of quantum Cramér-Rao inequality Vρ (M ) ≥ JρS cannot be attained unless all the SLDs commute, where we denote by Vρ (M ) a covariance matrix for a state ρ by a measurement M , the SLD Fisher information matrix for a state ρ by JρS . We therefore often treat an optimization problem for tr GVρ (M ) to be minimized, where G is an arbitrary real positive symmetric matrix. If there is a function Cρ (possibly depending on G) such that tr GVρ ≥ Cρ holds for all M , Cρ is called a Cramér-Rao type bound, or simply a CR bound. Our purpose is to find the most informative (i.e., attainable) CR bound under locally unbiasedness conditions. To author’s knowledge, there has been known only two models for which this optimization problem was explicitly solved in mixed state models. One is the estimation of complex amplitudes of coherent signals in Gaussian noise solved by Yuen and Lax [8], and Holevo [5]. Another one is the estimation of a 2-parameter spin-1/2 model solved by Nagaoka [6]. In pure state case, see Fujiwara and Nagaoka [1]. In this paper, a completely different approach to the optimization problem is given based on an infinite dimensional linear programming technique [7]. In addition, the most informative CR bound for a 3-parameter spin-1/2 model is explicitly derived by this approach.

c 1997 Kluwer. Reprinted, with permission, from Quantum Communication, Computing, and Measurement, (edited by Hirota, O., Holevo, A. S., and Caves, C.A.), Plenum, New York, 1997, 99–108. 150

entire

December 28, 2004

13:56


A Linear Programming Approach to Attainable Cram´ er-Rao Type Bounds

151

2. SLD Inner Product and Locally Unbiasedness Conditions Definition 1: For a subset Θ ⊂ Rn the map f : Θ → Tsa (H) is called a C k -map, if the k-th derivative of f is well defined on the interior of Θ, where Tsa (H) is the set of selfadjoint trace class operators on H. +,1 By Tsa (H) we denote the set {ρ ∈ Tsa (H)|ρ ≥ 0, trH ρ = 1}. +,1 (H) an n-dimensional C k -state manifold, Definition 2: We call P ⊂ Tsa k if P is an n-dimensional C -manifold which satisfies the condition: there exists a family {Uλ }λ∈Λ of open sets of P such that for any λ ∈ Λ there exist Θλ ⊂ Rn and C k map φλ : Θλ → Uλ .

We assume that we are given a family of density operators P which is a state manifold. The space L2sa (ρ) is defined as follows. Definition 3: For ρ ∈ Tsa (H), L2sa (ρ) consists of selfadjoint operators X on H satisfying the following conditions: 1) φj ∈ D(X) with respect to j such that sj "= 0 sj Xφj |Xφj < ∞, 2) X|Xsa ρ :=

(1) (2)

j

where ρ =

j

sj |φj φj | is the spectral decomposition of ρ.

Definition 4: We call a state ρ faithful, if it satisfies the following condition: X ∈ L2sa (ρ) and X|Xsa ρ = 0 =⇒ X = 0

(3)

We assume that all of the given family of density operators are faithful. For X, Y ∈ L2sa (ρ), define 1 sa sa (4) X|Y sa X + Y |X + Y := − X − Y |X − Y ρ ρ ρ . 4 We denote the norm with respect to this inner product by Sρ . We call the inner product the SLD inner product. Theorem 5: If H is separable, L2sa (ρ) is a real Hilbert space with respect to the SLD inner product. For a proof see Holevo [4].

entire

December 28, 2004

13:56


Masahito Hayashi

152

We define ρ ◦ X := 12 (ρ · X + X · ρ) ∈ Tsa (H) for X ∈ L2sa (ρ). We denote the inner product of the real Hilbert space L2sa (ρ) by JρS . We identify the 2 dual of L2sa (ρ) with L2,∗ sa (ρ) := {ρ ◦ X|X ∈ Lsa (ρ)} as follows. R ∈

∈

2 L2,∗ sa (ρ) × Lsa (ρ) →

(x, X)

(5)

)→ trH xX.

We may regard JρS as an element of Homsa (L2sa (ρ), L2,∗ sa (ρ)) by

X

∈

∈

JρS : L2sa (ρ) → L2,∗ sa (ρ) (6)

)→ ρ ◦ X.

∂ρ We identify ∂θ∂ i ∈ Tρ P with ∂θ i ∈ Tsa (H), and we assume that Tρ P ⊂ ρ ρ ρ,−1 2,∗ . We identify Tρ∗ P with JS,T (Tρ P ). We Lsa (ρ). We denote JS |Tρ∗ P by JS,T ρ,−1 call the inner product JS,T on Tρ P the *SLD inner product. By S we ρ,−1 . By M(Ω, H) we denote the denote the norm of *SLD inner product JS,T set of generalized measurements on H whose measurable space is Ω.

Definition 6: We define an affine map E from M(Tρ P, H) to Hom(Tsa (H), Tρ P ) by 6

x trH M ( dx)τ , ∀τ ∈ Tsa (H). (7) E(M )(τ ) := Tρ P

Let us define the locally unbiasedness condition. Definition 7: We call M ∈ M(Tρ P, H) locally unbiased on ρ ∈ P , if the map E(M ) : Tsa (H) → Tρ P satisfies the following condition: E(M )(ρ) = 0 E(M )|Tρ P = IdTρ P .

(8) (9)

We denote the set of locally unbiased measurements on ρ ∈ P by U(Tρ P ). Lemma 8: For M ∈ M(Tρ P, H), the condition (9) is equivalent to the following equation: 6 trH a(x)M ( dx) = trTρ P a, ∀a ∈ End(Tρ P ). (10) Tρ P

By taking basis, it is easy to verify this. Let g be a nonnegative inner product on Tρ P , then we call inf M∈U (Tρ P ) trTρ P Vρ (M )g the attainable Cramér-Rao bound.

entire

December 28, 2004

13:56



153

3. Linear Programming Approach We introduce a new approach to the attainable Cramér-Rao type bound. In this approach, we apply the duality theorem of the infinite dimensional linear programming [7]. But, we don’t have to know the duality theorem for this section. In the noncommutative case, there is no infimum of covariance matrices under the locally unbiasedness conditions. Therefore, we minimize the following value Dgρ under the locally unbiasedness conditions. Let g be a nonnegative inner product on Tρ P . Definition 9: We define the deviation Dgρ for a measurement M ∈ M(Tρ P, H) as follow: 6 Dgρ (M ) := trTρ∗ P gVρ (M ) = g(x, x) trH M ( dx)ρ. Tρ P

We introduce the useful theorem to minimize the deviation Dgρ (M ) under the locally unbiasedness conditions. Theorem 10: We have the inequality inf M∈U (Tρ P )

Dgρ (M ) ≥

sup (a,S)∈U ∗ (g)

(trTρ P a + trH S),

(11)

where U ∗ (g) + (H), ∀x ∈ Tρ P }. := {(a, S) ∈ End(Tρ P ) × Tsa (H)|g(x, x) · ρ − S − a(x) ∈ Tsa

Notice that Tρ P is a subset Tsa (H). We call the calculation of sup(a,S)∈U ∗(g) (trTρ P a + trH S) the dual problem. Corollary 11: If there exist a sequence of locally unbiased measurements {Mk } and an element (a , S ) of U ∗ (g) satisfying the condition Rρg (a , S ; Mk ) → 0 (as k → 0),

(12)

then lim Dgρ (Mk ) = trTρ P a + trH S =

k→∞

=

sup (a,S)∈U ∗ (g)

where Rρg is defined as

inf M∈U (Tρ P )

Dgρ (M )

(trTρ P a + trH S),

(13)

6

Rρg (a, S; M ) := trH

Rgρ (a, S; x)M ( dx)

(14)

Tρ P

Rgρ (a, S; x) := g(x, x) · ρ − S − a(x).

(15)

entire

December 28, 2004

13:56


Masahito Hayashi

154

We call (a, S) ∈ U ∗ (g) the Lagrange multiplier. Proof of Theorem 10 and Corollary 11: For M ∈ U(Tρ P ) and (a, S) ∈ U ∗ (g), we have Rρg (a, S; M ) 6 6 = trH g(x, x) · ρM ( dx) − trH Tρ P

6 SM ( dx) − trH Tρ P

a(x)M ( dx) Tρ P

= Dgρ (M ) − trTρ P a − trH S.

(16)

Since we have Rgρ (a, S; x) ≥ 0 for any x ∈ Tρ P , we obtain Rρg (a, S; M ) ≥ 0. By (16), the proof is complete. When dim H < ∞, we have the equality in (11). See Hayashi [2,3]. 4. Maximum In this section, we consider the dual problem. Let us define a linear functional on End(Tρ P ) × Tsa (H), denoted by Tr, in the following way, where ρ,−1 . in this section, we regard Tρ P as a real Hilbert space with respect to JS,T

Tr : End(Tρ P ) × Tsa (H) →

R

∈

∈

Definition 12: The functional Tr is defined by

(a, S)

)→ trTρ P a + trH S.

Lemma 13: If the dimension of H is finite, then the set U ∗ (g) ∩ Tr−1 ([0, ∞)) is compact. We assume that the norm of End(Tρ P ) is the operator norm o , and the norm of Tsa (H) is the trace norm t . We define the norm o,t of End(Tρ P ) × Tsa (H) as follows:

(a, S) o,t := a o + S t , ∀(a, S) ∈ End(Tρ P ) × Tsa (H). Proof: We have + (H)}. U ∗ (g) = ∩x∈Tρ P {(a, S)|g(x, x) · ρ − S − a(x) ∈ Tsa + (H)} is closed. Thus, U ∗ (g) Moreover, {(a, S)|g(x, x) · ρ − S − a(x) ∈ Tsa −1 ∗ is closed. Because Tr ([0, ∞)) is closed, U (g) ∩ Tr−1 ([0, ∞)) is closed. Therefore, it is sufficient to show that it is bounded with respect to the norm

o,t . Denote n := dim Tρ P . For (a, S) ∈ U ∗ (g) ∩ Tr−1 ([0, ∞)), we have

entire

December 28, 2004

13:56



155

trTρ P a ≤ n a o . Choose z ∈ Tρ P such that z S = 1, a(z) S = a o. For r > 0, we have g(r · z, r · z)ρ − a(r · z) − S ≥ 0.

(17)

Substitute r = 0, then −S ≥ 0. Let us calculate the left hand side of (17). ρ,−1 (a(z)) ◦ ρ − S g(r · z, r · z)ρ − a(r · z) − S = r2 · g(z, z)ρ − r · JS,T

; 1 ρ,−1 JS,T g(z, z)r − ; (a(z)) · ρ = 2 g(z, z)

; 1 ρ,−1 · JS,T g(z, z)r − ; (a(z)) 2 g(z, z) 1 ρ,−1 J ρ,−1 (a(z)) · ρ · JS,T (a(z)) − S. (18) − 4g(z, z) S,T

Let {ei } be a complete orthonormal system of H which consists of eigenvectors of ρ,−1 ρ,−1 1 (a(z)). Substitute r for the eigenvalue αi of 2g(z,z) JS,T (a(z)) corJS,T responding to the eigenvector ei , then we have < ; 1 ρ,−1 ei g(z, z)αi − ; (a(z)) · ρ JS,T 2 g(z, z) =

; 1 ρ,−1 g(z, z)αi − ; (a(z)) ei = 0. JS,T · 2 g(z, z) By (17), we have = < 1 ρ,−1 ρ,−1 ei − JS,T (a(z)) · ρ · JS,T (a(z)) − S ei ≥ 0. 4g(z, z) Therefore, we have

1 ρ,−1 ρ,−1 trH − JS,T (a(z)) · ρ · JS,T (a(z)) − S ≥ 0. 4g(z, z) Thus, we get trH S ≤ −

1

a 2o a(z)|a(z)ρS = − , 4g(z, z) 4g(z, z)

trH S ≤ −

a 2o . 4 g o

Therefore, we obtain 0 ≤ Tr(a, S) ≤ n a o −

a 2o . 4 g o

ao ). Thus, 0 ≤ a o ≤ 4n g o. As −S ≥ 0, we have Hence, 0 ≤ a o(n − 4g o

S = − trH S. Therefore, we obtain the following inequalities:

0 ≤ S ≤ tr a ≤ n a o ≤ 4 g on2 .

entire

December 28, 2004

13:56


Masahito Hayashi

156

Thus, U ∗ (g) ∩ Tr−1 ([0, ∞)) is bounded, hence compact. We have the following corollary. Corollary 14: There exists the maximum of the right hand side of (11). Assume that ρ ∈ P1 ⊂ P2 and Tρ P1 ⊂ Tρ P2 , Tρ P1 "= Tρ P2 . From the imbedding map i : P1 P→ P2 , we have diρ : Tρ P1 P→ Tρ P2 and di∗ρ : Tρ∗ P2 → Tρ∗ P1 . By identifying the dual Tρ∗ Pi with Tρ Pi (i = 1, 2), we may regard di∗ρ as di∗ρ : Tρ P2 → Tρ P1 . Let g be a nonnegative inner product on Tρ P1 , then diρ g di∗ρ is a nonnegative inner product on Tρ P2 . Lemma 15: We have the inequality max

(a,S)∈U ∗ (g)

(trTρ P1 a + trH S) ≤

max

(a ,S)∈U ∗ ( diρ g di∗ ρ)

(trTρ P2 a + trH S).

(19)

Moreover the equality holds, if and only if there exists (a , S) ∈ U ∗ ( diρ g di∗ρ ) such that a (Tρ P1 ) ⊂ Tρ P1 , and the maximum of the right hand side is attained by (a , S). Proof: We have ( diρ a di∗ρ , S) ∈ U ∗ ( diρ g di∗ρ ) for (a, S) ∈ U ∗ (g). (20)

∈

∈

F : U ∗ (g) → U ∗ ( diρ g di∗ρ ) (a, S) )→ ( diρ a di∗ρ , S). Then, we have Tr(a, S) = Tr(F (a, S)). In (19) the equality holds, if and only if max

(a ,S)∈U ∗ ( diρ g di∗ ρ)

Tr(a , S) =

max

(a ,S)∈Im F

Tr(a , S).

(21)

By the definition of U ∗ ( diρ g di∗ρ ), as diρ g di∗ρ (Ker di∗ρ ) = 0, we have a (Ker di∗ρ ) = 0 for (a , S) ∈ U ∗ ( diρ g di∗ρ ). Thus, (a , S) ∈ Im F for (a , S) ∈ U ∗ ( diρ g di∗ρ ), if and only if a (Tρ P1 ) ⊂ Tρ P1 . Thus, the proof is complete.

5. 3-Parameter Spin-1/2 Model In this section, we consider a 3-parameter spin-1/2 model. Let us define the Pauli matrices σ1 , σ2 , σ3 in the usual way: 01 0 −i 1 0 σ1 = , σ2 = , σ3 = . 10 i 0 0 −1

entire

December 28, 2004

13:56



157

+,1 Assume that dim H = 2, dim Tρ P = 3, and that ρ ∈ Tsa (H) is not a 1 pure state. We may assume that ρ = 2 (Id + ασ3 ). By S(Tρ P ) we denote the unit sphere in Tρ P with respect to the *SLD inner product. We assume that g is a quadratic form on Tρ P . In this section, we assume Tρ P to be ρ . f3 = a real Hilbert space with respect to the *SLD inner product JS,T √

1−α2 σ3 , 2 i

f i = σ2i , (i = 1, 2) is an orthonormal base on Tρ P . The dual −α 1 Id + √1−α σ , fi = σi (i = 1, 2) . We need some base of f is f3 = √1−α 2 2 3 lemmas. Lemma 16: For y ∈ Tρ P , we have det(r IdH −J −1 (y)) > > ! ! αy 3 αy 3 α2 (y 3 )2 α2 (y 3 )2 = r+ √ +1 +1 . r+ √ + − (1 − α2 ) (1 − α2 ) 1 − α2 1 − α2 Proof: We have J −1 (y) = y 3

3 1 (−α Id +σ ) + y i σi . H 3 1 − α2 i=2

(22)

√ √ 1 2 Since ; there exists t ∈ R s.t. exp( −1tσ32)(y σ1 + y σ2 ) exp(− −1tσ3 ) = (y 1 )2 + (y 2 )2 σ1 , we may assume that y = 0. We have J

−1

(y) =

−α+1 3 √ y y1 1−α2 −α−1 3 √ y1 y 1−α2

! .

Therefore, αy 3 det(r IdH −J −1 (y)) = r2 + √ r − (y 1 )2 − (y 3 )2 1 − α2 > > ! ! αy 3 α2 (y 3 )2 α2 (y 3 )2 αy 3 +1 r+ √ +1 . = r+ √ + − (1 − α2 ) (1 − α2 ) 1 − α2 1 − α2

Lemma 17: For z ∈ S(Tρ P ), we have J −1 (z) · ρ · J −1 (z) = (IdH −ρ).

(23)

entire

December 28, 2004

13:56


Masahito Hayashi

158

Proof: In the same way as above, we may assume that z 3 = 0. Then we have ! −α+1 3 1+α √ z z1 0 2 −1 −1 1−α 2 J (z) · ρ · J (z) = −α−1 3 √ z1 z 0 1−α 2 1−α2 ! −α+1 3 1 √ z z 2 1−α · −α−1 3 √ z1 z 1−α2 1−α 0 2 = IdH −ρ. = 0 1+α 2 For x ∈ Tρ P , we denote the spectral measure of J −1 x by MJ −1 x ∈ Ms (R, H). For y ∈ Tρ P we define the map (y) in the following way: ∈

∈

(y) : R → Tρ P (24)

c )→ c · y. Put MJy−1 x := (y)∗ MJ −1 x ∈ M(Tρ P, H). This is important to establish the optimal measurement. Lemma 18: We have the following equation with respect to E(MJy−1 x ) ∈ 2,∗ 2 Hom(L2,∗ sa (ρ), Tρ P ) ⊂ Hom(Lsa (ρ), Lsa (ρ)). E(MJy−1 x ) = y ⊗ J −1 x = |yx|. Proof: For τ ∈ L2,∗ sa (ρ) = Tsa (H), we have 6 y x1 trH MJy−1 x ( dx1 )τ = y trH τ J −1 x. E(MJ −1 x )(τ ) =

(25)

(26)

Tρ P

By Theorem 10 and Corollary 11 we obtain the following theorem about the attainable Cramér-Rao type bound. Theorem 19: Let g be a nonnegative inner product on Tρ P , then ; inf Dgρ (M ) = (trTρ P Jg)2 . M∈U (Tρ P )

(27)

Moreover, the optimal measurement is given by a random measurement (i.e., a convex combination of simple measurements).

entire

December 28, 2004

13:56



159

Proof: Take the Lagrange multipliers in the following way: a(x) := 2β ·

; β2 Jgx, S := − (Id − ασ3 ) = −β 2 (IdH −ρ), 2

where we put β := trTρ P

√ Jg. Then, we have

Rgρ (a, S; x) = g(x, x) · ρ − a(x) − S ; ; ; = J −1 ( Jgx, Jgx) · ρ − 2β · Jgx + β 2 (IdH −ρ) = J −1 (y, y) · ρ − 2β · y + β 2 (IdH −ρ) = r2 ρ − 2βrz + β 2 (IdH −ρ) = r2 ρ − 2βrρ ◦ J −1 (z) + β 2 (IdH −ρ)

= r − βJ −1 (z) ρ r − βJ −1 (z) + β 2 (IdH −ρ) − J −1 (z) · ρ · J −1 (z)

= r − βJ −1 (z) ρ r − βJ −1 (z) , √ where z ∈ SJ −1 (Tρ P ), r > 0, y = r · z, y = Jg(x). We derive the last equation from Lemma 17. Therefore, (a, S) ∈ U ∗ (g). Thus, trTρ P a+trH S =

√ 2 trTρ P Jg is a Cramér-Rao type bound. First let us consider the case in which g is nondegenerate. By Lemma 16, one of eigenvalues of βJ −1 z is positive and another is negative. We denote the positive one by r(z)+ , another by r(z)− . We denote their eigenvectors by v(z)+ , v(z)− , respectively. As we have r(−z)+ = −r(z)− , r(−z)− = −r(z)+ , v(−z)+ = v(z)− , v(−z)− = v(z)+ , we have √

−1

z Rρg (a, S; MβJJg ) −1 z = < = v(z)+ r(z)+ − βJ −1 (z) ρ r(z)+ − βJ −1 (z) v(z)+ = < + v(−z)+ r(−z)+ − βJ −1 (z) ρ r(−z)+ − βJ −1 (z) v(−z)+

= 0.

(28)

Diagonalize

√ Jg in the following way:

3 3 ; −1 Jg = Wi ei ⊗ J ei = Wi |ei ei |, ei ∈ S(Tρ P ). i=1

Put M :=

3

√ −1 Jg ei Wi i=1 β MβJ −1 ei

(29)

i=1

∈ M(Tρ P, H). Then, we have Rρg (a, S; M ) =

entire

December 28, 2004

13:56


Masahito Hayashi

160

0. By (7) and Lemma 18, we have E(M ) =

=

3 Wi i=1 3 i=1

β

√

−1

ei E(MβJJg )= −1 e i

3 Wi ; −1 =< Jg ei βei β i=1

Wi |Wi−1 ei βei | = |ei ei |. β i=1 3

Thus, the measurement M ∈ M(Tρ P, H) satisfies the locally unbiasedness conditions. As it satisfies the conditions of Corollary 3.1, the random mea√ surement M ∈ M(Tρ P, H) attains a Cramér-Rao type bound(trTρ P Jg)2 . √ Thus, (trTρ P Jg)2 is the attainable Cramér-Rao type bound. Next let us consider the case in which g is degenerate. Let {An } be a sequence of positive valued 3×3 symmetric matrices with limn→∞ An = Jg, √ √ where every An and Jg commute with each other, and trH An = Jg = β. Let us take complete orthonormal system e1 , e2 , e3 of the eigenvectors √ of An , Jg. Let Win , i = 1, 2, 3 be the eigenvalues of An . We define Mn ∈ M(Tρ P, H) in the following way: Mn :=

3 Wn i

i=1

βn

A−1 e

MβJn−1 eii .

(30)

We can show Mn ∈ U(Tρ P ) in the same way as the case in which g is nondegenerate. Then, we have lim Rρg (a, S; Mn ) =

n→∞

3

Win ρ A−1 e Rg (MβJn−1 eii ) = 0. n→∞ β i=1 lim

The proof is complete. +,1 Finally let us consider a 2-parameter case. Put P1 := Tsa (C2 ) and assume 2 2 that P2 ⊂ P1 , dim P2 = 2. Let g := i=1 (Wi ) ei ⊗ ei be a quadratic form on Tρ P2 , where e1 , e2 is complete orthonormal system of Tρ∗ P2 . The maximum of the duality problem is given the following Lagrange multiplier, ; ; a(x) := 2β · ( Jg)(x), S := −β 2 (Id − ρ), β = tr Jg,

(a, S) satisfies the conditions of the equality of Lemma 15. The optimal measurement is given by the random measurement which is the limit of the optimal measurement sequence given in the case of degenerate g in Theorem 19.

entire

December 28, 2004

13:56



161

Acknowledgments I wish to thank Professor A. Fujiwara for introducing me into this subject, and Professor K. Ueno for useful comments about this paper. References [1] A. Fujiwara and H. Nagaoka, “Coherency in view of quantum estimation theory,” in K. Fujikawa and Y.A. Ono, eds., Quantum coherence and decoherence, Elsevier, Amsterdam, 303-306, 1996. [2] M. Hayashi, The minimization of deviation under locally unbiased conditions, Master Thesis, Department of Mathematics, Kyoto University, 1996 (in Japanese). [3] M. Hayashi, “A linear programming approach to attainable Cramér-Rao type bounds and randomness condition,” quant-ph/9704044. [4] A.S. Holevo, “Commutation superoperator of a state and its application to the noncommutative statistics,” Rep. Math. Phys., 12, 251-271 (1977). [5] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982. [6] H. Nagaoka, “A generalization of the simultaneous diagonalization of Hermitian matrices and its relation to quantum estimation theory,” Trans. Jap. Soc. Indust. Appl. Math., 1, 43-56, (1991), (in Japanese). (Chap. 11 of this book) [7] R.M. Van Style and R.J.B. Wets, “A duality theory for abstract mathematical programs with applications to optimal control theory,” Journal of mathematical analysis and application, 22, 679-706 (1968). [8] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of nonselfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973).

entire

December 28, 2004

13:56


CHAPTER 13

Statistical Model with Measurement Degree of Freedom and Quantum Physics Masahito Hayashi and Keiji Matsumoto Abstract. The asymptotic efficiency of statistical estimate of unknown quantum states is discussed, both in adaptive and collective settings. Adaptive bounds are written in single-letterized form, and collective bounds are written in limiting expression. Our arguments clarify mathematical regularity conditions.

1. Introduction In the estimation of the unknown density operator by use of the experimental data, the error can be reduced by the improvement of the design of the experiment. Therefore, it is natural to ask what is the limit of the improvement. To answer the question, Helstrom [2] founded the quantum estimation theory, in analogy with classical estimation theory (in the manuscript, we refer to statistical estimation theory of probability distribution as ‘classical estimation theory’). Often, for simplicity, it is assumed that a state belongs to a family M = {ρθ |θ ∈ Θ ⊂ Rm } of states, which is called model and that the finite dimensional parameter θ is to be estimated statistically. He considered the quantum analogue of Cramér-Rao inequality, which gives the lower bound of mean square error of locally unbiased estimate. This bound, however, is not achievable at all, when the number of the data is finite. Let us assume that the number n of the data tends to infinite. Then, if some regularity conditions are assumed, it is concluded that if the estimate is consistent, i.e., the estimate converges to the true value of parameter, the first order asymptotic term of mean square error satisfies the CramérRao inequality, and that the bound is achieved for all θ ∈ Θ. This kind of discussion is called first order asymptotic theory. The quantum version of first order asymptotic theory is started by H. Nagaoka [6, 7]. He defined, in our terminology, the quasi-quantum This paper was originally written in Japanese. It was translated to English by Keiji Matsumoto. 162

entire

December 28, 2004

13:56


Statistical Model with Measurement Degree of Freedom and Quantum Physics

163

Cramér-Rao type bound, and pointed out that the bound is achieved asymptotically and globally. The proof of achievability, however, is only roughly sketched in his paper. In this manuscript, the proof of the achievability of the bound is fully written out, and the regularity conditions for the achievability is revealed. In addition, we defined another bound, the quantum Cramér-Rao type bound, and showed that the new bound is also achievable, if the use of quantum correlation between samples are allowed. 2. Preliminaries ˆ An estimate θˆ is obtained as a function θ(ω) of data ω ∈ Ω to Rm . The purpose of the theory is to obtain the best estimate and its accuracy. The optimization is done by the appropriate choice of the measuring apparatus ˆ and the function θ(ω) from data to the estimate. m Let σ(R ) be a σ- field in the space Rm . Whatever apparatus is used, the data ω ∈ Ω lie in a measurable subset B ∈ σ(Rm ) of Ω writes Pr{ω ∈ B|θ} = tr ρ(θ)M (B),

(1)

when the true value of the parameter is θ. Here, M , which is called positive operator-valued measure (POM, in short), is a mapping from subsets B ⊂ Ω to non-negative Hermitian operators in H, such that M (φ) = O, M (Ω) = I, M (

∞ 5

i=1

Bi ),=

∞

M (Bi ) (Bi ∩ Bj = φ, i "= j) (2)

i=1

(see p. 53 [2] and p. 50 [3]). Conversely, some apparatus corresponds to any POM M [8, 9]. Therefore, we refer to the measurement which is controlled ˆ M ) is called an estimator. by the POM M as ‘measurement M ’. A pair (θ, M The classical Fisher information matrix Jθ by the POM M is defined, as in classical estimation theory, 8 76 dPM dPM M θ θ ∂j log dP , Jθ := ∂i log dν dν ω∈Ω where ∂i = ∂/∂θi , PM θ (B) := tr ρθ M (B), and ν is some underlying measure (in the manuscript, we assume that for any POM M , there is a measure ν in Ω such that PM θ ≺ ν for all θ ∈ Θ). Denote the mean square error matrix of ˆ M ], and, as the measure of accuracy, let us take TrGVθ [M ], ˆ M ) by Vθ [θ, (θ, where G is nonnegative symmetric real matrix. If G = diag(g1 , · · · , gm ), TrGVθ [M ] is the weighed sum of mean square error of the estimate θî of each component θi of the parameter.

entire

December 28, 2004

164

13:56


Masahito Hayashi and Keiji Matsumoto

ˆ M ) at θ by, Let us define locally unbiased estimator (θ, 6 ˆ M ] := θˆj (ω) tr ρθ M (dω) = θj , (j = 1, · · · , m). Eθ [θ, 6 θˆj (ω) tr ∂k ρθ M (dω) = δkj , (j, k = 1, · · · , m).

(3) (4)

Then, JθM is characterized by, ˆ M ] | θˆ : (θ, ˆ M ) is locally unbiased}, JθM−1 = inf{Vθ [θ, and the quasi-quantum Cramér-Rao type bound Cθ (G) is defined by, ˆ M ] | M is locally unbiased} Cθ (G) := inf{TrGVθ [θ, = inf{TrGJθM−1 | M is a POM in H}. Nagaoka pointed out that the quasi-quantum Cramér-Rao type bound is achievable asymptotically for every θ ∈ Θ [6]. Cθ (G) is calculated explicitly for several special cases [1, 5]. of the unknown state ρθ are given. Then the Suppose n-i.i.d. state ρ⊗n θ ˆ sequence {(θn , Mn )}, where Mn is a POM in H⊗n , is said to be MSE consistent if the estimate θˆn converges to the true parameter in the mean square error, i.e., limn→∞ Vθ [(θˆn , Mn )] = 0. 3. The Quasi-Classical Cram´ er-Rao Type Bound 3.1. The lower bound Let M(1) , ..., M(n) be a sequence of the POMs in H, and apply the measurement M(1) to the first sample, and the measurement M(2) to the second sample, and so on. The choice of M(k) is dependent on the outcome ωk−1 = (ω(1) , .., ω(k−1) ) of M(1) , ..., M(k−1) . To reveal the dependency of ωk−1 , we write M(k) [ωk−1 ]. M(k) on Let us define the POM Mn in Hn which takes value in Ωn by, 6 n Mn (B) = M(k) [ωk−1 ](dω(k) ). ωn ∈B k=1

Then the data ωn is controlled by the probability distribution PθMn (B) = ⊗n tr ρθ Mn (B). The estimator is said to be asymptotically unbiased if i 6

i ˆ := θˆni (ω) − θi PM (Bn,θ ) = Bθ θn , Mn θ ( dω) → 0 as n → ∞ (5) Ω

i

∂ i ˆ (An,θ )ij = Aθ θˆn , Mn := E [θn , Mn ] → δji as n → ∞. ∂θj θ j

(6)

entire

December 28, 2004

13:56



165

The MSE consistent estimator satisfies (5) always. Therefore, if appropriate regularity conditions are assumed so that the differential, the integral and the trace commute with each other, then (6) is also satisfied, and the estimator will be asymptotically unbiased. Theorem 1: If {(θˆn , Mn )} is MSE consistent, and limn→∞ nVθ [(θˆn , Mn )] exists, lim n Tr GVθ [(θˆn , Mn )] ≥ Cθ (G).

(7)

n→∞

Proof: In the almost same manner as classical estimation theory, (6) leads to,

−1 t nVθ [θˆn , Mn ] ≥ nAn JθMn An (8) Elementary calculation leads to, 1 Mn Mn Jθ = Jθ θ , n n where Mθ ∈ M is a POM in H which is defined by 6 n n n n Bk ) = M(k) [ωk−1 ](Bk )PM ωn ). Mθ ( θ ( d k=1

(9)

ωn k=1

(8) and (9) yield

n −1 M t Tr GnVθ [θˆn , Mn ] ≥ Tr GAn,θ Jθ θ An,θ ≥ Cθ ( t An,θ GAn,θ ). (10) Passing both sides of (10) to the limit n → ∞, we have the theorem.

3.2. Estimator which achieves the bound The estimator defined in the following achieves the equality in the inequality (7) if the regularity conditions (B.1–4) are satisfied. The proof will be presented later in the subsection 3.4. √ First, apply the measurement M0 to n samples of unknown state ρθ , and calculate θˇn which satisfies (12). Second, apply the measure√ ment Mθˇn to the remaining n − n samples, where Mθ is defined by −1

≤ Cθ (G) + ε , Tr G JθMθ

(11)

entire

December 28, 2004

166

13:56



(11) is satisfied. Then, θˆn is defined to be θn (θˇn ), where θn (θ ) is defined by, θn (θ ) = argmax θ∈Θ

n √ k= n+1

M

log

dPθ θ (ωk ). dν

3.3. Regularity conditions (B.1) There is a POM M0 and θˇn which satisfies n ˇ lim PM θ { θ − θn > δ} = 0,

n→∞

∀δ > 0.

(12)

(B.2) K := supθ∈Θ θ is finite. (B.3) θn (θ ) achieves the equality in the classical asymptotic Cramér-Rao M inequality of the family {Pθ θ | θ ∈ Θ} of probability distributions. (B.4) The higher order term of mean square error of θn (θ ) is uniformly bounded when θ − θ < δ1 for some δ1 > 0. In other words, for any ε > 0, θ ∈ Θ, there exists a positive real number δ1 > 0 and a natural number N such that, −1

√ < ε, (n − n) Tr GVθ,n − Tr G J Mθ (13) θ ∀n ≥ N, ∀θ s.t. θ − θ ≤ δ1 , where Vθ,n is the conditional mean square error matrix of θ n (θ ) when θ is given. (B.5) For any ε > 0, θ ∈ Θ, there exists δ2 > 0, such that,

−1 ˇ < δ2 . Tr G J Mθˇ < ε, ∀θ, ∀θˇ s.t. θ − θ

− C (G) (14) θ θ (B.1) is satisfied almost always, and (B.2) is not restrictive . For θ n (θ ) M is the maximum likelihood estimator of the family {Pθ θ } of probability distributions, (B.3) is satisfied in usual cases. The validity of (B.4), however, is hard to verify. Therefore, in the future, this condition needs to be replaced by other conditions. Obviously, (B.5) reduces to the following (B.5.1–2), both of which are natural. (B.5.1) The map θ → ) Cθ (G) is continuous. M (B.5.2) For any θ , the map θ )→ [Jθ θ ]−1 is continuous.

entire

December 28, 2004

13:56



167

3.4. Proof of achievability Theorem 2: If the model M satisfy conditions (B.1–5) in the following, then we have, lim n Tr GVθ [θˆn , Mn ] = Cθ (G), ∀θ ∈ Θ,

(15)

n→∞

Proof: Let us choose δ1 , δ2 and N so that (13 − 14) are satisfied, and define δ := min(δ1 , δ2 ). Then, if n ≥ N , we have, n Tr GVθ [θˆn , Mn ] 6 n =n Tr GVθ [θn (θˇn ), Mn ]PM θ ( dω) 6 n ≤n Tr GVθ [θn (θˇn ), Mn ]PM θ ( dω) ˇ θ−θn ≤δ 6 + K 2 Tr G n √ ≤ n− n

≤

n √ n− n

θ−θˇn >δ

M −1 ˇn θ n + ε + ε PM Tr G Jθ θ ( dω)

6 θ−θˇn ≤δ

6 θ−θˇn ≤δ

n PM θ ( dω)

n ˇ + nK 2 Tr GPM θ { θ − θn > δ} n (Cθ (G) + 2ε + ε ) PM θ ( dω) n ˇ + nK 2 Tr GPM θ { θ − θn > δ}

≤

n n ˇ √ (Cθ (G) + 2ε + ε ) + nK 2 Tr GPM θ { θ − θn > δ}. n− n

(B.1) implies that the third term of the last end of the equation tends to n as n → ∞. Therefore, we have, for every ε > 0 and for every ε > 0, lim n Tr GVθ [θˆn , Mn ] ≤ Cθ (G) + 2ε + ε .

n→∞

which leads to the theorem. 4. Use of Quantum Correlation In this section, we consider the minimization of asymptotic mean square error where Mn runs every POM which satisfies MSE consistency. Physically, this means we allow the use of interactions between samples. So far, we considered POM which takes value in Ω, or the totality of all the possible data. Instead, in this section, we consider POM with the values in Rd , for if M is a POM with values in Ω, M ◦ θˆ−1 is POM with the

entire

December 28, 2004

13:56



168

values in Rd . MSE consistency is defined in the same way as the precedent sections. Let Cθn (G) denote the quasi-quantum Cramér-Rao type bound of the ⊗n . Then, the quantum family {ρ⊗n θ |θ ∈ Θ} of density operators in H Q Cramér-Rao type bound Cθ (G) is defined by, CθQ (G) := lim nCθn (G). n→∞

For Cθ (G) ≥

nCθn (G)

holds true, we have, Cθ (G) ≥ CθQ (G).

Theorem 3: If the sequence {M n }∞ n=1 is MSE consistent, we have, lim n tr GVθ (M n ) ≥ CθQ (G).

(16)

n→∞

Proof: In the almost same manner as the proof of theorem 1, we have,

n −1 t Vθ [Mn ] ≥ An,θ JθM An,θ ,

−1 % & t An,θ ≥ nCθn tAn,θ GAn,θ , n Tr GVθ [Mn ] ≥ n tr GAn,θ JθMn which approaches (16) as n → ∞. If the family {ρ⊗n θ |θ ∈ Θ} of density operators satisfies (B.1–5), we have the following theorem. Theorem 4: There is an MSE consistent sequence {Mn } of POM such that limn→∞ n tr GVθ [Mn ] ≤ CθQ (G) + ε is satisfied for every ε > 0 and for every θ ∈ Θ. Proof: Let us divide n samples into n2 groups each of which is consist n1 n1 , . . . , M(n be a sequence of POMs in H⊗n1 . of n1 samples, and let M(1) 2) n1 1 to the first group ρ⊗n of samples, and apply Apply the measurement M(1) θ n1 n1 is dependent M(2) to the second samples, and so on. The choice of M(k) n1 n1 on the outcome of the measurements M(1) , . . . , M(k−1) . With n1 fixed, let us approach n2 to ∞. Then, theorem 2 implies the existence of a MSE consistent sequence {Mn } of POM which satisfies lim n tr GVθ [Mn ] = lim n1 n2 tr GVθ [Mn ] = n1 Cθn1 (G).

n→∞

n2 →∞

(17)

For any epsilon, if n1 is sufficiently large, limn→∞ n tr GVθ (M n ) ≤ CθQ (G) + ε is satisfied, and we have the theorem.

entire

December 28, 2004

13:56



169

References [1] M. Hayashi, “A linear programming approach to attainable Cramér-Rao type bound,” in Quantum Communication, Computing, and Measurement, O. Hirota, A.S. Holevo, and C.M. Caves, eds., Plenum Publishing, New York, 1997. (Chap. 12 of this book) [2] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [3] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982 (in Russian, 1980). [4] E.L. Lehmann, Theory of point estimation, John Wiley, 1983. [5] K. Matsumoto, A geometrical approach to quantum estimation theory, doctoral thesis, Graduate School of Mathematical Sciences, University of Tokyo, 1997. (Chap. 20 of this book) [6] H. Nagaoka, “On the parameter estimation problem for quantum statistical models,” In Proceeding of 12th Symposium on Information Theory and Its Applications, p. 577–582 (1989). (Chap. 10 of this book) [7] H. Nagaoka, “On the relation between Kullback divergence and Fisher information — from classical systems to quantum systems —,” Proceedings of Joint Mini-workshop for data compression theory and fundamental open problems in information theory, 63–72, (1992), (in Japanese). (Chap. 27 of this book) [8] M. Ozawa, “Quantum measuring processes of continuous observables,” J. Math. Phys., 25, 79–87 (1984). [9] W.F. Steinspring, “Positive functions on C ∗ -algebras,” Proc. Am. Math. Soc., 6, 211–216 (1955).

entire

December 28, 2004

13:56


CHAPTER 14

Asymptotic Quantum Theory for the Thermal States Family Masahito Hayashi Abstract. Concerning state estimation, we will compare two cases. In one case we cannot use the quantum correlations between samples. In the other case, we can use them. In addition, under the later case, we will propose a method which simultaneously measures the complex amplitude and the expected photon number for the displaced thermal states.

1. Introduction Quantum estimation is essentially different from classical estimation regarding the following two points. The first point is that we cannot simultaneously construct the optimal estimators corresponding to respective parameters because of non-commutativity between them. It has been a serious problem since the beginning of the quantum estimation [4, 5, 8]. The second point is that we can reduce the estimation error under the assumption that we can prepare independent and identical samples of the unknown quantum state. It was pointed by Nagaoka [6, 7] concerning the large deviation theory in one-parameter estimation. The purpose of this paper is to clear the second point concerning the mean square error (MSE). Our situation is divided into the following two cases. In the first case, we estimate the unknown state by independently measuring every sample. In this case, we may decide the n-th POVM from n − 1 data which have been already given. In the second case, we estimate the unknown state by regarding n-sample system as a single composite system. In this case, we may use POVMs which are indivisible into every sample system. In order to construct these POVMs, we need to use quantum correlations between every sample. The former is called the non-quantum correlation case and c 2000 Kluwer. Reprinted, with permission, from Quantum Communication, Computing, and Measurement 2, (edited by Kumar, P., D’Ariano, G. M., and Hirota, O.), Kluwer/Plenum, New York, 2000, 99–104. 170

entire

December 28, 2004

13:56


Asymptotic Quantum Estimation Theory for the Thermal States Family

171

the later the quantum correlation case. When the unknown state is a pure state, the errors of both are asymptotically equivalent in the first order [2]. Concerning the spin 1/2 system, see Hayashi [1]. In this paper, we formulate a general theory for the asymptotic quantum estimation. It is applied to the simultaneous estimation of the expected photon number and the complex amplitude for the quantum displaced thermal state. 2. Asymptotic Estimation Theory In this paper, we use a quantum state family S parameterized by finite parameters θ1 , · · · , θd : S := {ρθ ∈ S(H)|θ = (θ1 , . . . , θd ) ∈ Θ ⊂ Rd }, where the set S(H) denotes the set of densities on H. For simplicity, we assume that ρθ is nondegenerate. 2.1. Non-quantum correlation case The non-quantum correlation case is formulated as follows. A pair En = ({Mk }nk=1 , θˆn ) is called a recursive estimator where θˆn is a function estimating the unknown parameter from n data, and {Mk }nk=1 is a sequence of POVMs M1 , M2 (ω1 ), . . . , Mn (ω1 , . . . , ωn−1 ) as follows: the n-th POVM Mk (ω1 , . . . , ωk−1 ) is determined by k − 1 data which have been already given. A sequence {En }∞ n=1 of recursive estimators is called a recursive MSE consistent estimator if 6 6 6 ' '2 ' ' lim · · · 'θˆn (ω1 , ω2 , . . . , ωn ) − θ' PρEθn ( dω1 , dω2 , . . . , dωn ) = 0, n→∞ n

∀θ ∈ Θ, where PρEn ( dω1 , dω2 , . . . , dωn ) = tr ρM1 ( dω1 )tr ρM2 (ω1 )( dω2 ) . . . . . . tr ρMn (ω1 , . . . , ωn−1 )( dωn ). We define the non-quantum-correlational Cramér-Rao type bound CθN QC (G) for a weighted matrix G(G is a d × d real positive symmetric matrix.) as: lim inf Tr nGVθ (En ) CθN QC (G) := inf n→∞ {En }∞ is a recursive MSE consistent estimator , n=1

entire

December 28, 2004

13:56


Masahito Hayashi

172

where the MSE matrix Vθ (En ) is given by: 6 6 i,j Vθ (En ) = · · · θˆni (ω1 , . . . , ωn ) − θi θˆnj (ω1 , . . . , ωn ) − θj n

×PρEθn ( dω1 , . . . , dωn ). We have the following equation:

&−1 % CθN QC (G) = inf Tr G JθM M is a POVM on H , where JθM denotes the Fisher information matrix at θ of {tr ρθ M ( dω)|θ ∈ Θ}. It is derived by Jensen’s inequality [3]. Under some regularlity conditions, we show that there exists a recursive MSE consistent estimator {En }∞ n=1 such that [3]: n Tr GVθ (En ) → CθN QC (G) as n → ∞, ∀θ ∈ Θ. According Holevo [5], we have JθM ≤ J˜θ , where J˜θ is the RLD Fisher ˜ θ;j , L ˜ θ;i )∗ ρθ L ˜ θ;i := (ρθ )−1 ∂ρθi . information matrix defined as: J˜θ;i,j := tr(L ∂θ Therefore we have the following inequality CθN QC (G) ≥ CθR (G), where

CθR (G) := inf tr GV |V is a d × d real symmetric matrix V ≥ J˜θ−1 √ √ = Tr G Re J˜θ−1 + Tr G Im J˜θ−1 G .

2.2. Quantum correlation case Next, we formulate the quantum correlation case. For this purpose, we consider a quantum counterpart of independent and identically distributed condition. If H1 , . . . , Hn are n Hilbert spaces which correspond to the physical systems, then their composite system is represented by the tensor Hilbert space. H(n) := H1 ⊗ · · · ⊗ Hn . Thus, a state on the composite system is denoted by a density operator ρ(n) on H(n) . In particular if n element systems (H1 , . . . , Hn ) of the composite system H(n) are independent of each other, there exists a density ρk on Hk such that ρ(n) = ρ1 ⊗ · · · ⊗ ρn , on H(n) .

entire

December 28, 2004

13:56



173

The condition: H1 = · · · = Hn = H, ρ1 = · · · = ρn = ρ corresponds to the independent and identically distributed condition in the classical case. Therefore, we consider the parameter estimation problem for (n) the family {ρθ on H|θ ∈ Θ} which is called the n-i.i.d. extended family. n is a In this case, we use a sequence {M n }∞ n=1 of POVMs where M d POVM on H whose measurable set is R as an estimator. A sequence n ∞ {M n }∞ n=1 is called an MSE consistent estimator if {M }n=1 satisfies 6 (n) ˆ = 0, ∀θ ∈ Θ. lim

θˆ − θ 2 tr ρθ M n ( dθ) n→∞

R

A recursive MSE consistent estimator can be regarded as an MSE consistent estimator because a recursive estimator En = ({Mk }nk=1 , θˆn ) is regarded as a POVM M (En ) as follows: 6 n Mk (ω1 , . . . , ωk−1 )( dωk ), ∀B ⊂ Rd on H(n) . M (En )(B) := −1 θˆn (B) k=1

We define the quantum-correlational Cramér-Rao type bound CθQC (G) for a weighted matrix G as: CθQC (G) := inf lim inf n Tr GVθ (M n ) n→∞

{M n }∞ is an MSE consistent estimator n=1 where the MSE matrix Vθ (M n ) is given by: 6 (n) ˆ Vθi,j (M n ) = (θî − θi )(θˆj − θj ) tr ρθ M n ( dθ). Rd

We have the following equation CθQC (G) = lim inf nCθn (G), n→∞

Cθn (G)

denotes the non-quantum-correlational Cramér-Rao type where bound for the n-i.i.d. extended family [3]. From the definition of the ni.i.d. extended family, we have CθN QC (G) ≥ nCθn (G). Therefore, we have n the inequality CθN QC (G) ≥ CθQC (G) Moreover, since JθM ≤ nJ˜θ for any POVM M n on H(n) , the inequality CθQC (G) ≥ CθR (G) holds. Thus, CθN QC (G) ≥ CθQC (G) ≥ CθR (G). CθN QC (G)

CθQC (G)

(1)

Therefore, the difference between and means the difference of the quantum correlation case from the non-quantum correlation

entire

December 28, 2004

174

13:56


Masahito Hayashi

case. Under some regularlity conditions, we can show that there exists an MSE consistent estimator {M n }∞ n=1 such that [3]: n Tr GVθ (M n ) → CθQC (G) as n → ∞, ∀θ ∈ Θ. 3. Quantum Displaced Thermal States Family Now we consider the estimation for the the complex amplitude ζ and expected photon number N for the quantum displaced thermal states family defined as: 6 1 |ζ − α|2 2 exp − |αα| d α ζ ∈ C, N > 0 . S := ρζ,N := πN C N 3.1. Estimation of complex amplitude ζ In the case of that photon number√N is known, we estimate the tow unknown parameters ζ = (θ1 + iθ2 )/ 2. This estimation problem is investigated by Yuen & Lax [8] and Holevo [5]. In this case they calculated the inverse J˜θ−1 of the RLD Fisher information matrix as: i N + 12 −1 2 ˜ Jθ = . − 2i N + 12 They calculated the non-quantum-correlational Cramér-Rao type bound CθN QC (G) as follows: 9 1 (2) CθN QC (G) = CθR (G) = 2 N + g1 + g12 − g22 − g32 , 2 where the weighted matrix G is parameterized as: g1 + g2 g3 G= . g3 g1 − g2 From (1) and (2), we have the following equations. CθN QC (G) = CθQC (G) = CθR (G).

(3)

In this case, the optimal estimator is the squeezed heterodyne. 3.2. Simultaneous estimation of complex amplitude ζ and expected photon number N Next we consider the case of that both of the expected photon number N and the complex amplitude ζ are unknown. In this case, we estimate three

entire

December 28, 2004

13:56



175

unknown parameters θ1 , θ2 and θ3 = N . The first equation of (3) does not hold. Therefore, the squeezed heterodyne is not optimal. The inverse J˜θ−1 of the RLD Fisher information matrix is calculates as:   i 0 N + 12 2 . J˜θ−1 =  − 2i N + 12 0 0 0 N (N + 1) Therefore we can calculated CθR (G) as: 9 1 CθR (G) = g0 N (N + 1) + 2 N + g1 + g12 − g22 − g32 , 2 if the weighted matrix G can be parameterized as:   0 g1 + g2 g3 G =  g3 g1 − g2 0  . 0 0 g0

(4)

If the weighted matrix G can be parameterized as (4), we obtain the following equations: CθN QC (G) > CθQC (G) = CθR (G). CθN QC (G)

CθQC (G)

(5) CθN QC (G)

A proof for > is omitted. The inequality > QC Cθ (G) means that we cannot the simultaneous measurement of the photon number counting and heterodyne for a single sample. 3.3. Construction of an MSE consistent estimator R {M n }∞ n=1 attaining Cθ (I) Now, for a weighted matrix I, we construct an MSE consistent estimator {M n }∞ n=1 such that lim n tr Vθ (M n ) = CθR (I), ∀θ ∈ Θ.

n→∞

It is sufficient for CθQC (I) = CθR (I) to construct such an MSE consistent estimator. Every POVM M n is constructed in the following step: (1) Evolve the unknown state ρζ,N ⊗ · · · ⊗ ρζ,N as: n

ρζ,N ⊗ · · · ⊗ ρζ,N → Un ρζ,N ⊗ · · · ⊗ ρζ,N Un∗ n

n

= ρ√nζ,N ⊗ ρ0,N ⊗ · · · ⊗ ρ0,N on H(n) , n−1

entire

December 28, 2004

13:56


Masahito Hayashi

176

where Un = exp φn−1 (a∗n a1 − a∗1 an ) · · · · · · exp φ2 (a∗3 a1 − a∗1 a3 ) exp φ1 (a∗2 a1 − a∗1 a2 ) on H(n) 1 φi = arctan √ , i

i = 1, 2, . . . , n − 1.

ai denotes the annihilation operator on Hi . (2) Measure the first sample ρ√nζ,N by the heterodyne, then we get the estimate ζˆ of the complex amplitude. (3) Measure the others by the photon counting, then we obtain n − 1 data which obey the probability distribution P N (k): 1 P (k) = N +1 N

N N +1

k ,

k = 0, 1 . . . .

ˆ of the expected photon number N (4) We obtain the estimate N by the maximum likelihood estimator of the probability distribution P N (k1 ), . . . , P N (kn−1 ). 4. Conclusion We formulate an asymptotic quantum estimation theory. This theory is applied to the simultaneous measurement of the photon number counting and the heterodyne for displaced thermal states. It is a future study to realize the MSE consistent estimator proposed in this paper in an actual physical system. Acknowledgments This work was supported by the Research Fellowships of the Japan Society for the Promotion of Science for Young Scientists No. 9404. The author would like to thank Dr. K. Matsumoto for useful discussions of these topics. References [1] M. Hayashi, “Asymptotic quantum parameter estimation in spin 1/2 system,” LANL e-print quant-ph/9710040, (1997). [2] M. Hayashi, “Asymptotic estimation theory for a finite-dimensional pure state model,” J. Phys. A: Math. Gen., 31, 4633, (1998). (Chap. 23 of this book)

entire

December 28, 2004

13:56



177

[3] M. Hayashi and K. Matsumoto, “Quantum mechanics as a statistical model which allows a free choice of measurements,” in Large deviation and statistical inference RIMS koukyuuroku, 1055, RIMS, Kyoto, (1998). (in Japanese). (Chap. 13 of this book) [4] C.W. Helstrom, Quantum Detection and Estimation Theory, Academic Press, New York, 1976. [5] A.S. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, NorthHolland, Amsterdam, 1982. [6] H. Nagaoka, “On the relation between Kullback divergence and Fisher information — from classical systems to quantum systems —,” Proceedings of Joint Mini-workshop for data compression theory and fundamental open problems in information theory, 63–72, (1992), (In Japanese). (Chap. 27 of this book) [7] H. Nagaoka, “Two quantum analogues of the large deviation Cramr-Rao ´ inequality,” In Proc. 1994 IEEE Int. Symp. on Information Theory, 118, (1994). [8] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of nonselfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973).

entire

December 28, 2004

13:56


CHAPTER 15

State Estimation for Large Ensembles Richard D. Gill and Serge Massar Abstract. We consider the problem of estimating the state of a large but finite number N of identical quantum systems. As N becomes large the problem simplifies dramatically. The only relevant measure of the quality of estimation becomes the mean quadratic error matrix. Here we present a bound on this quantity: a quantum Cramér-Rao inequality. This bound expresses succinctly how in the quantum case one can trade information about one parameter for information about another. The bound holds for arbitrary measurements on pure states, but only for separable measurements on mixed states—a striking example of nonlocality without entanglement for mixed but not for pure states. Cramér-Rao bounds are generally only derived for unbiased estimators. Here we give a version of our bound for biased estimators, and a simple asymptotic version for large N . Finally we prove that when the unknown state belongs to a two-dimensional Hilbert space our quantum Cramér-Rao bound can always be attained and we provide an explicit measurement strategy that attains it. Thus we have a complete solution to the problem of estimating as efficiently as possible the unknown state of a large ensemble of qubits in the same pure state. The same is true for qubits in the same mixed state if one restricts oneself to separable measurements, but non-separable measurements allow dramatic increase of efficiency. Exactly how much increase is possible is a major open problem.

1. Introduction One of the central problems of quantum measurement theory is the estimation of an unknown quantum state. Originally only of theoretical interest, this problem is becoming of increasing practical importance. Indeed there are now several beautiful experimental realizations of quantum state reconstruction in such diverse systems as quantum optics [17], molecular states [6], trapped ions [13] and atoms in motion [12]. The theoretical work which is the basis for these experiments is concerned with devising measurement strategies that are simple to realize exReprinted (abstract/excerpt/figure) with permission from R. Gill and S. Massar, Phys. c Rev. A, 61, 042312, 2000. 2000 by The American Physical Society 178

entire

December 28, 2004

13:56


State Estimation for Large Ensembles

179

perimentally and which allow an unambiguous reconstruction of the quantum state. The best known such technique is quantum state tomography [21], adapted in [14] for the case of finite-dimensional Hilbert spaces. However, other techniques are also available; see Ref. [22] for a recent discussion in the case of finite-dimensional Hilbert spaces. However, all these works suppose that the measurements are perfect and that any operator can be measured with infinite precision. However, in general the quality of the reconstruction will be limited by experimental error [10] or by finite statistics. The present work is devoted to studying this latter aspect, when the unknown state belongs to a finite-dimensional Hilbert space. Thus the setting of the problem is that we may dispose of a finite number N of copies of an unknown quantum state ρ (pure or mixed). Our task is to determine ρ as well as possible. This is by now a classical problem [8, 9]. A common approach is first to specify a cost function which numerically quantifies the deviation of the estimate from the true state. One then tries to devise a measurement and estimation strategy which minimizes the mean cost. Since the mean cost typically depends on the unknown state itself, one typically averages over all possible states to arrive at a single number expressing the quality of the estimation. However optimal strategies have only been found in some simple highly symmetric cases (the covariant measurements of Ref. [9], see also Ref. [15, 20]). However, when the number of copies N becomes large, one can hope that the problem becomes simpler so that one might be able to find the optimal strategies in this limit. The reason for this is that in the large-N limit the estimation problem ceases to be a ‘global’ problem and becomes ‘local’. Indeed for small N the estimated state will often be very different from the true state. Hence the optimal measurement strategy must take into account the behaviour of the cost function for large estimation errors. On the other hand, in the limit of an infinite number of copies any two states can be distinguished with certainty. So the relevant question to ask about the estimation strategy is at what rate it distinguishes neighboring states. In that case we are only concerned with the behaviour of the estimator and of the cost function very close to the true value. To formulate the problem with precision, let us suppose that the unknown state ρ(θ) depends on a vector of p unknown real parameters θ = (θ1 , . . . , θp ). For instance the θi could correspond to various settings or physical properties of the apparatus that produces the state ρ. After carrying out a measurement on the N copies of ρ, one will guess what is θ. Call θˆN = (θˆ1N , . . . , θˆpN ) the guessed value. For a good estimation strategy

entire

December 28, 2004

180

13:56


Richard D. Gill and Serge Massar

we expect the mean quadratic error (MQE) to decrease as 1/N :

Wij (θ) (1) Eθ (θîN − θi )(θˆjN − θj ) ≡ VijN (θ) = N where the scaled MQE matrix W (θ) = (Wij (θ)) = N V N (θ) does not depend on N . Eθ denotes the mean taken over repetitions of the measurement with the value of θ fixed. ˆ θ), which measures how much Consider now a smooth cost function f (θ, ˆ the estimated value θ differs from the true value θ of the parameter. f will have a minimum at θˆ = θ, hence can be expanded as ˆ θ) = f0 (θ) + Cij (θ)(θî − θi )(θˆj − θj ) + O( θˆ − θ 3 ) (2) f (θ, ij

where C(θ) = (Cij (θ)) is a non-negative matrix. Thus for a reasonable estimation strategy the mean value of the cost will decrease as Eθ (f (θˆN , θ)) = f0 (θ) + N −1 Cij (θ)Wij (θ) + o(N −1 ) (3) ij

since we expect the expectation value of higher order terms in θˆ − θ to decrease faster then 1/N . The problem has become local: only the quadratic cost matrix C(θ) and the scaled mean quadratic error matrix W (θ) at θ intervene. The essential question about state estimation for large ensembles is therefore what scaled MQE matrices W (θ) are attainable through arbitrary measurement and estimation procedures? In particular, what does the boundary of this set of attainable MQE matrices look like? In the case when the parameter θ is one-dimensional (p = 1), the problem has been solved: a bound on the variance of unbiased estimators—the quantum Cramér-Rao bound—was given in Ref. [8], and a strategy for attaining the bound in the large-N limit was proposed in Ref. [1]. This justifies taking the bound to induce a ‘distinguishability metric’ on the space of states [23, 3]. In the case of a multidimensional parameter however, though different bounds for the matrix W have been established, in general they are not tight [8, 24, 9]. In this paper we present a bound for W in the multiparameter case which is inspired by the discussion in [1]. This bound expresses in a natural way how one can trade information about one parameter for information about another. The interest of this new bound depends on the precise problem one is considering: (i) When ρ(θ) = |ψ(θ)ψ(θ)| is a pure state belonging to a 2 dimensional Hilbert space, the bound is sharp: it provides a necessary and sufficient

entire

December 28, 2004

13:56



(ii) (iii)

(iv)

(v)

181

condition that W must satisfy in order to be attainable. Furthermore, the bound can be attained by carrying out separate measurements on each particle. This completely solves the problem of estimating the state of a large ensemble of spin-1/2 particles (qubits) in the same pure state. When ρ(θ) is a pure state belonging to a Hilbert space of dimension d larger then 2, then our bound on W applies, but it is not sharp. When the unknown state is mixed and belongs to a 2 dimensional Hilbert space, and if one restricts oneself to measurements that act separately on each particle, then our bound applies and is sharp. When the unknown state is mixed and belongs to a Hilbert space of dimension d > 2, and if one restricts oneself to measurements that act separately on each particle, then our bound applies but is not sharp. If the unknown state is mixed and one allows collective measurements, then our bound is not necessarily satisfied.

This last point is surprising and points to a fundamental difference between measuring pure states and mixed states. Indeed it is known that carrying out measurements on several identical copies of the same pure state can generally be done better with collective measurements on the different copies [16, 15]. This is known as “nonlocality without entanglement” [2]. The first point shows that in the limit of a large number of copies, pure states of spin-1/2 do not exhibit nonlocality without entanglement. On the other hand, the last point shows that in the limit of a large number of copies mixed states of spin-1/2 continue to exhibit non locality without entanglement. To describe our bound on W , we first consider for simplicity the case of a pure state of spin-1/2 particles. Suppose the unknown state is a spin 1/2 known to be in a pure state, and the state is known to be almost pointing in the +z direction: 1 |ψ(θ1 , θ2 ) = |↑z + (θ1 + iθ2 ) |↓z 2

(4)

where we have written an expression valid to first order in θ1 , θ2 . Suppose we carry out a measurement of the operator σx . We obtain the outcome ±x with probability p(±x) = (1±θ1 )/2. Thus the outcome of this measurement tells us about the value of θ1 . Similarly we can carry out a measurement of σy . We obtain the outcome ±y with probability p(±y) = (1 ± θ2 )/2. The outcome of this measurement tells us about θ2 . But the measurements σx and σy are incompatible, i.e., the operators do not commute and cannot be

entire

December 28, 2004

182

13:56



measured simultaneously. Thus if one obtains knowledge about θ1 , it is at the expense of θ2 . Indeed suppose one has N copies of the state ψ and one measures σx on N1 copies and σy on N2 = N −N1 copies. Our estimator for θ1 is the fraction of +x outcomes minus the fraction of −x outcomes. This estimator is unbiased. The resulting uncertainty (at the point θ1 = θ2 = 0) about θ1 is then Eθ ((θˆ1 − θ1 )2 ) = N11 . Similarly we can estimate θ2 and the corresponding uncertainty is Eθ ((θˆ2 − θ2 )2 ) = 1 . We can combine these N2

two expressions in the following relation: 1 1 1 1 + = N + N =N V11 V22 Eθ ((θˆ1 − θ1 )2 ) Eθ ((θˆ2 − θ2 )2 )

(5)

which expresses in a compact form how we can trade knowledge about θ1 for knowledge about θ2 . We shall show that it is impossible to do better than precisely Eq. (5) when one restricts attention to unbiased estimators based on arbitrary measurements, and asymptotically not possible to do better with any estimator whatsover. To generalize Eq. (5), we rewrite it in a more abstract form, and state it as an inequality. We use polar coordinates to parameterize the unknown state of the spin-1/2 particle: |ψ = cos η2 | ↑ + sin η2 eiϕ | ↓. We introduce the tensor Hηη = 1 ,

Hϕϕ = sin2 η

,

Hηϕ = 0

(6)

which is simply the Euclidean metric on the sphere. Then bound (5) can be reexpressed as tr H −1 (V N )−1 ≤ N

(7)

where V N is the MQE matrix defined in Eq. (1). For mixed states belonging to a 2 dimensional Hilbert space, (7) can be generalized as follows. Let us suppose that the state ρ(θ) depends on three unknown parameters. Then we can parameterize it by ρ(θ) = 12 (I + i θi σi ) where I is the identity matrix, σi are the Pauli matrices, and the three parameters θi obey θ 2 = i θi 2 ≤ 1. We now introduce the tensor Hij (θ) = δij +

θi θj 1 − θ 2

(8)

which generalizes tensor (6) to the case of mixed states. Then, upon restricting oneself to separable measurements, we will show that the MQE matrix V N must satisfy (exactly for unbiased estimators, and otherwise asymptotically) tr H(θ)−1 V N (θ)−1 ≤ N .

(9)

entire

December 28, 2004

13:56



183

As an application of these results, the minimum of the cost function (3) in the case of spin-1/2 particles (for mixed states restricting oneself to separable measurement) is 2

; tr H(θ)−1/2 C(θ)H(θ)−1/2 ˆ θ)) = f0 (θ) + + o(1/N ) (10) min Eθ (f (θ, N which is obtained simply by minimizing Eq. (3) subject to the constraints (7) or (9). We can compare Eq. (10) with the exact results which are known in the case of covariant measurements on pure states of spin-1/2 particles [9, 15]. In this problem one is given N spin-1/2 particles polarized along the direction Ω. Ω is uniformly distributed on the sphere. One wants to devise a measurement and estimation strategy that minimize the mean value of the cost function cos2 ω/2, where ω is the angle between the estimated direction ˆ and the true direction Ω. Expanding the cost function to second order in Ω ω (to obtain the quadratic cost matrix C) and averaging Eq. (10) over the sphere, one finds E(cos2 ω/2) ≥ 1 −

1 1 + o( ) N N

(11)

which in the limit for large N coincides with the results (exact for all N ) of Refs. [9, 15]. If the directions Ω are not uniformly distributed, then Refs. [9, 15] do not apply, but Eq. (10) stays valid. However, we cannot compare our results with the recent analysis of covariant measurements on mixed states [20] because we suppose separability of the measurement, whereas Ref. [20] does not. Equations (7) and (9) have a simple generalization to the case of particles belonging to higher-dimensional Hilbert spaces. But in these cases these bounds are no longer sharp. In order to appreciate the above results, we must recall some results from classical statistical inference. This is the subject of Sec. 2. 2. Classical Cram´ er-Rao Bound Consider a random variable X with probability density p(x, θ). The connection with the quantum problem is that we can view p(x, θ) as the probability density that a quantum measurement on the system yields an outcome x given that the state was ρ(θ). We take a random sample of size N from the distribution, and use it to estimate the value of each parameter θi . Call θîN

entire

December 28, 2004

13:56



184

the estimated value. The following results about the MQE matrix of the estimator are well known. (1) Suppose that the estimator is unbiased, that is, Eθ (θˆN − θ) 2= 0, where Eθ is the expectation value at fixed θ, i.e., the integral dxp(x|θ). Define its MQE matrix V N (θ) by VijN (θ) = Eθ ((θîN − θi )(θˆjN − θj )).

(12)

Furthermore define the Fisher information matrix I(θ) by 6 ∂θ p(x|θ)∂θj p(x|θ) .(13) Iij (θ) = Eθ (∂θi ln p(X|θ)∂θj ln p(X|θ)) = dx i p(x|θ) Then, for any N , the following inequalities, known as the Cramér-Rao inequalities, hold [4, 8]: V N (θ) ≥ I(θ)−1 /N

(14)

or, equivalently V N (θ)

−1

≤ N I(θ),

(15)

the inequality meaning that the difference of the two sides is a nonnegative matrix. (2) The hypothesis of unbiased estimators is very restrictive since most estimators will be biased. Happily it is possible to relax this condition. Here are just two of the many results available: (a) First of all, if one is interested in averaging the mean cost over possible values of θ with respect to a given prior distribution λ(θ), then there is a Bayesian version of the Cramér-Rao inequality, the van Trees inequality [7, 19]. In the multivariate case, upon giving oneself a quadratic cost function determined by a matrix C(θ), one can derive the inequality 2 6 dθλ(θ) tr C(θ)I −1 (θ) α − 2 (16) dθλ(θ)tr C(θ)V N (θ) ≥ N N where α is a positive number that depends on C(θ), I(θ), and λ(θ), but is independent of N . (b) The second approach makes no reference to any prior distribution for θ, but only holds in the limit N tending to infinity and lays a mild restriction on the estimators considered. Specifically, if the √ probability distribution of N (θˆN − θ) converges uniformly in θ towards a distribution depending continuously on θ, say of a random

entire

December 28, 2004

13:56



185

vector Z, then the limiting scaled MQE matrix W (θ) defined by Wij (θ) = Eθ (Zi Zj ) obeys W ≥ I −1 . (3) Furthermore in the limit of arbitrarily large samples one can attain the Cramér-Rao bound. This is proven by explicitly constructing an estimator that attains the bound in the extended senses (2a) (apart from the 1/N 2 term) or (2b) just indicated: the maximum likelihood estimator (MLE). Modern statistical theory contains many other results having the same flavour as point (2) above, namely, that the Cramér-Rao bound holds in an approximate sense for large N , without the restriction to biased estimators. Result (2a) applies to a larger class of estimators than (2b), but only gives a result on the average behaviour over different values of θ. On the other hand combining results (3) and (2b) tells us that the maximum likelihood estimator is for large N an optimal estimator for each value of θ separately. The reason why in (2b) additional regularity is demanded is because of the phenomenon of super-efficiency (see Ref. [18] for a recent discussion), whereby an estimator can have mean quadratic error of smaller order than 1/N at isolated points. Modern statistical theory (see again Refs. [18] or [11]) has concentrated on the more difficult problem of obtaining non-Bayesian results (i.e., pointwise rather than average) making much use of the technical tool of “local asymptotic normality.” A major challenge in the quantum case is to obtain a result of type (2b) when this technique is definitely not available.

3. Quantum Cram´ er-Rao Bound In this paper we show that similar results to (1), (2a), (2b), and (3) can be obtained when one must estimate the state of an unknown quantum system ρ(θ) of which one possesses N copies. This problem is most simply addressed, following Ref. [3], by decomposing it into a first (quantum) step in which one carries out a measurement on ρN = ρ ⊗ · · · ⊗ ρ and a second (classical) step in which one uses the result of the measurement to estimate the value of the parameters θ. The most general way to describe the measurement is by a positive operator-valued measurement (POVM) M = (Mξ ) whose elements satisfy Mξ ≥ 0, ξ Mξ = I. (For simplicity we take the outcomes of the POVM to be discrete. The generalization to an arbitrary outcome space is just a question of translating into measure-theoretic language.)

entire

December 28, 2004

13:56



186

Quantum mechanics tells us the probability to obtain outcome ξ given state ρ(θ): p(ξ|θ) = Tr ρN (θ)Mξ .

(17)

From the outcome ξ of the measurement one must guess what are the values of the p parameters θi . Call θˆN the estimated value of the parameter vector. We want to obtain bounds on the MQE matrix V N (θ) of the estimator θˆN when the true parameter value is θ, thus VijN (θ) = Eθ (θîN − θi )(θˆjN − θj ). To proceed we temporarily make the simplifying assumption that the estimators are unbiased, Eθ θˆN = θ. Then we can apply the classical CramérRao inequality to the probability distribution p(ξ|θ) to obtain V N ≥ I N (M, θ)−1

(18)

(V N )−1 ≤ I N (M, θ)

(19)

or

where the Fisher information matrix I N for the measurement M is defined by N (M, θ) = Iij

∂i p(ξ|θ)∂j p(ξ|θ) ξ

p(ξ|θ)

=

N Tr(ρN ,i Mξ ) Tr(ρ,j Mξ ) ξ

Tr(Mξ ρN )

(20)

N with ρN ,i = ∂θi ρ . These expressions suggest the following questions.

(1) is there a simple bound for the MQE V N of unbiased estimators θˆN , or equivalently for the Fisher information I N (M, θ)? (2) is the bound also valid for sufficiently well behaved but possibly biased estimators—at least in the limit of large N ? (3) can this bound be attained—at least in the limit of a large number of copies N ? Most of the work on this subject has been devoted to answering the question (1). We now recall what is known about these questions. Suppose first the parameter θ is one-dimensional, p = 1. The symmetric logarithmic derivative (SLD) λθ of ρ is the Hermitian matrix defined implicitly by ρ,θ =

λθ ρ + ρλθ . 2

(21)

entire

December 28, 2004

13:56



In a basis where ρ is diagonal, ρ = yield

k

187

pk |k k|, this can be inverted to 2 . pk + pl

(22)

N Iθθ (M, θ) ≤ N Tr ρλθ λθ .

(23)

(λθ )kl = (ρ,θ )kl Then we have the bound

Furthermore it was suggested in Ref. [1] how to adapt the classical MLE so as to attain, in the limit of large N , the bound (23). In the multiparameter case the bound based on the SLD can be generalized in a natural way. Define the SLD along direction θi by ρ,i =

λi ρ + ρλi , 2

(24)

and Helstrom’s quantum information matrix H by Hij = Tr ρ

λi λj + λj λi . 2

(25)

(This is the same matrix that was introduced for spin-1/2 particles for a particular choice of parameters in (6) and (8)). Then one can prove the bound [8], I N (M, θ) ≤ N H(θ).

(26)

(This can be deduced directly from Eq. (23) as proven in [3]. Indeed since Eq. (23) holds for each path in parameter space, it implies the matrix equation (26)). However, this bound is in general not achievable. Another bound has been proposed based on an asymmetric logarithmic derivative (ALD) [24] which in some cases is better than Eq. (26). Holevo [9] has proposed yet another bound that is stronger then both the SLD and the ALD bounds, but this bound is not explicit: it requires a further minimization. As far as we know no general achievable bound is known in the multiparameter case. The difficulty in obtaining a simple bound in the multiparameter case is that there are many inequivalent ways in which one can minimize the MQE matrix VijN . That is, in order to build a good estimator one must make a choice of what one wants to estimate, and according to this choice the measurement strategy followed will be different. Hence a bound in the form of a matrix inequality like Eq. (26) cannot be expected to be tight.

entire

December 28, 2004

13:56

188



4. Results In this paper we obtain answers to the three questions raised above in the multiparameter case. Our results are summarized in this section. We first discuss point 1), that is bounds on the Fisher information. We shall show the following: Theorem 1: When ρ(θ) = |ψ(θ)ψ(θ)| is a pure state, then the Fisher information I N (M, θ) defined in Ref. (20) must satisfy the relation tr H −1 (θ)I N (M, θ) ≤ (d − 1)N

(27)

where H −1 is the inverse of the quantum information matrix defined in Eq. (25), and d is the dimension of the Hilbert space to which ρ(θ) belongs. Note that this inequality (27) is invariant under change of parameterization θ → θ (θ). This result immediately gives an inequality for the mean quadratic error matrix of unbiased estimators θˆN by invoking the classical Cramér-Rao inequality in order to replace I N (M, θ) by the inverse of the MQE V N (θ): tr H −1 (θ)(N V N (θ))−1 ≤ d − 1 .

(28)

Theorem 2: When ρ(θ) is a mixed state, and if the measurement M consists of separate measurements on each particle, then the Fisher information also satisfies Eq. (27). Hence for separable measurements on a mixed state, the MQE matrix of an unbiased estimator satisfies Eq. (28). Theorem 3: In the case of mixed states, it is in general possible to devise a collective measurement for which the Fisher information does not satisfy the inequality (27). The second part of the paper consists of proving that the constraint (28) also holds for biased estimators under suitable additional conditions. We give two forms of this generalized form of Eq. (28) corresponding to the two forms (2a) and (2b) of the generalized classical Cramér-Rao inequality. Consider N copies of a state ρ(θ). If ρ is pure we can make either collective or separable measurements. If ρ is mixed we restrict ourselves to separable measurements [since Theorem 3 shows that in this case collective measurements can beat Eq. (27)]. Based on the outcome of the measurement we estimate the value of the parameter vector θ. Call θˆ the estimator, and denote by V N = V N (θ) its MQE matrix when the true value of the parameter is θ.

entire

December 28, 2004

13:56



189

We shall prove the following generalization of result of type 2b) concerning the behaviour of the mean quadratic error matrix as N tends to infinity: Theorem 4: Suppose that the scaled MQE N V N (θ) has the limit W (θ) as N → ∞. Suppose that the convergence is uniform in θ, and that W is continuous at the point θ = θ0 . Furthermore we suppose that H and its derivatives are bounded in a neighbourhood of this point. Then we shall prove in section 6 that W (θ0 ) must satisfy tr H −1 (θ0 )W −1 (θ0 ) ≤ (d − 1).

(29)

This result gives a bound on the mean value of a quadratic cost function C as N tends to infinity. Indeed, using a Lagrange multiplier to impose the condition (29), the minimum cost is readily found to be 2 9 1 1 1 0 N 0 − − 0 0 0 2 2 tr H (θ )C(θ )H (θ ) . (30) lim N tr C(θ )V (θ ) ≥ N →∞ d−1 In terms of a cost function, it is also possible to prove a Bayesian version of the Cramér-Rao inequality which is the analogue of the classical result (2a). Theorem 5: Suppose that one is given a quadratic cost function C(θ) and a prior distribution λ(θ) for the parameters θ. If C, λ, and H are sufficiently smooth functions of θ (the continuity of the first derivatives is sufficient), while λ is zero outside a compact region with smooth boundary, then 6 dθλ(θ) tr C(θ)V N (θ) 1 ≥ (d − 1)N

6

2 9 α 1 1 − − H 2 (θ)C(θ)H 2 (θ) − 2 dθλ(θ) tr N

(31)

where α is a constant independent of N but which depends on C, λ and H. Theorems 1, 2, 4, and 5 put bounds on the MQE matrix of an estimator of an unknown state ρ(θ) (for mixed states, under the restriction that the measurement is separable). The third part of this paper is devoted to showing that in the case of spin-1/2 systems (d = 2) these bounds can be attained. We first show that at any point θ0 we can attain equality in Eq. (27). Theorem 6: Suppose one has N spin-1/2 particles in an unknown (possibly mixed) state ρ(θ). Fix any point θ0 . Give yourself a matrix G0 satisfying

entire

December 28, 2004

190

13:56



tr H −1 (θ0 )G0 ≤ 1. We call G0 the target scaled information matrix. Then 0 there exists a measurement M θ (depending on the choice of θ0 ) acting on 0 each spin separately such that I N (M θ , θ0 ) = N G0 . This measurement is described in detail in Sec. 7.1. For large N we can also approximately attain equality at all points θ simultaneously. Theorem 7: Suppose one has N spin-1/2 particles in an unknown pure state |ψ(θ), or suppose that one has N spin-1/2 particles in an unknown mixed state ρ(θ). In the latter case we also require that the state never be pure, i.e., Tr ρ(θ)2 < 1 for all θ. Give oneself a smooth positive matrix function G(θ) satisfying tr H −1 (θ)G(θ) ≤ 1 for all θ, the target scaled information for each possible value of θ. Define the corresponding target scaled MQE matrix W (θ) = G(θ)−1 . Suppose that W (θ) is nonsingular [i.e., G(θ) never has a zero eigenvalue]. Then there exists a measurement M acting on each spin ˆ whose MQE matrix V N (θ) separately, and a corresponding estimator θ, satisfies V N (θ) =

W (θ) + o(1/N ) N

(32)

√ for all values of θ simultaneously. For this estimation strategy N (θˆ − θ) converges in distribution towards N (0, W ), the normal distribution with mean zero and covariance W . The measurement M and estimation strategy is described in detail in Sec. 7.2. It is interesting to note that the measurement strategy which satisfies Eq. (32) is an adaptive one. That is, one first carries out a measurement on a small fraction of the particles. This gives a preliminary estimate of the quantum state which allows a fine tuning of the measurements that are carried out on the remaining particles. This is to be contrasted with previously proposed state estimation strategies in the case of finite-dimensional Hilbert spaces [14, 22] in which the same measurement is carried out on all the particles. The necessity of an adaptive measurement strategy if one wants to minimise the MQE was pointed out in Ref. [1]. When the unknown state belongs to a Hilbert space of dimension d > 2, then bound (27) cannot be attained in general. Indeed, we shall show in Sec. 5.6 that for d > 2, neither Eq. (26) nor (27) implies the other.

entire

December 28, 2004

13:56



191

5. New Quantum Cram´ er-Rao Inequality In this section we prove Theorems 1, 2, and 3. That is, we prove Eq. (27) for general measurements in the case of pure states and for separate measurements on each particle in the case of mixed states. 5.1. Preliminary results The first step in proving Eq. (27) is to show that one can restrict oneself to POVM’s whose elements are proportional to one-dimensional projectors. Indeed, any POVM can always be refined to yield a POVM whose elements are proportional to one dimensional projectors. We call such a measurement exhaustive. This yields a refined probability distribution (p(ξ, θ)). It is well known that under such refining of the probability distribution, the Fisher information can only increase [5]. The second step in proving Eq. (27) consists of increasing the number of parameters. Suppose that ρ(θ) depends on p parameters θi , i = 1, . . . , p. If ρ = |ψ(θ)ψ(θ)| is a pure state, then p ≤ 2d − 2 (since |ψ(θ) is normalized and defined up to a phase). If ρ is a mixed state, then Hermiticity and the condition Tr ρ = 1 impose that p ≤ d2 − 1. Suppose that p < ν is less then the maximum number of possible parameters (ν = 2d − 2 or ν = d2 − 1 according to whether the state is pure or mixed). Then one can always increase the number of parameters up to the maximum. Indeed let us suppose that to the p parameters, one adds independent parameters θi , ˜ i = p + 1, . . . , ν. We can now consider the quantum information matrix H, ˜ and Fisher information matrix I, for the completed set of parameters. We shall show below that ˜ −1 (θ)IÑ (M, θ). tr H −1 (θ)I N (M, θ) ≤ tr H

(33)

Therefore it will be sufficient to prove Eq. (27) in the case when there are ν parameters. To prove Eq. (33), fix a particular point θ0 . At this point we have the derivative ρ,i and SLD λi of ρ for i = 1, . . . , p. Introduce a set of Hermitian matrices λi with Tr ρ(θ0 )λi = 0, for i = p + 1, . . . , ν, such that λi λi + λi λi = 0 , i = 1, . . . , p , i = p + 1, . . . , ν. (34) 2 This is always possible because we can view (34) as a scalar product between λi and λi and a Gram-Schmidt orthogonalization procedure will then yield the matrices λi . Now define matrices ρ,i by ρ,i = (ρ(θ0 )λi + λi ρ(θ0 ))/2 and define additional parameters θi satisfying, at θ0 : ∂θi ρ = ρ,i . The point Tr ρ(θ0 )

entire

December 28, 2004

13:56



192

of this construction is that because of Eq. (34), the quantum information ˜ is block diagonal with the first block equal to H. Let I(M ˜ ) be the matrix H Fisher information matrix for the enlarged set of parameters (but the same ˜ ) = tr(H ˜ −1 )11 I˜11 (M ) + tr(H ˜ −1 )22 I˜22 (M ) ˜ −1 I(M measurement). Then tr H where the indices 11 and 22 denote the blocks of these matrices corresponding to the original and new parameters. But both terms are non-negative ˜ −1 )11 = H −1 , so we obsince all matrices involved are non-negative, and (H tain Eq. (33) at θ0 and for the particular parameters just introduced. But since the right-hand side of Eq. (33) is invariant under reparameterization, it is true for any parameterization, and at any θ. 5.2. Pure states To proceed we shall consider a POVM whose elements are proportional to one dimensional projectors, and calculate explicitly the left-hand side of Eq. (27) in the case where the number of parameters is the maximum p = 2d − 2 in a basis where H is diagonal. We fix a point θ0 . At this point we chose a basis such that ρ(θ0 ) = |11| .

(35)

Hence the density matrix of the N copies is ρN = |11| ⊗ . . . ⊗ |11| .

(36)

Consider now the 2d − 2 Hermitian operators ρ,k+ = |1 k| + |k 1| ,

ρ,k− = i |1 k| − i |k 1| ,

1 < k ≤ d . (37)

We choose a parameterization such that in the vicinity of θ0 , it has the 0 )ρ,k± with the unknown parameters θk± , form ρ = ρ(θ0 ) + k,± (θk± − θk± k = 2, . . . , d. With this parameterization the derivatives of ρN are ρN ,k± = ρ,k± ⊗ ρ · · · ⊗ ρ + · · · + ρ ⊗ · · · ⊗ ρ,k± .

(38)

One then calculates the SLD of ρ, and hence the quantum information matrix H. One verifies that in this basis H is diagonal: Hk±,k ± = 4δkk δ±± .

(39)

Consider any POVM whose elements are proportional to one dimensional projectors Mξ = |ψξ ψξ |

,

|ψξ =

d k1 =1

...

d kN =1

aξk1 ...kN |k1 . . . kN .

(40)

entire

December 28, 2004

13:56



193

The completeness relation ξ Mξ = I takes the form = δ k k . . . δk k a∗ξk1 ...kN aξk1 ...kN . 1 1 N N

(41)

ξ

To proceed we need the formulas Tr ρ(θ0 )Mξ = |aξ1...1 |2

(42)

(a∗ξ1...1 aξ1...kp =k...1 + a∗ξ1...kp =k...1 aξ1...1 ),

(43)

and Tr ρ(θ0 ),k+ Mξ =

N p=1

and similarly for Tr ρ(θ0 ),k− Mξ . Thus we obtain (Tr ρ(θ0 ),k+ Mξ )2 + (Tr ρ(θ0 ),k− Mξ )2 =

N

4|aξ1...1 |2 |aξ1...kp =k...1 |2 . (44)

p=1

Putting everything together yields tr H −1 I(M ) =

ξ

=

1 1 (Tr ρ(θ0 ),k+ Mξ )2 + (Tr ρ(θ0 ),k− Mξ )2 0 Tr ρ(θ )Mξ 4 ± d

k=2

d N k=2 p=1

|aξ1...kp =k...1 |2 = N (d − 1),

(45)

ξ

which proves that equality holds in Eq. (27) for arbitrary exhaustive measurements in the case of pure states. 5.3. One mixed state Deriving Eq. (27) for mixed states is more complicated than for pure states, and we shall proceed in two steps. First we shall consider the case of one mixed state (N = 1), and show that equality in Eq. (27) holds in this case for arbitrary exhaustive measurements. Then we shall consider the case of an arbitrary number N of mixed states. We first diagonalize ρ at a point θ0 : ρ(θ0 ) = dk=1 pk |k k|. We now introduce the following complete set of Hermitian traceless matrices: ρ,kl+ = |k l| + |l k| , ρ,m =

d k=1

cmk |k k| ,

ρ,kl− = i |k l| − i |l k| , m = 1, . . . , d − 1,

k < l, (46)

entire

December 28, 2004

13:56



194

where the coefficients cmk obey 1 cmk = 0, cm k cmk = δm m . pk k

(47)

k

Let us denote the matrices ρ,kl± and ρ,m collectively as ρ,i . [They constitute a set of generators of su(d)]. We choose a parameterization such that in the vicinity of θ0 , it has the form ρ = ρ(θ0 )+ i (θi − θi0 )ρ,i . One then calculates the SLD of ρ, and from this the quantum information matrix H. One verifies that in this basis H is diagonal: Hkl±,k l ± =

4 δkk δll δ±± , pk + pl

Hkl±,m = 0,

Hm,m = δm m . (48)

Consider any POVM whose elements are proportional to one dimensional projectors aξk |k . (49) Mξ = |ψξ ψξ | , |ψξ = k

The left-hand side of Eq. (27) can now be written as tr H −1 I(M ) ! pk +pl 1 2 2 ψξ |ρ,kl± |ψξ + ψξ |ρ,m |ψξ . (50) = ψξ |ρ|ψξ 4 ± m ξ

k N (d − 1) = 2. This shows that the optimal Fisher information is non additive. 5.6. Comparison with other Quantum Cram´ er-Rao bounds An important question raised by bound (27) is how it compares to other quantum Cramér-Rao bounds obtained in the literature. In this respect, our most important result is that Eq. (27) is both a necessary and sufficient condition that I(M, θ) must satisfy when the dimensionality of the system d equals 2 and the state is pure. This will be proven and discussed in detail in Sec. 7. When d > 2, Eq. (27) is not a sufficient condition that I(M, θ) must satisfy. To see this let us compare Eq. (27) with the bound derived by Helstrom based on the SLD. This bound is the matrix inequality I N (M, θ) ≤ N H(θ);

entire

December 28, 2004

13:56



198

see Eq. (26). The comparison is most easily carried out by defining the map 1 1 trix F = N1 H − 2 I N H − 2 = i=1 γi fi ⊗ fi where γi are the eigenvalues of F and fi its eigenvectors. Helstrom’s bound can be reexpressed as γi ≤ 1 for all i, whereas bound (27) states that i γi ≤ d − 1. From these expressions it results that the bound (27) is better then Helstrom’s bound for d = 2. For d > 2 and p ≤ d − 1, Helstrom’s bound is better than Eq. (27) as is seen by summing the inequalities γi ≤ 1 to obtain i γi ≤ p. For p > d − 1, neither Helstrom’s bound nor bound (27) are better than the other. Yuen and Lax [24] proposed another matrix bound based on an asymmetric logarithmic derivative. This bound is known to be worse then the bound based on the SLD in the case of one parameter, but it can be better, for some loss functions, in the case of two or more parameters. We have however not been able to make a detailed comparison between the bound based on the ALD and Eq. (27). Although when d > 2, bound (27) is not a sufficient condition it can be complemented by additional constraints based on partial traces of H −1 I N (M, θ) which we now exhibit. Consider a subset i = 1, . . . , p (p < p) of the parameters. Let ρ,i be the corresponding derivatives of ρ(θ). Let us define the effective dimension d of the space in which these parameters act at the point θ0 as follows. Let Π be a projector that commutes with ρ(θ0 ) ([Π, ρ(θ0 )] = 0) and such that ρ,i , i = 1, . . . , p acts only within the eigenspace of Π (that is Πρ,i Π = ρ,i ). Then d is the smallest dimension of the eigenspace of such a projector Π (d = Tr Π). To be more explicit, let us reexpress the definition of d in coordinates. First we diagonalize ρ(θ0 ) = k pk |kk|. If some pk are equal this can be done in many ways. The projector Π projects onto some of the d eigenvectors of ρ: Π = k=1 |kk|. Next we write the operators ρ,i in this d basis: ρ,i = k,l=1 (ρ,i )kl |kl|, where the fact that the indices k, l go from one to d expresses the fact that ρ,i acts only within the eigenspace of Π. Finally we choose the smallest such d . We will show that

p

N 0 Hi−1 j Ii j (M, θ ) ≤ N (d − 1) .

(63)

i ,j =1

Before proving this result let us illustrate it by an example. Consider an unknown pure state in d dimensions. In the neighbourhood of a particular point we can parameterize the state by ψ = |1 + (θ2 + iη2 ) |2 + ... + (θd + iηd ) |d

(64)

entire

December 28, 2004

13:56



199

where the unknown parameters are θi and ηi , i = 2, . . . , d. There are thus 2d − 2 parameters. At the point θ = η = 0, H is diagonal in this parameterization: Hθi θj = δij , Hηi ηj = δij , and Hθi ηj = 0. Hence Eq. (27) takes the form

IθNi θi (M, θ = η = 0) + IηNi ηi (M, θ = η = 0) ≤ N (d − 1).

(65)

i

But using (63) we also find the constraints IθNi θi (M, θ = η = 0) + IηNi ηi (M, θ = η = 0) ≤ N

,

i = 2, . . . , d

(66)

which are stronger than Eq. (65) since they must hold separately, but by summing them one obtains Eq. (65). The proof of Eq. (63) proceeds as in Sec. 5. First we can restrict ourselves to POVM’s whose elements are proportional to one dimensional projectors. Second we can restrict ourselves to the subspace Π in evaluating Eq. (63). This follows from the inequality I(M )i j =

Tr(ρ,i Mξ ) Tr(ρ,j Mξ ) ξ

=

ξ

≤

Tr(ρMξ )

Tr(ρ,i ΠMξ Π) Tr(ρ,j ΠMξ Π) Tr(ρΠMξ Π) + Tr(ρ(1 − Π)Mξ (1 − Π))

Tr(ρ,i ΠMξ Π) Tr(ρ,j ΠMξ Π) Tr(ρΠMξ Π)

ξ

.

(67)

Note that equality in Eq. (67) holds when the measurement consists of one dimensional projectors and when the POVM decomposes into the sum of two POVM’s acting on the subspaces spanned by Π and 1 − Π separately (i.e., the POVM elements Mξ = |ψξ ψξ | must commute with Π and 1−Π). 2 Third, we can increase the number of parameters from p to d − 1. We then introduce exactly as in Eq. (46) a parameterization in which the ρ,i are particularly simple, but in place of Eq. (52) we use 1≤m ≤d

c m k c m l = δ k l p k −

p k p l . Tr(Πρ)

(68)

After these preliminary steps the left-hand side of Eq. (63) is calculated exactly as in Secs. 5.2, 5.3, and 5.4.

entire

December 28, 2004

13:56



200

6. Dropping the Condition of Unbiased Estimators 6.1. Quantum van Trees inequality In the previous section we proved a bound on the MQE of unbiased estimators θˆN of N copies of the quantum system ρ(θ) (with the additional condition that if ρ is mixed the measurement should be separable). In this section we shall prove Theorems 4 and 5, that under additional conditions it is possible to drop the hypothesis that the estimator is unbiased. The starting point for the results in this section is a Bayesian form of the Cramér-Rao inequality, the van Trees inequality [19], and in particular the multivariate form of the van Trees inequality proven in Ref. [7]. Adapted to the problem of estimating a quantum state, this inequality takes the following form. Let θˆN be an arbitrary estimator of the parameter θ based on a measurement M of the system ρN (θ). Suppose it has MQE matrix V N (θ), and Fisher information matrix I N (M, θ). Let λ(θ) be a smooth density supported on a compact region (with smooth boundary) of the parameter space, and suppose λ vanishes on the boundary. By Eλ we denote expectation over a random parameter value Θ with the probability density λ(θ). Let C(θ) and D(θ) be two p × p matrix valued functions of θ, the former being symmetric and positive definite. Then the multivariate van Trees inequality reads Eλ tr C(Θ)V N (Θ) ≥ where

Eλ

(Eλ tr D(Θ))2 −1 tr C(Θ) D(Θ)I N (M, Θ)D(Θ)"

˜ + I(λ)

,

(69)

"

denotes the transpose of the matrix, and 6 1 ˜ I(λ) = dθ Cij (θ)−1 ∂θk {Dik (θ)λ(θ)}∂θl {Djl (θ)λ(θ)}. λ(θ)

(70)

ijkl

As a first application of this inequality we shall prove Theorem 5, that is bound the minimum value averaged over θ of a quadratic cost function. Let C(θ) be the quadratic cost function. Consider the matrix Wopt (θ) that minimizes for each value of θ the cost tr C(θ)W (θ) under the condition that tr H(θ)−1 W (θ)−1 ≤ d − 1. One easily finds that Wopt

√ H −1/2 CH −1/2 −1/2 ; 1/2 −1 1/2 −1/2 H = H C H H d−1 √ tr C 1/2 H −1 C 1/2 −1/2 ; 1/2 −1 1/2 −1/2 C = C H C C d−1 tr

(71) (72)

entire

December 28, 2004

13:56



and that

√ 2 tr C 1/2 H −1 C 1/2

2

√ tr H −1/2 CH −1/2

tr CWopt =

=

d−1

201

d−1

.

(73)

In Eq. (69) we choose D(θ) = C(θ)Wopt (θ). Thus tr D(θ) = tr C(θ)Wopt (θ) is given by Eq. (73). Note that D(θ)" C(θ)−1 D(θ) = Wopt (θ)C(θ)Wopt (θ) =

tr C(θ)Wopt (θ) H(θ)−1 .(74) d−1

Thus tr C(θ)Wopt (θ) tr H(θ)−1 I N (M, θ) d−1 (75) ≤ N tr C(θ)Wopt (θ) .

tr D(θ)" C(θ)−1 D(θ)I N (M, θ) =

Inserting these expressions into Eq. (69), one obtains (Eλ tr C(Θ)Wopt (Θ))2 ˜ N Eλ tr C(Θ)Wopt (Θ)+ I(λ) 2 Eλ tr C(Θ)Wopt (Θ) α ≥ − 2 N N

Eλ tr C(Θ)V N (Θ) ≥

where α=

˜ I(λ) Eλ tr C(Θ)Wopt (Θ)

(76)

is independent of N . This proves that upon averaging over θ it is impossible (for large N ) to improve over the minimum cost [Eq. (30)]. 6.2. Asymptotic version of the Cram´ er-Rao inequality We now prove Theorem 4, that is an asymptotic version of our main inequality (28) which is valid at every point θ and does not make the assumption of unbiased estimators. We must, however, slightly restrict the class of competing estimators since otherwise by the phenomenon of super-efficiency we can beat a given estimator at any specific value of the parameter, though we pay for this by bad behaviour closer and closer to the chosen value as N becomes larger. The restriction on the class of estimators is that N times their mean quadratic error matrix must converge uniformly in a neighbourhood of the true value θ0 of θ to a limit W (θ), continuous at θ0 . We assume that both W (θ0 ) and H(θ0 ) are nonsingular. Furthermore, we shall require some mild

entire

December 28, 2004

13:56



202

smoothness conditions on H(θ) in a neighbourhood of θ0 : it must be continuous at θ0 with bounded partial derivatives with respect to the parameter in a neighbourhood of θ0 . Note that imposing regularity conditions on H is natural since it corresponds to supposing that the θi smoothly parameterize the allowed density matrices. Suppose that, as N → ∞, N V N (θ) → W (θ) uniformly in θ in a neighbourhood of θ0 , with W continuous at θ0 ; write W 0 = W (θ0 ). Now in Eq. (69) let us make the following choices for the matrix functions C and D: C(θ) = W 0

−1

H −1 (θ)W 0

−1

,

D(θ) = W 0

−1

H −1 (θ).

Then Eq. (69) (multiplied throughout by N ) and (70) become Eλ tr W 0

−1

H −1 (Θ)W 0

−1

−1

N V N (Θ) ≥

(Eλ tr W 0 H −1 (Θ))2 1 −1 I N (M, Θ) + 1 I(λ) ˜ N Eλ tr H N −1

≥ and

6 ˜ I(λ) =

dθ

(Eλ tr W 0 H −1 (Θ))2 ˜ (d − 1) + N1 I(λ)

1 −1 −1 Hij (θ)∂θk {Hik (θ)λ(θ)}∂θl {Hjl (θ)λ(θ)}, λ(θ)

(77)

(78)

ijkl

where we have used our central inequality (27) to pass to Eq. (77). Now suppose that the quantity (78) is finite (we will give conditions for that in a moment). By the assumed uniform convergence of N V N to W , upon letting N → ∞, Eq. (77) becomes Eλ tr W 0

−1

H −1 (Θ)W 0

−1

−1

W (Θ) ≥

(Eλ tr W 0 H −1 (Θ))2 . (d − 1)

(79)

Now suppose the density λ in this equation (the probability density of Θ) is replaced by an element λm in a sequence of densities, concentrating on smaller and smaller neighbourhoods of θ0 as m → ∞. Assume that H(θ) is continuous at θ0 . Recall our earlier assumption that W (θ) is also continuous at θ0 , with W 0 = W (θ0 ). Then taking the limit as m → ∞ of Eq. (79) yields tr W −1 (θ0 )H −1 (θ0 ) ≥ (tr W −1 (θ0 )H −1 (θ0 ))2 /(d − 1)

entire

December 28, 2004

13:56



203

or the required limiting form of Eq. (27): tr W −1 (θ0 )H −1 (θ0 ) ≤ (d − 1). ˜ m) It remains to discuss whether it was reasonable to assume that I(λ is finite (for each m separately). Note that this quantity only depends on the prior density λ and on H(θ), where λ is one of a sequence of densities supported by smaller and smaller neighbourhoods of θ0 . We already assumed that H(θ) was continuous at θ0 . It is certainly possible to specify prior densities λm concentrating on the ball of radius 1/m, say, satisfying the smoothness assumptions in Ref. [7] and with, for each m, finite Fisher information matrix 6 1 dθ m ∂θk {λm (θ)}∂θl {λm (θ)}. λ (θ) Consideration of Eq. (78) then shows that it suffices further just to assume −1 (θ)} is, for each i, k, bounded in a neighbourhood of θ0 . that ∂θk {Hik In conclusion we have shown that under mild smoothness conditions on H(θ), the limiting mean quadratic error matrix W of a sufficiently regular but otherwise arbitrary sequence of estimators must satisfy the asymptotic version of our central inequality tr H −1 W −1 ≤ d − 1. 7. Attaining the Cram´ er-Rao Bound in Two Dimensions We shall now show that bounds (27), (29), and (31) are sharp in the case of pure states of spin-1/2 systems, and of separable measurements in the case of mixed states of spin-1/2 systems. In particular, in the limit of a large number of copies N any target scaled MQE matrix W that satisfies tr H −1 W −1 ≤ 1 can be attained (provided W is non singular). We shall show this by explicitly constructing a measurement strategy that attains the bound. In Sec. 6 we have already shown that if tr H −1 W −1 > 1, then it cannot be attained. 7.1. Attaining the bound at a fixed point θ 0 The first step in the proof is to consider the case of one copy of the unknown state (N = 1) and fix a particular point θ0 . Then we show that for any target information matrix G(θ0 ) that satisfies tr H −1 (θ0 )G(θ0 ) ≤ 1, we can 0 build a measurement M = M θ , in general depending on θ0 , such that 0 I(M θ , θ0 ) = G(θ0 ). In the next sections we shall show how to use this intermediate result to build a measurement and estimation strategy whose asymptotic MQE is equal to W (θ) = G(θ)−1 for all θ.

entire

December 28, 2004

13:56



204

Let us first consider the case of pure states. At θ0 , the state is |ψ 0 . We introduce a parameterization θ1 , θ2 such that in the vicinity of |ψ 0 , the unknown state is |ψ(θ) = |ψ 0 + (θ1 + iθ2 )|ψ 1 .

(80)

Thus the original point θ0 corresponds to the new θ1 = θ2 = 0. In this parameterization, H is proportional to the identity at θ1 = θ2 = 0: Hθ1 θ1 (0) = Hθ2 θ2 (0) = 1, Hθ1 θ2 (0) = 0. We now diagonalize the matrix G. Thus there exist new parameters θ1 = cos λ θ1 + sin λ θ2 and θ2 = − sin λ θ1 + cos λ θ2 such that Gθ1 θ1 (0) = g1 ≥ 0, Gθ2 θ2 (0) = g2 ≥ 0, and Gθ1 θ2 (0) = 0. In terms of the parameters θ1 , θ2 , the unknown state is written

|ψ 0 = |ψ 0 + (θ1 + iθ2 )|ψ 1 1

(81)

where |ψ = e |ψ . 0 The POVM M θ consists of measuring the observable |ψ 0 ψ 1 | + |ψ 1 ψ 0 | with probability g1 , of measuring the observable i(|ψ 0 ψ 1 | − |ψ 1 ψ 0 |) with probability g2 , and of measuring nothing (or measuring the identity) with probability 1 − g1 − g2 . It is straightforward to verify that 0 the Fisher information at θ0 in a measurement of the POVM M θ is equal to G(θ0 ). Let us now turn to the case of mixed states. We suppose that there are three unknown parameters. We use a parameterization in which ρ(θ) = (1/2)(I + θ · σ), with θ < 1. Without loss of generality we can suppose that θ0 = (0, 0, n), so that ρ(θ0 ) = (1/2 + n/2) |1 1| + (1/2 − n/2) |2 2| = 1 2 (I + nσz ). The tangent space at ρ is spanned by the Pauli √ matrices ρ,x = ρ ρ,12− ), ρ = σ (= ), and ρ = σ (= ρ 1 − n2 ), where in σx (= ,12+ ,y y ,z z ,1 2 2 parentheses we have given the relation to the basis used in Sec. 5.3. In this coordinate system H(θ0 ) is diagonal with eigenvalues 1, 1, 1/(1 − n2 ). Take any symmetric positive matrix G satisfying tr GH −1 (θ0 ) ≤ 1. De 1 1 fine the matrix F = H − 2 GH − 2 = i γi fi ⊗ fi , where γi and fi are the eigenvalues and eigenvectors of F . The condition tr GH −1 (θ0 ) ≤ 1 can 1 then be rewritten i γi ≤ 1. If we define gi = H 2 fi , then we can write G = i γi gi ⊗ gi . Denote mi = gi / gi . Consider the measurement of the spin along the direction mi . This is the POVM consisting of the two projectors P+mi = 12 (I + mi .σ) and P−mi = 1 2 (I − mi .σ). The information matrix for this measurement is Tr(P±m σk ) Tr(P±m σl ) (mi )k (mi )l i i = (82) I(P±mi )kl = Tr(P ρ) (1 − n2 (mi )2z ) ±m ± iλ

1

entire

December 28, 2004

13:56



205

where (mi )k is component k of vector mi . Therefore this information matrix is proportional to gi ⊗ gi . One verifies that it obeys tr H −1 I(P±mi ) = 1, as it must by our findings in Sec. 5 since the measurement is exhaustive, N = 1, and p = d2 − 1. Therefore, the coefficient of proportionality is 1, and I(P±mi ) = gi ⊗ gi .

(83)

We now combine such POVM’s to obtain the POVM whose elements are γ1 P+m1 , γ1 P−m1 , γ2 P+m2 , γ2 P−m2 , γ3 P+m3 , γ3 P−m3 , (1 − γ1 − γ2 − γ3 ). (84) The information matrix for this measurement is just the sum γ1 I(P±m1 ) + γ2 I(P±m2 ) + γ3 I(P±m3 ) = i γi gi ⊗ gi = G. Thus POVM (84) attains the target information G at the point θ0 . 7.2. Attaining the bound for every θ and arbitrary N by separable measurements We now prove Theorem 7, that states that in the case of spin half particles we can attain the bound (29) for every θ. Give yourself a continuous matrix W (θ), the target-scaled MQE matrix, satisfying Eq. (29) for every θ. Define G(θ) = W (θ)−1 , the target-scaled information matrix, which therefore satisfies Eq. (27). We will show that there exists a separable measurement and an estimation strategy on N copies of the state ρ(θ) such that the MQE matrix V N of the estimator satisfies 1 Wij (θ) +o V N (θ)ij = Eθ ((θî − θi )(θˆj − θj )) = (85) N N for all θ. In fact this holds uniformly in θ in a sufficiently small neighbourhood of any given point. This is proven by constructing explicitly a measurement and estimation strategy that satisfies Eq. (85), following the lines of Ref. [1]. The measurement and estimation strategy we propose is the following: first take a fraction N0 = O(N a ) of the states, for some fixed 0 < a < 1, and on one-third of them measure σx , on one third σy and on one-third σz . One obtains from each measurement of σx the outcome ±1 with probabilities 1 2 (1±θx ), and similarly for σy , σz . Using these data we make a first estimate ˜ for instance by equating the observed relative frequencies of of θ, call it θ, ±1 in the three kinds of measurement to their theoretical values. If the state is pure this determines a first estimate of the direction of polarization. If the

entire

December 28, 2004

206

13:56



state is mixed it is possible that the initial estimate suggests that the Bloch vector lies outside the unit sphere. This only occurs with exponentially small probability (in N0 ), and if this is the case the measurement is discarded. As discussed below this only affects the mean quadratic error by o(1/N ). On the remaining N = N − N0 states we carry out the measurement ˜ ˜ ˜ ˜ which we have just shown how to M = M θ such that I(M θ , θ) = G(θ), θ˜ construct. Note that I(M , θ) = G(θ) is only guaranteed when θ is precisely ˜ Write I(M, θ; θ) ˜ for the Fisher information about θ, based on equal to θ. θ˜ ˜ while the true value of the parameter is the measurement M optimal at θ, ˜ actually θ. Given θ, each of the N second stage measurements represents ˜ = Tr M θ˜ρ(θ). We use one draw from the probability distribution p(ξ|θ; θ) ξ the classical MLE based on this data only (with θ˜ fixed at its observed ˆ value) to estimate what is the value of θ. Call this estimated value θ. 0 Let ε > 0 be fixed, arbitrarily small. Let θ denote the true value of θ. For given δ > 0 let B(θ0 , δ) denote the ball of radius δ about θ0 . Fix a convenient matrix norm · . We have the exponential bound Pr{θ˜ ∈ B(θ0 , δ)} ≥ 1 − Ce−DN0 δ

2

(86)

for some positive numbers C and D (depending on δ). The reason we take N0 proportional to N a for some 0 < a < 1 is that this ensures that 1 − Ce−DN0 = o(1/N ). Modern results [11] on the MLE θˆ state that, under certain regularity conditions, the conditional MQE matrix of θˆ satisfies (at θ = θ0 , and ˜ conditional on θ) ˜ −1 I(M, θ0 ; θ) 1 N 0 ˜ (87) +o V (θ ; θ) = N N uniformly in θ0 . However, for the next step in our argument this same result must be true uniformly in θ˜ for given θ0 . This could be verified by careful reworking of the proof in Ref. [11]. Rather than doing this, in Secs. 7.3 and 7.4 we will explicitly calculate the conditional MQE matrix of our estimator, and show that it satisfies Eq. (87) uniformly in θ˜ in a small enough neighbourhood B(θ0 , δ) of θ0 . The “little o” in Eq. (87) refers to the chosen matrix norm. ˜ −1 is continuous in θ˜ at θ˜ = θ0 , at We will also need that I(M, θ0 ; θ) which point it is equal to our construction the target-scaled MQE W (θ0 ). This is also established in Sec. 7.3. Therefore, replacing if necessary δ ˜ −1 is within ε of by a smaller value, we can guarantee that I(M, θ0 ; θ) I(M, θ0 ; θ0 )−1 = W (θ0 ) for all θ˜ ∈ B(θ0 , δ).

entire

December 28, 2004

13:56



207

˜ is If θ˜ is outside the domain B(θ0 , δ), then the norm of V N (θ0 ; θ) bounded by a constant A since θ belongs to a compact domain. Putting everything together we find that '6 ' ' ' N 0 0 N 0 ˜ 0 ˜ ' N V (θ ; θ) − W (θ ) dP (θ)'

N V (θ ) − W (θ ) = ' ' 6 ˜ − W (θ0 ) dP (θ) ˜ + AN C e−DN0 ≤

N V N (θ0 ; θ) B(θ 0 ,δ) 6 ˜ −1 + o(1) − W (θ0 ) dP (θ) ˜ + o(1) ≤ ε + o(1) + o(1). =

I(M, θ0 ; θ)

B(θ 0 ,δ)

It follows since N /N → 1 as N → ∞ that lim sup N V N (θ0 )−W (θ0 ) ≤ ε. N →∞

Since ε was arbitrary, we obtain Eq. (85). 7.3. Analysis of the conditional mean quadratic error We first consider the case of impure states, with the parameterization 1 ρ = (I + θ.σ), with (θi )2 < 1. (88) 2 where we have imposed that the state is never pure. This case turns out to allow the most explicit and straightforward analysis because the relation between the frequency of the outcomes and the parameters θ is linear. For other cases the analysis is more delicate, and is discussed in the next subsection. In general, smoothness assumptions will have to be made on the parameterization ρ = ρ(θ). We suppose that W (θ) is non-singular and continuous in θ. Consequently the γi (defined in Sec. 7.1) depend continuously on θ, and are all strictly positive at the true value θ0 of θ. Given the initial estimate, the second state measurement can be implemented as follows: for each of the N = N − N0 observations, independently of one another, with probability γi , measure the projectors P±mi , in other γi , do words, measure the spin observable mi .σ. With probability 1 − nothing. We emphasize that the γi and mi all depend on the initial estimate θ˜ ˜ and H(θ). ˜ In the following, all probability calculations are through W (θ) ˜ conditional on a given value of θ. For simplicity we will modify the procedure in the following two ways: first, rather than taking a random number of each of the three types of measurement, we will take the fixed (expected) numbers Aγi N B (and neglect

entire

December 28, 2004

13:56


208


the difference between Aγi N B and γi N ). Second, we will ignore the con straint (θi )2 ≤ 1. These two modifications make the maximum likelihood estimator easier to analyze, but do not change its asymptotic MQE. Later we will sketch how to extend the calculations to the original constrained maximum likelihood estimator based on random numbers of measurements of each observable. Now measuring mi .σ produces the values ±1 with probabilities p±i = 1 2 (1 ± θ.mi ). Since our data consist of three binomially distributed counts and we have three parameters θ1 , θ2 and θ3 , the maximum likelihood estimator can be described, using the invariance of maximum likelihood estimators under 1–1 reparametrization, as follows: set the theoretical values p±i equal to their empirical counterparts (relative frequencies of ±1 in the γi N observations of the ith spin) and solve the resulting three equations in three unknowns. To be explicit, define ηi = 2p+i − 1 = θ.mi and let ηî be its empirical counterpart. Recall that mi = gi / gi and gi = H 1/2 fi , where the fi are the orthonormal eigenvectors of H −1/2 GH −1/2 , and where H and G are ˜ G(θ), ˜ and θ˜ is the preliminary estimate of θ. Then we can rewrite H(θ), ηi = θ · mi = θ · gi / gi = θ · H 1/2 fi / H 1/2 fi = (H 1/2 θ) · fi / H 1/2 fi , from which we obtain (H 1/2 θ) · fi = H 1/2 fi ηi and hence θ = H −1/2

H 1/2 fi ηi fi .

i

The same relation holds between θˆ and ηˆ and yields the sought for expression for θˆ in terms of the empirical relative frequencies. Observing that ηî are independent with variance 4p+mi p−mi /(γi N ) = ˆ conditional on the preliminary (1 − (θmi )2 )/(γi N ), the MQE matrix of θ, ˜ estimate θ, is 1 (θ0 · H 1/2 fi )2 ˜ = 1 V N (θ0 ; θ) 1 −

H 1/2 fi 2 H −1/2 (fi ⊗fi )H −1/2 . N i γi

H 1/2 fi 2 (89)

entire

December 28, 2004

13:56



209

There is no o(1/N ) term here, so we do not have to check uniform convergence: the limiting value is attained exactly. Actually we cheated by replacing Aγi N B by γi N . This does introduce a o(1/N ) error into Eq. (89) ˜ are uniformly in a neighbourhood of θ0 in which γi , which depend on θ, bounded away from zero, and H and its inverse are bounded. One may verify that Eq. (89) reduces to W (θ0 )/N at θ˜ = θ0 (indeed 2 2 2 2 2 ˜ (θ˜ · H 1/2 fi )2 = n fiz2 and H 1/2 fi 2 = 1−n +n2 fiz ). But this at θ0 = θ, 1−n 1−n computation is really superfluous since, at this point, we are computing the MQE of the maximum likelihood estimator based on a measurement with, by our construction, Fisher information equal to the inverse of W (θ0 ). (The modifications to our procedure do not alter the Fisher information). The two quantities must be equal by the classical large sample results for the maximum likelihood estimator. We finally need to show the continuity in θ˜ at θ˜ = θ0 of N times the quantity in Eq. (89). This is evident if γi are all different at θ0 . Both 1 1 the eigenvalues and the eigenvectors of H − 2 GH − 2 are then continuous functions of θ˜ at θ0 . However, there is a potential difficulty if some γi are equal to one another at θ˜ = θ0 . In this case, the eigenvectors fi are not continuous functions of θ˜ at this point, and not even uniquely defined there. We argue as follows that this does not destroy continuity of the mean quadratic error. Consider a sequence of points θñ approaching θ0 . This generates a sequence of eigenvectors fin and eigenvalues γin . The eigenvalues converge to the γi but the eigenvectors need not converge at all. However by compactness of the set of unit vectors in R3 , there is a subsequence along which the eigenvectors fin converge; and they must converge to a possible choice of eigenvectors at θ0 . Thus along this subsequence the mean quadratic error (89) does converge to a limit given by the same formula evaluated at the limiting fi etc. But this limit is equal by construction to the inverse of the target information matrix G(θ). A standard argument now shows that the limiting mean quadratic error is continuous at θ˜ = θ0 . The MQE of θˆ given θ˜ (times N ) therefore converges uniformly in a sufficiently small neighbourhood of θ0 to a limit continuous at that point, and is equal to W (θ0 ) there. In our derivation of Eq. (85) we required the parameter and its estimator to be bounded. By dropping the constraint on the length of θ we have inadvertently lost this property. Suppose we replace our modified estimator θˆ by the actual maximum likelihood estimator respecting the constraint. The two only differ when the unconstrained estimator lies outside the unit sphere; but this event only occurs with an exponentially small probability,

entire

December 28, 2004

210

13:56



˜ provided γi are uniformly bounded away from 0 in the given uniformly in θ, neighbourhood of θ0 . From this it can be shown that the mean quadratic ˜ error is altered by an amount o(1/N ) uniformly in θ. If we had worked with random numbers of measurements of each spin variable, when computing the mean quadratic error we would first have copied the computation above conditional on the numbers of measurements, say Xi , of each spin mi . These numbers are binomially distributed with parameter N and γi . The conditional mean quadratic error would be the same as the expression above but with 1/(γi N ) replaced by 1/Xi (and special provision taken for the possible outcome Xi = 0). So to complete the argument we must show that E(1/Xi ) = 1/(γi N ) + o(1/N ) uniformly ˜ This can also be shown to be true, using the fact that Xi /N only in θ. differs from its mean by more than a fixed amount with exponentially small probability as N → ∞, and we restrict attention to θ˜ in a neighbourhood of θ0 , where γi are bounded away from zero. Inspection of our argument shows that the convergence of the mean quadratic error is uniform in θ0 , as long as we keep away from the boundary of the parameter space. By the convergence of the normalized binomial distribution to the normal distribution, the representation of the estimator we gave above also shows that it is asymptotically normally distributed with asymptotic covariance matrix equal to the target covariance matrix W . Moreover, if X 1 has the binomial (n, p) distribution, then n 2 (X/n − p) converges in distribution to the normal with mean zero and variance p(1 − p), uniformly in p. Thus the convergence in distribution is also uniform in θ0 as long as we keep away from the boundary of the parameter space.

7.4. Conditional mean quadratic error for other models The preceding subsection gave a complete analysis of the mean quadratic error, given the preliminary estimate θ˜ for the 3 unknown parameters θj , of parameterization (88). We shall first analyze the mean quadratic error when the unknown parameters are functions φi (θj ) of the parameters θj . We shall then consider the important case when the state is pure and depends on two unknown parameters, and finally the case when the state is pure or mixed and depends on one unknown parameter, or is mixed and depends on two unknown parameters. Our first result is that if the change of parameters φi (θj ) is locally C 1 , then the MQE matrix of the φi is obtained from the MQE of the θj by the Jacobian ∂φi /∂θj except eventually at isolated points. This follows from the

entire

December 28, 2004

13:56



211

fact that under a smooth (locally C 1 ) parameterization, the δ method (firstorder Taylor expansion) allows √ usNto conclude a uniform convergence of the probability distribution of N (φ? − φ) to a normal limit with the target mean quadratic error. If the φi and their derivatives ∂φi /∂θj are bounded, then this proves our claim. If there are points where φi or their derivatives ∂φi /∂θj are infinite, then convergence in distributions not necessarily imply convergence of moments. However, a truncation device allows one to modify ? replacing it by 0 if any component is larger than cN a the estimate φ, for given c and a [use the method of Lemma II.8.2 [11], together with the exponential inequality (86) for the multinomial distribution]. With this 0 minor modification one can show (uniform in φ in √ a neighbourhood of φ ) ? convergence of the moments of the corresponding N (φ−φ) to the moments of its limiting distribution, and hence achievement of the bound in the sense of Theorem 4. In particular, if the parameter φ is bounded then the truncation is superfluous. Now turn to the pure state analog of model (88). Obtain a preliminary estimate of the location of ρ on the surface of the Poincaré sphere using the same method as in the mixed case, but always projecting onto the surface of the sphere. Next, after rotation to transform the preliminary estimate into “spin-up,” reparameterize to ρ = 12 (1 + φ · σ), where the parameters to be estimated are (φ1 , φ2 ) = (θ1 , θ2 ) of the parameterization (81) while √ φ3 = (1 − φ21 − φ22 ). The preliminary estimate is at φ1 = φ2 = 0. The optimal measurement at this point according to Sec. 7.1 consists of measurements of the spins σ1 and σ2 on specified proportions of the remaining copies. The resulting estimator of the parameter (φ1 , φ2 ) is a linear function of binomial counts, and hence its mean quadratic error can be studied exactly as in Sec. 7.3. Then we must transfer back to the originally specified parameterization, for instance polar coordinates. This is done as in the preceding paragraph. If the transformation is locally C 1 , then uniform convergence in distribution to the normal law also transfers back; there is also a convergence of the mean quadratic error if the original parameter space is bounded. Otherwise a truncation might be necessary. In any case, we can exhibit a procedure optimal in the sense of Theorem 4. It remains to consider one- and two-dimensional sub-models of the full mixed model, and one-dimensional sub-models of the full pure model. We suppose that the model specifies a smooth curve or surface in the interior of the Poincaré sphere, or a smooth curve on its surface; smoothly parameterized by a one- or two-dimensional parameter as appropriate. The first stage of the procedure is just as before, finishing in projection of an estimated

entire

December 28, 2004

212

13:56



density matrix into the model. Then we reparameterize locally, augmenting the dimension of the parameter to convert the model into a full mixed or pure model, respectively. The target information for the extra parameters is zero. Compute as before the optimal measurement at this point. Because of the zero values in the target information matrix, there will be zero eigenvalues γi in the computation of Sec. 7.1. Thus the optimal measurement will involve specified fractions of measurement of spin in the same number of directions as the dimension of the model. Compute the maximum likelihood estimator of the original parameters based on this data. If the parameterization is smooth enough, the estimator will yet again achieve the bound of Theorem 4.

8. Conclusions and Open Questions In this paper we have solved some of the theoretical problems that arise when trying to estimate the state of a quantum system of which one possesses a large number of copies. This constitutes a preliminary step toward solving the question with which Helstrom concluded his book [8]: “. . . mathematical statisticians are often concerned with asymptotic properties of decision strategies and estimators, . . . When the parameters of a quantum density operator are estimated on the basis of many observations, how does the accuracy of the estimates depend on the number of observations as that number grows very large? Under what conditions have the estimates asymptotic normal distributions? Problems such as these, and still others that doubtless will occur to physicists and mathematicians, remain to be solved within the framework of the quantum-mechanical theory.” In the case of pure states of spin-1/2 particles, the problem has been solved. The key result is that in the limit of large N , the variance of the estimate is bounded by Eq. (28), and the bound can be attained by separate von Neumann measurements on each particle. In the case of mixed states of spin-1/2 particles the state estimation problem for large N has been solved if one restricts oneself to separable measurements. However, if one considers nonseparable measurements, then one can improve the quality of the estimate, which shows that the Fisher information, which in classical statistics is additive, is no longer so for quantum state estimation. For the case of mixed states of spin-1/2 particles, or for higher spins, we do not know what the “outer” boundary of the set of (rescaled) achievable Fisher information matrices based on arbitrary (nonseparable) measure-

entire

December 28, 2004

13:56



213

ments of N systems looks like. We have some indications about the shape of this set (see Sec. 5.6), and we know that it is convex and compact. Acknowledgments S.M. thanks Utrecht University, The Netherlands, where part of this work was carried out. He is research associate of the Belgium National Research Fund. R.D.G. thanks the generous hospitality of the Department of Mathematics and Statistics, University of Western Australia. References [1] O.E. Barndorff-Nielsen and R.D. Gill, quant-ph/9808009, (1989)∗ . [2] C.H. Bennett, D.P. DiVincenzo, C.A. Fuchs, T. Mor, E. Rains, P.W. Shor, J.A. Smolin, and W.K. Wootters, Phys. Rev. A, 59, 1070, (1999). [3] S.L. Braunstein and C.M. Caves, Phys. Rev. Lett., 72, 3439, (1994). [4] H. Cramér, Mathematical methods of statistics, Princeton University, Princeton, NJ, 1946. [5] A.P. Dempster, N.M. Laird, and D.R. Rubin, J. Roy. Statist. Soc. B, 39, 1, (1977). [6] T.J. Dunn, I.A. Walmsley, and S. Mukamel, Phys. Rev. Lett., 74, 884, (1995). [7] R.D. Gill and B.Y. Levit, Bernouilli 1(1/2), 59, (1995). [8] C.W. Helstrom, Quantum detection and estimation theory, Academic, New York, 1976. [9] A. S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982. [10] Z. Hradil, J. Summhammer, and H. Rauch, quant-ph/9806014, (1998)† . [11] I.A. Ibragimov and R.Z Has’minskii, Statistical estimation—asymptotic theory, Springer, Berlin, 1981. Theorem III.1.1 (p. 174) is the theorem on uniform convergence of MLE’s. [12] C. Kurtsiefer, T. Pfau, and J. Mlynek, Nature (London) 386, 150, (1997). [13] D. Leibfried, D. Meekhof, B.E. King, C. Monroe, W.M. Itano, and D.J. Wineland, Phys. Rev. Lett., 77, 4281, (1996). [14] U. Leonhardt, Phys. Rev. Lett., 74, 4101, (1995)b. [15] S. Massar and S. Popescu, Phys. Rev. Lett., 74, 1259, (1995). (Chap. 22 of this book) [16] A. Peres and W.K. Wootters, Phys. Rev. Lett., 66, 1119, (1991). ∗ Editorial

note: This paper has been published as O.E. Barndorff-Nielsen and R.D. Gill, “Fisher information in quantum statistics,” J. Phys. A: Math. Gen., 30, 4481–4490 (2000). † Editorial note: This paper has been published as Z. Hradil, J. Summhammer, and H. Rauch, “Quantum tomography as normalization of incompatible observations,” Phys. Lett., A261, 20–24 (1999).

entire

December 28, 2004

214

13:56



[17] D.T. Smithey, M. Beck, M.G. Raymer, and A. Faridani, Phys. Rev. Lett., 70, 1244, (1993). [18] A.W. van der Vaart, in D. Pollard, E. Torgersen, and G.L. Yang, eds., Festschrift for Lucien le cam, Springer-Verlag, 1997, Chap. 27, 397-410. [19] H.L. van Trees, Detection, estimation and modulation theory, Wiley, New York, 1968. Pt. 1. [20] G. Vidal, J.I. Latorre, P. Pascual, and R. Tarrach, quant-ph/9812068, (1998)‡ . [21] K. Vogel and H. Risken, Phys. Rev. A., 40, 2847, (1989). [22] S. Weigert, quant-ph/9809065, (1998)§ . [23] W.K. Wootters, Phys. Rev. D, 23, 357, (1981). [24] H.P. Yuen and M. Lax, IEEE Trans. Inform. Theory, 19, 740, (1973).

Additional Note by Author The paper (*), Gill (2001), presents the same results as Gill & Massar (2002), but is written for mathematical statisticians rather than for physicists. It contains a deeper mathematical discussion of alternative, precise, definitions of asymptotic optimality, and of the regularity conditions needed to guarantee results of various type (e.g., optimality of mean square error of limit, versus optimal limit of mean square error; problem of phenomenon of super-efficient estimators). An important innovation of these papers was the use of the van Trees inequality, a Bayesian version of Cramér-Rao, in order to deal rigorously with these issues. (*) Richard D. Gill “Asymptotics in Quantum Statistics,” math.ST/0405571; pp. 255-285 in: State of the Art in Probability and Statistics, IMS Lecture Notes - Monograph Series 36 (2001), eds A.W. van der Vaart, M. de Gunst, C.A.J. Klaassen.

‡ Editorial note: This paper has been published as G. Vidal, J.I. Latorre, P. Pascual, and R. Tarrach, “Optimal minimal measurements of mixed states,” Phys. Rev. A, 60, 126 (1999). § Additional author’s note: This paper exists only on quant-ph, not in journal form. It reviews various analytical (i.e., assuming no statistical error) approaches to state reconstruction for finite dimensional systems, including some approaches of the the author, which can be found (via quant-ph) in some other, published, papers.

entire

December 28, 2004

13:56


PART III Quantum Cram´ er-Rao Bound in Pure States Model

Chap. 17: A. Fujiwara and H. Nagaoka “Quantum Fisher metric and estimation for pure state models” . . . . 220 Chap. 18: A. Fujiwara “Geometry of quantum estimation theory” . . . . . . . . . . . . . . . . . 229 Chap. 19: A. Fujiwara and H. Nagaoka “An estimation theoretical characterization of coherent states” . . . . 287 Chap. 20: K. Matsumoto “A geometrical approach to quantum estimation theory” . . . . . . . . 305

entire

December 28, 2004

13:56


entire

December 28, 2004

13:56


CHAPTER 16

Introduction to Part III 1. Quantum Cram´ er-Rao Bound in Pure States Model As mentioned in chapter 7, the asymptotic theory is one of the central issues in quantum statistical inference. However, the pure state case cannot be treated just as a special case of the mixed case. As is well known, a pure density matrix has no inverse. This property causes several difficulties in quantum estimation as well as other related topics. However, it has been found that this property of pure states simplifies the calculation of the CR bound. In the following, we review the history of these results. In Chap. 7, we mentioned that Helstrom [II-1] solved an optimization problem in the one-parameter case. The SLD is not uniquely defined in the pure states case. On the other hand, Holevo uniquely defined the SLD as a vector in the the quantum L2 space in a general setting (Chapters II and VI of Holevo [0-2]). Fujiwara and Nagaoka [Chap. 17] developed essentially the same argument for the pure state case in particular, and expressed the quantum L2 space as a quotient space, concretely. They also calculated SLD Fisher information in the pure state case (Also see Fujiwara [III-1]). Moreover, another difficulty is caused by the non-invertibility. As is mentioned in Chap. 7, Holevo derived the CR bound of the quantum Gaussian state family based on RLD Fisher information matrix. However, the RLD diverges, i.e., can not be defined in the pure state case. Thus, we cannot directly apply Holevo’s method to the family of pure coherent states while this family is the noiseless limit of the quantum Gaussian state family. In order to avoid this difficulty, Fujiwara and Nagaoka [III-2] invented another method and calculated the CR bound in this case. Their method can be applied to a certain two-parameter pure states family called the two-parameter coherent model, which contains the two-parameter pure states family in the quantum two-level system. Applying their method to the quantum two-level system, they provided a simpler proof than Nagaoka’s [Chap. 8] proof in this case. Combining both papers [Chap. 17, III-2], they pointed out the difference between the structures of SLD and RLD in the pure state case. Indeed, such a difficulty appears in the non-invertible case. Luati [III-3]

217

entire

December 28, 2004

218

13:56


Introduction to Part III

discussed POVMs achieving the optimal Fisher information when the rank of the density matrix is two. Furthermore, Fujiwara extended their result to the multi-parameter coherent model and calculated the CR bound of this model in his doctoral thesis [Chap. 18]. Since his thesis contains reviews of previous results, e.g., the paper [III-2] etc., as well as several original results, it is quite introductory. Following this result, Fujiwara and Nagaoka [Chap. 19] simplified its proof. Moreover, Matsumoto invented a more general approach that could be applied to any pure states model [Chap. 20, III-5]. Using this method, he solved this problem in any two-parameter pure states family. His result covers all known results regarding the CR bound in the pure state case. Next, we focus on the possibility of reducing the CR bound by the use of a quantum collective measurement on the tensor product space. It is easily checked by Matsumoto’s characterization of CR bounds in the pure states model, that the CR bound of the n-fold tensor product system in pure states model equals n1 times that for one single system. This is because the CR bound can be defined only by the inner products between the vectors ∂ψ(θ) ∂ψ(θ) ∂θ i − ψ(θ), ∂θ i ψ(θ), where ψ(θ) is the state vector. This fact does not appear in Matsumoto [Chap. 20], but does in Matsumoto [III-5]. As mentioned in Chap. 7, Hayashi and Matsumoto [Chap. 12] and Gill and Massar [Chap. 15] showed that the first-order coefficient of the optimal error equals the limit of n times the CR bound of the n-fold tensor product system. Combining it with the above statement, we can verify that the optimal error can be, asymptotically to the first order, attained by an adaptive measurement.

2. Geometrical Study of Quantum Estimation It is necessary to discuss other problems apart from calculating the CR bound. Since information geometry succeeded in geometrically characterizing several statistical properties in classical statistics [II-12], it is a natural problem to geometrically characterize quantum states families from an estimation-theoretical viewpoint. The first success of this direction is a quantum extension of the concept of ‘exponential family’ by Nagaoka [Chap. 9]. Nagaoka [Chap. 10, III-4] also considered quantum extension of exponential connection. Based on results, Fujiwara and Nagaoka [Chap. 17] and Fujiwara [Chap. 18] obtained remarkable results of geometrical aspects of quantum estimation theory as well as CR bounds of coherent model. They

entire

December 28, 2004

13:56


Introduction to Part III

219

characterized the quantum exponential family in the pure states case in a different form, and obtained an insightful relation with the time-energy uncertainty relation. Moreover, Matsumoto focused on Uhlmann geometry and Berry phase, and discussed their relation with quantum estimation theory [Chap. 20]. He also discussed the uncertainty principle from the viewpoint of quantum estimation theory [III-6]. Further Reading [III-1] A. Fujiwara, “One-parameter pure state estimation based on the symmetric logarithmic derivative,” Math. Eng. Tech. Rep. 94-8, Univ. Tokyo (1994). [III-2] A. Fujiwara and H. Nagaoka, “Coherency in view of quantum estimation theory,” in Quantum coherence and decoherence, edited by K. Fujikawa and Y. A. Ono (Elsevier, Amsterdam, 1996), p. 303. [III-3] A. Luati, “Maximum fisher information in mixed state quantum systems,” Annals of Statistics, 32, 1770–1779 (2004). [III-4] H. Nagaoka, “Differential geometrical aspects of quantum state estimation and relative entropy,” Quantum Communications and Measurement, (edited by Belavkin, Hirota, and Hudson), Plenum, New York, 1995, 449–452.; Math. Eng. Tech. Rep., 94-14, Univ. Tokyo (1994). [III-5] K. Matsumoto, “A new approach to the Cramer-Rao type bound of the pure state model,” J. Phys. A: Math. and Gen., 35, 3111–3124 (2002). [III-6] K. Matsumoto, “Uncertainty Principle in View of Quantum Estimation Theory,” Math. Eng. Tech. Rep., 97-08, Univ. Tokyo (1997).

entire

December 28, 2004

13:56


CHAPTER 17

Quantum Fisher Metric and Estimation for Pure State Models Akio Fujiwara and Hiroshi Nagaoka Abstract. A statistical parameter estimation theory for quantum pure state models is presented. We first investigate the basic framework of pure state estimation theory and derive quantum counterparts of the Fisher metric. We then formulate a one-parameter estimation theory, based on the symmetric logarithmic derivatives, and clarify the differences between pure state models and strictly positive models.

1. Introduction A quantum statistical model is a family of density operators ρθ defined on a certain separable Hilbert space H with finite-dimensional real parameters θ = (θi )ni=1 which are to be estimated statistically. In order to avoid singularities, the conventional quantum estimation theory [7, 8] has been often restricted to models that are composed of strictly positive density operators. It was Helstrom [6] who successfully introduced the symmetric logarithmic derivative for the one-parameter estimation theory as a quantum counterpart of the logarithmic derivative in the classical estimation theory. The right logarithmic derivative is another successful counterpart introduced by Yuen and Lax [15] in the expectation parameter estimation theory for quantum Gaussian models, which provided a theoretical background of optical communication theory. Quantum information theorists have also kept away from degenerate states, such as pure states, for mathematical convenience [12]. Indeed, the von Neumann entropy cannot distinguish the pure states, and the relative entropy diverge. In this paper, however, we try to construct an estimation theory for pure state models, and clarify the differences between the pure state case and the strictly positive state case.

Reprinted from Physics Letters, Vol 201A, A. Fujiwara and H. Nagaoka, “Quantum c Fisher metric and estimation for pure state models,” Pages 119-124, 1995, with permission from Elsevier. 220

entire

December 28, 2004

13:56


Quantum Fisher Metric and Estimation for Pure State Models

221

2. Preliminaries Let H be a Hilbert space with inner product ψ|ϕ for every ψ, ϕ ∈ H. Further, let L and Lsa are, respectively, the set of all (bounded) linear operators and all self-adjoint operators on H. Given a possibly degenerate density operator ρ, we define sesquilinear forms on L: (1) (A, B)ρ = Tr ρBA∗ , 1 (2) A, Bρ = Tr ρ(BA∗ + A∗ B), 2 where A, B ∈ L. These are pre-inner products on L, i.e., possess all properties of inner products except that (K, K)ρ and K, Kρ may be equal to zero for a nonzero K ∈ L. Note that the Schwarz inequality also holds for pre-inner product. The forms (·, ·)ρ and ·, ·ρ become inner products if and only if ρ > 0. If rank ρ = 1 or equivalently ρ2 = ρ, ρ is called pure. The following lemmas are fundamental. Lemma 1: Suppose ρ is pure. Then the following three conditions for linear operators K ∈ L are equivalent. (i) (K, K)ρ = 0, (ii) ρK = 0, (iii) Tr ρK = 0 and ρK + K ∗ ρ = 0. Proof: Let us express as ρ = |ψψ| where |ψ is a normalized vector in H. Then the following equivalent sequence (K, K)ρ = 0 ⇐⇒ ψ|KK ∗ |ψ = 0 ⇐⇒ ψ|K = 0 ⇐⇒ |ψψ|K = 0 yield (i)⇔(ii). Further, (ii)⇒(iii) is trivial. (iii)⇒(ii) is shown as follows. Operating ψ| from the left to the assumption |ψψ|K + K ∗ |ψψ| = 0, and invoking another assumption Tr ρK = 0 ⇔ ψ|K ∗ |ψ = 0, we have 0 = ψ|ψψ|K + ψ|K ∗ |ψψ| = ψ|K. Therefore |ψψ|K = 0. Lemma 2: Suppose ρ is pure. Then the following three conditions for linear operators K ∈ L are equivalent. (i) K, Kρ = 0, (ii) ρK = ρK ∗ = 0,

entire

December 28, 2004

13:56


Akio Fujiwara and Hiroshi Nagaoka

222

(iii) Tr ρK = 0, ρK + K ∗ ρ = 0, and ρK ∗ + Kρ = 0. Proof: This is a straightforward consequence of Lemma 1 since K, Kρ = 1 ∗ ∗ 2 [(K, K)ρ + (K , K )ρ ]. Lemma 3: Suppose ρ is pure. Then the following three conditions for self– adjoint operators K ∈ Lsa are equivalent. (i) K, Kρ = 0, (ii) ρK = 0, (iii) ρK + Kρ = 0. Proof: Straightforward by setting K = K ∗ in Lemma 2. Denote by K(ρ) the set of linear operators K ∈ L satisfying (K, K)ρ = 0, which are called the kernel of the pre-inner product (·, ·)ρ . Also denote by Ksa (ρ) the set of self-adjoint operators K ∈ Lsa satisfying K, Kρ = 0, which are called the kernel of the pre-inner product ·, ·ρ . Note Ksa (ρ) = K(ρ) ∩ Lsa . 3. Quantum Fisher Metric Suppose we are given an n-parameter pure state model: S = {ρθ ; ρ∗θ = ρθ , Tr ρθ = 1, ρ2θ = ρθ , θ ∈ Θ ⊂ Rn }.

(3)

We define a family of quantum analogues of the logarithmic derivative by ∂ρθ 1 = [ρθ Lθ,j + L∗θ,j ρθ ], j ∂θ 2

Tr ρθ Lθ,j = 0.

(4)

LSθ,j = LS∗ θ,j

(5)

For instance, ∂ρθ 1 = [ρθ LSθ,j + LSθ,j ρθ ], j ∂θ 2

defines the symmetric logarithmic derivative (SLD) LSθ,j introduced by Helstrom [6]. Furthermore, since every pure state model is written in the form ρθ = Uθ ρ0 Uθ∗ , where Uθ is unitary, we have another useful logarithmic derivative 1 ∂ρθ A A = [ρθ LA Tr ρθ LA LA∗ (6) θ,j − Lθ,j ρθ ], θ,j = 0, θ,j = −Lθ,j , ∂θj 2 which may be called the anti-symmetric logarithmic derivative (ALD). Indeed, the ALD is closely related to the local generator Aθ,j = −i(∂Uθ /∂θj )Uθ∗ of the unitary Uθ such as LA θ,j = −2i(Aθ,j − Tr ρθ Aθ,j ).

entire

December 28, 2004

13:56



223

Thus, (4) defines a certain family of logarithmic derivatives [10]. Denote by T (ρθ ) the linear span (over R) of logarithmic derivatives which satisfy (4). Lemma 4: Suppose ρθ is pure and an arbitrary linear operator A ∈ L is given. Then all the quantities (A, Lθ,j )ρθ are identical for every logarithmic derivatives Lθ,j ∈ T (ρθ ) which correspond to the same direction θj . Proof: Take any logarithmic derivatives Lθ,j and Lθ,j which correspond to the same θj , and denote K = Lθ,j − Lθ,j . Then, from (4), K satisfies condition (iii) in Lemma 1. Therefore (K, K)ρθ = 0 holds. This and the Schwarz inequality |(A, K)ρθ |2 ≤ (A, A)ρθ (K, K)ρθ , lead to (A, K)ρθ = 0 for all A ∈ L. From Lemma 4, we can define uniquely the complex Fisher information matrix Jθ for the family of logarithmic derivatives (4) whose (j, k) entry is (Lθ,j , Lθ,k )ρθ . The SLD is also not uniquely determined for pure state models. Denote by T S (ρθ ) the linear spam (over R) of SLDs which satisfy (5). Lemma 5: Suppose ρθ is pure and an arbitrary self-adjoint operator A ∈ Lsa is given. Then all the quantities A, LSθ,j ρθ are identical for every SLD LSθ,j ∈ T S (ρθ ) which corresponds to the same direction θj . Proof: By using Lemma 3, it is proved in the same way as Lemma 4. From Lemma 5, we can define uniquely the real Fisher information matrix JθS for the family of SLDs (5) whose (j, k) entry is LSθ,j , LSθ,k ρθ , which is called the SLD–Fisher information matrix. The above results are summarized by the following theorem. Theorem 6: Suppose {ρθ } is a pure state model. Then the complex Fisher information matrix Jθ = [(L θ,j , Lθ,k )ρθ ] and the SLD–Fisher information

matrix JθS = LSθ,j , LSθ,k ρθ are uniquely determined on the quotient spaces

T (ρθ )/K(ρθ ) and T S (ρθ )/Ksa (ρθ ), respectively. They are related by JθS = Re Jθ . The (j, k) entries of Jθ and JθS become (Jθ )jk = 4 Tr ρθ (∂k ρθ )(∂j ρθ ),

(7)

(JθS )jk = 2 Tr(∂j ρθ )(∂k ρθ ),

(8)

and

entire

December 28, 2004

224

13:56



respectively, where ∂j = ∂/∂θj . In particular, JθS is identical, up to a constant factor, to the Fubini–Study metric. Proof: We only need to prove (8). Differentiating ρθ = ρ2θ , ∂j ρθ = (∂j ρθ )ρθ + ρθ (∂j ρθ ).

(9)

This relation indicates that 2∂j ρθ is a representative of the SLD. Then (JθS )jk = 2∂j ρθ , 2∂k ρθ ρθ = 2 Tr ρθ [(∂j ρθ )(∂k ρθ ) + (∂k ρθ )(∂j ρθ )].

(10)

Further, multiplying ρθ with (9), we have ρθ (∂j ρθ )ρθ = 0.

(11)

Therefore, by using (9) and (11), (∂j ρθ )(∂k ρθ ) = [(∂j ρθ )ρθ + ρθ (∂j ρθ )][(∂k ρθ )ρθ + ρθ (∂k ρθ )] = (∂j ρθ )ρθ (∂k ρθ ) + ρθ (∂j ρθ )(∂k ρθ )ρθ . This, along with (10), leads to relation (8). Denoting ρθ = |ψψ| Tr(∂j ρθ )(∂k ρθ ) = 2[Re ∂j ψ|∂k ψ + ψ|∂j ψψ|∂k ψ], which is identical to the Fubini–Study metric [1, 9]. The Fubini–Study metric is known as a gauge invariant metric on a projective Hilbert space [13]. Theorem 6 gives another meaning of the Fubini– Study metric, i.e., the statistical distance. Wootters [14] also investigated from a statistical viewpoint the distance between two rays, and obtained d(ψ, ϕ) = cos−1 |ψ|ϕ|. This is identical, up to a constant factor, to the geodesic distance as measured by the Fubini–Study metric [2]. Theorem 6, together with the following Theorem 7, reveals a deeper connection between them. 4. Parameter Estimation for Pure State Models In this section, we give a parameter estimation theory for pure state models based on the SLD. Suppose we are given an n-parameter pure state model (3). In order to handle simultaneous probability distributions of possibly mutually non-commuting observables, an extended framework of measurement theory is needed (p. 53 [7], p. 50 [8]). An estimator for θ is identified to a generalized measurement which takes values on Θ. The expectation vector with respect to the measurement M at the state ρθ is defined as 6 ˆ M (dθ). ˆ Eθ [M ] = θP θ

entire

December 28, 2004

13:56



225

The measurement M is called unbiased if Eθ [M ] = θ holds for all θ ∈ Θ, i.e., 6 ˆ = θj , (j = 1, · · · , n). θˆj PθM (dθ) (12) Differentiation yields 6 ∂ ˆ = δj , θˆj k PθM (dθ) k ∂θ

(j, k = 1, · · · , n).

(13)

If (12) and (13) hold at a certain θ, M is called locally unbiased at θ. Obviously, M is unbiased iff M is locally unbiased at every θ ∈ Θ. Letting M be a locally unbiased measurement at θ, we define the covariance matrix Vθ [M ] = [vθjk ] ∈ Rn×n with respect to M at the state ρθ by 6 jk ˆ vθ = (θˆj − θj )(θˆk − θk )PθM (dθ). (14) A lower bound for Vθ [M ] is given by the following theorem, which is a pure state version of quantum Cramér–Rao theorem. Theorem 7: Given a pure state model ρθ , the following inequality holds for any locally unbiased measurement M : % &−1 . (15) Vθ [M ] ≥ JθS Proof: It is proved in almost the same way as the strictly positive case (p. 274 [8]), except that ·, ·ρθ is a pre-inner product now. When the model is one dimensional, the measurement M is identified with a certain self-adjoint operator T , and the inequalities in the theorem become scalar, i.e., 1 (16) Vθ [T ] ≥ S . Jθ In this case, the lower bound 1/ Tr ρθ (LSθ )2 can be attained by the unbiased estimators 2 dρθ + Kθ , ∀ Kθ ∈ Ksa (ρθ ), (17) T = θI + S Jθ dθ where I is the identity. Since dρθ /dθ and Kθ do not commute in general, the measurement which attains the lower bound (16) is not determined uniquely. This fact provides significant features in the pure state estimation theory. On the other hand, when the dimension n ≥ 2, the matrix equality in (15) cannot be attained in general, because of the impossibility of the

entire

December 28, 2004

13:56

226



exact simultaneous measurement of non-commuting observables (in von Neumann’s sense). We must, therefore, abandon the strategy of finding the measurement that minimizes the covariance matrix itself. Rather, we often adopt another strategy as follows: Given a positive definite real matrix G = [gjk ] ∈ Rn×n , find the measurement M that minimizes the quantity gjk vθjk . (18) tr GVθ [M ] = jk

If there is a constant C such that tr GVθ [M ] ≥ C holds for all M , C is called a Cramér–Rao type bound or simply a CR bound, which may depend on both G and θ. For instance, it has been shown that the following quantity is a CR bound [8, 11]. C S = tr G(JθS )−1 . This bound is, however, not always the most informative one unless n = 1. For instance, it has been shown that the CR bound based on the right logarithmic derivative is the most informative one for coherent models [3, 4]. Nevertheless, there have been few results that derived the most informative CR bounds, as the strictly positive case. The construction of the general quantum parameter estimation theory for n ≥ 2 is left for future study. 5. Examples Here we give two examples of one-parameter pure state estimation. Let us first consider a model of the form ρθ = eiθH/ ρ0 e−iθH/ .

(19)

Here, H is the time independent Hamiltonian of the system, the Planck constant, and θ the time parameter. We first assume ρ0 > 0. According to the one-parameter estimation theory for strictly positive models [10], V [T ] ≥

1 1 ≥ J(Lθ ) JθS

(20)

holds, where Lθ is any logarithmic derivative which satisfy (4) and J(Lθ ) = (Lθ , Lθ )ρθ . The first equality is attained iff T = θI +LSθ /JθS , and the second equality holds iff Lθ = LSθ . (Note LSθ is unique for strictly positive models.) Now, LA θ = −2i(H − Tr ρθ H)/ is the ALD for the model (19) and the corresponding Cramér–Rao inequality becomes Vθ [T ] ≥

2 , 4Vθ [H]

(21)

entire

December 28, 2004

13:56



227

where T is an arbitrary unbiased estimator T for the time parameter θ. This inequality is nothing but a time-energy uncertainty relation. It is notable that the lower bound cannot be attained for any T since LA θ is not the SLD for strictly positive models. We next assume that ρ0 is pure. In this case, Theorem 6 and (16) (17) assert that the equality in (21) is locally attainable. This is a significant difference between the strictly positive models and the pure state models. Since both the ALD LA θ = −2i(H − Tr ρθ H)/ and the SLD–Fisher information for the pure state models JθS = 2 Tr(dρθ /dθ)2 can be obtained directly from the Liouville–von Neumann equation, this result is not specific to the case where the Hamiltonian is time independent, but is quite general. We next consider the efficiency of an estimator. An unbiased estimator T is called efficient if the equality in (16) holds for all θ ∈ Θ. It has been proved in [10] that a one-parameter model ρθ has an efficient estimator iff the model takes the form 1

1

ρθ = e 2 [β(θ)T −γ(θ)] ρ0 e 2 [β(θ)T −γ(θ)],

(22)

where β(θ), γ(θ) are real functions. Now, let us consider a model of the form ρθ = eif (θ)A ρ0 e−if (θ)A , where f (θ) is a real monotonic odd function and A ∈ Lsa . If ρ0 > 0, then it is shown that there exists an efficient estimator for θ only when A is a canonical observable [5]. This cannot be the case unless the underlying Hilbert space H is infinite dimensional. On the other hand, if ρ0 is pure, then there may exist an efficient estimator even if A is not canonical, because of the uncertainty Kθ ∈ Ksa (ρθ ) in (17). For instance, the spin 1/2 model 7 8 1 π 1 11 f (θ) = ( − cos−1 θ), A = σy , ρ0 = , 2 2 2 11 has an efficient estimator σz for the parameter θ. Indeed, this model permits another form 7 7 8 8 ; 1 1 1+θ 1+θ log σz ρ0 exp log σz . ρθ = 1 − θ2 exp (23) 4 1−θ 4 1−θ This is not a paradox since, in the pure state model, the estimator which attains the Cramér–Rao bound (16) is adjustable for every points ρθ up to the uncertainty of the kernel Ksa (ρθ ), see also [5].

entire

December 28, 2004

13:56

228



6. Conclusions A quantum estimation theory for pure state models was presented. We first derived some mathematical lemmas to treat the pure state models and derived quantum counterpart of the Fisher metric. The statistical significance of the Fubini–Study metric was also stressed. We then formulated the oneparameter pure state estimation theory based on the symmetric logarithmic derivative and disclosed the characteristics of pure states. Some examples were given in order to clarify the difference between the pure state models and the strictly positive models. The construction of the general quantum multi-parameter estimation theory is, however, left for future study, as the strictly positive model case. References [1] S. Abe, “Quantized geometry associated with uncertainty and correlation,” Phys. Rev. A, 48 4102 (1993), and the references therein. [2] J. Anandan and Y. Aharonov, Phys. Rev. Lett., 65, 1697, (1990). [3] A. Fujiwara, “Multi-parameter pure state estimation based on the right logarithmic derivative,” Math. Eng. Tech. Rep., 94-9, Univ. Tokyo (1994). [4] A. Fujiwara, “Linear random measurements of two non-commuting observables,” Math. Eng. Tech. Rep., 94-10, Univ. Tokyo (1994). [5] A. Fujiwara, “Information geometry of quantum states based on the symmetric logarithmic derivative,” Math. Eng. Tech. Rep., 94-11, Univ. Tokyo (1994). [6] C.W. Helstrom, Phys. Lett., 25A, 101 (1967). [7] C.W. Helstrom, Quantum Detection and Estimation Theory, Academic Press, New York, 1976. [8] A.S. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, NorthHolland, Amsterdam, 1982. [9] S. Kobayashi and K. Nomizu, Foundations of Differential Geometry, II, John Wiley, New York, 1969. [10] H. Nagaoka, In Proceeding of 10th Symposium on Information Theory and Its Applications, p. 19–21, 1987. (Chap. 9 of this book) [11] H. Nagaoka, IEICE Technical Report, 89, 228, IT 89-42, 9–14 (1989). (Chap. 8 of this book) [12] D. Petz, Entropy in Quantum Probability I, in Quantum Probability and Related Topics VII, 275–297, World Scientific, Singapore, 1992. [13] J.P. Provost and G. Vallee, Comm. Math. Phys., 76, 289 (1980). [14] W.K. Wootters, Phys. Rev. D, 23, 357 (1981). [15] H.P. Yuen and M. Lax, IEEE Trans. Inform. Theory, 19, 740 (1973).

entire

December 28, 2004

13:56


CHAPTER 18

Geometry of Quantum Estimation Theory Akio Fujiwara∗ 1. Introduction Quantum estimation theory, the quantum counterpart of the classical statistical estimation theory, originated in the study of finding the optimal detection of continuous signals in optical communication theory. The purpose of quantum estimation theory is to find the optimal measurement that gives the best estimates of the true state which lies in a certain parametric family of quantum states. A quantum statistical model is a family of density operators ρθ defined on a certain separable Hilbert space H with finite-dimensional real parameters θ = (θi )ni=1 which are to be estimated statistically. It was Helstrom [14] who founded the quantum estimation theory and successfully introduced the symmetric logarithmic derivative for the one-parameter estimation theory as a quantum counterpart of the logarithmic derivative in the classical estimation theory. Yuen and Lax [32] investigated the two-parameter case in the expectation parameter estimation for quantum Gaussian models by introducing the right logarithmic derivative, where each parameter corresponds to mutually non-commuting two observables. These results have provided theoretical background of optical communication theory. Though many other researches have been made in multi-parameter estimation theory, the general solution is far from reaching as yet. For the reader’s convenience, we first give a brief summary of the conventional quantum parameter estimation theory. For details, consult [6, 15, 17]. Let H be a separable Hilbert space which corresponds to a physical system with inner product φ|ψ, (φ, ψ ∈ H ), and L and Lsa be, respectively, Editorial note: This chapter is reprinted from Part II “Geometry of quantum estimation theory” of A. Fujiwara’s Doctoral thesis A Geometrical Study in Quantum Information Systems. Originally, this part has three chapters. However, in this book, every paper is treated as one chapter. Thus, these three chapters are reprinted as three sections. ∗ Additional author’s note: This chapter is partly based on a joint work with H. Nagaoka. 229

entire

December 28, 2004

13:56


Akio Fujiwara

230

the set of all (bounded) linear operators and all self-adjoint operators on H . A quantum state is represented by a density operator ρ ∈ Lsa which satisfies ρ ≥ 0 and Tr ρ = 1. A state ρ is called pure if rank ρ = 1. In order to handle simultaneous probability distributions of possibly mutually noncommuting observables, an extended framework of measurement theory is needed (p. 53 [15], p. 50 [17]). A generalized measurement {M (B)}B∈F on a measurable space (Ω, F ) is an operator-valued set function which satisfy the following axioms: 1 M (φ) = 0, M (Ω) = I, 2 M (B) = M (B)∗ ≥ 0, (∀ B ∈ F ), $ 3 M ( j Bj ) = j M (Bj ), (for all at most countable disjoint sequence {Bj } ⊂ F). By fixing a state ρ and a measurement M , outcomes of the measurement form random variables, whose simultaneous distribution is given by PρM (B) = Tr ρM (B). In particular, a measurement M is called simple if it satisfies, in addition to the above three axioms, M (B)2 = M (B), (∀ B ∈ F ), or equivalently M (B1 )M (B2 ) = 0, (∀ B1 ∩ ∀ B2 = φ). In the following, simple measurement is denoted by E. Here we give an example of generalized measurement, which is called the random measurement (p. 70 [15]). Imagine N measuring apparatuses, which yield outcomes from the same finite set whose elements being labeled with the integers j. Suppose the nth apparatus is described by a (n) simple measurement {Ej }j . Select an apparatus n (n = 1, 2, · · · , N ) with probability ξn and apply to the physical system. Then the probability for obtaining the jth outcome is Pρ (j) =

N

(n)

ξn Tr ρEj

= Tr ρMj ,

n=1

where Mj =

N

(n)

ξn Ej

n=1

form a generally non-orthogonal measurement. Let S = {ρθ ; ρθ = ρ∗θ > 0, Tr ρθ = 1, θ ∈ Θ ⊂ Rn }

(1)

be the statistical parametric model composed of strictly positive density operators. Here, θ is the parameter to be estimated statistically. An estimator for θ is identified to a generalized measurement which takes values

entire

December 28, 2004

13:56


Geometry of Quantum Estimation Theory

231

on Θ. The expectation vector with respect to the measurement M at the state ρθ is defined as 6 ˆ M (dθ). ˆ Eθ [M ] = θP θ The measurement M is called unbiased if Eθ [M ] = θ holds for all θ ∈ Θ, i.e., 6 ˆ = θj , (j = 1, · · · , n). θˆj PθM (dθ) (2) Differentiation yields 6 ∂ ˆ = δj , θˆj k PθM (dθ) k ∂θ

(j, k = 1, · · · , n).

(3)

If (2) and (3) hold at a certain θ, M is called locally unbiased at θ. Obviously, M is unbiased iff M is locally unbiased at every θ ∈ Θ. A locally unbiased measurement is accompanied by n self-adjoint operators: 6 j ˆ (j = 1, · · · , n). (4) X = (θˆj − θj )M (dθ), The locally unbiasedness conditions (2) (3) are then re-expressed as ∂ρθ j X = δkj , (j, k = 1, · · · , n). (5) ∂θk Since these self-adjoint operators {X j } are usefully employed in the following, we may call them locally unbiased operators. Letting M be a locally unbiased measurement at θ, we define the covariance matrix Vθ [M ] = [vθjk ] ∈ Rn×n with respect to M at the state ρθ by 6 jk ˆ (6) vθ = (θˆj − θj )(θˆk − θk )PθM (dθ). Tr ρθ X j = 0,

Tr

In order to obtain lower bounds for Vθ [M ], let us consider a quantum analogue of the logarithmic derivative denoted by Lθ : ∂ρθ 1 = [ρθ Lθ,j + L∗θ,j ρθ ]. j ∂θ 2

(7)

For instance, 1 ∂ρθ = [ρθ LSθ,j + LSθ,j ρθ ], ∂θj 2

LSθ,j = LS∗ θ,j

(8)

defines the symmetric logarithmic derivative (SLD) LSθ,j introduced by Helstrom [14], and ∂ρθ = ρθ L R θ,j ∂θj

(9)

entire

December 28, 2004

13:56


Akio Fujiwara

232

defines the right logarithmic derivative (RLD) LR θ,j introduced by Yuen and Lax [32]. Thus, (7) defines a certain class of logarithmic derivatives. Correspondingly, we define the quantum analogue of Fisher information matrix Jθ = [(Lθ,j , Lθ,k )ρθ ], where the inner product (·, ·)ρ on L is defined by (A, B)ρ = Tr ρ BA∗ .

(10)

We also define another inner product on L as A, Bρ =

1 Tr ρ (BA∗ + A∗ B). 2

(11)

Then, the following quantum version of Cramér–Rao theorem holds. Proposition 1: For any locally unbiased measurement M , the following inequality holds: Vθ [M ] ≥ (Re Jθ )−1 ,

(12)

S where Re Jθ = (Jθ + Jθ )/2. In particular, for the SLD, Jθ = Re Jθ = LSθ,j , LSθ,k ρθ is called the SLD–Fisher information matrix. Moreover, for the RLD,

holds, where JθR matrix.

Vθ [M ] ≥ (JθR )−1 (13) R = (LR θ,j , Lθ,k )ρθ is called the RLD–Fisher information

When the model is one dimensional, the inequalities in the theorem become scalar. In this case, it is shown that the lower bound (Re Jθ )−1 = (Jθ )−1 becomes most informative, i.e., it takes the maximal value, iff the SLD is adopted, and the corresponding lower bound (JθS )−1 = 1/Tr ρθ (LSθ )2 can be attained by the estimator T = θI + LSθ /JθS , where I is the identity. Thus, the one parameter quantum estimation theory is quite analogous to the classical one when the SLD is used. On the other hand, for the dimension n ≥ 2, the matrix equalities in (12) and (13) cannot be attained in general because of the impossibility of the exact simultaneous measurement of non-commuting observables. In other words, this inequality is not so useful in the quantum multiparameter estimation theory, which is in contrast to the classical estimation theory. We must, therefore, abandon the strategy of finding the measurement that minimizes the covariance matrix itself. Rather, we often adopt

entire

December 28, 2004

13:56



233

an alternative strategy as follows: Given a non-negative definite real matrix G = [gjk ] ∈ Rn×n , find the measurement M that minimizes the quantity gjk vθjk . (14) tr GVθ [M ] = jk

If there is a constant C such that tr GVθ [M ] ≥ C holds for all M , C is called a Cramér–Rao type bound or simply a CR bound, which may depend on both G and θ. For instance, it is shown that the following two quantities are both CR bounds [23]. C S = tr G(JθS )−1 , C

R

=

tr G Re (JθR )−1

(15) + tr

abs G Im (JθR )−1 .

(16)

Here, for a given matrix X, Im X = (X − X)/2i, and tr abs X denotes the absolute sum of the eigenvalues of X. Let us call these CR bounds, respectively, the SLD–bound and the RLD–bound. The most informative CR bound is the maximum value of such C for given G and θ. Yuen and Lax [32] proved that the above C R is most informative for the Gaussian model, and they explicitly constructed the optimum measurement which attains C R . Holevo (p. 285 [17]) derived another CR bound which, though an implicit form, is not less informative than C S and C R . Nagaoka [23] (cf. [24]) investigated in detail the relation between these CR bounds. He also derived a new CR bound for two-dimensional models, which is not less informative than Holevo’s one, and obtained explicitly the most informative CR bound specific to the spin 1/2 model. The construction of the general quantum parameter estimation theory for n ≥ 2 is left for future study. 2. Dualistic Geometry Based on the Logarithmic Derivatives As was mentioned in Part I† , there exist a variety of manners to define quantum counterparts of geometrical notions used in the classical estimation theory such as divergence, logarithmic derivative, Fisher information, Cramér–Rao inequality, etc. Moreover, only a few of them have been proved crucial in solving concrete quantum estimation problems so far [14, 23, 32]. Therefore, if we invoke an easy analogy of a certain aspect of classical information geometry, we cannot expect fruitful geometrical viewpoints. In order not to construct abundance of useless imitations, we must stand upon † Editorial

note: “Part I” means Part I of A. Fujiwara’s Doctoral thesis A Geometrical Study in Quantum Information Systems.

entire

December 28, 2004

13:56


Akio Fujiwara

234

a deeper understanding of inherent geometrical aspects of quantum estimation theory. In this section, we offer a new geometrical structure of quantum states and investigate its significance in quantum estimation theory [10]. In Section 2.1, we define a dualistic geometrical structure of quantum states based on the symmetric logarithmic derivatives. This structure has a non-vanishing torsion field in general, so that there exists no divergence function on the quantum space. In Section 2.2, the autoparallelity of a quantum state model is investigated in detail, which also clarifies the difference between classical and various quantum criterions of estimation. In Section 2.3, a condition for a one-parameter unitary model to be autoparallel, i.e., to have an efficient estimator, is derived. This condition indicates the importance of canonical observables in an estimation theoretical viewpoint. Some examples are also presented.

2.1. Dualistic structure on quantum states For brevity, the readers are assumed familiar with some elementary knowledge of classical information geometry [2]. Let H be a separable Hilbert space which corresponds to a physical system with inner product φ|ψ, (φ, ψ ∈ H ). A quantum state is represented by a density operator ρ which satisfies ρ = ρ∗ ≥ 0 and Tr ρ = 1. In this paper, we further assume ρ > 0 for simplicity. Denote by P the totality of such strictly positive density operators. Let us start with definitions of quantum analogues of some basic notions in classical theory. We first define the mixture representation of a tangent vector ∂ at ρ by the isomorphism ∂ ↔ ∂ρ and call ∂ρ the (m) m-tangent vector. Denote their totality by Tρ (P ). On the other hand, as was mentioned in the previous section, there exist a variety of candidates of the quantum counterpart of the exponential representation of the tangent vector due to the non-commutativity. Nevertheless, we here adopt the isomorphism ∂ ↔ L defined by ∂ρ =

1 [ρL + Lρ], 2

L = L∗

as the exponential representation, since it gives the definition of the socalled symmetric logarithmic derivative (SLD) which leads us to the most informative Cramér–Rao bound in the one parameter estimation theory (e) [22]. Let us call L the e-tangent vector and denote their totality by Tρ (P ). These isomorphisms are characterized by the following schemata:

entire

December 28, 2004

13:56



235

• m-tangent vector (m)

(P )= {G ; G = G∗ , Tr G = 0}

∈

∈

Tρ (P ) = Tρ =

∂

G

where ∂ρ = G

• e-tangent vector Tρ (P ) = Tρ (P )= {H ; H = H ∗ , Tr ρH = 0} ∂

∈

∈

(e)

=

H

where

∂ρ =

1 [ρH + Hρ], 2

H = H∗

Correspondingly, we define two kinds of parallel translations: • m-parallel translation (m)

Tρ (P) = Tρ (P )EG ↓ ↓ ↓ (m) Tρ (P ) = Tρ (P )EG • e-parallel translation (e)

Tρ (P ) = Tρ (P )EH ↓ ↓ ↓ (e) Tρ (P ) = Tρ (P )EH − Tr ρ H These parallel translations naturally define two affine connections ∇(m) and ∇(e) , which are called the mixture connection (m-connection) and the exponential connection (e-connection). Further, by using the symmetrized inner product A, Bρ = 12 Tr ρ(A∗ B + BA∗ ), we define the Riemannian metric at ρ by gρ (∂i , ∂j ) = Li , Lj ρ =

1 Tr ρ(Li Lj + Lj Li ). 2

Since the metric can be written as gρ (∂i , ∂j ) = Tr (∂i ρ)Lj , the two connections ∇(m) and ∇(e) are mutually dual with respect to the metric g in the following sense: For arbitrary vector fields X, Y, Z on P, (m)

(e)

Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X Z) holds. It is evident that both curvature tensors with respect to the connections ∇(m) and ∇(e) vanish, since the two parallel translations are defined independently of the choice of the path connecting ρ and ρ . The torsion tensor with respect to the m-connection also vanishes, whereas the torsion

entire

December 28, 2004

13:56


Akio Fujiwara

236

tensor T (e) with respect to the e-connection does not vanish in general since it becomes 1 (17) T (e) (∂j , ∂k )ρ = [[Lj , Lk ], ρ]. 4 From this fact, the divergence function does not exist on P in general. Let us prove (17). We first construct two e-parallel vector fields X1 and X2 by translating two arbitrarily fixed tangent vectors ∂1 , ∂2 at ρ0 with respect to the e-connection. Then (e)

(e)

∇X1 X2 = ∇X2 X1 = 0, and the torsion becomes (e)

(e)

T (e) (X1 , X2 ) = ∇X1 X2 − ∇X2 X1 − [X1 , X2 ] = −[X1 , X2 ]. Letting the e-representation of the tangent vector ∂j (j = 1, 2) at ρ0 be Lj , the e-representation of the tangent vector (Xj )ρ becomes (e)

(Xj )ρ = (Xj )ρ = Lj − Tr ρLj , which acts on ρ as 1 1 (e) (e) Xj ρ = ρ(Xj )ρ + (Xj )ρ ρ = [ (ρLj + Lj ρ) − 2ρTr ρLj ] . (18) 2 2 Since the quantity X2 X1 ρ describe the change of X1 ρ when ρ is slightly moved along X2 -direction, 1 (e) (e) (e) (e) X2 X1 ρ = (X2 ρ)(X1 )ρ + (X1 )ρ (X2 ρ) 2 1 = [ ((X2 ρ)L1 + L1 (X2 ρ)) − 2(X2 ρ)Tr ρL1 − 2ρTr (X2 ρ)L1 ] . 2 Substituting (18) into the above equation, we have 1 (ρL2 L1 + L1 L2 ρ)+ { symmetric terms with respect to L1 , L2 }. 4 We can evaluate X1 X2 ρ in the same way, yielding X2 X1 ρ =

1 [[L1 , L2 ], ρ]. 4 Now, since ρ0 is arbitrary, the torsion at any point ρ ∈ P becomes T (e) (X1 , X2 )ρ = [X1 , X2 ]ρ =

1 [[Lj , Lk ], ρ], (19) 4 where Lj is the e-representation of the tangent vector ∂j at ρ. For an (e) arbitrary submanifold M , the torsion TM is obtained by projecting T (e) onto the tangent space of M with respect to the Riemannian metric. T (e) (∂j , ∂k )ρ =

entire

December 28, 2004

13:56



237

2.2. Autoparallelity in quantum estimation theory In classical estimation theory, one of the most important geometrical notion is the autoparallelity of a model with respect to the e-connection, which is an equivalent condition for the existence of the efficient estimator of the model [2]. An e-autoparallel model is also called an exponential family which takes the form pθ (x) = exp θi fi (x) − ψ(θ) , where θ = (θ1 , · · · , θn ) is the n-dimensional e-affine parameter to be estimated statistically, ψ(θ) the normalization factor, and Einstein’s summa tion convention θi fi (x) = i θi fi (x) is used. In this section, we investigate the quantum counterpart of this notion in detail. Let us first consider some conditions relevant to the efficiency of the estimator. We add some terminologies: 1 A locally unbiased measurement M is called locally efficient at θ if Vθ [M ] ≤ Vθ [M ] holds for every locally unbiased measurement M at θ. A measurement M is called efficient if M is locally efficient for all θ. 2 Given an arbitrary weight (real symmetric non-negative matrix) G, a locally unbiased measurement M is called G–locally efficient at θ if tr GVθ [M ] ≤ tr GVθ [M ] holds for every locally unbiased measurement M at θ. Given an arbitrary weight field G = {Gθ | θ ∈ Θ}, a measurement M is called G–efficient if M is Gθ –locally efficient for all θ. In particular, if Gθ ≡ G for all θ, M is called G-efficient. There exist some evident relations between these notions, which are listed in the following propositions to put the issues in order. Proposition 2: The following conditions for a measurement M are equivalent: (i) (ii) (iii) (iv)

M is locally efficient at θ. For all G > 0, M is G–locally efficient at θ. For all v = (v1 , · · · , vn ) ∈ Rn , M is vT v–locally efficient at θ. M is locally unbiased at θ and Vθ [M ] = (Jθ )−1 holds at θ.

Proposition 3: The following conditions for a measurement M are equivalent:

entire

December 28, 2004

238

(i) (ii) (iii) (iv) (v)

13:56


Akio Fujiwara

M is efficient. For all G, M is G–efficient. For all G > 0, M is G–efficient. For all v = (v1 , · · · , vn ) ∈ Rn , M is vT v–efficient. M is unbiased and Vθ [M ] = (Jθ )−1 holds for all θ.

Proposition 4: Given a model S = {ρθ | θ ∈ Θ}, consider the following conditions: (i) S has an efficient measurement. (ii) S has (possibly G-dependent) G–efficient measurements MG for all G. (iii) S has (possibly G-dependent) G–efficient measurements MG for all G > 0. (iv) S has (possibly vT v-dependent) vT v–efficient measurements M{vT v} for all v = (v1 , · · · , vn ) ∈ Rn . (v) There exists a certain G for which S has a G–efficient measurement. (vi) S is e-autoparallel, all the SLDs commute, and θ is an m-affine coordinate system. In classical theory, all these conditions are equivalent. In quantum case, however, only the following relations hold in general:   (iii) (i) ⇐⇒ (vi), (i) =⇒ (ii) =⇒ (iv)  (v) Proposition 4 clarifies the difference between the classical and the quantum estimation theories. Indeed, since the SLDs do not commute in general, we cannot expect the existence of an efficient estimator. We therefore have adopted another strategy of minimizing the weighted sum of the covariances (14) instead of the classical minimization strategy of the covariance matrix itself. Nevertheless, Proposition 4 also indicates that there exist a variety of strategies which are not mutually equivalent in general. Indeed, the relations between the conditions (iii) (iv) (v) in Proposition 4 are not yet clear. By the way, the condition (iv) in Proposition 4 is closely related to the one-parameter quantum estimation theory which is rather well-established so far. Indeed, the restriction of the Cramér–Rao inequality (12) in the v-direction vVθ [M ]vT ≥ v(Jθ )−1 vT

(20) T

gives explicitly the locally attainable lower bound for tr (v v)Vθ [M ]. More precisely, the following lemma holds.

entire

December 28, 2004

13:56



239

Lemma 5: Given a model S = {ρθ ; θ = (θ1 , . . . , θn ) ∈ Θ} and an arbitrary v ∈ Rn , consider the differential equation dθi = vj J ji , dt

(21)

where J ij is the (i, j) entry of (Jθ )−1 , and denote by θ(t) the solution of (21) for an arbitrarily fixed initial condition θ0 ∈ Θ. Then the restricted Cramér– Rao inequality (20) can be regarded as the one dimensional Cramér–Rao inequality for the one parameter sub-model ρθ(t) . Proof: Let Lj be the SLD for the parameter θj . For every locally unbiased measurement M of θ, 6 ˆ T (v) = vi θî M (dθ) becomes a locally unbiased estimator of the parameter θ(v) = vi θi at θ and satisfies 6 6 ˆ j = vi Re Tr Lj ρθ ˆ T (v) − θ(v), Lj ρθ = Re Tr ρθ vi θî M (dθ)L θî M (dθ) 6 6 ∂ρθ ˆ = vi ∂ Tr ρθ ˆ = vi ∂ θi = vj . θî M (dθ) θî M (dθ) = vi Tr j j ∂θ ∂θ ∂θj Then the orthogonal projection of T (v) − θ(v) onto the SLD–tangent space span {Lj }nj=1 with respect to the inner product ·, ·ρθ is S(v) = vj J ji Li since, by expressing S(v) = ci Li , vj = T (v) − θ(v), Lj ρθ = S(v), Lj ρθ = ci Li , Lj ρθ = ci Jij , where Jij , J ij are the (i, j) entries of Jθ , (Jθ )−1 , respectively. Therefore, the inequality T (v) − θ(v), T (v) − θ(v)ρθ ≥ S(v), S(v)ρθ characteristic of the projection is nothing but (20). Now, let us determine the sub-model ρθ(t) whose Cramér–Rao inequality becomes (20), i.e., whose SLD is S(v). Observing dρθ(t) 1 ∂ρθ dθi 1 dθi [S(v)ρθ(t) + ρθ(t) S(v)] = = = [Li ρθ + ρθ Li ] , i 2 dt ∂θ dt 2 dt we have S(v) = Li

dθi . dt

entire

December 28, 2004

240

13:56


Akio Fujiwara

Therefore, the desired sub-model is determined by the following differential equation dθi = vj J ji , dt

θ(0) = θ0 .

Since d i vi θ = vi J ij vj > 0, dt we can take t = vi θi , which proves the lemma. This lemma immediately leads us to the following theorem, which shows that the condition (iv) in Proposition 4 is a sufficient condition for the model to be totally e-geodesic. Theorem 6: If a model S = {ρθ ; θ = (θ1 , . . . , θn ) ∈ Θ} has (possibly v-dependent) vT v–efficient measurements for all v = (v1 , · · · , vn ) ∈ Rn , then S is totally e-geodesic and θ is the m-affine coordinate system. Proof: From the assumption, the restricted Cramér–Rao bound (20) can be attained by a certain θ-independent measurement. Then, by invoking Lemma 5, the sub-model ρθ(t) which is determined by v and ρ0 has an efficient estimator for t. Then ρθ(t) must be an e-geodesic and t is an maffine parameter. Since v is arbitrary, the model is totally e-geodesic and θ is m-affine. Theorem 6 asserts that the minimization strategy of tr GVθ [M ], (G ≥ 0) is closely related to the geometric structure of the model. Indeed, if the model is not totally e-geodesic (hence not e-autoparallel), we may not expect the existence of G-efficient measurement. Therefore the geodesic nature of a model is an important notion in quantum estimation theory. In general, an autoparallel submanifold is automatically a totally geodesic submanifold. Conversely, a totally geodesic submanifold becomes an autoparallel submanifold if the enveloping manifold is torsion free (II, p. 53 [20]). In this sense, the assumption of Theorem 6 may have little to do with the condition for a model to be e-autoparallel. Though a necessary and sufficient condition for a model to be e-autoparallel has been obtained by Nagaoka [27], the relation between the assumption of Theorem 6 and the e-autoparallelity of the model is not yet clarified so far.

entire

December 28, 2004

13:56



241

2.3. Autoparallelity in unitary models Since a model ρθ is represented, in general, by a spectral decomposition 6 ρθ = pθ (x)Eθ (dx), the parameter change is composed of two parts: eigenvalue part (classical part) and unitary part (purely quantum part). One extreme case is a model where Eθ is θ-independent (classical statistical model). Let us call another extreme a unitary model where pθ is θ-independent. Owing to group theoretical symmetry of the physical system, we often encounter such unitary models. Therefore, it is expected that the study of unitary models may bring us some important suggestions toward the construction of general quantum estimation theory. In this section, we investigate the geodesic nature of unitary models. Since Theorem 6 indicates the importance of the decomposition of the model into foliation of one-dimensional submanifolds, we consider here onedimensional unitary models and derive a necessary and sufficient condition for the model to be an e-geodesic. Let us consider a one parameter unitary model of the form ρt = eif (t)A ρ0 e−if (t)A ,

(22)

where A is a self-adjoint generator, ρ0 a strictly positive initial density operator, f (t) a real smooth monotonic odd function with one dimensional real parameter t. Let us recall the commutation operator D introduced by Holevo [17]. For an arbitrary state ρ and a self-adjoint operator X, D is defined by i(Xρ − ρX) =

1 ((DX)ρ + ρ(DX)), 2

(DX)∗ = DX.

(23)

This is an anti-symmetric super-operator such that A, DBρ = −DA, Bρ holds for all A, B ∈ Lsa . Note that DX is uniquely determined iff ρ > 0. Lemma 7: Denoting by Dt the commutation operator with respect to the unitary model ρt , then

Dt A = eif (t)A (D0 A)e−if (t)A , holds, and the SLD becomes Lt = f (t)eif (t)A (D0 A)e−if (t)A .

entire

December 28, 2004

13:56


Akio Fujiwara

242

Proof: dρt = if (t)[Aρt − ρt A] = if (t)eif (t)A [Aρ0 − ρ0 A]e−if (t)A dt 1 = f (t)eif (t)A [ρ0 (D0 A) + (D0 A)ρ0 ]e−if (t)A 2 1 = f (t) ρt eif (t)A (D0 A)e−if (t)A + eif (t)A (D0 A)e−if (t)A ρt . 2 This proves the lemma. Lemma 8: The unitary model (22) becomes an e-geodesic iff there exists a real function J(t) such that the self-adjoint operator T =

f (0) D0 A J(0)

satisfies the following relation eif (t)A T e−if (t)A =

f (0)J(t) (T − t). f (t)J(0)

In this case, T becomes the efficient estimator of the parameter t, and J(t) becomes the SLD–Fisher information of the model. Proof: In general, a one-parameter model ρt has an efficient estimator T of the parameter t iff there exists a real function J(t) such that Lt = J(t)(T − t) holds. Comparing this and Lemma 7, we have f (t)eif (t)A (D0 A)e−if (t)A = J(t)(T − t). Setting t = 0, we have the definition of T , which immediately leads to the lemma. Lemma 9: Suppose we are given self-adjoint operators A, B, and real functions f (t), g(t) such that f (−t) = −f (t), g(0) = 1. The equality of the form eif (t)A Be−if (t)A = g(t)(B − t) holds iff there exists a non-zero real number µ such that f (t) =

t , µ

g(t) = 1,

[A, B] = iµ.

(24)

entire

December 28, 2004

13:56



243

Proof: The sufficiency follows immediately from the expansion formula 2 1 it it itA/µ −itA/µ e Be = B + [A, B] + [A, [A, B]] + · · · . µ 2! µ We show the necessity. Since the left-hand side of (24) is an equi-spectrum transformation, the right-hand side is also so only when B has continuous spectrum, so that the Hilbert space H on which B acts must be infinite dimensional. Denoting the eigen-equation for B by ( |b ∈ H ,

B|b = b|b,

b ∈ R ).

(25)

Then, from (24), we have eif (t)A Be−if (t)A |b = g(t)(b − t)|b, or Be−if (t)A |b = g(t)(b − t)e−if (t)A |b, which indicates that e−if (t)A |b is an eigenvector of B with eigenvalue g(t)(b − t). On the other hand, operating eif (t)A to (25) from the left, and using (24), we have eif (t)A Be−if (t)A eif (t)A |b = b eif (t)A |b = g(t)(B − t)eif (t)A |b, which leads to

Beif (t)A |b =

b + t eif (t)A |b. g(t)

This indicates that eif (t)A |b is an eigenvector of B with eigenvalue (b/g(t)+ t). Now, e−if (t)A |b and eif (t)A |b are one-parameter family of eigenvectors of B which start from a common eigenvector |b and, since f (t) is assumed odd, these eigenvectors must be related by e−if (t)A |b = eif (−t)A |b. Therefore, the corresponding eigenvalues must be identical: b −t , g(t)(b − t) = g(−t) or 1 b g(t) − − t{g(t) − 1} = 0. g(−t) Since this relation must hold for any b and t, we have g(t) = 1. In this case, (24) is reduced to eif (t)A Be−if (t)A = B − t.

entire

December 28, 2004

13:56


Akio Fujiwara

244

Expanding the left-hand side as B + if (t)[A, B] +

{if (t)}2 [A, [A, B]] + · · · = B − t, 2!

and differentiating by t and setting t = 0, we have [A, B] = i/f (0), since f (0) = 0. Substituting this commutation relation into the above expansion, we see f (t) = f (0)t. Setting µ = 1/f (0), we have the conditions in the lemma. These lemmas immediately lead us to the following main theorem. Theorem 10: The one-parameter unitary model (22) becomes e-geodesic iff there exist non-zero reals µ and J such that f (t) =

t , µ

[A, D0 A] = iµ2 J

holds. In this case, T = D0 A/µJ is the efficient estimator for the parameter t, and J is the SLD–Fisher information of the model. Proof: From Lemma 9, the conditions derived in Lemma 8 are rewritten as f (t) =

t , µ

J(t) = 1, J(0)

[A, T ] = iµ,

T =

1 D0 A. µJ(0)

Setting J = J(0), we have the theorem. This theorem shows that, only when the generator is a canonical observable, the one parameter unitary model becomes an e-geodesic, i.e., it has an efficient estimator. Note that this fact is strongly indebted to the assumption that the model is strictly positive. For instance, this condition can be considerably loosened for pure state models, see the next section. Let us further determine ρ0 which satisfy the relation

D0 A = µJB,

[A, B] = iµ.

Taking the non-commutative Fourier transformation (see Appendix A) of the identity ρ0 A − Aρ0 =

i µJ[ρ0 B + Bρ0 ], 2

we have −xF x,k {ρ0 } = µ2 J

∂ Fx,k {ρ0 }. ∂x

entire

December 28, 2004

13:56



245

Integrating this under the condition F0,0 {ρ0 } = 1, we have F x,k {ρ0 } = exp(−

x2 )F (k), 2µ2 J

where F (k) is an arbitrary function which satisfies F (0) = 1 and some regularity conditions. Then ρ0 is written in the Weyl representation as 7 8 6 dxdk i x2 . (26) ρ0 = exp(− 2 )F (k) exp − (kA + xB) 2µ J µ 2πµ For instance, it becomes a quantum Gaussian state if we take F (k) to be Gaussian, see Appendix A. Corollary 11: Quantum Gaussian model is e-autoparallel. Proof: For a Gaussian model, SLDs are Lq = (Q − q)/σq2 ,

Lp = (P − p)/σp2 ,

the commutation operator

DQ = (P − p)/σp2 , and the SLD Fisher metric

DP = −(Q − q)/σq2 ,

7

8 1/σp2 0 J= , 0 1/σq2

as shown in Appendix A. The restricted one-dimensional sub-model in the direction v = (vq , vp ) is then determined by the SLD S = vj J ji Li = vq (Q − q) + vp (P − p) =

1 DR,

where R = −vq σq2 P + vp σp2 Q. Note R is independent of q and p. The submodel is therefore written also in the form ρt = eiRt/ ρ0 e−iRt/ . Observing % & [R, D0 R] = i2 vq2 σq2 + vp2 σp2 we can conclude, with the help of Theorem 10, that the sub-model is an e-geodesic. Since v is arbitrary, a Gaussian model is proved to be totally e-geodesic. Then it is also e-autoparallel since the model is torsion free.

entire

December 28, 2004

13:56


Akio Fujiwara

246

Here we give another example in the spin 2 × 2 representation. An egeodesic (quantum exponential family) which has an efficient estimator σz is written as [22] 1

1

ρt = e 2 [tσz −γ(t)] ρ0 e 2 [tσz −γ(t)] , If we set ρ0 =

8 7 1 1 x0 , 2 x0 1

γ(t) = log[Tr ρ0 etσz ].

(27)

−1 ≤ x0 ≤ 1

without loss of generality, then (27) becomes x0 , y(t) = 0, z(t) = tanh t x(t) = cosh t in the Stokes representation, i.e., ρt = 12 (I + x(t)σx + y(t)σy + z(t)σz ). It indicates that the e-geodesic is an ellipse of the form 2 x(t) + {z(t)}2 = 1, x0 which connects the north and south poles. Therefore, it cannot be a unitary model unless x0 = ±1, i.e., unless ρ0 is a pure state, since the equieigenvalue surfaces are spherical shells centered at the origin. 2.4. Conclusions of Section 2 An information geometrical aspects of quantum statistical models were studied. We first introduced a natural dualistic structure on the quantum state space based on the symmetric logarithmic derivatives. We next investigated the autoparallelity and clarified the difference between the classical and the quantum estimation theory. The importance of canonical observables in quantum estimation theory was also stressed. 3. Pure State Estimation Theory In order to avoid singularities, the conventional quantum estimation theory has been often restricted to models that are composed of strictly positive density operators. Quantum information theorists have also kept away from degenerate states, such as pure states, for mathematical convenience [28]. Indeed, the von Neumann entropy cannot distinguish pure states, and the relative entropies diverge. In this section, however, we try to construct an estimation theory for pure state models, and clarify the differences between the pure state case

entire

December 28, 2004

13:56



247

and the strictly positive state case [7, 8, 9]. In Section 3.1, we prove some crucial lemmas which will provide fundamentals of the pure state estimation theory. In Section 3.2, we study the quantum counterpart of the logarithmic derivative and the Fisher information which played important roles in the classical estimation theory. The quantum statistical significance of the Fubini–Study metric is also clarified. In Section 3.3, we provide one parameter pure state estimation theory based on the symmetric logarithmic derivative. This is rather analogous to the conventional quantum estimation theory, but reveals the essential difference between the pure state models and the strictly positive models. In Section 3.4, we study a multi-parameter quantum estimation theory based on the right logarithmic derivative. The estimation theoretical significance of the coherent models is also clarified. In order to demonstrate the results, some examples are also presented. 3.1. Preliminaries Given a possibly degenerate density operator ρ on H , we define sesquilinear forms on L: (A, B)ρ = Tr ρBA∗ , 1 A, Bρ = Tr ρ(BA∗ + A∗ B), 2

(28) (29)

where A, B ∈ L. These are apparently the same as the inner products (10) and (11), but are now pre-inner products on L that possess all properties of inner products except that (K, K)ρ and K, Kρ may be equal to zero for a nonzero K ∈ L. Note that the Schwarz inequality also holds for pre-inner products. The forms (·, ·)ρ and ·, ·ρ become inner products if and only if ρ > 0. If rank ρ = 1 or equivalently ρ2 = ρ, ρ is called pure. The following lemmas are fundamental. Lemma 12: Suppose ρ is pure. Then the following three conditions for linear operators K ∈ L are equivalent. (i) (K, K)ρ = 0, (ii) ρK = 0, (iii) Tr ρK = 0 and ρK + K ∗ ρ = 0. Proof: Let us express as ρ = |ψψ| where |ψ is a normalized vector in H . Then the following equivalent sequence (K, K)ρ = 0 ⇐⇒ ψ|KK ∗ |ψ = 0 ⇐⇒ ψ|K = 0 ⇐⇒ |ψψ|K = 0,

entire

December 28, 2004

13:56


Akio Fujiwara

248

yield (i)⇔(ii). Further, (ii)⇒(iii) is trivial. (iii)⇒(ii) is shown as follows. Operating ψ| from the left to the assumption |ψψ|K + K ∗ |ψψ| = 0, and invoking another assumption Tr ρK = 0 ⇔ ψ|K ∗ |ψ = 0, we have 0 = ψ|ψψ|K + ψ|K ∗ |ψψ| = ψ|K. Therefore |ψψ|K = 0. Lemma 13: Suppose ρ is pure. Then the following three conditions for linear operators K ∈ L are equivalent. (i) K, Kρ = 0, (ii) ρK = ρK ∗ = 0, (iii) Tr ρK = 0, ρK + K ∗ ρ = 0, and ρK ∗ + Kρ = 0. Proof: (i)⇔(ii) is shown as follows: K, Kρ = 0 ⇐⇒ ψ|KK ∗ |ψ + ψ|K ∗ K|ψ = 0 ⇐⇒ ψ|K = 0,

ψ|K ∗ = 0

⇐⇒ ρK = ρK ∗ = 0. (ii)⇔(iii) is a straightforward consequence of Lemma 12. Lemma 14: Suppose ρ is pure. Then the following three conditions for self-adjoint operators K ∈ Lsa are equivalent. (i) K, Kρ = 0, (ii) ρK = 0, (iii) ρK + Kρ = 0. Proof: Straightforward by setting K = K ∗ in Lemma 13. These lemmas are effectively employed in the pure state estimation theory. Denote by K(ρ) the set of linear operators K ∈ L satisfying (K, K)ρ = 0, which are called the kernel of the pre-inner product (·, ·)ρ . Also denote by Ksa (ρ) the set of self-adjoint operators K ∈ Lsa satisfying K, Kρ = 0, which are called the kernel of the pre-inner product ·, ·ρ . We next consider Holevo’s commutation operator D, which is defined in the same way as (23). i(Xρ − ρX) =

1 ((DX)ρ + ρ(DX)) , 2

X, DX ∈ Lsa .

(30)

entire

December 28, 2004

13:56



249

If ρ is degenerate, DX is not determined uniquely. In particular, the following lemma holds. Lemma 15: Suppose ρ is pure. Then D is regarded as a super-operator on the quotient space Lsa /Ksa (ρ), and is defined by (DX) ρ = 2i(X − Tr ρX)ρ,

(X ∈ Lsa /Ksa (ρ)).

(31)

Proof: Let us denote two distinct images of A ∈ Lsa by (DA) and (DA) , then K = (DA) − (DA) ∈ Lsa satisfies Kρ + ρK = 0 and, from Lemma 13, K ∈ Ksa (ρ). Further, observing DK, DKρ = −D2 K, Kρ , K ∈ Ksa (ρ) implies DK ∈ Ksa (ρ). Therefore, D is regarded as a super-operator on Lsa /Ksa (ρ). Further, re-expressing (30) as 8 7 8∗ 7 DX DX + i(X − Tr ρX) + + i(X − Tr ρX) ρ = 0, ρ 2 2 and using Lemma 12 together with the identity Tr ρ(DX) = 0, we have an equivalent equation (31).

3.2. Quantum Fisher metric Suppose we are given an n-parameter pure state model: S = {ρθ ; ρ∗θ = ρθ , Tr ρθ = 1, ρ2θ = ρθ , θ ∈ Θ ⊂ Rn }.

(32)

As in the strictly positive case, we define a family of quantum analogues of the logarithmic derivative by 1 ∂ρθ = [ρθ Lθ,j + L∗θ,j ρθ ], j ∂θ 2

Tr ρθ Lθ,j = 0.

(33)

LSθ,j = LS∗ θ,j

(34)

For instance, ∂ρθ 1 = [ρθ LSθ,j + LSθ,j ρθ ], ∂θj 2

defines the symmetric logarithmic derivative (SLD). Furthermore, since every pure state model is written in the form ρθ = Uθ ρ0 Uθ∗ , where Uθ is unitary, we have another useful logarithmic derivative ∂ρθ 1 A = [ρθ LA θ,j − Lθ,j ρθ ], ∂θj 2

Tr ρθ LA θ,j = 0,

A∗ LA θ,j = −Lθ,j ,

(35)

which may be called the anti-symmetric logarithmic derivative (ALD). Indeed, the ALD is closely related to the local generator Aθ,j =

entire

December 28, 2004

250

13:56


Akio Fujiwara

−i(∂Uθ /∂θj )Uθ∗ of the unitary Uθ such as LA θ,j = −2iAθ,j . Thus, (33) defines a certain family of logarithmic derivatives [22]. Denote by T (ρθ ) the linear span (over R) of logarithmic derivatives which satisfy (33). Lemma 16: Suppose ρθ is pure and an arbitrary linear operator A ∈ L is given. Then all the quantities (A, Lθ,j )ρθ are identical for every logarithmic derivative Lθ,j ∈ T (ρθ ) which corresponds to the same direction θj . Proof: Take any logarithmic derivatives Lθ,j and Lθ,j which correspond to the same θj , and denote K = Lθ,j − Lθ,j . Then, from (33), K satisfies the condition (iii) of Lemma 12. Therefore (K, K)ρθ = 0 holds. This and the Schwarz inequality |(A, K)ρθ |2 ≤ (A, A)ρθ (K, K)ρθ lead to (A, K)ρθ = 0 for all A ∈ L. From Lemma 16, we can define uniquely the complex Fisher information matrix Jθ for the family of logarithmic derivatives (33) whose (j, k) entry is (Lθ,j , Lθ,k )ρθ . The SLD is also not uniquely determined for pure state models. Denote by T S (ρθ ) the linear span of SLDs which satisfy (34). Lemma 17: Suppose ρθ is pure and an arbitrary self-adjoint operator A ∈ Lsa is given. Then all the quantities A, LSθ,j ρθ are identical for every SLD LSθ,j ∈ T S (ρθ ) which corresponds to the same direction θj . Proof: By using Lemma 14, it is proved in the same way as Lemma 16. From Lemma 17, we can define uniquely the real Fisher information matrix JθS for the family of SLDs (34) whose (j, k) entry is LSθ,j , LSθ,k ρθ , which is called the SLD–Fisher information matrix. The above results are summarized by the following theorem. Theorem 18: Suppose ρθ is pure. Then the complex Fisher information S matrix Jθ = [(L θ,j , Lθ,k )ρθ ] and the SLD–Fisher information matrix Jθ = LSθ,j , LSθ,k ρθ are uniquely determined on the quotient spaces T (ρθ )/K(ρθ ) and T S (ρθ )/Ksa (ρθ ), respectively. They are related by JθS = Re Jθ . The (j, k) entry of JθS becomes (JθS )jk = 2Tr (∂j ρθ )(∂k ρθ ),

(36)

where ∂j = ∂/∂θj . This metric is identical, up to a constant factor, to the Fubini–Study metric.

entire

December 28, 2004

13:56



251

Proof: We only need to prove (36). Differentiating ρθ = ρ2θ , ∂j ρθ = (∂j ρθ )ρθ + ρθ (∂j ρθ ).

(37)

This relation indicates that 2∂j ρθ is a representative of the SLD. Then (JθS )jk = 2∂j ρθ , 2∂k ρθ ρθ = 2Tr ρθ [(∂j ρθ )(∂k ρθ ) + (∂k ρθ )(∂j ρθ )].

(38)

Further, multiplying ρθ with (37), we have ρθ (∂j ρθ )ρθ = 0.

(39)

Therefore, by using (37) and (39), (∂j ρθ )(∂k ρθ ) = [(∂j ρθ )ρθ + ρθ (∂j ρθ )][(∂k ρθ )ρθ + ρθ (∂k ρθ )] = (∂j ρθ )ρθ (∂k ρθ ) + ρθ (∂j ρθ )(∂k ρθ )ρθ . This, along with (38), leads to relation (36). Denoting ρθ = |ψψ| Tr (∂j ρθ )(∂k ρθ ) = 2[Re ∂j ψ|∂k ψ + ψ|∂j ψψ|∂k ψ], which is identical to the Fubini–Study metric [1, 20]. The Fubini–Study metric is known as a gauge invariant metric on a projective Hilbert space [29]. Theorem 18 gives another meaning of the Fubini–Study metric, i.e., the statistical distance. Wootters [31] also investigated from a statistical viewpoint the distance between two rays, and obtained d(ψ, ϕ) = cos−1 |ψ|ϕ|. This is identical, up to a constant factor, to the geodesic distance as measured by the Fubini–Study metric [3]. Theorem 18, together with Theorem 19 below, reveals a deeper connection between them. 3.3. One parameter pure state estimation In this section, we give a parameter estimation theory of pure state models based on the SLD. As is the strictly positive model case, the following theorem holds. Theorem 19: Given a pure state model ρθ , the following inequality holds for any locally unbiased measurement M : % &−1 Vθ [M ] ≥ JθS . (40) Proof: It is proved in almost the same way as the strictly positive case (p. 274 [17]), except that ·, ·ρθ is a pre-inner product now.

entire

December 28, 2004

13:56


Akio Fujiwara

252

When the model is one dimensional, the measurement M is identified with a certain self-adjoint operator T , and the inequalities in the theorem become scalar, i.e., Vθ [T ] ≥

1 . JθS

(41)

In this case, the lower bound 1/Tr ρθ (LSθ )2 can be attained by the unbiased estimators T = θI +

2 dρθ + Kθ , JθS dθ

∀

Kθ ∈ Ksa (ρθ ),

(42)

where I is the identity. Since dρθ /dθ and Kθ do not commute in general, the measurement which attains the lower bound (41) is not determined uniquely. This fact provides significant features in the pure state estimation theory which are demonstrated in the following examples. 3.3.1. Time–energy uncertainty relation Let us consider a model of the form ρθ = eiθH/ ρ0 e−iθH/ . Here, H is the time independent Hamiltonian of the system, the Planck constant, and θ the time parameter. According to the one parameter estimation theory for strictly positive models [22], V [T ] ≥

1 1 ≥ S J(Lθ ) Jθ

(43)

holds, where Lθ is any logarithmic derivative which satisfies (33) and J(Lθ ) = (Lθ , Lθ )ρθ . The first equality is attained when and only when T = θI + LSθ /JθS , and the second equality holds iff Lθ = LSθ . Now, LA θ = −2iH/ is an ALD for the model and the corresponding Cramér-Rao inequality becomes Vθ [T ] ≥

2 , 4Vθ [H]

(44)

where T is an arbitrary unbiased estimator T for the time parameter θ. This inequality is nothing but a time-energy uncertainty relation. If ρ0 > 0, then this lower bound cannot be attained for any T since LA θ is not an SLD, whereas Theorem 2 asserts that, if ρ0 is pure, the equality in (44) is

entire

December 28, 2004

13:56



253

locally attainable. This is a significant difference between the strictly positive models and the pure state models. Since the ALD LA θ = −2iH/ and the SLD–Fisher information for the pure state models JθS = 2Tr (dρθ /dθ)2 are both obtainable directly from the Liouville–von Neumann equation, this result is not specific to the case where the Hamiltonian is time independent, but is quite general. 3.3.2. Efficient estimator An unbiased estimator T is called efficient if the equality in (41) holds for all θ ∈ Θ. Nagaoka [22] has proved that a one parameter model ρθ has an efficient estimator when and only when the model takes the form 1

1

ρθ = e 2 [β(θ)T −γ(θ)] ρ0 e 2 [β(θ)T −γ(θ)],

(45)

where β(θ), γ(θ) are real functions. Let us consider a model of the form ρθ = eif (θ)A ρ0 e−if (θ)A , where f (θ) is a real monotonic odd function and A ∈ Lsa . If ρ0 > 0, then it was shown that there exists an efficient estimator for θ only when A is a canonical observable, see Theorem 10. On the other hand, if ρ0 is pure, then there may exist an efficient estimator even if A is not canonical, because of the uncertainty Kθ ∈ Ksa (ρθ ) in (42). For instance, the spin 1/2 model 7 8 1 11 1 π f (θ) = ( − cos−1 θ), A = σy , ρ0 = , 2 2 2 11 has an efficient estimator σz for the parameter θ. Indeed, this model admits another form 7 7 8 8 ; 1 1 1+θ 1+θ 2 log σz ρ0 exp log σz . (46) ρθ = 1 − θ exp 4 1−θ 4 1−θ This is not a paradox since, in the pure state model, the estimator which attains the Cramér–Rao bound (41) is adjustable for every points ρθ up to the uncertainty of the kernel Ksa (ρθ ). 3.4. Multi-parameter pure state estimation When the dimension n ≥ 2, the matrix equality in (40) cannot be attained in general, because of the impossibility of the exact simultaneous measurement of non-commuting observables (in von–Neumann’s sense). We must, therefore, adopt another strategy to minimize tr GVθ [M ], (G ≥ 0). Since

entire

December 28, 2004

13:56


Akio Fujiwara

254

there is no prototype for general theory of quantum multi-parameter estimation even in the strictly positive case, let us restrict ourselves here to seeking the estimation theory based on the RLD. This may sound strange since the RLD defined by (9) does not always exist for degenerate states. However, it is essential to notice that what we need is not the RLD itself but the inverse of the RLD–Fisher information matrix, as is understood by (13). We start with the following theorem‡ , which is an extension of Holevo’s result originally obtained in the strictly positive case (p. 280 [17]). Hereafter, the subscripts θ of the SLDs are omitted for simplicity. Theorem 20: Suppose we are given a pure state model ρθ . Let {ρθ (ε) ; ε > 0} be a family of strictly positive density operators ρθ (ε) having a parameter ε which satisfy limε↓0 ρθ (ε) = ρθ , and denote the corresponding RLD by S LR θ (ε). If the SLD–tangent space T (ρθ )/Ksa (ρθ ) is D -invariant, then % &−1 % S &−1 i % S &−1 % S &−1 J lim J R (ε) = J + D J (47) ε↓0 2 R S S holds, where J R (ε) = (LR j (ε), Lk (ε))ρθ (ε) and D = i Tr ρθ [Lj , Lk ] . Proof: Observing the identities % &∗ 1 ∂ρθ = (ρθ LSj + LSj ρθ ) = LR ρθ j ∂θj 2 and i (A, B)ρ = A, (I + D)Bρ , 2 we have, for all X ∈ L, i S LSj , Xρθ = (LR j , X)ρθ = (I + D )Lj , Xρθ . 2 Then LSj = (I + 2i D)LR j and 7 8 i R S −1 S J (ε) = Lj (ε), (I + D(ε)) Lk (ε)ρθ (ε) . 2 Since I + 2i D(ε) is symmetric with respect to ·, ·ρθ , the following lemma immediately leads us to (47), by setting V = Lsa , xj (ε) = LSj (ε), and A(ε) = I + 2i D(ε). ‡ Additional

author’s note: There is a crucial error in the proof of Lemma 21. For a modified argument as well as an improved version of Theorem 20, see A. Fujiwara and H. Nagaoka, J. Math. Phys., 40, 4227-4239 (1999), which is reprinted in Chap. 19 of this book.

entire

December 28, 2004

13:56



255

Lemma 21: Let V be a d-dimensional linear space (possibly d = ∞). Suppose we are given a family of n(< d) linearly independent vectors {xj (ε)}nj=1 in V , a family of inner products ·, ·ε , and a family of symmetric operators A(ε) on V with respect to the inner product, having a parameter ε ≥ 0. Assume A(ε) is invertible for ε > 0 but A(0) is not. Further, the linear span of {xj (0)}nj=1 is A(0)-invariant in V . Denote n × n matrices −1 J R (ε) = xj (ε), A (ε)xk (ε)ε , J(ε) = xj (ε), xk (ε)ε . Then

% &−1 lim J R (ε) = J −1 (0) [xj (0), A(0)xk (0)0 ] J −1 (0). ε↓0

(48)

Proof: Let us denote, by W ⊥ (ε), the orthogonal complement of W (ε) = span {xj (ε)}nj=1 with respect to the inner product ·, ·ε in V . Further, let {yj (ε)}dj=n+1 be a basis of W ⊥ (ε) and construct a basis {zj (ε)}dj=1 of V by combining them as xj (ε), j = 1, · · · , n, zj (ε) = yj (ε), j = n + 1, · · · , d. Consider enlarged d × d matrices JR (ε) = zj (ε), A−1 (ε)zk (ε)ε ,

J(ε) = [zj (ε), zk (ε)ε ] .

Since V = W (ε) ⊕ W ⊥ (ε) is, of course, A(ε)-invariant, the inverse of JR (ε) is explicitly given as

−1 JR (ε) = J−1 (ε) [zj (ε), A(ε)zk (ε)ε ] J−1 (ε).

−1 This matrix is well-defined even for ε = 0. Decompose JR (ε) into blocks: 7 87 87 8 7 8 P O AB P O P AP P BQ = , OQ CD OQ QCP QDQ where the three matrices in the left-hand side correspond to

J−1 (ε),

[zj (ε), A(ε)zk (ε)ε ] ,

J−1 (ε),

respectively, P, A are n×n matrices, and Q, D are (d−n)×(d−n) matrices. Further, it is easy to see that B = C = O for ε = 0, since W (0) is A(0) −1 R becomes a block invariant and A(0) is symmetric. Then limε↓0 J (ε) diagonal matrix, and the limit of the first n × n block P AP approaches (48).

entire

December 28, 2004

256

13:56


Akio Fujiwara

Note that Tr ρθ [LSj , LSk ] in Theorem 20 also independent of the uncertainty of the SLD because of Lemma 14. Therefore, Theorem 20 asserts that the inverse of the RLD–Fisher information matrix can be obtained directly from the SLD, without using the diverging RLD–Fisher information matrix itself. Then, it may be important to investigate the condition for the SLD–tangent space to be D-invariant. The following theorem characterizes the structure of D-invariant SLD–tangent space. Theorem 22: The D-invariant SLD–tangent space T S (ρθ )/Ksa (ρθ ) has an even dimension and is decomposed into direct sum of two-dimensional D-invariant subspaces. Moreover, by taking an appropriate basis of T S (ρθ )/Ksa (ρθ ), the operation of D can be written in the form      ˜S ˜S L L 0 2 1 1  L  L ˜ S   −2 0 ˜S       2 2     L  S ˜ S3  0 2  L  ˜3      ˜S   ˜S   −2 0 L4  =    L4  . (49) D  .   .   ..  .   .   .  .   .    S  S    ˜ 2m−1   ˜ 2m−1  L 0 2L ˜ S2m ˜ S2m −2 0 L L Proof: Since Tr ρ(DX) = 0 holds for all X ∈ Lsa /Ksa , (31) is rewritten as [−4(X − Tr ρX)]ρ = 2i[DX − Tr ρ(DX)]ρ. Comparing this equation to (31) with X replaced by DX, we have

D2 X = −4(X − Tr ρX). In particular, D2 X = −4X holds for every X which satisfies Tr ρX = 0. We may, therefore, write this relation as D2 = −4 on the SLD–tangent space T S (ρθ )/Ksa (ρθ ) for short. Take an arbitrary element e1 of the SLD–tangent space T S (ρθ )/Ksa (ρθ ), and let e2 = De1 . Then from the assumption, e2 ∈ T S (ρθ )/Ksa (ρθ ), and De2 = −4e1 since D2 = −4. Therefore, S1 (ρθ ) = span {e1 , e2 } is a Dinvariant subspace of T S (ρθ )/Ksa (ρθ ) and T S (ρθ )/Ksa (ρθ ) = S1 (ρθ ) ⊕ S1 (ρθ )⊥ , where S1 (ρθ )⊥ is the orthogonal complement of S1 (ρθ ) with respect to ·, ·ρθ . Repeating the same procedure to S1 (ρθ )⊥ , we have T S (ρθ )/Ksa (ρθ ) = S1 (ρθ ) ⊕ S2 (ρθ ) ⊕ · · · ⊕ Sm (ρθ ).

entire

December 28, 2004

13:56



257

In particular, dim[T S (ρθ )/Ksa (ρθ )] = 2m. We next investigate the structure of two-dimensional D-invariant subspace S1 (ρθ ) = span {e1 , e2 }. Expressing the operation of D in a matrix form 7 8 7 87 8 e x y e1 D 1 = , (x, y, z, w ∈ R) e2 e2 zw and using D = −4, we have 2

x2 + yz = −4,

y(x + w) = 0,

z(x + w) = 0,

w2 + yz = −4.

These equations do not contradict the identities e1 , De1 ρ = e2 , De2 ρ = 0 iff x + w = 0, y "= 0, z "= 0. In this case 7 8 7 87 8 e1 x y e1 D = . e2 e2 −(x2 + 4)/y −x Furthermore, the transformation of the basis 7 S8 7 87 8 ˜1 L 2/y 0 e1 ˜ S2 = x/y 1 e2 L yields

7

˜S L D ˜ 1S L2

8

7

0 2 = −2 0

87

8 ˜S L 1 ˜ S2 . L

Repeating the same procedure to other invariant subspaces, we have the theorem. ˜ S }2m of the SLD–tangent space Definition 23: The basis {L j j=1 S T (ρθ )/Ksa (ρθ ) which is subject to the transformation law (49) is called ρθ –symplectic. From Theorem 22, it is sufficient to consider a two-dimensional D-invariant SLD tangent space. The following theorem gives the condition for the model to have a two-dimensional D-invariant SLD–tangent space at ρθ . Theorem 24: For the pure state model {ρθ = |θθ|}, the following two conditions are equivalent. ˜ S }j=1,2 is a |θθ|–symplectic basis. (i) {L j ˜ S2 )|θ = 0. ˜ S1 + iL (ii) (L ˜ S2 } is D-invariant. ˜ S1 , L The linear span of such basis span{L

entire

December 28, 2004

13:56


Akio Fujiwara

258

˜ S in (31), we have Proof: We first assume (i). Letting X = L 1 ˜ S1 )ρθ = 0. ˜ S1 + i DL (L 2 ˜ S = 2L ˜ S yields (ii). Then, DL 1 2 ˜ S2 )ρθ = 0, ˜ S1 + iL Next we assume (ii). Since (L ˜ S ρθ = 2iL ˜ S ρθ , 2L 2 1

˜ S ρθ = 2iL ˜ S ρθ . −2L 1 2

Comparing these equations to (31), we have (i). Since the condition (ii) in Theorem 24 is similar to the definition of the coherent states in quantum theory [18], we shall make the following definition. Definition 25: The pure state model {ρθ = |θθ|} which satisfy the condition in Theorem 24 is called coherent. Thus the D-invariancy is equivalent to the coherency of the model. The next fact, a straightforward consequence of Theorem 24, characterizes a global structure. Corollary 26: Consider the pure state model of the form ρθ = Uθ ρ0 Uθ∗ where {Uθ } forms a projective unitary group. This model is coherent iff T S (ρ0 )/Ksa (ρ0 ) is D-invariant, i.e., the model has a ρ0 –symplectic basis. ˜ S U ∗ }j=1,2 becomes ˜ S }j=1,2 is a ρ0 –symplectic basis, then {Uθ L Indeed, if {L j j θ a ρθ –symplectic basis. Now we investigate the RLD–bound for coherent models. From Theorem 22, it is sufficient to consider a two-parameter coherent model. In this case, the RLD–bound can be explicitly obtained by substituting (47) into (16). Thus we have Proposition 27: The RLD–bound for a coherent model ρθ is √ detG R S Tr ρθ [LS1 , LS2 ] . C =C + S detJ Since the identity

(50)

Tr ρθ [LS1 , LS2 ] = Tr Abs ρθ [LS1 , LS2 ].

holds for any pure state models, the above CR–bound is similar to the Nagaoka bound, which has been proved most informative when ρθ is represented in 2 × 2 matrix [23]. More strongly, we will show in the next subsection (see Theorem 29) that this CR–bound is most informative for any

entire

December 28, 2004

13:56



259

pure coherent models, by constructing explicitly a generalized measurement that attains this bound. Thus, the coherent model has a nice property from an estimation theoretical viewpoint. Before leaving this subsection, we give two examples of coherent model. 3.4.1. Canonical coherent states Let us first consider the family of canonical coherent states ρz = |zz| in a one-dimensional harmonic oscillator with frequency ω, where z = (ωq + ip)/2 ∈ C, see [11, 12, 19]. This can be regarded as a two-parameter pure state model which has real parameters q and p. It is shown that the representative elements of SLDs are LSq =

2ω (Q − q),

LSp =

2 (P − p), ω

and

DLSq = 2ωLSp ,

2 ω

DLSp = − LSq .

Letting ˜ Sq = LSq = ω(Q − q), L 2

˜ Sp = ω LSp = P − p, L 2

we have ˜ S = 2L ˜S, DL q p

˜ S = −2L ˜S. DL p q

˜ Sq , L ˜ Sp } forms a ρz –symplectic basis. Therefore, from This indicates that {L Theorem 24, ˜ Sq + iL ˜ Sp )|z = [ω(Q − q) + i(P − p)]|z = 0, (L which is nothing but the definition of canonical coherent states. Furthermore, from Theorem 20, we obtain 7 2 8 σP i/2 (J R )−1 = , 2 −i/2 σQ 2 = /2ω, and the corresponding RLD–bound where σP2 = ω/2, σQ √ 2 gP VP [M ] + gQ VQ [M ] ≥ gP σP2 + gQ σQ + gP gQ

is identical to the pure state limit of the most informative CR bound obtained by Yuen and Lax [32] (p. 281 [17]).

entire

December 28, 2004

13:56


Akio Fujiwara

260

3.4.2. Spin coherent states Another example is the family of spin coherent states, see Appendix B. Let (θ, ϕ) be the polar coordinates where the north pole is θ = 0 and x–axis corresponds to ϕ = 0. The spin coherent state |θ, ϕ is defined as |θ, ϕ = R[θ, ϕ]|j = exp [iθ(Jx sin ϕ − Jy cos ϕ)] |j, where |j is the highest occupied state in the spin j system. It is shown that the SLDs at the north pole in the direction of ϕ = 0 and ϕ = π/2 are, respectively, 2Jx , 2Jy and the operation of D becomes DJx = 2Jy , ˜ S1 = Jx and L ˜ S2 = Jy form a |jj|–symplectic DJy = −2Jx . Therefore, L basis and from Theorem 24,

˜ S1 + iL ˜ S2 |j = J+ |j = 0, L where J+ = Jx + iJy is the spin creation operator. This is nothing but the definition of the terminal state |j. From this fact, we can immediately conclude that the model which comprises the totality of the spin coherent states ρθ,ϕ = |θ, ϕθ, ϕ| = R[θ, ϕ]|jj|R[θ, ϕ]−1 has D-invariant SLD tangent space at every point on the sphere. Indeed, since R[θ, ϕ] form a compact Lie group, Corollary 26 asserts that

˜ S2 R[θ, ϕ]−1 ˜ S1 R[θ, ϕ]−1 , R[θ, ϕ]L R[θ, ϕ]L form a |θ, ϕθ, ϕ|–symplectic basis. In particular, a two-parameter spin 1/2 model has D-invariant SLD tangent space and 7 2 8 % R &−1 1 sin θ −i sin θ J = . 1 sin2 θ i sin θ The corresponding CR bound is 2 √ gϕ + gθ gϕ . sin2 θ sin θ This bound is identical to the pure state limit of the most informative CR bound obtained by Nagaoka [23]. gθ Vθ [M ] + gϕ Vϕ [M ] ≥ gθ +

3.5. Linear random measurement Suppose we are given a two-parameter model {ρθ ; θ = (θ1 , θ2 ) ∈ R2 }, the corresponding SLD being {LSθ,1, LSθ,2 }. In the following, we often drop the

entire

December 28, 2004

13:56



261

Fig. 1. Random measurement of two non-commuting observables. It is expected that the optimal measurement might be given by a random measurement of two “observables” S 1 2 LS 1 and L2 which gives the best estimates for θ and θ , respectively.

subscript θ for notational convenience since we only consider local properties of the model. If LS1 and LS2 commute, then we can estimate the parameters (θ1 , θ2 ) in the same way as in the classical theory. Therefore, suppose [LS1 , LS2 ] "= 0. If one of the two parameters, say θ2 , is fixed, then we obtain a one-parameter sub-model {ρθ1 θ2 ; θ2 = const.}. For this sub-model, we have an optimum locally unbiased estimator T 1 = θ1 I + LS1 /(J S )11 for the parameter θ1 as was mentioned in the previous section, where (J S )11 is the (1, 1) component of the Fisher information matrix J S . Therefore it is natural to ask whether the optimum estimation for the original two parameter model ρθ can be realized by a random measurement of two “observables” LS1 and LS2 . More generally, let us investigate the infimum of tr GV [M ] with respect to the random measurements M of linearly independent two observables A1 , A2 in the linear span LS = {a1 LS1 + a2 LS2 ; a1 , a2 ∈ R}, see Figure 1. Note that if we obtain the optimum measurement M for G = I, the corresponding locally unbiased operators √ being X j , the solution n j k j G = [fjk ]. We then for general weight G is Y = k=1 fk X , where consider only G = I case in the following without loss of generality. Denote the dual basis by Li = J ij LSj where J ij is the (i, j) entry of inverse of the SLD–Fisher information matrix. A pair of self-adjoint operators X 1 , X 2 whose expectations vanish is locally unbiased with respect to the parameters θ1 , θ2 iff LSi , X j = δij because of (5), where ·, · = ·, ·ρθ . This condition is rewritten as Li , X j = Li , Lj , which indicates that Lj is the orthogonal projection of X j onto the SLD–tangent space with respect to ·, ·, see Figure 2. We therefore restrict ourselves to the case X j = Lj , (j = 1, 2). Further, let A1 , A2 ∈ LS be linearly independent observables which are to be measured at random, assuming Aj , Aj = 1

entire

December 28, 2004

13:56


Akio Fujiwara

262

Fig. 2. Geometrical interpretation of local unbiasedness condition. Every locally unbiased operators X j have the common orthogonal projection Lj which is the dual of LS j.

without loss of generality. Their spectral decompositions are written as aj (ξ)Ej (ξ). (51) Aj = ξ

Let us construct their random measurement as follows. Select one of A1 , A2 according to the probability p1 , p2 , respectively, and make an exact measurement of it (in von Neumann’s sense). The corresponding resolution of identity is defined by Mj (ξ) = pj Ej (ξ).

(52)

When we selected an observable Aj and obtained an outcome aj (ξ), we identify this result to a pair of real quantities b1j (ξ) and b2j (ξ), which are connected to L1 , L2 by bkj (ξ) =

1 k j L , A aj (ξ), pj

(j, k = 1, 2),

(53)

where {A1 , A2 } is the dual basis of {A1 , A2 } in LS with respect to the inner product ·, ·. Indeed, the following relation holds: Lk =

2 j=1

bkj (ξ)Mj (ξ).

(54)

ξ

Then, by comparing the decomposition (54) to (4), we can evaluate the

entire

December 28, 2004

13:56



263

equi-weighted trace of covariance matrix as tr V [M ] =

2 % 1 &2 % 2 &2 bj (ξ) + bj (ξ) Tr ρMj (ξ) j=1

ξ

1 L1 , Aj 2 + L2 , Aj 2 . = pj j

(55)

Our aim is to find the infimum of (55) with respect to {pj } and {Aj }. Observing the fact that µ1 /p1 + µ2 /p2 , (p1 + p2 = 1) takes its minimum √ √ √ √ √ ( µ1 + µ2 )2 when and only when pj = µj /( µ1 + µ2 ), we have min tr V [M ] = {pj }

2 ; ; L1 , A1 2 + L2 , A1 2 + L1 , A2 2 + L2 , A2 2 .

(56) The problem is then reduced to the minimization of (56) with respect to A1 , A2 . The normalization conditions A1 , A1 = A2 , A2 = 1 impose the following constraints on A1 , A2 , A1 , A1 = A2 , A2 =

1 , 1 − α2

A1 , A2 = −

α , 1 − α2

(57)

where α = A1 , A2 . Let us define a linear transformation φ : LS −→ LS by φ(W ) = L1 , W L1 + L2 , W L2 .

(58)

Since φ is symmetric and positive definite, it has positive eigenvalues λ1 , λ2 and unit eigenvectors U1 , U2 , satisfying φ(Uj ) = λj Uj ,

(j = 1, 2).

(59)

By expanding as Ai = aij Uj , the problem to be solved is written in the form ; 2 ; 1 , φ(A1 ) + 2 , φ(A2 ) A A minimize 1 2 A ,A

= minimize 1 2 A ,A

; 2 ; λ1 (a11 )2 + λ2 (a12 )2 + λ1 (a21 )2 + λ2 (a22 )2 .

The constraints (57) become (a11 )2 +(a12 )2 = (a21 )2 +(a22 )2 =

1 , 1 − α2

(a11 )(a21 )+(a12 )(a22 ) = −

α . 1 − α2

entire

December 28, 2004

13:56


264

Akio Fujiwara

Introduce another parameterization by 1 1 a11 = √ cos θ, a12 = √ sin θ, 2 1−α 1 − α2 1 1 a21 = √ cos ϕ, a22 = √ sin ϕ. 1 − α2 1 − α2 Then the remaining third constraint becomes cos(θ −ϕ) = −α, and we have the constraint free minimization problem as 82 79 9 1 2 θ + λ sin2 θ + 2 ϕ + λ sin2 ϕ λ cos λ cos minimize 1 2 1 2 θ,ϕ,α 1 − α2 √ 2 ; 1 = minimize a + b cos 2θ + a + b cos 2ϕ , θ,ϕ sin2 (θ − ϕ) where a = (λ1 + λ2 )/2, b = (λ1 − λ2 )/2. By using the inequality (see Appendix C) √ √ ; √ 1 a + b cos 2θ + a + b cos 2ϕ ≥ a + b + a − b, (60) sin(θ − ϕ) which holds for 0 < θ − ϕ < π, we obtain the desired infimum ; ; 2 min min tr V [M ] = λ1 + λ2 {Aj } {pj } ; = L1 , L1 + L2 , L2 + 2 L1 , L1 L2 , L2 − L1 , L2 2 . (61) The last equality follows from the fact that the trace λ1 + λ2 and the determinant λ1 λ2 of the linear transformation φ is independent of the choice of the basis which represents φ in a matrix form. Since the equality in (60) holds on a sinusoidal periodic curve, there exist continuous potency of optimum measurements that attain the minimum (61). The lower bound (61) is first appeared in [25, 26], although its meaning is not clearly stated. Let us give two applications which clarify the significance of the lower bound (61). 3.5.1. Gaussian model For a quantum Gaussian model (see Appendix A), the SLDs become 1 1 LSp = 2 (P − p), LSq = 2 (Q − q), σq σp where σq2 = Q − q, Q − q and σp2 = P − p, P − p are variances of Q and P , respectively. Then, 7 7 2 8 8 1/σq2 0 σq 0 S −1 ) = JS = , (J , 0 1/σp2 0 σp2

entire

December 28, 2004

13:56



265

and the duals of LSq and LSp are Lq = Q − q,

Lp = P − p,

respectively. Therefore, the bound (61) becomes tr V [M ] = σq2 + σp2 + 2σq σp . This is greater than the most informative bound (see Appendix A) tr V [M ] = σq2 + σp2 + unless σq σp = /2, i.e., unless the system is in a minimum uncertainty state. Furthermore, even in the pure coherent state case, it is an open question whether the covariant measurement M (p, q) = U [p, q]|00|U ∗ [p, q], which is known to attain the above most informative bound (p. 281 [17]), can be constructed directly from the continuous combination of the above optimal random measurements. 3.5.2. Pure coherent models In this section, we will prove that the RLD–bound (50) gives the most informative CR–bound for any quantum coherent model by showing that it is attainable by random measurements. Definition 28: The parameter θ of a model ρθ is called ρ0 –canonical if the SLD–Fisher information matrix with respect to θ is in diagonal form at ρ0 . This condition is not restrictive since, by a certain transformation of coordinate system, we can always diagonalize the SLD–Fisher information matrix. Theorem 29: Suppose the parameter θ of the coherent model ρθ is ρ0 – canonical. Then the corresponding RLD–bound is attainable at ρ0 by a certain random measurement. Proof: After some calculation, we have another expression for the RLD– bound (50) as (62) C R = L1 , L1 + L2 , L2 + Tr ρθ [L1 , L2 ] , where we set G = I. On the other hand, from the assumption, there exists ˜ S1 , L ˜ S2 } non–zero real numbers c1 , c2 and normalized ρ0 –symplectic basis {L

entire

December 28, 2004

13:56


Akio Fujiwara

266

˜ S , or Lj = L ˜ S /cj . Substituting this relation to the such that LSj = cj L j j condition (ii) in Theorem 24, we have (c1 L1 + ic2 L2 )ρ0 = 0, which is nothing but the minimum uncertainty condition in the Heisenberg’s uncertainty relation. Then L1 , L1 L2 , L2 =

2 1 Tr ρ0 [L1 , L2 ] , 4

L1 , L2 = 0.

(63)

Comparing (61) (62) and (63), we have the theorem. This proof is indebted to the special choice of the coordinate system. It is not yet clear whether C R can be attained in an arbitrary coordinate system§ . Here we give an example of a two parameter coherent model which have non-diagonal SLD–Fisher information matrix, called the photon squeezed state model (see Appendix D). Throughout this example, adjoint operators and complex conjugate numbers are denoted by † and ∗, respectively, according to the convention in physics. A photon squeezed state |zξ is defined as 1 2 |zξ = D(z)S(ξ)|0, D(z) = exp(za† − z ∗ a), S(ξ) = exp[ (ξa† − ξ ∗ a2 )], 2 where z, ξ are complex numbers, a and a† are the annihilation and creation a† ] = 0, and |0 is operators of a boson satisfying [a, a† ] = 1, [a, a] = [a† , √ the Fock vacuum with respect to a. Letting z = (q + ip)/ 2, we may regard the family of density operators ρz = |zξξ z| = D(z)ρ0 D∗ (z),

ρ0 = |0ξξ 0|

as a quantum parametric model which is parameterized by two real numbers q and p. The √ pre-inner products with respect to the position √ operator Q = (a + a† )/ 2 and the momentum operator P = (a − a† )/i 2 become 8 7 (Q − q, Q − q)ρz (Q − q, P − p)ρz (P − p, Q − q)ρz (P − p, P − p)ρz 7 8 1 (cosh 2s + cos θ sinh 2s) (sin θ sinh 2s + i) = , (sin θ sinh 2s − i) (cosh 2s − cos θ sinh 2s) 2 § Additional author’s note: This problem has been resolved affirmatively in A. Fujiwara and H. Nagaoka, J. Math. Phys., 40, 4227-4239 (1999), which is reprinted as Chap. 19 of this book.

entire

December 28, 2004

13:56



267

where ξ = seiθ . Noting Tr ρz LA j = 0 (j = q, p), the ALDs are readily obtained as LA q = 2i(P − p),

LA p = −2i(Q − q).

Further, observing the relation b|zξ = β|zξ , where b = S(ξ)aS −1 (ξ) = a cosh s − a† eiθ sinh s, β = z cosh s − z ∗ eiθ sinh s, we immediately have a normalized ρz –symplectic basis as √ ˜ S = 2[(Q − q)(cosh s − cos θ sinh s) − (P − p) sin θ sinh s], L q √ ˜ Sp = 2[(P − p)(cosh s + cos θ sinh s) − (Q − q) sin θ sinh s]. L On the other hand, from (30) (34) and (35), the SLDs are given as LSj = A D(−LA j /2i), (j = q, p). Then, by expanding {Lj } into linear combinations S S S ˜ = 2L ˜ S = −2L ˜ }, and using the relations DL ˜ and DL ˜ S , we have of {L q p p q j LSq = 2[(Q − q)(cosh 2s − cos θ sinh 2s) − (P − p) sin θ sinh 2s], LSp = 2[(P − p)(cosh 2s + cos θ sinh 2s) − (Q − q) sin θ sinh 2s]. Then the SLD–Fisher information matrix becomes 7 S S 8 Lq , Lq ρz LSq , LSp ρz JS = LSp , LSq ρz LSp , LSp ρz 7 8 cosh 2s − cos θ sinh 2s − sin θ sinh 2s =2 . − sin θ sinh 2s cosh 2s + cos θ sinh 2s We thus have the dual basis of LSq , LSp as Lq = Q − q,

Lp = P − p.

Furthermore, this metric is, up to a constant factor, identical to the Fubini– Study metric [1], and is also identical to the real part of the complex ALD– Fisher information matrix 8 7 A A A (Lq , Lq )ρz (LA A q , Lp )ρz J = A A A (LA p , Lq )ρz (Lp , Lp )ρz 7 8 cosh 2s − cos θ sinh 2s − sin θ sinh 2s + i =2 , − sin θ sinh 2s − i cosh 2s + cos θ sinh 2s

entire

December 28, 2004

13:56


Akio Fujiwara

268

as is understood by Theorem 18. This result indicates that the coordinate system (q, p) is a non-orthogonal one. Let us change the coordinate system (q, p) into another one (q , p ) such that the corresponding SLD LSq , LSp are orthogonal at ρz with respect to the pre-inner product ·, ·ρz . For instance, the transformation of coordinate system q = q(cosh s − cos θ sinh s) − p sin θ sinh s p = p(cosh s + cos θ sinh s) − q sin θ sinh s √ S S √ S ˜ q , L = 2L ˜ p . Then, according to Theorem 4.1, there lead to LSq = 2L p exist random measurements that attain the RLD bound with respect to (q , p ). In this example, moreover, min tr V [M ] for the original parameter (q, p) in (61) also coincides with C R in (62) as min tr V [M ] = C R = cosh 2s + 1. M

3.6. Geometry of pure state estimation In this section, we make a supplementary consideration toward the information geometry of quantum pure states. This attempt has not been completed and most problems are left for future study¶ . 3.6.1. Dualistic structure of pure state space We have introduced in Chapter 2 a dualistic structure in the space of strictly positive density operators, where the m-tangent vector and the e-tangent vector are uniquely determined and a pair of tele-parallel translations are naturally defined which make the state space dually curvature free. In the pure state case, however, the e-tangent vector (the SLD) is not uniquely determined, i.e., it is determined up to the kernel Ksa (ρ) = {K ; ρK = 0, K = K ∗ } as shown in the previous chapter. Moreover, to make matters worse, we cannot even introduce tele-parallel translations, since the set of mtangent and e-tangent vectors cannot be identified with the set of operators satisfying Tr G = 0 and Tr ρH = 0, respectively. This implies that the pure state space P inherently has non-vanishing curvatures even in the dualistic geometry. We must therefore define a dualistic structure by means of a pair of infinitesimal parallel translations, i.e., covariant derivatives. Let us first define m- and e-representations of tangent vectors. ¶ Additional author’s note: For more information, see A. Fujiwara, “Geometry of quantum information systems,” in Geometry in Present Day Sciences, ed. O.E. BandorffNielsen and E.V.B. Jensen (World Scientific, Singapore, 1999) 35–48.

entire

December 28, 2004

13:56



269

• m-tangent vector (m)

∂

(P)

∈

∈

Tρ (P ) = Tρ =

∂ρ

• e-tangent vector (e)

∂

∈

∈

Tρ (P ) = Tρ (P ) =

L

Here, L is the SLD defined by 1 ∂ρ = [ρL + Lρ], L = L∗ , 2 which is determined up to Ksa (ρ). As shown in the previous chapter, the Riemannian metric is uniquely defined regardless of the uncertainty of L. 1 gρ (∂i , ∂j ) = Li , Lj ρ = Tr ρ(Li Lj + Lj Li ) = Tr (∂i ρ)Lj . 2 On the other hand, a pair of covariant derivatives are not uniquely determined due to the uncertainty of L: • m-covariant derivative (m)

∇∂i ∂j = ∂i ∂j ρ. • e-covariant derivative (e)

∇∂i ∂j = ∂i Lj − Tr ρ(∂i Lj ) +

δKj . dθi

Here, δKj is an arbitrary infinitesimal element in Ksa (ρ). Note δKj /dθi ∈ Ksa (ρ) and it does not mean the derivative of Kj (ρ). The corresponding connection coefficients become (m)

(m)

(e)

(e)

Γij,k = ∇∂i ∂j , ∂k ρ = Tr (∂i ∂j ρ)Lk , Γij,k = ∇∂i ∂j , ∂k ρ = Tr (∂i Lj )(∂k ρ), where Tr (δKj /dθi )(∂k ρ) = 0 is used. These formulae immediately lead to the duality of the connections with respect to the Riemannian metric: (m)

(e)

∂i g(∂j , ∂k ) = g(∇∂i ∂j , ∂k ) + g(∂j , ∇∂i ∂k ). Thus, the dualistic structure is determined by specifying a smooth family of e-tangent vectors {Li (ρ)}. Additional

author’s note: These definitions need slight modifications. See the reference cited in the footnote of p. 262.

entire

December 28, 2004

13:56


Akio Fujiwara

270

3.6.2. Curvature and torsion As stated before, the curvatures for pure state space never vanishes. On the other hand, we can expect that the torsion, which was inevitably nonzero in the strictly positive state case, may vanish in the pure state case by a suitable choice of the smooth family of e-tangent vectors. The torsion tensor is given (p. 46 [2]) by (e)

(e)

(e)

Sij,k = Γij,k − Γji,k = Tr (∂i Lj − ∂j Li )(∂k ρ). Therefore, if we adopt, for instance, 2∂ρ as the representative element of the SLD, the torsion tensor indeed vanishes. The next theorem gives the general condition for the torsion tensor to vanish. Theorem 30: Suppose we adopt a smooth family of e-tangent vectors {Li (ρ)}. The corresponding torsion tensor vanishes iff ∂i Kj (ρ) − ∂j Ki (ρ) ∈ Ksa (ρ), where Ki (ρ) denotes the deviation of e-tangent vector Li (ρ) from 2∂i ρ, i.e. Li (ρ) = 2∂i ρ + Ki (ρ),

Ki (ρ) ∈ Ksa (ρ).

Proof: The sufficiency is obvious. We show the necessity. For notational simplicity, we drop the argument ρ. (e)

Sij,k = Tr (∂i Lj − ∂j Li )(∂k ρ) = Tr (∂i Kj − ∂j Ki )(∂k ρ) 1 = Tr (∂i Kj − ∂j Ki )(ρLk + Lk ρ) 2 1 = Tr {(∂i Kj − ∂j Ki )ρ + ρ(∂i Kj − ∂j Ki )}Lk . 2 Since ρK = 0, ρ(∂K) = −(∂ρ)K holds. Substituting this and its conjugate into the above equation, we have (e)

1 Tr {−Kj (∂i ρ) + Ki (∂j ρ) − (∂i ρ)Kj + (∂j ρ)Ki }Lk 2 1 = Tr {−Kj (ρLi + Li ρ) + Ki (ρLj + Lj ρ) 4 −(ρLi + Li ρ)Kj + (ρLj + Lj ρ)Ki }Lk 1 = Tr {(Ki Lj − Kj Li )ρLk + Lk ρ(Lj Ki − Li Kj )} 4 1 = Re lk |φij , 2

Sij,k =

entire

December 28, 2004

13:56



271

where ρ = |ψψ|, and |lk = Lk |ψ, |φij = (Ki Lj − Kj Li )|ψ. Since (e) |lk ∈ H is arbitrary except that ψ|lk = 0, Sij,k = 0 implies |φij = 0 for all i, j (Note also ψ|φij = 0). This is equivalent to (Ki Lj − Kj Li )ρ = 0, or Ki (∂j ρ) = Kj (∂i ρ), which implies (∂i Kj )ρ = (∂j Ki )ρ due to ρK = 0. Therefore ∂i Kj − ∂j Ki ∈ Ksa Theorem 30 asserts that, by a suitable choice of the element of the kernel, the torsion of the dualistic structure for pure states may vanish. For instance, the dualistic structure reduces to the Riemannian structure if we adopt L(ρ) = 2∂ρ as a smooth family of SLDs, which corresponds to setting K(ρ) = 0 in the theorem. Remarkably, the following fact holds for 2 × 2 matrix representation. Corollary 31: The dualistic structure for 2 × 2 matrix representation has vanishing torsion iff it is Riemannian. Proof: For the pure state ρ represented by the following 2 × 2 matrix 1 (I + xσx + yσy + zσz ), x2 + y 2 + z 2 = 1, 2 the kernel Ksa (ρ) becomes one-dimensional, and is generated by ρxyz =

1 (I − xσx − yσy − zσz ). 2 Letting Ki = fi K, Kj = fj K, K=

∂i Kj − ∂j Ki = (∂i fj )K − (∂j fi )K + fj (∂i K) − fi (∂j K). The first two terms in the right-hand side belong to Ksa (ρ), whereas the other terms become so iff fi = fj = 0 since both ∂i K and ∂j K do not belong to Ksa (ρ). 3.7. Conclusions A quantum estimation theory of the pure state models was presented. We first investigated a general framework of the pure state estimation theory and derived quantum counterpart of the Fisher metric. The statistical significance of the Fubini–Study metric was also stressed. We then formulated the one-parameter pure state estimation theory based on the symmetric logarithmic derivative and disclosed the characteristics of the pure states. Some examples were also given in order to demonstrate the one-parameter pure state estimation, and clarified the difference between the pure state models and the strictly positive models.

entire

December 28, 2004

13:56

272


Akio Fujiwara

We next considered the possibility of the estimation theory based on the right logarithmic derivatives. We then investigated D–invariancy of the SLD–tangent space, which led to the notion of coherent models, and derived explicitly the RLD–bound. Some examples were also given. We further derived an explicit infimum of tr V [M ] with respect to the linear random measurement M in the linear span of SLDs. The corresponding lower bound was attainable by infinitely many (continuous potency) measurements. The RLD–bound for pure coherent models was also studied, and was found to be most informative. The construction of the general quantum multi-parameter estimation theory is left for future study, as the strictly positive model case. Finally, we considered naively the information geometry of quantum pure states. The dualistic structure for pure states was determined by specifying a smooth family of kernels. The curvature is inevitable, whereas the torsion may vanish, whose necessary and sufficient condition was also derived. This attempt has not been completed and most problems are left for future study. For instance, a contrastive study of autoparallelities with respect to various e-connections (including Levi–Civita connection) shall be urgently challenged.

Appendix A. Parameter Estimation of a Quantum Gaussian Model In this appendix, we present an accessible review of the parameter estimation theory of quantum Gaussian models. We start with a brief summary of the Weyl representation of Hilbert-Schmidt operators and the noncommutative Fourier transforms under the Fock space representation. For details, see p. 223 [17] or p. 178 [19]. For an arbitrary Hilbert-Schmidt operator A, define 8 7 i (kQ + xP ) = a(x, k), (64) F x,k {A} = Tr A exp where [Q, P ] = i and a positive constant (Planck’s constant), then 7 8 6 dxdk i A = a(x, k) exp − (kQ + xP ) (65) 2π holds. (64) is called the non-commutative Fourier transform of A, and (65) is called the Weyl representation of A. In particular, the non-commutative

entire

December 28, 2004

13:56



273

Fourier transform F x,k {ρ} of a density operator ρ is called the quantum characteristic function. With the help of the Baker–Hausdorff formula, we have 8 7 8 7 8 7 i i i kx exp kQ exp xP F x,k {ρ} = Tr ρ exp 2 7 8 7 8 7 8 i i i = Tr ρ exp − kx exp xP exp kQ . 2 Differentiating these equations, we have useful formulae. For instance, 7 8 7 8 7 8 i i i i i ∂ x + Q exp kx exp kQ exp xP F x,k {ρ} = Tr ρ ∂k 2 2 i i = xF x,k {ρ} + Fx,k {ρQ}, 2 yields 8 7 ∂ x F x,k {ρQ} = − − i F x,k {ρ}. 2 ∂k In the same way,

7 8 ∂ x F x,k {Qρ} = + − i F x,k {ρ} 2 ∂k 7 8 ∂ k F x,k {ρP } = + − i Fx,k {ρ} 2 ∂x 7 8 ∂ k F x,k {P ρ} = − − i Fx,k {ρ}. 2 ∂x

Then we obtain the following formulae ∂ F x,k {ρ} ∂k ∂ Fx,k {ρP − P ρ} = +k Fx,k {ρ}, F x,k {ρP + P ρ} = −2i F x,k {ρ} ∂x The next formula is also useful, which translates the Weyl representation into Glauber-Sudarshan’s P-representation, and vice versa ([6], p. 179 (8.142), p. 185 (8.159) [19]). Fx,k {ρQ − Qρ} = −xFx,k {ρ},

F x,k {ρQ + Qρ} = −2i

Proposition 32: The two representations of a trace class operator T 8 7 6 6 dxdk dpdq i = ϕ(p, q)|p, qp, q| T = t(x, k) exp − (kQ + xP ) 2π 2π are related by ϕ(x, ˜ k) = t(x, k) exp[

1 2 (x + k 2 )], 4

entire

December 28, 2004

274

13:56


Akio Fujiwara

where ϕ(x, ˜ k) is the Fourier transform of ϕ(p, q), i.e., 8 7 6 dpdq i . ϕ(x, ˜ k) = ϕ(p, q) exp (kq + xp) 2π A Gaussian state of a harmonic oscillator (Boson) is a model for the state of coherent lights associated with thermal noise. A Gaussian state of one degree of freedom with angular frequency ω is defined by GlauberSudarshan’s P-representation as 6 1 ϕθ (z)|zz| d2 z, (66) ρθ = π where ωx + iy dxdy ωq + ip , z= √ , d2 z = , θ= √ 2 2ω 2ω with q, p being the expectation of position and momentum, respectively. Further, |z is the Boson coherent state and 7 8 1 |z − θ|2 exp − , (67) ϕθ (z) = N N where N is a positive quantity which is related to the system temperature (Chap. V [17]). The model can be written also in the form 8 7 i ∗ (68) Uθ = exp (pQ − qP ) . ρθ = U θ ρ0 U θ , With the help of Proposition 32, the corresponding quantum characteristic function becomes 8 7 i (kQ + xP ) F x,k {ρθ } = Tr ρθ exp 7 8 i 1 2 2 = exp (kq + xp) − 2 (σQ k + σP2 x2 ) , (69) 2 where

1 2 = N + σQ , 2

1 σP2 = N + 2

(70)

are the variances of position Q and momentum P , respectively, which satisfy 1 . σQ σP = N + 2 Let us first calculate the SLDs for the Gaussian model (66). From (69), 3 4 7 8 2 σQ i σP2 ∂ i ∂ p − 2 x F x,k {ρ}, q − 2 k F x,k {ρ}. F x,k {ρ} = F x,k {ρ} = ∂x ∂k

entire

December 28, 2004

13:56



275

Then, by using the formulae obtained above, we have i Fx,k {ρ(P − p) + (P − p)ρ}, 2σP2 . i Fx,k {ρP − P ρ} = − 2 Fx,k {ρ(Q − q) + (Q − q)ρ}. 2σQ

F x,k {ρQ − Qρ} = +

(71)

By using (68) and the Baker–Hausdorff formula, i ∂ρ = − (P ρ − ρP ), ∂q

∂ρ i = (Qρ − ρQ). ∂p

Taking their non-commutative Fourier transformations and using (71), we get F x,k {

i ∂ρ 1 } = − F x,k {P ρ − ρP } = 2 F x,k {ρ(Q − q) + (Q − q)ρ}, ∂q 2σQ

F x,k {

i ∂ρ 1 } = Fx,k {Qρ − ρQ} = F x,k {ρ(P − p) + (P − p)ρ}, ∂p 2σP2

yielding LSq =

1 2 (Q − q), σQ

LSp =

1 (P − p). σP2

(72)

We next calculate the RLDs. The next lemma is the key. Lemma 33: i LSj = (I + D)LR j . 2 Proof: By using the definition of the commutation operator (23), we have (A, B)ρ = Tr ρBA∗ =

1 1 Tr ρ(BA∗ + A∗ B) + Tr ρ(BA∗ − A∗ B) 2 2

i = A, (I + D)Bρ . 2 Now, operating an arbitrary bounded X to the identity ∂ρθ 1 = (ρθ LSj + LSj ρθ ) = LR∗ j ρθ j ∂θ 2 from the right, and then taking the trace, we have i R LSj , Xρ = (LR j , X)ρ = (I + D )Lj , Xρ . 2 Since X is arbitrary, we have the lemma.

entire

December 28, 2004

13:56


Akio Fujiwara

276

Now, comparing (71) and the definition of D (23), we have

DQ =

(P − p), σP2

DP = −

2 (Q − q). σQ

(73)

The RLDs for the Gaussian model are then calculated from (72) and (73) together with Lemma 33 as 1 i LR [σ 2 (P − p) + (Q − q)], p = 2 2 σP σQ − 2 /4 Q 2 LR q =

1 i 2 2 − 2 /4 [σP (Q − q) − 2 (P − p)]. σP2 σQ

(74)

The SLD– and the RLD–Fisher informations are readily calculated from (72) and (74) as 7 8 1/σP2 0 JS = , (75) 2 0 1/σQ 8 7 2 1 σQ −i/2 , (76) JR = 2 2 σP σQ − 2 /4 i/2 σP2 the corresponding Cramér–Rao inequalities being 7 2 8 σP 0 V [M ] ≥ (J S )−1 = , 2 0 σQ 7 2 8 σP i/2 R −1 V [M ] ≥ (J ) = . 2 −i/2 σQ

(77) (78)

Then the CR bounds for SLD (15) and RLD (16) becomes, respectively, 2 = CS, gP VP [M ] + gQ VQ [M ] ≥ gP σP2 + gQ σQ √ 2 + gP gQ = C R . gP VP [M ] + gQ VQ [M ] ≥ gP σP2 + gQ σQ

(79) (80)

Obviously C R > C S . Moreover, it is shown that C R is most informative ([32], p. 281 [17]). B. Spin Coherent State In this appendix, we give a comprehensible review of spin coherent states. Throughout this appendix, adjoint operators and complex conjugate numbers are denoted by † and ∗, respectively, according to the convention in physics. A spin coherent state, originally introduced through a mathematical analogy of the canonical coherent state, permits some fashion of definition since it has been published almost at the same time by several authors.

entire

December 28, 2004

13:56



277

Here we adopt a rather natural description∗∗ of it, based on Radcliffe’s pioneering work [30] and Arecchi et al.’s elegant description [4]. Let εabc be the completely anti-symmetric tensor and define the su(2) algebra by (a, b, c ∈ {x, y, z}).

[Ja , Jb ] = iεabc Jc ,

Further, we take a complete orthonormal basis in the irreducible representation of su(2) with highest weight j as the eigenvectors of Jz , i.e. Jz |j, m = m|j, m,

(m = j, j − 1, · · · , −j).

In the following we write |j, m = |m for short. Now, the spin coherent state |µ (µ ∈ C) is defined by |µ = N

−1/2 µJ−

e

|j = N

−1/2

2j 1/2 2j p=0

p

µp |j − p,

(81)

where |j is the highest occupied state, J− = Jx − iJy the spin annihilation operator, N the normalization factor, and the next identity 1/2 p! 2j! p |j − p (82) (J− ) |j = (2j − p)! is used in the second equality. This definition is comparable to that of the canonical coherent state |z, i.e. |z = e−|z|

2

/2 za† −z ∗ a

e

e

|0 = e−|z|

2

/2 za†

e

|0.

From (81), µ|µ = N −1

2j 2j |µ|2p = N −1 (1 + |µ|2 )2j , p p=0

then the normalized spin coherent state becomes |µ =

2j 1/2 2j 1 1 µJ− e |j = µp |j − p. 2 j 2 j (1 + |µ| ) (1 + |µ| ) p=0 p

(83)

We next list up some useful mathematical properties of the spin coherent state, which are straightforward consequences of the representation (83). ∗∗ This

description was adopted by Lieb in [21].

entire

December 28, 2004

13:56


Akio Fujiwara

278

The inner product is (1 + λ∗ µ)2j , (1 + |λ|2 )j (1 + |µ|2 )j 2j |λ − µ|2 . |λ|µ|2 = 1 − (1 + |λ|2 )(1 + |µ|2 ) λ|µ =

(84) (85)

The resolution of identity becomes 6 2j d2 µ 2j + 1 |µµ| = |j − pj − p| = 1. π (1 + |µ|2 )2 p=0

(86)

Further, we have 2µ λ|µ, 1 + λ∗ µ 1 − λ∗ µ λ|Jz |µ = j λ|µ. 1 + λ∗ µ

λ|J+ |µ = j

The following formula is also useful. k l λ|J+ J− |µ

1 = (1 + |λ|2 )j (1 + |µ|2 )j

λ|J− |µ = j

∂ ∂λ∗

k

2λ∗ λ|µ, 1 + λ∗ µ

∂ ∂µ

l

(1 + λ∗ µ)2j .

(87)

Finally, we present an intuitive physical interpretation of spin coherent states. Let (θ, ϕ) the polar coordinate system on a unit sphere in which the north pole corresponds to θ = 0 and the x-axis corresponds to ϕ = 0. The rotation at an angle θ with respect to the axis n = (sin ϕ, − cos ϕ, 0) is then described by the operator R [θ, ϕ] = eiθ(Jx sin ϕ−Jy cos ϕ) = eξJ− −ξ

∗

J+

= eµJ− e− log(1+|µ|

2

)Jz −µ∗ J+

e

.

(88)

Here θ iϕ θ µ = tan eiϕ , (89) e , 2 2 and the third equality in (88) is called the disentangling theorem [4], which corresponds to the Baker–Hausdorff formula in the Heisenberg algebra. By using (88), the spin coherent state can be rewritten as ξ=

|µ = (1 + |µ|2 )−j eµJ− |j = eµJ− e− log(1+|µ|

2

)Jz −µ∗ J+

e

|j = R [θ, ϕ]|j.

This expression indicates that the spin coherent state is given by operating the rotation R [θ, ϕ] to the highest occupied state (north pole) |j. Here µ

entire

December 28, 2004

13:56



279

and (θ, ϕ) are mutually connected by (89). Geometrically, this correspondence is the stereo projection from the south pole to the tangent space at the north pole. In this sense, the spin coherent state is also denoted as |µ = |θ, ϕ. C. Proof of Inequality (7.33) Denote the right-hand side of (60) by f (θ, ϕ), and set b = 1 without loss of generality, i.e., √ ; 1 a + cos 2θ + a + cos 2ϕ , (a > 1). (90) f (θ, ϕ) = sin(θ − ϕ) Extremizing conditions ∂f /∂θ = 0 and ∂f /∂ϕ = 0 lead to ; a + cos 2θ + (a + b cos 2θ)(a + b cos 2ϕ) cot(θ − ϕ) + sin 2θ = 0, (91) and ; a + cos 2ϕ + (a + b cos 2θ)(a + b cos 2ϕ) cot(θ − ϕ) − sin 2ϕ = 0, (92) respectively. Subtracting these two equalities, we have (cos 2θ − cos 2ϕ) cot(θ − ϕ) + sin 2θ + sin 2ϕ = 0, but this is an identity. So the equalities (91) and (92) are equivalent. In other words, (91) is the only extremizing condition for (90). On the other hand, adding the equalities (91) and (92), we have, cos(θ − ϕ) = −

2 cos(θ + ϕ), f 2 (θ, ϕ)

(93)

which is also equivalent to either (91) or (92). We show that, under the condition 0 < θ − ϕ < π, (93) have a unique globally connected solution. Suppose (93) have possibly disconnected solutions (curves) labeled by Cn . Note that f (θ, ϕ) is constant on each Cn since ∂f /∂θ = ∂f /∂ϕ = 0 along the curve. Let us denote the corresponding constant value of 2/f 2 (θ, ϕ) on Cn by An . Furthermore, let us change the coordinate system as θ − ϕ = π/2 − δ, θ + ϕ = π/2 − x. Then ; 1 ; f (θ, ϕ) = fˆ(x, δ) = a + cos(x − δ) + a − cos(x + δ) , (a > 1). cos δ (94) and the extremizing condition (93) becomes π π (95) sin δ = −An sin x, (− < δ < ). 2 2

entire

December 28, 2004

13:56


Akio Fujiwara

280

Therefore, we can regard Cn as the connected component of the solution of (95) which crosses the x-axis at x = nπ (n = 0, ±1, · · · ). Since An is constant on each curve Cn , we can evaluate its value at the x-intercepts as 2 2 √ . = √ 2 ˆ ( a + 1 + a − 1)2 f (nπ, 0)

An =

In particular, An does not depend on n and is less than unity. Therefore, the extremizing condition (95), which is rewritten as 2 √ sin x, sin δ = − √ ( a + 1 + a − 1)2 has a unique globally connected solution 8 7 2 √ sin x . δ = − arcsin √ ( a + 1 + a − 1)2

(96)

In other words, all the Cn ’s are identical with each other. It is evident that fˆ(x, δ) takes its minimum along the curve (96). Therefore, √ √ f (θ, ϕ) ≥ fˆ(0, 0) = a + 1 + a − 1. D. Photon Squeezed State In this appendix, we give a quick review of photon squeezed states. Throughout this appendix, adjoint operators and complex conjugate numbers are denoted by † and ∗, respectively, according to the convention in physics. Let a, a† be the Boson annihilation and creation operators, respectively, which satisfy [a, a† ] = 1. Proposition 34: The following relations hold. (i) (ii)

2

[a† , a] = −2a† ,

[a2 , a† ] = 2a. 1 2 Given ξ ∈ C, define A = (ξa† − ξ ∗ a2 ). Then it satisfies 2 A† = −A,

(iii)

(v)

[A, a† ] = −ξ ∗ a.

Define b = e−A a eA , then b† = e−A a† eA , [b, b† ] = 1 and b = a cosh |ξ| + a†

(iv)

[A, a] = −ξa† ,

†

e−A eza †

−z ∗ a A

∗

ξ sinh |ξ|, |ξ| †

e = ezb †

∗

−z ∗ b

za − z a = βb − β b

a = b cosh |ξ| − b†

ξ sinh |ξ|. |ξ|

β = z cosh |ξ| + z ∗

ξ sinh |ξ|. |ξ|

.

where

entire

December 28, 2004

13:56



281

Proof: (i) is a direct consequence of [a, a† ] = 1 and (ii) is immediately derived from (i). Further, (iii) is derived as 1 e−A a eA = a + [−A, a] + [−A, [−A, a]] + · · · 2! 1 1 = a + ξa† + ξξ ∗ a + ξξ ∗ ξa† + · · · 2! 3! |ξ|2 ξ ξ |ξ|3 = a 1+ + · · · + a† + · · · = a cosh |ξ| + a† sinh |ξ|. |ξ| + 2! |ξ| 3! |ξ| †

(iv) is obtained by a Taylor expansion of eza straightforward consequence of (iii).

−z ∗ a

and using (iii). (v) is a

Let us define the following unitary operators. 1 2 ˆ ξ (z) = exp(zb† − z ∗ b). D(z) = exp(za† − z ∗ a), S(ξ) = exp[ (ξa† − ξ ∗ a2 )], D 2 D(z) and S(ξ) are called, respectively, the shift operator and the squeezing operator. Clearly D† (z) = D−1 (z) = D(−z) etc. hold and the next formulae are derived immediately from the above relations (iii) (iv) and (v). Proposition 35: The following relations hold. (i)

b = S −1 (ξ)a S(ξ) = a cosh |ξ| + a†

(ii)

ˆ ξ (z). S −1 (ξ)D(z)S(ξ) = D

(iii)

ˆ ξ (β), D(z) = D

ξ sinh |ξ|. |ξ|

β = z cosh |ξ| + z ∗

ξ sinh |ξ|. |ξ|

Now let us define the squeezed state. There are mainly two fashions to define the photon squeezed state [5, 33]. Definition 36: (Yuen)

b|βξ = β|βξ .

Definition 37: (Caves) |zξ = D(z)S(ξ)|0. These fashions are related by the following proposition. Proposition 38: The following relations hold. (i) (ii) (iii)

|βξ = S −1 (ξ)|β. ˆ ξ (β)|0 . |βξ = D ξ |βξ = |z(−ξ) .

Proof: (i) bS −1 (ξ)|β = {S −1 (ξ)a S(ξ)}S −1 (ξ)|β = βS −1 (ξ)|β. ˆ ξ (β)S −1 (ξ)|0 = D ˆ ξ (β)|0 . (ii) |βξ = S −1 (ξ)|β = S −1 (ξ)D(β)|0 = D ξ

entire

December 28, 2004

13:56


Akio Fujiwara

282

(iii)

ˆ ξ (β)S −1 (ξ)|0 = |β . |z(−ξ) = D(z)S(−ξ)|0 = D ξ

We can use both expressions of a squeezed state interchangeably through the formula (iii) in Proposition 38, provided that when we write as |zξ = |β(−ξ) , all the ξ in b and β must automatically be changed into −ξ †† . Now, let us calculate some expectational properties of squeezed states. Let the real and imaginary parts of a be Xr , Xi , respectively, and the real and imaginary parts of z be xr , xi , respectively. Introducing µ = cosh |ξ|,

ν=

ξ sinh |ξ|, |ξ|

which satisfy b = µa + νa† ,

|µ|2 − |ν|2 = 1,

β = µz + νz ∗ ,

a = µ∗ b − νb† ,

the expectation of Xr can be calculated as 1 a + a† |β = β|(µ − ν)∗ b + (µ − ν)b† |β 2 2 1 1 ∗ = [(µ − ν) β + (µ − ν)β ∗ ] = (z + z ∗ )(|µ|2 − |ν|2 ) = xr . 2 2

β|Xr |β = β|

In the same way, we have β|Xr |β = xr ,

β|Xi |β = xi .

Further, by using [b, b† ] = 1, 2

= =

=

= †† As

β|(Xr − xr )2 |β = β|Xr2 |β − β|Xr |β 2 a + a† a + a† |β2 β| |β − β| 2 2 1 2 β|[(µ − ν)∗2 b2 + (µ − ν)∗ (µ − ν)(bb† + b† b) + (µ − ν)2 b† ]|β 4 1 2 − [(µ − ν)∗ β + (µ − ν)β ∗ ] 4 1 (µ − ν)∗2 β 2 + |µ − ν|2 (1 + 2|β|2 ) + (µ − ν)2 β ∗2 4 1 − [(µ − ν)∗ β + (µ − ν)β ∗ ]2 4 1 |µ − ν|2 . 4

we shall see later, |zξ and |z(−ξ) are states which have the same center and principal axes but have opposite squeezing ratios.

entire

December 28, 2004

13:56



283

In the same way, we have 1 |µ − ν|2 , β|(Xr − xr )(Xi − xi )|β = 4 1 β|(Xi − xi )2 |β = |µ + ν|2 , β|(Xi − xi )(Xr − xr )|β = 4

β|(Xr − xr )2 |β =

i ∗ (µ ν − µν ∗ + 1), 4 i ∗ (µ ν − µν ∗ − 1). 4

These results indicate that the covariances of a squeezed state are independent of the complex amplitude z. Expressing ξ = seiθ and µ = cosh s, ν = −eiθ sinh s, the covariance matrix of Xr , Xi for the state |zξ = |β(−ξ) becomes 7 8 1 cosh 2s + cos θ sinh 2s sin θ sinh 2s V = sin θ sinh 2s cosh 2s − cos θ sinh 2s 4   θ θ 2 θ 2s 2 θ −2s 2s −2s e + e (e cos cos sin − e ) sin 2 2 2 2 1   =    4 θ 2s 2 θ θ 2s −2s −2s 2 θ e sin +e (e − e ) sin cos cos 2 2 2 2 7 2s 8 θ 1 θ 0 e −1 = O , (97) −2s O 0 e 4 2 2 where, O(ϕ) is the orthogonal (rotational) matrix defined by 7 8 cos ϕ − sin ϕ O(ϕ) = . sin ϕ cos ϕ Thus V has eigenvalues e2s /4 and e−2s /4, and the corresponding eigenvectors are 8 7 8 7 − sin θ2 cos θ2 , , sin θ2 cos θ2 respectively, which is illustrated in Figure 3‡‡ . Moreover, since the corresponding principal axes u and v are related to the original coordinate system (x, y) by 87 8 7 8 7 u x cos θ2 − sin θ2 , = sin θ2 cos θ2 v y 7 87 8 x y 1/a2 0 x ( )2 + ( )2 = 1 ⇐⇒ (x y) = 1, principal lengths of an ellip2 y 0 1/b a b soid are the squared roots of the inverse of the eigenvalues of the corresponding quadratic form. On the other hand, the inverse of covariance matrix determines a quadratic form, as is exemplified by the Gaussian distribution. Thus the squared roots of the eigenvalues of covariance matrix gives the principal lengths.

‡‡ Since

entire

December 28, 2004

13:56


Akio Fujiwara

284

Fig. 3.

Schematic configuration of a squeezed state.

and since the equation of the ellipse centered at the origin becomes u=

1 s e cos φ, 2

v=

1 −s e sin φ, 2

we have

@ θ e−s θ e2s θ e−2s θ es sin sin φ = cos2 + sin2 cos(φ + α), x = cos cos φ − 2 2 2 2 4 2 4 2 @ s −s 2s −2s e θ e θ e θ e θ y = sin cos φ + cos sin φ = sin2 + cos2 cos(φ + β). 2 2 2 2 4 2 4 2 These equations indicate that the widths of orthogonal projections of the ellipse onto the x, y axes are identical to the variances of Xr , Xi , respectively∗ . References [1] S. Abe, “Quantized geometry associated with uncertainty and correlation,” Phys. Rev. A, 48, 4102–4106 (1993). ∗ The

√ covariance matrix of Q, P is 2V , since Xr + iXi = (Q + iP )/ 2.

entire

December 28, 2004

13:56



285

[2] S. Amari, Differential-Geometrical Methods in Statistics, Lecture Notes in Statistics, Vol. 28 (Springer, Berlin, 1985). [3] J. Anandan and Y. Aharonov, “Geometry of quantum evolution,” Phys. Rev. Lett., 65, 1697–1700 (1990). [4] F. T. Arecchi, E. Courtens, R. Glimore, and H. Thomas, “Atomic coherent states in quantum optics,” Phys. Rev., 6, 2211–2237 (1972). [5] C. M. Caves, K. S. Thorne, R. W. P. Drever, V. D. Sandberg, and M. Zimmermann, “On the measurement of a weak classical force coupled to a quantum-mechanical oscillator. I. Issue of principle,” Rev. Mod. Phys., 52, 341–392 (1980). [6] A. Fujiwara, “Statistical estimation theory of quantum states,” Master thesis, Department of Mathematical Engineering and Information Physics, Univ. Tokyo (1993) (in Japanese). [7] A. Fujiwara, “One-parameter pure state estimation based on the symmetric logarithmic derivative,” Math. Eng. Tech. Rep. 94-8, Univ. Tokyo (1994): A. Fujiwara and H. Nagaoka, “Quantum Fisher metric and estimation for pure state models,” Phys. Lett. A (submitted)† . [8] A. Fujiwara, “Multi-parameter pure state estimation based on the right logarithmic derivative,” Math. Eng. Tech. Rep., 94-9, Univ. Tokyo (1994). [9] A. Fujiwara, “Linear random measurements of two non-commuting observables,” Math. Eng. Tech. Rep., 94-10, Univ. Tokyo (1994). [10] A. Fujiwara, “Information geometry of quantum states based on the symmetric logarithmic derivative,” Math. Eng. Tech. Rep., 94-11, Univ. Tokyo (1994). [11] R. J. Glauber, “The quantum theory of optical coherence,” Phys. Rev., 130, 2529–2539 (1963). [12] R. J. Glauber, “Coherent and incoherent states of the radiation field,” Phys. Rev., 131, 2766–2788 (1963). [13] H. Hasegawa, “α-divergence of the non-commutative information geometry,” Rep. Math. Phys., 33, 87–93 (1993). [14] C. W. Helstrom, “Minimum mean-square error estimation in quantum statistics,” Phys. Lett., 25A, 101–102 (1967). [15] C. W. Helstrom, Quantum Detection and Estimation Theory (Academic Press, New York, 1976). [16] A. S. Holevo, “Statistical decision theory for quantum systems,” J. Multivariate Anal. 3, 337–394 (1973). [17] A. S. Holevo, Probabilistic and Statistical Aspects of Quantum Theory (North-Holland, Amsterdam, 1982) (in Russian, 1980). [18] J. R. Klauder and B. Skagerstam, Coherent States – Applications in Physics and Mathematical Physics (World Scientific, Singapore, 1985). [19] J. R. Klauder and E. C. G. Sudarshan, Fundamentals of Quantum Optics (Benjamin, New York, 1968). [20] S. Kobayashi and K. Nomizu, Foundations of Differential Geometry, I, II

† Editorial

note: This paper has been published, and reprinted in Chap. 17.

entire

December 28, 2004

286

13:56


Akio Fujiwara

(John Wiley, New York, 1963, 1969). [21] E. H. Lieb, “The classical limit of quantum spin systems,” Commun. Math. Phys., 31, 327–340 (1973). [22] H. Nagaoka, “On Fisher information of quantum statistical models,” In Proceedings of 10th Symposium on Information Theory and Its Applications, 241–246, 1987 (in Japanese). (Chap. 9 of this book). [23] H. Nagaoka, “A new approach to Cramér-Rao bounds for quantum state estimation,” IEICE Technical Report, IT 89-42, 9-14, (1989). (Chap. 8 of this book). [24] H. Nagaoka, “On the parameter estimation problem for quantum statistical models,” In Proceedings of 12th Symposium on Information Theory and Its Applications, 577–582 (1989). (Chap. 10 of this book). [25] H. Nagaoka, “On a mathematical problem of matrices and quantum estimation theory,” In Proceedings of 13th Symposium on Information Theory and Its Applications, 681–686 (1991) (in Japanese). [26] H. Nagaoka, “A generalization of the simultaneous diagonalization of Hermitian matrices and its relation to quantum estimation theory,” Trans. Jap. Soc. Indust. Appl. Math., 1, 43–56 (1991) (in Japanese). (Chap. 11 of this book). [27] H. Nagaoka, unpublished manuscript‡ . [28] D. Petz, “Entropy in Quantum Probability I,” Quantum Probability and Related Topics Vol. VII, 275–297 (World Scientific, 1992). [29] J. P. Provost and G. Vallee, “Riemannian structure on manifolds of quantum states,” Commun. Math. Phys., 76, 289–301 (1980). [30] J. M. Radcliffe, “Some properties of coherent spin states,” J. Phys. A: Gen. Phys., 4, 313–323 (1971). [31] W. K. Wootters, “Statistical distance and Hilbert space,” Phys. Rev. D, 23, 357–362 (1981). [32] H. P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of non-selfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973). [33] H. P. Yuen, “Two-photon coherent states of the radiation field,” Phys. Rev. A, 13, 2226–2243 (1976).

‡ Additional

author’s note: An extended version of this manuscript is available: H. Nagaoka and A. Fujiwara, “Autoparallelity of a quantum statistical manifold,” Technical Report UEC-IS-2000-4, Grad. School of Inform. Syst., The Univ. of ElectroCommunications, 2000.

entire

December 28, 2004

13:56


CHAPTER 19

An Estimation Theoretical Characterization of Coherent States Akio Fujiwara and Hiroshi Nagaoka Abstract. We introduce a class of quantum pure state models called the coherent models. A coherent model is an even dimensional manifold of pure states whose tangent space is characterized by a symplectic structure. In a rigorous framework of noncommutative statistics, it is shown that a coherent model inherits and expands the original spirit of the minimum uncertainty property of coherent states.

1. Introduction Quantum estimation theory, originated in optical communications, offers a rigorous approach toward the optimization of detection processes in quantum communication systems [7, 8]. It aims to find, for a given smooth parametric family of density operators (a model) P = {ρθ ; θ = (θ1 , . . . , θn ) ∈ Θ ⊂ Rn }, the optimum measurement (positive operator-valued measure) M = {M (B) ; B is a Borel set in Rn } for the parameter θ under the unbiasedness condition: For all θ ∈ Θ, 6 ˆ = θj , j = 1, . . . , n. θˆj Tr ρθ M (dθ) Here Tr denotes the operator trace. Normally a more tractable (weaker) condition is adopted, called the local unbiasedness condition: A measurement M is called locally unbiased at a given point θ if M satisfies at θ the above equality and its formal differentiation 6 ∂ ˆ = δj , i, j = 1, . . . , n. θˆj Tr ρθ M (dθ) i ∂θi It is well-known that when n = 1, the quantum Cramér-Rao inequality with respect to the symmetric logarithmic derivative (SLD) offers the achievable lower bound (i.e., the bound attained by a certain measurement) of the variance of estimation. This is also regarded as a rigorous modification c 1999 AIP Publishing. Reprinted, with permission, from J. Math. Phys., 40, 4227–4239, 1999. 287

entire

December 28, 2004

288

13:56



of the uncertainty relation. When n ≥ 2, on the other hand, a matrix version of the SLD Cramér-Rao inequality itself does not always have an absolute significance because the lower bound cannot be attained in general unless the model has commutative SLDs. We therefore often deal with the minimization problem of the scalar quantity tr GVθ [M ] with respect to M , where tr denotes the matrix trace on the parameter space Θ, G a real symmetric positive matrix representing the weight, and Vθ [M ] the covariance matrix at θ with respect to a locally unbiased measurement M whose (i, j) entry is 6 ij ˆ (Vθ [M ]) = (θî − θi )(θˆj − θj )Tr ρθ M (dθ). If there is a number C such that tr GVθ [M ] ≥ C holds for all M , C is called a Cramér-Rao type bound or simply a CR bound. The CR bound C may depend on both G and θ. The problem of finding the achievable CR bound is in general a hard one and has been solved only in a few special models such as the quantum gaussian model [16, 8] and the two-dimensional spin 1/2 model [12, 13]. Holevo showed that if a model having the right logarithmic derivative (RLD) exhibits a certain “nice” property of a tangent space, the CR bound based on the RLD is expressed only in terms of the SLDs (p.280 [8]). Moreover it was shown that this gives the achievable CR bound for the gaussian model of quantum oscillators. Motivated by these facts and that the SLD Fisher information is well-defined also for pure state models [4], we will introduce a class of pure state models called the coherent models [5] each having a “nice” tangent space, and will explore their parameter estimation theory. The construction of the paper is as follows. In Section 2, we explore some basic characteristics inherent in pure state models which are closely related with Holevo’s commutation operator. In Section 3, a special class of pure state models, called the coherent models, is introduced of which the SLD tangent space forms an invariant subspace with respect to the commutation operator. In Section 4, we derive a CR bound, called the generalized RLD bound, for a model that has an invariant SLD tangent space with respect to the commutation operator. Here the model is not assumed to be pure. In Section 5, we show that for a coherent model, there exists a random measurement which attains the generalized RLD bound. In Section 6, the above results are demonstrated in two simple coherent models: a canonical squeezed state model and a spin coherent state model. In the final section

entire

December 28, 2004

13:56


An Estimation Theoretical Characterization of Coherent States

289

we give conclusions. 2. Commutation Operator In the study of noncommutative statistics, Holevo introduced useful mathematical tools called the square summable operators and the commutation operators associated with quantum states. We here give a brief summary: for details, consult [8]. Let H be a separable complex Hilbert space which corresponds to a physical system of interest, and let ρ be a fixed density operator. We define a real Hilbert space L2h (ρ) associated with ρ by the completion of Bh (H), the set of bounded self-adjoint operators, with respect to the pre-inner product X, Y ρ = Re Tr ρXY . Letting ρ = j sj |ψj ψj | be the spectral representation, an element X ∈ L2h (ρ) can be regarded as an equivalence class of such self-adjoint operators (called square summable operators) satisfying j sj Xψj 2 < ∞ (so that ψj ∈ Dom(X) if sj "= 0) under the identification X1 ∼ X2 if X1 ψj = X2 ψj for sj "= 0. The space L2h (ρ) thus provides a convenient tool to cope with unbounded observables. Let L2 (ρ) be the complexification of L2h (ρ). Note that L2 (ρ) is also regarded as the completion of B(H), the set of bounded operators, with respect to the pre-inner product X, Y ρ =

1 Tr ρ (Y X ∗ + X ∗ Y ). 2

Thus L2 (ρ) is regarded as a complex Hilbert space with the inner product ·, ·ρ . We further introduce two sesquilinear forms on B(H) by (X, Y )ρ = Tr ρ Y X ∗ ,

[X, Y ]ρ =

1 Tr ρ (Y X ∗ − X ∗ Y ), 2i

and extend them to L2 (ρ) by continuity. The commutation operator Dρ : L2 (ρ) → L2 (ρ) with respect to ρ is defined by [X, Y ]ρ = X, Dρ Y ρ , which is formally represented by the operator equation ρ(Dρ X) + (Dρ X)ρ = 1i (ρX − Xρ). (To be precise, this definition is different from Holevo’s original definition by a factor of 2.) The operator Dρ is a complex-linear bounded skew-adjoint operator. Moreover, since the forms [·, ·]ρ and ·, ·ρ are real on the real subspace L2h (ρ), this subspace is invariant under the operation of Dρ . Thus Dρ can also be regarded as a real-linear bounded skew-adjoint operator when restricted to L2h (ρ) as Dρ : L2h (ρ) → L2h (ρ). Our main concern lies in the case where ρ is pure. In this case the above setting is considerably simplified as follows: Let ρ = |ψψ|. Then for

entire

December 28, 2004

13:56

290



X, Y ∈ L2 (ρ), 1 {Y ∗ ψ|X ∗ ψ + Xψ|Y ψ}, 2 1 [X, Y ]ρ = {Y ∗ ψ|X ∗ ψ − Xψ|Y ψ}, 2i (X, Y )ρ = Y ∗ ψ|X ∗ ψ.

X, Y ρ =

Here Xψ, for example, stands for the vector X1 ψ where X1 is an arbitrary representative of X. (It is independent of the choice of a representative.) In particular, if X, Y ∈ L2h (ρ) we have X, Y ρ = Re Y ψ|Xψ = Re Xψ|Y ψ,

(1)

[X, Y ]ρ = Im Y ψ|Xψ = −Im Xψ|Y ψ,

(2)

(X, Y )ρ = Y ψ|Xψ = Xψ|Y ψ.

(3)

It should be noted that operators X and Y (whether bounded or not) are identified with each other in L2 (ρ) iff Xψ = Y ψ and X ∗ ψ = Y ∗ ψ. In particular, self-adjoint operators X and Y are identified in L2h (ρ) iff Xψ = Y ψ. Lemma 1: Let ρ = |ψψ|. Then for all X ∈ L2h (ρ), (Dρ X)ψ = i(X − ψ|XψI)ψ, where I denotes the identity in L2h (ρ). Proof: For X ∈ L2h (ρ), let Z be the element in L2h (ρ) having a representative Z1 = i(|Xψψ| − |ψXψ|). Then Zψ = i(X − ψ|XψI)ψ. On the other hand, for Y ∈ L2h (ρ), we have Y ψ|Zψ = i{Y ψ|Xψ − ψ|Xψψ|Y ψ}, and hence Y, Zρ = [Y, X]ρ because of (1) and (2). Thus Z = Dρ X, which completes the proof. Note that Lemma 1 does not imply Dρ X = i(X − ψ|XψI), since the right hand side is not a self-adjoint element in L2 (ρ) unless it equals 0. Let us introduce a linear subspace Th (ρ) = {X ∈ L2h (ρ) ; I, Xρ = 0} of L2h (ρ). Here ρ is not necessarily pure. This subspace is itself a real Hilbert space with the inner product ·, ·ρ . Now consider again the special case

entire

December 28, 2004

13:56



291

that ρ is pure: ρ = |ψψ|. Then from Lemma 1, we obtain the important relation: (Dρ X)ψ = (iX)ψ,

X ∈ Th (ρ).

(4)

This equation, combined with (1), implies that Dρ is a unitary transformation on (Th (ρ), ·, ·ρ ). In particular, Dρ is nondegenerate on Th (ρ), and so is the skew-symmetric bilinear form [·, ·]ρ . In other words, the real linear space Th (ρ) is regarded as a symplectic space [6] with the symplectic form [·, ·]ρ . We also note that Dρ2 = −I on Th (ρ) (I denotes the identity operator acting on Th (ρ)), since Dρ is unitary and skew-adjoint. Indeed, equation (4) immediately leads to (Dρ2 X)ψ = −Xψ and hence Dρ2 X = −X for all X ∈ Th (ρ), whereas Dρ X "= iX as mentioned earlier. In other words, Dρ is an almost complex structure on Th (ρ). 3. Coherent Model Let P = {ρθ ; θ = (θ1 , . . . , θn ) ∈ Θ} be an n-dimensional model, where ρθ are not necessarily pure for the present, and Θ is an open subset of Rn . We assume the following regularity conditions: (a) The parametrization θ )→ ρθ is assumed to be appropriately smooth and nondegenerate so that the derivatives {∂ρθ /∂θj }nj=1 exist in trace-class and form a linearly independent set at each point θ. (b) There exists a constant c such that 2 ∂ ∂θj Tr ρθ X ≤ cX, Xρθ for all X ∈ B(H) and j. From the condition (b), the linear functionals X )→ (∂/∂θj )Tr ρθ X can be extended to continuous linear functionals on L2 (ρθ ). Given a model that satisfies (a) and (b), the symmetric logarithmic derivative (SLD) LSθ,j in the jth direction is defined by the requirement that ∂ Tr ρθ X = LSθ,j , Xρθ , LSθ,j ∈ L2 (ρθ ) ∂θj for all X ∈ L2 (ρθ ). It is easily verified that LSθ,j ∈ L2h (ρθ ); so the definition is formally written as ∂ρθ /∂θj = 12 (LSθ,j ρθ + ρθ LSθ,j ). The SLDs belong to Th (ρθ ) since I, LSθ,j ρθ = (∂/∂θj )Tr ρθ = 0, and the SLD Fisher informa tion matrix defined by JθS = LSθ,j , LSθ,k ρθ gives a Cramér-Rao inequality

entire

December 28, 2004

292

13:56



Vθ [M ] ≥ (JθS )−1 , where M is an arbitrary locally unbiased measurement for the parameter θ, see p. 276 [8]. In the rest of this section, we restrict ourselves to pure state models. Some remarks are in order. First, by differentiating the identity ρ2θ = ρθ , we see that the element in L2h (ρθ ) having a representative 2∂ρθ /∂θj gives the SLD LSθ,j . Thus, for a pure state model, the condition (a) implies (b). Second, associated with a pure state model {ρθ ; θ ∈ Θ} is, at least locally, a smooth family {ψθ ; θ ∈ Θ} of normalized vectors in H such that ρθ = |ψθ ψθ |. In what follows, we shall frequently use this representation. A convenient way of finding SLDs for a pure state model ρθ is as follows: Let LA θ,j be the anti-symmetric logarithmic derivative (ALD) satisfying ∂ Tr ρθ X = [LA θ,j , X]ρθ , ∂θj

LA θ,j ∈ Th (ρθ )

A for all X ∈ L2 (ρθ ), or formally ∂ρθ /∂θj = (LA θ,j ρθ − ρθ Lθ,j )/2i. (This definition is different from [4] by a factor of i.) Then the SLD is given by S A LSθ,j = −Dθ LA θ,j where Dθ = Dρθ , since Lθ,j , Xρθ = [Lθ,j , X]ρθ . Note that 2 A S since Dθ = −I on Th (ρθ ), then Lθ,j = Dθ Lθ,j , which assures the existence and the uniqueness of the ALD for a pure state model. The advantage of the use of the ALD is this: Every pure state model can be expressed in the form ρθ = Uθ ρ0 Uθ∗ where {Uθ }θ is a smooth family of unitary operators (which do not necessarily form a group representation), so that the ALD is explicitly given by

LA θ,j = 2i(Aθ,j − I, Aθ,j ρθ ), where Aθ,j is the skew-adjoint element in L2 (ρθ ) having a representative (∂Uθ /∂θj )Uθ∗ , the local generator of Uθ . For a group covariant pure state model, the generator of the group is usually obvious. Let TθS (P) be the real-linear subspace of Th (ρθ ) spanned by the SLDs {LSθ,j }j . Since the tangent vectors of the manifold P at the point ρθ are faithfully represented by the elements of TθS (P) via the correspondence (∂/∂θj )θ )→ LSθ,j , we call TθS (P) the SLD tangent space of the model P at θ. A pure state model P = {ρθ ; θ ∈ Θ} is called locally coherent at θ if TθS (P) is Dθ -invariant. The model is called coherent if it is locally coherent for all θ ∈ Θ. When the Hilbert space H is finite-dimensional, the totality of pure states forms a complex projective space and is an example of coherent model. The Riemannian metric on the model induced by the SLD Fisher information matrix JθS is identical to the Fubini-Study metric up to a con-

entire

December 28, 2004

13:56



293

stant factor [4] and hence is a K¨ ahler metric. The associated fundamental 2-form [10] in this case is nothing but the symplectic structure [·, ·]ρ . ∗ Theorem 2: Consider a pure state model of the form ρθ = Ug(θ) ρ0 Ug(θ) where {Ug ; g ∈ G} is a projective unitary representation of a Lie group G and g(·) : θ )→ g(θ) is the parametrization of the elements of G by a local coordinate system satisfying g(0) = e (: the unit element). This model is coherent iff it is locally coherent at ρ0 .

Proof: We only need to prove the if part. Let Λθ : G → G be the left translation by g(θ)−1 which maps h )→ g(θ)−1 h. Then its differential by a nonsingular real matrix (dΛθ )g(θ) : Tg(θ) (G) → Te (G) % is represented & akj (θ) such that (dΛθ )g(θ) [∂g(θ)/∂θj ]θ = k akj (θ) [∂g(θ)/∂θk ]θ=0 . Now ∗ , where Λθ (g(θ + ∆θ)) = g(∆θ ), we find that since ρθ+∆θ = Ug(θ) ρ∆θ Ug(θ) k j ∂ρθ /∂θ = k aj (θ) Uθ [∂ρθ /∂θk ]θ=0 Uθ∗ . This implies that the SLDs at θ are given by LSθ,j = i akj (θ) Uθ LS0,k Uθ∗ . As a consequence LSθ,j ψθ =

akj (θ) Uθ LS0,k ψ0 .

(5)

k

Here we have set as ρθ = |ψθ ψθ | with ψθ = Uθ ψ0 . Now suppose P is locally coherent at ρ0 . Then the vector (D0 LS0,k )ψ0 = iLS0,k ψ0 (see (4)) belongs to the real linear span of {LS0,k ψ0 }nk =1 ; hence the vector (Dθ LSθ,j )ψθ = iLSθ,j ψθ belongs to the real linear span of {LSθ,j ψθ }nj =1 because of (5) and the nonsingularity of the matrix akj (θ). This implies that P is locally coherent at every point θ. It is clear from the definition that if P is locally coherent at θ, then TθS (P) forms a symplectic space with the symplectic form being the restriction of [·, ·]ρθ . In particular, the dimensionality of TθS (P) is necessarily ˜ S }2m satisfying even (say n = 2m), and it has a symplectic basis {L θ,j j=1

˜ Sθ,j , L ˜ Sθ,k ]ρ [L θ

  −1, if j is odd and k = j + 1 = 1, if j is even and k = j − 1  0, otherwise.

Furthermore, since Dθ is unitary on TθS (P) with respect to the inner prod˜ S } to be orthonormal. Such a basis, which we uct ·, ·ρθ , we can take {L θ,j

entire

December 28, 2004

13:56



294

shall call a normalized ρθ -symplectic basis, satisfies  ˜S      ˜S Lθ,1 Lθ,1 0 1  L   −1 0   L ˜S ˜S θ,2  θ,2      L      S S ˜ ˜ 0 1      Lθ,3   ˜ θ,3      ˜S −1 0 LSθ,4  =   L Dθ  θ,4     .  .. .. ..      .       . .  S     S ˜ ˜  Lθ,2m−1   0 1   Lθ,2m−1  ˜S ˜S −1 0 L L θ,2m

(6)

θ,2m

Thus the SLD tangent space of a coherent model is decomposed into twodimensional Dθ -invariant subspaces. This suggests the importance of studying two-dimensional coherent models. Now, let us characterize a two-dimensional coherent model. Theorem 3: For a two-dimensional pure state model P = {ρθ = |ψθ ψθ | ; θ ∈ Θ}, the following three conditions are equivalent. (α) P is locally coherent at θ. (β) LSθ,1 ψθ and LSθ,2 ψθ are linearly dependent. A (γ) LA θ,1 ψθ and Lθ,2 ψθ are linearly dependent.

Before going to the proof, we should remark that the condition (β) does not conflict with the fact that LSθ,1 and LSθ,2 are linearly independent due to the nondegeneracy of the parametrization θ )→ ρθ . Indeed, the linear independence of {LSθ,1, LSθ,2 } is concerned with the real linear structure of L2h (ρθ ) and is equivalent to the real linear independence of {LSθ,1ψθ , LSθ,2 ψθ }. On the other hand, the condition (β) asserts the complex linear dependence of the same vectors. Proof: The proof relies essentially on (4). We only need to show that (α)⇔(β), since (β)⇔(γ) is obvious from the identity LSθ,j ψθ = A S −(Dθ LA θ,j )ψθ = −iLθ,j ψθ . Let ϕj := Lθ,j ψθ , and assume (α) first. Then there exist real numbers x, y such that Dθ LSθ,1 = xLSθ,1 + yLSθ,2 . This is equivalent to iϕ1 = xϕ1 + yϕ2 and leads to (β). Assume (β) in turn. Recalling the real linear independence of {ϕ1 , ϕ2 }, we see that there exist real numbers x, y satisfying ϕ2 = (x + iy)ϕ1 with y "= 0. It then follows that iϕ1 = −(x/y)ϕ1 + (1/y)ϕ2 and Dθ LSθ,1 = −(x/y)LSθ,1 + (1/y)LSθ,2. Similarly Dθ LSθ,2 is shown to be a real linear combination of {LSθ,1, LSθ,2 } and thus (α) is verified. The following corollary, whose proof is now straightforward from Theorem 3 and (4), offers a mostly useful method to treat group covariant

entire

December 28, 2004

13:56



295

coherent models as exemplifed in Section 6. Moreover the equation (7) in the corollary reveals a close connection with the conventional definition of coherent states [9]. Indeed, this fact gave a motive for the nomenclature of the coherent model. Corollary 4: Let P = {ρθ = |ψθ ψθ |; θ ∈ Θ} be a two-dimensional pure A state model and let TθA (P) be the real linear span of ALDs {LA θ,1 , Lθ,2 } at θ. Then P is locally coherent at θ iff there exist nonzero elements X1 , X2 in TθA (P) satisfying (X1 + iX2 )ψθ = 0.

(7)

Moreover, (7) is also necessary and sufficient for {cXj }j=1,2 to form a normalized ρθ -symplectic basis of TθS (P) (= TθA (P)) with a common normalizing constant c. Under the condition (7), the linear relations LA θ,1 = c11 X1 + c12 X2 ,

LA θ,2 = c21 X1 + c22 X2

LSθ,1 = c12 X1 − c11 X2 ,

LSθ,2 = c22 X1 − c21 X2 .

imply

4. Generalized RLD Bound Throughout this section we consider an n-dimensional model P = {ρθ } of general (i.e., not necessarily pure) states satisfying the regularity conditions (a) and (b) presented in Section 3. Let L2+ (ρ) denote the completion of B(H) with respect to the pre-inner product (·, ·)ρ . Since (X, X)ρ ≤ 2X, Xρ , then L2 (ρ) ⊂ L2+ (ρ). The right logarithmic derivative (RLD) LR θ,j in the jth direction of a model P = {ρθ }, when it exists, is defined by the requirement that ∂ 2 Tr ρθ X = (LR LR θ,j , X)ρθ , θ,j ∈ L+ (ρθ ) ∂θj ∗ R for all X ∈ L2+ (ρθ ), or formally ∂ρθ /∂θj = (LR θ,j ) ρθ = ρθ Lθ,j . The covariance matrix of an arbitrary locally unbiased estimator M is then bounded from below as

where JθR

(8) Vθ [M ] ≥ (JθR )−1 , R = (LR θ,j , Lθ,k )ρθ is the RLD Fisher information matrix (p. 279

[8]). When a real positive definite matrix G is specified as the weight for the estimation accuracy, the total deviation is bounded from below as tr GVθ [M ] ≥ C R := tr G Re (JθR )−1 + tr abs G Im (JθR )−1 ,

(9)

entire

December 28, 2004

296

13:56



where tr abs A denotes the absolute sum of the eigenvalues of matrix A, see p. 284 [8]. The RLD thus gives a CR bound and plays a crucial role in optical communication theory [16, 8]. The RLD exists iff there is a constant c such that 2 ∂ (10) ∂θj Tr ρθ X ≤ c(X, X)ρθ for all X ∈ B(H). Thus the RLD does not always exist for a model satisfying the weaker condition (b). In particular it never exists for a pure state model ρθ = |ψθ ψθ |. To see this, let us fix a θ arbitrarily and take a vector x ∈ H such that ψθ |x = 0 and ∂ψθ /∂θj |x "= 0. (This is indeed possible because ψθ and ∂ψθ /∂θj are linearly independent owing to (∂/∂θj )ψθ |ψθ = 0 and (∂/∂θj )|ψθ ψθ | "= 0.) Then X = |xψθ | satisfies (X, X)ρθ = 0 and (∂/∂θj )Tr ρθ X "= 0. It is, however, important to notice that what is needed in estimation theory is not the RLD itself but the inverse of the RLD Fisher information matrix, as indicated by (8) and (9). In his book (p. 280 [8]), Holevo has shown that when a model satisfying the regularity conditions (a) and (10) has a Dθ -invariant SLD tangent space, the (JθR )−1 is expressed only in terms of SLDs; so is the CR bound (9). We generalize this result to a wider class of models that satisfy only the weaker conditions (a) and (b). Theorem 5: Suppose we are given an n-dimensional model P = {ρθ } having a Dθ -invariant SLD tangent space TθS (P). Then for all locally unbiased measurements M at θ, % &−1 % &−1 % &−1 + i JθS Dθ JθS , Vθ [M ] ≥ JθS where Dθ = [LSθ,j , LSθ,k ]ρθ . Proof: Let us introduce a family of inner products on L2 (ρθ ) having a parameter ε ∈ (0, 1]: (X, Y )(ε) ρθ = (1 − ε)(X, Y )ρθ + εX, Y ρθ . Since εX, Xρθ ≤ (X, X)(ε) ρθ ≤ (2 − ε)X, Xρθ , (ε)

there exists, for each ε, a unique operator Lθ,j ∈ L2 (ρθ ) which satisfies ∂ (ε) Tr ρθ X = (Lθ,j , X)(ε) ρθ ∂θj

entire

December 28, 2004

13:56



297

for all X ∈ L2 (ρθ ). Then in a quite similar way to the derivation of the quantum Cramér-Rao inequality, we have −1

(ε) (ε) (ε) (ε) (11) , Jθ = (Lθ,j , Lθ,k )(ε) Vθ [M ] ≥ Jθ ρθ . (ε)

Now observing the identity (X, Y )ρθ = X, Y ρθ + i(1 − ε)[X, Y ]ρθ , and using the definition of Dρθ = Dθ , we see that for all Y ∈ L2 (ρθ ), ∂ (ε) (ε) Tr ρθ Y = LSθ,j , Y ρθ = (Lθ,j , Y )(ε) ρθ = {I + i(1 − ε)Dθ }Lθ,j , Y ρθ . ∂θj (ε)

(ε)

(ε) (ε)

Then LSθ,j = {I + i(1 − ε)Dθ }Lθ,j , hence (Lθ,j , Lθ,k )ρθ = LSθ,j , {I + i(1 − ε)Dθ }−1 LSθ,k ρθ . Let us introduce Dirac’s notation |LSθ,j for the Hilbert space (L2 (ρθ ), ·, ·ρθ ), and let Γθ := |LSθ,1 , · · · , |LSθ,n . Then Γ∗θ Γθ = JθS and Γ∗θ Dθ Γθ = Dθ . And the matrix Jθ can be written in the form (ε) Jθ = Γ∗θ {I + i(1 − ε)Dθ }−1 Γθ . Thus from the assumption that TθS (P) is (ε) Dθ -invariant, the inverse of Jθ is explicitly given by

−1 % &−1 % &−1 (ε) Jθ = JθS Γ∗θ {I + i(1 − ε)Dθ } Γθ JθS % &−1 % &−1 % &−1 = JθS + i(1 − ε) JθS Dθ JθS . (12) (ε)

Combining (11) and (12), and taking the limit ε ↓ 0, we have the theorem. Theorem 5 asserts that even for a model that does not have the RLDs, (ε) the limε↓0 (Jθ )−1 indeed gives a generalization of (JθR )−1 as long as the SLD tangent space is Dθ -invariant. Then by using Theorem 5 and an analogous argument to the derivation of (9), we obtain the CR bound % &−1 % &−1 % &−1 C R = tr G JθS + tr abs G JθS Dθ JθS , (13) for models each having a Dθ -invariant SLD tangent space TθS (P). This may be called a generalized RLD bound. We will show in the next section that this bound is achievable in a coherent model. 5. Optimal Estimation for Two-Dimensional Coherent Models We now proceed to a parameter estimation for a pure coherent model. In particular, taking into account the symplectic structure (6) of the SLD tangent space, we restrict ourselves to a two-dimensional case. We note that as long as we are concerned with the achievable CR bound at each point on the model {ρθ }, we can take the weight as G = I without loss of generality. In

entire

December 28, 2004

298

13:56



fact, let M be a locally unbiased measurement for the parameter θ = (θ1 , θ2 ) ˆ be the corresponding joint distribution. and let p(θˆ1 , θˆ2 )dθˆ = Tr ρθ M (dθ) The coordinate transformation η i = j hij θj , where H = [hij ] is a real regular matrix, then induces another measurement N (dˆ η ) which corresponds η = p(θˆ1 , θˆ2 )dθˆ and is locally unbiased for to the joint distribution q(ˆ η 1 , ηˆ2 )dˆ the parameter η = (η 1 , η 2 ). In this case, tr Vη [N ] = tr (tHH)Vθ [M ]. Thus the parameter estimation for θ with the weight G = tHH is equivalent to that for η with the weight I. Now suppose we are given a two-dimensional coherent model P = {ρθ ; θ = (θ1 , θ2 ) ∈ Θ}. Let {Li } be the dual basis of the SLDs: Li = j J ij LSθ,j with J ij being the (i, j) entry of (JθS )−1 . Then 3 4 % S &−1 L1 , L1 ρθ L1 , L2 ρθ Jθ = L2 , L1 ρθ L2 , L2 ρθ and % S &−1 % &−1 Jθ Dθ JθS =

3

0 2 1 L ,L ρ

θ

L1 , L2 0

4

ρθ

Thus the generalized RLD bound (13) for G = I can be rewritten in the form (14) C R = L1 , L1 ρθ + L2 , L2 ρθ + 2 [L1 , L2 ]ρθ . We will show that the bound C R is achievable. In what follows, we fix a θ = (θ1 , θ2 ) arbitrarily. Let us consider a random measurement as follows. We first introduce a linear transformation φ : TθS (P) −→ TθS (P) by φ(X) = L1 , Xρθ L1 + L2 , Xρθ L2 . Since φ is symmetric and positive definite, it has positive eigenvalues λ1 , λ2 , and mutually orthogonal unit eigenvectors A1 , A2 satisfying φ(Aν ) = λν Aν , ν = 1, 2. We next take positive numbers p1 , p2 satisfying p1 + p2 = 1. Now letting 6 ν = 1, 2 ξEν (dξ), be the spectral decompositions of arbitrarily fixed representatives of Aν , we define a generalized measurement M (ν, dξ) = pν Eν (dξ).

entire

December 28, 2004

13:56



299

This has the following physical interpretation: Select one of the two “observables” A1 , A2 according to the probability p1 , p2 , respectively, and measure it in a usual sense. Now suppose we have selected Aν and have obtained an outcome ξ. We identify this result with a pair of real quantities ξ θî (ν, ξ) = θi + Li , Aν ρθ , pν

i = 1, 2.

The pair {θî (ν, ξ)}i=1,2 satisfies the local unbiasedness condition at θ: 2 6 ν=1 2 6 ν=1

θî (ν, ξ) Tr ρθ M (ν, dξ) = θi , ∂ θî (ν, ξ) j Tr ρθ M (ν, dξ) = δji , ∂θ

i = 1, 2

i, j = 1, 2.

(15)

(16)

To prove (15), we used the fact that Aν ∈ TθS (P), i.e., I, Aν ρθ = 0. To prove (16), observe that 6 ∂ ξ j Tr ρθ Eν (dξ) = LSθ,j , Aν ρθ , ∂θ so that the left-hand side of (16) becomes 2

Li , Aν ρθ LSθ,j , Aν ρθ = Li , LSθ,j ρθ = δji .

ν=1

With this measurement M , tr Vθ [M ] = =

2 6 7

θˆ1 (ν, ξ) − θ1

2

2 8

+ θˆ2 (ν, ξ) − θ2 Tr ρθ M (ν, dξ)

ν=1 2

1 1 2 2 L , Aν ρθ + L2 , Aν ρθ . p ν=1 ν

In the second equality, we used the fact that 6 ξ 2 Tr ρθ Eν (dξ) = Aν , Aν ρθ = 1. √ √ Since, for given µ1 , µ2 > 0, µ1 /p1 + µ2 /p2 takes the minimum ( µ1 + µ2 )2

entire

December 28, 2004

13:56



300

at pν =

√ √ √ µν /( µ1 + µ2 ), we see

min tr Vθ [M ] {pν }

=

82 79 9 2 2 2 2 L1 , A1 ρθ + L2 , A1 ρθ + L1 , A2 ρθ + L2 , A2 ρθ

9

A1 , φ(A1 )ρθ + ; ; 2 = λ1 + λ2 =

2 9 A2 , φ(A2 )ρθ

= L1 , L1 ρθ + L2 , L2 ρθ + 2

9 L1 , L1 ρθ L2 , L2 ρθ − L1 , L2 2ρθ .(17)

The last equality follows from the fact that the trace λ1 + λ2 and the determinant λ1 λ2 of the linear transformation φ are independent of the choice of the basis which represents φ in a matrix form. The random measurement presented above was first introduced in [13] by one of the present authors. In that paper, it was also shown that the problem of finding the achievable CR bound for an arbitrary two-parameter faithful spin 1/2 model can be reduced to an easy minimization problem. Interestingly, the explicit solution of the minimization problem, i.e., the achievable CR bound, turns out to be identical to the quantity (17), although the model treated there is not pure nor has in general a Dρ -invariant tangent space. Now we establish the relation between (14) and (17) for a coherent model. Theorem 6: For a two-dimensional coherent model {ρθ = |ψθ ψθ |}, the lower bound (14) is identical to (17). In other words, the generalized RLD bound (14) is achievable. Proof: By Theorem 3, L1 ψθ and L2 ψθ are linearly dependent. Therefore 7 1 8 L ψθ |L1 ψθ L1 ψθ |L2 ψθ det = 0, L2 ψθ |L1 ψθ L2 ψθ |L2 ψθ which leads to (Im L1 ψθ |L2 ψθ )2 = L1 ψθ |L1 ψθ L2 ψθ |L2 ψθ − (Re L1 ψθ |L2 ψθ )2 . By (1) and (2), this can be read as 1 2 2 [L , L ]ρ = L1 , L1 L2 , L2 − L1 , L2 2 , θ ρθ ρθ ρθ which proves the theorem.

entire

December 28, 2004

13:56



301

It should be noted that a more convincing result has been obtained by Matsumoto [11]. He proved that the CR bound (13) is achievable for a 2m-dimensional coherent model with an arbitrary weight G. It is also worth noting that the achievability of (14) is closely related to the Heisenberg uncertainty relation. By a coordinate transformation, we can assume that the SLD Fisher information matrix is diagonal at a fixed ρθ = |ψθ ψθ |. Then there exist nonzero real numbers c1 , c2 and normalized ˜ S . Then Lj = L ˜S, L ˜ S } such that LS = cj L ˜ S /cj , and ρθ -symplectic basis {L 1 2 j j j by (7) (c1 L1 + ic2 L2 )ψθ = 0. This is nothing but the equality condition for the Heisenberg uncertainty relation. So we have 2 L1 , L1 ρθ L2 , L2 ρθ = [L1 , L2 ]ρθ . This equation, combined with the assumption that L1 , L2 ρθ = 0, gives another proof of Theorem 6 for an orthogonal parametrization at ρθ . 6. Examples In this section we calculate the achievable CR bounds for canonical and spin coherent models. Throughout this section, adjoint operators and complex conjugate numbers are denoted by † and ∗, respectively, according to the convention in physics. Also we use the same letter for both a square summable operator and the corresponding element in L2h (ρ). 6.1. Canonical squeezed state model The canonical squeezed state [2, 15] is defined by ρq,p = D(q, p)|0ξξ 0|D† (q, p),

(q, p ∈ R),

† ∗ where D(q, √ p) = exp(za †− z a) denotes the shift operator with z = (q + ip)/ 2, and a and a are annihilation and creation operators, re√ 2 spectively, with a = (Q + iP )/ 2. Further |0ξ = exp[(ξa† − ξ ∗ a2 )/2]|0 is the squeezed vacuum with |0 the Fock vacuum, and ξ a complex number which represents the squeezing ratio: let ξ = seiθ . Comparing the identity b|zξ = β|zξ with Corollary 4, where |zξ = D(q, p)|0ξ , b = a cosh s − a† eiθ sinh s, and β = z cosh s − z ∗ eiθ sinh s, we

entire

December 28, 2004

13:56



302

see that ρq,p is a two-dimensional coherent model, and a normalized ρq,p symplectic basis is given by √ ˜ S = 2[(Q − qI)(cosh s − cos θ sinh s) − (P − pI) sin θ sinh s], L 1 √ ˜ S = 2[(P − pI)(cosh s + cos θ sinh s) − (Q − qI) sin θ sinh s]. L 2

The SLDs at ρq,p are calculated by operating −Dq,p to ALDs at ρq,p . By A expanding ALDs LA q = 2(P −pI), Lp = −2(Q−qI) into linear combinations ˜ S1 = L ˜ S2 = −L ˜ S2 , and using the relations Dq,p L ˜ S2 , Dq,p L ˜ S1 , we have ˜ S1 , L of L LSq = 2[(Q − qI)(cosh 2s − cos θ sinh 2s) − (P − pI) sin θ sinh 2s], LSp = 2[(P − pI)(cosh 2s + cos θ sinh 2s) − (Q − qI) sin θ sinh 2s]. The corresponding SLD Fisher information matrix becomes 7 8 cosh 2s − cos θ sinh 2s − sin θ sinh 2s S Jq,p = 2 . − sin θ sinh 2s cosh 2s + cos θ sinh 2s Then from (17), we have min tr Vq,p [M ] = cosh 2s + 1. M

6.2. Spin coherent state model The spin coherent state [1, 14] in the spin j representation is defined by ρθ,ϕ = R(θ, ϕ)|jj|R† (θ, ϕ),

(0 ≤ θ ≤ π, 0 ≤ ϕ < 2π),

where (θ, ϕ) is the polar coordinate system (the north pole is θ = 0 and x-axis corresponds to ϕ = 0), R(θ, ϕ) = exp [iθ(Jx sin ϕ − Jy cos ϕ)] the rotation through an angle −θ about an axis (sin ϕ, − cos ϕ, 0), and |j the highest weight state with respect to Jz that corresponds to the north pole. Since J+ |j = (Jx + iJy )|j = 0, we find that ρθ,ϕ is a two-dimensional ˜ S (0, 0) = coherent model, and;a normalized ρ0,0 -symplectic basis is L 1 ; S ˜ 2/jJx , L2 (0, 0) = 2/jJy . A normalized ρθ,ϕ -symplectic basis is then calculated as ˜ S (θ, ϕ) = R(θ, ϕ)L ˜ S (0, 0)R† (θ, ϕ), L k k where k = 1, 2. On the other hand, the generators of rotations about axes (sin ϕ, − cos ϕ, 0) and (cos ϕ, sin ϕ, 0) at θ = 0 are i(Jx sin ϕ − Jy cos ϕ)

entire

December 28, 2004

13:56



303

and i(Jx cos ϕ + Jy sin ϕ), respectively. Therefore ALDs for the model at ρθ,ϕ are given by † LA θ (θ, ϕ) = R(θ, ϕ){−2(Jx sin ϕ − Jy cos ϕ)}R (θ, ϕ)

; S ˜ 1 (θ, ϕ) sin ϕ − L ˜ S2 (θ, ϕ) cos ϕ , = − 2j L † LA ϕ (θ, ϕ) = R(θ, ϕ){−2(Jx cos ϕ + Jy sin ϕ) sin θ}R (θ, ϕ)

; ˜ S1 (θ, ϕ) sin θ cos ϕ + L ˜ S2 (θ, ϕ) sin θ sin ϕ . = − 2j L

The SLDs at ρθ,ϕ are calculated by operating −Dθ,ϕ to ALDs, to obtain

; S ˜ 1 (θ, ϕ) cos ϕ + L ˜ S2 (θ, ϕ) sin ϕ , LSθ (θ, ϕ) = 2j L

; S ˜ (θ, ϕ) sin θ sin ϕ − L ˜ S (θ, ϕ) sin θ cos ϕ . LSϕ (θ, ϕ) = − 2j L 1 2 ˜ S (θ, ϕ)}k=1,2 is orthonormal, the SLD Fisher Since ρθ,ϕ -symplectic basis {L k information matrix and the matrix D are easily calculated: 7 7 8 8 2j 0 0 −2j sin θ S Jθ,ϕ = = . , D θ,ϕ 0 2j sin2 θ 2j sin θ 0 We thus have 1 min tr Vθ,ϕ [M ] = M 2j

1+

1 sin θ

2 .

7. Conclusions We introduced a class of quantum pure state models called the coherent models. They are characterized by a symplectic structure of the tangent space, and have a close connection with the conventional generalized coherent states in mathematical physics. A Cramér-Rao type bound for a coherent model was derived by an analogous argument to the derivation of the right logarithmic derivative bound. Moreover, by an argument of random measurement, this lower bound was found to be achievable. Acknowledgment We thank Keiji Matsumoto for helpful suggestions, with which the manuscript has been improved as compared with the early version [3] of this paper.

entire

December 28, 2004

13:56

304



References [1] F.T. Arecchi, E. Courtens, R. Glimore, and H. Thomas, Phys. Rev., 6, 2211 (1972). [2] C.M. Caves, K.S. Thorne, R.W.P. Drever, V.D. Sandberg, and M. Zimmermann, Rev. Mod. Phys., 52, 341 (1980). [3] A. Fujiwara, Math. Eng. Tech. Rep., 94-10, Univ. Tokyo (1994). [4] A. Fujiwara and H. Nagaoka, Phys. Lett., A201, 119 (1995). (Chap. 17 of this book) [5] A. Fujiwara and H. Nagaoka, in K. Fujikawa and Y.A. Ono, eds., Quantum coherence and decoherence, Elsevier, Amsterdam, 1996, p. 303. [6] V. Guillemin and S. Sternberg, Symplectic techniques in physics, Cambridge Univ. Press, Cambridge, 1984. [7] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [8] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982. [9] J.R. Klauder and B. Skagerstam, Coherent states, World Scientific, Singapore, 1985. [10] S. Kobayashi and K. Nomizu, Foundations of differential geometry, II, John Wiley, New York, 1969. [11] K. Matsumoto, A geometrical approach to quantum estimation theory, Ph.D. Thesis, University of Tokyo, 1997. (Chap. 20 of this book) [12] H. Nagaoka, IEICE Technical Report, IT 89-42, 9-14 (1989). (Chap. 8 of this book) [13] H. Nagaoka, Trans. Jap. Soc. Indust. Appl. Math., 1, 43-56, (1991) (in Japanese). (Chap. 11 of this book) [14] J. M. Radcliffe, J. Phys. A: Gen. Phys., 4, 313 (1971). [15] H. P. Yuen, Phys. Rev., A13, 2226 (1976). [16] H. P. Yuen and M. Lax, IEEE Trans. Inform. Theory, 19, 740–750 (1973).

entire

December 28, 2004

13:56


CHAPTER 20

A Geometrical Approach to Quantum Estimation Theory Keiji Matsumoto 1. Introduction 1.1. The purposes of the thesis The most important purpose of the thesis is pursuit for the geometrical theory of statistical estimation of the quantum mechanical state. In the statistical theory of the probability distribution, S. Amari and his coworkers have formulated a geometrical theory, so called information geometry, and successfully applied to various statistical problems [3]. H. Nagaoka, one of Amari’s coworkers, pointed out that the duality between e- and m-connections sits at the heart of the information geometry. He also formulated quantum information geometry by use of this idea of the mutually dual connections, and applied to characterization of the model which has the efficient estimator [16, 17]. However, his information geometry does not give any insight into the problem of determination of the attainable CR (Cramér-Rao) type bound, nor characterize the condition for the SLD CR bound, which is of special interest for some reasons, is attainable. After all, Nagaoka’s geometry deal with the global properties of the model, while the attainable CR type bound is related local properties of the model. Hence, it seems that another geometric structure is needed for the thorough description of the quantum estimation theory. On the other hand, Berry’s phase, discovered by M. V. Berry as the nonintegrable phase factor in the adiabatic motion [4], is naturally understood as a curvature of the natural connection in the principle fiber bundle over This chapter is reprinted from K. Matsumoto’s Doctoral thesis “A Geometrical Approach to Quantum Estimation Theory”. This doctoral thesis contains the topics, Estimation of the temperature, Nagaoka’s information geometry, Uncertainty principle in view of quantum estimation theory, Time-energy uncertainty in view of hypothesis test, w-connection and e-connection, Attainability of SLD CR bound. But, these parts are omitted in this reprint with author’s permission. 305

entire

December 28, 2004

306

13:56


Keiji Matsumoto

the space of pure states whose structure group is U (1) [1]. In 1986, Uhlmann generalized the geometry to the space of mixed states. Though the physical meaning of Uhlmann’s geometry is not known, Berry’s phase is applied to the explanation of various phenomena [22]. The author conjectures that Berry-Uhlmann’s curvature reflects local properties of the model. To prove the statement, it is needed to determine the attainable CR type bound for arbitrary models, which is far out of our reach. However, for the 2-parameter pure state model, the author presents complete answer to the problem. In addition, for the faithful state model and pure state model, it is shown that SLD CR bound is attainable if and only if the model is free of Berry-Uhlmann curvature. Furthermore, we try a kind of unification of the two geometries, Nagaoka’s information geometry and Uhlmann’s parallelism. Second most important purpose is the determination of the attainable CR type bound, and the development of new methodology for that purpose. In the pure state models, this purpose is achieved, though not completely, to large extent. A new methodology direct approach is formulated and successfully applied to the 2-parameter pure state model, the coherent model. For the arbitrary pure state model, calculated is the attainable CR type bound whose weight matrix is the SLD Fisher information matrix. Looking back, no one has ever determined the attainable CR type bound for this wide range of models. Third, some considerations about such physical problems as the uncertainty principle are done. The time-energy uncertainty is nicely formulated as a hypothesis test, and the position-momentum uncertainty as a estimation of the mean values of the position and the momentum operators. In this formulation of the position-momentum uncertainty, it is shown that the mean values of the position and the momentum operators are simultaneously estimated up to arbitrarily high efficiency, if the particle is prepared carefully.

1.2. Organization of the thesis The thesis is divided into three parts: the faithful model theory, the pure state theory, the general model theory; The reason for this organization is that the extent of the achievement of the purposes is different in these three cases. Before these three parts, section 2 gives brief review of the estimation theory of probability distributions, the quantum mechanical theory of the

entire

December 28, 2004

13:56


A Geometrical Approach to Quantum Estimation Theory

307

measurement, and the quantum estimation theory, and section 3 gives the geometrical and the estimation-theoretical framework, commonly used in any of the following three parts. 2. Preliminaries In this section, statistical estimation theory and quantum mechanics are reviewed briefly. For the thorough description of estimation theory, see, for example, Ref. [14]. As for quantum mechanics, see Ref. [20], or other text books. Some basic concepts in quantum estimation theory are introduced also. 2.1. Classical estimation theory Throughout this section, the usual estimation theory, or the estimation theory of the probability distribution is called classical estimation theory, in the sense that the theory is not quantum mechanical. The theme of the classical estimation theory is identification of the probability distribution from which the N data x1 , x2 , ..., xN is produced. Usually, the probability distribution is assumed to be a member of a model, or a family M = {p(x|θ)|θ ∈ Θ ⊂ Rm } of probability distributions and that the finite dimensional parameter θ ∈ Θ ⊂ Rm is to be estimated statistically. ˆ 1 , x2 , ..., xN ) of parameter θ is the estimate Unbiased estimator θˆ = θ(x which satisfies 6 N ˆ ≡ dx1 dx2 ...dxN θ(x ˆ 1 , x2 , ..., xN ) Eθ [θ] p(xi |θ) = θ, (1) i=1

that is, the estimate which gives the true value of parameter in average. For the technical reason, we also define locally unbiased estimator θˆ at θ0 by ˆ = θ0 , ∂j Eθ [θî ] Eθ0 [θ] = δji . θ=θ0

The estimator is unbiased iff it is locally unbiased at every θ ∈ Θ. For the variance of locally unbiased estimator at θ, the following theorem gives bound of efficiency of the estimation.

entire

December 28, 2004

13:56


Keiji Matsumoto

308

Theorem 1: (Cramér-Rao inequality) For any locally unbiased estimate θˆ at θ, ˆ ≥ 1 J −1 (θ). (2) Vθ [θ] N Here, N is the number of the data and J(θ) is m× m real symmetric matrix defined by 8 76 (3) J(θ) ≡ dxp(x|θ)∂i ln p(x|θ)∂j ln p(x|θ) , where ∂i stands for ∂/∂θi . The best estimator, or the estimator θˆ satisfying (4), is given by i θî (x1 , ..., xN ) = θˆ(θ) (x1 , ..., xN ) ≡ θi +

m

N

[J

−1

p(xk |θ).

ij

(θ)] ∂j ln

j=1

k=1

J(θ) is called Fisher information matrix, because the larger the J(θ) is, the more precise estimate can be done with the same number of data. Metaphorically speaking, we obtain as much information as J(θ) per data. Actually, as easily seen by putting N = 1 in Cramér-Rao (CR) inequality, we can obtain J(θ) as the minimum variance of locally unbiased estimate when only one data is given. ˆ ≥ J −1 (θ). Vθ [θ]

(4)

The trouble with the (4) is that the best estimator θˆ(θ) is dependent on the true value of the parameter θ, which is unknown to us. When the true value of parameter is not θ0 , the estimate θˆ(θ0 ) is not even locally unbiased at the true value of the parameter. To avoid this dilemma, we give up with the unbiased estimator, and focus on the consistent estimator defined by ˆ 1 , x2 , ..., xN )] = θ. lim Eθ [θ(x

N →∞

For the consistent estimator, we also have the following theorem. Theorem 2: If the estimator is consistent, ˆ ≥ 1 J −1 (θ) + o Vθ [θ] N

1 N

holds true. The maximum likelihood estimator θˆMLE , which is defined by,   N   ln p(xi |θ) θ ∈ Θ ⊂ Rm , θˆMLE ≡ argmax   j=1

(5)

entire

December 28, 2004

13:56



309

is consistent and achieves the equality in (5). Notice that to obtain θˆMLE , we need no information about the true value of the parameter beforehand. Hence, the Fisher information matrix is a good measure of the efficiency of the optimal consistent estimator. 2.2. Quantum mechanics and measurement theory In the quantum mechanics, the state of physical system is described by the density operator ρ, which is a nonnegative Hermitian operator whose trace is equal to 1, in a separable Hilbert space H, whose dimension is denoted by d ≤ ∞ hereafter. We denote by P(H) the space of density operators in H, by Pr (H) the the space of density operators whose rank is r, and by P+ (H) the space of strictly positive definite density operators. P(H), Pr (H), and P+ (H) are often simply denoted by P, Pr , and P+ , respectively . Let Ω be a space of all possible outcomes of an experiment, and σ(Ω) be a σ- field in Ω. When the density operator of the system is ρ, the probability that the data ω ∈ Ω lies in B ∈ σ(Ω) writes Pr{ω ∈ B|ρ} = tr ρM (B),

(6)

by use of the map M from σ(Ω) to nonnegative Hermitian operator which satisfies ∞ ∞ 5 M (Bi ) (Bi ∩ Bj = φ, i "= j) (7) M (φ) = O, M (Ω) = I, M ( Bi ) = i=1

i=1

so that (6) define a probability measure (see Ref. [11], p. 53 and Ref. [12], p. 50). We call the map M the measurement, because there always exists a physical experiment corresponding to the map M which satisfies (7) [19, 23]. 2.3. Unbiased estimator in quantum estimation theory The purpose of the quantum estimation is to identify the density operator of the given physical system from the data obtained by the appropriately designed experiment. For simplicity, we usually assume that the density operator is a member of a model, or a manifold of M = {ρ(θ)|θ ∈ Θ ⊂ Rm } ⊂ P, and that the parameter θ is to be estimated statistically. For example, M is the set of spin states with given wave function part and unknown spin part. To estimate the parameter, we perform an experiment to obtain the ˆ data ω by which we calculate an estimator θˆ by estimator θ(ω). A pair

entire

December 28, 2004

13:56


Keiji Matsumoto

310

ˆ M, Ω) of a space Ω of data, a measurement M , and an estimator θ(∗) ˆ (θ, is also called an estimator. The expectation of f (ω) with respect to the probability measure (6) is denoted by Eθ [f (ω)|M ]. We have seen that the locally unbiased estimator played a key role in classical estimation theory. Hence we try to keep the same track also in the quantum estimation theory. ˆ M, Ω) is said to be unbiased if The estimator (θ, ˆ ]=θ Eθ [θ(ω)|M

(8)

holds for all θ ∈ Θ. If (8) and ∂i Eθ [θj (ω)|M ] = δij (i, j = 1, ..., m) ˆ M, Ω) is called locally unbiased at θ. hold at a particular θ, (θ, It is also reasonable to include calculation of the estimate from data into the process of measurement. In this point of view, the estimate θˆ itself is produced by the measurement process, and the data space Ω is Rm . Therefore, by the term ‘estimator’ we also mean the measurement which takes value on Rm . In this case, the unbiased estimator is a measurement which takes value on Rm which satisfies Eθ [M ] = θ holds for all θ ∈ Θ, where,

(9)

6

Eθ [M ] ≡

ˆ ρ(θ)M (dθ). ˆ θtr

If (9) and [∂i Eθ [M ]]j = δij (i, j = 1, ..., m) hold at a particular θ, M is called locally unbiased at θ. We denote by Vθ [M ] the covariance matrix of the estimator M when the true value of the parameter is θ. Obviously, these two definition of the estimator are equivalent. Therefore, in some situations, we prefer the former to the latter, while in other situations the latter is preferred for the sake of simplicity. 3. Conceptual Framework 3.1. Horizontal lift and SLD In this section, except for the pure state model theory, d ≡ dim H is assumed to be finite for the sake of clarity. The author believes the essence of the discussion will not be damaged by this restriction.

entire

December 28, 2004

13:56



311

Let Wr be the space of d × r complex and full-rank matrix W such that tr W W ∗ = 1, Pr the space of density operators whose rank is r, and π the map from Wr to Pr such that ρ = π(W ) ≡ W W ∗ . Because π(W U ) is identical to π(W ) iff U is a r × r unitary matrix, it is natural to see the space Wr as the total space of the principal fiber bundle with the base space Pr and the structure group U (d) [13]. One possible physical interpretation of W is a representation of a state vector |W in a bigger Hilbert space H ⊗ H . Here, the dimension of H is r and the operation π(∗) corresponds to the partial trace of |W W | over H . In this subsection, basic concepts about the tangent bundle T (Wr ) over Wr , which is a real manifold with the real parameter ζ = (ζ 1 , ..., ζ 2rd−1 )T , are introduced. The matrix representation M(∂/∂ζ i ) of the tangent vector ∂/∂ζ i (throughout the thesis, the tangent vector is understood as the differential operator) is a d × r complex matrix such that ∂ ∂ M ≡ 2 i W (ζ). ∂ζ i ∂ζ The real span of the matrix representations is {X | Re tr XW ∗ (ζ) = 0, X ∈ M (d, r, C)}. We introduce the inner product ∗, ∗W to T (Wr ) such that, ˆ Yˆ W ≡ ˆ ij Re(MYˆ )ij + Im(MX) ˆ ij Im(MYˆ )ij ) X, (Re(MX) i,j

ˆ = Re tr ( (MX)(M Yˆ )∗ ), which is invariant under the action of U ∈ U (r) to the matrix representation of the tangent vector from right side, ˆ MYˆ W , ˆ (MX)U, (MYˆ )U W U = MX,

U ∈ U (r),

Let us decompose TW (Wr ) into the direct sum of the horizontal subspace LS W and the vertical subspace KW where LS W is defined by ˆ | W ∗ (MX) ˆ = (MX) ˆ ∗ W }, LS W ≡ {X

(10)

entire

December 28, 2004

13:56


Keiji Matsumoto

312

and KW is the orthogonal complement space TW (Wr ) H LS W with respect ˆ ∈ KW satisfies to the inner product ∗, ∗W . X ∗ ˆ ˆ ∗ = 0, (MX)W + W (MX)

(11)

ˆ = 0, π∗ (X)

(12)

or its equivalence,

where π∗ is the differential map of π. A member of the horizontal subspace and the vertical subspace are called a horizontal vector and vertical vector, ˆ ∈ TW (Wr ) by the projection onto the horizonrespectively. The image of X tal subspace LS W is called the horizontal component, while the image by the projection onto the vertical subspace KW is called vertical component. The horizontal lift hW is a mapping from Tπ(W ) (Pr ) to TW (Wr ) such that π∗ h (X) = X, h (X) ∈ LS W . W

W

Because of the following theorem, the matrix representation of the horizontal lift π∗ (hW (X)) is a representation of the tangent vector X ∈ Tπ(W ) (Pr ). Theorem 3: hW is an isomorphism from Tπ(W ) (Pr ) to LS W . Proof: First, notice that for any Yˆ ∈ TW (Wr ), W +εM(Yˆ ) also is a member of Wr if ε is small enough. Therefore, we have π∗ ( TW (Wr ) ) ⊂ Tπ(W ) (Pr ). Second, we prove that the map π∗ |LS W is a one to one map from LS W ˆ = 0 when X ˆ ∈ KW . to Tπ(W ) (Pr ). For that, it is sufficient to prove that X This statement is proved to be true because KW is orthogonal to LS W . Finally, checking the dimension of Tπ(W ) (Pr ) is equal to LS W , we have the theorem. Using the horizontal lift, the inner product ∗, ∗ in T (Pr ) is deduced from ∗, ∗: AA BB . X, Y π(W ) = h (X), h (Y ) W

W

W

The horizontal lift h satisfies the following equality so that the above definition of the inner product ∗, ∗ is self-consistent: BB AA BB AA = , (U, U ∈ U (n) ). h X, h Y h X, h Y W

W

WU

WU

entire

December 28, 2004

13:56



313

The symmetrized logarithmic derivative (SLD, in short) of X ∈ Tπ(W ) (Pr ) is the Hermitian operator LSX in H defined by the equation 1 S (L ρ(θ) + ρ(θ)LSX ), (13) 2 X where θ is a real parameter which is assigned to a member of Pr . Iff the density operator is strictly positive, SLD is uniquely defined by (13). LS∂/∂θi is often denoted simply by LSi . SLD is closely related to the horizontal lift by the following equation: M h X = LSX W. (14) Xρ(θ) =

W

3.2. Definition of Uhlmann’s parallelism Berry’s phase, by far confirmed by several experiments, is a holonomy of a natural connection in the line bundle over the space of pure states [1, 4]. In 1986, Uhlmann generalized the theory to include mixed states in the Hilbert space H [24, 25, 26]. Throughout this section, for the sake of clarity, d ≡ dim H is assumed to be finite. For notational simplicity, the argument θ is omitted, as long as the omission is not misleading. Define a horizontal lift of a curve C = {ρ(t)|t ∈ R} in Pr as a curve Ch = {W (t)|t ∈ R} in Wr which satisfies C = π(Ch ) and d dW (t) =M . (15) h dt W (t) dt Then, the relative phase factor (RPF) between ρ(t0 ) and ρ(t1 ) along the curve C is the unitary matrix U which satisfies the equation ˆ 1 U, W (t1 ) = W ˆ 1 ) and ˆ 1 satisfies ρ(t1 ) = π(W where W ˆ ∗ W (t0 ) = W ∗ (t0 )W ˆ 1. W 1 RPF is said to vanish when it is equal to the identity. 3.3. RPF for infinitesimal loop The RPF for the infinitesimal loop (θ1 , ..., θi , ..., θj + dθj , ..., θm ) ← (θ1 , ..., θi + dθi , ..., θj + dθj , ...., θm ) (16) ↓ ↑ 1 i j m 1 i i j m (θ , ..., θ + dθ , ..., θ , ...., θ ) θ = (θ , ..., θ , ..., θ , ...., θ ) →

entire

December 28, 2004

314

13:56


Keiji Matsumoto

is calculated up to the second order of dθ by expanding the solution of the equation (15) to that order: 1 I + W −1 Fij W dθi dθj + o(dθ)2 , 2 1 Fij = (∂i LSj − ∂j LSi ) − [LSi , LSj ]. (17) 2 Note that Fij is a representation of the curvature form, and that RPF for any closed loop vanishes iff Fij is zero at any point in M. 3.4. The SLD Cram´ er-Rao inequality In parallel with the classical estimation theory, in the quantum estimation theory, we have the following SLD CR inequality, which is proved for the faithful state model by Helstrom [10, 11], for the pure state model by Fujiwara and Nagaoka [8], and for the general case by Fujiwara and Matsumoto [6]: ˆ | M ] ≥ (J S (θ))−1 , Vθ [θ(ω)

(18)

ˆ ˆ i.e., Vθ [θ(ω) | M ] − (J S (θ))−1 is non-negative definite. Here Vθ [θ(ω) | M ] is S ˆ a covariance matrix of an unbiased estimator (θ, M, Ω), and J (θ) is called SLD Fisher information matrix, and is defined by 3A B 4 ∂ ∂ S , (19) = [Re tr ρ(θ)LSi (θ)LSj (θ)], J (θ) ≡ ∂θi ∂θj ρ(θ) which is nothing but the metric tensor of the inner product ∗, ∗. The inequality (18) is of special interest, because J S−1 , called SLD CR bound, is the one of the best bounds in the sense explained later. To prove the inequality (18), we set some notations, and present some ˆ M, Ω), we define the notation M as lemmas. For unbiased estimator (θ, follows, 6 ˆ M, W ) ≡ (θî (ω) − θi )M (dω)W. Mi (θ, ˆ M ] is m × m matrix defined by Z[θ, ˆ M ] = tr Mi (θ, ˆ M, W ) (Mj (θ, ˆ M, W ) )∗ Z[θ,

(20)

Lemma 4: Following two inequalities are valid: ˆ M ]. V [θˆ | M ] ≥ Re Z[θ,

(21)

entire

December 28, 2004

13:56



315

ˆ M ]. V [θˆ | M ] ≥ Z[θ,

(22)

ˆ M ] ≥ J S−1 Re Z[θ,

(23)

Lemma 5:

holds. The equality is valid iff ∂ j ˆ S−1 j,k S S−1 j,k M (θ, M, W ) = [J ] Lk W, = [J ] M h , (24) W ∂θ k k

k

They are proved in almost the same manner as the strictly positive case (see p. 88 and p.274 [12], respectively). Lemma 4 and 5 lead to the SLD CR inequality (18). Theorem 6: SLD Fisher information gives a lower bound of covariance matrix of an unbiased measurement, i.e., (18) holds true. The SLD CR bound (18) is the best bound in the following sense. Theorem 7: Letting A be a real hermitian matrix which is larger than ˆ M, Ω) J S−1 , that is, A > J S−1 , there exists such an unbiased estimator (θ, that V [θˆ | M ] is not smaller than A. Proof: Let v ∈ Rm be the real vector such that 6 ∂ ∃C ⊂ R tr ρ(θ)Ev (dt) "= 0, (i = 1, ..., m), ∂θi C

(25)

where Ev is a projection valued measure obtained by the spectral decom position of i,j vj (θj I + [J S−1 ]i,j LSj ), where I and vi denotes identity and ith component of v. The condition (25) implies the existence of an estimator θˆv (ω) which makes the triplet (θˆv , Ev , Ω) locally unbiased at θ. For that triplet (θˆv , Ev , Ω), we have

vT V [θˆ | Ev ] − J S−1 v = 0. If ε is small enough, for any real vector v ∈ Rm , v + εv0 satisfies the condition (25), or its equivalence,

(26) (v + εv0 )T V [θˆv+εv0 | Ev+εv0 ] − J S−1 (v + εv0 ) = 0. Let us assume that there exists a real matrix A which satisfies ˆ ] ≥ A > J S−1 V [θ|M

entire

December 28, 2004

316

13:56


Keiji Matsumoto

for any unbiased estimator. Then, by virtue of (26), we have for any real vector v0 ∈ Rm and enough small ε, % & (v + εv0 )T A − J S−1 (v + εv0 ) = 0, whose second derivative with respect to ε yields % & v0T A − J S−1 v0 = 0. Because v0 ∈ Rm is arbitrary, we have A − J S−1 = 0, which contradicts with the assumption A > J S−1 . Theorem 8: If the model M has only one parameter, the equality in (18) is achievable. ˆ be a projection valued measurement and an estimator Proof: Let E and θ(∗) which satisfies, 76 8i S−1 ji S ˆ J Lj . (θ(ω) − θ)M (dω) = j

ˆ E, Ω) is locally unbiased at θ and attains SLD CR Then, the triplet (θ, bound. Theorem 8 implies the statistical significance of the natural metric ∗, ∗. A possible geometrical interpretation of (8): the closer two states ρ(t) and ρ(t + dt) are, the harder it is to distinguish ρ(t) from ρ(t + dt). The SLD CR inequality (18) looks quite analogical to CR inequality in classical estimation theory. However, as will be found out later, the equality is not generally attainable. 3.5. The attainable Cram´ er-Rao type bound In the previous section, SLD CR bound is proved to be the best bound. However, as will be turned out, this best bound is attainable only in the special cases, that is, the case when the model is locally quasi-classical. In general case, therefore, we must give up to find a tight lower bound of covariance matrix in the form of the matrix inequality. Instead, we determine the region Vθ (M) of the map V [∗] from unbiased estimators to m × m real positive symmetric matrices (so far as no confusion is expected, we write V for Vθ (M)). Especially, the boundary bdV is of interest, because V is convex as in the following lemma.

entire

December 28, 2004

13:56



317

Lemma 9: V is convex. Proof: In this proof, we define the estimator to be the measurement which takes value in Rm . Let M and M be an unbiased estimator. Because λV [M ] + (1 − λ)V [M ] = V [λM + (1 − λ)M ] holds true and λM +(1−λ)M is an unbiased estimator, we have the lemma.

Lemma 10: If a matrix V is a member of V, V + V0 is also a member of V for any arbitrary nonnegative real symmetric matrix V0 . ˆ M, Ω) be a locally unbiased estimator whose covariance maProof: Let (θ, trix is V , and define ε(ω) by

1/2 ˆ ])−1/2 θ(ω) ˆ ε(ω) ≡ V0 (V [θ|M −θ . ˆ + ε(ω) is also a locally unbiased estimator, and its Then, θˆ∗ (ω) = θ(ω) covariance matrix is equal to V + V0 . To obtain bdV, the following procedure is used in this thesis. Define an inner product of two real symmetric matrices A and B by Tr AB. Then, the set {V |Tr V G = const.} is a hyperplain perpendicular to the vector G. Because of lemmas 9-10, bdV is the collection of all the matrices V ∈ V which achieve the minimum of Tr GV for a certain symmetric real nonnegative definite matrix G. However, when the model has too many parameters, the dimension of the space Sym(m) of real symmetric matrices is so large that Vθ (M) is extremely hard to determine. In such cases, we calculate CR(G, θ, M) ≡ min{Tr GV | V ∈ Vθ (M)}

(27)

for an arbitrary nonnegative symmetric real matrix G, and call it the attainable CR (Cramér-Rao) type bound. The matrix G is called weight matrix. Often, we drop G, θ and/or M when no confusion is expected. If CR(G, θ, M) is smaller than CR(G, θ , M ) for any weight matrix G, the Vθ (M) of is located in the ‘lower part’ of Sym(m) compared with that of Vθ (M ). To make the estimational meaning of (27) clear, let us consider a diagonal weight matrix G = diag(g1 , g2 , ..., gm ). Letting vii be the (i, i)-th component of V [M ], gi vii , Tr GV [M ] = i

entire

December 28, 2004

318

13:56


Keiji Matsumoto

is the weighed sum of the variances of the estimations of the parameters θi (i = 1, ..., m). If the accuracy of estimation of, for example, the parameter θ1 is required more than other parameters, then g1 is set larger than any other gi , and the estimator which minimize i gi vii is to be used. 3.6. The locally quasi-classical model and the quasi-classical model In this section, the condition for SLD CR bound to be attainable is reviewed briefly, in the case of the faithful model, any member of which is faithful, i.e., a reversible operator. We denote the space of the faithful states by P+ (H), or simply by P+ . For mathematical simplicity, the dimension d of the Hilbert space H is assumed to be finite. The author believes the essence of the discussion will not be damaged by this restriction. As for the equality in (18) in the faithful model, we have the following theorem, which is proved by Nagaoka [18]. Theorem 11: The equality in (18) is attainable at θ iff [LSi (θ), LSj (θ)] = 0 for any i, j. Letting |ω be a simultaneous eigenvector of the matrices {LSj (θ)|j = 1, ..., m} and λi (ω) be the eigenvalue of LSi (θ) corresponding to |ω, the equality is attained by the estimator (θˆ(θ) , M(θ) , Ω) such that Ω = {ω|ω = 1, ..., n}, M(θ) (ω) = |ωω|, j (ω) = θj + θˆ(θ)

d

[(J S )−1 ]jk λk (ω).

(28)

k=1

The model M is said to be locally quasi-classical at θ iff LSi (θ) and commute for any i, j, because in this case (18) gives the attainable lower bound of the covariance matrix of the unbiased estimator, as its classical counterpart does. However, it should be noted that even if the model is locally quasi-classical at any θ ∈ Θ, theorem 11 do not tell us the optimal experiment scheme, because the measurement M(θ) in (28) generally depends on the unknown parameter θ (so does the best experiment scheme). Therefore, let us move to easier case. Suppose that LSi (θ) and LSj (θ ) commute even when θ "= θ , in addition to being locally quasi-classical at any θ ∈ Θ. Then, the measurement M(θ) in (28), denoted by Mbest in the remainder of the section, is uniformly optimal for all θ (so is the corresponding scheme). We say such a model is quasi-classical [28]. LSj (θ)

entire

December 28, 2004

13:56



319

After the best experiment is done, the rest of our task is to estimate the value of the parameter θ, where Mbest is substituted into M . Hence, in this case, the quantum estimation reduces to the classical estimation. 4. Uhlmann Connection and the Estimation Theory 4.1. Some new facts about RPF In this subsection, we derive conditions for RPF to vanish, which is used to characterize the classes of manifold defined in the previous subsection. For notational simplicity, the argument θ is omitted, as long as the omission is not misleading. Theorem 12: RPF for any closed loop vanishes iff [ LSi (θ), LSj (θ) ] = 0 for any θ ∈ Θ. In other words, (29) Fij (θ) = 0 ⇐⇒ LSi (θ), LSj (θ) = 0. Proof: If Fij equals zero, then both of the two terms in the left-hand side of (17) must vanish, because the first term is Hermitian and the second term is skew Hermitian. Hence, if Fij = 0, [ LSi , LSj ] vanishes. On the other hand, the identity ∂i ∂j ρ − ∂j ∂i ρ = 0, or its equivalence 1 S S 1 S S S S S S ∂i Lj − ∂j Li − [Li , Lj ] ρ + ρ ∂i Lj − ∂j Li + [Li , Lj ] = 0, 2 2 implies that ∂i LSj − ∂j LSi vanishes if [ LSi , LSj ] = 0, because ∂i LSj − ∂j LSi is Hermitian and ρ is positive definite. Thus we see Fij = 0 if LSi and LSj commute. A manifold M is said to be parallel when the RPF between any two points along any curve vanishes. From the definition, if M is parallel, RPF along any closed loop vanishes, but the reverse is not necessarily true. The following theorem is a generalization of Uhlmann’s theory of Ω-horizontal real plane [26]. Theorem 13: The following three conditions are equivalent. (i) M is parallel. (ii) Any element ρ(θ) of M writes M (θ) ρ0 M (θ), where M (θ) is Hermitian and M (θ0 ) M (θ1 ) = M (θ1 ) M (θ0 ) for any θ0 , θ1 ∈ Θ. (iii) ∀i, j, ∀θ0 , θ1 ∈ Θ, LSi (θ0 ), LSj (θ1 ) = 0. Proof: Let W (θt ) = M (θt )W0 be a horizontal lift of {ρ(θt ) | t ∈ R} ⊂ M. Then, W0∗ W (θt ) = W ∗ (θt ) W0 implies M (θt ) = M ∗ (θt ),

entire

December 28, 2004

320

13:56


Keiji Matsumoto

and W ∗ (θt0 ) W (θt1 ) = W ∗ (θt1 ) W (θt0 ) implies M (θt0 ) M (θt1 ) = M (θt1 ) M (θt0 ). Thus we get (1) ⇒ (2). Obviously, the reverse also holds true. For the proof of (2) ⇔ (3), see pp. 31–33 [28]. 4.2. Uhlmann’s parallelism in the quantum estimation theory To conclude the section, we present the theorems which geometrically characterize the locally quasi-classical model and quasi-classical model, by the vanishing conditions of RPF, implying the close tie between Uhlmann parallel transport and the quantum estimation theory. They are straightforward consequences of the definitions of the terminologies and theorems 11-13. Theorem 14: M is locally quasi-classical at θ iff Fij (θ) = 0 for any i, j. M is locally quasi-classical at any θ ∈ Θ iff the RPF for any loop vanishes. Theorem 15: M is quasi-classical iff M is parallel. 5. The Pure State Estimation Theory 5.1. Historical review of the theory and the purpose of the section First, we review the history of quantum estimation theory to clarify the purpose of the section. In parallel with the classical estimation theory, in 1967, Helstrom show that in the faithful state model, the covariance matrix is larger than or equal to the inverse of SLD Fisher information matrix, and that in the one parameter faithful model, the bound is attainable [10, 11]. On the other hand, in the multi-parameter model, it is proved that there is no matrix which makes attainable lower bound of covariance matrix, because of non-commutative nature of quantum theory. Therefore, we deal with the attainable CR type bound defined in the subsection 3.5. Lower bounds of Tr GVθ [M ] is, attainable or not, called Cramér-Rao (CR) type bound. Five years after Helstrom’s work, Yuen and Lax found out the exact form of the attainable CR type bound of the Gaussian state model, which is a faithful 2-parameter model obtained by superposition of coherent states by Gaussian kernel [27]. Their work is remarkable not only because it was first calculation of the attainable CR type bound of a multi-parameter model, but also because they established a kind of methodology, which we

entire

December 28, 2004

13:56



321

call indirect approach hereafter, to calculate the attainable CR type bound. First an auxiliary bound which is not generally attainable is found out and then it is proved to be attained in the specific cases. In their work, they used so-called RLD bound, which was used and generalized by several authors. Holevo completed their work by solving analytically subtle problems and generalizing RLD bound [12]. In 1998, Nagaoka calculated the attainable CR type bound of the faithful 2-parameter spin-1/2 model using Nagaoka bound, which is also another auxiliary bound [15]. The models had been assumed to be faithful till Fujiwara and Nagaoka formulated the problem in the pure state model, and calculated the CR type bound for the 1-parameter model by use of generalized SLD bound and that of the 2-parameter coherent model by use of generalized RLD bound in 1995 [8, 9]. The approach in this section, called direct approach in contrast with indirect approach, is essentially different from the approaches of other authors. We reduce the given minimization problem to the problem which is easy enough to be solved directly by elementary calculus. The methodology is successfully applied to the general 2-parameter pure state model, coherent model with arbitrary number of parameters, and the minimization of Tr J S (θ)Vθ [M ] for arbitrary pure state model. These are relatively general category in comparison with the cases treated by other authors. In the 2parameter pure state model, the existence of the order parameter β which is a good index of noncommutative nature between the parameters. As a by-product, we have remarkable corollary, which asserts that even for non-commutative cases, a simple measurement attains the lower bound. 5.2. Notations In this subsection, we consider data space Ω to be Rm , and each ω ∈ Ω to ˆ be the estimate θ. The pure state model M is assumed to be induced by M = π(N ) from the manifold ˜ N = {|φ(θ) | |φ(θ) ∈ H}, ˜ is the set of the members of H with unit length, where H ˜ = {|φ | |φ ∈ H, φ|φ = 1}. H If this assumption is made, the horizontal lift of the tangent vector is taken for granted. Uniqueness is proved mostly in the same way as the proof of theorem 3.

entire

December 28, 2004

13:56


Keiji Matsumoto

322

We denote by |lX the matrix representation of the horizontal lift of X ∈ Tρ(θ) (M), and |li is short for |l∂/∂θi . |lX satisfies Xρ(θ) =

1 (|lX φ(θ)| + |φ(θ)lX |), 2

(30)

and lX |φ(θ) = 0.

(31)

Notice that spanR {|li | i = 1, ..., m} is a representation of Tρ(θ) (M) because of unique existence of |lX . Therefore, we often also call the matrix representation |lX horizontal lift. We call Mi (M, |φ(θ)) estimation vector of the parameter θi by a measurement M at |φ(θ), An estimation vector Mi (M, |φ(θ)) is said to be locally unbiased iff M is locally unbiased. The local unbiasedness conditions for estimating vectors writes φ(θ)|Mi (M, |φ(θ) ) = 0, Relj (θ)|Mi (M, |φ(θ) ) = δji (i, j = 1, ..., m).

(32)

Often, we omit the argument θ of |lj (θ), |φ(θ), ρ(θ), and J S (θ) and denote them simply by |lj , |φ, ρ, J S . We denote the ordered pair of vectors [ |l1 , |l2 , ..., |lm ] by L. In this notation, the SLD Fisher information matrix J S writes J S = ReL∗ L ≡ Re [ li |lj ] , ˜ Generally, for the ordered and the imaginary part of L∗ L is denoted by J. pairs X = [ |x1 , |x2 , ..., |xm ], Y = [ |y 1 , |y 2 , ..., |y m ], of vectors, we define

X∗ Y = xi |y j

for notational simplicity. Then, letting X be 1 M (M, |φ ), M2 (M, |φ ), ..., Mm (M, |φ ) , the unbiasedness conditions (32) writes Re X∗ L = Im , Re (Mi (M, |φ ) )∗ |lj = Im ,

(33)

entire

December 28, 2004

13:56



323

where Im is the m × m unit matrix, and the matrix Zθ [M ] defined in the equation (20) writes Zθ [M ] = X∗ X 5.3. The commuting theorem and the locally quasi-classical model In this subsection, the necessary and sufficient condition for SLD CR bound to be attainable is studied. Fujiwara proved the following theorem [7]. Theorem 16: (Fujiwara [7]) SLD CR bound is attainable iff SLD’s {LSi |i = 1, ..., m} can be chosen so that [LSi , LSj ] = 0, (i, j = 1..., m). We prove another necessary and sufficient condition which is much easier to check for the concrete examples, by use of the following commuting theorem, which plays key role in our direct approach to pure state estimation. Theorem 17: If there exists a unbiased measurement M such that V [M ] = ReX∗ X,

(34)

where the ordered pair X is X = M1 (M, |φ ), M2 (M, |φ ), ..., Mm (M, |φ ) ,

(35)

then, ImX∗ X = 0

(36)

holds true. conversely, if (36) holds true for some ordered pair X of vectors, then there exists a simple, or projection valued, unbiased estimator E which satisfies (34), (35), and ! m 5 2 m ˆ ˆ ˆ ˆ {θκ } = 0, (37) E({θκ }) = E({θκ }), E({θ0 }) = E0 , E R / κ=0

for some {θˆκ |θˆκ ∈ R , κ = 0, ..., m + 1}, where E0 is a projection onto orthogonal complement subspace of spanC {X}. m

Proof: If (34) holds, inequality (22) in lemma 4 leads to ReX∗ X ≥ X∗ X,

or 0 ≥ iImX∗ X,

entire

December 28, 2004

13:56


Keiji Matsumoto

324

which implies ImX∗ X = 0. Conversely, Let us assume that (36) holds true. Applying Schmidt’s orthogonalization to the system {|φ, X} of vectors, we obtain the orthonormal system {|bi | i = 1, ..., m + 1} by which the system X of vectors write of vectors such that,   m+1 m+1 m+1 j  X= λ1j |bj , λ2j |bj , ..., λm j |b , j=1

j=1

j=1

λij

where (i = 1, ..., m, j = 1, ..., m + 1) are real numbers. Letting O = [oij ] be a (m + 1) × (m + 1) real orthogonal matrix such that φ|

m+1

oij |bj "= 0,

j=1

m+1

oij |bj by |b i , ith member of the ordered pair X writes   m+1 m+1 m+1 m+1 i k m+1 m+1 k k j=1 λj oj k i k k i k  |b b |φ. λj oj |b = λj oj  |b = k |φ b j=1 j=1 k=1 k=1 k=1 and denoting

j=1

Therefore, since the system {|b i |i = 1, ..., m+1} of vectors is orthonormal, we obtain an unbiased measurement which satisfies (37) as follows: m+1 i κ j=1 λj oj ˆ , κ = 1, ..., m + 1, θˆ0 = 0, θκ = b κ |φ m+1 κ κ κ κ ˆ ˆ E({θκ }) = |b b |, κ = 1, ..., m + 1 E(θ0 ) = I − |b b |. κ=1

Here, I is the identity in H. Theorem 18: SLD CR bound is attainable iff J˜ = ImL∗ L = 0

(38)

lj |li is real for any i, j. Conversely, if (38) holds true, SLD CR bound is achieved by a simple measurement, i.e., a projection valued measurement. Proof: If SLD CR bound is attainable, by virtue of lemma 4-5, we have (36) and (24), which lead directly to (38). Conversely, if Imlj |lk = 0 for any j, k, by virtue of commuting theorem, there exists such a simple measurement E that [J S−1 ]j,k |lk = M(E, |φ). k

entire

December 28, 2004

13:56



325

Elementary calculations show that the covariance matrix of this measurement equals J S−1 . Our theorem is equivalent to Fujiwara’s one, because by virtue of commuting theorem, lj |li is real iff there exist such SLD’s that LSi and LSj commute for any i, j. However, our condition is much easier to be checked, because SLD’s are not unique in the pure state model. Example 19: Often, a model is defined by an initial state and generators, ρ(θ0 ) = ρ,

∂i ρ(θ) = i[Hi (θ), ρ(θ)],

ρ(θ) = π(|φ(θ)).

Then, lj |li is real iff φ(θ)|[ Hi (θ), Hj (θ) ]|φ(θ) is 0. Because of theorem 17, φ(θ)|[ Hi (θ), Hj (θ) ]|φ(θ) = 0 is equivalent to the existence of generators which commute with each other, [Hi (θ), Hj (θ)] = 0. Because of this example and the theorem by Fujiwara, we may metaphorically say that SLD CR bound is attainable iff any two parameter has ‘classical nature’ at θ, because often classical limit of a quantum system is obtained by taking such a limit that commutation relations of observables tend to 0. Throughout the paper, we say that a manifold M is locally quasi-classical at θ iff lj |li is real at θ. The following remark describes another ‘classical’ aspect of the condition Imlj |li = 0. Example 20: The model M = π(N ), where N is a real span of some orthonormal basis of H, is locally quasi-classical at any point. As is illustrated in this example, when the model M = π(N ) is locally quasi-classical at θ0 , N behaves like an element of real Hilbert space around θ0 . Metaphorically speaking, |φ(θ)’s phase parts don’t change around θ0 at all, and N looks like the family of the square root of the probability distributions. 5.4. The reduction theorem and the direct approach Theorem 21: (Naimark’s theorem, see pp.64–68 [12].) Any generalized measurement M in H can be dilated to a simple measurement E in a larger Hilbert space K ⊂ H, so that M (B) = P E(B)P will hold, where P is the projection from K onto H.

(39)

entire

December 28, 2004

13:56


Keiji Matsumoto

326

Naimark’s theorem, mixed with commuting theorem, leads to the following reduction theorem, which sits at the heart of our direct approach. Theorem 22: Let M be a m-dimensional manifold in P1 , and Bθ be a system {|φ , |li | i = 1, ..., m} of vectors in 2m + 1-dimensional Hilbert space Kθ such that φ |lj = φ|lj = 0,

li |lj = li |lj ,

φ |φ = φ|φ = 1

for any i, j. Then, for any locally unbiased estimator M at θ in H, there is a simple ‘locally unbiased’ measurement E in Kθ , |xi = Mi (E, |φ ) ∈ Kθ

(40)

x |φ = 0, i

Rexi |lj = δji (i, j = 1, ..., m), whose ‘covariance matrix’ V [E] equals V [M ], 8 76 i i ˆj j ˆ ˆ V [M ] = V [E] ≡ (θ − θ )(θ − θ )φ |E(dθ)|φ ,

(41)

(42)

Proof: For any locally unbiased measurement M , there exists a Hilbert space HM and a simple measurement EM in HM which satisfies (39) by virtue of Naimark’s theorem. Note that EM is also locally unbiased. Mapping spanC {|φ, |li , Mi (M, |φ) | i = 1, ..., m} isometrically onto Kθ so that {|φ, |li | i = 1, ..., m} are mapped to {|φ , |li | i = 1, ..., m} , we denote the images of {Mi (M, |φ) | i = 1, ..., m} by {|xi | i = 1, ..., m}. Then, by virtue of the commuting theorem, we can construct a simple measurement E in Kθ satisfying the equations (40) - (42). In our direct approach, the reduction theorem reduces the determination of V to the determination of the set of matrices V = ReX∗ X where a system X = [|x1 , |x2 , ..., |xm ] of elements of Kθ H {|φ } which satisfies (36) and (41). In the same way, we minimize Tr GReX∗ X, where {|xi | i = 1, ..., m} ∈ Kθ H {|φ }, under the restriction such that the equations (36) and (41) are satisfied, instead of minimization of tr GV where V runs through V. Now, the problem is simplified to the large extent, because we only need to treat with vectors {|xi | i = 1, ..., m} in finite dimensional Hilbert space Kθ instead of measurements, or operator valued measures.

entire

December 28, 2004

13:56



327

We conclude this subsection with a corollary of reduction theorem, which is seemingly paradoxical, since historically, non-projection-valued measurement is introduced to describe measurements of non-commuting observables. Corollary 23: When the dimension of H is larger than or equal to 2m+ 1, for any unbiased measurement M in H, there is a simple measurement E in H which has the same covariance matrix as that of M . Proof: Choose {|li | i = 1, ..., m} to be {|li | i = 1, ..., m}. Especially, if H is infinite dimensional, as is the space of wave functions, the assumption of the corollary is always satisfied. 5.5. Lagrange’s method of indeterminate coefficients in the pure state estimation theory Now, we apply our direct approach to the problems presented in the end of subsection 3.5. To minimize the functional Tr GReX∗ X of vectors in Kθ , Lagrange’s indeterminate coefficients method is employed. First, denoting an ordered pair {|li | i = 1, ..., } of vectors in Kθ also by L, the symbol which is used also for an ordered pair {|li | i = 1, ..., } of vectors in H, we define a function Lag(X) by Lag(X) ≡ ReTr X∗ XG − 2Tr ((ReX∗ L − Im )Ξ) − Tr ImX∗ XΛ,

(43)

where Ξ, Λ are matrices whose components are Lagrange’s indeterminate coefficients. Here, Λ can be chosen to be antisymmetric, for Tr ImX∗ XΛ = Tr ImX∗ X(Λ − ΛT )/2 holds true and only skew symmetric part of Λ appears in (43). From here, we follow the routine of Lagrange’s method of indeterminate coefficients. Differentiating L(X + εδX) with respect to ε and substituting 0 into ε in the derivative, we get ReTr (δX∗ (2XG − 2LΞ − 2iXΛ)) = 0. Because δX is arbitrary, X(G − iΛ) = LΞ

(44)

is induced. Multiplying X∗ to both sides of (44), the real part of the outcoming equation, together with (33), yields Ξ = ReX∗ XG = V G.

(45)

entire

December 28, 2004

328

13:56


Keiji Matsumoto

Substituting (45) into (44) , we obtain X(G − iΛ) = LV G.

(46)

If (33), (36), (46) and V = ReX∗ X are solved for V , X and real skew symmetric matrix Λ, our problems will be perfectly solved. However, so far, solutions only for special cases are known. The rest of this subsection is devoted to the proof of the theorem which claim a little stronger assertion than the corollary 23. Though the real linear space spanR {L} is always m-dimensional for the parameters not to be redundant, the dimension of the complex linear space spanC {L}, or the rank of L, is not necessarily equal to m. If rankC L = rankG = m is assumed, since the rank of the matrix V is m as is proved soon, the rank of the left hand side of (46) is m, so is the rank of the right hand side, implying that G − iΛ is invertible. The rank of the matrix V is m because (33) implies that the dimension of spanR {X} is m. Since X is given by LV G(G − iΛ)−1 , we can conclude that spanC {X} should be a subspace of spanC {L}. Therefore, by the same argument as in the proof of the corollary 23, we obtain the following theorem. Theorem 24: Suppose that the dimension H is larger than or equal to m + 1, and that the dimension of the complex linear space spanC {L} is m. Then, for any strictly positive weight matrix G, the attainable CR type bound CR(G) is attained by a simple measurement. 5.6. The model with two parameters In this subsection, we determine V for the arbitrary 2-parameter pure state model. The equation(46), mixed with (36), leads to (G − iΛ)V (G − iΛ) = GV L∗ LV G.

(47)

whose real part and imaginary part are GV G − ΛV Λ = GV J S V G,

(48)

˜ G, GV Λ + ΛV G = −GV JV

(49)

and

where J˜ denotes ImL∗ L, respectively. We assert that when the matrix G is strictly positive, (47) is equivalent to the existence of X which satisfies (33), (36), (47), and V = ReX∗ X. If real

entire

December 28, 2004

13:56



329

positive symmetric matrix V and real antisymmetric matrix Λ satisfying (47) exist, X which satisfies (46) and (36) is given by X = U V 1/2 , where U is a 2m + 1 by m complex matrix such that U ∗ U = Im . If G is strictly positive, X = U V 1/2 also satisfies (33), because V G = ReX∗ LV G is obtained by multiplication of X∗ to and taking real part of the both sides of (46), and our assertion is proved. Hence, our task is to solve (48) and (49) for real positive symmetric matrix V and real antisymmetric matrix Λ, if G is strictly positive. When G is not strictly positive, after solving (48) and (49), we must check whether there exists an ordered pair X of vectors which satisfies (33), (36) and V = ReX∗ X. In the remainder of this subsection, we use the coordinate system where S J is equal to the identity Im . Given an arbitrary coordinate system {θi |i = i 1, ..., m}, such a coordinate system {θ | i = 1, ..., m} is obtained by the following coordinate transform: θ = i

m [(J S )1/2 ]ij θj (i = 1, ..., m)

(50)

j=1

By this coordinate transform, V is transformed as: V = (J S )1/2 V (J S )1/2 .

(51)

Therefore, the result in the originally given coordinate is obtained as a transformation of the result in the coordinate system {θ i | i = 1, ..., m}, by using (51) in the converse way. So far, we have not assumed dim M = 2. When dim M = 2, covariance matrices are members of the space Sym(2) of 2 × 2 symmetric matrices which is parameterized by the real variables x, y, and z, where 7 8 z+x y Sym(2) = V V = . y z−x Before tackling the equations (48) and (49), three useful facts about this parameterization are noted. First, letting A is a symmetric real matrix which is represented by (Ax , Ay , Az ) in the (x, y, z)-space, the set PM(A) of all matrices larger than A is PM(A) = {(x, y, z) | (z − Az )2 − (x − Ax )2 − (y − Ay )2 ≥ 0, (z − Az ) ≥ 0}, that is, interior of a upside-down corn with its vertex at A = (Ax , Ay , Az ). Hence, V is a subset of PM(Im ), or inside of an upside-down corn with its

entire

December 28, 2004

330

13:56


Keiji Matsumoto

vertex at (0, 0, 1) because of SLD CR bound. When the model M is locally quasi-classical at θ, V coincides with PM(Im ). Second, an action of rotation matrix Rθ to V such that Rθ V RθT , where 7 8 cos θ − sin θ Rθ = , sin θ cos θ corresponds to the rotation in the (x, y, z)-space around z-axis by the angle 2θ. Third, we have the following lemma. Lemma 25: V is rotationally symmetric around z-axis, if the coordinate in M is chosen so that J S writes the unit matrix Im . Proof: V is the set of every matrix which writes ReX∗ X using a 2m + 1 by m complex matrix X satisfying (33) and (36). Therefore, the rotational symmetry of V around z-axis is equivalent to the existence of a 2m + 1 × m complex matrix Y satisfying (33), (36) and ∀θ Rθ X∗ XRθT = Y∗ Y,

(52)

for any 2m + 1 × m complex matrix X which satisfies (33) and (36). Because ˜ elementary calculation shows that of L∗ L = Im + iJ, L∗ L = ReRθ L∗ LRθT , or, that for some unitary matrix U in Kθ , L∗ U = Rθ L∗ , which leads, together with (33), to ReL∗ U X = Rθ .

(53)

Therefore, Y = U XRθT satisfies (52), and we have the lemma. Because of lemma 25, V is determined if the boundary of the intersection V˜ of V and the zx-plane is calculated. Note that the ‘inner product’ Tr GV of G and V does not take its minimum at V ∈ V˜ unless G is in the zx˜ only diagonal weight matrix is needed to plane. Therefore, to obtain bdV, be considered.

entire

December 28, 2004

13:56



331

Let us begin with the case of a positive definite weight matrix. In this case, we only need to deal with (48) and (49). Let 7 8 7 8 7 8 7 8 0 −β 0 −λ 10 u0 ˜ J= , and Λ = , G= ,V = , β 0 λ 0 0g 0v where g, v, and u are positive real real numbers. Note that |β| ≤ 1 holds, because L∗ L = Im + iJ˜ is nonnegative definite. Then, (48) and (49) writes u + vλ2 − u2 = 0, vg 2 + uλ2 − v 2 g 2 = 0, vgλ + uλ + uvβg = 0. The necessary and sufficient condition for λ and positive g to exist is, after some calculations, (u − 1)1/2 + (v − 1)1/2 − |β|(uv)1/2 = 0.

(54)

Note that u and v are larger than or equal to 1, because V ≥ J S−1 = Im . Substitution of u = z + x and v = z − x into (54), after some calculations, leads to |β| = |β|(z + x − 1)1/2 (z − x − 1)1/2 ±(1 − β 2 )1/2 ((z + x − 1)1/2 + (z − x − 1)1/2 )

(55)

Fig.1 shows that the lower sign in the equation (55), |β| = |β|(z + x − 1)1/2 (z − x − 1)1/2 +(1 − β 2 )1/2 ((z + x − 1)1/2 + (z − x − 1)1/2 )

(56)

˜ In (56), x takes value ranging from −β 2 /(1 − β 2 ) to gives a part of bdV. 2 2 β /(1 − β ) if |β| is smaller than 1. When |β| = 1, x varies from −∞ to ∞. This restriction on the range of x comes from the positivity of z − x − 1 and z + x − 1. When the weight matrix G is 7 8 7 8 10 00 G= , or G = , (57) 00 01 we must treat the case of |β| = 1 and the case of |β| < 1 separately. If |β| = 1, there exists no (2m + 1) × m complex matrix X which satisfies V = ReX∗ X, (47), (33), and (36). On the other hand, if |β| < 1, such

entire

December 28, 2004

13:56


Keiji Matsumoto

332

z

2( 1 − (1 − β2)1/2)−1 (ii)

(i)

2( 1 + (1 − β2)1/2)−1 1

−1 − β2(1 − β2 ) −1

0

x

1 β (1 − β ) 2

2 −1

Fig. 1. Two stationary lines; (i) |β|(z + x − 1)1/2 (z − x − 1)1/2 + (1 − β 2 )1/2 ((z + x − 1)1/2 + (z − x − 1)1/2 ) = |β|; (ii) |β|(z + x − 1)1/2 (z − x − 1)1/2 − (1 − β 2 )1/2 ((z + x − 1)1/2 + (z − x − 1)1/2 ) = |β|;

complex matrix X always exists and V = ReX∗ X is given by, in terms of (x, y, z), z = −x + 1, x ≤ −

β2 1 − β2

or z = x + 1, x ≥

β2 1 − β2

(58)

With the help of (56) and (58), V is depicted as Fig.2. The intersection of z-axis and bdV gives CR(J S ) = min{J S V |V ∈ V} =

4 , 1 + (1 − |β|2 )1/2

(59)

where the equality holds in any coordinate of the model M. Simple calculation leads to following theorem. Theorem 26: If a model M has lager value of |β| at θ than another model N has at θ , the Vθ (M) is a subset of Vθ (N ). By virtue of this theorem, |β| can be seen as a measure of ‘uncertainty’ between the two parameters. Two extreme cases are worthy of special attention; When |β| = 0, the model M is locally quasi-classical at θ and V is largest. On the other hand, if |β| = 1 , V is smallest and uncertainty

entire

December 28, 2004

13:56



333

between θ1 and θ2 is maximum. In the latter case, we say that the model is coherent at θ. (b) z

1

−1 − β2(1 − β2 ) −1 (a)

0

2( 1 + (1 − β2)1/2)−1 x

1 β (1 − β ) 2

2 −1

(c) z

z

2

1

−1

0

x

1

Fig. 2.

−1

0

1

x

(a) |β| = 0; (b) 0 < |β| < 1; (c) |β| = 1.

Example 27: We define generalized spin coherent model [2] by Mj,m = π(N ) Nj,m = {|φ(θ) | |φ(θ) = exp iθ1 (sin θ2 Sx − cos θ2 Sy )|j, m, 0 ≤ θ1 < π, 0 ≤ θ2 < 2π},

(60)

where Sx , Sy , Sz are spin operators, and |j, m is defined by, Sz |j, m = m|j, m,

(Sx2 + Sy2 + Sz2 )|j, m = 2 j(j + 1)|j, m.

j takes value of half integers, and m is a half integer such that −j ≤ m ≤ j.

entire

December 28, 2004

13:56


Keiji Matsumoto

334

Then after tedious calculations, we obtain 2 2 M h ∂1 = 2i(sin θ Sx − cos θ Sy )|j, m |j,m$ 1 2 2 1 M h ∂2 = 2i{−sin θ (cos θ Sx +sin θ Sy )+(cos θ − 1)(Sz −m)}|j, m, |j,m$

and

7

J S = 22 (j 2 + j − m2 ) βj,m =

8 1 0 , 0 sin2 θ1

7 J˜ =

8 0 2m2 sin θ1 , −2m2 sin θ1 0

m . j 2 + j − m2

If m = αj, where α < 1 is a constant, βj,m tends to zero as j → ∞, and the model Mj,m becomes locally quasi-classical. However, if m = j, the model Mj,m is coherent for any j. 5.7. Multiplication of the imaginary unit As is shown in the previous subsection, in the 2-parameter model, |β|, a good index of ‘uncertainty’ between two parameters, or a measure of how distinct the model is from the classical model. It can be easily shown that, whatever coordinate of the model M is chosen, ±iβ are the eigenvalues ˜ which is deeply related to the complex structure of of the matrix J S−1 J, the model. Actually, that matrix stands for the linear map D from Tρ (M) onto Tρ (M) defined as in the followings; First, we multiply the imaginary unit i to |lX = M(h(|lX )) and M−1 and π∗ are applied successively to i|lX . Since π∗ (M−1 (i|lX )) is not a member of Tρ (M) generally, we project π∗ (M−1 (i|lX )) ∈ Tρ (P1 ) to Tρ (M) with respect to the metric ∗|∗ρ , and we obtain DX ∈ Tρ (M). multiplication of i −− −→ i|lX ∈ spanR {L, iL} |lX ∈ spanR L ↑ ↓ M−1 , π∗ −1 π∗ (M (i|lX )) ∈ Tρ (P1 ) h|φ$ , M ↑ project ↓ w.r.t. ∗, ∗ρ X ∈ Tρ (M) −− −→ DX ∈ Tρ (M) D The following theorems are straightforward consequences of the above discussion.

entire

December 28, 2004

13:56



335

Theorem 28: The absolute value of an eigenvalues of D, or equivalently, ˜ is smaller than or equal to 1. of J S−1 J, 5.8. The coherent model As for the model with arbitrary number of parameters, the model is said to be coherent at θ iff all of the eigenvalues of (J S )−1 J˜ are ±i. When the number of parameters is 2, this definition of coherency reduces to |β| = 1. It should be noted that the eigenvalues of J S−1 J˜ are , whether the model is coherent or not, of the form ±iβj or 0. Therefore, the number of parameters of the coherent model is even. In this subsection, we determine the attainable CR type bound of the coherent model with arbitrary numbers of parameters. The coherent model is worthy of attention firstly because the coherent model is ‘the maximal uncertainty’ model, secondly because there are many physically important coherent models. Because J S−1 J˜ is a representation of D, or of multiplication of the imaginary unit i, the following theorem. Theorem 29: The model M is coherent at θ iff ˜ 2 = −Im (J S−1 J)

(61)

holds true. Theorem 30: The model M is coherent at θ iff spanR {iL} is identical to spanR L, or equivalently, iff spanR {L, iL} is identical to spanR L. This theorem leads to the following theorem. Theorem 31: The model M is coherent at θ iff the dimension of spanC L is m/2. Proof: First, we assume that dimC spanC L = m/2.

(62)

Because spanR L is a m-dimensional subspace of spanR {L, iL} whose dimension is smaller than or equal to m because of (62), we have spanR {L, iL} = spanR L, or coherency of the model at θ. Conversely, let us assume that the model is coherent at θ. If we take an orthonormal basis {ej | j = 1, ..., m} of Tρ (M) such that ej+m/2 = Dei (i =

entire

December 28, 2004

13:56


Keiji Matsumoto

336

1, 2, ..., m/2), then M(h(ej+m/2 )) = iM(h(ej ))(i = 1, 2, ..., m/2) holds true, and any element |u of spanR L = spanR {L, iL} writes |u =

m

m/2

aj M(h(ej )) =

j=1

(aj + iaj+(m/2) )M(h(ej )),

j=1

implying that the dimension of spanC L is m/2. In 1996, Fujiwara and Nagaoka [9] determined the attainable CR type bound of the two parameter coherent model. In the following, more generally, we calculate the bound of the coherent model with arbitrary number of parameters. Lemma 32: In the case of the coherent model, ReL∗ X = Im or its equivalence ReL∗ (X − LJ S−1 ) = 0, implies ˜ S−1 . L∗ X = Im + iJJ Proof: ImL∗ (X − LJ S−1 ) = −ReiL∗ (X − LJ S−1 ) = 0.

(63)

Here, spanR L = spanR iL is used to deduce the last equality. (63) and ReL∗ (X − LJ S−1 ) = 0 implies ˜ S−1 . L∗ X = L∗ LJ S−1 = Im + iJJ

Multiplication of L∗ to the both sides of (46), together with the lemma presented above, yields ˜ S−1 )(G − iΛ) = (J S + iJ)V ˜ G. (Im + iJJ

(64)

By virtue of (61), both of the real part and the imaginary part of (64) give the same equation, ˜ S−1 Λ = J S V G, G + JJ or G1/2 V G1/2 − G1/2 J S−1 G1/2 =

˜ S−1 G1/2 G1/2 J S−1 JJ

G−1/2 ΛG−1/2 . (65)

entire

December 28, 2004

13:56



337

˜ S−1 G1/2 Therefore, letting ai and bi denote the eigenvalues of G1/2 J S−1 JJ and G−1/2 ΛG−1/2 respectively, we have ˜ S−1 G1/2 ), (G−1/2 ΛG−1/2 ) = 0 (G1/2 J S−1 JJ

˜ S−1 G1/2 )(G−1/2 ΛG−1/2 ) = Tr (G1/2 J S−1 JJ |ai ||bi |, i

˜ S−1 because (65) and SLD CR inequality implies that (G1/2 J S−1 JJ ×G1/2 )(G−1/2 ΛG−1/2 ) is positive Hermitian. On the other hand, (46) or its equivalence, XG1/2 (Im − iG−1/2 ΛG−1/2 ) = LV G1/2 ,

(66)

implies |bi | = 1 (i = 1, ..., m), because the rank of Im − iG−1/2 ΛG−1/2 is shown to be m/2 from (66), and the rank of matrices L, X, and V . rankC X = rankR X = m, hold true, where the last equation is valid by virtue of ImX∗ X = 0. After all, letting Tr absA denote the sum of the absolute values of the eigenvalues of A, we have the following theorem. Theorem 33: ˜ S−1 , CR(G) = Tr GJ S−1 + Tr abs GJ S−1 JJ where letting |A| = (AA∗ )1/2 , the covariance matrix V such that ˜ S−1 G1/2 G−1/2 . V = J S−1 + G−1/2 G1/2 J S−1 JJ attain the minimum. To check the coherency of the model, the following theorem, which is induced from theorem 28, is useful. Theorem 34: the model is coherent at θ iff ˜ |detJ S | = |detJ|. Example 35: (squeezed state model) Squeezed state model, which has four parameters, is defined by M = {ρ(z, ξ) | ρ(z, ξ) = π(|z, ξ), z, ξ ∈ C}, where |z, ξ = D(z)S(ξ)|0, D(z) = exp(za† − za), S(ξ) = exp

1 ( ξa†2 − ξa2 ) . 2

entire

December 28, 2004

13:56


Keiji Matsumoto

338

√ Here, the operator a is defined as a = (Q + iP )/ 2 where P and 9 Q satisfy 2 1 the canonical commutation relation [P, Q] = −i. Letting z = (θ + 4

iθ2 ), and ξ = θ3 e−2iθ (0 ≤ θ3 , 0 ≤ θ4 ≤ 2π), we have 2i 2i M h ∂θ1 = (−P + θ2 )|z, ξ, M h ∂θ2 = (Q − θ1 )|z, ξ, |z,ξ$ |z,ξ$ 4 4 M h ∂θ3 = −i(e2iθ a2 − e−2iθ a†2 )|z, ξ, |z,ξ$

M

h ∂θ4

|z,ξ$

1 = i4(sinh2 θ3 )(a† a + )|z, ξ, 2 4

4

−i2(sinh θ3 )(cosh θ3 )(e2iθ a2 + e−2iθ a†2 )|z, ξ, and



cosh 2θ3 − sinh 2θ3 cos 2θ4 sinh 2θ3 sin 2θ4 3 4 cosh 2θ3 + sinh 2θ3 cos 2θ4 sinh 2θ sin 2θ 0 0 0 0   0 1 0 0  0 0 2  −1 0 . J˜ =  0 − 2 sinh 2θ3   0 0 0 0 2 sinh 2θ3 0

2 JS =  

 0 0  0 0 ,  0 2 3 0 sinh 2θ

Coherency of this model is easily checked by theorem 34, ˜ = 4 sinh2 2θ3 . |detJ S | = |detJ| 2 Example 36: (spin coherent model) As is pointed out by Fujiwara [9], spin coherent model Ms,s , where Ms,m is defined by (60), is coherent. Example 37: (total space model) The total space model is the space of all the pure state P1 in finite dimensional Hilbert space H. By virtue of theorem 30, the coherency of the model is proved by checking that spanC L is invariant by the multiplication of the imaginary unit i. Let |l be a horizontal lift of a tangent vector at |φ. Then, i|l is also a horizontal lift of ˜ another tangent vector at |φ, because |φ + i|ldt is a member of H. 5.9. Informationally exclusive, and independent parameters In a m-parameter model M, we say parameter θi and θj are informationally independent at θ0 , iff Reli |lj |θ=θ0 = Imli |lj |θ=θ0 = 0,

entire

December 28, 2004

13:56



339

because if the equation holds true, in the estimation of the parameters θ1 , θ2 of the 2-parameter submodel M(1, 2|θ0 ) of M, where M(1, 2|θ0 ) ≡ ρ(θ) θ = ( θ1 , θ2 , θ03 , ..., θ0m ), (θ1 , θ2 ) ∈ R2 , (67) both of the parameters can be estimated up to the accuracy which is achieved in the estimation of the parameter of the 1-parameter submodels M(1|θ0 ) and M(2|θ0 ), where M(1|θ0 ) ≡ {ρ(θ) | θ = (θ1 , θ02 , ..., θ0m ), θ1 ∈ R}, M(2|θ0 ) ≡ {ρ(θ) | θ = (θ01 , θ2 , ..., θ0m ), θ2 ∈ R}. On the other hand, iff Rel1 |l2 |θ=θ0 = 0,

(68)

and M(1, 2|θ0 ) is coherent, or equivalently,

Iml1 |l2 |θ=θ0 = (l1 |l1 l2 |l2 )1/2

θ=θ0

hold true, we say the parameters are informationally exclusive at θ0 . Fujiwara and Nagaoka [9] showed that in the coherent model with two orthogonal parameters, the attainable CR type bound is achieved by applying the best measurement for each parameter alternatively to the system. This fact implies that if two parameters are informationally exclusive, the one of them do not contain any information about the other. In fact, we have the following theorem. Theorem 38: If two parameters θ1 and θ2 are informationally exclusive, any unbiased measurement M in M(1, 2|θ0 ) which estimates θ1 as accurately as possible, i.e., 6 ˆ = J S−1 11 (θˆ1 − θ1 )2 Tr ρ(θ0 )M (dθ) (69) can extract no information about θ2 from the system, i.e., ! ∂ρ 2 = Reφ|M (B)|l2 = 0, ∀B ⊂ R Tr M (B) 2 ∂θ θ=θ0

(70)

and vice versa. Proof: We prove the theorem only for the measurements which writes 6 ˆ θ|µ(d ˆ ˆ |θ θ). M (B) = B

entire

December 28, 2004

13:56


entire

Keiji Matsumoto

340

The proof for general case will be discussed elsewhere. If (69) holds true, as in the proof of lemma 4 (see p. 88 [12]), ˆ (θˆ1 − θ1 )|φ − 1 |l1 = 0 (71) θ| l1 |l1 must hold. On the other hand, because of coherency of M(1, 2|θ0 ) and (68), for some real number a, we have |l2 = ia|l1 , by use of which it is shown that (70) is equivalent to ˆ θ|l ˆ 1 ) = 0. Im( φ|θ This equation is obviously true if (71) is true, and we have the theorem.

5.10. Direct sum of models For the submodels M1 ≡ M(1, 2, ..., m1 | θ0 ), M2 ≡ M(m1 , m1 + 1, ..., m | θ0 ) of M, where M(1, 2, ..., m1 | θ0 ) and M(m1 , m1 + 1, ..., m | θ0 ) are defined almost in the same way as the definition (67) M(1, 2 | θ0 ), we write M|θ0 = M1 ⊕ M2 |θ0 , and say that M is sum of M and M at θ0 . m − m1 is denoted by m2 . Lemma 39: If any parameter of M1 is informationally independent of any parameter of M2 at θ0 , and the weight matrix G writes 7 8 G1 0 G= , 0 G2 then CR(G, θ0 , M) = CR(G1 , θ0 , M1 ) + CR(G2 , θ0 , M2 ). When the assumption of the lemma is satisfied, M1 and M2 are said to be informationally independent at θ0 . Proof: Let θˆ1 (ω) and θˆ2 (ω) be the vector whose components are the estimates of θ1 , θ2 , ..., θm1 , and θm1 +1 , θm1 +2 , ..., θm ,

θˆ1 (ω) = θˆ1 (ω), θˆ2 (ω), ..., θˆm1 (ω) , θˆ2 (ω) = θˆm1 +1 (ω), θˆm1 +2 (ω), ..., θˆm (ω) .

December 28, 2004

13:56



341

ˆ Ω) is locally unbiased, (M, θî , Ω) is locally unbiased. ThereThen, if (M, θ, fore, we have, ˆ Ω) is locally unbiased} {M | (M, ∃θî , Ω) is locally unbiased} ⊃ {M | (M, ∃θ, which yields,

ˆ Ω) is locally unbiased ˆ ] (M, θ, min Tr GV [θ|M

ˆ Ω) is locally unbiased = min Tr G1 V [θˆ1 |M ] (M, θ,

ˆ Ω) is locally unbiased + min Tr G2 V [θˆ2 |M ] (M, θ,

≥ min Tr G1 V [θˆ1 |M ] (M, θˆ1 , Ω) is locally unbiased

+ min Tr G2 V [θˆ2 |M ] (M, θˆ2 , Ω) is locally unbiased ,

or its equivalence, CR(G, M) ≥ CR(G1 , M1 ) + CR(G2 , M2 ).

(72)

Because M1 and M2 are informationally independent, L for M writes 7 8 L1 0 L= , 0 L2 in the appropriate coordinate, where L1 = [|l1 , |l2 , ..., |lm1 ], and L2 is in the same manner. In that coordinate, X writes 7 8 X1 X12 X= . X21 X2 Therefore, if ReX∗i Li = Imi , ImX∗i Xi = 0 (i = 1, 2) holds true, the measurements corresponding to X is locally unbiased, and CR(G, M) = min { Tr GX∗ X | ReX∗ L = Im , ImX∗ X = 0 } 2 Tr Gi X∗i Xi ReX∗i Li = Imi , ImX∗i Xi = 0, (i = 1, 2) ≤ min i=1

= CR(G1 , M1 ) + CR(G2 , M2 ), which, mixed with (72) leads to the lemma.

entire

December 28, 2004

13:56


Keiji Matsumoto

342

6. Berry’s Phase in Quantum Estimation Theory 6.1. Berry’s phase In this subsection, we review the geometrical theory of Berry’s phase. Berry’s phase was discovered by M.V. Berry in 1984 [4], and confirmed by many experimental facts [22]. In 1987, Aharonov and Anandan [1] pointed out that Berry’s phase is naturally interpreted as a curvature in the fiber bundle over P1 . Actually, Berry’s phase is nothing but the Uhlmann’s curvature restricted to P1 [24, 26]. Uhlmann’s RPF in the space P1 takes value in the set of unimodular complex number. On the other hand, Berry’s phase takes value in real numbers. They are related as Berry’s phase = −i ln(Uhlmann’s RPF). The Berry’s phase for the infinitesimal loop (16) is calculated up to the second order of dθ as 1 φ(0)|Fij |φ(0)dθi dθj + o(dθ)2 2i Because Berry’s phase is independent of the choice of SLD, we can take LSi to be 2∂i ρ. Then, the phase is equal to 1˜ Jij dθi dθj + o(dθ)2 , 2 where J˜ij is equal to 12 Imli |lj . Mathematically,

J˜ij dθi dθj

(73)

i,j

corresponds to the curvature form. 6.2. Berry’s phase in quantum estimation theory It must be noted that the curvature form is deeply related to the multiplication of the imaginary unit D. Actually, The curvature form (73) is identical to a map from Tθ (M) × Tθ (M) to R such that A B 1 J˜ij dθi dθj : (∂i , ∂j ) −→ ∂i , D∂j . 2 ij Hence, the eigenvalues of D can be interpreted in terms of Berry’s phase. Concretely speaking, taking the coordinate system which is orthonormal at

entire

December 28, 2004

13:56



343

θ in terms of the metric ∗, ∗θ , they are the half of the Berry’s phase obtained when the state goes around the infinitesimal loop (16). Especially, when the model is only with two parameters, the eigenvalues of D are the half of the Berry’s phase per unit area, where unit of the area is naturally induced from the metric ∗, ∗θ . Therefore, we can roughly say that the more the Berry’s phase for the loop (16), the harder it is to estimate θi and θj simultaneously. Namely, θi and θj are informationally independent iff the Berry’s phase for the loop (16) vanishes and ∂i , ∂j θ = 0 holds. On the other hand, iff the Berry’s phase for the loop (16) is maximal and ∂i , ∂j θ = 0 holds, i.e., β = 1, the two parameters are informationally exclusive. This discussion is parallel to that in the subsection 4.2, which was about relations between Uhlmann’s parallelism and the noncommutative nature of the quantum estimation theory of the faithful model. Because Berry’s phase is nothing but the restriction of Uhlmann’s RPF to the pure state model, this parallelism is natural. What about the models with arbitrary number of parameters? By virtue of theorem 18, if Berry’s phase for any closed loop vanishes, the model is locally quasi-classical. For general pure state models, we have the following theorem.

Theorem 40: For any pure state model,

˜ S−1/2 )1/2 }−2 CR(J S ) = Tr {Re(Im + iJ S−1/2 JJ 2 = . 1 + (1 − |α|2 )1/2 α∈{eigenvalues of D}

The estimation theoretical significance of CR(J S ) is hard to verify. However, this value remains invariant under any transform of the coordinate in the model M, and can be a good index of distance between V and J S−1 . Proof: Because CR(J S ) is invariant by any affine coordinate transform in the model M, we choose a coordinate in which J S writes Im and J˜ writes

entire

December 28, 2004

13:56


Keiji Matsumoto

344



0 −β1 0 · · · 0  β1 0 0 · · · 0   .  0 0 . . . . . . ..   . .. . . . .  . . . 0 .  .  J˜ =  0 0 · · · 0 0   0 0 · · · 0 βl    0 0 ··· 0 0   . .. . . .. ..  .. . . . . 0 0 ··· 0 0

 0 ··· 0 0 ··· 0  .. . . ..  . . .    0 0 ··· 0  −βl 0 · · · 0  .  0 0 ··· 0  . . 0 0 . . ..   .. . . . . ..  . . . . 0 ··· ··· 0 0 0 .. .

Then, The model M is decomposed into the direct sum of the submodels one or two parameter Mκ , : M= Mκ , κ

where any two submodels Mκ and Mκ are informationally independent, and J˜ of a two parameter submodel Mκ is 7 8 0 −βκ . βκ 0 Because the weight matrix J S = Im writes in the form of direct some of the weight matrix on the model Mκ , by virtue of lemma 39 and the equation (59), we have the theorem.

6.3. Berry’s phase in the global theory of quantum estimation In this subsection, we present a geometrical sufficient condition for the pure state model M to be quasi-classical in the sense of subsection 3.6. For simplicity, we say that the manifold N in P1 is a horizontal lift of the model M if π(N ) = M,

∀|φ(θ) ∈ N ,

1 ∂ |φ(θ) ∈ LS |φ(θ)$ . 2 ∂θi

The horizontal lift N exists iff M is quasi-classical. Theorem 41: If H is finite dimensional and the model M is parallel, that model is globally classical in the sense of subsection 3.6.

entire

December 28, 2004

13:56



345

Proof: Because the model is parallel, Berry’s phase for any closed loop also vanishes, i.e., there exists a horizontal lift N of the model M. For the model M to be parallel, any members |φ(θ), |φ(θ ) of the horizontal lift N must satisfy φ(θ)|φ(θ ) ∈ R, which implies that N is a subset of the real separable Hilbert space R spanned by the appropriately chosen basis {|ej | j = 1, 2, ..., d} in H,   d   N ⊂ R = |φ |φ = aj |ej . (74)   j We claim that the optimal estimator is constructed by use of the measurement 6 M (B) = O|e1 e1 |OT µ(dO), B

where O is an orthogonal transform in R, and µ(∗) is the invariant measure in the space of all the orthogonal transform in R. To prove this claim, in the following, we calculate the classical Fisher information matrix of the probability distribution family 6 6 p(O| θ)µ(dO) = φ(θ)|M (dO)|φ(θ) | θ ∈ Θ . p(O| θ) B

B

From the definition, the (i, j) th component of the classical Fisher information matrix is 6 ∂i p(O| θ)∂j p(O | θ) µ(dO) p(O | θ) Ω 6 4∂i φ(θ)|O|e1 e1 |OT |φ(θ)∂j φ(θ)|O|e1 e1 |OT |φ(θ) = µ(dO) φ(θ)|O|e1 e1 |OT |φ(θ) 6Ω = µ(dO) 4∂i φ(θ)|O|e1 ∂j φ(θ)|O|e1 Ω 6 = 4∂i φ(θ)| µ(dO) O|e1 e1 |OT |∂j φ(θ) = 4∂i φ(θ)|∂j φ(θ) = li |lj , Ω

which is nothing but the (i, j) th component of the SLD Fisher information matrix, because li |lj = 4∂i φ(θ)|∂j φ(θ) is real by virtue of (74). We define the model is quasi-classical in the wider sense iff for any θ ∈ Θ, there is an open ball B(θ, ε) in Rm such that the model M(θ, ε) = {ρ(θ) | ρ(θ) ∈ M, θ ∈ B(θ, ε) ⊂ Θ}

entire

December 28, 2004

13:56


Keiji Matsumoto

346

is quasi-classical. Theorem 42: If the model M is parallel, that model is quasi-classical in the wider sense. Proof: Let us define N and R in the same way as in the proof of theorem 41. Then, the optimal measurement in M(θ, ε) for some ε is obtained by use of commuting theorem. The converse of the latter theorem is, however, not true, because the following counter-examples exist. Example 43: We consider the position shifted model which is defined by Mx = π(Nx ) Nx = {|φ(θ) | |φ(θ) = const. × (x − θ)2 e−(x−θ)

2

+ig(x−θ)

, θ ∈ R},

where c is a normalizing constant, g the function such that 0 (x ≥ 0), g(x) = α (x < 0). Then, as easily checked, Nx is a horizontal lift of the model Mx , and φ(θ)|φ(θ ) is not real unless α = nπ (n = 0, 1, ...). However, SLD CR bound is uniformly attained by the measurement obtained by the spectral decomposition Ex (dx) = |xx|dx of the position operator, where |x0 = δ(x − x0 ), as is checked by comparing SLD Fisher information of the model Mx and the classical Fisher information of the probability distribution family {p(x|θ) | p(x|θ) = |φ(θ)|x|2 , θ ∈ R}. Note that |φ(θ) is an eigenstate of the Hamiltonian 2 d2 2 1 2 H=− + 2x + 2 , 2m dx2 m x whose potential has two wells with infinite height of wall between them. Example 44: Let H be L2 ([0, 2π], C), and define a one parameter model M such that, M = π(N ) N = {|φ(θ) | |φ(θ) = const. × (2 − cos ω) eiα(f (ω−θ)+θ) , (0 ≤ ω, θ < 2π)}, (75)

entire

December 28, 2004

13:56



347

where α is a real number and f the function defined by ω − θ (ω − θ ≥ 0) f (ω − θ) = . ω + 2π − θ (ω − θ < 0) Physically, (75) is an eigenstate of the Hamiltonian H such that, 2 2 d A − B cos ω H =− − iα + , 2m dω 2 − cos ω which characterize the dynamics of an electron confined to the onedimensional ring which encircles magnetic flax Φ = 2παc/e, where m is the mass of the electron, −e the charge of the electron, c the velocity of light, and A, B the appropriately chosen constant. It is easily checked that N is a horizontal lift of the model M, and that the model M is not parallel unless α = nπ (n = 0, 1, ...). However, consider the projection valued measure Eω such that Eω (dω) = |ωω|dω, where |ω0 = δ(ω − ω0 ). Then, it is easily checked that the classical Fisher information of the probability distribution family {p(ω|θ) | p(ω|θ) = |φ(θ)|ω|2 , 0 ≤ ω, θ < 2π} is equal to the SLD Fisher information of M. 6.4. Antiunitary operators The transformation A |˜ a = A|a, |˜b = A|b is said to be antiunitary iff ˜ a|˜b = a|b,

A(α|a + β|b) = αA|a + βA|b,

where x means the complex conjugate of x. Fix an orthonormal basis B = {|i | i = 1, 2, ..., d}, and we can then define antiunitary operator KB which takes complex conjugate of any components in this basis, ! αi |i = αi |i. KB i

i

For different basis B, B , we have KB = U KB U ∗ ,

entire

December 28, 2004

348

13:56


Keiji Matsumoto

where U is a unitary operator corresponding to the change of the basis. ˜ is invariant Suppose that any member of the manifold N = {|φ} in H ˜ = A|φ, |φ˜ = A|φ . Then, we by the antiunitary operator A, and let |φ have ˜ = φ |φ ∈ R. φ|φ = φ˜ |φ Conversely, if φ|φ is real for any |φ, |φ ∈ N , by Schmidt’s orthonormalization, we can obtain the basis B such that N is subset of the real span of B, which means any member of N is invariant by the antiunitary operator KB . Therefore, the premise of the statement of theorems 41-42 is satisfied iff the horizontal lift of the model is invariant by some antiunitary operator. 6.5. Time reversal symmetry As an example of the antiunitary operator, we discuss time reversal operator (see pp. 266–282 [20]). The time reversal operator T is an antiunitary operator in L2 (R3 , C) which transforms the wave function ψ(x) ∈ L2 (R3 , C) as: T ψ(x) = ψ(x) = K{|x$} ψ(x). The term ‘time reversal’ came from the fact that if ψ(x, t) is a solution of the Sch¨ odinger equation 2 2 ∂ψ = − ∇ + V ψ, i ∂t 2m then ψ(x, −t) is also its solution. The operator T is sometimes called motion reversal operator, since it transforms the momentum eigenstate eip·x/ corresponding to eigenvalue p to the eigenstate e−ip·x/ corresponding to eigenvalue −p. Define the position shifted model by Mx = {ρ(θ) | ρ(θ) = π(ψ(x − x0 ) ), x0 ∈ R3 }, and suppose that any member of the horizontal lift Nx of the model Mx has time reversal symmetry. Then, since time reversal operator T is antiunitary, the model Mx is quasi-classical in the wider sense. The spectral decomposition of the position operator gives optimal measurement. Now, we discuss the generalization of time reversal operator. The antiunitary transform Tα : eip·x/ → eiα(p) e−ip·x/

entire

December 28, 2004

13:56



349

is also called motion reversal operator, or time reversal operator. If any member ψ(x − x0 ) of the horizontal lift Nx of the position shifted model Mx is invariant by the time reversal operator Tα , 6 ψ(x − x0 ) ψ(x − x0 ) dx ∈ R (76) R3

holds true for any x0 , x0 , which is equivalent to the premise of theorems 41-42. Conversely, if (76) holds true, Fourier transform of (76) leads to |Ψ(p)|2 = |Ψ(−p)|2 , where

6 1 Ψ(p) = √ ψ(x)e−ip·x/ dx. 2π Therefore, any member of Nx is transformed to itself by the time reversal operator Tα such that Tα : eip·x/ → ei(β(p)+β(−p) ) e−ip·x/ , where eiβ(p) =

Ψ(p) . |Ψ(p)|

Theorem 45: (76) is equivalent to the existence of the time reversal operator which transforms any member of the horizontal lift Nx to itself. References [1] Y. Aharonov and J. Anandan, “Phase change during a cyclic quantum evolution,” Phys. Rev. Lett., 58, 1593–1596 (1987). [2] S. Abe, “Quantized geometry associated with uncertainty and correlation,” Phys. Rev. A, 48, 4102–4106 (1993). [3] S. Amari, Differential-Geometrical Methods in Statistics, Lecture Notes in Statistics, 28, Springer, Berlin, 1985. [4] M.V. Berry, “Quantal phase factors accompanying adiabatic changes,” Proc. Roy. Soc. London, A392, 45–57 (1984). [5] P.J. Bickel, “The 1980 world memorial lectures on adaptive estimation,” Ann. Stats., 10, 3, 647–671 (1982). [6] A. Fujiwara, A geometrical study in quantum information systems, doctoral thesis, 1995. [7] A. Fujiwara, private communication. [8] A. Fujiwara and H. Nagaoka, “Quantum Fisher metric and estimation for pure state models,” Phys. Lett., 201A 119–124 (1995). (Chap. 16 of this book)

entire

December 28, 2004

350

13:56


Keiji Matsumoto

[9] A. Fujiwara and H. Nagaoka, “Coherency in view of quantum estimation theory,” in Quantum coherence and decoherence, by K. Fujikawa and Y. A. Ono, eds., Elsevier, Amsterdam, 1996, 303–306. [10] C.W. Helstrom, “Minimum mean-square error estimation in quantum statistics,” Phys. Lett., 25A, 101–102 (1967). [11] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [12] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982 (in Russian, 1980). [13] S. Kobayashi and K. Nomizu, Foundations of Differential Geometry, I, II, John Wiley, New York, 1963, 1969. [14] E.L. Lehmann, Theory of point estimation, John Wiley, 1983. [15] H. Nagaoka, “A new approach to Cramér-Rao bounds for quantum state estimation,” IEICE Technical Report, 89, 228, IT 89-42, 9–14, (1989). (Chap. 8 of this book) [16] H. Nagaoka, “On the parameter estimation problem for quantum statistical models,” In Proceeding of 12th Symposium on Information Theory and Its Applications, p. 577–582 (1989). (Chap. 10 of this book). [17] H. Nagaoka, “Differential geometrical aspects of quantum state estimation and relative entropy,” Math. Eng. Tech. Rep., 94-14, Univ. Tokyo (1994). [18] H. Nagaoka, private communication, 1, 1996. [19] M. Ozawa, “Quantum measuring processes of continuous observables,” J. Math. Phys., 25, 79–87 (1984). [20] J.J. Sakurai, Modern quantum mechanics, Benjamin/Cummings Publishing Company, Inc, 1985. [21] J. Samuel and R. Bhandari, “General setting for Berry’s phase,” Phys.Rev.Lett., 60, 2239-2342 (1988). [22] A. Shapere, F. Wilczek, Geometric phases in physics, Advanced Series in Mathematical Physics, 5, World Scientific, 1989. [23] W.F. Steinspring, “Positive functions on C ∗ -algebras,” Proc. Am. Math. Soc., 6, 211–216 (1955). [24] A. Uhlmann, “Parallel transport and ‘Quantum holonomy’ along density operators,” Rep. Math. Phys., 24, 229–240 (1986). [25] A. Uhlmann, “An energy dispersion estimate,” Phys. Lett., 161A, 329–331 (1992). [26] A. Uhlmann, “Density operators as an arena for differential geometry,” Rep. Math. Phys., 33, 253–263 (1993). [27] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of nonselfadjoint observables,” IEEE Trans. Inform. Theory, 19, 6, 740–750 (1973). [28] T.Y. Young, “Asymptotic efficient approaches to quantum-mechanical parameter estimation,” Information Sciences, 9, 25–42 (1975).

entire

December 28, 2004

13:56


PART IV Group Symmetric Approach to Pure States Model

Chap. 22: S. Massar and S. Popescu “Optimal extraction of information from finite quantum ensembles” . 356 Chap. 23: M. Hayashi “Asymptotic estimation theory for a finite-dimensional pure state model” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Chap. 24: D. Bruß, A. Ekert, and C. Macchiavello “Optimal universal quantum cloning and state estimation” . . . . . . . 379 Chap. 25: A.S. Holevo “Bounds for generalized uncertainty of the shift parameter” . . . . . . 386

entire

December 28, 2004

13:56


entire

December 28, 2004

13:56


CHAPTER 21

Introduction to Part IV 1. Optimal Performance in the Covariant Model Group-theoretical symmetry is a fundamental property in quantum physics, and providess a very powerful method to research several topics in quantum theory. Especially, when a group acts on the quantum system, the irreducibility simplifies the problems very much. Quantum statistical inference has several important models with group-theoretical symmetry, although classically such models have less importance. Thus, it is a natural research direction to apply group-theoretical symmetry to quantum statistical inference. The first approach based on symmetry was developed by Helstrom [IV1], and he treated the estimation problem of one-parametric covariant pure states families. Following his result, Holevo [IV-2,IV-3] formulated a quantum covariant state family and a covariant measurement in a more general setting. If we adopt Bayesian critrion with an invariant prior and an invariant loss function in the estimation of any covariant model, the optimal estimator can be given as the optimal covariant estimators which is the minimax estimator. This statement is called the quantum Hunt-Stein theorem [IV-2,IV-3], whose details are reviewed in section 3 in Chap. 23. Using this method, he obtained minimum errors of several important group covariant models. However, he did not explicitly focus on the case of n-identicalcopies. After this work, Holevo [Chap. 24] focused on the n-identical-copies setting in a one-parametric shifted pure states model, and evaluated the asymptotic behavior of the estimation error. His result has been extended by Holevo [IV-4]. Massar and Popescu [Chap. 22] focused on the same setting in the full pure states model in the quantum two-level system. They proved 1 in the covariant setting, whereas Holevo that the optimal error equals n+2 implicitly obtained the same result in section 10 of chapter IV in Holevo’s textbook [IV-3]. Hayashi [Chap. 23] treated the same problem with a more general covariant setting in a full pure states model in any finite-dimensional system. Here, we should remark that Massar and Popescu [Chap. 22] and

353

entire

December 28, 2004

354

13:56


Introduction to Part IV

ˆ 2 between the true Holevo [Chap. 24] treated only the error 1 − |φ, φ| ˆ state |φ and the estimated state |φ; however Hayashi [Chap. 23] treated ˆ 2 as the error. Furthermore, an arbitrary monotone function of 1 − |φ, φ| Vidal et al. [IV-5] treated the optimal error of the mixed states model in the two-level system with the n-identical copies under any covariant Bayes prior. Bagan et al. [IV-6] treated the asymptotic behaviour of its average error. 2. Optimization among Realizable POVMs These optimal measurements, however, are not necessarily realizable because they are usually collective measurements on the joint system of many particles. On the other hand, if a quantum measurement on the whole system consists of the repetition of the same quantum measurement of a single-particle system, it is realized relatively easily. Massar and Popescu [Chap. 22] proved that the optimal error cannot be realized by a separable measurement in the finite-copies case of their model. However, Hayashi [Chap. 23] indicated that the optimal error can be asymptotically attained to the first order by the repetition of the optimal measurement of a single copy without the adaptive method. He also showed that there is a small gap concerning large deviation evaluation between these two settings. In addition, it is easily verified that these covariant models are coherent models in the sense of [Chaps. 18,19]. Therefore, the optimal coefficients with the first asymptotic order of these models can be calculated by Fujiwara [Chap. 18], and Fujiwara and Nagaoka [Chap. 19]. Also, from the discussion in chapter 16, we can check that the optimal coefficients can be attained by an adaptive measurement. 3. Application to Other Problems The group representation theoretical method is useful for analyzing the approximate cloning of a pure state with N input qubits and M output qubits [IV-7]. In particular, Bruß, Ekert, and Macchiavello [Chap. 25] focused on the relation between quantum estimation and quantum approximate cloning, and obtained the optimal performance of quantum approximate cloning in this case. This is a good application of quantum estimation, and indicates the importance of quantum estimation for the foundation of quantum information. This result has been extended to the d-dimensional case by Werner [IV-8]. These results indicate that quantum estimation can be regarded as the quantum cloning with infinite output qubits.

entire

December 28, 2004

13:56


Introduction to Part IV

355

Another application of this method is the estimation of the direction of polarization with two spin 1/2 particles by Gisin and Massar [IV-9], who found that the antiparallel spin pair is more informative than the parallel spin pair. Moreover, Buˇzek, Derka, and Massar [Chap. 31] applied this method to the estimation of an unknown unitary action. Further Reading [IV-1] C. W. Helstrom, Internat. Journ. Theor. Phys., 11, 357 (1974). [IV-2] A. S. Holevo, “Covariant Measurements and Uncertainty Relations,” Rep. Math. Phys., 16, 385–400 (1979). [IV-3] A. S. Holevo, “Covariant Measurements and Optimality,” Chapter IV of Probabilistic and Statistical Aspects of Quantum Theory, (North-Holland, Amsterdam, 1982); Originally published in Russian (1980). [IV-4] A. S. Holevo, “Asymptotic estimation of shift parameter of a quantum state,” quant-ph/0307225 (2003); Theory of Probability and Its Applications, 49, No 2, (2004). [IV-5] G. Vidal, J. I. Latorre, P. Pascual, and R. Tarrach, “Optimal minimal measurements of mixed states,” Phys. Rev. A, 60, 126 (1999). [IV-6] E. Bagan, M. Baig, R. Mu˜ noz-Tapia, and A. Rodriguez, “Collective versus local measurements in a qubit mixed-state estimation,” Phys. Rev. A, 69, 010304(R) (2004); quant-ph/0307199. [IV-7] N. Gisin and S. Popescu, “Spin flips and quantum information for anti-parallel spins,” Phys. Rev. Lett., 83, 432 (1999); quantph/9901072. [IV-8] R. F. Werner, “Optimal Cloning of Pure States,” Phys. Rev. A, 58, 1827–1832 (1998); quant-ph/9804001. [IV-9] N. Gisin and S. Massar, “Optimal Quantum Cloning Machines,” Phys. Rev. Lett., 79, 2153–2156 (1997); quant-ph/9705046.

entire

December 28, 2004

13:56


CHAPTER 22

Optimal Extraction of Information from Finite Quantum Ensembles Serge Massar and Sandu Popescu Abstract. Given only a finite ensemble of identically prepared particles, how precisely can one determine their states? We describe optimal measurement procedures in the case of spin-1/2 particles. Furthermore, we prove that optimal measurement procedures must necessarily view the ensemble as a single composite system rather than as the sum of its components, i.e., optimal measurements cannot be realized by separate measurements on each particle.

A basic assumption of quantum mechanics is that if an infinite ensemble of identically prepared particles is given the quantum state of the particles can be determined exactly. But in practice one never encounters such infinite ensembles, and very often they are not even large (statistical). Given such a finite ensemble its state can only be determined approximately. How much knowledge can be obtained from such finite ensembles? How quickly does one approach exact knowledge as the system becomes large? What experimental strategies furnish the maximum knowledge? We solve these problems in some particular cases. This is not of academic interest only. The solution to such problems is expected to lead to applications in the fields of quantum information transmission, quantum cryptography, and quantum computation. A fundamental question related to the above has been raised by Peres and Wootters [6]. Is an ensemble of identically prepared particles, viewed as an entity, more than the sum of its components? That is, could more be learned about the ensemble by performing a measurement on all the constituent particles together than by performing separate measurements on each particle? Peres and Wootters conjectured that this is the case. In the present Letter we prove their conjecture, although not in its letter but in its spirit. Reprinted (abstract/excerpt/figure) with permission from S. Massar and S. Popescu, c Phys. Rev. Lett., 74, 1259, 1995. 1995 by The American Physical Society 356

entire

December 28, 2004

13:56


Optimal Extraction of Information from Finite Quantum Ensembles

357

We have answered the above problems in the context of a simple “quantum game.” The game consists of many runs. In each run a player receives N spin-1/2 particles, all polarized in the same direction. The player knows that the N spins are parallel. He also knows that in each run they are polarized in another direction, randomly and uniformly distributed in space, and that the Hamiltonian of the particles is the same in each run. The player is allowed to do any measurement he wants and is finally required to guess the polarization direction. (The answer must consist of indicating a direction, i.e., the player is not allowed to say something like “with probability P1 the particles were polarized along....”) The score of the run is cos2 (α/2), where α is the angle between the real and guessed directions. The final score is the average of the scores obtained in each run. The aim is to obtain the maximal score. As it has been defined the score is a number between 0 and 1. If no measurement is performed, but a polarization direction is simply guessed at randomly, the score is 1/2. The improvement over 1/2 actually represents the “information” gain; scores less than 1/2 also correspond to a gain in information, but in this case the guessed direction is systematically opposite to that of the spins. The maximal score (1) corresponds to perfect knowledge of the direction of polarization. In this Letter it is shown that the maximum score obtainable is (N + 1)/(N + 2), which tends towards 1 as N tends to infinity, i.e., as expected for infinite ensembles the direction of polarization can be determined exactly. Clearly, the above “game” is only a particular way in which the problem of optimizing measurements on finite ensembles might be formulated. Peres and Wootters, for example, used the same rules, but a different score, inspired by information theory, and a different distribution of spin directions. This Letter is organized as follows. A general formalism describing an experiment is used to obtain the equations an optimal experiment must satisfy. Next we consider the game outlined above and obtain (N + 1)/(N + 2) as an upper bound on the score, and optimal experiments that attain this score are exhibited. Last, the conjecture of Peres and Wootters is proven when the system consists of two parallel spins. Following von Neumann [3], every measurement can be considered as having two stages. The first stage is the interaction between two quantum systems, the measured system and the measuring device; the second stage consists of “reading” the measuring device. Let us denote by {|i} an orthonormal basis of the Hilbert space of the measured system, and |ψ0 MD the initial state of the measuring device.

entire

December 28, 2004

358

13:56


Serge Massar and Sandu Popescu

The first stage of the measurement consists of the interaction between the measured system and the measuring device, |f |ψfi MD . (1) |i|ψ0 MD → f

If the measured system started in an arbitrary space |ϕ linearity implies |φ|ψ0 MD → i|φ|f |ψfi MD . (2) i,f

To allow for the most general measurements we impose no restrictions on the measuring device nor on the interaction with the measured system. The dimension of the Hilbert space of the measuring device is arbitrary and might be much larger than that of the measured system. The wave functions |ψfi MD are not necessarily normalized, nor orthogonal to each other. The only constraints they obey are ψfi |ψfi = δii , (3) f

which follow from the unitarity of the time evolution describing the interaction with the measured system and the normalization of |ψ0 MD . (To simplify notation, in this formula and throughout the text we drop the subscript MD whenever it is obvious that the state belongs to the Hilbert space of the measuring device.) The second stage of the experiment consists of reading the state of the measuring device. This is implemented by considering a complete set of orthogonal projectors {Pξ }. Different outcomes of the experiment correspond to finding the measuring device in the different eigenspaces of the projectors Pξ . The probability of the outcome ξ if the initial state were |φ is φ|i i|φMD ψfi |Pξ |ψfi MD . (4) P (φ, ξ) = i,i ,f

Here also, for the sake of generality, the number of possible outcomes ξ of the measurement is left arbitrary and can be larger than the dimension of the Hilbert space of the measured system. We stress, because of its importance to our purpose, the complete generality of the about formalism. It includes ideal measurements (as described in the postulates of quantum mechanics [3]) but also fuzzy measurements, repeated experiments on the same system, etc. For example, a postitiveoperator-valued measure (POVM) [2, 4] is described in our formalism by simply considering the ancilla as a part of the measuring device and letting

entire

December 28, 2004

13:56



359

the rest of the measuring device act on both the measured system and the ancilla. Upon finding the measuring device to be in the state ξ, some “information” is obtained about the state of the system. This information could be expressed as a function S(ξ, φ). The average value of S is 6 DφP (φ, ξ)S(φ, ξ), (5) S= ξ

where the sums run over the outcomes ξ of the experiment and the initial states φ of the system to be measured, with a measure Dφ corresponding to their distribution. The problem at hand is to maximize (5) with respect to the possible measurements and guessing strategies, while respecting the unitary relations (3). Below, this program will be carried out in detail in the case of parallel spins. Before proceeding we simplify the formalism by choosing the Pξ ’s to be one dimensional projection operators onto a basis {|eξ } of the Hilbert space of the measuring device. Indeed, by decomposing the original projectors as a sum of one dimensional projectors, that is, by a more accurate reading of the measuring device, the information obtained in the measurement can only increase. We now turn to the specific problem considered in the introduction. The system to be measured consists of N parallel spins polarized in a random direction, say (θ, φ). Denote this state |Nθ,φ = | ↑θ,φ . . . ↑θ,φ .

(6)

N

The Hilbert space of the N spins can be decomposed into a sum of subspaces having different total spin S with S = N/2, N/2 − 1, . . .. Since our system consists of N parallel spins, it will always belong to the subspace of highest spin so we have to specify the measuring interaction only for this subspace. A basis of this subspace is |m, m = −N/2, . . . , N/2, which is shorthand for |S = N/2, Sz = m. The unitary evolution of the spins plus measuring device is given by N

|m|ψ0 MD → |v = m

2

|f |ψfm MD ,

(7)

f =1

where {|f } is a complete base of the Hilbert space of N spins. The prob-

entire

December 28, 2004

13:56



360

ability to obtain the result ξ is N

2

N/2

P (Nθ,φ ; ξ) =

Nθ,φ |mψfm |eξ eξ |ψfm m |Nθ,φ .

(8)

m,m =−N/2 f =1

Upon finding the measuring device to be in the state ξ, one guesses a direction of polarization θξ , φξ and obtains a score S = S(θ, φ; θξ , φξ ) = cos2 (α/2), as explained in the introduction. We have finally arrived at the mathematical formulation of our problem. We have to maximize the average score SN , 6 sin θdθdφ P (Nθ,φ ; ξ)S(θ, φ; θξ , φξ ), (9) SN = 4π ξ

with the unitary constraints (written in the reading basis |eξ ) N

v |v m

m

2 = ψfm |eξ eξ |ψfm = δ mm . ξ

(10)

j=1

The variables of this problem are |ψfm which encode the measuring interaction, |eξ which encode the reading procedure, and θξ , φξ which encode the guessing strategy. It is worth noting that the final states of the measuring device |ψfm and the reading base vectors |eξ always appear together, via the scalar product ψfm |eξ , so we do not have to vary them independently. Clearly, the reason behind this is that, given a particular measuring interaction and reading procedure, one can always find a completely equivalent experiment by changing both the finite states of the measuring device and the way the result is read. Rather than work with the complete set of consrtaints (10), it is convenient to consider first only the constraint

N/2

N/2

v m |v m =

m=−N/2

m=−N/2 f

ψfm |eξ eξ |ψfm = N + 1,

(11)

ξ

which follows immediately from (10). The maximal value of the score obtained by using this single constraint equation (11) is larger or equal to the true maximum, obtained when all the constraints are considered. In our case the two maxima coincide. We shall first find the maximum of the reduced problem and then exhibit a solution of the complete problem that attains the same score.

entire

December 28, 2004

13:56



361

Upon adding to SN the constraint (11) multiplied by the Lagrange multiplier λ and varying with respect to ψfm |eξ considered as independent variables, one obtains the following linear equations: eξ |ψfm [Mmm (θξ , φξ ) − λδmm ] = 0, (12) m

where M

6 mm

(θξ , φξ ) =

sin θdθdφ Nθ,φ |mm |Nθ,φ S(θ, φ; θξ , φξ ). 4π

(13)

Upon multiplying the mth equation (12) by ψfm |eξ , summing over m, f , and ξ, and using (11), a concise expression for the external value of SN is found to be SN extremum = λ(N + 1). Equation (12) has a nontrivial solution if and only if λ is an eigenvalue of M (θξ , φξ ) (the trivial solutions correspond to eξ |ψfm = 0 for all m, implying that the outcome ξ is never realized). The spherical symmetry inherent to this problem can be used to show that the matrix M (θξ , φξ ) transforms according to the adjoint representation of SU(2): M (θξ , φξ ) = U (θξ , φξ )M (θξ = 0)U † (θξ , φξ ) where U (θξ , φξ ) is an element of the N + 1 dimensional irreducible unitary representation of SU(2) that realizes rotations of the spins, sending the +z direction onto the θξ , φξ direction. It follows that the eigenvalues of M (θξ , φξ ) are independent of |θξ , φξ . Taking θξ = 0 direct computation shows that M is diagonal and its largest eigenvalue is λ = 1/(N + 2). So for this reduced problem the corresponding maximal value of the average score is SN extremum = (N + 1)/(N + 2). We now exhibit optical experiments that attain the score SN = (N + 1)/(N + 2), thereby proving that this upper bound on the score can be realized. In the case N = 1, one experiment (among many) that attains the score of 2/3 is realized by measuring the projection of the spin along a given axis, say the z axis (a Stern-Gerlach experiment), and according to whether the spin is found to be polarized along the +z or −z direction, to guess that this is the direction along which it is polarized. In the case N = 2, one possible optimal experiment consists of the standard measurement of a nondegenerate operator that has the following four eigenstates: √ 3 1 |S + | ↑nˆ i ↑nˆ j i = 1, . . . , 4, (14) 2 2 î where |S is the singlet state, ↑nˆ i represents a spin polarized along the n direction, and the four directions n ˆ i are oriented towards the corners of

entire

December 28, 2004

13:56



362

a tetrahedron. [The phases used in the definition of ↑nˆ i are such that the four states (14) are orthogonal.] The only requirement of the corresponding eigenvalues is that they be different from each other so that the measurement can distinguish between all four eiganstates. If the spins are found to be in the ith eigenstate, the guessed direction is n ˆ i. In the above optimal measurements the Hilbert space of the measuring device is finite dimensional and the number of possible outcomes of the measurement is finite. By counting the number of parameters it can be shown that such finite dimensional optimal measurements exist in the general case (N spins) but we have not been able to construct one explicitly. However, allowing for an infinite set of possible outcomes and using the spherical symmetry inherent to our problem, a measurement that attains the optimal score SN extremum can be constructed [5]. The measuring device that gets correlated to the spins is a particle moving on the surface of a sphere. Reading the measuring device consists of measuring the position of the particle. Upon finding it to be located at θ, φ, one guesses that the spins were aligned along the θ, φ direction. Let the reading basis, corresponding to a particle localized at θ, φ, be denoted |eθ,φ , with the normalization eθφ |eθ φ = 4πδθθ δϕϕ / sin θ. The experiment is described by the unitary evolution equation (7) with m |ψf given by |ψfm =

√ 6 N +1 dθdφ sin θ √ UmN/2 (θ, φ)|eθφ , 4π 2N

(15)

where U (θ, φ) is the same unitary matrix as before. One readily verifies that the |ψfm obey the unitary relations and that the average score obtained by this measurement is (N + 1)/(N + 2). We now come to the crux of our Letter: Must an optimal measurement on an ensemble of parallel spins necessarily treat the ensemble as an entity, i.e., as a single composite system? We show below that there exist no optimal experiments consisting of separate measurements on each spin, even though the result of one measurement may be used to decide which measurement is to be performed next on the other spin. (For related work on optimizing separable measurements, see [1].) For simplicity we consider the case of two spins. One first makes an arbitrary measurement on the first spin, + | ↑1 |ψ0 MD1 → | ↑1 |φ+ ↑ MD1 + | ↓1 |φ↓ MD1 , − | ↓1 |ψ0 MD1 → | ↑1 |ψ↑ MD1 + | ↓1 |ψ↓− MD1 ,

(16)

entire

December 28, 2004

13:56



363

where MD1 denotes the first measuring device. The outcomes of this measurement are obtained by projecting the state of MD1 onto the reading base |eξ1 . According to the outcome ξ1 , different measurements are carried out on the second spin (i.e., the interaction of the second measuring device with the second spin is parametrized by ξ1 ), + | ↑2 |ψ0 MD2 → | ↑2 |φ+ ↑,ξ1 MD2 + | ↓2 |φ↓,ξ1 MD2 , − − | ↓2 |ψ0 MD2 → | ↑2 |ψ↑,ξ MD2 + | ↓2 |ψ↓,ξ MD2 , 1 1

(17)

where MD2 denotes the second measuring device. The outcomes of this measurement are obtained by projecting the state of MD2 onto the reading basis |gξ2 ,ξ1 ; the index ξ1 appears because the way the results of the second measurement are read may also depend on the outcomes ξ1 of the first measurement. Putting it all together one obtains | ↑1 | ↑2 |ϕ0 MD1 |ψ0 MD2 → |f 1 |f 2 eξ1 |ψf+ f =↑, ↓ f =↑, ↓

ξ1 ,ξ2

×gξ2 ,ξ1 |ϕ+ f ,ξ1 |eξ1 |gξ2 ,ξ1

(18)

and similarly for the other initial states | ↑1 | ↓2 , | ↓1 | ↑2 , | ↓1 | ↓2 . Equation (18) is a particular case of the general evolution (7), the measuring basis |eξ being replaced by the basis |eξ1 |gξ2 ,ξ1 . Indeed the two successive measurements considered here correspond in the general formalism to a single measuring device consisting of the two pieces MD1 and MD2, and the action of the human observer who “reads” the result of MD1 and decides accordingly what measurement to do next is replaced by MD2 automatically getting correlated to the final state of MD1, and tuning its interaction with the second spin accordingly. The unitary relations (10) are now replaced by the unitary relations obeyed by each measuring device separately: + + − − ∗ )∗ = 1, f,ξ1 af,ξ1 (af,ξ f,ξ1 af,ξ1 (af,ξ1 ) = 1, 1 (19) + − ∗ f,ξ1 af,ξ1 (af,ξ1 ) = 0,

+ + − − ∗ ∗ f ,ξ2 ,ξ1 ) = 1, f ,ξ2 bf ,ξ2 ,ξ1 (b f ,ξ2 bf ,ξ2 ,ξ1 (bf ,ξ2 ,ξ1 ) + − ∗ f ,ξ2 bf ,ξ2 ,ξ1 (bf ,ξ1 ,ξ1 ) = 0,

= 1,

(20)

where, as above, f , f =↑, ↓ and + + a+ b+ f,ξ1 = eξ1 |ψf , f ,ξ2 ,ξ1 = gξ2 ,ξ1 |ϕf ,ξ1 , − − − af,ξ1 = eξ1 |ψf , bf ,ξ2 ,ξ1 = gξ2 ,ξ1 |aϕ− f ,ξ1 .

(21)

entire

December 28, 2004

13:56

364



After completing these two measurements, one guesses a direction of polarization θξ1 ξ2 , φξ1 ξ2 which depends of course on both outcomes. If (18) is to describe an optimal experiment it must satisfy (12) with λ = 1/4 [since any optimal experiment on two parallel spins must necessarily also be an extremum of the reduced problem with (11) as the only constraint]. Explicitly Eq. (12) takes the form (dropping the indices f , f , ξ1 , ξ2 ) −2Sa+ b+ + CE(a+ b− + a− b+ ) = 0, 2CSE a b − (a+ b− + a− b+ ) + 2CSEa− b− = 0, SE ∗ (a+ b− + a− b+ ) − 2Ca− b− = 0, ∗ + +

(22)

which must be satisfied for all i, j, ξ1 , ξ2 . We have used the notation C = cos(θξ1 ξ2 /2), S = sin(θξ1 ξ2 /2) and E = eiφξ1 ξ2 . Equations (22) solve to yield a+ f,ξ1

a− f,ξ1

=

b+ f ,ξ2 ,ξ1 b− f ,ξ2 ,ξ1

=

CE . S

(23)

Upon inserting this relation into the unitary relations (20) a contradiction is readily obtained, thereby proving that experiments such as (18) cannot be optimal experiments. We have generalized this proof to the case where a finite number of measurements are carried out alternatively on the two spins. Whether an infinite number of such alternating measurements can reach the optimal score is still an open problem. It is a pleasure for us to thank P. Bieliavsky for very helpful discussions. References [1] K.R.W. Jones, Ann. Phys., (N.Y.) 207, 140, (1991). [2] C.W. Helstrom, Quantum Detection and Estimation Theory, Academic, New York, 74-83, 1976. [3] J. von Neumann, Mathematical Foundations of Quantum Machanics, Prinston Univ. Press, Princeton, 1955. [4] A. Peres, Found. Phys., 20, 1441 (1990); Nucl. Phys., B340, 333 (1990). [5] A. Peres, Quantum Theory: Concepts and Methods Kluwer Academic Press, Dordrecht, 386, 1993. [6] A. Peres and W.K. Wootters, Phys. Rev. Lett. 66, 1119 (1991).

entire

December 28, 2004

13:56


CHAPTER 23

Asymptotic Estimation Theory for a Finite-Dimensional Pure State Model Masahito Hayashi Abstract. The optimization of measurement for n samples of pure sates are studied. The error of the optimal measurement for n samples is asymptotically compared with the one of the maximum likelihood estimators from n data given by the optimal measurement for one sample.

1. Introduction Recently, there has been a rise in the necessity for studies about statistical estimation for the unknown state, related to the corresponding advance in measuring technologies in quantum optics. An investigation including both quantum theory and mathematical statistics is necessary for an essential understanding of quantum theory because it has statistical aspects [15, 17]. Therefore, it is indeed important to optimize the measuring process with respect to the estimation of the unknown state. Such research is known as quantum estimation, and was initiated by Helstrom in the late 1960s, originating in the optimization of the detecting process in optical communications [15]. In the classical statistical estimation, one searches for the most suitable estimator for which probability measure describes the objective probabilistic phenomenon. In quantum estimation, one searches for the most suitable measurement for which density operator describes the objective quantum state. Contained among important results are three estimation problems. The first is of the complex amplitude of coherent light in thermal noise and the second is of the expectation parameters of quantum Gaussian state. The former was studied by Yuen and Lax [25] and the latter by Holevo [17]. These studies discovered that heterodyning is the most suitable for the estimation of the complex amplitude of coherent light in thermal noise. This chapter is reprinted from J. Phys. A: Math. and Gen., 31, 4633–4655, 1998. Since Appendices consist of the technical proofs, they are omitted. They are available at quantph/9704041. 365

entire

December 28, 2004

366

13:56


Masahito Hayashi

The third is a formulation of the covariant measurement with respect to an action of a group. It was studied by Holevo [16, 17]. In the formulation, he established a quantum analogue of Hunt-Stein theorem. Quantum estimation, was first used in the evaluation of the estimation error of a single sample of the unknown state as it had advanced in connection with the optimization of the measuring process in optical communications. Thus, early studies were lacking in asymptotic aspects, i.e., there were little researches with respect to reducing the estimation error by quantum correlations between samples. Recently, studies concerning the estimation of the unknown state have attracted many physicists [4, 5, 18, 19]. Some of them were drawn by the variation of the measuring precision with respect to the number of samples of the unknown state [2, 24]. Nagaoka [23] studied, for the first time, asymptotic aspects of quantum estimation. He paid particular attention to the quantum correlations between samples of the unknown state, and studied the relation between the asymptotic estimation and the local detection of a one-parameter family of quantum states. In the early 1990s, Fujiwara and Nagaoka [8, 9, 10] studied the estimation problem for a multiparameter family consisting of pure states. They pioneered studies into the estimation problem of the complex amplitude of noiseless coherent light. However, there had been some studies with respect to that of coherent light in thermal noise. The research found that heterodyning is the most suitable for the estimation of the complex amplitude of noiseless coherent light. In 1996, Matsumoto [21] established a more general formulation of the estimation for a multiparameter family consisting of pure states. Moreover in 1991, Nagaoka [22] treated the estimation problem for two-parameter families of mixed states in a spin- 12 system, and in 1997 the author [12, 13] treated it for 3-parameter families of mixed states in a spin- 12 system. However, there are no asymptotic aspects in these works concerning multiparameter families. There is more need of this type of investigation into one- and multiparameter families. Can quantum estimation reduce the estimation error by using the quantum correlations between samples, under the preparation of sufficient samples of the unknown state? To answer this question, in this paper, we treat a family, consisting of all of pure states on a Hilbert space H ∗ under the ∗ Where

H denotes a finite-dimensional Hilbert space which corresponds to the physical system of interest.

entire

December 28, 2004

13:56


Asymptotic Estimation Theory for a Finite-Dimensional Pure State Model

367

preparation of n samples of the unknown state, with the estimation problem. In section 2, we use, as a tool, the composite system consisting of n samples as a single system. The quantum i.i.d. condition is introduced as the quantum counterpart of the independent and identical distributions condition (3). In section 3, we review Holevo’s result concerning covariant measurements which will be used in the following sections. In section 4, we apply Holevo’s result to the optimization of measurements on the composite system, which results in obtaining the most suitable measurement (theorem 6). We asymptotically calculate the estimation error by the optimal measurement in the sense of both the error mean square and large deviation (see (9), (10), (11), and (13)). The first term of the right-hand side of (10) is consistent with the value conjectured from the results by Fujiwara and Nagaoka [10] and Matsumoto [21]. However, the optimal measurement may be too difficult for modern technology to realize when using more than one sample. In section 5, we use this estimation problem under the following guidelines. The samples are divided into pairs consisting of a maximum of m samples. By measuring each pair with the optimal measurement in section 4, we create some data. The estimated value is given by manipulating these data. The restricted condition is called m-semiclassical (see (14)). We compare an m-semiclassical measurement with the optimal measurement of section 4, with respect to the estimation error under the preparation of a sufficient amount of samples. When we use the maximum likelihood estimator to manipulate the data, the error mean square of both asymptotically coincide in the first order (see (10) and (19)). However, when the radius of allowable errors is finite, the error of large deviation type in the latter is smaller than that in the former type (see (11) and (20)). Both coincide in the case of the maximum likelihood estimator under the limit where the radius goes to infinitesimal (13), (18). Can we asymptotically realize a small estimation error as the optimal measurements in section 4 has? It is, physically, sufficient to construct the optimal measurement for one sample. In section 5, we show how to construct it (see (25)). Most of the proofs of this paper are given in the appendices. In view of multiparameter families of mixed states in spin- 12 system, Hayashi [14] has discussed the same problem using Cramér-Rao type bound.

entire

December 28, 2004

13:56


Masahito Hayashi

368

2. Pure State n-I.I.D. Model In this section, we use the mathematical formulation of the estimation for pure states. Let k be the dimension of the Hilbert space H, and P(H) be the set of pure states on H. In quantum physics, the most general description of a quantum measurement is given by the mathematical concept of a positive operator-valued measure (POVM) [15, 17] on the system of state space. Generally, if Ω is measurable space, a measurement M satisfies the following M (B) = M (B)∗ M (B) ≥ 0 M (φ) = 0 M (Ω) = Id on H M (∪i Bi ) = M (Bi ) for Bi ∩ Bj = φ (i "= j)

∀B ⊂ Ω

i

{Bi } is countable subsets of Ω. In this paper, M(Ω, H) denotes the set of POVMs on H whose measurable set is Ω. A measurement M ∈ M(Ω, H) is said to be simple if M (B) is a projection for any Borel B ⊂ Ω. A measurement M is random if it is described as a convex combination of simple measurements. A random measurement M = i ai Mi (Mi is simple and ai > 0.) can be realized when every measurement Mi is taken with the probability ai . In this paper, we consider measurements whose measurable set is P(H) since it is known that the unknown state is included in P(H). Next, we define two distances charactering the homogeneous space P(H). Definition 1: Fubini-Study distance df s (which is the geodesic distance of the Fubini-Study metric) is defined as: ; π (1) cos df s (ρ, ρˆ) = tr ρˆ ρ 0 ≤ df s (ρ, ρˆ) ≤ . 2 The Bures’ distance db is defined in the usual way: ; ρ. (2) db (ρ, ρˆ) := 1 − tr ρˆ It is introduced by Bures [3] in a mathematical context. Let W (ρ, ρˆ) be a measure of deviation of the measured value ρˆ from the actual value ρ, then we have the following equivalent conditions. • W (ρ, ρˆ) = W (gρg ∗ , g ρˆg ∗ ) ∀ρ, ∀ˆ ρ ∈ P(H) ∀g ∈ SU(k). • There exists a function h on [0, 1] such that W (ρ, ρˆ) = h ◦ df s (ρ, ρˆ). It is natural to assume that a deviation measure W (ρ, ρˆ) is monotone increasing with respect to the Fubini-Study distance df s .

entire

December 28, 2004

13:56



369

If H1 , . . . , Hn are n Hilbert spaces which correspond to the physical systems, then their composite system is represented by the tensor Hilbert space H(n) := H1 ⊗ · · · ⊗ Hn = ⊗ni=1 Hi . Thus, a state on the composite system is denoted by a density operator ρ on H(n) . In particular if n element systems {Hi } of the composite system H(n) are independent of each other, there exists a density ρi on Hi such that ρ(n) = ρ1 ⊗ · · · ⊗ ρn = ⊗ni=1 ρi . The condition: H1 = · · · = Hn = H,

ρ1 = · · · = ρn = ρ

(3)

corresponds to the independent and identically distributed (i.i.d.) condition in the classical case. In this paper, we use this estimation problem under this condition (3) called the quantum i.i.d. condition. This condition means that identical n samples are independently prepared. The model {ρ(n) = ρ ⊗ · · · ⊗ ρ |ρ ∈ P(H)} is called n-i.i.d. model. As ρ is a pure n

state, H(n) and ρ(n) are simplified as follows. Letting ρ = |φφ| ∈ P(H), n (n) C D (n) (n) (n) we have ρ φ , φ = φ := φ ⊗ · · · ⊗ φ. Because all of the vectors φ(n) is included in n-times symmetric tensor space, for any measurement M ∈ M(Ω, H(n) ) on the n-times tensor space H(n) , the measurement ˜ ( dω) := P (n) M ( dω)P (n) ∈ M(Ω, Hs(n) ) on the n-times symmetric tenM Hs Hs (n) ˜ ( dω)ρ(n) for any ρ ∈ H, sor space Hs satisfies that tr M ( dω)ρ(n) = tr M (n)

where Hs denotes the n-times symmetric tensor space on H. Therefore, (n) all possible measurements can be regarded as elements of M(P(H), Hs ). (n) The mean error of the measurement Π ∈ M(P(H), Hs ) with respect to a deviation measure2 W (ρ, ρˆ), provided that the actual state is ρ, is W,(n) (Π) := P(H) W (ρ, ρˆ) tr(Π( dˆ ρ)ρ(n) ). In minimax approach equal to Dρ the maximum possible error with respect to a deviation measure W (ρ, ρˆ), W,(n) (Π) is minimized. DW,(n) (Π) := maxρ∈P(H) Dρ 3. Quantum Hunt-Stein Theorem In this section, the quantum Hunt-Stein theorem, established by Holevo [17, 16], is summarized. Let G be a compact transitive Lie group of all transformations on a compact parametric set Θ, and {Vg } a continuous unitary irreducible representation of G in a finite-dimensional Hilbert space H := Ck , and µ a σ-finite invariant measure on group G such that µ(G) = 1. In this section, we consider the following condition for a measurement. Definition 2: A measurement Π ∈ M(Θ, H ) is covariant with respect to

entire

December 28, 2004

370

13:56


Masahito Hayashi

{Vg } if Vg∗ Π(B)Vg = Π(Bg−1 ) for any g ∈ G and any Borel B ⊂ Θ, where Bg := {gθ|θ ∈ B}. M(Θ, V ) denotes the set of covariant measurements with respect to {Vg }. Covariant measurements are characterized by the following theorem. Theorem 3: The map V θ from the set S(H ) of densities on H to M(Θ, V ) is surjective for any θ ∈ Θ, where V θ (P ) is defined as 2 V θ (P )(B) := k {gθ∈B} Vg P Vg∗ µ( dg) for any B ∈ B(Θ) and P ∈ S(H ). In this section, we treat with the following condition for a family of states. Definition 4: The family is called covariant under the representation {Vg } of group G acting on Θ, if Sgθ = Vg Sθ Vg∗ for any g ∈ G and θ ∈ Θ. Assuming that the object is prepared in one of the states {Sθ |θ ∈ Θ} but the actual value of θ is unknown, then the difficulty is estimating this value as close as possible to a measurement on the object. We shall solve this problem by means of the quantum statistical decision theory. ˆ be a measure of deviation of the measured value θˆ from the Let W (θ, θ) ˆ is invariant: actual value θ. It is natural to assume that W (θ, θ) ˆ = W (gθ, g θ) ˆ ∀θ, ∀θˆ ∈ Θ, ∀g ∈ G. W (θ, θ)

(4)

The mean error of the measurement Π ∈ M(Θ, H ) with respect to a deˆ viation measure 2 W (θ, θ), provided that the actual state is Sθ , is equal to W,S ˆ ˆ θ ). Following the classical statistical deDθ (Π) := Θ W (θ, θ) tr(Π( dθ)S cision theory, we can form two functionals of DθW giving a total measure of precision of the measurement Π. In Bayes’ approach we take the mean of DθW with respect to a given prior distribution π( 2 dθ). The measurement minimizing the resulting functional DπW,S (Π) := Θ DθW,S (Π)π( dθ) is called Bayesian. This quantity represents the mean error in the situation where θ is a random parameter with known distribution π( dθ). In particular, as Θ, G are compact and ‘nothing is known’ about θ, it is natural to take for π( dθ) the ‘uniform’ distribution, i.e., normalized invariant measure ν( dθ) defined as ν(B) := µ({gθ ∈ B}). It is independent of the choice of θ ∈ Θ. In minimax approach the maximum possible error with respect to a ˆ DW,S (Π) := maxθ∈Θ DW,S (Π) is minimized. deviation measure W (θ, θ), θ The minimizing measurement is called minimax. Because G is compact, we shall show that in the covariant case the minima of Bayes and minimax criteria coincide and are achieved on a covari-

entire

December 28, 2004

13:56



371

ant measurement. We obtain the following quantum Hunt-Stein theorem [16, 17]. It is easy to prove the theorem. Theorem 5: For a covariant measurement Π ∈ M(Θ, V ), we obtain DθW,S (Π) = DνW,S (Π) = DW,S (Π). For Π ∈ M(Θ, H ), denote Πg (B) := Vg Π(Bg )Vg∗2 for B ∈ B(Θ). In¯ troducing the “averaged” measurement Π(B) := G Πg−1 (B)µ( dg), we 2 W,S ¯ W,S W,S have Dν (Π) = G Dν (Πg−1 )µ( dg) = Dν (Π). Thus, DW,S (Π) ≥ ¯ In this case, minimax approach and Bayes’ approach DνW,S (Π) = DνW,S (Π). with respect to ν( dθ) are2 equivalent. Therefore we minimize the following ˆ (θ)P , where value DθW,S ◦ V θ (P ) = k G W (θ, gθ) tr Sθ Vg P Vg∗ µ( dg) = tr W 6 6 ˆ ˆν( dθ). ˆ ˆ (θ) := k W W (θ, gθ)V ∗ Sθ Vg µ( dg) = k W (θ, θ)S g

G

Θ

θ

Thus, it is sufficient to consider the following minimization minP ∈S(H) ˆ (θ)P = minP ∈P(H ) tr W ˆ (θ)P . tr W 4. Optimal Measurement in Pure State n-I.I.D. Model In this section we apply the theory of section 3 to the problem of section 2. (n) We let Θ := P(H), H := Hs , G := SU(k), and Sρ := ρ(n) . Then, the invariant measure ν on P(H) is equivalent to the measure defined by the volume bundle induced by the Fubini-Study metric. We let the action (n) representation of the natural {Vg } of G = SU(k) to Hs be the tensor & %n+k−1 representation. In this case, we have k = k−1 . Theorem 6: If a deviation measure W (ρ, ρˆ) is monotone increasing with respect to the Fubini-Study distance df s , we find that minP0 ∈P(H(n) ) s ˆ (ρ)P0 = tr W ˆ (ρ)ρ(n) . tr W For a proof see appendix A. Thus, V ρ (ρ(n) ) is the optimal measurement with respect to a deviation measure W (ρ, ρˆ). The optimal measurement is (n) independent of the choice of ρ and W since V ρ0 (ρ0 ) = V ρ (ρ(n) ). This ρ) := optimal is denoted by Πn and is described as Πn ( dˆ %n+k−1& measurement (n) ˆ ν( dˆ ρ). Under the following chart (6), the optimal measurements k−1 ρ are denoted as: =< n + k − 1 Πn ( dθ) = (5) φ(θ)(n) φ(θ)(n) ν( dθ) k−1

entire

December 28, 2004

13:56


Masahito Hayashi

372

for θ ∈ {θ ∈ R2k−2 |θ1 , . . . , θk−1 ∈ [0, π/2], θk , . . . , θ2k−2 ∈ [0, 2π)}, where we defined as follows:   cos θ1   eiθk sin θ1 cos θ2     iθk+1 e sin θ1 sin θ2 cos θ3    . (6) φ(θ) :=  ..    .  iθ2k−3  e sin θ1 sin θ2 sin θ3 · · · sin θk−2 cos θk−1  iθ2k−2 e sin θ1 sin θ2 sin θ3 · · · sin θk−2 sin θk−1 The invariant measures ν( dθ) described above is from p. 31 [11] ν( dθ) =

(k − 1)! sin2k−3 θ1 sin2k−5 θ2 · · · sin θk−1 cos θ1 cos θ2 · · · π k−1 · · · cos θk−1 dθ1 dθ2 · · · dθ2k−2 . (7)

Lemma 7: If the deviation measure W is characterized as W (ρ, ρˆ) = h ◦ df s (ρ, ρˆ), we can describe the maximum possible error of the opti& 2 π2 % 2n+1 θ cal measurement Πn as DW,(n) (Πn ) = 2(k − 1) n+k−1 k−1 0 h(θ) cos 2k−3 θ dθ. sin For a proof, see appendix B. Next, we asymptotically calculate the error of the optimal measurements Πn in the third order. Theorem 8: When the deviation measure W is described as W = dγb , we can asymptotically calculate the maximum possible error of the optimal measurement as: γ

γ

lim Ddb ,(n) (Πn )n 2 =

n→∞

Γ(k − 1 + γ/2) . Γ(k − 1)

(8)

Specially in the case of γ = 2, we have 2

Ddb ,(n) (Πn )n =

i ∞ (k − 1)n k = (k − 1) → k − 1 as n → ∞. (9) − n+k n i=0

When the deviation measure is defined by the square of the Fubini-Study distance, we can asymptotically calculate the maximum possible error of the optimal measurement as: 2 1 23k − 7 1 2 Ddf s ,(n) (Πn )n ∼ = (k − 1) − k(k − 1) + k(k − 1) 3 n 45 n2 as n → ∞.

(10)

entire

December 28, 2004

13:56



373

The error of the sequence {Πn }∞ n=1 of the optimal measurements can be calculated in the sense of large deviation as:

(n) 1 log PrρΠn {ρˆ ∈ P(H)|df s (ρ, ρˆ) ≥ ε} n log n ∼ = log cos2 ε + (k − 2) n 1 + (− log(k − 2)! + 2(k − 2) log(sin ε) − 2 log(cos ε)) n 2 1 k −k−2 2 + (k − 2) cot ε as n → ∞, (11) + 2 n2 where PrSM B denotes the probability of B with respect to the probability measure tr(M ( dω)S) for a Borel B ⊂ Ω, a measurement M ∈ M(Ω, H ) and a state S ∈ S(H ). For a proof, see appendix C. The first term of the right hand of (11) coincide with the logarithm of the fidelity [20]. In this paper, ε in equations (11) is called admissible radius. Since log cos2 ε = −1 (12) lim ε→0 ε2 we obtain the following large deviation approximation

(n) 1 lim lim 2 log PrρΠn {ρˆ ∈ P(H)|df s (ρ, ρˆ) ≥ ε} = −1. (13) ε→0 n→∞ ε n 5. Semiclassical Measurement In this section, we consider measurements which allow the quantum correlation between finite samples only. A measurement M on H(nm) is called m-semiclassical if there exists an estimator T on the probability space P(H) × · · · × P(H) whose range is P(H) such that n

6

M (B) = T −1 (B)

Πm ( dρ1 ) ⊗ · · · ⊗ Πm ( dρn ) ∀B ⊂ P(H).

(14)

n

We compare the error between m-semiclassical measurements and the optimal measurement Πnm for nm samples of the unknown state as the equations (10), (11), and (13). In doing this comparison, we bear in mind asymptotic estimation theory in classical statistics. In classical statistics, it is assumed that the sequence of estimators satisfies the consistency.

entire

December 28, 2004

13:56


Masahito Hayashi

374

Definition 9: A sequence {T (n) }∞ n=1 of estimators on a probability space Ω is called consistent with respect to a family {pθ |θ ∈ Θ} of probability distributions on Ω, if it satisfies the condition (15), where every T (n) is a probability variable on the probability space Ω × · · · × Ω whose domain n

is Θ (n)

pθ {dJ (T (n) , θ) > ε} → 0

as n → ∞

∀θ ∈ Θ, ∀ε > 0

(15)

where dJ denotes the geodesic distance defined by the Fisher information (n) metric and pθ denotes the probability measure pθ × · · · × pθ on the prob n

ability space Ω × · · · × Ω. n

It is well known that the following theorem establishes under the preceding consistency [1, 6, 7]. Theorem 10: If a sequence {T (n) }∞ n=1 of estimators is a consistent estimator with respect to a family {pθ |θ ∈ Θ} of probability distributions on a probability space Ω which satisfies some regularity, then we have the following inequalities 6 6 6

(n) · · · d2J T (n) (x1 , x2 , . . . , xn ), θ pθ ( dx1 , . . . , dxn ) ≥ dim Θ lim n n→∞ n

(16)

1 (n) lim log pθ {D(pT (n) pθ ) ≥ ε} ≥ −ε (17) n→∞ n

1 1 (n) lim lim log pθ {dJ (T (n) , θ) ≥ ε} ≥ − (18) ε→0 n→∞ ε2 n 2 where D(p q) denotes the information divergence of a probability distribution q 2with respect to another probability distribution p defined by D(p q) := Ω (log p(ω) − log q(ω)) p(ω) dω. Under some regularity conditions, the lower bounds of (16) and (18) can be attained by the maximum likelihood estimator when the family {pθ |θ ∈ Θ} is exponential, but generally cannot be attained. For the comparison, we apply theorem 10 to the family of distributions ρ)ρ(m) |ρ ∈ P(H)} given by the measurement Πm and the fam{tr Πm ( dˆ ily of states {ρ(m) |ρ ∈ P(H)}. We consider the sequence of measurements (n) ∞ }n=1 on {T(n,m) }∞ n=1 which corresponds to the consistent estimator {T (m)

the family of distributions {tr Πm ( dˆ ρ)ρ1 |ρ ∈ P(H)}, where T(n,m) is the

entire

December 28, 2004

13:56



375

measurement on H(nm) defined by the estimator T (n) and n data given by the measurement Πm ⊗ · · · ⊗ Πm and the state ρ(nm) . From the sym n

metry of P(H) and Πm , the information divergence of a probability mea(m) ρ)ρ2 is desure with respect to another a probability measure tr Πm ( dˆ termined by the the Fubini-Study distance ε between ρ1 and ρ2 . Thus, the divergence can be denoted by Dm (ε). From lemma 11, the geodesic distance dΠm with respect to Fisher information metric in√the family of ρ)ρ(m) |ρ ∈ P(H)} is given by dΠm = 2mdf s . Since distributions {tr Πm ( dˆ dim P(H) = 2(k − 1), we have the following inequalities: 2

lim nmDdf s ,(nm) (T(n,m) ) = 6 d2f s (ρ, ρˆ) tr(T(n,m) ( dˆ ρ)ρ(nm) ) ≥ k − 1 lim max nm

n→∞

n→∞ ρ∈P(H)

(19)

P(H)

(nm) Dm (ε) 1 log PrρT(n,m) {ρˆ ∈ P(H)|df s (ρ, ρˆ) ≥ ε} ≥ − (20) n→∞ nm m (nm) 1 lim lim 2 log PrρT(n,m) {ρˆ ∈ P(H)|df s (ρ, ρˆ) ≥ ε} ≥ −1. (21) ε→0 n→∞ ε nm The lower bound of (19) is consistent with the first term of the right-hand side of (10) and the lower bound of (21) is consistent with the right-hand side of (13). When the sequence of measurements {T (n) }∞ n=1 corresponds to the maximum likelihood estimator, the lower bounds (19) and (21) can be attained. We have the following lemma concerning the comparison of the lower bound − Dmm(ε) of (20) and the first term 2 log cos ε of the right-hand side of (11).

lim

Lemma 11: We can calculate the divergence Dm (ε) and the distance dΠm as: m % & Dm (ε) sin2i ε = → − log 1 − sin2 ε = − log cos2 ε as m → ∞ (22) m i i=1 √ (23) dΠm = 2mdf s . Therefore,

Dm (ε) m

is monotone increasing with respect to m.

For a proof, see appendix D. (22) infers that −m log cos2 ε − Dm (ε) → 0 as ε → 0 (24) mε2m which means that the first term on the right-hand side of (11) cannot be attained by a semiclassical measurement. However, the first term of 0
0 if ρ "= σ D(ρ σ) (36) = 0 if ρ = σ.

entire

December 28, 2004

13:56


On the Relation between Kullback Divergence and Fisher Information

409

It also satisfies a property corresponding to the monotonicity of the Kullback divergence. Here the monotonicity of the Kullback divergence means that for an arbitrary probability distributions p, q on a set X and for an arbitrary mapping (random variable) f : X → Y we have D(p q) ≥ D(pf

qf ), where pf and qf respectively denote the probability distributions of f with respect to p and q. In order to formulate the general monotonicity of quantum relative entropy it is necessary to introduce the notion of operator algebras and states on them (see [15]), but for the present purpose we need only the special form of monotonicity that for any measurement Π = {πi } we have def

D(ρ σ) ≥ DΠ (ρ σ) = D(p q)

(37)

where def

p(i) = Tr[ρπi ],

def

q(i) = Tr[σπi ].

Here DΠ (ρ σ) represents the Kullback divergence for the probability distributions with respect to the measurement Π. In the case when ρσ = σρ we have the equality in (37) by using a Π which diagonalizes ρ and σ simultaneously. In the other case, however, the equality cannot hold: i.e., D(ρ σ) > sup DΠ (ρ σ) if Π

ρσ "= σρ.

(38)

Let us now explain the main result of Hiai and Petz[8] claiming that the inequality in (38) can be replaced with the equality asymptotically. For a natural number n, let the tensor product of n copies of ρ and σ def Gn def Gn ρ and σ (n) = σ respectively. These are be denoted by ρ(n) = def Gn density operators on the tensor-product Hilbert space H(n) = H (the composite space of n copies of H), and correspond to the classical i.i.d. assumption. It is then easy to show that D(ρ(n) σ (n) ) = nD(ρ σ).

(39)

Moreover, it follows from the monotonicity that for an arbitrary measurement Π(n) on H(n) D(ρ(n) σ (n) ) ≥ DΠ(n) (ρ(n) σ (n) ),

(40)

1 sup D (n) (ρ(n) σ (n) ). n Π(n) Π

(41)

and hence D(ρ σ) ≥

entire

December 28, 2004

13:56


Hiroshi Nagaoka

410

Now, letting n → ∞ we have ([8]) D(ρ σ) = lim

n→∞

1 sup D (n) (ρ(n) σ (n) ). n Π(n) Π

(42)

Remark 1: In the above statement, Π(n) is a set of projections corresponding to an arbitrary orthogonal decomposition of the tensor-product space H(n) and includes the special case when Π(n) is represented using n measurements Π1 = {π1,i }, Π2 = {π2,i }, . . ., Πn = {πn,i } on H as n

(n) (n) where πi1 i2 ···in = Π(n) = πi1 i2 ···in πk,ik , (43) k=1

which is denoted by Π(n) =

n

Πk .

(44)

k=1

This type of measurement is physically realized by independently applying measurements Π1 , . . . , Πn one by one to the n systems. A general Π(n) , on the other hand, is a measurement in which the totality of the n systems is considered as a single system, and includes cases in which quantum mechanical interactions among the element systems are utilized. If we restrict the range of sup in (41) to those Π(n) having the form (44), it turns out that the RHS is supΠ DΠ (ρ σ), being independent of n, and that (42) does not hold. Furthermore, Hiai and Petz [8] combined (42) with the classical Stein’s lemma to obtain the result mentioned below. Let us consider the hypothesis Gn H with testing problem about states on the quantum system H(n) = G n ρ” and H1 : “the the two hypotheses H0 : “the true state is ρ(n) = Gn σ”. A test in this case is defined as an observable true state is σ (n) = ϕ(n) on H(n) taking values in {0, 1}. Such a ϕ(n) is a Hermitian operator with its eigenvalues lying in {0, 1}, which means that it is a projection operator, and {ϕ(n) , I − ϕ(n) } forms a binary measurement on H(n) . The error probabilities of the 1st and 2nd kinds of the test ϕ(n) are respectively represented as α(ϕ(n) ) = Tr[ρ(n) ϕ(n) ], (n)

β(ϕ

) = 1 − Tr[σ

(n)

(45) (n)

ϕ

].

(46)

Letting for an arbitrary ε > 0 def

β∗ (n; ε) = inf {β(ϕ(n) ) | α(ϕ(n) ) ≤ ε}

(47)

entire

December 28, 2004

13:56



411

as in (9), we have [8]: 1 log β∗ (n; ε) ≤ −D(ρ σ), n 1 1 lim log β∗ (n; ε) ≥ − D(ρ σ). 1−ε n→∞ n lim

(48)

n→∞

(49)

It should be noted that these inequalities are a little weaker in comparison with the equation (10) in the classical case and cannot be regarded as a final result. 5. The Symmetric Logarithmic Derivative and the Quantum Fisher Information The definition (5) (6) of the Fisher information based on the symmetric logarithmic derivative was introduced by C. W. Helstrom [7] to formulate a version of Cramér-Rao inequality for parameter estimation with respect to a quantum statistical model. In this section we explain some basic results concerning parameter estimation and the Fisher information for a 1-parameter quantum statistical model {ρθ ; θ ∈ Θ (⊂ R)}, where appropriate regularity conditions on such matters as the smoothness of θ )→ ρθ are tacitly assumed. We begin with stating a result corresponding to the classical CramérRao inequality (11) in the case when n = 1. Generally, an estimator for the parameter θ is represented by an observable (i.e. a Hermitian operator) T on H. If it satisfies the unbiasedness: def

Eθ [T ] = Tr[ρθ T ] = θ

(∀θ ∈ Θ),

(50)

then we have the following quantum Cramér-Rao inequality: Eθ [(T − θ)2 ] = Tr[ρθ (T − θI)2 ] ≥ 1/J(θ),

(51)

where J(θ) is the (quantum) Fisher information defined by (5) and (6). G (n) When we consider ρθ = n ρθ instead of ρθ , the corresponding symmetric (n) logarithmic derivative Lθ is represented as (n)

Lθ

= Lθ ⊗ I ⊗ · · · ⊗ I + I ⊗ Lθ ⊗ · · · ⊗ I + · · · + I ⊗ I ⊗ · · · ⊗ Lθ . (n)

It follows that the Fisher information J (n) (θ) of the model {ρθ } is J (n) (θ) = nJ(θ),

(52)

and hence we have an inequality similar to (11). (In this case an estimator T (n) is a Hermitian operator on H(n) .) In addition, the asymptotic CramérRao inequality similar to (12) also holds.

entire

December 28, 2004

412

13:56


Hiroshi Nagaoka

As with the property (37) of the relative entropy, it holds for the Fisher information that for any measurement Π = {πi } 3 2 4 d def def log pθ J(θ) ≥ JΠ (θ) = Eθ , where pθ (i) = Tr[ρθ πi ]. (53) dθ In contrast to (38), however, we always have [10] J(θ) = max JΠ (θ). Π

(54)

Here a Π attaining the max is given by the measurement of the observable represented by the symmetric logarithmic derivative Lθ (i.e. the orthogonal projections onto the eigenspaces of Lθ ). Note that this measurement depends on θ in general. 6. Relation between the Relative Entropy and the Quantum Fisher Information In this section we show that a relation similar to (3) does not generally hold in the quantum case, and derive a result to understand the implication of this fact from an estimation-theoretical viewpoint. Since D(ρθ+∆θ ρθ ) = O((∆θ)2 ) holds from (36), we can define a ˜ nonnegative quantity J(θ) by 1˜ 2 + o((∆θ)2 ). (55) D(ρθ+∆θ ρθ ) = J(θ)(∆θ) 2 This is also represented as 7 2 8 7 2 8 7 2 8 ∂ ∂ ∂ ˜ J(θ) = D(ρξ ρη ) D(ρξ ρθ ) = D(ρθ ρη ) =− . ∂ξ 2 ∂η 2 ∂ξ∂η ξ=θ η=θ ξ=η=θ For any measurement Π on H we have D(ρθ+∆θ ρθ ) ≥ DΠ (ρθ+∆θ ρθ ) ˜ by the monotonicity, which leads to J(θ) ≥ JΠ (θ) by (55) and (3). Hence we obtain the following inequality from (54): ˜ ≥ J(θ). J(θ)

(56)

This inequality can be also verified by a direct calculation as follows. First, ρθ has the spectral decomposition: ρθ ψj (θ) = λj (θ)ψj (θ),

(57)

where {ψj (θ)} are chosen to form an orthogonal normal basis of H for every θ. That is, ψj (θ)|ψk (θ) = δjk ,

(58)

entire

December 28, 2004

13:56



413

where ·|· denotes the inner product on H. Using the Dirac notation we have λj (θ) |ψj (θ)ψj (θ)|. (59) ρθ = j

It then follows from (28) (29) that the eigenvalues {λj (θ)} are nonnegative real numbers and satisfy j λj (θ) = 1. Note that there may be some equal values in {λj (θ)}. We further assume that λj (θ) and ψj (θ) are smooth functions of θ. In this situation, with the aid of log ρθ = log λj (θ) |ψj (θ)ψj (θ)| j

and Lθ =

2ψj (θ)|(dρθ /dθ)ψk (θ) j

k

λj (θ) + λk (θ)

|ψj (θ)ψk (θ)|,

we can show the following relations: d λj 2 ˜ λj { log λj } + 2 λj log J(θ) = ajk dθ λk j j k λj − λk 2 d λj { log λj }2 + 4 λj ajk , J(θ) = dθ λj + λk j j

(60)

(61)

k

where ajk

2 dψj |ψk . = dθ

def

(62)

Noting that ajk = akj follows from (58), inequality (56) can be derived from (60) and (61). We also see that the equality does not generally hold in (56). The results given below are considered to correspond to (15)-(18) in the classical theory, which enable us to understand the meaning of inequality (56) from an estimation-theoretical viewpoint. From here on we treat sequences {T (n) } of observables T (n) on H(n) which are regarded as (sequences of) estimators. Then the probability distribution of T (n) with Gn (n) ρθ is determined. When it holds for this respect to the state ρθ = distribution that lim Pθ {|T (n) − θ| ≥ ε} = 0,

n→∞

we call {T (n) } a consistent estimator.

(∀θ ∈ Θ, ∀ε > 0)

(63)

entire

December 28, 2004

13:56


Hiroshi Nagaoka

414

Theorem 2: For ∀θ ∈ Θ and ∀ε > 0, 1 log Pθ {T (n) ≥ θ + ε} = −D(ρθ+ε ρθ ), n→∞ n 1 log Pθ {T (n) ≤ θ − ε} = −D(ρθ−ε ρθ ), inf lim (n) {T } n→∞ n inf

{T (n) }

lim

(64) (65)

where the inf’s are taken over all consistent estimators. From Theorem 2 and (55) we obtain the following. Corollary 3: For ∀θ ∈ Θ, 1 1˜ log Pθ {T (n) ≥ θ + ε} = − J(θ), ε2 n 2 1 1 ˜ lim inf lim 2 log Pθ {T (n) ≤ θ − ε} = − J(θ). ε↓0 {T (n) } n→∞ ε n 2 lim inf

lim

ε↓0 {T (n) } n→∞

(66) (67)

Theorem 4: For ∀θ ∈ Θ, 1 1 log Pθ {T (n) ≥ θ + ε} = − J(θ), ε2 n 2 1 1 log Pθ {T (n) ≤ θ − ε} = − J(θ), inf lim lim 2 {T (n) } ε↓0 n→∞ ε2 n inf lim lim

{T (n) } ε↓0 n→∞

(68) (69)

where the inf’s are tentatively considered to be taken over all consistent estimators {T (n)}, whereas there remains a certain problem as will be mentioned in the remark after the proof. Proof of Theorem 2: Here we prove (64) only, with ‘LHS ≥ RHS’ shown first. Let {T (n) } be an arbitrary consistent estimator, and let def

pn = Pθ {T (n) ≥ θ + ε},

def

qn = Pθ {T (n) ≥ θ + ε}

for arbitrary ε > 0 and θ > θ + ε. From the consistency of {T (n) } we have lim pn = 0,

n→∞

lim qn = 1.

n→∞

(70)

In addition, it follows from the monotonicity of the Kullback divergence that (n)

(n)

qn 1 − qn + (1 − qn ) log pn 1 − pn = −qn log pn − (1 − qn ) log(1 − pn ) − h(qn ), (71)

DT (n) (ρθ ρθ ) ≥ qn log

entire

December 28, 2004

13:56



(n)

415

(n)

where DT (n) (ρθ ρθ ) denotes the Kullback divergence for the probability (n)

def

(n)

distributions of T (n) with respect to the states ρθ and ρθ , and h(qn ) = −qn log qn − (1 − qn ) log(1 − qn ). Noting that (71) leads to

1 (n) (n) DT (n) (ρθ ρθ ) + (1 − qn ) log(1 − qn ) + h(qn ) , log pn ≥ − qn we obtain from (70) lim n→∞

1 1 (n) (n) log pn ≥ − lim DT (n) (ρθ ρθ ). n→∞ n n

(72)

Now, the monotonicity of the relative entropy yields (n)

(n)

(n)

(n)

DT (n) (ρθ ρθ ) ≤ D(ρθ ρθ ) = nD(ρθ ρθ ), and therefore from the above inequality we have 1 log pn ≥ −D(ρθ ρθ ). n→∞ n lim

Since θ is arbitrary within θ > θ + ε, we obtain 1 log pn ≥ −D(ρθ+ε ρθ ). n n→∞ lim

‘LHS ≥ RHS’ in (64) has thus been proved. We next proceed to show ‘LHS ≤ RHS’. Fix θ and ε > 0 arbitrarily. Then for any δ > 0 it follows from (42) that there exist a natural number (l) l and a measurement Π(l) = {πi } on H(l) satisfying 1 1 (l) (l) (l) (l) D (l) (ρθ+ε ρθ ) = D(pθ+ε pθ ) ≥ D(ρθ+ε ρθ ) − δ, l Π l (l)

(73) (l)

where pθ denotes the probability distribution defined by pθ (i) = (l) (l) Tr[ρθ πi ]. Using this measurement, let us construct for each n ≥ l an observable T (n) on H(n) in the following way. First, let k = kn and j = jn be natural numbers determined by n = lk +j, 0 ≤ j ≤ l−1. Regarding H(n) as the composite system of k copies of H(l) and one H(j) , the operation to perform the measurement Π(l) to each H(l) and no measurement to H(j) (n) defines a measurement Π(n) on H(n) . That is, Π(n) = {πi1 ···ik } is defined by (n)

(l)

(l)

πi1 ···ik = πi1 ⊗ · · · ⊗ πik ⊗ I (j) . (l)

(74)

Concerning the statistical model {pθ } consisting of the probability distri(l) butions pθ , let t(k) = t(k) (i1 , . . . , ik ) be an estimator for θ, based on k data

entire

December 28, 2004

13:56


Hiroshi Nagaoka

416

(i1 , . . . , ik ), which satisfies the consistency and attains the min in (17) in the limit of k → ∞, and define def (n) t(k) (i1 , . . . , ik ) πi1 ···ik . (75) T (n) = i1 ···ik

(n)

In other words, T represents the process of computing t(k) (i1 , . . . , ik ), as an estimate of θ, from the measurement result (i1 , . . . , ik ) of the measurement Π(n) . Owing to the property of t(k) , it turns out that T (n) is a consistent estimator and satisfies 1 1 log Pθ {T (n) ≥ θ + ε} = lim log Pθ {t(kn ) ≥ θ + ε} lim n→∞ n n→∞ lkn + jn 1 1 1 (l) (l) lim log Pθ {t(k) ≥ θ + ε} = − D(pθ+ε pθ ). (76) = l k→∞ k l Consequently, from (73) we have 1 log Pθ {T (n) ≥ θ + ε} ≤ −D(ρθ+ε ρθ ) + δ. n n→∞ lim

(77)

Since δ > 0 is arbitrary, ‘LHS ≤ RHS’ in (64) has been shown. Proof of Theorem 4: We prove (68) only, beginning with showing ‘LHS ≥ RHS’. As was shown in (72), for any consistent estimator {T (n)} and ∀ε > 0, ∀θ > θ + ε, we have 1 1 (n) (n) log Pθ {T (n) ≥ θ + ε} ≥ − lim DT (n) (ρθ ρθ ), lim n→∞ n n→∞ n which yields 1 1 (n) (n) lim lim 2 log Pθ {T (n) ≥ θ + ε} ≥ − lim lim 2 DT (n) (ρθ+ε+ε2 ρθ ) n→∞ ε↓0 ε n ε n n→∞ ε↓0 1 1 (n) 1 (∗) (n) (n) = − lim lim 2 DT (n) (ρθ+ε+ε2 ρθ ) = − lim JT (n) (θ). (78) n→∞ ε↓0 ε n 2 n→∞ n (n)

(See the remark below for (∗).) Here JT (n) (θ) denotes the Fisher information of the statistical model consisting of the probability distributions of T (n) (n) with respect to the states ρθ , and the last equality follows from (3). Since by (52) and (53) (n)

JT (n) (θ) ≤ J (n) (θ) = nJ(θ), ‘LHS ≥ RHS’ in (68) is proved by (78). We next show ‘LHS ≤ RHS’. For an arbitrarily fixed θ, let Π = {πi } be a measurement on H which attains the max in (54); i.e., J(θ) = JΠ (θ).

(79)

entire

December 28, 2004

13:56



417

Let t(n) = t(n) (i1 , . . . , in ) be a consistent estimator (e.g. the maximum likelihood estimator) for the model {pθ } consisting of the probability distribudef

tions pθ (i) = Tr[ρθ πi ] which attains the lower bound of (15) (− 12 JΠ (θ) in the present case), and define def T (n) = t(n) (i1 , . . . , in ) πi1 ⊗ · · · ⊗ πin . (80) i1 ···in

is an observable on H(n) and satisfies 1 1 1 lim lim 2 log Pθ {T (n) ≥ θ + ε} = − JΠ (θ) = − J(θ). 2 2 ε↓0 n→∞ ε n

Then T

(n)

(81)

‘LHS ≤ RHS’ has thus been proved. Remark 5: In the proof above we have changed the order of limn→∞ and limε↓0 at (∗) in (78), but its validity is not necessarily clear. Actually we need to investigate whether the validity follows from the consistency of T (n) or otherwise what kind of assumption is necessary for the validity. The main point is, however, whether inequality (78) itself, i.e. 1 1 (n) 1 log Pθ {T (n) ≥ θ + ε} ≥ − lim JT (n) (θ), 2n n→∞ ε 2 n ε↓0 n→∞

lim lim

is valid or not. So let us tentatively consider the set C of consistent estimators satisfying this inequality and replace inf {T (n) } with inf {T (n) }∈C in (64)-(69) all at once. Then it is clear that ‘LHS ≥ RHS’ holds in these equations, but in fact ‘LHS ≤ RHS’ also holds. That is, the inf’s in (64)-(69) are attained in the range {T (n) } ∈ C. The reason is as follows. In the classical case, we can show that for a consistent estimator satisfying a certain regularity condition we always have lim lim

ε↓0 n→∞

1 1 1 (n) log Pθ {T (n) ≥ θ + ε} = − lim JT (n) (θ). ε2 n 2 n→∞ n

(82)

The estimators (75) and (80) constructed to show ‘LHS ≤ RHS’ in the proofs of (64) and (68) are both directly derived from classical estimators and satisfy (82). Therefore these estimators belong to C. 7. Discussions Comparing (66) (67) with (68) (69), the difference between them lies in the positions of limε↓0 and inf {T (n) } , which immediately implies inequality (56). The results may be given an intuitive explanation as follows; equations (66) and (68) are both evaluating error probabilities in discriminating θ and θ+ε

entire

December 28, 2004

13:56


Hiroshi Nagaoka

418

for infinitesimally small ε > 0, but they differ in that in (66) optimization (of {T (n) }) is made for each ε, while in (68) a kind of ‘uniform’ optimization for all infinitesimal ε > 0 is considered, so the condition of the latter is more restrictive than that of the former to yield a larger error probability. In the classical case, we can also consider both kinds of optimizations, but it turns out that they coincide with each other owing to (3). In the quantum case, on the other hand, they differ in general. The difference takes an extreme form in the case when the elements ρθ of the model are pure states. We say that ρθ is a pure state when its rank is 1, which is characterized by the condition ρ2θ = ρθ (under (28) and (29)). In this case there is only one 1 in the eigenvalues {λj (θ)} of ρθ with the rest being 0. Hence it follows from ˜ (60) and (61) that J(θ) < ∞ = J(θ). There is also a remarkable difference between the substance of the inf’s in (66) (67) and that in (68) (69). As mentioned in Remark 1 in section 4, if we only consider measurements on H(n) of the form given in (44), then (42) breaks down and so do (64)-(67). On the other hand, in the proof of Th. 2 the estimator T (n) in (80) is constructed from the measurement Gn Π, where Π is a measurement on H satisfying (79), and this Π(n) = T (n) attains the inf in (68). Therefore, the result remains the same in (68) and (69) even if we restrict ourselves to measurements on H(n) of the form (44). Corresponding to (13) and (14), it is obvious from (66)-(69) that 1 1˜ log Pθ {|T (n) − θ| ≥ ε} ≥ − J(θ), ε2 n 2 1 1 log Pθ {|T (n) − θ| ≥ ε} ≥ − J(θ). inf lim lim 2 {T (n) } ε↓0 n→∞ ε2 n lim inf

lim

ε↓0 {T (n) } n→∞

(83) (84)

The equality in (83) does not generally hold, while as for (84), letting t(n) in constructing T (n) by (80) be the maximum likelihood estimator, we have lim lim ε↓0 n→∞

1 1 log Pθ {|T (n) − θ| ≥ ε} = − J(θ), ε2 n 2

and therefore 1 1 log Pθ {|T (n) − θ| ≥ ε} = − J(θ). 2n ε 2 ε↓0 n→∞

inf lim lim

{T (n) }

(85)

Although the above-mentioned results are all evaluations at a fixed θ, from a proper estimation-theoretical standpoint it is more desirable to have uniform evaluations for θ. In the classical case, the maximum likelihood estimator attains the equalities in (13), (15) and (16) uniformly (i.e. for all

entire

December 28, 2004

13:56



419

θ). In the quantum case, it is also expected that an estimator uniformly attaining the inf’s in (68), (69) and (85) can be constructed by modifying the construction (80) with application of the method [11, 13] in which the maximum likelihood estimation is carried out together with an adaptive search of a measurement approximating the Π in (79) . It is impossible, however, to apply the same argument to (66) and (67). References [1] S. Amari, “Differential geometry of curved exponential families — curvatures and information loss,” Ann. Statist., 10, 357-385, (1982). [2] S. Amari, Differential-geometrical methods in statistics, Lecture Notes in Statistics 28, Springer-Verlag, 1985. [3] R.R. Bahadur, “On the asymptotic efficiency of tests and estimates,” Sankhy¯ a, 22, 229-252, (1960). [4] J.A. Bucklew, Large deviation techniques in decision, simulation, and estimation, John Wiley & Sons, New York, 1990. [5] R.E. Blahut, Principles and practice of information theory, Addison-Wesley, New York, 1987. [6] I. Csisz´ ar and J. K¨ orner, Information theory: coding theorems for discrete memoryless systems, Academic Press, New York, 1981. [7] C.W. Helstrom, Quantum detection and estimation theory, (Academic Press, New York, 1976). [8] F. Hiai and D. Petz, “The proper formula for relative entropy and its asymptotics in quantum probability,” Commun. Math. Phys., 143, 99–114, (1991). (Chap. 3 of this book). [9] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, (NorthHolland, Amsterdam, 1982). [10] H. Nagaoka, “On Fisher information of quantum statistical models,” In Proceedings of 10th Symposium on Information Theory and Its Applications, p. 241–246, 1987 (in Japanese). (Chap. 9 of this book). [11] H. Nagaoka, “An asymptotically efficient estimator for a one-dimensional parametric model of quantum statistical operators,” Proc. of 1988 IEEE International Symposium on Information Theory, p. 198, 1988. [12] H. Nagaoka, “A new approach to Cramér-Rao bounds for quantum state estimation,” IEICE Technical Report, IT 89-42, 9–14, (1989). (Chap. 8 of this book). [13] H. Nagaoka, “On the parameter estimation problem for quantum statistical models,” In Proceedings of 12th Symposium on Information Theory and Its Applications, p. 577–582 (1989) (In Japanese). (Chap. 10 of this book). [14] H.P. Yuen and M. Lax, “Multiple-parameter quantum estimation and measurement of non-selfadjoint observables,” IEEE Trans. Inform. Theory, 19, 740–750 (1973). [15] H. Umegaki and M. Ohya, Quantum theoretical entropy (in Japanese), (Kyoritsu Shuppan, Japan, 1984).

entire

December 28, 2004

13:56


CHAPTER 28

Two Quantum Analogues of Fisher Information from a Large Deviation Viewpoint of Quantum Estimation Masahito Hayashi Abstract. We discuss two quantum analogues of Fisher information, symmetric logarithmic derivative (SLD) Fisher information and KuboMori-Bogoljubov (KMB) Fisher information from a large deviation viewpoint of quantum estimation and prove that the former gives the true bound and the latter gives the bound of consistent superefficient estimators. In another comparison, it is shown that the difference between them is characterized by the change of the order of limits.

1. Introduction Fisher information not only plays a central role in statistical inference, but also coincides with a natural inner product in a distribution family. It is defined as 6 dpθ (ω) (1) lθ (ω)2 pθ (ω) dω, lθ (ω)pθ (ω) = Jθ := dθ Ω for a probability distribution family {pθ |θ ∈ Θ ⊂ R} with a probability space Ω. However, the quantum version of Fisher information cannot be uniquely determined. In general, there is a serious arbitrariness concerning the order among non-commutative observables in the quantization of products of several variables. The problem of the arbitrarity of the quantum version of Fisher information is due to the same reason. The geometrical properties of its quantum analogues have been discussed by many authors [1, 34, 35, 36]. One quantum analogue is the Kubo-Mori-Bogoljubov (KMB) Fisher information J˜ρ defined by 6 1 6 1 dρθ t ˜ 1−t ˜ ˜ Jθ := (2) Tr ρθ Lθ ρθ Lθ dt, ρtθ L˜θ ρ1−t dt = θ dθ 0 0 This chapter is reprinted from J. Phys. A: Math. and Gen., 35, 7689-7727, 2002. Since large part of Appendices consist of the technical proofs, they are omitted. They are available at quant-ph/0202003. 420

entire

December 28, 2004

13:56


Two Quantum Analogues of Fisher Information

421

for a quantum state family {ρθ ∈ S(H)|θ ∈ Θ}, where S(H) is the set of density matrixes on H and the Hilbert space H corresponds to the physical system of interest [1, 34, 35, 36]. As proven in B, it can be characterized as the limit of quantum relative entropy, which plays an important role in several topics of quantum information theory, for example, quantum channel coding [24, 38], quantum source coding [16, 37, 42] and quantum hypothesis testing [22, 33]. Moreover, as mentioned in section 3, this inner product is closely related to the canonical correlation of the linear response theory in statistical mechanics [27]. As mentioned in A, it appears to be the most natural quantum extension from an information geometrical viewpoint. Thus, one might expect that it is significant in quantum estimation, but its estimation-theoretical characterization has not been sufficiently clarified. Another quantum analogue is symmetric logarithmic derivative (SLD) Fisher information 1 dρθ (Lθ ρθ + ρθ Lθ ) = , (3) Jθ := Tr L2θ ρθ , 2 dθ where Lθ is called the symmetric logarithmic derivative [20]. It is closely related to the achievable lower bound of mean square error (MSE) not only for the one-parameter case [20, 21, 23], but also for the multi-parameter case [11, 13, 14] in quantum estimation. The difference between the two can be regarded as the difference in the order of the operators, and reflects the two ways of defining Fisher information for a probability distribution family. Currently, the former is closely related to the quantum information theory while the latter is related to the quantum estimation theory. These two inner products have been discussed only in separate contexts. In this paper, to clarify the difference between them, we introduce a large deviation viewpoint of quantum estimation as a unified viewpoint, whose classical version was initiated by Bahadur [2, 3, 4]. This method may not be conventional in mathematical statistics, but seems a suitable setting for a comparison between two quantum analogues from an estimation viewpoint. This type of comparison was initiated by Nagaoka [31, 32], and is discussed in further depth in this paper. Such a large deviation evaluation of quantum estimation is closely related to the exponent of the overflow probability of quantum universal variable-length coding [19]. This paper is structured as follows: Before we state the main results, we review the classical estimation theory including Bahadur’s large deviation theory, which has been done in section 2. After this review, we briefly outline the main results in section 3, i.e., the difference is characterized from three

entire

December 28, 2004

13:56


Masahito Hayashi

422

contexts. To simplify the notations, even if we need the Gauss notation [ ], we omit it when this does not cause confusion. Some proofs are very complicated and are presented in the appendixes. 2. Review of Classical Estimation Theory We review the relationship between the parameter estimation for the probability distribution family {pθ |θ ∈ Θ ⊂ R} with a probability space Ω and its Fisher information. The definition of Fisher information is given not only by (1), but also by2 the limit of the relative entropy (Kullback-Leibler divergence) D(p q) := Ω (log p(ω) − log q(ω)) p(ω) dω as Jθ := lim

ε→0

2 D(pθ+ε pθ ). ε2

(4)

These two definitions (1) and (4) coincide under some regularity conditions for a family. Next, we consider a map f from Ω to Ω . Similarly to other information quantities, (for example Kullback divergence etc) the inequality Jθ ≥ Jθ

(5)

holds, where Jθ is Fisher information of the family {pθ ◦f −1 |θ ∈ Θ}. Inequalˇ ity (5) is called the monotonicity. According to Cencov [8], any information quantities satisfying (5) coincide with a constant times Fisher information Jθ . For an estimator that is defined as a map from the data set Ω to the parameter set Θ, we sometimes consider the unbiasedness condition: 6 T (ω)pθ (ω) dω = θ, ∀θ ∈ Θ. (6) Ω

The MSE of any unbiased estimator T is evaluated by the following inequality (Cramér-Rao inequality), 6 1 (T (ω) − θ)2 pθ (ω) dω ≥ , (7) Jθ Ω which follows from Schwartz inequality with respect to (w.r.t.) the inner 2 product X, Y := Ω X(ω)Y (ω)pθ (ω) dω for variables X, Y . When the number of data ωn := (ω1 , . . . , ωn ), which obeys the unknown probability pθ , is sufficiently large, we discuss a sequence {Tn } of estimators Tn (ωn ). If {Tn } is suitable as a sequence of estimators, we can expect that it converges

entire

December 28, 2004

13:56



423

to the true parameter θ in probability, i.e., it satisfies the weak consistency condition: lim pn {|Tn n→∞ θ

− θ| > ε} = 0,

∀ε > 0, ∀θ ∈ Θ.

(8)

Usually, the performance of a sequence {Tn } of estimators is measured by the speed of its convergence. As one criterion, we focus on the speed of the convergence in MSE. If a sequence {Tn } of estimators satisfies the weak consistency condition and some regularity conditions, the asymptotic version of Cramér-Rao inequality, 6 1 , (9) lim n (Tn (ωn ) − θ)2 pnθ (ω) dω ≥ n→∞ J θ Ω holds. If it satisfies only the weak consistency condition, it is possible that it surpasses the bound of (9) at a specific subset. Such a sequence of estimators is called superefficient. We can reduce its error to any amount at a specific subset with the measure 0 under the weak consistency condition (8). As another criterion, we evaluate the decreasing rate of the tail probability: −1 log pnθ {|Tn − θ| > ε}. (10) n→∞ n This method was initiated by Bahadur [2, 3, 4], and was a much discussed topic among mathematical statisticians in the 1970’s. From the monotonicity of the divergence, we can prove the inequality β({Tn }, θ, ε) := lim

β({Tn }, θ, ε) ≤ min{D(pθ+ε pθ ), D(pθ−ε pθ )}

(11)

for any weakly consistent sequence {Tn } of estimators. Its proof is essentially given in our proof of Theorem 5. Since it is difficult to analyze β({Tn }, θ, ε) except in the case of an exponential family, we focus on another quantity α({Tn }, θ) := limε→0 ε12 β({Tn }, θ, ε). For an exponential family, see Appendix K. Taking the limit ε → +0, we obtain the inequality Jθ . (12) 2 If Tn is the maximum likelihood estimator (MLE), the equality of (12) holds under some regularity conditions for the family [4, 9]. This type of discussion is different from the MSE type of discussion in deriving (12) from only the weak consistency condition. Therefore, there is no consistent superefficient estimator w.r.t. the large deviation evaluation. Indeed, we can relate the above large deviation type of discussion in the estimation to Stein’s lemma in simple hypothesis testing as follows. In α({Tn }, θ) ≤

entire

December 28, 2004

13:56


Masahito Hayashi

424

simple hypothesis testing, we decide whether the null hypothesis should be accepted or rejected from the data ωn := (ω1 , . . . , ωn ) which obeys an unknown probability. For the decision, we must define an acceptance region An as a subset of Ωn . If the null hypothesis is p and the alternative is q, the first error (though the true distribution is p, we reject the null hypothesis) probability β1,n (An ) and the second error (though the true distribution is q, we accept the null hypothesis) probability β2,n (An ) are given by β1,n (An ) := 1 − pn (An ),

β2,n (An ) := q n (An ).

Regarding the decreasing rate of the second error probability under the constant constraint of the first error probability, the equation lim

n→∞

−1 log min{β2,n (An )|β1,n (An ) ≤ ε} = D(p q), n

ε >0

(13)

holds (Stein’s lemma). Inequality (11) can be derived from this lemma. We can regard the large deviation type of evaluation in the estimation to be Stein’s lemma in the case where the null hypothesis is close to the alternative one. 3. Outline of Main Results Let us return to the quantum case. In a quantum setting, we focus two quantum analogues of Fisher information, KMB Fisher information J˜θ and SLD Fisher information Jθ . Indeed, if the state ρθ is degenerate, SLD Lθ is not uniquely determined. However, as is proven in Appendix C, SLD Fisher information Jθ is uniquely determined, i.e., it is independent of the choice of the SLD Lθ . On the other hand, according to Chap. 7 in Amari and Nagaoka [1], L˜θ has another form d log ρθ L˜θ = . dθ

(14)

As is proven by using formula (14) in Appendix B, KMB Fisher information J˜θ can be characterized as the limit of the the quantum relative entropy D(ρ σ) := Tr ρ(log ρ − log σ) in the following way 2 J˜θ = lim 2 D(ρθ+ε ρθ ). ε→0 ε

(15)

Moreover, in the linear response theory of statistical physics, given an equilibrium state ρ, when a variable A fluctuates with a small value δ, another

entire

December 28, 2004

13:56



425

variable B also is thought to fluctuate with a constant times δ [27]. Its coefficient is called the canonical correlation and given by 6 1 Tr ρtθ (A − Tr ρA)ρ1−t (16) θ (B − Tr ρB) dt. 0

Thus, KMB Fisher information J˜θ is thought to be more natural from a viewpoint of statistical physics. As another quantum analogue, right logarithmic derivative (RLD) Fisher information Jˇθ : ˇθL ˇ ∗θ , Jˇθ := Tr ρθ L

dρθ ρθ Lˇθ = dθ

θ ˇ is known. When ρθ does not commute dρ dθ and ρθ > 0, the RLD Lθ is not self-adjoint. Since it is not useful in the one-parameter case, we do not discuss it in this paper. Since the difference in definitions can be regarded as the difference in the order of operators, these quantum analogues coincide when all states of the family are commutative with each other. However, in the general case, they do not coincide and the inequality J˜θ ≥ Jθ holds, as exemplified in section 4. Concerning some information-geometrical properties, see Appendix A. In the following, we consider how the roles these quantum analogues of Fisher information play in the parameter estimation for the state family. As is discussed in detail in section 4, the estimator is described by the pair of positive operator valued measure (POVM) M (which corresponds to the measurement and is defined in section 4) and the map from the data set to the parameter space Θ. Similarly to the classical case, we can define an unbiased estimator. For any unbiased estimator E, the SLD Cramér-Rao inequality

V (E) ≥

1 Jθ

(17)

holds, where V (E) is the mean square error (MSE) of the estimator E. In an asymptotic setting, as a quantum analogue of the n-i.i.d. condition, we treat the quantum n-i.i.d. condition, i.e., we consider the case where the number of systems independently prepared in the same unknown state is sufficiently large, in section 5. In this case, the measurement is denoted by a POVM M n on the composite system H⊗n and the state is described by the tensor product density matrix ρ⊗n . Of course, such POVMs include a POVM that requires quantum correlations between the respective quantum systems in the measurement apparatus. Similarly to the classical case,

entire

December 28, 2004

426

13:56


Masahito Hayashi

= {E n } of estimators, we can define the weak consisfor a sequence E tency condition given in (31). In mathematical statistics, the square root n consistency, local asymptotic minimax theorems and Bayesian theorem are important topics as the asymptotic theory, but it seems too difficult to link these quantum settings and KMB Fisher information J˜θ . Thus, in this paper, in order to compare two quantum analogues from a unified framework, we adopt Bahadur’s large deviation theory as follows. As is discussed in sec θ, ε), α(E, θ). Similarly tion 5, we can similarly define the quantities β(E, to (11)(12), under the weak consistency (WC) condition, the inequalities θ, ε) ≤ min{D(ρθ+ε ρθ ), D(ρθ−ε ρθ )} β(E, θ) ≤ 1 J˜θ (18) α(E, 2 hold. From these discussions, the bound in the large deviation type of evaluation seems different from the one in the MSE case. However, as mentioned in section 6, the inequality θ) ≤ α(E,

1 Jθ 2

(19)

satisfies the strong consistency (SC) condition inholds if the sequence E troduced in section 6 as a stronger condition. As is mentioned in section 7, these bounds can be attained in their respective senses. Therefore, roughly speaking, the difference between the two quantum analogues can be regarded as the difference in consistency conditions and can be characterized as 1 1 1 1 θ, ε) = Jθ , sup lim inf 2 β(E, θ, ε) = J˜θ . sup lim inf 2 β(E, ε→0 ε→0 ε 2 ε 2 SC WC E: E: Even if we restrict our estimators to strongly consistent ones, the difference between two appears as 1 , θ, ε) = Jθ sup lim inf 2 β(M 2 ε→0 ε M:SC 1 , θ, ε) = J˜θ , lim 2 sup β(M 2 ε→0 ε M:SC

(20) (21)

where, for a precise statement, as expressed in section 9, we need more complicated definitions. However, we should consider that the bound J2θ is more meaningful for the following two reasons. The first reason is the fact that we can construct the sequence of estimators attaining the bound J2θ at all points, which is

entire

December 28, 2004

13:56



427

proven in section 7. On the other hand, there is a sequence of estimators ˜ attaining the bound J2θ at one point θ, but it cannot attain the bound at all points. The other reason is the naturalness of the conditions for deriving the bound J2θ . In other words, an estimator attaining J2θ is natural, but ˜

an estimator attaining J2θ is very irregular. Such a sequence of estimators can be regarded as a consistent superefficient estimator and does not satisfy regularity conditions other than the weak consistency condition. This type of discussion of the superefficiency is different from the MSE type of discussion in that any consistent superefficient estimator is bounded by inequality (18). To consider the difference between the two quantum analogues of Fisher information in more details, we must analyze how we can achieve the bound J˜θ 2 . It is important in this analysis to consider the relationship between the above discussion and the quantum version of Stein’s lemma in simple hypothesis testing. Similarly to the classical case, when the null hypothesis is the state ρ and the alternative is the state σ, we evaluate the decreasing rate of the second error probability under the constant constraint ε > 0 of the first error probability. As was proven in quantum Stein’s lemma, its exponential component is given by the quantum relative entropy D(ρ σ) for any ε > 0. Hiai and Petz [22] constructed a sequence of tests to attain the optimal rate D(ρ σ), by constructing the sequence {M n } of POVMs such that n 1 Mn lim D(PM (22) ρ Pσ ) = D(ρ σ). n→∞ n Ogawa and Nagaoka [33] proved that there is no test exceeding the bound D(ρ σ). It was proven by Hayashi [15] that by using the group representation theory, we can construct the POVM satisfying (22) independently of ρ. For the reader’s convenience, we give a review of this in Appendix J. As discussed in section 7.2, this type of construction is useful for the ˜ construction of an estimator attaining the bound J2θ at one point. Since the proper bound of the large deviation is J2θ , we cannot regard the quantum estimation as the limit of the quantum Stein’s lemma. ˜ In order to consider the properties of estimators attaining the bound J2θ at one point from another viewpoint, we consider the restriction that makes such a construction impossible. We introduce a class of estimators whose POVMs do not require a quantum correlation in the quantum apparatus in section 8. In this class, we assume that the POVM on the l-th system is chosen from l − 1 data. We call such an estimator an adaptive estimator. satisfies the weak consistency condition, the When an adaptive estimator E

entire

December 28, 2004

13:56


Masahito Hayashi

428

inequality 1 Jθ (23) 2 holds (See section 6). Similarly, we can define a class of estimators that use quantum correlations up to m systems. We call such an estimator an m-adaptive estimator. For any m-adaptive weakly consistent estimator E, inequality (23) holds. Therefore, it is impossible to construct a sequence ˜ of estimators attaining the bound J2θ if we fix the number of systems in which we use quantum correlations. As mentioned in section 8, taking limit m → ∞, we obtain θ) ≤ α(E,

lim lim

sup

m→∞ ε→0 M :m-AWC

1 , θ, ε) = Jθ , β(M 2 ε 2

(24)

where m-AWC denotes an m-adaptive weakly consistent estimator. However, as the third characterization of the difference between the two quantum analogues, as precisely mentioned in section 9, the equation ˜ 1 , θ, ε) = Jθ (25) sup β( M 2 ε→0 m→∞ 2 M:m-ASC ε holds, where m-ASC denotes an m-adaptive strongly consistent estimator. A more narrow class of estimators is treated in equation (25) than in equation (21). Equations (24) and (25) indicate that the order of limits limm→∞ and limε→0 is more crucial than the difference between two types of consistencies. lim lim

Remark 1: In the estimation only of the spectrum of a density matrix in a unitary-invariant family, the natural inner product in the parameter space is unique and equals Fisher inner product in the distribution family whose element is the probability distribution corresponding to eigenvalues of a density matrix. In addition, the achievable bound is derived by Keyl and Werner [26], and coincides with the bound uniquely given by the above inner product. For detail, see Appendix L. 4. Review of Non-Asymptotic Setting in Quantum Estimation In a quantum system, in order to discuss the probability distribution which the data obeys, we must define a POVM. A POVM M is defined as a map from Borel sets of the data set Ω to the set of bounded, self-adjoint and positive semi-definite operators, which

entire

December 28, 2004

13:56



satisfies M (φ) = 0,

M (Ω) = I,

429

M (Bi ) = M (∪Bi ) for disjoint sets.

i

If the state on the quantum system H is a density operator ρ and we perform a measurement corresponding to a POVM M on the system, the data obeys the probability distribution PM ρ (B) := Tr ρM (B). If a POVM 2 M satisfies M (B) = M (B) for any Borel set B, M is called a projectionvalued measure (PVM). The spectral measure of a self-adjoint operator X is a PVM, and is denoted by E(X). For 1 > λ > 0 and any POVMs M1 and M2 taking values in Ω, the POVM B )→ λM1 (B) + (1 − λ)M2 (B) is called the random combination of M1 and M2 in the ratio λ : 1 − λ. Even if M1 ’s data set Ω1 is different from M2 ’s data set Ω2 , M1 and M2 can H be regarded as POVMs taking values in the disjoint union set Ω1 Ω2 := (Ω1 × {1}) ∪ (Ω2 × {2}). In this case, we can define a random combination H of M1 and M2 as a POVM taking values in Ω1 Ω2 and call it the disjoint random combination. In this paper, we simplify the probability PM ρθ and M M

P ) to P , D(θ

θ the relative entropies D(ρθ0 ρθ1 ) and D(PM 0 1 ) and ρθ0 ρθ1 θ M D (θ0 θ1 ), respectively. In the one-parameter quantum estimation, the estimator is described by a pair comprising a POVM and a map from its data set to the real number set R. Since the POVM M ◦ T −1 takes values in the real number set R, we can regard any estimator as a POVM taking values in the real number set R. In order to evaluate MSE, Helstrom [20, 21] derived the SLD CramérRao inequality as a quantum counterpart of Cramér-Rao inequality (29). If an estimator M satisfies 6 x Tr ρθ M ( dx) = θ, ∀θ ∈ Θ, (26) R

it is called unbiased. If θ−θ0 is sufficiently small, we can obtain the following approximation in the neighborhood of θ0 : 6 ! 6 ∂ρθ x Tr ρθ0 M ( dx) + x Tr M ( dx) (θ − θ0 ) ∼ = θ0 + (θ − θ0 ). ∂θ θ=θ0 R R It implies the following two conditions: 6 ∂ρθ x Tr M ( dx) = 1 ∂θ θ=θ0 R 6 x Tr ρθ0 M ( dx) = θ0 . R

(27) (28)

entire

December 28, 2004

13:56


Masahito Hayashi

430

If an estimator M satisfies (27) and (28), it is called locally unbiased at θ0 . For any locally unbiased estimator M (at θ), the inequality, which is called the SLD Cramér-Rao inequality, 6 1 (x − θ)2 Tr ρθ M ( dx) ≥ (29) J θ R holds. Similarly to the classical case, this inequality is derived from the Schwartz inequality with respect to SLD Fisher information X|Y := X [20, 21, 23]. Tr ρθ XY +Y 2 The equality of (29) holds when the estimator is given by the spectral Lθ θ decomposition E( L Jθ + θ) of Jθ + θ, where Lθ is the SLD at θ and is with defined by (3). This implies that SLD Fisher information JθL0θ coincides E( J 0 +θ0 ) θ0 Fisher information at θ0 of the probability family Pθ θ ∈ Θ . The monotonicity of quantum relative [29, 39] gives the following entropy Lθ E( J 0 +θ0 ) evaluation of the probability family Pθ θ0 θ ∈ Θ :

D

E

Lθ 0 Jθ 0

+θ0

(θ θ0 ) ≤ D(θ θ0 ).

Taking the limit θ → θ0 , we have Jθ ≤ J˜θ .

(30)

In this paper, we discuss inequality (30) from the viewpoint of the large deviation type of evaluation of the quantum estimation. The following families are treated as simple examples of the one-parameter quantum state family, in the latter. Example 2: [One-parameter equatorial spin 1/2 system state family]: 1 1 + r cos θ r sin θ 0 ≤ θ < 2π Sr := ρθ := r sin θ 1 − r cos θ 2 In this family, we calculate D(ρθ ρ0 ) =

1+r r (1 − cos θ) log , 2 1−r

1+r r J˜θ = log , 2 1−r

Jθ = r2 .

Since the relations J˜θ = ∞ and Jθ = 1 hold in the case of r = 1, the two quantum analogues are completely different.

entire

December 28, 2004

13:56



431

Example 3: [One-parameter quantum Gaussian state family and half-line quantum Gaussian state family]: We define the boson coher|α|2 ∞ αn √ ent vector |α := e− 2 n=0 n! |n, where |n is the number vector on L2 (R). The quantum Gaussian state is defined as 6 |α−θ|2 1 |αα|e− N d2 α, ∀θ ∈ C. ρθ := πN C We call {ρθ |θ ∈ R} the one-parameter quantum Gaussian state family, and call {ρθ |θ ≥ 0(θ ∈ R+ = [0, ∞))} the half-line quantum Gaussian state family. In this family, we can calculate 1 1 2 2 ˜ |θ − θ0 | , Jθ = 2 log 1 + , Jθ = . D(ρθ ρθ0 ) = log 1 + N N N + 12 5. The Bound under the Weak Consistency Condition We introduce the quantum independent-identical density (i.i.d.) condition in order to treat an asymptotic setting. Suppose that n-independent physical systems are prepared in the same state ρ. Then, the quantum state of the composite system is described by ρ⊗n := ρ ⊗ · · · ⊗ ρ on H⊗n , n

where the tensor product space H

⊗n

is defined by

H⊗n := H · · ⊗ H . ⊗ · n

We call this condition the quantum i.i.d. condition, which is a quantum analogue of the independent-identical distribution condition. In this setting, any estimator is described by a POVM M n on H⊗n , whose data set is R. In n n n n n and D(PM

PM ) to PM and DM (θ0 θ1 ). this paper, we simplify PM θ ρ⊗n ρ⊗n ρ⊗n θ

θ0

θ1

The notation M × n denotes the POVM in which we perform the POVM M for the respective n systems. Definition 4: [Weak consistency condition]: A sequence of estimators := {M n }∞ is called weakly consistent if M n=1

n lim PM |θˆ − θ| > ε = 0, ∀θ ∈ Θ, ∀ε > 0, (31) θ n→∞

where θˆ is the estimated value.

entire

December 28, 2004

432

13:56


Masahito Hayashi

This definition means that the estimated value θˆ converges to the true value θ in probability, and can be regarded as the quantum extension of (8). Now, we focus on the exponential component of the tail probability as follows:

, θ, ε) := lim sup −1 log PM n |θˆ − θ| > ε . β(M θ n n→∞ , θ, ε) We usually discuss the following value instead of β(M , θ) := lim sup α(M ε→0

1 , θ, ε) β(M ε2

(32)

, θ, ε). The following theorem can because it is too difficult to discuss β(M be proven from the monotonicity of the quantum relative entropy. Theorem 5: (Nagaoka [31, 32]) If a POVM M n on H⊗n satisfies the weakly consistent condition (31), the inequalities , θ, ε) ≤ inf{D(ρθ ρθ )||θ − θ | < ε} β(M ˜ , θ) ≤ Jθ α(M 2

(33) (34)

hold. Even if the parameter set Θ is not open (e.g., the closed half-line R+ := [0, ∞)), this theorem holds. Proof: The monotonicity of the quantum relative entropy yields the inequality pn,θ 1 − pn,θ ⊗n + (1 − pn,θ ) log , D(ρ⊗n θ ρθ ) ≥ pn,θ log pn,θ 1 − pn,θ |θ − θ| > ε, where we denote for any θ such that satisfies

Mn by pn,θ . Using the inequality the probability Pθ |θˆ − θ| > ε − (1 − pn,θ ) log (1 − pn,θ ) ≥ 0, we have

n ˆ − θ| > ε ⊗n log PM | θ θ D(ρ⊗n log pn,θ θ ρθ ) + h(pn,θ ) =− ≤ − , (35) n n npn,θ where h is the binary entropy defined by h(x) := −x log x−(1−x) log(1−x). Since the assumption guarantees that pn,θ → 1, the inequality , θ, ε) ≤ D(ρθ ρθ ) β(M holds, where we use the additivity of quantum relative entropy: ⊗n D(ρ⊗n θ ρθ ) = nD(ρθ ρθ ).

(36)

entire

December 28, 2004

13:56



433

Thus, we obtain (33). Taking the limit ε → 0 in inequality (36), we obtain (34). As another proof, we can prove this inequality as a corollary of the quantum Stein’s lemma [22, 33]. 6. The Bound under the Strong Consistency Condition As discussed in section 4, the SLD Cramér-Rao inequality guarantees that the lower bound of MSE is given by SLD Fisher information. Therefore, it is expected that the bound is connected with SLD Fisher information for large deviation. In order to discuss the relationship between SLD Fisher information and the bound for large deviation, we need another characterization with respect to the limit of the tail probability. We thus define

, θ, ε) := lim inf −1 log PM n |θˆ − θ| > ε β(M θ n→∞ n , θ) := lim inf 1 β(M , θ, ε). (37) α(M ε→0 ε2 , θ) with In the following, we attempt to link the quantity α(M SLD Fisher information. For this purpose, it is suitable to focus on an information quantity that satisfies the additivity and the monotonicity, as in the proof of Theorem 5. Its limit SLD 9 should be √ √ Fisher information. The Bures distance b(ρ, σ) := 2(1 − Tr | ρ σ|) = 9 √ √ √ √ minU:unitary Tr( ρ − σU )( ρ − σU )∗ is known to be an information quantity whose limit is SLD Fisher information, as mentioned in Lemma 6. Of course, it can be regarded as a quantum analogue of the Hellinger distance, and satisfies the monotonicity. Lemma 6: (Uhlmann [40], Matsumoto [30]). If there exists an SLD Lθ satisfying (3), then the equation 1 b2 (ρθ , ρθ+ε ) Jθ = lim ε→0 4 ε2

(38)

holds. A proof of Lemma 6 is given in Appendix C. As discussed in the latter, the Bures distance satisfies the monotonicity. Unfortunately, the Bures distance does not satisfy the additivity. √ √ However, the quantum affinity I(ρ σ) := −8 log Tr ρ σ = % & −8 log 1 − 12 b(ρ, σ)2 satisfies the additivity: I(ρ⊗n σ ⊗n ) = nI(ρ σ).

(39)

entire

December 28, 2004

13:56


Masahito Hayashi

434

Its classical version is called affinity in the following form [28]: ! √ √ I(p q) = −8 log pi qi .

(40)

i

As a trivial deformation of (38), the equation lim

ε→0

I(ρθ ρθ+ε ) = Jθ ε2

(41)

holds. The quantum affinity satisfies the monotonicity w.r.t. any measurement M (Jozsa [25], Fuchs [10]): 9 9 % M' M& M M ' Pρ (ω) Pσ (ω) . (42) I(ρ σ) ≥ I Pρ Pσ = −8 log ω

The most simple proof of (42) is given by Fuchs [10] who directly proved that 9 9 9 √ √ M (ω) . ρσ ρ ≤ PM (ω) P (43) Tr ρ σ ω

For the reader’s convenience, a proof of (43) is given in D. From (39),(41) and (42), we can expect that SLD Fisher information is, in a sense, closely related to a large deviation type of bound. From the additivity and the monotonicity of the quantum affinity, we can show the following lemma. Lemma 7: The inequality

, θ, sδ) + β (M , θ + δ, (1 − s)δ) ≤ I(ρθ ρθ+δ ) β (M 4 inf {s|1≥s≥0}

(44)

, θ, δ) := limε→+0 β(M , θ, δ − ε). holds, where we define β (M A proof of Lemma 7 is given in Appendix E. However, Lemma 7 cannot , θ) under the weak consistency condition, yield an inequality w.r.t. α(M unlike inequality (36). Therefore, we consider a stronger condition, which is given in the following. Definition 8: [Strong consistency condition]: A sequence of estima = {M n }∞ tors M n=1 is called strongly consistent if the convergence of (37) is , θ) is continuous for θ. A sequence uniform for the parameter θ and if α(M of estimators is called strongly consistent at θ if there exists a neighborhood U of θ such that it is strongly consistent in U .

entire

December 28, 2004

13:56



435

The square root n consistency is familiar in the field of mathematical statistics. However, in the large deviation setting, this strong consistency seems more suitable than the square root n consistency. As a corollary of Lemma 7, we have the following theorem. Theorem 9: Assume that there exists the SLD Lθ satisfying (3). If a se = {M n }∞ is strongly consistent at θ, then the quence of estimators M n=1 inequality , θ) ≤ Jθ α(M 2

(45)

holds. Proof: From the above assumption, for any real ε > 0 and any element , θ) − θ ∈ Θ, there exists a sufficiently small real δ > 0 such that (α(M 2 ε)ε ≤ β (M , θ, ε ), β (M , θ + δ, ε ) for ∀ε < δ. Therefore, inequality (44) yields the relations % 2 2 & , θ) − ε)δ 2 = 4(α(M , θ) − ε) inf s δ + (1 − s)2 δ 2 2(α(M {s|1≥s≥0}

β (M , θ, sδ) + β (M , θ + δ, (1 − s)δ) ≤ I(ρθ ρθ+δ ). (46) ≤ 4 inf {s|1≥s≥0}

Lemmas 6 and (46) guarantee (45) for ∀θ ∈ Θ. Remark 10: Inequality (43) can be regarded as a special case of the monotonicity w.r.t. any trace-preserving CP (completely positive) map C : S(H1 ) → S(H2 ): 2 ; % √ √ &2 ; Tr ρ σ ≤ Tr C(ρ) C(σ) (47) which is proven by Jozsa [25] because the map ρ )→ PM ρ can be regarded as a trace-preserving CP map from the C ∗ algebra of bounded operators on H to the commutative C ∗ algebra C(Ω), where Ω is the data set. 7. Attainabilities of the Bounds Next, we discuss the attainabilities of the two bounds J˜θ and Jθ in their respective senses. In this section, we discuss the attainabilities in two cases: the first case is the one-parameter quantum Gaussian state family, and the second case is an arbitrary one-parameter finite-dimensional quantum state family that satisfies some assumptions.

entire

December 28, 2004

13:56


Masahito Hayashi

436

7.1. One-parameter quantum Gaussian state family In this subsection, we discuss the attainabilities in the one-parameter quantum Gaussian state family. Theorem 11: In the one-parameter quantum Gaussian state family, the s = {M s,n }∞ (defined in the following) satisfies sequence of estimators M n=1 the strong consistency condition and the relations s , θ) = s , θ) = α(M α(M

1 Jθ = . 2 N + 12

(48)

s ]: We perform the POVM E(Q) for all systems, [Construction of M where Q is the position operator on L2 (R). The estimated value ξn is determined to be the mean value of n data. Proof: Since the equation @ 2 −2(x−αx )2 E(Q) e dx P|α$)α| ( dx) = π holds, we have the equation 6

|α−θ|2 1 E(Q) E(Q) E(Q) ( dx) = Pρθ ( dx) = P|α$)α| ( dx)e− N d2 α Pθ πN C > 2 2 − 2(x−θ) e 2N +1 dx. = π(2N + 1) Thus, we obtain the equation > PM θ

s,n

( dξn ) =

2 2 − 2(ξn −θ) e (2N +1)n dξn , π(2N + 1)n

which implies that s , θ, ε) = lim β(M

s,n −1 ε2 log PM {|ξn − θ| > ε} = . θ n N + 12

(49)

s = {M s,n }∞ Therefore, the sequence of estimators M n=1 attains the bound Jθ and satisfies the strong consistency condition. 2 Proposition 12: In the half-line quantum Gaussian state family, the sequence of estimators M w = {M w,n}∞ n=0 (defined in the following) satisfies the weak consistency condition and the strong consistency condition at

entire

December 28, 2004

13:56



437

R+ \ {0} and the relations

J˜0 1 = log 1 + , 2 N 1 Jθ = , ∀θ ∈ R+ \ {0}. α(M w , θ) = α(M w , θ) = 2 N + 12

α(M w , 0) = α(M w , 0) =

(50) (51)

This proposition indicates the significance of the uniformity of the convergence of (37). This proposition is proven in Appendix G. [Construction of M w ]: We perform the following unitary evolution: ⊗(n−1)

ρ⊗n )→ ρ√nθ ⊗ ρ0 θ

.

For detail, see Appendix F. We perform the number measurement E(N ) of the first system whose state is ρ√nθ , and let k be its data, where the number operator N is defined as N := n n|nn|. The estimated value Tn 9 is determined by Tn :=

k n.

Theorem 13: In the one-parameter quantum Gaussian state family, for }∞ any θ ∈ R, the sequence of estimators Mθw1 = {Mθw,n n=1 (defined in the 1 following) satisfies the weak consistency condition and the relations J˜θ 1 w w = log 1 + α(Mθ1 , θ1 ) = α(Mθ1 , θ1 ) = . (52) 2 N [Construction of Mθw1 ]: We divide n systems into two groups. One consists √ √ of n systems and the other, of n − n systems. We perform the PVM E(Q) for every system in the first group. Let ξ√n be the mean value in √ the first group, i.e., we perform the PVM M s, n for the first system. At the second step, we perform the following unitary evolution for the second group. √ ⊗(n− n)

ρθ

√ ⊗(n− n)

)→ ρθ−θ1

√

For details, see Appendix F.√ We perform the POVM M w,n− n for the ⊗(n− n) ; the data is written as Tn−√n . Then, we system whose state is ρθ−θ1 decide the final estimated value θˆ as θˆ := θ1 + sgn(ξ√n − θ1 )Tn−√n . Proof: Since M w,n

Pθ1 θ1

√ w,n− n ˆ Tn−√n > ε , θ − θ1 > ε = PM 0

entire

December 28, 2004

13:56


Masahito Hayashi

438

we have

M w,n −1 β(Mθw1 , θ1 ) = lim log Pθ1 θ1 θˆ − θ1 > ε n √ √ w,n− n n − n −1 Tn−√n > ε = β(M w , 0). √ log PM = lim 0 n n− n As is shown in Appendix G, we have 1 β(Mw , 0) = ε2 log 1 + , N which implies (52). Next, we prove the consistency in the case where θ > θ1 . In this case, it is sufficient √to discuss the case where θ − θ√1 > ε > 0. Since the first measurement M s, n and the second one M w,n− n are performed independently, we obtain

√ M w,n w,n− n Tn−√n − (θ − θ1 ) > ε Pθ θ1 θˆ − θ1 > ε ≤ PM θ √ s, n +PM ξ√n − θ1 ≤ 0 . θ Proposition 12 guarantees that the first term goes to 0, and Theorem 11 guarantees that the second term goes to 0. Thus, we obtain the consistency of Mθw1 . Similarly, we can prove the weak consistency the case where θ < θ1 .

7.2. Finite dimensional family In this subsection, we treat the case where the dimension of the Hilbert space H is k (finite). As for the attainability of the RHS of inequality (45), we have the following lemma. Lemma 14: Let θ0 be fixed in Θ. Under Assumptions 1 and 2, the sequence s (defined in the following) satisfies the strong consistency of estimators M θ0 condition at θ0 (defined in Def. 8) and the relation α(Mθs0 , θ0 ) = α(Mθs0 , θ0 ) =

Jθ0 . 2

(53)

[Assumption 1]: The map θ )→ ρθ is C 1 and ρθ > 0. L [Assumption 2]: The map θ )→ Tr ρθ Jθθ0 is injective i.e., one-to-one. 0 L [Construction of Ms ]: We perform the POVM E( θ0 ) for all systems. θ0

Jθ0

The estimated value is determined to be the mean value plus θ0 .

entire

December 28, 2004

13:56



439

Proof of Lemma 14: From Assumption 2, the weak consistency is satisfied. Let δ > 0 be a sufficiently small number. Define the function Lθ0 Tr ρθ Lθ0 − . (54) φθ,θ0 (s) := Tr ρθ exp s Jθ0 Jθ0 ' '

L Tr ρ L 'L ' Since ' Jθθ0 ' < ∞ and Tr ρθ Jθθ0 − Jθθ θ0 = 0, we have 0

0

0

φθ,θ0 (s) − 1 1 lim = Tr ρθ 2 s→0 s 2

Lθ0 Tr ρθ Lθ0 − Jθ0 Jθ0

2 .

When θ − θ0 is sufficiently small, the function x → sups (xs − log φθ,θ0 (s)) is continuous in (−δ, δ). Using Cramér’s theorem [7], we have

M s,n −1 log Pθ θ0 |θˆ − θ0 | > ε = lim n→∞ n min sup(εs − log φθ,θ0 (s)), sup(−εs − log φθ,θ0 (s )) s

s

for ε < δ. Taking the limit ε → 0, we have −1 Mθs,n lim lim 2 Pθ0 0 {|θˆ − θ0 | > ε} ε→0 n→∞ ε n sups (εs − log φθ,θ0 (s)) sups (−εs − log φθ,θ0 (s)) 1 , lim , = min lim = c−1 2 2 ε→0 ε→0 ε ε 2 θ,θ0 where

cθ,θ0 := Tr ρθ

Lθ0 Tr ρθ Lθ0 − Jθ0 Jθ0

2

because 1 1 εs − log φθ,θ0 (s) ∼ = εs − log(1 + cθ,θ0 s2 ) ∼ = εs − cθ,θ0 s2 2 2 cθ,θ0 ε2 ε =− . + s− 2 cθ,θ0 2cθ,θ0 The above convergence is uniform for the neighborhood of θ0 . Taking the limit θ → θ0 , we have 2 2 Lθ0 Lθ0 Tr ρθ Lθ0 Tr ρθ0 Lθ0 −1 − = Jθ0 = Tr ρθ0 − . lim Tr ρθ θ→θ0 Jθ0 Jθ0 Jθ0 Jθ0 Thus, we can check (53) and the strong consistency in the neighborhood of θ0 .

entire

December 28, 2004

440

13:56


Masahito Hayashi

s depends on the true parameter However, this sequence of estimators M δ θ0 . We should construct a sequence of estimators that satisfies the strong J consistency condition and attains the bound 2θ0 at all points θ0 . Since such a construction is too difficult, we introduce another strong consistency condition that is weaker than the above and under which inequality (45) holds. We construct a sequence of estimators that satisfies this strong consistency condition and attains the bound given in (45) for all θ in a weak sense. = [Second strong consistency condition]: A sequence of estimators M {M n } is called second strongly consistent if there exists a sequence of func , θ, ε)}∞ such that tions {β m (M m=1 1 , θ, ε) = α(M , θ). • lim lim 2 β m (M m→∞ ε→0 ε 1 , θ, ε) ≤ α(M , θ) holds. Its LHS converges locally uni• lim 2 β m (M ε→0 ε formly to θ. , θ, ε) ≥ β (M , θ, ε), for δ > ∀ε > 0. • ∀m, ∃δ > 0 s.t. β(M m Similarly to Theorem 9, we can prove inequality (45) under the second strong consistency condition. Under these preparations, we state a theorem with respect to the attainability of the bound Jθ . The following theorem can be regarded as a special case of Theorem 8 of [18]. Theorem 15: Under Assumptions 1 and 3, the sequence of estimators s = {M s,n }∞ M n=1 (defined in the following) satisfies the second strong conδ δ sistency condition and the relations s , θ) = α(M s , θ) = (1 − δ) α(M δ δ

Jθ . 2

(55)

s is independent of the unknown parameter The sequence of estimators M δ s,n θ. Every Mδ is an adaptive estimator and will be defined in section 8. Its proof is given in Appendix H. [Assumption 3]: The following set is compact.   2 !−1 2   Lθˇ Tr ρθ Lθˇ Lθˇ Tr ρθ Lθˇ ˇ∈ Θ ∀θ, θ − , Tr ρθ − Tr ρθ   Jθˇ Jθˇ Jθˇ Jθˇ If the state family is included by a bounded closed set consisting of positive definite operators, Assumption 3 is satisfied.

entire

December 28, 2004

13:56



441

s ]: We perform a faithful POVM Mf (defined in the [Construction of M δ following) for the first δn systems. Then, the data (ω1 , . . . , ωδn ) obey the M probability family {Pθ f |θ ∈ Θ}. We denote the maximum likelihood esˇ Next, we perform the timator (MLE) w.r.t. the data (ω1 , . . . , ωδn ) by θ. measurement E(Lθˇ) defined by the spectral measure of Lθˇ for other (1−δ)n systems. Then, we have data (ωδn+1 , . . . , ωn ). We decide the final estimated value Tθˇn as Tr ρTθˇn Lθˇ =

n 1 ωi . (1 − δ)n i=δn+1

Definition 16: A POVM M is called faithful, if the map ρ ∈ S(H) )→ PM ρ is one-to-one. An example of faithful POVM, which is a POVM taking values in the set of pure states on H, is given by Mh ( dρ) := kρν( dρ) , where ν is the invariant (w.r.t. the action of SU(H)) probability measure on the set of pure states on H. As another example, if L1 , . . . Lk2 −1 is a basis of the space of self-adjoint traceless operators, a disjoint random combination of PVMs E(L1 ), . . . E(Lk2 −1 ) is faithful. Note that a disjoint random combination is defined in section 4. √ √ Remark 17: By dividing n systems into n and n − n systems, Gill and Massar [11] constructed an estimator which asymptotically attains the optimal bound w.r.t. MSE, and Hayashi and Matsumoto [17] constructed a similar estimator by dividing them into bn and n − bn systems, where lim bnn = 0. However, in our proof, it is difficult to show the attainability of the bound (45) in such a division. Perhaps, there may exist a family in which such an estimator does not attain the bound (45). At least, it is essential in our proof that the number of the first group bn satisfy lim bnn > 0. Conversely, as is mentioned in Theorems 13 and 18, by dividing n sys√ √ tems into n and n − n systems, we can construct an estimator attaining the bound (34) at one point. We must use quantum correlations in the quantum apparatus to achieve ˜ the bound J2θ . The following theorem can be easily extended to the multiparameter case. Theorem 18: We assume Assumption 1 and that D(ρθ ρθ1 ) < ∞ for ∀θ1 , ∀θ ∈ Θ. Then, for any θ1 ∈ Θ, the sequence of estimators Mθw1 =

entire

December 28, 2004

13:56


Masahito Hayashi

442

{Mθw,n }∞ n=1 satisfies the weak consistency condition (31), and the equations 1 {D(ρθ ρθ1 )||θ1 − θ | > ε}, (56) β(Mθw1 , θ1 , ε) = β(Mθw1 , θ1 , ε) = inf θ ∈Θ

J˜θ α(Mθw1 , θ1 ) = α(Mθw1 , θ1 ) = 1 . 2

(57)

The sequence of estimators Mθw1 depends on the unknown parameter θ1 but not on ε > 0. is Its proof is given in Appendix I. In the following construction, Mθw,n 1 constructed from the PVM Eθn1 , which is defined from a group-theoretical viewpoint in in Appendix J.3. ]: We divide the n systems into two groups. We [Construction of Mθw,n 1 √ perform a faithful POVM Mf for the first group of n systems. Then, M the data (ω1 , . . . , ω√n ) obey the probability Pθ f . We let θˇ be the MLE of M

the data (ω1 , . . . , ω√n ) under the probability family {Pθ f |θ ∈ Θ}. Next, √

n− n

we perform the correlational PVM Eθ1 for the composite system which √ the data ω √obeys consists of the other √group of n − n systems. Then, √ E

n−

n

E

n−

n

E

n−

n

the probability Pθ θ1 . If en(1−δn− n )D(ρθˇρθ1 ) Pθ1θ1 (ω) ≥ Pθˇ θ1 (ω), the estimated value Tn is decided to be θ1 , where δn := 11 . If not, Tn is n5 ˇ decided to be θ. √

The following lemma proven in Appendix J plays an important role in the proof of Theorem 18. Lemma 19: For three parameters θ0 , θ1 and θ2 and δ > 0, the inequalities En En 1 Pθ0θ1 − log Pθ2θ1 (ω) + Tr ρθ0 log ρθ2 ≥ δ n ≤ exp −n

sup (δ − Tr ρθ0 log ρθ2 )t

0≤t≤1

En

Pθ0θ1

(k + 1) log(n + 1) − log Tr ρθ0 ρθ2 −t −t n Eθn1 1 log Pθ1 (ω) − Tr ρθ0 log ρθ1 ≥ δ n t ≤ exp −n sup(δ + Tr ρθ0 log ρθ1 )t − log Tr ρθ0 ρθ1 0≤t

hold.

! (58)

(59)

entire

December 28, 2004

13:56



443

We obtain the following theorem as a review of the above discussion. Theorem 20: From Theorems 5, 9 and 15 and Lemma 14, we have the equations ˜ 1 , θ, ε) = sup lim inf 1 β(M , θ, ε) = Jθ (60) sup lim sup 2 β(M ε→0 ε2 2 ε→0 ε : WC WC M M: 1 , θ, ε) = Jθ sup lim inf 2 β(M (61) ε→0 ε 2 : SC at θ M as an operational comparison of J˜θ and Jθ under Assumptions 1, 2 and 3. , θ, ε) in equations (60). , θ, ε) with β(M We can replace β(M We can also prove (30) as a consequence of equations (60) and (61). 8. Adaptive Estimators In this section, we assume that the dimension of the Hilbert space H is finite. We consider estimators whose POVM is adaptively chosen from the data. We choose the l-th POVM Ml (ωl−1 ) on H from l − 1 data ωl−1 := (ω1 , . . . , ωl−1 ). Its POVM M n is described by M n ( ωn ) := M1 (ω1 ) ⊗ M2 (ω1 ; ω2 ) ⊗ · · · ⊗ Mn (ωn−1 ; ωn ).

(62)

In this setting, the estimator is written as the pair En = (M , Tn ) of the POVM M n satisfying (62) and the function Tn : Ωn )→ Θ. Such an estimator En is called an adaptive estimator. As a larger class of POVMs, the separable POVM is well known. A POVM M n on H⊗n is called separable if it is written as n

M n = {M1 (ω) ⊗ · · · ⊗ Mn (ω)}ω∈Ω on H⊗n , where Mi (ω) is a positive semi-definite operator on H. For any separable estimator (M n , Tn ), the relations In n Tr ρθ Ml (ω) Mn D (θ θ ) = Tr ρθ Ml (ω) log Inl=1 l=1 Tr ρθ Ml (ω) ω∈Ω = =

l =1 n

Tr ρθ Ml (ω)

ω∈Ω l =1 n

l=1

=

l=1

log

Tr ρθ Ml (ω) Tr ρθ Ml (ω)

aθ,l (ω) Tr ρθ Ml (ω) log

l=1 ω∈Ω n

n

DMθ,l (θ θ ) ≤ n

aθ,l (ω) Tr ρθ Ml (ω) aθ,l (ω) Tr ρθ Ml (ω)

sup M:POVM on

DM (θ θ ) H

(63)

entire

December 28, 2004

13:56


Masahito Hayashi

444

hold, where the POVM Mθ,l on H is defined by  Mθ,l (ω) := aθ,l (ω)Ml (ω),



aθ,l (ω) := 

Tr ρθ Ml (ω) .

l =l

= {En } = Theorem 21: If a sequence of separable estimators M n {(M , Tn )} satisfies the weak consistency condition, the inequalities , θ1 , ε) ≤ β(M , θ1 ) ≤ α(M

inf

DM (θ θ1 )

sup

|θ−θ1 | >ε M:POVM

on

(64)

H

Jθ1 2

(65)

hold. Proof: Similarly to (35), the monotonicity of quantum relative entropy yields n

ωn ) − θ1 | > ε} log PM DM (θ θ1 ) + h(Pn ) θ1 {|Tn ( ≤ , − n nPn n

n

ωn ) − θ1 | > ε}. From the weak consistency, we have where Pn := PM θ {|Tn ( Pn → 1. Thus, we obtain (64) from (63). Since H is finite-dimensional, the set of extremal points of POVMs is compact. Therefore, the convergence limε→0 ε12 DM (θ1 + ε θ1 ) is uniform w.r.t. M . This implies that 1 ε→0 ε2 lim

sup M:POVM on

DM (θ1 + ε θ1 ) H

1 Jθ = sup (66) lim DM (θ1 + ε θ1 ) = 1 . ε→0 ε2 2 M:POVM on H The last equation is derived from (29). The preceding theorem holds for any adaptive estimator. As a simple extension, we can define an m-adaptive estimator that satisfies (62) when every ωl−1 ) is a POVM on Hm . As a corollary of Theorem 21, we have the Ml ( following. = {En } = Corollary 22: If a sequence of m-adaptive estimators M n {(M , Tn )} satisfies the weak consistency condition, then the inequalities , θ1 , ε) ≤ β(M

inf

|θ−θ1 | >ε

, θ1 ) ≤ Jθ1 α(M 2 hold.

sup M:POVM on

H⊗m

1 M D (θ θ1 ) m

(67) (68)

entire

December 28, 2004

13:56



445

Now, we obtain the equation lim lim

sup

m→∞ ε→0 M :m-AWC

Jθ 1 . β M , θ, ε) = ε2 2

(69)

The part of ≥ holds because an adaptive estimator attaining the bound is constructed in Theorem 15, and the part of ≤ follows from (67) and the equation lim

ε→0

=

sup M:POVM on

sup M:POVM on

H⊗m

lim

H⊗m

ε→0

1 DM (θ1 + ε θ1 ) ε2 m 1 ε2 m

DM (θ1 + ε θ1 ) =

Jθ1 , 2

which is proven in a similar manner as (66). 9. Difference in Order among Limits and Supremums Theorem 20 yields another operational comparison as 1 , θ, ε) = sup lim inf 2 β(M ε→0 ε M: SC at θ 1 , θ, ε) = lim sup β(M ε→0 ε2 M: SC at θ

Jθ 2

(70)

J˜θ . 2

(71)

Equation (70) equals (61) and equation (71) follows from Theorem 23. ˜ Therefore, the difference between J2θ and J2θ can be regarded as the difference in the order of lim inf ε→0 and supM : SC . This comparison was naively discussed by Nagaoka [31, 32]. Theorem 23: We adopt Assumption 1 in Theorem 15 and D(ρθ ρθ1 ) < m,δ = {M m,δ,n } of ∞ for ∀θ ∈ Θ. For any δ > 0, there exists a sequence M θ0 θ0 m-adaptive estimators satisfying the strong consistency condition and the inequality M m,δ,n −1 log Pθ0 θ0 {|θˆ − θ0 | > ε} n→∞ nm

lim

≥ (1 − δ) inf { D(θ θ0 )| |θ − θ0 | > ε} −

(1 − δ)(k − 1) log(m + 1) . m

However, using Theorem 23, we obtain a stronger equation than (71): lim lim

sup ε→0 m→∞ M:m-ASC at

θ

˜ 1 , θ, ε) = Jθ , β( M ε2 2

(72)

entire

December 28, 2004

13:56


Masahito Hayashi

446

where m-ASC at θ denotes m-adaptive and is strongly consistent at θ. This equation is in contrast with (69). Of course, the part of ≤ for (72) follows from (67). The part of ≥ for (72) is derived from the above theorem. The following two lemmas are essential for our proof of Theorem 23. Lemma 24: For two parameters θ1 and θ0 , the inequality m

mD(θ0 θ1 ) − (k − 1) log(m + 1) ≤ DEθ1 (θ0 θ1 ) ≤ mD(θ0 θ1 )

(73)

holds, where the PVM Eθm1 on H⊗m is defined in Appendix J.3. It is independent of θ0 . This lemma was proven by Hayashi [15] and can be regarded as an improvement of Hiai and Petz’s result [22]. However, Hiai and Petz’s original version is sufficient for our proof of Theorem 23. For the reader’s convenience, the proof is presented in Appendix J.3. Lemma 25: Let Y be a curved exponential family and X be an exponential family including Y . For a curved exponential family and an exponential family, see Chap 4 in Amari and Nagaoka [1] or Barndorff-Nielsen [5]. ML (ω n ) for the exponential In this setting, for n-i.i.d. data, the MLE TX,n family X is a sufficient statistic for the curved exponential family Y , where ωn := (ω1 , . . . , ωn ). Using the map T : X → Y , we can define an estimator ML , and for an estimator TY , there exists a map T : X → Y such that T ◦ TX,n ML . We can identify a map T from X to Y with a sequence of TY = T ◦ TX,n ML ( ωn ). We define the map Tθ0 : X → Y as estimators T ◦ TX,n Tθ0 := arg min{D(x θ)|D(θ θ0 ) ≤ D(x θ0 )}. θ∈Y

(74)

When Y is an exponential family (i.e., flat), Tθ0 coincides with the projection to Y . Then, the sequence of estimators corresponding to the map Tθ0 satisfies the strong consistency at θ0 and the equation −1 ML log pnθ0 { Tθ0 ◦ TX,n (ωn ) − θ0 > ε} lim n→∞ n (75) = inf {D(θ θ0 )| θ − θ0 > ε} θ∈Y

holds. Proof: It is well known that for any subset X ⊂ X, the equation 1 ML lim − log pnθ0 {TX,n (ωn ) ∈ X } = inf D(x θ0 ) n→∞ x∈X n

(76)

holds. For the reader’s convenience, we present a proof of (76) in Appendix K. Thus, equation (75) follows from (74) and (76). If Y is an exponential

entire

December 28, 2004

13:56



447

ML family, then the estimator Tθ0 ◦ TX,n coincides with the MLE and satisfies the strong consistency. Otherwise, we choose a neighborhood U of θ0 so that we can approximate the neighborhood U by the tangent space. The ML can be approximated by the MLE and satisfies the estimator Tθ0 ◦ TX,n strong consistency at U . Thus, it also satisfies the strong consistency at θ0 .

Proof of Theorem 23: Let M = {Mi } be a faithful POVM defined in section 7.2 such that the number of operators Mi is finite. For any m and any δ > 0, we define the POVM Mθm0 to be the disjoint random combination of M × m and Eθm0 with the ratio δ : 1 − δ. Note that a disjoint random combination is defined in section 4. From the definition of Mθm0 , the inequality m

m

(1 − δ)DEθ0 (θ θ) ≤ DMθ0 (θ θ)

(77) Mm Pθ θ0

holds. Since the map θ )→ PM is also θ is one-to-one, the map θ )→ one-to-one. Since M and Eθm0 are finite-resolutions of the identity, the oneMm

parameter family {Pθ θ0 |θ ∈ Θ} is a subset of multi-nominal distributions X, which is an exponential family. Applying Lemma 25, we have M m ×n −1 ML log Pθ0 θ0 {|Tθ0 ◦ TX,n (ωn ) − θ0 | > ε} n→∞ nm m 1 = inf {DMθ0 (θ θ0 ) |θ − θ0 | > ε} M θ∈Θ

m (1 − δ) inf DEθ0 (θ θ0 ) |θ − θ0 | > ε ≥ m (1 − δ)(k − 1) log(m + 1) , ≥ (1 − δ) inf { D(θ θ0 )| |θ − θ0 | > ε} − m where the first inequality follows from (77) and the second inequality follows from (73).

lim

Remark 26: In the case of the one-parameter equatorial spin 1/2 system Eθm0

state family, the map θ )→ Pθ not Eθm0 but Mθm0 .

is not one-to-one. Therefore, we must treat

Conclusions It has been clarified that SLD Fisher information Jθ gives the essential large deviation bound in the quantum estimation and KMB Fisher information J˜θ gives the large deviation bound of consistent superefficient estimators. ˜ Since estimators attaining the bound J2θ are unnatural, the bound J2θ is

entire

December 28, 2004

13:56


Masahito Hayashi

448

more important from the viewpoint of quantum estimation than the bound J˜θ 2 . On the other hand, as is mentioned in A, concerning a quantum analogue of information geometry from the viewpoint of e-connections, KMB is the most natural among the quantum versions of Fisher information. The interpretation of these two facts which seem to contradict each other, remains a problem. Similarly, it is a future problem to explain geometrically the relationship between the change of the orders of limits and the difference between the two quantum analogues of Fisher information. Acknowledgments The author wishes to thank Professor H. Nagaoka for encouragement and essential advice regarding this manuscript. He also wishes to thank Professor K. Matsumoto for useful advice regarding this manuscript. He is grateful to Professor S. Amari and Professor A. Tomita and two anonymous referees, whose comments helped to considerably improve the presentation. Appendix A. Brief Review of Information-geometrical Properties of Jθ , J˜θ and Jˇθ The quantum analogues of Fisher information Jθ , J˜θ and Jˇθ are obtained from the the inner products Jρ , J˜ρ and Jˇρ on the linear space consisting of self-adjoint operators: 6 1 ˜ B ρ1−t dt = B ˜B, J˜ρ (A, B) := Tr AL ρt L 0

Jρ (A, B) := Tr ALB , ˇB, Jˇρ (A, B) := Tr AL

1 (LB ρ + ρLB ) = B 2 ˇB B = ρL

in the following way: dρθ dρθ dρθ dρθ dρθ dρθ ˜ ˇ ˜ ˇ , , , Jθ = Jρθ , Jθ = Jρθ , Jθ = Jρθ . dθ dθ dθ dθ dθ dθ In the multi-dimensional case, these are regarded as metrics as follows. For example, we can define a metrics ∂ρθ ∂ρθ , (78) ∂i , ∂j = Jρθ ∂θi ∂θj on the tangent space at θ, and the RHS of (78) is called SLD Fisher matrix.

entire

December 28, 2004

13:56



449

In quantum setting, any information precessing is described by a tracepreserving CP (completely positive) map C : S(H) → S(H ). These inner product satisfy the monotonicity: dρθ dρθ dC(ρθ ) dC(ρθ ) , , Jρθ ≥ JC(ρθ ) dθ dθ dθ dθ dρθ dρθ dC(ρθ ) dC(ρθ ) ˜ ˜ , ≥ JC(ρθ ) , Jρθ dθ dθ dθ dθ dρθ dρθ dC(ρθ ) dC(ρθ ) ˇ ˇ Jρθ , , ≥ JC(ρθ ) dθ dθ dθ dθ for a one-parametric density family {ρθ ∈ S(H)|θ ∈ Θ ⊂ R} [1]. These inequalities can be regarded as the quantum versions of (5). An inner product satisfying the above is called a monotone inner product. According Petz [34], the inner product Jˇρ is the maximum one among normalized monotone inner products, and the inner product Jρ is the minimum one. In the information geometry community, we usually discuss the torsion. As is known within this community, α-connection is a generalization of e-connection. The torsion of α-connection concerning Fisher inner product vanishes in any distribution family [1]. In quantum setting, we can define the e-connections with respect to several quantum Fisher inner products. One may expect that in a quantum setting, its torsion vanishes in any density family. However, for only the inner product J˜ρ , the torsion of e-connection vanishes in any density family [1]. Thus, KMB Fisher information seems the most natural quantum analogue of Fisher information, from an informationgeometrical viewpoint. B. Proof of (15) From (14), we can calculate as D(ρθ+ε ρθ ) = Tr (ρθ+ε (log ρθ+ε − log ρθ )) d log ρθ 1 d2 log ρθ 2 dρθ ∼ ε ε+ ε = Tr ρθ + dθ dθ 2 dθ2

1 dρθ ˜ d2 log ρθ Lθ + Tr ρθ ε2 . = Tr ρθ L˜θ ε + Tr dθ 2 dθ2 Next, we calculate the above coefficients 6 1

dρθ dt = Tr Tr ρθ L˜θ = Tr ρtθ L˜θ ρ1−t = 0. θ dθ 0

(79)

(80)

entire

December 28, 2004

13:56


Masahito Hayashi

450

Using (80) and (14), we have dρθ d log ρθ d2 log ρθ d log ρθ d Tr ρθ = Tr ρ − Tr θ dθ2 dθ dθ dθ dθ dρθ ˜ Lθ = −J˜θ . (81) = − Tr dθ From (79), (80) and (81), we obtain 1 D(ρθ+ε ρθ ) ∼ = J˜θ ε2 . 2 C. Proof of Lemma 6 We define the unitary operator Uε as √ √ & % √ √ √ √ b2 (ρθ , ρθ+ε ) = 2 1 − Tr ρθ ρθ+ε = Tr( ρ − σUε )( ρ − σUε )∗ . √ Letting W (ε) be ρθ+ε Uε , then we have b2 (ρθ , ρθ+ε ) = Tr(W (0) − W (ε))(W (0) − W (ε))∗ ∗ dW dW dW dW ∼ (0)ε − (0)ε ∼ (0) (0)∗ ε2 . = Tr − = Tr dε dε dε dε As is proven in the following discussion, the SLD L satisfies dW 1 (0) = LW (0). dε 2

(82)

Therefore, we have 1 1 b2 (ρθ , ρθ+ε ) ∼ = Tr LW (0)W (0)∗ Lε2 = Tr L2 ρθ ε. 4 4 We obtain (38). It is sufficient to show (82). From the definition of the Bures distance, we have √ √ √ √ b2 (ρθ , ρθ+ε ) = min Tr( ρθ − ρθ+ε U )( ρθ − ρθ+ε U )∗ U:unitary √ √ √ √ = 2 − max Tr ρθ ρθ+ε U ∗ + U ρθ+ε ρθ U:unitary √ √ √ √ = 2 − Tr ρθ ρθ+ε + ρθ+ε ρθ %√ √ √ & √ = 2 − Tr ρθ ρθ+ε U (ε)∗ + U (ε) ρθ+ε ρθ , √ √ √ √ which implies that ρθ ρθ+ε U (ε)∗ = U (ε) ρθ+ε ρθ . Therefore, W (0)W (ε)∗ = W (ε)W (0)∗ . Taking the derivative, we have W (0)

dW dW (0)∗ = (0)W (0)∗ , dε dε

entire

December 28, 2004

13:56



451

which implies that there exists a self-adjoint operator L such that 1 dW (0) = LW (0). dε 2 Since ρθ+ε = W (ε)W (ε)∗ , we have dρ 1 (θ) = (LW (0)W (0)∗ + W (0)W (0)∗ L) . dθ 2 Thus, the operator L coincides with the SLD.

D. Proof of (43) Let M = {Mi } be an arbitrary POVM. We choose the unitary U satisfying ; U σ 1/2 ρ1/2 = ρ1/2 σρ1/2 . Using the Schwartz inequality, we have 9 9 PM (ω) PM ρ σ (ω) @ ∗ @ ∗ 1/2 1/2 ∗ 1/2 1/2 ∗ 1/2 1/2 Tr Mω ρ1/2 Mω σ U Mω ρ1/2 = Tr Mω σ U

∗ ≥ Tr Mω1/2 σ 1/2 U ∗ Mω1/2 ρ1/2 = Tr U σ 1/2 Mω ρ1/2 . Therefore, 9 9 M PM Tr U σ 1/2 Mω ρ1/2 Tr U σ 1/2 Mω ρ1/2 ≥ ρ (ω) Pσ (ω) ≥ ω ω ω ; = Tr U σ 1/2 ρ1/2 = Tr ρ1/2 σρ1/2 . E. Proof of Lemma 7 Let m and ε be an arbitrary positive integer and an arbitrary positive real number, respectively. There exists a sufficiently large integer N such that n 1 , θ, δ i + ε ˆ − θ| > δ i ≤ −β M log PM | θ θ n m m n δ 1 ˆ − (θ + δ)| > δ (m − i) ≤ −β M , θ + δ, log PM (m − i) +ε | θ θ+δ n m m

entire

December 28, 2004

13:56


Masahito Hayashi

452

for i = 0, . . . , m and ∀n ≥ N . From the monotonicity (42) and the additivity (39) of quantum affinity, we perform the following evaluation: n 1 ⊗n − I(ρθ ρθ+δ ) = − I(ρ⊗n θ ρθ+δ ) 8 8

12

12 n n ˆ θˆ ≤ θ PM ≤ log PM θ θ+δ θ ≤ θ

12

1 n ˆ 2 θ + δ < θˆ PM θ+δ θ + δ < θ 12 m δ δ Mn ˆ + Pθ θ + (i − 1) < θ ≤ θ + i m m i=1 12 ! δ δ Mn ×Pθ+δ θ + (i − 1) < θˆ ≤ θ + i m m +PM θ

n

12

12 n n ˆ ˆ ≤ log PM + PM θ − θ > δ θ+δ θ − (θ + δ) ≥ δ θ 12 δ ˆ θ − θ > (i − 1) m i=1 12 ! δ ˆ Mn ×Pθ+δ θ − (θ + δ) ≥ (m − i) m 1

12 2 n n δ ˆ ˆ (m − 1)δ + PM ≤ log PM θ − θ > δ θ+δ θ − (θ + δ) > θ m 12 m δ Mn ˆ + Pθ θ − θ > (i − 1) m i=1 12 ! n δ ˆ (m − i − 1) ×PM θ+δ θ − (θ + δ) > m n , θ, δ (m − 1) − ε ≤ log exp − β M 2 m

n , θ + δ, δ − ε β M + exp − 2 m δ n exp − + β M , θ, (i − 1) − ε 2 m i=1 ! n δ , θ + δ, (m − i − 1) − ε − β M 2 m +

m

PM θ

n

entire

December 28, 2004

13:56



453

δ n ≤ log(m + 2) exp − min β M , θ, (i − 1) 2 0≤i≤m m δ + β M , θ + δ, (m − i − 1) − 2ε m n , θ, δ (i − 1) min β M = log(m + 2) − 0≤i≤m 2 m ! δ +β M , θ + δ, (m − i − 1) − 2ε , m , θ, a) = 0 for any negative real number a. Taking where we assume that β(M the limit n → ∞ after dividing by n, we have 1 I(ρθ ρθ+δ ) 8 δ δ 1 ≥ min β M , θ, (i − 1) + β M , θ + δ, (m − i − 1) − 2ε . 2 0≤i≤m m m Since ε > 0 is arbitrary, the inequality 1 I(ρθ ρθ+δ ) 8 δ δ 1 ≥ min β M , θ, (i − 1) + β M , θ + δ, (m − i − 1) 2 0≤i≤m m m holds. Taking the limit m → ∞, we obtain (44). K. Large Deviation Theory for an Exponential Family In this section, we review the large deviation theory for an exponential family. A d-dimensional probability family is called an exponential family if there exist linearly independent real-valued random variables F1 , . . . , Fd and a probability distribution p on the probability space Ω such that the family consists of the probability distribution d ! i pθ ( dω) := exp θ Fi (ω) − ψ(θ) p( dω) 6

i=1

ψ(θ) := log

exp Ω

d

! i

θ Fi (ω) p( dω).

i=1

In this family, the parametric space is given by Θ := {θ ∈ Rd |0, < ψ(θ) < ∞}, the parameter θ is called the natural parameter and the function ψ(θ)

entire

December 28, 2004

13:56


Masahito Hayashi

454

is called the potential. We define the dual potential φ(θ) and the dual parameter η(θ), called the expectation parameter, as 6 ∂ψ(θ) = log Fi (ω)pθ ( dω) (83) ηi (θ) := ∂θi Ω ! d i φ(θ) := max θ ηi (θ) − ψ(θ ) . θ

i=1

From (83), we have φ(θ) =

d

θi ηi (θ) − ψ(θ).

i=1

In this family, the sufficient statistics are given by F1 (ω), . . . , Fd (ω). The ˆ ˆ = Fi (ω). The KL divergence D(θ θ0 ) := MLE θ(ω) is given by ηi (θ(ω)) D(pθ pθ0 ) is calculated by 6 pθ (ω) pθ ( dω) log D(θ θ0 ) = pθ0 (ω) Ω 6 (θi − θ0i )Fi (ω) + ψ(θ0 ) − ψ(θ)pθ ( dω) = Ω

=

i

(θi − θ0i )ηi (ω) + ψ(θ0 ) − ψ(θ) = φ(θ) + ψ(θ0 ) −

i

= max

θ

= max θ

i

+ ψ(θ0 ) −

i

i

(θ −

i θ 0 )ηi (θ)

6 − log

exp Ω

i

θ0i ηi (ω)

i

! θ ηi (θ) − ψ(θ )

θ0i ηi (θ)

i

! (θ − i

θ0i )Fi (ω)

pθ ( dω).

i

Next, we discuss the n-i.i.d. extension of the family {pθ |θ ∈ Θ}. For the data ωn := (ω1 , . . . , ωn ) ∈ Ωn , the probability distribution pnθ (ωn ) := pθ (ω1 ) . . . pθ (ωn ) is given by ! n i ωn ) = exp n θ Fn,i (ωn ) − nψ(θ) pn ( dωn ) pθ ( i

1 Fi (ωk ). n n

pn ( dωn ) := p( dω1 ) . . . p( dωn ),

Fn,i (ωn ) :=

k=1

Since the expectation parameter of the probability family {pnθ |θ ∈ Θ} is given by nηi (θ), the MLE θˆn (ωn ) is given by nηi (θˆn (ωn )) = nFn,i (ωn ).

(84)

entire

December 28, 2004

13:56



455

Applying Cramér’s Theorem [7] to the random variables F1 , . . . , Fd and the distribution pθ0 , for any subset S ⊂ Rd we have ! i −1 log pnθ0 {Fn ∈ S} θ (ηi − Eθ0 (Fi )) − ψθ0 (θ ) ≤ lim inf sup n→∞ n η∈S θ ∈Rd i ! i ≤ inf sup θ (ηi − Eθ0 (Fi )) − ψθ0 (θ ) , η∈intS θ ∈Rd

where

i

6

6

Eθ0 (Fi )) :=

Fi (ω)pθ ( dω),

ψθ0 (θ) :=

Ω

exp Ω

! i

θ Fi (ω) pθ ( dω)

i

Fn ( ωn ) := (Fn,1 ( ωn ), . . . , Fn,d (ωn )), and intS denotes the interior of S, which is consistent with (S c )c . Since ! i sup θ (ηi − Eθ0 (Fi )) − ψθ0 (θ ) θ ∈Rd

i

= sup θ ∈Rd

! i

θ (ηi − ηi (θ0 )) − ψ(θ )

+ ψ(θ0 ) = D(θ θ0 )

i

and the map θ )→ D(θ θ0 ) is continuous, it follows from (84) that lim

n→∞

−1 log pnθ0 {θˆn ∈ Θ } = inf D(θ θ0 ) θ∈Θ n

for any subset Θ ⊂ Θ, which is equivalent to (76). Conversely, if an estiωn )} satisfies the weak consistency mator {Tn ( lim pn { Tn ( ωn ) n→∞ θ

− θ > ε} → 0,

∀ε > 0, ∀θ ∈ Θ,

then, similarly to (33), we can prove lim

n→∞

−1 log pnθ0 {Tn (ωn ) ∈ Θ } ≤ inf D(θ θ0 ). θ∈Θ n

Therefore, we can conclude that the MLE is optimal in the large deviation sense for exponential families. References [1] S. Amari and H. Nagaoka, Methods of information geometry, AMS & Oxford University Press, 2000. [2] R.R. Bahadur, Sankhy¯ a, 22, 229, (1960). [3] R.R. Bahadur, Ann. Math. Stat., 38, 303, 1967.

entire

December 28, 2004

456

13:56


Masahito Hayashi

[4] R.R. Bahadur, Some limit theorems in statistics, Regional Conf. Series in Applied Mathematics, 4, SIAM, Philadelphia, 1971. [5] O.E. Barndorff-Nielsen, Information and exponential families in statistical theory, Wiley, New York, 1978. [6] R. Bhatia, Matrix analysis, Springer, New York, 1997. [7] J.A. Bucklew, Large deviation techniques in decision, simulation and estimation, John Wiley & Sons, New York, 1990. ˇ [8] N.N. Cencov, Statistical decision rules and optimal inference, American Mathematical Society, Rhode Island, U.S.A., 1982 (Originally published in Russian, Nauka, Moscow, 1972). [9] J.C. Fu, Ann. Stat. 1, 745, (1973). [10] C.A. Fuchs, Distinguishability and accessible information in quantum theory, Ph. D. Dissertation, University of New Mexico, 1995; quant-ph/9601020. [11] R. Gill and S. Massar, Phys. Rev. A, 61, 042312, 2000. [12] R. Goodman and N. Wallach, Representations and invariants of the classical groups, Cambridge University Press, Cambridge, 1998. [13] M. Hayashi quant-ph/9704044, 1997. [14] M. Hayashi, “A linear programming approach to attainable Cramer-Rao type bound,” in O. Hirota, A.S. Holevo, and C.M. Caves, eds., Quantum communication, computing, and measurement, Plenum Publishing, 99-108, 1997. (Chap. 12 of this book). [15] M. Hayashi, J. Phys. A: Math. and Gen., 34, 3413, (2001) (Originally appeared in quant-ph/9704040). (Chap. 5 of this book). [16] M. Hayashi, “Exponents of quantum fixed-length pure state source coding,” to be published in Phys. Rev. A, (2002)∗ ; quant-ph/0202002. [17] M. Hayashi and K. Matsumoto, “Statistical model with a option for measurements and quantum mechanics,” RIMS koukyuroku, 1055, 96-110, 1998 (In Japanese). (Chap. 13 of this book). [18] M. Hayashi and K. Matsumoto, IEICE, J83-A, 6, 629, (2000) (In Japanese). [19] M. Hayashi and K. Matsumoto, Quantum universal variable-length source coding, quant-ph/0202001, to be published in Phys. Rev. A, (2002)† . [20] C.W. Helstrom, Phys. Lett. 25A, 101, (1967). [21] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York, 1976. [22] F. Hiai and D. Petz, Commun. Math. Phys. 143, 99–114, (1991). (Chap. 3 of this book) [23] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982. [24] A.S. Holevo, IEEE Trans. Inform. Theory, 44, 269, (1998); quantph/9611023. [25] R. Jozsa, J. Mod. Opt., 41, 2315, (1994). [26] K. Keyl and R.F. Werner, Phys. Rev. A, 64, 052311, (2001); quantph/0102027. (Chap. 29 of this book) ∗ Editorial † Editorial

note: The paper [16] has been published as Phys. Rev. A 66, 032321 (2002). note: The paper [19] has been published as Phys. Rev. A 66, 022311 (2002).

entire

December 28, 2004

13:56



457

[27] R. Kubo, M. Toda, and N. Hashitsume, Statistical physics II, nonequilibrium statistical mechanics, Springer Series on Solid State Sciences, SpringerVerlag, 1991. [28] L. LeCam, Asymptotic methods in statistical decision theory, SpringerVerlag, New York, 1986. [29] G. Lindblad, Commun. Math. Phys., 40, 147, (1975). [30] K. Matsumoto, Geometry of quantum states, master thesis, the University of Tokyo, 1995 (In Japanese). [31] H. Nagaoka, “An asymptotically efficient estimator for a one-dimensional parametric model of quantum statistical operators,” In Proc. of IEEE International Symposium on Information Theory, 198, (1988). [32] H. Nagaoka, “On the relation between Kullback divergence and Fisher information — from classical systems to quantum systems —,” Proceedings of Joint Mini-workshop for data compression theory and fundamental open problems in information theory, 63-72, (1992) (in Japanese). (Chap. 27 of this book). [33] T. Ogawa and H. Nagaoka, IEEE Trans. Inform. Theory, 46, 2428-2433, (2000) (Chap. 2 of this book); quant-ph/9906090, 1999. [34] D. Petz, Linear algebra and its applications, 224, 81, (1996). [35] D. Petz and C. Sud´ ar, “Extending the Fisher metric to density matrices,” In O.E. Barndorff-Nielsen and E.B.V. Jensen, eds., Geometry in Present Day Science, 21, World Scientific, Singapore, 1998. [36] D. Petz and G. Toth, Lett. Math. Phys., 27, 205 (1993). [37] B. Schumacher, Phys. Rev A, 51, 2738 (1995). [38] B. Schumacher and M.D. Westmoreland, Phys. Rev. A, 56, 131 (1997). [39] A. Uhlmann, Commun. Math. Phys., 54, 21 (1977). [40] A. Uhlmann, Rep. Math. Phys., 33, 253–263 (1993). [41] H. Weyl, The classical groups, their invariants and representations, Princeton University Press, Princeton, NJ, 1939. [42] A. Winter, “Coding theorems of quantum information theory,” Ph.D. dissertation, Uni Bielefeld, 2000; quant-ph/9907077.

entire

December 28, 2004

13:56


CHAPTER 29

Estimating the Spectrum of a Density Operator Michael Keyl and Reinhard F. Werner Abstract. Given N quantum systems prepared according to the same density operator ρ, we propose a measurement on the N -fold system that approximately yields the spectrum of ρ. The projections of the proposed observable decompose the Hilbert space according to the irreducible representations of the permutations on N points, and are labeled by Young frames, whose relative row lengths estimate the eigenvalues of ρ in decreasing order. We show convergence of these estimates in the limit N → ∞, and that the probability for errors decreases exponentially with a rate we compute explicitly.

1. Introduction The density operator of a quantum system describes the preparation of the system in all details relevant to statistical experiments. Like a classical probability distribution it cannot be measured on a single system, but can be estimated only on an ensemble sequence of identically prepared systems. In fact, if we could determine the density operator (or, in the pure case, the wave function) on a single quantum system, we could combine the measurement with a device repreparing several systems with the measured density operator, in contradiction to the well-known no-cloning theorem [22]. This points to a close connection between the problem of estimating the density operator and approximate cloning. In the case of inputs promised to be in a pure state the optimal solutions to both problems are known [20, 11, 2], and it turns out that in a sense the limit of the cloning problem for output number M → ∞ is equivalent to the estimation problem. The “optimal” cloning transformation was shown in this case to be quite insensitive to the figure of merit defining optimality [11]. In the case of mixed input states much less is known about the cloning problem. It is likely that in this case there may be different natural figures Reprinted (abstract/excerpt/figure) with permission from M. Keyl and R.F. Werner, c Phys. Rev. A 64, 052311, 2001. 2001 by The American Physical Society 458

entire

December 28, 2004

13:56


Estimating the Spectrum of a Density Operator

459

of merit leading to inequivalent “optimal” solutions. Even the classical version of the problem is not trivial, and is related to the so-called bootstrap technique [6] in classical statistics. The estimation problem certainly has many solutions. In fact, any procedure of determining the density matrix through the measurement of the expectations of a suitable “quorum” of observables [18], such as in quantum state tomography [13] is a solution. Other methods include adaptive schemes [8] where the result of one measurement is used to select the next one. In all these cases, the estimate amounts to the measurement of an observable on the full input state ρ⊗N , which factorizes into one-site observables. What we are concerned with here, as in the work of Vidal et al. [17], is the search for improved estimates, admitting arbitrary observables on the N -fold input system, including “entangled” ones. In contrast to [17], however, we are not interested in estimators that are optimal for a more or less general figure of merit, but in the asymptotic behavior if the number N of input systems goes to infinity (in this context see also the work of Gill and Massar [9]). When H ∼ = Cd is the Hilbert space of a single system, the overall input density operator of the estimation problem is ρ⊗n , which exists on the N th tensor power H⊗N . This space has a natural orthogonal decomposition according to the irreducible representations of the permutation group of N points, acting as the permutations of the tensor factors. Equivalently, this is the decomposition according to the irreducible representations of the unitary group on H (see below). It is well known that this orthogonal decomposition is labeled by Young frames, i.e., by the arrangements of N boxes into d rows of lengths Y1 ≥ Y2 ≥ · · · ≥ Yd ≥ 0 with α Yα = N . There is a striking similarity here with the spectra we want to estimate, which are given by sequences of the eigenvalues of ρ, say, r1 ≥ r2 ≥ · · · ≥ rd ≥ 0, with α rα = 1. The basic idea of this paper is to show that this is not a superficial similarity: measuring the Young frame (by an observable whose eigenprojections are the projections in the orthogonal decomposition) is, in fact, a good estimate of the spectrum. More precisely, we show that the probability for the error |Yα /N − rα | to be larger than a fixed ε for some α decreases exponentially as N → ∞. The group theoretic ideas just sketched are nothing new but go back to Weyl [21] and are in the meantime a standard tool within quantum mechanics. Examples of works where similar methods are used in quantum information are [4, 17, 1, 10, 12]. In particular, [12] is closely related to the present paper because similar techniques are used there. This concerns in

entire

December 28, 2004

460

13:56


Michael Keyl and Reinhard F. Werner

particular the theory of large deviations[7], and a result by Duffield [5] on the large deviation properties of tensor powers of group presentations. This will allow us to compute the rate of exponential convergence explicitly. 2. Statement on the Result In order to state our result, explicitly, we need to recall the decomposition theory for N -fold tensor products. Throughout, the one-particle space H will be the d-dimensional Hilbert space Cd , with d < ∞. Two group representations play a crucial role: first, the representation X )→ X ⊗N of the general linear group GL(d, C) and, secondly, the representation p )→ Sp of the permutations p ∈ SN on N points, represented by permuting the tensor factors: Sp ψ1 ⊗ · · · ⊗ ψN = ψp−1 1 ⊗ · · · ⊗ ψp−1 N .

(1)

The basic result [15] is that these two representations are “commutants” of each other, i.e., any operator on H⊗N commuting with all X ⊗N is a linear combination of the Sp , and conversely. This leads to the decomposition H⊗N ∼ = ⊕ RY ⊗ SY ,

(2)

X ⊗N ∼ = ⊕ π Y (X) ⊗ 1,

(3)

Y Y

ˆ Y (p), Sp ∼ = ⊕1⊗π

(4)

Y

ˆ Y : SN → B(SY ) are irreducible where π Y : GL(d, C) → B(RY ) and π representations, and the restriction of πY to unitary operators is unitary. The summation index Y runs over all Young frames with d rows and N boxes, as described in the Introduction. We denote by PY the projection onto the corresponding summand in the above decomposition. Let us consider now the estimation problem. As already discussed in the Introduction, we are searching for an observable EN describing a measurement on N d-level systems, whose readouts are possible spectra of d-level density operators. The set of possible spectra will be denoted by Σ=

d s ∈ Rd x ✄ 0, xj = 1 ,

(5)

j=1

where x ✄ 0 denotes the ordering relation on Rd given by s ✄ 0 :⇔ sj > sj+1 for all j = 1, . . . , d − 1.

(6)

entire

December 28, 2004

13:56



461

Technically, EN must be a positive operator-valued measure on this set, assigning to each measurable subset ∆ ⊂ Σ a positive operator EN (∆) ∈ B(H⊗N ), whose expectation in any given state is interpreted as the probability for the measurement to yield a result s ∈ ∆. The criterion for a good estimator EN is that, for any one-particle density operator ρ, the value measured on a state ρ⊗N is likely to be close to the true spectrum r ∈ Σ of ρ, i.e., that the probability KN (∆) := Tr [EN (∆)ρ⊗N ]

(7)

is small when ∆ is the complement of a small ball around r. Of course, we will look at this problem for large N . So our task is to find a whole sequence of observables EN , N = 1, 2, . . ., making error probabilities like Eq. (7) go to zero as N → ∞. The search for efficient estimation strategies EN can be simplified greatly by symmetry arguments. To see this, consider a permutation p ∈ SN . If we insert the transformed estimator Sp EN (∆)Sp∗ into Eq. (7) we see immediately that KN (∆) remains unchanged. Replacing EN (∆) by the av erage N !−1 p∈SN Sp EN (∆)Sp∗ shows that we may assume [EN (∆), Sp ] = 0 for all permutations p, without loss of estimation quality. A similar argument together with the fact that the quality of the estimate is judged by some criterion not depending on the choice of a basis in H shows that we may assume in addition that EN (∆) commutes with all unitaries U ⊗N . But this implies according to Eqs. (3) and (4) that EN (∆) must be a function of the projection operators PY : H⊗N → RY ⊗ SY defined at the beginning of this section. If we require in addition that each EN (∆) be a projection, which is suggestive for ruling out unnecessary fuzziness, EN must be of the form EN (∆) = PY , (8) Y :sN (Y )∈∆

where sN is an arbitrary mapping assigning to each Young frame Y (with d rows and N boxes) an estimate sN (Y ) ∈ Σ. In other words, the estimation proceeds by first measuring the Young frame projections PY and then computing an estimate sN (Y ) on the basis of the result Y . The simplest choice is clearly to take the normalized Young frames themselves as the estimate, i.e., sN (Y ) = Y /N.

(9)

It turns out somewhat surprisingly that with this choice the EN (∆) from Eq. (8) form an asymptotically exact estimator. By this we mean that, for

entire

December 28, 2004

462

13:56



Fig. 1. Probability distribution Tr (ρ⊗N PY ) for d = 3, N = 20, 100, 500, and r = (0.6, 0.3, 0.1). The set Σ is the triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0), C = (1/3, 1/3, 1/3).

every ρ, the probability measures KN from Eq. (7) converge weakly to the point measure at the spectrum r of ρ. Explicitly, for each continuous function f on Σ we have 6 Y f (s)KN (ds) = lim f (10) lim Tr (ρ⊗N PY ) = f (r). N →∞ Σ N →∞ N Y

We illustrate this in Fig. 1, for d = 3 and ρ a density operator with spectrum r = (0.6, 0.3, 0.1). Then Σ is a triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0), and C = (1/3, 1/3, 1/3), and we plot the probabilities Tr (ρ⊗N PY ) over Y /N ∈ Σ. The explicit computation uses the Weyl character formula (Chap. 9, Sec. 9.1 [15]), which we do not need elsewhere in the paper. Clearly, the distribution is peaked at the true spectrum and our claim

entire

December 28, 2004

13:56



463

is that this will become exact in the limit N → ∞. To prove convergence we will use large deviation methods which give us not only the convergence just stated but an exponential error estimate of the form KN (∆) ≈ exp(−N inf I(s)), s∈∆

(11)

where I denotes a positive function on Σ, called the rate function, which vanishes only for s = r. For the statement of the main theorem we say that a measurable subset ∆ ⊂ Σ has “small boundary” if its interior is dense in its closure. A typical choice for ∆ is the complement of a ball around the true spectrum. Theorem 1: The estimator defined in Eqs. (8) and (9) is asymptotically exact. Moreover, we have the error estimate lim

N →∞

1 ln KN (∆) = − inf I(s) s∈∆ N

(12)

for any set ∆ ⊂ Σ with small boundary, where the rate function I : Σ → [0, ∞] is sj (ln sj − ln rj ). (13) I(s) = j

The expression for I is the relative entropy [14] of the probability vectors s and r. Relative entropies occur also as the rate functions in large deviation properties of independent identically distributed (classical [3] or quantum [19]) random variables, although there seems to be no direct way to reduce the above theorem to these standard setups. 3. Sketch of Proof Rather than giving a proof of every detail, our aim here is to explain why the scaled Young frames Y /N appear in the estimation problem. The crucial observation is that the Young frame (Y1 , . . . , Yd ) is the highest weight of the representation π Y in the ordering ✄ and this ordering is directly related to picking out the fastest growing exponential in certain integrals of the measurement KN . The integrals we need to study are the Laplace transforms of the measures KN . We introduce the “scaled cumulant generating function” 6 1 KN (ds)eN η·s , (14) c(η) = lim ln N →∞ N Σ

entire

December 28, 2004

464

13:56



where η ∈ Rd , and η · s is the scalar product. If the measures KN behave like Eq. (11) the integrand near s behaves like exp N [η · s − I(s)], and the largest contribution comes from the fastest growing exponential: c(η) = sup[η · s − I(s)].

(15)

s

This is an instance of Varadhan’s theorem [16], which has a converse, the G¨ artner-Ellis theorem (Theorem II.6.1 [7]): if the limit (14) exists and is differentiable then the estimate in the theorem holds, with the rate function determined from Eq. (15) by inverse Legendre transformation. We will follow Duffield [5] by computing the limit (14) from group theoretical data. Consider the “maximally Abelian subgroup” C ⊂ GL(d, C) of diagonal matrices ρh = diag[exp(h1 ), . . . , exp(hd )]

(16)

for h ∈ Cd . Since these commute, all the operators π Y (ρh ) commute in every representation π Y , and can hence be simultaneously diagonalized. The vectors µ = (µ1 , . . . , µd ) such that π Y (ρh )ψ = exp(µ · h)ψ for some nonzero vector ψ are called weights of the representation πY . The dimension m(µ) of the corresponding eigenspace is called multiplicity of µ. One particular weight (with multiplicity 1) is the Young frame Y itself (interpreted as an element of Rd ) and it turns out that Y is the maximum (the “highest weight”) among all weights of π Y , in the ✄ ordering from Eq. (6). Representation theory of semisimple Lie algebras [15] shows that each irreducible, analytic representation of GL(d, C) is uniquely characterized (up to unitary equivalence) by its highest weight Y . In order to estimate the integral (14), we need the quantities tr(ρ⊗N PY ). For simplicity we assume that ρ is nonsingular, i.e., an element of GL(d, C). By Eq. (3) we have Tr (ρ⊗N PY ) = Tr [π y (ρ) ⊗ 1] = χY (ρ)dim(SY ),

(17)

χY (ρ) := tr[π Y (ρ)]

(18)

where

is the character of the representation π Y . Since χY is unitarily invariant [χY (U ρU ∗ ) = χY (ρ)] we may assume that without loss of generality that ρ is diagonal and its matrix elements are arranged in descending order. Using the notation from Eq. (16) this assumption reads exp(hj ) = 1. (19) ρ = ρh ∈ C with h ✄ 0 and j

entire

December 28, 2004

13:56



Hence we can express χY (ρ) in terms of the weights of π Y : χY (ρ) = m(µ) exp(µ · h),

465

(20)

µ

where the sum is taken over all weights µ of π Y . Since h ✄ 0 and Y ✄ µ for all µ we see that exp(Y ·h) is the largest exponential. We therefore estimate exp(Y · h) ≤ χY (ρ) ≤ dim(RY ) exp(Y · h).

(21)

Hence, if we introduce for any h, η ∈ Rd , h, η ✄ 0 the two expressions 6 J(h, η) = KN (ds)eN η·s (22) Σ N η·Y /N = Tr (ρ⊗N (23) h PY )e Y

=

χY (ρh )eη·Y dim(SY )

(24)

Y

and J (h, η) =

e(h+η)·Y dim(SY )

(25)

Y

we get J (h, η) ≤ J(h, η) ≤ dim(RY )J (h, η).

(26)

If we combine this with the consequence of Weyl’s dimension formula that dim(RY ) is bounded above by a polynomial p(N ) in N , uniformly in Y (Lemma 2.2 [5]), and take logarithms we get ln J (h, η) ≤ ln J(h, η) ≤ const × ln N + ln J (h, η).

(27)

Since J (h, η) grows exponentially in N its logarithm is linear in N and we see that J(h, η) and J (h, η) are asymptotically equivalent in the sense that (1/N )[ln J(h, η) − ln J (h, η)] → 0.

(28)

In the same sense we can continue the chain of equivalences J(h, η) ≈ J (h, η) = J (h + η, 0) ≈ J(h + η, 0) 6 = KN (ds).

(29) (30)

Σ

Here we have Eq. (22) for J(h+ η, 0), and the h+ η dependence is contained in KN (ds) via ρh+η . Together with the definition of KN in Eq. (7) this

entire

December 28, 2004

13:56



466

implies

6 J(h, η) ≈ Σ

KN (ds) = KN (Σ) = Tr (EN (Σ)ρ⊗N h+η )

N = Tr (ρ⊗N h+η ) = (trρh+η ) .

(31) (32)

Hence, if rα = exp(hα ) are the eigenvalues of a nonsingular density operator, we get for Eq. (14) the expression rα exp(ηα ). (33) c(η) = ln α

It is then a simple calculus exercise to verify the above rate function as the Legendre transform I(s) = supη [η · s − c(η)]. This concludes our sketch of proof. In order to expand it into a full proof, one needs to extend the computation of c(η) to η " ✄0, and prove that this extension has the required regularity properties for the application of the converse Varadhan’s theorem cited above. This has been carried out by Duffield [5] in a context that is, on the one hand wider, because it includes tensor powers of much more general representations of semisimple Lie groups, but on the other hand narrower, because it contains only the case ρ = d−1 1 of our theorem. However, one can extend Duffield’s result by multiplying his measures KN by the factor χY (ρ)/χY (1) and using for this factor the estimate (21). 4. Discussion Although the estimate we discuss is asymptotically exact, it is not at all clear whether and in what sense it might be optimal, even for finite N . We have experimented with various figures of merit for estimation and found different “optimal” estimators for low N , rarely coinciding with the EN determined by Eq. (9). It is also not at all clear how much could be gained by optimization here. An interesting extension will also be the construction of estimators for the full density operator. It is very suggestive to compose this out of the above estimator for the spectrum, and to use for each Young frame a covariant observables to estimate the eigenbasis of ρ. The density of the covariant observable might be based on the highest weight vector of π Y . Acknowledgements Funding by the European Union project EQUIP (Contract No. IST1999-11053) and financial support from the DFG (Bonn) are gratefully acknowledged.

entire

December 28, 2004

13:56



467

References [1] A. Ac´ın, R. Tarrach, and G. Vidal, Phys. Rev. A, 61, 062307 (2000). [2] D. Bruß and C. Macchiavello, Phys. Lett. A, 253, 249 (1999). [3] H. Cramér, Colloque consacré ` a la théorie des probabilités, 736 of Actualités scientifique et industrielles, 5-23, Hermann, Paris, 1938. [4] J.I. Cirac, A.K. Ekert, and C. Macchiavello, Phys. Rev. Lett., 82, 4344 (1999). [5] N.G. Duffield, Proc. Am. Math. Soc. 109, 503 (1990). [6] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap (Chapman and Hall, New York, 1993). [7] R.S. Ellis, Entropy, Large Deviations, and Statistical Mechanics Springer, Berlin, 1985. [8] D.G. Fischer and M. Freyberger, Phys. Lett. A, 273, 293 (2000). [9] R.D. Gill and S. Massar, Phys. Rev. A, 61, 042312, (2000). (Chap. 15 of this book). [10] N. Gisin, quant-ph/0010036, 2001. [11] M. Keyl and R.F. Werner, J. Math. Phys., 40, 3283, (1999). [12] M. Keyl and R.F. Werner, Ann. Inst. Henri Poincar´ e, 2, 1 (2001). [13] U. Leonhardt, Measuring the quantum state of light, Cambridge University Press, Cambridge, 1997. [14] M. Ohya and D. Petz, Quantum Entropy and Its Use (Springer, Berlin, 1993). [15] B. Simon, Representations of finite and compact groups, American Mathematical Society, Providence, RI, 1996. [16] S.R.S. Varadhan, Commun. Pure Appl. Math., 19, 261 (1966). [17] G. Vidal, J.I. Latorre, P. Pascual, and R. Tarrach, Phys. Rev. A, 60, 126 (1999). [18] S. Weigert, in H.D. Doebner, S.T. Ali, M. Keyl, and R.F. Werner, eds., Trends in Quantum Mechanics, 146-156, World Scientific, Singapore, 2000. [19] R. Werner, in L. Accardi, ed., Quantum Probability and Related Topics, VII, 349-381, World Scientific, Singapore, 1992. [20] R. Werner, Phys. Rev. A, 58, 1827 (1998). [21] H. Weyl, The classical groups, Princeton University, Princeton, NJ, 1946. [22] W.K. Wootters and W.H. Zurek, Nature, London, 299, 802 (1982).

entire

December 28, 2004

468

13:56



entire

December 28, 2004

13:56


PART VI Further Topics on Quantum Statistical Inference

Chap. 31: V. Buˇzek, R. Derka, and S. Massar “Optimal quantum clocks” . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Chap. 32: A. Fujiwara “Quantum channel identification problem” . . . . . . . . . . . . . . . . . 487 Chap. 33: G.M. D’Ariano “Homodyning as universal detection” . . . . . . . . . . . . . . . . . . . . 494 Chap. 34: D.F.V. James, P.G. Kwiat, W.J. Munro, and A.G. White “On the measurement of qubits” . . . . . . . . . . . . . . . . . . . . . . . 509

entire

December 28, 2004

13:56


entire

December 28, 2004

13:56


CHAPTER 30

Introduction to Part VI 1. Estimation of Suantum State-Evolution Process The purpose of quantum statistical inference is not limited to the estimation of an unknown state, but quantum statistical inference can be applied to estimation of an unknown quantum state-evolution process. For this type of estimation, we assume that we are able to choose suitable input states as well as quantum measurements. If there is no noise, the state-evolution process is described as an action of a unitary matrix. Buˇzek, Derka, and Massar [Chap. 31] focused on the estimation of a one-parameter unknown iθ/2

0 that can operate on n systems simultaneously. unitary action e 0 e−iθ/2 Using Helstrom and Holevo’s formula [IV-1, IV-3], they showed that the θˆ π2 optimal estimation error sin2 θ− 2 goes to 0 with the rate 4n2 . Their optimal input state is entangled between all identical input systems. Fujiwara [VI-1] focused on that of a single unknown SU(2) action, and analyzed the merit of the input state entangled with the reference system. This problem was extended by Ballester [VI-2]. Ac´ın et al. [VI-3] also discussed related topics. Bagan et al. [VI-4] derived the optimal estimation of the unknown n-identical SU(2) operations with use of entanglement with the reference system. They also showed that the optimal error goes to 0 with the rate of π2 ∗ n2 . They used effectively applied the Clebsch-Gordan coefficients method to this problem. Hayashi [VI-5] also treated the same problem by a diffenrent method. He derived a notable relation between this problem and that of Buˇzek et al., and applied the obtained relation to this problem. Furthermore, Bagan et al. [VI-6], and Chiribella et al. [VI-7] independently showed that the estimation error goes to 0 with the same rate. These papers pointed out that the multiplicity of the same irreducible representations can be regarded as the reference system, i.e., the effect of ‘self-entanglement’.

∗ They

ˆ −1 ) goes to 0 with the rate proved that the error 6 − 2χ1 (U U

8π 2 , where χj (g) n2 ˆ −1 ) 1−χ2 1/2 (U U

denotes the trace of g in the spin-j representation. If we focus on the error the error goes to 0 with the rate

π2 n2

because χ1/2 471

(g)2

= 1 + χ1 (g).

4

,

entire

December 28, 2004

13:56

472


Introduction to Part VI

Moreover, Fujiwara [Chap. 32] focused on the estimation of a depolarizing channel, which is a typical example of noisy quantum state evolutions. By the careful discussion of Fisher information, he proved the advantage of the input state entangled with a reference system. He also discussed a slightly different setting. After this result, Sasaki et al. [VI-8] discussed a similar estimation problem with the Bayesian approach in a non-asymptotic setting. Fischer et al. [VI-9] focused the use of the maximally entangled input state for the estimation of the Pauli channel. Fujiwara and Imai [VI-10] and Hayashi† proved that the maximally entangled input state yields the optimal estimation of the unknown n-identical Pauli channel operations, by different methods. Furthermore, Fujiwara and Imai [VI-10] proved that only a maximally entangled input state realizes the optimal estimation in this problem. Based on this result, Fujiwara [VI-11] and Tanaka‡ treated the eatimation problem of the amplitude damping channel, independently. Especially, Fujiwara [VI-11] proceeded to the estimation problem of the generalized amplitude damping channel, which is the more general and difficult part. De Martini et al. [VI-12] implemented an experiment for the estimation of an unknown unitary. 2. Tomography The previous parts treat the estimation with the choice of the quantum measurement. In the following, we treat the estimation with a fixed quantum measurement that is experimentally available. This type of approach is called tomography, and the infinite-dimensional case was studied first. Vogel and Risken [VI-13] focused on the Boson-Fock space, and proposed an estimation method of the density operator on this space based on an available quantum measurement that is referred to as Homodyne measurement. Any state on the Boson-Fock space is known to be described by a Wigner function. However, we cannot directly obtain this function through a quantum measurement. They proposed a method of estimating the Wigner function from the probability distribution p(x; φ) of the data x of the measurement of the observable Xφ ≡ Q cos φ + P sin φ. That is, their method requires the probability densities p(x; φ) of all angles φ. For this method, they derived a formula calculating the Wigner function from the probability distribution p(x; φ) of the data x of the measurement † Hayashi,

Chapter 6 [II-6]. Tanaka, Investigation on Fisher metric on the classical statistical models and the quantum ones, master thesis, the University of Tokyo, 2004 (in Japanese).

‡ F.

entire

December 28, 2004

13:56



473

of the observable Xφ ≡ Q cos φ + P sin φ. Based on their results, Smithey et al. [VI-14] implemented the required measurements in a real optical system, and experimentally estimated Wigner function. These researches estimated the true state by directly using the relation between the Wigner function and p(x; φ). However, there is a gap between the true distribution p(x; φ) and the experimentally estimated distribution pˆ(x; φ). Therefore, we need to consider how large the difference between the true state and the estimated state is. For this analysis, D’Ariano [Chap.33] constructed a function K : p )→ K[p] transforming p(x; φ) to the density operator ρ. He focused on matrix elements n|K[p]|m based on the number state |n, and proved that the norm of the linear functional of p )→ n|K[p]|m is finite. This fact guarantees that the difference between the true matrix element and the estimated one is small if the difference between p(x; φ) and pˆ(x; φ) is also small. Furthermore, he discussed the case of finite quantum efficiency. However, there is no reason to use only the linear function K[ˆ p] for the estimation of the density operator from the data pˆ(x; φ). If we use this method in the case when the true state is pure or nearly pure, the estimated density operator may have negative eigenvalues, i.e., may not be a density operator. Banaszek et al. [VI-15] proposed a method estimating the density operator ρˆ(X n ) from the finite data X n ≡ (X1 , Φ1 ), . . . , (Xn , Φn ). Their method is regarded as a modification from the maximum likelihood estimation (MLE)§ , which is a standard estimating method in classical statistical inference. Gill and Gut¸˘a [VI-16] called this method the sieved maximum likelihood estimation, and proved that the norm ρ − ρˆ(X n ) 1 between the true density operator ρ and the estimated one ρˆ(X n ) converges to 0 with the probability 1 i.e., this convergence is almost sure. Furthermore, Artiles et al. [VI-17] discussed the implementation of this estimation, in more details. The “tomographic” method can be also applied to the finite-dimensional case. In the above method, we obtain the probability distribution corresponding to the 2projection Eφ (x) at first, and obtain ρ from a linear calculation, where xEφ (x) dx is the spectral decomposition of Xφ . Thus, this method is generalized to a more general setting on a finite-dimensional space H as follows: Let X1 , . . . , Xk be a basis of the linear space of matrix

§ The

definition of MLE is given in Sec. 6.2 of Chap. 7.

entire

December 28, 2004

13:56



474

on H, and Y1 , . . . , Yk be the dual basis, i.e., satisfy Tr Yi Xj = δi,j . Based on the above preparation, we measure observables Xi several times, and obtain the respective average values µi and the estimated density matrix by [VI-19, eq.(19) of Chap. 34] ρˆ = µi Yi , Tr ρˆXi = µi . i

This method is called linear tomography. However, if we use this method in the nearly pure state case, ρˆ may have negative eigenvalues. Another problem is that this method uses only the averages µi f the respective measured observables Xi . Usually, the empirical distribution is sufficient information to estimate the true distribution in statistics. If we use only the averages, we ignore much useful information and the accuracy of the estimation goes down. Thus, for a more precise estimation, we need to focus on the probability distribution under the measurements of the observables X1 , . . . , Xk . Hradil [VI-20] and James et al. [Chap. 34] applied the maximum likelihood estimation to this problem. This method is called maximum likelihood tomography. James et al. [Chap. 34] compared the maximum likelihood tomography and linear tomography with a physical realizable setting, and demonstrated the superiority of the former¶ . Their comparison is based on the application to the real optical system. Moreover, Usami et al. [VI-18] applied maximum likelihood tomography to their experimental data, and checked its accuracy. They discussed the merit of the use of AIC (Akaike Information Criterion). As another method, Buˇzek et al. [VI-21] applied Bayes inference to the above setting under a prior distribution. This method is called Bayesian reconstruction, and they compared it with maximum entropy method. Further Reading [VI-1] A. Fujiwara, “Estimation of SU(2) operation and dense coding: An information geometric approach,” Phys. Rev. A, 65, 012316 (2002). [VI-2] M. A. Ballester, “Estimation of unitary quantum operations,” Phys. Rev. A, 69, 022303 (2004); quant-ph/0305104. ¶ Of

course, this superiority is promised by the results of classical statistics.

entire

December 28, 2004

13:56



475

[VI-3] A. Ac´ın, E. Jané, and G. Vidal, “Optimal estimation of quantum dynamics,” Phys. Rev. A, 64, 050302(R) (2001); quant-ph/0012015. [VI-4] E. Bagan, M. Baig, and R. Mu˜ noz-Tapia, “Entanglement-assisted aligment of reference frames using a dense covariant coding,” Phys. Rev. A, 69, 050303(R) (2004); quant-ph/0303019. [VI-5] M. Hayashi, “Estimation of SU(2) action by using entanglement,” quant-ph/0407053 (2004); ibid, Proceedings of The Ninth Quantum Information Technology Symposium (QIT9), NTT Basic Research Laboratories, Atsugi, Kangawa, Japan, December 11–12, 2003, pp.9– 13 (In Japanese); ibid, Proceedings of The 7th International Conference on Quantum Communication, Measurement and Computing, Glasgow, UK, July 25–29, 2004 (AIP), pp. 269–272. [VI-6] E. Bagan, M. Baig, and R. Mu˜ noz-Tapia, “Quantum reverseengineering and reference-frame alignment without nonlocal correlations,” Phys. Rev. A, 70, 030301(R) (2004); quant-ph/0405082 (2004). [VI-7] G. Chiribella, G. M. D’Ariano, P. Perinotti, and M. F. Sacchi, “Efficient use of quantum resources for the transmission of a reference frame,” quant-ph/0405095 (2004); Phys. Rev. Lett. 93, 180503 (2004). [VI-8] M. Sasaki, M. Ban, and S.M. Barnett, “Optimal parameter estimation of a depolarizing channel,” Phys. Rev. A, 66, 022308 (2002); quant-ph/0203113. [VI-9] D. G. Fischer, H. Mack, M. A. Cirone, and M. Freyberger, “Enhanced estimation of a noisy quantum channel using entanglement,” Phys. Rev. A, 64, 022309 (2001); quant-ph/0103160. [VI-10] A. Fujiwara and H. Imai, “Quantum parameter estimation of a generalized Pauli channel,” J. Phys. A: Math. and Gen., 36, 8093–8103 (2003). [VI-11] A. Fujiwara, “Estimation of a generalized amplitude-damping channel,” Phys. Rev. A, 70, 012317 (2004). [VI-12] F. De Martini, A. Mazzei, M. Ricci, and G. M. D’Ariano, “Pauli Tomography: complete characterization of a single qubit device,” quant-ph/0207143; appear in Phys. Rev. Lett.. [VI-13] K. Vogel and H. Risken, “Determination of quasiprobability distribution in terms of probability distributions for the rotated quadrature phase,” Phys. Rev. A, 40, 2847–2849 (1989). [VI-14] D.T. Smithey, M. Beck, M. G. Raymer, and A. Faridani, “Mea-

entire

December 28, 2004

476

[VI-15]

[VI-16] [VI-17]

[VI-18]

[VI-19]

[VI-20] [VI-21]

13:56



surement of the Wigner Distribution and the Density Matrix of a Light Mode Using Optical Homodyne Tomography: Application to Squeezed States and the Vacuum,” Phys. Rev. Lett., 70, 1244–1247 (1993). K. Banaszek, G. M. D’Ariano, M. G. A. Paris, and M. F. Sacchi, “Maximum-likelihood estimation of the density matrix,” Phys. Rev. A, 61, 010304 (2000); quant-ph/9909052. R. Gill and M. Gut¸˘a, “An invitation to quantum tomography,” quant-ph/0303020. L.M. Artiles, R.D. Gill, and M.I. Gut¸˘a, “An invitation to quantum tomography (II),” Journal of the Royal Statistical Society. Series B (Statistical Methodology); Preprint: math.ST/0405595. K. Usami, Y. Nambu, Y. Tsuda, K. Matsumoto, and K. Nakamura, “Accuracy of quantum-state estimation utilizing Akaike’s information criterion,” Phys. Rev. A, 68, 022314 (2003); quant-ph/0306083. V. Buˇzek, G. Adam, and G. Drobn´ y, “Quantum state reconstruction and detection of quantum coherences on different observation levels,” Phys. Rev. A, 54, 804–820 (1996). Z. Hradil, “Quantum-state estimation,” Phys. Rev. A, 55, R1561– R1564 (1997); quant-ph/9609012. V. Buˇzek, R. Derka, G. Adam, and P. L. Knight “Reconstruction of Quantum States of Spin Systems: From Quantum Bayesian Inference to Quantum Tomography,” Annals of Physics, 266, 454–496 (1998); quant-ph/9701038.

entire

December 28, 2004

13:56


CHAPTER 31

Optimal Quantum Clocks Vladim´ır Buˇzek, R. Derka, and Serge Massar Abstract. A quantum clock must satisfy two basic constraints. The first is a bound on the time resolution of the clock given by the difference between its maximum and minimum energy eigenvalues. The second follows from Holevo’s bound on how much classical information can be encoded in a quantum system. We show that asymptotically, as the dimension of the Hilbert space of the clock tends to infinity, both constraints can be satisfied simultaneously. The experimental realization of such an optimal quantum clock using trapped ions is discussed.

Recent technical advances in the laser cooling and trapping of ions suggest that coherent manipulations of trapped ions will be performed in the not too far future [13]. Apart from various important applications such as quantum information processing or improving high-precision spectroscopy these techniques also allow us to test fundamental concepts of quantum theory. In particular, much deeper insight into the problem of quantum measurement can be obtained. In this Letter we study the problem of building an optimal quantum clock from an ensemble of N ions. To be specific let us assume an ion trap with N two-level ions all in the ground state |Ψ = |0 ⊗ · · · ⊗ |0. This state is an eigenstate of the free Hamiltonian and thus cannot record time. Therefore the first step in building a clock is to bring the system to ˆ which is not an energy eigenstate. For instance one can an initial state Ω apply a Ramsey pulse whose shape and duration is chosen such that it puts all the ions in the product state ˆ prod = ρˆ ⊗ · · · ⊗ ρˆ ; Ω

ρˆ =

1 (|0 + |1)(0| + 1|). 2

(1)

We shall also consider more general states, but shall always take them Reprinted (abstract/excerpt/figure) with permission from V. Buˇzek, R. Derka, and S. c Massar, Phys. Rev. Lett., 82, 2207, 1999. 1999 by The American Physical Society 477

entire

December 28, 2004

478

13:56


Vladim´ır Buˇ zek, R. Derka, and Serge Massar

to belong to the symmetric subspace of the N ions. The basis vectors of this space will be denoted |m, m = 0, 1 . . . N . They are the completely symmetrized states of N two-level ions with m ions in the excited state and (N − m) ions in the ground state. The states |m have energy Em = m (this defines our unit of energy, setting = 1 then defines our unit of time). The reason we can restrict ourselves to the symmetric subspace is that we can map any clock state onto the symmetric subspace without affecting its dynamics. Indeed consider an initial state Ω = |ψψ| that does not belong to the symmetric subspace of the atoms.% We & can decompose N c |m, α where |m, α, α = 1, ..., ( |ψ = mα m α m ) denote a basis ˆ that maps of the states with energy m. Consider the unitary operator U the state |ψ onto the symmetric subspace without changing its energy: ˆ cmα |m, α = cm |m where |m is as before the symmetric state with U α ˆ commutes with the Hamiltonian the performance of energy m. Since U the clock based on |ψ is identical to the clock whose initial state is the ˆ symmetric state |ψsym = U|ψ. After the preparation stage, the ions evolve in time according to the ˆ (t) = exp{−itH}. ˆ The task ˆ ˆ (t)Ω Û ˆ † (t), U Hamiltonian evolution Ω(t) = U is to determine the elapsed time t by carrying out a measurement on the ions. Note that because of the indeterminism of quantum mechanics it is impossible, given a single set of N two-level ions, to determine the elapsed time with certainty. The best we can do is to estimate the elapsed time based on the result of a measurement on the system [4]. Making a good quantum clock requires a double optimization. First of all one can optimize the measurement. This aspect has been studied in detail in [4] where the best measuring strategy was derived. But one can ˆ of the system. It is this second optimization also optimize the initial state Ω that is studied in this Letter. ˆ it is Before turning to the problem of optimizing the initial state Ω, instructive to review the fundamental limitations on the performance of quantum clocks. Let us first consider a simple classical clock that can then be generalized to the quantum case. Our classical clock consists of a set of n registers. Each register is either in the 0 or the 1 state. Thus the classical clock consists of n bits, and can be in 2n different states. The dynamics of the clock is as follows: the first register flips from 0 to 1 or from 1 to 0 every 2π2−n , the second register flips every 2π2−n+1 , etc. The last register flips every π. This clock thus measures time modulo 2π. Note that this clock has an inherent uncertainty since it cannot measure time with a precision better than 2π2−n . Throughout this Letter the time uncertainty is defined

entire

December 28, 2004

13:56


Optimal Quantum Clocks

479

as ∆t2 = [(testimate − ttrue )(mod 2π)]2 .

(2)

√ For the classical clock ∆tclass = π2−n / 3. It is straightforward to replace the classical clock by a quantum version. The quantum clock consists of n two-level systems (qubits). The first qubit has an energy splitting between the levels of 2n−1 so that it has the same period as the corresponding classical register. The second qubit has an energy splitting of 2n−2 and so on up to the last qubit that has an energy splitting of 1. Considered together these n qubits constitute a quantum system with 2n equally spaced energy levels. The mapping between the Hilbert space of this abstract quantum clock and the symmetric subspace of N two-level ions is straightforward when N + 1 = 2n . Indeed in this case the dimension and energy spectrum of both Hilbert spaces coincide. Note, however, that this is not a mapping between the qubits of the clock and the ions individually, but between energy eigenstates. This comparison between classical and quantum clocks suggests that a quantum clock built out of N ions cannot behave better then a classical clock built out of ln(N + 1) registers. That this is indeed the case follows from two fundamental constraints: The first constraint is a bound on the time resolution of the clock that results from its energy spectrum. Indeed the time-energy uncertainty [7, 2] ∆t∆E ≥

1 , 2

(3)

ˆ ˆ Ω)] ˆ 2 relates the uncertainty in the estimated ˆ 2 Ω)−[Tr( H where ∆E 2 = Tr(H time [defined by Eq. (2)] to the spread in energy of the clock. In the present case√there is a state with a maximum energy uncertainty, namely |ψ+ = (1/ 2)(|N + |0) for which ∆E = N/2. Inserting in Eq. (3) shows that for any clock built out of N ions ∆t ≥

1 . N

(4)

Note that one cannot attain equality in Eq. (3). Indeed the state |ψ+ evolves with a period 2π/N , hence it necessarily has a large time uncertainty. Thus Eq. (4) is only a lower bound and maximizing the energy uncertainty as in [1] is not necessarily an optimal procedure.

entire

December 28, 2004

480

13:56



The second fundamental constraint the clock must obey is a bound on the total information it can carry. Indeed Holevo [5] has shown that one cannot encode and subsequently retrieve reliably more than n bits of classical information into n qubits. Letting a clock evolve for a given time interval t can be viewed as trying to encode information about the classical variable t into the state of the clock. Hence a measurement on our model clock [consisting of n = ln(N +1) qubits] cannot retrieve more than ln(N +1) bits of information about t. Together these two bounds imply that the quantum clock cannot perform better than the corresponding classical clock: it cannot carry more information and it cannot have better resolution. But is it possible, by making an optimal measurement and choosing in an astute manner the initial state of the clock, to make a quantum clock with similar performances to the classical one? Our main result is to show that this is indeed the case for clocks built out of symmetric states of N ions and to provide an algorithm for constructing such an optimal clock. The problem of constructing quantum clocks has been considered previously in [10, 9]. However, the best clocks considered in these papers are based on the phase state |Ψ0 described below. As √ we shall see for these clocks the time uncertainty is very large ∆t = (1/ N ) and is very far from reaching equality in Eq. (4). Recently, Vaidman and Belkind [12] considered the problem of a clock for which equality holds in Eq. (3). They showed that in the limit of large N the product states satisfy this condition. √ However, for the product state the energy uncertainty is very small: ∆E = N /2 hence they also do not saturate Eq. (4). Furthermore, clocks based on product state also do not attain Holevo’s bound. A similar approach to the one used here, namely optimizing both the initial state and the measurement on a system of N ions was considered in [6] with the aim of using the ions as an improved frequency standard. This problem can be rephrased in the following way: one disposes of a classical but noisy clock which provides some a priori knowledge about the time t and one wants to improve the knowledge of t by using the N ions. On the other hand, in the present Letter we suppose that there is no prior knowledge about t. The other difference with the present work is that our aim is to study the fundamental structure of quantum mechanics. We therefore neglect the effect of noise during the preparation and measurement stages and decoherence during the evolution. On the other hand, taking these effects into account was central to [6].

entire

December 28, 2004

13:56



481

In order to find how to build an optimal clock we must delve in detail into the functioning of this device. We first recall Holevo’s results concerning the optimal measurement strategy [4]. The measurement is described ˆ r }R by a positive operator measurement (POVM), that is a set {O r=1 of pos ˆ ˆ itive Hermitian operators such that r Or = 1. To each outcome r of the measurement we associate an estimate tr of the time elapsed. The difference between the estimated time tr and the true time t is measured by a cost function f (tr − t). Here we note that because of the periodicity of the clock, f has to be periodic. We also take f (t) to be an even function. The task is to minimize the mean value of the cost function 6 2π dt ˆ ˆ r Ω(t)]f (tr − t) . (5) Tr[O f¯ = 2π 0 r To proceed we expand the cost function in Fourier series: f (t) = w0 −

∞

wk cos kt.

(6)

k=1

The essential hypothesis made by Holevo is positivity of the Fourier coefficients: wk ≥ 0, (k = 1, 2, ...). He also supposes that the initial state ˆ = |ωω|, |ω = am |m is a pure state (and we make a phase conΩ m vention such that am is real and positive). He then shows that ∞

k=1

m,m |m−m |=k

1 wk f¯ ≥ w0 − 2

am am ,

(7)

and equality is attained only if the measurement is of the form ˆr = pr |Ψr Ψr | ; O

pr ≥ 0

;

ˆ

|Ψr = eitr H |Ψ0 ,

N 1 |m, |Ψ0 = √ N + 1 m=0

(8)

ˆ ˆ with the completeness relation r O r = 1. Several remarks about this result are called for: 1 Holevo supposed that the initial state is a pure state. If the initial state ˆ= is mixed, Ω i pi |ψi ψi |, then one finds that the corresponding cost is bounded by the average of the bounds Eq. (7). This shows that in building a good quantum clock one should always take the initial state to be pure.

entire

December 28, 2004

482

13:56



2 Holevo considered covariant measurements in which the times tr takes the continuum of values between 0 and 2π. But as shown in [3] the completeness relation can also be satisfied by taking a discreet set of 2π j times tj = N +1 , j = 0, ..., N . These “phase” states |Ψj [8] form an orthonormal basis of the Hilbert space, and this measurement is therefore a von Neumann measurement. Thus it is not necessary to use an ancilla to make the optimal measurement in this case. 3 In Eq. (7) only the values k = 0, ..., N intervene because of the condition |m − m | = k. That is, only the first N + 1 Fourier coefficients of the cost function are meaningful. 4 Because of the condition of positivity of wk , not all cost functions are covered by this result, but several important examples are 4 sin2 2t = 2(1 − cos t), |t(mod 2π)|, | sin 2t |, −δ[t(mod 2π)]. The most notable absence from this list is the quadratic deviation t2 [as defined in Eq. (2)] but it can be well approximated by the the first cost function since 4 sin2 2t = t2 for |t| % π. We now turn back to the central problem of this Letter, namely how to optimize the initial state of the system so that the time estimate is as good as possible. This corresponds to minimizing the right-hand side of Eq. (7) with respect to am . To this end let us define the matrix 1 wk (δm,m +k + δm+k,m ). 2 N

Fmm = w0 δm,m −

(9)

k=1

We must then minimize f¯ = aT Fˆ a under the condition aT a = 1. Using a Lagrange multiplier we find the eigenvalue equation (Fˆ − λˆ1)a = 0, and the task is therefore to find the smallest eigenvalue and eigenvector of Fˆ . Let us first consider the cost function f = 4 sin2 t/2. The advantage of this cost function is that the matrix F is particularly simple in this case. Furthermore, for errors much smaller then 2π, f and ∆t2 as defined in Eq. (2) coincide. Hence the first constraint on clock resolution Eq. (4) can be approximately replaced by f¯ ≥ 1/N 2 . If the initial state is the product state Eq. 9(1) then the mean cost is N −1 −N ( N )( N )] which for large given by the expression f¯ = 2[1 − 2 i=0

i

i+1

N decreases as f¯ = 1/N . If the initial state is the phase state |Ψ0 , then direct calculation shows that f¯ = N2+1 . Thus for both states ∆t = N −1/2 and one is very far from attaining equality in Eq. (4).

entire

December 28, 2004

13:56



483

However, neither of these two states is optimal. To find the optimum we note that the matrix Fˆ = 2δmm − δmm +1 − δm+1m can be viewed as the discretized second derivative operator Fˆ = −d2 /dx2 with von Neumann boundary conditions. The lowest eigenvalue of Fˆ is therefore approximately λmin = π 2 /(N + 1)2 and the corresponding eigenvector is √ π(m + 1/2) 2 |m. (10) sin |Ψopt = √ N +1 N +1 m 2 Thus in this case the cost decreases for large N as f¯opt = (Nπ+1)2 corresponding to ∆topt = (Nπ+1) . Therefore, up to a factor of π the optimal clock attains the bound Eq. (4). It is also interesting to consider the situation where the cost function is the delta function f = −δ[t(mod 2π)]. In this case Fmm = − 21π for all m, m . One checks that the phase state |Ψ0 is the eigenvector of Fˆ with minimal eigenvalue λ = − 2Nπ . Note that one could also have taken a smeared delta function since only the first N + 1 terms intervene in Eq. (7). The smeared delta function is approximately zero everywhere except in an interval of about 1/N around zero where it is equal to N . Thus for this cost function one wants to maximize the frequency with which the estimated value of t is within about N1 of the true value. But there is no extra cost if the estimated value is very far from the true one. It is for this reason that taking |Ψ0 as the initial state when the cost function is 4 sin2 t/2 is bad since making estimates that are wildly off is strongly penalized in that case. The mean value of a cost function gives only very partial information about the sensitivity of a clock. The full information is encoded in the ˆr |t)P (t)/P (O ˆr ) = P (O ˆr |t) N +1 that the true time probability P (t|tr ) = P (O 2π is t given that the readout of the measurement is tr . In the case of the optimal state |Ψopt , one finds that

Popt (t|tr ) = N

[1 + cos(N + 1)T ](1 + cos T ) , 1 − cos(T + Nπ+1 ) 1 − cos(T − Nπ+1 )

(11)

where N = 4(Nπ+1)3 and T = t − tr . This distribution (see Fig. 1) has a central peak of width 3π/(N + 1) and tails which decrease for 2πg|t − tr |gN −1 as Popt (t|tr ) = N −3 |t − tr |−4 . For the phase state |Ψ0 one finds P|Ψ0 $ (t|tr ) =

[1 − cos(N + 1)T ] 1 . 2π(N + 1) (1 − cos T )

(12)

entire

December 28, 2004

484

13:56



3.5 3.0

P(t|tr = )

2.5 2.0 1.5 1.0 0.5 0.0

0

1

2

3

t

4

5

6

Fig. 1. A plot of the a posteriori probability P (t|tr ) that the time was t given that the measurement yielded outcome tr = π for different initial states (N = 20). The dotted line corresponds to the product state Eq. (1), the solid line to the phase state |Ψ0 (8), and the dashed line to the optimal state |Ψopt (10).

This distribution has a slightly tighter central peak of width 2π/(N + 1) but the tails of the distribution decrease as N −1 |t − tr |−2 . It is these slowly decreasing tails that give the main contribution to ∆t = N −1/2 . For the product state Eq. (1) the distribution Pprod (t|tr ) has a very wide central peak of width = √1N . In this case it is the wide central peak that gives rise to the large time uncertainty. Using these distributions it is possible to calculate the number of bits of information about time that is encoded in the outcomes of the measurement. This is given by the mutual information 6 6 P (tr ) dtP (t|tr ) ln P (t|tr ). I = − dtP (t) ln P (t) + r

In all cases the integral in the second term is dominated by the central peak. Thus one finds that for the product state only 12 ln N bits of information about time are obtained, whereas for both the states |Ψ0 and |Ψopt one obtains ln N bits, thereby saturating Holevo’s bound. In summary we have seen that for different cost functions there are different optimal clocks. There is, of course, no cost function that is in an absolute sense better then another, and the choice of a particular cost

entire

December 28, 2004

13:56



485

function depends on the physical context. Nevertheless, the experimental realization of a quantum clock based on the state |Ψopt seems particularly desirable because it combines the attractive features that the a posteriori probability P (t|tr ) has a tight central peak and rapidly decreasing tails. Carrying out such an experiment with trapped ions presents two main ˆ Such coherent difficulties. The first is the preparation of the initial state Ω. manipulation of trapped ions is one of the current experimental challenges. A possibly important simplifying feature of the quantum clock is that since it is symmetric in the N ions one does not need to address each ion individually. The second problem is to realize the optimal phase measurement as discussed in the present Letter. Recent experiments [11] suggest that this type of coherent preparation and specific projective measurements are possible for systems with a moderate number of trapped ions. Therefore, one may hope that it will be feasible to make optimal quantum clocks in the not too distant future.

Acknowledgements Our special thanks to Lev Vaidman for his criticism and careful rereading of the manuscript. We also thank Win van Dam, Susana Huelga, Peter Knight and Christopher Monroe for helpful discussions. This work was in part supported by the Royal Society and by the Slovak Academy of Sciences. V. B. and S. M. thank the Benasque Center for Physics where part of this work was carried out. S.M. would like to thank Utrecht University where most of this work was carried out; he is a chercheur qualifié du FNRS.

References [1] J.J. Bollinger et al., Phys. Rev. A, 54, R4649 (1996) [2] S.L. Braunstein and C.M. Caves, Phys. Rev. Lett., 72, 3439 (1994); S.L. Braunstein, C.M. Caves, and G.J. Milburn, Ann. Phys. 247, 135 (1996). [3] R. Derka, V. Buˇzek, and A.K. Ekert, Phys. Rev. Lett 80, 1571 (1998). [4] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982. [5] A.S. Holevo, Probl. Pereda. Inf. 9, 3, 1973 [Probl. Inf. Transm. 9, 177 (1973)]. [6] S.F. Huelga et al., Phys. Rev. Lett., 79, 3865 (1997). [7] L. Mandelstam and I.E. Tamm, Izv. Akad. Nauk. SSSR Fiz., 9, 122 (1945); J. Phys. (Moscow) 9, 249 (1945). [8] D.T. Pegg and S.M. Barnett, Europhys. Lett., 6, 483 (1988). [9] A. Peres, Am. J. Phys., 48, 552 (1980). [10] H. Salecker and E.P. Wigner, Phys. Rev., 109, 571 (1958).

entire

December 28, 2004

486

13:56



[11] Q.A. Turchette et al., Phys. Rev. Lett., 81, 3631 (1998). [12] L. Vaidman (private communication); O. Belkind, M.Sc. thesis, Tel Aviv University, 1998. [13] D.J. Wineland, et al., quant-ph/9710025∗ .

∗ Editorial

note: This paper has been published as D.J. Wineland, et al., “Experimental issues in coherent quantum-state manipulation of trapped atomic ions,” J. Res. Natl. Inst. Stand. Tech., 103, 259 (1998).

entire

December 28, 2004

13:56


CHAPTER 32

Quantum Channel Identification Problem Akio Fujiwara Abstract. This paper explores an application of quantum entanglement. The problem treated here is the quantum channel identification problem: given a parametric family {Γθ }θ of quantum channels, find the best strategy of estimating the true value of the parameter θ. As a simple example, we study the estimation problem of the isotropic depolarization parameter θ for a two-level quantum system H C 2 . In the framework of noncommutative statistics, it is shown that the optimal input state on H ⊗ H to the channel exhibits a transition-like behavior according to the value of the parameter θ.

Let H be a Hilbert space that represents the physical system of interest and let S(H) be the set of density operators on H. It is well known [4] that a dynamical change Γ : S(H) → S(H) of the physical system, called a quantum channel, is represented by a trace-preserving completely positive map. But how can we identify the quantum channel that we have in a laboratory? A general scheme may be as follows: input a well-prepared state σ to the quantum channel and estimate the dynamical change σ )→ Γ(σ) by performing a certain measurement on the output state Γ(σ). It is then natural to inquire what is the best strategy of estimating a quantum channel. The purpose of this paper is to study this problem from a noncommutative statistical point of view. For mathematical simplicity, we restrict ourselves to the case in which the quantum channel to be identified lies in a smooth parametric family {Γθ ; θ = (θ1 , ..., θn ) ∈ Θ} of quantum channels. When H is finite-dimensional, this is not an essential restriction [1]. Once an input state σ for the channel is fixed, we have a parametric family {Γθ (σ)}θ∈Θ of output states, and as long as the parametrization θ )→ Γθ (σ) is nondegenerate, the problem of estimating the quantum channel is reduced to a parameter estimation problem for the noncommutative statistical model {Γθ (σ)}θ∈Θ . As a consequence, the parameter estimation problem for a family {Γθ }θ∈Θ of quantum channels amounts to finding an Reprinted (abstract/excerpt/figure) with permission from A. Fujiwara, Phys. Rev. A 63, c 042304, 2001. 2001 by The American Physical Society 487

entire

December 28, 2004

488

13:56


Akio Fujiwara

optimal input state σ for the channel and an optimal estimator for the parametric family {Γθ (σ)}θ of output states. One may imagine that this problem does not exceed the realm of conventional quantum estimation theory [2, 3]. But, in fact, it opens a new field of research in noncommutative statistics. Since each channel Γθ is completely positive, it can be extended to the composite quantum system H ⊗ H. In view of statistical parameter estimation, there are two essentially different extensions that have the same parametrization θ as Γθ : one is Γθ ⊗ I : S(H ⊗ H) → S(H ⊗ H), where I denotes the identity channel, and the other is Γθ ⊗ Γθ : S(H ⊗ H) → S(H ⊗ H). A question arises naturally: what happens when we use an entangled state as an input to the extended channel? In what follows, we demonstrate a somewhat nontrivial aspect of this problem. Let H := C 2 and let the channel Γθ : S(H) → S(H) be definde by 7 7 8 8 1 1 + z x − iy 1 1 + θz θ(x − iy) Γθ = . 2 x + iy 1 − z 2 θ(x + iy) 1 − θz The parameter θ represents the magnitude of isotropic depolarization. The channel can be uniquely extended on the 2 × 2 matrix algebra C 2×2 as follows. 7 7 7 7 8 8 8 8 1 0 2θ 1 1+θ 0 01 10 Γθ = = , Γθ , 00 00 0 1−θ 2 0 0 2 7 7 7 7 8 8 8 8 1 0 0 1 1−θ 0 00 00 = = Γθ , Γθ . 10 01 0 1+θ 2 2θ 0 2 To ensure that Γθ is completely positive, the parameter θ must lie in the closed interval Θ := [− 13 , 1]. (See [1].) We thus have a one-parameter family {Γθ ; θ ∈ Θ} of quantum channels, and our task is to estimate the true value of θ. Before proceeding to the parameter estimation for {Γθ }θ , we give a brief account of the one-parameter quantum estimation theory for density operators. (Consult [2] or [3] for details.) Given a one-parameter family {ρθ }θ of density operators, an estimator for the parameter θ is represented by a Hermitian operator T , normally with a requirement that the estimator should be (locally) unbiased: that is, if the system is in the state ρθ , then the expectation Eθ [T ] := Trρθ T of the estimator T should be identical to θ. It is easy to show that every (locally) unbiased estimator T for the parameter θ satisfies the quantum Cramér-Rao inequality Vθ [T ] ≥ (Jθ )−1 , where Vθ [T ] := Trρθ (T − θ)2 is the variance of estimator T ,

entire

December 28, 2004

13:56


Quantum Channel Identification Problem

489

and Jθ := J(ρθ ) := Trρθ (Lθ )2 is the quantum Fisher information with Lθ the symmetric logarithmic derivative (SLD), i.e., the Hermitian operator that satisfies the equation 1 dρθ = (Lθ ρθ + ρθ Lθ ). dθ 2 It is important to notice that the lower bound (Jθ )−1 in the quantum Cramér-Rao inequality is achievable (at least locally). In other words, the inverse of the SLD Fisher information gives the ultimate limit of estimation. As a consequence, the larger the SLD Fisher information is, the more accurately we can estimate the parameter θ. Let us return to the parameter estimation problem for the oneparameter family {Γθ }θ of quantum channels. Taking account of the abovementioned one-parameter estimation theory for density operators, our task is reduced to finding an optimal input for the channel that maximizes the SLD Fisher information of the corresponding parametric family of output states. We start with the maximization of the SLD Fisher information of the family {Γθ (σ)}θ with respect to the input state σ ∈ S(H). An important observation is that the maximum is attained by a pure state. To see this, it suffices to prove the convexity of the SLD Fisher information, i.e., if ρθ = λσθ + (1 − λ)τθ for a constant λ between 0 and 1, then J(ρθ ) ≤ λJ(σθ ) + (1 − λ)J(τθ ). Let us introduce the states ρ˜θ := λσθ ⊕ (1 − λ)τθ on H⊕H. It is easy to show that J(˜ ρθ ) = λJ(σθ )+(1−λ)J(τθ ). By identifying H ⊕ H with C 2 ⊗ H, we consider the partial trace TrC 2 : S(H ⊕ H) → S(H). Since TrC 2 is a stochastic (i.e., trace preserving completely positive) map, the monotonicity of the SLD Fisher information with respect to a stochastic map [5] shows that J(˜ ρθ ) ≥ J(TrC 2 ρ˜θ ) = J(ρθ ). This completes the proof of the convexity of the SLD Fisher information. Now we are ready to specify an optimal input state. Since our channel Γθ is unitarily invariant (i.e., isotropic in the Stokes parameter space), we can take without loss of generality the optimal input to be σ = |ee| where e| = (1, 0). The corresponding output state ρθ := Γθ (σ) is 7 8 1 1+θ 0 ρθ = . 0 1−θ 2 Since the state ρθ is isomorphic to the classical coin flipping in which “heads” occur with probability (1 + θ)/2, the SLD Fisher information be-

entire

December 28, 2004

490

13:56


Akio Fujiwara

comes Jθ =

1 . 1 − θ2

We next study the extended channel Γθ ⊗ I : S(H ⊗ H) → S(H ⊗ H). In this case we can use a possibly entangled state as the input. For the same reason as above, we can take the input to be a pure state: σ ˆ = |ψψ| where ψ ∈ H ⊗ H. By the Schmidt decomposition, the vector ψ is represented as √ √ (1) |ψ = x |e1 |f 1 + 1 − x |e2 |f 2 , where x is a real number between 0 and 1, and {e1 , e2 } and {f 1 , f 2 } are orthonormal bases of H = C 2 . Since the channels Γθ and I are both unitarily invariant, we can assume without loss of generality that the optimal input takes the form (1) with e1 | = f 1 | = (0, 1) and e2 | = f 2 | = (1, 0). The constant x remains to be determined. The corresponding output state σ ) becomes ρˆθ := Γθ ⊗ I(ˆ ;   (1 − x)(1 + θ) 0 0 2 x(1 − x) θ  1 0 x(1 − θ) 0 0 . ρˆθ =    0 (1 − x)(1 − θ) 0 2 ; 0 0 0 x(1 + θ) 2 x(1 − x) θ The SLD for the family {ρˆθ }θ is given by ;   4 x(1 − x) 1 + 2θ − 3θ2 − 8θx 0 0  (1 − θ2 )(1 + 3θ) (1 − θ)(1 + 3θ)      1   0 0 0 −   1−θ ˆθ =  L , 1   0 0 0 −   1−θ   ; 2  4 x(1 − x) 1 − 6θ − 3θ + 8θx  0 0 (1 − θ)(1 + 3θ) (1 − θ2 )(1 + 3θ) and the SLD Fisher information is 1 + 3θ + 8x(1 − x) Jˆθ = . (1 − θ2 )(1 + 3θ) When x = 0 or 1, the above SLD Fisher information Jˆθ is identical to Jθ . This is a matter of course: the input state is disentangled in this case and no information about the parameter θ is available via the independent channel I. When x "= 0 and "= 1, the SLD Fisher information Jˆθ diverges at θ = 1 and − 31 . This is because the complete positivity of the channel Γθ breakes across these values. Now let us specify the optimal input state. For every

entire

December 28, 2004

13:56



491

θ, the SLD Fisher information Jˆθ takes the maximum 3/(1 − θ)(1 + 3θ) at x = 12 . Therefore the optimal input for the channel Γθ ⊗ I is the maximally entangled state. The implication of this result is profound: although we use the channel Γθ only once, extra information about the channel is obtained via entanglement of the input state. In particular, the use of entanglement improves exceedingly the performance of estimation as θ approaches − 13 . Let us proceed to the analysis of the other extended channel Γθ ⊗ Γθ : S(H ⊗ H) → S(H ⊗ H). As before, we can take the input to be a pure state σ ˇ = |ψψ| where ψ is given by Eq. (1) with e1 | = f 1 | = (0, 1) σ) and e2 | = f 2 | = (1, 0). The corresponding output state ρˇθ := Γθ ⊗ Γθ (ˇ becomes ;   0 4 x(1 − x) θ2 x(1−θ)2 +(1−x)(1+θ)2 0  1 0 1 − θ2 0 0 . ρˇθ =  2   0 0 0 1 − θ 4 ; 2 2 2 0 0 x(1+θ) +(1−x)(1−θ) 4 x(1 − x) θ Since the SLD for the family {ρˇθ }θ is too complicated to write down, we give the SLD Fisher information only. 4θ4 + 5θ2 − 1 8θ2 x(1 − x) 1 − θ2 + . + 2 Jˇθ = 2 4 4 2 2θ (1 − θ ) 1−θ 2θ (1 + θ )(1 − θ2 + 16θ2 x(1 − x)) When x = 0 or 1, the above SLD Fisher information Jˇθ becomes 2/(1 − θ ), which precisely doubles the Jθ . Again this is a matter of course: the input state is disentangled in this case and the same amount of information about the parameter θ is obtained per independent use of the channel Γθ . When x "= 0 and "= 1, the SLD Fisher information Jˇθ diverges at θ = 1 but does not at θ = − 13 . This is because the requirement of positivity for the channel Γθ ⊗ Γθ is strictly weaker than that for the channel Γθ ⊗ I (i.e., the complete positivity for Γθ ). Now we examine a rather unexpected behavior of the optimal input state. For √13 ≤ θ < 1, the SLD Fisher information Jˇθ takes the maximum 12θ2 /(1 − θ2 )(1 + 3θ2 ) at x = 1 , while 2

2

for − 13 ≤ θ ≤ √13 , it takes the maximum 2/(1 − θ2 ) at x = 0 and 1. (See Fig. 1.) Namely, the optimal input state “jumps” from the maximally √ entangled state to a disentangled state at θ = 1/ 3. It is surprising that the seemingly homogeneous family {Γθ }θ of depolarization channels involves a transition-like behavior. Finally we mention the possibility of extending the channel Γθ in the form Γθ ⊗ Γ where Γ is a channel that is known to the observer and is independent of θ. Since the channel Γθ ⊗Γ is decomposed into (I ⊗Γ )(Γθ ⊗

entire

December 28, 2004

13:56


Akio Fujiwara

492

SLD Fisher information

5 4 3 2 1 0 0

0.2

0.4

0.6

0.8

1

x √ Fig. 1. SLD Fisher information Jˇθ versus x, for θ = 0.7 (dashed), θ = 1/ 3 (solid), and θ = 0.3 (chained).

I), the monotonicity argument for the SLD Fisher information with respect to a stochastic map allows us to deduce that the best choice of the channel Γ is the identity channel. To conclude, among those we have considered on the second extension H ⊗ H of the quantum system, the best strategy of estimating the isotropic depolarization parameter θ is the following. For √13 ≤ θ ≤ 1, use Γθ ⊗Γθ and input a maximally entangled state on H ⊗ H; for 13 ≤ θ ≤ √13 , use Γθ twice independently and input any pure state on H each time; for − 31 ≤ θ ≤ 13 , use Γθ ⊗ I and input a maximally entangled state on H ⊗ H. We have demonstrated a nontrivial aspect of statistical estimation problem for a quantum channel. Other problems, such as the use of the nth extension H⊗n and its asymptotics, or the multi parameter quantum channel estimation, will be presented elsewhere. References [1] A. Fujiwara and P. Algoet, Phys. Rev. A 59, 3290 (1999). [2] C.W. Helstrom, Quantum detection and estimation theory, Academic, New York, 1976. [3] A.S. Holevo, Probabilistic and statistical aspects of quantum theory, NorthHolland, Amsterdam, 1982. [4] K. Kraus, States, effects, and operations, Lecture Notes in Physics, 190 Springer, Berlin, 1983. [5] Intuitively speaking, the monotonicity of the SLD Fisher information implies that the distance of two nearby quantum states measured by the SLD Fisher metric shrinks as they are mapped by a stochastic map. For a complete

entire

December 28, 2004

13:56



493

characterization of monotone metrics, see D. Petz, Linear Algebr. Appl., 244, 81, (1996); D. Petz and Cs. Sud´ ar, “Extending the Fisher metric to density matrices,” in O.E. Barndorff-Nielsen and E.B. Vedel Jensen, eds., Geometry in present day sciences, 21-33, World Scientific, Singapore (1999).

entire

December 28, 2004

13:56


CHAPTER 33

Homodyning as Universal Detection Giacomo Mauro D’Ariano Abstract. Homodyne tomography — i.e., homodyning while scanning the local oscillator phase — is now a well assessed method for “measuring” the quantum state. In this paper I will show how it can be used as a kind of universal detection, for measuring generic field operators, however at expense of some additional noise. The general class of field operators that can be measured in this way is presented, and includes also operators that are inaccessible to heterodyne detection. The noise from tomographical homodyning is compared to that from heterodyning, for those operators that can be measured in both ways. It turns out that for some operators homodyning is better than heterodyning when the mean photon number is sufficiently small. Finally, the robustness of the method to additive phase-insensitive noise is analyzed. It is shown that just half photon of thermal noise would spoil the measurement completely.

1. Introduction Homodyne tomography is the only viable method currently known for determining the detailed state of a quantum harmonic oscillator — a mode of the electromagnetic field. The state measurement is achieved by repeating many homodyne measurements at different phases φ with respect to the local oscillator (LO). The experimental work of the group in Eugene-Oregon [18] undoubtedly established the feasibility of the method, even though the earlier data analysis were based on a filtered procedure that affected the results with systematic errors. Later, the theoretical group in Pavia-Italy presented an exact reconstruction algorithm [7], which is the method currently adopted in actual experiments (see, for example, Refs. [16] and [17]). The reconstruction algorithm of Ref. [7] was later greatly simplified [5], so that it was possible also to recognize the feasibility of the method even for nonideal quantum efficiency η < 1 at the homodyne detector, and, at the c 1997 Kluwer. Reprinted, with permission, from Quantum Communication, Computing, and Measurement, (edited by Hirota, O., Holevo, A. S., and Caves, C.A.), Plenum, New York, 1997, 253–264. 494

entire

December 28, 2004

13:56


Homodyning as Universal Detection

495

same time, establishing lower bounds for η for any given matrix representation. After these first results, further theoretical progress has been made, understanding the mechanisms that underly the generation of statistical errors [2], thus limiting the sensitivity of the method. More recently, for η = 1 non trivial factorization formulas have been recognized [19, 14] for the “pattern functions” [15] that are necessary to reconstruct the photon statistics. In this paper I will show how homodyne tomography can also be used as a method for measuring generic field operators. In fact, due to statistical errors, the measured matrix elements cannot be used to obtain expectations of field operators, and a different algorithm for analyzing homodyne data is needed suited to the particular field operator whose expectation one wants to estimate. Here, I will present an algorithm valid for any operator that admits a normal ordered expansion, giving the general class of operators that can be measured in this way, also as a function of the quantum efficiency η. Hence, from the same bunch of homodyne experimental data, now one can obtain not only the density matrix of the state, but also the expectation value of various field operators, including some operators that are inaccessible to heterodyne detection. However, the price to pay for such detection flexibility is that all measured quantities will be affected by noise. But, if one compares this noise with that from heterodyning (for those operators that can be measured in both ways), it turns out that for some operators homodyning is less noisy than heterodyning, at least for small mean photon numbers. Finally, I will show that the method of homodyne tomography is quite robust to sources of additive noise. Focusing attention on the most common situation in which the noise is Gaussian and independent on the LO phase, I will show that this kind of noise produces the same effect of nonunit quantum efficiency at detectors. Generalizing the result of Ref. [5], I will give bounds for the overall rms noise level below which the tomographical reconstruction is still possible. I will show that the smearing effect of half photon of thermal noise in average is sufficient to completely spoil the measurement, making the experimental errors growing up unbounded.

2. Short Up-to-Date Review on Homodyne Tomography The homodyne tomography method is designed to obtain a general matrix element ψ|ˆ U|ϕ in form of expectation of a function of the homodyne

entire

December 28, 2004

496

13:56


Giacomo Mauro D’Ariano

outcomes at different phases with respect to the LO. In equations, one has 6 +∞ 6 π dφ ψ|ˆ U|ϕ = dx p(x; φ) fψϕ (x; φ) , (1) 0 π −∞ where p(x; φ) denotes probability distribution of the outcome x of the % † the & 1 iφ −iφ quadrature x ˆφ = 2 a e + ae of the field mode with particle operators a and a† at phase φ with respect to the LO. Notice that it is sufficient xφ . One to average only over φ ∈ [0, π], due to the symmetry x ˆφ+π = −ˆ wants the function fψϕ (x; φ) bounded for all x, whence every moment will be bounded for any possible (a priori unknown) probability distribution p(x; φ). Then, according to the central-limit theorem, one is guaranteed that the integral in Eq. (1) can be sampled statistically over a sufficiently large set of data, and the average values for different experiments will be Gaussian distributed, allowing estimation of confidence intervals. If, on the other hand, the kernel fψϕ (x; φ) turns out to be unbounded, then we will say that the matrix element cannot be measured by homodyne tomography. The easiest way to obtain the integral kernel fψϕ (x; φ) is starting from the operator identity 6 2 † † d α Tr(ˆ Ue−αa+αa ) e−αa +αa (2) Uˆ = π which, by changing to polar variables α = (i/2)keiφ , becomes 6 π 6 dφ +∞ dk |k| Tr(ˆ Ueikˆxφ ) e−ikˆxφ . Uˆ = π −∞ 4 0

(3)

Equation (2) is nothing but the operator form of the Fourier-transform relation between Wigner function and characteristic function: it can also be considered as an operator form of the Moyal identity 6 2 d z ˆ † (z)|ml|D(z)|n ˆ k|D = k|nl|m . (4) π The trace-average in Eq. (3) can be evaluated in terms of p(x, φ), using the ˆφ , and exchanging the integrals over complete set {|xφ } of eigenvectors of x x and k. One obtains 6 6 π dφ +∞ Uˆ = dx p(x; φ)K(x − x ˆφ ) , (5) π −∞ 0 where the integral kernel K(x) is given by 1 1 1 1 , K(x) = − P 2 ≡ − lim Re 2 x (x + iε)2 ε→0+ 2

(6)

entire

December 28, 2004

13:56



497

P denoting the Cauchy principal value. Taking matrix elements of both sides of Eq. (5) between vectors ψ and ϕ, we obtain the sampling formula we were looking for, namely 6 6 π dφ +∞ dx p(x; φ)ψ|K(x − x ˆφ )|ϕ . (7) ψ|ˆ U|ϕ = π −∞ 0 Hence, the matrix element ψ|ˆ U|ϕ is obtained by averaging the function fψϕ (x; φ) ≡ ψ|K(x − xˆφ )|ϕ over homodyne data at different phases φ. As we will see soon, despite K(x) is unbounded, for particular vectors ψ and ϕ in the Hilbert space the matrix element ψ|K(x − xˆφ )|ϕ is bounded, and thus the integral (7) can be sampled experimentally. Before analyzing specific matrix representations, I recall how the sampling formula (7) can be generalized to the case of nonunit quantum efficiency. Low efficiency homodyne detection simply produces a probability pη (x; φ) that is a Gaussian convolution of the ideal probability p(x; φ) for η = 1 (see, for example, Ref. [3]). In terms of the generating functions of the x ˆφ -moments one has 6 +∞ 6 +∞ 1−η 2 ikx k dx pη (x; φ)e = exp − dx p(x; φ)eikx . (8) 8η −∞ −∞ Upon substituting Eq. (8) into Eq. (3), and by following the same lines that lead us to Eq. (5), one obtains the operator identity 6 π 6 dφ +∞ dx pη (x; φ)Kη (x − x ˆφ ) , (9) Uˆ = π −∞ 0 where now the kernel reads 6 +∞ 1−η 2 1 k + ikx . Kη (x) = Re dk k exp 2 8η 0

(10)

The desired sampling formula for ψ|ˆ U|ϕ is obtained again as in Eq. (7), by taking matrix elements of both sides of Eq. (10). Notice that now the kernel Kη (x) is not even a tempered distribution: however, as we will see immediˆφ ) are bounded for some representaately, the matrix elements of Kη (x − x ˆφ )|ϕ tions, depending on the value of η. The matrix elements ψ|Kη (x − a are bounded if the following inequality is satisfied for all phases φ ∈ [0, π] η>

1 , 1 + 4ε2 (φ)

(11)

where ε2 (φ) is the harmonic mean 2 ε2 (φ)

=

1 ε2ψ (φ)

+

1 ε2ϕ (φ)

,

(12)

entire

December 28, 2004

13:56



498

and ε2υ (φ) is the “resolution” of the vector |υ in the x ˆφ -representation, namely: 7 8 x2 2 |φ x|υ| = exp − 2 . (13) 2ευ (φ) In Eq. (13) the symbol = stands for the leading term as a function of x, † ˆφ for eigenvalue and |xφ ≡ eia aφ |x denote eigen-ket of the quadrature x x. Upon maximizing Eq. (11) with respect to φ one obtains the bound η>

1 , 1 + 4ε2

ε2 = min {ε2 (φ)} . φ∈[0,π]

(14)

One can easily see that the bound is η > 1/2 for both number-state and coherent-state representations, whereas it is η > (1 + s2 )−1 ≥ 1/2 for squeezed-state representations with minimum squeezing factor s < 1. On the other hand, for the quadrature representation one has η > 1, which means that this matrix representation cannot be measured. The value η = 1/2 is actually an absolute bound for all representations satisfying the “Heisenberg relation” ε(φ)ε(φ + π2 ) ≥ 14 with the equal sign, which include all known representations (for a discussion on the existence of exotic representations see Ref. [4]). Here, I want to emphasize that the existence of such a lower bound for quantum efficiency is actually of fundamental relevance, as it prevents measuring the wave function of a single system using schemes of weak repeated indirect measurements on the same system [10]. At the end of this section, from Ref. [5] I report for completeness the kernel n|K(x − x ˆφ )|m for matrix elements between number eigenstates. One has > 2 2 n! −idφ d+2 n|Kη (x − x ˆφ )|n + d = e 2κ (15) e−κ x (n + d)! n (−)ν n + d × (2ν + d + 1)!κ2ν Re (−i)d D−(2ν+d+2) (−2iκx) , ν! n−ν ν=0 ; where κ = η/(2η − 1), and Dσ (z) denotes the parabolic cylinder function. For η = 1 the kernel factorizes as follows [19, 14] n|K(x − x ˆφ )|n + d = e−idφ [2xun (x)vn+d (x) −

√

n + 1un+1 (x)vn+d (x) −

√

m+1un (x)vn+d+1 (x)],

(16)

entire

December 28, 2004

13:56



499

where un (x) and vn (x) are the regular and irregular energy eigen-functions of the harmonic oscillator j 1/4 2 2 1 ∂x e−x , x− uj (x) = √ 2 π j! j 6 √2x 2 1 ∂x 1/4 −x2 vj (x) = √ (2π) e dt et . (17) x− 2 j! 0 3. Measuring Generic Field Operators Homodyne tomography provides the maximum achievable information on the quantum state, and, in principle, the knowledge of the density matrix ˆ = Tr[O ˆ Uˆ] of any should allow one to calculate the expectation value O ˆ However, this is generally true only when one has an analytic observable O. knowledge of the density matrix, but it is not true when the matrix has been obtained experimentally. In fact, the Hilbert space is actually infinite dimensional, whereas experimentally one can achieve only a finite matrix, each element being affected by an experimental error. Notice that, even though the method allows one to extract any matrix element in the Hilbert space from the same bunch of experimental data, however, it is the way in which errors converge in the Hilbert space that determines the actual ˆ Uˆ]. To make things more concrete, possibility of estimating the trace Tr[O let us fix the case of the number representation, and suppose we want to estimate the average photon number a† a. In Ref. [9] it has been shown that for nonunit quantum efficiency the statistical error for the diagonal matrix element n|ˆ U|n diverges faster than exponentially versus n, whereas for ; η = 1 the error saturates for large n to the universal value εn = 2/N that depends only on the number N of experimental data, but is independent on both n and on the quantum state. Even for the unrealistic case η = 1, one can see immediately that the estimated expectation value a† a = H−1 n=0 nUnn based on the measured matrix elements Unn , is not guaranteed to converge versus the truncated-space dimension H, because the error on Unn is nonvanishing versus n. Clearly in this way I am not proving that the expectation a† a is unobtainable from homodyne data, because matrix errors convergence depends on the chosen representation basis, whence the ineffectiveness of the method may rely in the data processing, more than in the actual information contained in the bunch of experimental data. Therefore, the question is: is it possible to estimate a generic expectation ˆ directly from homodyne data, without using the measured density value O matrix? As we will see soon, the answer is positive in most cases of interest,

entire

December 28, 2004

13:56



500

ˆ will be referred to as and the procedure for estimating the expectation O ˆ homodyning the observable O. ˆ I mean averaging an appropriate kerBy homodyning the observable O ˆ nel function R[O](x; φ) (independent on the state Uˆ) over the experimental homodyne data, achieving in this way the expectation value of the observˆ for every state Uˆ. Hence, the kernel function R[O](x; ˆ able O φ) is defined through the identity 6 6 π dφ +∞ ˆ ˆ = dx p(x; φ)R[O](x; φ) . (18) O π −∞ 0 ˆ From the definition of R[O](x; φ) in Eq. (18), and from Eqs. (2) and (3) — which generally hold true for any Hilbert-Schmidt operator in place of Uˆ — one obtains 6 π 6 dφ +∞ ˆ= ˆ O dx R[O](x; φ)|xφ φ x| , (19) π −∞ 0 ˆ with the kernel R[O](x; φ) given by ˆ ˆ R[O](x; φ) = Tr[OK(x −x ˆφ )] ,

(20)

and K(x) given in Eq. (6). The validity of Eq. (20), however, is limited ˆ otherwise it is ill defined. only to the case of a Hilbert-Schmidt operator O, ˆ Nevertheless, one can obtain the explicit form of the kernel R[O](x; φ) in a different way. Starting from the identity involving trilinear products of Hermite polynomials [11] 6 +∞ m+n+k 1 2 2 2 π 2 k!m!n! , (21) dx e−x Hk (x) Hm (x) Hn (x) = (s − k)!(s − m)!(s − n)! −∞ for k + m + n = 2s even . Richter proved the following nontrivial formula for the expectation value of the normally ordered field operators [20] √ 6 π 6 dφ +∞ †n m i(m−n)φ Hn+m ( 2x) √ a a = (22) dx p(x; φ)e % &, π −∞ 2n+m n+m 0 n which corresponds to the kernel †n m

R[a a ](x; φ) = e

i(m−n)φ

√ Hn+m ( 2x) √ % & . 2n+m n+m n

(23)

This result can be easily extended to the case of nonunit quantum efficiency η < 1, as the normally ordered expectation a†n am just gets an extra factor

entire

December 28, 2004

13:56



501

1

η 2 (n+m) . Therefore, one has †n m

Rη [a a ](x; φ) = e

i(m−n)φ

√ Hn+m ( 2x) ; % &, (2η)n+m n+m n

(24)

ˆ φ) is defined as in Eq. (18), but with the experiwhere the kernel Rη [O](x; mental probability distribution pη (x; φ). From Eq. (24) by linearity on can obtain the kernel Rη [fˆ](x; φ) for any operator function fˆ that has normal ordered expansion fˆ ≡ f (a, a† ) =

∞

(n) †n m fnm a a .

(25)

nm=0

From Eq. (24) one obtains Rη [fˆ](x; φ) =

√ ∞ ∞ Hs ( 2x)

(n) i(m−n)φ fnm e n!m!δn+m,s s/2 s!(2η) s=0 nm=0 √ ∞ Hs ( 2x)is ds = F [fˆ](v; φ), s!(2η)s/2 dv s s=0

where ∞

F [fˆ](v; φ) =

nm=0

(n) fnm

−1 n+m (−iv)n+m ei(m−n)φ . m

Continuing from Eq. (26) one obtains 1 d2 2ix d ˆ Rη [f ](x; φ) = exp +√ 2η dv 2 η dv and finally Rη [fˆ](x; φ) =

6

+∞

−∞

(26)

v=0

F [fˆ](v; φ) ,

(27)

(28)

v=0

η 2 dw √ ; e− 2 w F [fˆ](w + 2ix/ η; φ) . −1 2πη

(29)

Hence one concludes that the operator fˆ can be measured by homodyne tomography if the function F [fˆ](v; φ) in Eq. (27) grows slower than exp(−ηv 2 /2) for v → ∞, and the integral in Eq. (29) grows at most exponentially for x → ∞ (assuming p(x; φ) goes to zero faster than exponentially at x → ∞). ˆ ˆ One can φ) for some operators O. In Table 3 I report the kernel Rη [O](x; see that for the raising operator eˆ+ the kernel diverges at η = 1/2+ , namely ˆ s in the same table it can be measured only for η > 1/2. The operator W ¯ for ordering parameter s gives the generalized Wigner function Ws (α, α)

entire

December 28, 2004

13:56



502

ˆ ˆ † (α)W ˆ s ]. From the expression through the identity Ws (α, α) ¯ = Tr[D(α)ˆ UD ˆ of Rη [Ws ](x; φ) it follows that by homodyning with quantum efficiency η one can measure the generalized Wigner function only for s < 1 − η −1 : in particular, as already noticed in Refs. [5], the usual Wigner function for s = 0 cannot be measured for any quantum efficiency [in fact one would ˆ ˆ † (α)W ˆ 0 D(α)](x; φ) = K[x − Re(αe−iφ )], with K(x) unbounded have R1 [D as given in Eq. (6)]. ˆ Rη [O](x; φ) √ Hn+m ( 2x) ei(m−n)φ √ %n+m& 2n+m n

ˆ O (1)

a†n am

(2)

a

2eiφ x

(3)

a2

e2iφ (4x2 − 1)

(4)

a† a

(5)

(a† a)2

(6)

† . : D (α) := e−αa eαa

2x2 − 8 4 3x

− 2x2

iφ 1 2x exp[− 2η (αeiφ )2 + √ η αe ]

ˆ†

+

−2iφ 1+ α αe 1 2x exp[− 2η (αe−iφ )2 − √ αe−iφ ] η 2iφ 1+ α αe

1 2xe−iφ √2πη

(7)

(8) (9)

. eˆ+ = a† √

. ˆs = W

2 π(1−s)

2

s+1 s−1

|n + d n|

a† a

2 +∞

e−v × (1+z) 2Φ

1 1+a† a

1 2

6

∞ 0

2e−t dt π(1 − s) −

dv

−∞ 3 x2 , 2, 2 ; 1+z −1 2

z= > 1 η

cos 2

e−v −1 2η

2t (1 − s) −

! 1 η

x

n|K(x − x ˆφ )|n + d in Eqs. (15) and (16)

ˆ ˆ [The symbol Table 1: Kernel Rη [O](x; φ), as defined in Eq. (18), for some operators O. Φ(a, b; x) denotes the customary confluent hypergeometric function.]

3.1. Comparison between homodyne tomography and heterodyning We have seen that from the same bunch of homodyne tomography data, not only one can recover the density matrix of the field, but also one can measure any field observable fˆ ≡ f (a, a† ) having normal ordered ex∞ (n) †n m and bounded integral in pansion fˆ ≡ f (n) (a, a† ) = nm=0 fnm a a Eq. (29) — this holds true in particular for any polynomial function

entire

December 28, 2004

13:56



503

of the annihilation and creation operators. This situation can be compared with the case of heterodyne detection, where again one measures general field observables, but admitting anti-normal ordered expansion ∞ (a) fˆ ≡ f (a) (a, a† ) = nm=0 fnm am a†n , in which case the expectation value is obtained through the heterodyne average 6 2 d α (a) f (α, α)α|ˆ fˆ = U|α . (30) π For η = 1 the heterodyne probability is just the Q-function Q(α, α) = 1 U|α, whereas for η = 1 it will be Gaussian convoluted. As shown by π α|ˆ Baltin [1], generally the anti-normal expansion either is not defined, or is not consistent on the Fock basis, namely f (a) (a, a† )|n has infinite norm or is different from fˆ(a, a† )|n for some n ≥ 0. In particular, let us focus ∞ † l attention on functions of the number operator f (a† a) = l=0 cl (a a) , (n) (a) ∞ ∞ f (n) (a† a) = l=0 cl a†l al , f (a) (a† a) = l=0 cl al a†l . Baltin has shown that [1] (n)

cl

=

1 l!

6

+∞ −∞

6

dλ g(λ)(e−iλ − 1)l =

l (−)l−k f (k) , k!(l − k)! k=0

(−)k f (−k − 1) 1 , dλ eiλ g(λ)(1 − eiλ )l = l! −∞ k!(l − k)! k=0 6 +∞ dx . g(λ) = f (x)eiλx . −∞ 2π (a)

cl

=

+∞

l

(31)

From Eqs. (31) one can see that the normal ordered expansion is always well defined, whereas the anti-normal ordering needs extending the domain of f to negative integers. However, even though the anti-normal expansion is defined, this does not mean that the expectation of f (a† a) can be obtained through heterodyning, because the integral in Eq. (30) may not exist. Actually, this is the case when the anti-normal expansion is not consistent on the Fock basis. In fact, for the exponential function f (a† a) = exp(−µa† a) one has f (a) (|α|2 ) = eµ exp[(1 − eµ )|α|2 ]; on the Fock basis f (a) (a† a)|n is a binomial expansion with finite convergence radius, and this gives the consistency condition |1 − eµ | < 1. However, one can take the analytic continuation corresponding for 1 − eµ < 1, which coincides with the condition that the integral in Eq. (30) exists for any state Uˆ (the Q-function vanishes as exp(−|α|2 ) for α → ∞, at least for states with limited photon number). This argument can be extended by Fourier transform to more general functions f (a† a), leading to the conclusion that there are field op-

entire

December 28, 2004

504

13:56



erators that cannot be heterodyne-measured, even though they have well defined anti-normal expansion, but the expansion is not consistent on the ˆ s in Fock basis. As two examples, I consider the field operators eˆ+ and W Table 3. According to Eqs. (31) it follows that the operator eˆ+ does not admit an anti-normal expansion, whence it cannot be heterodyne detected. This is in agreement with the fact that according to Table 3 we can homodyne eˆ+ only for η > 1/2, and heterodyning is equivalent to homodyning with effective quantum efficiency η = 1/2 (which corresponds to the 3 dB ˆ s is noise due to the joint measurement [21]). The case of the operator W different. It admits both normal-ordered and anti-normal-ordered forms:

2 2 2 † † ˆs = 2 W : exp − a a := − : exp a a : , where : ... : A π(1−s) 1−s π(1+s) 1+s denotes normal ordering and : . . . :A anti-normal. However, the consistency condition for anti-normal ordering is 2/(s + 1) < 1, with s ≤ 1, which imˆ s for s > −1, again in agreement with the plies that one can heterodyne W value of s achievable by homodyne tomography at η = 1/2. Now I briefly analyze the additional noise from homodyning field operators, and compare them with the heterodyne noise. For a complex random variable z = u + iv the noise is given by the eigenvalues N (±) = |z|2 − |z|2 ± |z 2 − z 2 | of the covariance matrix. When homodyning the field, the random variable is z ≡ 2eiφ x [22] and the average over-line denotes the double integral over x and φ in Eq. (18). From Table (3) one has z = a, z 2 = a2 , |z|2 = 2a† a+1, e2iφ = 0 [23]. In this way one finds that the noise (±) from homodyning the field is Nhom [a] = 1 + 2a† a − |a|2 ± |a2 − a2 |. On the other hand, when heterodyning, z becomes the heterodyne output photocurrent, whence z = a, z 2 = a2 , |z|2 = a† a + 1, and one has (±) Nhet [a] = 1 + a† a − |a|2 ± |a2 − a2 |, so that the tomographical noise is larger than the heterodyne noise by a term equal to the average photon number, i. e. Nhom [a] = Nhet [a] + a† a . (±)

(±)

(32)

Therefore, homodyning the field is always more noisy than heterodyning it. On the other hand, for other field observables it may happen that homodyne tomography is less noisy than heterodyne detection. For example, one can n] when homodyning the photon number easily evaluate the noise Nhom [ˆ † n ˆ = a a. The random variable corresponding to the photon number is ν(z) = 12 (|z|2 − 1) ≡ 2x2 − 12 , and from Table 3 we see that the noise . n] = ∆ν 2 (z) can be written as Nhom [ˆ n] = ∆ˆ n2 + 12 ˆ n2 + n ˆ + 1 Nhom [ˆ [9]. When heterodyning the field, the random variable corresponding to the photon number is ν(z) = |z|2 − 1, and from the relation |z|4 = a†2 a2 one

entire

December 28, 2004

13:56



505

. obtains Nhet [ˆ n] = ∆ν 2 (z) = ∆ˆ n2 + ˆ n + 1, namely 1 2 Nhom [ˆ n −n n] = Nhet [ˆ n] + ˆ ˆ − 1 . (33) 2 We thus conclude that homodyning the photon number is less noisy √ than heterodyning it for sufficiently low mean photon number ˆ n < 12 (1 + 5). 4. Homodyne Tomography in Presence of Additive Phase-Insensitive Noise In this section I consider the case of additive Gaussian noise, in the typical situation in which the noise is phase-insensitive. This kind of noise is described by a density matrix evolved by the master equation (34) ∂t Uˆ(t) = 2 AL[a† ] + BL[a] Uˆ(t) , . 1 † † where L[ˆ c] denotes the Lindblad super-operator L[ˆ c]ˆ U = cÛˆcˆ − 2 [ˆ c cˆ, Uˆ]+ . −iφ Due to the phase invariance L[ae ] = L[a] the dynamical evolution does not depend on the phase, and the noise is phase insensitive. From the U(t)] = gain ≡ Tr[aˆ U(0)] with evolution of the averaged field aout ≡ Tr[aˆ g = exp[(A − B)t], we can see that for A > B Eq. (34) describes phaseinsensitive amplification with field-gain g, whereas for B > A it describes phase-insensitive attenuation, with g < 1. Concretely, for A > B Eq. (34) models unsaturated parametric amplification with thermal idler [average photon number m ¯ = B/(A − B)], or unsaturated laser action [A and B proportional to atomic populations on the upper and lower lasing levels respectively]. For B > A, on the other hand, the same equation describes a field mode damped toward the thermal distribution [inverse photon lifetime Γ = 2(B − A), equilibrium photon number m ¯ = A/(B − A)], or a loss g < 1 along an optical fiber or at a beam-splitter, or even due to frequency conversion [6]. The borderline case A = B leaves the average field invariant, but introduces noise that changes the average photon number as a† aout = ¯ , where n ¯ = 2At. In this case the solution of Eq. (34) can be a† ain + n cast into the simple form 6 2 % & d β ˆ U(0)D ˆ † (β) . exp −|β|2 /¯ n D(β)ˆ (35) Uˆ(t) = π¯ n This is the Gaussian displacement noise studied in Refs. [13, 12] and commonly referred to as “thermal noise” [regarding the misuse of this terminology, see Ref. [12]], which can be used to model many kinds of undesired environmental effects, typically due to linear interactions with random classical fluctuating fields.

entire

December 28, 2004

13:56



506

Eq. (34) has the following simple Fokker-Planck differential representa¯ for ordering tion [8] in terms of the generalized Wigner function Ws (α, α) parameter s 2 ¯ t) = Q(∂α α + ∂α¯ α) ¯ + 2Ds ∂α, ¯ t) , (36) ∂t Ws (α, α; α ¯ Ws (α, α; where Q = B − A and 2Ds = A + B + s(A − B). For nonunit quantum efficiency η and after a noise-diffusion time t the homodyne probability distribution pη (x; φ; t) can be evaluated as the marginal distribution of the Wigner function for ordering parameter s = 1 − η −1 , namely 6 +∞ % & dyW1−η−1 (x + iy)eiφ , (x − iy)e−iφ ; t . (37) pη (x; φ; t) = −∞

The solution of Eq. (36) is the Gaussian convolution [8] 7 8 6 2 d β |α − gβ|2 ¯ 0) , ¯ t) = exp − Ws (β, β; Ws (α, α; πδs2 δs2 Ds δs2 = (1 − e−2Qt ) , Q

(38)

and using Eq. (37) one obtains the homodyne probability distribution 3 4 6 ∞ dx (x − g −1 x)2 Qt 9 pη (x; φ; t) = e exp − pη (x ; φ), (39) 2 2 2∆ −1 −∞ 1−η 2π∆1−η−1 2 where ∆2η = 12 g −2 δ1−η −1 . It is easy to see that the generating function of the x ˆφ -moments with the experimental probability pη (x; φ; t) can be written in term of the probability distribution p(x; φ) for perfect homodyning as follows 6 +∞ dx pη (x; φ; t)eikx = −∞

6 +∞ 1 2 2 2 1−η 2 2 g k dx p(x; φ)eigkx . (40) exp − g ∆η k − 2 8η −∞

Eq. (40) has the same form of Eq. (8), but with the Fourier variable k multiplied by g and with an overall effective quantum efficiency η∗ given by 2A (g −2 − 1) . (41) B−A On the other hand, following the same lines that lead us to Eq. (9), we obtain the operator identity 6 π 6 dφ +∞ dx pη∗ (x; φ; t)Kη∗ (g −1 x − x ˆφ ) , (42) Uˆ ≡ Uˆ(0) = π −∞ 0 η∗−1 = η −1 + 4∆2η = g −2 η −1 +

entire

December 28, 2004

13:56



507

ˆ one should use which also means that when homodyning the operator O −1 Rη∗ (g x; φ) in place of Rη (x; φ), namely, more generally, one needs to re-scale the homodyne outcomes by the gain and use the effective quantum efficiency η∗ in Eq. (41). In terms of the gain g and of the input-output photon numbers, the effective quantum efficiency reads η∗−1 = η −1 + g −2 (2a† aout + η −1 ) − (2a† ain + η −1 ) .

(43)

In the case of pure displacement Gaussian noise (A = B), Eq. (43) becomes η∗−1 = η −1 + 2¯ n,

(44)

which means that the bound η∗ > 1/2 is surpassed already for n ¯ ≥ 1: in other worlds, it is just sufficient to have half photon of thermal noise to completely spoil the tomographic reconstruction. References [1] R. Baltin, J. Phys. A Math. Gen., 16 2721 (1983); Phys. Lett. A, 102 332 (1984). [2] G.M. D’Ariano, Quantum Semiclass. Opt., 7, 693 (1995). [3] G.M. D’Ariano, “Quantum estimation theory and optical detection,” in T. Hakioglu and A.S. Shumovsky, eds., Concepts and advances in quantum optics and spectroscopy of solids, Kluwer, Amsterdam, 1996. [4] G.M. D’Ariano, “Measuring quantum states,” in the same book of Ref. ([3]). [5] G.M. D’Ariano, U. Leonhardt, and H. Paul, Phys. Rev. A, 52 R1801 (1995). [6] G.M. D’Ariano and C. Macchiavello, Phys. Rev. A, 48 3947 (1993). [7] G.M. D’Ariano, C. Macchiavello, and M.G.A. Paris, Phys. Rev. A, 50, 4298 (1994); Phys. Lett., 195A, 31 (1994). [8] G.M. D’Ariano, C. Macchiavello, and M.G.A. Paris, “Information gain in quantum communication channels,” In V.P. Belavkin, O. Hirota and R.L. Hudson, eds., Quantum Communication and Measurement, 339, Plenum Press, New York and London, 1995. [9] G.M. D’Ariano, C. Macchiavello, and N. Sterpi, “Systematic and statistical errors in homodyne measurements of the density matrix,” submitted to Phys. Rev. A. [10] G.M. D’Ariano and H.P. Yuen, Phys. Rev. Lett., 76 2832, (1996). [11] I.S. Gradshteyn and I.M. Ryzhik, Table of integrals, series, and products, Academic Press, 1980. [12] M. J. W. Hall, Phys. Rev. A, 50 3295 (1994). [13] M.J.W. Hall, “Phase and noise,” in V.P. Belavkin, O. hirota, and R.L. Hudson, eds., Quantum Communication and Measurement, 53-59, Plenum Press, New York and London, 1995. [14] U. Leonhardt, M. Munroe, T. Kiss, Th. Richter, and M.G. Raymer, Opt. Comm., 127, 144 (1996).

entire

December 28, 2004

508

13:56



[15] U. Leonhardt, H. Paul and G.M. D’Ariano, Phys. Rev. A, 52 4899 (1995); H. Paul, U. Leonhardt, and G.M. D’Ariano, Acta Phys. Slov., 45, 261 (1995). [16] M. Munroe, D. Boggavarapu, M. E. Anderson, and M. G. Raymer, Phys. Rev. A, 52, R924 (1995). [17] S. Schiller, G. Breitenbach, S.F. Pereira, T. MHuller, and J. Mlynek, Phys. Rev. Lett., 77 2933 (1996); see also: G. Breitenbach, S. Schiller, and J. Mlynek, Quantum state reconstruction of coherent light and squeezed light, on this volume. [18] D.T. Smithey, M. Beck, M.G. Raymer, and A. Faridani, Phys. Rev. Lett., 70, 1244 (1993). [19] Th. Richter, Phys. Lett. A, 221 327 (1996). [20] Th. Richter, Phys. Rev. A, 53 1197 (1996). [21] H. P. Yuen, Phys. Lett., 91A, 101 (1982). [22] Notice that for the complex random variable z = 2eiφ x the phase φ is a scanning parameter imposed by the detector. (Actually, the best way to experimentally scan the integral in Eq. (18) is just to pick up the phase φ at random.) Nevertheless, the argument of the complex number z is still a genuine random variable, because the sign of x is random, and depends on the value of φ. One has arg(z) = φ + π(1 − sgn(x)). For example, for any highly excited coherent state |α the probability distribution of arg(z) will approach a uniform distribution on [arg(α) − π/2, arg(α) + π/2]. [23] One should remember that, the phase φ is imposed by the detector, and is uniformly scanned (randomly or not) in the interval [0, π]. This leads to e2iφ = 0, independently on the state ,ˆ.

entire

December 28, 2004

13:56


CHAPTER 34

On the Measurement of Qubits Daniel F. V. James, Paul G. Kwiat, William J. Munro, and Andrew G. White Abstract. We describe in detail the theory underpinning the measurement of density matrices of a pair of quantum two-level systems (“qubits”). Our particular emphasis is on qubits realized by the two polarization degrees of freedom of a pair of entangled photons generated in a down-conversion experiment; however, the discussion applies in general, regardless of the actual physical realization. Two techniques are discussed, namely, a tomographic reconstruction (in which the density matrix is linearly related to a set of measured quantities) and a maximum likelihood technique which requires numerical optimization (but has the advantage of producing density matrices which are always non-negative definite). In addition, a detailed error analysis is presented, allowing errors in quantities derived from the density matrix, such as the entropy or entanglement of formation, to be estimated. Examples based on down-conversion experiments are used to illustrate our results.

1. Introduction The ability to create, manipulate, and characterize quantum states is becoming an increasingly important area of physical research, with implications for areas of technology such as quantum computing, quantum cryptography, and communications. With a series of measurements on a large enough number of identically prepared copies of a quantum system, one can infer, to a reasonable approximation, the quantum state of the system. Arguably, the first such experimental technique for determining the state of a quantum system was devised by George Stokes in 1852 [23]. His famous four parameters allow an experimenter to determine uniquely the polarization state of a light beam. With the insight provided by nearly 150 years of progress in optical physics, we can consider coherent light beams to be an Reprinted (abstract/excerpt/figure) with permission from Daniel F.V. James, Paul G. c Kwiat, William J. Munro, and Andrew G. White, Phys. Rev. A, 64, 052312, 2001. 2001 by The American Physical Society 509

entire

December 28, 2004

13:56


510 Daniel F. V. James, Paul G. Kwiat, William J. Munro, and Andrew G. White

ensemble of two-level quantum mechanical systems, the two levels being the two polarization degrees of freedom of the photons; the Stokes parameters allow one to determine the density matrix describing this ensemble. More recently, experimental techniques for the measurement of the more subtle quantum properties of light have been the subject of intensive investigation (see Ref. [15] for a comprehensive and erudite exposition of this subject). In various experimental circumstances it has been found reasonably straightforward to devise a simple linear tomographic technique in which the density matrix (or Wigner function) of a quantum state is found from a linear transformation of experimental data. However, there is one important drawback to this method, in that the recovered state might not correspond to a physical state because of experimental noise. For example, density matrices for any quantum state must be Hermitian, positive semi-definite matrices with unit trace. The tomographically measured matrices often fail to be positive semi-definite, especially when measuring low-entropy states. To avoid this problem the “maximum likelihood” tomographic approach to the estimation of quantum states has been developed [8, 24, 2, 9, 20]. In this approach the density matrix that is “mostly likely” to have produced a measured data set is determined by numerical optimization. In the past decade several groups have successfully employed tomographic techniques for the measurement of quantum mechanical systems. In 1990 Risley et al. reported the measurement of the density matrix for the nine sublevels of the n = 3 level of hydrogen atoms formed following collision between H+ ions and He atoms, in conditions of high symmetry which simplified the tomographic problem [1]. Since then, in 1993 Smithey et al. made a homodyne measurement of the Wigner function of a single mode of light [19]. Other explorations of the quantum states of single mode light fields have been made by Mlynek et al. [11] and Bachor et al.[30]. Other quantum systems whose density matrices have been investigated experimentally include the vibrations of molecules [6], the motion ions and atoms [14, 10], and the internal angular momentum quantum state of the F = 4 ground state of a cesium atom [12]. The quantum states of multiple spin1 2 nuclei have been measured in the high-temperature regime using NMR techniques [5], albeit in systems of such high entropy that the creation of entangled states is necessarily precluded [4]. The measurement of the quantum state of entangled qubit pairs, realized using the polarization degrees of freedom of a pair of photons created in a parametric down-conversion experiment, was reported by us recently [26]. In this paper we will examine techniques in detail for quantum state

entire

December 28, 2004

13:56


On the Measurement of Qubits

511

measurement as it applies to multiple correlated two-level quantum mechanical systems (or “qubits” in the terminology of quantum information). Our particular emphasis is qubits realized via the two polarization degrees of freedom of photons, data from which we use to illustrate our results. However, these techniques are readily applicable to other technologies proposed for creating entangled states of pairs of two-level systems. Because of the central importance of qubit systems to the emergent discipline of quantum computation, a thorough explanation of the techniques needed to characterize the qubit states will be of relevance to workers in the various diverse experimental fields currently under consideration for quantum computation technology [31]. This paper is organized as follows: In Section 2 we explore the analogy with the Stokes parameters, and how they lead naturally to a scheme for measurement of an arbitrary number of two-level systems. In Section 3, we discuss the measurement of a pair of qubits in more detail, presenting the validity condition for an arbitrary measurement scheme and introducing the set of 16 measurements employed in our experiments. Section 4 deals with our method for maximum likelihood reconstruction and in Section 5 we demonstrate how to calculate the errors in such measurements, and how these errors propagate to quantities calculated from the density matrix.

2. The Stokes Parameters and Quantum State Tomography As mentioned above, there is a direct analogy between the measurement of the polarization state of a light beam and the measurement of the density matrix of an ensemble of two-level quantum mechanical systems. Here we explore this analogy in more detail.

2.1. Single qubit tomography The Stokes parameters are defined from a set of four intensity measurements [7]: (i) with a filter that transmits 50% of the incident radiation, regardless of its polarization; (ii) with a polarizer that transmits only horizontally polarized light; (iii) with a polarizer that transmits only light polarized at 45o to the horizontal; and (iv) with a polarizer that transmits only right circularly polarized light. The number of photons counted by a detector, which is proportional to the classical intensity, in these four experiments

entire

December 28, 2004

13:56



are as follows: N N (H|ˆ ρ|H + V |ˆ ρ|V ) = (R|ˆ ρ|R + L|ˆ ρ|L) , n0 = 2 2 N n1 = N (H|ˆ (R|ˆ ρ|R + R|ˆ ρ|L + L|ˆ ρ|R + L|ˆ ρ|L) , ρ|H) = 2 % & ¯ ρ|D ¯ = N (R|ˆ n2 = N D|ˆ ρ|R + L|ˆ ρ|L − iL|ˆ ρ|R + iR|ˆ ρ|L) , 2 n3 = N (R|ˆ ρ|R) . (1) √ √ ¯ = (|H − |V ) / 2 = exp(iπ/4) (|R + i|L) / 2 and Here |H, |V , |D √ |R = (|H − i|V ) / 2 are the kets representing qubits polarized in the linear horizontal, linear vertical, linear diagonal (45o ), and right-circular senses respectively, ρˆ is the (2 × 2) density matrix for the polarization degrees of the light (or for a two-level quantum system), and N is a constant dependent on the detector efficiency and light intensity. The Stokes parameters, which fully characterize the polarization state of the light, are then defined by ρ|R + L|ˆ ρ|L) , S0 ≡ 2n0 = N (R|ˆ S1 ≡ 2 (n1 − n0 ) = N (R|ˆ ρ|L + L|ˆ ρ|R) , ρ|L − L|ˆ ρ|R) , S2 ≡ 2 (n2 − n0 ) = N i (R|ˆ ρ|R − L|ˆ ρ|L) . S3 ≡ 2 (n3 − n0 ) = N (R|ˆ

(2)

We can now relate the Stokes parameters to the density matrix ρˆ by the formula 3 1 Si ρˆ = σ î , (3) 2 i=0 S0 where σ ˆ0 = |RR| + |LL| is the single qubit identity operator and σ ˆ1 = ˆ3 = |RR| − |LL| are |RL| + |LR|, σ ˆ2 = i(|LR| − |RL|, and σ the Pauli spin operators. Thus the measurement of the Stokes parameters can be considered equivalent to a tomographic measurement of the density matrix of an ensemble of single qubits. 2.2. Multiple beam Stokes parameters: Multiple qubit tomography The generalization of the Stokes scheme to measure the state of multiple photon beams (or multiple qubits) is reasonably straightforward. One should, however, be aware that importance differences exist between the one-photon and the multiple photon cases. Single photons, at least in the

entire

December 28, 2004

13:56



513

current context, can be described in a purely classical manner, and the density matrix can be related to the purely classical concept of the coherency matrix [28]. For multiple photons one has the possibility of nonclassical correlations occurring, with quintessentially quantum mechanical phenomena such as entanglement being present. We will return to the concept of entanglement and how it may be measured later in this paper. An n-qubit state is characterized by a density matrix which may be written as follows: ρˆ =

1 2n

3

ri1 ,i2 ,...in σ î1 ⊗ σ î2 ⊗ . . . ⊗ σ în ,

(4)

i1 ,i2 ,...in =0

where the 4n parameters ri1 ,i2 ,...in are real numbers. The normalization property of the density matrices requires that r0,0,...0 = 1, and so the density matrix is specified by 4n − 1 real parameters. The symbol ⊗ represents the tensor product between operators acting on the Hilbert spaces associated with the separate qubits. As Stokes showed, the state of a single qubit can be determined by taking a set of four projection measurements which are represented by the four ¯ D|, ¯ µ ˆ1 = |HH|, µ ˆ2 = |D ˆ3 = |RR|. operators µ ˆ0 = |HH| + |V V |, µ Similarly, the state of two qubits can be determined by the set of 16 meaˆ j (i, j = 0, 1, 2, 3). More genersurements represented by the operators µ î ⊗ µ ally the state of an n-qubit system can be determined by 4n measurements ˆ i2 ⊗. . .⊗ µ în (ik = 0, 1, 2, 3 and k = 1, 2, . . . n). given by the operators µ î1 ⊗ µ This “tree” structure for multi-qubit measurement is illustrated in Fig. 1. µ^ 0

1 qubit 2 qubits

3 qubits

µ^ 0

µ^ 1

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 2

µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3 •• •

µ^ 1 µ^ 0

µ^ 1

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 2 µ^ 2

µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0

µ^ 1

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 3 µ^ 2

µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0

µ^ 1

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 2

µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

µ^ 0 µ^ 1µ^ 2µ^ 3

................................................................................................................................................................................................................................................

Fig. 1. Tree diagram representing number and type of measurements necessary for toˆ1 , µ ˆ2 , µ ˆ3 } suffice to reconstruct mography. For a single qubit, the measurements {ˆ µ0 , µ the state, e.g., measurements of the horizontal, vertical, diagonal and right-circular polarization components, (H,V,D,R). For two qubits, 16 double-coincidence measurements ˆ0 , µ ˆ0 µ ˆ1 , . . . , µ ˆ3 µ ˆ3 }), increasing to 64 three-coincidence measurements are necessary ({ˆ µ0 µ ˆ0 µ ˆ0 , µ ˆ0 µ ˆ0 µ ˆ1 , . . . , µ ˆ3 µ ˆ3 µ ˆ3 }), and so on, as shown. for three qubits ({ˆ µ0 µ

entire

December 28, 2004

13:56



The proof of this conjecture is reasonably straightforward. The outcome of a measurement is given by the formula n = N Tr {ˆ ρµ ˆ} ,

(5)

where ρˆ is the density matrix, µ ˆ is the measurement operator, and N is a constant of proportionality which can be determined from the data. Thus in our n-qubit case the outcomes of the various measurement are ρ (ˆ µi1 ⊗ µ î2 ⊗ . . . ⊗ µ în )} . ni1 ,i2 ,...in = N Tr {ˆ

(6)

Substituting from Eq. (4) we obtain ni1 ,i2 ,...in

N = n 2

3

Tr {ˆ µi1 σ ˆj1 } Tr {ˆ µi2 σ ˆj2 } . . . Tr {ˆ µin σ ˆjn } ri1 ,i2 ,...in .

j1 ,j2 ,...jn =0

(7) As can be easily verified, the single qubit measurement operators µ î are 3 î = j=0 Υij σ ˆj , where linear combinations of the Pauli operators σ ˆj , i.e., µ Υij are the elements of the matrix   1 0 0 0  1/2 1/2 0 0   Υ= (8)  1/2 0 1/2 0  . 1/2 0 0 1/2 ˆj } = 2δij (where δij is the Kronecker Further, we have the relation Tr {ˆ σi σ delta). Hence Eq. (7) becomes 3

ni1 ,i2 ,...in = N

Υi1 j1 Υi2 j2 . . . Υin jn ri1 ,i2 ,...in .

(9)

j1 ,j2 ,...jn =0

Introducing the left inverse of the matrix Υ, defined so that 3 −1 )ik Υkj = δij and whose elements are k=0 (Υ   1 000  −1 2 0 0   (10) Υ−1 =   −1 0 2 0  , −1 0 0 2 we can find a formula for the parameters ri1 ,i2 ,...in in terms of the measured quantities ni1 ,i2 ,...in , viz., N ri1 ,i2 ,...in =

3

% % −1 & % −1 & & . . . Υ−1 in jn ni1 ,i2 ,...in Υ Υ i1 j1 i2 j2

j1 ,j2 ,...jn =0

≡ Si1 ,i2 ,...in .

(11)

entire

December 28, 2004

13:56



515

In Eq. (11) we have introduced the n-photon Stokes parameter Si1 ,i2 ,...in , defined in an analogous manner to the single photon Stokes parameters given in Eq. (2). Since, as already noted, r0,0,...0 = 1, we can make the identification S0,0,...0 = N , and so the density matrix for the n-qubit system can be written in terms of the Stokes parameters as follows: ρˆ =

1 2n

3 i1 ,i2 ,...in

Si1 ,i2 ,...in σ î1 ⊗ σ î2 ⊗ . . . ⊗ σ în . S0,0,...0 =0

(12)

This is a recipe for measurement of the density matrices which, assuming perfect experimental conditions and the complete absence of noise, will always work. It is important to realize that the set of four Stokes measureˆ1 , µ ˆ2 , µ ˆ3 } are not unique: there may be circumstances in which ments {ˆ µ0 , µ it is more convenient to use some other set, which is equivalent. A more ˆ1 = |V V |, typical set, at least in optical experiments, is µ ˆ0 = |HH|, µ ˆ3 = |RR|. µ ˆ2 = |DD|, µ In the following section we will explore more general schemes for the measurement of two qubits, starting with a discussion, in some detail, of how the measurements are actually performed. 3. Generalized Tomographic Reconstruction of the Polarization State of Two Photons 3.1. Experimental set-up The experimental arrangement used in our experiments is shown schematically in Fig. 2. An optical system consisting of lasers, polarization elements, and nonlinear optical crystals (collectively characterized for the purposes of this paper as a “black-box,”) is used to generate pairs of qubits in an almost arbitrary quantum state of their polarization degrees of freedom. A full description of this optical system and how such quantum states can be prepared can be found in Ref. [13, 27, 3] ∗ . The output of the black box consists of a pair of beams of light, whose quanta can be measured by means of photodetectors. To project the light beams onto a polarization state of the experimenter’s choosing, three optical elements are placed in the beam ∗ It

is important to realize that the entangled photon pairs are produced in a nondeterministic manner: one cannot specify with certainty when a photon pair will be emitted; indeed there is a small probability of generating four or six or higher number of photons. Thus we can only postselectively generate entangled photon pairs: i.e., one only knows that the state was created after if has been measured.

entire

December 28, 2004

13:56



in front of each detector: a polarizer (which transmits only vertically polarized light), a quarter-wave plate, and a half-wave plate. The angles of the fast axes of both of the wave plates can be set arbitrarily, allowing the |V projection state fixed by the polarizer to be rotated into any polarization state that the experimenter may wish.

photon detector polarizer

“Black Box” source of photon pairs in arbitrary quantum states

QWP HWP

Coincidence Detector output

polarizer photon detector

Fig. 2. Schematic illustration of the experimental arrangement. QWP stands for quarter-wave plate, HWP for half-wave plate; the angles of both pairs of wave plates can be set independently giving the experimenter four degrees of freedom with which to set the projection state. In the experiment, the polarizers were realized using polarizing prisms, arranged to transmit vertically polarized light.

Using the Jones calculus notation, with the convention, 0 1 = |V, = |H, 1 0

(13)

where |V (|H) is the ket for a vertically (horizontally) polarized beam, the effect of quarter- and half-wave plates whose fast axes are at angles q and h with respect to the vertical axis, respectively, are given by the 2 × 2 matrices i−cos(2q) sin(2q) ˆQW P (q) = √1 ˆHW P (q) = cos(2h) −sin(2h) . U , U −sin(2h) −cos(2h) sin(2q) i+cos(2q) 2 Thus the projection state for the measurement in one of the beams is given by (1) ˆQW P (q) · U ˆHW P (h) · 0 = a(h, q)|H + b(h, q)|V, |ψproj (h, q) = U 1

entire

December 28, 2004

13:56



517

where, neglecting an overall phase, the functions a(h, q) and b(h, q) are given by 1 a(h, q) = √ (sin(2h) − i sin[2(h − q)]) , 2 1 b(h, q) = − √ (cos(2h) + i cos[2(h − q)]) . (14) 2 The projection state for the two beams is given by (2)

(1)

(1)

|ψproj (h1 , q1 , h2 , q2 ) = |ψproj (h1 , q1 ) ⊗ |ψproj (h2 , q2 ) = a(h1 , q1 )a(h2 , q2 )|HH + a(h1 , q1 )b(h2 , q2 )|HV +b(h1 , q1 )a(h2 , q2 )|VH + b(h1 , q1 )b(h2 , q2 )|VV.

(15)

We shall denote the projection state corresponding to one particular set of wave plate angles {h1,ν , q1,ν , h2,ν , q2,ν } by the ket |ψν ;† thus the projection measurement is represented by the operator µ ˆν = |ψν ψν |. Consequently, the average number of coincidence counts that will be observed in a given experimental run is ρ|ψν nν = N ψν |ˆ

(16)

where ρˆ is the density matrix describing the ensemble of qubits, and N is a constant dependent on the photon flux and detector efficiencies. In what follows, it will be convenient to consider the quantities sν defined by ρ|ψν . sν = ψν |ˆ

(17)

3.2. Tomographically complete set of measurements In Section 2 we have given one possible set of projection measurements {|ψν ψν |} which uniquely determine the density matrix ρˆ. However, one can conceive of situations in which these will not be the most convenient set of measurements to make. Here we address the problem of finding other sets of suitable measurements. The smallest number of states required for such measurements can be found by a simple argument: there are 15 real unknown parameters which determine a 4 × 4 density matrix, plus there is the single unknown real parameter N , making a total of 16. In order to proceed it is helpful to convert the 4 × 4 matrix ρˆ into a 16-dimensional column vector. To do this we use a set of 16 linearly † Here the first subscript on the wave plate angle refers one of the two photon beams; the second subscript distinguishes which of the 16 different experimental states is under consideration.

entire

December 28, 2004

13:56



ˆ ν } which have the following mathematical independent 4 × 4 matrices {Γ properties:

ˆν · Γ ˆ µ = δν,µ , Tr Γ

16

Aˆ =

ˆ ν · Aˆ ˆ ν Tr Γ Γ

ˆ ∀A,

(18)

ν=1

ˆ ν matrices is in fact where Aˆ is an arbitrary 4 × 4 matrix. Finding a set of Γ reasonably straightforward: for example, the set of (appropriately normalized) generators of the Lie algebra SU(2)⊗ SU(2) fulfill the required criteria (for reference, we list this set in Appendix A). These matrices are of course ˆj (i, j = 0, 1, 2, 3) simply a relabeling of the two-qubit Pauli matrices σ î ⊗ σ discussed above. Using these matrices the density matrix can be written as ρˆ =

16

ˆ ν rν , Γ

(19)

ν=1

where rν is the νth element of a 16-element column vector, given by the formula

ˆ ν · ρˆ (20) rν = Tr Γ Substituting Eq. (19) into Eq. (16), we obtain the following linear relationship between the measured coincidence counts nν and the elements of the vector rµ : nν = N

16

Bν,µ rµ

(21)

µ=1

where the 16 × 16 matrix Bν,µ is given by ˆ µ |ψν . Bν,µ = ψν |Γ

(22)

Immediately we find a necessary and sufficient condition for the completeness of the set of tomographic states {|ψν }: if the matrix Bν,µ is nonsingular, then Eq. (21) can be inverted to give rν = (N )−1

16 % −1 & B n . ν,µ µ

(23)

µ=1

The set of 16 tomographic states that we employed are given in Table 1. They can be shown to satisfy the condition that Bν,µ is nonsingular. By no means are these states unique in this regard: these were the states chosen principally for experimental convenience.

entire

December 28, 2004

13:56



ν 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mode 1 |H |H |V |V |R |R |D |D |D |D |R |H |V |V |H |R

Mode 2 |H |V |V |H |H |V |V |H |R |D |D |D |D |L |L |L

h1 45o 45o 0 0 22.5o 22.5o 22.5o 22.5o 22.5o 22.5o 22.5o 45o 0 0 45o 22.5o

q1 0 0 0 0 0 0 45o 45o 45o 45o 0 0 0 0 0 0

519

h2 45o 0 0 45o 45o 0 0 45o 22.5o 22.5o 22.5o 22.5o 22.5o 22.5o 22.5o 22.5o

q2 0 0 0 0 0 0 0 0 0 45o 45o 45o 45o 90o 90o 90o

Table 1: The tomographic analysis states used in our experiments. The number of coincidence counts measured in projections measurements provide a set of 16 data that allow the density matrix of the √ state of the two modes √ to be estimated. We have used √ the notation |D ≡ (|H + |V) / 2, |L ≡ (|H + i|V) / 2 and |R ≡ (|H − i|V) / 2. Note that, when the measurement are taken in the order given by the table, only one wave plate angle has to be changed between measurements.

These states can be realized by setting specific values of the half- and quarter-wave plate angles. The appropriate values of these angles (measured from the vertical) are given in Table 1. Note that overall phase factors do not affect the results of projection measurements. Substituting Eq. (23) into Eq. (19), we find that ρˆ = (N )−1

16

ˆ ν nν = M

ν=1

16

ˆ ν sν , M

(24)

ν=1

ˆ ν are defined by where the sixteen 4 × 4 matrices M ˆν = M

16 % −1 & ˆ . Γ B ν,µ µ

(25)

ν=1

ˆ ν matrices allows a compact form of linear toThe introduction of the M mographic reconstruction, Eq. (24), which will be most useful in the error

entire

December 28, 2004

13:56



ˆ ν matrices, valid for our set of tomographic analysis that follows. These M states, are listed in Appendix B, together with some of their important properties. We can use one of these properties, Eq. (80), to obtain the value of the unknown quantity N . That relationship implies ˆ ν }|ψν ψν |ˆ Tr{M ρ = ρˆ. (26) ν

Taking the trace of this formula, and multiplying by N we obtain: ˆ ν }nν = N . Tr{M

(27)

ν

For our set of tomographic states, it can be shown that ˆ ν } = 1 if ν = 1, 2, 3, 4 Tr{M 0 if ν = 5, . . . 16;

(28)

hence the value of the unknown parameter N in our experiments is given by: N =

4

nν = N (HH|ˆ ρ|HH + HV |ˆ ρ|HV + V H|ˆ ρ|V H + V V |ˆ ρ|V V ).

ν=1

Thus we obtain the final formula for the tomographic reconstruction of the density matrices of our states: 16 !J 4 ! ˆ ν nν M ρˆ = nν . (29) ν=1

ν=1

As an example, the following set of 16 counts were taken for the purpose of tomographically determining the density matrix for an ensemble of qubits all prepared in a specific quantum state: n1 = 34749, n2 = 324, n3 = 35805, n4 = 444, n5 = 16324, n6 = 17521, n7 = 13441, n8 = 16901, n9 = 17932, n10 = 32028, n11 = 15132, n12 = 17238, n13 = 13171, n14 = 17170, n15 = 16722, n16 = 33586. Applying Eq. (29) we find 

 0.4872 −0.0042 + i0.0114 −0.0098 − i0.0178 0.5192 + i0.0380  −0.0042 − i0.0114 0.0045 0.0271 − i0.0146 −0.0648 − i0.0076  . ρˆ =   −0.0098 + i0.0178 0.0271 + i0.0146 0.0062 −0.0695 + i0.0134  0.5192 − i0.0380 −0.0648 + i0.0076 −0.0695 − i0.0134 0.5020

(30) This matrix is shown graphically in Fig. 3(left). Note that, by construction, the density matrix is normalized, i.e., Tr{ρˆ} = 1 and Hermitian, i.e., ρˆ† = ρˆ . However, when one calculates the

entire

December 28, 2004

13:56



521

eigenvalues of this measured density matrix, one finds the values 1.02155, 0.0681238, -0.065274, and -0.024396; and also, Tr{ρˆ2 } = 1.053. Density matrices for all physical states must have the property of positive semidefiniteness, which (in conjunction with the normalization and Hermiticity properties) implies that all of the eigenvalues must lie in the interval [0, 1], their sum being 1; this in turn implies that 0 ≤ Tr{ρˆ2 } ≤ 1. Clearly, the density matrix reconstructed above by linear tomography violates these condition. From our experience of tomographic measurements of various mixed and entangled states prepared experimentally, this seems to happen roughly 75% of the time for low entropy, highly entangled states; it seems to have a higher probability of producing the correct result for states of higher entropy, but the cautious experimenter should check every time. The obvious culprit for this problem is experimental inaccuracies and statistical fluctuations of coincidence counts, which mean that the actual numbers of counts recorded in a real experiment differ from those that can be calculated by Eq. (16). Thus the linear reconstruction is of limited value for states of low entropy (which are of most experimental interest because of their application to quantum information technology); however, as we shall see, the linear approach does provide a useful starting point for the numerical optimization approach to density matrix estimation which we will discuss in the next section. 4. Maximum Likelihood Estimation As mentioned in Section 3, the tomographic measurement of density matrices can produce results which violate important basic properties such as positivity. To avoid this problem, the maximum likelihood estimation of density matrices may be employed. Here we describe a simple realization of this technique. 4.1. Basic approach Our approach to the maximum likelihood estimation of the density matrix is as follows. (i) Generate a formula for an explicitly “physical” density matrix, i.e., a matrix that has the three important properties of normalization, Hermiticity, and positivity. This matrix will be a function of 16 real variables (denoted {t1 , t2 , . . . , t16 }). We will denote the matrix as ρˆp (t1 , t2 , . . . , t16 ). (ii) Introduce a “likelihood function” which quantifies how good the density matrix ρˆp (t1 , t2 , . . . t16 ) is in relation to the experimental data.

entire

December 28, 2004

13:56



This likelihood function is a function of the 16 real parameters tν and of the 16 experimental data nν . We will denote this function as L(t1 , t2 , . . . , t16 ; n1 , n2 , . . . , n16 ). (iii) Using standard numerical optimization techniques, find the op(opt) (opt) (opt) timum set of variables {t1 , t2 , . . . , t16 } for which the function L(t1 , t2 , . . . , t16 ; n1 , n2 , . . . n16 ) has its maximum value. The best estimate (opt) (opt) (opt) for the density matrix is then ρˆ(t1 , t2 , . . . , t16 ). The details of how these three steps can be carried out are described in the next three subsections. 4.2. Physical density matrices The property of non-negative definiteness for any matrix Gˆ is written mathematically as ˆ ψ|G|ψ ≥0

∀|ψ.

(31)

Any matrix that can be written in the form Gˆ = Tˆ † Tˆ must be non-negative definite. To see that this is the case, substitute into Eq. (31) ψ|Tˆ† Tˆ |ψ = ψ |ψ ≥ 0,

(32)

where we have defined |ψ = Tˆ |ψ. Furthermore, (Tˆ † Tˆ )† = Tˆ † (Tˆ † )† = Tˆ † Tˆ , i.e., Gˆ = Tˆ † Tˆ must be Hermitian. To ensure normalization, one can simply divide by the trace: thus the matrix gˆ given by the formula gˆ = Tˆ † Tˆ/Tr{Tˆ † Tˆ }

(33)

has all three of the mathematical properties that we require for density matrices. For the two-qubit system, we have a 4 × 4 density matrix with 15 independent real parameters. Since it will be useful to be able to invert relation (33), it is convenient to choose a tridiagonal form for Tˆ : 

0 0 t1  t5 + it6 t 0 2 Tˆ(t) =   t11 + it12 t7 + it8 t3 t15 + it16 t13 + it14 t9 + it10

 0 0 . 0 t4

(34)

Thus the explicitly “physical” density matrix ρˆp is given by the formula ρˆp (t) = Tˆ † (t)Tˆ (t)/Tr{Tˆ † (t)Tˆ (t)}.

(35)

entire

December 28, 2004

13:56



523

For future reference, the inverse relationship, by which the elements of Tˆ can be expressed in terms of the elements of ρˆ, is as follows:  9  ∆ 0 0 0 (1) M11       @   (1) (1) M11  9 M12 0 0  (2)   (1) (2) M11,22  M11 M11,22    (36) Tˆ =  . @   (2) (2) (2)   M12,23 M11,23 M11,22  0   √ρ 9 M(2) √ρ 9 M(2)  ρ44 44  44  11,22 11,22     √ √ρ41 √ρ42 √ρ43 ρ 44 ρ44 ρ44 ρ44 (1)

Here we have used the notation ∆ = Det(ˆ ρ); Mij is the first minor of ρˆ, i.e., the determinant of the 3 × 3 matrix formed by deleting the ith row and (2) j-th column of ρˆ; Mij,kl is the second minor of ρˆ, i.e., the determinant of the 2 × 2 matrix formed by deleting the ith and kth rows and jth and lth columns of ρˆ (i "= k and j "= l). 4.3. The likelihood function The measurement data consists of a set of 16 coincidence counts nν (ν = ρ|ψν . Let us assume 1, 2, . . . , 16) whose expected value is n ¯ ν = N ψν |ˆ that the noise on these coincidence measurements has a Gaussian probability distribution. Thus the probability of obtaining a set of 16 counts {n1 , n2 , . . . , n16 } is 7 8 16 ¯ ν )2 1 (nν − n exp − P (n1 , n2 , . . . , n16 ) = , (37) Nnorm ν=1 2σν2 where σν is the standard deviation for νth coincidence measurement (given √ ¯ ν ) and Nnorm is the normalization constant. For our approximately by n physical density matrix ρˆp the number of counts expected for the νth measurement is ρp (t1 , t2 , . . . , t16 ) |ψν . n ¯ ν (t1 , t2 , . . . , t16 ) = N ψν |ˆ

(38)

Thus the likelihood that the matrix ρˆp (t1 , t2 , . . . , t16 ) could produce the measured data {n1 , n2 , . . . , n16 } is P (n1 , n2 , . . . , n16 ) = 1 Nnorm

7 8 (N ψν |ˆ ρp (t1 , t2 , . . . , t16 ) |ψν − nν )2 exp − , 2N ψν |ˆ ρp (t1 , t2 , . . . , t16 ) |ψν ) ν=1 16

(39)

entire

December 28, 2004

13:56



4 where N = ν=1 Nν . Rather than find maximum value of P (t1 , t2 , . . . , t16 ) it simplifies things somewhat to find the maximum of its logarithm (which is mathematically equivalent).‡ Thus the optimization problem reduces to finding the minimum of the following function: L (t1 , t2 , . . . , t16 ) =

16 (N ψν |ˆ ρp (t1 , t2 , . . . , t16 ) |ψν − nν )2 ν=1

2N ψν |ˆ ρp (t1 , t2 , . . . , t16 ) |ψν )

.

(40)

This is the “likelihood” function that we employed in our numerical optimization routine. 4.4. Numerical optimization We used the Mathematica 4.0 routine FindMinimum which executes a multidimensional Powell direction set algorithm (see Ref. [18] for a description of this algorithm). To execute this routine, one requires an initial estimate for the values of t1 , t2 , . . . , t16 . For this, we used the tomographic estimate of the density matrix in the inverse relation (36), allowing us to determine a set of values for t1 , t2 , . . . , t16 . Since the tomographic density matrix may not be nonnegative definite, the values of the tν ’s deduced in this manner are not necessarily real. Thus for our initial guess we used the real parts of the tν ’s deduced from the tomographic density matrix. For the example given in Section 2, the maximum likelihood estimate is 

 0.5069 −0.0239 + i0.0106 −0.0412 − i0.0221 0.4833 + i0.0329  −0.0239 − i0.0106 0.0048 0.0023 + i0.0019 −0.0296 − i0.0077  . ρˆ =   −0.0412 + i0.0221 0.0023 − i0.0019 0.0045 −0.0425 + i0.0192  0.4833 − i0.0329 −0.0296 + i0.0077 −0.0425 − i0.0192 0.4839

(41) This matrix is illustrated in Fig. 3(right). In this case, the matrix has eigenvalues 0.986022, 0.0139777, 0, and 0; and Tr{ρˆ2 } = 0.972435, indicating that, while the linear reconstruction gave a nonphysical density matrix, the maximum likelihood reconstruction gives a legitimate density matrix. 5. Error Analysis In this section we present an analysis of the errors inherent in the tomographic scheme described in Section 3. Two sources of error are found to ‡ Note

that here we neglect the dependence of the normalization constant on t1 , t2 , . . . , t16 , which only weakly affects solution for the most likely state.

entire

December 28, 2004

13:56



525

0.6

0.6

0.4

0.4 0.2

HH

0.2

HV

0

HH HV

0

VH

VH HH

HH VH

VH

VV

HH

VV

HH VH

VH

0.6

0.6

0.4

0.4 0.2

HH

0.2

HV

0

HH HV

0

VH

VH HH

HH VH

VH

VV

HH

VV

HH VH

VH

Fig. 3. Graphical representation of the density matrix of a state as estimated by linear tomography (left) and by maximum likelihood tomography (right) from the experimental data given in the text. The upper plot is the real part of ρˆ, the lower plot the imaginary part.

be important: the shot noise error in the measured coincidence counts nν and the uncertainty in the settings of the angles of the wave plates used to make the tomographic projection states. We will analyze these two sources separately. In addition to determining the density matrix of a pair of qubits, one is often also interested in quantities derived from the density matrix, such as the entropy or the entanglement of formation. For completeness, we will also derive the errors in some of these quantities. 5.1. Errors due to count statistics From Eq. (29) we see that the density matrix is specified by a set of 16 parameters sν defined by sν = nν /N ,

4

(42)

where nν are the measured coincidence counts and N = ν=1 nν . We can determine the errors in sν using the following formula [16]: 16 ∂sν ∂sµ δsν δsµ = δnλ δnκ , (43) ∂nλ ∂nκ λ,κ=1

entire

December 28, 2004

13:56



where the overbar denotes the ensemble average of the random uncertainties δsν and δnλ . The measured coincidence counts nλ are statistically independent Poissonian random variables, which implies the following relation: δnλ δnκ = nλ δλ,κ ,

(44)

where δλ,κ is the Kronecker delta. Taking the derivative of Eq. (42), we find that ∂sµ 1 nµ = δµν − 2 Dν , ∂nν N N where Dν =

4 λ=1

(45)

  1 if 1 ≤ ν ≤ 4 δλ,ν =



(46) 0 if 5 ≤ ν ≤ 16.

Substituting Eq. (45) into Eq. (43) and using Eq. (44), we obtain the result nµ nν nµ δsν δsµ = 2 δν,µ + (1 − Dµ − Dν ). (47) N N3 In most experimental circumstances N g1, and so the second term in Eq. (47) is negligibly small in comparison to the first. We shall therefore ignore it, and use the approximate expression in the subsequent discussion: nµ sµ δν,µ . δsν δsµ ≈ 2 δν,µ ≡ (48) N N 5.2. Errors due to angular settings uncertainties Using the formula (17) for the parameters sν we can find the dependence of the measured density matrix on errors in the tomographic states. The derivative of sν with respect to some generic wave plate settings angle θ is ∂ ∂ ∂sν = ψν | ρˆ|ψν + ψν |ˆ |ψν , ρ (49) ∂θ ∂θ ∂θ where |ψν is the ket of the νth projection state [see Eq. (15)]. Substituting Eq. (24) we find 7 8 16 ∂ ∂ ∂sν sµ (50) = ψν | Mˆµ |ψν + ψν |Mˆµ |ψν . ∂θ ∂θ ∂θ µ=1 For convenience, we shall label the four wave plate angles {h1,ν , q1,ν , h2,ν , q2,ν }, which specify the νth state by {θν,1 , θν,2 , θν,3 , θν,4 }, respectively. Clearly the µth state does not depend on any of the νth set

entire

December 28, 2004

13:56



527

of angles. Thus we obtain the following expression for the derivatives of sν with respect to wave plate settings: 16 ∂sν (i) = δν,λ sµ fν,µ , ∂θλ,i µ=1

where

(i) fν,µ =

(51)

∂ ∂ ψν | Mˆµ |ψν + ψν |Mˆµ |ψν . ∂θν,i ∂θν,i

(52)

(i)

The 1024 quantities fν,µ can be determined by taking the derivatives of the functional forms of the tomographic states given by Eqs. (14) and (15), and evaluating those derivatives at the appropriate values of the arguments (see Table 1). The errors in the angles are assumed to be uncorrelated, as would be the case if each wave plate were adjusted for each of the 16 measurements. In reality, for qubit experiments, only one or two of the four wave plates are adjusted between one measurement and the next. However, the assumption of uncorrelated angular errors greatly simplifies the calculation (which is, after all, only an estimate of the errors), and seems to produce reasonable figures for our error bars.§ Thus, with the assumption δθν,i δθµ,j = δν,µ δi,j (∆θ)2

(53)

(where ∆θ is the rms uncertainty in the setting of the wave plate, with an estimated value of 0.25o for our apparatus), we obtain the following expression for the errors in sν due to angular settings: δsν δsµ = δν,µ

4 16

(i)

(i) fν,ε fν,λ sε sλ .

(54)

i=1 ε,λ=1

Combining Eqs. (54) and (48) we obtain the following formula for the total error in the quantities sν : δsν δsµ = δν,µ Λν where

 Λν = 

§ In

4 16

sν + N i=1

(55) 

(i) fν,ε fν,λ sε sλ  . (i)

(56)

ε,λ=1

other experimental circumstances, such as the measurement of the joint state of two spin-1/2 particles, the tomography would be realized by performing unitary operations on the spins prior to measurement. In this case, an assumption analogous to ours will be wholly justified.

entire

December 28, 2004

13:56



These 16 quantities can be calculated using the parameters sν and the (i) constants fν,ε . Note that the same result can be obtained by assuming a priori that the errors in sν are all uncorrelated, with Λν = δs2ν ; the more rigorous treatment given here is necessary, however, to demonstrate this fact. For a typical number of counts, say N = 10000, it is found that the contribution of errors from the two causes is roughly comparable; for larger numbers of counts, the angular settings will become the dominant source of error. Based on these results, the errors in the values of the various elements of the density matrix estimated by the linear tomographic technique described in Section 3 are as follows: 2

(∆ρi,j ) =

16 16 % &2 ∂ρi,j ∂ρi,j Mν(i,j) Λν δsν δsµ = ∂sν ∂sµ ν,µ=1 ν=1

(57)

ˆν. where Mν(i,j) is the i, j element of the matrix M A convenient way in which to estimate errors for a maximum likelihood tomographic technique (rather than a linear tomographic technique) is to employ the above formulas, with the slight modification that the parameter sν should be recalculated from Eq. (17) using the estimated density matrix ρêst . This does not take into account errors inherent in the maximum likelihood technique itself. 5.3. Errors in quantities derived from the density matrix When calculating the propagation of errors, it is actually more convenient to use the errors in the sν parameters [given by Eq. (56)], rather than the errors in the elements of density matrix itself (which have non-negligible correlations). 5.3.1. von Neumann entropy The von Neumann entropy is an important measure of the purity of a quantum state ρˆ. It is defined by [17] ρ)} = − S = −Tr {ˆ ρ log2 (ˆ

4

pa log2 pa ,

(58)

a=1

where pa is an eigenvalue of ρˆ, i.e., ρˆ|φa = pa |φa ,

(59)

entire

December 28, 2004

13:56



529

|φa being the ath eigenstate (a= 1,. . . , 4). The error in this quantity is given by 2 16 ∂S 2 (∆S) = Λν . (60) ∂sν ν=1 Applying the chain rule, we find 4 ∂pa ∂S ∂S = . ∂sν ∂sν ∂pa a=1

(61)

The partial differential of an eigenvalue can be easily found by perturbation theory. As is well known (e.g., [22]) the change in the eigenvalue λa of a ˆ due to a perturbation in the matrix δ W ˆ is matrix W ˆ |φa , δλa = φa |δ W

(62)

ˆ corresponding to the eigenvalue λa . Thus where |φa is the eigenvector of W the derivative of λa with respect to some variable x is given by L K ∂W ∂λa ˆ = φa (63) φa . ∂x ∂x Since ρˆ =

16 ν=1

ˆ ν sν , we find that M ∂pa ˆ ν |φa = φa |M ∂sν

(64)

and so, taking the derivative of Eq. (58), Eq. (61) becomes 4 ∂S ˆ ν |φa [1 + ln pa ] . φa |M =− ∂sν ln 2 a=1 Hence 2

(∆S) =

4 16 ν=1

ˆ ν |φa [1 + ln pa ] φa |M ln 2 a=1

(65)

!2 Λν .

(66)

For the experimental example given above, S = 0.106 ± 0.049. 5.3.2. Linear entropy The “linear entropy” is used to quantify the degree of mixture of a quantum state in an analytically convenient form, although unlike the von Neumann entropy it has no direct information theoretic implications. In a normalized

entire

December 28, 2004

13:56



form (defined so that its value lies between zero and one), the linear entropy for a two-qubit system is defined by ! 4 2 & 4 4% 2 1 − Tr ρˆ 1− = P= pa . (67) 3 3 a=1 To calculate the error in this quantity, we need the following partial derivative: 4 4 ∂P 8 ∂pa 8 ˆ ν |φa =− pa =− pa φa |M ∂sν 3 a=1 ∂sν 3 a=1 16 8 ˆ ˆ

8 ˆ

Tr Mµ Mν sµ = − Tr ρˆMν = − 3 3 µ=1

Hence the error in the linear entropy is 16 !2 2 16 16 ∂P 8 ˆ ˆ

2 (∆P) = Λν = Tr Mµ Mν sµ Λν . ∂sν 3 µ=1 ν ν=1

(68)

(69)

For the example given in Sections 3 and 4, P = 0.037 ± 0.026. 5.3.3. Concurrence, entanglement of formation and tangle The concurrence, entanglement of formation, and tangle are measures of the quantum coherence properties of a mixed quantum state [29]. For two qubits,¶ concurrence is defined as follows: consider the non-Hermitian maˆ where the superscript T denotes transpose and the “spin ˆ = ρˆΣˆ ˆ ρT Σ trix R ˆ is defined by flip matrix” Σ   0 0 0 −1   ˆ =  0 0 1 0 . Σ (70)  0 10 0  −1 0 0 0 ˆ depends on the basis chosen; we have asNote that the definition of Σ sumed here the “computational basis” {|HH, |HV , |V H, |V V }. In what ¶ The

analysis in this subsection applies to the two-qubit case only. Measures of entanglement for mixed n-qubit systems are a subject of on-going research: see, for example, [25] for a recent survey. It may be possible to measure entanglement directlty, without quantum state tomography; this possibility was investigated in [21].

entire

December 28, 2004

13:56



531

ˆ in the following form: follows, it will be convenient to write R 16 ˆ=1 R qˆµ,ν sµ sν , 2 µ,ν=1

(71)

ˆM ˆ TΣ ˆ +M ˆνΣ ˆM ˆ T Σ. ˆ The left and right eigenstates and ˆ µΣ where qˆµ,ν = M ν µ ˆ eigenvalues of the matrix R we shall denote by ξa |, |ζa , and ra , respectively, i.e., ˆ = ra ξa |, ξa |R

ˆ a = ra |ζa . R|ζ

(72)

We shall assume that these eigenstates are normalized in the usual manner for biorthogonal expansions, i.e., ξa |ζb = δa,b . Further we shall assume that the eigenvalues are numbered in decreasing order, so that r1 ≥ r2 ≥ r3 ≥ r4 . The concurrence is then defined by the formula 4 √ √ √ √ √ 3 C = max {0, r1 − r2 − r3 − r4 } = max 0, −a sgn ra , 2 a=1 where sgn(x) = 1 if x > 0 and sgn(x) = −1 if x < 0. The tangle is given by T = C 2 and the entanglement of formation by ! √ 1 + 1 − C2 , (73) E=h 2 where h(x) = −xlog2 x − (1 − x)log2 (1 − x). Because h(x) is a monotonically increasing function, these three quantities are to some extent equivalent measures of the entanglement of a mixed state. To calculate the errors in these rather complicated functions, we must employ the perturbation theory for non-Hermitian matrices (see Appendix C for more details). We need to evaluate the following partial derivative: 4 4 ˆ 1 ∂ra 1 3 3 ∂R ∂C −a −a = sgn = sgn |ζa √ √ ξa | ∂sν 2 2 ra ∂sν 2 2 ra ∂sν a=1 a=1 4 16 1 3 −a sgn qµ,ν sµ |ζa , (74) = √ ξa |ˆ 2 2 ra a=1 µ=1 where the function sgn (x) is the sign of the quantity x: it takes the value 1 if x > 0 and −1 if x < 0. Thus sgn (3/2 − a) is equal to +1 if a = 1 and

entire

December 28, 2004

13:56



−1 if a = 2, 3, or 4. Hence the error in the concurrence is 2 16 ∂C 2 Λν (∆C) = ∂sν ν=1 !2 4 16 16 3 1 = sgn qµ,ν sµ |ζa Λν . −a √ ξa |ˆ 2 2 ra ν=1 a=1 µ=1

(75)

For our example the concurrence is 0.963 ± 0.018. Once we know the error in the concurrence, the errors in the tangle and the entanglement of formation can be found straightforwardly: ! √ 2 1 − C 1 + C ∆C, h ∆T = 2C∆C, ∆E = √ 2 1 − C2 where h (x) is the derivative of h(x). For our example the tangle is 0.928 ± 0.034 and the entanglement of formation is 0.947 ± 0.025. 6. Conclusions In conclusion, we have presented a technique for reconstructing density matrices of qubit systems, including a full error analysis. We have extended the latter through to calculation of quantities of interest in quantum information, such as the entropy and concurrence. Without loss of generality, we have used the example of polarization qubits of entangled photons, but we stress that these techniques can be adapted to any physical realization of qubits. Acknowledgements The authors would like to thank Joe Altepeter, Mauro d’Ariano, Zdenek Hradil, Susana Huelga, Kurt Jacobs, Poul Jessen, Jian Li, James D. Malley, Michael Nielsen, Mike Raymer, Sze Tan, and Jaroslav Reh´ aˇcek for useful discussions and correspondence. This work was supported in part by the U.S. National Security Agency, and Advanced Research and Development Activity (ARDA), by the Los Alamos National Laboratory LDRD program, and by the Australian Research Council.

entire

December 28, 2004

13:56



533

Appendix ˆ A. The Γ-Matrices ˆ One possible set of Γ-matrices are generators of SU(2) ⊗ SU(2), normalized so that the conditions given in Eq. (18) are fulfilled. These matrices are 

 0100   ˆ1 = 1  1 0 0 0  , Γ 2 0 0 0 1 0010   0010   ˆ4 = 1  0 0 0 1  , Γ 2 1 0 0 0 0100   0 0 1 0   ˆ 7 = 1  0 0 0 −1  , Γ 2 1 0 0 0  0 −1 0 0   0 0 0 −1   ˆ 10 = 1  0 0 1 0  , Γ 2  0 1 0 0  −1 0 0 0   01 0 0   ˆ 13 = 1  1 0 0 0  , Γ 2  0 0 0 −1  0 0 −1 0   1000   ˆ 16 = 1  0 1 0 0  . Γ 2 0 0 1 0 0001



   0 −i 0 0 1 0 0 0     ˆ2 = 1  i 0 0 0  , Γ ˆ 3 = 1  0 −1 0 0  , Γ 2  0 0 0 −i  2 0 0 1 0  0 0 i 0 0 0 0 −1     0001 0 0 0 −i     ˆ5 = 1  0 0 1 0  , Γ ˆ6 = 1  0 0 i 0  , Γ 2 0 1 0 0 2  0 −i 0 0  1000 i 0 0 0     0 0 −i 0 0 0 0 −i     ˆ 8 = 1  0 0 0 −i  , Γ ˆ 9 = 1  0 0 −i 0  , Γ 2 i 0 0 0  2 0 i 0 0  0i 0 0 i0 0 0     0 0 −i 0 10 0 0     ˆ 11 = 1  0 0 0 i  , Γ ˆ 12 = 1  0 1 0 0  , Γ 2  i 0 0 0 2  0 0 −1 0  0 −i 0 0 0 0 0 −1     0 −i 0 0 1 0 0 0     ˆ 14 = 1  i 0 0 0  , Γ ˆ 15 = 1  0 −1 0 0  , Γ 2 0 0 0 i  2  0 0 −1 0  0 0 −i 0 0 0 0 1

As noted in the text, this is only one possible choice for these matrices, and the final results are independent of the choice. ˆ -Matrices and Some of Their Properties B. The M ˆ matrices, defined by Eq. (25), are as follows: The M  2 −(1 − i) −(1 + i) 1 0 i  −(1 + i) ˆ1 = M  −(1 − i) −i 0 2 1 0 0

  1 0 −(1 − i) 1 0 −(1 + i) 2 ˆ2 =  , M   0 0 −i 2 0 1 −(1 − i)

 0 1 i −(1 + i)  , 0 0 0 0

entire

December 28, 2004

13:56


534 Daniel F. V. James, Paul G. Kwiat, William J. Munro, and Andrew G. White     0 0 0 1 0 0 −(1 + i) 1 1 1 0 0 i −(1 + i) 0 0 i 0    ˆ3 = ˆ4 =  M , M , 0  −(1 − i) −i −i 0 −(1 − i)  2 −(1 − i)  2 2 1 −(1 − i) −(1 + i) 2 1 0 −(1 + i) 0     0 0 2i −(1 + i) 0 0 0 −(1 + i) 1 1 0 0 (1 − i) 0 0 0 (1 − i) 2i     ˆ5 = ˆ6 =  M  −2i , M , (1 + i) 0 0 0 (1 + i) 0 0 2 2 −(1 − i) 0 0 0 −(1 − i) −2i 0 0   0 0 0 −(1 + i) 1 0 0 −(1 − i) 2   ˆ7 = M  , 0 −(1 + i) 0 0 2 −(1 − i) 2 0 0       0 0 0 i 0 0 0 1 0 0 2 −(1 + i) 1 0 0 −i 0 0 0 1 0 0 0 −(1 − i) 0       ˆ8 = ˆ 10 =  ˆ9 =  M , M ,  , M 0 i 0 0 0 1 0 0 2 −(1 + i) 0 0 2 −i 0 0 0 1 0 0 0 −(1 − i) 0 0 0     0 0 0 i 0 2 0 −(1 + i) 1 0 0 i 0 2 0 −(1 + i) 0     ˆ 11 =  ˆ 12 =  M , M , 0 −i 0 0  0 −(1 − i) 0 0 2 −i 0 0 0 −(1 − i) 0 0 0   0 0 0 −(1 + i) 1 0 0 −(1 + i) 0   ˆ 13 = M  , 0 −(1 − i) 0 2 2 −(1 − i) 0 2 0   0 0 0 −(1 − i) 1 0 0 (1 − i) 0   ˆ 14 = M ,  0 (1 + i) 0 −2i  2 −(1 + i) 0 2i 0     0 0 0 1 0 −2i 0 −(1 − i) 1 0 0 −1 0 2i 0 (1 − i) 0     ˆ 15 = ˆ 16 =  M .  , M 0 −1 0 0  0 (1 + i) 0 0 2 1 0 0 0 −(1 + i) 0 0 0

The form of these matrices is independent of the chosen set of matrices ˆ ν } used to convert the density matrix into a column vector. However, {Γ ˆ ν matrices do depend on the set of tomographic states |ψν . the M There are some useful properties of these matrices which we will now derive. From Eq. (25), we have % & ˆ ν |ψµ = ˆ λ |ψµ B −1 ψµ |Γ . (76) ψµ |M λ,ν λ

ˆ λ |ψµ = Bµ,λ ; thus we obtain the result From Eq. (22) we have ψµ |Γ ˆ ν |ψµ = δµ,ν . ψµ |M

(77)

If we denote the basis set for the four-dimensional Hilbert space by {|i (i = 1, 2, 3, 4)}, then Eq. (24) can be written as follows: ˆ ν |jψν |kl|ψν k|ˆ i|ˆ ρ|j = i|M ρ|l. (78) k,l

ν

entire

December 28, 2004

13:56



535

Since Eq. (78) is valid for arbitrary states ρ, ˆ we obtain the following relationship: ˆ ν |jψν |kl|ψν = δik δjl . i|M (79) ν

Contracting Eq. (79) over the indices (i, j) we obtain:

ˆ ν |ψν ψν | = I, ˆ Tr M

(80)

ν

where Iˆ is the identity operator for our four dimensional Hilbert space. A second relationship can be obtained by contracting Eq. (79), viz., ˆ ν |j = δij , i|M (81) ν

or, in operator notation,

ˆ ν = I. ˆ M

(82)

ν

C. Perturbation Theory for Non-Hermitian Matrices Whereas perturbation theory for Hermitian matrices is covered in most quantum mechanics textbooks, the case of non-Hermitian matrices is less familiar, and so we will present it here. The problem is as follows. Given ˆ 0 [32], i.e., the eigenspectrum of a matrix R ˆ 0 = ra ξa |, ξa |R

ˆ 0 |ζa = ra |ζa , R

where ξa |ζb = δa,b

(83)

we wish to find expressions for the eigenvalues ra and eigenstates ξa | and ˆ = R ˆ 0 + δ R. ˆ |ζa of the perturbed matrix R We start with the standard assumption of perturbation theory, i.e., that the perturbed quantities ra , ξa |, and |ζa can be expressed as power series of some parameter λ: ra = ra(0) + λra(1) + λ2 ra(2) + · · · , |ζa ξa |

= =

|ζa(0) + λ|ζa(1) + λ2 |ζa(2) + · · · ξa(0) | + λξa(1) | + λ2 ξa(2) | + · · ·

(84) ,

(85)

.

(86)

entire

December 28, 2004

13:56



ˆ = R ˆ 0 + λδ R, ˆ and comparing terms of equal powers of λ in the Writing R eigenequations, one obtains the following formulas: ˆ 0 |ζ (0) = r(0) |ζ (0) , R a a a (0) ˆ (0) (0) ξa |R0 = ra ξa |,

(0) ˆ ˆ ˆ − r(1) |ζ (0) , R0 − ra I |ζa(1) = − δ R a a

ˆ 0 − r(0) Iˆ = −ξ (0) | δ R ˆ − r(1) . ξa(1) | R a a a

(87) (88) (89) (90)

Equations (87) and (88) imply that, as might be expected, |ζa(0) = |ζa ,

ξa(0) | = ξa |,

ra(0) = ra .

Taking the inner product of Eq. (89) with ξa |, and using the bi-orthogonal property Eq. (83), we obtain ˆ a . ra(1) = ξa |δ R|ζ

(91)

ˆ a . δra ≡ ra − ra ≈ ξa |δ R|ζ

(92)

This implies that

Thus, dividing both sides by some differential increment δx and taking the limit δx → 0, we obtain ˆ ∂ra ∂R = ξa | |ζa . ∂x ∂x

(93)

ˆ and Using the completeness property of the eigenstates, b |ζb ξb | = I, ˆ the identity R0 = b rb |ζb ξb | we obtain the following formula −1

1 ˆ 0 − ra Iˆ = |ζb ξb |. (94) R rb − ra b b=a

Applying this to Eq. (89) we obtain |δζa(1)

≡

|ζa

− |ζa ≈ −

b b=a

Similarly, Eqs. (90) and (94) imply δξa | ≡

δξa |

− δξa | ≈ −

b b=a

ˆ a ξb |δ R|ζ rb − ra

!

ˆ b ξa |δ R|ζ rb − ra

|ζb .

! ξb |.

(95)

entire

December 28, 2004

13:56



537

References [1] J.R. Ashburn, R.A. Cline, P.J.M. Vanderburgt, W.B. Westerveld, and J.S. Risley, Phys. Rev. A, 41, 2407 (1990). [2] K. Banaszek, G.M. D’Ariano, M.G.A. Paris and, M.F. Sacchi, Phys. Rev. A, 61, 010304 (1999). [3] A. Berglund, Undergraduate thesis, Dartmouth College, June 2000; A. Berglund, quant-ph/0010001. [4] S.L. Braunstein, C.M. Caves, R. Jozsa, N. Linden, S. Popescu, and R. Schack, Phys. Rev. Lett., 83, 1054 (1999). [5] I.L. Chuang, N. Gershenfeld, and M. Kubinec, Phys. Rev. Lett., 80, 3408 (1998). [6] T.J. Dunn, I.A. Walmsley, and S. Mukamel, Phys. Rev. Lett., 74, 884 (1995). [7] E. Hecht and A. Zajac, Optics, Sec. 8.12, Addision-Wesley, Reading, MA, 1974. [8] Z. Hradil, Phys. Rev. A, 55, R1561 (1997). [9] Z. Hradil, J. Summhammer, G. Badurek, and H. Rauch, Phys. Rev. A, 62, 014101 (2000). [10] C. Kurtsiefer, T. Pfau, and J. Mlynek, Nature (London), 386, 150-153 (1997). [11] G. Breitenbach, S. Schiller, J. Mlynek, Nature, London 387, 471 (1997). [12] G. Klose, G. Smith, and P.S. Jessen, Phys. Rev. Lett., 86, 4721 (2001). [13] P.G. Kwiat, E. Waks, A.G. White, I. Appelbaum, and P.H. Eberhard, Phys. Rev. A, 60, R773 (1999). [14] D. Leibfried, D.M. Meekhof, B.E. King, C. Monroe, W.M. Itano and D.J. Wineland Phys. Rev. Lett. 77, 4281-4285 (1996); D. Leibfried, T. Pfau and C. Monroe, Phys. Today 51(4), 22 (1998). [15] U. Leonhardt, Measuring the quantum state of light, Cambridge University Press, 1997. [16] A.C. Melissinos, Experiments in modern physics, Sec. 10.4, 467-479, Academic Press, New York, 1966. [17] M.A. Nielsen and I.L. Chuang, Quantum computation and quantum information, Chap. 11, Cambridge University Press, Cambridge, 2000. [18] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd ed., Sec. 10.5, Cambridge University Press, Cambridge, 1992. [19] D.T. Smithey, M. Beck, M.G. Raymer, and A. Faridani, Phys. Rev. Lett., 70, 1244 (1993); see also the discussion of polarization effects in M.G. Raymer, D.F. McAlister, and A. Funk, in P. Kumar, ed., Quantum Communication, Computing, and Measurement ’98, 147-162, Plenum, New York, 2000. ˇ aˇcek, Z. Hradil, and M. Jeˇzek, Phys. Rev. A, 63, 040303 (2001). [20] J. Reh´ [21] J.M.G. Sancho and S.F. Huelga, Phys. Rev. A, 61, 042303 (2000). [22] L.I. Schiff, Quantum mechanics, 3rd ed., Eq. (31.8), 246, McGraw-Hill, New York, 1968. [23] G.C. Stokes, Trans. Cambr. Phil. Soc. 9, 399 (1852). [24] S.M. Tan, J. Mod. Opt., 44, 2233 (1997).

entire

December 28, 2004

13:56



[25] B.M. Terhal, J. Theoretical Computer Science, 287, 313 (2002). [26] A.G. White, D.F.V. James, P.H. Eberhard, and P.G. Kwiat, Phys. Rev. Lett. 83, 3103 (1999). [27] A.G. White, D.F.V. James, W.J. Munro, and P.G. Kwiat, Phys. Rev. A, 65, 012301 (2002). [28] E. Wolf, Nuovo Cimento 13, 1165, 1959; L. Mandel and E. Wolf, Optical coherence and quantum optics, Chap. 6, Cambridge University Press, Cambridge, 1995. [29] W.K. Wootters, Phys. Rev. Lett., 80, 2245 (1998); V. Coffman, J. Kundu, and W.K. Wootters, Phys. Rev. A, 61, 052306 (2000). [30] J.W. Wu, P.K. Lam, M.B., Gray and H.-A. Bachor, Opt. Express, 3, 154 (1998). [31] A recent overview of many quantum computation technologies is given in S. Braunstein and Ho.-K. Lo, Scalable Quantum Computers: Paving the Way to Realization, Wiley, New York, 2001; see also Fortschr. Phys., 48, 9-11 (2000). [32] The properties of the eigenvectors and eigenvalues of non-Hermitian matrices is discussed in P.M. Morse and H. Feshbach, Methods of theoretical physics, I, 884, McGraw-Hill, New York, 1953, et seq.

entire

December 28, 2004

13:56


INDEX

∗-subalgebra, 120 C k -state manifold, 151 C k -map, 151 G–locally efficient, 237 G-efficient, 237 m-semiclassical, 373 n-i.i.d. extended family, 173 δ-typical, 87, 88 G–efficient, 237 (first-order) asymptotically efficient, 128

character, 464 coherent, 292, 333, 335 coherent state, 141 collective measurement, 94 commutation operator, 289 commutative, 68, 123 commuting theorem, 323 completely ergodic, 46, 48 completely positive map, 381 completeness, 141 composite systems, 408 concatenation, 380, 383 conditional expectation, 122 consistent, 128 consistent estimator, 308, 413 converse part, 21 covariant, 369, 370 CR bound, 104 Cramér-Rao inequality, 96 Cramér-Rao type bound, 320

acceptance region, 424 AF C ∗ -algebra, 46, 56, 61 anti-normal ordered, 503 anti-symmetric logarithmic derivative, 222, 249 antiunitary, 347 approximate cloning, 458 asymptotic efficiency, 129 asymptotic strong orthogonality, 23, 83 asymptotic weak orthogonality, 23, 78 asymptotically exact, 461 asymptotically unbiased, 164 attainable Cramér-Rao type bound, 92, 317 auxiliary qubits, 379

density operator, 101, 116, 137, 230, 234 deviation, 153 direct approach, 306, 321 direct part, 21 direct sum representation, 71 disjoint random combination, 429 distinguishable, 23 duality, 305

base space, 311 Bayesian approach, 9 Bayesian reconstruction, 474 Black Cow factor, 380 Bloch vector, 379, 380

e-tangent vector, 234 effective quantum efficiency, 506 efficient, 96, 126, 227, 237, 253 539

entire

December 28, 2004

13:56


540

efficient estimator, 118 ergodic, 47 estimating the spectrum, 458 estimation vector, 322 estimator, 103, 163, 224, 230, 309, 310, 411 exponential connection, 129 exponential error estimate, 463 exponential representation, 234 faithful, 151, 441 faithful model, 318 family, 91 family of probability distributions, 8, 91 fidelity, 379, 381, 382 first error probability, 46 first order asymptotic theory, 162 Fisher information, 91, 121 Fisher information matrix, 92, 308 Fubini-Study distance, 368 G¨ artner-Ellis theorem, 464 Gaussian displacement noise, 505 Gaussian state, 274 generalized measurement, 134, 230 generalized quantum measurement, 115 generalized spin coherent model, 333 GNS representation, 53, 62 half-line quantum Gaussian state family, 431 highest weight, 463 Homodyne, 472 homodyning the observable, 500 horizontal component, 312 horizontal lift, 312, 313, 344 horizontal subspace, 311 horizontal vector, 312 hyperfinite, 57 hyperfinite von Neumann algebra, 46, 61 indirect approach, 321 infinite C ∗ -tensor product, 45

Index

information divergence, 43 information geometry, 305 informationally exclusive, 339 informationally independent, 338, 340 invariance approach, 9 invariant, 71 irreducible, 71 KMB Fisher information, 396 Kolmogorov-Sinai entropy, 53, 62 large deviation, 460 large deviation method, 463 large deviation theory, 395 likelihood ratio test, 19 linear tomography, 474 local unbiasedness, 139 locally coherent, 292 locally efficient, 237 locally quasi-classical, 318 locally unbiased, 152, 225, 231, 310, 322 locally unbiased at, 104 locally unbiased estimator, 307 locally unbiased operators, 231 LOCC, 23 logarithmic derivative, 91 lower bound of Cramér-Rao type, 104 m-tangent vector, 234 martingale convergence of relative entropy, 56 matrix representation, 311 maximum likelihood estimator (MLE), 98 maximum likelihood tomography, 474 mean ergodic theorem, 46, 53, 62 mean relative entropy, 45, 47 mean square error (MSE), 91 measurement, 102, 115, 134, 137, 309, 407 minimally uncertain state, 141 minimax, 9 minimum variance unbiased estimation, 9 mixed, 101

entire

December 28, 2004

13:56


Index

mixture connection, 129 mixture representation, 234 model, 162, 307, 309 monotonicity, 20 monotonicity of relative entropy, 44 most informative CR bound, 107 most informative Cramér-Rao bound, 92 motion reversal operator, 348 MSE consistent, 164 MSE consistent estimator, 173 multiplicity, 464 mutually dual, 130 Neyman Pearson Lemma, 19, 24 non-commutative Fourier transform, 272 non-quantum-correlational Cramér-Rao type bound, 92, 171 normal ordered, 502 normalized ρθ -symplectic basis, 294 observable, 103, 408 observation, 44 one-parameter quantum Gaussian state family, 431 operator-concavity, 50, 61 operator-valued relative entropy, 49 optimal, 466 optimal quantum cloning, 385 orthogonal resolution of identity, 115 outcome of the measurement, 137 over-completeness, 141 parallel, 319 POM, 13 position shifted model, 346, 348 positive operator-valued measure, 13, 44, 163, 368, 461 POVM, 13, 368, 382 pre-inner product, 221, 247 principal fiber bundle, 311 probability operator measure, 13, 136 probability operator-valued measure, 102 product state, 45

541

projection-valued, 102 projection-valued measure, 44 pure, 101, 221, 230, 247 pure states, 418 quantum birthday problems, 77 quantum channel, 487 quantum characteristic function, 273 quantum cloning, 379, 380 quantum Cramér-Rao inequality, 114 quantum Cramér-Rao type bound, 168 quantum exponential family, 119, 127 quantum hypothesis testing, 45 quantum i.i.d., 10, 14 quantum information geometry, 305 quantum Neyman Pearson Lemma, 19 quantum relative entropy, 20 quantum state family, 92 quantum statistical model, 103 Quantum Stein’s Lemma, 21 quantum-correlational Cramér-Rao type bound, 173 quasi-classical, 94, 121, 318 quasi-classical in the wider sense, 345 quasi-quantum Cramér-Rao type bound, 164 random combination, 429 rate function, 463 recursive estimator, 171 recursive MSE consistent estimator, 171 relative entropy, 20, 43, 130 relative phase factor, 313 representation, 71 right logarithmic derivative, 93, 104, 117, 232, 295 RLD, 93 RLD Fisher information matrix, 93, 172 scaled cumulant generating function, 463 second error probability, 46

entire

December 28, 2004

13:56


542

Shannon-McMillan-Breiman theorem, 46, 53, 57, 63 shrinking factor, 380, 381, 383, 384 sieved maximum likelihood estimation, 473 simple, 102, 230 simple hypothesis testing, 19 simple measurement, 134, 407 SLD CR bound, 314 SLD CR inequality, 314 SLD Fisher information, 92, 396 SLD Fisher information matrix, 223, 250, 314 SLD tangent space, 292 small boundary, 463 spectral decomposition, 115 spin coherent model, 338 state, 44, 101, 115, 116, 137, 406 state change caused by the measurement, 407 stationary, 47 statistical operator, 116, 137 statistics of the measurement result, 407 Stein’s Lemma, 20, 24 Stokes parameters, 512 strong consistency, 397 strong orthogonality, 76 strong orthogonality capacity, 76 strongly clustering, 47 strongly consistent, 434 strongly consistent at θ, 434 strongly mixing, 47 structure group, 311 sufficiency, 122 sufficient, 122, 123 superadditivity of relative entropy, 47 symmetric logarithmic derivative, 92, 104, 117, 222, 231, 249, 291, 401 symmetric subspace, 379, 384 symmetrized logarithmic derivative, 313 tensored representation, 71 test, 410 time reversal operator, 348

Index

tomography, 472 total space model, 338 trace, 44 Uhlmann fidelity, 384 Uhlmann’s relative entropy, 55 Umegaki’s relative entropy, 44 unbiased, 95, 103, 138, 225, 231, 310 unbiased estimator, 116, 307 unbiasedness, 411, 422 uncertainty principle, 134 universal N → M cloner, 385 universal N → M quantum cloning machine, 380 universal and symmetric quantum cloning machine, 379 universal cloning machine, 381, 384 Varadhan’s theorem, 464 vertical component, 312 vertical subspace, 311 vertical vector, 312 weak consistency, 423, 426 weak orthogonality, 76 weak orthogonality capacity, 76 weakly consistent, 396, 431 weakly mixing, 47 weight field, 126 weight matrix, 317 weights, 464 Weyl representation, 272 Wigner function, 472 Young frame, 459, 460

entire

Asymptotic theory of quantum statistical inference

Asymptotic Theory Of Quantum Statistical Inference: Selected Papers

Geometrical Foundations of Asymptotic Inference

Quantum Statistical Theory of Superconductivity

Quantum Statistical Theory of Superconductivity

Quantum statistical theory of superconductivity

Quantum Statistical Theory of Superconductivity

Quantum Statistical Theory of Superconductivity

Quantum Statistical Theory of Superconductivity

Statistical Inference

Statistical Inference

Principles of Statistical Inference

Statistical Inference

Statistical Inference

Statistical Inference

Essentials of statistical inference

Statistical Inference

Principles of statistical inference

Principles of Statistical Inference

Essentials of Statistical Inference

Statistical experiments and decisions: asymptotic theory

Asymptotic Theory of Statistical Inference for Time Series (Springer Series in Statistics)

Asymptotic Theory of Statistical Inference for Time Series (Springer Series in Statistics)

Introduction to statistical inference

Introduction to Statistical Inference

Modes of Parametric Statistical Inference

Probabilistic and Statistical Aspects of Quantum Theory

Probability and Statistical Inference

Statistical Inference in Science

Probability and Statistical Inference

Asymptotic theory of quantum statistical inference