CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
This page intentionally left blank
March 31, 2008
18:49
CUUS063...
80 downloads
728 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
This page intentionally left blank
March 31, 2008
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
Computational Complexity A Conceptual Perspective
Complexity Theory is a central field of the theoretical foundations of computer science. It is concerned with the general study of the intrinsic complexity of computational tasks; that is, it addresses the question of what can be achieved within limited time (and/or with other limited natural computational resources). This book offers a conceptual perspective on Complexity Theory. It is intended to serve as an introduction for advanced undergraduate and graduate students, either as a textbook or for selfstudy. The book will also be useful to experts, since it provides expositions of the various subareas of Complexity Theory such as hardness amplification, pseudorandomness, and probabilistic proof systems. In each case, the author starts by posing the intuitive questions that are addressed by the subarea and then discusses the choices made in the actual formulation of these questions, the approaches that lead to the answers, and the ideas that are embedded in these answers. Oded Goldreich is a Professor of Computer Science at the Weizmann Institute of Science and an Incumbent of the Meyer W. Weisgal Professorial Chair. He is an editor for the SIAM Journal on Computing, the Journal of Cryptology, and Computational Complexity and previously authored the books Modern Cryptography, Probabilistic Proofs and Pseudorandomness, and the twovolume work Foundations of Cryptography.
i
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
ii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
Computational Complexity A Conceptual Perspective
Oded Goldreich Weizmann Institute of Science
iii
18:49
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521884730 © Oded Goldreich 2008 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2008
ISBN13 9780511398827
eBook (EBL)
ISBN13
hardback
9780521884730
Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or thirdparty internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
to Dana
v
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
vi
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
Contents
page xiii
List of Figures Preface Organization and Chapter Summaries Acknowledgments
xv xvii xxiii
1 Introduction and Preliminaries 1.1 Introduction 1.1.1 A Brief Overview of Complexity Theory 1.1.2 Characteristics of Complexity Theory 1.1.3 Contents of This Book 1.1.4 Approach and Style of This Book 1.1.5 Standard Notations and Other Conventions 1.2 Computational Tasks and Models 1.2.1 Representation 1.2.2 Computational Tasks 1.2.3 Uniform Models (Algorithms) 1.2.4 Nonuniform Models (Circuits and Advice) 1.2.5 Complexity Classes Chapter Notes
1 1 2 6 8 12 16 17 18 18 20 36 42 43
2 P, NP, and NPCompleteness 2.1 The P Versus NP Question 2.1.1 The Search Version: Finding Versus Checking 2.1.2 The Decision Version: Proving Versus Verifying 2.1.3 Equivalence of the Two Formulations 2.1.4 Two Technical Comments Regarding NP 2.1.5 The Traditional Definition of NP 2.1.6 In Support of P Different from NP 2.1.7 Philosophical Meditations 2.2 PolynomialTime Reductions 2.2.1 The General Notion of a Reduction 2.2.2 Reducing Optimization Problems to Search Problems 2.2.3 SelfReducibility of Search Problems 2.2.4 Digest and General Perspective
44 46 47 50 54 55 55 57 58 58 59 61 63 67
vii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CONTENTS
NPCompleteness 2.3.1 Definitions 2.3.2 The Existence of NPComplete Problems 2.3.3 Some Natural NPComplete Problems 2.3.4 NP Sets That Are Neither in P nor NPComplete 2.3.5 Reflections on Complete Problems 2.4 Three Relatively Advanced Topics 2.4.1 Promise Problems 2.4.2 Optimal Search Algorithms for NP 2.4.3 The Class coNP and Its Intersection with NP Chapter Notes Exercises 2.3
67 68 69 71 81 85 87 87 92 94 97 99
3 Variations on P and NP 3.1 Nonuniform Polynomial Time (P/poly) 3.1.1 Boolean Circuits 3.1.2 Machines That Take Advice 3.2 The PolynomialTime Hierarchy (PH) 3.2.1 Alternation of Quantifiers 3.2.2 Nondeterministic Oracle Machines 3.2.3 The P/poly Versus NP Question and PH Chapter Notes Exercises
108 108 109 111 113 114 117 119 121 122
4 More Resources, More Power? 4.1 Nonuniform Complexity Hierarchies 4.2 Time Hierarchies and Gaps 4.2.1 Time Hierarchies 4.2.2 Time Gaps and Speedup 4.3 Space Hierarchies and Gaps Chapter Notes Exercises
127 128 129 129 136 139 139 140
5 Space Complexity 5.1 General Preliminaries and Issues 5.1.1 Important Conventions 5.1.2 On the Minimal Amount of Useful Computation Space 5.1.3 Time Versus Space 5.1.4 Circuit Evaluation 5.2 Logarithmic Space 5.2.1 The Class L 5.2.2 LogSpace Reductions 5.2.3 LogSpace Uniformity and Stronger Notions 5.2.4 Undirected Connectivity 5.3 Nondeterministic Space Complexity 5.3.1 Two Models 5.3.2 NL and Directed Connectivity 5.3.3 A Retrospective Discussion
143 144 144 145 146 153 153 154 154 155 155 162 162 164 171
viii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CONTENTS
5.4 PSPACE and Games Chapter Notes Exercises
172 175 175
6 Randomness and Counting 6.1 Probabilistic Polynomial Time 6.1.1 Basic Modeling Issues 6.1.2 TwoSided Error: The Complexity Class BPP 6.1.3 OneSided Error: The Complexity Classes RP and coRP 6.1.4 ZeroSided Error: The Complexity Class ZPP 6.1.5 Randomized LogSpace 6.2 Counting 6.2.1 Exact Counting 6.2.2 Approximate Counting 6.2.3 Searching for Unique Solutions 6.2.4 Uniform Generation of Solutions Chapter Notes Exercises
184 185 186 189 193 199 199 202 202 211 217 220 227 230
7 The Bright Side of Hardness 7.1 OneWay Functions 7.1.1 Generating Hard Instances and OneWay Functions 7.1.2 Amplification of Weak OneWay Functions 7.1.3 HardCore Predicates 7.1.4 Reflections on Hardness Amplification 7.2 Hard Problems in E 7.2.1 Amplification with Respect to PolynomialSize Circuits 7.2.2 Amplification with Respect to ExponentialSize Circuits Chapter Notes Exercises
241 242 243 245 250 255 255 257 270 277 278
8 Pseudorandom Generators Introduction 8.1 The General Paradigm 8.2 GeneralPurpose Pseudorandom Generators 8.2.1 The Basic Definition 8.2.2 The Archetypical Application 8.2.3 Computational Indistinguishability 8.2.4 Amplifying the Stretch Function 8.2.5 Constructions 8.2.6 Nonuniformly Strong Pseudorandom Generators 8.2.7 Stronger Notions and Conceptual Reflections 8.3 Derandomization of TimeComplexity Classes 8.3.1 Defining Canonical Derandomizers 8.3.2 Constructing Canonical Derandomizers 8.3.3 Technical Variations and Conceptual Reflections 8.4 SpaceBounded Distinguishers 8.4.1 Definitional Issues
284 285 288 290 291 292 295 299 301 304 305 307 308 310 313 315 316
ix
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CONTENTS
8.4.2 Two Constructions SpecialPurpose Generators 8.5.1 Pairwise Independence Generators 8.5.2 SmallBias Generators 8.5.3 Random Walks on Expanders Chapter Notes Exercises 8.5
9 Probabilistic Proof Systems Introduction and Preliminaries 9.1 Interactive Proof Systems 9.1.1 Motivation and Perspective 9.1.2 Definition 9.1.3 The Power of Interactive Proofs 9.1.4 Variants and Finer Structure: An Overview 9.1.5 On Computationally Bounded Provers: An Overview 9.2 ZeroKnowledge Proof Systems 9.2.1 Definitional Issues 9.2.2 The Power of ZeroKnowledge 9.2.3 Proofs of Knowledge – A Parenthetical Subsection 9.3 Probabilistically Checkable Proof Systems 9.3.1 Definition 9.3.2 The Power of Probabilistically Checkable Proofs 9.3.3 PCP and Approximation 9.3.4 More on PCP Itself: An Overview Chapter Notes Exercises
318 325 326 329 332 334 337 349 350 352 352 354 357 363 366 368 369 372 378 380 381 383 398 401 404 406
10 Relaxing the Requirements 10.1 Approximation 10.1.1 Search or Optimization 10.1.2 Decision or Property Testing 10.2 AverageCase Complexity 10.2.1 The Basic Theory 10.2.2 Ramifications Chapter Notes Exercises
416 417 418 423 428 430 442 451 453
Epilogue
461
Appendix A: Glossary of Complexity Classes A.1 Preliminaries A.2 AlgorithmBased Classes A.2.1 Time Complexity Classes A.2.2 Space Complexity Classes A.3 CircuitBased Classes
463 463 464 464 467 467
Appendix B: On the Quest for Lower Bounds B.1 Preliminaries
469 469
x
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CONTENTS
B.2 Boolean Circuit Complexity B.2.1 Basic Results and Questions B.2.2 Monotone Circuits B.2.3 BoundedDepth Circuits B.2.4 Formula Size B.3 Arithmetic Circuits B.3.1 Univariate Polynomials B.3.2 Multivariate Polynomials B.4 Proof Complexity B.4.1 Logical Proof Systems B.4.2 Algebraic Proof Systems B.4.3 Geometric Proof Systems
471 472 473 473 474 475 476 476 478 480 480 481
Appendix C: On the Foundations of Modern Cryptography C.1 Introduction and Preliminaries C.1.1 The Underlying Principles C.1.2 The Computational Model C.1.3 Organization and Beyond C.2 Computational Difficulty C.2.1 OneWay Functions C.2.2 HardCore Predicates C.3 Pseudorandomness C.3.1 Computational Indistinguishability C.3.2 Pseudorandom Generators C.3.3 Pseudorandom Functions C.4 ZeroKnowledge C.4.1 The Simulation Paradigm C.4.2 The Actual Definition C.4.3 A General Result and a Generic Application C.4.4 Definitional Variations and Related Notions C.5 Encryption Schemes C.5.1 Definitions C.5.2 Constructions C.5.3 Beyond Eavesdropping Security C.6 Signatures and Message Authentication C.6.1 Definitions C.6.2 Constructions C.7 General Cryptographic Protocols C.7.1 The Definitional Approach and Some Models C.7.2 Some Known Results C.7.3 Construction Paradigms and Two Simple Protocols C.7.4 Concluding Remarks
482 482 483 485 486 487 487 489 490 490 491 492 494 494 494 495 497 500 502 504 505 507 508 509 511 512 517 517 522
Appendix D: Probabilistic Preliminaries and Advanced Topics in Randomization
523
D.1 Probabilistic Preliminaries D.1.1 Notational Conventions D.1.2 Three Inequalities
523 523 524
xi
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CONTENTS
D.2 Hashing D.2.1 Definitions D.2.2 Constructions D.2.3 The Leftover Hash Lemma D.3 Sampling D.3.1 Formal Setting D.3.2 Known Results D.3.3 Hitters D.4 Randomness Extractors D.4.1 Definitions and Various Perspectives D.4.2 Constructions
528 528 529 529 533 533 534 535 536 537 541
Appendix E: Explicit Constructions E.1 ErrorCorrecting Codes E.1.1 Basic Notions E.1.2 A Few Popular Codes E.1.3 Two Additional Computational Problems E.1.4 A ListDecoding Bound E.2 Expander Graphs E.2.1 Definitions and Properties E.2.2 Constructions
545 546 546 547 551 553 554 555 561
Appendix F: Some Omitted Proofs F.1 Proving That PH Reduces to #P F.2 Proving That IP( f ) ⊆ AM(O( f )) ⊆ AM( f ) F.2.1 Emulating General Interactive Proofs by AMGames F.2.2 Linear Speedup for AM
566 566 572 572 578
Appendix G: Some Computational Problems G.1 Graphs G.2 Boolean Formulae G.3 Finite Fields, Polynomials, and Vector Spaces G.4 The Determinant and the Permanent G.5 Primes and Composite Numbers
583 583 585 586 587 587
Bibliography Index
589 601
xii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
List of Figures
1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 3.1 4.1 5.1 5.2 5.3 6.1 6.2 6.3 6.4 6.5 7.1 7.2 8.1 8.2 8.3 8.4 8.5 8.6 9.1 9.2 9.3 9.4 9.5 9.6
Dependencies among the advanced chapters. A single step by a Turing machine. A circuit computing f (x1 , x2 , x3 , x4 ) = (x1 ⊕ x2 , x1 ∧ ¬x2 ∧ x4 ). Recursive construction of parity circuits and formulae. Consecutive computation steps of a Turing machine. The idea underlying the reduction of CSAT to SAT. The reduction to G3C – the clause gadget and its subgadget. The reduction to G3C – connecting the gadgets. The world view under P = coN P ∩ N P = N P. Two levels of the Polynomialtime Hierarchy. The Gap Theorem – determining the value of t(n). Algorithmic composition for spacebounded computation. The recursive procedure in N L ⊆ DSPACE(O(log2 )). The main step in proving N L = coN L. The reduction to the Permanent – tracks connecting gadgets. The reduction to the Permanent – the gadget’s effect. The reduction to the Permanent – A Deus ex Machina gadget. The reduction to the Permanent – a structured gadget. The reduction to the Permanent – the box’s effect. The hardcore of a oneway function – an illustration. Proofs of hardness amplification: Organization. Pseudorandom generators – an illustration. Analysis of stretch amplification – the i th hybrid. Derandomization of BPL – the generator. An affine transformation affected by a Toeplitz matrix. The LFSR smallbias generator. Pseudorandom generators at a glance. Arithmetization of CNF formulae. Zeroknowledge proofs – an illustration. The PCP model – an illustration. Testing consistency of linear and quadratic forms. Composition of PCP system – an illustration. The amplifying reduction in the second proof of the PCP Theorem.
xiii
page 9 23 38 42 74 76 81 82 97 119 137 147 167 170 207 207 208 209 209 250 258 287 300 321 327 330 335 360 369 380 387 389 397
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
LIST OF FIGURES
10.1 10.2 E.1 F.1 F.2
Two types of averagecase completeness. Worstcase versus averagecase assumptions. Detail of the ZigZag product of G and G. The transformation of an MAgame into an AMgame. The transformation of MAMA into AMA.
xiv
446 451 562 579 581
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
Preface
The quest for efficiency is ancient and universal, as time and other resources are always in shortage. Thus, the question of which tasks can be performed efficiently is central to the human experience. A key step toward the systematic study of the aforementioned question is a rigorous definition of the notion of a task and of procedures for solving tasks. These definitions were provided by computability theory, which emerged in the 1930s. This theory focuses on computational tasks, and considers automated procedures (i.e., computing devices and algorithms) that may solve such tasks. In focusing attention on computational tasks and algorithms, computability theory has set the stage for the study of the computational resources (like time) that are required by such algorithms. When this study focuses on the resources that are necessary for any algorithm that solves a particular task (or a task of a particular type), the study becomes part of the theory of Computational Complexity (also known as Complexity Theory).1 Complexity Theory is a central field of the theoretical foundations of computer science. It is concerned with the study of the intrinsic complexity of computational tasks. That is, a typical complexity theoretic study refers to the computational resources required to solve a computational task (or a class of such tasks), rather than referring to a specific algorithm or an algorithmic schema. Actually, research in Complexity Theory tends to start with and focus on the computational resources themselves, and addresses the effect of limiting these resources on the class of tasks that can be solved. Thus, Computational Complexity is the general study of what can be achieved within limited time (and/or other limited natural computational resources). The (halfcentury) history of Complexity Theory has witnessed two main research efforts (or directions). The first direction is aimed toward actually establishing concrete lower bounds on the complexity of computational problems, via an analysis of the evolution of the process of computation. Thus, in a sense, the heart of this direction is a “lowlevel” analysis of computation. Most research in circuit complexity and in proof complexity falls within this category. In contrast, a second research effort is aimed at exploring the connections among computational problems and notions, without being able to provide absolute statements regarding the individual problems or notions. This effort may be 1
In contrast, when the focus is on the design and analysis of specific algorithms (rather than on the intrinsic complexity of the task), the study becomes part of a related subfield that may be called Algorithmic Design and Analysis. Furthermore, Algorithmic Design and Analysis tends to be subdivided according to the domain of mathematics, science, and engineering in which the computational tasks arise. In contrast, Complexity Theory typically maintains a unity of the study of tasks solvable within certain resources (regardless of the origins of these tasks).
xv
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
PREFACE
viewed as a “highlevel” study of computation. The theory of NPcompleteness as well as the studies of approximation, probabilistic proof systems, pseudorandomness, and cryptography all fall within this category. The current book focuses on the latter effort (or direction). The main reason for our decision to focus on the “highlevel” direction is the clear conceptual significance of the known results. That is, many known results in this direction have an extremely appealing conceptual message, which can be appreciated also by nonexperts. Furthermore, these conceptual aspects can be explained without entering excessive technical detail. Consequently, the “highlevel” direction is more suitable for an exposition in a book of the current nature.2 The last paragraph brings us to a discussion of the nature of the current book, which is captured by the subtitle (i.e., “a conceptual perspective”). Our main thesis is that Complexity Theory is extremely rich in conceptual content, and that this content should be explicitly communicated in expositions and courses on the subject. The desire to provide a corresponding textbook is indeed the motivation for writing the current book and its main governing principle. This book offers a conceptual perspective on Complexity Theory, and the presentation is designed to highlight this perspective. It is intended to serve as an introduction to the field, and can be used either as a textbook or for selfstudy. Indeed, the book’s primary target audience consists of students who wish to learn Complexity Theory and educators who intend to teach a course on Complexity Theory. Still, we hope that the book will be useful also to experts, especially to experts in one subarea of Complexity Theory who seek an introduction to and/or an overview of some other subarea. It is also hoped that the book may help promote general interest in Complexity Theory and make this field acccessible to general readers with adequate background (which consists mainly of being comfortable with abstract discussions, definitions, and proofs). However, we do expect most readers to have a basic knowledge of algorithms, or at least be fairly comfortable with the notion of an algorithm. The book focuses on several subareas of Complexity Theory (see the following organization and chapter summaries). In each case, the exposition starts from the intuitive questions addressed by the subarea, as embodied in the concepts that it studies. The exposition discusses the fundamental importance of these questions, the choices made in the actual formulation of these questions and notions, the approaches that underlie the answers, and the ideas that are embedded in these answers. Our view is that these (“nontechnical”) aspects are the core of the field, and the presentation attempts to reflect this view. We note that being guided by the conceptual contents of the material leads, in some cases, to technical simplifications. Indeed, for many of the results presented in this book, the presentation of the proof is different (and arguably easier to understand) than the standard presentations. Web site for notices regarding this book. We intend to maintain a Web site listing corrections of various types. The location of the site is http://www.wisdom.weizmann.ac.il/∼oded/ccbook.html 2
In addition, we mention a subjective reason for our decision: The “highlevel” direction is within our own expertise, while this cannot be said about the “lowlevel” direction.
xvi
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
Organization and Chapter Summaries
This book consists of ten chapters and seven appendices. The chapters constitute the core of this book and are written in a style adequate for a textbook, whereas the appendices provide either relevant background or additional perspective and are written in the style of a survey article. The relative length and ordering of the chapters (and appendices) do not reflect their relative importance, but rather an attempt at the best logical order (i.e., minimizing the number of forward pointers). Following are brief summaries of the book’s chapters and appendices. These summaries are more novicefriendly than those provided in Section 1.1.3 but less detailed than the summaries provided at the beginning of each chapter. Chapter 1: Introduction and Preliminaries. The introduction provides a highlevel overview of some of the content of Complexity Theory as well as a discussion of some of the characteristic features of this field. In addition, the introduction contains several important comments regarding the approach and conventions of the current book. The preliminaries provide the relevant background on computability theory, which is the setting in which complexity theoretic questions are being studied. Most importantly, central notions such as search and decision problems, algorithms that solve such problems, and their complexity are defined. In addition, this part presents the basic notions underlying nonuniform models of computation (like Boolean circuits). Chapter 2: P, NP, and NPCompleteness. The P versus NP Question can be phrased as asking whether or not finding solutions is harder than checking the correctness of solutions. An alternative formulation asks whether or not discovering proofs is harder than verifying their correctness, that is, is proving harder than verifying. It is widely believed that the answer to the two equivalent formulations is that finding (resp., proving) is harder than checking (resp., verifying); that is, it is believed that P is different from NP. At present, when faced with a hard problem in NP, we can only hope to prove that it is not in P assuming that NP is different from P. This is where the theory of NPcompleteness, which is based on the notion of a reduction, comes into the picture. In general, one computational problem is reducible to another problem if it is possible to efficiently solve the former when provided with an (efficient) algorithm for solving the latter. A problem (in NP) is NPcomplete if any problem in NP is reducible to it. Amazingly enough, NPcomplete problems exist, and furthermore, hundreds of natural computational problems arising in many different areas of mathematics and science are NPcomplete.
xvii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
ORGANIZATION AND CHAPTER SUMMARIES
Chapter 3: Variations on P and NP. Nonuniform polynomial time (P/poly) captures efficient computations that are carried out by devices that handle specific input lengths. The basic formalism ignores the complexity of constructing such devices (i.e., a uniformity condition), but a finer formalism (based on “machines that take advice”) allows us to quantify the amount of nonuniformity. This provides a generalization of P. In contrast, the Polynomialtime Hierarchy (PH) generalizes NP by considering statements expressed by a quantified Boolean formula with a fixed number of alternations of existential and universal quantifiers. It is widely believed that each quantifier alternation adds expressive power to the class of such formulae. The two different classes are related by showing that if NP is contained in P/poly then the Polynomialtime Hierarchy collapses to its second level (i.e., 2 ). Chapter 4: More Resources, More Power? When using “nice” functions to determine an algorithm’s resources, it is indeed the case that more resources allow for more tasks to be performed. However, when “ugly” functions are used for the same purpose, increasing the resources may have no effect. By nice functions we mean functions that can be computed without exceeding the amount of resources that they specify. Thus, we get results asserting, for example, that there are problems that are solvable in cubic time but not in quadratic time. In the case of nonuniform models of computation, the issue of “nicety” does not arise, and it is easy to establish separation results. Chapter 5: Space Complexity. This chapter is devoted to the study of the space complexity of computations, while focusing on two rather extreme cases. The first case is that of algorithms having logarithmic space complexity, which seem a proper and natural subset of the set of polynomialtime algorithms. The second case is that of algorithms having polynomial space complexity, which in turn can solve almost all computational problems considered in this book. Among the many results presented in this chapter are a logspace algorithm for exploring (undirected) graphs, and a logspace reduction of the set of directed graphs that are not strongly connected to the set of directed graphs that are strongly connected. These results capture fundamental properties of space complexity, which seems to differentiate it from time complexity. Chapter 6: Randomness and Counting. Probabilistic polynomialtime algorithms with various types of failure give rise to complexity classes such as BPP, RP, and ZPP. The results presented include the emulation of probabilistic choices by nonuniform advice (i.e., BPP ⊂ P/poly) and the emulation of twosided probabilistic error by an ∃∀sequence of quantifiers (i.e., BPP ⊆ 2 ). Turning to counting problems (i.e., counting the number of solutions for NPtype problems), we distinguish between exact counting and approximate counting (in the sense of relative approximation). While any problem in PH is reducible to the exact counting class #P, approximate counting (for #P) is (probabilistically) reducible to N P. Additional related topics include #Pcompleteness, the complexity of searching for unique solutions, and the relation between approximate counting and generating almost uniformly distributed solutions. Chapter 7: The Bright Side of Hardness. It turns out that hard problems can be “put to work” to our benefit, most notably in cryptography. One key issue that arises in this context is bridging the gap between “occasional” hardness (e.g., worstcase hardness or mild averagecase hardness) and “typical” hardness (i.e., strong averagecase hardness).
xviii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
ORGANIZATION AND CHAPTER SUMMARIES
We consider two conjectures that are related to P = N P. The first conjecture is that there are problems that are solvable in exponential time but are not solvable by (nonuniform) families of small (say, polynomialsize) circuits. We show that these types of worstcase conjectures can be transformed into averagecase hardness results that yield nontrivial derandomizations of BPP (and even BPP = P). The second conjecture is that there are problems in NP for which it is easy to generate (solved) instances that are hard to solve for other people. This conjecture is captured in the notion of oneway functions, which are functions that are easy to evaluate but hard to invert (in an averagecase sense). We show that functions that are hard to invert in a relatively mild averagecase sense yield functions that are hard to invert almost everywhere, and that the latter yield predicates that are very hard to approximate (called hardcore predicates). The latter are useful for the construction of generalpurpose pseudorandom generators, as well as for a host of cryptographic applications. Chapter 8: Pseudorandom Generators. A fresh view of the question of randomness was taken in the theory of computing: It has been postulated that a distribution is pseudorandom if it cannot be told apart from the uniform distribution by any efficient procedure. The paradigm, originally associating efficient procedures with polynomialtime algorithms, has been applied also with respect to a variety of limited classes of such distinguishing procedures. The archetypical case of pseudorandom generators refers to efficient generators that fool any feasible procedure; that is, the potential distinguisher is any probabilistic polynomialtime algorithm, which may be more complex than the generator itself. These generators are called generalpurpose, because their output can be safely used in any efficient application. In contrast, for purposes of derandomization, one may use pseudorandom generators that are somewhat more complex than the potential distinguisher (which represents the algorithm to be derandomized). Following this approach and using various hardness assumptions, one may obtain corresponding derandomizations of BPP (including a full derandomization; i.e., BPP = P). Other forms of pseudorandom generators include ones that fool spacebounded distinguishers, and even weaker ones that only exhibit some limited random behavior (e.g., outputting a pairwise independent sequence). Chapter 9: Probabilistic Proof Systems. Randomized and interactive verification procedures, giving rise to interactive proof systems, seem much more powerful than their deterministic counterparts. In particular, interactive proof systems exist for any set in PSPACE ⊇ coN P (e.g., for the set of unsatisfied propositional formulae), whereas it is widely believed that some sets in coN P do not have NPproof systems. Interactive proofs allow the meaningful conceptualization of zeroknowledge proofs, which are interactive proofs that yield nothing (to the verifier) beyond the fact that the assertion is indeed valid. Under reasonable complexity assumptions, every set in N P has a zeroknowledge proof system. (This result has many applications in cryptography.) A third type of probabilistic proof system underlies the model of PCPs, which stands for probabilistically checkable proofs. These are (redundant) NPproofs that offer a tradeoff between the number of locations (randomly) examined in the proof and the confidence in its validity. In particular, a small constant error probability can be obtained by reading a constant number of bits in the redundant NPproof. The PCP Theorem asserts that NPproofs can be efficiently transformed into PCPs. The study of PCPs is closely related to the study of the complexity of approximation problems.
xix
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
ORGANIZATION AND CHAPTER SUMMARIES
Chapter 10: Relaxing the Requirements. In light of the apparent infeasibility of solving numerous useful computational problems, it is natural to seek relaxations of those problems that remain useful for the original applications and yet allow for feasible solving procedures. Two such types of relaxation are provided by adequate notions of approximation and a theory of averagecase complexity. The notions of approximation refer to the computational problems themselves; that is, for each problem instance we extend the set of admissible solutions. In the context of search problems this means settling for solutions that have a value that is “sufficiently close” to the value of the optimal solution, whereas in the context of decision problems this means settling for procedures that distinguish yesinstances from instances that are “far” from any yesinstance. Turning to averagecase complexity, we note that a systematic study of this notion requires the development of a nontrivial conceptual framework. One major aspect of this framework is limiting the class of distributions in a way that, on the one hand, allows for various types of natural distributions and, on the other hand, prevents the collapse of averagecase hardness to worstcase hardness. Appendix A: Glossary of Complexity Classes. The glossary provides selfcontained definitions of most complexity classes mentioned in the book. The glossary is partitioned into two parts, dealing separately with complexity classes that are defined in terms of algorithms and their resources (i.e., time and space complexity of Turing machines) and complexity classes defined in terms of nonuniform circuits (and referring to their size and depth). In particular, the following classes are defined: P, N P, coN P, BPP, RP, coRP, ZPP, #P, PH, E, EX P, N EX P, L, N L, RL, PSPACE, P/poly, N C k , and AC k . Appendix B: On the Quest for Lower Bounds. This brief survey describes the most famous attempts at proving lower bounds on the complexity of natural computational problems. The first part, devoted to Circuit Complexity, reviews lower bounds for the size of (restricted) circuits that solve natural computational problems. This represents a program whose longterm goal is proving that P = N P. The second part, devoted to Proof Complexity, reviews lower bounds on the length of (restricted) propositional proofs of natural tautologies. This represents a program whose longterm goal is proving that N P = coN P. Appendix C: On the Foundations of Modern Cryptography. This survey of the foundations of cryptography focuses on the paradigms, approaches, and techniques that are used to conceptualize, define, and provide solutions to natural security concerns. It presents some of these conceptual tools as well as some of the fundamental results obtained using them. The appendix augments the partial treatment of oneway functions, pseudorandom generators, and zeroknowledge proofs (included in Chapters 7–9). Using these basic tools, the appendix provides a treatment of basic cryptographic applications such as encryption, signatures, and general cryptographic protocols. Appendix D: Probabilistic Preliminaries and Advanced Topics in Randomization. The probabilistic preliminaries include conventions regarding random variables as well as three useful inequalities (i.e., Markov’s Inequality, Chebyshev’s Inequality, and Chernoff Bound). The advanced topics include constructions and lemmas regarding families of hashing functions, a study of the sample and randomness complexities of estimating the
xx
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
ORGANIZATION AND CHAPTER SUMMARIES
average value of an arbitrary function, and the problem of randomness extraction (i.e., procedures for extracting almost perfect randomness from sources of weak or defected randomness). Appendix E: Explicit Constructions. Complexity Theory provides a clear perspective on the intuitive notion of an explicit construction. This perspective is demonstrated with respect to errorcorrecting codes and expander graphs. Starting with codes, the appendix focuses on various computational aspects, and offers a review of several popular constructions as well as a construction of a binary code of constant rate and constant relative distance. Also included are a brief review of the notions of locally testable and locally decodable codes, and a useful upper bound on the number of codewords that are close to any single sequence. Turning to expander graphs, the appendix contains a review of two standard definitions of expanders, two levels of explicitness, two properties of expanders that are related to (singlestep and multistep) random walks on them, and two explicit constructions of expander graphs. Appendix F: Some Omitted Proofs. This appendix contains some proofs that were not included in the main text (for a variety of reasons) and still are beneficial as alternatives to the original and/or standard presentations. Included are a proof that PH is reducible to #P via randomized Karpreductions, and the presentation of two useful transformations regarding interactive proof systems. Appendix G: Some Computational Problems. This appendix includes definitions of most of the specific computational problems that are referred to in the main text. In particular, it contains a brief introduction to graph algorithms, Boolean formulae, and finite fields.
xxi
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
xxii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
Acknowledgments
My perspective on Complexity Theory was most influenced by Shimon Even and Leonid Levin. In fact, it was hard not to be influenced by these two remarkable and highly opinionated researchers (especially for somebody like me who was fortunate to spend a lot of time with them).1 Shimon Even viewed Complexity Theory as the study of the limitations of algorithms, a study concerned with natural computational resources and natural computational tasks. Complexity Theory was there to guide the engineer and to address the deepest questions that bother an intellectually curious computer scientist. I believe that this book shares Shimon’s perspective of Complexity Theory as evolving around such questions. Leonid Levin emphasized the general principles that underlie Complexity Theory, rejecting any “modeldependent effects” as well as the common coupling of Complexity Theory with the theory of automata and formal languages. In my opinion, this book is greatly influenced by these perspectives of Leonid. I wish to acknowledge the influence of numerous other colleagues on my professional perspectives and attitudes. These include Shafi Goldwasser, Dick Karp, Silvio Micali, and Avi Wigderson. Needless to say, this is but a partial list that reflects influences of which I am most aware. The year I spent at Radcliffe Institute for Advanced Study (of Harvard University) was instrumental in my decision to undertake the writing of this book. I am grateful to Radcliffe for creating such an empowering atmosphere. I also wish to thank many colleagues for their comments and advice (or help) regarding earlier versions of this text. A partial list includes Noga Alon, Noam Livne, Dieter van Melkebeek, Omer Reingold, Dana Ron, Ronen Shaltiel, Amir Shpilka, Madhu Sudan, Salil Vadhan, and Avi Wigderson. Lastly, I am grateful to Mohammad Mahmoody Ghidary and Or Meir for their careful reading of drafts of this manuscript and for the numerous corrections and suggestions that they have provided. Relation to previous texts. Some of the text of this book has been adapted from previous texts of mine. In particular, Chapters 8 and 9 were written based on my surveys [90, Chap. 3] and [90, Chap. 2], respectively; but the exposition has been extensively revised 1
Shimon Even was my graduate studies adviser (at the Technion, 1980–83), whereas I had a lot of meetings with Leonid Levin during my postdoctoral period (at MIT, 1983–86).
xxiii
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
ACKNOWLEDGMENTS
to fit the significantly different aims of the current book. Similarly, Section 7.1 and Appendix C were written based on my survey [90, Chap. 1] and books [91, 92]; but, again, the previous texts are very different in many ways. In contrast, Appendix B was adapted with relatively little modifications from an early draft of a section in an article by Avi Wigderson and myself [107].
xxiv
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER ONE
Introduction and Preliminaries
When you set out on your journey to Ithaca, pray that the road is long, full of adventure, full of knowledge. K. P. Cavafy, “Ithaca” The current chapter consists of two parts. The first part provides a highlevel introduction to (computational) Complexity Theory. This introduction is much more detailed than the laconic statements made in the preface, but is quite sparse when compared to the richness of the field. In addition, the introduction contains several important comments regarding the contents, approach, and conventions of the current book. approximation
averagecase
pseudorandomness
PCP ZK
IP
PSPACE
PH
BPP
RP
NP
P
coNP
NL
L
lower bounds
The second part of this chapter provides the necessary preliminaries to the rest of the book. It includes a discussion of computational tasks and computational models, as well as natural complexity measures associated with the latter. More specifically, this part recalls the basic notions and results of computability theory (including the definition of Turing machines, some undecidability results, the notion of universal machines, and the definition of oracle machines). In addition, this part presents the basic notions underlying nonuniform models of computation (like Boolean circuits).
1.1. Introduction This introduction consists of two parts: The first part refers to the area itself, whereas the second part refers to the current book. The first part provides a brief overview of Complexity Theory (Section 1.1.1) as well as some reflections about its characteristics (Section 1.1.2). The second part describes the contents of this book (Section 1.1.3), the
1
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
considerations underlying the choice of topics as well as the way they are presented (Section 1.1.4), and various notations and conventions (Section 1.1.5).
1.1.1. A Brief Overview of Complexity Theory Out of the tough came forth sweetness1 Judges, 14:14
The following brief overview is intended to give a flavor of the questions addressed by Complexity Theory. This overview is quite vague, and is merely meant as a teaser. Most of the topics mentioned in it will be discussed at length in the various chapters of this book. Complexity Theory is concerned with the study of the intrinsic complexity of computational tasks. Its “final” goals include the determination of the complexity of any welldefined task. Additional goals include obtaining an understanding of the relations between various computational phenomena (e.g., relating one fact regarding computational complexity to another). Indeed, we may say that the former type of goal is concerned with absolute answers regarding specific computational phenomena, whereas the latter type is concerned with questions regarding the relation between computational phenomena. Interestingly, so far Complexity Theory has been more successful in coping with goals of the latter (“relative”) type. In fact, the failure to resolve questions of the “absolute” type led to the flourishing of methods for coping with questions of the “relative” type. Musing for a moment, let us say that, in general, the difficulty of obtaining absolute answers may naturally lead to seeking conditional answers, which may in turn reveal interesting relations between phenomena. Furthermore, the lack of absolute understanding of individual phenomena seems to facilitate the development of methods for relating different phenomena. Anyhow, this is what happened in Complexity Theory. Putting aside for a moment the frustration caused by the failure of obtaining absolute answers, we must admit that there is something fascinating in the success of relating different phenomena: In some sense, relations between phenomena are more revealing than absolute statements about individual phenomena. Indeed, the first example that comes to mind is the theory of NPcompleteness. Let us consider this theory, for a moment, from the perspective of these two types of goals. Complexity Theory has failed to determine the intrinsic complexity of tasks such as finding a satisfying assignment to a given (satisfiable) propositional formula or finding a 3coloring of a given (3colorable) graph. But it has succeeded in establishing that these two seemingly different computational tasks are in some sense the same (or, more precisely, are computationally equivalent). We find this success amazing and exciting, and hope that the reader shares these feelings. The same feeling of wonder and excitement is generated by many of the other discoveries of Complexity Theory. Indeed, the reader is invited to join a fast tour of some of the other questions and answers that make up the field of Complexity Theory. We will indeed start with the P versus NP Question. Our daily experience is that it is harder to solve a problem than it is to check the correctness of a solution (e.g., think of either a puzzle or a research problem). Is this experience merely a coincidence or does it represent a fundamental fact of life (i.e., a property of the world)? Could you imagine a 1
The quote is commonly interpreted as meaning that benefit arose out of misfortune.
2
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
world in which solving any problem is not significantly harder than checking a solution to it? Would the term “solving a problem” not lose its meaning in such a hypothetical (and impossible, in our opinion) world? The denial of the plausibility of such a hypothetical world (in which “solving” is not harder than “checking”) is what “P different from NP” actually means, where P represents tasks that are efficiently solvable and NP represents tasks for which solutions can be efficiently checked. The mathematically (or theoretically) inclined reader may also consider the task of proving theorems versus the task of verifying the validity of proofs. Indeed, finding proofs is a special type of the aforementioned task of “solving a problem” (and verifying the validity of proofs is a corresponding case of checking correctness). Again, “P different from NP” means that there are theorems that are harder to prove than to be convinced of their correctness when presented with a proof. This means that the notion of a “proof” is meaningful; that is, proofs do help when one is seeking to be convinced of the correctness of assertions. Here NP represents sets of assertions that can be efficiently verified with the help of adequate proofs, and P represents sets of assertions that can be efficiently verified from scratch (i.e., without proofs). In light of the foregoing discussion it is clear that the P versus NP Question is a fundamental scientific question of farreaching consequences. The fact that this question seems beyond our current reach led to the development of the theory of NPcompleteness. Loosely speaking, this theory identifies a set of computational problems that are as hard as NP. That is, the fate of the P versus NP Question lies with each of these problems: If any of these problems is easy to solve then so are all problems in NP. Thus, showing that a problem is NPcomplete provides evidence of its intractability (assuming, of course, “P different than NP”). Indeed, demonstrating the NPcompleteness of computational tasks is a central tool in indicating hardness of natural computational problems, and it has been used extensively both in computer science and in other disciplines. We note that NPcompleteness indicates not only the conjectured intractability of a problem but also its “richness” in the sense that the problem is rich enough to “encode” any other problem in NP. The use of the term “encoding” is justified by the exact meaning of NPcompleteness, which in turn establishes relations between different computational problems (without referring to their “absolute” complexity). The foregoing discussion of NPcompleteness hints at the importance of representation, since it referred to different problems that encode one another. Indeed, the importance of representation is a central aspect of Complexity Theory. In general, Complexity Theory is concerned with problems for which the solutions are implicit in the problem’s statement (or rather in the instance). That is, the problem (or rather its instance) contains all necessary information, and one merely needs to process this information in order to supply the answer.2 Thus, Complexity Theory is concerned with manipulation of information, and its transformation from one representation (in which the information is given) to another representation (which is the one desired). Indeed, a solution to a computational problem is merely a different representation of the information given, that is, a representation in which the answer is explicit rather than implicit. For example, the answer to the question of whether or not a given Boolean formula is satisfiable is implicit in the formula itself (but the task is to make the answer explicit). Thus, Complexity Theory clarifies a central 2
In contrast, in other disciplines, solving a problem may require gathering information that is not available in the problem’s statement. This information may either be available from auxiliary (past) records or be obtained by conducting new experiments.
3
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
issue regarding representation, that is, the distinction between what is explicit and what is implicit in a representation. Furthermore, it even suggests a quantification of the level of nonexplicitness. In general, Complexity Theory provides new viewpoints on various phenomena that were considered also by past thinkers. Examples include the aforementioned concepts of solutions, proofs, and representation as well as concepts like randomness, knowledge, interaction, secrecy, and learning. We next discuss the latter concepts and the perspective offered by Complexity Theory. The concept of randomness has puzzled thinkers for ages. Their perspective can be described as ontological: They asked “what is randomness” and wondered whether it exists, at all (or is the world deterministic). The perspective of Complexity Theory is behavioristic: It is based on defining objects as equivalent if they cannot be told apart by any efficient procedure. That is, a coin toss is (defined to be) “random” (even if one believes that the universe is deterministic) if it is infeasible to predict the coin’s outcome. Likewise, a string (or a distribution of strings) is “random” if it is infeasible to distinguish it from the uniform distribution (regardless of whether or not one can generate the latter). Interestingly, randomness (or rather pseudorandomness) defined this way is efficiently expandable; that is, under a reasonable complexity assumption (to be discussed next), short pseudorandom strings can be deterministically expanded into long pseudorandom strings. Indeed, it turns out that randomness is intimately related to intractability. Firstly, note that the very definition of pseudorandomness refers to intractability (i.e., the infeasibility of distinguishing a pseudorandomness object from a uniformly distributed object). Secondly, as stated, a complexity assumption, which refers to the existence of functions that are easy to evaluate but hard to invert (called oneway functions), implies the existence of deterministic programs (called pseudorandom generators) that stretch short random seeds into long pseudorandom sequences. In fact, it turns out that the existence of pseudorandom generators is equivalent to the existence of oneway functions. Complexity Theory offers its own perspective on the concept of knowledge (and distinguishes it from information). Specifically, Complexity Theory views knowledge as the result of a hard computation. Thus, whatever can be efficiently done by anyone is not considered knowledge. In particular, the result of an easy computation applied to publicly available information is not considered knowledge. In contrast, the value of a hardtocompute function applied to publicly available information is knowledge, and if somebody provides you with such a value then it has provided you with knowledge. This discussion is related to the notion of zeroknowledge interactions, which are interactions in which no knowledge is gained. Such interactions may still be useful, because they may convince a party of the correctness of specific data that was provided beforehand. For example, a zeroknowledge interactive proof may convince a party that a given graph is 3colorable without yielding any 3coloring. The foregoing paragraph has explicitly referred to interaction, viewing it as a vehicle for gaining knowledge and/or gaining confidence. Let us highlight the latter application by noting that it may be easier to verify an assertion when allowed to interact with a prover rather than when reading a proof. Put differently, interaction with a good teacher may be more beneficial than reading any book. We comment that the added power of such interactive proofs is rooted in their being randomized (i.e., the verification procedure is randomized), because if the verifier’s questions can be determined beforehand then the prover may just provide the transcript of the interaction as a traditional written proof.
4
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
Another concept related to knowledge is that of secrecy: Knowledge is something that one party may have while another party does not have (and cannot feasibly obtain by itself) – thus, in some sense knowledge is a secret. In general, Complexity Theory is related to cryptography, where the latter is broadly defined as the study of systems that are easy to use but hard to abuse. Typically, such systems involve secrets, randomness, and interaction as well as a complexity gap between the ease of proper usage and the infeasibility of causing the system to deviate from its prescribed behavior. Thus, much of cryptography is based on complexity theoretic assumptions and its results are typically transformations of relatively simple computational primitives (e.g., oneway functions) into more complex cryptographic applications (e.g., secure encryption schemes). We have already mentioned the concept of learning when referring to learning from a teacher versus learning from a book. Recall that Complexity Theory provides evidence to the advantage of the former. This is in the context of gaining knowledge about publicly available information. In contrast, computational learning theory is concerned with learning objects that are only partially available to the learner (i.e., reconstructing a function based on its value at a few random locations or even at locations chosen by the learner). Complexity Theory sheds light on the intrinsic limitations of learning (in this sense). Complexity Theory deals with a variety of computational tasks. We have already mentioned two fundamental types of tasks: searching for solutions (or rather “finding solutions”) and making decisions (e.g., regarding the validity of assertions). We have also hinted that in some cases these two types of tasks can be related. Now we consider two additional types of tasks: counting the number of solutions and generating random solutions. Clearly, both the latter tasks are at least as hard as finding arbitrary solutions to the corresponding problem, but it turns out that for some natural problems they are not significantly harder. Specifically, under some natural conditions on the problem, approximately counting the number of solutions and generating an approximately random solution is not significantly harder than finding an arbitrary solution. Having mentioned the notion of approximation, we note that the study of the complexity of finding “approximate solutions” is also of natural importance. One type of approximation problems refers to an objective function defined on the set of potential solutions: Rather than finding a solution that attains the optimal value, the approximation task consists of finding a solution that attains an “almost optimal” value, where the notion of “almost optimal” may be understood in different ways giving rise to different levels of approximation. Interestingly, in many cases, even a very relaxed level of approximation is as difficult to obtain as solving the original (exact) search problem (i.e., finding an approximate solution is as hard as finding an optimal solution). Surprisingly, these hardnessofapproximation results are related to the study of probabilistically checkable proofs, which are proofs that allow for ultrafast probabilistic verification. Amazingly, every proof can be efficiently transformed into one that allows for probabilistic verification based on probing a constant number of bits (in the alleged proof). Turning back to approximation problems, we note that in other cases a reasonable level of approximation is easier to achieve than solving the original (exact) search problem. Approximation is a natural relaxation of various computational problems. Another natural relaxation is the study of averagecase complexity, where the “average” is taken over some “simple” distributions (representing a model of the problem’s instances that may occur in practice). We stress that, although it was not stated explicitly, the entire discussion so far has referred to “worstcase” analysis of algorithms. We mention that worstcase complexity is a more robust notion than averagecase complexity. For starters,
5
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
one avoids the controversial question of which instances are “important in practice” and correspondingly the selection of the class of distributions for which averagecase analysis is to be conducted. Nevertheless, a relatively robust theory of averagecase complexity has been suggested, albeit it is less developed than the theory of worstcase complexity. In view of the central role of randomness in Complexity Theory (as evident, say, in the study of pseudorandomness, probabilistic proof systems, and cryptography), one may wonder as to whether the randomness needed for the various applications can be obtained in real life. One specific question, which received a lot of attention, is the possibility of “purifying” randomness (or “extracting good randomness from bad sources”). That is, can we use “defected” sources of randomness in order to implement almost perfect sources of randomness? The answer depends, of course, on the model of such defected sources. This study turned out to be related to Complexity Theory, where the most tight connection is between some type of randomness extractors and some type of pseudorandom generators. So far we have focused on the time complexity of computational tasks, while relying on the natural association of efficiency with time. However, time is not the only resource one should care about. Another important resource is space: the amount of (temporary) memory consumed by the computation. The study of space complexity has uncovered several fascinating phenomena, which seem to indicate a fundamental difference between space complexity and time complexity. For example, in the context of space complexity, verifying proofs of validity of assertions (of any specific type) has the same complexity as verifying proofs of invalidity for the same type of assertions. In case the reader feels dizzy, it is no wonder. We took an ultrafast air tour of some mountain tops, and dizziness is to be expected. Needless to say, the rest of the book offers a totally different touring experience. We will climb some of these mountains by foot, step by step, and will often stop to look around and reflect. Absolute Results (aka. Lower Bounds). As stated upfront, absolute results are not known for many of the “big questions” of Complexity Theory (most notably the P versus NP Question). However, several highly nontrivial absolute results have been proved. For example, it was shown that using negation can speed up the computation of monotone functions (which do not require negation for their mere computation). In addition, many promising techniques were introduced and employed with the aim of providing a lowlevel analysis of the progress of computation. However, as stated in the preface, the focus of this book is elsewhere.
1.1.2. Characteristics of Complexity Theory We are successful because we use the right level of abstraction. Avi Wigderson (1996)
Using the “right level of abstraction” seems to be a main characteristic of the theory of computation at large. The right level of abstraction means abstracting away secondorder details, which tend to be context dependent, while using definitions that reflect the main issues (rather than abstracting them away, too). Indeed, using the right level of abstraction calls for an extensive exercising of good judgment, and one indication for having chosen the right abstractions is the result of their study.
6
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
One major choice, taken by the theory of computation at large, is the choice of a model of computation and corresponding complexity measures and classes. The choice, which is currently taken for granted, was to use a simple model that avoids both the extreme of being too realistic (and thus too detailed) as well as the extreme of being too abstract (and vague). On the one hand, the main model of computation (which is used in Complexity Theory) does not try to mimic (or mirror) the actual operation of reallife computers used at a specific historical time. Such a choice would have made it very hard to develop Complexity Theory as we know it and to uncover the fundamental relations discussed in this book: The mass of details would have obscured the view. On the other hand, avoiding any reference to any concrete model (like in the case of recursive function theory) does not encourage the introduction and study of natural measures of complexity. Indeed, as we shall see in Section 1.2.3, the choice was (and is) to use a simple model of computation (which does not mirror reallife computers), while avoiding any effects that are specific to that model (by keeping an eye on a host of variants and alternative models). The freedom from the specifics of the basic model is obtained by considering complexity classes that are invariant under a change of model (as long as the alternative model is “reasonable”). Another major choice is the use of asymptotic analysis. Specifically, we consider the complexity of an algorithm as a function of its input length, and study the asymptotic behavior of this function. It turns out that structure that is hidden by concrete quantities appears at the limit. Furthermore, depending on the case, we classify functions according to different criteria. For example, in the case of time complexity we consider classes of functions that are closed under multiplication, whereas in case of space complexity we consider closure under addition. In each case, the choice is governed by the nature of the complexity measure being considered. Indeed, one could have developed a theory without using these conventions, but this would have resulted in a far more cumbersome theory. For example, rather than saying that finding a satisfying assignment for a given formula is polynomialtime reducible to deciding the satisfiability of some other formulae, one could have stated the exact functional dependence of the complexity of the search problem on the complexity of the decision problem. Both the aforementioned choices are common to other branches of the theory of computation. One aspect that makes Complexity Theory unique is its perspective on the most basic question of the theory of computation, that is, the way it studies the question of what can be efficiently computed. The perspective of Complexity Theory is general in nature. This is reflected in its primary focus on the relevant notion of efficiency (captured by corresponding resource bounds) rather than on specific computational problems. In most cases, complexity theoretic studies do not refer to any specific computational problems or refer to such problems merely as an illustration. Furthermore, even when specific computational problems are studied, this study is (explicitly or at least implicitly) aimed at understanding the computational limitations of certain resource bounds. The aforementioned general perspective seems linked to the significant role of conceptual considerations in the field: The rigorous study of an intuitive notion of efficiency must be initiated with an adequate choice of definitions. Since this study refers to any possible (relevant) computation, the definitions cannot be derived by abstracting some concrete reality (e.g., a specific algorithmic schema). Indeed, the definitions attempt to capture any possible reality, which means that the choice of definitions is governed by conceptual principles and not merely by empirical observations.
7
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
1.1.3. Contents of This Book This book is intended to serve as an introduction to Computational Complexity Theory. It consists of ten chapters and seven appendices, and can be used either as a textbook or for selfstudy. The chapters constitute the core of this book and are written in a style adequate for a textbook, whereas the appendices provide either relevant background or additional perspective and are written in the style of a survey article.
1.1.3.1. Overall Organization of the Book Section 1.2 and Chapter 2 are a prerequisite for the rest of the book. Technically speaking, the notions and results that appear in these parts are extensively used in the rest of the book. More importantly, the former parts are the conceptual framework that shapes the field and provides a good perspective on the field’s questions and answers. Indeed, Section 1.2 and Chapter 2 provide the very basic material that must be understood by anybody having an interest in Complexity Theory. In contrast, the rest of the book covers more advanced material, which means that none of it can be claimed to be absolutely necessary for a basic understanding of Complexity Theory. In particular, although some advanced chapters refer to material in other advanced chapters, the relation between these chapters is not a fundamental one. Thus, one may choose to read and/or teach an arbitrary subset of the advanced chapters and do so in an arbitrary order, provided one is willing to follow the relevant references to some parts of other chapters (see Figure 1.1). Needless to say, we recommend reading and/or teaching all the advanced chapters, and doing so by following the order presented in this book. As illustrated by Figure 1.1, some chapters (i.e., Chapters 3, 6, and 10) lump together topics that are usually presented separately. These decisions are related to our perspective on the corresponding topics. Turning to the appendices, we note that some of them (e.g., Appendix G and parts of Appendices D and E) provide background information that is required in some of the advanced chapters. In contrast, other appendices (e.g., Appendices B and C and other parts of Appendices D and E) provide additional perspective that augments the advanced chapters. (The function of Appendices A and F will be clarified in §1.1.3.2.) 1.1.3.2. Contents of the Specific Parts The rest of this section provides a brief summary of the contents of the various chapters and appendices. This summary is intended for the teacher and/or the expert, whereas the student is referred to the more novicefriendly summaries that appear in the book’s preface. Section 1.2: Preliminaries. This section provides the relevant background on computability theory, which is the basis for the rest of this book (as well as for Complexity Theory at large). Most importantly, it contains a discussion of central notions such as search and decision problems, algorithms that solve such problems, and their complexity. In addition, this section presents nonuniform models of computation (e.g., Boolean circuits). Chapter 2: P, NP, and NPcompleteness. This chapter presents the PvsNP Question both in terms of search problems and in terms of decision problems. The second main
8
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
10.2
10.1.1 10.1.2 approx. (of opt.)
average case
prop. test.
9.2 ZK
9.3
7.1
9.1 IP
PCP
OWF
7.2 in E
8.2 8.3 8.4 8.5 gen. pur.
8.1
deran. space
paragidm
7.1.3
6.1 rand.
6.2 count.
6.1.5 (RL)
5.2.4
3.1
3.2
P/poly
PH
4.3 space
5.3.1
4.2 TIME
5.2 5.3 5.4 L NL PSPACE
4.1 advice
5.1 general
3.2.3
Figure 1.1: Dependencies among the advanced chapters. Solid arrows indicate the use of specific results that are stated in the section to which the arrow points. Dashed lines (and arrows) indicate an important conceptual connection; the wider the line, the tighter the connection. When relations are only between subsections, their index is indicated.
topic of this chapter is the theory of NPcompleteness. The chapter also provides a treatment of the general notion of a (polynomial time) reduction, with special emphasis on selfreducibility. Additional topics include the existence of problems in NP that are neither NPcomplete nor in P, optimal search algorithms, the class coNP, and promise problems. Chapter 3: Variations on P and NP. This chapter provides a treatment of nonuniform polynomial time (P/poly) and of the Polynomialtime Hierarchy (PH). Each of the two classes is defined in two equivalent ways (e.g., P/poly is defined both in terms of circuits and in terms of “machines that take advice”). In addition, it is shown that if NP is contained in P/poly then PH collapses to its second level (i.e., 2 ). Chapter 4: More Resources, More Power? The focus of this chapter is on hierarchy theorems, which assert that typically more resources allow for solving more problems. These results depend on using bounding functions that can be computed without exceeding the amount of resources that they specify, and otherwise gap theorems may apply. Chapter 5: Space Complexity. Among the results presented in this chapter are a logspace algorithm for testing connectivity of (undirected) graphs, a proof that N L = coN L,
9
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
and complete problems for N L and PSPACE (under logspace and polytime reductions, respectively). Chapter 6: Randomness and Counting. This chapter focuses on various randomized complexity classes (i.e., BPP, RP, and ZPP) and the counting class #P. The results presented in this chapter include BPP ⊂ P/poly and BPP ⊆ 2 , the #Pcompleteness of the Permanent, the connection between approximate counting and uniform generation of solutions, and the randomized reductions of approximate counting to N P and of N P to solving problems with unique solutions. Chapter 7: The Bright Side of Hardness. This chapter deals with two conjectures that are related to P = N P. The first conjecture is that there are problems in E that are not solvable by (nonuniform) families of small (say, polynomialsize) circuits, whereas the second conjecture is equivalent to the notion of oneway functions. Most of this chapter is devoted to “hardness amplification” results that convert these conjectures into tools that can be used for nontrivial derandomizations of BPP (resp., for a host of cryptographic applications). Chapter 8: Pseudorandom Generators. The pivot of this chapter is the notion of computational indistinguishability and corresponding notions of pseudorandomness. The definition of generalpurpose pseudorandom generators (running in polynomial time and withstanding any polynomialtime distinguisher) is presented as a special case of a general paradigm. The chapter also contains a presentation of other instantiations of the latter paradigm, including generators aimed at derandomizing complexity classes such as BPP, generators withstanding spacebounded distinguishers, and some specialpurpose generators. Chapter 9: Probabilistic Proof Systems. This chapter provides a treatment of three types of probabilistic proof systems: interactive proofs, zeroknowledge proofs, and probabilistic checkable proofs. The results presented include IP = PSPACE, zeroknowledge proofs for any NPset, and the PCP Theorem. For the latter, only overviews of the two different known proofs are provided. Chapter 10: Relaxing the Requirements. This chapter provides a treatment of two types of approximation problems and a theory of averagecase (or rather typicalcase) complexity. The traditional type of approximation problem refers to search problems and consists of a relaxation of standard optimization problems. The second type is known as “property testing” and consists of a relaxation of standard decision problems. The theory of averagecase complexity involves several nontrivial definitional choices (e.g., an adequate choice of the class of distributions). Appendix A: Glossary of Complexity Classes. The glossary provides selfcontained definitions of most complexity classes mentioned in the book. Appendix B: On the Quest for Lower Bounds. The first part, devoted to Circuit Complexity, reviews lower bounds for the size of (restricted) circuits that solve natural computational problems. The second part, devoted to Proof Complexity, reviews lower bounds on the length of (restricted) propositional proofs of natural tautologies.
10
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
Appendix C: On the Foundations of Modern Cryptography. The first part of this appendix augments the partial treatment of oneway functions, pseudorandom generators, and zeroknowledge proofs (included in Chapters 7–9). Using these basic tools, the second part provides a treatment of basic cryptographic applications such as encryption, signatures, and general cryptographic protocols. Appendix D: Probabilistic Preliminaries and Advanced Topics in Randomization. The probabilistic preliminaries include conventions regarding random variables and overviews of three useful inequalities (i.e., Markov’s Inequality, Chebyshev’s Inequality, and Chernoff Bound). The advanced topics include constructions of hashing functions and variants of the Leftover Hashing Lemma, and overviews of samplers and extractors (i.e., the problem of randomness extraction). Appendix E: Explicit Constructions. This appendix focuses on various computational aspects of errorcorrecting codes and expander graphs. On the topic of codes, the appendix contains a review of the Hadamard code, ReedSolomon codes, ReedMuller codes, and a construction of a binary code of constant rate and constant relative distance. Also included are a brief review of the notions of locally testable and locally decodable codes, and a listdecoding bound. On the topic of expander graphs, the appendix contains a review of the standard definitions and properties as well as a presentation of the MargulisGabberGalil and the ZigZag constructions. Appendix F: Some Omitted Proofs. This appendix contains some proofs that are beneficial as alternatives to the original and/or standard presentations. Included are proofs that PH is reducible to #P via randomized Karpreductions, and that IP( f ) ⊆ AM(O( f )) ⊆ AM( f ). Appendix G: Some Computational Problems. This appendix contains a brief introduction to graph algorithms, Boolean formulae, and finite fields. Bibliography. As stated in §1.1.4.4, we tried to keep the bibliographic list as short as possible (and still reached over a couple of hundred entries). As a result, many relevant references were omitted. In general, our choice of references was biased in favor of textbooks and survey articles. We tried, however, not to omit references to key papers in an area. Absent from this book. As stated in the preface, the current book does not provide a uniform cover of the various areas of Complexity Theory. Notable omissions include the areas of Circuit Complexity (cf. [46, 236]) and Proof Complexity (cf. [27]), which are briefly reviewed in Appendix B. Additional topics that are commonly covered in Complexity Theory courses but are omitted here include the study of branching programs and decision trees (cf. [237]), parallel computation [141], and communication complexity [148]. We mention that the forthcoming textbook of Arora and Barak [14] contains a treatment of all these topics. Finally, we mention two areas that we consider related to Complexity Theory, although this view is not very common. These areas are distributed computing [17] and computational learning theory [142].
11
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
1.1.4. Approach and Style of This Book According to a common opinion, the most important aspect of a scientific work is the technical result that it achieves, whereas explanations and motivations are merely redundancy introduced for the sake of “error correction” and/or comfort. It is further believed that, like in a work of art, the interpretation of the work should be left with the reader. The author strongly disagrees with the aforementioned opinions, and argues that there is a fundamental difference between art and science, and that this difference refers exactly to the meaning of a piece of work. Science is concerned with meaning (and not with form), and in its quest for truth and/or understanding, science follows philosophy (and not art). The author holds the opinion that the most important aspects of a scientific work are the intuitive question that it addresses, the reason that it addresses this question, the way it phrases the question, the approach that underlies its answer, and the ideas that are embedded in the answer. Following this view, it is important to communicate these aspects of the work. The foregoing issues are even more acute when it comes to Complexity Theory, firstly because conceptual considerations seem to play an even more central role in Complexity Theory than in other fields (cf. Section 1.1.2). Secondly (and even more importantly), Complexity Theory is extremely rich in conceptual content. Thus, communicating this content is of primary importance, and failing to do so misses the most important aspects of this theory. Unfortunately, the conceptual content of Complexity Theory is rarely communicated (explicitly) in books and/or surveys of the area.3 The annoying (and quite amazing) consequences are students who have only a vague understanding of the meaning and general relevance of the fundamental notions and results that they were taught. The author’s view is that these consequences are easy to avoid by taking the time to explicitly discuss the meaning of definitions and results. A closely related issue is using the “right” definitions (i.e., those that reflect better the fundamental nature of the notion being defined) and emphasizing the (conceptually) “right” results. The current book is written accordingly.
1.1.4.1. The General Principle In accordance with the foregoing, the focus of this book is on the conceptual aspects of the technical material. Whenever presenting a subject, the starting point is the intuitive questions being addressed. The presentation explains the importance of these questions, the specific ways that they are phrased (i.e., the choices made in the actual formulation), the approaches that underlie the answers, and the ideas that are embedded in these answers. Thus, a significant portion of the text is devoted to motivating discussions that refer to the concepts and ideas that underlie the actual definitions and results. The material is organized around conceptual themes, which reflect fundamental notions and/or general questions. Specific computational problems are rarely referred to, with exceptions that are used either for the sake of clarity or because the specific 3
It is tempting to speculate on the reasons for this phenomenon. One speculation is that communicating the conceptual content of Complexity Theory involves making bold philosophical assertions that are technically straightforward, whereas this combination does not fit the personality of most researchers in Complexity Theory.
12
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
problem happens to capture a general conceptual phenomenon. For example, in this book, “complete problems” (e.g., NPcomplete problems) are always secondary to the class for which they are complete.4
1.1.4.2. On a Few Specific Choices Our technical presentation often differs from the standard one. In many cases this is due to conceptual considerations. At times, this leads to some technical simplifications. In this subsection we only discuss general themes and/or choices that have a global impact on much of the presentation. This discussion is intended mainly for the teacher and/or the expert. Avoiding nondeterministic machines. We try to avoid nondeterministic machines as much as possible. As argued in several places (e.g., Section 2.1.5), we believe that these fictitious “machines” have a negative effect both from a conceptual and technical point of view. The conceptual damage caused by using nondeterministic machines is that it is unclear why one should care about what such machines can do. Needless to say, the reason to care is clear when noting that these fictitious “machines” offer a (convenient but rather slothful) way of phrasing fundamental issues. The technical damage caused by using nondeterministic machines is that they tend to confuse the students. Furthermore, they do not offer the best way to handle more advanced issues (e.g., counting classes). In contrast, we use search problems as the basis for much of the presentation. Specifically, the class PC (see Definition 2.3), which consists of search problems having efficiently checkable solutions, plays a central role in our presentation. Indeed, defining this class is slightly more complicated than the standard definition of N P (which is based on nondeterministic machines), but the technical benefits start accumulating as we proceed. Needless to say, the class PC is a fundamental class of computational problems and this fact is the main motivation for its presentation. (Indeed, the most conceptually appealing phrasing of the PvsNP Question consists of asking whether every search problem in PC can be solved efficiently.) Avoiding modeldependent effects. Complexity Theory evolves around the notion of efficient computation. Indeed, a rigorous study of this notion seems to require reference to some concrete model of computation; however, all questions and answers considered in this book are invariant under the choice of such a concrete model, provided of course that the model is “reasonable” (which, needless to say, is a matter of intuition). The foregoing text reflects the tension between the need to make rigorous definitions and the desire to be independent of technical choices, which are unavoidable when making rigorous definitions. It also reflects the fact that, by their fundamental nature, the questions that we address are quite modelindependent (i.e., are independent of various technical choices). Note that we do not deny the existence of modeldependent 4
We admit that a very natural computational problem can give rise to a class of problems that are computationally equivalent to it, and that in such a case the class may be less interesting than the original problem. This is not the case for any of the complexity classes presented in this book. Still, in some cases (e.g., N P and #P), the historical evolution actually went from a specific computational problem to a class of problems that are computationally equivalent to it. However, in all cases presented in this book, a retrospective evaluation of the material suggests that the class is actually more important than the original problem.
13
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
questions, but rather avoid addressing such questions and view them as less fundamental in nature. In contrast to common beliefs, the foregoing comments refer not only to time complexity but also to space complexity. However, in both cases, the claim of invariance may not hold for marginally small resources (e.g., linear time or sublogarithmic space). In contrast to the foregoing paragraph, in some cases we choose to be specific. The most notorious case is the association of efficiency with polynomialtime complexity (see §1.2.3.5). Indeed, all the questions and answers regarding efficient computation can be phrased without referring to polynomialtime complexity (i.e., by stating explicit functional relations between the complexities of the problems involved), but such a generalized treatment will be painful to follow.
1.1.4.3. On the Presentation of Technical Details In general, the more complex the technical material is, the more levels of expositions we employ (starting from the most highlevel exposition, and when necessary providing more than one level of details). In particular, whenever a proof is not very simple, we try to present the key ideas first, and postpone implementation details to later. We also try to clearly indicate the passage from a highlevel presentation to its implementation details (e.g., by using phrases such as “details follow”). In some cases, especially in the case of advanced results, only proof sketches are provided and the implication is that the reader should be able to fill up the missing details. Few results are stated without a proof. In some of these cases the proof idea or a proof overview is provided, but the reader is not expected to be able to fill up the highly nontrivial details. (In these cases, the text clearly indicates this state of affairs.) One notable example is the proof of the PCP Theorem (9.16). We tried to avoid the presentation of material that, in our opinion, neither is the “last word” on the subject nor represents the “right” way of approaching the subject. Thus, we do not always present the “best” known result. 1.1.4.4. Organizational Principles Each of the main chapters starts with a highlevel summary and ends with chapter notes and exercises. The latter are not aimed at testing or inspiring creativity, but are rather designed to help verify the basic understanding of the main text. In some cases, exercises (augmented by adequate guidelines) are used for presenting additional related material. The book contains material that ranges from topics currently taught in undergraduate courses (on computability and basic Complexity Theory) to topics currently taught mostly in advanced graduate courses. Although this situation may (and hopefully will) change in the future so that undergraduates will enjoy greater exposure to Complexity Theory, we believe that it will continue to be the case that typical readers of the advanced chapters will be more sophisticated than typical readers of the basic chapters (i.e., Section 1.2 and Chapter 2). Accordingly, the style of presentation becomes more sophisticated as one progresses from Chapter 2 to later chapters. As stated in the preface, this book focuses on the highlevel approach to Complexity Theory, whereas the lowlevel approach (i.e., lower bounds) is only briefly reviewed in Appendix B. Other appendices contain material that is closely related to Complexity Theory but is not an integral part of it (e.g., the foundations of
14
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.1. INTRODUCTION
cryptography).5 Further details on the contents of the various chapters and appendices are provided in Section 1.1.3. In an attempt to keep the bibliographic list from becoming longer than an average chapter, we omitted many relevant references. One trick used toward this end is referring to lists of references in other texts, especially when the latter are cited anyhow. Indeed, our choices of references were biased in favor of textbooks and survey articles, because we believe that they provide the best way to further learn about a research direction and/or approach. We tried, however, not to omit references to key papers in an area. In some cases, when we needed a reference for a result of interest and could not resort to the aforementioned trick, we also cited less central papers. As a matter of policy, we tried to avoid references and credits in the main text. The few exceptions are either pointers to texts that provide details that we chose to omit or usage of terms (bearing researchers’ names) that are too popular to avoid. In general, in each chapter, references and credits are provided in the chapter’s notes. Teaching note: The text also includes some teaching notes, which are typeset as this one. Some of these notes express quite opinionated recommendations and/or justify various expositional choices made in the text.
1.1.4.5. A Call for Tolerance This book attempts to accommodate a wide variety of readers, ranging from readers with no prior knowledge of Complexity Theory to experts in the field. This attempt is reflected in tailoring the presentation, in each part of the book, for the readers with the least background who are expected to read this part. However, in a few cases, advanced comments that are mostly directed at more advanced readers could not be avoided. Thus, readers with more background may skip some details, while readers with less background may ignore some advanced comments. An attempt was made to facilitate such selective reading by an adequate labeling of the text, but in many places the readers are expected to exercise their own judgment (and tolerate the fact that they are asked to invest some extra effort in order to accommodate the interests of other types of readers). We stress that the different parts of the book do envision different ranges of possible readers. Specifically, while Section 1.2 and Chapter 2 are intended mainly for readers with no background in Complexity Theory (and even no background in computability), the subsequent chapters do assume such basic background. In addition to familiarity with the basic material, the more advanced parts of the book also assume a higher level of technical sophistication. 1.1.4.6. Additional Comments Regarding Motivation The author’s guess is that the text will be criticized for lengthy discussions of technically trivial issues. Indeed, most researchers dismiss various conceptual clarifications as being 5
As further articulated in Section 7.1, we recommend not including a basic treatment of cryptography within a course on Complexity Theory. Indeed, cryptography may be claimed to be the most appealing application of Complexity Theory, but a superficial treatment of cryptography (from this perspective) is likely to be misleading and cause more harm than good.
15
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
trivial and devote all their attention to the technically challenging parts of the material. The consequence is students who master the technical material but are confused about its meaning. In contrast, the author recommends not being embarrassed at devoting time to conceptual clarifications, even if some students may view them as obvious. The motivational discussions presented in the text do not necessarily represent the original motivation of the researchers who pioneered a specific study and/or contributed greatly to it. Instead, these discussions provide what the author considers to be a good motivation and/or a good perspective on the corresponding concepts.
1.1.5. Standard Notations and Other Conventions Following are some notations and conventions that are freely used in this book. Standard asymptotic notation: When referring to integral functions, we use the standard asymptotic notation; that is, for f, g : N → N, we write f = O(g) (resp., f = (g)) if there exists a constant c > 0 such that f (n) ≤ c · g(n) (resp., f (n) ≥ c · g(n)) holds for all n ∈ N. We usually denote by “poly” an unspecified polynomial, and write f (n) = poly(n) instead of “there exists a polynomial p such that f (n) ≤ p(n) for all n ∈ N.” We also use the notation f = O(g) that means f (n) = poly(log n) · g(n), and f = o(g) (resp., f = ω(g)) that means f (n) < c · g(n) (resp., f (n) > c · g(n)) for every constant c > 0 and all sufficiently large n. Integrality issues: Typically, we ignore integrality issues. This means that we may assume that log2 n is an integer rather than using a more cumbersome form as log2 n. Likewise, we may assume that various equalities are satisfied by integers (e.g., 2n = m m ), even when this cannot possibly be the case (e.g., 2n = 3m ). In all these cases, one should consider integers that approximately satisfy the relevant equations (and deal with the problems that emerge by such approximations, which will be ignored in the current text). Standard combinatorial and graph theory terms and notation: For any set S, we denote by 2 S the set of all subsets of S (i.e., 2 S = {S : S ⊆ S}). For a natural number def n ∈ N, we denote [n] = {1, . . . , n}. Many of the computational problems that we mention refer to finite (undirected) graphs. Such a graph, denoted G = (V, E), consists of a set of vertices, denoted V , and a set of edges, denoted E, which are unordered pairs of vertices. By default, graphs are undirected, whereas directed graphs consist of vertices and directed edges, where a directed edge is an order pair of vertices. We also refer to other graphtheoretic terms such as connectivity, being acyclic (i.e., having no simple cycles), being a tree (i.e., being connected and acyclic), kcolorability, etc. For further background on graphs and computational problems regarding graphs, the reader is referred to Appendix G.1. Typographic conventions: We denote formally defined complexity classes by calligraphic letters (e.g., N P), but we do so only after defining these classes. Furthermore, when we wish to maintain some ambiguity regarding the specific formulation of a class of problems we use Roman font (e.g., NP may denote either a class of search problems or a class of decision problems). Likewise, we denote formally defined computational problems by typewriter font (e.g., SAT). In contrast, generic problems and algorithms will be denoted by italic font.
16
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
1.2. Computational Tasks and Models But, you may say, we asked you to speak about women and fiction – what, has that got to do with a room of one’s own? I will try to explain. Virginia Woolf, A Room of One’s Own
This section provides the necessary preliminaries for the rest of the book; that is, we discuss the notion of a computational task and present computational models (for describing methods) for solving such tasks. We start by introducing the general framework for our discussion of computational tasks (or problems): This framework refers to the representation of instances (as binary sequences) and focuses on two types of tasks (i.e., searching for solutions and making decisions). In order to facilitate a study of methods for solving such tasks, the latter are defined with respect to infinitely many possible instances (each being a finite object).6 Once computational tasks are defined, we turn to methods for solving such tasks, which are described in terms of some model of computation. The description of such models is the main contents of this section. Specifically, we consider two types of models of computation: uniform models and nonuniform models. The uniform models correspond to the intuitive notion of an algorithm, and will provide the stage for the rest of the book (which focuses on efficient algorithms). In contrast, nonuniform models (e.g., Boolean circuits) facilitate a closer look at the way a computation progresses, and will be used only sporadically in this book. Organization of Section 1.2. Sections 1.2.1–1.2.3 correspond to the contents of a traditional computability course, except that our presentation emphasizes some aspects and deemphasizes others. In particular, the presentation highlights the notion of a universal machine (see §1.2.3.4), justifies the association of efficient computation with polynomialtime algorithms (§1.2.3.5), and provides a definition of oracle machines (§1.2.3.6). This material (with the exception of Kolmogorov Complexity) is taken for granted in the rest of the current book. In contrast, Section 1.2.4 presents basic preliminaries regarding nonuniform models of computation (i.e., various types of Boolean circuits), and these are only used lightly in the rest of the book. (We also call the reader’s attention to the discussion of generic complexity classes in Section 1.2.5.) Thus, whereas Sections 1.2.1–1.2.3 (and 1.2.5) are absolute prerequisites for the rest of this book, Section 1.2.4 is not. Teaching note: The author believes that there is no real need for a semesterlong course in computability (i.e., a course that focuses on what can be computed rather than on what can be computed efficiently). Instead, undergraduates should take a course in Computational Complexity, which should contain the computability aspects that serve as a basis for the rest of the course. Specifically, the former aspects should occupy at most 25% of the course, and the focus should be on basic complexity issues (captured by P, NP, and NPcompleteness) augmented by a selection of some more advanced material. Indeed, such a course can be based on Chapters 1 and 2 of the current book (augmented by a selection of some topics from other chapters).
6
The comparison of different methods seems to require the consideration of infinitely many possible instances; otherwise, the choice of the language in which the methods are described may totally dominate and even distort the discussion (cf. the discussion of Kolmogorov Complexity in §1.2.3.4).
17
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
1.2.1. Representation In mathematics and related sciences, it is customary to discuss objects without specifying their representation. This is not possible in the theory of computation, where the representation of objects plays a central role. In a sense, a computation merely transforms one representation of an object to another representation of the same object. In particular, a computation designed to solve some problem merely transforms the problem instance to its solution, where the latter can be thought of as a (possibly partial) representation of the instance. Indeed, the answer to any fully specified question is implicit in the question itself, and computation is employed to make this answer explicit. Computational tasks refers to objects that are represented in some canonical way, where such canonical representation provides an “explicit” and “full” (but not “overly redundant”) description of the corresponding object. We will consider only finite objects like numbers, sets, graphs, and functions (and keep distinguishing these types of objects although, actually, they are all equivalent). While the representation of numbers, sets, and functions is quite straightforward, we refer the reader to Appendix G.1 for a discussion of the representation of graphs. Strings. We consider finite objects, each represented by a finite binary sequence, called a string. For a natural number n, we denote by {0, 1}n the set of all strings of length n, hereafter referred to as n bit (long) strings. The set of all strings is denoted {0, 1}∗ ; that is, {0, 1}∗ = ∪n∈N {0, 1}n . For x ∈ {0, 1}∗ , we denote by x the length of x (i.e., x ∈ {0, 1}x ), and often denote by xi the i th bit of x (i.e., x = x1 x2 · · · xx ). For x, y ∈ {0, 1}∗ , we denote by x y the string resulting from concatenation of the strings x and y. At times, we associate {0, 1}∗ ×{0, 1}∗ with {0, 1}∗ ; the reader should merely consider an adequate encoding (e.g., the pair (x1 · · · xm , y1 · · · yn ) ∈ {0, 1}∗ × {0, 1}∗ may be encoded by the string x1 x1 · · · xm xm 01y1 · · · yn ∈ {0, 1}∗ ). Likewise, we may represent sequences of strings (of fixed or varying length) as single strings. When we wish to emphasize that such a sequence (or some other object) is to be considered as a single object we use the notation · (e.g., “the pair (x, y) is encoded as the string x, y”). Numbers. Unless stated differently, natural numbers will be encoded by their binary n−1 expansion; that is, the string bn−1 · · · b1 b0 ∈ {0, 1}n encodes the number i=0 bi · 2i , where typically we assume that this representation has no leading zeros (i.e., bn−1 = 1). Rational numbers will be represented as pairs of natural numbers. In the rare cases in which one considers real numbers as part of the input to a computational problem, one actually means rational approximations of these real numbers. Special symbols. We denote the empty string by λ (i.e., λ ∈ {0, 1}∗ and λ = 0), and the empty set by ∅. It will be convenient to use some special symbols that are not in {0, 1}∗ . One such symbol is ⊥, which typically denotes an indication (e.g., produced by some algorithm) that something is wrong.
1.2.2. Computational Tasks Two fundamental types of computational tasks are the socalled search problems and decision problems. In both cases, the key notions are the problem’s instances and the problem’s specification.
18
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
1.2.2.1. Search Problems A search problem consists of a specification of a set of valid solutions (possibly an empty one) for each possible instance. That is, given an instance, one is required to find a corresponding solution (or to determine that no such solution exists). For example, consider the problem in which one is given a system of equations and is asked to find a valid solution. Needless to say, much of computer science is concerned with solving various search problems (e.g., finding shortest paths in a graph, sorting a list of numbers, finding an occurrence of a given pattern in a given string, etc). Furthermore, search problems correspond to the daily notion of “solving a problem” (e.g., finding one’s way between two locations), and thus a discussion of the possibility and complexity of solving search problems corresponds to the natural concerns of most people. In the following definition of solving search problems, the potential solver is a function (which may be thought of as a solving strategy), and the sets of possible solutions associated with each of the various instances are “packed” into a single binary relation. Definition 1.1 (solving a search problem): Let R ⊆ {0, 1}∗ × {0, 1}∗ and def R(x) = {y : (x, y) ∈ R} denote the set of solutions for the instance x. A function f : {0, 1}∗ → {0, 1}∗ ∪ {⊥} solves the search problem of R if for every x the following holds: if R(x) = ∅ then f (x) ∈ R(x) and otherwise f (x) = ⊥. Indeed, R = {(x, y) ∈ {0, 1}∗ × {0, 1}∗ : y ∈ R(x)}, and the solver f is required to find a solution (i.e., given x output y ∈ R(x)) whenever one exists (i.e., the set R(x) is not empty). It is also required that the solver f never outputs a wrong solution (i.e., if R(x) = ∅ then f (x) ∈ R(x) and if R(x) = ∅ then f (x) = ⊥), which in turn means that f indicates whether x has any solution. A special case of interest is the case of search problems having a unique solution (for each possible instance); that is, the case that R(x) = 1 for every x. In this case, R is essentially a (total) function, and solving the search problem of R means computing (or def evaluating) the function R (or rather the function R defined by R (x) = y if and only if R(x) = {y}). Popular examples include sorting a sequence of numbers, multiplying integers, finding the prime factorization of a composite number, etc.
1.2.2.2. Decision Problems A decision problem consists of a specification of a subset of the possible instances. Given an instance, one is required to determine whether the instance is in the specified set (e.g., the set of prime numbers, the set of connected graphs, or the set of sorted sequences). For example, consider the problem where one is given a natural number, and is asked to determine whether or not the number is a prime. One important case, which corresponds to the aforementioned search problems, is the case of the set of instances having a solution; that is, for any binary relation R ⊆ {0, 1}∗ × {0, 1}∗ we consider the set {x : R(x) = ∅}. Indeed, being able to determine whether or not a solution exists is a prerequisite to being able to solve the corresponding search problem (as per Definition 1.1). In general, decision problems refer to the natural task of making a binary decision, a task that is not uncommon in daily life (e.g., determining whether a traffic light is red). In any case, in the following definition of solving decision problems, the potential solver is again a function; that is, in this case the solver is a Boolean function, which is supposed to indicate membership in a predetermined set.
19
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
Definition 1.2 (solving a decision problem): Let S ⊆ {0, 1}∗ . A function f : {0, 1}∗ → {0, 1} solves the decision problem of S (or decides membership in S) if for every x it holds that f (x) = 1 if and only if x ∈ S. We often identify the decision problem of S with S itself, and identify S with its characteristic function (i.e., with the function χ S : {0, 1}∗ → {0, 1} defined such that χ S (x) = 1 if and only if x ∈ S). Note that if f solves the search problem of R then the Boolean def function f : {0, 1}∗ → {0, 1} defined by f (x) = 1 if and only if f (x) = ⊥ solves the decision problem of {x : R(x) = ∅}. Reflection. Most people would consider search problems to be more natural than decision problems: Typically, people seek solutions more than they stop to wonder whether or not solutions exist. Definitely, search problems are not less important than decision problems; it is merely that their study tends to require more cumbersome formulations. This is the main reason that most expositions choose to focus on decision problems. The current book attempts to devote at least a significant amount of attention also to search problems.
1.2.2.3. Promise Problems (an Advanced Comment) Many natural search and decision problems are captured more naturally by the terminology of promise problems, in which the domain of possible instances is a subset of {0, 1}∗ rather than {0, 1}∗ itself. In particular, note that the natural formulation of many search and decision problems refers to instances of a certain type (e.g., a system of equations, a pair of numbers, a graph), whereas the natural representation of these objects uses only a strict subset of {0, 1}∗ . For the time being, we ignore this issue, but we shall revisit it in Section 2.4.1. Here we just note that, in typical cases, the issue can be ignored by postulating that every string represents some legitimate object (e.g., each string that is not used in the natural representation of these objects is postulated as a representation of some fixed object). 1.2.3. Uniform Models (Algorithms) Science is One. Laci Lov´asz (according to Silvio Micali, ca. 1990)
We finally reach the heart of the current section (Section 1.2), which is the definition of uniform models of computation. We are all familiar with computers and with the ability of computer programs to manipulate data. This familiarity seems to be rooted in the positive side of computing; that is, we have some experience regarding some things that computers can do. In contrast, Complexity Theory is focused at what computers cannot do, or rather with drawing the line between what can be done and what cannot be done. Drawing such a line requires a precise formulation of all possible computational processes; that is, we should have a clear model of all possible computational processes (rather than some familiarity with some computational processes).
1.2.3.1. Overview and General Principles Before being formal, let we offer a general and abstract description, which is aimed at capturing any artificial as well as natural process. Indeed, artificial processes will be associated with computers, whereas by natural processes we mean (attempts to
20
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
model) the “mechanical” aspects of the natural reality (be it physical, biological, or even social). A computation is a process that modifies an environment via repeated applications of a predetermined rule. The key restriction is that this rule is simple: In each application it depends on and affects only a (small) portion of the environment, called the active zone. We contrast the a priori bounded size of the active zone (and of the modification rule) with the a priori unbounded size of the entire environment. We note that, although each application of the rule has a very limited effect, the effect of many applications of the rule may be very complex. Put in other words, a computation may modify the relevant environment in a very complex way, although it is merely a process of repeatedly applying a simple rule. As hinted, the notion of computation can be used to model the “mechanical” aspects of the natural reality, that is, the rules that determine the evolution of the reality (rather than the specific state of the reality at a specific time). In this case, the starting point of the study is the actual evolution process that takes place in the natural reality, and the goal of the study is finding the (computation) rule that underlies this natural process. In a sense, the goal of science at large can be phrased as finding (simple) rules that govern various aspects of reality (or rather one’s abstraction of these aspects of reality). Our focus, however, is on artificial computation rules designed by humans in order to achieve specific desired effects on a corresponding artificial environment. Thus, our starting point is a desired functionality, and our aim is to design computation rules that effect it. Such a computation rule is referred to as an algorithm. Loosely speaking, an algorithm corresponds to a computer program written in a highlevel (abstract) programming language. Let us elaborate. We are interested in the transformation of the environment as affected by the computational process (or the algorithm). Throughout (most of) this book, we will assume that, when invoked on any finite initial environment, the computation halts after a finite number of steps. Typically, the initial environment to which the computation is applied encodes an input string, and the end environment (i.e., at the termination of the computation) encodes an output string. We consider the mapping from inputs to outputs induced by the computation; that is, for each possible input x, we consider the output y obtained at the end of a computation initiated with input x, and say that the computation maps input x to output y. Thus, a computation rule (or an algorithm) determines a function (computed by it): This function is exactly the aforementioned mapping of inputs to outputs. In the rest of this book (i.e., outside the current chapter), we will also consider the number of steps (i.e., applications of the rule) taken by the computation on each possible input. The latter function is called the time complexity of the computational process (or algorithm). While time complexity is defined per input, we will often consider it per input length, taking the maximum over all inputs of the same length. In order to define computation (and computation time) rigorously, one needs to specify some model of computation, that is, provide a concrete definition of environments and a class of rules that may be applied to them. Such a model corresponds to an abstraction of a real computer (be it a PC, mainframe, or network of computers). One simple abstract model that is commonly used is that of Turing machines (see, §1.2.3.2). Thus, specific algorithms are typically formalized by corresponding Turing machines (and their time complexity is represented by the time complexity of the corresponding Turing machines). We stress, however, that most results in the theory of computation hold regardless of the specific computational model used, as long as it is “reasonable”
21
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
(i.e., satisfies the aforementioned simplicity condition and can perform some apparently simple computations). What is being computed? The foregoing discussion has implicitly referred to algorithms (i.e., computational processes) as means of computing functions. Specifically, an algorithm A computes the function f A : {0, 1}∗ → {0, 1}∗ defined by f A (x) = y if, when invoked on input x, algorithm A halts with output y. However, algorithms can also serve as means of “solving search problems” or “making decisions” (as in Definitions 1.1 and 1.2). Specifically, we will say that algorithm A solves the search problem of R (resp., decides membership in S) if f A solves the search problem of R (resp., decides membership in S). In the rest of this exposition we associate the algorithm A with the function f A computed by it; that is, we write A(x) instead of f A (x). For the sake of future reference, we summarize the foregoing discussion. Definition 1.3 (algorithms as problem solvers): We denote by A(x) the output of algorithm A on input x. Algorithm A solves the search problem R (resp., the decision problem S) if A, viewed as a function, solves R (resp., S). Organization of the rest of Section 1.2.3. In §1.2.3.2 we provide a rough description of the model of Turing machines. This is done merely for the sake of providing a concrete model that supports the study of computation and its complexity, whereas most of the material in this book will not depend on the specifics of this model. In §1.2.3.3 and §1.2.3.4 we discuss two fundamental properties of any reasonable model of computation: the existence of uncomputable functions and the existence of universal computations. The time (and space) complexity of computation is defined in §1.2.3.5. We also discuss oracle machines and restricted models of computation (in §1.2.3.6 and §1.2.3.7, respectively).
1.2.3.2. A Concrete Model: Turing Machines The model of Turing machines offers a relatively simple formulation of the notion of an algorithm. The fact that the model is very simple complicates the design of machines that solve problems of interest, but makes the analysis of such machines simpler. Since the focus of Complexity Theory is on the analysis of machines and not on their design, the tradeoff offered by this model is suitable for our purposes. We stress again that the model is merely used as a concrete formulation of the intuitive notion of an algorithm, whereas we actually care about the intuitive notion and not about its formulation. In particular, all results mentioned in this book hold for any other “reasonable” formulation of the notion of an algorithm. The model of Turing machines is not meant to provide an accurate (or “tight”) model of reallife computers, but rather to capture their inherent limitations and abilities (i.e., a computational task can be solved by a reallife computer if and only if it can be solved by a Turing machine). In comparison to reallife computers, the model of Turing machines is extremely oversimplified and abstracts away many issues that are of great concern to computer practice. However, these issues are irrelevant to the higherlevel questions addressed by Complexity Theory. Indeed, as usual, good practice requires more refined understanding than the one provided by a good theory, but one should first provide the latter. Historically, the model of Turing machines was invented before modern computers were even built, and was meant to provide a concrete model of computation and a definition
22
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
b 5
3
2
2
2
3
1
0
















b 5
3
2
2
2
3
3
0
Figure 1.2: A single step by a Turing machine.
of computable functions.7 Indeed, this concrete model clarified fundamental properties of computable functions and plays a key role in defining the complexity of computable functions. The model of Turing machines was envisioned as an abstraction of the process of an algebraic computation carried out by a human using a sheet of paper. In such a process, at each time, the human looks at some location on the paper, and depending on what he/she sees and what he/she has in mind (which is little . . .), he/she modifies the contents of this location and shifts his/her look to an adjacent location. The Actual Model. Following is a highlevel description of the model of Turing machines; the interested reader is referred to standard textbooks (e.g., [208]) for further details. Recall that we need to specify the set of possible environments, the set of machines (or computation rules), and the effect of applying such a rule on an environment. • The main component in the environment of a Turing machine is an infinite sequence of cells, each capable of holding a single symbol (i.e., member of a finite set ⊃ {0, 1}). This sequence is envisioned as starting at a leftmost cell, and extending infinitely to the right (cf. Figure 1.2). In addition, the environment contains the current location of the machine on this sequence, and the internal state of the machine (which is a member of a finite set Q). The aforementioned sequence of cells is called the tape, and its contents combined with the machine’s location and its internal state is called the instantaneous configuration of the machine. • The main component in the Turing machine itself is a finite rule (i.e., a finite function), called the transition function, which is defined over the set of all possible symbolstate pairs. Specifically, the transition function is a mapping from × Q to × Q × {−1, 0, +1}, where {−1, +1, 0} correspond to a movement instruction (which is either “left” or “right” or “stay,” respectively). In addition, the machine’s description specifies an initial state and a halting state, and the computation of the machine halts when the machine enters its halting state.8 7
In contrast, the abstract definition of “recursive functions” yields a class of “computable” functions without referring to any model of computation (but rather based on the intuition that any such model should support recursive functional composition). 8 Envisioning the tape as in Figure 1.2, we also use the convention that if the machine tries to move left of the end of the tape then it is considered to have halted.
23
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
We stress that, in contrast to the finite description of the machine, the tape has an a priori unbounded length (and is considered, for simplicity, as being infinite). • A single computation step of such a Turing machine depends on its current location on the tape, on the contents of the corresponding cell, and on the internal state of the machine. Based on the latter two elements, the transition function determines a new symbolstate pair as well as a movement instruction (i.e., “left” or “right” or “stay”). The machine modifies the contents of the said cell and its internal state accordingly, and moves as directed. That is, suppose that the machine is in state q and resides in a cell containing the symbol σ , and suppose that the transition function maps (σ, q) to (σ , q , D). Then, the machine modifies the contents of the said cell to σ , modifies its internal state to q , and moves one cell in direction D. Figure 1.2 shows a single step of a Turing machine that, when in state ‘b’ and seeing a binary symbol σ , replaces σ with the symbol σ + 2, maintains its internal state, and moves one position to the right.9 Formally, we define the successive configuration function that maps each instantaneous configuration to the one resulting by letting the machine take a single step. This function modifies its argument in a very minor manner, as described in the foregoing; that is, the contents of at most one cell (i.e., at which the machine currently resides) is changed, and in addition the internal state of the machine and its location may change, too. The initial environment (or configuration) of a Turing machine consists of the machine residing in the first (i.e., leftmost) cell and being in its initial state. Typically, one also mandates that, in the initial configuration, a prefix of the tape’s cells hold bit values, which concatenated together are considered the input, and the rest of the tape’s cells hold a special symbol (which in Figure 1.2 is denoted by ‘’). Once the machine halts, the output is defined as the contents of the cells that are to the left of its location (at termination time).10 Thus, each machine defines a function mapping inputs to outputs, called the function computed by the machine. Multitape Turing machines. We comment that in most expositions, one refers to the location of the “head of the machine” on the tape (rather than to the “location of the machine on the tape”). The standard terminology is more intuitive when extending the basic model, which refers to a single tape, to a model that supports a constant number of tapes. In the corresponding model of socalled multitape machines, the machine maintains a single head on each such tape, and each step of the machine depends on and affects the cells that are at the machine’s head location on each tape. As we shall see in Chapter 5 (and in §1.2.3.5), the extension of the model to multitape Turing machines is crucial to the definition of space complexity. A less fundamental advantage of the model of multitape Turing machines is that it facilitates the design of machines that compute functions of interest. 9
Figure 1.2 corresponds to a machine that, when in the initial state (i.e., ‘a’), replaces the symbol σ by σ + 4, modifies its internal state to ‘b’, and moves one position to the right. Indeed, “marking” the leftmost cell (in order to allow for recognizing it in the future) is a common practice in the design of Turing machines. 10 By an alternative convention, the machine halts while residing in the leftmost cell, and the output is defined as the maximal prefix of the tape contents that contains only bit values.
24
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
Teaching note: We strongly recommend avoiding the standard practice of teaching the student to program with Turing machines. These exercises seem very painful and pointless. Instead, one should prove that the Turing machine model is exactly as powerful as a model that is closer to a reallife computer (see the following “sanity check”); that is, a function can be computed by a Turing machine if and only if it is computable by a machine of the latter model. For starters, one may prove that a function can be computed by a singletape Turing machine if and only if it is computable by a multitape (e.g., twotape) Turing machine.
The ChurchTuring Thesis. The entire point of the model of Turing machines is its simplicity. That is, in comparison to more “realistic” models of computation, it is simpler to formulate the model of Turing machines and to analyze machines in this model. The ChurchTuring Thesis asserts that nothing is lost by considering the Turing machine model: A function can be computed by some Turing machine if and only if it can be computed by some machine of any other “reasonable and general” model of computation. This is a thesis, rather than a theorem, because it refers to an intuitive notion (i.e., the notion of a reasonable and general model of computation) that is left undefined on purpose. The model should be reasonable in the sense that it should allow only computation rules that are “simple” in some intuitive sense. For example, we should be able to envision a mechanical implementation of these computation rules. On the other hand, the model should allow for computation of “simple” functions that are definitely computable according to our intuition. At the very least the model should allow for emulation of Turing machines (i.e., computation of the function that, given a description of a Turing machine and an instantaneous configuration, returns the successive configuration). A philosophical comment. The fact that a thesis is used to link an intuitive concept to a formal definition is common practice in any science (or, more broadly, in any attempt to reason rigorously about intuitive concepts). Any attempt to rigorously define an intuitive concept yields a formal definition that necessarily differs from the original intuition, and the question of correspondence between these two objects arises. This question can never be rigorously treated, because one of the objects that it relates to is undefined. That is, the question of correspondence between the intuition and the definition always transcends a rigorous treatment (i.e., it always belongs to the domain of the intuition). A sanity check: Turing machines can emulate an abstract RAM. To gain confidence in the ChurchTuring Thesis, one may attempt to define an abstract randomaccess machine (RAM), and verify that it can be emulated by a Turing machine. An abstract RAM consists of an infinite number of memory cells, each capable of holding an integer, a finite number of similar registers, one designated as program counter, and a program consisting of instructions selected from a finite set. The set of possible instructions includes the following instructions: • reset(r ), where r is an index of a register, results in setting the value of register r to zero; • inc(r ), where r is an index of a register, results in incrementing the content of register r . Similarly dec(r ) causes a decrement;
25
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
• load(r1 , r2 ), where r1 and r2 are indices of registers, results in loading to register r1 the contents of the memory location m, where m is the current contents of register r2 ; • store(r1 , r2 ), stores the contents of register r1 in the memory, analogously to load; • condgoto(r, ), where r is an index of a register and does not exceed the program length, results in setting the program counter to − 1 if the content of register r is nonnegative. The program counter is incremented after the execution of each instruction, and the next instruction to be executed by the machine is the one to which the program counter points (and the machine halts if the program counter exceeds the program’s length). The input to the machine may be defined as the contents of the first n memory cells, where n is placed in a special input register. We note that, as stated, the abstract RAM model is as powerful as the Turing machine model (see the following details). However, in order to make the RAM model closer to reallife computers, we may augment it with additional instructions that are available on reallife computers like the instruction add(r1 , r2 ) (resp., mult(r1 , r2 )) that results in adding (resp., multiplying) the contents of registers r1 and r2 (and placing the result in register r1 ). We suggest proving that this abstract RAM can be emulated by a Turing machine.11 (Hint: note that during the emulation, we only need to hold the input, the contents of all registers, and the contents of the memory cells that were accessed during the computation.)12 Reflections: Observe that the abstract RAM model is significantly more cumbersome than the Turing machine model. Furthermore, seeking a sound choice of the instruction set (i.e., the instructions to be allowed in the model) creates a vicious cycle (because the sound guideline for such a choice should have been allowing only instructions that correspond to “simple” operations, whereas the latter correspond to easily computable functions . . .). This vicious cycle was avoided in the foregoing paragraph by trusting the reader to include only instructions that are available in some reallife computer. (We comment that this empirical consideration is justifiable in the current context, because our current goal is merely linking the Turing machine model with the reader’s experience of reallife computers.)
1.2.3.3. Uncomputable Functions Strictly speaking, the current subsection is not necessary for the rest of this book, but we feel that it provides a useful perspective. In contrast to what every layman would think, we know that not all functions are computable. Indeed, an important message to be communicated to the world is that not every welldefined task can be solved by applying a “reasonable” automated procedure (i.e., a procedure that has a simple description that can be applied to any instance of the problem at hand). Furthermore, not only is it the case that there exist uncomputable 11
We emphasize this direction of the equivalence of the two models, because the RAM model is introduced in order to convince the reader that Turing machines are not too weak (as a model of general computation). The fact that they are not too strong seems selfevident. Thus, it seems pointless to prove that the RAM model can emulate Turing machines. Still, note that this is indeed the case, by using the RAM’s memory cells to store the contents of the cells of the Turing machine’s tape. 12 Thus, at each time, the Turning machine’s tape contains a list of the RAM’s memory cells that were accessed so far as well as their current contents. When we emulate a RAM instruction, we first check whether the relevant RAM cell appears on this list, and augment the list by a corresponding entry or modify this entry as needed.
26
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
functions, but it is also the case that most functions are uncomputable. In fact, only relatively few functions are computable. Theorem 1.4 (on the scarcity of computable functions): The set of computable functions is countable, whereas the set of all functions (from strings to string) has cardinality ℵ. We stress that the theorem holds for any reasonable model of computation. In fact, it only relies on the postulate that each machine in the model has a finite description (i.e., can be described by a string). Proof: Since each computable function is computable by a machine that has a finite description, there is a 1–1 correspondence between the set of computable functions and the set of strings (which in turn is in 1–1 correspondence to the natural numbers). On the other hand, there is a 1–1 correspondence between the set of Boolean functions (i.e., functions from strings to a single bit) and the set of real number in [0, 1). This correspondence associates each real r ∈ [0, 1) to the function f : N → {0, 1} such that f (i) is the i th bit in the infinite binary expansion of r . The Halting Problem: In contrast to the discussion in §1.2.3.1, at this point we also consider machines that may not halt on some inputs. (The functions computed by such machines are partial functions that are defined only on inputs on which the machine halts.) Again, we rely on the postulate that each machine in the model has a finite description, and denote the description of machine M by M ∈ {0, 1}∗ . The halting def function, h : {0, 1}∗ × {0, 1}∗ → {0, 1}, is defined such that h(M, x) = 1 if and only if M halts on input x. The following result goes beyond Theorem 1.4 by pointing to an explicit function (of natural interest) that is not computable. Theorem 1.5 (undecidability of the halting problem): The halting function is not computable. The term undecidability means that the corresponding decision problem cannot be solved by an algorithm. That is, Theorem 1.5 asserts that the decision problem associated with the set h−1 (1) = {(M, x) : h(M, x) = 1} is not solvable by an algorithm (i.e., there exists no algorithm that, given a pair (M, x), decides whether or not M halts on input x). Actually, the following proof shows that there exists no algorithm that, given M, decides whether or not M halts on input M. Proof: We will show that even the restriction of h to its “diagonal” (i.e., the function def d(M) = h(M, M)) is not computable. Note that the value of d(M) refers to the question of what happens when we feed M with its own description, which is indeed a “nasty” (but legitimate) thing to do. We will actually do something “worse”: toward the contradiction, we will consider the value of d when evaluated at a (machine that is related to a) hypothetical machine that supposedly computes d. We start by considering a related function, d , and showing that this function is uncomputable. The function d is defined on purpose so as to foil any attempt
27
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
to compute it; that is, for every machine M, the value d (M) is defined to differ from M(M). Specifically, the function d : {0, 1}∗ → {0, 1} is defined such that def d (M) = 1 if and only if M halts on input M with output 0. (That is, d (M) = 0 if either M does not halt on input M or its output does not equal the value 0.) Now, suppose, toward the contradiction, that d is computable by some machine, denoted Md . Note that machine Md is supposed to halt on every input, and so Md halts on input Md . But, by definition of d , it holds that d (Md ) = 1 if and only if Md halts on input Md with output 0 (i.e., if and only if Md (Md ) = 0). Thus, Md (Md ) = d (Md ) in contradiction to the hypothesis that Md computes d . We next prove that d is uncomputable, and thus h is uncomputable (because d(z) = h(z, z) for every z). To prove that d is uncomputable, we show that if d is computable then so is d (which we already know not to be the case). Indeed, suppose toward the contradiction that A is an algorithm for computing d (i.e., A(M) = d(M) for every machine M). Then we construct an algorithm for computing d , which given M , invokes A on M , where M is defined to operate as follows: 1. On input x, machine M emulates M on input x. 2. If M halts on input x with output 0 then M halts. 3. If M halts on input x with an output different from 0 then M enters an infinite loop (and thus does not halt). 4. Otherwise (i.e., M does not halt on input x), then machine M does not halt (because it just stays stuck in Step 1 forever). Note that the mapping from M to M is easily computable (by augmenting M with instructions to test its output and enter an infinite loop if necessary), and that d(M ) = d (M ), because M halts on x if and only if M halts on x with output 0. We thus derived an algorithm for computing d (i.e., transform the input M into M and output A(M )), which contradicts the already established fact by which d is uncomputable. Turingreductions. The core of the second part of the proof of Theorem 1.5 is an algorithm that solves one problem (i.e., computes d ) by using as a subroutine an algorithm that solves another problem (i.e., computes d (or h)). In fact, the first algorithm is actually an algorithmic scheme that refers to a “functionally specified” subroutine rather than to an actual (implementation of such a) subroutine, which may not exist. Such an algorithmic scheme is called a Turingreduction (see formulation in §1.2.3.6). Hence, we have Turingreduced the computation of d to the computation of d, which in turn Turingreduces to h. The “natural” (“positive”) meaning of a Turingreduction of f to f is that, when given an algorithm for computing f , we obtain an algorithm for computing f . In contrast, the proof of Theorem 1.5 uses the “unnatural” (“negative”) counterpositive: If (as we know) there exists no algorithm for computing f = d then there exists no algorithm for computing f = d (which is what we wanted to prove). Jumping ahead, we mention that resourcebounded Turingreductions (e.g., polynomialtime reductions) play a central role in Complexity Theory itself, and again they are used mostly in a “negative” way. We will define such reductions and extensively use them in subsequent chapters.
28
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
Rice’s Theorem. The undecidability of the halting problem (or rather the fact that the function d is uncomputable) is a special case of a more general phenomenon: Every nontrivial decision problem regarding the function computed by a given Turing machine has no algorithmic solution. We state this fact next, clarifying the definition of the aforementioned class of problems. (Again, we refer to Turing machines that may not halt on all inputs.) Theorem 1.6 (Rice’s Theorem): Let F be any nontrivial subset13 of the set of all computable partial functions, and let SF be the set of strings that describe machines that compute functions in F. Then deciding membership in SF cannot be solved by an algorithm. Theorem 1.6 can be proved by a Turingreduction from d. We do not provide a proof because this is too remote from the main subject matter of the book. We stress that Theorems 1.5 and 1.6 hold for any reasonable model of computation (referring both to the potential solvers and to the machines the description of which is given as input to these solvers). Thus, Theorem 1.6 means that no algorithm can determine any nontrivial property of the function computed by a given computer program (written in any programming language). For example, no algorithm can determine whether or not a given computer program halts on each possible input. The relevance of this assertion to the project of program verification is obvious. The Post Correspondence Problem. We mention that undecidability arises also outside of the domain of questions regarding computing devices (given as input). Specifically, we consider the Post Correspondence Problem in which the input consists of two sequences of strings, (α1 , . . . , αk ) and (β1 , . . . , βk ), and the question is whether or not there exists a sequence of indices i 1 , . . . , i ∈ {1, . . . , k} such that αi1 · · · αi = βi1 · · · βi . (We stress that the length of this sequence is not a priori bounded.)14 Theorem 1.7: The Post Correspondence Problem is undecidable. Again, the omitted proof is by a Turingreduction from d (or h).15
1.2.3.4. Universal Algorithms So far we have used the postulate that, in any reasonable model of computation, each machine (or computation rule) has a finite description. Furthermore, we also used the fact that such model should allow for the easy modification of such descriptions such that the resulting machine computes an easily related function (see the proof of Theorem 1.5). Here we go one step further and postulate that the description of machines (in this model) is “effective” in the following natural sense: There exists an algorithm that, given a description of a machine (resp., computation rule) and a corresponding environment, determines the environment that results from performing a single step of this machine on 13
The set S is called a nontrivial subset of U if both S and U \ S are nonempty. Clearly, if F is a trivial set of computable functions then the corresponding decision problem can be solved by a “trivial” algorithm that outputs the corresponding constant bit. 14 In contrast, the existence of an adequate sequence of a specified length can be determined in time that is exponential in this length. 15 We mention that the reduction maps an instance (M, x) of h to a pair of sequences ((α1 , . . . , αk ), (β1 , . . . , βk )) such that only α1 and β1 depend on x, whereas k as well as the other strings depend only on M.
29
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
this environment (resp., the effect of a single application of the computation rule). This algorithm can, in turn, be implemented in the said model of computation (assuming this model is general; see the ChurchTuring Thesis). Successive applications of this algorithm leads to the notion of a universal machine, which (for concreteness) is formulated next in terms of Turing machines. Definition 1.8 (universal machines): A universal Turing machine is a Turing machine that on input a description of a machine M and an input x returns the value of M(x) if M halts on x and otherwise does not halt. That is, a universal Turing machine computes the partial function u on pairs (M, x) such that M halts on input x, in which case it holds that u(M, x) = M(x). That is, u(M, x) = M(x) if M halts on input x and u is undefined on (M, x) otherwise. We note that if M halts on all possible inputs then u(M, x) is defined for every x. We stress that the mere fact that we have defined something (i.e., a universal Turing machine) does not mean that it exists. Yet, as hinted in the foregoing discussion and obvious to anyone who has written a computer program (and thought about what he/she was doing), universal Turing machines do exist. Theorem 1.9: There exists a universal Turing machine. Theorem 1.9 asserts that the partial function u is computable. In contrast, it can be shown that any extension of u to a total function is uncomputable. That is, for any total function uˆ that agrees with the partial function u on all the inputs on which the latter is defined, it holds that uˆ is uncomputable.16 Proof: Given a pair (M, x), we just emulate the computation of machine M on input x. This emulation is straightforward, because (by the effectiveness of the description of M) we can iteratively determine the next instantaneous configuration of the computation of M on input x. If the said computation halts then we will obtain its output and can output it (and so, on input (M, x), our algorithm returns M(x)). Otherwise, we turn out emulating an infinite computation, which means that our algorithm does not halt on input (M, x). Thus, the foregoing emulation procedure constitutes a universal machine (i.e., yields an algorithm for computing u). As hinted already, the existence of universal machines is the fundamental fact underlying the paradigm of generalpurpose computers. Indeed, a specific Turing machine (or algorithm) is a device that solves a specific problem. A priori, solving each problem would have required building a new physical device, that allows for this problem to be solved in the physical world (rather than as a thought experiment). The existence of a universal machine asserts that it is enough to build one physical device, that is, a general purpose computer. Any specific problem can then be solved by writing a corresponding program 16
The claim is easy to prove for the total function uˆ that extends u and assigns the special symbol ⊥ to inputs on def def which u is undefined (i.e., uˆ (M, x) = ⊥ if u is not defined on (M, x) and uˆ (M, x) = u(M, x) otherwise). In this case h(M, x) = 1 if and only if u(M, ˆ x) = ⊥, and so the halting function h is Turingreducible to uˆ . In the general case, we may adapt the proof of Theorem 1.5 by using the fact that, for a machine M that halts on every input, it holds that uˆ (M, x) = u(M, x) for every x (and in particular for x = M).
30
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
to be executed (or emulated) by the generalpurpose computer. Thus, universal machines correspond to generalpurpose computers, and provide the basis for separating hardware from software. In other words, the existence of universal machines says that software can be viewed as (part of the) input. In addition to their practical importance, the existence of universal machines (and their variants) has important consequences in the theories of computability and Computational Complexity. To demonstrate the point, we note that Theorem 1.6 implies that many questions about the behavior of a fixed universal machine on certain input types are undecidable. For example, it follows that, for some fixed machines (i.e., universal ones), there is no algorithm that determines whether or not the (fixed) machine halts on a given input. Revisiting the proof of Theorem 1.7 (see footnote 15), it follows that the Post Correspondence Problem remains undecidable even if the input sequences are restricted to have a specific length (i.e., k is fixed). A more important application of universal machines to the theory of computability follows. A detour: Kolmogorov Complexity. The existence of universal machines, which may be viewed as universal languages for writing effective and succinct descriptions of objects, plays a central role in Kolmogorov Complexity. Loosely speaking, the latter theory is concerned with the length of (effective) descriptions of objects, and views the minimum such length as the inherent “complexity” of the object; that is, “simple” objects (or phenomena) are those having short description (resp., short explanation), whereas “complex” objects have no short description. Needless to say, these (effective) descriptions have to refer to some fixed “language” (i.e., to a fixed machine that, given a succinct description of an object, produces its explicit description). Fixing any machine M, a string x is called a description of s with respect to M if M(x) = s. The complexity of s with respect to M, denoted K M (s), is the length of the shortest description of s with respect to M. Certainly, we want to fix M such that every string has a description with respect to M, and furthermore such that this description is not “significantly” longer than the description with respect to a different machine M . The following theorem makes it natural to use a universal machine as the “point of reference” (i.e., as the aforementioned M). Theorem 1.10 (complexity wrt a universal machine): Let U be a universal machine. Then, for every machine M , there exists a constant c such that K U (s) ≤ K M (s) + c for every string s. The theorem follows by (setting c = O(M ) and) observing that if x is a description of s with respect to M then (M , x) is a description of s with respect to U . Here it is important to use an adequate encoding of pairs of strings (e.g., the pair (σ1 · · · σk , τ1 · · · τ ) is encoded by the string σ1 σ1 · · · σk σk 01τ1 · · · τ ). Fixing any universal machine U , we def define the Kolmogorov Complexity of a string s as K (s) = K U (s). The reader may easily verify the following facts: 1.
K (s) ≤ s + O(1), for every s. (Hint: Apply Theorem 1.10 to a machine that computes the identity mapping.)
2. There exist infinitely many strings s such that K (s) s. (Hint: Consider s = 1n . Alternatively, consider any machine M such that M(x) x for every x.)
31
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
3. Some strings of length n have complexity at least n. Furthermore, for every n and i, {s ∈ {0, 1}n : K (s) ≤ n − i} < 2n−i+1 (Hint: Different strings must have different descriptions with respect to U .)
It can be shown that the function K is uncomputable. The proof is related to the paradox captured by the following “description” of a natural number: the largest natural number that can be described by an English sentence of up to a thousand letters. (The paradox amounts to observing that if the above number is well defined then so is the integer successor of the largest natural number that can be described by an English sentence of up to a thousand letters.) Needless to say, the foregoing sentences presuppose that any
English sentence is a legitimate description in some adequate sense (e.g., in the sense captured by Kolmogorov Complexity). Specifically, the foregoing sentences presuppose that we can determine the Kolmogorov Complexity of each natural number, and furthermore that we can effectively produce the largest number that has Kolmogorov Complexity not exceeding some threshold. Indeed, the paradox suggests a proof of the fact that the latter task cannot be performed; that is, there exists no algorithm that given t produces the lexicographically last string s such that K (s) ≤ t, because if such an algorithm A would have existed then K (s) ≤ O(A) + log t and K (s0) < K (s) + O(1) < t in contradiction to the definition of s.
1.2.3.5. Time and Space Complexity Fixing a model of computation (e.g., Turing machines) and focusing on algorithms that halt on each input, we consider the number of steps (i.e., applications of the computation rule) taken by the algorithm on each possible input. The latter function is called the time complexity of the algorithm (or machine); that is, t A : {0, 1}∗ → N is called the time complexity of algorithm A if, for every x, on input x algorithm A halts after exactly t A (x) steps. We will be mostly interested in the dependence of the time complexity on the input length, when taking the maximum over all inputs of the relevant length. That is, for t A as def in the foregoing, we will consider T A : N → N defined by T A (n) = maxx∈{0,1}n {t A (x)}. Abusing terminology, we sometimes refer to T A as the time complexity of A. The time complexity of a problem. As stated in the preface and in the introduction, typically Complexity Theory is not concerned with the (time) complexity of a specific algorithm. It is rather concerned with the (time) complexity of a problem, assuming that this problem is solvable at all (by some algorithm). Intuitively, the time complexity of such a problem is defined as the time complexity of the fastest algorithm that solves this problem (assuming that the latter term is well defined).17 Actually, we shall be interested in upper and lower bounds on the (time) complexity of algorithms that solve the problem. Thus, when we say that a certain problem has complexity T , we actually mean that has complexity at most T . Likewise, when we say that requires time T , we actually mean that has time complexity at least T . 17
Advanced comment: As we shall see in Section 4.2.2 (cf. Theorem 4.8), the naive assumption that a “fastest algorithm” for solving a problem exists is not always justified. On the other hand, the assumption is essentially justified in some important cases (see, e.g., Theorem 2.33). But even in these cases the said algorithm is “fastest” (or “optimal”) only up to a constant factor.
32
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
Recall that the foregoing discussion refers to some fixed model of computation. Indeed, the complexity of a problem may depend on the specific model of computation in which algorithms that solve are implemented. The following CobhamEdmonds Thesis asserts that the variation (in the time complexity) is not too big, and in particular is irrelevant to much of the current focus of Complexity Theory (e.g., for the PvsNP Question). The CobhamEdmonds Thesis. As just stated, the time complexity of a problem may depend on the model of computation. For example, deciding membership in the set {x x : x ∈ {0, 1}∗ } can be done in linear time on a twotape Turing machine, but requires quadratic time on a singletape Turing machine.18 On the other hand, any problem that has time complexity t in the model of multitape Turing machines has complexity O(t 2 ) in the model of singletape Turing machines. The CobhamEdmonds Thesis asserts that the time complexities in any two “reasonable and general” models of computation are polynomially related. That is, a problem has time complexity t in some “reasonable and general” model of computation if and only if it has time complexity poly(t) in the model of (singletape) Turing machines. Indeed, the CobhamEdmonds Thesis strengthens the ChurchTuring Thesis. It asserts not only that the class of solvable problems is invariant as far as “reasonable and general” models of computation are concerned, but also that the time complexity (of the solvable problems) in such models is polynomially related. Efficient algorithms. As hinted in the foregoing discussions, much of Complexity Theory is concerned with efficient algorithms. The latter are defined as polynomialtime algorithms (i.e., algorithms that have time complexity that is upperbounded by a polynomial in the length of the input). By the CobhamEdmonds Thesis, the definition of this class is invariant under the choice of a “reasonable and general” model of computation. The association of efficient algorithms with polynomialtime computation is grounded in the following two considerations: • Philosophical consideration: Intuitively, efficient algorithms are those that can be implemented within a number of steps that is a moderately growing function of the input length. To allow for reading the entire input, at least linear time should be allowed. On the other hand, apparently slow algorithms and in particular “exhaustive search” algorithms, which take exponential time, must be avoided. Furthermore, a good definition of the class of efficient algorithms should be closed under natural compositions of algorithms (as well as be robust with respect to reasonable models of computation and with respect to simple changes in the encoding of problems’ instances). Choosing polynomials as the set of time bounds for efficient algorithms satisfies all the foregoing requirements: Polynomials constitute a “closed” set of moderately growing functions, where “closure” means closure under addition, multiplication, and functional composition. These closure properties guarantee the closure of the class of 18
Proving the latter fact is quite nontrivial. One proof is by a “reduction” from a communication complexity problem [148, Sec. 12.2]. Intuitively, a singletape Turing machine that decides membership in the aforementioned set can be viewed as a channel of communication between the two parts of the input. Focusing our attention on inputs of the form y0n z0n , for y, z ∈ {0, 1}n , each time the machine passes from the first part to the second part it carries O(1) bits of information (in its internal state) while making at least n steps. The proof is completed by invoking the linear lower bound on the communication complexity of the (twoargument) identity function (i.e., id(y, z) = 1 if y = z and id(y, z) = 0 otherwise, cf. [148, Chap. 1]).
33
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
efficient algorithms under natural compositions of algorithms (as well as its robustness with respect to any reasonable and general model of computation). Furthermore, polynomialtime algorithms can conduct computations that are apparently simple (although not necessarily trivial), and on the other hand they do not include algorithms that are apparently inefficient (like exhaustive search). • Empirical consideration: It is clear that algorithms that are considered efficient in practice have a running time that is bounded by a small polynomial (at least on the inputs that occur in practice). The question is whether any polynomial time algorithm can be considered efficient in an intuitive sense. The belief, which is supported by past experience, is that every natural problem that can be solved in polynomial time also has a “reasonably efficient” algorithm. We stress that the association of efficient algorithms with polynomialtime computation is not essential to most of the notions, results, and questions of Complexity Theory. Any other class of algorithms that supports the aforementioned closure properties and allows for conducting some simple computations but not overly complex ones gives rise to a similar theory, albeit the formulation of such a theory may be more complicated. Specifically, all results and questions treated in this book are concerned with the relation among the complexities of different computational tasks (rather than with providing absolute assertions about the complexity of some computational tasks). These relations can be stated explicitly, by stating how any upper bound on the time complexity of one task gets translated to an upper bound on the time complexity of another task.19 Such cumbersome statements will maintain the contents of the standard statements; they will merely be much more complicated. Thus, we follow the tradition of focusing on polynomialtime computations, while stressing that this focus both is natural and provides the simplest way of addressing the fundamental issues underlying the nature of efficient computation. Universal machines, revisited. The notion of time complexity gives rise to a timebounded version of the universal function u (presented in §1.2.3.4). Specifically, we def define u (M, x, t) = y if on input x machine M halts within t steps and outputs the def string y, and u (M, x, t) = ⊥ if on input x machine M makes more than t steps. Unlike u, the function u is a total function. Furthermore, unlike any extension of u to a total function, the function u is computable. Moreover, u is computable by a machine U that, on input X = (M, x, t), halts after poly(X ) steps. Indeed, machine U is a variant of a universal machine (i.e., on input X , machine U merely emulates M for t steps rather than emulating M till it halts (and potentially indefinitely)). Note that the number of steps taken by U depends on the specific model of computation (and that some overhead is unavoidable because emulating each step of M requires reading the relevant portion of the description of M). Space complexity. Another natural measure of the “complexity” of an algorithm (or a task) is the amount of memory consumed by the computation. We refer to the memory 19
For example, the NPcompleteness of SAT (cf. Theorem 2.22) implies that any algorithm solving SAT in time T yields an algorithm that factors composite numbers in time T such that T (n) = poly(n) · (1 + T (poly(n))). (More generally, if the correctness of solutions for nbit instances of some search problem R can be verified in time t(n) then the hypothesis regarding SAT implies that solutions (for nbit instances of R) can be found in time T such that T (n) = t(n) · (1 + T (O(t(n))2 )).)
34
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
used for storing some intermediate results of the computation. Since much of our focus will be on using memory that is sublinear in the input length, it is important to use a model in which one can differentiate memory used for computation from memory used for storing the initial input or the final output. In the context of Turing machines, this is done by considering multitape Turing machines such that the input is presented on a special readonly tape (called the input tape), the output is written on a special writeonly tape (called the output tape), and intermediate results are stored on a worktape. Thus, the input and output tapes cannot be used for storing intermediate results. The space complexity of such a machine M is defined as a function s M such that s M (x) is the number of cells of the worktape that are scanned by M on input x. As in the case of time complexity, we def will usually refer to S A (n) = maxx∈{0,1}n {s A (x)}.
1.2.3.6. Oracle Machines The notion of Turingreductions, which was discussed in §1.2.3.3, is captured by the following definition of socalled oracle machines. Loosely speaking, an oracle machine is a machine that is augmented such that it may pose questions to the outside. We consider the case in which these questions, called queries, are answered consistently by some function f : {0, 1}∗ → {0, 1}∗ , called the oracle. That is, if the machine makes a query q then the answer it obtains is f (q). In such a case, we say that the oracle machine is given access to the oracle f . For an oracle machine M, a string x and a function f , we denote by M f (x) the output of M on input x when given access to the oracle f . (Reexamining the second part of the proof of Theorem 1.5, observe that we have actually described an oracle machine that computes d when given access to the oracle d.) The notion of an oracle machine extends the notion of a standard computing device (machine), and thus a rigorous formulation of the former extends a formal model of the latter. Specifically, extending the model of Turing machines, we derive the following model of oracle Turing machines. Definition 1.11 (using an oracle): • An oracle machine is a Turing machine with a special additional tape, called the oracle tape, and two special states, called oracle invocation and oracle spoke. • The computation of the oracle machine M on input x and access to the oracle f : {0, 1}∗ → {0, 1}∗ is defined based on the successive configuration function. For configurations with a state different from oracle invocation the next configuration is defined as usual. Let γ be a configuration in which the machine’s state is oracle invocation and suppose that the actual contents of the oracle tape is q (i.e., q is the contents of the maximal prefix of the tape that holds bit values).20 Then, the configuration following γ is identical to γ , except that the state is oracle spoke, and the actual contents of the oracle tape is f (q). The string q is called M’s query and f (q) is called the oracle’s reply. • The output of the oracle machine M on input x when given oracle access to f is denoted M f (x). 20
This fits the definition of the actual initial contents of a tape of a Turing machine (cf. §1.2.3.2). A common convention is that the oracle can be invoked only when the machine’s head resides at the leftmost cell of the oracle tape. We comment that, in the context of space complexity, one uses two oracle tapes: a writeonly tape for the query and a readonly tape for the answer.
35
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
We stress that the running time of an oracle machine is the number of steps made during its (own) computation, and that the oracle’s reply on each query is obtained in a single step.
1.2.3.7. Restricted Models We mention that restricted models of computation are often mentioned in the context of a course on computability, but they will play no role in the current book. One such model is the model of finite automata, which in some variant coincides with Turing machines that have spacecomplexity zero (equiv., constant). In our opinion, the most important motivation for the study of these restricted models of computation is that they provide simple models for some natural (or artificial) phenomena. This motivation, however, seems only remotely related to the study of the complexity of various computational tasks, which calls for the consideration of general models of computation and the evaluation of the complexity of computation with respect to such models. Teaching note: Indeed, we reject the common coupling of computability theory with the theory of automata and formal languages. Although the historical links between these two theories (at least in the West) cannot be denied, this fact cannot justify coupling two fundamentally different theories (especially when such a coupling promotes a wrong perspective on computability theory). Thus, in our opinion, the study of any of the lower levels of Chomsky’s Hierarchy [123, Chap. 9] should be decoupled from the study of computability theory (let alone the study of Complexity Theory).
1.2.4. Nonuniform Models (Circuits and Advice) Camille: Like Thelma and Louise. But . . . without the guns. Petra: Oh, well, no guns, I don’t know . . . Patricia Rozema, When Night Is Falling, 1995
The main use of nonuniform models of computation, in this book, will be as a source of some natural computational problems (cf. §2.3.3.1 and Theorem 5.4). In addition, these models will be briefly studied in Sections 3.1 and 4.1. By a nonuniform model of computation we mean a model in which for each possible input length a different computing device is considered, while there is no “uniformity” requirement relating devices that correspond to different input lengths. Furthermore, this collection of devices is infinite by nature, and (in the absence of a uniformity requirement) this collection may not even have a finite description. Nevertheless, each device in the collection has a finite description. In fact, the relationship between the size of the device (resp., the length of its description) and the length of the input that it handles will be of major concern. Nonuniform models of computation are studied either toward the development of lowerbound techniques or as simplified limits on the ability of efficient algorithms.21 21
The second case refers mainly to efficient algorithms that are given a pair of inputs (of (polynomially) related length) such that these algorithms are analyzed with respect to fixing one input (arbitrarily) and varying the other input (typically, at random). Typical examples include the context of derandomization (cf. Section 8.3) and the setting of zeroknowledge (cf. Section 9.2).
36
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
In both cases, the uniformity condition is eliminated in the interest of simplicity and with the hope (and belief) that nothing substantial is lost as far as the issues at hand are concerned. In the context of developing lower bounds, the hope is that the finiteness of all parameters (i.e., the input length and the device’s description) will allow for the application of combinatorial techniques to analyze the limitations of certain settings of parameters. We will focus on two related models of nonuniform computing devices: Boolean circuits (§1.2.4.1) and “machines that take advice” (§1.2.4.2). The former model is more adequate for the study of the evolution of computation (i.e., development of lowerbound techniques), whereas the latter is more adequate for modeling purposes (e.g., limiting the ability of efficient algorithms).
1.2.4.1. Boolean Circuits The most popular model of nonuniform computation is the one of Boolean circuits. Historically, this model was introduced for the purpose of describing the “logic operation” of reallife electronic circuits. Ironically, nowadays this model provides the stage for some of the most practically removed studies in Complexity Theory (which aim at developing methods that may eventually lead to an understanding of the inherent limitations of efficient algorithms). A Boolean circuit is a directed acyclic graph22 with labels on the vertices, to be discussed shortly. For the sake of simplicity, we disallow isolated vertices (i.e., vertices with no incoming or outgoing edges), and thus the graph’s vertices are of three types: sources, sinks, and internal vertices. 1. Internal vertices are vertices having incoming and outgoing edges (i.e., they have indegree and outdegree at least 1). In the context of Boolean circuits, internal vertices are called gates. Each gate is labeled by a Boolean operation, where the operations that are typically considered are ∧, ∨, and ¬ (corresponding to and, or, and neg). In addition, we require that gates labeled ¬ have indegree 1. The indegree of ∧gates and ∨gates may be any number greater than zero, and the same holds for the outdegree of any gate. 2. The graph sources (i.e., vertices with no incoming edges) are called input terminals. Each input terminal is labeled by a natural number (which is to be thought of as the index of an input variable). (For the sake of defining formulae (see §1.2.4.3), we allow different input terminals to be labeled by the same number.)23 3. The graph sinks (i.e., vertices with no outgoing edges) are called output terminals, and we require that they have indegree 1. Each output terminal is labeled by a natural number such that if the circuit has m output terminals then they are labeled 1, 2, . . . , m. That is, we disallow different output terminals to be labeled by the same number, and insist that the labels of the output terminals be consecutive numbers. (Indeed, the labels of the output terminals will correspond to the indices of locations in the circuit’s output.) 22
See Appendix G.1. This is not needed in the case of general circuits, because we can just feed outgoing edges of the same input terminal to many gates. Note, however, that this is not allowed in the case of formulae, where all nonsinks are required to have outdegree exactly 1. 23
37
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
1
2
neg
neg
and
and
4
3
neg
and and
0
or or
2
1
Figure 1.3: A circuit computing f (x1 , x2 , x3 , x4 ) = (x1 ⊕ x2 , x1 ∧ ¬x2 ∧ x4 ).
For the sake of simplicity, we also mandate that the labels of the input terminals be consecutive numbers.24 A Boolean circuit with n different input labels and m output terminals induces (and indeed computes) a function from {0, 1}n to {0, 1}m defined as follows. For any fixed string x ∈ {0, 1}n , we iteratively define the value of vertices in the circuit such that the input terminals are assigned the corresponding bits in x = x1 · · · xn and the values of other vertices are determined in the natural manner. That is: • An input terminal with label i ∈ {1, . . . , n} is assigned the i th bit of x (i.e., the value xi ). • If the children of a gate (of indegree d) that is labeled ∧ have values v1 , v2 , . . . , vd , d vi . The value of a gate labeled ∨ (or ¬) is then the gate is assigned the value ∧i=1 determined analogously. Indeed, the hypothesis that the circuit is acyclic implies that the following natural process of determining values for the circuit’s vertices is well defined: As long as the value of some vertex is undetermined, there exists a vertex such that its value is undetermined but the values of all its children are determined. Thus, the process can make progress, and terminates when the values of all vertices (including the output terminals) are determined. The value of the circuit on input x (i.e., the output computed by the circuit on input x) is y = y1 · · · ym , where yi is the value assigned by the foregoing process to the output terminal labeled i. We note that there exists a polynomialtime algorithm that, given a circuit C and a corresponding input x, outputs the value of C on input x. This algorithm determines the values of the circuit’s vertices, going from the circuit’s input terminals to its output terminals. 24
This convention slightly complicates the construction of circuits that ignore some of the input values. Specifically, we use artificial gadgets that have incoming edges from the corresponding input terminals, and compute an adequate constant. To avoid having this constant as an output terminal, we feed it into an auxiliary gate such that the value of the latter is determined by the other incoming edge (e.g., a constant 0 fed into an ∨gate). See an example of dealing with x3 in Figure 1.3.
38
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
We say that a family of circuits (Cn )n∈N computes a function f : {0, 1}∗ → {0, 1}∗ if for every n the circuit Cn computes the restriction of f to strings of length n. In other words, for every x ∈ {0, 1}∗ , it must hold that Cx (x) = f (x). Bounded and unbounded fanin. We will be most interested in circuits in which each gate has at most two incoming edges. In this case, the types of (twoargument) Boolean operations that we allow is immaterial (as long as we consider a “full basis” of such operations, i.e., a set of operations that can implement any other twoargument Boolean operation). Such circuits are called circuits of bounded fanin. In contrast, other studies are concerned with circuits of unbounded fanin, where each gate may have an arbitrary number of incoming edges. Needless to say, in the case of circuits of unbounded fanin, the choice of allowed Boolean operations is important and one focuses on operations that are “uniform” (across the number of operants, e.g., ∧ and ∨). Circuit size as a complexity measure. The size of a circuit is the number of its edges. When considering a family of circuits (Cn )n∈N that computes a function f : {0, 1}∗ → {0, 1}∗ , we are interested in the size of Cn as a function of n. Specifically, we say that this family has size complexity s : N → N if for every n the size of Cn is s(n). The circuit complexity of a function f , denoted s f , is the infimum of the size complexity of all families of circuits that compute f . Alternatively, for each n we may consider the size of the smallest circuit that computes the restriction of f to nbit strings (denoted f n ), and set s f (n) accordingly. We stress that nonuniformity is implicit in this definition, because no conditions are made regarding the relation between the various circuits used to compute the function on different input lengths.25 On the circuit complexity of functions. We highlight some simple facts about the circuit complexity of functions. (These facts are in clear correspondence to facts regarding Kolmogorov Complexity mentioned in §1.2.3.4.) 1. Most importantly, any Boolean function can be computed by some family of circuits, and thus the circuit complexity of any function is well defined. Furthermore, each function has at most exponential circuit complexity. (Hint: The function f n : {0, 1}n → {0, 1} can be computed by a circuit of size O(n2n ) that implements a lookup table.)
2. Some functions have polynomial circuit complexity. In particular, any function that has time complexity t (i.e., is computed by an algorithm of time complexity t) has circuit complexity poly(t). Furthermore, the corresponding circuit family is uniform (in a natural sense to be discussed in the next paragraph). (Hint: Consider a Turing machine that computes the function, and consider its computation on a generic nbit long input. The corresponding computation can be emulated by a circuit that consists of t(n) layers such that each layer represents an instantaneous configuration of the machine, and the relation between consecutive configurations is captured by (“uniform”) local gadgets in the circuit. For further details see the proof of Theorem 2.21, which presents a similar emulation.) 25
Advanced comment: We also note that, in contrast to footnote 17, the circuit model and the (circuit size) complexity measure support the notion of an optimal computing device: Each function f has a unique size complexity s f (and not merely upper and lower bounds on its complexity).
39
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
3. Almost all Boolean functions have exponential circuit complexity. Specifically, the number of functions mapping {0, 1}n to {0, 1} that can be computed by some circuit of size s is smaller than s 2s . v s (Hint: The number of circuits having v vertices and s edges is at most 2 ·
2
+ v .)
Note that the first fact implies that families of circuits can compute functions that are uncomputable by algorithms. Furthermore, this phenomenon also occurs when restricting attention to families of polynomial size circuits. See further discussion in §1.2.4.2. Uniform families. A family of polynomialsize circuits (Cn )n∈N is called uniform if given n one can construct the circuit Cn in poly(n)time. Note that if a function is computable by a uniform family of polynomialsize circuits then it is computable by a polynomialtime algorithm. This algorithm first constructs the adequate circuit (which can be done in polynomial time by the uniformity hypothesis), and then evaluates this circuit on the given input (which can be done in time that is polynomial in the size of the circuit). Note that limitations on the computing power of arbitrary families of polynomialsize circuits certainly hold for uniform families (of polynomial size), which in turn yield limitations on the computing power of polynomialtime algorithms. Thus, lower bounds on the circuit complexity of functions yield analogous lower bounds on their time complexity. Furthermore, as is often the case in mathematics and science, disposing of an auxiliary condition that is not well understood (i.e., uniformity) may turn out fruitful. Indeed, this has occured in the study of classes of restricted circuits, which is reviewed in §1.2.4.3 (and Appendix B.2).
1.2.4.2. Machines That Take Advice General (nonuniform) circuit families and uniform circuit families are two extremes with respect to the “amounts of nonuniformity” in the computing device. Intuitively, in the former, nonuniformity is only bounded by the size of the device, whereas in the latter the amounts of nonuniformity is zero. Here we consider a model that allows the decoupling of the size of the computing device from the amount of nonuniformity, which may range from zero to the device’s size. Specifically, we consider algorithms that “take a nonuniform advice” that depends only on the input length. The amount of nonuniformity will be defined as equaling the length of the corresponding advice (as a function of the input length). Definition 1.12 (taking advice): We say that algorithm A computes the function f using advice of length : N → N if there exists an infinite sequence (an )n∈N such that 1. for every x ∈ {0, 1}∗ , it holds that A(ax , x) = f (x). 2. for every n ∈ N, it holds that an  = (n). The sequence (an )n∈N is called the advice sequence. Note that any function having circuit complexity s can be computed using advice of length O(s log s), where the log factor is due to the fact that a graph with v vertices and e edges can be described by a string of length 2e log2 v. Note that the model of machines that use advice allows for some sharper bounds than the ones stated in §1.2.4.1: Every function
40
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
1.2. COMPUTATIONAL TASKS AND MODELS
can be computed using advice of length such that (n) = 2n , and some uncomputable functions can be computed using advice of length 1. Theorem 1.13 (the power of advice): There exist functions that can be computed using onebit advice but cannot be computed without advice. Proof: Starting with any uncomputable Boolean function f : N → {0, 1}, consider the function f defined as f (x) = f (x). Note that f is Turingreducible to f (e.g., on input n make any nbit query to f , and return the answer).26 Thus, f cannot be computed without advice. On the other hand, f can be easily computed by using the advice sequence (an )n∈N such that an = f (n); that is, the algorithm merely outputs the advice bit (and indeed ax = f (x) = f (x), for every x ∈ {0, 1}∗ ).
1.2.4.3. Restricted Models The model of Boolean circuits (cf. §1.2.4.1) allows for the introduction of many natural subclasses of computing devices. Following is a laconic review of a few of these subclasses. For further detail regarding the study of these subclasses, the interested reader is referred to Appendix B.2. Since we shall refer to various types of Boolean formulae in the rest of this book, we suggest not skiping the following two paragraphs. Boolean formulae. In (general) Boolean circuits the nonsink vertices are allowed arbitrary outdegree. This means that the same intermediate value can be reused without being recomputed (and while increasing the size complexity by only one unit). Such “free” reusage of intermediate values is disallowed in Boolean formula, which are formally defined as Boolean circuits in which all nonsink vertices have outdegree 1. This means that the underlying graph of a Boolean formula is a tree (see §G.2), and it can be written as a Boolean expression over Boolean variables by traversing this tree (and registering the vertices’ labels in the order traversed). Indeed, we have allowed different input terminals to be assigned the same label in order to allow formulae in which the same variable occurs multiple times. As in the case of general circuits, one is interested in the size of these restricted circuits (i.e., the size of families of formulae computing various functions). We mention that quadratic lower bounds are known for the formula size of simple functions (e.g., parity), whereas these functions have linear circuit complexity. This discrepancy is depicted in Figure 1.4. Formulae in CNF and DNF. A restricted type of Boolean formula consists of formulae that are in conjunctive normal form (CNF). Such a formula consists of a conjunction of clauses, where each clause is a disjunction of literals each being either a variable or its negation. That is, such formulae are represented by layered circuits of unbounded fanin in which the first layer consists of neggates that compute the negation of input variables, the second layer consists of orgates that compute the logicalor of subsets of inputs and negated inputs, and the third layer consists of a single andgate that computes the logicaland of the values computed in the second layer. Note that each Boolean function can be computed by a family of CNF formula of exponential size, and that the size of 26
Indeed, this Turingreduction is not efficient (i.e., it runs in exponential time in n = log2 n), but this is immaterial in the current context.
41
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION AND PRELIMINARIES
PARITY
PARITY
of x .... x 1
of x
n
...x
n+1
neg
neg
2n
PARITY
PARITY
of x .... x 1
of x
n
PARITY
...x
n+1
PARITY
of x .... x
2n
1
of x
n
...x
n+1
neg
2n
neg
and
and
and
and
or
or
Figure 1.4: Recursive construction of parity circuits and formulae.
CNF formulae may be exponentially larger than the size of ordinary formulae computing the same function (e.g., parity). For a constant k, a formula is said to be in k CNF if its CNF has disjunctions of size at most k. An analogous restricted type of Boolean formula refers to formulae that are in disjunctive normal form (DNF). Such a formula consists of a disjunction of conjunctions of literals, and when each conjunction has at most k literals we say that the formula is in k DNF. Constantdepth circuits. Circuits have a “natural structure” (i.e., their structure as graphs). One natural parameter regarding this structure is the depth of a circuit, which is defined as the longest directed path from any source to any sink. Of special interest are constantdepth circuits of unbounded fanin. We mention that subexponential lower bounds are known for the size of such circuits that compute a simple function (e.g., parity). Monotone circuits. The circuit model also allows for the consideration of monotone computing devices: A monotone circuit is one having only monotone gates (e.g., gates computing ∧ and ∨, but no negation gates (i.e., ¬gates)). Needless to say, monotone circuits can only compute monotone functions, where a function f : {0, 1}n → {0, 1} is called monotone if for any x y it holds that f (x) ≤ f (y) (where x1 · · · xn y1 · · · yn if and only if for every bit position i it holds that xi ≤ yi ). One natural question is whether, as far as monotone functions are concerned, there is a substantial loss in using only monotone circuits. The answer is yes: There exist monotone functions that have polynomial circuit complexity but require subexponential size monotone circuits.
1.2.5. Complexity Classes Complexity classes are sets of computational problems. Typically, such classes are defined by fixing three parameters: 1. A type of computational problems (see Section 1.2.2). Indeed, most classes refer to decision problems, but classes of search problems, promise problems, and other types of problems will also be considered. 2. A model of computation, which may be either uniform (see Section 1.2.3) or nonuniform (see Section 1.2.4).
42
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
3. A complexity measure and a limiting function (or a set of functions), which put together limit the class of computations of the previous item; that is, we refer to the class of computations that have complexity not exceeding the specified function (or set of functions). For example, in §1.2.3.5, we mentioned time complexity and space complexity, which apply to any uniform model of computation. We also mentioned polynomialtime computations, which are computations in which the time complexity (as a function) does not exceed some polynomial (i.e., a member of the set of polynomial functions). The most common complexity classes refer to decision problems, and are sometimes defined as classes of sets rather than classes of the corresponding decision problems. That is, one often says that a set S ⊆ {0, 1}∗ is in the class C rather than saying that the problem of deciding membership in S is in the class C. Likewise, one talks of classes of relations rather than classes of the corresponding search problems (i.e., saying that R ⊆ {0, 1}∗ × {0, 1}∗ is in the class C means that the search problem of R is in the class C).
Chapter Notes It is quite remarkable that the theories of uniform and nonuniform computational devices have emerged in two single papers. We refer to Turing’s paper [225], which introduced the model of Turing machines, and to Shannon’s paper [203], which introduced Boolean circuits. In addition to introducing the Turing machine model and arguing that it corresponds to the intuitive notion of computability, Turing’s paper [225] introduces universal machines and contains proofs of undecidability (e.g., of the Halting Problem). The ChurchTuring Thesis is attributed to the works of Church [55] and Turing [225]. In both works, this thesis is invoked for claiming that the fact that Turing machines cannot solve some problem implies that this problem cannot be solved in any “reasonable” model of computation. The RAM model is attributed to von Neumann’s report [234]. The association of efficient computation with polynomialtime algorithms is attributed to the papers of Cobham [57] and Edmonds [70]. It is interesting to note that Cobham’s starting point was his desire to present a philosophically sound concept of efficient algorithms, whereas Edmonds’s starting point was his desire to articulate why certain algorithms are “good” in practice. Rice’s Theorem is proven in [192], and the undecidability of the Post Correspondence Problem is proven in [181]. The formulation of machines that take advice (as well as the equivalence to the circuit model) originates in [139].
43
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER TWO
P, NP, and NPCompleteness
For as much as many have taken in hand to set forth in order a declaration of those things which are most surely believed among us; Even as they delivered them unto us, who from the beginning were eyewitnesses, and ministers of the word; It seems good to me also, having had perfect understanding of all things from the very first, to write unto thee in order, most excellent Theophilus; That thou mightest know the certainty of those things, wherein thou hast been instructed. Luke, 1:1–4 The main focus of this chapter is the PvsNP Question and the theory of NPcompleteness. Additional topics covered in this chapter include the general notion of a polynomialtime reduction (with a special emphasis on selfreducibility), the existence of problems in NP that are neither NPcomplete nor in P, the class coNP, optimal search algorithms, and promise problems. Summary: Loosely speaking, the PvsNP Question refers to search problems for which the correctness of solutions can be efficiently checked (i.e., if there is an efficient algorithm that given a solution to a given instance determines whether or not the solution is correct). Such search problems correspond to the class NP, and the question is whether or not all these search problems can be solved efficiently (i.e., if there is an efficient algorithm that given an instance finds a correct solution). Thus, the PvsNP Question can be phrased as asking whether or not finding solutions is harder than checking the correctness of solutions. An alternative formulation, in terms of decision problems, refers to assertions that have efficiently verifiable proofs (of relatively short length). Such sets of assertions correspond to the class NP, and the question is whether or not proofs for such assertions can be found efficiently (i.e., if there is an efficient algorithm that given an assertion determines its validity and/or finds a proof for its validity). Thus, the PvsNP Question can be phrased as asking whether or not discovering proofs is harder than verifying their correctness, that is, if proving is harder than verifying (or if proofs are valuable at all). Indeed, it is widely believed that the answer to the two equivalent formulations is that finding (resp., discovering) is harder than checking (resp.,
44
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
verifying), that is, that P is different than NP. The fact that this natural conjecture is unsettled seems to be one of the big sources of frustration of Complexity Theory. The author’s opinion, however, is that this feeling of frustration is out of place. In any case, at present, when faced with a hard problem in NP, we cannot expect to prove that the problem is not in P (unconditionally). The best we can expect is a conditional proof that the said problem is not in P, based on the assumption that NP is different from P. The contrapositive is proving that if the said problem is in P, then so is any problem in NP (i.e., NP equals P). This is where the theory of NPcompleteness comes into the picture. The theory of NPcompleteness is based on the notion of a reduction, which is a relation between computational problems. Loosely speaking, one computational problem is reducible to another problem if it is possible to efficiently solve the former when provided with an (efficient) algorithm for solving the latter. Thus, the first problem is not harder to solve than the second one. A problem (in NP) is NPcomplete if any problem in NP is reducible to it. Thus, the fate of the entire class NP (with respect to inclusion in P) rests with each individual NPcomplete problem. In particular, showing that a problem is NPcomplete implies that this problem is not in P unless NP equals P. Amazingly enough, NPcomplete problems exist, and furthermore, hundreds of natural computational problems arising in many different areas of mathematics and science are NPcomplete. We stress that NPcomplete problems are not the only hard problems in NP (i.e., if P is different than NP then NP contains problems that are neither NPcomplete nor in P). We also note that the PvsNP Question is not about inventing sophisticated algorithms or ruling out their existence, but rather boils down to the analysis of a single known algorithm; that is, we will present an optimal search algorithm for any problem in NP, while having no clue about its time complexity. Teaching note: Indeed, we suggest presenting the PvsNP Question both in terms of search problems and in terms of decision problems. Furthermore, in the latter case, we suggest introducing NP by explicitly referring to the terminology of proof systems. As for the theory of NPcompleteness, we suggest emphasizing the mere existence of NPcomplete problems.
Prerequisites. We assume familiarity with the notions of search and decision problems (see Section 1.2.2), algorithms (see Section 1.2.3), and their time complexity (see §1.2.3.5). We will also refer to the notion of an oracle machine (see §1.2.3.6). Organization. In Section 2.1 we present the two formulations of the PvsNP Question. The general notion of a reduction is presented in Section 2.2, where we highlight its applicability outside the domain of NPcompleteness. Section 2.3 is devoted to the theory of NPcompleteness, whereas Section 2.4 treats three relatively advanced topics (i.e., the framework of promise problems, the existence of optimal search algorithms for NP, and the class coNP).
45
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Teaching note: This chapter has more teaching notes than any other chapter in the book. This reflects the author’s concern regarding the way in which this fundamental material is often taught. Specifically, it is the author’s impression that the material covered in this chapter is often taught in wrong ways, which fail to communicate its fundamental nature.
2.1. The P Versus NP Question Our daily experience is that it is harder to solve a problem than it is to check the correctness of a solution. Is this experience merely a coincidence or does it represent a fundamental fact of life (or a property of the world)? This is the essence of the P versus NP Question, where P represents search problems that are efficiently solvable and NP represents search problems for which solutions can be efficiently checked. Another natural question captured by the P versus NP Question is whether proving theorems is harder that verifying the validity of these proofs. In other words, the question is whether deciding membership in a set is harder than being convinced of this membership by an adequate proof. In this case, P represents decision problems that are efficiently solvable, whereas NP represents sets that have efficiently checkable proofs of membership. These two meanings of the P versus NP Question are rigorously presented and discussed in Sections 2.1.1 and 2.1.2, respectively. The equivalence of the two formulations is shown in Section 2.1.3, and the common belief that P is different from NP is further discussed in Section 2.1.6. We start by recalling the notion of efficient computation. Teaching note: Most students have heard of P and NP before, but we suspect that many of them have not obtained a good explanation of what the PvsNP Question actually represents. This unfortunate situation is due to using the standard technical definition of NP (which refers to the fictitious and confusing device called a nondeterministic polynomialtime machine). Instead, we advocate the use of the more cumbersome definitions, sketched in the foregoing paragraphs (and elaborated in Sections 2.1.1 and 2.1.2), which clearly capture the fundamental nature of NP.
The notion of efficient computation. Recall that we associate efficient computation with polynomialtime algorithms.1 This association is justified by the fact that polynomials are a class of moderately growing functions that is closed under operations that correspond to natural composition of algorithms. Furthermore, the class of polynomialtime algorithms is independent of the specific model of computation, as long as the latter is “reasonable” (cf. the CobhamEdmonds Thesis). Both issues are discussed in §1.2.3.5. Advanced note on the representation of problem instances. As noted in §1.2.2.3, many natural (search and decision) problems are captured more naturally by the terminology of promise problems (cf. Section 2.4.1), where the domain of possible instances is a subset of {0, 1}∗ rather than {0, 1}∗ itself. For example, computational problems in graph theory presume some simple encoding of graphs as strings, but this encoding is typically not 1
Advanced comment: In this chapter, we consider deterministic (polynomialtime) algorithms as the basic model of efficient computation. A more liberal view, which also includes probabilistic (polynomialtime) algorithms, is presented in Chapter 6. We stress that the most important facts and questions that are addressed in the current chapter hold also with respect to probabilistic polynomialtime algorithms.
46
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.1. THE P VERSUS NP QUESTION
onto (i.e., not all strings encode graphs), and thus not all strings are legitimate instances. However, in these cases, the set of legitimate instances (e.g., encodings of graphs) is efficiently recognizable (i.e., membership in it can be decided in polynomial time). Thus, artificially extending the set of instances to the set of all possible strings (and allowing trivial solutions for the corresponding dummy instances) does not change the complexity of the original problem. We further discuss this issue in Section 2.4.1.
2.1.1. The Search Version: Finding Versus Checking Teaching note: Complexity theorists are so accustomed to focusing on decision problems that they seem to forget that search problems are at least as natural as decision problems. Furthermore, to many nonexperts, search problems may seem even more natural than decision problems: Typically, people seek solutions more than they pause to wonder whether or not solutions exist. Thus, we recommend starting with a formulation of the PvsNP Question in terms of search problems. Admittedly, the cost is more cumbersome formulations, but it is more than worthwhile.
Teaching note: In order to reflect the importance of the search version as well as allow less cumbersome formulations, we chose to introduce notations for the two search classes corresponding to P and NP: These classes are denoted PF and PC (standing for Polynomialtime Find and Polynomialtime Check, respectively). The teacher may prefer using notations and terms that are more evocative of P and NP (such as Psearch and NPsearch), and actually we also do so in some motivational discussions (especially in advanced chapters of this book). (Still, in our opinion, in the long run, the students and the field may be served better by using standardlooking notations.)
Much of computer science is concerned with solving various search problems (as in Definition 1.1). Examples of such problems include finding a solution to a system of linear (or polynomial) equations, finding a prime factor of a given integer, finding a spanning tree in a graph, finding a short traveling salesman tour in a metric space, and finding a scheduling of jobs to machines such that various constraints are satisfied. Furthermore, search problems correspond to the daily notion of “solving problems” and thus are of natural general interest. In the current section, we will consider the question of which search problems can be solved efficiently. One type of search problems that cannot be solved efficiently consists of search problems for which the solutions are too long in terms of the problem’s instances. In such a case, merely typing the solution amounts to an activity that is deemed inefficient. Thus, we focus our attention on search problems that are not in this class. That is, we consider only search problems in which the length of the solution is bounded by a polynomial in the length of the instance. Recalling that search problems are associated with binary relations (see Definition 1.1), we focus our attention on polynomially bounded relations. Definition 2.1 (polynomially bounded relations): We say that R ⊆ {0, 1}∗ × {0, 1}∗ is polynomially bounded if there exists a polynomial p such that for every (x, y) ∈ R it holds that y ≤ p(x).
47
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Recall that (x, y) ∈ R means that y is a solution to the problem instance x, where R represents the problem itself. For example, in the case of finding a prime factor of a given integer, we refer to a relation R such that (x, y) ∈ R if the integer y is a prime factor of the integer x. For a polynomially bounded relation R it makes sense to ask whether or not, given a problem instance x, one can efficiently find an adequate solution y (i.e., find y such that (x, y) ∈ R). The polynomial bound on the length of the solution (i.e., y) guarantees that a negative answer is not merely due to the length of the required solution.
2.1.1.1. The Class P as a Natural Class of Search Problems Recall that we are interested in the class of search problems that can be solved efficiently, that is, problems for which solutions (whenever they exist) can be found efficiently. Restricting our attention to polynomially bounded relations, we identify the corresponding fundamental class of search problem (or binary relation), denoted PF (standing for “Polynomialtime Find”). (The relationship between PF and the standard definition of P will be discussed in Sections 2.1.3 and 2.2.3.) The following definition refers to the formulation of solving search problems provided in Definition 1.1. Definition 2.2 (efficiently solvable search problems): • The search problem of a polynomially bounded relation R ⊆ {0, 1}∗ × {0, 1}∗ is efficiently solvable if there exists a polynomial time algorithm A such that, def for every x, it holds that A(x) ∈ R(x) = {y : (x, y) ∈ R} if and only if R(x) is not empty. Furthermore, if R(x) = ∅ then A(x) = ⊥, indicating that x has no solution. • We denote by PF the class of search problems that are efficiently solvable (and correspond to polynomially bounded relations). That is, R ∈ PF if R is polynomially bounded and there exists a polynomialtime algorithm that given x finds y such that (x, y) ∈ R (or asserts that no such y exists). Note that R(x) denotes the set of valid solutions for the problem instance x. Thus, the solver A is required to find a valid solution (i.e., satisfy A(x) ∈ R(x)) whenever such a solution exists (i.e., R(x) is not empty). On the other hand, if the instance x has no solution (i.e., R(x) = ∅) then clearly A(x) ∈ R(x). The extra condition (also made in Definition 1.1) requires that in this case A(x) = ⊥. Thus, algorithm A always outputs a correct answer, which is a valid solution in the case that such a solution exists and otherwise provides an indication that no solution exists. We have defined a fundamental class of problems, and we do know of many natural problems in this class (e.g., solving linear equations over the rationals, finding a perfect matching in a graph, etc). However, we must admit that we do not have a good understanding regarding the actual contents of this class (i.e., we are unable to characterize many natural problems with respect to membership in this class). This situation is quite common in Complexity Theory, and seems to be a consequence of the fact that complexity classes are defined in terms of the “external behavior” (of potential algorithms) rather than in terms of the “internal structure” (of the problem). Turning back to PF, we note that, while it contains many natural search problems, there are also many natural search problems that are not known to be in PF. A natural class containing a host of such problems is presented next.
48
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.1. THE P VERSUS NP QUESTION
2.1.1.2. The Class NP as Another Natural Class of Search Problems Natural search problems have the property that valid solutions can be efficiently recognized. That is, given an instance x of the problem R and a candidate solution y, one can efficiently determine whether or not y is a valid solution for x (with respect to the problem R, i.e., whether or not y ∈ R(x)). The class of all such search problems is a natural class per se, because it is not clear why one should care about a solution unless one can recognize a valid solution once given. Furthermore, this class is a natural domain of candidates for PF, because the ability to efficiently recognize a valid solution seems to be a natural (albeit not absolute) prerequisite for a discussion regarding the complexity of finding such solutions. We restrict our attention again to polynomially bounded relations, and consider the class of relations for which membership of pairs in the relation can be decided efficiently. We stress that we consider deciding membership of given pairs of the form (x, y) in a def fixed relation R, and not deciding membership of x in the set S R = {x : R(x) = ∅}. (The relationship between the following definition and the standard definition of NP will be discussed in Sections 2.1.3–2.1.5 and 2.2.3.) Definition 2.3 (search problems with efficiently checkable solutions): • The search problem of a polynomially bounded relation R ⊆ {0, 1}∗ × {0, 1}∗ has efficiently checkable solutions if there exists a polynomialtime algorithm A such that, for every x and y, it holds that A(x, y) = 1 if and only if (x, y) ∈ R. • We denote by PC (standing for “Polynomialtime Check”) the class of search problems that correspond to polynomially bounded binary relations that have efficiently checkable solutions. That is, R ∈ PC if the following two conditions hold: 1. For some polynomial p, if (x, y) ∈ R then y ≤ p(x). 2. There exists a polynomialtime algorithm that given (x, y) determines whether or not (x, y) ∈ R. The class PC contains thousands of natural problems (e.g., finding a traveling salesman tour of length that does not exceed a given threshold, finding the prime factorization of a given composite, etc). In each of these natural problems, the correctness of solutions can be checked efficiently (e.g., given a traveling salesman tour it is easy to compute its length and check whether or not it exceeds the given threshold).2 The class PC is the natural domain for the study of which problems are in PF, because the ability to efficiently recognize a valid solution is a natural prerequisite for a discussion regarding the complexity of finding such solutions. We warn, however, that PF contains (unnatural) problems that are not in PC (see Exercise 2.1).
2.1.1.3. The P Versus NP Question in Terms of Search Problems Is it the case that every search problem in PC is in PF? That is, if one can efficiently check the correctness of solutions with respect to some (polynomially bounded) relation R, then is it necessarily the case that the search problem of R can be solved efficiently? In other words, if it is easy to check whether or not a given solution for a given instance is correct, then is it also easy to find a solution to a given instance? 2
In the traveling salesman problem (TSP), the instance is a matrix of distances between cities and a threshold, and the task is to find a tour that passes all cities and covers a total distance that does not exceed the threshold.
49
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
If PC ⊆ PF then this would mean that whenever solutions to given instances can be efficiently checked (for correctness) it is also the case that such solutions can be efficiently found (when given only the instance). This would mean that all reasonable search problems (i.e., all problems in PC) are easy to solve. Needless to say, such a situation would contradict the intuitive feeling (and the daily experience) that some reasonable search problems are hard to solve. Furthermore, in such a case, the notion of “solving a problem” would lose its meaning (because finding a solution will not be significantly more difficult than checking its validity). On the other hand, if PC \ PF = ∅ then there exist reasonable search problems (i.e., some problems in PC) that are hard to solve. This conforms with our basic intuition by which some reasonable problems are easy to solve whereas others are hard to solve. Furthermore, it reconfirms the intuitive gap between the notions of solving and checking (asserting that in some cases “solving” is significantly harder than “checking”).
2.1.2. The Decision Version: Proving Versus Verifying As we shall see in the sequel, the study of search problems (e.g., the PCvsPF Question) can be “reduced” to the study of decision problems. Since the latter problems have a less cumbersome terminology, Complexity Theory tends to focus on them (and maintains its relevance to the study of search problems via the aforementioned reduction). Thus, the study of decision problems provides a convenient way for studying search problems. For example, the study of the complexity of deciding the satisfiability of Boolean formulae provides a convenient way for studying the complexity of finding satisfying assignments for such formulae. We wish to stress, however, that decision problems are interesting and natural per se (i.e., beyond their role in the study of search problems). After all, some people do care about the truth, and so determining whether certain claims are true is a natural computational problem. Specifically, determining whether a given object (e.g., a Boolean formula) has some predetermined property (e.g., is satisfiable) constitutes an appealing computational problem. The PvsNP Question refers to the complexity of solving such problems for a wide and natural class of properties associated with the class NP. The latter class refers to properties that have “efficient proof systems” allowing for the verification of the claim that a given object has a predetermined property (i.e., is a member of a predetermined set). Jumping ahead, we mention that the PvsNP Question refers to the question of whether properties that have efficient proof systems can also be decided efficiently (without proofs). Let us clarify all these notions. Properties of objects are modeled as subsets of the set of all possible objects (i.e., a property is associated with the set of objects having this property). For example, the property of being a prime is associated with the set of prime numbers, and the property of being connected (resp., having a Hamiltonian path) is associated with the set of connected (resp., Hamiltonian) graphs. Thus, we focus on deciding membership in sets (as in Definition 1.2). The standard formulation of the PvsNP Question refers to the questionable equality of two natural classes of decision problems, denoted P and NP (and defined in §2.1.2.1 and §2.1.2.2, respectively).
2.1.2.1. The Class P as a Natural Class of Decision Problems Needless to say, we are interested in the class of decision problems that are efficiently solvable. This class is traditionally denoted P (standing for “Polynomial time”). The
50
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.1. THE P VERSUS NP QUESTION
following definition refers to the formulation of solving decision problems (provided in Definition 1.2). Definition 2.4 (efficiently solvable decision problems): • A decision problem S ⊆ {0, 1}∗ is efficiently solvable if there exists a polynomialtime algorithm A such that, for every x, it holds that A(x) = 1 if and only if x ∈ S. • We denote by P the class of decision problems that are efficiently solvable. As in Definition 2.2, we have defined a fundamental class of problems, which contains many natural problems (e.g., determining whether or not a given graph is connected), but we do not have a good understanding regarding its actual contents (i.e., we are unable to characterize many natural problems with respect to membership in this class). In fact, there are many natural decision problems that are not known to reside in P, and a natural class containing a host of such problems is presented next. This class of decision problems is denoted NP (for reasons that will become evident in Section 2.1.5).
2.1.2.2. The Class NP and NPproof Systems We view NP as the class of decision problems that have efficiently verifiable proof systems. Loosely speaking, we say that a set S has a proof system if instances in S have valid proofs of membership (i.e., proofs accepted as valid by the system), whereas instances not in S have no valid proofs. Indeed, proofs are defined as strings that (when accompanying the instance) are accepted by the (efficient) verification procedure. We say that V is a verification procedure for membership in S if it satisfies the following two conditions: 1. Completeness: True assertions have valid proofs; that is, proofs accepted as valid by V . Bearing in mind that assertions refer to membership in S, this means that for every x ∈ S there exists a string y such that V (x, y) = 1 (i.e., V accepts y as a valid proof for the membership of x in S). 2. Soundness: False assertions have no valid proofs. That is, for every x ∈ S and every string y it holds that V (x, y) = 0, which means that V rejects y as a proof for the membership of x in S. We note that the soundness condition captures the “security” of the verification procedure, that is, its ability not to be fooled (by anything) into proclaiming a wrong assertion. The completeness condition captures the “viability” of the verification procedure, that is, its ability to be convinced of any valid assertion, when presented with an adequate proof. (We stress that, in general, proof systems are defined in terms of their verification procedures, which must satisfy adequate completeness and soundness conditions.) Our focus here is on efficient verification procedures that utilize relatively short proofs (i.e., proofs that are of length that is polynomially bounded by the length of the corresponding assertion).3 3
Advanced comment: In a continuation of footnote 1, we note that in this chapter we consider deterministic (polynomialtime) verification procedures, and consequently, the completeness and soundness conditions that we state here are errorless. In contrast, in Chapter 9, we will consider various types of probabilistic (polynomialtime) verification procedures as well as probabilistic completeness and soundness conditions. A common theme that underlies both treatments is that efficient verification is interpreted as meaning verification by a process that runs in time that is polynomial in the length of the assertion. In the current chapter, we use the equivalent formulation that
51
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Let us consider a couple of examples before turning to the actual definition. Starting with the set of Hamiltonian graphs, we note that this set has a verification procedure that, given a pair (G, π), accepts if and only if π is a Hamiltonian path in the graph G. In this case π serves as a proof that G is Hamiltonian. Note that such proofs are relatively short (i.e., the path is actually shorter than the description of the graph) and are easy to verify. Needless to say, this proof system satisfies the aforementioned completeness and soundness conditions. Turning to the case of satisfiable Boolean formulae, given a formula φ and a truth assignment τ , the verification procedure instantiates φ (according to τ ), and accepts if and only if simplifying the resulting Boolean expression yields the value true. In this case τ serves as a proof that φ is satisfiable, and the alleged proofs are indeed relatively short and easy to verify. Definition 2.5 (efficiently verifiable proof systems): • A decision problem S ⊆ {0, 1}∗ has an efficiently verifiable proof system if there exists a polynomial p and a polynomialtime (verification) algorithm V such that the following two conditions hold: 1. Completeness: For every x ∈ S, there exists y of length at most p(x) such that V (x, y) = 1. (Such a string y is called an NPwitness for x ∈ S.) 2. Soundness: For every x ∈ S and every y, it holds that V (x, y) = 0. Thus, x ∈ S if and only if there exists y of length at most p(x) such that V (x, y) = 1. In such a case, we say that S has an NPproof system, and refer to V as its verification procedure (or as the proof system itself). • We denote by N P the class of decision problems that have efficiently verifiable proof systems. We note that the term NPwitness is commonly used.4 In some cases, V (or the set of pairs accepted by V ) is called a witness relation of S. We stress that the same set S may have many different NPproof systems (see Exercise 2.2), and that in some cases the difference is not artificial (see Exercise 2.3). Teaching note: Using Definition 2.5, it is typically easy to show that natural decision problems are in N P. All that is needed is designing adequate NPproofs of membership, which is typically quite straightforward and natural, because natural decision problems are typically phrased as asking about the existence of a structure (or object) that can be easily verified as valid. For example, SAT is defined as the set of satisfiable Boolean formulae, which means asking about the existence of satisfying assignments. Indeed, we can efficiently check whether a given assignment satisfies a given formula, which means that we have (a verification procedure for) an NPproof system for SAT.
considers the running time as a function of the total length of the assertion and the proof, but require that the latter has length that is polynomially bounded by the length of the assertion. 4 In most cases this is done without explicitly defining V , which is understood from the context and/or by common practice. In many texts, V is not called a proof system (nor a verification procedure of such a system), although this term is most adequate.
52
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.1. THE P VERSUS NP QUESTION
Note that for any search problem R in PC, the set of instances that have a solution with def respect to R (i.e., the set S R = {x : R(x) = ∅}) is in N P. Specifically, for any R ∈ PC, def consider the verification procedure V such that V (x, y) = 1 if and only if (x, y) ∈ R, and note that the latter condition can be decided in poly(x)time. Thus, any search problem in PC can be viewed as a problem of searching for (efficiently verifiable) proofs (i.e., NPwitnesses for membership in the set of instances having solutions). On the other hand, any NPproof system gives rise to a natural search problem in PC; that is, the problem of searching for a valid proof (i.e., an NPwitness) for the given instance (i.e, the verification procedure V yields the search problem that corresponds to R = {(x, y) : V (x, y) = 1}). Thus, S ∈ N P if and only if there exists R ∈ PC such that S = {x : R(x) = ∅}.
Teaching note: The last paragraph suggests another easy way of showing that natural decision problems are in N P: just thinking of the corresponding natural search problem. The point is that natural decision problems (in N P) are phrased as referring to whether a solution exists for the corresponding natural search problem. For example, in the case of SAT, the question is whether there exists a satisfying assignment to a given Boolean formula, and the corresponding search problem is finding such an assignment. But in all these cases, it is easy to check the correctness of solutions; that is, the corresponding search problem is in PC, which implies that the decision problem is in N P.
Observe that P ⊆ N P holds: A verification procedure for claims of membership in a set S ∈ P may just ignore the alleged NPwitness and run the decision procedure that is guaranteed by the hypothesis S ∈ P; that is, V (x, y) = A(x), where A is the aforementioned decision procedure. Indeed, the latter verification procedure is quite an abuse of the term (because it makes no use of the proof); however, it is a legitimate one. As we shall shortly see, the PvsNP Question refers to the question of whether such proofoblivious verification procedures can be used for every set that has some efficiently verifiable proof system. (Indeed, given that P ⊆ N P, the PvsNP Question is whether N P ⊆ P.)
2.1.2.3. The P Versus NP Question in Terms of Decision Problems Is it the case that NPproofs are useless? That is, is it the case that for every efficiently verifiable proof system one can easily determine the validity of assertions without looking at the proof? If that were the case, then proofs would be meaningless, because they would offer no fundamental advantage over directly determining the validity of the assertion. The conjecture P = N P asserts that proofs are useful: There exists sets in N P that cannot be decided by a polynomialtime algorithm, and so for these sets obtaining a proof of membership (for some instances) is useful (because we cannot efficiently determine membership by ourselves). In the foregoing paragraph we viewed P = N P as asserting the advantage of obtaining proofs over deciding the truth by ourselves. That is, P = N P asserts that (in some cases) verifying is easier than deciding. A slightly different perspective is that P = N P asserts that finding proofs is harder than verifying their validity. This is the case because, for any set S that has an NPproof system, the ability to efficiently find proofs of membership with respect to this system (i.e., finding an NPwitness of membership in S for any given x ∈ S), yields the ability to decide membership in S. Thus, for S ∈ N P \ P, it must be
53
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
harder to find proofs of membership in S than to verify the validity of such proofs (which can be done in polynomial time).
2.1.3. Equivalence of the Two Formulations As hinted several times, the two formulations of the PvsNP Questions are equivalent. That is, every search problem having efficiently checkable solutions is solvable in polynomialtime (i.e., PC ⊆ PF) if and only if membership in any set that has an NPproof system can be decided in polynomialtime (i.e., N P ⊆ P). Recalling that P ⊆ N P (whereas PF is not contained in PC (Exercise 2.1)), we prove the following. Theorem 2.6: PC ⊆ PF if and only if P = N P. Proof: Suppose, on the one hand, that the inclusion holds for the search version (i.e., PC ⊆ PF). We will show that this implies the existence of an efficient algorithm for finding NPwitnesses for any set in N P, which in turn implies that this set is in P. Specifically, let S be an arbitrary set in N P, and V be the corresponding verification def procedure (i.e., satisfying the conditions in Definition 2.5). Then R = {(x, y) : V (x, y) = 1} is a polynomially bounded relation in PC, and by the hypothesis its search problem is solvable in polynomialtime (i.e., R ∈ PC ⊆ PF). Denoting by A the polynomialtime algorithm solving the search problem of R, we decide membership in S in the obvious way. That is, on input x, we output 1 if and only if A(x) = ⊥, where the latter event holds if and only if A(x) ∈ R(x), which in turn occurs if and only if R(x) = ∅ (equiv., x ∈ S). Thus, N P ⊆ P (and N P = P) follows. Suppose, on the other hand, that N P = P. We will show that this implies an efficient algorithm for determining whether a given string y is a prefix of some solution to a given instance x of a search problem in PC, which in turn yields an efficient algorithm for finding solutions. Specifically, let R be an arbitrary search def problem in PC. Then the set S R = {x, y : ∃y s.t. (x, y y ) ∈ R} is in N P (because R ∈ PC), and hence S R is in P (by the hypothesis N P = P). This yields a polynomialtime algorithm for solving the search problem of R, by extending a prefix of a potential solution bit by bit (while using the decision procedure to determine whether or not the current prefix is valid). That is, on input x, we first check whether or not (x, λ) ∈ S R and output ⊥ (indicating R(x) = ∅) in case (x, λ) ∈ S R . Next, we proceed in iterations, maintaining the invariant that (x, y ) ∈ S R . In each iteration, we set y ← y 0 if (x, y 0) ∈ S R and y ← y 1 if (x, y 1) ∈ S R . If none of these conditions hold (which happens after at most polynomially many iterations) then the current y satisfies (x, y ) ∈ R. Thus, for an arbitrary R ∈ PC we obtain that R ∈ PF, and PC ⊆ PF follows. Reflection. The first part of the proof of Theorem 2.6 associates with each set S in N P a natural relation R (in PC). Specifically, R consists of all pairs (x, y) such that y is an NPwitness for membership of x in S. Thus, the search problem of R consists of finding such an NPwitness, when given x as input. Indeed, R is called the witness relation of S, and solving the search problem of R allows for deciding membership in S. Thus, R ∈ PC ⊆ PF implies S ∈ P. In the second part of the proof, we associate with each R ∈ PC a set S R
54
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.1. THE P VERSUS NP QUESTION def
(in N P), but S R is more “expressive” than the set S R = {x : ∃y s.t. (x, y) ∈ R} (which gives rise to R as its witness relation). Specifically, S R consists of strings that encode pairs (x, y ) such that y is a prefix of some string in R(x) = {y : (x, y) ∈ R}. The key observation is that deciding membership in S R allows for solving the search problem of R; that is, S R ∈ P implies R ∈ PF. Conclusion. Theorem 2.6 justifies the traditional focus on the decision version of the PvsNP Question. Indeed, given that both formulations of the question are equivalent, we may just study the less cumbersome one.
2.1.4. Two Technical Comments Regarding NP Recall that when defining PC (resp., N P) we have explicitly confined our attention to search problems of polynomially bounded relations (resp., NPwitnesses of polynomial length). An alternative formulation may allow a binary relation R to be in PC (resp., S ∈ N P) if membership of (x, y) in R can be decided in time that is polynomial in the length of x (resp., the verification of a candidate NPwitness y for membership of x in S is required to be performed in poly(x)time). Indeed, this means that the validity of y can be determined without reading all of it (which means that some substring of y can be used as the effective y in the original definitions). We comment that problems in PC (resp., N P) can be solved in exponential time (i.e., time exp(poly(x)) for input x). This can be done by an exhaustive search among all possible candidate solutions (resp., all possible candidate NPwitnesses). Thus, N P ⊆ EX P, where EX P denote the class of decision problems that can be solved in exponential time (i.e., time exp(poly(x)) for input x).
2.1.5. The Traditional Definition of NP Unfortunately, Definition 2.5 is not the commonly used definition of N P. Instead, traditionally, N P is defined as the class of sets that can be decided by a fictitious device called a nondeterministic polynomialtime machine (which explains the source of the notation NP). The reason that this class of fictitious devices is interesting is due to the fact that it captures (indirectly) the definition of NPproofs. Since the reader may come across the traditional definition of N P when studying different works, the author feels obliged to provide the traditional definition as well as a proof of its equivalence to Definition 2.5. Definition 2.7 (nondeterministic polynomialtime Turing machines): • A nondeterministic Turing machine is defined as in §1.2.3.2, except that the transition function maps symbolstate pairs to subsets of triples (rather than to a single triple) in × Q × {−1, 0, +1}. Accordingly, the configuration following a specific instantaneous configuration may be one of several possibilities, each determined by a different possible triple. Thus, the computations of a nondeterministic machine on a fixed input may result in different outputs. In the context of decision problems, one typically considers the question of whether or not there exists a computation that starting with a fixed input halts with output 1. We say that the nondeterministic machine M accept x if there
55
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
exists a computation of M, on input x, that halts with output 1. The set accepted by a nondeterministic machine is the set of inputs that are accepted by the machine. • A nondeterministic polynomialtime Turing machine is defined as one that makes a number of steps that is polynomial in the length of the input. Traditionally, N P is defined as the class of sets that are each accepted by some nondeterministic polynomialtime Turing machine. We stress that Definition 2.7 refers to a fictitious model of computation. Specifically, Definition 2.7 makes no reference to the number (or fraction) of possible computations of the machine (on a specific input) that yield a specific output.5 Definition 2.7 only refers to whether or not computations leading to a certain output exist (for a specific input). The question of what the mere existence of such possible computations means (in terms of real life) is not addressed, because the model of a nondeterministic machine is not meant to provide a reasonable model of a (reallife) computer. The model is meant to capture something completely different (i.e., it is meant to provide an elegant definition of the class N P, while relying on the fact that Definition 2.7 is equivalent to Definition 2.5). Teaching note: Whether or not Definition 2.7 is elegant is a matter of taste. For sure, many students find Definition 2.7 quite confusing, possibly because they assume that it represents some natural model of computation and consequently they allow themselves to be fooled by their intuition regarding such models. (Needless to say, the students’ intuition regarding computation is irrelevant when applied to a fictitious model.)
Note that, unlike other definitions in this chapter, Definition 2.7 makes explicit reference to a specific model of computation. Still, a similar extension can be applied to other models of computation by considering adequate nondeterministic computation rules. Also note that, without loss of generality, we may assume that the transition function maps each possible symbolstate pair to exactly two triples (cf. Exercise 2.4). Theorem 2.8: Definition 2.5 is equivalent to Definition 2.7. That is, a set S has an NPproof system if and only if there exists a nondeterministic polynomialtime machine that accepts S. Proof Sketch: Suppose, on the one hand, that the set S has an NPproof system, and let us denote the corresponding verification procedure by V . Consider the following nondeterministic polynomialtime machine, denoted M. On input x, machine M makes an adequate m = poly(x) number of nondeterministic steps, producing (nondeterministically) a string y ∈ {0, 1}m , and then emulates V (x, y). We stress that these nondeterministic steps may result in producing any mbit string y. Recall that x ∈ S if and only if there exists y of length at most poly(x) such that V (x, y) = 1. This implies that the set accepted by M equals S. 5
Advanced comment: In contrast, the definition of a probabilistic machine refers to this number (or, equivalently, to the probability that the machine produces a specific output, when the probability is essentially taken uniformly over all possible computations). Thus, a probabilistic machine refers to a natural model of computation that can be realized provided we can equip the machine with a source of randomness. For details, see Section 6.1.1.
56
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.1. THE P VERSUS NP QUESTION
Suppose, on the other hand, that there exists a nondeterministic polynomialtime machine M that accepts the set S. Consider a deterministic machine M that on input (x, y), where y has adequate length, emulates a computation of M on input x while using y to determine the nondeterministic steps of M. That is, the i th step of M on input x is determined by the i th bit of y (which indicates which of the two possible moves to make at the current step). Note that x ∈ S if and only if there exists y of length at most poly(x) such that M (x, y) = 1. Thus, M gives rise to an NPproof system for S.
2.1.6. In Support of P Different from NP Intuition and concepts constitute . . . the elements of all our knowledge, so that neither concepts without an intuition in some way corresponding to them, nor intuition without concepts, can yield knowledge. Immanuel Kant (1724–1804)
Kant speaks of the importance of both philosophical considerations (referred to as “concepts”) and empirical considerations (referred to as “intuition”) to science (referred to as (sound) “knowledge”). It is widely believed that P is different than NP; that is, that PC contains search problems that are not efficiently solvable, and that there are NPproof systems for sets that cannot be decided efficiently. This belief is supported by both philosophical and empirical considerations. • Philosophical considerations: Both formulations of the PvsNP Question refer to natural questions about which we have strong conceptions. The notion of solving a (search) problem seems to presume that, at least in some cases (if not in general), finding a solution is significantly harder than checking whether a presented solution is correct. This translates to PC \ PF = ∅. Likewise, the notion of a proof seems to presume that, at least in some cases (if not in general), the proof is useful in determining the validity of the assertion, that is, that verifying the validity of an assertion may be made significantly easier when provided with a proof. This translates to P = N P, which also implies that it is significantly harder to find proofs than to verify their correctness, which again coincides with the daily experience of researchers and students. • Empirical considerations: The class NP (or rather PC) contains thousands of different problems for which no efficient solving procedure is known. Many of these problems have arisen in vastly different disciplines, and were the subject of extensive research by numerous different communities of scientists and engineers. These essentially independent studies have all failed to provide efficient algorithms for solving these problems, a failure that is extremely hard to attribute to sheer coincidence or a stroke of bad luck. Throughout the rest of this book, we will adopt the common belief that P is different from NP. At some places, we will explicitly use this conjecture (or even stronger assumptions), whereas in other places we will present results that are interesting (if and) only if P = N P (e.g., the entire theory of NPcompleteness becomes uninteresting if P = N P). The P = N P conjecture is indeed very appealing and intuitive. The fact that this natural conjecture is unsettled seems to be one of the sources of frustration of many
57
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
complexity theorists. The author’s opinion, however, is that this feeling of frustration is not justified. In contrast, the fact that Complexity Theory evolves around natural and simply stated questions that are so difficult to resolve makes its study very exciting.
2.1.7. Philosophical Meditations Whoever does not value preoccupation with thoughts, can skip this chapter. Robert Musil, The Man Without Qualities, Chap. 28
The inherent limitations of our scientific knowledgewe were articulated by Kant, who argued that our knowledge cannot transcend our way of understanding. The “ways of understanding” are predetermined; they precede any knowledge acquisition and are the precondition to such acquisition. In a sense, Wittgenstein refined the analysis, arguing that knowledge must be formulated in a language, and the latter must be subject to a (sound) mechanism of assigning meaning. Thus, the inherent limitations of any possible “meaningassigning mechanism” impose limitations on what can be (meaningfully) said. Both philosophers spoke of the relation between the world and our thoughts. They took for granted (or rather assumed) that, in the domain of wellformulated thoughts (e.g., logic), every valid conclusion can be effectively reached (i.e., every valid assertion can be effectively proved). Indeed, this naive assumption was refuted by G¨odel. In a similar vain, Turing’s work asserts that there exist welldefined problems that cannot be solved by welldefined methods. The latter assertion transcends the philosophical considerations of the first paragraph: It asserts that the limitations of our ability are not due only to the gap between the “world as is” and our model of it. Indeed, this assertion refers to inherent limitations on any rational process even when this process is applied to wellformulated information and is aimed at a wellformulated goal. Indeed, in contrast to naive presumptions, not every wellformulated problem can be (effectively) solved. The P = N P conjecture goes even beyond the foregoing. It limits the domain of the discussion to “fair” problems, that is, to problems for which valid solutions can be efficiently recognized as such. Indeed, there is something feigned in problems for which one cannot efficiently recognize valid solutions. Avoiding such feigned and/or unfair problems, P = N P means that (even with this limitation) there exist problems that are inherently unsolvable in the sense that they cannot be solved efficiently. That is, in contrast to naive presumptions, not every problem that refers to efficiently recognizable solutions can be solved efficiently. In fact, the gap between the complexity of recognizing solutions and the complexity of finding them vouches for the meaningfulness of the notion of a problem.
2.2. PolynomialTime Reductions We present a general notion of (polynomialtime) reductions among computational problems, and view the notion of a “Karpreduction” as an important special case that suffices (and is more convenient) in many cases. Reductions play a key role in the theory of NPcompleteness, which is the topic of Section 2.3. In the current section, we stress the fundamental nature of the notion of a reduction per se and highlight two specific applications (i.e., reducing search and optimization problems to decision problems). Furthermore,
58
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.2. POLYNOMIALTIME REDUCTIONS
in the latter applications, it will be important to use the general notion of a reduction (i.e., “Cookreduction” rather than “Karpreduction”). Teaching note: We assume that many students have heard of reductions, but we fear that most have obtained a conceptually poor view of their fundamental nature. In particular, we fear that reductions are identified with the theory of NPcompleteness, while reductions have numerous other important applications that have little to do with NPcompleteness (or completeness with respect to some other class). Furthermore, we believe that it is important to show that natural search and optimization problems can be reduced to decision problems.
2.2.1. The General Notion of a Reduction Reductions are procedures that use “functionally specified” subroutines. That is, the functionality of the subroutine is specified, but its operation remains unspecified and its running time is counted at unit cost. Analogously to algorithms, which are modeled by Turing machines, reductions can be modeled as oracle (Turing) machines. A reduction solves one computational problem (which may be either a search or a decision problem) by using oracle (or subroutine) calls to another computational problem (which again may be either a search or a decision problem).
2.2.1.1. The Actual Formulation The notion of a general algorithmic reduction was discussed in §1.2.3.3 and §1.2.3.6. These reductions, called Turingreductions (cf. §1.2.3.3) and modeled by oracle machines (cf. §1.2.3.6), made no reference to the time complexity of the main algorithm (i.e., the oracle machine). Here, we focus on efficient (i.e., polynomialtime) reductions, which are often called Cookreductions. That is, we consider oracle machines (as in Definition 1.11) that run in time polynomial in the length of their input. We stress that the running time of an oracle machine is the number of steps made during its (own) computation, and that the oracle’s reply on each query is obtained in a single step. The key property of efficient reductions is that they allow for the transformation of efficient implementations of the subroutine into efficient implementations of the task reduced to it. That is, as we shall see, if one problem is Cookreducible to another problem and the latter is polynomialtime solvable then so is the former. The most popular case is that of reducing decision problems to decision problems, but we will also consider reducing search problems to search problems and reducing search problems to decision problems. Note that when reducing to a decision problem, the oracle is determined as the single valid solver of the decision problem (i.e., the function f : {0, 1}∗ → {0, 1} solves the decision problem of membership in S if, for every x, it holds that f (x) = 1 if x ∈ S and f (x) = 0 otherwise). In contrast, when reducing to a search problem the oracle is not uniquely determined because there may be many different valid solvers (i.e., the function f : {0, 1}∗ → {0, 1}∗ ∪ {⊥} solves the search problem of R if, for every x, it holds that f (x) ∈ R(x) if x ∈ S R and f (x) = ⊥ otherwise). We capture both cases in the following definition. Definition 2.9 (Cookreduction): A problem is Cookreducible to a problem if there exists a polynomialtime oracle machine M such that for every function f that solves it holds that M f solves , where M f (x) denotes the output of machine M on input x when given oracle access to f .
59
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Note that (resp., ) may be either a search problem or a decision problem (or even a yet undefined type of a problem). At this point the reader should verify that if is Cookreducible to and is solvable in polynomial time then so is . (See Exercise 2.5 for other properties of Cookreductions.) Observe that the second part of the proof of Theorem 2.6 is actually a Cookreduction of the search problem of any R in PC to a decision problem regarding a related set S R = {(x, y ) : ∃y s.t. (x, y y ) ∈ R}, which in N P. Thus, that proof establishes the following result. Theorem 2.10: Every search problem in PC is Cookreducible to some decision problem in N P. We shall see a tighter relation between search and decision problems in Section 2.2.3; that is, in some cases, R will be reduced to S R = {x : ∃y s.t. (x, y) ∈ R} rather than to S R .
2.2.1.2. Special Cases A Karpreduction is a special case of a reduction (from a decision problem to a decision problem). Specifically, for decision problems S and S , we say that S is Karpreducible to S if there is a reduction of S to S that operates as follows: On input x (an instance for S), the reduction computes x , makes query x to the oracle S (i.e., invokes the subroutine for S on input x ), and answers whatever the latter returns. This reduction is often represented by the polynomialtime computable mapping of x to x ; that is, the standard definition of a Karpreduction is actually as follows. Definition 2.11 (Karpreduction): A polynomialtime computable function f is called a Karpreduction of S to S if, for every x, it holds that x ∈ S if and only if f (x) ∈ S . Thus, syntactically speaking, a Karpreduction is not a Cookreduction, but it trivially gives rise to one (i.e., on input x, the oracle machine makes query f (x), and returns the oracle answer). Being slightly inaccurate but essentially correct, we shall say that Karpreductions are special cases of Cookreductions. Needless to say, Karpreductions constitute a very restricted case of Cookreductions. Still, this restricted case suffices for many applications (e.g., most importantly for the theory of NPcompleteness (when developed for decision problems)), but not for reducing a search problem to a decision problem. Furthermore, whereas each decision problem is Cookreducible to its complement, some decision problems are not Karpreducible to their complement (see Exercises 2.7 and 2.33). We comment that Karpreductions may (and should) be augmented in order to handle reductions of search problems to search problems. Such an augmented Karpreduction of the search problem of R to the search problem of R operates as follows: On input x (an instance for R), the reduction computes x , makes query x to the oracle R (i.e., invokes the subroutine for searching R on input x ) obtaining y such that (x , y ) ∈ R , and uses y to compute a solution y to x (i.e., y ∈ R(x)). Thus, such a reduction can be represented by two polynomialtime computable mappings, f and g, such that (x, g(x, y )) ∈ R for any y that is a solution of f (x) (i.e., for y that satisfies ( f (x), y ) ∈ R ). (Indeed, in general, unlike in the case of decision problems, the reduction cannot just return y as an answer to x.) This augmentation is called a Levinreduction and, analogously to the case
60
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.2. POLYNOMIALTIME REDUCTIONS
of a Karpreduction, it is often represented by the two aforementioned polynomialtime computable mappings (i.e., of x to x , and of (x, y ) to y). Definition 2.12 (Levinreduction): A pair of polynomialtime computable functions, f and g, is called a Levinreduction of R to R if f is a Karpreduction of S R = {x : ∃y s.t. (x, y) ∈ R} to S R = {x : ∃y s.t. (x , y ) ∈ R } and for every x ∈ S R and y ∈ R ( f (x)) it holds that (x, g(x, y )) ∈ R, where R (x ) = {y : (x , y ) ∈ R }. Indeed, the function f preserves the existence of solutions; that is, for any x, it holds that R(x) = ∅ if and only if R ( f (x)) = ∅. As for the second function (i.e., g), it maps any solution y for the reduced instance f (x) to a solution for the original instance x (where this mapping may also depend on x). We note that it is also natural to consider a third function that maps solutions for R to solutions for R (see Exercise 2.28).
2.2.1.3. Terminology and a Brief Discussion In the sequel, whenever we neglect to mention the type of a reduction, we refer to a Cookreduction. Two additional terms, which will be particularly useful in the advanced chapters, are presented next. • We say that two problems are computationally equivalent if they are reducible to one another. This means that the two problems are essentially as hard (or as easy). Note that computationally equivalent problems need not reside in the same complexity class. For example, as we shall see in Section 2.2.3, there exist many natural R ∈ PC such that the search problem of R and the decision problem of S R = {x : ∃y s.t. (x, y) ∈ R} are computationally equivalent, although (even syntactically) the two problems do not belong to the same class (i.e., R ∈ PC whereas S R ∈ N P). Also, each decision problem is computationally equivalent to its complement, although the two problems may not belong to the same class (see, e.g., Section 2.4.3). • We say that a class of problems, C, is reducible to a problem if every problem in C is reducible to . We say that the class C is reducible to the class C if for every ∈ C there exists ∈ C such that is reducible to . For example, Theorem 2.10 asserts that PC is reducible to N P. The fact that we allow Cookreductions is essential to various important connections between decision problems and other computational problems. For example, as will be shown in Section 2.2.2, a natural class of optimization problems is reducible to N P. Also recall that PC is reducible to N P (cf. Theorem 2.10). Furthermore, as will be shown in Section 2.2.3, many natural search problems in PC are reducible to a corresponding natural decision problem in N P (rather than merely to some problem in N P). In all of these results, the reductions in use are (and must be) Cookreductions.
2.2.2. Reducing Optimization Problems to Search Problems Many search problems refer to a set of potential solutions, associated with each problem instance, such that different solutions are assigned different “values” (resp., “costs”). In such a case, one may be interested in finding a solution that has value exceeding some threshold (resp., cost below some threshold). Alternatively, one may seek a solution of
61
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
maximum value (resp., minimum cost). For simplicity, let us focus on the case of a value that we wish to maximize. Still, there are two different objectives (i.e., exceeding a threshold and optimizing), giving rise to two different (auxiliary) search problems related to the same relation R. Specifically, for a binary relation R and a value function f : {0, 1}∗ × {0, 1}∗ → R, we consider two search problems. 1. Exceeding a threshold: Given a pair (x, v) the task is to find y ∈ R(x) such that f (x, y) ≥ v, where R(x) = {y : (x, y) ∈ R}. That is, we are actually referring to the search problem of the relation def
R f = {(x, v, y) : (x, y) ∈ R ∧ f (x, y) ≥ v},
(2.1)
where x, v denotes a string that encodes the pair (x, v). 2. Maximization: Given x the task is to find y ∈ R(x) such that f (x, y) = vx , where vx is the maximum value of f (x, y ) over all y ∈ R(x). That is, we are actually referring to the search problem of the relation { f (x, y )}}. R f = {(x, y) ∈ R : f (x, y) = max def
y ∈R(x)
(2.2)
Examples of value functions include the size of a clique in a graph, the amount of flow in a network (with link capacities), etc. The task may be to find a clique of size exceeding a given threshold in a given graph or to find a maximumsize clique in a given graph. Note that, in these examples, the “base” search problem (i.e., the relation R) is quite easy to solve, and the difficulty arises from the auxiliary condition on the value of a solution (presented in R f and R f ). Indeed, one may trivialize R (i.e., let R(x) = {0, 1}poly(x) for every x), and impose all necessary structure by the function f (see Exercise 2.8). We confine ourselves to the case that f is polynomialtime computable, which in particular means that f (x, y) can be represented by a rational number of length polynomial in x + y. We will show next that, in this case, the two aforementioned search problems (i.e., of R f and R f ) are computationally equivalent. Theorem 2.13: For any polynomialtime computable f : {0, 1}∗ ×{0, 1}∗ → R and a polynomially bounded binary relation R, let R f and R f be as in Eq. (2.1) and Eq. (2.2), respectively. Then the search problems of R f and R f are computationally equivalent. Note that, for R ∈ PC and polynomialtime computable f , it holds that R f ∈ PC. Combining Theorems 2.10 and 2.13, it follows that in this case both R f and R f are reducible to N P. We note, however, that even in this case it does not necessarily hold that R f ∈ PC. See further discussion following the proof. Proof: The search problem of R f is reduced to the search problem of R f by finding an optimal solution (for the given instance) and comparing its value to the given threshold value. That is, we construct an oracle machine that solves R f by making a single query to R f . Specifically, on input (x, v), the machine issues the query x (to a solver for R f ), obtaining the optimal solution y (or an indication ⊥ that R(x) = ∅), computes f (x, y), and returns y if f (x, y) ≥ v. Otherwise (i.e., either y = ⊥ or f (x, y) < v), the machine returns an indication that R f (x, v) = ∅. Turning to the opposite direction, we reduce the search problem of R f to the search problem of R f by first finding the optimal value vx = max y∈R(x) { f (x, y)}
62
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.2. POLYNOMIALTIME REDUCTIONS
(by binary search on its possible values), and next finding a solution of value vx . In both steps, we use oracle calls to R f . For simplicity, we assume that f assigns positive integer values, and let = poly(x) be such that f (x, y) ≤ 2 − 1 for every y ∈ R(x). Then, on input x, we first find vx = max{ f (x, y) : y ∈ R(x)}, by making oracle calls of the form x, v. The point is that vx < v if and only if R f (x, v) = ∅, which in turn is indicated by the oracle answer ⊥ (to the query x, v). Making queries, we determine vx (see Exercise 2.9). Note that in case R(x) = ∅, all answers will indicate that R f (x, v) = ∅, which we treat as if vx = 0. Finally, we make the query (x, vx ), and halt returning the oracle’s answer (which is y ∈ R(x) such that f (x, y) = vx if vx > 0 and an indication that R(x) = ∅ otherwise). Proof’s Digest: Note that the first direction uses the hypothesis that f is polynomialtime computable, whereas the opposite direction only used the fact that the optimal value lies in a finite space of exponential size that can be “efficiently searched.” While the first direction can be proved using a Levinreduction, this seems impossible for the opposite direction (in general). On the complexity of R f and R f . We focus on the natural case in which R ∈ PC and f is polynomialtime computable. In this case, Theorem 2.13 asserts that R f and R f are computationally equivalent. A closer look reveals, however, that R f ∈ PC always holds, whereas R f ∈ PC does not necessarily hold. That is, the problem of finding a solution (for a given instance) that exceeds a given threshold is in the class PC, whereas the problem of finding an optimal solution is not necessarily in the class PC. For example, the problem of finding a clique of a given size K in a given graph G is in PC, whereas the problem of finding a maximum size clique in a given graph G is not known (and is quite unlikely) to be in PC (although it is Cookreducible to PC). Indeed, the class of problems that are reducible to PC is a natural and interesting class (see further discussion at the end of Section 3.2.1). Needless to say, for every R ∈ PC and polynomialtime computable f , the former class contains R f .
2.2.3. SelfReducibility of Search Problems The results to be presented in this section further justify the focus on decision problems. Loosely speaking, these results show that for many natural relations R, the question of whether or not the search problem of R is efficiently solvable (i.e., is in PF) is equivalent to the question of whether or not the “decision problem implicit in R” (i.e., S R = {x : ∃y s.t. (x, y) ∈ R}) is efficiently solvable (i.e., is in P). In fact, we will show that these two computational problems (i.e., R and S R ) are computationally equivalent. Note that the decision problem of S R is easily reducible to the search problem of R, and so our focus is on the other direction. That is, we are interested in relations R for which the search problem of R is reducible to the decision problem of S R . In such a case, we say that R is selfreducible. Teaching note: Our usage of the term selfreducibility differs from the traditional one. Traditionally, a decision problem is called (downward) selfreducible if it is Cookreducible to itself via a reduction that on input x only makes queries that are smaller than x (according to some appropriate measure on the size of strings). Under some natural restrictions
63
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
(i.e., the reduction takes the disjunction of the oracle answers) such reductions yield reductions of search to decision (as discussed in the main text). For further details, see Exercise 2.13.
Definition 2.14 (the decision implicit in a search and selfreducibility): The decision problem implicit in the search problem of R is deciding membership in the set S R = {x : R(x) = ∅}, where R(x) = {y : (x, y) ∈ R}. The search problem of R is called selfreducible if it can be reduced to the decision problem of S R . Note that the search problem of R and the problem of deciding membership in S R refer to the same instances: The search problem requires finding an adequate solution (i.e., given x find y ∈ R(x)), whereas the decision problem refers to the question of whether such solutions exist (i.e., given x determine whether or not R(x) is nonempty). Thus, S R is really the “decision problem implicit in R,” because it is a decision problem that one implicitly solves when solving the search problem of R. Indeed, for any R, the decision problem of S R is easily reducible to the search problem for R (and if R is in PC then S R is in N P).6 It follows that if a search problem R is selfreducible then it is computationally equivalent to the decision problem S R . Note that the general notion of a reduction (i.e., Cookreduction) seems inherent to the notion of selfreducibility. This is the case not only due to syntactic considerations, but also due to the following inherent reason. An oracle to any decision problem returns a single bit per invocation, while the intractability of a search problem in PC must be due to lacking more than a “single bit of information” (see Exercise 2.10). We shall see that selfreducibility is a property of many natural search problems (including all NPcomplete search problems). This justifies the relevance of decision problems to search problems in a stronger sense than established in Section 2.1.3: Recall that in Section 2.1.3 we showed that the fate of the search problem class PC (wrt PF) is determined by the fate of the decision problem class N P (wrt P). Here we show that, for many natural search problems in PC (i.e., selfreducible ones), the fate of such a problem R (wrt PF) is determined by the fate of the decision problem S R (wrt P), where S R is the decision problem implicit in R.
2.2.3.1. Examples We now present a few search problems that are selfreducible. We start with SAT (see Appendix G.2), the set of satisfiable Boolean formulae (in CNF), and consider the search problem in which given a formula one should provide a truth assignment that satisfies it. The corresponding relation is denoted RSAT ; that is, (φ, τ ) ∈ RSAT if τ is a satisfying assignment to the formula φ. The decision problem implicit in RSAT is indeed SAT. Note that RSAT is in PC (i.e., it is polynomially bounded and membership of (φ, τ ) in RSAT is easy to decide (by evaluating a Boolean expression)). Proposition 2.15 (RSAT is selfreducible): The search problem of RSAT is reducible to SAT. Thus, the search problem of RSAT is computationally equivalent to deciding membership in SAT. Hence, in studying the complexity of SAT, we also address the complexity of the search problem of RSAT . 6
For example, the reduction invokes the search oracle and answer 1 if and only if the oracle returns some string (rather than the “no solution” symbol).
64
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.2. POLYNOMIALTIME REDUCTIONS
Proof: We present an oracle machine that solves the search problem of RSAT by making oracle calls to SAT. Given a formula φ, we find a satisfying assignment to φ (in case such an assignment exists) as follows. First, we query SAT on φ itself, and return an indication that there is no solution if the oracle answer is 0 (indicating φ ∈ SAT). Otherwise, we let τ , initiated to the empty string, denote a prefix of a satisfying assignment of φ. We proceed in iterations, where in each iteration we extend τ by one bit. This is done as follows: First we derive a formula, denoted φ , by setting the first τ  + 1 variables of φ according to the values τ 0. We then query SAT on φ (which means that we ask whether or not τ 0 is a prefix of a satisfying assignment of φ). If the answer is positive then we set τ ← τ 0; otherwise we set τ ← τ 1. This procedure relies on the fact that if τ is a prefix of a satisfying assignment of φ and τ 0 is not a prefix of a satisfying assignment of φ then τ 1 must be a prefix of a satisfying assignment of φ. We wish to highlight a key point that has been blurred in the foregoing description. Recall that the formula φ is obtained by replacing some variables by constants, which means that φ per se contains Boolean variables as well as Boolean constants. However, the standard definition of SAT disallows Boolean constants in its instances.7 Nevertheless, φ can be simplified such that the resulting formula contains no Boolean constants. This simplification is performed according to the straightforward Boolean rules: That is, the constant false can be omitted from any clause, but if a clause contains only occurrences of the constant false then the entire formula simplifies to false. Likewise, if the constant true appears in a clause then the entire clause can be omitted, and if all clauses are omitted then the entire formula simplifies to true. Needless to say, if the simplification process yields a Boolean constant then we may skip the query, and otherwise we just use the simplified form of φ as our query. Other examples. Reductions analogous to the one used in the proof of Proposition 2.15 can also be presented for other search problems (and not only for NPcomplete ones). Two such examples are searching for a 3coloring of a given graph and searching for an isomorphism between a given pair of graphs (where the first problem is known to be NPcomplete and the second problem is believed not to be NPcomplete). In both cases, the reduction of the search problem to the corresponding decision problem consists of iteratively extending a prefix of a valid solution, by making suitable queries in order to decide which extension to use. Note, however, that in these two cases the process of getting rid of constants (representing partial solutions) is more involved. Specifically, in the case of Graph 3Colorability (resp., Graph Isomorphism) we need to enforce a partial coloring of a given graph (resp., a partial isomorphism between a given pair of graphs); see Exercises 2.11 and 2.12, respectively. Reflection. The proof of Proposition 2.15 (as well as the proofs of similar results) consists of two observations. 1. For every relation R in PC, it holds that the search problem of R is reducible to the decision problem of S R = {(x, y ) : ∃y s.t. (x, y y ) ∈ R}. Such a 7
While the problem seems rather technical at the current setting (as it merely amounts to whether or not the definition of SAT allows Boolean constants in its instances), it is far from being so technical in other cases (see Exercises 2.11 and 2.12).
65
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
reduction is explicit in the proof of Theorem 2.6 and is implicit in the proof of Proposition 2.15. 2. For specific R ∈ PC (e.g., SSAT ), deciding membership in S R is reducible to deciding membership in S R = {x : ∃y s.t. (x, y) ∈ R}. This is where the specific structure of SAT was used, allowing for a direct and natural transformation of instances of S R to instances of S R . (We comment that if S R is NPcomplete then S R , which is always in N P, is reducible to S R by the mere fact that S R is NPcomplete; this comment is related to the following advanced comment.) For an arbitrary R ∈ PC, deciding membership in S R is not necessarily reducible to deciding membership in S R . Furthermore, deciding membership in S R is not necessarily reducible to the search problem of R. (See Exercises 2.14, 2.15, and 2.16.) In general, selfreducibility is a property of the search problem and not of the decision problem implicit in it. Furthermore, under plausible assumptions (e.g., the intractability of factoring), there exist relations R1 , R2 ∈ PC having the same implicitdecision problem (i.e., {x : R1 (x) = ∅} = {x : R2 (x) = ∅}) such that R1 is selfreducible but R2 is not (see Exercise 2.17). However, for many natural decision problems this phenomenon does not arise; that is, for many natural NPdecision problems S, any NPwitness relation associated with S (i.e., R ∈ PC such that {x : R(x) = ∅} = S) is selfreducible. Indeed, see the other examples following the proof of Proposition 2.15 as well as the advanced discussion in §2.2.3.2.
2.2.3.2. SelfReducibility of NPComplete Problems Teaching note: In this advanced subsection, we assume that the students have heard of NPcompleteness. Actually, we only need the students to know the definition of NPcompleteness (i.e., a set S is N Pcomplete if S ∈ N P and every set in N P is reducible to S). Yet, the teacher may prefer postponing the presentation of the following advanced discussion to Section 2.3.1 (or even to a later stage).
Recall that, in general, selfreducibility is a property of the search problem R and not of the decision problem implicit in it (i.e., S R = {x : R(x) = ∅}). In contrast, in the special case of NPcomplete problems, selfreducibility holds for any witness relation associated with the (NPcomplete) decision problem. That is, all search problems that refer to finding NPwitnesses for any NPcomplete decision problem are selfreducible. Theorem 2.16: For every R in PC such that S R is N Pcomplete, the search problem of R is reducible to deciding membership in S R . In many cases, as in the proof of Proposition 2.15, the reduction of the search problem to the corresponding decision problem is quite natural. The following proof presents a generic reduction (which may be “unnatural” in some cases). Proof: In order to reduce the search problem of R to deciding S R , we compose the following two reductions: 1. A reduction of the search problem of R to deciding membership in S R = {(x, y ) : ∃y s.t. (x, y y ) ∈ R}.
66
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
As stated in the foregoing paragraph (titled “Reflection”), such a reduction is implicit in the proof of Proposition 2.15 (as well as being explicit in the proof of Theorem 2.6). 2. A reduction of S R to S R . This reduction exists by the hypothesis that S R is N Pcomplete and the fact that S R ∈ N P. (Note that we do not assume that this reduction is a Karpreduction, and furthermore it may be an “unnatural” reduction). The theorem follows.
2.2.4. Digest and General Perspective Recall that we presented (polynomialtime) reductions as (efficient) algorithms that use functionally specified subroutines. That is, an efficient reduction of problem to problem is an efficient algorithm that solves while making subroutine calls to any procedure that solves . This presentation fits the “natural” (“positive”) application of such a reduction; that is, combining such a reduction with an efficient implementation of the subroutine (solving ), we obtain an efficient algorithm for solving . We note that the existence of a polynomialtime reduction of to actually means more than the latter implication. For example, even applying such a reduction to an inefficient algorithm for solving yields something for ; that is, if is solvable in time t then is solvable in time t such that t(n) = poly(n) · t (poly(n)) (e.g., if t (n) = n log2 n then t(n) = n O(log n) ). Thus, the existence of a polynomialtime reduction of to yields an upper bound on the time complexity of in terms of the time complexity of . We note that tighter relations between the complexity of and can be established whenever the reduction satisfies additional properties. For example, suppose that is polynomialtime reducible to by a reduction that makes queries of linear length (i.e., on input x each query has length O(x)). Then, if is solvable in time t then is solvable √ √ n in time t such that t(n) = poly(n) · t (O(n)) (e.g., if t (n) = 2 then t(n) = 2 O( n) ). We further note that bounding other complexity measures of the reduction (e.g., its space complexity) allows for relating the corresponding complexities of the problems; see Section 5.2.2. In contrast to the foregoing “positive” applications of polynomialtime reductions, the theory of NPcompleteness (presented in Section 2.3) is famous for its “negative” application of such reductions. Let us elaborate. The fact that is polynomialtime reducible to means that if solving is feasible then solving is feasible. The direct “positive” application starts with the hypothesis that is feasibly solvable and infers that so is . In contrast, the “negative” application uses the counterpositive: it starts with the hypothesis that solving is infeasible and infers that the same holds for .
2.3. NPCompleteness In light of the difficulty of settling the PvsNP Question, when faced with a hard problem H in NP, we cannot expect to prove that H is not in P (unconditionally). The best we can expect is a conditional proof that H is not in P, based on the assumption that NP is different from P. The contrapositive is proving that if H is in P, then so is any problem in NP (i.e., NP equals P). One possible way of proving such an assertion is showing that
67
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
any problem in NP is polynomialtime reducible to H. This is the essence of the theory of NPcompleteness. Teaching note: Some students have heard of NPcompleteness before, but we suspect that many have missed important conceptual points. Specifically, we fear that they missed the point that the mere existence of NPcomplete problems is amazing (let alone that these problems include natural ones such as SAT). We believe that this situation is a consequence of presenting the detailed proof of Cook’s Theorem as the very first thing right after defining NPcompleteness.
2.3.1. Definitions The standard definition of NPcompleteness refers to decision problems. In the following we will also present a definition of NPcomplete (or rather PCcomplete) search problems. In both cases, NPcompleteness of a problem combines two conditions: 1. is in the class (i.e., being in N P or PC, depending on whether is a decision or a search problem). 2. Each problem in the class is reducible to . This condition is called NPhardness. Although a perfectly good definition of NPhardness could have allowed arbitrary Cookreductions, it turns out that Karpreductions (resp., Levinreductions) suffice for establishing the NPhardness of all natural NPcomplete decision (resp., search) problems. Consequently, NPcompleteness is usually defined using this restricted notion of a polynomialtime reduction. Definition 2.17 (NPcompleteness of decision problems, restricted notion): A set S is N P complete if it is in N P and every set in N P is Karpreducible to S. A set is N P hard if every set in N P is Karpreducible to it. Indeed, there is no reason to insist on Karpreductions (rather than using arbitrary Cookreductions), except that the restricted notion suffices for all known demonstrations of NPcompleteness and is easier to work with. An analogous definition applies to search problems. Definition 2.18 (NPcompleteness of search problems, restricted notion): A binary relation R is PC complete if it is in PC and every relation in PC is Levinreducible to R. In the sequel, we will sometimes abuse the terminology and refer to search problems as NPcomplete (rather than PCcomplete). Likewise, we will say that a search problem is NPhard (rather than PC hard) if every relation in PC is Levinreducible to it. We stress that the mere fact that we have defined a property (i.e., NPcompleteness) does not mean that there exist objects that satisfy this property. It is indeed remarkable that NPcomplete problems do exist. Such problems are “universal” in the sense that solving them allows for solving any other (reasonable) problem (i.e., problems in NP).
68
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
2.3.2. The Existence of NPComplete Problems We suggest not to confuse the mere existence of NPcomplete problems, which is remarkable by itself, with the even more remarkable existence of “natural” NPcomplete problems. The following proof delivers the first message as well as focuses on the essence of NPcompleteness, rather than on more complicated technical details. The essence of NPcompleteness is that a single computational problem may “effectively encode” a wide class of seemingly unrelated problems. Theorem 2.19: There exist NPcomplete relations and sets. Proof: The proof (as well as any other NPcompleteness proofs) is based on the observation that some decision problems in N P (resp., search problems in PC) are “rich enough” to encode all decision problems in N P (resp., all search problems in PC). This fact is most obvious for the “generic” decision and search problems, denoted Su and Ru (and defined next), which are used to derive the simplest proof of the current theorem. We consider the following relation Ru and the decision problem Su implicit in Ru (i.e., Su = {x : ∃y s.t. (x, y) ∈ Ru }). Both problems refer to the same type of instances, which in turn have the form x = M, x, 1t , where M is a description of a (deterministic) Turing machine, x is a string, and t is a natural number. The number t is given in unary (rather than in binary) in order to allow various complexity measures, which depend on the instance length, to be polynomial in t (rather than polylogarithmic in t). Definition. The relation Ru consists of pairs (M, x, 1t , y) such that M accepts def
the input pair (x, y) within t steps, where y ≤ t.8 The corresponding set Su = {x : ∃y s.t. (x, y) ∈ Ru } consists of triples M, x, 1t such that machine M accepts some input of the form (x, ·) within t steps. It is easy to see that Ru is in PC and that Su is in N P. Indeed, Ru is recognizable by a universal Turing machine, which on input (M, x, 1t , y) emulates (t steps of) the computation of M on (x, y). (The fact that Su ∈ N P follows similarly.) We comment that u indeed stands for universal (i.e., universal machine), and the proof extends to any reasonable model of computation (which has adequate universal machines). We now turn to show that Ru and Su are NPhard in the adequate sense (i.e., Ru is PChard and Su is N Phard). We first show that any set in N P is Karpreducible to Su . Let S be a set in N P and let us denote its witness relation by R; that is, R is in PC and x ∈ S if and only if there exists y such that (x, y) ∈ R. Let p R be a polynomial bounding the length of solutions in R (i.e., y ≤ p R (x) for every (x, y) ∈ R), let M R be a polynomialtime machine deciding membership (of alleged (x, y) pairs) in R, and let t R be a polynomial bounding its running time. Then, the desired Karpreduction maps an instance x (for S) to the instance M R , x, 1t R (x+ p R (x)) (for Su ); that is, def
x !→ f (x) = M R , x, 1t R (x+ p R (x)) . 8
(2.3)
Instead of requiring that y ≤ t, one may require that M is “canonical” in the sense that it reads its entire input before halting.
69
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Note that this mapping can be computed in polynomial time, and that x ∈ S if and only if f (x) = M R , x, 1t R (x+ p R (x)) ∈ Su . Details follow. First, note that the mapping f does depend (of course) on S, and so it may depend on the fixed objects M R , p R , and TR (which depend on S). Thus, computing f on input x calls for printing the fixed string M R , copying x, and printing a number of 1’s that is a fixed polynomial in the length of x. Hence, f is polynomialtime computable. Second, recall that x ∈ S if and only if there exists y such that y ≤ p R (x) and (x, y) ∈ R. Since M R accepts (x, y) ∈ R within t R (x + y) steps, it follows that x ∈ S if and only if there exists y such that y ≤ p R (x) and M R accepts (x, y) within t R (x + y) steps. It follows that x ∈ S if and only if f (x) ∈ Su . We now turn to the search version. For reducing the search problem of any R ∈ PC to the search problem of Ru , we use essentially the same reduction. On input an instance x (for R), we make the query M R , x, 1t R (x+ p R (x)) to the search problem of Ru and return whatever the latter returns. Note that if x ∈ S then the answer will be “no solution,” whereas for every x and y it holds that (x, y) ∈ R if and only if (M R , x, 1t R (x+ p R (x)) , y) ∈ Ru . Thus, a Levinreduction of R to Ru consists of the pair of functions ( f, g), where f is the foregoing Karpreduction and g(x, y) = y. Note that indeed, for every ( f (x), y) ∈ Ru , it holds that (x, g(x, y)) = (x, y) ∈ R. Advanced comment. Note that the role of 1t in the definition of Ru is to allow placing Ru in PC. In contrast, consider the relation Ru that consists of pairs (M, x, t, y) such that M accepts x y within t steps. Indeed, the difference is that in Ru the time bound t appears in unary notation, whereas in Ru it appears in binary. Then, as will become obvious in §4.2.1.2, membership in Ru cannot be decided in polynomialtime. Going even further, we note that omitting t altogether from the problem instance yields a search problem def that is not solvable at all. That is, consider the relation R H = {(M, x, y) : M(x y) = 1} (which is related to the halting problem). Indeed, the search problem of any relation (and in particular of any relation in PC) is Karpreducible to the search problem of R H , but the latter is not solvable at all (i.e., there exists no algorithm that halts on every input and on input x = M, x outputs y such that (x, y) ∈ R H if and only if such a y exists).
Bounded Halting and Nonhalting We note that the problem shown to be NPcomplete in the proof of Theorem 2.19 is related to the following two problems, called Bounded Halting and Bounded Nonhalting. Fixing any programming language, the instance to each of these problems consists of a program π and a time bound t (presented in unary). The decision version of Bounded Halting (resp., Bounded Nonhalting) consists of determining whether or not there exists an input (of length at most t) on which the program π halts in t steps (resp., does not halt in t steps), whereas the search problem consists of finding such an input. The decision version of Bounded Nonhalting refers to a fundamental computational problem in the area of program verification; specifically, the problem of determining whether a given program halts within a given time bound on all inputs of a given length.9 9
The length parameter need not equal the time bound. Indeed, a more general version of the problem refers to two bounds, and t, and to whether the given program halts within t steps on each possible bit input. It is easy to prove
70
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
We have mentioned Bounded Halting because it is often referred to in the literature, but we believe that Bounded Nonhalting is much more relevant to the project of program verification (because one seeks programs that halt on all inputs rather than programs that halt on some input). It is easy to prove that both problems are NPcomplete (see Exercise 2.19). Note that the two (decision) problems are not complementary (i.e., (M, 1t ) may be a yesinstance of both decision problems).10 The fact that Bounded Nonhalting is probably intractable (i.e., is intractable provided that P = N P) is even more relevant to the project of program verification than the fact that the Halting Problem is undecidable. The reason being that the latter problem (as well as other related undecidable problems) refers to arbitrarily long computations, whereas the former problem refers to an explicitly bounded number of computational steps. Specifically, Bounded Nonhalting is concerned with the existence of an input that causes the program to violate a certain condition (i.e., halting) within a given timebound. In light of the foregoing, the common practice of bashing Bounded (Non)halting as an “unnatural” problem seems very odd at an age in which computer programs play such a central role. (Nevertheless, we will use the term “natural” in this traditionally and odd sense in the next title, which refers to natural computational problems that seem unrelated to computation.)
2.3.3. Some Natural NPComplete Problems Having established the mere existence of NPcomplete problems, we now turn to prove the existence of NPcomplete problems that do not (explicitly) refer to computation in the problem’s definition. We stress that thousands of such problems are known (and a list of several hundreds can be found in [85]). We will prove that deciding the satisfiability of propositional formulae is NPcomplete (i.e., Cook’s Theorem), and also present some combinatorial problems that are NPcomplete. This presentation is aimed at providing a (small) sample of natural NPcompleteness results as well as some tools toward proving NPcompleteness of new problems of interest. We start by making a comment regarding the latter issue. The reduction presented in the proof of Theorem 2.19 is called “generic” because it (explicitly) refers to any (generic) NPproblem. That is, we actually presented a scheme for the design of reductions from any desired NPproblem to the single problem proved to be NPcomplete. Indeed, in doing so, we have followed the definition of NPcompleteness. However, once we know some NPcomplete problems, a different route is open to us. We may establish the NPcompleteness of a new problem by reducing a known NPcomplete problem to the new problem. This alternative route is indeed a common practice, and it is based on the following simple proposition. that the problem remains NPcomplete also in the case that the instances are restricted to having parameters and t such that t = p(), for any fixed polynomial p (e.g., p(n) = n 2 , rather than p(n) = n as used in the main text). 10 Indeed, (M, 1t ) can not be a noinstance of both decision problems, but this does not make the problems complementary. In fact, the two decision problems yield a threeway partition of the instances (M, 1t ): (1) pairs (M, 1t ) such that for every input x (of length at most t) the computation of M(x) halts within t steps, (2) pairs (M, 1t ) for which such halting occurs on some inputs but not on all inputs, and (3) pairs (M, 1t ) such that there exists no input (of length at most t) on which M halts in t steps. Note that instances of type (1) are exactly the noinstances of Bounded Nonhalting, whereas instances of type (3) are exactly the noinstances of Bounded Halting.
71
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Proposition 2.20: If an NPcomplete problem is reducible to some problem in NP then is NPcomplete. Furthermore, reducibility via Karpreductions (resp., Levinreductions) is preserved. Proof: The proof boils down to asserting the transitivity of reductions. Specifically, the NPhardness of means that every problem in NP is reducible to , which in turn is reducible to . Thus, by transitivity of reduction (see Exercise 2.6), every problem in NP is reducible to , which means that is NPhard and the proposition follows.
2.3.3.1. Circuit and Formula Satisfiability: CSAT and SAT We consider two related computational problems, CSAT and SAT, which refer (in the decision version) to the satisfiability of Boolean circuits and formulae, respectively. (We refer the reader to the definitions of Boolean circuits, formulae, and CNF formulae that appear in §1.2.4.1.) Teaching note: We suggest establishing the NPcompleteness of SAT by a reduction from the circuit satisfaction problem (CSAT), after establishing the NPcompleteness of the latter. Doing so allows for decoupling two important parts of the proof of the NPcompleteness of SAT: the emulation of Turing machines by circuits, and the emulation of circuits by formulae with auxiliary variables.
CSAT. Recall that Boolean circuits are directed acyclic graphs with internal vertices, called gates, labeled by Boolean operations (of arity either 2 or 1), and external vertices called terminals that are associated with either inputs or outputs. When setting the inputs of such a circuit, all internal nodes are assigned values in the natural way, and this yields a value to the output(s), called an evaluation of the circuit on the given input. The evaluation of circuit C on input z is denoted C(z). We focus on circuits with a single output, and let CSAT denote the set of satisfiable Boolean circuits (i.e., a circuit C is in CSAT if there exists an input z such that C(z) = 1). We also consider the related relation RCSAT = {(C, z) : C(z) = 1}. Theorem 2.21 (NPcompleteness of CSAT): The set (resp., relation) CSAT (resp., RCSAT ) is N Pcomplete (resp., PCcomplete). Proof: It is easy to see that CSAT ∈ N P (resp., RCSAT ∈ PC). Thus, we turn to showing that these problems are NPhard. We will focus on the decision version (but also discuss the search version). We will present (again, but for the last time in this book) a generic reduction, this time of any NPproblem to CSAT. The reduction is based on the observation, mentioned in §1.2.4.1, that the computation of polynomialtime algorithms can be emulated by polynomialsize circuits. In the current context, we wish to emulate the computation of a fixed machine M on input (x, y), where x is fixed and y varies (but y = poly(x) and the total number of steps of M(x, y) is polynomial in x + y). Thus, x will be “hardwired” into the circuit, whereas y will serve as the input to the circuit. The circuit itself, denoted C x , will consists of “layers” such
72
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
that each layer will represent an instantaneous configuration of the machine M, and the relation between consecutive configurations in a computation of this machine will be captured by (“uniform”) local gadgets in the circuit. The number of layers will depend on (x and on) the polynomial that upperbounds the running time of M, and an additional gadget will be used to detect whether the last configuration is accepting. Thus, only the first layer of the circuit C x (which will represent an initial configuration with input prefixed by x) will depend on x. The punch line is that determining whether, for a given x, there exists a y (y = poly(x)) such that M(x, y) = 1 (in a given number of steps) will be reduced to whether there exists a y such that C x (y) = 1. Performing this reduction for any machine M R that corresponds to any R ∈ PC (as in the proof of Theorem 2.19), we establish the fact that CSAT is NPcomplete. Details follow. Recall that we wish to reduce an arbitrary set S ∈ N P to CSAT. Let R, p R , M R , and t R be as in the proof of Theorem 2.19 (i.e., R is the witness relation of S, whereas p R bounds the length of the NPwitnesses, M R is the machine deciding membership in R, and t R is its polynomial time bound). Without loss of generality (and for simplicity), suppose that M R is a onetape Turing machine. We will construct a def Karpreduction that maps an instance x (for S) to a circuit, denoted f (x) = C x , such that C x (y) = 1 if and only if M R accepts the input (x, y) within t R (x + p R (x)) steps. Thus, it will follow that x ∈ S if and only if there exists y ∈ {0, 1} p R (x) such that C x (y) = 1 (i.e., if and only if C x ∈ CSAT). The circuit C x will depend on x as well as on M R , p R , and t R . (We stress that M R , p R , and t R are fixed, whereas x varies and is thus explicit in our notation.) Before describing the circuit C x , let us consider a possible computation of M R on input (x, y), where x is fixed and y represents a generic string of length at most p R (x). Such a computation proceeds for t = t R (x + p R (x)) steps, and corresponds to a sequence of t + 1 instantaneous configurations, each of length t. Each such configuration can be encoded by t pairs of symbols, where the first symbol in each pair indicates the contents of a cell and the second symbol indicates either a state of the machine or the fact that the machine is not located in this cell. Thus, each pair is a member of × (Q ∪ {⊥}), where is the finite “work alphabet” of M R , Q is its finite set of internal states, and ⊥ is an indication that the machine is not present at a cell. The initial configuration includes x y as input, and the decision of M R (x, y) can be read from (the leftmost cell of) the last configuration.11 With the exception of the first row, the values of the entries in each row are determined by the entries of the row just above it, where this determination reflects the transition function of M R . Furthermore, the value of each entry in the said array is determined by the values of (up to) three entries that reside in the row above it (see Exercise 2.20). Thus, the aforementioned computation is represented by a (t + 1) × t array, where each entry encodes one out of a constant number of possibilities, which in turn can be encoded by a constantlength bit string. See Figure 2.1. The circuit C x has a structure that corresponds to the aforementioned array. Each entry in the array is represented by a constant number of gates such that when C x is evaluated at y these gates will be assigned values that encode the contents of the said entry (in the computation of M R (x, y)). In particular, the entries of the first row of 11
We refer to the output convention presented in §1.2.3.2, by which the output is written in the leftmost cells and the machine halts at the cell to its right.
73
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
(1,a) (1,) (0,) (y1,) (y2,) (,)
(,)
(,)
(,)
(,)
(3,) (1,b) (0,) (y1,) (y2,) (,)
(,)
(,)
(,)
(,)
(3,) (1,) (0,b) (y1,) (y2,) (,)
(,)
(,)
(,)
(,)
initial configuration (with input 110y1y2 )
(3,) (1,c) (0,) (3,c) (1,) (0,) (1,) (1,f) (0,)
last configuration
Figure 2.1: An array representing ten consecutive computation steps on input 110y1 y2 . Blank characters as well as the indication that the machine is not present in the cell are marked by a hyphen (). The three arrows represent the determination of an entry by the three entries that reside above it. The machine underlying this example accepts the input if and only if the input contains a zero.
the array are “encoded” by hardwiring the reduction’s input (i.e., x), and feeding the circuit’s input (i.e., y) to the adequate input terminals. That is, the circuit has p R (x) (“real”) input terminals (corresponding to y), and the hardwiring of constants to the other O(t − p R (x)) gates that represent the first row is done by simple gadgets (as in Figure 1.3). Indeed, the additional hardwiring in the first row corresponds to the other fixed elements of the initial configuration (i.e., the blank symbols, and the encoding of the initial state and of the initial location; cf. Figure 2.1). The entries of subsequent rows will be “encoded” (or rather computed at evaluation time) by using constantsize circuits that determine the value of an entry based on the three relevant entries in the row above it. Recall that each entry is encoded by a constant number of gates, and thus these constantsize circuits merely compute the constantsize function described in Exercise 2.20. In addition, the circuit C x has a few extra gates that check the values of the entries of the last row in order to determine whether or not it encodes an accepting configuration.12 Note that the circuit C x can be constructed in polynomialtime from the string x, because we just need to encode x in an appropriate manner as well as generate a “highly uniform” gridlike circuit of size O(t R (x + p R (x))2 ).13 Although the foregoing construction of C x capitalizes on various specific details of the (onetape) Turing machine model, it can be easily adapted to other natural 12
In continuation of footnote 11, we note that it suffices to check the values of the two leftmost entries of the last row. We assumed here that the circuit propagates a halting configuration to the last row. Alternatively, we may check for the existence of an accepting/halting configuration in the entire array, since this condition is quite simple. 13 Advanced comment: A more efficient construction, which generates almostlinear sized circuits (i.e., circuits (t R (x + p R (x)))) is known; see [180]. of size O
74
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
models of efficient computation (by showing that in such models the transformation from one configuration to the subsequent one can be emulated by a (polynomialtime constructible) circuit).14 Alternatively, we recall the CobhamEdmonds Thesis asserting that any problem that is solvable in polynomial time (on some “reasonable” model) can be solved in polynomial time by a (onetape) Turing machine. Turning back to the circuit C x , we observe that indeed C x (y) = 1 if and only if M R accepts the input (x, y) while making at most t = t R (x + p R (x)) steps. Recalling that S = {x : ∃y s.t. y ≤ p R (x) ∧ (x, y) ∈ R} and that M R decides membership in R in time t R , we infer that x ∈ S if and only if f (x) = C x ∈ CSAT. Furthermore, (x, y) ∈ R if and only if ( f (x), y) ∈ RCSAT . It follows that f is a Karpreduction of def S to CSAT, and, for g(x, y) = y, it holds that ( f, g) is a Levinreduction of R to RCSAT . The theorem follows. SAT. Recall that Boolean formulae are special types of Boolean circuits (i.e., circuits having a tree structure).15 We further restrict our attention to formulae given in conjunctive normal form (CNF). We denote by SAT the set of satisfiable CNF formulae (i.e., a CNF formula φ is in SAT if there exists a truth assignment τ such that φ(τ ) = 1). We also consider the related relation RSAT = {(φ, τ ) : φ(τ ) = 1}. Theorem 2.22 (NPcompleteness of SAT): The set (resp., relation) SAT (resp., RSAT ) is N Pcomplete (resp., PCcomplete). Proof: Since the set of possible instances of SAT is a subset of the set of instances of CSAT, it is clear that SAT ∈ N P (resp., RSAT ∈ PC). To prove that SAT is NPhard, we reduce CSAT to SAT (and use Proposition 2.20). The reduction boils down to introducing auxiliary variables in order to “cut” the computation of an arbitrary (“deep”) circuit into a conjunction of related computations of “shallow” circuits (i.e., depth2 circuits) of unbounded fanin, which in turn may be presented as a CNF formula. The aforementioned auxiliary variables hold the possible values of the internal gates of the original circuit, and the clauses of the CNF formula enforce the consistency of these values with the corresponding gate operation. For example, if gatei and gate j feed into gatek , which is a ∧gate, then the corresponding auxiliary variables gi , g j , gk should satisfy the Boolean condition gk ≡ (gi ∧ g j ), which can be written as a 3CNF with four clauses. Details follow. We start by Karpreducing CSAT to SAT. Given a Boolean circuit C, with n input terminals and m gates, we first construct m constantsize formulae on n + m variables, where the first n variables correspond to the input terminals of the circuit, and the other m variables correspond to its gates. The i th formula will depend on the variable that correspond to the i th gate and the 12 variables that correspond to the vertices that feed into this gate (i.e., 2 vertices in case of ∧gate or ∨gate and a single vertex in case of a ¬gate, where these vertices may be either input terminals or other gates). This (constantsize) formula will be satisfied by a truth assignment if and only if this assignment matches the gate’s functionality (i.e., feeding this gate 14
Advanced comment: Indeed, presenting such circuits is very easy in the case of all natural models (e.g., the RAM model), where each bit in the next configuration can be expressed by a simple Boolean formula in the bits of the previous configuration. 15 For an alternative definition, see Appendix G.2.
75
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS 1
3
2
1
2
3
3
g1 gate1
g1
g2
g3
g2
g3
g4
g4
and
gate2
eq
gate3
and
or
and
or
eq
neg
eq
eq
and
and
gate4
neg
Figure 2.2: Using auxiliary variables (i.e., the gi ’s) to “cut” a depth5 circuit (into a CNF). The dashed regions will be replaced by equivalent CNF formulae. The dashed cycle representing an unbounded fanin andgate is the conjunction of all constantsize circuits (which enforce the functionalities of the original gates) and the variable that represents the gate that feeds the output terminal in the original circuit.
with the corresponding values result in the corresponding output value). Note that these constantsize formulae can be written as constantsize CNF formulae (in fact, as 3CNF formulae).16 Taking the conjunction of these m formulae and the variable associated with the gate that feeds into the output terminal, we obtain a formula φ in CNF (see Figure 2.2, where n = 3 and m = 4). Note that φ can be constructed in polynomial time from the circuit C; that is, the mapping of C to φ = f (C) is polynomialtime computable. We claim that C is in CSAT if and only if φ is in SAT. 1. Suppose that for some string s it holds that C(s) = 1. Then, assigning to the i th auxiliary variable the value that is assigned to the i th gate of C when evaluated on s, we obtain (together with s) a truth assignment that satisfies φ. This is the case because such an assignment satisfies all m constantsize CNFs as well as the variable associated with the output of C. 2. On the other hand, if τ satisfies φ then the first n bits in τ correspond to an input on which C evaluates to 1. This is the case because the m constantsize CNFs guarantee that the variables of φ are assigned values that correspond to the evaluation of C on the first n bits of τ , while the fact that the variable associated with the output of C has value true guarantees that this evaluation of C yields the value 1. Note that the latter mapping (of τ to its nbit prefix) is the second mapping required by the definition of a Levinreduction. Thus, we have established that f is a Karpreduction of CSAT to SAT, and that augmenting f with the aforementioned second mapping yields a Levinreduction of RCSAT to RSAT . 16
Recall that any Boolean function can be written as a CNF formula having size that is exponential in the length of its input, which in this case is a constant (i.e., either 2 or 3). Indeed, note that the Boolean functions that we refer to here depend on 23 Boolean variables (since they indicate whether or not the corresponding values respect the gate’s functionality).
76
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
Comment. The fact that the second mapping required by the definition of a Levinreduction is explicit in the proof of the validity of the corresponding Karpreduction is a fairly common phenomenon. Actually (see Exercise 2.28), typical presentations of Karpreductions provide two auxiliary polynomialtime computable mappings (in addition to the main mapping of instances from one problem (e.g., CSAT) to instances of another problem (e.g., SAT)): The first auxiliary mapping is of solutions for the preimage instance (e.g., of CSAT) to solutions for the image instance of the reduction (e.g., of SAT), whereas the second mapping goes the other way around. (Note that only the main mapping and the second auxiliary mapping are required in the definition of a Levinreduction.) For example, the proof of the validity of the Karpreduction of CSAT to SAT, denoted f , specified two additional mappings h and g such that (C, s) ∈ RCSAT implies ( f (C), h(C, s)) ∈ RSAT and ( f (C), τ ) ∈ RSAT implies (C, g(C, τ )) ∈ RCSAT . Specifically, in the proof of Theorem 2.22, we used h(C, s) = (s, a1 , . . . , am ) where ai is the value assigned to the i th gate in the evaluation of C(s), and g(C, τ ) being the nbit prefix of τ . 3SAT. Note that the formulae resulting from the Karpreduction presented in the proof of Theorem 2.22 are in conjunctive normal form (CNF) with each clause referring to at most three variables. Thus, the above reduction actually establishes the NPcompleteness of 3SAT (i.e., SAT restricted to CNF formula with up to three variables per clause). Alternatively, one may Karpreduce SAT (i.e., satisfiability of CNF formula) to 3SAT (i.e., satisfiability of 3CNF formula), by replacing long clauses with conjunctions of threevariable clauses (using auxiliary variables; see Exercise 2.21). Either way, we get the following result, where the furthermore part is proved by an additional reduction. Proposition 2.23: 3SAT is NPcomplete. Furthermore, the problem remains NPcomplete also if we restrict the instances such that each variable appears in at most three clauses. Proof Sketch: The furthermore part is proved by reduction from 3SAT. We just replace each occurrence of each Boolean variable by a new copy of this variable, and add clauses to enforce that all these copies are assigned the same value. Specifically, replacing the variable z by copies z (1) , . . . , z (m) , we add the clauses z (i+1) ∨ ¬z (i) for i = 1 . . . , m (where m + 1 is understood as 1). Related problems. Note that instances of SAT can be viewed as systems of Boolean conditions over Boolean variables. Such systems can be emulated by various types of systems of arithmetic conditions, implying the NPhardness of solving the latter types of systems. Examples include systems of integer linear inequalities (see Exercise 2.23), and systems of quadratic equalities (see Exercise 2.25).
2.3.3.2. Combinatorics and Graph Theory Teaching note: The purpose of this subsection is to expose the students to a sample of NPcompleteness results and proof techniques (i.e., the design of reductions among computational problems). The author believes that this traditional material is insightful, but one may skip it in the context of a complexity class.
77
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
We present just a few of the many appealing combinatorial problems that are known to be NPcomplete. Throughout this section, we focus on the decision versions of the various problems, and adopt a more informal style. Specifically, we will present a typical decision problem as a problem of deciding whether a given instance, which belongs to a set of relevant instances, is a “yesinstance” or a “noinstance” (rather than referring to deciding membership of arbitrary strings in a set of yesinstances). For further discussion of this style and its rigorous formulation, see Section 2.4.1. We will also neglect showing that these decision problems are in NP; indeed, for natural problems in NP, showing membership in NP is typically straightforward. Set Cover. We start with the Set Cover problem, in which an instance consists of a collection of finite sets S1 , . . . , Sm and an integer K andthe question (for decision) is m Si (i.e., indices i 1 , . . . , i K whether or not there exist (at most)17 K sets that cover i=1 K m such that j=1 Si j = i=1 Si ). Proposition 2.24: Set Cover is NPcomplete. Proof Sketch: We sketch a reduction of SAT to Set Cover. For a CNF formula φ with m clauses and n variables, we consider the sets S1,t , S1,f , .., Sn,t , Sn,f ⊆ {1, . . . , m} such that Si,t (resp., Si,f ) is the set of the indices of the clauses (of φ) that are satisfied by setting the i th variable to true (resp., false). That is, if the i th variable appears unnegated (resp., negated) in the j th clause then j ∈ Si,t (resp., j ∈ Si,f ). Note that the union of these 2n sets equals {1, . . . , m}. Now, on input φ, the reduction outputs def the Set Cover instance f (φ) = ((S1 , .., S2n ), n), where S2i−1 = Si,t ∪ {m + i} and S2i = Si,f ∪ {m + i} for i = 1, . . . , n. Note that f is computable in polynomial time, and that if φ is satisfied by τ1 · · · τn then the collection {S2i−τi : i = 1, . . . , n} covers {1, . . . , m + n}. Thus, φ ∈ S AT implies that f (φ) is a yesinstance of Set Cover. On the other hand, each cover of {m + 1, . . . , m + n} ⊂ {1, . . . , m + n} must include either S2i−1 or S2i for each i. Thus, a cover of {1, . . . , m + n} using n of the S j ’s must contain, for every i, either S2i−1 or S2i but not both. Setting τi accordingly (i.e., τi = 1 if and only if S2i−1 is in the cover) implies that {S2i−τi : i = 1, . . . , n} covers {1, . . . , m}, which in turn implies that τ1 · · · τn satisfies φ. Thus, if f (φ) is a yesinstance of Set Cover then φ ∈ SAT. Exact Cover and 3XC. The Exact Cover problem is similar to the set cover problem, except that here the sets that are used in the cover are not allowed to intersect. That is, each element in the universe should be covered by exactly one set in the cover. Restricting the set of instances to sequences of subsets each having exactly three elements, we get the restricted problem called 3Exact Cover (3XC), where it is unnecessary to specify the number of sets to be used in the cover. The problem 3XC is rather technical, but it is quite useful for demonstrating the NPcompleteness of other problems (by reducing 3XC to them). Proposition 2.25: 3Exact Cover is NPcomplete. 17
Clearly, in the case of Set Cover, the two formulations (i.e., asking for exactly K sets or at most K sets) are computationally equivalent.
78
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
Indeed, it follows that the Exact Cover (in which sets of arbitrary size are allowed) is NPcomplete. This follows both for the case that the number of sets in the desired cover is unspecified and for the various cases in which this number is bounded (i.e., upperbounded or lowerbounded or both). Proof Sketch: The reduction is obtained by composing three reductions. We first reduce a restricted case of 3SAT to a restricted version of Set Cover, denoted 3SC, in which each set has at most three elements (and an instance consists, as in the general case, of a sequence of finite sets as well as an integer K ). Specifically, we refer to 3SAT instances that are restricted such that each variable appears in at most three clauses, and recall that this restricted problem is NPcomplete (see Proposition 2.23). Actually, we further reduce this special case of 3SAT to one in which each literal appears in at most two clauses.18 Now, we reduce the new version of 3SAT to 3SC by using the (very same) reduction presented in the proof of Proposition 2.24, and observing that the size of each set in the reduced instance is at most three (i.e., one more than the number of occurrences of the corresponding literal). Next, we reduce 3SC to the following restricted case of Exact Cover, denoted 3XC’, in which each set has at most three elements, an instance consists of a sequence of finite sets as well as an integer K , and the question is whether there exists an exact cover with at most K sets. The reduction maps an instance ((S1 , . . . , Sm ), K ) of 3SC to the instance (C , K ) such that C is a collection of all subsets of each of the sets S1 , . . . , Sm . Since each Si has size at most 3, we introduce at most 7 nonempty subsets per each such set, and the reduction can be computed in polynomial time. The reader may easily verify the validity of this reduction. Finally, we reduce m3XC’ to 3XC. Consider an instance ((S1 , . . . , Sm ), K ) of 3XC’, Si = [n]. If n > 3K then this is definitely a noinstance, and suppose that i=1 which can be mapped to a dummy noinstance of 3XC, and so we assume that def x = 3K − n ≥ 0. Note that x represents the “excess” covering ability of an exact cover having K sets, each having three elements. Thus, we augment the set system with x new elements, denoted n + 1, . . . , 3K , and replace each Si such that Si  < 3 by a subcollection of 3sets that cover Si as well as arbitrary elements from {n + 1, . . . , 3K }. That is, in case Si  = 2, the set Si is replaced by the subcollection (Si ∪ {n + 1}, . . . , Si ∪ {3K }), whereas a singleton Si is replaced by the sets Si ∪ { j1 , j2 } for every j1 < j2 in {n + 1, . . . , 3K }. In addition, we add all possible 3subsets of {n + 1, . . . , 3K }. This completes the description of the third reduction, the validity of which is left as an exercise. Vertex Cover, Independent Set, and Clique. Turning to graph theoretic problems (see Appendix G.1), we start with the Vertex Cover problem, which is a special case of the Set Cover problem. The instances consist of pairs (G, K ), where G = (V, E) is a simple graph and K is an integer, and the problem is whether or not there exists a set 18
This can be done by observing that if all three occurrences of a variable are of the same type (i.e., they are all negated or all nonnegated) then this variable can be assigned a value that satisfies all clauses in which it appears, and so the variable and the clauses in which it appears can be omitted from the instance. This yields a reduction of 3SAT instances in which each variable appears in at most three clauses to 3SAT instances in which each literal appears in at most two clauses. Actually, a closer look at the proof of Proposition 2.23 reveals the fact that the reduced instances satisfy the latter property anyhow.
79
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
of (at most) K vertices that is incident to all graph edges (i.e., each edge in G has at least one endpoint in this set). Indeed, this instance of Vertex Cover can be viewed as an instance of Set Cover by considering the collection of sets (Sv )v∈V , where Sv denotes the set of edges incident at vertex v (i.e., Sv = {e ∈ E : v ∈ e}). Thus, the NPhardness of Set Cover follows from the NPhardness of Vertex Cover (but this implication is unhelpful for us here: We already know that Set Cover is NPhard and we wish to prove that Vertex Cover is NPhard). We also note that the Vertex Cover problem is computationally equivalent to the Independent Set and Clique problems (see Exercise 2.26), and thus it suffices to establish the NPhardness of one of these problems. Proposition 2.26: The problems Vertex Cover, Independent Set and Clique are NPcomplete. Teaching note: The following reduction is not the “standard” one (see Exercise 2.27). It is rather adapted from the FGLSSreduction (see Exercise 9.18), and is used here in anticipation of the latter. Furthermore, although the following reduction tends to create a larger graph, the author finds it clearer than the “standard” reduction.
Proof Sketch: We show a reduction from 3SAT to Independent Set. On input a 3CNF formula φ with m clauses and n variables, we construct a graph with 7m vertices, denoted G φ . The vertices are grouped in m cliques, each corresponding to one of the clauses and containing 7 vertices that correspond to the 7 truth assignments (to the 3 variables in the clause) that satisfy the clause. In addition to the internal edges of these m cliques, we add an edge between each pair of vertices that correspond to partial assignments that are mutually inconsistent. That is, if a specific (satisfying) assignment to the variables of the i th clause is inconsistent with some (satisfying) assignment to the variables of the j th clause then we connect the corresponding vertices by an edge. (Note that the internal edges of the m cliques may be viewed as a special case of the edges connecting mutually inconsistent partial assignments.) Thus, on input φ, the reduction outputs the pair (G φ , m). Note that if φ is satisfiable by a truth assignment τ then there are no edges between the m vertices that correspond to the partial satisfying assignment derived from τ . (We stress that any truth assignment to φ yields an independent set, but only a satisfying assignment guarantees that this independent set contains a vertex from each of the m cliques.) Thus, φ ∈ SAT implies that G φ has an independent set of size m. On the other hand, an independent set of size m in G φ must contain exactly one vertex in each of the m cliques, and thus induces a truth assignment that satisfies φ. (We stress that each independent set induces a consistent truth assignment to φ, because the partial assignments selected in the various cliques must be consistent, and that an independent set containing a vertex from a specific clique induces an assignment that satisfies the corresponding clause.) Thus, if G φ has an independent set of size m then φ ∈ SAT. Graph 3Colorability (G3C). In this problem the instances are graphs and the question is whether or not the graph can be colored using three colors such that neighboring vertices are not assigned the same color.
80
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
2
x
3
y
T1
1
M
T2 T3
Figure 2.3: The clause gadget and its subgadget. In a generic 3coloring of the subgadget it must hold that if x = y then x = y = 1. Thus, if the three terminals of the gadget are assigned the same color, χ, then M is also assigned the color χ.
Proposition 2.27: Graph 3Colorability is NPcomplete. Proof Sketch: We reduce 3SAT to G3C by mapping a 3CNF formula φ to the graph G φ , which consists of two special (“designated”) vertices, a gadget per each variable of φ, a gadget per each clause of φ, and edges connecting some of these components. • The two designated vertices are called ground and false, and are connected by an edge that ensures that they must be given different colors in any 3coloring of G φ . We will refer to the color assigned to the vertex ground (resp., false) by the name ground (resp., false). The third color will be called true. • The gadget associated with variable x is a pair of vertices, associated with the two literals of x (i.e., x and ¬x). These vertices are connected by an edge, and each of them is also connected to the vertex ground. Thus, in a 3coloring of G φ one of the vertices associated with the variable is colored true and the other is colored false. • The gadget associated with a clause C is depicted in Figure 2.3. It contains a master vertex, denoted M, and three terminal vertices, denoted T1, T2, and T3. The master vertex is connected by edges to the vertices ground and false, and thus in a 3coloring of G φ the master vertex must be colored true. The gadget has the property that it is possible to color the terminals with any combination of the colors true and false, except for coloring all terminals with false. The terminals of the gadget associated with clause C will be identified with the vertices that are associated with the corresponding literals appearing in C. This means that the various clause gadgets are not vertexdisjoint but may rather share some terminals (with the variable gadgets as well as among themselves).19 See Figure 2.4. Verifying the validity of the reduction is left as an exercise.
2.3.4. NP Sets That Are Neither in P nor NPComplete As stated in Section 2.3.3, thousands of problems have been shown to be NPcomplete (cf., [85, Apdx.], which contains a list of more than three hundreds main entries). Things have reached a situation in which people seem to expect any NPset to be either NPcomplete or in P. This naive view is wrong: Assuming N P = P, there exist, sets 19
Alternatively, we may use disjoint gadgets and “connect” each terminal with the corresponding literal (in the corresponding vertex gadget). Such a connection (i.e., an auxiliary gadget) should force the two endpoints to have the same color in any 3coloring of the graph.
81
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
GROUND
FALSE
the two designated verices
variable gadgets
clause gadgets
Figure 2.4: A single clause gadget and the relevant variables gadgets.
in N P that are neither NPcomplete nor in P, where here NPhardness allows also Cookreductions. Theorem 2.28: Assuming N P = P, there exists a set T in N P \ P such that some sets in N P are not Cookreducible to T . Theorem 2.28 asserts that if N P = P then N P is partitioned into three nonempty classes: the class P, the class of problems to which N P is Cookreducible, and the rest, denoted N PI. We already know that the first two classes are not empty, and Theorem 2.28 establishes the nonemptiness of N PI under the condition that N P = P, which is actually a necessary condition (because if N P = P then every set in N P is Cookreducible to any other set in N P). The following proof of Theorem 2.28 presents an unnatural decision problem in N PI. We mention that some natural decision problems (e.g., some that are computationally equivalent to factoring) are conjectured to be in N PI. We also mention that if N P = def coN P, where coN P = {{0, 1}∗ \ S : S ∈ N P}, then = N P ∩ coN P ⊆ P ∪ N PI holds (as a corollary to Theorem 2.35). Thus, if N P = coN P then \ P is a (natural) subset of N PI, and the nonemptiness of N PI follows provided that = P. Recall that Theorem 2.28 establishes the nonemptiness of N PI under the seemingly weaker assumption that N P = P. Teaching note: We recommend either stating Theorem 2.28 without a proof or merely presenting the proof idea.
Proof Sketch: The basic idea is modifying an arbitrary set in N P \ P so as to fail all possible reductions (from N P to the modified set) as well as all possible polynomialtime decision procedures (for the modified set). Specifically, starting with S ∈ N P \ P, we derive S ⊂ S such that on the one hand there is no polynomialtime
82
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
reduction of S to S while on the other hand S ∈ N P \ P. The process of modifying S into S proceeds in iterations, alternatively failing a potential reduction (by dropping sufficiently many strings from the rest of S) and failing a potential decision procedure (by including sufficiently many strings from the rest of S). Specifically, each potential reduction of S to S can be failed by dropping finitely many elements from the current S , whereas each potential decision procedure can be failed by keeping finitely many elements of the current S . These two assertions are based on the following two corresponding facts: 1. Any polynomialtime reduction (of any set not in P) to any finite set (e.g., a finite subset of S) must fail, because only sets in P are Cookreducible to a finite set. Thus, for any finite set F1 and any potential reduction (i.e., a polynomialtime oracle machine), there exists an input x on which this reduction to F1 fails. We stress that the aforementioned reduction fails while the only queries that are answered positively are those residing in F1 . Furthermore, the aforementioned failure is due to a finite set of queries (i.e., the set of all queries made by the reduction when invoked on an input that is smaller or equal to x). Thus, for every finite set F1 ⊂ S ⊆ S, any reduction of S to S can be failed by dropping a finite number of elements from S and without dropping elements of F1 . 2. For every finite set F2 , any polynomialtime decision procedure for S \ F2 must fail, because S is Cookreducible to S \ F2 . Thus, for any potential decision procedure (i.e., a polynomialtime algorithm), there exists an input x on which this procedure fails. We stress that this failure is due to a finite “prefix” of S \ F2 (i.e., the set {z ∈ S \ F2 : z ≤ x}). Thus, for every finite set F2 , any polynomialtime decision procedure for S \ F2 can be failed by keeping a finite subset of S \ F2 . As stated, the process of modifying S into S proceeds in iterations, alternatively failing a potential reduction (by dropping finitely many strings from the rest of S) and failing a potential decision procedure (by including finitely many strings from the rest of S). This can be done efficiently because it is inessential to determine the first possible points of alternation (in which sufficiently many strings were dropped (resp., included) to fail the next potential reduction (resp., decision procedure)). It suffices to guarantee that adequate points of alternation (albeit highly nonoptimal ones) can be efficiently determined. Thus, S is the intersection of S and some set in P, which implies that S ∈ N P. Following are some comments regarding the implementation of the foregoing idea. The first issue is that the foregoing plan calls for an (“effective”) enumeration of all polynomialtime oracle machines (resp., polynomialtime algorithms). However, none of these sets can be enumerated (by an algorithm). Instead, we enumerate all corresponding machines along with all possible polynomials, and for each pair (M, p) we consider executions of machine M with time bound specified by the polynomial p. That is, we use the machine M p obtained from the pair (M, p) by suspending the execution of M on input x after p(x) steps. We stress that we do not know whether machine M runs in polynomial time, but the computations of any polynomialtime machine is “covered” by some pair (M, p). Next, let us clarify the process in which reductions and decision procedures are ruled out. We present a construction of a “filter” set F in P such that the final set S
83
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
will equal S ∩ F. Recall that we need to select F such that each polynomialtime reduction of S to S ∩ F fails, and each polynomialtime procedure for deciding S ∩ F fails. The key observation is that for every finite F each polynomialtime reduction of S to S ∩ F fails, whereas for every cofinite F (i.e., finite {0, 1}∗ \ F ) each polynomialtime procedure for deciding S ∩ F fails. Furthermore, each of these failures occurs on some input, and such an input can be determined by finite portions of S and F. Thus, we alternate between failing possible reductions and decision procedures on some inputs, while not trying to determine the “optimal” points of alternation but rather determining points of alternation in an efficient manner (which in turn allows for efficiently deciding membership in F). Specifically, we let F = {x : f (x) ≡ 1 mod 2}, where f : N → {0} ∪ N will be defined such that (i) each of the first f (n) − 1 machines is failed by some input of length at most n, and (ii) the value f (n) can be computed in time poly(n). The value of f (n) is defined by the following process that performs exactly n 3 computation steps (where cubic time is a rather arbitrary choice). The process proceeds in (an a priori unknown number of) iterations, where in the i + 1st iteration we try to find an input on which the i + 1st (modified) machine fails. Specifically, in the i + 1st iteration we scan all inputs, in lexicographic order, until we find an input on which the i + 1st (modified) machine fails, where this machine is an oracle machine if i + 1 is odd and a standard machine otherwise. If we detect a failure of the i + 1st machine, then we increment i and proceed to the next iteration. When we reach the allowed number of steps (i.e., n 3 steps), we halt outputting the current value of i (i.e., the current i is output as the value of f (n)). Needless to say, this description is heavily based on determining whether or not the i + 1st machine fails on specific inputs. Intuitively, these inputs will be much shorter than n, and so performing these decisions in time n 3 (or so) is not out of the question – see next paragraph. In order to determine whether or not a failure (of the i + 1st machine) occurs on a particular input x, we need to emulate the computation of this machine on input x as well as determine whether x is in the relevant set (which is either S or S = S ∩ F). Recall that if i + 1 is even then we need to fail a standard machine (which attempts to decide S ) and otherwise we need to fail an oracle machine (which attempts to reduce S to S ). Thus, for even i + 1 we need to determine whether x is in S = S ∩ F, whereas for odd i + 1 we need to determine whether x is in S as well as whether some other strings (which appear as queries) are in S . Deciding membership in S ∈ N P can be done in exponential time (by using the exhaustive search algorithm that tries all possible NPwitnesses). Indeed, this means that when computing f (n) we may only complete the treatment of inputs that are of logarithmic (in n) length, but anyhow in n 3 steps we cannot hope to reach (in our lexicographic scanning) strings of length greater than 3 log2 n. As for deciding membership in F, this requires the ability to compute f on adequate integers. That is, we may need to compute the value of f (n ) for various integers n , but as noted n will be much smaller than n (since n ≤ poly(x) ≤ poly(log n)). Thus, the value of f (n ) is just computed recursively (while counting the recursive steps in our total number of steps).20 The point is that, when considering an input x, we may need the 20
We do not bother to present a more efficient implementation of this process. That is, we may afford to recompute f (n ) every time we need it (rather than store it for later use).
84
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.3. NPCOMPLETENESS
values of f only on {1, . . . , pi+1 (x)}, where pi+1 is the polynomial bounding the running time of the i + 1st (modified) machine, and obtaining such a value takes at most pi+1 (x)3 steps. We conclude that the number of steps performed toward determining whether or not a failure (of the i + 1st machine) occurs on the input x is upperbounded by an (exponential) function of x. As hinted in the foregoing, the procedure will complete n 3 steps long before examining inputs of length greater than 3 log2 n, but this does not matter. What matters is that f is unbounded (see Exercise 2.34). Furthermore, by construction, f (n) is computed in poly(n) time. Comment. The proof of Theorem 2.28 actually establishes that for every S ∈ P there exists S ∈ P such that S is Karpreducible to S but S is not Cookreducible to S .21 Thus, if P = N P then there exists an infinite sequence of sets S1 , S2 , . . . in N P \ P such that Si+1 is Karpreducible to Si but Si is not Cookreducible to Si+1 . That is, there exists an infinite hierarchy of problems (albeit unnatural ones), all in N P, such that each problem is “easier” than the previous ones (in the sense that it can be reduced to the previous problems while these problems cannot be reduced to it).
2.3.5. Reflections on Complete Problems This book will perhaps only be understood by those who have themselves already thought the thoughts which are expressed in it – or similar thoughts. It is therefore not a textbook. Its object would be attained if it afforded pleasure to one who read it with understanding. Ludwig Wittgenstein, Tractatus LogicoPhilosophicus
Indeed, this section should be viewed as an invitation to meditate together on questions of the type what enables the existence of complete problems? Accordingly, the style is intentionally naive and imprecise; this entire section may be viewed as an openended exercise, asking the reader to consider substantiations of the vague text. We know that NPcomplete problems exist. The question we ask here is what aspects in our modeling of problems enables the existence of complete problems. We should, of course, bear in mind that completeness refers to a class of problems; the complete problem should “encode” each problem in the class and be itself in the class. Since the first aspect, hereafter referred to as encodability of a class, is amazing enough (at least to a layman), we start by asking what enables it. We identify two fundamental paradigms, regarding the modeling of problems, that seem essential to the encodability of any (infinite) class of problems: 1. Each problem refers to an infinite set of possible instances. 2. The specification of each problem uses a finite description (e.g., an algorithm that enumerates all the possible solutions for any given instance).22 These two paradigms seem somewhat conflicting, yet put together they suggest the definition of a universal problem, that is, a problem that refers to instances of the form (D, x), The said Karpreduction (of S to S) maps x to itself if x ∈ F and otherwise maps x to a fixed noinstance of S. This seems the most naive notion of a description of a problem. An alternative notion of a description refers to an algorithm that recognizes all valid instancesolution pairs (as in the definition of NP). However, at this point, we allow also “noneffective” descriptions (as giving rise to the Halting Problem). 21 22
85
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
where D is a description of a problem and x is an instance to that problem (and we seek a solution to x with respect to D). Intuitively, this universal problem can encode any other problem (provided that problems are modeled in a way that conforms with the foregoing paradigms): Solving the universal problem allows solving any other problem.23 Note that the foregoing universal problem is actually complete with respect to the class of all problems, but it is not complete with respect to any class that contains only (algorithmically) solvable problems (because this universal problem is not solvable). Turning our attention to classes of solvable problems, we seek versions of the universal problem that are complete for these classes. One archetypical difficulty that arises is that, given a description D (as part of the instance to the universal problem), we cannot tell whether or not D is a description of a problem in a predetermined class C (because this decision problem is unsolvable). This fact is relevant because24 if the universal problem requires solving instances that refer to a problem not in C then intuitively it cannot be itself in C. Before turning to the resolution of the foregoing difficulty, we note that the aforementioned modeling paradigms are pivotal to the theory of computation at large. In particular, so far we made no reference to any complexity consideration. Indeed, a complexity consideration is the key to resolving the foregoing difficulty: The idea is modifying any description D into a description D such that D is always in C, and D agrees with D in the case that D is in C (i.e., in this case they described exactly the same problem). We stress that in the case that D is not in C, the corresponding problem D may be arbitrary (as long as it is in C). Such a modification is possible with respect to many complexity theoretic classes. We consider two different types of classes, where in both cases the class is defined in terms of the time complexity of algorithms that do something related to the problem (e.g., recognize valid solutions, as in the definition of NP). 1. Classes defined by a single timebound function t (e.g., t(n) = n 3 ). In this case, any algorithm D is modified to the algorithm D that, on input x, emulates (up to) t(x) steps of the execution of D(x). The modified version of the universal problem treats the instance (D, x) as (D , x). This version can encode any problem in the said class C. But will this (version of the universal) problem be itself in C? The answer depends both on the efficiency of emulation in the corresponding computational model and on the growth rate of t. For example, for tripleexponential t, the answer will be definitely yes, because t(x) steps can be emulated in poly(t(x)) time (in any reasonable model) while t((D, x)) > t(x + 1) > poly(t(x)). On the other hand, in most reasonable models, the emulation of t(x) steps requires ω(t(x)) time while for any polynomial t it holds that t(n + O(1)) < 2t(n). 2. Classes defined by a family of infinitely many functions of different growth rate (e.g., polynomials). We can, of course, select a function t that grows faster than any function in the family and proceed as in the prior case, but then the resulting universal problem will definitely not be in the class. 23
Recall, however, that the universal problem is not (algorithmically) solvable. Thus, both clauses of the implication are false. Indeed, the notion of a problem is rather vague at this stage; it certainly extends beyond the set of all solvable problems. 24 Here we ignore the possibility of using promise problems, which do enable avoiding such instances without requiring anybody to recognize them. Indeed, using promise problems resolves this difficulty, but the issues discussed following the next paragraph remain valid.
86
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.4. THREE RELATIVELY ADVANCED TOPICS
Note that in the current case, a complete problem will indeed be striking because, in particular, it will be associated with one function t0 that grows more moderately than some other functions in the family (e.g., a fixed polynomial grows more moderately than other polynomials). Seemingly this means that the algorithm describing the universal machine should be faster than some algorithms that describe some other problems in the class. This impression presumes that the instances of both problems are (approximately) of the same length, and so we intensionally violate this presumption by artificially increasing the length of the description of the instances to the universal problem. For example, if D is associated with the time bound t D , then the instance −1 2 (D, x) to the universal problem is presented as, say, (D, x, 1t0 (t D (x) ) ), where in the case of NP we used t0 (n) = n. We believe that the last item explains the existence of NPcomplete problems. But what about the NPcompleteness of SAT? We first note that the NPhardness of CSAT is an immediate consequence of the fact that Boolean circuits can emulate algorithms.25 This fundamental fact is rooted in the notion of an algorithm (which postulates the simplicity of a single computational step) and holds for any reasonable model of computation. Thus, for every D and x, the problem of finding a string y such that D(x, y) = 1 is “encoded” as finding a string y such that C D,x (y) = 1, where C D,x is a Boolean circuit that is easily derived from (D, x). In contrast to the fundamental fact underlying the NPhardness of CSAT, the NPhardness of SAT relies on a clever trick that allows for encoding instances of CSAT as instances of SAT. As stated, the NPcompleteness of SAT is proved by encoding instances of CSAT as instances of SAT. Similarly, the NPcompleteness of other new problems is proved by encoding instances of problems that are already known to be NPcomplete. Typically, these encodings operate in a local manner, mapping small components of the original instance to local gadgets in the produced instance. Indeed, these problemspecific gadgets are the core of the encoding phenomenon. Presented with such a gadget, it is typically easy to verify that it works. Thus, one cannot be surprised by most of these gadgets, but the fact that they exist for thousands of natural problem is definitely amazing.
2.4. Three Relatively Advanced Topics In this section we discuss three relatively advanced topics. The first topic, which was eluded to in previous sections, is the notion of promise problems (Section 2.4.1). Next we present an optimal search algorithm for NP (Section 2.4.2), and discuss the class (coNP) of sets that are complements of sets in NP. Teaching note: These topics are typically not mentioned in a basic course on complexity. Still, depending on time constraints, we suggest discussing them at least at a high level.
2.4.1. Promise Problems Promise problems are a natural generalization of search and decision problems, where one explicitly considers a set of legitimate instances (rather than considering any string as 25
The fact that CSAT is in NP is a consequence of the fact that the circuit evaluation problem is solvable in polynomial time.
87
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
a legitimate instance). As noted previously, this generalization provides a more adequate formulation of natural computational problems (and indeed this formulation is used in all informal discussions). For example, in §2.3.3.2 we presented such problems using phrases like “given a graph and an integer . . .” (or “given a collection of sets . . .”). In other words, we assumed that the input instance has a certain format (or rather we “promised the solver” that this is the case). Indeed, we claimed that in these cases the assumption can be removed without affecting the complexity of the problem, but we avoided providing a formal treatment of this issue, which is done next. Teaching note: The notion of promise problems was originally introduced in the context of decision problems, and is typically used only in that context. However, we believe that promise problems are as natural in the context of search problems.
2.4.1.1. Definitions In the context of search problems, a promise problem is a relaxation in which one is only required to find solutions to instances in a predetermined set, called the promise. The requirement regarding efficient checkability of solutions is adapted in an analogous manner. Definition 2.29 (search problems with a promise): A search problem with a promise consists of a binary relation R ⊆ {0, 1}∗ × {0, 1}∗ and a promise set P. Such a problem is also referred to as the search problem R with promise P. • The search problem R with promise P is solved by algorithm A if for every x ∈ P it holds that (x, A(x)) ∈ R if x ∈ S R = {x : R(x) = ∅} and A(x) = ⊥ otherwise, where R(x) = {y : (x, y) ∈ R}. def The time complexity of A on inputs in P is defined as T AP (n) = maxx∈P∩{0,1}n {t A (x)}, where t A (x) is the running time of A(x) and T AP (n) = 0 if P ∩ {0, 1}n = ∅. • The search problem R with promise P is in the promise problem extension of PF if there exists a polynomialtime algorithm that solves this problem.26 • The search problem R with promise P is in the promise problem extension of PC if there exists a polynomial T and an algorithm A such that, for every x ∈ P and y ∈ {0, 1}∗ , algorithm A makes at most T (x) steps and it holds that A(x, y) = 1 if and only if (x, y) ∈ R. We stress that nothing is required of the solver in the case that the input violates the promise (i.e., x ∈ P); in particular, in such a case the algorithm may halt with a wrong output. (Indeed, the standard formulation of search problems is obtained by considering the trivial promise P = {0, 1}∗ .)27 In addition to the foregoing motivation for promise problems, we mention one natural class of search problems with a promise. These are search problem in which the promise is that the instance has a solution (i.e., in terms of 26
In this case it does not matter whether the time complexity of A is defined on inputs in P or on all possible strings. Suppose that A has (polynomial) time complexity T on inputs in P; then we can modify A to halt on any input x after at most T (x) steps. This modification may only effects the output of A on inputs not in P (which is OK by us). The modification can be implemented in polynomial time by computing t = T (x) and emulating the execution of A(x) for t steps. A similar comment applies to the definition of PC, P, and N P. 27 Here we refer to the formulation presented in Section 2.1.4.
88
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.4. THREE RELATIVELY ADVANCED TOPICS def
the foregoing notation P = S R , where S R = {x : ∃y s.t. (x, y) ∈ R}). We refer to such search problems by the name candid search problems. Definition 2.30 (candid search problems): An algorithm A solves the candid search problem of the binary relation R if for every x ∈ S R (i.e., for every (x, y) ∈ R) it holds that (x, A(x)) ∈ R. The time complexity of such an algorithm is defined def as T AS R (n) = maxx∈P∩{0,1}n {t A (x)}, where t A (x) is the running time of A(x) and T AS R (n) = 0 if P ∩ {0, 1}n = ∅. Note that nothing is required when x ∈ S R : In particular, algorithm A may either output a wrong solution (although no solutions exist) or run for more than T AS R (x) steps. The first case can be essentially eliminated whenever R ∈ PC. Furthermore, for R ∈ PC, if we “know” the time complexity of algorithm A (e.g., if we can compute T AS R (n) in poly(n)time), then we may modify A into an algorithm A that solves the (general) search problem of R (i.e., halts with a correct output on each input) in time T A (n) = T AS R (n) + poly(n). However, we do not necessarily know the running time of an algorithm that we consider. Furthermore, as we shall see in Section 2.4.2, the naive assumption by which we always know the running time of an algorithm that we design is not valid either. Decision problems with a promise. In the context of decision problems, a promise problem is a relaxation in which one is only required to determine the status of instances that belong to a predetermined set, called the promise. The requirement of efficient verification is adapted in an analogous manner. In view of the standard usage of the term, we refer to decision problems with a promise by the name promise problems. Formally, promise problems refer to a threeway partition of the set of all strings into yesinstances, noinstances, and instances that violate the promise. Standard decision problems are obtained as a special case by insisting that all inputs are allowed (i.e., the promise is trivial). Definition 2.31 (promise problems): A promise problem consists of a pair of nonintersecting sets of strings, denoted (Syes , Sno ), and Syes ∪ Sno is called the promise. • The promise problem (Syes , Sno ) is solved by algorithm A if for every x ∈ Syes it holds that A(x) = 1 and for every x ∈ Sno it holds that A(x) = 0. The promise problem is in the promise problem extension of P if there exists a polynomialtime algorithm that solves it. • The promise problem (Syes , Sno ) is in the promise problem extension of N P if there exists a polynomial p and a polynomialtime algorithm V such that the following two conditions hold: 1. Completeness: For every x ∈ Syes , there exists y of length at most p(x) such that V (x, y) = 1. 2. Soundness: For every x ∈ Sno and every y, it holds that V (x, y) = 0. We stress that for algorithms of polynomialtime complexity, it does not matter whether the time complexity is defined only on inputs that satisfy the promise or on all strings (see footnote 26). Thus, the extended classes P and N P (like PF and PC) are invariant under this choice.
89
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Reducibility among promise problems. The notion of a Cookreduction extend naturally to promise problems, when postulating that a query that violates the promise (of the problem at the target of the reduction) may be answered arbitrarily.28 That is, the oracle machine should solve the original problem no matter how queries that violate the promise are answered. The latter requirement is consistent with the conceptual meaning of reductions and promise problems. Recall that reductions capture procedures that make subroutine calls to an arbitrary procedure that solves the reduced problem. But, in the case of promise problems, such a solver may behave arbitrarily on instances that violate the promise. We stress that the main property of a reduction is preserved (see Exercise 2.35): If the promise problem is Cookreducible to a promise problem that is solvable in polynomial time, then is solvable in polynomial time. We warn that the extension of a complexity class to promise problems does not necessarily inherit the “structural” properties of the standard class. For example, in contrast to Theorem 2.35, there exists promise problems in N P ∩ coN P such that every set in N P can be Cookreduced to them: see Exercise 2.36. Needless to say, N P = coN P does not seem to follow from Exercise 2.36. See further discussion at the end of §2.4.1.2.
2.4.1.2. Applications The following discussion refers both to the decision and search versions of promise problems. Recall that promise problems offer the most direct way of formulating natural computational problems (e.g., when referring to computational problems regarding graphs, the promise mandates that the input is a graph). In addition to the foregoing application of promise problems, we mention their use in formulating the natural notion of a restriction of a computational problem to a subset of the instances. Specifically, such a restriction means that the promise set of the restricted problem is a subset of the promise set of the unrestricted problem. Definition 2.32 (restriction of computational problems): • For any P ⊆ P and binary relation R, we say that the search problem R with promise P is a restriction of the search problem R with promise P. , Sno ) is a restriction of the promise problem • We say that the promise problem (Syes (Syes , Sno ) if both Syes ⊆ Syes and Sno ⊆ Sno hold. For example, when we say that 3SAT is a restriction of SAT, we refer to the fact that the set of allowed instances is now restricted to 3CNF formulae (rather than to arbitrary CNF formulae). In both cases, the computational problem is to determine satisfiability (or to find a satisfying assignment), but the set of instances (i.e., the promise set) is further restricted in the case of 3SAT. The fact that a restricted problem is never harder than the original problem is captured by the fact that the restricted problem is reducible to the original one (via the identity mapping). Other uses and some reservations. In addition to the two aforementioned generic uses of the notion of a promise problem, we mention that this notion provides adequate 28
It follows that Karpreductions among promise problems are not allowed to make queries that violate the promise. Specifically, we say that the promise problem = (yes , no ) is Karpreducible to the promise problem = (yes , no ) if there exists a polynomialtime mapping f such that for every x ∈ yes (resp., x ∈ no ) it holds that f (x) ∈ yes (resp., f (x) ∈ no ).
90
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.4. THREE RELATIVELY ADVANCED TOPICS
formulations for a variety of specific computational complexity notions and results. Examples include the notion of “unique solutions” (see Section 6.2.3) and the formulation of “gap problems” as capturing various approximation tasks (see Section 10.1). In all these cases, promise problems allow for discussing natural computational problems and making statements about their inherent complexity. Thus, the complexity of promise problems (and classes of such problems) addresses natural questions and concerns. Consequently, demonstrating the intractability of a promise problem that belongs to some class (e.g., saying that some promise problem in N P cannot be solved by a polynomialtime algorithm) carries the same conceptual message as demonstrating the intractability of a standard problem in the corresponding class. In contrast, as indicated at the end of §2.4.1.1, structural properties of promise problems may not hold for the corresponding classes of standard problems (e.g., see Exercise 2.36). Indeed, we do distinguish here between the inherent (or absolute) properties such as intractability and structural (or relative) properties such as reducibility.
2.4.1.3. The Standard Convention of Avoiding Promise Problems Recall that, although promise problems provide a good framework for presenting natural computational problems, we managed to avoid this framework in previous sections. This was done by relying on the fact that, for all the (natural) problems considered in the previous sections, it is easy to decide whether or not a given instance satisfies the promise. For example, given a formula it is easy to decide whether or not it is in CNF (or 3CNF). Actually, the issue arises already when talking about formulae: What we are actually given is a string that is supposed to encode a formula (under some predetermined encoding scheme), and so the promise (which is easy to decide for natural encoding schemes) is that the input string is a valid encoding of some formula. In any case, if the promise is efficiently recognizable (i.e., membership in it can be decided in polynomial time), then we may avoid mentioning the promise by using one of the following two “nasty” conventions: 1. Extending the set of instances to the set of all possible strings (and allowing trivial solutions for the corresponding dummy instances). For example, in the case of a search problem, we may either define all instances that violate the promise to have no solution or define them to have a trivial solution (e.g., be a solution for themselves); that is, for a search problem R with promise P, we may consider the (standard) search problem of R where R is modified such that R(x) = ∅ for every x ∈ P (or, say, R(x) = {x} for every x ∈ P). In the case of a promise (decision) problem (Syes , Sno ), we may consider the problem of deciding membership in Syes , which means that instances that violate the promise are considered as noinstances. 2. Considering every string as a valid encoding of an object that satisfies the promise. That is, fixing any string x0 that satisfies the promise, we consider every string that violates the promise as if it were x0 . In the case of a search problem R with promise P, this means considering the (standard) search problem of R where R is modified such that R(x) = R(x0 ) for every x ∈ P. Similarly, in the case of a promise (decision) problem (Syes , Sno ), we consider the problem of deciding membership in Syes (provided x0 ∈ Sno and otherwise we consider the problem of deciding membership in {0, 1}∗ \ Sno ). We stress that in the case that the promise is efficiently recognizable the aforementioned conventions (or modifications) do not effect the complexity of the relevant (search or
91
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
decision) problem. That is, rather than considering the original promise problem, we consider a (search or decision) problem (without a promise) that is computationally equivalent to the original one. Thus, in some sense we lose nothing by studying the latter problem rather than the original one. On the other hand, even in the case that these two problems are computationally equivalent, it is useful to have a formulation that allows for distinguishing between them (as we do distinguish between the different NPcomplete problems although they are all computationally equivalent). This conceptual concern becomes of crucial importance in the case (to be discussed next) that the promise is not efficiently recognizable. The foregoing transformations of promise problems into computationally equivalent standard (decision and search) problems do not necessarily preserve the complexity of the problem in the case that the promise is not efficiently recognizable. In this case, the terminology of promise problems is unavoidable. Consider, for example, the problem of deciding whether a Hamiltonian graph is 3colorable. On the face of it, such a problem may have fundamentally different complexity than the problem of deciding whether a given graph is both Hamiltonian and 3colorable. In spite of the foregoing opinions, we adopt the convention of focusing on standard decision and search problems. That is, by default, all complexity classes discussed in this book refer to standard decision and search problems, and the exceptions in which we refer to promise problems are explicitly stated as such. Such exceptions appear in Sections 2.4.2, 6.1.3, 6.2.3, and 10.1.
2.4.2. Optimal Search Algorithms for NP We actually refer to solving the candid search problem of any relation in PC. Recall that PC is the class of search problems that allow for efficient checking of the correctness of candidate solutions (see Definition 2.3), and that the candid search problem is a search problem in which the solver is promised that the given instance has a solution (see Definition 2.30). We claim the existence of an optimal algorithm for solving the candid search problem of any relation in PC. Furthermore, we will explicitly present such an algorithm, and prove that it is optimal in a very strong sense: For any algorithm solving the candid search problem of R ∈ PC, our algorithm solves the same problem in time that is at most a constant factor slower (ignoring a fixed additive polynomial term, which may be disregarded in the case that the problem is not solvable in polynomial time). Needless to say, we do not know the time complexity of the aforementioned optimal algorithm (indeed if we knew it then we would have resolved the PvsNP Question). In fact, the PvsNP Question boils down to determining the time complexity of a single explicitly presented algorithm (i.e., the optimal algorithm claimed in Theorem 2.33). Theorem 2.33: For every binary relation R ∈ PC there exists an algorithm A that satisfies the following: 1. A solves the candid search problem of R. 2. There exists a polynomial p such that for every algorithm A that solves the candid search problem of R and for every x ∈ S R (i.e., for every (x, y) ∈ R) it holds that t A (x) = O(t A (x) + p(x)), where t A (x) (resp., t A (x)) denotes the number of steps taken by A (resp., A ) on input x.
92
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.4. THREE RELATIVELY ADVANCED TOPICS
Interestingly, we establish the optimality of A without knowing what its (optimal) running time is. Furthermore, the optimality claim is “pointwise” (i.e., it refers to any input) rather than “global” (i.e., referring to the (worstcase) time complexity as a function of the input length). We stress that the hidden constant in the Onotation depends only on A , but in the following proof this dependence is exponential in the length of the description of algorithm A (and it is not known whether a better dependence can be achieved). Indeed, this dependence as well as the idea underlying it constitute one negative aspect of this otherwise amazing result. Another negative aspect is that the optimality of algorithm A refers only to inputs that have a solution (i.e., inputs in S R ). Finally, we note that the theorem as stated refers only to models of computation that have machines that can emulate a given number of steps of other machines with a constant overhead. We mention that in most natural models the overhead of such emulation is at most polylogarithmic in the number A (x) + p(x)). of steps, in which case it holds that t A (x) = O(t Proof Sketch: Fixing R, we let M be a polynomialtime algorithm that decides membership in R, and let p be a polynomial bounding the running time of M (as a function of the length of the first element in the input pair). Using M, we present an algorithm A that solves the candid search problem of R as follows. On input x, algorithm A emulates all possible search algorithms “in parallel” (on input x), checks the result provided by each of them (using M), and halts whenever it recognizes a correct solution. Indeed, most of the emulated algorithms are totally irrelevant to the search, but using M we can screen the bad solutions offered by them and output a good solution once obtained. Since there are infinitely many possible algorithms, it may not be clear what we mean by the expression “emulating all possible algorithms in parallel.” What we mean is emulating them at different “rates” such that the infinite sum of these rates converges to 1 (or to any other constant). Specifically, we will emulate the i th possible algorithm at rate 1/(i + 1)2 , which means emulating a single step of this algorithm per (i + 1)2 emulation steps (performed for all algorithms). Note that a straightforward implementation of this idea may create a significant overhead, involved in switching frequently from the emulation of one machine to the emulation of another. Instead, we present an alternative implementation that proceeds in iterations. In the j th iteration, for i = 1, . . . , 2 j/2 − 1, algorithm A emulates 2 j /(i + 1)2 steps of the i th machine (where the machines are ordered according to the lexicographic order of their descriptions). Each of these emulations is conducted in one chunk, and thus the overhead of switching between the various emulations is insignificant (in comparison to the total number of steps being emulated). In the case that one or more of these emulations (on input x) halt with output y, algorithm A invokes M on input (x, y) and output y if and only if M(x, y) = 1. Furthermore, the verification of a solution provided by a candidate algorithm is also emulated at the expense of its step count. (Put in other words, we augment each algorithm with a canonical procedure (i.e., M) that checks the validity of the solution offered by the algorithm.) By its construction, whenever A(x) outputs a string y (i.e., y = ⊥) it must hold that (x, y) ∈ R. To show the optimality of A, we consider an arbitrary algorithm A that solves the candid search problem of R. Our aim is to show that A is
93
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
not much slower than A . Intuitively, this is the case because the overhead of A results from emulating other algorithms (in addition to A ), but the total number of emulation steps wasted (due to these algorithms) is inversely proportional to the rate of algorithm A , which in turn is exponentially related to the length of the description of A . The punch line is that since A is fixed, the length of its description is a constant. Details follow. For every x, let us denote by t (x) the number of steps taken by A on input x, where t (x) also accounts for the running time of M(x, ·); that is, t (x) = t A (x) + p(x), where t A (x) is the number of steps taken by A (x) itself. Then, the emulation of t (x) steps of A on input x is “covered” by the j th iteration of A, provided that 2 j /(2A +1 )2 ≥ t (x) where A  denotes the length of the description of A . (Indeed, we use the fact that the algorithms are emulated in lexicographic order, and note that there are at most 2A +1 − 2 algorithms that precede A in lexicographic order.) Thus, on input x, algorithm A halts after at most j A (x) iterations, where j A (x) = 2(A  + 1) + log2 (t A (x) + p(x)), after emulating a total number of steps that is at most def
t(x) =
j A (x) 2 j/2 −1 j=1
i=1
2j < 2 j A (x)+1 = 22A +3 · (t A (x) + p(x)). (i + 1)2
The question of how much time is required for emulating these many steps depends on the specific model of computation. In many models of computation, the emulation steps of the emulating of t steps of one machine by another machine requires O(t) machines, and in some models this emulation can even be performed with constant overhead. The theorem follows. Comment. By construction, the foregoing algorithm A does not halt on input x ∈ S R . This can be easily rectified by letting A emulate a straightforward exhaustive search for a solution, and halt with output ⊥ if and only if this exhaustive search indicates that there is no solution to the current input. This extra emulation can be performed in parallel to all other emulations (e.g., at a rate of one step for the extra emulation per each step of everything else).
2.4.3. The Class coNP and Its Intersection with NP By prepending the name of a complexity class (of decision problems) with the prefix “co” we mean the class of complement sets; that is, coC = {{0, 1}∗ \ S : S ∈ C}. def
(2.4)
Specifically, coN P = {{0, 1}∗ \ S : S ∈ N P} is the class of sets that are complements of sets in N P. Recalling that sets in N P are characterized by their witness relations such that x ∈ S if and only if there exists an adequate NPwitness, it follows that their complement sets consist of all instances for which there are no NPwitnesses (i.e., x ∈ {0, 1}∗ \ S if there is no NPwitness for x being in S). For example, SAT ∈ N P implies that the set of unsatisfiable CNF formulae is in coN P. Likewise, the set of graphs that are not 3colorable is in coN P. (Jumping ahead, we mention that it is widely believed that these sets are not in N P.)
94
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
2.4. THREE RELATIVELY ADVANCED TOPICS
Another perspective on coN P is obtained by considering the search problems in PC. Recall that for such R ∈ PC, the set of instances having a solution (i.e., S R = {x : ∃y s.t. (x, y) ∈ R}) is in N P. It follows that the set of instances having no solution (i.e., {0, 1}∗ \ S R = {x : ∀y (x, y) ∈ R}) is in coN P. It is widely believed that N P = coN P (which means that N P is not closed under complementation). Indeed, this conjecture implies P = N P (because P is closed under complementation). The conjecture N P = coN P means that some sets in coN P do not have NPproof systems (because N P is the class of sets having NPproof systems). As we will show next, under this conjecture, the complements of NPcomplete sets do not have NPproof systems; for example, there exists no NPproof system for proving that a given CNF formula is not satisfiable. We first establish this fact for NPcompleteness in the standard sense (i.e., under Karpreductions, as in Definition 2.17). Proposition 2.34: Suppose that N P = coN P and let S ∈ N P such that every set def in N P is Karpreducible to S. Then S = {0, 1}∗ \ S is not in N P. Proof Sketch: We first observe that the fact that every set in N P is Karpreducible to S implies that every set in coN P is Karpreducible to S. We next claim that if S is in N P then every set that is Karpreducible to S is also in N P. Applying the claim to S = S, we conclude that S ∈ N P implies coN P ⊆ N P, which in turn implies N P = coN P in contradiction to the main hypothesis. We now turn to prove the foregoing claim; that is, we prove that if S has an NPproof system and S is Karpreducible to S then S has an NPproof system. Let V be the verification procedure associated with S , and let f be a Karpreduction of S to S . Then, we define the verification procedure V (for membership in S ) by V (x, y) = V ( f (x), y). That is, any NPwitness that f (x) ∈ S serves as an NPwitness for x ∈ S (and these are the only NPwitnesses for x ∈ S ). This may not be a “natural” proof system (for S ), but it is definitely an NPproof system for S . Assuming that N P = coN P, Proposition 2.34 implies that sets in N P ∩ coN P cannot be NPcomplete with respect to Karpreductions. In light of other limitations of Karpreductions (see, e.g., Exercise 2.7), one may wonder whether or not the exclusion of NPcomplete sets from the class N P ∩ coN P is due to the use of a restricted notion of reductions (i.e., Karpreductions). The following theorem asserts that this is not the case: Some sets in N P cannot be reduced to sets in the intersection N P ∩ coN P even under general reductions (i.e., Cookreductions). Theorem 2.35: If every set in N P can be Cookreduced to some set in N P ∩ coN P then N P = coN P. In particular, assuming N P = coN P, no set in N P ∩ coN P can be NPcomplete, even when NPcompleteness is defined with respect to Cookreductions. Since N P ∩ coN P is conjectured to be a proper superset of P, it follows (assuming N P = coN P) that there are decision problems in N P that are neither in P nor NPhard (i.e., specifically, the decision problems in (N P ∩ coN P) \ P). We stress that Theorem 2.35 refers to standard decision problems and not to promise problems (see Section 2.4.1 and Exercise 2.36).
95
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Proof: Analogously to the proof of Proposition 2.34 , the current proof boils down to proving that if S is Cookreducible to a set in N P ∩ coN P then S ∈ N P ∩ coN P. Using this claim, the theorem’s hypothesis implies that N P ⊆ N P ∩ coN P, which in turn implies N P ⊆ coN P and N P = coN P (see Exercise 2.37). Fixing any S and S ∈ N P ∩ coN P such that S is Cookreducible to S , we prove that S ∈ N P (and the proof that S ∈ coN P is similar).29 Let us denote by M the oracle machine reducing S to S . That is, on input x, machine M makes queries and decides whether or not to accept x, and its decision is correct provided that all queries are answered according to S . To show that S ∈ N P, we will present an NPproof system for S. This proof system (or rather its verification procedure), denoted V , accepts a pair of the form (x, ((z 1 , σ1 , w1 ), . . . , (z t , σt , wt )) if the following two conditions hold: 1. On input x, machine M accepts after making the queries z 1 , . . . , z t , and obtaining the corresponding answers σ1 , . . . , σt . That is, V checks that, on input x, after obtaining the answers σ1 , . . . , σi−1 to the first i − 1 queries, the i th query made by M equals z i . In addition, V checks that, on input x and after receiving the answers σ1 , . . . , σt , machine M halts with output 1 (indicating acceptance). Note that V does not have oracle access to S . The procedure V rather emulates the computation of M(x) by answering, for each i, the i th query of M(x) by using the bit σi (provided to V as part of its input). The correctness of these answers will be verified (by V ) separately (i.e., see the next item). 2. For every i, it holds that if σi = 1 then wi is an NPwitness for z i ∈ S , whereas if σi = 0 then wi is an NPwitness for z i ∈ {0, 1}∗ \ S . Thus, if this condition holds then it is the case that each σi indicates the correct status of z i with respect to S (i.e., σi = 1 if and only if z i ∈ S ). def
We stress that we use the fact that both S and S = {0, 1}∗ \ S have NPproof systems, and refer to the corresponding NPwitnesses. Note that V is indeed an NPproof system for S. Firstly, the length of the corresponding witnesses is bounded by the running time of the reduction (and the length of the NPwitnesses supplied for the various queries). Next note that V runs in polynomialtime (i.e., verifying the first condition requires an emulation of the polynomialtime execution of M on input x when using the σi ’s to emulate the oracle, whereas verifying the second condition is done by invoking the relevant NPproof systems). Finally, observe that x ∈ S if and only if there exists a def sequence y = ((z 1 , σ1 , w1 ), . . . , (z t , σt , wt )) such that V (x, y) = 1. In particular, V (x, y) = 1 holds only if y contains a valid sequence of queries and answers as made in a computation of M on input x and oracle access to S , and M accepts based on that sequence. The world view – a digest. Recall that on top of the P = N P conjecture, we mentioned two other conjectures (which clearly imply P = N P): def
Alternatively, we show that S ∈ coN P by applying the following argument to S = {0, 1}∗ \ S and noting that S is Cookreducible to S (via S, or alternatively that S is Cookreducible to {0, 1}∗ \ S ∈ N P ∩ coN P). 29
96
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
NP
NPC P coNPC
coNP
Figure 2.5: The world view under P = coN P ∩ N P = N P .
1. The conjecture that N P = coN P (equivalently, N P ∩ coN P = N P). This conjecture is equivalent to the conjecture that CNF formulae have no short proofs of unsatisfiability (i.e., the set {0, 1}∗ \ SAT has no NPproof system). 2. The conjecture that N P ∩ coN P = P. Notable candidates for the class N P ∩ coN P = P include decision problems that are computationally equivalent to the integer factorization problem (i.e., the search problem (in PC) in which, given a composite number, the task is to find its prime factors). Combining these conjectures, we get the world view depicted in Figure 2.5, which also shows the class of coN Pcomplete sets (defined next). Definition 2.36: A set S is called coN P hard if every set in coN P is Karpreducible to S. A set is called coN P complete if it is both in coN P and coN Phard. Indeed, insisting on Karpreductions is essential for a distinction between N Phardness and coN Phardness.
Chapter Notes Many sources provide historical accounts of the developments that led to the formulation of the PvsNP Problem and to the discovery of the theory of NPcompleteness (see, e.g., [85, Sec. 1.5] and [221]). Still, we feel that we should not refrain from offering our own impressions, which are based on the texts of the original papers. Nowadays, the theory of NPcompleteness is commonly attributed to Cook [58], Karp [138], and Levin [152]. It seems that Cook’s starting point was his interest in theoremproving procedures for propositional calculus [58, p. 151]. Trying to provide evidence of the difficulty of deciding whether or not a given formula is a tautology, he identified N P as a class containing “many apparently difficult problems” (cf, e.g., [58, p. 151]), and showed that any problem in N P is reducible to deciding membership in the set of 3DNF tautologies. In particular, Cook emphasized the importance of the concept of polynomialtime reductions and the complexity class N P (both explicitly defined for
97
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
the first time in his paper). He also showed that CLIQUE is computationally equivalent to SAT, and envisioned a class of problems of the same nature. Karp’s paper [138] can be viewed as fulfilling Cook’s prophecy: Stimulated by Cook’s work, Karp demonstrated that a “large number of classic difficult computational problems, arising in fields such as mathematical programming, graph theory, combinatorics, computational logic and switching theory, are [NP]complete (and thus equivalent)” [138, p. 86]. Specifically, his list of twentyone NPcomplete problems includes Integer Linear Programming, Hamilton Circuit, Chromatic Number, Exact Set Cover, Steiner Tree, Knapsack, Job Scheduling, and Max Cut. Interestingly, Karp defined N P in terms of verification procedures (i.e., Definition 2.5), pointed to its relation to “backtrack search of polynomial bounded depth” [138, p. 86], and viewed N P as the residence of a “wide range of important computational problems” (which are not in P). Independently of these developments, while being in the USSR, Levin proved the existence of “universal search problems” (where universality meant NPcompleteness). The starting point of Levin’s work [152] was his interest in the “perebor” conjecture asserting the inherent need for brute force in some search problems that have efficiently checkable solutions (i.e., problems in PC). Levin emphasized the implication of polynomialtime reductions on the relation between the time complexity of the related problems (for any growth rate of the time complexity), asserted the NPcompleteness of six “classical search problems,” and claimed that the underlying method “provides a mean for readily obtaining” similar results for “many other important search problems.” It is interesting to note that although the works of Cook [58], Karp [138], and Levin [152] were received with different levels of enthusiasm, none of the contemporaries realized the depth of the discovery and the difficulty of the question posed (i.e., the PvsNP Question). This fact is evident in every account from the early 1970s, and may explain the frustration of the corresponding generation of researchers, which expected the PvsNP Question to be resolved in their lifetime (if not in a matter of years). Needless to say, the author’s opinion is that there was absolutely no justification for these expectations, and that one should have actually expected quite the opposite. We mention that the three “founding papers” of the theory of NPcompleteness (i.e., Cook [58], Karp [138], and Levin [152]) use the three different types of reductions used in this chapter. Specifically, Cook uses the general notion of polynomialtime reduction [58], often referred to as Cookreductions (Definition 2.9). The notion of Karpreductions (Definition 2.11) originates from Karp’s paper [138], whereas its augmentation to search problems (i.e., Definition 2.12) originates from Levin’s paper [152]. It is worth stressing that Levin’s work is stated in terms of search problems, unlike Cook’s and Karp’s works, which treat decision problems. The reductions presented in §2.3.3.2 are not necessarily the original ones. Most notably, the reduction establishing the NPhardness of the Independent Set problem (i.e., Proposition 2.26) is adapted from [74] (see also Exercise 9.18). In contrast, the reductions presented in §2.3.3.1 are merely a reinterpretation of the original reduction as presented in [58]. The equivalence of the two definitions of N P (i.e., Theorem 2.8) was proved in [138]. The existence of NPsets that are neither in P nor NPcomplete (i.e., Theorem 2.28) was proven by Ladner [149], Theorem 2.35 was proven by Selman [198], and the existence of optimal search algorithms for NPrelations (i.e., Theorem 2.33) was proven by Levin [152]. (Interestingly, the latter result was proven in the same paper in which Levin presented the discovery of NPcompleteness, independently of Cook and Karp.) Promise problems
98
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
were explicitly introduced by Even, Selman, and Yacobi [72]; see [94] for a survey of their numerous applications. We mention that the standard reductions used to establish natural NPcompleteness results have several additional properties or can be modified to have such properties. These properties include an efficient transformation of solutions in the direction of the reduction (see Exercise 2.28), the preservation of the number of solutions (see Exercise 2.29), the computability by a logspace algorithm (see Section 5.2.2), and the invertibility in polynomialtime (see Exercise 2.30). We also mention the fact that all known NPcomplete sets are (effectively) isomorphic (see Exercise 2.31).
Exercises Exercise 2.1 (PF contains problems that are not in PC): Show that PF contains some (unnatural) problems that are not in PC. Guideline: Consider the relation R = {(x, 1) : x ∈ {0, 1}∗ } ∪ {(x, 0) : x ∈ S}, where S is some undecidable set. Note that R is the disjoint union of two binary relations, denoted R1 and R2 , where R1 is in PF whereas R2 is not in PC. Furthermore, for every x it holds that R1 (x) = ∅.
Exercise 2.2: Show that any S ∈ N P has many different NPproof systems (i.e., verification procedures V1 , V2 , . . . such that Vi (x, y) = 1 does not imply V j (x, y) = 1 for i = j). Guideline: For V and p as in Definition 2.5, define Vi (x, y) = 1 if y = p(x) + i and there exists a prefix y of y such that V (x, y ) = 1.
Exercise 2.3: Relying on the fact that primality is decidable in polynomial time and assuming that there is no polynomialtime factorization algorithm, present two “natural but fundamentally different” NPproof systems for the set of composite numbers. Guideline: Consider the following verification procedures V1 and V2 for the set of composite numbers. Let V1 (n, y) = 1 if and only if y = n and n is not a prime, and V2 (n, m) = 1 if and only if m is a nontrivial divisor of n. Show that valid proofs with respect to V1 are easy to find, whereas valid proofs with respect to V2 are hard to find.
Exercise 2.4: Regarding Definition 2.7, show that if S is accepted by some nondeterministic machine of time complexity t then it is accepted by a nondeterministic machine of time complexity O(t) that has a transition function that maps each possible symbolstate pair to exactly two triples. Exercise 2.5: Verify the following properties of Cookreductions: 1. If is Cookreducible to and is solvable in polynomial time then so is . 2. Cookreductions are transitive (i.e., if is Cookreducible to and is Cookreducible to then is Cookreducible to ). 3. If is solvable in polynomial time then it is Cookreducible to any problem . In continuation of the last item, show that a problem is solvable in polynomial time if and only if it is Cookreducible to a trivial problem (e.g., deciding membership in the empty set). Exercise 2.6: Show that Karpreductions (and Levinreductions) are transitive.
99
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Exercise 2.7: Show that some decision problems are not Karpreducible to their complement (e.g., the empty set is not Karpreducible to {0, 1}∗ ). A popular exercise of dubious nature is showing that any decision problem in P is Karpreducible to any nontrivial decision problem, where the decision problem regarding a set S is called nontrivial if S = ∅ and S = {0, 1}∗ . It follows that every nontrivial set in P is Karpreducible to its complement. Exercise 2.8 (reducing search problems to optimization problems): For every polynomially bounded relation R (resp., R ∈ PC), present a function f (resp., a polynomialtime computable function f ) such that the search problem of R is computationally equivalent to the search problem in which given (x, v) one has to find a y ∈ {0, 1}poly(x) such that f (x, y) ≥ v. (Hint: Use a Boolean function.)
Exercise 2.9 (binary search): Show that using binary queries of the form “is z < v” it is possible to determine the value of an integer z that is a priori known to reside in the interval [0, 2 − 1]. Guideline: Consider a process that iteratively halves the interval in which z is known to reside.
Exercise 2.10: Show that if R ∈ PC \ PF is selfreducible then the relevant Cookreduction makes more than a logarithmic number of queries to S R . More generally, show that if R ∈ PC \ PF is Cookreducible to any decision problem, then this reduction makes more than a logarithmic number of queries. Guideline: Note that the oracle answers can be emulated by trying all possibilities, and that the correctness of the output of the oracle machine can be efficiently tested.
Exercise 2.11: Show that the standard search problem of Graph 3Colorability is selfreducible, where this search problem consists of finding a 3coloring for a given input graph. Guideline: Iteratively extend the current prefix of a 3coloring of the graph by making adequate oracle calls to the decision problem of Graph 3Colorability. (Specifically, encode the question of whether or not (χ1 , . . . , χt ) ∈ {1, 2, 3}t is a prefix of a 3coloring of the graph G as a query regarding the 3colorability of an auxiliary graph G .)30
Exercise 2.12: Show that the standard search problem of Graph Isomorphism is selfreducible, where this search problem consists of finding an isomorphism between a given pair of graphs. Guideline: Iteratively extend the current prefix of an isomorphism between the two N vertex graphs by making adequate oracle calls to the decision problem of Graph Isomorphism. (Specifically, encode the question of whether or not (π1 , . . . , πt ) ∈ [N ]t 30
Note that we merely need to check whether G has a 3coloring in which the equalities and inequalities induced by (χ1 , . . . , χt ) hold. This can be done by adequate gadgets (e.g., inequality is enforced by an edge between the corresponding vertices, whereas equality is enforced by an adequate subgraph that includes the relevant vertices as well as auxiliary vertices). For Part 1 of Exercise 2.13, equality is better enforced by combining the two vertices.
100
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
is a prefix of an isomorphism between G 1 = ([N ], E 1 ) and G 2 = ([N ], E 2 ) as a query regarding isomorphism between two auxiliary graphs G 1 and G 2 .)31
Exercise 2.13 (downward selfreducibility): We say that a set S is downward selfreducible if there exists a Cookreduction of S to itself that only makes queries that are each shorter than the reduction’s input (i.e., if on input x the reduction makes the query q then q < x).32 1. Show that SAT is downward selfreducible with respect to a natural encoding of CNF formulae. Note that this encoding should have the property that instantiating a variable in a formula results in a shorter formula. A harder exercise consists of showing that Graph 3Colorability is downward selfreducible with respect to some reasonable encoding of graphs. Note that this encoding has to be selected carefully (if it is to work for a process analogous to the one used in Exercise 2.11). 2. Suppose that S is downward selfreducible by a reduction that outputs the disjunction of the oracle answers. (Note that this is the case for SAT.) Show that in this case, S is characterized by a witness relation R ∈ PC (i.e., S = {x : R(x) = ∅}) that is selfreducible (i.e., the search problem of R is Cookreducible to S). Needless to say, it follows that S ∈ N P. Guideline: Include (x0 , x1 , . . . , xt ) in R if xt ∈ S ∩ {0, 1} O(1) and, for every i ∈ {0, 1, . . . , t − 1}, on input xi the selfreduction makes a set of queries that contains xi+1 . Prove that, indeed, R ∈ PC and S = {x : R(x) = ∅}.
Note that the notion of downward selfreducibility may be generalized in some natural ways. For example, we may say that S is downward selfreducible also in case it is computationally equivalent via Karpreductions to some set that is downward selfreducible (in the foregoing strict sense). Note that Part 2 still holds. Exercise 2.14 (NP problems that are not selfreducible): 1. Assuming that P = N P ∩ coN P, show that there exists a search problem that is in PC but is not selfreducible. Guideline: Given S ∈ N P ∩ coN P \ P, present relations R1 , R2 ∈ PC such that S = {x : R1 (x) = ∅} = {x : R2 (x) = ∅}. Then, consider the relation R = {(x, 1y) : (x, y) ∈ R1 } ∪ {(x, 0y) : (x, y) ∈ R2 }, and prove that R ∈ PF but S R = {0, 1}∗ .
2. Prove that if a search problem R is not selfreducible then (1) R ∈ PF and (2) the set S R = {(x, y ) : ∃y s.t. (x, y y ) ∈ R} is not Cookreducible to S R = {x : ∃y s.t. (x, y) ∈ R}. Exercise 2.15 (extending any prefix of any solution versus PC and PF): Assuming that P = N P, present a search problem R in PC ∩ PF such that deciding S R is not reducible to the search problem of R. 31
This can be done by attaching adequate gadgets to pairs of vertices that we wish to be mapped to one another (by the isomorphism). For example, we may connect the vertices in the i th pair to an auxiliary star consisting of (N + i) vertices. 32 Note that on some instances the reduction may make no queries at all. (This prevents a possible nonviability of the definition due to very short instances.)
101
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Guideline: Consider the relation R = {(x, 0x) : x ∈ {0, 1}∗ } ∪ {(x, 1y) : (x, y) ∈ R }, where R is an arbitrary relation in PC \ PF , and prove that R ∈ PF but S R ∈ P.
Exercise 2.16: In continuation of Exercise 2.14, present a natural search problem R in PC such that if factoring integers is intractable then the search problem R (and so also S R ) is not reducible to S R . Guideline: Consider the relation R such that (N , Q) ∈ R if the integer Q is a nontrivial divisor of the integer N . Use the fact that the set of prime numbers is in P.
Exercise 2.17: In continuation of Exercises 2.14 and 2.16, show that under suitable assumptions, there exist relations R1 , R2 ∈ PC having the same implicit decision problem (i.e., {x : R1 (x) = ∅} = {x : R2 (x) = ∅}) such that R1 is selfreducible but R2 is not. Exercise 2.18: Provide an alternative proof of Theorem 2.16 without referring to the set S R = {(x, y ) : ∃y s.t. (x, y y ) ∈ R}. (Hint: Use Proposition 2.15.) Guideline: Reduce the search problem of R to the search problem of RSAT , next reduce RSAT to SAT, and finally reduce SAT to S R . Justify the existence of each of these three reductions.
Exercise 2.19: Prove that Bounded Halting and Bounded Nonhalting are NPcomplete, where the problems are defined as follows. The instance consists of a pair (M, 1t ), where M is a Turing machine and t is an integer. The decision version of Bounded Halting (resp., Bounded Nonhalting) consists of determining whether or not there exists an input (of length at most t) on which M halts (resp., does not halt) in t steps, whereas the search problem consists of finding such an input. Guideline: Either modify the proof of Theorem 2.19 or present a reduction of (say) the search problem of Ru to the search problem of Bounded (Non)Halting. (Indeed, the exercise is more straightforward in the case of Bounded Halting.)
Exercise 2.20: In the proof of Theorem 2.21, we claimed that the value of each entry in the “array of configurations” of a machine M is determined by the values of the three entries that reside in the row above it (as in Figure 2.1). Present a function f M : 3 → , where = × (Q ∪ {⊥}), that substantiates this claim. Guideline: For example, for every σ1 , σ2 , σ3 ∈ , it holds that f M ((σ1 , ⊥), (σ2 , ⊥), (σ3 , ⊥)) = (σ2 , ⊥). More interestingly, if the transition function of M maps (σ, q) to (τ, p, +1) then, for every σ1 , σ2 , σ3 ∈ Q, it holds that f M ((σ, q), (σ2 , ⊥), (σ3 , ⊥)) = (σ2 , p) and f M ((σ1 , ⊥), (σ, q), (σ3 , ⊥)) = (τ, ⊥).
Exercise 2.21: Present and analyze a reduction of SAT to 3SAT. Guideline: For a clause C, consider auxiliary variables such that the i th variable indicates whether one of the first i literals is satisfied, and replace C by a 3CNF that uses the original variables of C as well as the auxiliary variables. For example, the t clause ∨i=1 xi is replaced by the conjunction of 3CNFs that are logically equivalent to the formulae (y2 ≡ (x1 ∨ x2 )), (yi ≡ (yi−1 ∨ xi )) for i = 3, . . . , t, and yt . We comment that this is not the standard reduction, but we find it conceptually more appealing.33 33
t The standard reduction replaces the clause ∨i=1 xi by the conjunction of the 3CNFs (x1 ∨ x2 ∨ z 2 ), ((¬z i−1 ) ∨ xi ∨ z i ) for i = 3, . . . , t, and ¬z t .
102
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 2.22 (efficient solvability of 2SAT): In contrast to Exercise 2.21, prove that 2SAT (i.e., the satisfiability of 2CNF formulae) is in P. Guideline: Consider the following “forcing process” for CNF formulae. If the formula contains a singleton clause (i.e., a clause having a single literal), then the corresponding variable is assigned the only value that satisfies the clause, and the formula is simplified accordingly (possibly yielding a constant formula, which is either true or false). The process is repeated until the formula is either a constant or contains only nonsingleton clauses. Note that a formula φ is satisfiable if and only if the formula obtained from φ by the forcing process is satisfiable. Consider the following algorithm for solving the search problem associated with 2SAT. 1. Choose an arbitrary variable in φ. For each σ ∈ {0, 1}, denote by φσ the formula obtained from φ by assigning this variable the value σ . 2. If, for some σ ∈ {0, 1}, applying the forcing process to φσ yields a (nonconstant) 2CNF formula φ , then set φ ← φ and goto Step 1. (The case that this happens for both σ ∈ {0, 1} is treated as the case that this happens for a single σ ; that is, in such a case we proceed with an arbitrary choice of σ .) 3. If one of these assignments yields (via the application of the forcing process) the constant true then we halt with a satisfying assignment for the original formula. Otherwise (i.e., both assignments yield the constant false), we halt asserting that the original formula is unsatisfiable. Proving the correctness of this algorithm boils down to observing that the arbitrary choice made in Step 2 is immaterial. Indeed, this observation relies on the fact that we refer to 2CNF formulae.
Exercise 2.23 (Integer Linear Programming): Prove that the following problem is NPcomplete. An instance of the problem is a systems of linear inequalities (say with integer constants), and the problem is to determine whether the system has an integer solution. A typical instance of this decision problem follows. x + 2y − z ≥ 3 −3x − z ≥ −5 x ≥0 −x ≥ −1 Guideline: Reduce from SAT. Specifically, consider an arithmetization of the input CNF by replacing ∨ with addition and ¬x by 1 − x. Thus, each clause gives rise to an inequality (e.g., the clause x ∨ ¬y is replaced by the inequality x + (1 − y) ≥ 1, which simplifies to x − y ≥ 2). Enforce a 01 solution by introducing inequalities of the form x ≥ 0 and −x ≥ −1, for every variable x.
Exercise 2.24 (Maximum Satisfiability of Linear Systems over GF(2)): Prove that the following problem is NPcomplete. An instance of the problem consists of a systems of linear equations over GF(2) and an integer k, and the problem is to determine whether there exists an assignment that satisfies at least k equations. (Note that the problem of determining whether there exists an assignment that satisfies all the equations is in P.) Guideline: Reduce from 3SAT, using the following arithmetization. Replace each clause that contains t ≤ 3 literals by 2t − 1 linear GF(2) equations that correspond to
103
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
the different nonempty subsets of these literals, and assert that their sum (modulo 2) equals one; for example, the clause x ∨ ¬y is replaced by the equations x + (1 − y) = 1, x = 1, and 1 − y = 1. Identifying {false, true} with {0, 1}, prove that if the original clause is satisfied by a Boolean assignment v then exactly 2t−1 of the corresponding equations are satisfied by v, whereas if the original clause is unsatisfied by v then none of the corresponding equations is satisfied by v.
Exercise 2.25 (Satisfiability of Quadratic Systems over GF(2)): Prove that the following problem is NPcomplete. An instance of the problem consists of a system of quadratic equations over GF(2), and the problem is to determine whether there exists an assignment that satisfies all the equations. Note that the result holds also for systems of quadratic equations over the reals (by adding conditions that enforce a value in {0, 1}). Guideline: Start by showing that the corresponding problem for cubic equations is NPcomplete, by a reduction from 3SAT that maps the clause x ∨ ¬y ∨ z to the equation (1 − x) · y · (1 − z) = 0. Reduce the problem for cubic equations to the problem for quadratic equations by introducing auxiliary variables; that is, given an instance with variables x1 , . . . , xn , introduce the auxiliary variables xi, j ’s and add equations of the form xi, j = xi · x j .
Exercise 2.26 (Clique and Independent Set): An instance of the Independent Set problem consists of a pair (G, K ), where G is a graph and K is an integer, and the question is whether or not the graph G contains an independent set (i.e., a set with no edges between its members) of size (at least) K . The Clique problem is analogous. Prove that both problems are computationally equivalent via Karpreductions to the Vertex Cover problem. Exercise 2.27 (an alternative proof of Proposition 2.26): Consider the following sketch of a reduction of 3SAT to Independent Set. On input a 3CNF formula φ with m clauses and n variables, we construct a graph G φ consisting of m triangles (corresponding to the m clauses) augmented with edges that link conflicting literals. That is, if x appears as the i 1th literal of the j1th clause and ¬x appears as the i 2th literal of the j2th clause, then we draw an edge between the i 1th vertex of the j1th triangle and the i 2th vertex of the j2th triangle. Prove that φ ∈ 3SAT if and only if G φ has an independent set of size m. Exercise 2.28 (additional properties of standard reductions): In continuation of the discussion in the main text, consider the following augmented form of Karpreductions. Such a reduction of R to R consists of three polynomialtime mappings ( f, h, g) such that f is a Karpreduction of S R to S R and the following two conditions hold: 1. For every (x, y) ∈ R it holds that ( f (x), h(x, y)) ∈ R . 2. For every ( f (x), y ) ∈ R it holds that (x, g(x, y )) ∈ R. (We note that this definition is actually the one used by Levin in [152], except that he restricted h and g to depend only on their second argument.) Prove that such a reduction implies both a Karpreduction and a Levinreduction, and show that all reductions presented in this chapter satisfy this augmented requirement. Furthermore, prove that in all these cases the main mapping (i.e., f ) is 11 and polynomialtime invertible.
104
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 2.29 (parsimonious reductions): Let R, R ∈ PC and let f be a Karpreduction of S R = {x : R(x) = ∅} to S R = {x : R (x) = ∅}. We say that f is parsimonious (with respect to R and R ) if for every x it holds that R(x) = R ( f (x)). For each of the reductions presented in this chapter, check whether or not it is parsimonious. For the reductions that are not parsimonious, find alternative reductions that are parsimonious (cf. [85, Sec. 7.3]). Exercise 2.30 (on polynomialtime invertible reductions (following [37])): We say that a set S is markable if there exists a polynomialtime (marking) algorithm M such that 1. For every x, α ∈ {0, 1}∗ it holds that (a) M(x, α) ∈ S if and only if x ∈ S. (b) M(x, α) > x. 2. There exists a polynomialtime (demarking) algorithm D such that, for every x, α ∈ {0, 1}∗ , it holds that D(M(x, α)) = α. Note that all natural NPsets (e.g., those considered in this chapter) are markable (e.g., for SAT, one may mark a formula by augmenting it with additional satisfiable clauses that use specially designated auxiliary variables). Prove that if S is Karpreducible to S and S is markable then S is Karpreducible to S by a lengthincreasing, onetoone, and polynomialtime invertible mapping.34 Infer that for any natural NPcomplete problem S, any set in N P is Karpreducible to S by a lengthincreasing, onetoone, and polynomialtime invertible mapping. Guideline: Let f be a Karpreduction of S to S, and let M be the guaranteed marking algorithm. Consider the reduction that maps x to M( f (x), x).
Exercise 2.31 (on the isomorphism of NPcomplete sets (following [37])): Suppose that S and T are Karpreducible to one another by lengthincreasing, onetoone, and polynomialtime invertible mappings, denoted f and g, respectively. Using the following guidelines, prove that S and T are “effectively” isomorphic; that is, present a polynomialtime computable and invertible onetoone mapping φ such that T = def φ(S) = {φ(x) : x ∈ S}. def
def
1. Let F = { f (x) : x ∈ {0, 1}∗ } and G = {g(x) : x ∈ {0, 1}∗ }. Using the lengthpreserving condition of f (resp., g), prove that F (resp., G) is a proper subset of {0, 1}∗ . Prove that for every y ∈ {0, 1}∗ there exists a unique triple ( j, x, i) ∈ {1, 2} × {0, 1}∗ × ({0} ∪ N) that satisfies one of the following two conditions: def (a) j = 1, x ∈ G = {0, 1}∗ \ G, and y = (g ◦ f )i (x); def (b) j = 2, x ∈ F = {0, 1}∗ \ F, and y = (g ◦ f )i (g(x)). (In both cases h 0 (z) = z, h i (z) = h(h i−1 (z)), and (g ◦ f )(z) = g( f (z)). Hint: consider the maximal sequence of inverse operations g −1 , f −1 , g −1 , ... that can be applied to y, and note that each inverse shrinks the current string.) def def 2. Let U1 = {(g ◦ f )i (x) : x ∈ G ∧ i ≥ 0} and U2 = {(g ◦ f )i (g(x)) : x ∈ F ∧ i ≥ 0}. Prove that (U1 , U2 ) is a partition of {0, 1}∗ . Using the fact that f and g are lengthincreasing and polynomialtime invertible, present a polynomialtime procedure for deciding membership in the set U1 . 34
When given a string that is not in the image of the mapping, the inverting algorithm returns a special symbol.
105
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
P, NP, AND NPCOMPLETENESS
Prove the same for the sets V1 = {( f ◦ g)i (x) : x ∈ F ∧ i ≥ 0} and V2 = {( f ◦ g)i ( f (x)) : x ∈ G ∧ i ≥ 0}. def def 3. Note that U2 ⊆ G, and define φ(x) = f (x) if x ∈ U1 and φ(x) = g −1 (x) otherwise. (a) Prove that φ is a Karpreduction of S to T . (b) Note that φ maps U1 to f (U1 ) = { f (x) : x ∈U1 } = V2 and U2 to g −1 (U2 ) = {g −1 (x) : x ∈U2 } = V1 . Prove that φ is onetoone and onto. Observe that φ −1 (x) = f −1 (x) if x ∈ f (U1 ) and φ −1 (x) = g(x) otherwise. Prove that φ −1 is a Karpreduction of T to S. Infer that φ(S) = T . Using Exercise 2.30, infer that all natural NPcomplete sets are isomorphic. Exercise 2.32: Prove that a set S is Karpreducible to some set in N P if and only if S is in N P. Guideline: For the nontrivial direction, see the proof of Proposition 2.34.
Exercise 2.33: Recall that the empty set is not Karpreducible to {0, 1}∗ , whereas any set is Cookreducible to its complement. Thus, our focus here is on the Karpreducibility of nontrivial sets to their complements, where a set is nontrivial if it is neither empty nor contains all strings. Furthermore, since any nontrivial set in P is Karpreducible to its complement (see Exercise 2.7), we assume that P = N P and focus on sets in N P \ P. 1. Prove that N P = coN P implies that some sets in N P \ P are Karpreducible to their complements. 2. Prove that N P = coN P implies that some sets in N P \ P are not Karpreducible to their complements. Guideline: Use NPcomplete sets in both parts, and Exercise 2.32 in the second part.
Exercise 2.34: Referring to the proof of Theorem 2.28, prove that the function f is unbounded (i.e., for every i there exists an n such that n 3 steps of the process defined in the proof allow for failing the i + 1st machine). Guideline: Note that f is monotonically nondecreasing (because more steps allow for failing at least as many machines). Assume toward the contradiction that f is bounded. Let i = supn∈N { f (n)} and n be the smallest integer such that f (n ) = i. If i is odd then the set F determined by f is cofinite (because F = {x : f (x) ≡ 1 (mod 2)} ⊇ {x : x ≥ n }). In this case, the i + 1st machine tries to decide S ∩ F (which differs from S on finitely many strings), and must fail on some x. Derive a contradiction by showing that the number of steps taken till reaching and considering this x is at most exp(poly(x)), which is smaller than n 3 for some sufficiently large n. A similar argument applies to the case that i is even, where we use the fact that F ⊆ {x : x < n } is finite and so the relevant reduction of S to S ∩ F must fail on some input x.
Exercise 2.35: Prove that if the promise problem is Cookreducible to a promise problem that is solvable in polynomial time, then is solvable in polynomial time. Note that the solver may not halt on inputs that violate the promise. Guideline: Any polynomialtime algorithm solving any promise problem can be modified such that it halts on all inputs.
106
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 2.36 (NPcomplete promise problems in coNP (following [72])): Consider the promise problem xSAT having instances that are pairs of CNF formulae. The yesinstances consists of pairs (φ1 , φ2 ) such that φ1 is satisfiable and φ2 is unsatisfiable, whereas the noinstances consists of pairs such that φ1 is unsatisfiable and φ2 is satisfiable. 1. Show that xSAT is in the intersection of (the promise problem classes that are analogous to) N P and coN P. 2. Prove that any promise problem in N P is Cookreducible to xSAT. In designing the reduction, recall that queries that violate the promise may be answered arbitrarily. Guideline: Note that the promise problem version of N P is reducible to SAT, and show a reduction of SAT to xSAT. Specifically, show that the search problem associated with SAT is Cookreducible to xSAT, by adapting the ideas of the proof of Proposition 2.15. That is, suppose that we know (or assume) that τ is a prefix of a satisfying assignment to φ, and we wish to extend τ by one bit. Then, for each σ ∈ {0, 1}, we construct a formula, denoted φσ , by setting the first τ  + 1 variables of φ according to the values τ σ . We query the oracle about the pair (φ1 , φ0 ), and extend τ accordingly (i.e., we extend τ by the value 1 if and only if the answer is positive). Note that if both φ1 and φ0 are satisfiable then it does not matter which bit we use in the extension, whereas if exactly one formula is satisfiable then the oracle answer is reliable.
3. Pinpoint the source of failure of the proof of Theorem 2.35 when applied to the reduction provided in the previous item. Exercise 2.37: For any class C, prove that C ⊆ coC if and only if C = coC.
107
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER THREE
Variations on P and NP
Cast a cold eye On life, on death. Horseman, pass by! W. B. Yeats, “Under Ben Bulben” In this chapter we consider variations on the complexity classes P and NP. We refer specifically to the nonuniform version of P, and to the Polynomialtime Hierarchy (which extends NP). These variations are motivated by relatively technical considerations; still, the resulting classes are referred to quite frequently in the literature. Summary: Nonuniform polynomialtime (P/poly) captures efficient computations that are carried out by devices that can each handle only inputs of a specific length. The basic formalism ignores the complexity of constructing such devices (i.e., a uniformity condition). A finer formalism that allows for quantifying the amount of nonuniformity refers to socalled “machines that take advice.” The Polynomialtime Hierarchy (PH) generalizes NP by considering statements expressed by quantified Boolean formulae with a fixed number of alternations of existential and universal quantifiers. It is widely believed that each quantifier alternation adds expressive power to the class of such formulae. An interesting result that refers to both classes asserts that if NP is contained in P/poly then the Polynomialtime Hierarchy collapses to its second level. This result is commonly interpreted as supporting the common belief that nonuniformity is irrelevant to the PvsNP Question; that is, although P/poly extends beyond the class P, it is believed that P/poly does not contain NP. Except for the latter result, which is presented in Section 3.2.3, the treatments of P/poly (in Section 3.1) and of the Polynomialtime Hierarchy (in Section 3.2) are independent of one another.
3.1. Nonuniform Polynomial Time (P/poly) In this section we consider two formulations of the notion of nonuniform polynomial time, based on the two models of nonuniform computing devices that were
108
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
3.1 NONUNIFORM POLYNOMIAL TIME
presented in Section 1.2.4. That is, we specialize the treatment of nonuniform computing devices, provided in Section 1.2.4, to the case of polynomially bounded complexities. It turns out that both (polynomially bounded) formulations allow for solving the same class of computational problems, which is a strict superset of the class of problems solvable by polynomialtime algorithms. The two models of nonuniform computing devices are Boolean circuits and “machines that take advice” (cf. §1.2.4.1 and §1.2.4.2, respectively). We will focus on the restriction of both models to the case of polynomial complexities, considering (nonuniform) polynomialsize circuits and polynomialtime algorithms that take (nonuniform) advice of polynomially bounded length. The main motivation for considering nonuniform polynomialsize circuits is that their computational limitations imply analogous limitations on polynomialtime algorithms. The hope is that, as is often the case in mathematics and science, disposing of an auxiliary condition (i.e., uniformity) that seems secondary1 and is not well understood may turn out to be fruitful. In particular, the (nonuniform) circuit model facilitates a lowlevel analysis of the evolution of a computation, and allows for the application of combinatorial techniques. The benefit of this approach has been demonstrated in the study of restricted classes of circuits (see Appendix B.2.2 and B.2.3). The main motivation for considering polynomialtime algorithms that take polynomially bounded advice is that such devices are useful in modeling auxiliary information that is available to possible efficient strategies that are of interest to us. We mention two such settings. In cryptography (see Appendix C), the advice is used for accounting for auxiliary information that is available to an adversary. In the context of derandomization (see Section 8.3), the advice is used for accounting for the main input to the randomized algorithm. In addition, the model of polynomialtime algorithms that take advice allows for a quantitative study of the amount of nonuniformity, ranging from zero to polynomial.
3.1.1. Boolean Circuits We refer the reader to §1.2.4.1 for a definition of (families of) Boolean circuits and the functions computed by them. For concreteness and simplicity, we assume throughout this section that all circuits have bounded fanin. We highlight the following result stated in §1.2.4.1: Theorem 3.1 (circuit evaluation): There exists a polynomialtime algorithm that, given a circuit C : {0, 1}n → {0, 1}m and an nbit long string x, returns C(x). Recall that the algorithm works by performing the “valuedetermination” process that underlies the definition of the computation of the circuit on a given input. This process assigns values to each of the circuit vertices based on the values of its children (or the values of the corresponding bit of the input, in the case of an inputterminal vertex). Circuit size as a complexity measure. We recall the definitions of circuit complexity presented in §1.2.4.1: The size of a circuit is defined as the number of edges, and the length of its description is almost linear in the latter; that is, a circuit of size s is 1
The common belief is that the issue of nonuniformity is irrelevant to the PvsNP Question, that is, that resolving the latter question by proving that P = N P is not easier than proving that NP does not have polynomialsize circuits. For further discussion see Appendix B.2 and Section 3.2.3.
109
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
commonly described by the list of its edges and the labels of its vertices, which means that its description length is O(s log s). We are interested in families of circuits that solve computational problems, and thus we say that the circuit family (Cn )n∈N computes the function f : {0, 1}∗ → {0, 1}∗ if for every x ∈ {0, 1}∗ it holds that Cx (x) = f (x). The size complexity of this family is the function s : N → N such that s(n) is the size of Cn . The circuit complexity of a function f , denoted s f , is the size complexity of the smallest family of circuits that computes f . An equivalent formulation follows. Definition 3.2 (circuit complexity): The circuit complexity of f : {0, 1}∗ → {0, 1}∗ is the function s f : N → N such that s f (n) is the size of the smallest circuit that computes the restriction of f to nbit strings. We stress that nonuniformity is implicit in this definition, because no conditions are made regarding the relation between the various circuits that are used to compute the function value on different input lengths. An interesting feature of Definition 3.2 is that, unlike in the case of uniform model of computations, it allows for considering the actual complexity of the function rather than an upper bound on its complexity (cf. §1.2.3.5 and Section 4.2.1). This is a consequence of the fact that the circuit model has no “free parameters” (such as various parameters of the possible algorithm that are used in the uniform model).2 We will be interested in the class of problems that are solvable by families of polynomialsize circuits. That is, a problem is solvable by polynomialsize circuits if it can be solved by a function f that has polynomial circuit complexity (i.e., there exists a polynomial p such that s f (n) ≤ p(n), for every n ∈ N). A detour: Uniform families. A family of polynomialsize circuits (Cn )n∈N is called uniform if given n one can construct the circuit Cn in poly(n)time. More generally: Definition 3.3 (uniformity): A family of circuits (Cn )n∈N is called uniform if there exists an algorithm that on input n outputs Cn within a number of steps that is polynomial in the size of Cn . We note that stronger notions of uniformity have been considered. For example, one may require the existence of a polynomialtime algorithm that on input n and v, returns the label of vertex v as well as the list of its children (or an indication that v is not a vertex in Cn ). For further discussion see Section 5.2.3. Turning back to Definition 3.3, we note that indeed the computation of a uniform family of circuits can be emulated by a uniform computing device. Proposition 3.4: If a problem is solvable by a uniform family of polynomialsize circuits then it is solvable by a polynomialtime algorithm. As was hinted in §1.2.4.1, the converse holds as well. The latter fact follows easily from the proof of Theorem 2.21 (see also the proof of Theorem 3.6). 2
Advanced comment: The “free parameters” in the uniform model include the length of the description of the finite algorithm and its alphabet size. Note that these “free parameters” underlie linear speedup results such as Exercise 4.4, which in turn prevent the specification of the exact (uniform) complexities of functions.
110
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
3.1 NONUNIFORM POLYNOMIAL TIME
Proof: On input x, the algorithm operates in two stages. In the first stage, it def invokes the algorithm guaranteed by the uniformity condition, on input n = x, and obtains the circuit Cn . Next, it invokes the circuit evaluation algorithm (asserted in Theorem 3.1) on input Cn and x, and obtains Cn (x). Since the size of Cn (as well as its description length) is polynomial in n, it follows that each stage of our algorithm runs in polynomial time (i.e., polynomial in n = x). Thus, the algorithm emulates the computation of Cx (x), and does so in time polynomial in the length of its own input (i.e., x).
3.1.2. Machines That Take Advice General (i.e., possibly nonuniform) families of polynomialsize circuits and uniform families of polynomialsize circuits are two extremes with respect to the “amounts of nonuniformity” in the computing device. Intuitively, in the former, nonuniformity is only bounded by the size of the device, whereas in the latter the amounts of nonuniformity is zero. Here we consider a model that allows for decoupling the size of the computing device from the amount of nonuniformity, which may indeed range from zero to the device’s size. Specifically, we consider algorithms that “take a nonuniform advice” that depends only on the input length. The amount of nonuniformity will be defined to equal the length of the corresponding advice (as a function of the input length). Thus, we specialize Definition 1.12 to the case of polynomialtime algorithms. Definition 3.5 (nonuniform polynomialtime and P/poly): We say that a function f is computed in polynomial time with advice of length : N → N if these exists a polynomialtime algorithm A and an infinite advice sequence (an )n∈N such that 1. For every x ∈ {0, 1}∗ , it holds that A(ax , x) = f (x). 2. For every n ∈ N, it holds that an  = (n). We say that a computational problem can be solved in polynomial time with advice of length if a function solving this problem can be computed within these resources. We denote by P/ the class of decision problems that can be solved in polynomial time with advice of length , and by P/poly the union of P/ p taken over all polynomials p. Clearly, P/0 = P. But allowing some (nonempty) advice increases the power of the class (see Theorem 3.7), and allowing advice of length comparable to the time complexity yields a formulation equivalent to circuit complexity (see Theorem 3.6). We highlight the greater flexibility available by the formalism of machines that take advice, which allows for separate specification of time complexity and advice length. (Indeed, this comes at the expense of a more cumbersome formulation; thus, we shall prefer the circuit formulation whenever we consider the case that both complexity measures are polynomial.) Relation to families of polynomialsize circuits. As hinted before, the class of problems solvable by polynomialtime algorithms with polynomially bounded advice equals the class of problems solvable by families of polynomialsize circuits. For concreteness, we state this fact for decision problems.
111
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
Theorem 3.6: A decision problem is in P/poly if and only if it can be solved by a family of polynomialsize circuits. More generally, for any function t, the following proof establishes the equivalence of the power of polynomialtime machines that take advice of length t and families of circuits of size polynomially related to t. Proof Sketch: Suppose that a problem can be solved by a polynomialtime algorithm A using the polynomially bounded advice sequence (an )n∈N . We obtain a family of polynomialsize circuits that solves the same problem by adapting the proof of Theorem 2.21. Specifically, we observe that the computation of A(ax , x) can be emulated by a circuit of poly(x)size, which incorporates ax and is given x as input. That is, we construct a circuit Cn such that Cn (x) = A(an , x) holds for every x ∈ {0, 1}n (analogously to the way C x was constructed in the proof of Theorem 2.21, where it holds that C x (y) = M R (x, y) for every y of adequate length).3 On the other hand, given a family of polynomialsize circuits, we obtain a polynomialtime advicetaking machine that emulates this family when using advice that provides the description of the relevant circuits. Specifically, we transform the evaluation algorithm asserted in Theorem 3.1 into a machine that, given advice α and input x, treats α as a description of a circuit C and evaluates C(x). Indeed, we use the fact that a circuit of size s can be described by a string of length O(s log s), where the log factor is due to the fact that a graph with v vertices and e edges can be described by a string of length 2e log2 v. Another perspective. A set S is called sparse if there exists a polynomial p such that for every n it holds that S ∩ {0, 1}n  ≤ p(n). We note that P/poly equals the class of sets that are Cookreducible to a sparse set (see Exercise 3.2). Thus, SAT is Cookreducible to a sparse set if and only if N P ⊂ P/poly. In contrast, SAT is Karpreducible to a sparse set if and only if N P = P (see Exercise 3.12). The power of P/poly. In continuation of Theorem 1.13 (which focuses on advice and ignores the time complexity of the machine that takes this advice), we prove the following (stronger) result. Theorem 3.7 (the power of advice, revisited): The class P/1 ⊆ P/poly contains P as well as some undecidable problems. Actually, P/1 ⊂ P/poly. Furthermore, by using a counting argument, one can show that for any two polynomially bounded functions 1 , 2 : N → N such that 2 − 1 > 0 is unbounded, it holds that P/1 is strictly contained in P/2 ; see Exercise 3.3. Proof: Clearly, P = P/0 ⊆ P/1 ⊆ P/poly. To prove that P/1 contains some undecidable problems, we review the proof of Theorem 1.13. The latter proof established the existence of an uncomputable Boolean function that only depends on its input length. That is, there exists an undecidable set S ⊂ {0, 1}∗ such that for every pair 3
Advanced comment: Note that an is the only “nonuniform” part in the circuit Cn . Thus, if algorithm A takes no advice (i.e., an = λ for every n) then we obtain a uniform family of circuits.
112
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
3.2 THE POLYNOMIALTIME HIERARCHY
(x, y) of equal length strings it holds that x ∈ S if and only if y ∈ S. In other words, for every x ∈ {0, 1}∗ it holds that x ∈ S if and only if 1x ∈ S. But such a set is easily decidable in polynomial time by a machine that takes one bit of advice; that is, consider the algorithm A that satisfies A(a, x) = a (for a ∈ {0, 1} and x ∈ {0, 1}∗ ) and the advice sequence (an )n∈N such that an = 1 if and only if 1n ∈ S. Note that, indeed, A(ax , x) = 1 if and only if x ∈ S.
3.2. The PolynomialTime Hierarchy (PH) The Polynomialtime Hierarchy is a rather natural generalization of N P. Interestingly, this generalization collapses to P if and only if N P = P, and furthermore it is the largest natural generalization of N P that is known to have this feature. We start with an informal motivating discussion, which will be made formal in Section 3.2.1. Sets in N P can be viewed as sets of valid assertions that can be expressed as quantified Boolean formulae using only existential quantifiers. That is, a set S is in N P if there is a Karpreduction of S to the problem of deciding whether or not an existentially quantified Boolean formula is valid (i.e., an instance x is mapped by this reduction to a formula of the form ∃y1 · · · ∃ym(x) φx (y1 , . . . , ym(x) )). The conjectured intractability of N P seems due to the long sequence of existential quantifiers. Of course, if somebody else (i.e., a “prover”) were to provide us with an adequate assignment (to the yi ’s) whenever such an assignment exists then we would be in good shape. That is, we can efficiently verify proofs of validity of existentially quantified Boolean formulae. But what if we want to verify the validity of universally quantified Boolean formulae (i.e., formulae of the form ∀y1 · · · ∀ym φ(y1 , . . . , ym )). Here we seem to need the help of a totally different entity: We need a “refuter” that is guaranteed to provide us with a refutation whenever such exists, and we need to believe that if we were not presented with such a refutation then it is the case that no refutation exists (and hence the universally quantified formula is valid). Indeed, this new setting (of a “refutation system”) is fundamentally different from the setting of a proof system: In a proof system we are only convinced by proofs (to assertions) that we have verified by ourselves, whereas in the “refutation system” we trust the “refuter” to provide evidence against false assertions.4 Furthermore, there seems to be no way of converting one setting (e.g., the proof system) into another (resp., the refutation system). Taking an additional step, we may consider a more complicated system in which we use two agents: a “supporter” that tries to provide evidence in favor of an assertion and an “objector” that tries to refute it. These two agents conduct a debate (or an argument) in our presence, exchanging messages with the goal of making us (the referee) rule their way. The assertions that can be proven in this system take the form of general quantified formulae with alternating sequences of quantifiers, where the number of alternating sequences equals the number of rounds of interaction in the said system. We stress that the exact length of each sequence of quantifiers of the same type does not matter; what matters is the number of alternating sequences, denoted k. 4
More formally, in proof systems the soundness condition relies only on the actions of the verifier, whereas completeness also relies on the prover’s action (i.e., its using an adequate strategy). In contrast, in a “refutation system” the soundness condition relies on the proper actions of the refuter, whereas completeness does not depend on the refuter’s actions.
113
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
The aforementioned system of alternations can be viewed as a twoparty game, and we may ask ourselves which of the two parties has a kmove winning strategy. In general, we may consider any (01 zerosum) twoparty game, in which the game’s position can be efficiently updated (by any given move) and efficiently evaluated. For such a fixed game, given an initial position, we may ask whether the first party has a (kmove) winning strategy. It seems that answering this type of question for some fixed k does not necessarily allow answering it for k + 1. We now turn to formalizing the foregoing discussion.
3.2.1. Alternation of Quantifiers In the following definition, the aforementioned propositional formula φx is replaced by the input x itself. (Correspondingly, the combination of the Karpreduction and a formulaevaluation algorithm is replaced by the verification algorithm V (see Exercise 3.7).) This is done in order to make the comparison to the definition of N P more transparent (as well as to fit the standard presentations). We also replace a sequence of Boolean quantifiers of the same type by a single corresponding quantifier that quantifies over all strings of the corresponding length. Definition 3.8 (the class k ): For a natural number k, a decision problem S ⊆ {0, 1}∗ is in k if there exists a polynomial p and a polynomialtime algorithm V such that x ∈ S if and only if ∃y1 ∈ {0, 1} p(x) ∀y2 ∈ {0, 1} p(x) ∃y3 ∈ {0, 1} p(x) · · · Q k yk ∈ {0, 1} p(x) s.t. V (x, y1 , . . . , yk ) = 1 where Q k is an existential quantifier if k is odd and is a universal quantifier otherwise. Note that 1 = N P and 0 = P. The Polynomialtime Hierarchy, denoted PH, is the union of all the aforementioned classes (i.e., PH = ∪k k ), and k is often referred to as the k th level of PH. The levels of the Polynomialtime Hierarchy can also be defined def def inductively, by defining k+1 based on k = cok , where cok = {{0, 1}∗ \ S : S ∈ k } (cf. Eq. (2.4)). Proposition 3.9: For every k ≥ 0, a set S is in k+1 if and only if there exists a polynomial p and a set S ∈ k such that S = {x : ∃y ∈ {0, 1} p(x) s.t. (x, y) ∈ S }. Proof: Suppose that S is in k+1 and let p and V be as in Definition 3.8. Then define S as the set of pairs (x, y) such that y = p(x) and ∀z 1 ∈ {0, 1} p(x) ∃z 2 ∈ {0, 1} p(x) · · · Q k z k ∈ {0, 1} p(x) s.t. V (x, y, z 1 , . . . , z k ) = 1 . Note that x ∈ S if and only if there exists y ∈ {0, 1} p(x) such that (x, y) ∈ S , and that S ∈ k (see Exercise 3.6). On the other hand, suppose that for some polynomial p and a set S ∈ k it holds that S = {x : ∃y ∈ {0, 1} p(x) s.t. (x, y) ∈ S }. Then, for some p and V , it holds that (x, y) ∈ S if and only if y = p(x) and
∀z 1 ∈ {0, 1} p (x) ∃z 2 ∈ {0, 1} p (x) · · · Q k z k ∈ {0, 1} p (x) s.t. V ((x, y), z 1 , . . . , z k ) = 1
114
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
3.2. THE POLYNOMIALTIME HIERARCHY
(see Exercise 3.6 again). By using a suitable encoding of y and the z i ’s (as strings of length max( p(x), p (x))) and a trivial modification of V , we conclude that S ∈ k+1 . Determining the winner in kmove games. Definition 3.8 can be interpreted as capturing the complexity of determining the winner in certain efficient twoparty games. Specifically, we refer to twoparty games that satisfy the following three conditions: 1. The parties alternate in taking moves that affect the game’s (global) position, where each move has a description length that is bounded by a polynomial in the length of the current position. 2. The current position can be updated in polynomial time based on the previous position and the current party’s move.5 3. The winner in each position can be determined in polynomial time. Note that the set of initial positions for which the first party has a kmove winning strategy with respect to the foregoing game is in k . Specifically, denoting this set by G, note that an initial position x is in G if there exists a move y1 for the first party, such that for every response move y2 of the second party, there exists a move y3 for the first party, etc., such that after k moves the parties reach a position in which the first party wins, where the final position is determined according to the foregoing Item 2 and the winner in it is determined according to Item 3.6 Thus, G ∈ k . On the other hand, note that any set S ∈ k can be viewed as the set of initial positions (in a suitable game) for which the first party has a kmove winning strategy. Specifically, x ∈ S if starting at the initial position x, there exists a move y1 for the first party, such that for every response move y2 of the second party, there exists a move y3 for the first party, etc., such that after k moves the parties reach a position in which the first party wins, where the final position is defined as (x, y1 , . . . , yk ) and the winner is determined by the predicate V (as in Definition 3.8). PH and the P Versus NP Question. We highlight the fact that PH = P if and only if P = N P. Indeed, the fact that PH = P implies P = N P is purely syntactic, whereas the opposite implication follows from Proposition 3.9 (see also the second part of the proof of Proposition 3.10).7 The fact that P = N P implies PH = P suggests that P = N P can be proved by proving that PH = P. Thus, a separation between two classes (i.e., P = N P) can be shown by separating the smaller class (i.e., P) from a class (i.e., PH) that is believed to be a superset of the other class (i.e., N P). 5
Note that, since we consider a constant number of moves, the length of all possible final positions is bounded by a polynomial in the length of the initial position, and thus all items have an equivalent form in which one refers to the complexity as a function of the length of the initial position. The latter form allows for a smooth generalization to games with a polynomial number of moves (as in Section 5.4), where it is essential to state all complexities in terms of the length of the initial position. 6 Let U be the update algorithm of Item 2 and W be the algorithm that decides the winner as in Item 3. Then the final position is given by computing xi ← U (xi−1 , yi ), for i = 1, . . . , k (where x0 = x), and the winner is W (xk ). Note that, by Item 1, there exists a polynomial p such that yi  ≤ p(xi ), for every i ∈ [k], and it follows that yi  ≤ poly(x). Using a suitable encoding, we obtain a polynomialtime algorithm V such that V (x, y1 , . . . , yk ) = W (xk ), where xk = U (· · · U (U (U (x, y1 ), y2 ), y3 ) . . . , yk ). 7 Advanced comment: We stress that the latter implication is not due to a Cookreduction of PH to N P; in fact, such Cookreductions exist only for a subclass of PH (which is contained in 2 ∩ 2 ).
115
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
The collapsing effect of other equalities. Extending the intuition that underlies the N P = coN P conjecture, it is commonly conjectured that k = k for every k ∈ N. The failure of this conjecture causes the collapse of the Polynomialtime Hierarchy to the corresponding level. Proposition 3.10: For every k ≥ 1, if k = k then k+1 = k , which in turn implies PH = k . The converse also holds (i.e., PH = k implies k+1 = k and k = k ). Needless to say, the first part of Proposition 3.10 (i.e., k = k implies k+1 = k ) does not seem to hold for k = 0, but indeed the second part holds also for k = 0 (i.e., 1 = 0 implies PH = 0 ). Proof: Assuming that k = k , we first show that k+1 = k . For any set S in k+1 , by Proposition 3.9, there exists a polynomial p and a set S ∈ k such that S = {x : ∃y ∈ {0, 1} p(x) s.t. (x, y) ∈ S }. Using the hypothesis, we infer that S ∈ k , and so (using Proposition 3.9 and k ≥ 1) there exists a polynomial p and a set S ∈ k−1 such that S = {x : ∃y ∈ {0, 1} p (x ) s.t. (x , y ) ∈ S }. It follows that
S = {x : ∃y ∈ {0, 1} p(x) ∃z ∈ {0, 1} p ((x,y)) s.t. ((x, y), z) ∈ S }. By collapsing the two adjacent existential quantifiers (and using Proposition 3.9 yet again), we conclude that S ∈ k . This proves the first part of the proposition. Turning to the second part, we note that k+1 = k (or, equivalently, k+1 = k ) implies k+2 = k+1 (again by using Proposition 3.9), and similarly j+2 = j+1 for any j ≥ k. Thus, k+1 = k implies PH = k . Decision problems that are Cookreductions to NP. The Polynomialtime Hierarchy contains all decision problems that are Cookreductions to N P (see Exercise 3.4). As shown next, the latter class contains many natural problems. Recall that in Section 2.2.2 we defined two types of optimization problems and showed that under some natural conditions these two types are computationally equivalent (under Cookreductions). Specifically, one type of problems referred to finding solutions that have a value exceeding some given threshold, whereas the second type called for finding optimal solutions. In Section 2.3 we presented several problems of the first type, and proved that they are NPcomplete. We note that corresponding versions of the second type are believed not to be in NP. For example, we discussed the problem of deciding whether or not a given graph G has a clique of a given size K , and showed that it is NPcomplete. In contract, the problem of deciding whether or not K is the maximum clique size of the graph G is not known (and quite unlikely) to be in N P, although it is Cookreducible to N P. Thus, the class of decision problems that are Cookreducible to N P contains many natural problems that are unlikely to be in N P. The Polynomialtime Hierarchy contains all these problems. Complete problems and a relation to AC0. We note that quantified Boolean formulae with a bounded number of quantifier alternations provide complete problems for the various levels of the Polynomialtime Hierarchy (see Exercise 3.7). We also note the correspondence between these formulae and (highly uniform) constantdepth circuits
116
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
3.2. THE POLYNOMIALTIME HIERARCHY
of unbounded fanin that get as input the truth table of the underlying (quantifierfree) formula (see Exercise 3.8).
3.2.2. Nondeterministic Oracle Machines The Polynomialtime Hierarchy is commonly defined in terms of nondeterministic polynomialtime (oracle) machines that are given oracle access to a set in the lower level of the same hierarchy. Such machines are defined by combining the definitions of nondeterministic (polynomialtime) machines (cf. Definition 2.7) and oracle machines (cf. Definition 1.11). Specifically, for an oracle f : {0, 1}∗ → {0, 1}∗ , a nondeterministic oracle machine M, and a string x, one considers the question of whether or not there exists an accepting (nondeterministic) computation of M on input x and access to the oracle f . The class of sets that can be accepted by nondeterministic polynomialtime (oracle) machines with access to f is denoted N P f . (We note that this notation makes sense because we can associate the class N P with a collection of machines that lends itself to being extended to oracle machines.) For any class of decision problems C, we denote by N P C the union of N P f taken over all decision problems f in C. The following result provides an alternative definition of the Polynomialtime Hierarchy. Proposition 3.11: For every k ≥ 1, it holds that k+1 = N P k . Needless to say, 1 = N P 0 , but this fact is due to simple considerations (i.e., 1 = N P = N P P = N P 0 , where only N P = N P P is nonsyntactic). Proof: Containment in one direction (i.e., k+1 ⊆ N P k ) is almost straightforward: For any S ∈ k+1 , let S ∈ k and p be as in Proposition 3.9; that is, S = {x : ∃y ∈ {0, 1} p(x) s.t. (x, y) ∈ S }. Consider the nondeterministic oracle machine that, on input x, nondeterministically generates y ∈ {0, 1} p(x) and accepts if and only if (the oracle indicates that) (x, y) ∈ S . This machine demonstrates that S ∈ N P k = N P k , where the equality holds by letting the oracle machine flip each (binary) answer that is provided by the oracle.8 For the opposite containment (i.e., N P k ⊆ k+1 ), we generalize the main idea underlying the proof of Theorem 2.35 (which referred to P N P∩coN P ). Specifically, consider any S ∈ N P k , and let M be a nondeterministic polynomialtime oracle machine that accepts S when given oracle access to S ∈ k . Note that machine M may issue several queries to S , and these queries may be determined based on previous oracle answers.9 To simplify the argument, we assume, without loss of generality, that at the very beginning of its execution machine M guesses (nondeterministically) all oracle answers and accepts only if the actual answers match its guesses. Thus, M’s queries to the oracle are determined by its input, denoted x, and its nondeterministic choices, denoted y. We denote by q (i) (x, y) the i th query made by M (on input x and nondeterministic choices y), and by a (i) (x, y) 8
Do not get confused by the fact that the class of oracles may not be closed under complementation. From the point of view of the oracle machine, the oracle is merely a function, and the machine may do with its answer whatever it pleases (and in particular negate it). 9 Indeed, this is unlike the specific machine used toward proving that k+1 ⊆ N P k .
117
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
the corresponding (a priori) guessed answer (which is a bit in y). Thus, x ∈ S if and only if there exists y ∈ {0, 1}poly(x) such that the following two conditions hold: 1. Machine M accepts when it is invoked on input x, makes nondeterministic choices y, and is given a (i) (x, y) as the answer to its i th oracle query. We denote the corresponding (“acceptance”) predicate, which is polynomialtime computable, by A(x, y). We stress that we do not assume here that the a (i) (x, y)’s are consistent with answers that would have been given by the oracle S ; this will be the subject of the next condition. The current condition refers only to the decision of M on a specific input, when M makes a specific sequence of nondeterministic choices, and is provided with specific answers. 2. Each bit a (i) (x, y) is consistent with S ; that is, for every i, it holds that a (i) (x, y) = 1 if and only if q (i) (x, y) ∈ S . Denoting the number of queries made by M (on input x and nondeterministic choices y) by q(x, y) ≤ poly(x), it follows that x ∈ S if and only if q(x,y)
∃y A(x, y) ∧ a (i) (x, y) = 1 ⇔ q (i) (x, y) ∈ S (3.1) i=1
.
···
(i) Q k yk
Denoting the verification algorithm of S by V , Eq. (3.1) equals q(x,y) a (i) (x, y) = 1 ∃y A(x, y) ∧ i=1
⇔
(i) (i) ∃y1 ∀y2
(i)
V q (x, y),
(i) (i) y1 , . . . , yk = 1
.
The proof is completed by observing that the foregoing expression can be rearranged to fit the definition of k+1 . Details follow. Starting with the foregoing expression, we first replace the subexpression E 1 ⇔ E 2 by (E 1 ∧ E 2 ) ∨ (¬E 1 ∧ ¬E 2 ), and then pull all quantifiers outside.10 This way we obtain a quantified expression with k + 1 alternating quantifiers, starting with an existential quantifier. (Note that we get k + 1 alternating quantifiers rather than k, because the case of ¬a (i) (x, y) = 1 introduces an expression of the form (i) (i) (i) (i) (i) ¬∃y1 ∀y2 · · · Q k yk V (q (i) (x, y), y1 , . . . , yk ) = 1, which in turn is equivalent to (i) (i) (i) (i) (i) the expression ∀y1 ∃y2 · · · Q k yk ¬V (q (i) (x, y), y1 , . . . , yk ) = 1.) Once this is done, we may incorporate the computation of all the q (i) (x, y)’s (and a (i) (x, y)’s) as well as the polynomial number of invocations of V (and other logical operations) into the new verification algorithm V . It follows that S ∈ k+1 . A general perspective – what does C1C2 mean? By the foregoing discussion it should be clear that the class C1C2 can be defined for two complexity classes C1 and C2 , provided that 10
For example, note that for predicates P1 and P2 , the expression ∃y (P1 (y) ⇔ ∃z P2 (y, z)) is equivalent to the expression ∃y ((P1 (y) ∧ ∃z P2 (y, z)) ∨ (¬P1 (y) ∧ ¬∃z P2 (y, z))), which in turn is equivalent to the expression ∃y∃z ∀z ((P1 (y) ∧ P2 (y, z )) ∨ (¬P1 (y) ∧ ¬P2 (y, z ))). Note that pulling the quantifiers outside in t t ∃y (i) ∀z (i) P(y (i) , z (i) ) yields an expression of the type ∃y (1) , . . . , y (t) ∀z (1) , . . . , z (t) ∧i=1 P(y (i) , z (i) ). ∧i=1
118
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
3.2. THE POLYNOMIALTIME HIERARCHY
Πk+1 Πk
PΣk
Σk
Σk+1 Figure 3.1: Two levels of the Polynomialtime Hierarchy.
C1 is associated with a class of standard machines that generalizes naturally to a class of oracle machines. Actually, the class C1C2 is not defined based on the class C1 but rather by analogy to it. Specifically, suppose that C1 is the class of sets that are recognizable (or rather accepted) by machines of a certain type (e.g., deterministic or nondeterministic) with certain resource bounds (e.g., time and/or space bounds). Then, we consider analogous oracle machines (i.e., of the same type and with the same resource bounds), and say that S ∈ C1C2 if there exists an adequate oracle machine M1 (i.e., of this type and resource bounds) and a set S2 ∈ C2 such that M1S2 accepts the set S. Decision problems that are Cookreductions to NP, revisited. Using the foregoing notation, the class of decision problems that are Cookreductions to N P is denoted P N P , and thus is a subset of N P N P = 2 (see Exercise 3.9). In contrast, recall that the class of decision problems that are Karpreductions to N P equals N P. The world view. Using the foregoing notation and relying on Exercise 3.9, we note that for every k ≥ 1 it holds that k ∪ k ⊆ P k ⊆ k+1 ∩ k+1 . See Figure 3.1 that depicts the situation, assuming that all the containments are strict.
3.2.3. The P/poly Versus NP Question and PH As stated in Section 3.1, a main motivation for the definition of P/poly is the hope that it can serve to separate P from N P (by showing that N P is not even contained in P/poly, which is a (strict) superset of P). In light of the fact that P/poly extends far beyond P (and in particular contains undecidable problems), one may wonder if this approach does not run the risk of asking too much (because it may be that N P is in P/poly even if P = N P). The common feeling is that the added power of nonuniformity is irrelevant with respect to the PvsNP Question. Ideally, we would like to know that N P ⊂ P/poly may occur only if P = N P, which may be phrased as saying that the Polynomialtime Hierarchy collapses to its zero level. The following result seems to get close to such an implication, showing that N P ⊂ P/poly may occur only if the Polynomialtime Hierarchy collapses to its second level.
119
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
Theorem 3.12: If N P ⊂ P/poly then 2 = 2 . Recall that 2 = 2 implies PH = 2 (see Proposition 3.10). Thus, an unexpected behavior of the nonuniform complexity class P/poly implies an unexpected behavior in the world of uniform complexity (which is the habitat of PH). Proof: Using the hypothesis (i.e., N P ⊂ P/poly) and starting with an arbitrary set S ∈ 2 , we shall show that S ∈ 2 . Let us describe, first, our highlevel approach. Loosely speaking, S ∈ 2 means that x ∈ S if and only if for all y there exists a z such that some (fixed) polynomialtime verifiable condition regarding (x, y, z) holds. Note that the residual condition regarding (x, y) is of the NPtype, and thus (by the hypothesis) it can be verified by a polynomialsize circuit. This suggests saying that x ∈ S if and only if there exists an adequate circuit C such that for all y it holds that C(x, y) = 1. Thus, we managed to switch the order of the universal and existential quantifiers. Specifically, the resulting assertion is of the desired 2 type provided that we can either verify the adequacy condition in coN P (or even in 2 ) or keep out of trouble even in the case that x ∈ S and C is inadequate. In the following proof we implement the latter option by observing that the hypothesis yields small circuits for NPsearch problems (and not only for NPdecision problems). Specifically, we obtain (small) circuits that, given (x, y), find an NPwitness for (x, y) (whenever such a witness exists), and rely on the fact that we can efficiently verify the correctness of NPwitnesses. (The alternative approach of providing a coNPtype procedure for verifying the adequacy of the circuit is pursued in Exercise 3.11.) We now turn to a detailed implementation of the foregoing approach. Let S be an arbitrary set in 2 . Then, by Proposition 3.9, there exists a polynomial p and a set S ∈ N P such that S = {x : ∀y ∈ {0, 1} p(x) (x, y) ∈ S }. Let R ∈ PC be the witness relation corresponding to S ; that is, there exists a polynomial p , such that x = x, y ∈ S if and only if there exists z ∈ {0, 1} p (x ) such that (x , z) ∈ R . It follows that
S = {x : ∀y ∈ {0, 1} p(x) ∃z ∈ {0, 1} p (x,y) (x, y, z) ∈ R }.
(3.2)
Our argument proceeds essentially as follows. By the reduction of PC to N P (see Theorem 2.10), the theorem’s hypothesis (i.e., N P ⊆ P/poly) implies the existence of polynomialsize circuits for solving the search problem of R . Using the existence of these circuits, it follows that for any x ∈ S there exists a small circuit C such that for every y it holds that C (x, y) ∈ R (x, y) (because x, y ∈ S and hence R (x, y) = ∅). On the other hand, for any x ∈ S there exists a y such that x, y ∈ S , and hence for any circuit C it holds that C (x, y) ∈ R (x, y) (for the trivial reason that R (x, y) = ∅). Thus, x ∈ S if and only if there exists a poly(x + p(x))size circuit C such that for all y ∈ {0, 1} p(x) it holds that (x, y, C (x, y)) ∈ R . Letting V (x, C , y) = 1 if and only if (x, y, C (x, y)) ∈ R , we infer that S ∈ 2 . Details follow. Let us first spell out what we mean by polynomialsize circuits for solving a search problem and further justify their existence for the search problem of R . In Section 3.1, we have focused on polynomialsize circuits that solve decision problems. However, the definition sketched in Section 3.1.1 also applies to solving search problems, provided that an appropriate convention is used for encoding
120
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
solutions of possibly varying lengths (for instances of fixed length) as strings of fixed length. Next, observe that combining the Cookreduction of PC to N P with the hypothesis N P ⊆ P/poly implies that PC is Cookreducible to P/poly. In particular, this implies that any search problem in PC can be solved by a family of polynomialsize circuits. Note that the resulting circuit that solves nbit long instances of such a problem may incorporate polynomially (in n) many circuits, each solving a decision problem for mbit long instances, where m ∈ [poly(n)]. Needless to say, the size of the resulting circuit that solves the search problem of the aforementioned R ∈ PC (for instances of length n) is upperbounded by poly(n) poly(n) · m=1 poly(m). We next (revisit and) establish the claim that x ∈ S if and only if there exists a poly(x + p(x))size circuit C such that for all y ∈ {0, 1} p(x) it holds that (x, y, C (x, y)) ∈ R . Recall that x ∈ S if and only if for every y ∈ {0, 1} p(x) it holds that (x, y) ∈ S , which means that there exists z ∈ {0, 1} p (x) such that (x, y, z) ∈ R . Also recall that (by the foregoing discussion) there exist polynomialsize circuits for solving the search problem of R . Thus, in the case that x ∈ S, we just use the corresponding circuit C that solves the search problem of R on inputs of length x + p(x). Indeed, this circuit C only depends on n = x + p(x), which in turn is determined by x, and for every x ∈ {0, 1}n it holds that (x , C (x )) ∈ R if and only if x ∈ S . Thus, for x ∈ S, there exists a poly(x + p(x))size circuit C such that for every y ∈ {0, 1} p(x) it holds that (x, y, C (x, y)) ∈ R . On the other hand, if x ∈ S then there exists a y such that for all z it holds that (x, y, z) ∈ R . It follows that, in this case, for every C there exists a y such that (x, y, C (x, y)) ∈ R . We conclude that x ∈ S if and only if ∃C ∈ {0, 1}poly(x+ p(x)) ∀y ∈ {0, 1} p(x) (x, y, C (x, y)) ∈ R .
(3.3)
The key observation regarding the condition stated in Equation (3.3) is that it is of the desired form (of a 2 statement). Specifically, consider the polynomialtime verification procedure V that given x, y and the description of the circuit C , first computes z ← C (x, y) and accepts if and only if (x, y, z) ∈ R , where the latter condition can be verified in polynomial time (because R ∈ PC). Denoting the description of a potential circuit by C , the aforementioned (polynomialtime) computation of V is denoted V (x, C , y), and indeed x ∈ S if and only if ∃C ∈ {0, 1}poly(x+ p(x)) ∀y ∈ {0, 1} p(x) V (x, C , y) = 1. Having established that S ∈ 2 for an arbitrary S ∈ 2 , we conclude that 2 ⊆ 2 . The theorem follows (by applying Exercise 3.9.4).
Chapter Notes The class P/poly was defined by Karp and Lipton [139] as part of a general formulation of “machines that take advice” [139]. They also noted the equivalence to the traditional formulation of polynomialsize circuits as well as the effect of uniformity (Proposition 3.4). The PolynomialTime Hierarchy (PH) was introduced by Stockmeyer [213]. A third equivalent formulation of PH (via socalled alternating machines) can be found in [52].
121
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
The implication of the failure of the conjecture that N P is not contained in P/poly on the Polynomialtime Hierarchy (i.e., Theorem 3.12) was discovered by Karp and Lipton [139]. This interesting connection between nonuniform and uniform complexity provides the main motivation for presenting P/poly and PH in the same chapter.
Exercises Exercise 3.1 (a small variation on the definitions of P/poly): Using an adequate encoding of strings of length smaller than n as nbit strings (e.g., x ∈ ∪i 1, consider first an analogous complete problem for k , and then consider its complement.
Exercise 3.8 (on the relation between PH and AC 0 ): Note that there is an obvious analogy between PH and constantdepth circuits of unbounded fanin, where existential (resp., universal) quantifiers are represented by “large” (resp., ) gates. To articulate this relationship, consider the following definitions. • A family of circuits {C N } is called highly uniform if there exists a polynomialtime algorithm that answers local queries regarding the structure of the relevant circuit. Specifically, on input (N , u, v), the algorithm determines the type of gates represented by the vertices u and v in C N as well as whether there exists a directed edge from u to v. If the vertex represents a terminal then the algorithm also indicates the index of the corresponding inputbit (or outputbit). Note that this algorithm operates in time that is polylogarithmic in the size of C N . We focus on the family of polynomialsize circuits, meaning that the size of C N is polynomial in N , which in turn represents the number of inputs to C N . • Fixing a polynomial p, a psuccinctly represented input Z ∈ {0, 1} N is a circuit c Z of size at most p(log2 N ) such that for every i ∈ [N ] it holds that c Z (i) equals the i th bit of Z . • For a fixed family of highly uniform circuits {C N } and a fixed polynomial p, the problem of evaluating a succinctly represented input is defined as follows. Given psuccinct representation of an input Z ∈ {0, 1} N , determine whether or not C N (Z ) = 1. Prove the following relationship between PH and the problem of evaluating a succinctly represented input with respect to some families of highly uniform circuits of bounded depth. 1. For every k and every S ∈ k , show that there exists a family of highly uniform unbounded fanin circuits of depth k and polynomial size such that S is Karpreducible to evaluating a succinctly represented input (with respect to that family of circuits). That is, the reduction should map an instance x ∈ {0, 1}n to a psuccinct representation of some Z ∈ {0, 1} N such that x ∈ S if and only if C N (Z ) = 1. (Note
123
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
that Z is represented by a circuit c Z such that log2 N ≤ c Z  ≤ poly(n), and thus N ≤ exp(poly(n)).)11 Guideline: Let S ∈ k and let V be the corresponding verification algorithm as in Definition 3.8. That is, x ∈ S if and only if ∃y1 ∀y2 · · · Q k yk , where each yi ∈ {0, 1}poly(x) such k·m that V (x, y , consider the fixed circuit 1 , . . . , y k ) = 1. Then, for m = poly(x) and N = 2 C N (Z ) = i1 ∈[2m ] i2 ∈[2m ] · · · Q ik ∈[2m ] Z i1 ,i2 ,...,ik , and the problem of evaluating C N at an input consisting of the truth table of V (x, · · ·) (i.e., when setting Z i1 ,i2 ,...,ik = V (x, i 1 , . . . , i k ), where [2m ] ≡ {0, 1}m , which means that Z is essentially represented by x).12 Note that the size of C N is O(N ).
2. For every k and every fixed family of highly uniform unbounded fanin circuits of depth k and polynomial size, show that the corresponding problem of evaluating a succinctly represented input is either in k or in k . Guideline: Given a succinct representation of Z , the value of C N (Z ) can be captured by a quantified Boolean formula with k alternating quantifier sequences. This formula quantifies on certain paths from the output of C N to its input terminals; for example, an ∨gate (resp., ∧gate) evaluates to 1 if and only if one (resp., all) of its children evaluates to 1. The children of a vertex as well as the corresponding inputbits can be efficiently recognized based on the uniformity condition regarding C N . The value of the inputbit itself can be efficiently computed from the succinct representation of Z .
Exercise 3.9: Verify the following facts: 1. For every k ≥ 0, it holds that k ⊆ P k ⊆ k+1 . (Recall that, for any complexity class C, the class P C denotes the class of sets that are Cookreducible to some set in C. In particular, P P = P.) 2. For every k ≥ 0, k ⊆ P k ⊆ k+1 . (Hint: For any complexity class C, it holds that P C = P coC and P C = coP C .) 3. For every k ≥ 0, it holds that k ⊆ k+1 and k ⊆ k+1 . Thus, PH = ∪k k . 4. For every k ≥ 0, if k ⊆ k (resp., k ⊆ k ) then k = k . (Hint: See Exercise 2.37.) Exercise 3.10: In continuation of Exercise 3.7, prove the following claims: 1. SAT is computationally equivalent (under Karpreductions) to 1QBF. 2. For every k ≥ 1, it holds that P k = P kQBF and k+1 = N P kQBF . Guideline: Prove that if S is Ccomplete then P C = P S . Note that P C ⊆ P S uses the polynomialtime reductions of C to S, whereas P S ⊆ P C uses S ∈ C. 11
Assuming P = N P, it cannot be that N ≤ poly(n) (because circuit evaluation can be performed in time polynomial in the size of the circuit). 12 Advanced comment: Note that the computational limitations of AC 0 circuits (see, e.g., [83, 115]) imply limitations on the functions of a generic input Z that the aforementioned circuits C N can compute. More importantly, (1) is generic and each bit of Z equals either some fixed these limitations apply also to Z = h(Z ), where Z ∈ {0, 1} N bit in Z or its negation. Unfortunately, these computational limitations do not seem to provide useful information on the limitations of functions of inputs Z that have succinct representation (as obtained by setting Z i1 ,i2 ,...,ik = V (x, i 1 , . . . , i k ), where V is a fixed polynomialtime algorithm and only x ∈ {0, 1}poly(log N ) varies). This fundamental problem is “resolved” in the context of “relativization” by providing V with oracle access to an arbitrary input of length N (1) (or so); cf. [83].
124
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 3.11 (an alternative proof of Theorem 3.12): In continuation of the discussion in the proof of Theorem 3.12, use the following guidelines to provide an alternative proof of Theorem 3.12. 1. First, prove that if T is downward selfreducible (as defined in Exercise 2.13) then the correctness of circuits deciding T can be decided in coN P. Specifically, denoting by χ the characteristic function of T , show that the set def
cktχ = {(1n , C) : ∀w ∈ {0, 1}n C(w) = χ(w)} is in coN P. Note that you may assume nothing about T , except for the hypothesis that T is downward selfreducible. Guideline: Using the more flexible formulation suggested in Exercise 3.1, it suffices to verify that, for every i < n and every ibit string w, the value C(w) equals the output of the downward selfreduction on input w when obtaining answers according to C. Thus, for every i < n, the correctness of C on inputs of length i follows from its correctness on inputs of length less than i. Needless to say, the correctness of C on the empty string (or on all inputs of some constant length) can be verified by comparison to the fixed value of χ on the empty string (resp., the values of χ on a constant number of strings).
2. Recalling that SAT is downward selfreducible and that N P is Karpreducible to SAT, derive Theorem 3.12 as a corollary of Part 1. Guideline: Let S ∈ 2 and S ∈ N P be as in the proof of Theorem 3.12. Letting f denote a Karpreduction of S to SAT, note that S = {x : ∀y ∈ {0, 1} p(x) f (x, y) ∈ SAT}. Using the hypothesis that SAT has polynomialsize circuits, note that x ∈ S if and only if there exists a poly(x)size circuit C such that (1) C decides SAT correctly on every input of length at most poly(x), and (2) for every y ∈ {0, 1} p(x) it holds that C( f (x, y)) = 1. Infer that S ∈ 2 .
Exercise 3.12: In continuation of Part 2 of Exercise 3.2, we consider the class of sets that are Karpreducible to a sparse set. It can be proven that this class contains SAT if and only if P = N P (see [81]). Here, we consider only the special case in which the sparse set is contained in a polynomialtime decidable set that is itself sparse (e.g., the latter set may be {1}∗ , in which case the former set may be an arbitrary unary set). Actually, prove the following seemingly stronger claim: If SAT is Karpreducible to a set S ⊆ G such that G ∈ P and G \ S is sparse then SAT ∈ P. Using the hypothesis, we outline a polynomialtime procedure for solving the search problem of SAT, and leave the task of providing the details as an exercise. The procedure conducts a DFS on the tree of all possible partial truth assignments to the input formula,13 while truncating the search at nodes that correspond to partial truth assignments that were already demonstrated to be useless. Guideline: The key observation is that each internal node (which yields a formula derived from the initial formulae by instantiating the corresponding partial truth assignment) is mapped by the Karpreduction either to a string not in G (in which case we conclude that the subtree contains no satisfying assignments and backtrack from 13
For an nvariable formulae, the leaves of the tree correspond to all possible nbit long strings, and an internal node corresponding to τ is the parent of the nodes corresponding to τ 0 and τ 1.
125
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
VARIATIONS ON P AND NP
this node) or to a string in G. In the latter case, unless we already know that this string is not in S, we start a scan of the subtree rooted at this node. However, once we backtrack from this internal node, we know that the corresponding element of G is not in S, and we will never scan again a subtree rooted at a node that is mapped to this element. Also note that once we reach a leaf, we can check by ourselves whether or not it corresponds to a satisfying assignment to the initial formula. (Hint: When analyzing the foregoing procedure, note that on input an nvariable poly(φ) formulae φ the number of times we start to scan a subtree is at most n ·  ∪i=1 i {0, 1} ∩ (G \ S).)
126
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER FOUR
More Resources, More Power?
More electricity, less toil. The Israeli Electricity Company, 1960s Is it indeed the case that the more resources one has, the more one can achieve? The answer may seem obvious, but the obvious answer (of yes) actually presumes that the worker knows what resources are at his/her disposal. In this case, when allocated more resources, the worker (or computation) can indeed achieve more. But otherwise, nothing may be gained by adding resources. In the context of Computational Complexity, an algorithm knows the amount of resources that it is allocated if it can determine this amount without exceeding the corresponding resources. This condition is satisfied in all “reasonable” cases, but it may not hold in general. The latter fact should not be that surprising: We already know that some functions are not computable, and if these functions are used to determine resources then the algorithm may be in trouble. Needless to say, this discussion requires some formalization, which is provided in the current chapter. Summary: When using “nice” functions to determine an algorithm’s resources, it is indeed the case that more resources allow for more tasks to be performed. However, when “ugly” functions are used for the same purpose, increasing the resources may have no effect. By nice functions we mean functions that can be computed without exceeding the amount of resources that they specify (e.g., t(n) = n 2 or t(n) = 2n ). Naturally, “ugly” functions do not allow for presenting themselves in such nice forms. The foregoing discussion refers to uniform models of computation and to (natural) resources such as time and space complexities. Thus, we get results asserting, for example, that there are problems that are solvable in cubic time but not in quadratic time. In case of nonuniform models of computation, the issue of “nicety” does not arise, and it is easy to establish separations between levels of circuit complexity that differ by any unbounded amount. Results that separate the class of problems solvable within one resource bound from the class of problems solvable within a larger resource bound are called hierarchy theorems. Results that indicate the nonexistence of
127
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
such separations, hence indicating a “gap” in the growth of computing power (or a “gap” in the existence of algorithms that utilize the added resources), are called gap theorems. A somewhat related phenomenon, called speedup theorems, refers to the inability to define the complexity of some problems. Caveat. Uniform complexity classes based on specific resource bounds (e.g., cubic time) are model dependent. Furthermore, the tightness of separation results (i.e., how much “more time” is required for solving some additional computational problems) is also model dependent. Still, the existence of such separations is a phenomenon common to all reasonable and general models of computation (as referred to in the CobhamEdmonds Thesis). In the following presentation, we will explicitly differentiate modelspecific effects from generic ones. Organization. We will first demonstrate the “more resources yield more power” phenomenon in the context of nonuniform complexity. In this case, the issue of “knowing” the amount of resources allocated to the computing device does not arise, because each device is tailored to the amount of resources allowed for the input length that it handles (see Section 4.1). We then turn to the time complexity of uniform algorithms; indeed, hierarchy and gap theorems for time complexity, presented in Section 4.2, constitute the main part of the current chapter. We end by mentioning analogous results for space complexity (see Section 4.3, which may also be read after Section 5.1).
4.1. Nonuniform Complexity Hierarchies The model of machines that use advice (cf. §1.2.4.2 and Section 3.1.2) offers a very convenient setting for separation results. We refer specifically to classes of the form P/, where : N → N is an arbitrary function (see Definition 3.5). Recall that every Boolean function is in P/2n , by virtue of a trivial algorithm that is given, as advice, the truth table of the function (restricted to the relevant input length). An analogous algorithm underlies the following separation result. Theorem 4.1: For any two functions , δ : N → N such that (n) + δ(n) ≤ 2n and δ is unbounded, it holds that P/ is strictly contained in P/( + δ). def
Proof: Let = + δ, and consider the following advicetaking algorithm A: Given advice an ∈ {0, 1}(n) and input i ∈ {1, . . . , 2n } (viewed as an nbit long string), algorithm A outputs the i th bit of an if i ≤ an  and zero otherwise. Clearly, for any def a = (an )n∈N such that an  = (n), it holds that the function f a (x) = A(ax , x) is in P/. Furthermore, different sequences a yield different functions f a . We claim that some of these functions f a are not in P/ , thus obtaining a separation. The claim is proved by considering all possible (polynomialtime) algorithms A and all possible sequences a = (an )n∈N such that an  = (n). Fixing any algorithm A , we consider the number of nbit long functions that are correctly computed by A (an , ·). Clearly, the number of these functions is at most 2 (n) , and thus A may account for at most 2−δ(n) fraction of the functions f a (even when restricted to nbit strings). Essentially, this consideration holds for every n and every possible A , and
128
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
4.2. TIME HIERARCHIES AND GAPS
thus the measure of the set of functions that are computable by algorithms that take advice of length is zero. Formally, for every n, we consider all advicetaking algorithms that have a description of length shorter than δ(n) − 2. (This guarantees that every advicetaking algorithm will be considered.) Coupled with all possible advice sequences of length , these algorithms can compute at most 2(δ(n)−2)+ (n) different functions of nbit long inputs. The latter number falls short of the 2(n) corresponding functions (of nbit long inputs) that are computable by A with advice of length (n). A somewhat less tight bound can be obtained by using the model of Boolean circuits. In this case, some slackness is needed in order to account for the gap between the upper and lower bounds regarding the number of Boolean functions over {0, 1}n that are computed by Boolean circuits of size s < 2n . Specifically (see Exercise 4.1), an obvious lower bound on this number is 2s/O(log s) whereas an obvious upper bound is s 2s = 22s log2 s . Compare these bounds to the lowerbound 2 (n) and the upperbound 2 (n)+(δ(n)/2) (on the number of functions computable with advice of length (n)), which were used in the proof of Theorem 4.1.
4.2. Time Hierarchies and Gaps In this section we show that in “reasonable cases,” increasing the time complexity allows for more problems to be solved, whereas in “pathological cases,” it may happen that even a dramatic increase in the time complexity provides no additional computing power. As hinted in the introductory comments to the current chapter, the “reasonable cases” correspond to time bounds that can be determined by the algorithm itself within the specified time complexity. We stress that also in the aforementioned “reasonable cases,” the added power does not necessarily refer to natural computational problems. That is, like in the case of nonuniform complexity (i.e., Theorem 4.1), the hierarchy theorems are proven by introducing artificial computational problems. Needless to say, we do not know of natural problems in P that are unsolvable in cubic (or some other fixed polynomial) time (on, say, a twotape Turing machine). Thus, although P contains an infinite hierarchy of computational problems, with each level requiring significantly more time than the previous level, we know of no such hierarchy of natural computational problems. In contrast, so far it has been the case that any natural problem that was shown to be solvable in polynomial time was eventually followed by algorithms having running time that is bounded by a moderate polynomial.
4.2.1. Time Hierarchies Note that the nonuniform computing devices, considered in Section 4.1, were explicitly given the relevant resource bounds (e.g., the length of advice). Actually, they were given the resources themselves (e.g., the advice itself) and did not need to monitor their usage of these resources. In contrast, when designing algorithms of arbitrary time complexity t : N → N, we need to make sure that the algorithm does not exceed the time bound. Furthermore, when invoked on input x, the algorithm is not given the time bound t(x) explicitly, and a reasonable design methodology is to have the algorithm compute this bound (i.e., t(x)) before doing anything else. This, in turn, requires the algorithm to
129
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
read the entire input (see Exercise 4.3) as well as to compute t(n) in O(t(n)) steps (as otherwise this preliminary stage already consumes too much time). The latter requirement motivates the following definition (which is related to the standard definition of “fully time constructibility” (cf. [123, Sec. 12.3])). Definition 4.2 (time constructible functions): A function t : N → N is called time constructible if there exists an algorithm that on input n outputs t(n) using at most t(n) steps. Equivalently, we may require that the mapping 1n !→ t(n) be computable within time complexity t. We warn that the foregoing definition is model dependent; however, typically nice functions are computable even faster (e.g., in poly(log t(n)) steps), in which case the model dependency is irrelevant (for reasonable and general models of computation, as referred to in the CobhamEdmonds Thesis). For example, in any reasonable and n general model, functions like t1 (n) = n 2 , t2 (n) = 2n , and t3 (n) = 22 are computable in poly(log ti (n)) steps. Likewise, for a fixed model of computation (to be understood from the context) and for any function t : N → N, we denote by DTIME(t) the class of decision problems that are solvable in time complexity t. We call the reader’s attention to Exercise 4.4 that asserts that in many cases DTIME(t) = DTIME(t/2).
4.2.1.1. The Time Hierarchy Theorem In the following theorem (which separates DTIME(t1 ) from DTIME(t2 )), we refer to the model of twotape Turing machines. In this case we obtain quite a tight hierarchy in terms of the relation between t1 and t2 . We stress that, using the CobhamEdmonds Thesis, this result yields (possibly less tight) hierarchy theorems for any reasonable and general model of computation. Teaching note: The standard statement of Theorem 4.3 asserts that for any timeconstructible function t2 and every function t1 such that t2 = ω(t1 log t1 ) and t1 (n) > n it holds that DTIME(t1 ) is strictly contained in DTIME(t2 ). The current version is only slightly weaker, but it allows a somewhat simpler and more intuitive proof. We comment on the proof of the standard version of Theorem 4.3 in a teaching note following the proof of the current version.
Theorem 4.3 (time hierarchy for twotape Turing machines): For any timeconstructible function t1 and every function t2 such that t2 (n) ≥ (log t1 (n))2 · t1 (n) and t1 (n) > n it holds that DTIME(t1 ) is strictly contained in DTIME(t2 ). As will become clear from the proof, an analogous result holds for any model in which a universal machine can emulate t steps of another machine in O(t log t) time, where the constant in the Onotation depends on the emulated machine. Before proving Theorem 4.3, we derive the following corollary. Corollary 4.4 (time hierarchy for any reasonable and general model): For any reasonable and general model of computation there exists a positive polynomial p such that for any timecomputable function t1 and every function t2 such that t2 > p(t1 ) and t1 (n) > n it holds that DTIME(t1 ) is strictly contained in DTIME(t2 ).
130
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
4.2. TIME HIERARCHIES AND GAPS
It follows that, for every such model and every polynomial t (such that t(n) > n), there exist problems in P that are not in DTIME(t). It also follows that P is a strict subset of E and even of “quasipolynomial time” (i.e., DTIME(q), where q(n) = exp(poly(log n))); moreover, P is a strict subset of DTIME(q), for any superpolynomial function q (i.e., q(n) = n ω(1) ). We comment that Corollary 4.4 can be proven directly (rather than by invoking Theorem 4.3). This can be done by implementing the ideas that underlie the proof of Theorem 4.3 directly to the model of computation at hand (see Exercise 4.5). In fact, such a direct implementation, which is allowed “polynomial slackness” (i.e., t2 > p(t1 )), is less cumbersome than the implementation presented in the proof of Theorem 4.3 where only 1 )). We also note that a polylogarithmic factor is allowed in the slackness (i.e., t2 ≥ O(t the separation result in Corollary 4.4 can be tightened – for details see Exercise 4.6. Proof of Corollary 4.4: The underlying fact is that separation results regarding any reasonable and general model of computation can be “translated” to analogous results regarding any other such model. Such a translation may affect the time bounds as demonstrated next. Letting DTIME2 denote the classes that correspond to twotape Turing machines (and recalling that DTIME denotes the classes that correspond to the alternative model), we note that DTIME(t1 ) ⊆ DTIME2 (t1 ) and DTIME2 (t2 ) ⊆ DTIME(t2 ), where t1 = poly(t1 ) and t2 is defined such that t2 (n) = poly(t2 (n)). The latter unspecified polynomials, hereafter denoted p1 and p2 , respectively, are the ones guaranteed by the CobhamEdmonds Thesis. Also, the hypothesis that t1 is timeconstructible implies that t1 = p1 (t1 ) is timeconstructible with respect to the twotape Turing machine model. Thus, for a suitable choice of the polynomial p (i.e., p( p1−1 (m)) ≥ p2 (m 2 )), it holds that t2 (n) = p2−1 (t2 (n)) > p2−1 ( p(t1 (n))) = p2−1 p p1−1 (t1 (n)) ≥ t1 (n)2 , where the first inequality holds by the corollary’s hypothesis (i.e., t2 > p(t1 )) and the last inequality holds by the choice of p. Invoking Theorem 4.3 (while noting that t2 (n) > t1 (n)2 ), we obtain the strict inclusion DTIME2 (t1 ) ⊂ DTIME2 (t2 ). Combining the latter with DTIME(t1 ) ⊆ DTIME2 (t1 ) and DTIME2 (t2 ) ⊆ DTIME(t2 ), the corollary follows. Proof of Theorem 4.3: The idea is constructing a Boolean function f such that all machines having time complexity t1 fail to compute f . This is done by associating with each possible machine M a different input x M (e.g., x M = M) and making sure that f (x M ) = M (x M ), where M (x) denotes an emulation of M(x) that is suspended after t1 (x) steps. For example, we may define f (x M ) = 1 − M (x M ). We note that M is used instead of M in order to allow for computing f in time that is related to t1 . The point is that M may be an arbitrary machine that is associated with the input x M , and so M does not necessarily run in time t1 (but, by construction, the corresponding M does run in time t1 ). Implementing the foregoing idea calls for an efficient association of machines to inputs as well as for a relatively efficient emulation of t1 steps of an arbitrary machine. As shown next, both requirements can be met easily. Actually, we are going to use a mapping µ of inputs to machines (i.e., µ will map the aforementioned x M to M) such that each machine is in the range of µ and µ is very easy to compute
131
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
(e.g., indeed, for starters, assume that µ is the identity mapping). Thus, by construction, f ∈ DTIME(t1 ). The issue is presenting a relatively efficient algorithm for computing f , that is, showing that f ∈ DTIME(t2 ). The algorithm for computing f as well as the definition of f (sketched in the first paragraph) are straightforward: On input x, the algorithm computes t = t1 (x), determines the machine M = µ(x) that corresponds to x (outputting a default value if no such machine exists), emulates M(x) for t steps, and returns the value 1 − M (x). Recall that M (x) denotes the timetruncated emulation of M(x) (i.e., the emulation of M(x) suspended after t steps); that is, if M(x) halts within t steps then M (x) = M(x), and otherwise M (x) may be defined arbitrarily. Thus, f (x) = 1 − M (x) if M = µ(x) and (say) f (x) = 0 otherwise. In order to show that f ∈ DTIME(t1 ), we show that each machine of time complexity t1 fails to compute f . Fixing any such machine, M, we consider an input x M such that M = µ(x M ), where such an input exists because µ is onto. Now, on the one hand, M (x M ) = M(x M ) (because M has time complexity t1 ), while on the other hand, f (x M ) = 1 − M (x M ) (by the definition of f ). It follows that M(x) = f (x). We now turn to upperbounding the time complexity of f by analyzing the time complexity of the foregoing algorithm that computes f . Using the time constructibility of t1 and ignoring the easy computation of µ, we focus on the question of how much time is required for emulating t steps of machine M (on input x). We should bear in mind that the time complexity of our algorithm needs to be analyzed in the twotape Turing machine model, whereas M itself is a twotape Turing machine. We start by implementing our algorithm on a threetape Turing machine, and next emulate this machine on a twotape Turing machine. The obvious implementation of our algorithm on a threetape Turing machine uses two tapes for the emulation itself and designates the third tape for the actions of the emulation procedure (e.g., storing the code of the emulated machine and maintaining a stepcounter). Thus, each step of the twotape machine M is emulated using O(M) steps on the threetape machine.1 This also includes the amortized complexity of maintaining a step counter for the emulation (see Exercise 4.7). Next, we need to emulate the foregoing threetape machine on a twotape machine. This is done by using the fact (cf., e.g., [123, Thm. 12.6]) that t steps of a threetape machine can be emulated on a twotape machine in O(t log t ) steps. Thus, the complexity of computing f on input x is upperbounded by O(Tµ(x) (x) log Tµ(x) (x)), where TM (n) = O(M · t1 (n)) represents the cost of emulating t1 (n) steps of the twotape machine M on a threetape machine (as in the foregoing discussion). It turns out that the quality of the separation result that we obtain depends on the choice of the mapping µ (of inputs to machines). Using the naive (identity) · t1 (n)) mapping (i.e., µ(x) = x) we can only establish the theorem for t2 (n) = O(n rather than t2 (n) = O(t1 (n)), because in this case Tµ(x) (x) = O(x · t1 (x)). (Note that, in this case, x M = M is a description of µ(x M ) = M.) The theorem follows by associating the machine M with the input x M = M01m , where m = 2M ; that M is, we may use the mapping µ such that µ(x) = M if x = M012 and µ(x) equals some fixed machine otherwise. In this case µ(x) < log2 x < log t1 (x) and so Tµ(x) (x) = O((log t1 (x)) · t1 (x)). The theorem follows. 1
This overhead accounts both for searching the code of M for the adequate action and for the effecting of this action (which may refer to a larger alphabet than the one used by the emulator).
132
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
4.2. TIME HIERARCHIES AND GAPS
Teaching note: Proving the standard version of Theorem 4.3 cannot be done by associating a sufficiently long input x M with each machine M, because this does not allow for geting rid of the additional unbounded factor in Tµ(x) (x) (i.e., the µ(x) factor that multiplies t1 (x)). Note that the latter factor needs to be computable (at the very least) and thus cannot be accounted for by the generic ωnotation that appears in the standard version (cf. [123, Thm. 12.9]). Instead, a different approach is taken (see footnote 2).
Technical comments. The proof of Theorem 4.3 associates with each potential machine M some input x M and defines the computational problem such that machine M errs on input x M . The association of machines with inputs is rather flexible: We can use any onto mapping of inputs to machines that is efficiently computable and sufficiently shrinking. M Specifically, in the proof, we used the mapping µ such that µ(x) = M if x = M012 and µ(x) equals some fixed machine otherwise. We comment that each machine can be M made to err on infinitely many inputs by redefining µ such that µ(x) = M if M012 is a suffix of x (and µ(x) equals some fixed machine otherwise). We also comment that, in contrast to the proof of Theorem 4.3, the proof of Theorem 1.5 utilizes a rigid mapping of inputs to machines (i.e., there µ(x) = M if x = M). Digest: Diagonalization. The last comment highlights the fact that the proof of Theorem 4.3 is merely a sophisticated version of the proof of Theorem 1.5. Both proofs refer to versions of the universal function, which in the case of the proof of Theorem 4.3 is (implicitly) defined such that its value at (M, x) equals M (x), where M (x) denotes an emulation of M(x) that is suspended after t1 (x) steps.3 Actually, both proofs refers to the “diagonal” of the aforementioned function, which in the case of the proof of Theorem 4.3 is only defined implicitly. That is, the value of the diagonal function at x, denoted d(x), equals the value of the universal function at (µ(x), x). This is actually a definitional schema, as the choice of the function µ remains unspecified. Indeed, setting µ(x) = x corresponds to a “real” diagonal in the matrix depicting the universal function, but any other choice of a 11 mappings µ also yields a “kind of diagonal” of the universal function. Either way, the function f is defined such that for every x it holds that f (x) = d(x). This guarantees that no machine of time complexity t1 can compute f , and the focus is on presenting an algorithm that computes f (which, needless to say, has time complexity greater than t1 ). Part of the proof of Theorem 4.3 is devoted to selecting µ in a way that minimizes the time complexity of computing f , whereas in the proof of Theorem 1.5 we merely need to guarantee that f is computable.
4.2.1.2. Impossibility of Speedup for Universal Computation The time hierarchy theorem (Theorem 4.3) implies that the computation of a universal def machine cannot be significantly sped up. That is, consider the function u (M, x, t) = y if 2
In the standard proof the function f is not defined with reference to t1 (x M ) steps of M(x M ), but rather with reference to the result of emulating M(x M ) while using a total of t2 (x M ) steps in the emulation process (i.e., in the algorithm used to compute f ). This guarantees that f is in DTIME(t2 ), and “pushes the problem” to showing that f is not in DTIME(t1 ). It also explains why t2 (rather than t1 ) is assumed to be timeconstructible. As for the foregoing problem, it is resolved by observing that for each relevant machine (i.e., having time complexity t1 ) the executions on any sufficiently long input will be fully emulated. Thus, we merely need to associate with each M a disjoint set of infinitely many inputs and make sure that M errs on each of these inputs. 3 Needless to say, in the proof of Theorem 1.5, M = M.
133
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER? def
on input x machine M halts within t steps and outputs the string y, and u (M, x, t) = ⊥ if on input x machine M makes more than t steps. Recall that the value of u (M, x, t) can be computed in O(x + M · t) steps. As shown next, Theorem 4.3 implies that this value (i.e., u (M, x, t)) cannot be computed within significantly fewer steps. Theorem 4.5: There exists no twotape Turing machine that, on input M, x and t, computes u (M, x, t) in o((t + x) · f (M)/ log2 (t + x)) steps, where f is an arbitrary function. A similar result holds for any reasonable and general model of computation (cf., Corollary 4.4). In particular, it follows that u is not computable in polynomial time (because the input t is presented in binary). In fact, one can show that there exists no polynomialtime algorithm for deciding whether or not M halts on input x in t steps (i.e., the set {(M, x, t) : u (M, x, t) = ⊥} is not in P); see Exercise 4.8. Proof: Suppose (toward the contradiction) that, for every fixed M, given x and t > x, the value of u (M, x, t) can be computed in o(t/ log2 t) steps, where the onotation hides a constant that may depend on M. We shall show that this hypothesis implies that for any timeconstructible t1 and t2 (n) = t1 (n) · log2 t1 (n) it holds that DTIME(t2 ) = DTIME(t1 ), which (strongly) contradicts Theorem 4.3. Consider an arbitrary timeconstructible t1 (s.t. t1 (n) > n) and an arbitrary set S ∈ DTIME(t2 ), where t2 (n) = t1 (n) · log2 t1 (n). Let M be a machine of time complexity t2 that decides membership in S, and consider the following algorithm: On input x, the algorithm first computes t = t1 (x), and then computes (and outputs) the value u (M, x, t log2 t). By the time constructibility of t1 , the first computation can be implemented in t steps, and by the contradiction hypothesis the same holds for the second computation. Thus, S can be decided in DTIME(2t1 ) = DTIME(t1 ), implying that DTIME(t2 ) = DTIME(t1 ), which in turn contradicts Theorem 4.3. We conclude that the contradiction hypothesis is wrong, and the theorem follows.
4.2.1.3. Hierarchy Theorem for Nondeterministic Time Analogously to DTIME, for a fixed model of computation (to be understood from the context) and for any function t : N → N, we denote by NTIME(t) the class of sets that are accepted by some nondeterministic machine of time complexity t. Indeed, this definition extends the traditional formulation of N P (as presented in Definition 2.7). Alternatively, analogously to our preferred definition of N P (i.e., Definition 2.5), a set S ⊆ {0, 1}∗ is in NTIME(t) if there exists a lineartime algorithm V such that the two conditions hold: 1. For every x ∈ S there exists y ∈ {0, 1}t(x) such that V (x, y) = 1. 2. For every x ∈ S and every y ∈ {0, 1}∗ it holds that V (x, y) = 0. We warn that the two formulations are not identical, but in sufficiently strong models (e.g., twotape Turing machines) they are related up to logarithmic factors (see Exercise 4.10). The hierarchy theorem itself is similar to the one for deterministic time, except that here we require that t2 (n) ≥ (log t1 (n + 1))2 · t1 (n + 1) (rather than t2 (n) ≥ (log t1 (n))2 · t1 (n)). That is:
134
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
4.2. TIME HIERARCHIES AND GAPS
Theorem 4.6 (nondeterministic time hierarchy for twotape Turing machines): For any timeconstructible and monotonically nondecreasing function t1 and every function t2 such that t2 (n) ≥ (log t1 (n + 1))2 · t1 (n + 1) and t1 (n) > n it holds that NTIME(t1 ) is strictly contained in NTIME(t2 ). Proof: We cannot just apply the proof of Theorem 4.3, because the Boolean function f defined there requires the ability to determine whether there exists a computation of M that accepts the input x M in t1 (x M ) steps. In the current context, M is a nondeterministic machine and so the only way we know how to determine this question (both for a “yes” and “no” answers) is to try all the (2t1 (x M ) ) relevant executions.4 1 )), and so a different But this would put f in DTIME(2t1 ), rather than in NTIME( O(t approach is needed. We associate with each (nondeterministic) machine M a large interval of strings (viewed as integers), denoted I M = [α M , β M ], such that the various intervals do not intersect and such that it is easy to determine for each string x in which interval it resides. For each x ∈ [α M , β M − 1], we define f (x) = 1 if and only if there exists a def nondeterministic computation of M that accepts the input x = x + 1 in t1 (x ) ≤ t1 (x + 1) steps. Thus, if M has time complexity t1 and (nondeterministically) accepts {x : f (x) = 1}, then either M (nondeterministically) accepts each string in the interval I M or M (nondeterministically) accepts no string in I M , because M must nondeterministically accept x if and only if it nondeterministically accepts x = x + 1. So, it is left to deal with the case that M is invariant on I M , which is where the definition of the value of f (β M ) comes into play: We define f (β M ) to equal zero if and only if there exists a nondeterministic computation of M that accepts the input α M in t1 (α M ) steps. We shall select β M to be large enough relative to α M such that we can afford to try all possible computations of M on input α M . Details follow. Let us first recapitulate the definition of f : {0, 1}∗ → {0, 1}, focusing on the case that the input is in some interval I M . We define a Boolean function A M such that A M (z) = 1 if and only if there exists a nondeterministic computation of M that accepts the input z in t1 (z) steps. Then, for x ∈ I M we have A M (x + 1) if x ∈ [α M , β M − 1] f (x) = 1 − A M (α M ) if x = β M Next, we present the following nondeterministic machine for accepting the set {x : f (x) = 1}. We assume that, on input x, it is easy to determine the machine M that corresponds to the interval [α M , β M ] in which x resides.5 We distinguish two cases: 1. On input x ∈ [α M , β M − 1], our nondeterministic machine emulates t1 (x ) steps of a (single) nondeterministic computation of M on input x = x + 1, and decides accordingly (i.e., our machine accepts if and only if the said emulation has accepted). Indeed (as in the proof of Theorem 4.3), this emulation can be performed in time (log t1 (x + 1))2 · t1 (x + 1) ≤ t2 (x). (t1 (x M )) steps, but we cannot do so for “no” Indeed, we can nondeterministically recognize “yes” answers in O answers. 5 For example, we may partition the strings to consecutive intervals such that the i th interval, denoted [αi , βi ], corresponds to the i th machine and for T1 (m) = 22t1 (m) it holds that βi = 1T1 (αi ) and αi+1 = 0T1 (αi )+1 . Note that βi  = T1 (αi ), and thus t1 (βi ) > t1 (αi ) · 2t1 (αi ) . 4
135
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
2. On input x = β M , our machine just tries all 2t1 (α M ) executions of M on input α M and decides in a suitable manner; that is, our machine emulates t1 (α M ) steps in each of the 2t1 (α M ) possible executions of M(α M ) and accepts β M if and only if none of the emulated executions ended accepting α M . Note that this part of our def machine is deterministic, and it amounts to emulating TM = 2t1 (α M ) · t1 (α M ) steps of M. By a suitable choice of the interval [α M , β M ] (e.g., β M  > TM ), this number of steps (i.e., TM ) is smaller than β M  ≤ t1 (β M ), and it follows that these TM steps of M can be emulated in time (log2 t1 (β M ))2 · t1 (β M ) ≤ t2 (β M ). Thus, our nondeterministic machine has time complexity t2 , and it follows that f is in NTIME(t2 ). It remains to show that f is not in NTIME(t1 ). Suppose, on the contrary, that some nondeterministic machine M of time complexity t1 accepts the set {x : f (x) = 1}; that is, for every x it holds that A M (x) = f (x), where A M is as defined in the foregoing (i.e., A M (x) = 1 if and only if there exists a nondeterministic computation of M that accepts the input x in t1 (x) steps). Focusing on the interval [α M , β M ], we have A M (x) = f (x) for every x ∈ [α M , β M ], which (combined with the definition of f ) implies that A M (x) = f (x) = A M (x + 1) for every x ∈ [α M , β M − 1] and A M (β M ) = f (β M ) = 1 − A M (α M ). Thus, we reached a contraction (because we got A M (α M ) = · · · = A M (β M ) = 1 − A M (α M )).
4.2.2. Time Gaps and Speedup In contrast to Theorem 4.3, there exists functions t : N → N such that DTIME(t) = DTIME(t 2 ) (or even DTIME(t) = DTIME(2t )). Needless to say, these functions are not timeconstructible (and thus the aforementioned fact does not contradict Theorem 4.3). The reason for this phenomenon is that, for such functions t, there exist no algorithms that have time complexity above t but below t 2 (resp., 2t ). Theorem 4.7 (the time gap theorem): For every nondecreasing computable function g : N → N there exists a nondecreasing computable function t : N → N such that DTIME(t) = DTIME(g(t)). The foregoing examples referred to g(m) = m 2 and g(m) = 2m . Since we are mainly interested in dramatic gaps (i.e., superpolynomial functions g), the model of computation does not matter here (as long as it is reasonable and general). Proof: Consider an enumeration of all possible algorithms (or machines), which also includes machines that do not halt on some inputs. (Recall that we cannot enumerate the set of all machines that halt on every input.) Let ti denote the time complexity of the i th algorithm; that is, ti (n) = ∞ if the i th machine does not halt on some nbit long input and otherwise ti (n) = maxx∈{0,1}n {Ti (x)}, where Ti (x) denotes the number of steps taken by the i th machine on input x. The basic idea is to define t such that no ti is “sandwiched” between t and g(t), and thus no algorithm will have time complexity between t and g(t). Intuitively, if ti (n) is finite, then we may define t such that t(n) > ti (n) and thus guarantee that ti (n) ∈ [t(n), g(t(n))], whereas if ti (n) = ∞ then any finite value of t(n) will do
136
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
4.2. TIME HIERARCHIES AND GAPS
g(v) t k(n) t j(n)
v
t i(n) t(n1)
v0
current v
Figure 4.1: The Gap Theorem – determining the value of t(n).
(because then ti (n) > g(t(n))). Thus, for every m and n, we can define t(n) such that ti (n) ∈ [t(n), g(t(n))] for every i ∈ [m] (e.g., t(n) = maxi∈[m]:ti (n)=∞ {ti (n)} + 1).6 This yields a weaker version of the theorem in which the function t is neither computable nor nondecreasing. It is easy to modify t such that it is nondecreasing (e.g., t(n) = max(t(n − 1), maxi∈[m]:ti (n)=∞ {ti (n)}) + 1) and so the real challenge is to make t computable. The problem is that we want t to be computable, whereas given n we cannot tell whether or not ti (n) is finite. However, we do not really need to make the latter decision: For each candidate value v of t(n), we should just determine whether or not ti (n) ∈ [v, g(v)], which can be decided by running the i th machine for at most g(v) + 1 steps (on each nbit long string). That is, as far as the i th machine is concerned, we should just find a value v such that either v > ti (n) or g(v) < ti (n) (which includes the case ti (n) = ∞). This can be done by starting with v = v0 (where, say, v0 = t(n − 1) + 1), and increasing v until either v > ti (n) or g(v) < ti (n). The point is that if ti (n) is infinite then we may output v = v0 after emulating 2n · (g(v0 ) + 1) steps, and otherwise we reach a safe value v > ti (n) after performing ti (n) n 2 · j emulation steps. Bearing in mind that we should deal with all at most j=v 0 possible machines, we obtain the following procedure for setting t(n). Let µ : N → N be any unbounded and computable function (e.g., µ(n) = n will do). Starting with v = t(n − 1) + 1, we keep incrementing v until v satisfies, for every i ∈ {1, . . . , µ(n)}, either ti (n) < v or ti (n) > g(v). This condition can be verified by computing µ(n) and g(v), and emulating the execution of each of the first µ(n) machines on each of the nbit long strings for g(v) + 1 steps. The procedure sets t(n) to equal the first value v satisfying the aforementioned condition, and halts. (Figure 4.1 depicts the search for a good value v for t(n).) 6
We may assume, without loss of generality, that t1 (n) = 1 for every n, e.g., by letting the machine that always halts after a single step be the first machine in our enumeration.
137
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
To show that the foregoing procedure halts on every n, consider the set Hn ⊆ {1, . . . , µ(n)} of the indices of the (relevant) machines that halt on all inputs of length n. Then, the procedure definitely halts before reaching the value v = max(Tn , t(n − 1)) + 2, where Tn = maxi∈Hn {ti (n)}. (Indeed, the procedure may halt with a value v ≤ Tn , but this will happen only if g(v) < Tn .) Finally, for the foregoing function t, we prove that DTIME(t) = DTIME(g(t)) holds. Indeed, consider an arbitrary S ∈ DTIME(g(t)), and suppose that the i th algorithm decides S in time at most g(t); that is, for every n, it holds that ti (n) ≤ g(t(n)). Then (by the construction of t), for every n satisfying µ(n) ≥ i, it holds that ti (n) < t(n). It follows that the i th algorithm decides S in time at most t on all but finitely many inputs. Combining this algorithm with a “lookup table” machine that handles the exceptional inputs, we conclude that S ∈ DTIME(t). The theorem follows. Comment. The function t defined by the foregoing proof is computable in time that exceeds g(t). Specifically, the presented procedure computes t(n) (as well as g( f (n))) n · g(t(n)) + Tg (t(n))), where Tg (m) denotes the number of steps required to in time O(2 compute g(m) on input m. Speedup theorems. Theorem 4.7 can be viewed as asserting that some time complexity classes (i.e., DTIME(g(t)) in the theorem) collapse to lower classes (i.e., to DTIME(t)). A conceptually related phenomenon is of problems that have no optimal algorithm (not even in a very mild sense); that is, every algorithm for these (“pathological”) problems can be drastically sped up. It follows that the complexity of these problems cannot be defined (i.e., as the complexity of the best algorithm solving this problem). The following drastic speedup theorem should not be confused with the linear speedup that is an artifact of the definition of a Turing machine (see Exercise 4.4).7 Theorem 4.8 (the time speedup theorem): For every computable (and superlinear) function g there exists a decidable set S such that if S ∈ DTIME(t) then S ∈ DTIME(t ) for t satisfying g(t (n)) < t(n). 2 Taking g(n) = n√ (or g(n) = 2n ), the theorem asserts that, for every t, if S ∈ DTIME(t) then S ∈ DTIME( t) (resp., S ∈ DTIME(log t)). Note that Theorem 4.8 can be applied any (constant) number of times, which means that we cannot give a reasonable estimate to the complexity of deciding membership in S. In contrast, recall that in some important cases, optimal algorithms for solving computational problems do exist. Specifically, algorithms solving (candid) search problems in NP cannot be sped up (see Theorem 2.33), nor can the computation of a universal machine (see Theorem 4.5). We refrain from presenting a proof of Theorem 4.8, but comment on the complexity of the sets involved in this proof. The proof (presented in [123, Sec. 12.6]) provides a construction of a set S in DTIME(t ) \ DTIME(t ) for t (n) = h(n − O(1)) and t (n) = h(n − ω(1)), where h(n) denoted g iterated n times on 2 (i.e., h(n) = g (n) (2), where g (i+1) (m) = g(g (i) (m)) and g (1) = g). The set S is constructed such that for every i > 0 7
Advanced comment: We note that the linear speedup phenomenon was implicitly addressed in the proof of Theorem 4.3, by allowing an emulation overhead that depends on the length of the description of the emulated machine.
138
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
there exists a j > i and an algorithm that decides S in time ti but not in time t j , where tk (n) = h(n − k).
4.3. Space Hierarchies and Gaps Hierarchy and gap theorems analogous to Theorem 4.3 and Theorem 4.7, respectively, are known for space complexity. In fact, since spaceefficient emulation of spacebounded machines is simpler than timeefficient emulations of timebounded machines, the results tend to be sharper (and their proofs tend to be simpler). This is most conspicuous in the case of the separation result (stated next), which is optimal (in light of the corresponding linear speedup result; see Exercise 4.12). Before stating the separation result, we need a few preliminaries. We refer the reader to §1.2.3.5 for a definition of space complexity (and to Chapter 5 for further discussion). As in the case of time complexity, we consider a specific model of computation, but the results hold for any other reasonable and general model. Specifically, we consider threetape Turing machines, because we designate two special tapes for input and output. For any function s : N → N, we denote by DSPACE(s) the class of decision problems that are solvable in space complexity s. Analogously to Definition 4.2, we call a function s : N → N spaceconstructible if there exists an algorithm that on input n outputs s(n) while using at most s(n) cells of the worktape. Actually, functions like s1 (n) = log n, s2 (n) = (log n)2 , and s3 (n) = 2n are computable using O(log si (n)) space. Theorem 4.9 (space hierarchy for threetape Turing machines): For any spaceconstructible function s2 and every function s1 such that s2 = ω(s1 ) and s1 (n) > log n it holds that DSPACE(s1 ) is strictly contained in DSPACE(s2 ). Theorem 4.9 is analogous to the traditional version of Theorem 4.3 (rather than to the one we presented), and is proven using the alternative approach sketched in footnote 2. The details are left as an exercise (see Exercise 4.13).
Chapter Notes The material presented in this chapter predates the theory of NPcompleteness and the dominant stature of the PvsNP Question. In these early days, the field (to be known as Complexity Theory) had not yet developed an independent identity and its perspectives were dominated by two classical theories: the theory of computability (and recursive function) and the theory of formal languages. Nevertheless, we believe that the results presented in this chapter are interesting for two reasons. Firstly, as stated up front, these results address the natural question of under what conditions it is the case that more computational resources help. Secondly, these results demonstrate the type of results that one can get with respect to “generic” questions regarding Computational Complexity; that is, questions that refer to arbitrary resource bounds (e.g., the relation between DTIME(t1 ) and DTIME(t2 ) for arbitrary t1 and t2 ). We note that, in contrast to the “generic” questions considered in this chapter, the PvsNP Question as well as the related questions that will be addressed in the rest of this book are not “generic” since they refer to specific classes (which capture natural computational issues). Furthermore, whereas time and space complexity behave in similar
139
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
manner with respect to hierarchies and gaps, they behave quite differently with respect to other questions. The interested reader is referred to Sections 5.1 and 5.3. Getting back to the concrete contents of the current chapter, let us briefly mention the most relevant credits. The hierarchy theorems (e.g., Theorem 4.3) were proven by Hartmanis and Stearns [114]. Gap theorems (e.g., Theorem 4.7) were proven by Borodin [47] (and are often referred to as Borodin’s Gap Theorem). An axiomatic treatment of complexity measures was developed by Blum [38], who also proved corresponding speedup theorems (e.g., Theorem 4.8, which is often referred to as Blum’s Speedup Theorem). A traditional presentation of all the aforementioned topics is provided in [123, Chap. 12], which also presents related techniques (e.g., “translation lemmas”).
Exercises Exercise 4.1: Let Fn (s) denote the number of different Boolean functions over {0, 1}n that are computed by Boolean circuits of size s. Prove that, for any s < 2n , it holds that Fn (s) ≥ 2s/O(log s) and Fn (s) ≤ s 2s . Guideline: Any Boolean function f : {0, 1} → {0, 1} can be computed by a circuit of size s = O( · 2 ). Thus, for every ≤ n, it holds that Fn (s ) ≥ 22 > 2s /O(log s ) . s 2 s On the other hand, the number of circuits of size s is less than 2 · s , where the second factor represents the number of possible choices of pairs of gates that feed any gate in the circuit.
Exercise 4.2 (advice can speed up computation): For every timeconstructible function t, show that there exists a set S in DTIME(t 2 ) \ DTIME(t) that can be decided in linear time using an advice of linear length (i.e., S ∈ DTIME()/ where (n) = O(n)). Guideline: Starting with a set S ∈ DTIME(T 2 ) \ DTIME(T ), where T (m) = t(2m ), x consider the set S = {x02 −x : x ∈ S }.
Exercise 4.3: Referring to any reasonable model of computation (and assuming that the input length is not given explicitly (unlike as in, e.g., Definition 10.10)), prove that any algorithm that has sublinear time complexity actually has constant time complexity. Guideline: Consider the question of whether or not there exists an infinite set of strings S such that when invoked on any input x ∈ S the algorithm reads all of x. Note that if S is infinite then the algorithm cannot have sublinear time complexity, and prove that if S is finite then the algorithm has constant time complexity.
Exercise 4.4 (linear speedup of Turing machine): Prove that any problem that can be solved by a twotape Turing machine that has time complexity t can be solved by another twotape Turing machine having time complexity t , where t (n) = O(n) + (t(n)/2). Guideline: Consider a machine that uses a larger alphabet, capable of encoding a constant (denoted c) number of symbols of the original machine, and thus capable of emulating c steps of the original machine in O(1) steps, where the constant in the Onotation is a universal constant (independent of c). Note that the O(n) term accounts for a preprocessing that converts the binary input to the work alphabet of the new machine (which encodes c input bits in one alphabet symbol). Thus, a similar result for a onetape Turing machine seems to require an additive O(n 2 ) term.
140
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 4.5 (a direct proof of Corollary 4.4): Present a direct proof of Corollary 4.4 by using the ideas that underlie the proof of Theorem 4.3. Furthermore, prove that if t steps of machine M (in the model at hand) can be emulated by g(M, t) steps of a corresponding universal machine, then Corollary 4.4 holds for any t2 (n) ≥ g(log n, t1 (n)). Guideline: The function f ∈ DTIME(t1 ) is defined exactly as in the proof of Theorem 4.3, where here DTIME denotes the timecomplexity classes of the model at hand. When upperbounding the time complexity of f in this model, let TM (n) denote the number of steps used in emulating t1 (n) steps of machine M, and note that TM (n) = g(M, t1 (n)) and that f ∈ DTIME(T ), where T (n) = maxx∈{0,1}n {Tµ(x) (n)}.
Exercise 4.6 (tightening Corollary 4.4): Prove that, for any reasonable and general model of computation, any constant ε > 0 and any “nice” function t (e.g., either t(n) = n c for any constant c ≥ 1 or t(n) = 2c n for any constant c > 0), it holds that 1+ε DTIME(t) is strictly contained in DTIME(t ). Guideline: Assuming toward the contradiction that DTIME(t) = DTIME( f ◦ t), for f (k) = k 1+ε , derive a contradiction to Corollary 4.4 by proving that for every constant i it holds that DTIME(t) = DTIME( f i ◦ t), where f i denotes i iterative applications of f . Note that proving that DTIME(t) = DTIME( f ◦ t) implies that DTIME( f i−1 ◦ t) = DTIME( f i ◦ t) (for every constant i) requires a “padding argument” (i.e., nbit long inputs are encoded as mbit long inputs such that t(m) = ( f i−1 ◦ t)(n), and indeed n !→ m = (t −1 ◦ f i−1 ◦ t)(n) should be computable in time t(m)).
Exercise 4.7 (constant amortizedtime stepcounter): A stepcounter is an algorithm that runs for a number of steps that is specified in its input. Actually, such an algorithm may run for a somewhat larger number of steps but halt after issuing a number of “signals” as specified in its input, where these signals are defined as entering (and leaving) a designated state (of the algorithm). A stepcounter may be run in parallel to another procedure in order to suspend the execution after a predetermined number of steps (of the other procedure) have elapsed. Show that there exists a simple deterministic machine that, on input n, halts after issuing n signals while making O(n) steps. Guideline: A slightly careful implementation of the straightforward algorithm will do, when coupled with an “amortized” timecomplexity analysis.
Exercise 4.8 (a natural set in E \ P): In continuation of the proof of Theorem 4.5, prove def that the set {(M, x, t) : u (M, x, t) = ⊥} is in E \ P, where E = ∪c DTIME(ec ) and ec (n) = 2cn . Exercise 4.9 (EXPcompleteness): In continuation of Exercise 4.8, prove that every set in EX P is Karpreducible to the set {(M, x, t) : u (M, x, t) = ⊥}. Exercise 4.10: Prove that the two definitions of NTIME, presented in §4.2.1.3, are related up to logarithmic factors. Note the importance of the condition that V has linear (rather than polynomial) time complexity. Guideline: When emulating a nondeterministic machine by the verification procedure V , encode the nondeterministic choices in a “witness” string y such that y is slightly larger than the number of steps taken by the original machine. Specifically, having y = O(t log t), where t denotes the number of steps taken by the original machine, allows for emulating the latter computation in linear time (i.e., linear in y).
141
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
MORE RESOURCES, MORE POWER?
Exercise 4.11: In continuation of Theorem 4.7, prove that for every computable function t : N → N and every nondecreasing computable function g : N → N there exists a nondecreasing computable function t : N → N such that t > t and DTIME(t) = DTIME(g(t)). Exercise 4.12: In continuation of Exercise 4.4, state and prove a linear speedup result for space complexity, when using the standard definition of space as recalled in Section 4.3. (Note that this result does not hold with respect to “binary space complexity” as defined in Section 5.1.1.) Exercise 4.13: Prove Theorem 4.9. As a warmup, prove first a spacecomplexity version of Theorem 4.3. Guideline: Note that providing a spaceefficient emulation of one machine by another machine is easier than providing an analogous timeefficient emulation.
Exercise 4.14 (space gap theorem): In continuation of Theorem 4.7, state and prove a gap theorem for space complexity.
142
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER FIVE
Space Complexity
Open are the double doors of the horizon; unlocked are its bolts. Philip Glass, Akhnaten, Prelude Whereas the number of steps taken during a computation is the primary measure of its efficiency, the amount of temporary storage used by the computation is also a major concern. Furthermore, in some settings, space is even more scarce than time. In addition to the intrinsic interest in space complexity, its study provides an interesting perspective on the study of time complexity. For example, in contrast to the common conjecture by which N P = coN P, we shall see that analogous spacecomplexity classes (e.g., N L) are closed under complementation (e.g., N L = coN L). Summary: This chapter is devoted to the study of the space complexity of computations, while focusing on two rather extreme cases. The first case is that of algorithms having logarithmic space complexity. We view such algorithms as utilizing the naturally minimal amount of temporary storage, where the term “minimal” is used here in an intuitive (but somewhat inaccurate) sense, and note that logarithmic space complexity seems a more stringent requirement than polynomial time. The second case is that of algorithms having polynomial space complexity, which seems a strictly more liberal restriction than polynomial time complexity. Indeed, algorithms utilizing polynomial space can perform almost all the computational tasks considered in this book (e.g., the class PSPACE contains almost all complexity classes considered in this book). We first consider algorithms of logarithmic space complexity. Such algorithms may be used for solving various natural search and decision problems, for providing reductions among such problems, and for yielding a strong notion of uniformity for Boolean circuits. The climax of this part is a logspace algorithm for exploring (undirected) graphs. We then turn to nondeterministic computations, focusing on the complexity class N L that is captured by the problem of deciding directed connectivity of (directed) graphs. The climax of this part is a proof that N L = coN L, which may be paraphrased as a logspace reduction of directed unconnectivity to directed connectivity.
143
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
We conclude with a short discussion of the class PSPACE, proving that the set of satisfiable quantified Boolean formulae is PSPACEcomplete (under polynomialtime reductions). We mention the similarity between this proof and the proof that NSPACE(s) ⊆ DSPACE(O(s 2 )). We stress that, as in the case of time complexity, the main results presented in this chapter hold for any reasonable model of computation.1 In fact, when properly defined, space complexity is even more robust than time complexity. Still, for the sake of clarity, we often refer to the specific model of Turing machines. Organization. Space complexity seems to behave quite differently from time complexity, and seems to require a different mindset as well as auxiliary conventions. Some of the relevant issues are discussed in Section 5.1. We then turn to the study of logarithmic space complexity (see Section 5.2) and the corresponding nondeterministic version (see Section 5.3). Finally, we consider polynomial space complexity (see Section 5.4).
5.1. General Preliminaries and Issues We start by discussing several very important conventions regarding space complexity (see Section 5.1.1). Needless to say, reading Section 5.1.1 is essential for the understanding of the rest of this chapter. (In contrast, the rather parenthetical Section 5.1.2 can be skipped with no significant loss.) We then discuss a variety of issues, highlighting the differences between space complexity and time complexity (see Section 5.1.3). In particular, we call the reader’s attention to the composition lemmas (§5.1.3.1) and related reductions (§5.1.3.3) as well as to the obvious simulation result presented in §5.1.3.2 (i.e., DSPACE(s) ⊆ DTIME(2 O(s) )). Lastly, in Section 5.1.4 we relate circuit size to space complexity by considering the space complexity of circuit evaluation.
5.1.1. Important Conventions Space complexity is meant to measure the amount of temporary storage (i.e., computer’s memory) used when performing a computational task. Since much of our focus will be on using an amount of memory that is sublinear in the input length, it is important to use a model in which one can differentiate memory used for computation from memory used for storing the initial input and/or the final output. That is, we do not want to count the input and output themselves within the space of computation, and thus formulate that they are delivered on special devices that are not considered memory. On the other hand, we have to make sure that the input and output devices cannot be abused for providing work space (which is unaccounted for). This leads to the convention by which the input device (e.g., a designated inputtape of a multitape Turing machine) is readonly, whereas the output device (e.g., a designated outputtape of a such machine) is writeonly. With this convention in place, we define space complexity as accounting only for the use of space on the other (storage) devices (e.g., the worktapes of a multitape Turing machine). Fixing a concrete model of computation (e.g., multitape Turing machines), we denote by DSPACE(s) the class of decision problems that are solvable in space complexity s. The 1
The only exceptions appear in Exercises 5.4 and 5.18, which refer to the notion of a crossing sequence. The use of this notion in these proofs presumes that the machine scans its storage devices in a serial manner. In contrast, we stress that the various notions of an instantaneous configuration do not assume such a machine model.
144
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.1. GENERAL PRELIMINARIES AND ISSUES
space complexity of search problems is defined analogously. Specifically, the standard definition of space complexity (see §1.2.3.5) refers to the number of cells of the worktape scanned by the machine on each input. We prefer, however, an alternative definition, which provides a more accurate account of the actual storage. Specifically, the binary space complexity of a computation refers to the number of bits that can be stored in these cells, thus multiplying the number of cells by the logarithm of the finite set of worktape symbols of the machine.2 The difference between the two aforementioned definitions is mostly immaterial because it amounts to a constant factor and we will usually discard such factors. Nevertheless, aside from being conceptually right, using the definition of binary space complexity facilitates some technical details (because the number of possible “instantaneous configurations” is explicitly upperbounded in terms of binary space complexity, whereas its relation to the standard definition depends on the machine in question). Toward such applications, we also count the finite state of the machine in its space complexity. Furthermore, for the sake of simplicity, we also assume that the machine does not scan the input tape beyond the boundaries of the input, which are indicated by special symbols.3 We stress that individual locations of the (readonly) inputtape (or device) may be read several times. This is essential for many algorithms that use a sublinear amount of space (because such algorithms may need to scan their input more than once while they cannot afford copying their input to their storage device). In contrast, rewriting on (the same location of) the writeonly outputtape is inessential, and in fact can be eliminated at a relatively small cost (see Exercise 5.2). Summary. Let us compile a list of the foregoing conventions. As stated, the first two items on the list are of crucial importance, while the rest are of technical nature (but do facilitate our exposition). 1. Space complexity discards the use of the input and output devices. 2. The input device is readonly and the output device is writeonly. 3. We will usually refer to the binary space complexity of algorithms, where the binary space complexity of a machine M that uses the alphabet , finite state set Q, and has standard space complexity S M is defined as (log2 Q) + (log2 ) · S M . (Recall that S M measures the number of cells of the temporary storage device that are used by M during the computation.) 4. We will assume that the machine does not scan the input device beyond the boundaries of the input. 5. We will assume that the machine does not rewrite to locations of its output device (i.e., it writes to each cell of the output device at most once).
5.1.2. On the Minimal Amount of Useful Computation Space Bearing in mind that one of our main objectives is identifying natural subclasses of P, we consider the question of what is the minimal amount of space that allows for meaningful computations. We note that regular sets [123, Chap. 2] are decidable by constantspace 2
We note that, unlike in the context of time complexity, linear speedup (as in Exercise 4.12) does not seem to represent an actual saving in space resources. Indeed, time can be sped up by using stronger hardware (i.e., a Turing machine with a bigger work alphabet), but the actual space is not really affected by partitioning it into bigger chunks (i.e., using bigger cells). This fact is demonstrated when considering the binary space complexity of the two machines. 3 As indicated by Exercise 5.1, little is lost by this natural assumption.
145
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Turing machines and that this is all that the latter can decide (see, e.g., [123, Sec. 2.6]). It is tempting to say that sublogarithmic space machines are not more useful than constantspace machines, because it seems impossible to allocate a sublogarithmic amount of space. This wrong intuition is based on the presumption that the allocation of a nonconstant amount of space requires explicitly computing the length of the input, which in turn requires logarithmic space. However, this presumption is wrong: The input itself (in case it is of a proper form) can be used to determine its length (and/or the allowed amount of space).4 In fact, for (n) = log log n, the class DSPACE(O()) is a proper superset of DSPACE(O(1)); see Exercise 5.3. On the other hand, it turns out that doublelogarithmic space is indeed the smallest amount of space that is more useful than constant space (see Exercise 5.4); that is, for (n) = log log n, it holds that DSPACE(o()) = DSPACE(O(1)). In spite of the fact that some nontrivial things can be done in sublogarithmic space complexity, the lowest spacecomplexity class that we shall study in depth is logarithmic space (see Section 5.2). As we shall see, this class is the natural habitat of several fundamental computational phenomena. A parenthetical comment (or a side lesson). Before proceeding, let us highlight the fact that a naive presumption about arbitrary algorithms (i.e., that the use of a nonconstant amount of space requires explicitly computing the length of the input) could have led us to a wrong conclusion. This demonstrates the danger in making “reasonable looking” (but unjustified) presumptions about arbitrary algorithms. We need to be fully aware of this danger whenever we seek impossibility results and/or complexity lower bounds.
5.1.3. Time Versus Space Space complexity behaves very different from time complexity and indeed different paradigms are used in studying it. One notable example is provided by the context of algorithmic composition, discussed next.
5.1.3.1. Two Composition Lemmas Unlike time, space can be reused; but, on the other hand, intermediate results of a computation cannot be recorded for free. These two conflicting aspects are captured in the following composition lemma. Lemma 5.1 (naive composition): Let f 1 : {0, 1}∗ → {0, 1}∗ and f 2 : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ be computable in space s1 and s2 , respectively.5 Then f defined def by f (x) = f 2 (x, f 1 (x)) is computable in space s such that s(n) = max(s1 (n), s2 (n + (n))) + (n) + δ(n), 4
Indeed, for this approach to work, we should be able to detect the case that the input is not of the proper form (and do so within sublogarithmic space). 5 Here (and throughout the chapter) we assume, for simplicity, that all complexity bounds are monotonically nondecreasing. Another minor inaccuracy (in the text) is that we stated the complexity of the algorithm that computes f 2 in a somewhat nonstandard way. Recall that by the standard convention, the complexity of an algorithm should be stated in terms of the length of its input, which in this case is a pair (x, y) that may be encoded as a string of length x + y + 2 log2 x (but not as a string of length x + y). An alternative convention is to state the complexity of such computations in terms of the length of both parts of the input (i.e., have s : N × N → N rather than s : N → N), but we did not do this either.
146
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.1. GENERAL PRELIMINARIES AND ISSUES
x
x
x
A1
A1
A1
f1(x)
f1(x)
f1(x) counters
A2
f(x)
A2
A2
f(x)
f(x)
Figure 5.1: Three composition methods for spacebounded computation. The leftmost figure shows the trivial composition (which just invokes A 1 and A 2 without attempting to economize storage), the middle figure shows the naive composition (of Lemma 5.1), and the rightmost figure shows the emulative composition (of Lemma 5.2). In all figures the filled rectangles represent designated storage spaces. The dotted rectangle represents a virtual storage device.
where (n) = maxx∈{0,1}n { f 1 (x)} and δ(n) = O(log((n) + s2 (n + (n)))) = o(s(n)). Lemma 5.1 is useful when is relatively small, but in many cases max(s1 , s2 ). In these cases, the following composition lemma is more useful. Proof: Indeed, f (x) is computed by first computing and storing f 1 (x), and then reusing the space (used in the first computation) when computing f 2 (x, f 1 (x)). This explains the dominant terms in s(n); that is, the term max(s1 (n), s2 (n + (n))) accounts for the computations themselves (which reuse the same space), whereas the term (n) accounts for storing the intermediate result (i.e., f 1 (x)). The extra term is due to implementation details. Specifically, the same storage device is used both for storing f 1 (x) and for providing workspace for the computation of f 2 , which means that we need to maintain our location on each of these two parts (i.e., the location of the algorithm (that computes f 2 ) on f 1 (x) and its location on its own work space). (See further discussion at end of the proof of Lemma 5.2.) The extra O(1) term accounts for the overhead involved in emulating two algorithms. Lemma 5.2 (emulative composition): Let f 1 , f 2 , s1 , s2 , and f be as in Lemma 5.1. Then f is computable in space s such that s(n) = s1 (n) + s2 (n + (n)) + O(log(n + (n))) + δ(n), where δ(n) = O(log(s1 (n) + s2 (n + (n)))) = o(s(n)). The alternative compositions are depicted in Figure 5.1 (which also shows the most straightforward composition that makes no attempt to economize space). Proof: The idea is avoiding the storage of the temporary value of f 1 (x) by computing each of its bits (“on the fly”) whenever this bit is needed for the computation of
147
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
f 2 . That is, we do not start by computing f 1 (x), but rather start by computing f 2 (x, f 1 (x)) although we do not have some of the bits of the relevant input (i.e., the bits of f 1 (x)). The missing bits will be computed (and recomputed) whenever we need them in the computation of f 2 (x, f 1 (x)). Details follow. Let A1 and A2 be the algorithms (for computing f 1 and f 2 , respectively) guaranteed in the hypothesis.6 Then, on input x ∈ {0, 1}n , we invoke algorithm A2 (for computing f 2 ). Algorithm A2 is invoked on a virtual input, and so when emulating each of its steps we should provide it with the relevant bit. Thus, we should also keep track of the location of A2 on the imaginary (virtual) inputtape. Whenever A2 seeks to read the i th bit of its input, where i ∈ [n + (n)], we provide A2 with this bit by reading it from x if i ≤ n and invoke A1 (x) otherwise. When invoking A1 (x) we provide it with a virtual outputtape, which means that we get the bits of its output one by one and do not record them anywhere. Instead, we count until reaching the (i − n)th outputbit, which we then pass to A2 (as the i th bit of x, f 1 (x)). Note that while invoking A1 (x), we suspend the execution of A2 but keep its current configuration such that we can resume the execution (of A2 ) once we get the desired bit. Thus, we need to allocate separate space for the computation of A2 and for the computation of A1 . In addition, we need to allocate separate storage for maintaining the aforementioned counters (i.e., we use log2 (n + (n)) bits to hold the location of the inputbit currently read by A2 , and log2 (n) bits to hold the index of the outputbit currently produced in the current invocation of A1 ). A final (and tedious) issue is that our description of the composed algorithm refers to two storage devices, one for emulating the computation of A1 and the other for emulating the computation of A2 . The issue is not the fact that the storage (of the composed algorithm) is partitioned between two devices, but rather that our algorithm uses two pointers (one per each of the two storage devices). In contrast, a (“fair”) composition result should yield an algorithm (like A1 and A2 ) that uses a single storage device with a single pointer to locations on this device. Indeed, such an algorithm can be obtained by holding the two original pointers in memory; the additional δ(n) term accounts for this additional storage. Reflection. The algorithm presented in the proof of Lemma 5.2 is wasteful in terms of time: it recomputes f 1 (x) again and again (i.e., once per each access of A2 to the second part of its input). Indeed, our aim was economizing on space and not on time (and the two goals may be conflicting (see, e.g., [59, Sec. 4.3])).
5.1.3.2. An Obvious Bound The time complexity of an algorithm is essentially upper bounded by an exponential function in its space complexity. This is due to an upper bound on the number of possible instantaneous “configurations” of the algorithm (as formulated in the proof of Theorem 5.3), and to the fact that if the computation passes through the same configuration twice then it must loop forever. Theorem 5.3: If an algorithm A has binary space complexity s and halts on every input then it has time complexity t such that t(n) ≤ n · 2s(n)+log2 s(n) . 6
We assume, for simplicity, that algorithm A1 never rewrites on (the same location of) its writeonly outputtape. As shown in Exercise 5.2, this assumption can be justified at an additive cost of O(log (n)). Alternatively, the idea presented in Exercise 5.2 can be incorporated directly in the current proof.
148
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.1. GENERAL PRELIMINARIES AND ISSUES
Note that for s(n) = (log n), the factor of n can be absorbed by 2 O(s(n)) , and so we may just write t(n) = 2 O(s(n)) . Indeed, throughout this chapter (as in most of this book), we will consider only algorithms that halt on every input (see Exercise 5.5 for further discussion). Proof: The proof refers to the notion of an instantaneous configuration (in a computation). Before starting, we warn the reader that this notion may be given different definitions, each tailored to the application at hand. All these definitions share the desire to specify variable information that together with some fixed information determines the next step of the computation being analyzed. In the current proof, we fix an algorithm A and an input x, and consider as variable the contents of the storage device (e.g., worktape of a Turing machine as well as its finite state) and the machine’s location on the input device and on the storage device. Thus, an instantaneous configuration of A(x) consists of the latter three objects (i.e., the contents of the storage device and a pair of locations), and can be encoded by a binary string of length (x) = s(x) + log2 x + log2 s(x).7 The key observation is that the computation A(x) cannot pass through the same instantaneous configuration twice, because otherwise the computation A(x) passes through this configuration infinitely many times, which means that this computation does not halt. This observation is justified by noting that the instantaneous configuration, together with the fixed information (i.e., A and x), determines the next step of the computation. Thus, whatever happens (i steps) after the first time that the computation A(x) passes through configuration γ will also happen (i steps) after the second time that the computation A(x) passes through γ . By the foregoing observation, we infer that the number of steps taken by A on input x is at most 2(x) , because otherwise the same configuration will appear twice in the computation (which contradicts the halting hypothesis). The theorem follows.
5.1.3.3. Subtleties Regarding SpaceBounded Reductions Lemmas 5.1 and 5.2 suffice for the analysis of the effect of manytoone reductions in the context of spacebounded computations. (By a manytoone reduction of the function f to the function g, we mean a mapping π such that for every x it holds that f (x) = g(π(x)).)8 1. (In the spirit of Lemma 5.1:) If f is reducible to g via a manytoone reduction that can be computed in space s1 , and g is computable in space s2 , then f is computable in space s such that s(n) = max(s1 (n), s2 ((n))) + (n) + δ(n), where (n) denotes the maximum length of the image of the reduction when applied to some nbit string and δ(n) = O(log((n) + s2 ((n)))) = o(s(n)). 2. (In the spirit of Lemma 5.2:) For f and g as in Item 1, it follows that f is computable in space s such that s(n) = s1 (n) + s2 ((n)) + O(log (n)) + δ(n), where δ(n) = O(log(s1 (n) + s2 ((n)))) = o(s(n)). 7
Here we rely on the fact that s is the binary space complexity (and not the standard space complexity); see summary item 3 in Section 5.1.1. 8 This is indeed a special case of the setting of Lemmas 5.1 and 5.2 (obtained by letting f 1 = π and f 2 (x, y) = g(y)). However, the results claimed for this special case are better than those obtained by invoking the corresponding lemma (i.e., s2 is applied to (n) rather than to n + (n)).
149
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Note that by Theorem 5.3, it holds that (n) ≤ 2s1 (n)+log2 s1 (n) · n. We stress the fact that is not upperbounded by s1 itself (as in the analogous case of timebounded computation), but rather by exp(s1 ). Things get much more complicated when we turn to general (spacebounded) reductions, especially when referring to general reductions that make a nonconstant number of queries. A preliminary issue is defining the space complexity of general reductions (i.e., of oracle machines). In the standard definition, the length of the queries and answers is not counted in the space complexity, but the queries of the reduction (resp., answers given to it) are written on (resp., read from) a special device that is writeonly (resp., readonly) for the reduction (and readonly (resp., writeonly) for the invoked oracle). Note that these convention are analogous to the conventions regarding input and output (as well as fit the definitions of spacebounded manytoone reductions that were outlined in the foregoing items). The foregoing conventions suffice for defining general spacebounded reductions. They also suffice for obtaining appealing composition results in some cases (e.g., for reductions that make a single query or, more generally, for the case of nonadaptive queries). But more difficulties arise when seeking composition results for general reductions, which may make several adaptive queries (i.e., queries that depend on the answers to prior queries). As we shall show next, in this case it is essential to upperbound the length of every query and/or every answer in terms of the length of the initial input. Teaching note: The rest of the discussion is quite advanced and laconic (but is inessential to the rest of the chapter).
Recall that the complexity of the algorithm resulting from the composition of an oracle machine and an actual algorithm (which implements the oracle) depends on the length of the queries made by the oracle machine. For example, the space complexity of the foregoing compositions, which referred to singlequery reductions, had an s2 ((n)) term (where (n) represents the length of the query). In general, the length of the first query is upperbounded by an exponential function in the space complexity of the oracle machine, but the same does not necessarily hold for subsequent queries, unless some conventions are added to enforce it. For example, consider a reduction that, on input x and access to an oracle f such that  f (z) = 2z, invokes the oracle x times, where each time it uses as a query the answer obtained to the previous query. This reduction uses constant space, but produces queries that are exponentially longer than the input, whereas the first query of any constantspace reduction has length that is linear in its input. This problem can be resolved by placing explicit bounds on the length of the queries that spacebounded reductions are allowed to make; for example, we may bound the length of all queries by the obvious bound that holds for the length of the first query (i.e., a reduction of space complexity s is allowed to make queries of length at most 2s(n)+log2 s(n) · n). With the aforementioned convention (or restriction) in place, let us consider the composition of general spacebounded reductions with a spacebounded implementation of the oracle. Specifically, we say that a reduction is (, )restricted if, on input x, all oracle queries are of length at most (x) and the corresponding oracle answers are of length at most (x). It turns out that naive composition (in the spirit of Lemma 5.1) remains useful, whereas the emulative composition of Lemma 5.2 breaks down (in the sense that it yields very weak results).
150
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.1. GENERAL PRELIMINARIES AND ISSUES
1. Following Lemma 5.1, we claim that if can be solved in space s1 when given (, )restricted oracle access to and is solvable is space s2 , then is solvable in space s such that s(n) = s1 (n) + s2 ((n)) + (n) + (n) + δ(n), where δ(n) = O(log((n) + (n) + s1 (n) + s2 ((n)))) = o(s(n)). This claim is proved by using a naive emulation that allocates separate space for the reduction (i.e., oracle machine) itself, for the emulation of its query and answer devices, and for the algorithm solving . Note, however, that here we cannot reuse the space of the reduction when running the algorithm that solves , because the reduction’s computation continues after the oracle answer is obtained. The additional δ(n) term accounts for the various pointers of the oracle machine, which need to be stored when the algorithm that solves is invoked (cf. last paragraph in the proof of Lemma 5.2). A related composition result is presented in Exercise 5.7. This composition refrains from storing the current oracle query (but does store the corresponding answer). It yields s(n) = O(s1 (n) + s2 ((n)) + (n) + log (n)), which for (n) < 2 O(s1 (n)) means s(n) = O(s1 (n) + s2 ((n)) + (n)). 2. Turning to the approach underlying the proof of Lemma 5.2, we get into more serious trouble. Specifically, note that recomputing the answer to the i th query requires recomputing the query itself, which unlike in Lemma 5.2 is not the input to the reduction but rather depends on the answers to prior queries, which need to be recomputed as well. Thus, the space required for such an emulation is at least linear in the number of queries. We note that one should not expect a general composition result (i.e., in the spirit of the foregoing Item 1) in which s(n) = F(s1 (n), s2 ((n))) + o(min((n), (n))), where F is any function. One demonstration of this fact is implied by the observation that any computation of space complexity s can be emulated by a constantspace (2s, 2s)restricted reduction to a problem that is solvable in constant space (see Exercise 5.9). Nonadaptive reductions. Composition is much easier in the special case of nonadaptive reductions. Loosely speaking, the queries made by such reductions do not depend on the answers obtained to previous queries. Formulating this notion is not straightforward in the context of spacebounded computation. In the context of timebounded computations, nonadaptive reductions are viewed as consisting of two algorithms: a querygenerating algorithm, which generates a sequence of queries, and an evaluation algorithm, which given the input and a sequence of answers (obtained from the oracle) produces the actual output. The reduction is then viewed as invoking the querygenerating algorithm (and recording the sequence of generated queries), making the designated queries (and recording the answers obtained), and finally invoking the evaluation algorithm on the sequence of answers. Using such a formulation raises the question of how to describe nonadaptive reductions of small space complexity. This question is resolved by designated special storage devices for the aforementioned sequences (of queries and answers) and postulating that these devices can be used only as described. For details, see Exercise 5.8. Note that nonadaptivity resolves most of the difficulties discussed in the foregoing. In particular, the length of each query made by a nonadaptive reduction is upperbounded by an exponential in the space complexity of the reduction (just as in the case of singlequery reductions). Furthermore, composing such reductions with an algorithm that implements the oracle is not more involved than doing the same for singlequery reductions. Thus, as shown in Exercise 5.8, if is reducible to via a nonadaptive reduction of space
151
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
complexity s1 that makes queries of length at most and is solvable is space s2 , then is solvable in space s such that s(n) = O(s1 (n) + s2 ((n))). (Indeed (n) < 2 O(s1 (n)) · n always hold.) Reductions to decision problems. Composition in the case of reductions to decision problems is also easier, because also in this case the length of each query made by the reduction is upperbounded by an exponential in the space complexity of the reduction (see Exercise 5.10). Thus, applying the seminaive composition result of Exercise 5.7 (mentioned in the foregoing Item 1) is very appealing. It follows that if can be solved in space s1 when given oracle access to a decision problem that is solvable is space s2 , then is solvable in space s such that s(n) = O(s1 (n) + s2 (2s1 (n)+log(n·s1 (n)) )). Indeed, if the length of each query in such a reduction is upperbounded by , then we may use s(n) = O(s1 (n) + s2 ((n))). These results, however, are of limited interest, because it seems difficult to construct smallspace reductions of search problems to decision problems (see §5.1.3.4). We mention that an alternative notion of spacebounded reductions is discussed in §5.2.4.2. This notion is more cumbersome and more restricted, but in some cases it allows recursive composition with a smaller overhead than offered by the aforementioned composition results.
5.1.3.4. Search Versus Decision Recall that in the setting of time complexity we allowed ourselves to focus on decision problems, since search problems could be efficiently reduced to decision problems. Unfortunately, these reductions (e.g., the ones underlying Theorem 2.10 and Proposition 2.15) are not adequate for the study of (small) space complexity. Recall that these reductions extend the currently stored prefix of a solution by making a query to an adequate decision problem. Thus, these reductions have space complexity that is lowerbounded by the length of the solution, which makes them irrelevant for the study of smallspace complexity. In light of the foregoing, the study of the space complexity of search problems cannot be “reduced” to the study of the space complexity of decision problems. Thus, while much of our exposition will focus on decision problems, we will keep an eye on the corresponding search problems. Indeed, in many cases, the ideas developed in the study of the decision problems can be adapted to the study of the corresponding search problems (see, e.g., Exercise 5.17). 5.1.3.5. Complexity Hierarchies and Gaps Recall that more space allows for more computation (see Theorem 4.9), provided that the spacebounding function is “nice” in an adequate sense. Actually, the proofs of spacecomplexity hierarchies and gaps are simpler than the analogous proofs for time complexity, because emulations are easier in the context of spacebounded algorithms (cf. Section 4.3). 5.1.3.6. Simultaneous TimeSpace Complexity Recall that, for space complexity that is at least logarithmic, the time of a computation is always upperbounded by an exponential function in the space complexity (see Theorem 5.3). Thus, polylogarithmic space complexity may extend beyond polynomial time, and it make sense to define a class that consists of all decision problems that may be solved by a polynomialtime algorithm of polylogarithmic space complexity. This class,
152
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.2. LOGARITHMIC SPACE
denoted SC, is indeed a natural subclass of P (and contains the class L, which is defined in Section 5.2.1).9 In general, one may define DTISP(t, s) as the class of decision problems solvable by an algorithm that has time complexity t and space complexity s. Note that DTISP(t, s) ⊆ DTIME(t) ∩ DSPACE(s) and that a strict containment may hold. We mention that DTISP(·, ·) provides the arena for the only known absolute (and highly nontrivial) lower bound regarding N P; see [79]. We also note that lower bounds on timespace tradeoffs (see, e.g., [59, Sec. 4.3]) may be stated as referring to the classes DTISP(·, ·).
5.1.4. Circuit Evaluation Recall that Theorem 3.1 asserts the existence of a polynomialtime algorithm that, given a circuit C : {0, 1}n → {0, 1}m and an nbit long string x, returns C(x). For circuits of bounded fanin, the space complexity of such an algorithm can be made linear in the depth of the circuit (which may be logarithmic in its size). This is obtained by the following DFStype algorithm. The algorithm (recursively) determines the value of a gate in the circuit by first determining the value of its first incoming edge and next determining the value of the second incoming edge. Thus, the recursive procedure, started at each output terminal of the circuit, needs only store the path that leads to the currently processed vertex as well as the temporary values computed for each ancestor. Note that this path is determined by indicating, for each vertex on the path, whether we currently process its first or second incoming edge. In the case that we currently process the vertex’s second incoming edge, we need also store the value computed for its first incoming edge. The temporary storage used by the foregoing algorithm, on input (C, x), is thus 2dC + O(log x + log C(x)), where dC denotes the depth of C. The first term in the space bound accounts for the core activity of the algorithm (i.e., the recursion), whereas the other terms account for the overhead involved in manipulating the initial input and final output (i.e., assigning the bits of x to the corresponding input terminals of C and scanning all output terminals of C). Note. Further connections between circuit complexity and space complexity are mentioned in Section 5.2.3 and §5.3.2.2.
5.2. Logarithmic Space Although Exercise 5.3 asserts that “there is life below logspace,” logarithmic space seems to be the smallest amount of space that supports interesting computational phenomena. In particular, logarithmic space is required for merely maintaining an auxiliary counter that holds a position in the input, which seems required in many computations. On the other hand, logarithmic space suffices for solving many natural computational problems, for establishing reductions among many natural computational problems, and for a stringent notion of uniformity (of families of Boolean circuits). Indeed, an important feature of logarithmic space computations is that they are a natural subclass of the polynomialtime computations (see Theorem 5.3). 9
We also mention that BPL ⊆ SC, where BPL is defined in §6.1.5.1 and the result is proved in Section 8.4 (see Theorem 8.23).
153
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
5.2.1. The Class L Focusing on decision problems, we denote by L the class of decision problems that are solvable by algorithms of logarithmic space complexity; that is, L = ∪c DSPACE(c ), def where c (n) = c log2 n. Note that, by Theorem 5.3, L ⊆ P. As hinted, many natural computational problems are in L (see Exercises 5.6 and 5.12 as well as Section 5.2.4). On the other hand, it is widely believed that L = P.
5.2.2. LogSpace Reductions Another class of important logspace computations is the class of logarithmic space reductions. In light of the subtleties discussed in §5.1.3.3, we focus on the case of manytoone reductions. Analogously to the definition of Karpreductions (Definition 2.11), we say that f is a logspace (manytoone) reduction of S to S if f is logspace computable and, for every x, it holds that x ∈ S if and only if f (x) ∈ S . By Lemma 5.2 (and Theorem 5.3), if S is logspace reducible to some set in L then S ∈ L. Similarly, one can define a logspace variant of Levinreductions (Definition 2.12). Both types of reductions are transitive (see Exercise 5.11). Note that Theorem 5.3 applies in this context and implies that these reductions run in polynomial time. Thus, the notion of a logspace manytoone reduction is a special case of a Karpreduction. We observe that all known Karpreductions establishing NPcompleteness results are actually logspace reductions. This is easily verifiable in the case of the reductions presented in Section 2.3.3 (as well as in Section 2.3.2). For example, consider the generic reduction to CSAT presented in the proof of Theorem 2.21: The constructed circuit is “highly uniform” and can be easily constructed in logarithmic space (see also Section 5.2.3). A degeneration of this reduction suffices for proving that every problem in P is logspace reducible to the problem of evaluating a given circuit on a given input. Recall that the latter problem is in P, and thus we may say that it is Pcomplete under logspace reductions. Theorem 5.4 (The complexity of Circuit Evaluation): Let CEVL denote the set of pairs (C, α) such that C is a Boolean circuit and C(α) = 1. Then CEVL is in P and every problem in P is logspace Karpreducible to CEVL. Proof Sketch: Recall that the observation underlying the proof of Theorem 2.21 (as well as the proof of Theorem 3.6) is that the computation of a Turing machine can be emulated by a (“highly uniform”) family of circuits. In the proof of Theorem 2.21, we hardwired the input to the reduction (denoted x) into the circuit (denoted C x ) and introduced input terminals corresponding to the bits of the NPwitness (denoted y). In the current context we leave x as an input to the circuit, while noting that the auxiliary NPwitness does not exist (or has length zero). Thus, the reduction from S ∈ P to CEVL maps the instance x (for S) to the pair (Cx , x), where Cx is a circuit that emulates the computation of the machine that decides membership in S (on any xbit long input). For the sake of future use (in Section 5.2.3), we highlight the fact that Cx can be constructed by a logspace machine that is given the input 1x .
154
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.2. LOGARITHMIC SPACE
The impact of Pcompleteness under logspace reductions. Indeed, Theorem 5.4 implies that L = P if and only if CEVL ∈ L. Other natural problems were proved to have the same property (i.e., being Pcomplete under logspace reductions; cf. [60]). Logspace reductions are used to define completeness with respect to other classes that are assumed to extend beyond L. This restriction of the power of the reduction is definitely needed when the class of interest is contained in P (e.g., N L; see Section 5.3.2). In general, we say that a problem is C complete under logspace reductions if is in C and every problem in C is logspace (manytoone) reducible to . In such a case, if ∈ L then C ⊆ L. As in the case of polynomialtime reductions, we wish to stress that the relevance of logspace reductions extends beyond being a tool for defining complete problems.
5.2.3. LogSpace Uniformity and Stronger Notions Recall that a basic notion of uniformity of a family of circuits (Cn )n∈N , introduced in Definition 3.3, requires the existence of an algorithm that on input n outputs the description of Cn , while using time that is polynomial in the size of Cn . Strengthening Definition 3.3, we say that a family of circuits (Cn )n∈N is logspace uniform if there exists an algorithm that on input n outputs Cn while using space that is logarithmic in the size of Cn . As implied by the following Theorem 5.5 (and implicitly proved in the foregoing Theorem 5.4), the computation of any polynomialtime algorithm can be emulated by a logspace uniform family of (bounded fanin) polynomialsize circuits. On the other hand, in continuation of Section 5.1.4, we note that logspace uniform circuits of bounded fanin and logarithmic depth can be emulated by an algorithm of logarithmic space complexity (i.e., “logspace uniform N C 1 ” is in L; see Exercise 5.12). As mentioned in Section 3.1.1, stronger notions of uniformity have also been considered. Specifically, in an analogy to the discussion in §E.2.1.2, we say that (Cn )n∈N has a strongly explicit construction if there exists an algorithm that runs in polynomial time and linear space such that, on input n and v, the algorithm returns the label of vertex v in Cn as well as the list of its children (or an indication that v is not a vertex in Cn ). Note that if (Cn )n∈N has a strongly explicit construction then it is logspace uniform, because the length of the description of a vertex in Cn is logarithmic in the size of Cn . The proof of Theorem 5.4 actually establishes the following. Theorem 5.5 (strongly uniform circuits emulating P): For every polynomialtime algorithm A there exists a strongly explicit construction of a family of polynomialsize circuits (Cn )n∈N such that for every x it holds that Cx (x) = A(x). Proof Sketch: As noted already, the circuits (Cx )x (considered in the proof of Theorem 5.4) are highly uniform. In particular, the underlying (directed) graph consists of constantsize gadgets that are arranged in an array and are only connected to adjacent gadgets (see the proof of Theorem 2.21).
5.2.4. Undirected Connectivity Exploring a graph (e.g., toward determining its connectivity) is one of the most basic and ubiquitous computational tasks regarding graphs. The standard graphexploration
155
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
algorithms (e.g., BFS and DFS) require temporary storage that is linear in the number of vertices. In contrast, the algorithm presented in this section uses temporary storage that is only logarithmic in the number of vertices. In addition to demonstrating the power of logspace computation, this algorithm (or rather its actual implementation) provides a taste of the type of issues arising in the design of sophisticated logspace algorithms. The intuitive task of “exploring a graph” is captured by the task of deciding whether a given graph is connected.10 In addition to the intrinsic interest in this natural computational problem, we mention that it is computationally equivalent (under logspace reductions) to numerous other computational problems (see, e.g., Exercise 5.16). We note that some related computational problems seem actually harder; for example, determining directed connectivity (in directed graphs) captures the essence of the class N L (see Section 5.3.2). In view of this state of affairs, we emphasize the fact that the computational problem considered here refers to undirected graphs by calling it undirected connectivity. Theorem 5.6: Deciding undirected connectivity (UCONN) is in L The algorithm is based on the fact that UCONN is easy in the special case that the graph consists of a collection of constant degree expanders.11 In particular, if the graph has constant degree and logarithmic diameter then it can be explored using a logarithmic amount of space (which is used for determining a generic path from a fixed starting vertex).12 Needless to say, the input graph does not necessarily consist of a collection of constant degree expanders. The main idea is then to transform the input graph into one that does satisfy the aforementioned condition, while preserving the number of connected components of the graph. Furthermore, the key point is performing such a transformation in logarithmic space. The rest of this section is devoted to the description of such a transformation. We first present the basic approach and next turn to the highly nontrivial implementation details. Teaching note: We recommend leaving the actual proof of Theorem 5.6 (i.e., the rest of this section) for advanced reading. The main reason is its heavy dependence on technical material that is beyond the scope of a course in Complexity Theory.
Getting started. We first note that it is easy to transform the input graph G 0 = (V0 , E 0 ) into a constantdegree graph G 1 that preserves the number of connected components in G 0 . Specifically, each vertex v ∈ V having degree d(v) (in G 0 ) is represented by a cycle Cv of d(v) vertices (in G 1 ), and each edge {u, v} ∈ E 0 is replaced by an edge having one endpoint on the cycle Cv and the other endpoint on the cycle Cu such that each vertex in G 1 has degree three (i.e., has two cycle edges and a single intracycle edge). This transformation can be performed using logarithmic space, and thus (relying on Lemma 5.2) we assume that the input graph has degree three. 10
See Appendix G.1 for basic terminology. At this point, the reader may think that expanders are merely graphs of logarithmic diameter. At a later stage, we will rely on a basic familiarity with a specific definition of expanders as well as with a specific technique for constructing them. The relevant material is contained in Appendix E.2. 12 Indeed, this is analogous to the circuitevaluation algorithm of Section 5.1.4, where the circuit depth corresponds to the diameter and the bounded fanin corresponds to the constant degree. For further details, see Exercise 5.13. 11
156
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.2. LOGARITHMIC SPACE
Our goal is to transform this graph into a collection of expanders, while maintaining the number of connected components. In fact, we shall describe the transformation while pretending that the graph is connected, while noting that otherwise the transformation acts separately on each connected component. A couple of technicalities. For a constant integer d > 2 determined so as to satisfy some additional condition, we may assume that the input graph is actually d 2 regular (albeit is not necessarily simple). Furthermore, we shall assume that this graph is not bipartite. Both assumptions can be justified by augmenting the aforementioned construction of a 3regular graph by adding d 2 − 3 selfloops to each vertex. Prerequisites. Evidently, the notion of an expander graph plays a key role in the aforementioned transformation. For a brief review of this notion, the reader is referred to Appendix E.2. In particular, we assume familiarity with the algebraic definition of expanders (as presented in §E.2.1.1). Furthermore, the transformation relies heavily on the ZigZag product, defined in §E.2.2.2, and the following exposition assumes familiarity with this definition.
5.2.4.1. The Basic Approach Recall that our goal is to transform G 1 into an expander. The transformation is gradual and consists of logarithmically many iterations, where in each iteration an adequate expansion parameter doubles while the graph becomes a constant factor larger and maintains the degree bound. The (expansion) parameter of interest is the relative eigenvalue gap (see §E.2.1.1).13 A constant value of this parameter indicates that the graph is an expander. Initially, this parameter is lowerbounded by (n −2 ), where n is the size of the graph. Since this parameter doubles in each iteration, after logarithmically many iterations this parameter is lowerbounded by a constant (and hence the current graph is an expander). The crux of the aforementioned gradual transformation is the transformation that takes place in each single iteration. This transformation is supposed to double the expansion parameter while maintaining the graph’s degree and increasing the number of vertices by a constant factor. The transformation combines the (standard) graphpowering operation and the ZigZag product presented in §E.2.2.2. Specifically, for adequate positive integers d and c, we start with the d 2 regular graph G 1 = (V1 , E 1 ), and go through a z G for i = 1, . . . , t − 1, where G logarithmic number of iterations letting G i+1 = G ic is a fixed dregular graph with d 2c vertices. That is, in each iteration, we raise the current graph (i.e., G i ) to the power c and combine the resulting graph (d 2c regular) with the fixed (d 2c vertex) graph G using the ZigZag product. Thus, G i+1 is a d 2 regular graph with d i·2c · V1  vertices, where this invariant is preserved by definition of the ZigZag product (i.e., the ZigZag product of a d 2c regular graph G = (V , E ) with the dregular graph G (which has d 2c vertices) yields a d 2 regular graph with d 2c · V  vertices). The analysis of the improvement in the expansion parameter (i.e., the relative eigendef ¯ value gap), denoted δ(·) = 1 − λ(·), relies on Eq. (E.11). Recall that Eq. (E.11) implies 13
Recall that the eigenvalue gap of a regular graph is the difference between the graph’s degree (i.e., the graph’s ¯ largest eigenvalue) and the absolute value of each of the other eigenvalues. The relative eigenvalue bound, denoted λ, is the eigenvalue bound divided by the graph’s degree.
157
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
¯ ¯ ¯ ))/3. Thus, the fixed graph G is sethat if λ(G) < 1/2 then 1 − λ(G z G) > (1 − λ(G ¯ lected such that λ(G) < 1/2, which requires a sufficiently large constant d. Thus, we have ¯ i )c ¯ ic ) 1 − λ(G 1 − λ(G ¯ ic = δ(G i+1 ) = 1 − λ(G z G) > 3 3 ¯ i )c > whereas, for a sufficiently large constant integer c > 0, it holds that 1 − λ(G 14 ¯ min(6 · (1 − λ(G i )), 1/2). It follows that δ(G i+1 ) > min(2δ(G i ), 1/6). Thus, setting ¯ 1 ) = (V1 −2 ), we obtain δ(G t ) > 1/6 as t = O(log V1 ) and using δ(G 1 ) = 1 − λ(G desired. Needless to say, a “detail” of crucial importance is the ability to transform G 1 into G t via a logspace computation. Indeed, the transformation of G i to G i+1 can be performed in logarithmic space (see Exercise 5.14), but we need to compose a logarithmic number of such transformations. Unfortunately, the standard composition lemmas for spacebounded algorithms involve overhead that we cannot afford.15 Still, taking a closer look at the transformation of G i to G i+1 , one may note that it is highly structured, and in some sense it can be implemented in constant space and supports a stronger composition result that incurs only a constant amount of storage per iteration. The resulting implementation (of the iterative transformation of G 1 to G t ) and the underlying formalism will be the subject of §5.2.4.2. (An alternative implementation, provided in [190], can be obtained by unraveling the composition.)
5.2.4.2. The Actual Implementation The spaceefficient implementation of the iterative transformation outlined in §5.2.4.1 is based on the observation that we do not need to explicitly construct the various graphs but merely provide “oracle access” to them. This observation is crucial when applied to the intermediate graphs; that is, rather than constructing G i+1 , when given G i as input, we show how to provide oracle access to G i+1 (i.e., answer “neighborhood queries” regarding G i+1 ) when given oracle access to G i (i.e., an oracle that answers neighborhood queries regarding G i ). This means that we view G i and G i+1 (or rather their incidence lists) as functions (to be evaluated) rather than as strings (to be printed), and show how to reduce the task of finding neighbors in G i+1 (i.e., evaluating the “incidence function” at a given vertex) to the task of finding neighbors in G i . A clarifying discussion. Note that here we are referring to oracle machines that access a finite oracle, which represents a finite variable object (which, in turn, is an instance of some computational problem). Such a machine provides access to a complex object by using its access to a more basic object, which is represented by the oracle. Specifically, such a machine gets an input, which is a “query” regarding the complex object (i.e, the object that the machine tries to emulate), and produces an output (which is the answer to the query). Analogously, these machines make queries, which are queries 14 ¯ i ) < (1 − (1/c)), show that 1 − λ(G ¯ i )c > 1/2. Otherwise, Consider the following two cases: In the case that λ(G def ¯ i ), and using ε ≤ 1/c show that 1 − λ(G ¯ i )c > cε/2. let ε = 1 − λ(G 15 We cannot afford the naive composition (of Lemma 5.1), because it causes an overhead linear in the size of the intermediate result. As for the emulative composition (of Lemma 5.2), it sums up the space complexities of the composed algorithms (not to mention adding another logarithmic term), which would result in a logsquared bound on the space complexity.
158
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.2. LOGARITHMIC SPACE
regarding another object (i.e., the one represented in the oracle), and obtain corresponding answers.16 Like in §5.1.3.3, queries are made via a special writeonly device and the answers are read from a corresponding readonly device, where the use of these devices is not charged in the space complexity. With these conventions in place, we claim that neighborhoods in the d 2 regular graph G i+1 can be computed by a constantspace oracle machine that is given oracle access to the d 2 regular graph G i . That is, letting gi : Vi × [d 2 ] → Vi × [d 2 ] (resp., gi+1 : Vi+1 × [d 2 ] → Vi+1 × [d 2 ]) denote the edgerotation function17 of G i (resp., G i+1 ), we have: Claim 5.7: There exists a constantspace oracle machine that evaluates gi+1 when given oracle access to gi , where the state of the machine is counted in the space complexity. Proof Sketch: We first show that the two basic operations that underly the definition of G i+1 (i.e., powering and ZigZag product with a constant graph) can be performed in constant space. The edgerotation function of G i2 (i.e., the square of the graph G i ) can be evaluated at any desired pair, by evaluating the edgerotation function of G i twice, and using a constant amount of space. Specifically, given v ∈ Vi and j1 , j2 ∈ [d 2 ], we compute gi (gi (v, j1 ), j2 ), which is the edge rotation of (v, j1 , j2 ) in G i2 , as follows. First, making the query (v, j1 ), we obtain the edge rotation of (v, j1 ), denoted (u, k1 ). Next, making the query (u, j2 ), we obtain (w, k2 ), and finally we output (w, k2 , k1 ). We stress that we only use the temporary storage to record k1 , whereas u is directly copied from the oracle answer device to the oracle query device. Accounting also for a constant number of states needed for the various stages of the foregoing activity, we conclude that graph squaring can be performed in constant space. The argument extends to the task of raising the graph to any constant power. Turning to the ZigZag product (of an arbitrary regular graph G with a fixed graph G), we note that the corresponding edgerotation function can be evaluated in constant space (given oracle access to the edgerotation function of G ). This follows directly from Eq. (E.9), noting that the latter calls for a single evaluation of the edgerotation function of G and two simple modifications that only depend on the constantsize graph G (and affect a constant number of bits of the relevant strings). Again, using the fact that it suffices to copy vertex names from the input to the oracle query device (or from the oracle answer device to the output), we conclude that the aforementioned activity can be performed using constant space. The argument extends to a sequential composition of a constant number of operations of the aforementioned type (i.e., graph squaring and ZigZag product with a constant graph). 16
Indeed, the current setting (in which the oracle represents a finite variable object, which in turn is an instance of some computational problem) is different from the standard setting, where the oracle represents a fixed computational problem. Still the mechanism (and/or operations) of these two types of oracle machines is the same: They both get an input (which here is a “query” regarding a variable object rather than an instance of a fixed computational problem) and produce an output (which here is the answer to the query rather than a “solution” for the given instance). Analogously, these machines make queries (which here are queries regarding another variable object rather than queries regarding another fixed computational problem) and obtain corresponding answers. 17 Recall that the edgerotation function of a graph maps the pair (v, j) to the pair (u, k) if vertex u is the j th neighbor of vertex v and v is the k th neighbor of u (see §E.2.2.2).
159
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Recursive composition. Using Claim 5.7, we wish to obtain a O(t)space oracle machine that evaluates gt by making oracle calls to g1 , where t = O(log V1 ). Such an oracle machine will yield a logspace transformation of G 1 to G t (by evaluating gt at all possible values). It is tempting to hope that an adequate composition lemma, when applied to Claim 5.7, will yield the desired O(t)space oracle machine (reducing the evaluation of gt to g1 ). This is indeed the case, except that the adequate composition lemma is still to be developed (as we do next). We first note that applying a naive composition (as in Lemma 5.1) amounts to an additive overhead of O(log V1 ) per each composition. But we cannot afford more than an amortized constant additive overhead per composition. Applying the emulative composition (as in Lemma 5.2) causes a multiplicative overhead per each composition, which is certainly unaffordable. The composition developed next is a variant of the naive composition, which is beneficial in the context of recursive calls. The basic idea is deviating from the paradigm that allocates separate input/output and query devices to each level in the recursion, and combining all these devices in a single (“global”) device that will be used by all levels of the recursion. That is, rather than following the “structured programming” methodology of using locally designated space for passing information to the subroutine, we use the “bad programming” methodology of passing information through global variables. (As usual, this notion is formulated by referring to the model of multitape Turing machine, but it can be formulated in any other reasonable model of computation.) Definition 5.8 (globaltape oracle machines): A globaltape oracle machine is defined as an oracle machine (cf. Definition 1.11), except that the input, output, and oracle tapes are replaced by a single globaltape. In addition, the machine has a constant number of worktapes, called the localtapes. The machine obtains its input from the globaltape, writes each query on this very tape, obtains the corresponding answer from this tape, and writes its final output on this tape. (We stress that, as a result of invoking the oracle f , the contents of the globaltape changes from q to f (q).)18 The space complexity of such a machine is stated when referring separately to its use of the globaltape and to its use of the localtapes. Clearly, any ordinary oracle machine can be converted into an equivalent globaltape oracle machine. The resulting machine uses a globaltape of length at most n + + m, where n denotes the length of the input, denote the length of the longest query or oracle answer, and m denotes the length of the output. However, combining these three different tapes into one globaltape seems to require holding separate pointers for each of the original tapes, which means that the localtape has to store three corresponding counters (in addition to storing the original worktape). Thus, the resulting machine uses a localtape of length w + log2 n + log2 + log2 m, where w denotes the space complexity of the original machine and the additional logarithmic terms (which are logarithmic in the length of the globaltape) account for the aforementioned counters. Fortunately, the aforementioned counters can be avoided in the case that the original oracle machine can be described as an iterative sequence of transformations (i.e., the input is transformed to the first query, and the i th answer is transformed to the i + 1st query or 18
This means that the prior contents of the globaltape (i.e., the query q) is lost (i.e., it is replaced by the answer f (q)). Thus, if we wish to keep such prior contents then we need to copy it to a localtape. We also stress that, according to the standard oracle invocation conventions, the head location after the oracle responds is at the leftmost cell of the globaltape.
160
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.2. LOGARITHMIC SPACE
to the output, all while maintaining auxiliary information on the worktape). Indeed, the machine presented in the proof of Claim 5.7 has this form, and thus it can be implemented by a globaltape oracle machine that uses a globaltape not longer than its input and a localtape of constant length (rather than a localtape of length that is logarithmic in the length of the globaltape). Claim 5.9 (Claim 5.7, revisited): There exists a globaltape oracle machine that evaluates gi+1 when given oracle access to gi , while using globaltape of length log2 (d 2 · Vi+1 ) and a localtape of constant length. Proof Sketch: Following the proof of Claim 5.7, we merely indicate the exact use of the two tapes. For example, recall that the edgerotation function of the square of G i is evaluated at (v, j1 , j2 ) by evaluating the edgerotation function of the original graph first at (v, j1 ) and then at (u, j2 ), where (u, k1 ) = gi (v, j1 ). This means the globaltape machine first reads (v, j1 , j2 ) from the globaltape and replaces it by the query (v, j1 ), while storing j2 on the localtape. Thus, the machine merely deletes a constant number of bits from the globaltape (and leaves its prefix intact). After invoking the oracle, the machine copies k1 from the globaltape (which currently holds (u, k1 )) to its localtape, and copies j2 from its localtape to the globaltape (such that it contains (u, j2 )). After invoking the oracle for the second time, the globaltape contains (w, k2 ) = gi (u, j2 ), and the machine merely modifies it to (w, k2 , k1 ), which is the desired output. Similarly, note that the edgerotation function of the ZigZag product of the variable graph G with the fixed graph G is evaluated at (u, i, α, β) by querying G at (u, E α (i)) and outputting (v, E β ( j ), β, α), where (v, j ) denotes the oracle answer (see Eq. (E.9)). This means that the globaltape oracle machine first copies α, β from the globaltape to the localtape, transforms the contents of the globaltape from (u, i, α, β) to (u, E α (i)), and makes an analogous transformation after the oracle is invoked. Composing globaltape oracle machines. In the proof of Claim 5.9, we implicitly used sequential composition of computations conducted by globaltape oracle machines.19 In general, when sequentially composing such computations the length of the globaltape (resp., localtape) is the maximum among all composed computations; that is, the current formalism offers a tight bound on naive sequential composition (as opposed to Lemma 5.1). Furthermore, globaltape oracle machines are beneficial in the context of recursive composition, as indicated by Lemma 5.10 (which relies on this model in a crucial way). The key observation is that all levels in the recursive composition may reuse the same global storage, and only the local storage gets added. Consequently, we have the following composition lemma. Lemma 5.10 (recursive composition in the globaltape model): Suppose that there exists a globaltape oracle machine that, for every i = 1, . . . , t − 1, computes f i+1 by making oracle calls to f i while using a globaltape of length L and a localtape of length li , which also accounts for the machine’s state. Then f t can be 19
A similar composition took place in the proof of Claim 5.7, but in Claim 5.9 we asserted a stronger feature of this specific computation.
161
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
computed t−1 by a standard oracle machine that makes calls to f 1 and uses space (li + log2 li ). L + i=1 We shall apply this lemma with f i = gi and t = O(log V1 ) = O(log Vt ), using the bounds L = log2 (d 2 · Vt ) and li = O(1) (as guaranteed by Claim 5.9). Indeed, in this application L equals the length of the input to f t = gt . Proof Sketch: We compute f t by allocating space for the emulation of the globaltape and the localtapes of each level in the recursion. We emulate the recursive computation by capitalizing on the fact that all recursive levels use the same globaltape (for making queries and receiving answers). Recall that in the actual recursion, each level may use the globaltape arbitrarily so long as when it returns control to the invoking machine the globaltape contains the right answer. Thus, the emulation may do the same, and emulate each recursive call by using the space allocated for the globaltape as well as the space designated for the localtape of this level. The emulation should also store the locations of the other levels of therecursion on the corresponding localtapes, but the space needed for this t−1 (i.e., i=1 log2 li ) is clearly smaller than the length of the various localtapes (i.e., t−1 i=1 li ). Conclusion. Combining Claim 5.9 and Lemma 5.10, we conclude that the evaluation of g O(log V1 ) can be reduced to the evaluation of g1 in space O(log V1 ); that is, g O(log V1 ) can be computed by a standard oracle machine that makes calls to g1 and uses space O(log V1 ). Recalling that G 1 can be constructed in logspace (based on the input graph G 0 ), we infer that G = G O(log V1 ) can be constructed in logspace. Theorem 5.6 follows by recalling that G (which has constant degree and logarithmic diameter) can be tested for connectivity in logspace (see Exercise 5.13). Using a similar argument, we can test whether a given pair of vertices are connected in the input graph (see Exercise 5.15). Furthermore, a corresponding path can be found within the same complexity (see Exercise 5.17).
5.3. Nondeterministic Space Complexity The difference between space complexity and time complexity is quite striking in the context of nondeterministic computations. One phenomenon is the huge gap between the power of two formulation of nondeterministic space complexity (see Section 5.3.1), which stands in contrast to the fact that the analogous formulations are equivalent in the context of time complexity. We also highlight the contrast between various results regarding (the standard model of) nondeterministic spacebounded computation (see Section 5.3.2) and the analogous questions in the context of time complexity; one good example is the “question of complementation” (cf. §5.3.2.3).
5.3.1. Two Models Recall that nondeterministic timebounded computations were defined via two equivalent models. In the offline model (underlying the definition of NP as a proof system (see Definition 2.5)), nondeterminism is captured by reference to the existential choice of an auxiliary (“nondeterministic”) input. Thus, in this model, nondeterminism refers
162
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.3. NONDETERMINISTIC SPACE COMPLEXITY
to choices that are transcendental to the machine itself (i.e., choices that are performed “offline”). In contrast, in the online model (underlying the traditional definition of NP (see Definition 2.7)) nondeterminism is captured by reference to the nondeterministic choices of the machine itself. In the context of time complexity, these models are equivalent because the latter online choices can be recorded (almost) for free (see the proof of Theorem 2.8). However, such a recording is not free of charge in the context of space complexity. Let us take a closer look at the relation between the offline and online models. The issue at hand is the cost of emulating offline choices by online choices and vice versa. We stress the fact that in the offline model the nondeterministic choices are recorded “for free” on an adequate auxiliary input device, whereas such a free record is not available in the online model. The fact that the online model can be efficiently emulated by the offline model is almost generic; that is, it holds for any natural notion of complexity, because online nondeterministic choices can be emulated by using consecutive bits of the (offline) nondeterministic input (and without significantly affecting any natural complexity measure). In contrast, the efficient emulation of the offline model by the online model relies on the ability to efficiently maintain (in the online model) a record of nondeterministic choices, which eliminates the advantage of the offline nondeterministic input (which is recorded for free in the offline model). This efficient emulation is possible in the context of time complexity, because in that context a machine may store a sequence of nondeterministic choices (performed online) and retrieve bits of it without significantly affecting the runningtime (i.e., almost “free of charge”). This naive emulation of the offline choices by online choices is not free of charge in the context of spacebounded computation, because (in the online model) each online choice that we record (i.e., store) is charged in the space complexity. Furthermore, typically the number of nondeterministic choices is much larger than the space bound, and thus the naive emulation is not possible in the context of space complexity (because it is prohibitively expensive in terms of space complexity). Let us recapitulate the two models and consider the relation between them in the context of space complexity. In the standard model, called the online model, the machine makes nondeterministic choices “on the fly” (as in Definition 2.7).20 Thus, if the machine may need to refer to such a nondeterministic choice at a latter stage in its computation, then it must store this choice on its storage device (and be charged for it). In contrast, in the socalled offline model the nondeterministic choices are provided from the outside as the bits of a special nondeterministic input. This nondeterministic input is presented on a special readonly device (or tape) that can be scanned in both directions like the main input. We denote by NSPACEonline (s) (resp., NSPACEoffline (s)) the class of sets that are acceptable by an online (resp., offline) nondeterministic machine having space complexity s. We stress that, as in Definition 2.7, the set accepted by a nondeterministic machine M is the set of strings x such that there exists a computation of M on input x that is accepting. (In the case of an online model this existential statement refers to possible nondeterministic choices of the machine itself, whereas in the case of an offline model we refer to a possible choice of a corresponding nondeterministic input.) 20
An alternative but equivalent definition is obtained by considering machines that read a nondeterministic input from a special readonly tape that can be read only in one direction. This stands in contrast to the offline model, where the nondeterministic input is presented on a readonly tape that can be scanned freely.
163
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
The relationship between these two types of classes is not obvious. Indeed, NSPACEonline (s) ⊆ NSPACEoffline (s), but (in general) containment does not hold in the opposite direction. In fact, for s that is at least logarithmic, not only that NSPACEonline (s) = NSPACEoffline (s) but rather NSPACEonline (s) ⊆ NSPACEoffline (s ), where s (n) = O(log s(n)) = o(s(n)). Furthermore, for s that is at least linear, it holds that NSPACEonline (s) = NSPACEoffline ((log s)); see Exercise 5.18. Before proceeding any further, let us justify the focus on the online model in the rest of this section. Indeed, the offline model fits better the motivations to N P (as presented in Section 2.1.2), but the online model seems more adequate for the study of nondeterminism in the context of space complexity. One reason is that an offline nondeterministic input can be used to code computations (see Exercise 5.18), and in a sense allows for “cheating” with respect to the “actual” space complexity of the computation. This is reflected in the fact that the offline model can emulate the online model while using space that is logarithmic in the space used by the online model. A related phes nomenon is that NSPACEoffline (s) is only known to be contained in DTIME(22 ), whereas s NSPACEonline (s) ⊆ DTIME(2 ). This fact motivates the study of N L = NSPACEonline (log), as a study of a (natural) subclass of P. Indeed, the various results regarding N L justify its study in retrospect. In light of the foregoing, we adopt the standard conventions and let NSPACE(s) = NSPACEonline (s). Our main focus will be the study of N L = NSPACE(log). After studying this class in Section 5.3.2, we shall return to the “question of modeling” in Section 5.3.3.
5.3.2. NL and Directed Connectivity This section is devoted to the study of N L, which we view as the nondeterministic analogue of L. Specifically, N L = ∪c NSPACE(c ), where c (n) = c log2 n. (We refer the reader to the definitional issues pertaining to NSPACE = NSPACEonline , which are discussed in Section 5.3.1.) We first note that the proof of Theorem 5.3 can be easily extended to the (online) nondeterministic context. The reason is that moving from the deterministic model to the current model does not affect the number of instantaneous configurations (as defined in the proof of Theorem 5.3), whereas this number bounds the time complexity. Thus, N L ⊆ P. The following problem, called directed connectivity (stCONN), captures the essence of nondeterministic logspace computations (and, in particular, is complete for N L under logspace reductions). The input to stCONN consists of a directed graph G = (V, E) and a pair of vertices (s, t), and the task is to determine whether there exists a directed path from s to t (in G).21 Indeed, the study of N L is often conducted via stCONN. For example, note that N L ⊆ P follows easily from the fact that stCONN is in P (and the fact that N L is logspace reducible to stCONN).
5.3.2.1. Completeness and Beyond Clearly, stCONN is in N L (see Exercise 5.19). As shown next, the N Lcompleteness of stCONN under logspace reductions follows by noting that the computation of any nondeterministic spacebounded machine yields a directed graph in which vertices 21
See Appendix G.1 for basic graph theoretic terminology. We note that, here (and in the sequel), s stands for start and t stands for terminate.
164
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.3. NONDETERMINISTIC SPACE COMPLEXITY
correspond to possible configurations and edges represent the “successive” relation of the computation. In particular, for logspace computations the graph has polynomial size, but in general the relevant graph is strongly explicit (in a natural sense; see Exercise 5.21). Theorem 5.11: Every problem in N L is logspace reducible to stCONN (via a manytoone reduction). Proof Sketch: Fixing a nondeterministic machine M and an input x, we consider the following directed graph G x = (Vx , E x ). The vertices of Vx are possible instantaneous configurations of M(x), where each configuration consists of the contents of the worktape (and the machine’s finite state), the machine’s location on it, and the machine’s location on the input. The directed edges represent single possible moves in such a computation. We stress that such a move depends on the machine M as well as on the (single) bit of x that resides in the location specified by the first configuration (i.e., the configuration corresponding to the start point of the potential edge).22 Note that (for a fixed machine M), given x, the graph G x can be constructed in logspace (by scanning all pairs of vertices and outputting only the pairs that are valid edges (which, in turn, can be tested in constant space)). By definition, the graph G x represents the possible computations of M on input x. In particular, there exists an accepting computation of M on input x if and only if there exists a directed path, in G x , starting at the vertex s that corresponds to the initial configuration and ending at the vertex t that corresponds to a canonical accepting configuration. Thus, x ∈ S if and only if (G x , s, t) is a yesinstance of stCONN. Reflection. We believe that the proof of Theorem 5.11 (see also Exercise 5.21) justifies saying that stCONN captures the essence of nondeterministic spacebounded computations. Note that this (intuitive and informal) statement goes beyond saying that stCONN is N Lcomplete under logspace reductions. We note the discrepancy between the spacecomplexity of undirected connectivity (see Theorem 5.6 and Exercise 5.15) and directed connectivity (see Theorem 5.11 and Exercise 5.23). In this context it is worthwhile to note that determining the existence of relatively short paths (rather than arbitrary paths) in undirected (or directed) graphs is also N Lcomplete under logspace reductions; see Exercise 5.24. On the search version of stCONN. We mention that the search problem corresponding to stCONN is logspace reducible to N L (by a Cookreduction); see Exercise 5.20. Also note that accepting computations of any logspace nondeterministic machine can be found by finding directed paths in directed graphs; indeed, this is a simple demonstration of the thesis that stCONN captures nondeterministic logspace computations.
5.3.2.2. Relating NSPACE to DSPACE Recall that in the context of time complexity, the only known conversion of nondeterministic computation to deterministic computation comes at the cost of an 22
Thus, the actual input x only affects the set of edges of G x (whereas the set of vertices is only affected by x). A related construction is obtained by incorporating in the configuration also the (single) bit of x that resides in the machine’s location on the input. In the latter case, x itself affects Vx (but not E x , except for E x ⊆ Vx ×Vx ).
165
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
exponential blowup in the complexity. In contrast, space complexity allows such a conversion at the cost of a polynomial blowup in the complexity. Theorem 5.12 (Nondeterministic versus deterministic space): For any spaceconstructible s : N → N that is at least logarithmic, it holds that NSPACE(s) ⊆ DSPACE(O(s 2 )). In particular, nondeterministic polynomial space is contained in deterministic polynomial space (and nondeterministic polylogarithmic space is contained in deterministic polylogarithmic space). Proof Sketch: We focus on the special case of N L and the argument extends easily to the general case. Alternatively, the general statement can be derived from the special case by using a suitable upwardtranslation lemma (see, e.g., [123, Sec. 12.5]). The special case boils down to presenting an algorithm for deciding directed connectivity that has logsquare space complexity. The basic idea is that checking whether or not there is a path of length at most 2 from u to v in G reduces (in logspace) to checking whether there is an intermediate vertex w such that there is a path of length at most from u to w def and a path of length at most from w to v. That is, let φG (u, v, ) = 1 if there is def a path of length at most from u to v in G, and φG (u, v, ) = 0 otherwise. Then φG (u, v, 2) can be computed by scanning all vertices w in G, and checking for each w whether both φG (u, w, ) = 1 and φG (w, v, ) = 1 hold.23 Hence, we can compute φG (u, v, 2) by a logspace algorithm that makes oracle calls to φG (·, ·, ), which in turn can be computed recursively in the same manner. Note that the original computational problem (i.e., stCONN) can be cast as computing φG (s, t, V ) (or φG (s, t, 2&log2 V ' )) for a given directed graph G = (V, E) and a given pair of vertices (s, t). Thus, the foregoing recursive procedure yields the theorem’s claim, provided that we use adequate composition results. We take a technically different approach by directly analyzing the recursive procedure at hand. Recall that given a directed graph G = (V, E) and a pair of vertices (s, t), we should merely compute φG (s, t, 2&log2 V ' ). This is done by invoking a recursive procedure that computes φG (u, v, 2) by scanning all vertices in G, and computing for each vertex w the values of φG (u, w, ) and φG (w, v, ). The punch line is that all these computations may reuse the same space, while we need only store one additional bit representing the results of all prior computations. We return the value 1 if and only if for some w it holds that φG (u, w, ) = φG (w, v, ) = 1 (see Figure 5.2). Needless to say, φG (u, v, 1) can be decided easily in logarithmic space. We consider an implementation of the foregoing procedure (of Figure 5.2) in which each level of the recursion uses a designated portion of the entire storage for maintaining the local variables (i.e., w and σ ). The amount of space taken by each level of the recursion is essentially log2 V  (for storing the current value of w), and the number of levels is log2 V . We stress that when computing φG (u, v, 2), we make many recursive calls, but all these calls reuse the same work space (i.e., the portion that is designated to that level). That is, when we compute φG (u, w, ) we 23
Similarly, φG (u, v, 2 + 1) can be computed by scanning all vertices w in G, and checking for each w whether both φG (u, w, + 1) = 1 and φG (w, v, ) = 1 hold.
166
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.3. NONDETERMINISTIC SPACE COMPLEXITY
Recursive computation of φG (u, v, 2), for ≥ 1. For w = 1, . . . , V  do begin (storing the vertex name) Compute σ ← φG (u, w, ) (by a recursive call) Compute σ ← σ ∧ φG (w, v, ) (by a second recursive call) If σ = 1 then return 1. (success: an intermediate vertex was found) End (of scan). return 0. (reached only if the scan was completed without success). Figure 5.2: The recursive procedure in N L ⊆ DSPACE(O (log2 )).
reuse the space that was used for computing φG (u, w , ) for the previous w , and we reuse the same space when we compute φG (w, v, ). Thus, the space complexity of our algorithm is merely the sum of the amount of space used by all recursion levels. It follows that stCONN has logsquare (deterministic) space complexity, and the same follows for all of N L (either by noting that stCONN actually represents any N L computation or by using the logspace reductions of N L to stCONN). Digest. The proof of Theorem 5.12 relies on two main observations. The first observation is that a conjunction (resp., disjunction) of two Boolean conditions can be verified using space s + O(1), where s is the space complexity of verifying a single condition. This follows by applying naive composition (i.e., Lemma 5.1). Actually, the second observation is merely a generalization of the first observation: It asserts that an existential claim (resp., a universally quantified claim) can be verified by scanning all possible values in the relevant domain (and testing the claim for each value), which in terms of space complexity has an additive cost that is logarithmic in the size of the domain. The proof of Theorem 5.12 is facilitated by the fact that we may consider a concrete and simple computational problem such as stCONN. Nevertheless, the same ideas can be applied directly to N L (or any NSPACE class). Placing NL in NC2. The simple formulation of stCONN facilitates placing N L in complexity classes such as N C 2 (i.e., decidability by uniform families of circuits of logsquare depth and bounded fanin). All that is needed is observing that stCONN can be solved by raising the adequate matrix (i.e., the adjacency matrix of the graph augmented with 1entries on the diagonal) to the adequate power (i.e., its dimension). Squaring a matrix can be done by a uniform family circuits of logarithmic depth and bounded fanin (i.e., in NC1), and by repeated squaring the n th power of an nbyn matrix can be computed by a uniform family of bounded fanin circuits of polynomial size and depth O(log2 n); thus, stCONN ∈ N C 2 . Indeed, N L ⊆ N C 2 follows by noting that stCONN actually represents any N L computation (or by noting that any logspace reduction can be computed by a uniform family of logarithmic depth and bounded fanin circuits).
5.3.2.3. Complementation or NL = coNL Recall that (reasonable) nondeterministic timecomplexity classes are not known to be closed under complementation. Furthermore, it is widely believed that N P = coN P. In contrast, (reasonable) nondeterministic spacecomplexity classes
167
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
are closed under complementation, as captured by the result N L = coN L, where def coN L = {{0, 1}∗ \ S : S ∈ N L}. Before proving that N L = coN L, we note that proving this result is equivalent to presenting a logspace Karpreduction of stCONN to its complement (or, equivalently, a reduction in the opposite direction; see Exercise 5.26). Our proof utilizes a different perspective on the NLvscoNL question, by rephrasing this question as referring to the relation between N L and N L ∩ coN L (see Exercise 2.37), and by offering an “operational interpretation” of the class N L ∩ coN L. Recall that a set S is in N L if there exists a nondeterministic logspace machine M that accepts S, and that the acceptance condition of nondeterministic machines is asymmetric in nature. That is, x ∈ S implies the existence of an accepting computation of M on input x, whereas x ∈ S implies that all computations of M on input x are nonaccepting. Thus, the existence of an accepting computation of M on input x is an absolute indication for x ∈ S, but the existence of a rejecting computation of M on input x is not an absolute indication for x ∈ S. In contrast, for S ∈ N L ∩ coN L, there exist absolute indications def both for x ∈ S and for x ∈ S (or, equivalently for x ∈ S = {0, 1}∗ \ S), where each of the two types of indication is provided by a different nondeterministic machine (i.e., either the one accepting S or the one accepting S). Combining both machines, we obtain a single nondeterministic machine that, for every input, sometimes outputs the correct answer and always outputs either the correct answer or a special (“don’t know”) symbol. This yields the following definition, which refers to Boolean functions as a special case. Definition 5.13 (nondeterministic computation of functions): We say that a nondeterministic machine M computes the function f : {0, 1}∗ → {0, 1}∗ if for every x ∈ {0, 1}∗ the following two conditions hold. 1. Every computation of M on input x yields an output in { f (x), ⊥}, where ⊥ ∈ {0, 1}∗ is a special symbol (indicating “don’t know”). 2. There exists a computation of M on input x that yields the output f (x). Note that S ∈ N L ∩ coN L if and only if there exists a nondeterministic logspace machine that computes the characteristic function of S (see Exercise 5.25). Recall that the characteristic function of S, denoted χ S , is the Boolean function satisfying χ S (x) = 1 if x ∈ S and χ S (x) = 0 otherwise. It follows that N L = coN L if and only if for every S ∈ N L there exists a nondeterministic logspace machine that computes χ S . Theorem 5.14 (N L = coN L): For every S ∈ N L there exists a nondeterministic logspace machine that computes χ S . As in the case of Theorem 5.12, the result extends to any spaceconstructible s : N → N that is at least logarithmic; that is, for such s and every S ∈ NSPACE(s), it holds that S ∈ NSPACE(O(s)). This extension can be proved either by generalizing the following proof or by using an adequate upwardtranslation lemma. Proof Sketch: As in the proof of Theorem 5.12, it suffices to present a nondeterministic logspace machine that computes the characteristic function of stCONN, denoted χ (i.e., χ(G, s, t) = 1 if there is a directed path from s to t in G and χ(G, s, t) = 0 otherwise).
168
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.3. NONDETERMINISTIC SPACE COMPLEXITY
We first show that the computation of χ is logspace reducible to determining the number of vertices that are reachable (via a directed path) from a given vertex in a given graph. On input (G, s, t), the reduction computes the number of vertices that are reachable from s in the graph G and compares this number to the number of vertices reachable from s in the graph G obtained by omitting t from G. Clearly, these two numbers are different if and only if vertex t is reachable from vertex v (in the graph G). An alternative reduction that uses a single query is presented in Exercise 5.28. Combining either of these reductions with a nondeterministic logspace machine that computes the number of reachable vertices, we obtain a nondeterministic logspace machine that computes χ. This can be shown by relying either on the nonadaptivity of these reductions or on the fact that the solutions for the target problem have logarithmic length; see Exercise 5.29. Thus, we focus on providing a nondeterministic logspace machine for computing the number of vertices that are reachable from a given vertex in a given graph. Fixing an nvertex graph G = (V, E) and a vertex v, we consider the set of vertices that are reachable from v by a path of length at most i. We denote this set by Ri , and observe that R0 = {v} and that for every i = 1, 2, . . . , it holds that Ri = Ri−1 ∪ {u : ∃w ∈ Ri−1 s.t. (w, u) ∈ E}
(5.1)
Our aim is to (nondeterministically) compute Rn  in logspace. This will be done in n iterations such that at the i th iteration we compute Ri . When computing Ri  we rely on the fact that Ri−1  is known to us, which means that we shall store Ri−1  in memory. We stress that we discard Ri−1  from memory as soon as we complete the computation of Ri , which we store instead. Thus, at each iteration i, our record of past iterations only contains Ri−1 . Computing Ri . Given Ri−1 , we nondeterministically compute Ri  by making a guess (for Ri ), denoted g, and verifying its correctness as follows:
1. We verify that Ri  ≥ g in a straightforward manner. That is, scanning V in some canonical order, we verify for g vertices that they are each in Ri . That is, during the scan, we select nondeterministically g vertices, and for each selected vertex w we verify that w is reachable from v by a path of length at most i, where this verification is performed by just guessing and verifying an adequate path (see Exercise 5.19). We use log2 n bits to store the number of vertices that were already verified to be in Ri , another log2 n bits to store the currently scanned vertex (i.e., w), and another O(log n) bits for implementing the verification of the existence of a path of length at most i from v to w. 2. The verification of the condition Ri  ≤ g (equivalently, V \ Ri  ≥ n − g) is the interesting part of the procedure. Indeed, as we saw, demonstrating membership in Ri is easy, but here we wish to demonstrate nonmembership in Ri . We do so by relying on the fact that we know Ri−1 , which allows for a nondeterministic enumeration of Ri−1 itself, which in turn allows for proofs of nonmembership in Ri (via the use of Eq. (5.1)). Details follows (and an even more structured description is provided in Figure 5.3). Scanning V (again), we verify for n − g (guessed) vertices that they are not in Ri (i.e., are not reachable from v by paths of length at most i). By Eq. (5.1), verifying
169
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Given Ri−1  and a guess g, the claim g ≥ Ri  is verified as follows. Set c ← 0. (initializing the main counter) For u = 1, . . . , n do begin (the main scan) Guess whether or not u ∈ Ri . For a negative guess (i.e., u ∈ Ri ), do begin (Verify that u ∈ Ri via Eq. (5.1).) Set c ← 0. (initializing a secondary counter) For w = 1, . . . , n do begin (the secondary scan) Guess whether or not w ∈ Ri−1 . For a positive guess (i.e., w ∈ Ri−1 ), do begin Verify that w ∈ Ri−1 (as in Step 1). Verify that u = w and (w, u) ∈ E. If some verification failed then halt with output ⊥ otherwise increment c . End (of handling a positive guess for w ∈ Ri−1 ). End (of secondary scan). (c vertices in Ri−1 were checked) If c < Ri−1  then halt with output ⊥. Otherwise (c = Ri−1 ), increment c. (u verified to be outside of Ri ) End (of handling a negative guess for u ∈ Ri ). End (of main scan). (c vertices were shown outside of Ri ) If c < n − g then halt with output ⊥. Otherwise g ≥ Ri  is verified (since n − Ri  ≥ c ≥ n − g). Figure 5.3: The main step in proving N L = coN L.
that u ∈ Ri amounts to proving that for every w ∈ Ri−1 , it holds that u = w and (w, u) ∈ E. As hinted, the knowledge of Ri−1  allows for the enumeration of Ri−1 , and thus we merely check the aforementioned condition on each vertex in Ri−1 . Thus, verifying that u ∈ Ri is done as follows. (a) We scan V guessing Ri−1  vertices that are in Ri−1 , and verify each such guess in the straightforward manner (i.e., as in Step 1).24 (b) For each w ∈ Ri−1 that was guessed and verified in Step 2a, we verify that both u = w and (w, u) ∈ E. By Eq. (5.1), if u passes the foregoing verification then indeed u ∈ Ri . We use log2 n bits to store the number of vertices that were already verified to be in V \ Ri , another log2 n bits to store the current vertex u, another log2 n bits to count the number of vertices that are currently verified to be in Ri−1 , another log2 n bits to store such a vertex w, and another O(log n) bits for verifying that w ∈ Ri−1 (as in Step 1). If any of the foregoing verifications fails, then the procedure halts outputting the “don’t know” symbol ⊥. Otherwise, it outputs g. Clearly, the foregoing nondeterministic procedure uses a logarithmic amount of space. It can be verified that, when given the correct value of Ri−1 , this procedure nondeterministically computes the value of Ri . That is, if all verifications are 24
Note that implicit in Step 2a is a nondeterministic procedure that computes the mapping (G, v, i, Ri−1 ) → Ri−1 , where Ri−1 denotes the set of vertices that are reachable in G by a path of length at most i from v.
170
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.3. NONDETERMINISTIC SPACE COMPLEXITY
satisfied then it must hold that g = Ri , and if g = Ri  then there exist adequate nondeterministic choices that satisfy all verifications. Recall that Rn is computed iteratively, starting with R0  = 1 and computing Ri  based on Ri−1 . Each iteration i = 1, . . . , n is nondeterministic, and is either completed with the correct value of Ri  (at which point Ri−1  is discarded) or halts in failure (in which case we halt the entire process and output ⊥). This yields a nondeterministic logspace machine for computing Rn , and the theorem follows. Digest. Step 2 is the heart of the proof (of Theorem 5.14). In this step a nondeterministic procedure is used to verify nonmembership in an NLtype set. Indeed, verifying membership in NLtype sets is the archetypical task of nondeterministic procedures (i.e., they are defined so as to fit these tasks), and thus Step 1 is straightforward. In contrast, nondeterministic verification of nonmembership is not a common phenomenon, and thus Step 2 is not straightforward at all. Nevertheless, in the current context (of Step 2), the verification of nonmembership is performed by an iterative (nondeterministic) process that consumes an admissible amount of resources (i.e., a logarithmic amount of space).
5.3.3. A Retrospective Discussion The current section may be viewed as a study of the “power of nondeterminism in computation” (which is a somewhat contradictory term). Recall that we view nondeterministic processes as fictitious abstractions aimed at capturing fundamental phenomena such as the verification of proofs (cf. Section 2.1.5). Since these fictitious abstractions are fundamental in the context of time complexity, we may hope to gain some understanding by a comparative study, specifically, a study of nondeterminism in the context of space complexity. Furthermore, we may discover that nondeterministic spacebounded machines give rise to interesting computational phenomena. The aforementioned hopes seem to come true in the current section. For example, the fact that N L = coN L, while the common conjecture is that N P = coN P, indicates that the latter conjecture is less generic than sometimes stated. It is not that an existential quantifier cannot be “feasibly replaced” by a universal quantifier, but it is rather the case that the feasibility of such a replacement depends very much on the specific notion of feasibility used. Turning to the other type of benefits, we learned that stCONN can be Karpreduced in logspace to stunCONN (i.e., the set of graphs in which there is no directed path between the two designated vertices; see Exercise 5.26). Still, one may ask what does the class N L actually represent (beyond stCONN, which seems actually more than merely a complete problem for this class; see §5.3.2.1). Turning back to Section 5.3.1, we recall that the class NSPACEoffline captures the straightforward notion of spacebounded verification. In this model (called the offline model), the alleged proof is written on a special device (similarly to the assertion being established by it), and this device is being read freely. In contrast, underlying the alternative class NSPACEonline is a notion of proofs that are verified by reading them sequentially (rather than scanning them back and forth). In this case, if the verification procedure may need to reexamine the currently read part of the proof (in the future), then it must store the relevant part (and be charged for this storage). Thus, the online model underlying NSPACEonline refers to the standard process of reading proofs in a sequential manner and taking notes for
171
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
future verification, rather than repeatedly scanning the proof back and forth. The online model reflects the true space complexity of taking such notes and hence of sequential verification of proofs. Indeed (as stated in Section 5.3.1), our feeling is that the offline model allows for an unfair accounting of temporary space as well as for unintendedly long proofs.
5.4. PSPACE and Games As stated in Section 5.2, we rarely encounter computational problems that require less than logarithmic space. On the other hand, we will rarely treat computational problems that require more than polynomial space. The class of decision problems that are solvable def in polynomial space is denoted PSPACE = ∪c DSPACE( pc ), where pc (n) = n c . To get a sense of the power of PSPACE, we observe that PH ⊆ PSPACE; for example, a polynomialspace algorithm can easily verify the quantified condition underlying Definition 3.8. In fact, such an algorithm can handle an unbounded number of alternating quantifiers (see the following Theorem 5.15). On the other hand, by Theorem 5.3, PSPACE ⊆ EX P, where EX P = ∪c DTIME(2 pc ) for pc (n) = n c . The class PSPACE can be interpreted as capturing the complexity of determining the winner in certain efficient twoparty games; specifically, the very games considered in Section 3.2.1 (modulo footnote 5 there). Recall that we refer to twoparty games that satisfy the following three conditions: 1. The parties alternate in taking moves that effect the game’s (global) position, where each move has a description length that is bounded by a polynomial in the length of the initial position. 2. The current position is updated based on the previous position and the current party’s move. This updating can be performed in time that is polynomial in the length of the initial position. (Equivalently, we may require a polynomialtime updating procedure and postulate that the length of the current position be bounded by a polynomial in the length of the initial position.) 3. The winner in each position can be determined in polynomial time. Recall that, for every fixed k, we showed (in Section 3.2.1) a correspondence between k and the problem of determining the existence of a kmove winning strategy (for the first party) in games of the foregoing type. The same correspondence exists between PSPACE and the problem of determining the existence of a winning strategy with polynomially many moves (in games of the foregoing type). That is, on the one hand, the set of initial positions x for which the first party has a poly(x)move winning strategy with respect to the foregoing game is in PSPACE. On the other hand, by the following Theorem 5.15, every set in PSPACE can be viewed as the set of initial positions (in a suitable game) for which the first party has a winning strategy consisting of a polynomial number of moves. Actually, the correspondence is between determining the existence of such winning strategies and deciding the satisfiability of quantified Boolean formulae (QBF); see Exercise 5.30. QBF and PSPACE. A quantified Boolean formula is a Boolean formula (as in SAT) augmented with quantifiers that refer to each variable appearing in the formula. (Note that, unlike in Exercise 3.7, we make no restrictions regarding the number of alternations between existential and universal quantifiers. For further discussion, see
172
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
5.4. PSPACE AND GAMES
Appendix G.2.) As noted before, deciding the satisfiability of quantified Boolean formulae (QBF) in in PSPACE. We next show that every problem in PSPACE is Karpreducible to QBF. Theorem 5.15: QBF is complete for PSPACE under polynomialtime manytoone reductions. Proof: As noted before, QBF is solvable by a polynomialspace algorithm that just evaluates the quantified formula. Specifically, consider a recursive procedure that eliminates a Boolean quantifier by evaluating the value of the two residual formulae, and note that the space used in the first (recursive) evaluation can be reused in the second evaluation. (Alternatively, consider a DFStype procedure as in Section 5.1.4.) Note that the space used is linear in the depth of the recursion, which in turn is linear in the length of the input formula. We now turn to show that any set S ∈ PSPACE is manytoone reducible to QBF. The proof is similar to the proof of Theorem 5.12 (which establishes N L ⊆ DSPACE(log2 )), except that here we work with an implicit graph (see Exercise 5.21, rather than with an explicitly given graph). Specifically, we refer to the directed graph of instantaneous configurations (of the algorithm A deciding membership in S), where here we use a different notion of a configuration that includes also the entire input. That is, in the rest of this proof, a configuration consists of the contents of all storage devices of the algorithm (including the input device) as well as the location of the algorithm on each device. Thus, on input x (to the reduction), we shall consider the directed graph G = G x,A = (Vx , E A ), where Vx represents all possible configurations with input x and E A represents the transition function of algorithm A (i.e., the effect of a single computation step of A). As in the proof of Theorem 5.12, for a graph G, we defined φG (u, v, ) = 1 if there is a path of length at most from u to v in G (and φG (u, v, ) = 0 otherwise). We need to determine φG (s, t, 2m ) for s that encodes the initial configuration of A(x) and t that encodes the canonical accepting configuration, where G depends on the algorithm A and m = poly(x) is such that A(x) uses at most m space and runs for at most 2m steps. By the specific definition of a configuration (which contains all relevant information including the input x), the value of φG (u, v, 1) can be determined easily based solely on the fixed algorithm A (i.e., either u = v or v is a configuration following u). Recall that φG (u, v, 2) = 1 if and only if there exists a configuration w such that both φG (u, w, ) = 1 and φG (w, v, ) = 1 hold. Thus, we obtain the recursion φG (u, v, 2) = ∃w ∈ {0, 1}m φG (u, w, ) ∧ φG (w, v, ),
(5.2)
where the bottom of the recursion (i.e., φG (u, v, 1)) is a simple propositional formula (see the foregoing comment). The problem with Eq. (5.2) is that the expression for φG (·, ·, 2) involves two occurrences of φG (·, ·, ), which doubles the length of the recursively constructed formula (yielding an exponential blowup). Our aim is to express φG (·, ·, 2) while using φG (·, ·, ) only once. This extra restriction, which prevents an exponential blowup, corresponds to the reusing of space in the two evaluations of φG (·, ·, ) that take place in the computation of φG (u, v, 2). The main idea is replacing the condition φG (u, w, ) ∧ φG (w, v, ) by the condition “∀(u v ) ∈ {(u, w), (w, v)} φG (u , v , )” (where we quantify over a
173
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
twoelement set that is not the Boolean set {0, 1}). Next, we reformulate the nonstandard quantifier (which ranges over a specific pair of strings) by using additional quantifiers as well as some simple Boolean conditions. That is, the nonstandard quantifier ∀(u v ) ∈ {(u, w), (w, v)} is replaced by the standard quantifiers ∀σ ∈ {0, 1}∃u , v ∈ {0, 1}m and the auxiliary condition [(σ = 0) ⇒ (u = u ∧ v = w)] ∧ [(σ = 1) ⇒ (u = w ∧ v = v)].
(5.3)
Thus, φG (u, v, 2) holds if and only if there exist w such that for every σ there exists (u , v ) such that both Eq. (5.3) and φG (u , v , ) hold. Note that the length of this expression for φG (·, ·, 2) equals the length of φG (·, ·, ) plus an additive overhead term of O(m). Thus, using a recursive construction, the length of the formula grows only linearly in the number of recursion steps. The reduction itself maps an instance x (of S) to the quantified Boolean formula (sx , t, 2m ), where sx denotes the initial configuration of A(x), (t and m = poly(x) are as in the foregoing discussion), and is recursively defined as follows ∃w ∈ {0, 1}m ∀σ ∈ {0, 1}∃u , v ∈ {0, 1}m [(σ = 0) ⇒ (u = u ∧ v = w)] def (u, v, 2) = ∧ [(σ = 1) ⇒ (u = w ∧ v = v)] ∧ (u , v , )
(5.4)
with (u, v, 1) = 1 if and only if either u = v or there is an edge from u to v. Note that (u, v, 1) is a (fixed) propositional formula with Boolean variables representing the bits of the variables u and v such that (u, v, 1) is satisfied if and only if either u = v or v is a configuration that follows the configuration u in a computation of A. On the other hand, note that (sx , t, 2m ) is a quantified formula in which sx , t and m are fixed and the quantified variables are not shown in the notation. We stress that the mapping of x to (sx , t, 2m ) can be computed in polynomial time. Firstly, note that the propositional formula (u, v, 1), having Boolean variables representing the bits of u and v, expresses extremely simple conditions and can certainly be constructed in polynomial time (i.e., polynomial in the number of Boolean variables, which in turn equals 2m). Next note that, given (u, v, ), which (for > 1) contains quantified variables that are not shown in the notation, we can construct (u, v, 2) by merely replacing variables names and adding quantifiers and Boolean conditions as in the recursive definition of Eq. (5.4). This is certainly doable in polynomial time. Lastly, note that the construction of (sx , t, 2m ) depends mainly on the length of x, where x itself only affects sx (and does so in a trivial manner). Recalling that m = poly(x), it follows that everything is computable in time polynomial in x. Thus, given x, the formula (sx , t, 2m ) can be constructed in polynomial time. Finally, note that x ∈ S if and only if the formula (sx , t, 2m ) is satisfiable. The theorem follows. Other PSPACEcomplete problems. As stated in the beginning of this section, there is a close relationship between PSPACE and determining winning strategies in various games. This relationship was established by considering the generic game that corresponds to the satisfiability of general QBF (see Exercise 5.30). The connection between PSPACE
174
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
and determining winning strategies in games is closer than indicated by this generic game: Determining winning strategies in several (generalizations of) natural games is also PSPACEcomplete (see [208, Sec. 8.3]). This further justifies the title of the current section.
Chapter Notes The material presented in the current chapter is based on a mix of “classical” results (proven in the 1970s if not earlier) and “modern” results (proven in the late 1980s and even later). We wish to emphasize the time gap between the formulation of some questions and their resolution. Details follow. We first mention the “classical” results. These include the N Lcompleteness of stCONN, the emulation of nondeterministic spacebounded machines by deterministic spacebounded machines (i.e., Theorem 5.12 due to Savitch [197]), the PSPACEcompleteness of QBF, and the connections between circuit depth and space complexity (see Section 5.1.4 and Exercise 5.12 due to Borodin [48]). Before turning to the “modern” results, we mention that some researchers tend to be discouraged by the impression that “decades of research have failed to answer any of the famous open problems of Complexity Theory.” In our opinion this impression is fundamentally mistaken. Specifically, in addition to the fact that substantial progress toward the understanding of many fundamental issues has been achieved, these researchers tend to forget that some famous open problems were actually resolved. Two such examples were presented in this chapter. The question of whether N L = coN L was a famous open problem for almost two decades. Furthermore, this question is related to an even older open problem dating to the early days of research in the area of formal languages (i.e., to the 1950s).25 This open problem was resolved in 1988 by Immerman [125] and Szelepcsenyi [219], who (independently) proved Theorem 5.14 (i.e., N L = coN L). For more than two decades, undirected connectivity (UCONN) was one of the most appealing examples of the computational power of randomness. Recall that the classical lineartime (deterministic) algorithms (e.g., BFS and DFS) require an extensive use of temporary storage (i.e., linear in the size of the graph). On the other hand, it was known (since 1979, see §6.1.5.2) that, with high probability, a random walk of polynomial length visits all vertices (in the corresponding connected component). Thus, the resulting randomized algorithm for UCONN uses a minimal amount of temporary storage (i.e., logarithmic in the size of the graph). In the early 1990s, this algorithm (as well as the entire class BPL (see Definition 6.11)) was derandomized in polynomial time and polylogarithmic space (see Theorem 8.23), but despite more than a decade of research attempts, a significant gap remained between the space complexity of randomized and deterministic polynomialtime algorithms for this natural and ubiquitous problem. This gap was closed by Reingold [190], who established Theorem 5.6 in 2004.26 Our presentation (in Section 5.2.4) follows Reingold’s ideas, but the specific formulation in §5.2.4.2 does not appear in [190]. 25
Specifically, the class of sets recognized by linearspace nondeterministic machines equals the class of contextsensitive languages (see, e.g., [123, Sec. 9.3]), and thus Theorem 5.14 resolves the question of whether the latter class is closed under complementation. 26 We mention that an almostlogarithmic space algorithm was discovered independently and concurrently by Trifonov [224], using a very different approach.
175
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Exercises Exercise 5.1 (scanning the inputtape beyond the input): Let A be an arbitrary algorithm of space complexity s. Show that there exists a functionally equivalent algorithm A that has space complexity s (n) = O(s(n) + log n) and does not scan the inputtape beyond the boundaries of the input. Guideline: Prove that on input x, algorithm A does not scan the inputtape beyond distance 2 O(s(x)) from the input. (Extra hint: Consider instantaneous configurations of A(x) that refer to the case that A reads a generic location on the inputtape that is not part of the input.)
Exercise 5.2 (rewriting on the writeonly outputtape): Let A be an arbitrary algorithm of space complexity s. Show that there exists a functionally equivalent algorithm A that never rewrites on (the same location of) its output device and has space complexity s such that s (n) = s(n) + O(log (n)), where (n) = maxx∈{0,1}n A(x). Guideline: Algorithm A proceeds in iterations, where in the i th iteration it outputs the i th bit of A(x) by emulating the computation of A on input x. The i th emulation of A avoids printing A(x), but rather keeps a record of the i th location of A(x)’s outputtape (and terminates by outputting the final value of this bit). Indeed, this emulation requires maintaining the current value of i as well as the current location of the emulated machine (i.e., A) on its outputtape.
Exercise 5.3 (on the power of doublelogarithmic space): For any k ∈ N, let wk denote the concatenation of all kbit long strings (in lexicographic order) separated by ∗’s (i.e., def wk = 0k−2 00 ∗ 0k−2 01 ∗ 0k−2 10 ∗ 0k−2 11 ∗ · · · ∗ 1k ). Show that the set S = {wk : k ∈ N} ⊂ {0, 1, ∗} is not regular and yet is decidable in doublelogarithmic space. Guideline: The nonregularity of S can be shown using standard techniques. Toward developing an algorithm, note that wk  > 2k , and thus O(log k) = O(log log wk ). Membership of x in S is determined by iteratively checking whether x = wi , for i = 1, 2, . . ., while stopping when detecting an obvious case (i.e., either verifying that x = wi or detecting evidence that x = wk for every k ≥ i). By taking advantage of the ∗’s (in wi ), the i th iteration can be implemented in space O(log i). Furthermore, on input x ∈ S, we halt and reject after at most log x iterations. Actually, it is slightly simpler to handle the related set {w1 ∗ ∗w2 ∗ ∗ · · · ∗ ∗wk : k ∈ N}; moreover, in this case the ∗’s can be omitted from the wi ’s (as well as from between them).
Exercise 5.4 (on the weakness of less than double logarithmic space): Prove that for (n) = log log n, it holds that DSPACE(o()) = DSPACE(O(1)). Guideline: Let s denote the machine’s (binary) space complexity. Show that if s is unbounded then it must hold that s(n) = (log log n) infinitely often. Specifically, for every integer m, consider a shortest string x such that on input x the machine uses space at least m. Consider, for each location on the input, the sequence of the residual configurations of the machine (i.e., the contents of its temporary storage)27 such that the i th element in the sequence represents the residual configuration of the machine at the i th time that the machine crosses (or rather passes through) this input location. 27
Note that, unlike in the proof of Theorem 5.3, the machine’s location on the input is not part of the notion of a configuration used here. On the other hand, although not stated explicitly, the configuration also encodes the machine’s location on the storage tape.
176
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
For starters, note that the length of this “crossing sequence” is upperbounded by the def number of possible residual configurations, which is at most t = 2s(x) · s(x). Thus, the number of such crossing sequences is upperbounded by t t . Now, if t t < x/2 then there exist three input locations that have the same crossing sequence, and two of them hold the same bit value. Contracting the string at these two locations, we get a shorter input on which the machine behaves in exactly the same manner, contradicting the hypothesis that x is the shortest input on which the machine uses space at least m. We conclude that t t ≥ x/2 must hold, and s(x) = (log log x) holds for infinitely many x’s.
Exercise 5.5 (space complexity and halting): In continuation of Theorem 5.3, prove that for every algorithm A of (binary) space complexity s there exists an algorithm A of space complexity s (n) = O(s(n) + log n) that halts on every input such that for every x on which A halts it holds that A (x) = A(x). Guideline: On input x, algorithm A emulates the execution of A(x) for at most t(x) + 1 steps, where t(n) = n · 2s(n)+log2 s(n) .
Exercise 5.6 (some logspace algorithms): Present logspace algorithms for the following computational problems. 1. Addition and multiplication of a given pair of integers. Guideline: Relying on Lemma 5.2, first transform the input to a more convenient format, then perform the operation, and finally transformthe result to the adequate format. n−1 n−1 For example, when adding x = i=0 xi 2i and y = i=0 yi 2i , a convenient format is ((x0 , y0 ), . . . , (xn−1 , yn−1 )).
2. Deciding whether two given strings are identical. 3. Finding occurrences of a given pattern p ∈ {0, 1}∗ in a given string s ∈ {0, 1}∗ . 4. Transforming the adjacency matrix representation of a graph to its incidence list representation, and vice versa. 5. Deciding whether the input graph is acyclic (i.e., has no simple cycles). Guideline: Consider a scanning of the graph that proceeds as follows. Upon entering a vertex v via the i th edge incident at it, we exit this vertex using its i + 1st edge if v has degree at least i + 1 and exit via the first edge otherwise. Note that when started at any vertex of any tree, this scanning performs a DFS. On the other hand, for every cyclic graph there exists a vertex v and an edge e incident to v such that if this scanning is started by traversing the edge e from v then it returns to v via an edge different from e.
6. Deciding whether the input graph is a tree. Guideline: Use the fact that a graph G = (V, E) is a tree if and only if it is acyclic and E = V  − 1.
Exercise 5.7 (another composition result): In continuation of the discussion in §5.1.3.3, prove that if can be solved in space s1 when given an (, )restricted oracle access to and is solvable is space s2 , then is solvable in space s such that s(n) = 2s1 (n) + s2 ((n)) + 2 (n) + δ(n), where δ(n) = O(log((n) + (n) + s1 (n) + s2 ((n)))). In particular, if s1 , s2 and are at most logarithmic, then s(n) = O(log n), because (by Exercise 5.10) in this case is at most polynomial.
177
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Guideline: View the oracleaided computation of as consisting of iterations such that in the i th iteration the i th query (denoted qi ) is determined based on the initial input (denoted x), the i − 1st oracle answer (denoted ai−1 ), and the contents of the worktape at the time that the i − 1st answer was given (denoted wi−1 ). Note that the mapping (x, ai−1 , wi−1 ) → (qi , wi ) can be computed using s1 (x) + δ(x) bits of temporary storage, because the oracle machine effects this mapping (when x, ai−1 and wi−1 reside on different devices). Composing each iteration with the computation of (using a variant of Lemma 5.2), we conclude that the mapping (x, ai−1 , wi−1 ) → (ai , wi ) can be computed (without storing the intermediate qi ) in space s1 (n) + s2 ((n)) + O(log((n) + s1 (n) + s2 ((n)))). Thus, we can emulate the entire computation using space s(n), where the extra space of s1 (n) + 2 (n) bits is used for storing the worktape of the oracle machine and the i − 1st and i th oracle answers.
Exercise 5.8 (nonadaptive reductions): In continuation of the discussion in §5.1.3.3, we define nonadaptive spacebounded reductions as follows. First, for any problem , we define the (“direct product”) problem such that the instances of are sequences of instances of . The sequence y = (y1 , . . . , yt ) is a valid solution (with respect to the problem ) to the instance x = (x1 , . . . , xt ) if and only if for every i ∈ [t] it holds that yi is a valid solution to xi (with respect to the problem ). Now, a nonadaptive reduction of to is defined as a singlequery reduction of to . 1. Note that this definition allows the oracle machine to freely scan the sequence of answers (i.e., it can move freely between the blocks that correspond to different answers). Still, prove that this does not add much power to the machine (in comparison to a machine that reads the oracleanswer device in a “semiunidirectional” manner (i.e., it never reads bits of some answer after reading any bit of any later answer)). That is, prove that a general nonadaptive reduction of space complexity s can be emulated by a nonadaptive reduction of space complexity O(s) that when obtaining the oracle answer (y1 , . . . , yt ) may read bits of yi only before reading any bit of yi+1 , . . . , yt . Guideline: Replace the query sequence x = (x1 , . . . , xt ) by the query sequence (x, x, . . . , x) where the number of repetitions is 2 O(s) .
2. Prove that if is reducible to via a nonadaptive reduction of space complexity s1 that makes queries of length at most and is solvable in space s2 , then is solvable in space s such that s(n) = O(s1 (n) + s2 ((n))). As a warmup, consider first the case of a general singlequery reduction (of to ). Guideline: The composed computation, on input x, can be written as E(x, A(G(x))), where G represents the query generation phase, A represents the application of the solver to each string in the sequence of queries, and E represents the evaluation phase. Analyze the space complexity of this computation by using (variants of) Lemma 5.2.
Exercise 5.9: Referring to the discussion in §5.1.3.3, prove that, for any s, any problem having space complexity s can be solved by a constantspace (2s, 2s)restricted reduction to a problem that is solvable in constant space. Guideline: The reduction is to the “next configuration function” associated with the said algorithm (of space complexity s), where here the configuration contains also the single bit of the input that the machine currently examines (i.e., the value of the bit at the machine’s location on the input device). To facilitate the computation of
178
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
this function, choose a suitable representation of such configurations. Note that the bulk of the operation of the oracle machine consists of iteratively copying (with minor modification) the contents of the oracleanswer tape to the oraclequery tape.
Exercise 5.10: In continuation of §5.1.3.3, we say that a reduction is (·, )restricted if there exists some function such that the reduction is (, )restricted; that is, in this definition only the length of the oracle answers is restricted. Prove that any reduction of space complexity s that is (·, )restricted is (, )restricted for (n) = 2 O(s(n)+ (n)+log n) . Actually, prove that this reduction has time complexity . Guideline: Consider an adequate notion of instantaneous configuration; specifically, such a configuration consists of the contents of both the worktape and the oracleanswer tape as well as the machine’s location on these tapes (and on the input tape).
Exercise 5.11 (transitivity of logspace reductions): Prove that logspace Karpreductions are transitive. Define logspace Levinreductions and prove that they are transitive. Guideline: Use Lemma 5.2, noting that such reductions are merely logspace computable functions.
Exercise 5.12 (logspace uniform N C 1 is in L): Suppose that a problem is solvable by a family of logspace uniform circuits of bounded fanin and depth d such that d(n) ≥ log n. Prove that is solvable by an algorithm having space complexity O(d). Guideline: Combine the algorithm outlined in Section 5.1.4 with the definition of logspace uniformity (using Lemma 5.2).
Exercise 5.13 (UCONN in constant degree graphs of logarithmic diameter): Present a logspace algorithm for deciding the following promise problem, which is parameterized by constants c and d. The input graph satisfies the promise if each vertex has degree at most d and every pair of vertices that reside in the same connected component is connected by a path of length at most c log2 n, where n denotes the number of vertices in the input graph. The task is to decide whether the input graph is connected. Guideline: For every pair of vertices in the graph, we check whether these vertices are connected in the graph. (Alternatively, we may just check whether each vertex is connected to the first vertex.) Relying on the promise, it suffices to inspect all paths of def length at most = c log2 n, and these paths can be enumerated using · &log2 d' bits of storage.
Exercise 5.14 (warmup toward §5.2.4.2): In continuation of §5.2.4.1, present a logspace transformation of G i to G i+1 . Guideline: Given the graph G i as input, we may construct G i+1 by first constructing G = G ic and then constructing G z G. To construct G , we scan all vertices of G i (holding the current vertex in temporary storage), and, for each such vertex, construct its “distance c neighborhood” in G (by using O(c) space for enumerating all possible “distance c neighbors”). Similarly, we can construct the vertex neighborhoods in G z G (by storing the current vertex name and using a constant amount of space for indicating incident edges in G).
179
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
Exercise 5.15 (stUCONN): In continuation of Section 5.2.4, prove that the following computational problem is in L: Given an undirected graph G = (V, E) and two designated vertices, s and t, determine whether there is a path from s to t in G. Guideline: Note that the transformation described in Section 5.2.4 can be easily extended such that it maps vertices in G 0 to vertices in G O(log V ) while preserving the connectivity relation (i.e., u and v are connected in G 0 if and only if their images under the map are connected in G O(log V ) ).
Exercise 5.16 (bipartiteness): Prove that the problem of determining whether or not the input graph is bipartite (i.e., 2colorable) is computationally equivalent under logspace reductions to stUCONN (as defined in Exercise 5.15). Guideline: Both reductions use the mapping of a graph G = (V, E) to a bipartite graph G = (V , E ) such that V = {v (1) , v (2) : v ∈ V } and E = {{u (1) , v (2) }, {u (2) , v (1) } : {u, v} ∈ E}. When reducing to stUCONN note that a vertex v resides on an odd cycle in G if and only if v (1) and v (2) are connected in G . When reducing from stUCONN note that s and t are connected in G by a path of even (resp., odd) length if and only if the graph G ceases to be bipartite when augmented with the edge {s (1) , t (1) } (resp., with the edges {s (1) , x} and {x, t (2) }, where x ∈ V is an auxiliary vertex).
Exercise 5.17 (finding paths in undirected graphs): In continuation of Exercise 5.15, present a logspace algorithm that given an undirected graph G = (V, E) and two designated vertices, s and t, finds a path from s to t in G (in case such a path exists). Guideline: In continuation of Exercise 5.15, we may find and (implicitly) store a logarithmically long path in G O(log V ) that connects a representative of s and a representative of t. Focusing on the task of finding a path in G 0 that corresponds to an edge in G O(log V ) , we note that such a path can be found by using the reduction underlying the combination of Claim 5.9 and Lemma 5.10. (An alternative description appears in [190].)
Exercise 5.18 (relating the two models of NSPACE): Referring to the definitions in Section 5.3.1, prove that for every function s such that log s is spaceconstructible and at least logarithmic, it holds that NSPACEonline (s) = NSPACEoffline ((log s)). Note that NSPACEonline (s) ⊆ NSPACEoffline (O(log s)) holds also for s that is at least logarithmic. Guideline (for NSPACEonline (s) ⊆ NSPACEoffline (O(log s))): Use the nondeterministic input of the offline machine for encoding an accepting computation of the online machine; that is, this input should contain a sequence of consecutive configurations leading from the initial configuration to an accepting configuration, where each configuration contains the contents of the worktape as well as the machine’s state and its locations on the work tape and on the inputtape. The emulating offline machine (which verifies the correctness of the sequence of configurations recorded on its nondeterministic input tape) needs only store its location within the current pair of consecutive configurations that it examines, which requires space logarithmic in the length of a single configuration (which in turn equals s(n) + log2 s(n) + log2 n + O(1)). (Note that this verification relies on a twodirectional access to the nondeterministic input.) Guideline (for NSPACEoffline (s ) ⊆ NSPACEonline (exp(s ))): Here we refer to the notion of a crossing sequence. Specifically, for each location on the offline nondeterministic input, consider the sequence of the residual configurations of the machine, where such a residual configuration consists of the bit residing in this
180
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
nondeterministic tape location, the contents of the machine’s temporary storage, and the machine’s locations on the input and storage tapes (but not its location on the nondeterministic tape). Show that the length of such a crossing sequence is exponential in the space complexity of the offline machine, and that the time complexity of the offline machine is at most doubleexponential in its space complexity (see Exercise 5.4). The online machine merely generates a sequence of crossing sequences (“on the fly”) and checks that each consecutive pair of crossing sequences is consistent. This requires holding two crossing sequences in storage, which require space linear in the length of such sequences (which, in turn, is exponential in the space complexity of the offline machine).
Exercise 5.19 (stCONN and variants of it are in NL): Prove that the following computational problem is in N L. The instances have the form (G, v, w, ), where G = (V, E) is a directed graph, v, w ∈ V , and is an integer, and the question is whether G contains a path of length at most from v to w. Guideline: Consider a nondeterministic machine that generates and verifiers an adequate path on the fly. That is, starting at v0 = v, the machine proceeds in iterations, such that in the i th iteration it nondeterministically generates vi , verifies that (vi−1 , vi ) ∈ E, and checks whether i ≤ and vi = w. Note that this machine need only store the last two vertices on the path (i.e., vi−1 and vi ) as well as the number of edges traversed so far (i.e., i). (Actually, using a careful implementation, it suffices to store only one of these two vertices (as well as the current i).)
Exercise 5.20 (finding directed paths in directed graphs): Present a logspace oracle machine that finds (shortest) directed paths in directed graphs by using an oracle to N L. Conclude that N L = L if and only if such paths can be found by a (standard) logspace algorithm. Guideline: Use a reduction to the decision problem presented in Exercise 5.19, and derive a standard algorithm by using the composition result of Exercise 5.7.
Exercise 5.21 (NSPACE and directed connectivity): Our aim is to establish a relation between general nondeterministic spacebounded computation and directed connectivity in “strongly constructible” graphs that have size exponential in the space bound. Let s be spaceconstructible and at least logarithmic. For every S ∈ NSPACE(s), present a lineartime oracle machine (somewhat as in §5.2.4.2) that given oracle access to x provides oracle access to a directed graph G x of size exp(s(x)) such that x ∈ S if and only if there is a directed path between the first and last vertices of G x . That is, on input a pair (u, v) and oracle access to x, the oracle machine decides whether or not (u, v) is a directed edge in G x . Guideline: Follow the proof of Theorem 5.11.
Exercise 5.22 (an alternative presentation of the proof of Theorem 5.12): We refer to directed graphs in which each vertex has a selfloop. 1. Viewing the adjacency matrices of directed graphs as oracles (cf. Exercise 5.21), present a linearspace oracle machine that determines whether a given pair of vertices is connected by a directed path of length two in the input graph. Note that this oracle machine computes the adjacency relation of the square of the graph represented in the oracle.
181
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
SPACE COMPLEXITY
2. Using naive composition (as in Lemma 5.1), present a quadraticspace oracle machine that determines whether a given pair of vertices is connected by a directed path in the graph represented in the oracle. Note that the machine in item 2 implies that stCONN can be decided in logsquare space. In particular, justify the selfloop assumption made up front. Exercise 5.23 (deciding strong connectivity): A directed graph is called strongly connected if there exists a directed path between every ordered pair of vertices in the graph (or, equivalently, a directed cycle passing through every two vertices). Prove that the problem of deciding whether a directed graph is strongly connected is N Lcomplete under (manytoone) logspace reductions. Guideline (for N Lhardness): Reduce from stCONN. Note that, for any graph G = (V, E), it holds that (G, s, t) is a yesinstance of stCONN if and only if the graph G = (V, E ∪ {(v, s) : v ∈ V } ∪ {(t, v) : v ∈ V }) is strongly connected.
Exercise 5.24 (determining distances in undirected graphs): Prove that the following computational problem is N Lcomplete under (manytoone) logspace reductions: Given an undirected graph G = (V, E), two designated vertices, s and t, and an integer K , determine whether there is a path of length at most (resp., exactly) K from s to t in G. Guideline (for N Lhardness): Reduce from stCONN. Specifically, given a directed graph G = (V, E) and vertices s, t, consider a (“layered”) graph G = (V , E ) V −1 V −2 such that V = ∪i=0 {i, v : v ∈ V } and E = ∪i=0 {{i, u, i + 1, v} : (u, v) ∈ E ∨ u = v}. Note that there exists a directed path from s to t in G if and only if there exists a path of length at most (resp., exactly) V  − 1 between 0, s and V  − 1, t in G . Guideline (for the exact version being in N L): Use N L = coN L.
Exercise 5.25 (an operational interpretation of N L ∩ coN L, N P ∩ coN P, etc.): Referring to Definition 5.13, prove that S ∈ N L ∩ coN L if and only if there exists a nondeterministic logspace machine that computes χ S , where χ S (x) = 1 if x ∈ S and χ S (x) = 0 otherwise. State and prove an analogous result for N P ∩ coN P. Guideline: A nondeterministic machine computing any function f yields, for each value v, a nondeterministic machine of similar complexity that accept {x : f (x) = v}. (Extra hint: Invoke the machine M that computes f and accept if and only if M outputs v.)
On the other hand, for any function f of finite range, combining nondeterministic def machines that accept the various sets Sv = {x : f (x) = v}, we obtain a nondeterministic machine of similar complexity that computes f . (Extra hint: On input x, the combined machine invokes each of the aforementioned machines on input x and outputs the value v if and only if the machine accepting Sv has accepted. In the case that none of the machines accepts, the combined machine outputs ⊥.)
Exercise 5.26 (a graph algorithmic interpretation of N L = coN L): Show that there exists a logspace computable function f such that for every (G, s, t) it holds that (G, s, t) is a yesinstance of stCONN if and only if (G , s , t ) = f (G, s, t) is a noinstance of stCONN.
182
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 5.27: Referring to Definition 5.13, prove that there exists a nondeterministic logspace machine that computes the distance between two given vertices in a given undirected graph. Guideline: Relate this computational problem to the (exact version of the) decision problem considered in Exercise 5.24.
Exercise 5.28: As an alternative to the twoquery reduction presented in the proof of Theorem 5.14, show that (computing the characteristic function of) stCONN is logspace reducible via a single query to the problem of determining the number of vertices that are reachable from a given vertex in a given graph. (Hint: On input (G, s, t), where G = ([N ], E), consider the number of vertices reachable from s in the graph G = ([2N ], E ∪ {(t, N + i) : i = 1, . . . , N }).)
Exercise 5.29 (reductions and nondeterministic computations): Suppose that computing f is logspace reducible to computing some function g and that it is either the case that the reduction is nonadaptive or that for every x it holds that g(x) = O(log x). Referring to nondeterministic computations as in Definition 5.13, prove that if there exists a nondeterministic logspace machine that computes g then there exists a nondeterministic logspace machine that computes f . Guideline: The point is adapting a composition result that refers to deterministic algorithms (for computing g) into one that applies to nondeterministic computations. Specifically, in the first case we adapt the result of Exercise 5.8, whereas in the second case we adapt the result Exercise 5.7. The idea is running the same procedure as in the deterministic case, and handling the possible failure of the nondeterministic machine that computes g in the natural manner; that is, if any such computation returns the value ⊥ then we just halt outputting ⊥, and otherwise we proceed as in the deterministic case (using the non⊥ values obtained).
Exercise 5.30 (the QBF game): Consider the following twoparty game that is initiated with a quantified Boolean formula. The game features an existential player (which tries to prove that the formula is valid) versus a universal player (which tries to invalidate it). The game consists of the parties scanning the formula from left to right such that when a quantifier is encountered, the corresponding party takes a move that consists of instantiating the corresponding Boolean variable. At the final position, when all variables were instantiated, the existential party is declared the winner if and only if the corresponding Boolean expression evaluates to true. 1. Show that, modulo some technical conventions, the foregoing QBF game fits the framework of efficient twoparty games (described at the beginning of Section 5.4). 2. Prove that any efficient twoparty game can be cast as a QBF game. Guideline: For part 1 define the universal player as winning in any nonfinal position (i.e., a position in which not all variables are instantiated). For part 2, see footnote 6 in Chapter 3.
183
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER SIX
Randomness and Counting
I owe this almost atrocious variety to an institution which other republics do not know or which operates in them in an imperfect and secret manner: the lottery. Jorge Luis Borges, “The Lottery in Babylon” So far, our approach to computing devices was somewhat conservative: We thought of them as executing a deterministic rule. A more liberal and quite realistic approach, which is pursued in this chapter, considers computing devices that use a probabilistic rule. This relaxation has an immediate impact on the notion of efficient computation, which is consequently associated with probabilistic polynomialtime computations rather than with deterministic (polynomialtime) ones. We stress that the association of efficient computation with probabilistic polynomialtime computation makes sense provided that the failure probability of the latter is negligible (which means that it may be safely ignored). The quantitative nature of the failure probability of probabilistic algorithms provides one connection between probabilistic algorithms and counting problems. The latter are indeed a new type of computational problems, and our focus is on counting efficiently recognizable objects (e.g., NPwitnesses for a given instance of set in N P). Randomized procedures turn out to play an important role in the study of such counting problems. Summary: Focusing on probabilistic polynomialtime algorithms, we consider various types of probabilistic failure of such algorithms (e.g., actual error versus failure to produce output). This leads to the formulation of complexity classes such as BPP, RP, and ZPP. The results presented include the existence of (nonuniform) families of polynomialsize circuits that emulate probabilistic polynomialtime algorithms (i.e., BPP ⊂ P/poly) and the fact that BPP resides in the (second level of the) Polynomialtime Hierarchy (i.e., BPP ⊆ 2 ). We then turn to counting problems: specifically, counting the number of solutions for an instance of a search problem in PC (or, equivalently, counting the number of NPwitnesses for an instance of a decision problem in N P). We distinguish between exact counting and approximate counting (in the sense of relative approximation). In particular, while any problem in PH is reducible to the exact counting class #P, approximate counting (for #P) is (probabilistically) reducible to N P.
184
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.1. PROBABILISTIC POLYNOMIAL TIME
In general, counting problems exhibit a “richer structure” than the corresponding search (and decision) problems, even when considering only natural problems. For example, some counting problems are hard in the exact version (e.g., are #Pcomplete) but easy to approximate, while others are NPhard to approximate. In some cases #Pcompleteness is due to the very same reduction that establishes the N Pcompleteness of the corresponding decision problem, whereas in other cases new reductions are required (often because the corresponding decision problem is not N Pcomplete but is rather in P). We also consider two other types of computational problems that are related to approximate counting. The first type refers to promise problems, called unique solution problems, in which the solver is guaranteed that the instance has at most one solution. Many NPcomplete problems are randomly reducible to the corresponding unique solution problems. Lastly, we consider the problem of generating almost uniformly distributed solutions, and show that in many cases this problem is computationally equivalent to approximately counting the number of solutions. Prerequisites. We assume basic familiarity with elementary probability theory (see Appendix D.1). In Section 6.2 we will rely extensively on formulations presented in def Section 2.1 (i.e., the “NP search problem” class PC as well as the sets R(x) = {y : def (x, y) ∈ R}, and S R = {x : R(x) = ∅} defined for every R ∈ PC). In Sections 6.2.2–6.2.4 we shall extensively use various hashing functions and their properties, as presented in Appendix D.2.
6.1. Probabilistic Polynomial Time Considering algorithms that utilize random choices, we extend our notion of efficient algorithms from deterministic polynomialtime algorithms to probabilistic polynomialtime algorithms. Two conflicting questions that arise are whether it is reasonable to allow randomized computational steps and whether adding such steps buys us anything. We first note that random events are an important part of our modeling of the world. We stress that this does not necessarily mean that we assert that the world per se includes genuine random choices, but rather that it is beneficial to model the world as including random choices (i.e., some phenomena appear to us as if they are random in some sense). Furthermore, it seems feasible to generate randomlooking events (e.g., the outcome of a toss coin).1 Thus, postulating that seemingly random choices can be generated by a computer is quite natural (and is in fact common practice). At the very least, this postulate yields an intuitive model of computation and the study of such a model is of natural concern. This leads to the question of whether augmenting the computational model with the ability to make random choices buys us anything. Although randomization is known to 1
Different perspectives on the question of the feasibility of randomized computation are offered in Chapter 8 and Appendix D.4. The pivot of Chapter 8 is the distinction between being actually random and looking random (to computationally restricted observers). In contrast, Appendix D.4 refers to various notions of randomness and to the feasibility of transforming weak forms of randomness into almost perfect forms.
185
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
be essential in several computational settings (e.g., cryptography (cf. Appendix C) and sampling (cf. Appendix D.3)), the question is whether randomization is useful in the context of solving decision (and search) problems. This is indeed a very good question, which is further discussed in §6.1.2.1. In fact, one of the main goals of the current section is putting this question forward. To demonstrate the potential benefit of randomized algorithms, we provide a few examples (cf. §6.1.2.2, §6.1.3.1, and §6.1.5.2).
6.1.1. Basic Modeling Issues Rigorous models of probabilistic (or randomized) algorithms are defined by natural extensions of the basic machine model. We will exemplify this approach by describing the model of probabilistic Turing machines, but we stress that (again) the specific choice of the model is immaterial (as long as it is “reasonable”). A probabilistic Turing machine is defined exactly as a nondeterministic machine (see the first item of Definition 2.7), but the definition of its computation is fundamentally different. Specifically, whereas Definition 2.7 refers to the question of whether or not there exists a computation of the machine that (started on a specific input) reaches a certain configuration, in the case of probabilistic Turing machines we refer to the probability that this event occurs, when at each step a choice is selected uniformly among the relevant possible choices available at this step. That is, if the transition function of the machine maps the current statesymbol pair to several possible triples, then in the corresponding probabilistic computation one of these triples is selected at random (with equal probability) and the next configuration is determined accordingly. These random choices may be viewed as the internal coin tosses of the machine. (Indeed, as in the case of nondeterministic machines, we may assume without loss of generality that the transition function of the machine maps each statesymbol pair to exactly two possible triples; see Exercise 2.4.) We stress the fundamental difference between the fictitious model of a nondeterministic machine and the realistic model of a probabilistic machine. In the case of a nondeterministic machine we consider the existence of an adequate sequence of choices (leading to a desired outcome), and ignore the question of how these choices are actually made. In fact, the selection of such a sequence of choices is merely a mental experiment. In contrast, in the case of a probabilistic machine, at each step a real random choice is actually made (uniformly among a set of predetermined possibilities), and we consider the probability of reaching a desired outcome. In view of the foregoing, we consider the output distribution of such a probabilistic machine on fixed inputs; that is, for a probabilistic machine M and string x ∈ {0, 1}∗ , we denote by M(x) the output distribution of M when invoked on input x, where the probability is taken uniformly over the machine’s internal coin tosses. Needless to say, we will consider the probability that M(x) is a “correct” answer; that is, in the case of a search problem (resp., decision problem) we will be interested in the probability that M(x) is a valid solution for the instance x (resp., represents the correct decision regarding x). The foregoing description views the internal coin tosses of the machine as taking place on the fly; that is, these coin tosses are performed online by the machine itself. An alternative model is one in which the sequence of coin tosses is provided by an external device, on a special “random input” tape. In such a case, we view these coin tosses as performed offline. Specifically, we denote by M (x, r ) the (uniquely defined) output of the residual deterministic machine M , when given the (primary) input x and random input r . Indeed, M is a deterministic machine that takes two inputs (the first representing
186
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.1. PROBABILISTIC POLYNOMIAL TIME
the actual input and the second representing the “random input”), but we consider the def random variable M(x) = M (x, U(x) ), where (x) denotes the number of coin tosses “expected” by M (x, ·). These two perspectives on probabilistic algorithms are closely related: Clearly, the aforementioned residual deterministic machine M yields a randomized machine M that on input x selects at random a string r of adequate length, and invokes M (x, r ). On the other hand, the computation of any randomized machine M is captured by the residual machine M that emulates the actions of M(x) based on an auxiliary input r (obtained by M and representing a possible outcome of the internal coin tosses of M). (Indeed, there is no harm in supplying more coin tosses than are actually used by M, and so the length of the aforementioned auxiliary input may be set to equal the time complexity of M.) For sake of clarity and future reference, we summarize the foregoing discussion in the following definition. Definition 6.1 (online and offline formulations of probabilistic polynomial time): • We say that M is an online probabilistic polynomialtime machine if there exists a polynomial p such that when invoked on any input x ∈ {0, 1}∗ , machine M always halts within at most p(x) steps (regardless of the outcome of its internal coin tosses). In such a case M(x) is a random variable. • We say that M is an offline probabilistic polynomialtime machine if there exists a polynomial p such that, for every x ∈ {0, 1}∗ and r ∈ {0, 1} p(x) , when invoked on the primary input x and the randominput sequence r , machine M halts within at most p(x) steps. In such a case, we will consider the random variable M (x, U p(x) ), where Um denotes a random variable uniformly distributed over {0, 1}m . Clearly, in the context of time complexity, the online and offline formulations are equivalent (i.e., given an online probabilistic polynomialtime machine we can derive a functionally equivalent offline (probabilistic polynomialtime) machine, and vice versa). Thus, in the sequel, we will freely use whichever is more convenient. We stress that the output of a randomized algorithm is no longer a function of its input, but is rather a random variable that depends on the input. Thus, the formulations of solving search and decision problems (see Definitions 1.1 and 1.2, respectively) will be extended to account for the new situation. One major aspect of this extension is that the output may assume values that are not necessarily correct. Needless to say, we would like the output to be correct with very high probability (but not necessarily with probability 1). Failure probability. Indeed, a major aspect of randomized algorithms (probabilistic machines) is that they may fail (see Exercise 6.1). That is, with some specified (“failure”) probability, these algorithms may fail to produce the desired output. We discuss two aspects of this failure: its type and its magnitude. 1. The type of failure is a qualitative notion. One aspect of this type is whether, in case of failure, the algorithm produces a wrong answer or merely an indication that it failed to find a correct answer. Another aspect is whether failure may occur on all instances or merely on certain types of instances. Let us clarify these aspects by considering three natural types of failure, giving rise to three different types of algorithms.
187
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
(a) The most liberal notion of failure is the one of twosided error. This term originates from the setting of decision problems, where it means that (in case of failure) the algorithm may err in both directions (i.e., it may rule that a yesinstance is a noinstance, and vice versa). In the case of search problems twosided error means that, when failing, the algorithm may output a wrong answer on any input. That is, the algorithm may falsely rule that the input has no solution and it may also output a wrong solution (both in case the input has a solution and in case it has no solution). (b) An intermediate notion of failure is the one of onesided error. Again, the term originates from the setting of decision problems, where it means that the algorithm may err only in one direction (i.e., either on yesinstances or on noinstances). Indeed, there are two natural cases depending on whether the algorithm errs on yesinstances but not on noinstances, or the other way around. Analogous cases occur also in the setting of search problems. In one case the algorithm never outputs a wrong solution but may falsely rule that the input has no solution. In the other case the indication that an input has no solution is never wrong, but the algorithm may output a wrong solution. (c) The most conservative notion of failure is the one of zerosided error. In this case, the algorithm’s failure amounts to indicating its failure to find an answer (by outputting a special don’t know symbol). We stress that in this case the algorithm never provides a wrong answer. Indeed, the foregoing discussion ignores the probability of failure, which is the subject of the next item. 2. The magnitude of failure is a quantitative notion. It refers to the probability that the algorithm fails, where the type of failure is fixed (e.g., as in the foregoing discussion). When actually using a randomized algorithm we typically wish its failure probability to be negligible, which intuitively means that the failure event is so rare that it can be ignored in practice. Formally, we say that a quantity is negligible if, as a function of the relevant parameter (e.g., the input length), this quantity vanishes faster than the reciprocal of any positive polynomial. For ease of presentation, we sometimes consider alternative upper bounds on the probability of failure. These bounds are selected in a way that allows (and in fact facilitates) “error reduction” (i.e., converting a probabilistic polynomialtime algorithm that satisfies such an upper bound into one in which the failure probability is negligible). For example, in the case of twosided error we need to be able to distinguish the correct answer from wrong answers by sampling, and in the other types of failure “hitting” a correct answer suffices. In the following three sections (i.e., Sections 6.1.2–6.1.4), we will discuss complexity classes corresponding to the aforementioned three types of failure. For the sake of simplicity, the failure probability itself will be set to a constant that allows error reduction. Randomized reductions. Before turning to the more detailed discussion, we mention that randomized reductions play an important role in Complexity Theory. Such reductions can be defined analogously to the standard Cookreductions (resp., Karpreductions), and again a discussion of the type and magnitude of the failure probability is in place. For clarity, we spell out the twosided error versions.
188
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.1. PROBABILISTIC POLYNOMIAL TIME
• In analogy to Definition 2.9, we say that a problem is probabilistic polynomialtime reducible to a problem if there exists a probabilistic polynomialtime oracle machine M such that, for every function f that solves and for every x, with probability at least 1 − µ(x), the output M f (x) is a correct solution to the instance x, where µ is a negligible function. • In analogy to Definition 2.11, we say that a decision problem S is reducible to a decision problem S via a randomized Karpreduction if there exists a probabilistic polynomialtime algorithm A such that, for every x, it holds that Pr[χ S (A(x)) = χ S (x)] ≥ 1 − µ(x), where χ S (resp., χ S ) is the characteristic function of S (resp., S ) and µ is a negligible function. These reductions preserve efficient solvability and are transitive: See Exercise 6.2.
6.1.2. TwoSided Error: The Complexity Class BPP In this section we consider the most liberal notion of probabilistic polynomialtime algorithms that is still meaningful. We allow the algorithm to err on each input, but require the error probability to be negligible. The latter requirement guarantees the usefulness of such algorithms, because in reality we may ignore the negligible error probability. Before focusing on the decision problem setting, let us say a few words on the search problem setting (see §1.2.2.1). Following the previous paragraph, we say that a probabilistic (polynomialtime) algorithm A solves the search problem of the relation R if for every def x ∈ S R (i.e., R(x) = {y : (x, y) ∈ R} = ∅) it holds that Pr[A(x) ∈ R(x)] > 1 − µ(x) and for every x ∈ S R it holds that Pr[A(x) = ⊥] > 1 − µ(x), where µ is a negligible function. Note that we did not require that, when invoked on input x that has a solution (i.e., R(x) = ∅), the algorithm always outputs the same solution. Indeed, a stronger requirement is that for every such x there exists y ∈ R(x) such that Pr[A(x) = y] > 1 − µ(x). The latter version and quantitative relaxations of it allow for error reduction (see Exercise 6.3). Turning to decision problems, we consider probabilistic polynomialtime algorithms that err with negligible probability. That is, we say that a probabilistic (polynomialtime) algorithm A decides membership in S if for every x it holds that Pr[A(x) = χ S (x)] > 1 − µ(x), where χ S is the characteristic function of S (i.e., χ S (x) = 1 if x ∈ S and χ S (x) = 0 otherwise) and µ is a negligible function. The class of decision problems that are solvable by probabilistic polynomialtime algorithms is denoted BPP, standing for Boundederror Probabilistic Polynomial time. Actually, the standard definition refers to machines that err with probability at most 1/3. Definition 6.2 (the class BPP): A decision problem S is in BPP if there exists a probabilistic polynomialtime algorithm A such that for every x ∈ S it holds that Pr[A(x) = 1] ≥ 2/3 and for every x ∈ S it holds that Pr[A(x) = 0] ≥ 2/3. The choice of the constant 2/3 is immaterial, and any other constant greater than 1/2 will do (and yields the very same class). Similarly, the complementary constant 1/3 can be replaced by various negligible functions (while preserving the class). Both facts are special cases of the robustness of the class, discussed next, which is established using the process of error reduction.
189
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Error reduction (or confidence amplification). For ε : N→ (0, 0.5), let BPP ε denote the class of decision problems that can be solved in probabilistic polynomial time with error probability upperbounded by ε; that is, S ∈ BPP ε if there exists a probabilistic polynomialtime algorithm A such that for every x it holds that Pr[A(x) = χ S (x)] ≤ ε(x). By definition, BPP = BPP 1/3 . However, a wide range of other classes also equal BPP. In particular, we mention two extreme cases: 1. For every positive polynomial p and ε(n) = (1/2) − (1/ p(n)), the class BPP ε equals BPP. That is, any error that is (“noticeably”) bounded away from 1/2 (i.e., error (1/2) − (1/poly(n))) can be reduced to an error of 1/3. 2. For every positive polynomial p and ε(n) = 2− p(n) , the class BPP ε equals BPP. That is, an error of 1/3 can be further reduced to an exponentially vanishing error. Both facts are proved by invoking the weaker algorithm (i.e., the one having a larger error probability bound) for an adequate number of times, and ruling by majority. We stress that invoking a randomized machine several times means that the random choices made in the various invocations are independent of one another. The success probability of such a process is analyzed by applying an adequate Law of Large Numbers (see Exercise 6.4).
6.1.2.1. On the Power of Randomization Let us turn back to the natural question raised at the beginning of Section 6.1; that is, was anything gained by extending the definition of efficient computation to include also probabilistic polynomialtime ones. This phrasing seems too generic. We certainly gained the ability to toss coins (and generate various distributions). More concretely, randomized algorithms are essential in many settings (see, e.g., Chapter 9, Section 10.1.2, Appendix C, and Appendix D.3) and seem essential in others (see, e.g., Sections 6.2.2–6.2.4). What we mean to ask here is whether allowing randomization increases the power of polynomialtime algorithms also in the restricted context of solving decision and search problems. The question is whether BPP extends beyond P (where clearly P ⊆ BPP). It is commonly conjectured that the answer is negative. Specifically, under some reasonable assumptions, it holds that BPP = P (see Part 1 of Theorem 8.19). We note, however, that a polynomial slow down occurs in the proof of the latter result; that is, randomized algorithms that run in time t(·) are emulated by deterministic algorithms that run in time poly(t(·)). This slow down seems inherent to the aforementioned approach (see §8.3.3.2). Furthermore, for some concrete problems (most notably primality testing (cf. §6.1.2.2)), the known probabilistic polynomialtime algorithm is significantly faster (and conceptually simpler) than the known deterministic polynomialtime algorithm. Thus, we believe that even in the context of decision problems, the notion of probabilistic polynomialtime algorithms is advantageous. We note that the fundamental nature of BPP will remain intact even in the (rather unlikely) case that it turns out that randomization offers no computational advantage (i.e., even if every problem that can be decided in probabilistic polynomial time can be decided by a deterministic algorithm of essentially the same complexity). Such a result would address a fundamental question regarding the power of randomness.2 We now turn 2
By analogy, establishing that IP = PSPACE (cf. Theorem 9.4) does not diminish the importance of any of these classes, because each class models something fundamentally different.
190
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.1. PROBABILISTIC POLYNOMIAL TIME
from the foregoing philosophical (and partially hypothetical) discussion to a concrete discussion of what is known about BPP. BPP is in the Polynomialtime Hierarchy. While it may be that BPP = P, it is not known whether or not BPP is contained in N P. The source of trouble is the twosided error probability of BPP, which is incompatible with the absolute rejection of noinstances required in the definition of N P (see Exercise 6.8). In view of this ignorance, it is interesting to note that BPP resides in the second level of the Polynomialtime Hierarchy (i.e., BPP ⊆ 2 ). This is a corollary of Theorem 6.9. Trivial derandomization. A straightforward way of eliminating randomness from an algorithm is trying all possible outcomes of its internal coin tosses, collecting the relevant statistics, and deciding accordingly. This yields BPP ⊆ PSPACE ⊆ EX P, which is considered the trivial derandomization of BPP. In Section 8.3 we will consider various nontrivial derandomizations of BPP, which are known under various intractability assumptions. The interested reader, who may be puzzled by the connection between derandomization and computational difficulty, is referred to Chapter 8. Nonuniform derandomization. In many settings (and specifically in the context of solving search and decision problems), the power of randomization is superseded by the power of nonuniform advice. Intuitively, the nonuniform advice may specify a sequence of coin tosses that is good for all (primary) inputs of a specific length. In the context of solving search and decision problems, such an advice must be good for each of these inputs,3 and thus its existence is guaranteed only if the error probability is low enough (so as to support a union bound). The latter condition can be guaranteed by error reduction, and thus we get the following result. Theorem 6.3: BPP is (strictly) contained in P/poly. Proof: Recall that P/poly contains undecidable problems (Theorem 3.7), which are certainly not in BPP. Thus, we focus on showing that BPP ⊆ P/poly. By the discussion regarding error reduction, for every S ∈ BPP there exists a (deterministic) polynomialtime algorithm A and a polynomial p such that for every x it holds that Pr[A(x, U p(x) ) = χ S (x)] < 2−x . Using a union bound, it follows that Prr ∈{0,1} p(n) [∃x ∈ {0, 1}n s.t. A(x, r ) = χ S (x)] < 1. Thus, for every n ∈ N, there exists a string rn ∈ {0, 1} p(n) such that for every x ∈ {0, 1}n it holds that A(x, rn ) = χ S (x). Using such a sequence of rn ’s as advice, we obtain the desired nonuniform machine (establishing S ∈ P/poly). Digest. The proof of Theorem 6.3 combines error reduction with a simple application of the Probabilistic Method (cf. [11]), where the latter refers to proving the existence of an object by analyzing the probability that a random object is adequate. In this case, we sought a nonuniform advice, and proved it existence by analyzing the probability that a random advice is good. The latter event was analyzed by identifying the space of possible advice with the set of possible sequences of internal coin tosses of a randomized algorithm. 3
In other contexts (see, e.g., Chapters 7 and 8), it suffices to have an advice that is good on the average, where the average is taken over all relevant (primary) inputs.
191
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
6.1.2.2. A Probabilistic PolynomialTime Primality Test Teaching note: Although primality has been recently shown to be in P, we believe that the following example provides a nice illustration to the power of randomized algorithms.
We present a simple probabilistic polynomialtime algorithm for deciding whether or not a given number is a prime. The only numbertheoretic facts that we use are Fact 1: For every prime p > 2, each quadratic residue mod p has exactly two square
roots mod p (and they sum up to p).4 Fact 2: For every (odd and nonintegerpower) composite number N , each quadratic residue mod N has at least four square roots mod N . Our algorithm uses as a blackbox an algorithm, denoted sqrt, that given a prime p and a quadratic residue mod p, denoted s, returns the smallest among the two modular square roots of s. There is no guarantee as to what the output is in the case that the input is not of the aforementioned form (and in particular in the case that p is not a prime). Thus, we actually present a probabilistic polynomialtime reduction of testing primality to extracting square roots modulo a prime (which is a search problem with a promise; see Section 2.4.1). Construction 6.4 (the reduction): On input a natural number N > 2 do 1. If N is either even or an integer power5 then reject. 2. Uniformly select r ∈ {1, . . . , N − 1}, and set s ← r 2 mod N . 3. Let r ← sqrt(s, N ). If r ≡ ±r (mod N ) then accept else reject. Indeed, in the case that N is composite, the reduction invokes sqrt on an illegitimate input (i.e., it makes a query that violates the promise of the problem at the target of the reduction). In such a case, there is no guarantee as to what sqrt answers, but actually a bluntly wrong answer only plays in our favor. In general, we will show that if N is composite, then the reduction rejects with probability at least 1/2, regardless of how sqrt answers. We mention that there exists a probabilistic polynomialtime algorithm for implementing sqrt (see Exercise 6.16). Proposition 6.5: Construction 6.4 constitutes a probabilistic polynomialtime reduction of testing primality to extracting square roots module a prime. Furthermore, if the input is a prime then the reduction always accepts, and otherwise it rejects with probability at least 1/2. We stress that Proposition 6.5 refers to the reduction itself; that is, sqrt is viewed as a (“perfect”) oracle that, for every prime P and quadratic residue s (mod P), returns r < s/2 such that r 2 ≡ s (mod P). Combining Proposition 6.5 with a probabilistic 4
That is, for every r ∈ {1, . . . , p − 1}, the equation x 2 ≡ r 2 (mod p) has two solutions modulo p (i.e., r and p − r ). 5 This can be checked by scanning all possible powers e ∈ {2, . . . , log2 N }, and (approximately) solving the equation x e = N for each value of e (i.e., finding the smallest integer i such that i e ≥ N ). Such a solution can be found by binary search.
192
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.1. PROBABILISTIC POLYNOMIAL TIME
polynomialtime algorithm that computes sqrt with negligible error probability, we obtain that testing primality is in BPP. Proof: By Fact 1, on input a prime number N , Construction 6.4 always accepts (because in this case, for every r ∈ {1, . . . , N − 1}, it holds that sqrt(r 2 mod N , N ) ∈ {r, N − r }). On the other hand, suppose that N is an odd composite that is not an integer power. Then, by Fact 2, each quadratic residue s has at least four square roots, and each of these square roots is equally likely to be chosen at Step 2 (in other words, s yields no information regarding which of its modular square roots was selected in Step 2). Thus, for every such s, the probability that either sqrt(s, N ) or N − sqrt(s, N ) equal the root chosen in Step 2 is at most 2/4. It follows that, on input a composite number, the reduction rejects with probability at least 1/2. Reflection. Construction 6.4 illustrates an interesting aspect of randomized algorithms (or rather reductions), that is, their ability to take advantage of information that is unknown to the invoked subroutine. Specifically, Construction 6.4 generates a problem instance (N , s), which hides crucial information (regarding how s was generated). Any subroutine that answers correctly in the case that N is prime provides probabilistic evidence that N is a prime, where the probability space refers to the missing information (regarding how s was generated in the case that N is composite). Comment. Testing primality is actually in P. However, the deterministic algorithm demonstrating this fact is more complex than Construction 6.4 (and its analysis is even more complicated).
6.1.3. OneSided Error: The Complexity Classes RP and coRP In this section we consider notions of probabilistic polynomialtime algorithms having onesided error. The notion of onesided error refers to a natural partition of the set of instances, that is, yesinstances versus noinstances in the case of decision problems, and instances having solution versus instances having no solution in the case of search problems. We focus on decision problems, and comment that an analogous treatment can be provided for search problems (see Exercise 6.3). Definition 6.6 (the class RP):6 A decision problem S is in RP if there exists a probabilistic polynomialtime algorithm A such that for every x ∈ S it holds that Pr[A(x) = 1] ≥ 1/2 and for every x ∈ S it holds that Pr[A(x) = 0] = 1. The choice of the constant 1/2 is immaterial, and any other constant greater than zero will do (and yields the very same class). Similarly, this constant can be replaced by 1 − µ(x) for various negligible functions µ (while preserving the class). Both facts are special cases of the robustness of the class (see Exercise 6.5). Observe that RP ⊆ N P (see Exercise 6.8) and that RP ⊆ BPP (by the aforementioned error reduction). Defining coRP = {{0, 1}∗ \ S : S ∈ RP}, note that coRP 6
The initials RP stands for Random Polynomial time, which fails to convey the restricted type of error allowed in this class. The only nice feature of this notation is that it is reminiscent of NP, thus reflecting the fact that RP is a randomized polynomialtime class that is contained in N P.
193
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
corresponds to the opposite direction of onesided error probability. That is, a decision problem S is in coRP if there exists a probabilistic polynomialtime algorithm A such that for every x ∈ S it holds that Pr[A(x) = 1] = 1 and for every x ∈ S it holds that Pr[A(x) = 0] ≥ 1/2.
6.1.3.1. Testing Polynomial Identity An appealing example of a onesided error randomized algorithm refers to the problem of determining whether two polynomials are identical. For simplicity, we assume that we are given an oracle for the evaluation of each of the two polynomials. An alternative presentation that refers to polynomials that are represented by arithmetic circuits (cf. Appendix B.3) yields a standard decision problem in coRP (see Exercise 6.17). Either way, we refer to multivariate polynomials and to the question of whether they are identical over any field (or, equivalently, whether they are identical over a sufficiently large finite field). Note that it suffices to consider finite fields that are larger than the degree of the two polynomials. Construction 6.7 (PolynomialIdentity Test): Let n be an integer and F be a finite field. Given blackbox access to p, q : Fn → F, uniformly select r1 , . . . , rn ∈ F, and accept if and only if p(r1 , . . . , rn ) = q(r1 , . . . , rn ). Clearly, if p ≡ q then Construction 6.7 always accepts. The following lemma implies that if p and q are different polynomials, each of total degree at most d over the finite field F, then Construction 6.7 accepts with probability at most d/F. Lemma 6.8: Let p : Fn → F be a nonzero polynomial of total degree d over the finite field F. Then
Prr1 ,...,rn ∈F [ p(r1 , . . . , rn ) = 0] ≤
d F .
Proof: The lemma is proven by induction on n. The base case of n = 1 follows immediately by the Fundamental Theorem of Algebra (i.e., any nonzero univariate polynomial of degree d has at most d distinct roots). In the induction step, we write p as a polynomial in its first variable with coefficients that are polynomials in the other variables. That is, p(x1 , x2 , . . . , xn ) =
d
pi (x2 , . . . , xn ) · x1i
i=0
where pi is a polynomial of total degree at most d − i. Let i be the largest integer for which pi is not identically zero. Dismissing the case i = 0 and using the induction hypothesis, we have
Prr1 ,r2 ,...,rn [ p(r1 , r2 , . . . , rn ) = 0] ≤ Prr2 ,...,rn [ pi (r2 , . . . , rn ) = 0] + Prr1 ,r2 ,...,rn [ p(r1 , r2 , . . . , rn ) = 0  pi (r2 , . . . , rn ) = 0] ≤
i d −i + F F
194
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.1. PROBABILISTIC POLYNOMIAL TIME
where the second term is bounded by fixing any sequence r2 , . . . , rn for def which pi (r2 , . . . ., rn ) = 0 and considering the univariate polynomial p (x) = p(x, r2 , . . . , rn ) (which by hypothesis is a nonzero polynomial of degree i). Reflection. Lemma 6.8 may be viewed as asserting that for every nonzero polynomial of degree d over F at least a 1 − (d/F) fraction of its domain does not evaluate to zero. Thus, if d F then most of the evaluation points constitute a witness for the fact that the polynomial is nonzero. We know of no efficient deterministic algorithm that, given a representation of the polynomial via an arithmetic circuit, finds such a witness. Indeed, Construction 6.7 attempts to find a witness by merely selecting it at random.
6.1.3.2. Relating BPP to RP A natural question regarding probabilistic polynomialtime algorithms refers to the relation between twosided and onesided error probability. For example, is BPP contained in RP? Loosely speaking, we show that BPP is reducible to coRP by onesided error randomized Karpreductions, where the actual statement refers to the promise problem versions of both classes (briefly defined in the following paragraph). Note that BPP is trivially reducible to coRP by twosided error randomized Karpreductions, whereas a deterministic Karpreduction of BPP to coRP would imply BPP = coRP = RP (see Exercise 6.9). First, we refer the reader to the general discussion of promise problems in Section 2.4.1. Analogously to Definition 2.31, we say that the promise problem = (Syes , Sno ) is in (the promise problem extension of) BPP if there exists a probabilistic polynomialtime algorithm A such that for every x ∈ Syes it holds that Pr[A(x) = 1] ≥ 2/3 and for every x ∈ Sno it holds that Pr[A(x) = 0] ≥ 2/3. Similarly, is in coRP if for every x ∈ Syes it holds that Pr[A(x) = 1] = 1 and for every x ∈ Sno it holds that Pr[A(x) = 0] ≥ 1/2. Probabilistic reductions among promise problems are defined by adapting the conventions of Section 2.4.1; specifically, queries that violate the promise at the target of the reduction may be answered arbitrarily. Theorem 6.9: Any problem in BPP is reducible by a onesided error randomized Karpreduction to coRP, where coRP (and possibly also BPP) denotes the corresponding class of promise problems. Specifically, the reduction always maps a noinstance to a noinstance. It follows that BPP is reducible by a onesided error randomized Cookreduction to RP. Thus, using the conventions of Section 3.2.2 and referring to classes of promise problems, we may write BPP ⊆ RP RP . In fact, since RP RP ⊆ BPP BPP = BPP, we have BPP = RP RP . Theorem 6.9 may be paraphrased as saying that the combination of the onesided error probability of the reduction and the onesided error probability of coRP can account for the twosided error probability of BPP. We warn that this statement is not a triviality like 1 + 1 = 2, and in particular we do not know whether it holds for classes of standard decision problems (rather than for the classes of promise problems considered in Theorem 6.9). Proof: Recall that we can easily reduce the error probability of BPPalgorithms, and derive probabilistic polynomialtime algorithms of exponentially vanishing error
195
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
probability. But this does not eliminate the error altogether (not even on “one side”). In general, there seems to be no hope of eliminating the error, unless we (either do something earthshaking or) change the setting as done when allowing a onesided error randomized reduction to a problem in coRP. The latter setting can be viewed as a twomove randomized game (i.e., a random move by the reduction followed by a random move by the decision procedure of coRP), and it enables the application of different quantifiers to the two moves (i.e., allowing error in one direction in the first quantifier and error in the other direction in the second quantifier). In the next paragraph, which is inessential to the actual proof, we illustrate the potential power of this setting. Teaching note: The following illustration represents an alternative way of proving Theorem 6.9. This way seems conceptually simpler but it requires a starting point (or rather an assumption) that is much harder to establish, where both comparisons are with respect to the actual proof of Theorem 6.9 (which follows the illustration).
An illustration. Suppose that for some set S ∈ BPP there exists a polyno
mial p and an offline BPPalgorithm A such that for every x it holds that Prr ∈{0,1}2 p (x) [A (x, r ) = χ S (x)] < 2−( p (x)+1) ; that is, the algorithm uses 2 p (x) bits of randomness and has error probability smaller than 2− p (x) /2. Note that such an algorithm cannot be obtained by standard error reduction (see Exercise 6.10). Anyhow, such a small error probability allows a partition of the string r such that one part accounts for the entire error probability on yesinstances while the other part accounts for the error probability on noinstances. Specifically, for every x ∈ S, it holds that Prr ∈{0,1} p (x) [(∀r ∈ {0, 1} p (x) ) A (x, r r ) = 1] > 1/2, whereas for every x ∈ S and every r ∈ {0, 1} p (x) it holds that Prr ∈{0,1} p (x) [A (x, r r ) = 1] < 1/2. Thus, the error on yesinstances is “pushed” to the selection of r , whereas the error on noinstances is pushed to the selection of r . This yields a onesided error randomized Karpreduction that maps x to (x, r ), where r is uniformly selected in {0, 1} p (x) , such that deciding S is reduced to the coRP problem (regarding pairs (x, r )) that is decided by the (online) randomized algorithm A defined by def A (x, r ) = A (x, r U p (x) ). For details, see Exercise 6.11. The actual proof, which avoids the aforementioned hypothesis, follows. The actual starting point. Consider any BPPproblem with a characteristic function χ (which, in case of a promise problem, is a partial function, defined only over the promise). By standard error reduction, there exists a probabilistic polynomialtime algorithm A such that for every x on which χ is defined it holds that Pr[A(x) = χ(x)] < µ(x), where µ is a negligible function. Looking at the corresponding residual (offline) algorithm A and denoting by p the polynomial that bounds the running time of A, we have
Prr ∈{0,1} p(x) [A (x, r ) = χ(x)] < µ(x)
1/2. Claim 2: If x is a noinstance (i.e., χ(x) = 0) then Pr[M(x) ∈ no ] = 1.
We start with Claim 2, which is easier to establish. Recall that M(x) = (x, (s1 , . . . , sm )), where s1 , . . . , sm are uniformly and independently distributed in {0, 1}m . We note that (by Eq. (6.1) and χ(x) = 0), for every possible choice of s1 , . . . , sm ∈ {0, 1}m and every i ∈ {1, . . . , m}, the fraction of r ’s that satisfy 1 . Thus, for every possible choice of s1 , . . . , sm ∈ A (x, r ⊕ si ) = 1 is at most 2m m {0, 1} , for at most half of the possible r ∈ {0, 1}m there exists an i such that A (x, r ⊕ si ) = 1 holds. Hence, the reduction M always maps the noinstance x (i.e., χ(x) = 0) to a noinstance of (i.e., an element of no ). Turning to Claim 1 (which refers to χ(x) = 1), we will show shortly that in this case, with very high probability, the reduction M maps x to a yesinstance of . We upperbound the probability that the reduction fails (in case χ(x) = 1) as follows:
Pr[M(x) ∈ yes ] = Prs1 ,...,sm [∃r ∈ {0, 1}m s.t. (∀i) A (x, r ⊕ si ) = 0] ≤
Prs1 ,...,sm [(∀i) A (x, r ⊕ si ) = 0]
r ∈{0,1}m
=
m r ∈{0,1}m
0} for some R ∈ PC (see
202
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
Section 2.1.2). It also holds that BPP is Cookreducible to #P (see Exercise 6.20). The class #P is sometimes defined in terms of decision problems, as is implicit in the following proposition. Proposition 6.15 (a decisional version of #P): For any f ∈ #P, deciding memberdef ship in S f = {(x, N ) : f (x) ≥ N } is computationally equivalent to computing f . Actually, the claim holds for any function f : {0, 1}∗ → N for which there exists a polynomial p such that for every x ∈ {0, 1}∗ it holds that f (x) ≤ 2 p(x) . Proof: Since the relation R vouching for f ∈ #P (i.e., f (x) = R(x)) is polynomially bounded, there exists a polynomial p such that for every x it holds that f (x) ≤ 2 p(x) . Deciding membership in S f is easily reduced to computing f (i.e., we accept the input (x, N ) if and only if f (x) ≥ N ). Computing f is reducible to deciding S f by using a binary search (see Exercise 2.9). This relies on the fact that, on input x and oracle access to S f , we can determine whether or not f (x) ≥ N by making the query (x, N ). Note that we know a priori that f (x) ∈ [0, 2 p(x) ]. The counting class #P is also related to the problem of enumerating all possible solutions to a given instance (see Exercise 6.21).
6.2.1.1. On the Power of #P As indicated, N P ∪ BPP is (easily) reducible to #P. Furthermore, as stated in Theorem 6.16, the entire Polynomialtime Hierarchy (as defined in Section 3.2) is Cookreducible to #P (i.e., PH ⊆ P #P ). On the other hand, any problem in #P is solvable in polynomial space, and so P #P ⊆ PSPACE. Theorem 6.16: Every set in PH is Cookreducible to #P. We do not present a proof of Theorem 6.16 here, because the known proofs are rather technical. Furthermore, one main idea underlying these proofs appears in a more clear form in the proof of Theorem 6.29. Nevertheless, in Appendix F.1 we present a proof of a related result, which implies that PH is reducible to #P via randomized Karpreductions.
6.2.1.2. Completeness in #P The definition of #Pcompleteness is analogous to the definition of N Pcompleteness. That is, a counting problem f is #P complete if f ∈ #P and every problem in #P is Cookreducible to f . We claim that the counting problems associated with the NPcomplete problems presented in Section 2.3.3 are all #Pcomplete. We warn that this fact is not due to the mere NPcompleteness of these problems, but rather to an additional property of the reductions establishing their NPcompleteness. Specifically, the Karpreductions that were used (or variants of them) have the extra property of preserving the number of NPwitnesses (as captured by the following definition). Definition 6.17 (parsimonious reductions): Let R, R ∈ PC and let g be a Karpreduction of S R = {x : R(x) = ∅} to S R = {x : R (x) = ∅}, where R(x) = {y : (x, y) ∈ R} and R (x) = {y : (x, y) ∈ R }. We say that g is parsimonious (with
203
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
respect to R and R ) if for every x it holds that R(x) = R (g(x)). In such a case we say that g is a parsimonious reduction of R to R . We stress that the condition of being parsimonious refers to the two underlying relations R and R (and not merely to the sets S R and S R ). The requirement that g is a Karpreduction is partially redundant, because if g is polynomialtime computable and for every x it holds that R(x) = R (g(x)), then g constitutes a Karpreduction of S R to S R . Specifically, R(x) = R (g(x)) implies that R(x) > 0 (i.e., x ∈ S R ) if and only if R (g(x)) > 0 (i.e., g(x) ∈ S R ). The reader may easily verify that the Karpreduction underlying the proof of Theorem 2.19 as well as many of the reductions used in Section 2.3.3 are parsimonious (see Exercise 2.29). Theorem 6.18: Let R ∈ PC and suppose that every search problem in PC is parsimoniously reducible to R. Then the counting problem associated with R is #Pcomplete. Proof: Clearly, the counting problem associated with R, denoted #R, is in #P. To show that every f ∈ #P is reducible to f , we consider the relation R ∈ PC that is counted by f ; that is, #R = f . Then, by the hypothesis, there exists a parsimonious reduction g of R to R. This reduction also reduces #R to #R; specifically, #R (x) = #R(g(x)) for every x. Corollaries. As an immediate corollary of Theorem 6.18, we get that counting the number of satisfying assignments to a given CNF formula is #Pcomplete (because RSAT is PCcomplete via parsimonious reductions). Similar statements hold for all the other NPcomplete problems mentioned in Section 2.3.3 and in fact for all NPcomplete problems listed in [85]. These corollaries follow from the fact that all known reductions among natural NPcomplete problems are either parsimonious or can be easily modified to be so. We conclude that many counting problems associated with NPcomplete search problems are #Pcomplete. It turns out that also counting problems associated with efficiently solvable search problems may be #Pcomplete. Theorem 6.19: There exist #Pcomplete counting problems that are associated with efficiently solvable search problems. That is, there exists R ∈ PF (see Definition 2.2) such that #R is #Pcomplete. Theorem 6.19 can be established by presenting artificial #Pcomplete problems (see Exercise 6.22). The following proof uses a natural counting problem. Proof: Consider the relation Rdnf consisting of pairs (φ, τ ) such that φ is a DNF formula and τ is an assignment satisfying it. Note that the search problem of Rdnf is easy to solve (e.g., by picking an arbitrary truth assignment that satisfies the first term in the input formula). To see that #Rdnf is #Pcomplete consider the following reduction from #RSAT (which is #Pcomplete by Theorem 6.18). Given a CNF formula φ, transform ¬φ into a DNF formula φ by applying deMorgan’s
204
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
Law, query #Rdnf on φ , and return 2n − #Rdnf (φ ), where n denotes the number of variables in φ (resp., φ ). Reflections. We note that Theorem 6.19 is not established by a parsimonious reduction. This fact should not come as a surprise because a parsimonious reduction of #R to #R implies that S R = {x : ∃y s.t. (x, y) ∈ R } is reducible to S R = {x : ∃y s.t. (x, y) ∈ R}, where in our case S R is NPcomplete while S R ∈ P (since R ∈ PF). Nevertheless, the proof of Theorem 6.19 is related to the hardness of some underlying decision problem (i.e., the problem of deciding whether a given DNF formula is a tautology (i.e., whether #Rdnf (φ ) = 2n )). But does there exist a #Pcomplete problem that is “not based on some underlying NPcomplete decision problem”? Amazingly enough, the answer is positive. Theorem 6.20: Counting the number of perfect matchings in a bipartite graph is #Pcomplete.8 Equivalently (see Exercise 6.23), the problem of computing the permanent of matrices with 0/1entries is #Pcomplete. Recall that the permanent of an nbyn matrix M = (m ni, j ), denoted perm(M), equals the sum over all permutations π of [n] of the products i=1 m i,π(i) . Theorem 6.20 is proven by composing the following two (manytoone) reductions (asserted in Propositions 6.21 and 6.22, respectively) and using the fact that #R3SAT is #Pcomplete (see Theorem 6.18 and Exercise 2.29). Needless to say, the resulting reduction is not parsimonious. Proposition 6.21: The counting problem of 3SAT (i.e., #R3SAT ) is reducible to computing the permanent of integer matrices. Furthermore, there exists an even integer c > 0 and a finite set of integers I such that, on input a 3CNF formula φ, the reduction produces an integer matrix Mφ with entries in I such that perm(Mφ ) = cm · #R3SAT (φ) where m denotes the number of clauses in φ. The original proof of Proposition 6.21 uses c = 210 and I = {−1, 0, 1, 2, 3}. It can be shown (see Exercise 6.24 (which relies on Theorem 6.29)) that, for every integer n > 1 that is relatively prime to c, computing the permanent modulo n is NPhard (under randomized reductions). Thus, using the case of c = 210 , this means that computing the permanent modulo n is NPhard for any odd n > 1. In contrast, computing the permanent modulo 2 (which is equivalent to computing the determinant modulo 2) is easy (i.e., can be done in polynomial time and even in N C). Thus, assuming N P ⊆ BPP, Proposition 6.21 cannot hold for an odd c (because by Exercise 6.24 it would follow that computing the permanent modulo 2 is NPhard). We also note that, assuming P = N P, Proposition 6.21 cannot possibly hold for a set I containing only nonnegative integers (see Exercise 6.25). Proposition 6.22: Computing the permanent of integer matrices is reducible to computing the permanent of 0/1matrices. Furthermore, the reduction maps any integer matrix A into a 0/1matrix A such that the permanent of A can be easily computed from A and the permanent of A . 8
See Appendix G.1 for basic terminology regarding graphs.
205
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Teaching note: We do not recommend presenting the proofs of Propositions 6.21 and 6.22 in class. The highlevel structure of the proof of Proposition 6.21 has the flavor of some sophisticated reductions among NPproblems, but the crucial point is the existence of adequate gadgets. We do not know of a highlevel argument establishing the existence of such gadgets nor of any intuition as to why such gadgets exist.9 Instead, the existence of such gadgets is proved by a design that is both highly nontrivial and ad hoc in nature. Thus, the proof of Proposition 6.21 boils down to a complicated design problem that is solved in a way that has little pedagogical value. In contrast, the proof of Proposition 6.22 uses two simple ideas that can be useful in other settings. With suitable hints, this proof can be used as a good exercise.
Proof of Proposition 6.21: We will use the correspondence between the permanent of a matrix A and the sum of the weights of the cycle covers of the weighted directed graph represented by the matrix A. A cycle cover of a graph is a collection of simple10 vertexdisjoint directed cycles that covers all the graph’s vertices, and its weight is the product of the weights of the corresponding edges. The SWCC of a weighted directed graph is the sum of the weights of all its cycle covers. Given a 3CNF formula φ, we construct a directed weighted graph G φ such that the SWCC of G φ equals equals cm · #R3SAT (φ), where c is a universal constant and m denotes the number of clauses in φ. We may assume, without loss of generality, that each clause of φ has exactly three literals (which are not necessarily distinct). We start with a highlevel description (of the construction) that refers to (clause) gadgets, each containing some internal vertices and internal (weighted) edges, which are unspecified at this point. In addition, each gadget has three pairs of designated vertices, one pair per each literal appearing in the clause, where one vertex in the pair is designated as an entry vertex and the other as an exit vertex. The graph G φ consists of m such gadgets, one per each clause (of φ), and n auxiliary vertices, one per each variable (of φ), as well as some additional directed edges, each having weight 1. Specifically, for each variable, we introduce two tracks, one per each of the possible literals of this variable. The track associated with a literal consists of directed edges (each having weight 1) that form a simple “cycle” passing through the corresponding (auxiliary) vertex as well as through the designated vertices that correspond to the occurrences of this literal in the various clauses. Specifically, for each such occurrence, the track enters the corresponding clause gadget at the entry vertex corresponding to this literal and exits at the corresponding exit vertex. (If a literal does not appear in φ then the corresponding track is a selfloop on the corresponding variable.) See Figure 6.1 showing the two tracks of a variable x that occurs positively in three clauses and negatively in one clause. The entry vertices (resp., exit vertices) are drawn on the top (resp., bottom) part of each gadget. For the purpose of stating the desired properties of the clause gadget, we augment the gadget by nine external edges (of weight 1), one per each pair of (not necessarily matching) entry and exit vertices such that the edge goes from the exit vertex to the entry vertex (see Figure 6.2). (We stress that this is an auxiliary construction that differs from and yet is related to the use of gadgets in the foregoing construction of G φ .) The three edges that link the designated pairs of vertices that correspond to the 9
Indeed, the conjecture that such gadgets exist can only be attributed to ingenuity. Here, a simple cycle is a strongly connected directed graph in which each vertex has a single incoming (resp., outgoing) edge. In particular, selfloops are allowed. 10
206
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
x +x
+x x
+x
Figure 6.1: Tracks connecting clause gadgets in the reduction to cycle cover.
three literals are called nice. We say that a collection of edges C (e.g., a collection of cycles in the augmented gadget) uses the external edges S if the intersection of C with the set of the (nine) external edges equals S. We postulate the following three properties of the clause gadget. 1. The sum of the weights of all cycle covers (of the gadget) that do not use any external edge (i.e., use the empty set of external edges) equals zero. 2. Let V (S) denote the set of vertices incident to S, and say that S is nice if it is nonempty and the vertices in V (S) can be perfectly matched using nice edges.11 Then, there exists a constant c (indeed the one postulated in the proposition’s claim) such that, for any nice set S, the sum of the weights of all cycle covers that use the external edges S equals c.
Figure 6.2: External edges for the analysis of the effect of a clause gadget. On the left is a gadget with the track edges adjacent to it (as in the real construction). On the right is a gadget and four out of the nine external edges (two of which are nice) used in the analysis.
11
Clearly, any nonempty set of nice edges is a nice set. Thus, a singleton set is nice if and only if the corresponding edge is nice. On the other hand, any set S of three (vertexdisjoint) external edges is nice, because V (S) has a perfect matching using all three nice edges. Thus, the notion of nice sets is “nontrivial” only for sets of two edges. Such a set S is nice if and only if V (S) consists of two pairs of corresponding designated vertices.
207
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
The gadget uses eight vertices, where the first six are the designated (entry and exit) vertices. The entry vertex (resp., exit vertex) associated with the i th literal is numbered i (resp., i + 3). The corresponding adjacency matrix follows. 1 0 0 2 0 0 0 0 0 1 0 0 3 0 0 0 0 0 0 0 0 1 0 0 0 0 −1 1 −1 0 1 1 0 0 −1 −1 2 0 1 1 0 0 0 −1 −1 0 1 1 0 0 1 1 1 0 2 −1 0
0
1
1
1
0
0
1
Note that the edge 3 → 6 can be contracted, but the resulting 7vertex graph will not be consistent with our (inessentially stringent) definition of a gadget by which the six designated vertices should be distinct. Figure 6.3: A Deus ex Machina clause gadget for the reduction to cycle cover.
3. For any nonnice set S = ∅ of external edges, the sum of the weights of all cycle covers that use the external edges S equals zero. Note that the foregoing three cases exhaust all the possible ones. Also note that the set of external edges used by a cycle cover (of the augmented gadget) must be a matching (i.e., these edges must be vertex disjoint). Intuitively, there is a correspondence between nice sets of external edges (of an augmented gadget) and the pairs of edges on tracks that pass through the (unaugmented) gadget. Indeed, we now turn back to G φ , which uses unaugmented gadgets. Using the foregoing properties of the (augmented) gadgets, it can be shown that each satisfying assignment of φ contributes exactly cm to the SWCC of G φ (see Exercise 6.26). It follows that the SWCC of G φ equals cm · #R3SAT (φ). Having established the validity of the abstract reduction, we turn to the implementation of the clause gadget. The first implementation is a Deus ex Machina, with a corresponding adjacency matrix depicted in Figure 6.3. Its validity (for the value c = 12) can be verified by computing the permanent of the corresponding submatrices (see analogous analysis in Exercise 6.28). A more structured implementation of the clause gadget is depicted in Figure 6.4, which refers to a (hexagon) box to be implemented later. The box contains several vertices and weighted edges, but only two of these vertices, called terminals, are connected to the outside (and are shown in Figure 6.4). The clause gadget consists of five copies of this box, where three copies are designated for the three literals of the clause (and are marked LB1, LB2, and LB3), as well as additional vertices and edges shown in Figure 6.4. In particular, the clause gadget contains the six aforementioned designated vertices (i.e., a pair of entry and exit vertices per each literal), two additional vertices (shown at the two extremes of the figure), and some edges (all having weight 1). Each designated vertex has a selfloop, and is incident to a single additional edge that is outgoing (resp., incoming) in case the vertex is an entry vertex (resp., exit vertex) of the gadget. The two terminals of each box that is associated with some literal are connected to the corresponding pair of designated
208
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING entry1
entry2
LB1
LB2
exit1
exit2
entry3
LB3
exit3
Figure 6.4: A structured clause gadget for the reduction to cycle cover.
vertices (e.g., the outgoing edge of entry1 is incident at the right terminal of the box LB1). Note that the five boxes reside on a directed path (going from left to right), and the only edges going in the opposite direction are those drawn below this path. In continuation of the foregoing, we wish to state the desired properties of the box. Again, we do so by considering the augmentation of the box by external edges (of weight 1) incident at the specified vertices. In this case (see Figure 6.5), we have a pair of antiparallel edges connecting the two terminals of the box as well as two selfloops (one on each terminal). We postulate the following three properties of the box. 1. The sum of the weights of all cycle covers (of the box) that do not use any external edge equals zero. 2. There exists a constant b (in our case b = 4) such that, for each of the two antiparallel edges, the sum of the weights of all cycle covers that use this edge equals b. 3. For any (nonempty) set S of the selfloops, the sum of the weights of all cycle covers (of the box) that use S equals zero.
Figure 6.5: External edges for the analysis of the effect of a box. On the left is a box with potential edges adjacent to it (as in the gadget construction). On the right is a box and the four external edges used in the analysis.
209
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Note that the foregoing three cases exhaust all the possible ones. It can be shown that the conditions regarding the box imply that the construction presented in Figure 6.4 satisfies the conditions that were postulated for the clause gadget (see Exercise 6.27). Specifically, we have c = b5 . As for box itself, a smaller Deus ex Machina is provided by the following 4by4 adjacency matrix 0 1 −1 −1 1 −1 1 1 (6.4) 0 1 1 2 0 1 3 0 where the two terminals correspond to the first and the fourth vertices. Its validity (for the value b = 4) can be verified by computing the permanent of the corresponding submatrices (see Exercise 6.28). Proof of Proposition 6.22: The proof proceeds in two steps. In the first step we show that computing the permanent of integer matrices is reducible to computing the permanent of nonnegative matrices. This reduction proceeds as follows. For an nbyn integer matrix A = (ai, j )i, j∈[n] , let )A)∞ = maxi, j (ai, j ) and Q A = 2(n!) · )A)n∞ + 1. We note that, given A, the value Q A can be computed in polynomial time, and in particular log2 Q A < n 2 log )A)∞ . Given the matrix A, the reduction constructs the nonnegative matrix A = (ai, j mod Q A )i, j∈[n] (i.e., the entries of A are in {0, 1, . . . , Q A − 1}), queries the oracle for the permanent of A , and def outputs v = perm(A ) mod Q A if v < Q A /2 and −(Q A − v) otherwise. The key observation is that
perm(A) ≡ perm(A ) (mod Q A ), while perm(A) ≤ (n!) · )A)n∞ < Q A /2. Thus, perm(A ) mod Q A (which is in {0, 1, . . . , Q A − 1}) determines perm(A). We note that perm(A ) is likely to be much larger than Q A > perm(A); it is merely that perm(A ) and perm(A) are equivalent modulo Q A . In the second step we show that computing the permanent of nonnegative matrices is reducible to computing the permanent of 0/1matrices. In this reduction, we view the computation of the permanent as the computation of the sum of the weights of all the cycle covers (SWCC) of the corresponding weighted directed graph (see proof of Proposition 6.21). Thus, we reduce the computation of the SWCC of directed graphs with nonnegative weights to the computation of the SWCC of unweighted directed graphs with no parallel edges (which correspond to 0/1matrices). The reduction is via local replacements that preserve the value of the SWCC. These local replacements combine the following two local replacements (which preserve the SWCC): t 1. Replacing an edge of weight w = i=1 wi by a path of length t (i.e., t − 1 internal nodes) with the corresponding weights w1 , . . . , wt , and selfloops (with weight 1) on all internal nodes. Note that a cycle cover that uses the original edge corresponds to a cycle cover that uses the entire path, whereas a cycle cover that does not use the original edge corresponds to a cycle cover that uses all the selfloops. t 2. Replacing an edge of weight w = i=1 wi by t parallel 2edge paths such that the first edge on the i th path has weight wi , the second edge has weight 1, and the
210
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
intermediate node has a selfloop (with weight 1). (Paths of length two are used because parallel edges are not allowed.) Note that a cycle cover that uses the original edge corresponds to a collection of cycle covers that use one out of the t paths (and the selfloops of all other intermediate nodes), whereas a cycle cover that does not use the original edge corresponds to a cycle cover that uses all the selfloops. In particular, we may write each positive edge weight w, having binary expansion σw−1 · · · σ0 , as i:σi =1 (1 + 1)i , and apply the adequate replacements (i.e., first apply the additive replacement to the outer sum (over {i : σi = 1}), next apply the product replacement to each power 2i , and finally apply the additive replacement to each 1 + 1). Applying this process to the matrix A obtained in the first step, we efficiently obtain a matrix A with 0/1entries such that perm(A ) = perm(A ). (In particular, the dimension of A is polynomial in the length of the binary representation of A , which in turn is polynomial in the length of the binary representation of A.) Combining the two reductions (steps), the proposition follows.
6.2.2. Approximate Counting Having seen that exact counting (for relations in PC) seems even harder than solving the corresponding search problems, we turn to relaxations of the counting problem. Before focusing on relative approximation, we briefly consider approximation with (large) additive deviation. Let us consider the counting problem associated with an arbitrary R ∈ PC. Without loss of generality, we assume that all solutions to nbit instances have the same length (n), where indeed is a polynomial. We first note that, while it may be hard to compute #R, given x it is easy to approximate #R(x) up to an additive error of 0.01 · 2(x) (by randomly sampling potential solutions for x). Indeed, such an approximation is very rough, but it is not trivial (and in fact we do not know how to obtain it deterministically). In general, we can efficiently produce at random an estimate of #R(x) that, with high probability, deviates from the correct value by at most an additive term that is related to the absolute upper bound on the number of solutions (i.e., 2(x) ). Proposition 6.23 (approximation with large additive deviation): Let R ∈ PC and be a polynomial such that R ⊆ ∪n∈N {0, 1}n × {0, 1}(n) . Then, for every polynomial p, there exists a probabilistic polynomialtime algorithm A such that for every x ∈ {0, 1}∗ and δ ∈ (0, 1) it holds that
Pr[A(x, δ) − #R(x) > (1/ p(x)) · 2(x) ] < δ.
(6.5)
As usual, δ is presented to A in binary, and hence the running time of A(x, δ) is upperbounded by poly(x · log(1/δ)). Proof Sketch: On input x and δ, algorithm A sets t = ( p(x)2 · log(1/δ)), selects uniformly y1 , . . . , yt and outputs 2(x) · {i : (x, yi ) ∈ R}/t.
211
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Discussion. Proposition 6.23 is meaningful in the case that #R(x) > (1/ p(x)) · 2(x) holds for some x’s. But otherwise, a trivial approximation (i.e., outputting the constant value zero) meets the bound of Eq. (6.5). In contrast to this notion of additive approximation, a relative factor approximation is typically more meaningful. Specifically, we will be interested in approximating #R(x) up to a constant factor (or some other reasonable factor). In §6.2.2.1, we consider a natural #Pcomplete problem for which such a relative approximation can be obtained in probabilistic polynomial time. We do not expect this to happen for every counting problem in #P, because a relative approximation allows for distinguishing instances having no solution from instances that do have solutions (i.e., deciding membership in S R is reducible to a relative approximation of #R). Thus, relative approximation for all #P is at least as hard as deciding all problems in N P. However, in §6.2.2.2 we show that the former is not harder than the latter; that is, relative approximation for any problem in #P can be obtained by a randomized Cookreduction to N P. Before turning to these results, let us state the underlying definition (and actually strengthen it by requiring approximation to within a factor of 1 ± ε, for ε ∈ (0, 1)).12 Definition 6.24 (approximation with relative deviation): Let f : {0, 1}∗ → N and ε, δ : N → [0, 1]. A randomized process is called an (ε, δ)approximator of f if for every x it holds that
Pr [(x) − f (x) > ε(x) · f (x)] < δ(x).
(6.6)
We say that f is efficiently (1 − ε)approximable (or just (1 − ε)approximable) if there exists a probabilistic polynomialtime algorithm A that constitutes an (ε, 1/3)approximator of f . The error probability of the latter algorithm A (which has error probability 1/3) can be reduced to δ by O(log(1/δ)) repetitions (see Exercise 6.29). Typically, the running time of A will be polynomial in 1/ε, and ε is called the deviation parameter. We comment that the computational problem undelying Definition 6.24 is the search problem of (1 − ε)approximating a function f (i.e., solving the search problem def R f,ε = {(x, v) : v − f (x) ≤ ε(x) · f (x)}). Typically (see Exercise 6.30 for details), this search problem is computationally equivalent to the promise (“gap”) problem of distinguishing elements of the set {(x, v) : v < (1 − ε(x)) · f (x)} and elements of the set {(x, v) : v > (1 + ε(x)) · f (x)}.
6.2.2.1. Relative Approximation for #Rdnf In this subsection we present a natural #Pcomplete problem for which constant factor approximation can be found in probabilistic polynomial time. Stronger results regarding unnatural #Pcomplete problems appear in Exercise 6.31. Consider the relation Rdnf consisting of pairs (φ, τ ) such that φ is a DNF formula and τ is an assignment satisfying it. Recall that the search problem of Rdnf is easy to solve and that the proof of Theorem 6.19 establishes that #Rdnf is #Pcomplete (via a nonparsimonious 12
We refrain from formally defining an Ffactor approximation, for an arbitrary F, although we shall refer to this notion in several informal discussions. There are several ways of defining the aforementioned term (and they are all equivalent when applied to our informal discussions). For example, an Ffactor approximation of #R may mean that, with high probability, the output A(x) satisfies #R(x)/F(x) ≤ A(x) ≤ F(x) · #R(x). Alternatively, we may require that #R(x) ≤ A(x) ≤ F(x) · #R(x) (or, alternatively, that #R(x)/F(x) ≤ A(x) ≤ #R(x).
212
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
reduction). Still, as we shall see, there exists a probabilistic polynomialtime algorithm that provides a constant factor approximation of #Rdnf . We warn that the fact that #Rdnf is #Pcomplete via a nonparsimonious reduction means that the constant factor approximation for #Rdnf does not seem to imply a similar approximation for all problems in #P. In fact, we should not expect each problem in #P to have a (probabilistic) polynomialtime constantfactor approximation algorithm because this would imply N P ⊆ BPP (since a constant factor approximation allows for distinguishing the case in which the instance has no solution from the case in which the instance has a solution). The approximation algorithm for #Rdnf is obtained by a deterministic reduction of of the task of (ε, 1/3)approximating #Rdnf to an (additive deviation) approximation m Ci , where the type provided in Proposition 6.23. Consider a DNF formula φ = i=1 each Ci : {0, 1}n → {0, 1} is a conjunction. Our task is to approximate the number of assignments that satisfy at least one of the conjunctions. Actually, we will deal with the more general problem in which we are (implicitly) given m subsets S1 , . . . , Sm ⊆ {0, 1}n and wish to approximate  i Si . In our case, each Si is the set of assignments that satisfy the conjunction Ci . In general, we make two computational assumptions regarding these sets (while letting “efficient” mean implementable in time polynomial in n · m): 1. Given i ∈ [m], one can efficiently determine Si . 2. Given i ∈ [m] and J ⊆ [m], one can efficiently approximate Prs∈Si [s ∈ j∈J S j ] up to an additive deviation of 1/poly(n + m). −1 These assumptions are satisfied in our setting (where m Si = Ci (1); see Exercise 6.32). Now, the key observation toward approximating  i=1 Si  is that m m m Si \ S j = Prs∈Si s ∈ S j · Si  (6.7) Si = j i x equals Prh∈Hi [{y ∈ R(x) : h(y) = 0i } ≥ 1], which in turn is upperbounded by α/(α − 1)2 , where α = 2i /R(x) > 1 (using Eq. (6.8) with ε = α − 1).14 Thus, the probability that the procedure does not halt by iteration i x + 4 is upperbounded by 8/49 < 1/6 (because i x > (log2 R(x)) − 1). Thus, with probability at least 5/6, the output is at most 2i x +4 ≤ 16 · R(x) (because i x ≤ log2 R(x)). Thus, with probability at least (7/8) − (1/6) > 2/3, the foregoing procedure outputs a value v such that v/16 ≤ R(x) < 16v. Reducing the deviation by using the ideas presented in Exercise 6.33 (and reducing the error probability as in Exercise 6.29), the theorem follows. Digest. The key observation underlying the proof of Theorem 6.27 is that, while (even with the help of an NPoracle) we cannot directly test whether the number of solutions is greater than a given number, we can test (with the help of an NPoracle) whether the number of solutions that “survive a random sieve” is greater than zero. Since the number of solutions that survive a random sieve reflects the total number of solutions (normalized by the sieve’s density), this offers a way of approximating the total number of solutions. We mention that one can also test whether the number of solutions that “survive a random sieve” is greater than a small number, where small means polynomial in the length of the input (see Exercise 6.35). Specifically, the complexity of this test is linear in the size of the threshold, and not in the length of its binary description. Indeed, in many settings it is more advantageous to use a threshold that is polynomial in some efficiency parameter (rather than using the threshold zero); examples appear in §6.2.4.2 and in [106]. 13
Note that 0 does not reside in the open interval (0, 2ρ), where ρ = R(x)/2i > 0. Here we use the fact that 1 ∈ (2α −1 − 1, 1). A better bound can be obtained by using the hypothesis that, for every y, when h is uniformly selected in Hi , the value of h(y) is uniformly distributed in {0, 1}i . In this case, Prh∈H i [{y ∈ R(x) : h(y) = 0i } ≥ 1] is upperbounded by Eh∈H i [{y ∈ R(x) : h(y) = 0i }] = R(x)/2i . 14
216
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
6.2.3. Searching for Unique Solutions A natural computational problem (regarding search problems), which arises when discussing the number of solutions, is the problem of distinguishing instances having a single solution from instances having no solution (or finding the unique solution whenever such exists). We mention that instances having a single solution facilitate numerous arguments (see, for example, Exercise 6.24 and §10.2.2.1). Formally, searching for and deciding the existence of unique solutions are defined within the framework of promise problems (see Section 2.4.1). Definition 6.28 (search and decision problems for unique solution instances): The set of instances having unique solutions with respect to the binary relation R is def
def
defined as US R = {x : R(x) = 1}, where R(x) = {y : (x, y) ∈ R}. As usual, we def denote S R = {x : R(x) ≥ 1}, and S R = {0, 1}∗ \ S R = {x : R(x) = 0}. • The problem of finding unique solutions for R is defined as the search problem R with promise US R ∪ S R (see Definition 2.29). In continuation of Definition 2.30, candid searching for unique solutions for R is defined as the search problem R with promise US R . • The problem of deciding unique solutions for R is defined as the promise problem (US R , S R ) (see Definition 2.31). Interestingly, in many natural cases, the promise does not make any of these problems any easier than the original problem. That is, for all known NPcomplete problems, the original problem is reducible in probabilistic polynomial time to the corresponding unique instances problem. Theorem 6.29: Let R ∈ PC and suppose that every search problem in PC is parsimoniously reducible to R. Then solving the search problem of R (resp., deciding membership in S R ) is reducible in probabilistic polynomial time to finding unique solutions for R (resp., to the promise problem (US R , S R )). Furthermore, there exists a probabilistic polynomialtime computable mapping M such that for every x ∈ S R it holds that Pr[M(x) ∈ S R ] = 1, whereas for every x ∈ S R it holds that Pr[M(x) ∈ US R ] ≥ 1/poly(x). We highlight the fact that the hypothesis asserts that R is PCcomplete via parsimonious reductions; this hypothesis is crucial to Theorem 6.29 (see Exercise 6.36). The large (but boundedaway from 1) error probability of the randomized Karpreduction M can be reduced by repetitions, yielding a randomized Cookreduction with exponentially vanishing error probability. Note that the resulting reduction may make many queries that violate the promise, and still yield the correct answer (with high probability) by relying on queries that satisfy the promise. (Specifically, in the case of search problems, we avoid wrong solutions by checking each solution obtained, while in the case of decision problems we rely on the fact that for every x ∈ S R it always holds that M(x) ∈ S R .) Proof: We focus on establishing the furthermore clause (and the main claim follows). The proof uses many of the ideas of the proof of Theorem 6.27, and we refer to the latter for motivation. We shall again make essential use of hashing functions, and rely on the material presented in Appendix D.2.1–D.2.2.
217
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
As in the proof of Theorem 6.27, the idea is to apply a “random sieve” on R(x), this time with the hope that a single element survives. Specifically, if we let each element pass the sieve with probability approximately 1/R(x) then with constant probability a single element survives. In such a case, we shall obtain an instance with a unique solution (i.e., an instance of S R,H having a single NPwitness), which will (essentially) fulfill our quest. Sieving will be performed by a random function selected in an adequate hashing family (see Appendix D.2). A couple of questions arise: 1. How do we get an approximation to R(x)? Note that we need such an approximation in order to determine the adequate hashing family. Note that invoking Theorem 6.27 will not do, because the said oracle machine uses an oracle to N P (which puts us back to square one, let alone that the said reduction makes many queries).15 Instead, we just select m ∈ {0, . . . , poly(x)} uniformly and note that (if R(x) > 0 then) Pr[m = &log2 R(x)'] = 1/poly(x). Next, we randomly map x to (x, m, h), where h is uniformly selected in an adequate hashing family. 2. How does the question of whether a single element of R(x) pass the random sieve translate to an instance of the uniquesolution problem for R? Recall that in the proof of Theorem 6.27 the nonemptiness of the set of elements of R(x) that pass the sieve (defined by h) was determined by checking membership (of (x, m, h)) in S R,H ∈ N P (defined in Eq. (6.9)). Furthermore, the number of NPwitnesses for (x, m, h) ∈ S R,H equals the number of elements of R(x) that pass the sieve. Thus, a single element of R(x) passes the sieve (defined by h) if and only if (x, m, h) ∈ S R,H has a single NPwitness. Using the parsimonious reduction of S R,H to S R (which is guaranteed by the theorem’s hypothesis), we obtained the desired instance. Note that in case R(x) = ∅ the aforementioned mapping always generates a noinstance (of S R,H and thus of S R ). Details follow. Implementation (i.e., the mapping M ). As in the proof of Theorem 6.27, we as
sume, without loss of generality, that R(x) ⊆ {0, 1} , where = poly(x). We start by uniformly selecting m ∈ {1, . . . , + 1} and h ∈ Hm , where Hm is a family of efficiently computable and pairwiseindependent hashing functions (see Definition D.1) mapping bit long strings to mbit long strings. Thus, we obtain an instance (x, m, h) of S R,H ∈ N P such that the set of valid solutions for (x, m, h) equals {y ∈ R(x) : h(y) = 0m }. Using the parsimonious reduction g of the NPwitness relation of S R,H to R (i.e., the NPwitness relation of S R ), we map (x, m, h) to g(x, m, h), and it holds that {y ∈ R(x) : h(y) = 0m } equals R(g(x, m, h)). To summarize, on def input x the randomized mapping M outputs the instance M(x) = g(x, m, h), where m ∈ {1, . . . , + 1} and h ∈ Hm are uniformly selected. The analysis. Note that for any x ∈ S R it holds that Pr[M(x) ∈ S R ] = 1. Assuming def
that x ∈ S R , with probability exactly 1/( + 1) it holds that m = m x , where m x = &log2 R(x)' + 1. Focusing on the case that m = m x , for a uniformly selected 15
Needless to say, both problems can be resolved by using a reduction to uniquesolution instances, but we still do not have such a reduction – we are currently designing it.
218
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING def
h ∈ Hm , we shall lowerbound the probability that the set Rh (x) = {y ∈ R(x) : h(y) = 0m } is a singleton. First, using the InclusionExclusion Principle, we lowerbound Prh∈Hm x [Rh (x) > 0] by Prh∈Hm x [h(y) = 0m x ] − Prh∈Hm x [h(y1 ) = h(y2 ) = 0m x ]. y1 1] by Prh∈Hm x [h(y1 ) = h(y2 ) = 0m x ]. y1 0] − Prh∈Hm x [Rh (x) > 1] ≥ Prh∈Hm x [h(y) = 0m x ] − 2 · Prh∈Hm x [h(y1 ) = h(y2 ) = 0m x ] y∈R(x)
= R(x) · 2−m x − 2 ·
y1 1 − µ (z), where µ (n) = µ(n)/(n). In the rest of the analysis we ignore the probability that the estimate of #R(z) provided by the randomized oracle A (on query z) deviates from the aforementioned interval. (We note that these rare events are the only source of the possible deviation of the output distribution from the uniform distribution on R(x).)21 Next, let us assume for a moment that A is deterministic and that for every x and y it holds that A(g(x; y 0)) + A(g(x; y 1)) ≤ A(g(x; y )).
(6.10)
We also assume that the approximation is correct at the “trivial level” (where one may just check whether or not (x, y) is in R); that is, for every y ∈ {0, 1}(x) , it holds that A(g(x; y)) = 1 if (x, y) ∈ R and A(g(x; y)) = 0 otherwise.
(6.11)
We modify the i th iteration of the foregoing procedure such that, when entering with the (i − 1)bit long prefix y , we set the i th bit to σ ∈ {0, 1} with probability A(g(x; y σ ))/A(g(x; y )) and halt (with output ⊥) with the residual probability (i.e., 1 − (A(g(x; y 0))/A(g(x; y ))) − (A(g(x; y 1))/A(g(x; y )))). Indeed, Eq. (6.10) guarantees that the latter instruction is sound, since the two main probabilities sum up to at most 1. If we completed the last (i.e., (x)th ) iteration, then we output the (x)bit long string that was generated. Thus, as long as Eq. (6.10) holds (but regardless of other aspects of the quality of the approximation), every 20
In fact, many approximate counting algorithms rely explicitly or implicitly on Theorem 6.31 (see, e.g., [168, Sec. 11.3.1] and [131]). 21 Note that the (negligible) effect of these rare events may not be easy to correct. For starters, we do not necessarily get an indication when these rare events occur. Furthermore, these rare events may occur with different probability in the different invocations of algorithm A (i.e., on different queries).
222
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
y = σ1 · · · σ(x) ∈ R(x) is output with probability A(g(x; σ1 )) A(g(x; σ1 σ2 )) A(g(x; σ1 σ2 · · · σ(x) )) · ··· A(g(x; λ)) A(g(x; σ1 )) A(g(x; σ1 σ2 · · · σ(x)−1 ))
(6.12)
which, by Eq. (6.11), equals 1/A(g(x; λ)). Thus, the procedure outputs each element of R(x) with equal probability, and never outputs a non⊥ value that is outside R(x). It follows that the quality of approximation only affects the probability that the procedure outputs a non⊥ value (which in turn equals R(x)/A(g(x; λ))). The key point is that, as long as Eq. (6.11) holds, the specific approximate values obtained by the procedure are immaterial – with the exception of A(g(x; λ)), all these values “cancel out.” We now turn to enforcing Eq. (6.10) and Eq. (6.11). We may enforce Eq. (6.11) by performing the straightforward check (of whether or not (x, y) ∈ R) rather than invoking A(g(x, y)).22 As for Eq. (6.10), we enforce it artificially by us def ing A (x, y ) = (1 + ε(x))3((x)−y ) · A(g(x; y )) instead of A(g(x; y )). Recalling that A(g(x; y )) = (1 ± ε(x)) · R (x; y ), we have
A (x, y ) > (1 + ε(x))3((x)−y ) · (1 − ε(x)) · R (x; y )
A (x, y σ ) < (1 + ε(x))3((x)−y −1) · (1 + ε(x)) · R (x; y σ ) and the claim (that Eq. (6.10) holds) follows by using (1 − ε(x)) · (1 + ε(x))3 > (1 + ε(x)). Note that the foregoing modification only affects the probability of outputting a non⊥ value; this good event now occurs with probability R (x; λ)/A (x, λ), which is lowerbounded by (1 + ε(x))−(3(x)+1) > 1/2, where the inequality is due to the setting of ε (i.e., ε(n) = 1/5(n)). Finally, we refer to our assumption that A is deterministic. This assumption was only used in order to identify the value of A(g(x, y )) obtained and used in the (y  − 1)st iteration with the value of A(g(x, y )) obtained and used in the y th iteration. The same effect can be obtained by just reusing the former value (in the y th iteration) rather than reinvoking A in order to obtain it. Part 1 follows. Toward Part 2, let use first reduce the task of approximating #R to the task of (exact) uniform generation for R. On input x ∈ S R , the reduction uses the tree of possible prefixes of elements of R(x) in a somewhat different manner. Again, we proceed in iterations, entering the i th iteration with an (i − 1)bit long string y such def that R (x; y ) = {y : (x, y y ) ∈ R} is not empty. At the i th iteration we estimate the bigger among the two fractions R (x; y 0)/R (x; y ) and R (x; y 1)/R (x; y ), by uniformly sampling the uniform distribution over R (x; y ). That is, taking poly(x/ε (x)) uniformly distributed samples in R (x; y ), we obtain with overwhelmingly high probability an approximation of these fractions up to an additive deviation of at most ε (x). This means that we obtain a relative approximation up to a factor of 1 ± 3ε (x) for the fraction (or fractions) that is (resp., are) bigger than 1/3. Indeed, we may not be able to obtain such a good relative approximation of the other fraction (in the case that the other fraction is very small), but this does not matter. It also does not matter that we cannot tell which is the bigger fraction among the two; it only matter that we use an approximation that indicates a quantity that is, Alternatively, we note that since A is a (1 − ε)approximator for ε < 1 it must hold that #R (z) = 0 implies A(z) = 0. Also, since ε < 1/3, if #R (z) = 1 then A(z) ∈ (2/3, 4/3), which may be rounded to 1. 22
223
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
say, bigger than 1/3. We proceed to the next iteration by augmenting y using the bit that corresponds to such a quantity. Specifically, suppose that we obtained the approximations a0 (y ) ≈ R (x; y 0)/R (x; y ) and a1 (y ) ≈ R (x; y 1)/R (x; y ). Then we extend y by the bit 1 if a1 (y ) > a0 (y ) and extend y by the bit 0 otherwise. Finally, when we reach y = σ1 · · · σ(x) such that (x, y) ∈ R, we output aσ1 (λ)−1 · aσ2 (σ1 )−1 · · · aσ(x) (σ1 σ2 · · · σ(x)−1 )−1
(6.13)
(x;σ1 σ2 ···σi ) . where for each i it holds that aσi (σ1 σ2 · · · σi−1 ) is (1 ± 3ε (x)) · RR (x;σ 1 σ2 ···σi−1 ) As in Part 1, actions regarding R (in this case uniform generation in R ) are conducted via the parsimonious reduction g to R. That is, whenever we need to sample uniformly in the set R (x; y ), we sample the set R(g(x; y )) and recover the corresponding element of R (x; y ) by using the mapping guaranteed by the hypothesis that g is strongly parsimonious. Finally, note that so far we assumed a uniform generation procedure for R, but using an (1 − ε )approximate uniform generation merely means that all our approximations deviate by another additive term of ε . Thus, with overwhelmingly high probability, for each i it holds that aσi (σ1 σ2 · · · σi−1 ) is (1 ± 6ε (x)) · R (x; σ1 σ2 · · · σi )/R (x; σ1 σ2 · · · σi−1 ). It follows that, on input x, when using an oracle that provides a (1 − ε )approximate uniform generation for R, with overwhelmingly high probability, the output (as defined in Eq. (6.13)) is in (x)
i=1
(1 ± 6ε (x))−1 ·
R (x; σ1 · · · σi−1 ) R (x; σ1 · · · σi )
(6.14)
where the error probability is due to the unlikely case that in one of the iterations our approximation deviates from the correct value by more than an additive deviation term of 2ε (n). Noting that Eq. (6.14) equals (1 ± 6ε (x))−(x) · R(x) and using (1 ± 6ε (x))−(x) ⊂ (1 ± ε(x)) (which holds for ε = ε/7), Part 2 follows.
6.2.4.2. A Direct Procedure for Uniform Generation We conclude the current chapter by presenting a direct procedure for solving the uniform generation problem of any R ∈ PC. This procedure uses an oracle to N P (or to S R itself in case it is NPcomplete), which is unavoidable because solving the uniform generation problem of R implies solving the corresponding search problem (which in turn implies deciding membership in S R ). One advantage of this procedure, over the reduction presented in §6.2.4.1, is that it solves the uniform generation problem rather than the approximate uniform generation problem. We are going to use hashing again, but this time we use a family of hashing functions having a stronger “uniformity property” (see Appendix D.2.3). Specifically, we will use a family of wise independent hashing functions mapping bit strings to mbit strings, where bounds the length of solutions in R, and rely on the fact that such a family satisfies Lemma D.6. Intuitively, such functions partition {0, 1} into 2m cells and Lemma D.6 asserts that these partitions “uniformly shatter” all sufficiently large sets. That is, for every set S ⊆ {0, 1} of size ( · 2m ), the partition induced by almost every function in this family is such that each cell contains approximately S/2m elements of S. In particular, if S = ( · 2m ) then each cell contains () elements of S. We denote this family of functions by Hm , and rely on the fact that its elements have succinct and effective representation (as defined in Appendix D.2.1).
224
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
6.2. COUNTING
Loosely speaking, the following procedure (for uniform generation) first selects a random hashing function and tests whether it “uniformly shatters” the target set S = R(x). If this condition holds then the procedure selects a cell at random and retrieves all the elements of S residing in the chosen cell. Finally, the procedure either outputs one of the retrieved elements or halts with no output, where each retrieved element is output with a fixed probability p (which is independent of the actual number of elements of S that reside in the chosen cell). This guarantees that each element e ∈ S is output with the same probability (i.e., 2−m · p), regardless of the number of elements of S that reside with e in the same cell. In the following construction, we assume that on input x we also obtain a good approximation to the size of R(x). This assumption can be enforced by using an approximate counting procedure as a preprocessing stage. Alternatively, the ideas presented in the following construction yield such an approximate counting procedure. Construction 6.32 (uniform generation): On input x and m x ∈ {m x , m x + 1}, def where m x = log2 R(x) and R(x) ⊆ {0, 1} , the oracle machine proceeds as follows. 1. Selecting a partition that “uniformly shatters” R(x). The machine sets m = max(0, m x − log2 40) and selects uniformly h ∈ Hm . Such a function defines a partition of {0, 1} into 2m cells,23 and the hope is that each cell contains approximately the same number of elements of R(x). Next, the machine checks that this is indeed the case or rather than no cell contains more than 120 elements of R(x) (i.e., more than twice the expected number). This is done by (1) checking whether or not (x, h, 1120+1 ) is in the set S R,H defined as follows S R,H = {(x , h , 1t ) : ∃v s.t. {y : (x , y) ∈ R ∧ h (y) = v} ≥ t} (6.15) (1)
def
= {(x , h , 1t ) : ∃v, y1 , . . . , yt s.t. ψ (1) (x , h , v, y1 , . . . , yt )}, where ψ (1) (x , h , v, y1 , . . . , yt ) holds if and only if y1 < y2 · · · < yt and for every (1) j ∈ [t] it holds that (x , y j ) ∈ R ∧ h (y j ) = v. Note that S R,H ∈ N P. If the answer is positive (i.e., there exists a cell that contains more than 120 elements of R(x)) then the machine halts with output ⊥. Otherwise, the machine continues with this choice of h. In this case, no cell contains more than 120 elements of R(x) (i.e., for every v ∈ {0, 1}m , it holds that {y : (x, y) ∈ R ∧ h(y) = v} ≤ 120). We stress that this is an absolute guarantee that follows (1) from (x, h, 1120+1 ) ∈ S R,H . 2. Selecting a cell and determining the number of elements of R(x) that are contained in it. The machine selects uniformly v ∈ {0, 1}m and determines def sv = {y : (x, y) ∈ R ∧ h(y) = v} by making queries to the following NPset S R,H = {(x , h , v , 1t ) : ∃y1 , . . . , yt s.t. ψ (1) (x , h , v , y1 , . . . , yt )}. (6.16) (2)
def
(2)
Specifically, for i = 1, . . . , 120, it checks whether (x, h, v, 1i ) is in S R,H , and sets sv to be the largest value of i for which the answer is positive. 23
For sake of uniformity, we also allow the case of m = 0, which is rather artificial. In this case all hashing functions in H0 map {0, 1} to the empty string, which is viewed as 00 , and thus define a trivial partition of {0, 1} (i.e., into a single cell).
225
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
3. Obtaining all the elements of R(x) that are contained in the selected cell, and outputting one of them at random. Using sv , the procedure reconstructs the set def Sv = {y : (x, y) ∈ R ∧ h(y) = v}, by making queries to the following NPset S R,H = {(x , h , v , 1t , j) : ∃y1 , . . . , yt s.t. ψ (3) (x , h , v , y1 , . . . , yt , j)}, (6.17) (3) (1) where ψ (x , h , v , y1 , . . . , yt , j) holds if and only if ψ (x , h , v , y1 , . . . , yt ) holds and the j th bit of y1 · · · yt equals 1. Specifically, for j1 = 1, . . . , sv and j2 = 1, . . . , , we make the query (x, h, v, 1sv , ( j1 − 1) · + j2 ) in order to determine the j2th bit of y j1 . Finally, having recovered Sv , the procedure outputs each y ∈ Sv with probability 1/120, and outputs ⊥ otherwise (i.e., with probability 1 − (sv /120)). (3)
def
Recall that for R(x) = () and m = m x − log2 40, Lemma D.6 implies that, with overwhelmingly high probability (over the choice of h ∈ Hm ), each set {y : (x, y) ∈ R ∧ h(y) = v} has cardinality (1 ± 0.5)R(x)/2m . Thus, ignoring the case of R(x) = O(), Step 1 can be easily adapted to yield an approximate counting procedure for #R; see Exercise 6.41, which also handles the case of R(x) = O() by using ideas as in Step 2. However, our aim is to establish the following result. Proposition 6.33: Construction 6.32 solves the uniform generation problem of R. Proof: Intuitively, by Lemma D.6 (and the setting of m), with overwhelmingly high probability, a uniformly selected h ∈ Hm partitions R(x) into 2m cells, each containing at most 120 elements. Following is the tedious proof of this fact. Since m = max(0, m x − log2 40), we may focus on the case that m x > log2 40 (as in the other case R(x) ≤ 2m x +1 ≤ 80). In this case, by Lemma D.6 (using ε = 0.5 and m = m x − log2 40 ≤ log2 R(x) − log2 20 (which implies m ≤ log2 R(x) − log2 (5/ε2 ))), with overwhelmingly high probability, each set {y : (x, y) ∈ R ∧ h(y) = v} has cardinality (1 ± 0.5)R(x)/2m . Using m x > (log2 R(x)) − 1 (and m = m x − log2 40), it follows that R(x)/2m < 80 and hence each cell contains at most 120 elements of R(x). We also note that, using m x ≤ (log2 R(x)) + 1, it follows that R(x)/2m ≥ 20 and hence each cell contains at least 10 elements of R(x). The key observation, stated in Step 1, is that if the procedure does not halt in Step 1 then it is indeed the case that h induces a partition in which each cell contains at most 120 elements of R(x). The fact that these cells may contain a different number of elements is immaterial, because each element is output with the same probability (i.e., 1/120). What matters is that the average number of elements in the various cells is sufficiently large, because this average number determines the probability that the procedure outputs an element of R(x) (rather than ⊥). Specifically, conditioned on not halting in Step 1, the probability that Step 3 outputs some element of R(x) equals the average number of elements per cell (i.e., R(x)/2m ) divided by 120. Recalling that for m > 0 (resp., m = 0) it holds that R(x)/2m ≥ 20 (resp., R(x) ≥ 1), we conclude that in this case some element of R(x) is output with probability at least 1/6 (resp., R(x)/120). Recalling that Step 1 halts with negligible probability, it follows that the procedure outputs some element of R(x) with probability at least 0.99 · min((R(x)/120), (1/6)).
226
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
Comments. We can easily improve the performance of Construction 6.32 by dealing separately with the case m = 0. In such a case, Step 3 can be simplified and improved by uniformly selecting and outputting an element of Sλ (which equals R(x)). Under this modification, the procedure outputs some element of R(x) with probability at least 1/6. In any case, recall that the probability that a uniform generation procedure outputs ⊥ can be deceased by repeated invocations. Digest. Construction 6.32 is the culmination of the “hashing paradigm” that is aimed at allowing various manipulations of arbitrary sets. In particular, as seen in Construction 6.32, hashing can be used in order to partition a large set into an adequate number of small subsets that are of approximately the same size. We stress that hashing is performed by randomly selecting a function in an adequate family. Indeed, the use of randomization for such purposes (i.e., allowing manipulation of large sets) seems indispensable.
Chapter Notes One key aspect of randomized procedures is their success probability, which is obviously a quantitative notion. This aspect provides a clear connection between probabilistic polynomialtime algorithms considered in Section 6.1 and the counting problems considered in Section 6.2 (see also Exercise 6.20). More appealing connections between randomized procedures and counting problems (e.g., the application of randomization in approximate counting) are presented in Section 6.2. These connections justify the presentation of these two topics in the same chapter.
Randomized Algorithms Making people take an unconventional step requires compelling reasons, and indeed the study of randomized algorithms was motivated by a few compelling examples. Ironically, the appeal of the two most famous examples (discussed next) has been somewhat diminished due to subsequent findings, but the fundamental questions that emerged remain fascinating regardless of the status of these two examples. These questions refer to the power of randomization in various computational settings, and in particular in the context of decision and search problems. We shall return to these questions after briefly reviewing the story of the aforementioned examples. The first example: primality testing. For more than two decades, primality testing was the archetypical example of the usefulness of randomization in the context of efficient algorithms. The celebrated algorithms of Solovay and Strassen [211] and of Rabin [184], proposed in the late 1970s, established that deciding primality is in coRP (i.e., these tests always correctly recognize prime numbers, but they may err on composite inputs). (The approach of Construction 6.4, which only establishes that deciding primality is in BPP, is commonly attributed to M. Blum.) In the late 1980s, Adleman and Huang [2] proved that deciding primality is in RP (and thus in ZPP). In the early 2000s, Agrawal, Kayal, and Saxena [3] showed that deciding primality is actually in P. One should note, however, that strong evidence of the fact that deciding primality is in P was actually available from the start: We refer to Miller’s deterministic algorithm [166], which relies on the Extended Riemann Hypothesis.
227
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
The second example: undirected connectivity. Another celebrated example of the power of randomization, specifically in the context of logspace computations, was provided by testing undirected connectivity. The randomwalk algorithm presented in Construction 6.12 is due to Aleliunas, Karp, Lipton, Lov´asz, and Rackoff [5]. Recall that a deterministic logspace algorithm was found twentyfive years later (see Section 5.2.4 or [190]). Another famous example: polynomial identity testing. A third famous example, which dates back to about the same period, is the polynomial identity tester of [65, 199, 243]. This tester, presented in §6.1.3.1, has found many applications in Complexity Theory (some are implicit in subsequent chapters). Needless to say, in the abstract setting of Construction 6.7, randomization is indispensable. Interestingly, the computational version mentioned in Exercise 6.17 has so far resisted derandomization attempts (cf. [134]). Other randomized algorithms. In addition to the three foregoing examples, several other appealing randomized algorithms are known. Confining ourselves to the context of search and decision problems, we mention the algorithms for finding perfect matchings and minimum cuts in graphs (see, e.g., [90, Apdx. B.1] or [168, Sec. 12.4 and Sec. 10.2]), and note the prominent role of randomization in computational number theory (see, e.g., [24] or [168, Chap. 14]). We mention that randomized algorithms are more abundant in the context of approximation problems, and, in particular, for approximate counting problems (cf., e.g., [168, Chap. 11]). For a general textbook on randomized algorithms, we refer the interested reader to [168]. While it can be shown that randomization is essential in several important computational settings (cf., e.g., Chapter 9, Section 10.1.2, Appendix C, and Appendix D.3), a fundamental question is whether randomization is essential in the context of search and decision problems. The prevailing conjecture is that randomization is of limited help in the context of timebounded and spacebounded algorithms. For example, it is conjectured that BPP = P and BPL = L. Note that such conjectures do not rule out the possibility that randomization is also helpful in these contexts; they merely says that this help is limited. For example, it may be the case that any quadratictime randomized algorithm can be emulated by a cubictime deterministic algorithm, but not by a quadratictime deterministic algorithm. On the study of BPP. The conjecture BPP = P is referred to as a full derandomization of BPP, and can be shown to hold under some reasonable intractability assumptions. This result (and related ones) will be presented in Section 8.3. In the current chapter, we only presented unconditional results regarding BPP like BPP ⊂ P/poly and BPP ⊆ PH. Our presentation of Theorem 6.9 follows the proof idea of Lautemann [150]. A different proof technique, which yields a weaker result but found more applications (see, e.g., Theorems 6.27 and F.2), was presented (independently) by Sipser [207]. On the role of promise problems. In addition to their use in the formulation of Theorem 6.9, promise problems allow for establishing complete problems and hierarchy theorems for randomized computation (see Exercises 6.14 and 6.15, respectively). We mention that such results are not known for the corresponding classes of standard decision problems. The technical difficulty is that we do not know how to enumerate and/or recognize probabilistic machines that utilize a nontrivial probabilistic decision rule.
228
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
On the feasibility of randomized computation. Different perspectives on this question are offered by Chapter 8 and Appendix D.4. Specifically, as advocated in Chapter 8, generating uniformly distributed bit sequences is not really necessary for implementing randomized algorithms; it suffices to generate sequences that look (to their user) as if they are uniformly distributed. In many cases this leads to reducing the number of coin tosses in such implementations, and at times even to a full (efficient) derandomization (see Sections 8.3 and 8.4). A less radical approach is presented in Appendix D.4, which deals with the task of extracting almost uniformly distributed bit sequences from sources of weak randomness. Needless to say, these two approaches are complementary and can be combined.
Counting Problems The counting class #P was introduced by Valiant [230], who proved that computing the permanent of 0/1matrices is #Pcomplete (i.e., Theorem 6.20). Interestingly, like in the case of Cook’s introduction of NPcompleteness [58], Valiant’s motivation was determining the complexity of a specific problem (i.e., the permanent). Our presentation of Theorem 6.20 is based both on Valiant’s paper [230] and on subsequent studies (most notably [31]). Specifically, the highlevel structure of the reduction presented in Proposition 6.21 as well as the “structured” design of the clause gadget is taken from [230], whereas the Deus Ex Machina gadget presented in Figure 6.3 is based on [31]. The proof of Proposition 6.22 is also based on [31] (with some variants). Turning back to the design of clause gadgets, we regret not being able to cite and/or use a systematic study of this design problem. As noted in the main text, we decided not to present a proof of Toda’s Theorem [220], which asserts that every set in PH is Cookreducible to #P (i.e., Theorem 6.16). Appendix F.1 contains a proof of a related result, which implies that PH is reducible to #P via probabilistic polynomialtime reductions. Alternative proofs can be found in [136, 212, 220]. Approximate counting and related problems. The approximation procedure for #P is due to Stockmeyer [214], following an idea of Sipser [207]. Our exposition, however, follows further developments in the area. The randomized reduction of N P to problems of unique solutions was discovered by Valiant and Vazirani [233]. Again, our exposition is a bit different. The connection between approximate counting and uniform generation (presented in §6.2.4.1) was discovered by Jerrum, Valiant, and Vazirani [132], and turned out to be very useful in the design of algorithms (e.g., in the “Markov Chain approach” (see [168, Sec. 11.3.1])). The direct procedure for uniform generation (presented in §6.2.4.2) is taken from [28]. In continuation of §6.2.2.1, which is based on [140], we refer the interested reader to [131], which presents a probabilistic polynomialtime algorithm for approximating the permanent of nonnegative matrices. This fascinating algorithm is based on the fact that knowing (approximately) certain parameters of a nonnegative matrix M allows for approximating the same parameters for a matrix M , provided that M and M are sufficiently similar. Specifically, M and M may differ only on a single entry, and the ratio of the corresponding values must be sufficiently close to one. Needless to say, the actual observation (is not generic but rather) refers to specific parameters of the matrix, which include its permanent. Thus, given a matrix M for which we need to approximate
229
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
the permanent, we consider a sequence of matrices M0 , . . . , Mt ≈ M such that M0 is the all 1’s matrix (for which it is easy to evaluate the said parameters), and each Mi+1 is obtained from Mi by reducing some adequate entry by a factor sufficiently close to one. This process of (polynomially many) gradual changes allows for transforming the dummy matrix M0 into a matrix Mt that is very close to M (and hence has a permanent that is very close to the permanent of M). Thus, approximately obtaining the parameters of Mt allows for approximating the permanent of M. Finally, we mention that Section 10.1.1 provides a treatment of a different type of approximation problems. Specifically, when given an instance x (for a search problem R), rather than seeking an approximation of the number of solutions (i.e., #R(x)), one seeks an approximation of the value of the best solution (i.e., best y ∈ R(x)), where the value of a solution is defined by an auxiliary function.
Exercises Exercise 6.1: Show that if a search (resp., decision) problem can be solved by a probabilistic polynomialtime algorithm having zero failure probability, then the problem can be solve by a deterministic polynomialtime algorithm. (Hint: Replace the internal coin tosses by a fixed outcome that is easy to generate deterministically (e.g., the allzero sequence).)
Exercise 6.2 (randomized reductions): In continuation of the definitions presented in Section 6.1.1, prove the following: 1. If a problem is probabilistic polynomialtime reducible to a problem that is solvable in probabilistic polynomial time then is solvable in probabilistic polynomial time, where by solving we mean solving correctly except with negligible probability. Warning: Recall that in the case that is a search problem, we required that on input x the solver provides a correct solution with probability at least 1 − µ(x), but we did not require that it always returns the same solution. (Hint: without loss of generality, the reduction does not make the same query twice.) 2. Prove that probabilistic polynomialtime reductions are transitive. 3. Prove that randomized Karpreductions are transitive and that they yield a special case of probabilistic polynomialtime reductions. Define onesided error and zerosided error randomized (Karp and Cook) reductions, and consider the foregoing items when applied to them. Note that the implications for the case of onesided error are somewhat subtle. Exercise 6.3 (on the definition of probabilistically solving a search problem): In continuation of the discussion at the beginning of Section 6.1.2, suppose that for some probabilistic polynomialtime algorithm A and a positive polynomial p the def following holds: For every x ∈ S R = {z : R(z) = ∅} there exists y ∈ R(x) such that Pr[A(x) = y] > 0.5 + (1/ p(x)), whereas for every x ∈ S R it holds that Pr[A(x) = ⊥] > 0.5 + (1/ p(x)). 1. Show that there exists a probabilistic polynomialtime algorithm that solves the search problem of R with negligible error probability. (Hint: See Exercise 6.4 for a related procedure.)
230
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
2. Reflect on the need to require that one (correct) solution occurs with probability greater than 0.5 + (1/ p(x)). Specifically, what can we do if it is only guaranteed that for every x ∈ S R it holds that Pr[A(x) ∈ R(x)] > 0.5 + (1/ p(x)) (and for every x ∈ S R it holds that Pr[A(x) = ⊥] > 0.5 + (1/ p(x)))? Note that R is not necessarily in PC. Indeed, in the case that R ∈ PC we can eliminate the error probability for every x ∈ S R , and perform error reduction for x ∈ S R as in the case of RP. Exercise 6.4 (error reduction for BPP): For ε : N → [0, 1], let BPP ε denote the class of decision problems that can be solved in probabilistic polynomial time with error probability upperbounded by ε. Prove the following two claims: 1. For every positive polynomial p and ε(n) = (1/2) − (1/ p(n)), the class BPP ε equals BPP. 2. For every positive polynomial p and ε(n) = 2− p(n) , the class BPP equals BPP ε . Formulate a corresponding version for the setting of search problems. Specifically, for every input that has a solution, consider the probability that a specific solution is output. Guideline: Given an algorithm A for the syntactically weaker class, consider an algorithm A that on input x invokes A on x for t(x) times, and rules by majority. For Part 1 set t(n) = O( p(n)2 ) and apply Chebyshev’s Inequality. For Part 2 set t(n) = O( p(n)) and apply the Chernoff Bound.
Exercise 6.5 (error reduction for RP): For ρ : N → [0, 1], we define the class of decision problem RP ρ such that it contains S if there exists a probabilistic polynomialtime algorithm A such that for every x ∈ S it holds that Pr[A(x) = 1] ≥ ρ(x) and for every x ∈ S it holds that Pr[A(x) = 0] = 1. Prove the following two claims: 1. For every positive polynomial p, the class RP 1/ p equals RP. 2. For every positive polynomial p, the class RP equals RP ρ , where ρ(n) = 1 − 2− p(n) . (Hint: The onesided error allows for using an “orrule” (rather than a “majorityrule”) for the decision.)
Exercise 6.6 (error reduction for ZPP): For ρ : N → [0, 1], we define the class of decision problem ZPP ρ such that it contains S if there exists a probabilistic polynomialtime algorithm A such that for every x it holds that Pr[A(x) = χ S (x)] ≥ ρ(x) and Pr[A(x) ∈ {χ S (x), ⊥}] = 1, where χ S (x) = 1 if x ∈ S and χ S (x) = 0 otherwise. Prove the following two claims: 1. For every positive polynomial p, the class ZPP 1/ p equals ZPP. 2. For every positive polynomial p, the class ZPP equals ZPP ρ , where ρ(n) = 1 − 2− p(n) . Exercise 6.7 (an alternative definition of ZPP): We say that the decision problem S is solvable in expected probabilistic polynomial time if there exists a randomized algorithm A and a polynomial p such that for every x ∈ {0, 1}∗ it holds that Pr[A(x) = χ S (x)] = 1 and the expected number of steps taken by A(x) is at most p(x). Prove that S ∈ ZPP if and only if S is solvable in expected probabilistic polynomial time.
231
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Guideline: Repeatedly invoking a ZPP algorithm until it yields an output other than ⊥ yields an expected probabilistic polynomialtime solver. On the other hand, truncating runs of an expected probabilistic polynomialtime algorithm once they exceed twice the expected number of steps (and outputting ⊥ on such runs), we obtain a ZPPalgorithm.
Exercise 6.8: Prove that for every S ∈ N P there exists a probabilistic polynomialtime algorithm A such that for every x ∈ S it holds that Pr[A(x) = 1] > 0 and for every x ∈ S it holds that Pr[A(x) = 0] = 1. That is, A has error probability at most 1 − exp(−poly(x)) on yesinstances but never errs on noinstances. Thus, N P may be fictitiously viewed as having a huge onesided error probability. Exercise 6.9: Let BPP and coRP be classes of promise problems (as in Theorem 6.9). 1. Prove that every problem in BPP is reducible to the set {1} ∈ P by a twosided error randomized Karpreduction. 2. Prove that if a set S is Karpreducible to RP (resp., coRP) via a deterministic reduction then S ∈ RP (resp., S ∈ coRP). Exercise 6.10 (randomnessefficient error reductions): Note that standard error reduction (as in Exercise 6.4) yields error probability δ at the cost of increasing the randomness complexity by a factor of O(log(1/δ)). Using the randomnessefficient error reductions outlined in §D.4.1.3, show that error probability δ can be obtained at the cost of increasing the randomness complexity from r to O(r ) + 1.5 log2 (1/δ). Note that this allows for satisfying the hypothesis made in the illustrative paragraph of the proof of Theorem 6.9. Exercise 6.11: In continuation of the illustrative paragraph in the proof of Theorem 6.9, consider the promise problem = (yes , no ) such that yes = {(x, r ) : r  = p (x) ∧ (∀r ∈ {0, 1}r  ) A (x, r r ) = 1} and no = {(x, r ) : x ∈ S}. Recall that for every x it holds that Prr ∈{0,1}2 p (x) [A (x, r ) = χ S (x)] < 2−( p (x)+1) .
1. Show that mapping x to (x, r ), where r is uniformly distributed in {0, 1} p (x) , constitutes a onesided error randomized Karpreduction of S to . 2. Show that is in the promise problem class coRP. Exercise 6.12 (randomized versions of N P): In continuation of footnote 7, consider the following two variants of MA (which we consider the main randomized version of N P). 1. S ∈ MA(1) if there exists a probabilistic polynomialtime algorithm V such that for every x ∈ S there exists y ∈ {0, 1}poly(x) such that Pr[V (x, y) = 1] ≥ 1/2, whereas for every x ∈ S and every y it holds that Pr[V (x, y) = 0] = 1. 2. S ∈ MA(2) if there exists a probabilistic polynomialtime algorithm V such that for every x ∈ S there exists y ∈ {0, 1}poly(x) such that Pr[V (x, y) = 1] ≥ 2/3, whereas for every x ∈ S and every y it holds that Pr[V (x, y) = 0] ≥ 2/3. Prove that MA(1) = N P whereas MA(2) = MA. Guideline: For the first part, note that a sequence of internal coin tosses that makes V accept (x, y) can be incorporated into y itself (yielding a standard NPwitness). For the second part, apply the ideas underlying the proof of Theorem 6.9, and note that an adequate sequence of shifts (to be used by the verifier) can be incorporated in the single message sent by the prover.
232
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 6.13 (BPP ⊆ ZPP N P ): In continuation of the proof of Theorem 6.9, present a zeroerror randomized reduction of BPP to N P, where all classes are the standard classes of decision problems. Guideline: On input x, the ZPPmachine uniformly selects s = (s1 , . . . , sm ), and for each σ ∈ {0, 1} makes the query (x, σ, s), which is answered positively by the (coNP) oracle if for every r it holds that ∨i (A(x, r ⊕ si ) = σ ). The machine outputs σ if and only if the query (x, σ, s) was answered positively, and outputs ⊥ otherwise (i.e., both queries were answered negatively).
Exercise 6.14 (completeness for promise problem versions of BPP): Referring to the promise problem version of BPP, present a promise problem that is complete for this class under (deterministic logspace) Karpreductions. Guideline: The promise problem consists of yesinstances that are Boolean circuits that accept at least a 2/3 fraction of their possible inputs and noinstances that are Boolean circuits that reject at least a 2/3 fraction of their possible inputs. The reduction is essentially the one provided in the proof of Theorem 2.21, and the promise is used in an essential way in order to provide a BPPalgorithm.
Exercise 6.15 (hierarchy theorems for promise problem versions of BPTIME): Fixing a model of computation, let BPTIME(t) denote the class of promise problems that are solvable by a randomized algorithm of time complexity t that has a twosided error probability at most 1/3. (The standard definition refers only to decision problems.) Formulate and prove results analogous to Theorem 4.3 and Corollary 4.4. Guideline (by Dieter van Melkebeek): Apply the “delayed diagonalization” method used to prove Theorem 4.6 rather than the simple diagonalization used in Theorem 4.3. Analogously to the proof of Theorem 4.6, for every σ ∈ {0, 1}, define A M (x) = σ if Pr[M (x) = σ ] ≥ 2/3 and define A M (x) = ⊥ otherwise (i.e., if 1/3 < Pr[M (x) = 1] < 2/3), where M (x) denotes the computation of M(x) truncated after t1 (x) steps. For x ∈ [α M , β M − 1], define f (x) = A M (x + 1), where f (x) = ⊥ means that x violates the promise. Define f (β M ) = 1 if A M (α M ) = 0 and f (β M ) = 0 otherwise (i.e., if A M (α M ) ∈ {1, ⊥}). Note that f (x) is computable 1 (x + 1)) by emulating a single computation of M (x) if in randomized time O(t x ∈ [α M , β M − 1] and emulating all computations of M (α M ) if x = β M . Prove that the promise problem f cannot be solved in randomized time t1 , by noting that β M satisfies the promise and that for every x ∈ [α M + 1, β M ] that satisfies the promise (i.e., f (x) ∈ {0, 1}) it holds that if A M (x) = f (x) then f (x − 1) = A M (x) ∈ {0, 1}.
Exercise 6.16 (extracting square roots modulo a prime): Using the following guidelines, present a probabilistic polynomialtime algorithm that, on input a prime P and a quadratic residue s (mod P), returns r such that r 2 ≡ s (mod P). 1. Prove that if P ≡ 3 (mod 4) then s (P+1)/4 mod P is a square root of the quadratic residue s (mod P). 2. Note that the procedure suggested in Item 1 relies on the ability to find an odd integer e such that s e ≡ 1 (mod P). Indeed, once such an e is found, we may output s (e+1)/2 mod P. (In Item 1, we used e = (P − 1)/2, which is odd since P ≡ 3 (mod 4).) Show that it suffices to find an odd integer e together with a residue t and an even integer e such that s e t e ≡ 1 (mod P), because s ≡ s e+1 t e ≡ (s (e+1)/2 t e /2 )2 .
233
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
3. Given a prime P ≡ 1 (mod 4), a quadratic residue s, and any quadratic nonresidue t (i.e., residue t such that t (P−1)/2 ≡ −1 (mod P)), show that e and e as in Item 2 can be efficiently found.24 4. Prove that, for a prime P, with probability 1/2 a uniformly chosen t ∈ {1, . . . , P} satisfies t (P−1)/2 ≡ −1 (mod P). Note that randomization is used only in the last item, which in turn is used only for P ≡ 1 (mod 4). Exercise 6.17: Referring to the definition of arithmetic circuits (cf. Appendix B.3), show that the following decision problem is in coRP: Given a pair of circuits (C1 , C2 ) of depth d over a field that has more than 2d+1 elements, determine whether the circuits compute the same polynomial. Guideline: Note that each of these circuits computes a polynomial of degree at most 2d .
Exercise 6.18 (smallspace randomized stepcounter): As defined in Exercise 4.7, a stepcounter is an algorithm that halts after issuing a number of “signals” as specified in its input, where these signals are defined as entering (and leaving) a designated state (of the algorithm). Recall that a stepcounter may be run in parallel to another procedure in order to suspend the execution after a predetermined number of steps (of the other procedure) have elapsed. Note that there exists a simple deterministic machine that, on input n, halts after issuing n signals while using O(1) + log2 n space (and O(n) time). The goal of this exercise is presenting a (randomized) stepcounter that allows for many more signals while using the same amount of space. Specifically, present a n ) time) (randomized) algorithm that, on input n, uses O(1) + log2 n space (and O(2 n and halts after issuing an expected number of 2 signals. Furthermore, prove that, with probability at least 1 − 2−k+1 , this stepcounter halts after issuing a number of signals that is between 2n−k and 2n+k . Guideline: Repeat the following experiment till reaching success. Each trial consists of uniformly selecting n bits (i.e., tossing n unbiased coins), and is deemed successful if all bits turn out to equal the value 1 (i.e., all outcomes equal HEAD). Note that such a trial can be implemented by using space O(1) + log2 n (mainly for implementing a standard counter for determining the number of bits). Thus, each trial is successful with probability 2−n , and the expected number of trials is 2n .
Exercise 6.19 (analysis of random walks on arbitrary undirected graphs): In order to complete the proof of Proposition 6.13, prove that if {u, v} is an edge of the graph G = (V, E) then E[X u,v ] ≤ 2E. Recall that, for a fixed graph, X u,v is a random variable representing the number of steps taken in a random walk that starts at the vertex u until the vertex v is first encountered. Guideline: Let Z u,v (n) be a random variable counting the number of minimal paths from u to v that appear along a random walk of length n, where the walk starts at the i
Write (P − 1)/2 = (2 j0 + 1) · 2i0 , and note that s (2 j0 +1)·2 0 ≡ 1 (mod P), which may be written as i i ≡ 1 (mod P). Given that for some i > i > 0 and j it holds that s (2 j0 +1)·2 t (2 j +1)·2 ≡ 1 i−1 i (mod P), show how to find i > i − 1 and j such that s (2 j0 +1)·2 t (2 j +1)·2 ≡ 1 (mod P). (Extra hint: 24
i i +1 s (2 j0 +1)·2 0 t (2 j0 +1)·2 0
i −1
s (2 j0 +1)·2 t (2 j +1)·2 we get what we need. i−1
i
≡ ±1 (mod P) and t (2 j0 +1)·2 0 ≡ −1 (mod P).) Applying this reasoning for i 0 times,
234
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
stationary vertex distribution (which is well defined assuming the graph is not bipartite, which in turn may be enforced by adding a selfloop). On one hand, E[X u,v + X v,u ] = limn→∞ (n/E[Z u,v (n)]), due to the memoryless property of the walk. On the other hand, def th letting χv,u (i) = 1 if the edge {u, v} was traversed from n v to u in the i step of such def a random walk and χv,u (i) = 0 otherwise, we have i=1 χv,u (i) ≤ Z u,v (n) + 1 and E[χv,u (i)] = 1/2E (because, in each step, each directed edge appears on the walk with equal probability). It follows that E[X u,v ] < 2E.
Exercise 6.20 (the class PP ⊇ BPP and its relation to #P): In contrast to BPP, which refers to useful probabilistic polynomialtime algorithms, the class PP does not capture such algorithms but is rather closely related to #P. A decision problem S is in PP if there exists a probabilistic polynomialtime algorithm A such that, for every x, it holds that x ∈ S if and only if Pr[A(x) = 1] > 1/2. Note that BPP ⊆ PP. Prove that PP is Cookreducible to #P and vice versa. Guideline: For S ∈ PP (by virtue of the algorithm A), consider the relation R such that (x, r ) ∈ R if and only if A accepts the input x when using the randominput r ∈ {0, 1} p(x) , where p is a suitable polynomial. Thus, x ∈ S if and only if R(x) > 2 p(x)−1 , which in turn can de determined by querying the counting function of R. To reduce f ∈ #P to PP, consider the relation R ∈ PC that is counted by f (i.e., f (x) = R(x)) and the decision problem S f as defined in Proposition 6.15. Let p be the polynomial specifying the length of solutions for R (i.e., (x, y) ∈ R implies y = p(x)), and consider the following algorithm A : On input (x, N ), with probability 1/2, algorithm A uniformly selects y ∈ {0, 1} p(x) and accepts if and only if (x, y) ∈ R, and otherwise (i.e., with the remaining probability of 1/2) algorithm A accepts with probap(x) +0.5 . Prove that (x, N ) ∈ S f if and only if Pr[A (x) = 1] > 1/2. bility exactly 2 2−N p(x)
Exercise 6.21 (enumeration problems): For any binary relation R, define the enumeration problem of R as a function f R : {0, 1}∗ × N → {0, 1}∗ ∪ {⊥} such that f R (x, i) equals the i th element in R(x) if R(x) ≥ i and f R (x, i) = ⊥ otherwise. The above definition refers to the standard lexicographic order on strings, but any other efficient order of strings will do.25 1. Prove that, for any polynomially bounded R, computing #R is reducible to computing fR. 2. Prove that, for any R ∈ PC, computing f R is reducible to some problem in #P. Guideline: Consider the binary relation R = {(x, b, y) : (x, y) ∈ R ∧ y ≤ b}, and show that f R is reducible to #R . (Extra hint: Note that f R (x, i) = y if and only if R (x, y) = i and for every y < y it holds that R (x, y ) < i.)
Exercise 6.22 (artificial #Pcomplete problems): Show that there exists a relation R ∈ PC such that #R is #Pcomplete and S R = {0, 1}∗ . Furthermore, prove that for every R ∈ PC there exists R ∈ PF ∩ PC such that for every x it holds that #R(x) = #R (x) + 1. Note that Theorem 6.19 follows by starting with any relation R ∈ PC such that #R is #Pcomplete. 25
An order of strings is a 11 and onto mapping µ from the natural numbers to the set of all strings. Such order is called efficient if both µ and its inverse are efficiently computable. The standard lexicographic order satisfies µ(i) = y if the string 1y is the (compact) binary expansion of the integer i; that is µ(1) = λ, µ(2) = 0, µ(3) = 1, µ(4) = 00, etc.
235
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Exercise 6.23 (computing the permanent of integer matrices): Prove that computing the permanent of matrices with 0/1entries is computationally equivalent to computing the number of perfect matchings in bipartite graphs. Guideline: Given a bipartite graph G = ((X, Y ), E), consider the matrix M representing the edges between X and Y (i.e., the (i, j)entry in M is 1 if the i th vertex of X is connected to the j th entry of Y ), and note that only perfect matchings in G contribute to the permanent of M.
Exercise 6.24 (computing the permanent modulo 3): Combining Proposition 6.21 and Theorem 6.29, prove that for every fixed n > 1 that does not divide any power of c, computing the permanent modulo n is NPhard under randomized reductions. Since Proposition 6.21 holds for c = 210 , hardness holds for every integer n > 1 that is not a power of 2. (We mention that, on the other hand, for any fixed n = 2e , the permanent modulo n can be computed in polynomial time [230, Thm. 3].) Guideline: Apply the reduction of Proposition 6.21 to the promise problem of deciding whether a 3CNF formula has a unique satisfiable assignment or is unsatisfiable. Note that for any m it holds that cm ≡ 0 (mod n).
Exercise 6.25 (negative values in Proposition 6.21): Assuming P = N P, prove that Proposition 6.21 cannot hold for a set I containing only nonnegative integers. Note that the claim holds even if the set I is not finite (and even if I is the set of all nonnegative integers). Guideline: A reduction as in Proposition 6.21 yields a Karpreduction of 3SAT to deciding whether the permanent of a matrix with entries in I is nonzero. Note that the permanent of a nonnegative matrix is nonzero if and only if the corresponding bipartite graph has a perfect matching.
Exercise 6.26 (highlevel analysis of the permanent reduction): Establish the correctness of the highlevel reduction presented in the proof of Proposition 6.21. That is, show that if the clause gadget satisfies the three conditions postulated in the said proof, then each satisfying assignment of φ contributes exactly cm to the SWCC of G φ whereas unsatisfying assignments have no contribution. Guideline: Cluster the cycle covers of G φ according to the set of track edges that they use (i.e., the edges of the cycle cover that belong to the various tracks). (Note the correspondence between these edges and the external edges used in the definition of the gadget’s properties.) Using the postulated conditions (regarding the clause gadget) prove that, for each such set T of track edges, if the sum of the weights of all cycle covers that use the track edges T is nonzero then the following hold: 1. The intersection of T with the set of track edges incident at each specific clause gadget is nonempty. Furthermore, if this set contains an incoming edge (resp., outgoing edge) of some entry vertex (resp., exit vertex) then it also contains an outgoing edge (resp., incoming edge) of the corresponding exit vertex (resp., entry vertex). 2. If T contains an edge that belongs to some track then it contains all edges of this track. It follows that, for each variable x, the set T contains the edges of a single track associated with x.
236
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
3. The tracks “picked” by T correspond to a single truth assignment to the variables of φ, and this assignment satisfies φ (because, for each clause, T contains an external edge that corresponds to a literal that satisfies this clause). Note that different sets of the aforementioned type yield different satisfying assignments, and that each satisfying assignment is obtained from some set of the aforementioned type.
Exercise 6.27 (analysis of the implementation of the clause gadget): Establish the correctness of the implementation of the clause gadget presented in the proof of Proposition 6.21. That is, show that if the box satisfies the three conditions postulated in the said proof, then the clause gadget of Figure 6.4 satisfies the conditions postulated for it. Guideline: Cluster the cycle covers of a gadget according to the set of nonbox edges that they use, where nonbox edges are the edges shown in Figure 6.4. Using the postulated conditions (regarding the box) prove that, for each set S of nonbox edges, if the sum of the weights of all cycle covers that use the nonbox edges S is nonzero then the following hold: 1. The intersection of S with the set of edges incident at each box must contain two (nonselfloop) edges, one incident at each of the box’s terminals. Needless to say, one edge is incoming and the other outgoing. Referring to the six edges that connects one of the six designated vertices (of the gadget) with the corresponding box terminals as connectives, note that if S contains a connective incident at the terminal of some box then it must also contain the connective incident at the other terminal. In such a case, we say that this box is picked by S. 2. Each of the three (literaldesignated) boxes that is not picked by S is “traversed” from left to right (i.e., the cycle cover contains an incoming edge of the left terminal and an outgoing edge of the right terminal). Thus, the set S must contain a connective, because otherwise no directed cycle may cover the leftmost vertex shown in Figure 6.4. That is, S must pick some box. 3. The set S is fully determined by the nonempty set of boxes that it picks. The postulated properties of the clause gadget follow, with c = b5 .
Exercise 6.28 (analysis of the design of a box for the clause gadget): Prove that the 4by4 matrix presented in Eq. (6.4) satisfies the properties postulated for the “box” used in the second part of the proof of Proposition 6.21. In particular: 1. Show a correspondence between the conditions required of the box and conditions regarding the value of the permanent of certain submatrices of the adjacency matrix of the graph. (Hint: For example, show that the first condition corresponds to requiring that the value of the permanent of the entire matrix equals zero. The second condition refers to submatrices obtained by omitting either the first row and fourth column or the fourth row and first column.)
2. Verify that the matrix in Eq. (6.4) satisfies the aforementioned conditions (regarding the value of the permanent of certain submatrices). Prove that no 3by3 matrix (and thus also no 2by2 matrix) can satisfy the aforementioned conditions.
237
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Exercise 6.29 (error reduction for approximate counting): Show that the error probability δ in Definition 6.24 can be reduced from 1/3 (or even (1/2) + (1/poly(x)) to exp(−poly(x)). Guideline: Invoke the weaker procedure for an adequate number of times and take the median value among the values obtained in these invocations.
Exercise 6.30 (approximation and gap problems): Let f : {0, 1}∗ → N be a polynomially bounded function (i.e.,  f (x) = poly(x)) and ε : N → [0, 1] be a noticeable function that is polynomialtime computable. Prove that the search problem associdef ated with the relation R f,ε = {(x, v) : v − f (x) ≤ ε(x) · f (x)} is computationally equivalent to the promise (“gap”) problem of distinguishing elements of the set {(x, v) : v < (1 − ε(x)) · f (x)} and elements of the set {(x, v) : v > (1 + ε(x)) · f (x)}. Exercise 6.31 (strong approximation for some #Pcomplete problems): Show that there exists #Pcomplete problems (albeit unnatural ones) for which an (ε, 0)approximation can be found by a (deterministic) polynomialtime algorithm. Furthermore, the running time depends polynomially on 1/ε. Guideline: Combine any #Pcomplete problem referring to some R1 ∈ PC with a trivial counting problem (e.g., the counting problem associated with the trivial relation R2 = ∪n∈N {(x, y) : x, y ∈ {0, 1}n }). Show that, without loss of generality, it holds that #R1 (x) ≤ 2x/2 . Prove that the counting problem of R = {(x, 1y) : (x, y) ∈ R1 } ∪ {(x, 0y) : (x, y) ∈ R2 } is #Pcomplete (by reducing from #R1 ). Present a deterministic algorithm that, on input x and ε > 0, outputs an (ε, 0)approximation of #R(x) in time poly(x/ε). (Extra hint: distinguish between ε ≥ 2−x/2 and ε < 2−x/2 .)
Exercise 6.32 (relative approximation for DNF satisfaction): Referring to the text of §6.2.2.1, prove the following claims. 1. Both assumptions regarding the general setting hold in case Si = Ci−1 (1), where Ci−1 (1) denotes the set of truth assignments that satisfy the conjunction Ci . Guideline: In establishing the second assumption note that it reduces to the conjunction of the following two assumptions: (a) Given i, one can efficiently generate a uniformly distributed element of Si . Actually, generating a distribution that is almost uniform over Si suffices. (b) Given i and x, one can efficiently determine whether x ∈ Si .
2. Prove Proposition 6.26, relating to details such as the error probability in an implementation of Construction 6.25. 3. Note that Construction 6.25 does not require exact computation of Si . Analyze the output distribution in the case that we can only approximate Si  up to a factor of 1 ± ε . Exercise 6.33 (reducing the relative deviation in approximate counting): Prove that, for any R ∈ PC and every polynomial p and constant δ < 0.5, there exists R ∈ PC such that (1/ p, δ)approximation for #R is reducible to (1/2, δ)approximation for #R . Furthermore, for any F(n) = exp(poly(n)), prove that there exists R ∈ PC such that (1/ p, δ)approximation for #R is reducible to approximating #R to within a factor of F with error probability δ.
238
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Guideline (for the main part): For t(n) = ( p(n)), define R such that (y1 , . . . , yt(x) ) ∈ R (x) if and only if (∀i) yi ∈ R(x). Note that R(x) = R (x)1/t(x) , and thus if a = (1 ± (1/2)) · R (x) then a 1/t(x) = (1 ± (1/2))1/t(x) · R(x).
Exercise 6.34 (deviation reduction in approximate counting, cont.): In continuation of Exercise 6.33, prove that if R is NPcomplete via parsimonious reductions then, for every positive polynomial p and constant δ < 0.5, the problem of (1/ p, δ)approximation for #R is reducible to (1/2, δ)approximation for #R. (Hint: Compose the reduction (to the problem of (1/2, δ)approximation for #R ) provided in Exercise 6.33 with the parsimonious reduction of #R to #R.)
Prove that, for every function F such that F (n) = exp(n o(1) ), we can also reduce the aforementioned problems to the problem of approximating #R to within a factor of F with error probability δ. Guideline: Using R as in Exercise 6.33, we encounter a technical difficulty. The issue is that the composition of the (“amplifying”) reduction of #R to #R with the parsimonious reduction of #R to #R may increase the length of the instance. Indeed, the length of the new instance is polynomial in the length of the original instance, but this polynomial may depend on R , which in turn depends on F . Thus, we cannot use F (n) = exp(n 1/O(1) ) but F (n) = exp(n o(1) ) is fine.
Exercise 6.35: Referring to the procedure in the proof Theorem 6.27, show how to use an NPoracle in order to determine whether the number of solutions that “pass a random sieve” is greater than t. You are allowed queries of length polynomial in the length of x, h and in the size of t. def
Guideline: Consider the set S R,H = {(x, i, h, 1t ) : ∃y1 , . . . , yt s.t. ψ (x, h, y1 , . . . , yt )}, where ψ (x, h, y1 , . . . , yt ) holds if and only if the y j are different and for every j it holds that (x, y j ) ∈ R ∧ h(y j ) = 0i .
Exercise 6.36 (parsimonious reductions and Theorem 6.29): Demonstrate the importance of parsimonious reductions in Theorem 6.29 by proving that there exists a search problem R ∈ PC such that every problem in PC is reducible to R (by a nonparsimonious reduction) and still the the promise problem (US R , S R ) is decidable in polynomial time. Guideline: Consider the following artificial witness relation R for SAT in which (φ, σ τ ) ∈ R if σ ∈ {0, 1} and τ satisfies φ. Note that the standard witness relation of SAT is reducible to R, but this reduction is not parsimonious. Also note that US R = ∅ and thus (US R , S R ) is trivial.
Exercise 6.37: In continuation of Exercise 6.36, prove that there exists a search problem R ∈ PC such that #R is #Pcomplete and still the promise problem (US R , S R ) is decidable in polynomial time. Provide one proof for the case that R is PCcomplete and another proof for R ∈ PF. Guideline: For the first case, the relation R suggested in the guideline to Exercise 6.36 will do. For the second case, rely on Theorem 6.20 and on the fact that it is easy to decide (US R , S R ) when R is the corresponding perfect matching relation (by computing the determinant).
Exercise 6.38: Prove that SAT is randomly reducible to deciding unique solutions for SAT, without using the fact that SAT is NPcomplete via parsimonious reductions.
239
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
RANDOMNESS AND COUNTING
Guideline: Follow the proof of Theorem 6.29, while using the family of pairwise independent hashing functions provided in Construction D.3. Note that, in this case, the condition (τ ∈ RSAT (φ)) ∧ (h(τ ) = 0i ) can be directly encoded as a CNF formula. def That is, consider the formula φh such that φh (z) = φ(z) ∧ (h(z) = 0i ), and note that i h(z) = 0 can be written as the conjunction of i conditions, where each condition is a CNF that is logically equivalent to the parity of some of the bits of z (where the identity of these bits is determined by h).
Exercise 6.39 (search problems with few solutions): For any R ∈ PC and any polynodef mial p, consider the promise problem (FewS R, p , S R ), where FewS R, p = {x : R(x) ≤ p(x)}. 1. Show that, for every R ∈ PC, the (approximate counting) procedure that is presented in the proof of Theorem 6.27 implies a randomized reduction of S R to (FewS R ,100 , S R ), where R (x, i, h) = {y ∈ R(x) : h(x) = 0i }. 2. For any R ∈ PC and any polynomial p , present a randomized reduction of (FewS R , p , S R ) to (US R , S R ), where R (x , m) = {y1 < y2 < · · · < ym : (∀ j) y j ∈ R (x )}. Guideline: Map x to (x , m), where m is uniformly selected in [ p (x )].
Exercise 6.40: Show that the search problem associated with Perfect Matching satisfies the hypothesis of Theorem 6.31. Guideline: For a given graph G = (V, E), encode each matching as an Ebit long string such that the i th is set to 1 if and only if the i th edge is in the matching.
Exercise 6.41 (an alternative procedure for approximate counting): Adapt Step 1 of Construction 6.32 so as to obtain an approximate counting procedure for #R. Guideline: For m = 0, 1, . . . , the procedure invokes Step 1 of Construction 6.32 until a negative answer is obtained, and outputs 120 · 2m for the current value of m. For R(x) > 80, this yields a constant factor approximation of R(x). In fact, we can obtain a better estimate by making additional queries at iteration m (i.e., queries of the form (x, h, 1i ) for i = 10, . . . , 120). The case R(x) ≤ 80 can be treated by using Step 2 of Construction 6.32, in which case we obtain an exact count.
Exercise 6.42: Let R be an arbitrary PCcomplete search problem. Show that approximate counting and uniform generation for R can be randomly reduced to deciding membership in S R , where by approximate counting we mean a (1 − (1/ p)approximation for any polynomial p. Guideline: Note that Construction 6.32 yields such procedures (see also Exercise 6.41), except that they make oracle calls to some other set in N P. Using the NPcompleteness of S R , we are done.
Exercise 6.43: Present a probabilistic polynomialtime algorithm that solves the uniform generation problem of Rdnf , where (φ, τ ) ∈ Rdnf if φ is a DNF formula and τ is an assignment satisfying it. Guideline: Consider the set {(i, e) : i ∈ [m] ∧ e ∈ Si }, where the Si ’s are as in §6.2.2.1.
240
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER SEVEN
The Bright Side of Hardness
So saying she donned her beautiful, glittering golden–Ambrosial sandals, which carry her flying like the wind over the vast land and sea; she grasped the redoubtable bronzeshod spear, so stout and sturdy and strong, wherewith she quells the ranks of heroes who have displeased her, the [brighteyed] daughter of her mighty father. Homer, Odyssey, 1:96–101 The existence of natural computational problems that are (or seem to be) infeasible to solve is usually perceived as bad news, because it means that we cannot do things we wish to do. But this bad news has a positive side, because hard problems can be “put to work” to our benefit, most notably in cryptography. It seems that utilizing hard problems requires the ability to efficiently generate hard instances, which is not guaranteed by the notion of worstcase hardness. In other words, we refer to the gap between “occasional” hardness (e.g., worstcase hardness or mild averagecase hardness) and “typical” hardness (with respect to some tractable distribution). Much of the current chapter is devoted to bridging this gap, which is known by the term hardness amplification. The actual applications of typical hardness are presented in Chapter 8 and Appendix C. Summary: We consider two conjectures that are related to P = N P. The first conjecture is that there are problems that are solvable in exponential time (i.e., in E) but are not solvable by (nonuniform) families of small (say, polynomialsize) circuits. We show that this worstcase conjecture can be transformed into an averagecase hardness result; specifically, we obtain predicates that are strongly “inapproximable” by small circuits. Such predicates are used toward derandomizing BPP in a nontrivial manner (see Section 8.3). The second conjecture is that there are problems in NP (i.e., search problems in PC) for which it is easy to generate (solved) instances that are typically hard to solve (for a party that did not generate these instances). This conjecture is captured in the formulation of oneway functions, which are functions that are easy to evaluate but hard to invert (in an averagecase sense). We show that functions that are hard to invert in a relatively mild averagecase sense yield functions that are hard to invert in a strong averagecase sense, and that the latter
241
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
yield predicates that are very hard to approximate (called hardcore predicates). Such predicates are useful for the construction of generalpurpose pseudorandom generators (see Section 8.2) as well as for a host of cryptographic applications (see Appendix C). In the rest of this chapter, the actual order of presentation of the two aforementioned conjectures and their consequences is reversed: We start (in Section 7.1) with the study of oneway functions, and only later (in Section 7.2) turn to the study of problems in E that are hard for small circuits. Teaching note: We list several reasons for preferring the aforementioned order of presentation. First, we mention the great conceptual appeal of oneway functions and the fact that they have very practical applications. Second, hardness amplification in the context of oneway functions is technically simpler than the amplification of hardness in the context of E. (In fact, Section 7.2 is the most technical text in this book.) Third, some of the techniques that are shared by both treatments seem easier to understand first in the context of oneway functions. Last, the current order facilitates the possibility of teaching hardness amplification only in one incarnation, where the context of oneway functions is recommended as the incarnation of choice (for the aforementioned reasons). If you wish to teach hardness amplification and pseudorandomness in the two aforementioned incarnations, then we suggest following the order of the current text. That is, first teach hardness amplification in its two incarnations, and only next teach pseudorandomness in the corresponding incarnations.
Prerequisites. We assume a basic familiarity with elementary probability theory (see Appendix D.1) and randomized algorithms (see Section 6.1). In particular, standard conventions regarding random variables (presented in Appendix D.1.1) and various “laws of large numbers” (presented in Appendix D.1.2) will be extensively used.
7.1. OneWay Functions Loosely speaking, oneway functions are functions that are easy to evaluate but hard (on the average) to invert. Thus, in assuming that oneway functions exist, we are postulating the existence of efficient processes (i.e., the computation of the function in the forward direction) that are hard to reverse. Analogous phenomena in daily life are known to us in abundance (e.g., the lighting of a match). Thus, the assumption that oneway functions exist is a complexity theoretic analogue of our daily experience. Oneway functions can also be thought of as efficient ways for generating “puzzles” that are infeasible to solve; that is, the puzzle is a random image of the function and a solution is a corresponding preimage. Furthermore, the person generating the puzzle knows a solution to it and can efficiently verify the validity of (possibly other) solutions to the puzzle. In fact, as explained in Section 7.1.1, every mechanism for generating such puzzles can be converted to a oneway function. The reader may note that when presented in terms of generating hard puzzles, oneway functions have a clear cryptographic flavor. Indeed, oneway functions are central to cryptography, but we shall not explore this aspect here (and rather refer the reader to Appendix C). Similarly, oneway functions are closely related to (generalpurpose)
242
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.1. ONEWAY FUNCTIONS
pseudorandom generators, but this connection will be explored in Section 8.2. Instead, in the current section, we will focus on oneway functions per se. Teaching note: While we recommend including a basic treatment of pseudorandomness within a course on Complexity Theory, we do not recommend doing so with respect to cryptography. The reason is that cryptography is far more complex than pseudorandomness (e.g., compare the definition of secure encryption to the definition of pseudorandom generators). The extra complexity is due to conceptual richness, which is something good, except that some of these conceptual issues are central to cryptography but not to Complexity Theory. Thus, teaching cryptography in the context of a course on Complexity Theory is likely to either overload the course with material that is not central to Complexity Theory or cause a superficial and misleading treatment of cryptography. We are not sure as to which of these two possibilities is worse. Still, for the benefit of the interested reader, we have included an overview of the foundations of cryptography as an appendix to the main text (see Appendix C).
7.1.1. Generating Hard Instances and OneWay Functions Let us start by examining the prophecy, made in the preface to this chapter, by which intractable problems can be used to our benefit. The basic idea is that intractable problems offer a way of generating an obstacle that stands in the way of our opponents and thus protects our interests. These opponents may be either real (e.g., in the context of cryptography) or imaginary (e.g., in the context of derandomization), but in both cases we wish to prevent them from seeing something or doing something. Hard obstacles seem useful toward this goal. Let us assume that P = N P or even that N P is not contained in BPP. Can we use this assumption to our benefit? Not really: The N P ⊆ BPP assumption refers to the worstcase complexity of problems, while benefiting from hard problems seems to require the ability to generate hard instances. In particular, the generated instances should be typically hard and not merely occasionally hard; that is, we seek averagecase hardness and not merely worstcase hardness. Taking a short digression, we mention that in Section 7.2 we shall see that worstcase hardness (of N P or even E) can be transformed into averagecase hardness of E. Such a transformation is not known for N P itself, and in some applications (e.g., in cryptography) we do need the hardontheaverage problem to be in N P. In this case, we currently need to assume that, for some problem in N P, it is the case that hard instances are easy to generate (and not merely exist). That is, we assume that N P is “hard on the average” with respect to a distribution that is efficiently samplable. This assumption will be further discussed in Section 10.2. However, for the aforementioned applications (e.g., in cryptography) this assumption does not seem to suffice either: We know how to utilize such “hard on the average” problems only when we can efficiently generate hard instances coupled with adequate solutions.1 That is, we assume that, for some search problem in PC (resp., decision problem in N P), we can efficiently generate instancesolution pairs (resp., yesinstances coupled with corresponding NPwitnesses) such that the instance is hard to solve (resp., 1
We wish to stress the difference between the two gaps discussed here. Our feeling is that the nonusefulness of worstcase hardness (per se) is far more intuitive than the nonusefulness of averagecase hardness that does not correspond to an efficient generation of “solved” instances.
243
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
hard to verify as belonging to the set). Needless to say, the hardness assumption refers to a person who does not get the solution (resp., witness). Thus, we can efficiently generate hard “puzzles” coupled with solutions, and so we may present to others hard puzzles for which we know a solution. Let us formulate the foregoing discussion. Referring to Definition 2.3, we consider a relation R in PC (i.e., R is polynomially bounded and membership in R can be determined in polynomial time), and assume that there exists a probabilistic polynomialtime algorithm G that satisfies the following two conditions: 1. On input 1n , algorithm G always generates a pair in R such that the first element has length n. That is, Pr[G(1n ) ∈ R ∩ ({0, 1}n × {0, 1}∗ )] = 1. 2. It is typically infeasible to find solutions to instances that are generated by G; that is, when only given the first element of G(1n ), it is infeasible to find an adequate solution. Formally, denoting the first element of G(1n ) by G 1 (1n ), for every probabilistic polynomialtime (solver) algorithm S, it holds that Pr[(G 1 (1n ), S(G 1 (1n ))) ∈ R] = µ(n), where µ vanishes faster than any polynomial fraction (i.e., for every positive polynomial p and all sufficiently large n it is the case that µ(n) < 1/ p(n)). We call G a generator of solved intractable instances for R. We will show that such a generator exists if and only if oneway functions exist, where oneway functions are functions that are easy to evaluate but hard (on the average) to invert. That is, a function f : {0, 1}∗ → {0, 1}∗ is called oneway if there is an efficient algorithm that on input x outputs f (x), whereas any feasible algorithm that tries to find a preimage of f (x) under f may succeed only with negligible probability (where the probability is taken uniformly over the choices of x and the algorithm’s coin tosses). Associating feasible computations with probabilistic polynomialtime algorithms and negligible functions with functions that vanish faster than any polynomial fraction, we obtain the following definition. Definition 7.1 (oneway functions): A function f : {0, 1}∗ → {0, 1}∗ is called oneway if the following two conditions hold: 1. Easy to evaluate: There exists a polynomialtime algorithm A such that A(x) = f (x) for every x ∈ {0, 1}∗ . 2. Hard to invert: For every probabilistic polynomialtime algorithm A , every polynomial p, and all sufficiently large n,
Pr x∈{0,1}n [A ( f (x), 1n ) ∈ f −1 ( f (x))]
1 q(m)
(7.4)
Focusing on such a generic m and assuming (see footnote 3) that m = n 2 p(n), we present the following probabilistic polynomialtime algorithm, A , for inverting f . On input y and 1n (where supposedly y = f (x) for some x ∈ {0, 1}n ), algorithm A proceeds by applying the following probabilistic procedure, denoted I , on input y for t (n) times, where t (·) is a polynomial that depends on the polynomials p and q def (specifically, we set t (n) = 2n 2 · p(n) · q(n 2 p(n))). Procedure I (on input y and 1n ): def
For i = 1 to t(n) = n · p(n) do begin
(1) Select uniformly and independently a sequence of strings x1 , . . . , xt(n) ∈ {0, 1}n . (2) Compute (z 1 , . . . , z t(n) ) ← B ( f (x1 ), . . . , f (xi−1 ), y, f (xi+1 ), . . . , f (xt(n) )). (Note that y is placed in the i th position instead of f (xi ).) (3) If f (z i ) = y then halt and output z i . (This is considered a success.) end
Using Eq. (7.4), we now present a lower bound on the success probability of algorithm A , deriving a contradiction to the theorem’s hypothesis. To this end we define a set, denoted Sn , that contains all nbit strings on which the procedure I succeeds with probability greater than n/t (n). (The probability is taken only over the coin
247
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
tosses of procedure I ). Namely, def Sn = x ∈ {0, 1}n : Pr[I ( f (x)) ∈ f −1 ( f (x))] >
n t (n)
In the next two claims we shall show that Sn contains all but at most a 1/2 p(n) fraction of the strings of length n, and that for each string x ∈ Sn algorithm A inverts f on f (x) with probability exponentially close to 1. It will follow that A inverts f on f (Un ) with probability greater than 1 − (1/ p(n)), in contradiction to the theorem’s hypothesis. Claim 7.5.1: For every x ∈ Sn
Pr[A ( f (x)) ∈ f −1 ( f (x))] > 1 − 2−n This claim follows directly from the definitions of Sn and A . Claim 7.5.2:
1 Sn  > 1 − 2 p(n)
· 2n
The rest of the proof is devoted to establishing this claim, and indeed combining Claims 7.5.1 and 7.5.2, the theorem follows. The key observation is that, for every i ∈ [t(n)] and every xi ∈ {0, 1}n \ Sn , it holds that ! Pr B (F(Un2 p(n) )) ∈ F −1 (F(Un2 p(n) ))Un(i) = xi n ≤ Pr[I ( f (xi )) ∈ f −1 ( f (xi ))] ≤ t (n) where Un(1) , . . . , Un(n· p(n)) denote the nbit long blocks in the random variable Un 2 p(n) . It follows that ! def ξ = Pr B (F(Un 2 p(n) )) ∈ F −1 (F(Un 2 p(n) )) ∧ ∃i s.t. Un(i) ∈ {0, 1}n \ Sn ≤
t(n)
Pr B (F(Un2 p(n) )) ∈ F −1 (F(Un2 p(n) )) ∧ Un(i) ∈ {0, 1}n \ Sn
!
i=1
≤ t(n) ·
1 n = t (n) 2q(n 2 p(n))
where the equality is due to t (n) = 2n 2 · p(n) · q(n 2 p(n)) and t(n) = n · p(n). On the other hand, using Eq. (7.4), we have ! ! ξ ≥ Pr B (F(Un 2 p(n) )) ∈ F −1 (F(Un 2 p(n) )) − Pr (∀i) Un(i) ∈ Sn ≥
1 q(n 2 p(n))
− Pr[Un ∈ Sn ]t(n) .
Using t(n) = n · p(n), we get Pr[Un ∈ Sn ] > (1/2q(n 2 p(n)))1/(n· p(n)) , which implies Pr[Un ∈ Sn ] > 1 − (1/2 p(n)) for sufficiently large n. Claim 7.5.2 follows, and so does the theorem.
248
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.1. ONEWAY FUNCTIONS
Digest. Let us recall the structure of the proof of Theorem 7.5. Given a weak oneway function f , we first constructed a polynomialtime computable function F with the intention of later proving that F is strongly oneway. To prove that F is strongly oneway, we used a reducibility argument. The argument transforms efficient algorithms that supposedly contradict the strong onewayness of F into efficient algorithms that contradict the hypothesis that f is weakly oneway. Hence, F must be strongly oneway. We stress that our algorithmic transformation, which is in fact a randomized Cookreduction, makes no implicit or explicit assumptions about the structure of the prospective algorithms for inverting F. Such assumptions (e.g., the “natural” assumption that the inverter of F works independently on each block) cannot be justified (at least not at our current state of understanding of the nature of efficient computations). We use the term a reducibility argument, rather than just saying a reduction, so as to emphasize that we do not refer here to standard (worstcase complexity) reductions. Let us clarify the distinction: In both cases we refer to reducing the task of solving one problem to the task of solving another problem; that is, we use a procedure solving the second task in order to construct a procedure that solves the first task. However, in standard reductions one assumes that the second task has a perfect procedure solving it on all instances (i.e., on the worst case), and constructs such a procedure for the first task. Thus, the reduction may invoke the given procedure (for the second task) on very “nontypical” instances. This cannot be allowed in our reducibility arguments. Here, we are given a procedure that solves the second task with certain probability with respect to a certain distribution. Thus, in employing a reducibility argument, we cannot invoke this procedure on any instance. Instead, we must consider the probability distribution, on instances of the second task, induced by our reduction. In our case (as in many cases) the latter distribution equals the distribution to which the hypothesis (regarding solvability of the second task) refers, but in general these distributions need only be “sufficiently close” in an adequate sense (which depends on the analysis). In any case, a careful consideration of the distribution induced by the reducibility argument is due. (Indeed, the same issue arises in the context of reductions among “distributional problems” considered in Section 10.2.) An informationtheoretic analogue. Theorem 7.5 (or rather its proof) has a natural informationtheoretic (or “probabilistic”) analogue that refers to the amplification of the success probability by repeated experiments: If some event occurs with probability p in a single experiment, then the event will occur with very high probability (i.e., 1 − e−n ) when the experiment is repeated n/ p times. The analogy is to evaluating the function F at a randominput, where each block of this input may be viewed as an attempt to hit the noticeable “hard region” of f . The reader is probably convinced at this stage that the proof of Theorem 7.5 is much more complex than the proof of the informationtheoretic analogue. In the informationtheoretic context the repeated experiments are independent by definition, whereas in the computational context no such independence can be guaranteed. (Indeed, the independence assumption corresponds to the naive argument discussed at the beginning of the proof of Theorem 7.5.) Another indication of the difference between the two settings follows. In the informationtheoretic setting, the probability that the event did not occur in any of the repeated trials decreases exponentially with the number of repetitions. In contrast, in the computational setting we can only reach an unspecified negligible bound on the inverting probabilities of polynomialtime algorithms. Furthermore, for all we know, it may be the case that F can be efficiently inverted on F(Un 2 p(n) ) with success
249
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
x f(x) b(x) Figure 7.1: The hardcore of a oneway function. The solid arrows depict easily computable transformation while the dashed arrows depict infeasible transformations.
probability that is subexponentially decreasing (e.g., with probability 2−(log2 n) ), whereas the analogous informationtheoretic bound is exponentially decreasing (i.e., e−n ). 3
7.1.3. HardCore Predicates Oneway functions per se suffice for one central application: the construction of secure signature schemes (see Appendix C.6). For other applications, one relies not merely on the infeasibility of fully recovering the preimage of a oneway function, but rather on the infeasibility of meaningfully guessing bits in the preimage. The latter notion is captured by the definition of a hardcore predicate. Recall that saying that a function f is oneway means that given a typical y (in the range of f ) it is infeasible to find a preimage of y under f . This does not mean that it is infeasible to find partial information about the preimage(s) of y under f . Specifically, it may be easy to retrieve half of the bits of the preimage (e.g., given a oneway function def f consider the function f defined by f (x, r ) = ( f (x), r ), for every x = r ). We note that hiding partial information (about the function’s preimage) plays an important role in more advanced constructs (e.g., pseudorandom generators and secure encryption). With this motivation in mind, we will show that essentially any oneway function hides specific partial information about its preimage, where this partial information is easy to compute from the preimage itself. This partial information can be considered as a “hardcore” of the difficulty of inverting f . Loosely speaking, a polynomialtime computable (Boolean) predicate b is called a hardcore of a function f if no feasible algorithm, given f (x), can guess b(x) with success probability that is nonnegligibly better than one half. Definition 7.6 (hardcore predicates): A polynomialtime computable predicate b : {0, 1}∗ → {0, 1} is called a hardcore of a function f if for every probabilistic polynomialtime algorithm A , every positive polynomial p(·), and all sufficiently large n’s ! 1 1 Pr A ( f (x)) = b(x) < + 2 p(n) where the probability is taken uniformly over all the possible choices of x ∈ {0, 1}n and all the possible outcomes of the internal coin tosses of algorithm A .
250
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.1. ONEWAY FUNCTIONS
Note that for every b : {0, 1}∗ → {0, 1} and f : {0, 1}∗ → {0, 1}∗ , there exist obvious algorithms that guess b(x) from f (x) with success probability at least one half (e.g., the algorithm that, obliviously of its input, outputs a uniformly chosen bit). Also, if b is a hardcore predicate (of any function) then it follows that b is almost unbiased (i.e., for a uniformly chosen x, the difference Pr[b(x) = 0] − Pr[b(x) = 1] must be a negligible function in n). Since b itself is polynomialtime computable, the failure of efficient algorithms to approximate b(x) from f (x) (with success probability that is nonnegligibly higher than one half) must be due either to an information loss of f (i.e., f not being onetoone) or to the difficulty of inverting f . For example, for σ ∈ {0, 1} and x ∈ {0, 1}∗ , the predicate def b(σ x ) = σ is a hardcore of the function f (σ x ) = 0x . Hence, in this case the fact that b is a hardcore of the function f is due to the fact that f loses information (specifically, the first bit: σ ). On the other hand, in the case that f loses no information (i.e., f is onetoone) a hardcore for f may exist only if f is hard to invert. In general, the interesting case is when being a hardcore is a computational phenomenon rather than an informationtheoretic one (which is due to “information loss” of f ). It turns out that any oneway function has a modified version that possesses a hardcore predicate. Theorem 7.7 (a generic hardcore predicate): For any oneway function f , the innerproduct mod 2 of x and r , denoted b(x, r ), is a hardcore of f (x, r ) = ( f (x), r ). In other words, Theorem 7.7 asserts that, given f (x) and a random subset S ⊆ [x], it is infeasible to guess ⊕i∈S xi significantly better than with probability 1/2, where x = x1 · · · xn is uniformly distributed in {0, 1}n . Proof Sketch: The proof is by a socalled reducibility argument (see Section 7.1.2). Specifically, we reduce the task of inverting f to the task of predicting the hardcore of f , while making sure that the reduction (when applied to input distributed as in the inverting task) generates a distribution as in the definition of the predicting task. Thus, a contradiction to the claim that b is a hardcore of f yields a contradiction to the hypothesis that f is hard to invert. We stress that this argument is far more complex than analyzing the corresponding “probabilistic” situation (i.e., the distribution of (r, b(X, r )), where r ∈ {0, 1}n is uniformly distributed and X is a random variable with superlogarithmic minentropy (which represents the “effective” knowledge of x, when given f (x))).4 Our starting point is a probabilistic polynomialtime algorithm B that satisfies, for some polynomial p and infinitely many n’s, Pr[B( f (X n ), Un ) = b(X n , Un )] > (1/2) + (1/ p(n)), where X n and Un are uniformly and independently distributed over def {0, 1}n . Using a simple averaging argument, we focus on a ε = 1/2 p(n) fraction of the x’s for which Pr[B( f (x), Un ) = b(x, Un )] > (1/2) + ε holds. We will show how to use B in order to invert f , on input f (x), provided that x is in this good set (which has density ε). As a warmup, suppose for a moment that, for the aforementioned x’s, algorithm B succeeds with probability p such that p > 34 + 1/poly(x) rather than p > 12 + 1/poly(x). In this case, retrieving x from f (x) is quite easy: To retrieve the i th bit of 4
The minentropy of X is defined as minv {log2 (1/Pr[X = v])}; that is, if X has minentropy m then maxv {Pr[X = v]} = 2−m . The Leftover Hashing Lemma (see Appendix D.2) implies that, in this case, Pr[b(X, Un ) = 1Un ] = 1 −(m) , where U denotes the uniform distribution over {0, 1}n . n 2 ±2
251
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
x, denoted xi , we randomly select r ∈ {0, 1}x , and obtain B( f (x), r ) and B( f (x), r ⊕ei ), where ei = 0i−1 10x−i and v ⊕ u denotes the addition mod 2 of the binary vectors v and u. A key observation underlying the foregoing scheme as well as the rest of the proof is that b(x,r ⊕s) = b(x, r ) ⊕ b(x, s), which can n be readily verified by writing b(x, y) = i=1 xi yi mod 2 and noting that addition modulo 2 of bits corresponds to their XOR. Now, note that if both B( f (x), r ) = b(x, r ) and B( f (x), r ⊕ei ) = b(x, r ⊕ei ) hold, then B( f (x), r ) ⊕ B( f (x), r ⊕ei ) equals b(x, r ) ⊕ b(x, r ⊕ei ) = b(x, ei ) = xi . The probability that both B( f (x), r ) = b(x, r ) and B( f (x), r ⊕ei ) = b(x, r ⊕ei ) hold, for a random r , 1 . Hence, repeating the foregoing procedure is at least 1 − 2 · (1 − p) > 12 + poly(x) sufficiently many times (using independent random choices of such r ’s) and ruling by majority, we retrieve xi with very high probability. Similarly, we can retrieve all the bits of x, and hence invert f on f (x). However, the entire analysis was con1 ducted under (the unjustifiable) assumption that p > 34 + poly(x) , whereas we only 1 know that p > 2 +ε for ε = 1/poly(x). The problem with the foregoing procedure is that it doubles the original error probability of algorithm B on inputs of the form ( f (x), ·). Under the unrealistic (foregoing) assumption that B’s average error on such inputs is nonnegligibly smaller than 14 , the “errordoubling” phenomenon raises no problems. However, in general (and even in the special case where B’s error is exactly 14 ) the foregoing procedure is unlikely to invert f . Note that the average error probability of B (for a fixed f (x), when the average is taken over a random r ) cannot be decreased by repeating B several times (e.g., for every x, it may be that B always answers correctly on threequarters of the pairs ( f (x), r ), and always errs on the remaining quarter). What is required is an alternative way of using the algorithm B, a way that does not double the original error probability of B. The key idea is generating the r ’s in a way that allows for applying algorithm B only once per each r (and i), instead of twice. Specifically, we will invoke B on ( f (x), r ⊕ei ) in order to obtain a “guess” for b(x, r ⊕ei ), and obtain b(x, r ) in a different way (which does not involve using B). The good news is that the error probability is no longer doubled, since we only use B to get a “guess” of b(x, r ⊕ei ). The bad news is that we still need to know b(x, r ), and it is not clear how we can know b(x, r ) without applying B. The answer is that we can guess b(x, r ) by ourselves. This is fine if we only need to guess b(x, r ) for one r (or logarithmically in x many r ’s), but the problem is that we need to know (and hence guess) the value of b(x, r ) for polynomially many r ’s. The obvious way of guessing these b(x, r )’s yields an exponentially small success probability. Instead, we generate these polynomially many r ’s such that, on the one hand, they are “sufficiently random” whereas, on the other hand, we can guess all the b(x, r )’s with noticeable success probability.5 Specifically, generating the r ’s in a specific pairwise independent manner will satisfy both these (conflicting) requirements. We stress that in case we are successful (in our guesses for all the b(x, r )’s), we can retrieve x with high probability. Hence, we retrieve x with noticeable probability. A word about the way in which the pairwise independent r ’s are generated (and the corresponding b(x, r )’s are guessed) is indeed in place. To generate 5
Alternatively, we can try all polynomially many possible guesses. In such a case, we shall output a list of candidates that, with high probability, contains x. (See Exercise 7.6.)
252
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.1. ONEWAY FUNCTIONS def
m = poly(x) many r ’s, we uniformly (and independently) select = log2 (m + 1) strings in {0, 1}x . Let us denote these strings by s 1 , . . . , s . We then guess b(x, s 1 ) through b(x, s ). Let us denote these guesses, which are uniformly (and independently) chosen in {0, 1}, by σ 1 through σ . Hence, the probability that all our guesses 1 . The different r ’s correspond to the diffor the b(x, s i )’s are correct is 2− = poly(x) ferent nonempty subsets of {1, 2, . . . , }. Specifically, for every such subset J , we def let r J = ⊕ j∈J s j . The reader can easily verify that the r J ’s are pairwise independent and each is uniformly distributed in {0, 1}x ; see Exercise 7.5. The key observation is that b(x, r J ) = b(x, ⊕ j∈J s j ) = ⊕ j∈J b(x, s j ). Hence, our guess for b(x, r J ) is ⊕ j∈J σ j , and with noticeable probability all our guesses are correct. Wrapping up everything, we obtain the following procedure, where ε = 1/poly(n) represents a lower bound on the advantage of B in guessing b(x, ·) for an ε fraction of the x’s (i.e., for these good x’s it holds that Pr[B( f (x), Un ) = b(x, Un )] > 12 + ε). Inverting procedure (on input y = f (x) and parameters n and ε ):
Set = log2 (n/ε2 ) + O(1).
(1) Select uniformly and independently s 1 , . . . , s ∈ {0, 1}n . Select uniformly and independently σ 1 , . . . , σ ∈ {0, 1}. (2) For every nonempty J ⊆ [], compute r J = ⊕ j∈J s j and ρ J = ⊕ j∈J σ j . (3) For i = 1, . . . , n determine the bit z i according to the majority vote of the (2 − 1)long sequence of bits (ρ J ⊕ B( f (x), r J ⊕ei ))∅= J ⊆[] . (4) Output z 1 · · · z n . Note that the “voting scheme” employed in Step 3 uses pairwise independent samples (i.e., the r J ’s), but works essentially as well as it would have worked with independent samples (i.e., the independent r ’s).6 That is, for every i and J , it holds that Prs 1 ,...,s [B( f (x), r J ⊕ei ) = b(x, r J ⊕ei )] > (1/2) + ε, where r J = ⊕ j∈J s j , and (for every fixed i) the events corresponding to different J ’s are pairwise independent. It follows that if for every j ∈ [] it holds that σ j = b(x, s j ), then for every i and J we have
Prs 1 ,...,s [ρ J ⊕ B( f (x), r J ⊕ei ) = b(x, ei )] = Prs 1 ,...,s [B( f (x), r J ⊕ei ) = b(x, r J ⊕ei )] >
(7.5) 1 +ε 2
where the equality is due to ρ J = ⊕ j∈J σ j = b(x, r J ) = b(x, r J ⊕ei ) ⊕ b(x, ei ). Note that Eq. (7.5) refers to the correctness of a single vote for b(x, ei ). Using m = 2 − 1 = O(n/ε2 ) and noting that these (Boolean) votes are pairwise independent, we infer that the probability that the majority of these votes is wrong is upperbounded by 1/2n. Using a union bound on all i’s, we infer that with probability at least 1/2, all majority votes are correct and thus x is retrieved correctly. Recall 6
Our focus here is on the accuracy of the approximation obtained by the sample, and not so much on the error probability. We wish to approximate Pr[b(x, r ) ⊕ B( f (x), r ⊕ei ) = 1] up to an additive term of ε, because such an approximation allows for correctly determining b(x, ei ). A pairwise independent sample of O(t/ε 2 ) points allows for an approximation of a value in [0, 1] up to an additive term of ε with error probability 1/t, whereas a totally random sample of the same size yields error probability exp(−t). Since we can afford setting t = poly(n) and having error probability 1/2n, the difference in the error probability between the two approximation schemes is not important here. For a wider perspective, see Appendix D.1.2 and D.3.
253
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
that the foregoing is conditioned on σ j = b(x, s j ) for every j ∈ [], which in turn holds with probability 2− = (m + 1)−1 = (ε2 /n) = 1/poly(n). Thus, x is retrieved correctly with probability 1/poly(n), and the theorem follows. Digest. Looking at the proof of Theorem 7.7, we note that it actually refers to an arbitrary blackbox Bx (·) that approximates b(x, ·); specifically, in the case of Theorem 7.7 we used def Bx (r ) = B( f (x), r ). In particular, the proof does not use the fact that we can verify the correctness of the preimage recovered by the described process. Thus, the proof actually establishes the existence of a poly(n/ε)time oracle machine that, for every x ∈ {0, 1}n , given oracle access to any Bx : {0, 1}n → {0, 1} satisfying 1 +ε (7.6) 2 outputs x with probability at least poly(ε/n). Specifically, x is output with probability at def least p = (ε2 /n). Noting that x is merely a string for which Eq. (7.6) holds, it follows that the number of strings that satisfy Eq. (7.6) is at most 1/ p. Furthermore, by iterating the foregoing procedure for O(1/ p) times we can obtain all these strings (see Exercise 7.7).
Prr ∈{0,1}n [Bx (r ) = b(x, r )] ≥
Theorem 7.8 (Theorem 7.7, revisited): There exists a probabilistic oracle machine that, given parameters n, ε and oracle access to any function B : {0, 1}n → {0, 1}, halts after poly(n/ε) steps and with probability at least 1/2 outputs a list of all strings x ∈ {0, 1}n that satisfy 1 + ε, 2 where b(x, r ) denotes the innerproduct mod 2 of x and r .
Prr ∈{0,1}n [B(r ) = b(x, r )] ≥
(7.7)
This machine can be modified such that, with high probability, its output list does not include any string x such that Prr ∈{0,1}n [B(r ) = b(x, r )] < 12 + 2ε . Theorem 7.8 means that if given some information about x it is hard to recover x, then given the same information and a random r it is hard to predict b(x, r ). This assertion is proved by the counterpositive (see Exercise 7.14).7 Indeed, the foregoing statement is in the spirit of Theorem 7.7 itself, except that it refers to any “information about x” (rather than to the value f (x)). To demonstrate the point, let us rephrase the foregoing statement as follows: For every randomized process , if given s it is hard to obtain (s) then given s and a uniformly distributed r ∈ {0, 1}(s) it is hard to predict b((s), r ).8 A coding theory perspective. Theorem 7.8 can be viewed as a “list decoding” procedure for the Hadamard code, where the Hadamard encoding of a string x ∈ {0, 1}n is the 2n bit long string containing b(x, r ) for every r ∈ {0, 1}n . Specifically, the function B : {0, 1}n → {0, 1} is viewed as a string of length 2n , and each x ∈ {0, 1}n that satisfies Eq. (7.7) corresponds to a codeword (i.e., the Hadamard encoding of x) that is at distance at most (0.5 − ε) · 2n from B. Theorem 7.8 asserts that the list of all such x’s can be (probabilistic) recovered in poly(n/ε)time, when given direct access to the bits of B (and in particular without reading all of B). This yields a very strong listdecoding result for the Hadamard code, where in list decoding the task is recovering all strings that have an 7
The information available about x is represented in Exercise 7.14 by X n , while x itself is represented by h(X n ). Indeed, s is distributed arbitrarily (as X n in Exercise 7.14). Note that Theorem 7.7 is obtained as a special case by letting (s) be uniformly distributed in f −1 (s). 8
254
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
encoding that is within a specified distance from the given string, in contrast to standard decoding in which the task is recovering the unique information that is encoded in the codeword that is closest to the given string. We mention that list decoding is applicable and valuable in the case that the specified distance does not allow for unique decoding (i.e., the specified distance is greater than half the distance of the code). (Note that a very fast uniquedecoding procedure for the Hadamard code is implicit in the warmup discussion at the beginning of the proof of Theorem 7.7.) Applications of hardcore predicates. Turning back to hardcore predicates, we mention that they play a central role in the construction of generalpurpose pseudorandom generators (see Section 8.2), commitment schemes and zeroknowledge proofs (see Sections 9.2.2 and C.4.3), and encryption schemes (see Appendix C.5).
7.1.4. Reflections on Hardness Amplification Let us take notice that something truly amazing happens in Theorems 7.5 and 7.7. We are not talking merely of using an assumption to derive some conclusion; this is common practice in mathematics and science (and was indeed done several times in previous chapters, starting with Theorem 2.28). The thing that is special about Theorems 7.5 and 7.7 (and we shall see more of this in Section 7.2 as well as in Sections 8.2 and 8.3) is that a relatively mild intractability assumption is shown to imply a stronger intractability result. This strengthening of an intractability phenomenon (aka hardness amplification) takes place while we admit that we do not understand the intractability phenomenon (because we do not understand the nature of efficient computation). Nevertheless, hardness amplification is enabled by the use of the counterpositive, which in this case is called a reducibility argument. At this point things look less miraculous: A reducibility argument calls for the design of a procedure (i.e., a reduction) and a probabilistic analysis of its behavior. The design and analysis of such procedures may not be easy, but it is certainly within the standard expertise of computer science. The fact that hardness amplification is achieved via this counterpositive is best represented in the statement of Theorem 7.8.
7.2. Hard Problems in E As in Section 7.1, we start with the assumption P = N P and seek to use it to our benefit. Again, we shall actually use a seemingly stronger assumption; here, the strengthening is in requiring worstcase hardness with respect to nonuniform models of computation (rather than averagecase hardness with respect to the standard uniform model). Specifically, we shall assume that N P cannot be solved by (nonuniform) families of polynomialsize circuits; that is, N P is not contained in P/poly (even not infinitely often). Our goal is to transform this worstcase assumption into an averagecase condition, which is useful for our applications. Since the transformation will not yield a problem in N P but rather one in E, we might as well take the seemingly weaker assumption by which E is not contained in P/poly (see Exercise 7.9). That is, our starting point is actually that there exists an exponentialtime solvable decision problem such that any family of polynomialsize circuits fails to solve it correctly on all but finitely many input lengths.9 9
Note that our starting point is actually stronger than assuming the existence of a function f in E \ P/poly. Such an assumption would mean that any family of polynomialsize circuits fails to compute f correctly on infinitely many input lengths, whereas our starting point postulates failures on all but finitely many lengths.
255
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
A different perspective on our assumption is provided by the fact that E contains problems that cannot be solved in polynomial time (cf.. Section 4.2.1). The current assumption goes beyond this fact by postulating the failure of nonuniform polynomialtime machines rather than the failure of (uniform) polynomialtime machines. Recall that our goal is to obtain a predicate (i.e., a decision problem) that is computable in exponential time but is inapproximable by polynomialsize circuits. For the sake of later developments, we formulate a general notion of inapproximability. Definition 7.9 (inapproximability, a general formulation): We say that f : {0, 1}∗ → {0, 1} is (S, ρ)inapproximable if for every family of Ssize circuits {Cn }n∈N and all sufficiently large n it holds that
Pr[Cn (Un ) = f (Un )] ≥
ρ(n) 2
(7.8)
We say that f is T inapproximable if it is (T, 1 − (1/T ))inapproximable. We chose the specific form of Eq. (7.8) such that the “level of inapproximability” represented by the parameter ρ will range in (0, 1) and increase with the value of ρ. Specifically, (almosteverywhere) worstcase hardness for circuits of size S is represented by (S, ρ)inapproximability with ρ(n) = 2−n+1 (i.e., in this case Pr[C(Un ) = f (Un )] ≥ 2−n for every circuit Cn of size S(n)). On the other hand, no predicate can be (S, ρ)inapproximable for ρ(n) = 1 − 2−n even with S(n) = O(n) (i.e., Pr[C(Un ) = f (Un )] ≥ 0.5 + 2−n−1 holds for some linearsize circuit; see Exercise 7.10). We note that Eq. (7.8) can be interpreted as an upper bound on the correlation of each adequate circuit with f (i.e., Eq. (7.8) is equivalent to E[χ(C(Un ), f (Un ))] ≤ 1 − ρ(n), where χ(σ, τ ) = 1 if σ = τ and χ(σ, τ ) = −1 otherwise).10 Thus, T inapproximability means that no family of size T circuits can correlate f better than 1/T . We note that the existence of a nonuniformly hard oneway function (as in Definition 7.3) implies the existence of an exponentialtime computable predicate that is T inapproximable for every polynomial T . (For details see Exercise 7.24.) However, our goal in this section is to establish this conclusion under a seemingly weaker assumption. On almosteverywhere hardness. We highlight the fact that both our assumptions and conclusions refer to almosteverywhere hardness. For example, our starting point is not merely that E is not contained in P/poly (or in other circuitsize classes to be discussed), but rather that this is the case almost everywhere. Note that by saying that f has circuit complexity exceeding S, we merely mean that there are infinitely many n’s such that no circuit of size S(n) can compute f correctly on all inputs of length n. In contrast, by saying that f has circuit complexity exceeding S almost everywhere, we mean that for all but finite many n’s no circuit of size S(n) can compute f correctly on all inputs of length n. (Indeed, it is not known whether an “infinitely often” type of hardness implies a corresponding “almosteverywhere” hardness.) 10
Indeed, E[χ (X, Y )] = Pr[X = Y ] − Pr[X = Y ] = 1 − 2Pr[X = Y ].
256
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
The class E. Recall that E denotes the class of exponentialtime solvable decision problems (equivalently, exponentialtime computable Boolean predicates); that is, E = def ∪ε DTIME(tε ), where tε (n) = 2εn . The rest of this section. We start (in Section 7.2.1) with a treatment of assumptions and hardness amplification regarding polynomialsize circuits, which suffice for nontrivial derandomization of BPP. We then turn (in Section 7.2.2) to assumptions and hardness amplification regarding exponentialsize circuits, which yield a “full” derandomization of BPP (i.e., BPP = P). In fact, both sections contain material that is applicable to various other circuitsize bounds, but the motivational focus is as stated. Teaching note: Section 7.2.2 is advanced material, which is best left for independent reading. Furthermore, for one of the central results (i.e., Lemma 7.23) only an outline is provided and the interested reader is referred to the original paper [128].
7.2.1. Amplification with Respect to PolynomialSize Circuits Our goal here is to prove the following result. Theorem 7.10: Suppose that for every polynomial p there exists a problem in E having circuit complexity that is almosteverywhere greater than p. Then there exist polynomialinapproximable Boolean functions in E; that is, for every polynomial p there exists a pinapproximable Boolean function in E. Theorem 7.10 is used toward deriving a meaningful derandomization of BPP under the aforementioned assumption (see Part 2 of Theorem 8.19). We present two proofs of Theorem 7.10. The first proof proceeds in two steps: 1. Starting from the worstcase hypothesis, we first establish some mild level of averagecase hardness (i.e., a mild level of inapproximability). Specifically, we show that for every polynomial p there exists a problem in E that is ( p, ε)inapproximable for ε(n) = 1/n 3 . 2. Using the foregoing mild level of inapproximability, we obtain the desired strong level of inapproximability (i.e., p inapproximability for every polynomial p ). Specifically, for every two polynomials p1 and p2 , we prove that if the function f is t(n) ( p1 , 1/ p2 )inapproximable, then the function F(x1 , . . . , xt(n) ) = ⊕i=1 f (xi ), where t(n) = n · p2 (n) and x1 , . . . , xt(n) ∈ {0, 1}n , is p inapproximable for p (t(n) · n) = p1 (n)(1) /poly(t(n)). This claim is known as Yao’s XOR Lemma and its proof is far more complex than the proof of its informationtheoretic analogue (discussed at the beginning of §7.2.1.2). The second proof of Theorem 7.10 consists of showing that the construction employed in the first step, when composed with Theorem 7.8, actually yields the desired end result. This proof will uncover a connection between hardness amplification and coding theory. Our presentation will thus proceed in three corresponding steps (presented in §7.2.1.1–7.2.1.3, and schematically depicted in Figure 7.2).
257
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
via list decoding (7.2.1.3)
mild averagecase
worstcase HARDNESS
7.2.1.1
HARDNESS
Yao’s XOR
inapprox.
7.2.1.2
derandomized Yao’s XOR (7.2.2) Figure 7.2: Proofs of hardness amplification: Organization.
7.2.1.1. From WorstCase Hardness to Mild AverageCase Hardness The transformation of worstcase hardness into averagecase hardness (even in a mild sense) is indeed remarkable. Note that worstcase hardness may be due to a relatively small number of instances, whereas even mild forms of averagecase hardness refer to a very large number of possible instances.11 In other words, we should transform hardness that may occur on a negligible fraction of the instances into hardness that occurs on a noticeable fraction of the instances. Intuitively, we should “spread” the hardness of few instances (of the original problem) over all (or most) instances (of the transformed problem). The counterpositive view is that computing the value of typical instances of the transformed problem should enable solving the original problem on every instance. The aforementioned transformation is based on the selfcorrection paradigm, to be reviewed first. The paradigm refers to functions g that can be evaluated at any desired point by using the value of g at a few random points, where each of these points is uniformly distributed in the function’s domain (but indeed the points are not independently distributed). The key observation is that if g(x) can be reconstructed based on the value of g at t such random points, then such a reconstruction can tolerate a 1/3t fraction of errors (regarding the values of g). Thus, if we can correctly obtain the value of g on all but at most a 1/3t fraction of its domain, then we can probabilistically recover the correct value of g at any point with very high probability. It follows that if no probabilistic polynomialtime algorithm can correctly compute g in the worstcase sense, then every probabilistic polynomialtime algorithm must fail to correctly compute g on more than a 1/3t fraction of its domain. The archetypical example of a selfcorrectable function is any mvariate polynomial of individual degree d over a finite field F such that F > dm + 1. The value of such a polynomial at any desired point x can be recovered based on the values of dm + 1 points (other than x) that reside on a random line that passes through x. Note that each of these points is uniformly distributed in F m , which is the function’s domain. (For details, see Exercise 7.11.) Recall that we are given an arbitrary function f ∈ E that is hard to compute in the worst case. Needless to say, this function is not necessarily selfcorrectable (based on relatively 11
Indeed, worstcase hardness with respect to polynomialsize circuits cannot be due to a polynomial number of instances, because a polynomial number of instances can be hardwired into such circuits. Still, for all we know, worstcase hardness may be due to a small superpolynomial number of instances (e.g., n log2 n instances). In contrast, even mild forms of averagecase hardness must be due to an exponential number of instances (i.e., 2n /poly(n) instances).
258
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
few points), but it can be extended into such a function. Specifically, we extend f : [N ] → {0, 1} (viewed as f : [N 1/m ]m → {0, 1}) to an mvariate polynomial of individual degree d over a finite field F such that F > dm + 1 and (d + 1)m = N . Intuitively, in terms of worstcase complexity, the extended function is at least as hard as f , which means that it is hard (in the worst case). The point is that the extended function is selfcorrectable and thus its worstcase hardness implies that it must be at least mildly hard in the averagecase. Details follow. Construction 7.11 (multivariate extension):12 For any function f n : {0, 1}n → {0, 1}, a finite field F, a set H ⊂ F, and an integer m such that H m = 2n and F > (H  − 1)m + 1, we consider the function fˆn : F m → F defined as the mvariate polynomial of individual degree H  − 1 that extends f n : H m → {0, 1}. That is, we identify {0, 1}n with H m , and define fˆn as the unique mvariate polynomial of individual degree H  − 1 that satisfies fˆn (x) = f n (x) for every x ∈ H m , where we view {0, 1} as a subset of F. Note that fˆn can be evaluated at any desired point, by evaluating f n on its entire domain, and determining the unique mvariate polynomial of individual degree H  − 1 that agrees with f n on H m (see Exercise 7.12). Thus, for f : {0, 1}∗ → {0, 1} in E, the corresponding fˆ (defined by separately extending the restriction of f to each input length) is also in E. For the sake of preserving various complexity measures, we wish to have F m  = poly(2n ), which leads to setting m = n/ log2 n (yielding H  = n and F = poly(n)). In particular, in this case fˆn is defined over strings of length O(n). The mild averagecase hardness of fˆ follows by the foregoing discussion. In fact, we state and prove a more general result. Theorem 7.12: Suppose that there exists a Boolean function f in E having circuit complexity that is almosteverywhere greater than S. Then, there exists an exponentialtime computable function fˆ : {0, 1}∗ → {0, 1}∗ such that  fˆ(x) ≤ x and for every family of circuit {Cn }n ∈N of size S (n ) = S(n /O(1))/poly(n ) it holds that Pr[Cn (Un ) = fˆ(Un )] > (1/n )2 . Furthermore, fˆ does not depend on S. Theorem 7.12 seems to complete the first step of the proof of Theorem 7.10, except that we desire a Boolean function rather than a function that merely does not stretch its input. The extra step of obtaining a Boolean function that is (poly(n), n −3 )inapproximable is taken in Exercise 7.13.13 Essentially, if fˆ is hard to compute on a noticeable fraction of its inputs then the Boolean predicate that on input (x, i) returns the i th bit of fˆ(x) must be mildly inapproximable. Proof Sketch: Given f as in the hypothesis and for every n ∈ N, we consider the restriction of f to {0, 1}n , denoted f n , and apply Construction 7.11 to it, while using m = n/ log n, H  = n and n 2 < F = poly(n). Recall that the resulting function fˆn maps strings of length n = log2 F m  = O(n) to strings of length 12
The algebraic fact underlying this construction is that for any function f : H m → F there exists a unique mvariate polynomial fˆ : F m → F of individual degree H  − 1 such that for every x ∈ H m it holds that fˆ (x) = f (x). This polynomial is called a multivariate polynomial extension of f , and it can be found in poly(H m log F)time. For details, see Exercise 7.12. 13 A quantitatively stronger bound can be obtained by noting that the proof of Theorem 7.12 actually establishes an error lower bound of ((log n )/(n )2 ) and that  fˆ(x) = O(log x).
259
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
log2 F = O(log n). Following the foregoing discussion, we shall show that circuits that approximate fˆn too well yield circuits that compute f n correctly on each input. Using the hypothesis regarding the size of the latter, we shall derive a lower bound on the size of the former. The actual (reducibility) argument proceeds as follows. We fix an arbitrary circuit Cn that satisfies
Pr[Cn (Un ) = fˆn (Un )] ≥ 1 − (1/n )2 > 1 − (1/3t),
(7.9)
def where t = (H  − 1)m + 1 = o(n 2 ) exceeds the total degree of fˆn . Using the selfcorrection feature of fˆn , we observe that by making t oracle calls to Cn we can probabilistically recover the value of ( fˆn and thus of) f n on each input, with probability at least 2/3. Using error reduction and (nonuniform) derandomization as in the proof of Theorem 6.3,14 we obtain a circuit of size n 3 · Cn  that computes f n . By the hypothesis n 3 · Cn  > S(n), and so Cn  > S(n /O(1))/poly(n ). Recalling that Cn is an arbitrary circuit that satisfies Eq. (7.9), the theorem follows.
Digest. The proof of Theorem 7.12 is actually a worstcase to averagecase reduction. That is, the proof consists of a selfcorrection procedure that allows for the evaluation of f at any desired nbit long point, using oracle calls to any circuit that computes fˆ correctly on a 1 − (1/n )2 fraction of the n bit long inputs. We recall that if f ∈ E then fˆ ∈ E, but we do not know how to preserve the complexity of f in case it is in N P. (Various indications to the difficulty of a worstcase to averagecase reduction for N P are known; see, e.g., [43].) We mention that the ideas underlying the proof of Theorem 7.12 have been applied in a large variety of settings. For example, we shall see applications of the selfcorrection paradigm in §9.3.2.1 and in §9.3.2.2. Furthermore, in §9.3.2.2 we shall reencounter the very same multivariate extension used in the proof of Theorem 7.12.
7.2.1.2. Yao’s XOR Lemma Having obtained a mildly inapproximable predicate, we wish to obtain a strongly inapproximable one. The informationtheoretic context provides an appealing suggestion: Suppose that X is a Boolean random variable (representing the mild inapproximability of the aforementioned predicate) that equals 1 with probability ε. Then XORing the outcome of n/ε independent samples of X yields a bit that equals 1 with probability 0.5 ± exp(−(n)). It is tempting to think that the same should happen in the computational setting. That is, if f is hard to approximate correctly with probability exceeding 1 − ε then XORing the output of f on n/ε nonoverlapping parts of the input should yield a predicate that is hard to approximate correctly with probability that is nonnegligibly higher than 1/2. The latter assertion turns out to be correct, but (even more than in Section 7.1.2) the proof of the computational phenomenon is considerably more complex than the analysis of the informationtheoretic analogue. Theorem 7.13 (Yao’s XOR Lemma): There exists a universal constant c > 0 such that the following holds. If, for some polynomials p1 and p2 , the Boolean function f is 14
First, we apply the foregoing probabilistic procedure O(n) times and take a majority vote. This yields a probabilistic procedure that, on input x ∈ {0, 1}n , invokes Cn for o(n 3 ) times and computes f n (x) correctly with probability greater than 1 − 2−n . Finally, we just fix a sequence of random choices that is good for all 2n possible inputs, and obtain a circuit of size n 3 · Cn  that computes f n correctly on every nbit input.
260
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E t(n)
( p1 , 1/ p2 )inapproximable, then the function F(x1 , . . . , xt(n) ) = ⊕i=1 f (xi ), where t(n) = n · p2 (n) and x1 , . . . , xt(n) ∈ {0, 1}n , is p inapproximable for p (t(n) · n) = p1 (n)c /t(n)1/c . Furthermore, the claim also holds if the polynomials p1 and p2 are replaced by any integer functions. Combining Theorem 7.12 (and Exercise 7.13), and Theorem 7.13, we obtain a proof of Theorem 7.10. (Recall that an alternative proof is presented in §7.2.1.3.) We note that proving Theorem 7.13 seems more difficult than proving Theorem 7.5 (i.e., the amplification of oneway functions), due to two issues. Firstly, unlike in Theorem 7.5, the computational problems are not in PC and thus we cannot efficiently recognize correct solutions to them. Secondly, unlike in Theorem 7.5, solutions to instances of the transformed problem do not correspond to the concatenation of solutions for the original instances, but are rather a function of the latter that loses almost all the information about the latter. The proof of Theorem 7.13 presented next deals with each of these two difficulties separately. Several different proofs of Theorem 7.13 are known. As just stated, the proof that we present is conceptually appealing because it deals separately with two unrelated difficulties. Furthermore, this proof benefits most from the material already presented in Section 7.1. The proof proceeds in two steps: 1. First we prove that the corresponding “direct product” function P(x1 , . . . , xt(n) ) = ( f (x1 ), . . . , f (xt(n) )) is difficult to compute in a strong averagecase sense. 2. Next we establish the desired result by an application of Theorem 7.8. Thus, given Theorem 7.8, our main focus is on the first step, which is of independent interest (and is thus generalized from Boolean functions to arbitrary ones). Theorem 7.14 (the Direct Product Lemma): Let p1 and p2 be two polynomials, and suppose that f : {0, 1}∗ → {0, 1}∗ is such that for every family of p1 size circuits, {Cn }n∈N , and all sufficiently large n ∈ N, it holds that Pr[Cn (Un ) = f (Un )] > 1/ p2 (n). Let P(x1 , . . . , xt(n) ) = ( f (x1 ), . . . , f (xt(n) )), where x1 , . . . , xt(n) ∈ {0, 1}n and t(n) = n · p2 (n). Then, for any ε : N → (0, 1], setting p such that p (t(n) · n) = p1 (n)/poly(t(n)/ε (t(n) · n)), it holds that every family of p size circuits, {Cm }m∈N , satisfies Pr[Cm (Um ) = P(Um )] < ε (m). Furthermore, the claim also holds if the polynomials p1 and p2 are replaced by any integer functions. In particular, for an adequate constant c > 0, selecting ε (t(n) · n) = p1 (n)−c , we obtain p (t(n) · n) = p1 (n)c /t(n)1/c , and so ε (m) ≤ 1/ p (m). Deriving Theorem 7.13 from Theorem 7.14. Theorem 7.13 follows from Theorem 7.14 by considering the function P (x1 , . . . , xt(n) , r ) = b( f (x1 ) · · · f (xt(n) ), r ), where f is a Boolean function, r ∈ {0, 1}t(n) , and b(y, r ) is the innerproduct modulo 2 of the t(n)bit long strings y and r . Note that, for the corresponding P, we have P (x1 , . . . , xt(n) , r ) ≡ b(P(x1 , . . . , xt(n) ), r ), whereas F(x1 , . . . , xt(n) ) = P (x1 , . . . , xt(n) , 1t(n) ). Intuitively, the inapproximability of P should follow from the strong averagecase hardness of P (via Theorem 7.8), whereas it should be possible to reduce the approximation of P to the approximation of F (and thus derive the desired inapproximability of F). Indeed, this
261
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
intuition does not fail, but detailing the argument seems a bit cumbersome (and so we only provide the clues here). Assuming that f is ( p1 , 1/ p2 )inapproximable, we first apply Theorem 7.14 (with ε (t(n) · n) = p1 (n)−c ) and then apply Theorem 7.8 (see Exercise 7.14), inferring that P is p inapproximable for p (t(n) · n) = p1 (n)(1) /poly(t(n)). The less obvious part of the argument is reducing the approximation of P to the approximation of F. The key observation is that " f (z i ) (7.10) P (x1 , . . . , xt(n) , r ) = F(z 1 , . . . , z t(n) ) ⊕ i:ri =0
where z i = xi if ri = 1 and is an arbitrary nbit long string otherwise. Now, if somebody provides us with samples of the distribution (Un , f (Un )), then we can use these samples in the role of the pairs (z i , f (z i )) for the indices i that satisfy ri = 0. Considering a best choice of such samples (i.e., one for which we obtain the best approximation of P ), we obtain a circuit that approximates P (by using a circuit that approximates F and the said choices of samples). (The details are left for Exercise 7.17.) Theorem 7.13 follows. Proving Theorem 7.14. Note that Theorem 7.14 is closely related to Theorem 7.5; see Exercise 7.20 for details. This suggests employing an analogous proof strategy, that is, converting circuits that violate the theorem’s conclusion into circuits that violate the theorem’s hypothesis. We note, however, that things were much simpler in the context of Theorem 7.5: There we could (efficiently) check whether or not a value contained in the output of the circuit that solves the directproduct problem constitutes a correct answer for the corresponding instance of the basic problem. Lacking such an ability in the current context, we shall have to use such values more carefully. Loosely speaking, we shall take a weighted majority vote among various answers, where the weights reflect our confidence in the correctness of the various answers. We establish Theorem 7.14 by applying the following lemma that provides quantitative bounds on the feasibility of computing the direct product of two functions. In this lemma, {Ym }m∈N and {Z m }m∈N are independent probability ensembles such that Ym , Z m ∈ {0, 1}m , and X n = (Y(n) , Z n−(n) ) for some function : N → N. The lemma refers to the success probability of computing the direct product function F : {0, 1}∗ → {0, 1}∗ defined by F(yz) = (F1 (y), F2 (z)), where y = (yz), when given bounds on the success probability of computing F1 and F2 (separately). Needless to say, these probability bounds refer to circuits of certain sizes. (The slackness parameter ε represents a deviation from an idealized result in which the probability of correctly computing F is upperbounded by the product of the probabilities of correctly computing F1 and F2 , where tightening this slackness (i.e., decreasing ε) comes at the cost of decreasing the size of the circuit for which the success probability bound holds.) We stress that the lemma is not symmetric with respect to the two functions: It guarantees a stronger (and in fact lossless) preservation of circuit sizes for one of the functions (which is arbitrarily chosen to be F1 ). Lemma 7.15 (Direct Product, a quantitative twoargument version): For {Ym }, {Z m }, F1 , F2 , , {X n } and F as in the foregoing, let ρ1 (·) be an upper bound on the success probability of s1 (·)size circuits in computing F1 over {Ym }. That is, for every such circuit family {Cm }
Pr[Cm (Ym ) = F1 (Ym )] ≤ ρ1 (m).
262
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
Likewise, suppose that ρ2 (·) is an upper bound on the probability that s2 (·)size circuits compute F2 over {Z m }. Then, for every function ε : N → R, the function ρ defined as def
ρ(n) = ρ1 ((n)) · ρ2 (n − (n)) + ε(n) is an upper bound on the probability that families of s(·)size circuits correctly compute F over {X n }, where s2 (n − (n)) def s(n) = min s1 ((n)), poly(n/ε(n)) . Theorem 7.14 is derived from Lemma 7.15 by using a careful induction, which capitalizes on the highly quantitative form of Lemma 7.15 and in particular on the fact that no loss is incurred for one of the two functions that are used. We first detail this argument, and next establish Lemma 7.15 itself. Deriving Theorem 7.14 from Lemma 7.15. We write P(x1 , x2 , . . . , xt(n) ) as P (t(n)) (x1 , x2 , . . . , xt(n) ), where P (i) (x1 , . . . , xi ) = ( f (x1 ), . . . , f (xi )) and P (i) (x1 , . . . , xi ) ≡ (P (i−1) (x1 , . . . , xi−1 ), f (xi )). For any function ε, we shall prove by induction on i that circuits of size s(n) = p1 (n)/poly(t(n)/ε(n)) cannot compute P (i) (Ui·n ) with success probability greater than (1 − (1/ p2 (n)))i + (i − 1) · ε(n), where p1 and p2 are as in Theorem 7.14. Thus, no s(n)size circuit can compute P (t(n)) (Ut(n)·n ) with success probability greater than (1 − (1/ p2 (n))t(n) + (t(n) − 1) · ε(n) = exp(−n) + (t(n) − 1) · ε(n). Recalling that this is established for any function ε, Theorem 7.14 follows (by using ε(n) = ε (t(n) · n)/t(n), and observing that the setting s(n) = p (t(n) · n) satisfies s(n) = p1 (n)/poly(t(n)/ε(n))). Turning to the induction itself, we first note that its basis (i.e., i = 1) is guaranteed by the theorem’s hypothesis (i.e., the hypothesis of Theorem 7.14 regarding f ). The induction step (i.e., from i to i + 1) will be proved by using Lemma 7.15 with F1 = P (i) and F2 = f , (i) (i) along with the parameter setting ρ1 (i · n) = (1 − (1/ p2 (n))i + (i − 1) · ε(n), s1 (i · n) = (i) (i) s(n), ρ2 (n) = 1 − (1/ p2 (n)) and s2 (n) = poly(n/ε(n)) · s(n) = p1 (n). Details follow. Note that the induction hypothesis (regarding P (i) ) implies that F1 satisfies the hypoth(i) (i) esis of Lemma 7.15 (wrt size s1 and success rate ρ1 ), whereas the theorem’s hypothesis (i) regarding f implies that F2 satisfies the hypothesis of Lemma 7.15 (wrt size s2 and suc(i) cess rate ρ2 ). Thus, F = P (i+1) satisfies the lemma’s conclusion with respect to circuits of (i) (i) (i) (i) size min(s1 (i · n), s2 (n)/poly(n/ε(n))) = s(n) and success rate ρ1 (i · n) · ρ2 (n) + ε(n) i+1 which is upperbounded by (1 − (1/ p2 (n))) + i · ε(n). This completes the induction step. We stress the fact that we used induction for a nonconstant number of steps, and that this was enabled by the highly quantitative form of the inductive claim and the small loss incurred by the inductive step. Specifically, the size bound did not decrease during the induction (although we could afford a small additive loss in each step, but not a constant factor loss).15 Likewise, the success rate suffered an additive increase of ε(n) in each 15
Note that if we had s(n) = min{s1 ((n)), s2 (n − (n))}/2 (in Lemma 7.15) then the foregoing argument would (i+1) (i+1) (i) such that s1 ((i + 1) · n) = min{s1 (i · n), s2 (n)}/2, which yield that P (i+1) is hard for circuits of size s1 (t(n)) would yield a meaningless result for P = P (t(n)) (since s1 (t(n) · n) = min{ p1 (n), p2 (n)}/2t(n) , where t(n) n and pi (n) < 2n ).
263
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
step, which was accommodated by the inductive claim. Thus, assuming the correctness of Lemma 7.15, we have established Theorem 7.14. Proof of Lemma 7.15: Proceeding (as usual) by the contrapositive, we consider a family of s(·)size circuits {Cn }n∈N that violates the lemma’s conclusion; that is, Pr[Cn (X n ) = F(X n )] > ρ(n). We will show how to use such circuits in order to obtain either circuits that violate the lemma’s hypothesis regarding F1 or circuits that violate the lemma’s hypothesis regarding F2 . Toward this end, it is instructive to write the success probability of Cn in a conditional form, while denoting the i th output of Cn (x) by Cn (x)i (i.e., Cn (x) = (Cn (x)1 , Cn (x)2 )):
Pr[Cn (Y(n) , Z n−(n) ) = F(Y(n) , Z n−(n) )] = Pr[Cn (Y(n) , Z n−(n) )1 = F1 (Y(n) )] · Pr[Cn (Y(n) , Z n−(n) )2 = F2 (Z n−(n) )  Cn (Y(n) , Z n−(n) )1 = F1 (Y(n) )]. The basic idea is that if the first factor is greater than ρ1 ((n)) then we immediately derive a circuit (i.e., Cn (y) = Cn (y, Z n−(n) )1 ) contradicting the lemma’s hypothesis regarding F1 , whereas if the second factor is significantly greater than ρ2 (n − (n)) then we can obtain a circuit contradicting the lemma’s hypothesis regarding F2 . The treatment of the latter case is indeed not obvious. The idea is that a sufficiently large sample of (Y(n) , F1 (Y(n) )), which may be hardwired into the circuit, allows for using the conditional probability space (in such a circuit) toward an attempt to approximate F2 . That is, on input z, we select uniformly a string y satisfying Cn (y, z)1 = F1 (y) (from the aforementioned sample), and output Cn (y, z)2 . For a fixed z, sampling of the conditional space (i.e., y’s satisfying Cn (y, z)1 = F1 (y)) is possible provided that Pr[Cn (Y(n) , z)1 = F1 (Y(n) )] holds with noticeable probability. The last caveat motivates a separate treatment of z’s having a noticeable value of Pr[Cn (Y(n) , z)1 = F1 (Y(n) )] and of the rest of z’s (which are essentially ignored). Details follow. Let us first simplify the notations by fixing a generic n and using the abbreviations C = Cn , ε = ε(n), = (n), Y = Y , and Z = Yn− . We call z good if Pr[C(Y, z)1 = F1 (Y )] ≥ ε/2 and let G be the set of good z’s. Next, rather than considering the event C(Y, Z ) = F(Y, Z ), we consider the combined event C(Y, Z ) = F(Y, Z ) ∧ Z ∈ G, which occurs with almost the same probability (up to an additive error term of ε/2). This is the case because, for any z ∈ G, it holds that
Pr[C(Y, z) = F(Y, z)] ≤ Pr[C(Y, z)1 = F1 (Y )] < ε/2 and thus z’s that are not good do not contribute much to Pr[C(Y, Z ) = F(Y, Z )]; that is, Pr[C(Y, Z ) = F(Y, Z ) ∧ Z ∈ G] is lowerbounded by Pr[C(Y, Z ) = F(Y, Z )] − ε/2. Using Pr[C(Y, z) = F(Y, z)] > ρ(n) = ρ1 () · ρ2 (n − ) + ε, we have ε Pr[C(Y, Z ) = F(Y, Z ) ∧ Z ∈ G] > ρ1 () · ρ2 (n − ) + (7.11) 2. We proceed according to the foregoing outline, first showing that if Pr[C(Y, Z )1 = F1 (Y )] > ρ1 () then we immediately derive circuits violating the hypothesis concerning F1 . Actually, we prove something stronger (which we will need for the other case). Claim 7.15.1: For every z, it holds that Pr[C(Y, z)1 = F1 (Y )] ≤ ρ1 ().
264
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
Proof: Otherwise, using any z ∈ {0, 1}n− that satisfies Pr[C(Y, z)1 = F1 (Y )] > def
ρ1 (), we obtain a circuit C (y) = C(y, z)1 that contradicts the lemma’s hypoth2 esis concerning F1 . Using Claim 7.15.1, we show how to obtain a circuit that violates the lemma’s hypothesis concerning F2 , and in doing so we complete the proof of the lemma. Claim 7.15.2: There exists a circuit C of size s2 (n − ) such that
Pr[C (Z ) = F2 (Z )] ≥
Pr[C(Y, Z ) = F(Y, Z ) ∧ Z ∈ G] ε − ρ1 () 2
> ρ2 (n − ) Proof: The second inequality is due to Eq. (7.11), and thus we focus on establishing
the first inequality. We construct the circuit C as suggested in the foregoing outline. Specifically, we take a poly(n/ε)large sample, denoted S, from the distribution def (Y, F1 (Y )) and let C (z) = C(y, z)2 , where (y, v) is uniformly selected among the elements of S for which C(y, z)1 = v holds. Details follow. Let m be a sufficiently large number that is upperbounded by a polynomial in n/ε, and consider a random sequence of m pairs, generated by taking m independent samples from the distribution (Y, F1 (Y )). We stress that we do not assume here that such a sample, denoted S, can be produced by an efficient (uniform) algorithm (but, jumping ahead, we remark that such a sequence can be fixed nonuniformly). For each z ∈ G ⊆ {0, 1}n− , we denote by Sz the set of pairs (y, v) ∈ S for which C(y, z)1 = v. Note that Sz is a random sample of the residual probability space defined by (Y, F1 (Y )) conditioned on C(Y, z)1 = F1 (Y ). Also, with overwhelmingly high probability, Sz  = (n/ε2 ), because z ∈ G implies Pr[C(Y, z)1 = F1 (Y )] ≥ ε/2 and m = (n/ε3 ).16 Thus, for each z ∈ G, with overwhelming probability (taken over the choices of S), the sample Sz provides a good approximation of the conditional probability space.17 In particular, with probability greater than 1 − 2−n , it holds that ε {(y, v) ∈ Sz : C(y, z)2 = F2 (z)} ≥ Pr[C(Y, z)2 = F2 (z)  C(Y, z)1 = F1 (Y )] − . Sz  2 (7.12) Thus, with positive probability, Eq. (7.12) holds for all z ∈ G ⊆ {0, 1}n− . The circuit C computing F2 is now defined as follows. The circuit will contain a set S = {(yi , vi ) : i = 1, . . . , m} (i.e., S is “hardwired” into the circuit C ) such that the following two conditions hold: 1. For every i ∈ [m] it holds that vi = F1 (yi ). 2. For each good z the set Sz = {(y, v) ∈ S : C(y, z)1 = v} satisfies Eq. (7.12). (In particular, Sz is not empty for any good z.) 16
Note that the expected size of Sz is m · ε/2 = (n/ε 2 ). Using the Chernoff Bound, we get Pr S [Sz  < mε/4] = exp(−(n/ε 2 )) < 2−n . 17 For Tz = {y : C(y, z)1 = F1 (y)}, we are interested in a sample S of Tz such that {y ∈ S : C(y, z)2 = F2 (z)}/S  approximates Pr[C(Y, z)2 = F2 (z)  Y ∈ Tz ] up to an additive term of ε/2. Using the Chernoff Bound again, we note that a random S ⊂ Tz of size (n/ε 2 ) provides such an approximation with probability greater than 1 − 2−n .
265
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
On input z, the circuit C first determines the set Sz , by running C for m times and checking, for each i = 1, . . . , m, whether or not C(yi , z) = vi . In case Sz is empty, the circuit returns an arbitrary value. Otherwise, the circuit selects uniformly a pair (y, v) ∈ Sz and outputs C(y, z)2 . (The latter random choice can be eliminated by an averaging argument; see Exercise 7.16.) Using the definition of C and Eq. (7.12), we have: Pr[C (Z ) = F2 (Z )] ≥ Pr[Z = z] · Pr[C (z) = F2 (z)] z∈G
{(y, v) ∈ Sz : C(y, z)2 = F2 (z)} Sz  z∈G
ε Pr[Z = z] · Pr[C(Y, z)2 = F2 (z)  C(Y, z)1 = F1 (Y )] − ≥ 2 z∈G ε Pr[C(Y, z)2 = F2 (z) ∧ C(Y, z)1 = F1 (Y )] = Pr[Z = z] · − Pr[C(Y, z)1 = F1 (Y )] 2 =
Pr[Z = z] ·
z∈G
Next, using Claim 7.15.1, we have: ε Pr[C(Y, z) = F(Y, z)] Pr[C (Z ) = F2 (Z )] ≥ Pr[Z = z] · − ρ1 () 2 z∈G =
Pr[C(Y, Z ) = F(Y, Z ) ∧ Z ∈ G] ε − ρ1 () 2
Finally, using Eq. (7.11), the claim follows.
2
This completes the proof of the lemma.
Comments. Firstly, we wish to call attention to the care with which an inductive argument needs to be carried out in the computational setting, especially when a nonconstant number of inductive steps is concerned. Indeed, our inductive proof of Theorem 7.14 involves invoking a quantitative lemma (i.e., Lemma 7.15) that allows for keeping track of the relevant quantities (e.g., success probability and circuit size) throughout the induction process. Secondly, we mention that Lemma 7.15 (as well as Theorem 7.14) has a uniform complexity version that assumes that one can efficiently sample the distribution (Y(n) , F1 (Y(n) )) (resp., (Un , f (Un ))). For details see [102]. Indeed, a good lesson from the proof of Lemma 7.15 is that nonuniform circuits can “effectively sample” any distribution. Lastly, we mention that Theorem 7.5 (the amplification of oneway functions) and Theorem 7.13 (Yao’s XOR Lemma) also have (tight) quantitative versions (see, e.g., [91, Sec. 2.3.2] and [102, Sec. 3], respectively).
7.2.1.3. List Decoding and Hardness Amplification Recall that Theorem 7.10 was proved in §7.2.1.1–7.2.1.2, by first constructing a mildly inapproximable predicate via Construction 7.11, and then amplifying its hardness via Yao’s XOR Lemma. In this subsection we show that the construction used in the first step (i.e., Construction 7.11) actually yields a strongly inapproximable predicate. Thus, we provide an alternative proof of Theorem 7.10. Specifically, we show that a strongly inapproximable predicate (as asserted in Theorem 7.10) can be obtained by combining Construction 7.11
266
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
(with a suitable choice of parameters) and the innerproduct construction (of Theorem 7.8). The main ingredient of this argument is captured by the following result. Proposition 7.16: Suppose that there exists a Boolean function f in E having circuit complexity that is almosteverywhere greater than S, and let ε : N → [0, 1] satisfying ε(n) > 2−n . Let f n be the restriction of f to {0, 1}n , and let fˆn be the function obtained from f n when applying Construction 7.1118 with H  = n/ε(n) and F = H 3 . Then, the function fˆ : {0, 1}∗ → {0, 1}∗ , defined by fˆ(x) = fˆx/3 (x), is computable in exponential time and for every family of circuit {Cn }n ∈N of size S (n ) = poly(ε(n /3)/n ) · S(n /3) it holds that Pr[Cn (Un ) = fˆ(Un )] < def ε (n ) = ε(n /3). Before turning to the proof of Proposition 7.16, let us describe how it yields an alternative proof of Theorem 7.10. Firstly, for some γ > 0, Proposition 7.16 yields an exponentialtime computable function fˆ such that  fˆ(x) ≤ x and for every family of circuit {Cn }n ∈N of size S (n ) = S(n /3)γ /poly(n ) it holds that Pr[Cn (Un ) = fˆ(Un )] < 1/S (n ). Combining this with Theorem 7.8 (cf. Exercise 7.14), we infer that P(x, r ) = b( fˆ(x), r ), where r  =  fˆ(x) ≤ x, is S inapproximable for S (n ) = S (n /2)(1) /poly(n ). In particular, for every polynomial p, we obtain a pinapproximable predicate in E by applying the foregoing with S(n) = poly(n, p(n)). Thus, Theorem 7.10 follows. Teaching note: The following material is very advanced and is best left for independent reading. Furthermore, its understanding requires being comfortable with basic notions of errorcorrecting codes (as presented in Appendix E.1.1).
Proposition 7.16 is proved by observing that the transformation of f n to fˆn constitutes a “strong” code and that any such code provides a worstcase to (strongly) averagecase reduction. For starters, we note that the mapping f n !→ fˆn is closely related to the ReedMuller code (see §E.1.2.4), whereas we already saw a connection between “hardness amplification” and coding theory (see discussion at the end of Section 7.1.3). In the current context (of reducing the worstcase computation of f n to an averagecase computation of fˆn ), we seek a relatively small circuit that computes f n (correctly on each input) when given oracle access to any function f that agrees with fˆn on a sufficiently large fraction of the domain. Actually, we may relax the requirement by (switching the order of quantifiers and) allowing the small (oracleaided) circuit to depend on f (as well as on f n ), because f represents some small circuit that violates the averagecase complexity requirement with respect to fˆn (and so the combined circuit will violate the worstcase hypothesis regarding f n ). Furthermore, wanting f n ∈ E to imply fˆn ∈ E, we wish the mapping f n !→ fˆn to be computable in time 2 O(n) = poly( f n ). These considerations lead to the following definition of a class of codes, which contain the encoding that underlies the foregoing mapping f n !→ fˆn . 18
Recall that in Construction 7.11 we have H m = 2n , which may yield a noninteger m if we insist on H  = n/ε(n). This problem was avoided in the proof of Theorem 7.12 (where H  = n was used), but is more acute in the current context because of ε (e.g., we may have ε(n) = 2−2n/7 ). Thus, we should either relax the requirement H m = 2n (e.g., allow 2n ≤ H m < 22n ) or relax the requirement H  = n/ε(n). However, for the sake of simplicity, we ignore this issue in the presentation.
267
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
Definition 7.17 (efficient codes supporting implicit decoding): For fixed functions q, : N → N and α : N → (0, 1], the mapping : {0, 1}∗ → {0, 1}∗ is said to be efficient and supports implicit decoding with parameters q, , α if it satisfies the following two conditions: 1. Encoding (or efficiency): The mapping is polynomialtime computable. It is instructive to view as mapping N bit long strings to sequences of length (N ) over [q(N )], and to view each (codeword) (x) ∈ [q(x)](x) as a mapping from [(x)] to [q(x)]. 2. Decoding (in implicit form): There exists a polynomial p such that the following holds. For every w : [(N )] → [q(N )] and every x ∈ {0, 1} N such that (x) is (1 − α(N ))close to w, there exists an oracleaided19 circuit C of size p((log N )/α(N )) such that, for every i ∈ [N ], it holds that C w (i) equals the i th bit of x. The encoding condition implies that is polynomially bounded. The decoding condition refers to any codeword that agrees with the oracle w : [(N )] → [q(N )] on an α(N ) fraction of the (N ) coordinates, where α(N ) may be very small. We highlight the nontriviality of the decoding condition: There are N bits of information in x, while the circuit C may “encode” only p((log N )/α(N )) bits of information about x. Thus, x is (implicitly) recovered by C based mainly on a highly corrupted version of (x). Furthermore, each desired bit of x is recovered (by C) by making at most p((log N )/α(N )) queries to this corrupted version of (x). We mention that the foregoing decoding condition is related to list decoding (as defined in Appendix E.1.1).20 Let us now relate the transformation of f n to fˆn , which underlies Proposition 7.16, to Definition 7.17. We view f n as a binary string of length N = 2n (representing the truth table m 3 of f n : H m → {0, 1}) and analogously view fˆn : F m → F as an element of F F = F N (or as a mapping from [N 3 ] to [F]).21 Recall that the transformation of f n to fˆn is efficient. We mention that this transformation also supports implicit decoding with parameters q, , α such that (N ) = N 3 , α(N ) = ε(n), and q(N ) = (n/ε(n))3 , where N = 2n . The latter fact is highly nontrivial, but establishing it is beyond the scope of the current text (and the interested reader is referred to [218]). We mention that the transformation of f n to fˆn enjoys additional features, which are not required in Definition 7.17 and will not be used in the current context. For example, there are at most O(1/α(2n )2 ) codewords (i.e., fˆn ’s) that are (1 − α(2n ))close to any fixed w : [(2n )] → [q(2n )], and the corresponding oracleaided circuits can 19 Oracleaided circuits are defined analogously to oracle Turing machines. Alternatively, we may consider here oracle machines that take advice such that both the advice length and the machine’s running time are upperbounded by p((log N )/α(N )). The relevant oracles may be viewed either as blocks of binary strings that encode sequences over [q(N )] or as sequences over [q(N )]. Indeed, in the latter case we consider nonbinary oracles, which return elements in [q(N )]. 20 Recall that, on input w ∈ [q(N )](N ) , a listdecoding algorithm for : {0, 1} N → [q(N )](N ) is required to output the list L w containing every string x ∈ {0, 1} N such that (x) agrees with w on α(N ) fraction of the locations. Turning to the foregoing decoding condition (of Definition 7.17), note that it requires outputting (bits of) one specific string x ∈ L w . This can be obtained by hardwiring (in the listdecoding circuit) the index of x in L w , while taking advantage of the fact that the circuit may depend on x and w. Note, however, that the circuit obtained in this way may not satisfy the stringent decoding condition of Definition 7.17, which requires a circuit of p((log N )/α(N ))size. On the other hand, the decoding condition does not refer to the complexity of obtaining the aforementioned oracleaided circuits (and, in particular, may not yield a listdecoding algorithm). 21 Recall that N = 2n = H m and F = H 3 . Hence, Fm = N 3 .
268
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
be constructed in probabilistic p(n/α(2n ))time.22 These results are termed “list decoding with implicit representations” (and we refer the interested reader again to [218]). Our focus is on showing that efficient codes that supports implicit decoding suffice for worstcase to (strongly) averagecase reductions. We state and prove a general result, noting that in the special case of Proposition 7.16 gn = fˆn (and (2n ) = 23n ). Theorem 7.18: Suppose that there exists a Boolean function f in E having circuit complexity that is almosteverywhere greater than S, and let ε : N → (0, 1]. Consider a polynomial : N → N such that n !→ log2 (2n ) is a 11 map of the integers, and let m(n) = log2 (2n ); e.g., if (N ) = N 3 then m(n) = 3n. Suppose that the mapping : {0, 1}∗ → {0, 1}∗ is efficient and supports implicit decoding with parameters q, , α such that α(N ) = ε(log2 N ). Define gn : [(2n )] → [q(2n )] such n that gn (i) equals the i th element of ( f n ) ∈ [q(2n )](2 ) , where f n denotes the 2n bit long description of the truth table of f n . Then, the function g : {0, 1}∗ → {0, 1}∗ , defined by g(z) = gm −1 (z) (z), is computable in exponential time and for every family of circuit {Cn }n ∈N of size S (n ) = poly(ε(m −1 (n ))/n ) · S(m −1 (n )) it holds that def Pr[Cn (Un ) = g(Un )] < ε (n ) = ε(m −1 (n )). Proof Sketch: First note that we can generate the truth table of fn in exponential time, and by the encoding condition of it follows that gn can be evaluated in exponential time. The averagecase hardness of g is established via a reducibility argument as follows. We consider a circuit C = Cn of size S such that Pr[Cn (Un ) = g(Un )] < ε (n ), let n = m −1 (n ), and recall that ε (n ) = ε(n) = α(2n ). Then, C : {0, 1}n → n {0, 1} (viewed as a function) is (1 − α(2 ))close to the function gn , which in turn equals ( f n ). The decoding condition of asserts that we can recover each bit of f n (i.e., evaluate f n ) by an oracleaided circuit D of size p(n/α(2n )) that uses (the function) C as an oracle. Combining (the circuit C ) with the oracleaided circuit D, we obtain a (standard) circuit of size p(n/α(2n )) · S (n ) < S(n) that computes f n . The theorem follows (i.e., the violation of the conclusion regarding g implies the violation of the hypothesis regarding f ). Advanced comment. For simplicity, we formulated Definition 7.17 in a crude manner that suffices for proving Proposition 7.16, where q(N ) = ((log2 N )/α(N ))3 . The issue is the existence of codes that satisfy Definition 7.17: In general, such codes may exist only when using a more careful formulation of the decoding condition that refers to codewords that are (1 − ((1/q(N )) + α(N )))close to the oracle w : [(N )] → [q(N )] rather than being (1 − α(N ))close to it.23 Needless to say, the difference is insignificant in the case that 22
The construction may yield also oracleaided circuits that compute the decoding of codewords that are almost (1 − α(2n ))close to w. That is, there exists a probabilistic p(n/α(2n ))time algorithm that outputs a list of circuits that, with high probability, contains an oracleaided circuit for the decoding of each codeword that is (1 − α(2n ))close to w. Furthermore, with high probability, the list contains only circuits that decode codewords that are (1 − α(2n )/2)close to w. 23 Note that this is the “right” formulation, because in the case that α(N ) < 1/q(N ) it seems impossible to satisfy the decoding condition (as stated in Definition 7.17). Specifically, a random (N )sequence over [q(N )] is expected to be (1 − (1/q(N )))close to any fixed codeword, and with overwhelmingly high probability it will be (1 − ((1 − o(1))/q(N )))close to almost all the codewords, provided (N ) q(N )2 . But in case N > poly(q(N )),
269
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
α(N ) 1/q(N ) (as in Proposition 7.16), but it is significant in case we care about binary codes (i.e., q(N ) = 2, or codes over other small alphabets). We mention that Theorem 7.18 can be adapted to this context (of q(N ) = 2), and directly yields strongly inapproximable predicates. For details, see Exercise 7.21.
7.2.2. Amplification with Respect to ExponentialSize Circuits For the purpose of stronger derandomization of BPP, we start with a stronger assumption regarding the worstcase circuit complexity of E and turn it to a stronger inapproximability result. Theorem 7.19: Suppose that there exists a Boolean function f in E having almosteverywhere exponential circuit complexity; that is, there exists a constant b > 0 such that, for all but finitely many n’s, any circuit that correctly computes f on def {0, 1}n has size at least 2b·n . Then, for some constant c > 0 and T (n) = 2c·n , there exists a T inapproximable Boolean function in E. Theorem 7.19 can be used for deriving a full derandomization of BPP (i.e., BPP = P) under the aforementioned assumption (see Part 1 of Theorem 8.19). Theorem 7.19 follows as a special case of Proposition 7.16 (combined with Theorem 7.8; see Exercise 7.22). An alternative proof, which uses different ideas that are of independent interest, will be briefly reviewed next. The starting point of the latter proof is a mildly inapproximable predicate, as provided by Theorem 7.12. However, here we cannot afford to apply Yao’s XOR Lemma (i.e., Theorem 7.13), because the latter relates the size of circuits that strongly fail to approximate a predicate defined over poly(n)bit long strings to the size of circuits that fail to mildly approximate a predicate defined over nbit long strings. That is, Yao’s XOR Lemma asserts that if f : {0, 1}n → {0, 1} is mildly inapproximable by S f size circuits then F : {0, 1}poly(n) → {0, 1} is strongly inapproximable by S F size circuits, where S F (poly(n)) is polynomially related to S f (n). In particular, S F (poly(n)) < S f (n) seems inherent in this reasoning. For the case of polynomial lower bounds, this is good enough (i.e., if S f can be an arbitrarily large polynomial then so can S F ), but for S f (n) = exp((n)) we cannot obtain S F (m) = exp((m)) (but rather only obtain S F (m) = exp(m (1) )). The source of trouble is that amplification of inapproximability was achieved by taking a polynomial number of independent instances. Indeed, we cannot hope to amplify hardness without applying f on many instances, but these instances need not be independent. Thus, poly(n) the idea is to define F(r ) = ⊕i=1 f (xi ), where x1 , . . . , xpoly(n) ∈ {0, 1}n are generated from r and still r  = O(n). That is, we seek a “derandomized” version of Yao’s XOR Lemma. In other words, we seek a “pseudorandom generator” of a type appropriate for expanding r to dependent xi ’s such that the XOR of the f (xi )’s is as inapproximable as it would have been for independent xi ’s.24
we cannot hope to recover almost all N bit long strings based on poly(q(N ) log N ) bits of advice (per each of them). 24 Indeed, this falls within the general paradigm discussed in Section 8.1. Furthermore, this suggestion provides another perspective on the connection between randomness and computational difficulty, which is the focus of much discussion in Chapter 8 (see, e.g., §8.2.7.2).
270
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
Teaching note: In continuation of footnote 24, we note that there is a strong connection between the rest of this section and Chapter 8. On top of the aforementioned conceptual aspect, we will use technical tools from Chapter 8 toward establishing the derandomized version of the XOR Lemma. These tools include pairwise independence generators (see Section 8.5.1), random walks on expanders (see Section 8.5.3), and the NisanWigderson Construction (Construction 8.17). Indeed, recall that Section 7.2.2 is advanced material, which is best left for independent reading.
The pivot of the proof is the notion of a hard region of a Boolean function. Loosely speaking, S is a hard region of a Boolean function f if f is strongly inapproximable on a random input in S; that is, for every (relatively) small circuit Cn , it holds that Pr[Cn (Un ) = f (Un )Un ∈ S] ≈ 1/2. By definition, {0, 1}∗ is a hard region of any strongly inapproximable predicate. As we shall see, any mildly inapproximable predicate has a hard region of density related to its inapproximability parameter. Loosely speaking, hardness amplification will proceed via methods for generating related instances that hit the hard region with sufficiently high probability. But, first let us study the notion of a hard region.
7.2.2.1. Hard Regions We actually generalize the notion of hard regions to arbitrary distributions. The important special case of uniform distributions (on nbit long strings) is obtained from Definition 7.20 by letting X n equal Un (i.e., the uniform distribution over {0, 1}n ). In general, we only assume that X n ∈ {0, 1}n . Definition 7.20 (hard region relative to arbitrary distribution): Let f : {0, 1}∗ → {0, 1} be a Boolean predicate, {X n }n∈N be a probability ensemble, s : N → N and ε : N → [0, 1]. • We say that a set S is a hard region of f relative to {X n }n∈N with respect to s(·)size circuits and advantage ε(·) if for every n and every circuit Cn of size at most s(n), it holds that 1 Pr[Cn (X n ) = f (X n )X n ∈ S] ≤ + ε(n). 2 • We say that f has a hard region of density ρ(·) relative to {X n }n∈N (with respect to s(·)size circuits and advantage ε(·)) if there exists a set S that is a hard region of f relative to {X n }n∈N (with respect to the foregoing parameters) such that Pr[X n ∈ Sn ] ≥ ρ(n). Note that a Boolean function f is (s, 1 − 2ε)inapproximable if and only if {0, 1}∗ is a hard region of f relative to {Un }n∈N with respect to s(·)size circuits and advantage ε(·). Thus, strongly inapproximable predicates (e.g., Sinapproximable predicates for superpolynomial S) have a hard region of density 1 (with respect to a negligible advantage).25 But this trivial observation does not provide hard regions (with respect to a small (i.e., close to zero) advantage) for mildly inapproximable predicates. Providing such hard regions is the contents of the following theorem. 25
Likewise, mildly inapproximable predicates have a hard region of density 1 with respect to an advantage that is noticeably smaller than 1/2.
271
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
Theorem 7.21 (hard regions for mildly inapproximable predicates): Let f : {0, 1}∗ → {0, 1} be a Boolean predicate, {X n }n∈N be a probability ensemble, s : N → N, and ρ : N → [0, 1] such that ρ(n) > 1/poly(n). Suppose that, for every circuit Cn of size at most s(n), it holds that Pr[Cn (X n ) = f (X n )] ≤ 1 − ρ(n). Then, for every ε : N → (0, 1], the function f has a hard region of density ρ (·) relative to {X n }n∈N with respect to s (·)size circuits and advantage ε(·), where def def ρ (n) = (1 − o(1)) · ρ(n) and s (n) = s(n)/poly(n/ε(n)). In particular, if f is (s, 2ρ)inapproximable then f has a hard region of density ρ (·) ≈ ρ(·) relative to the uniform distribution (with respect to s (·)size circuits and advantage ε(·)). Proof Sketch: 26 The proof proceeds by first establishing that {X n } is “related” to (or rather “dominates”) an ensemble {Yn } such that f is strongly inapproximable on {Yn }, and next showing that this implies the claimed hard region. Indeed, this notion of “related ensembles” plays a central role in the proof. For ρ : N → [0, 1], we say that {X n } ρ dominates {Yn } if for every x it holds that Pr[X n = x] ≥ ρ(n) · Pr[Yn = x]. In this case we also say that {Yn } is ρ dominated by {X n }. We say that {Yn } is critically ρ dominated by {X n } if for every x either Pr[Yn = x] = (1/ρ(n)) · Pr[X n = x] or Pr[Yn = x] = 0.27 The notions of domination and critical domination play a central role in the proof, which consists of two parts. In the first part (Claim 7.21.1), we prove that, for {X n } and ρ as in the theorem’s hypothesis, there exists an ensemble {Yn } that is ρdominated by {X n } such that f is strongly inapproximable on {Yn }. In the second part (Claim 7.21.2), we prove that the existence of such a dominated ensemble implies the existence of an ensemble {Z n } that is critically ρ dominated by {X n } such that f is strongly inapproximable on {Z n }. Finally, we note that such a critically dominated ensemble yields a hard region of f relative to {X n }, and the theorem follows. Claim 7.21.1: Under the hypothesis of the theorem it holds that there exists a prob
ability ensemble {Yn } that is ρdominated by {X n } such that, for every s (n)size circuit Cn , it holds that
Pr[Cn (Yn ) = f (Yn )] ≤
1 ε(n) + 2 2 .
(7.13)
Proof: We start by assuming, toward the contradiction, that for every distribution Yn
that is ρdominated by X n there exists a s (n)size circuit Cn such that Pr[Cn (Yn ) = f (Yn )] > 0.5 + ε (n), where ε (n) = ε(n)/2. One key observation is that there is a correspondence between the set of all distributions that are each ρdominated by X n and the set of all the convex combination of critically ρdominated (by X n ) distributions; that is, each ρdominated distribution is a convex combination of critically ρdominated distributions and vice versa (cf., a special case in §D.4.1.1). Thus, considering an enumeration Yn(1) , . . . , Yn(t) of the critically ρdominated (by X n ) distributions, we conclude that for every distribution π on [t] there exists a 26
See details in [102, Apdx. A]. Actually, we should allow one point of exception, that is, relax the requirement by saying that for at most one string x ∈ {0, 1}n it holds that 0 < Pr[Yn = x] < Pr[X n = x]/ρ(n). This point has little effect on the proof, and is ignored in our presentation. 27
272
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
s (n)size circuit Cn such that t ! π(i) · Pr Cn Yn(i) = f Yn(i) > 0.5 + ε (n).
(7.14)
i=1
Now, consider a finite game between two players, where the first player selects a critically ρdominated (by X n ) distribution, and the second player selects a s (n)size circuit and obtains a payoff as determined by the corresponding success probability; that is, if the first player selects the i th critically dominated distribution and the second player selects the circuit C then the payoff equals Pr[C(Yn(i) ) = f (Yn(i) )]. Eq. (7.14) may be interpreted as saying that for any randomized strategy for the first player there exists a deterministic strategy for the second player yielding average payoff greater than 0.5 + ε (n). The MinMax Principle (cf. von Neumann [235]) asserts that in such a case there exists a randomized strategy for the second player that yields average payoff greater than 0.5 + ε (n) no matter what strategy is employed by the first player. This means that there exists a distribution, denoted Dn , on s (n)size circuits such that for every i it holds that ! Pr Dn Yn(i) = f Yn(i) > 0.5 + ε (n), (7.15) where the probability refers both to the choice of the circuit Dn and to the random variable Yn . Let Bn = {x : Pr[Dn (x) = f (x)] ≤ 0.5 + ε (n)}. Then, Pr[X n ∈ Bn ] < ρ(n), because otherwise we reach a contradiction to Eq. (7.15) by defining Yn such that Pr[Yn = x] = Pr[X n = x]/Pr[X n ∈ Bn ] if x ∈ Bn and Pr[Yn = x] = 0 otherwise.28 By employing standard amplification to Dn , we obtain a distribution Dn over poly(n/ε (n)) · s (n)size circuits such that for every x ∈ {0, 1}n \ Bn it holds that Pr[Dn (x) = f (x)] > 1 − 2−n . It follows that there exists a s(n)sized circuit Cn such that Cn (x) = f (x) for every x ∈ {0, 1}n \ Bn , which implies that Pr[Cn (X n ) = f (X n )] ≥ Pr[X n ∈ {0, 1}n \ Bn ] > 1 − ρ(n), in contradiction to the 2 theorem’s hypothesis. The claim follows. We next show that the conclusion of Claim 7.21.1 (which was stated for ensembles that are ρdominated by {X n }) essentially holds also when allowing only critically ρdominated (by {X n }) ensembles. The following precise statement involves some loss in the domination parameter ρ (as well as in the advantage ε). Claim 7.21.2: If there exists a probability ensemble {Yn } that is ρdominated by
{X n } such that for every s (n)size circuit Cn it holds that Pr[Cn (Yn ) = f (Yn )] ≤ 0.5 + (ε(n)/2), then there exists a probability ensemble {Z n } that is critically ρ dominated by {X n } such that for every s (n)size circuit Cn it holds that Pr[Cn (Z n ) = f (Z n )] ≤ 0.5 + ε(n). In other words, Claim 7.21.2 asserts that the function f has a hard region of density ρ (·) relative to {X n } with respect to s (·)size circuits and advantage ε(·), thus establishing the theorem. The proof of Claim 7.21.2 uses the Probabilistic Method (cf. [11]). Specifically, we select a set Sn at random by including each nbit long string x with probability def ρ(n) · Pr[Yn = x] p(x) = ≤1 (7.16) Pr[X n = x] Note that Yn is ρdominated by X n , whereas by the hypothesis Pr[Dn (Yn ) = f (Yn )] ≤ 0.5 + ε (n). Using the fact that any ρdominated distribution is a convex combination of critically ρdominated distributions, it follows that (i) (i) (i) Pr[Dn (Yn ) = f (Yn )] ≤ 0.5 + ε (n) holds for some critically ρdominated Yn . 28
273
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
independently of the choice of all other strings. It can be shown that, with high probability over the choice of Sn , it holds that Pr[X n ∈ Sn ] ≈ ρ(n) and that Pr[Cn (X n ) = f (X n )X n ∈ Sn ] < 0.5 + ε(n) for every circuit Cn of size s (n). The latter assertion is proved by a union bound on all relevant circuits, while showing that for each such circuit Cn , with probability 1 − exp(−s (n)2 ) over the choice of Sn , it holds that Pr[Cn (X n ) = f (X n )X n ∈ Sn ] − Pr[Cn (Yn ) = f (Yn )] < ε(n)/2. For details, see [102, Apdx. A]. (This completes the proof of the theorem.)
7.2.2.2. Hardness Amplification via Hard Regions Before showing how to use the notion of a hard region in order to prove a derandomized version of Yao’s XOR Lemma, we show how to use it in order to prove the original version of Yao’s XOR Lemma (i.e., Theorem 7.13). An alternative proof of Yao’s XOR Lemma. Let f , p1 , and p2 be as in Theorem 7.13. Then, by Theorem 7.21, for ρ (n) = 1/3 p2 (n) and s (n) = p1 (n)(1) /poly(n), the function f has a hard region S of density ρ (relative to {Un }) with respect to s (·)size circuits and advantage 1/s (·). Thus, for t(n) = n · p2 (n) and F as in Theorem 7.13, with probability at least 1 − (1 − ρ (n))t(n) = 1 − exp(−(n)), one of the t(n) random (nbit long) blocks of F resides in S (i.e., the hard region of f ). Intuitively, this suffices for establishing the strong inapproximability of F. Indeed, suppose toward the contradiction that a small (i.e., p (t(n) · n)size) circuit Cn can approximate F (over Ut(n)·n ) with advantage ε(n) + exp(−(n)), where ε(n) > 1/s (n). Then, the ε(n) term must be due to t(n) · nbit long inputs that contain a block in S. Using an averaging argument, we can first fix the index of this block and then the contents of the other blocks, and infer the following: For some i ∈ [t(n)] and x1 , . . . , xt(n) ∈ {0, 1}n it holds that
Pr[Cn (x , Un , x ) = F(x , Un , x )  Un ∈ S] ≥
1 + ε(n) 2
where x = (x1 , . . . , xi−1 ) ∈ {0, 1}(i−1)·n and x = (xi+1 , . . . , xt(n) ) ∈ {0, 1}(t(n)−i)·n . Hardwiring i ∈ [t(n)], x = (x1 , . . . , xi−1 ) and x = (xi+1 , . . . , xt(n) ) as well as def σ = ⊕ j=i f (x j ) in Cn , we obtain a contradiction to the (established) fact that S is a hard region of f (by using the circuit Cn (z) = Cn (x , z, x ) ⊕ σ ). Thus, Theorem 7.13 follows (for any p (t(n) · n) ≤ s (n) − 1). Derandomized versions of Yao’s XOR Lemma. We first show how to use the notion of a hard region in order to amplify very mild inapproximability to a constant level of inapproximability. Recall that our goal is to obtain such an amplification while applying the given function on many (related) instances, where each instance has length that is linearly related to the length of the input of the resulting function. Indeed, these related instances are produced by applying an adequate “pseudorandom generator” (see Chapter 8). The following amplification utilizes a pairwise independence generator (see Section 8.5.1), denoted G, that stretches 2nbit long seeds to sequences of n strings, each of length n. Lemma 7.22 (derandomized XOR Lemma up to constant inapproximability): Suppose that f : {0, 1}∗ → {0, 1} is (T, ρ)inapproximable, for ρ(n) > 1/poly(n), and assume for simplicity that ρ(n) ≤ 1/n. Let b denote the innerproduct mod 2 predicate, and G be the aforementioned pairwise independence generator. Then F1 (s, r ) = b( f (x1 ) · · · f (xn ), r ), where r  = n = s/2 and (x1 , . . . , xn ) =
274
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
7.2. HARD PROBLEMS IN E
G(s), is (T , ρ )inapproximable for T (n ) = T (n /3)/poly(n ) and ρ (n ) = (n · ρ(n /3)). Needless to say, if f ∈ E then F1 ∈ E. By applying Lemma 7.22 for a constant number of times, we may transform a (T, 1/poly)inapproximable predicate into a (T , (1))inapproximable one, where T (n ) = T (n /O(1))/poly(n ). Proof Sketch: As in the foregoing proof (of the original version of Yao’s XOR Lemma), we first apply Theorem 7.21. This time we set the parameters so as to infer that, for α(n) = ρ(n)/3 and t (n) = T (n)/poly(n), the function f has a hard region S of density α (relative to {Un }) with respect to t (·)size circuits and advantage 0.01. Next, as in §7.2.1.2, we shall consider the corresponding (derandomized) direct product problem; that is, the function P1 (s) = ( f (x1 ), . . . , f (xn )), where s = 2n and (x1 , . . . , xn ) = G(s). We will first show that P1 is hard to compute on an (n · α(n)) fraction of the domain, and the quantified inapproximality of F1 will follow. def One key observation is that, by Exercise 7.23, with probability at least β(n) = n · α(n)/2, at least one of the n strings output by G(U2n ) resides in S. Intuitively, we expect every t (n)sized circuit to fail in computing P1 (U2n ) with probability at least 0.49β(n), because with probability β(n) the sequence G(U2n ) contains an element in the hard region of f (and in this case the value can be guessed correctly with probability at most 0.51). The actual proof relies on a reducibility argument, which is less straightforward than the argument used in the nonderandomized case. For technical reasons,29 we use the condition α(n) < 1/2n (which is guaranteed by the hypothesis that ρ(n) ≤ 1/n and our setting of α(n) = ρ(n)/3). In this case def Exercise 7.23 implies that, with probability at least β(n) = 0.75 · n · α(n), at least one of the n strings output by G(U2n ) resides in S. We shall show that every (t (n) − poly(n))sized circuit fails in computing P1 with probability at least γ (n) = 0.3β(n). As usual, the claim is proved by a reducibility argument. Let G(s)i denote the i th string in the sequence G(s) (i.e., G(s) = (G(s)1 , . . . , G(s)n )), and note that def given i and x we can efficiently sample G i−1 (x) = {s ∈ {0, 1}2n : G(s)i = x}. Given a circuit Cn that computes P1 (U2n ) correctly with probability 1 − γ (n), we consider the circuit Cn that, on input x, uniformly selects i ∈ [n] and s ∈ G i−1 (x), and outputs the i th bit in Cn (s). Then, by the construction (of Cn ) and the hypothesis regarding Cn , it holds that
Pr[Cn (Un ) =
f (Un )Un ∈ S] ≥
n 1 i=1
29
n
· Pr[Cn (U2n ) = P1 (U2n )G(U2n )i ∈ S]
≥
Pr[Cn (U2n ) = P1 (U2n ) ∧ ∃i G i (U2n )i ∈ S] n · maxi {Pr[G(U2n )i ∈ S]}
≥
(1 − γ (n)) − (1 − β(n)) n · α(n)
=
0.7β(n) > 0.52 . n · α(n)
The following argument will rely on the fact that β(n) − γ (n) > 0.51n · α(n), where γ (n) = (β(n)).
275
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
This contradicts the fact that S is a hard region of f with respect to t (·)size circuits and advantage 0.01. Thus, we have established that every (t (n) − poly(n))sized circuit fails in computing P1 with probability at least γ (n) = 0.3β(n). Having established the hardness of P1 , we now infer the mild inapproximability of F1 , where F1 (s, r ) = b(P1 (s), r ). It suffices to employ the simple (warmup) case discussed at the beginning of the proof of Theorem 7.7 (where the predictor errs with probability less than 1/4, rather than the fullfledged result that refers to a prediction error that is only smaller than 1/2). Denoting by ηC (s) = Prr ∈{0,1}n [C(s, r ) = b(P1 (s), r )] the prediction error of the circuit C, we recall that if ηC (s) ≤ 0.24 then C can be used to recover P1 (s). Thus, for circuits C of size T (3n) = t (n)/poly(n) it must hold that Prs [ηC (s) > 0.24] ≥ γ (n). It follows that Es [ηC (s)] > 0.24γ (n), which means that every T (3n)sized circuits fails def to compute (s, r ) !→ b(P1 (s), r ) with probability at least δ(s + r ) = 0.24 · γ (r ). This means that F1 is (T , 2δ)inapproximable, and the lemma follows (when noting that δ(n ) = (n · α(n /3))). The next lemma offers an amplification of constant inapproximability to strong inapproximability. Indeed, combining Theorem 7.12 with Lemmas 7.22 and 7.23 yields Theorem 7.19 (as a special case). Lemma 7.23 (derandomized XOR Lemma starting with constant inapproximability): Suppose that f : {0, 1}∗ → {0, 1} is (T, ρ)inapproximable, for some constant ρ, and let b denote the innerproduct mod 2 predicate. Then there exists an exponentialtime computable function G such that F2 (s, r ) = b( f (x1 ) · · · f (xn ), r ), where (x1 , . . . , xn ) = G(s) and n = (s) = r  = x1  = · · · = xn , is T inapproximable for T (n ) = T (n /O(1))(1) /poly(n ). Again, if f ∈ E then F2 ∈ E. Proof Outline:30 As in the proof of Lemma 7.22, we start by establishing a hard region of density ρ/3 for f (this time with respect to circuits of size T (n)(1) / poly(n) and advantage T (n)−(1) ), and focus on the analysis of the (derandomized) direct product problem corresponding to computing the function P2 (s) = ( f (x1 ), . . . , f (xn )), where s = O(n) and (x1 , . . . , xn ) = G(s). The “generator” G is defined such that G(s s ) = G 1 (s ) ⊕ G 2 (s ), where s  = s , G 1 (s ) = G 2 (s ), and the following conditions hold: 1. G 1 is the Expander Random Walk Generator discussed in Section 8.5.3. It can be shown that G 1 (U O(n) ) outputs a sequence of n strings such that for any set S of density ρ, with probability 1 − exp(−(ρn)), at least (ρn) of the strings hit S. Note that this property is inherited by G, provided G 1 (s ) = G 2 (s ) for any s  = s . It follows that, with probability 1 − exp(−(ρn)), a constant fraction of the xi ’s in the definition of P2 hit the hard region of f . It is tempting to say that small circuits cannot compute P2 better than with probability exp(−(ρn)), but this is clear only in the case that the xi ’s that hit the hard region are distributed independently (and uniformly) in it, which is hardly the case here. Indeed, G 2 is used to handle this problem. 30
For details, see [128].
276
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER NOTES
2. G 2 is the “set projection” system underlying Construction 8.17; specifically, G 2 (s) = (s S1 , . . . , s Sn ), where each Si is an nsubset of [s] and the Si ’s have pairwise intersections of size at most n/O(1).31 An analysis as in the proof of Theorem 8.18 can be employed for showing that the dependency among the xi ’s does not help for computing a particular f (xi ) when given xi as well as all the other f (x j )’s. (Note that this property of G 2 is inherited by G.) The actual analysis of the construction (via a guessing game presented in [128, Sec. 3]), links the success probability of computing P2 to the advantage of guessing f on its hard region. The interested reader is referred to [128]. Digest. Both Lemmas 7.22 and 7.23 are proved by first establishing corresponding derandomized versions of the “direct product” lemma (Theorem 7.14); in fact, the core of these proofs is proving adequate derandomized “direct product” lemmas. We call the reader’s attention to the seemingly crucial role of this step (especially in the proof of Lemma 7.23): We cannot treat the values f (x1 ), ... f (xn ) as if they were independent (at least not for the generator G as postulated in these lemmas), and so we seek to avoid analyzing the probability of correctly computing the XOR of all these values. In contrast, we have established that it is very hard to correctly compute all n values, and thus XORing a random subset of these values yields a strongly inapproximable predicate. (Note that the argument used in Exercise 7.17 fails here, because the xi ’s are not independent, which is the reason that we XOR a random subset of these values rather than all of them.)
Chapter Notes The notion of a oneway function was suggested by Diffie and Hellman [66]. The notion of weak oneway functions as well as the amplification of oneway functions (i.e., Theorem 7.5) were suggested by Yao [239]. A proof of Theorem 7.5 has first appeared in [87]. The concept of hardcore predicates was suggested by Blum and Micali [41]. They also proved that a particular predicate constitutes a hardcore for the “DLP function” (i.e., exponentiation in a finite field), provided that the latter function is oneway. The generic hardcore predicate (of Theorem 7.7) was suggested by Levin, and proven as such by Goldreich and Levin [99]. The proof presented here was suggested by Rackoff. We comment that the original proof has its own merits (cf., e.g., [105]). The construction of canonical derandomizers (see Section 8.3) and, specifically, the NisanWigderson framework (i.e., Construction 8.17) has been the driving force behind the study of inapproximable predicates in E. Theorem 7.10 is due to [22], whereas Theorem 7.19 is due to [128]. Both results rely heavily on variants of Yao’s XOR Lemma, to be reviewed next. Like several other fundamental insights32 attributed to Yao’s paper [239], Yao’s XOR Lemma (i.e., Theorem 7.13) is not even stated in [239] but is rather due to Yao’s oral presentations of his work. The first published proof of Yao’s XOR Lemma was given by Levin (see [102, Sec. 3]). The proof presented in §7.2.1.2 is due to Goldreich, Nisan, and 31
Recall that s S denotes the projection of s on coordinates S ⊆ [s]; that is, for s = σ1 · · · σk and S = {i j : j = 1, . . . , n}, we have s S = σi1 · · · σin . 32 Most notably, the equivalence of pseudorandomness and unpredictability (see Section 8.2.5).
277
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
Wigderson [102, Sec. 5]. For a recent (but brief) review of other proofs of Yao’s XOR Lemma (as well as variants of it), the interested reader is referred to [223]. The notion of a hard region and its applications to proving the original version of Yao’s XOR Lemma are due to Impagliazzo [126] (see also [102, Sec. 4]). The first derandomization of Yao’s XOR Lemma (i.e., Lemma 7.22) also originates in [126], while the second derandomization (i.e., Lemma 7.23) as well as Theorem 7.19 are due to Impagliazzo and Wigderson [128]. The worstcase to averagecase reduction (i.e., §7.2.1.1, yielding Theorem 7.12) is due to [22]. This reduction follows the selfcorrection paradigm of Blum, Luby, and Rubinfeld [40], which was first employed in the context of a (strict)33 worstcase to averagecase reduction by Lipton [157].34 The connection between list decoding and hardness amplification (i.e., §7.2.1.3), yielding alternative proofs of Theorems 7.10 and 7.19, is due to Sudan, Trevisan, and Vadhan [218]. Hardness amplification for N P has been the subject of recent attention: An amplification of mild inapproximability to strong inapproximability is provided in [121], and an indication of the impossibility of a worstcase to averagecase reductions (at least nonadaptive ones) is provided in [43].
Exercises Exercise 7.1: Prove that if one wayfunctions exist then there exist oneway functions that are length preserving (i.e.,  f (x) = x for every x ∈ {0, 1}n ). Guideline: Clearly, for some polynomial p, it holds that  f (x) < p(x) for all x. Assume, without loss of generality, that n !→ p(n) is 11 and increasing, and let p −1 (m) = n if p(n) ≤ m < p(n + 1). Define f (z) = f (x)01z− f (x)−1 , where x is the p −1 (z)bit long prefix of z.
Exercise 7.2: Prove that if a function f is hard to invert in the sense of Definition 7.3 then it is hard to invert in the sense of Definition 7.1. Guideline: Consider a sequence of internal coin tosses that maximizes the probability in Eq. (7.1).
Exercise 7.3: Assuming the existence of oneway functions, prove that there exists a weak oneway function that is not strongly oneway. 33
Earlier uses of the selfcorrection paradigm referred to “two argument problems” and consisted of fixing one argument and randomizing the other (see, e.g., [108]); consider, for example, the decision problem in which given (N , r ) the task is to determine whether x 2 ≡ r (mod N ) has an integer solution, and the randomized process mapping (N , r ) to (N , r ), where r = r · ω2 mod N and ω is uniformly distibuted in [N ]. Loosely speaking, such a process yields a reduction from worstcase complexity to “mixed worst/averagecase” complexity (or from “mixed average/worstcase” to pure averagecase). 34 An earlier use of the selfcorrection paradigm for a strict worstcase to averagecase reduction appears in [19], but it refers to very low complexity classes. Specifically, this reduction refers to the parity function and is computable in AC 0 (implying that parity cannot be approximated in AC 0 , since it cannot be computed in that class (see [83, 240, 115])). The reduction (randomly) maps x ∈ {0, 1}n , viewed as a sequence (x1 , x2 , x3 , . . . , xn ), to the sequence x = (x1 ⊕ r1 , r1 ⊕ x2 ⊕ r2 , r2 ⊕ x3 ⊕ r3 , . . . , rn−1 ⊕ xn ⊕ rn ), where r1 , . . . , rn ∈ {0, 1} are uniformly and independently distributed. Note that x is uniformly distributed in {0, 1}n and that parity(x) = parity(x ) ⊕ rn .
278
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Exercise 7.4 (a universal oneway function (by L. Levin)): Using the notion of a universal machine, present a polynomialtime computable function that is hard to invert (in the sense of Definition 7.1) if and only if there exist oneway functions. Guideline: Consider the function F that parses its input into a pair (M, x) and emulates x3 steps of M on input x. Note that if there exists a oneway function that can be evaluated in cubic time then F is a weak oneway function. Using padding, prove that there exists a oneway function that can be evaluated in cubic time if and only if there exist oneway functions.
Exercise 7.5: For > 1, prove that the following 2 − 1 samples are pairwise independent and uniformly distributed in {0, 1}n . The samples are generated by uniformly and independently selecting strings in {0, 1}n . Denoting these strings by s 1 , . . . , s , we generate 2 − 1 samples corresponding to the different nonempty subsets of {1, 2, . . . , } such def that for subset J we let r J = ⊕ j∈J s j .
Guideline: For J = J , it holds that r J ⊕ r J = ⊕ j∈K s j , where K denotes the symmetric difference of J and J . See related material in Section 8.5.1.
Exercise 7.6 (a variant on the proof of Theorem 7.7): Provide a detailed presentation of the alternative procedure outlined in footnote 5. That is, prove that for every x ∈ {0, 1}n , given oracle access to any Bx : {0, 1}n → {0, 1} that satisfies Eq. (7.6), this procedure makes poly(n/ε) steps and outputs a list of strings that, with probability at least 1/2, contains x. Exercise 7.7 (proving Theorem 7.8): Recall that the proof of Theorem 7.7 establishes the existence of a poly(n/ε)time oracle machine M such that, for every B : {0, 1}n → {0, 1} and every x ∈ {0, 1}n that satisfy Prr [B(r ) = b(x, r )] ≥ 12 + ε, it holds that Pr[M B (n, ε) = x] = (ε2 /n). Show that this implies Theorem 7.8. (Indeed, an alternative proof can be derived by adapting Exercise 7.6.) Guideline: Apply a “coupon collector” argument.
Exercise 7.8: A polynomialtime computable predicate b : {0, 1}∗ → {0, 1} is called a universal hardcore predicate if for every oneway function f , the predicate b is a hardcore of f . Note that the predicate presented in Theorem 7.7 is “almost universal” (i.e., for every oneway function f , that predicate is a hardcore of f (x, r ) = ( f (x), r ), where x = r ). Prove that there exists no universal hardcore predicate. Guideline: Let b be a candidate universal hardcore predicate, and let f be an arbitrary oneway function. Then consider the function f (x) = ( f (x), b(x)).
Exercise 7.9: Prove that if N P is not contained in P/poly then neither is E. Furthermore, for every S : N → N, if some problem in N P does not have circuits of size S then for some constant ε > 0 there exists a problem in E that does not have circuits of size S , where S (n) = S(n ε ). Repeat the exercise for the “almosteverywhere” case. Guideline: Although N P is not known to be in E, it is the case that SAT is in E, which implies that N P is reducible to a problem in E. For the “almosteverywhere” case, address the fact that the said reduction may not preserve the length of the input.
Exercise 7.10: For every function f : {0, 1}n → {0, 1}, present a linearsize circuit Cn such that Pr[C(Un ) = f (Un )] ≥ 0.5 + 2−n . Furthermore, for every t ≤ 2n−1 , present
279
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
a circuit Cn of size O(t · n) such that Pr[C(Un ) = f (Un )] ≥ 0.5 + t · 2−n . Warning: You may not assume that Pr[ f (Un ) = 1] = 0.5. Exercise 7.11 (selfcorrection of lowdegree polynomials): Let d, m be integers, and F def be a finite field of cardinality greater than t = dm + 1. Let p : F m → F be a polynomial of individual degree d, and α1 , . . . , αt be t distinct nonzero elements of F. 1. Show that, for every x, y ∈ F m , the value of p(x) can be efficiently computed from the values of p(x + α1 y), . . . , p(x + αt y), where x and y are viewed as mary vectors over F. 2. Show that, for every x ∈ F m and α ∈ F \ {0}, if we uniformly select r ∈ F m then the point x + αr is uniformly distributed in F m . Conclude that p(x) can be recovered based on t random points, where each point is uniformly distributed in F m . Exercise 7.12 (low degree extension): Prove that for any H ⊂ F and every function f : H m → F there exists an mvariate polynomial fˆ : F m → F of individual degree H  − 1 such that for every x ∈ H m it holds that fˆ(x) = f (x). Guideline: Define fˆ(x) = a∈H m δa (x) · f (a), where δa is an mvariate of individ= 1 whereas δa (x) = 0 for every x ∈ H m \ {a}. ual degree H  − 1 such that δa (a) m Specifically, δa1 ,...,am (x1 , . . . , xm ) = i=1 b∈H \{ai } ((xi − b)/(ai − b)).
Exercise 7.13: Suppose that fˆ and S are as in the conclusion of Theorem 7.12. Prove that there exists a Boolean function g in E that is (S , ε)inapproximable for S (n + O(log n )) = S (n )/n and ε(m) = 1/m 3 . Guideline: Consider the function g defined such that g(x, i) equals the i th bit of fˆ(x).
Exercise 7.14 (a generic application of Theorem 7.8): For any : N → N, let h : {0, 1}∗ → {0, 1}∗ be an arbitrary function (or even a randomized mapping) such that h(x) = (x) for every x ∈ {0, 1}∗ , and {X n }n∈N be a probability ensemble. Suppose that, for some s : N → N and ε : N → (0, 1], for every family of ssize circuits {Cn }n∈N and all sufficiently large n it holds that Pr[Cn (X n ) = h(X n )] ≤ ε(n). Suppose that s : N → N and ε : N → (0, 1] satisfy s (n + (n)) ≤ s(n)/poly(n/ε (n + (n))) and ε (n + (n)) = (n) · ε(n)(1) . Show that Theorem 7.8 implies that for every family of s size circuits {Cn }n ∈N and all sufficiently large n = n + (n) it holds that 1 + ε (n + (n)), 2 where b(y, r ) denotes the innerproduct mod 2 of y and r . Note that if X n is uniform over {0, 1}n then the predicate h (x, r ) = b(h(x), r ), where r  = h(x), is (s , 1 − 2ε )inapproximable. Conclude that, in this case, if ε(n) = 1/s(n) and s (n + (n)) = s(n)(1) /poly(n), then h is s inapproximable. Pr[Cn+(n) (X n , U(n) ) = b(h(X n ), U(n) )] ≤
Exercise 7.15 (reversing Exercise 7.14 (by Viola and Wigderson)): Let : N → N, h : {0, 1}∗ → {0, 1}∗ , {X n }n∈N , and b be as in Exercise 7.14. Let H (x, r ) = b(h(x), r ) and recall that in Exercise 7.14 we reduced guessing h to approximating H . Present a reduction in the opposite direction. That is, show that if H is (s, 1 − ε)inapproximable (over {X n }n∈N ) then every s size circuit succeeds in computing h (over {X n }n∈N ) with probability at most ε, where s (n) = s(n) − O((n)).
280
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
Guideline: As usual, start by assuming the existence of a s size circuit that computes h with success probability exceeding ε. Consider two correlated random variables X and Y , each distributed over {0, 1}(n) , where X represents the value of h(Un ) and Y represents the circuit’s guess for this value. Prove that, for a uniformly distributed def r ∈ {0, 1}(n) , it holds that Pr[b(X, r ) = b(Y, r )] = (1 + p)/2, where p = Pr[X = Y ].
Exercise 7.16 (derandomization via averaging arguments): Let C : {0, 1}n × {0, 1}m → {0, 1} be a circuit, which may represent a “probabilistic circuit” that processes the first input using a sequence of choices that are given as a second input. Let X and Z be two independent random variables distributed over {0, 1}n and {0, 1}m , respectively, and let χ be a Boolean predicate (which may represent a success event regarding the behavior of C). Prove that there exists a string z ∈ {0, 1}m such that for def C z (x) = C(x, z) it holds that Pr[χ(X, C z (X )) = 1] ≥ Pr[χ(X, C(X, Z )) = 1]. Exercise 7.17 (reducing “selective XOR” to “standard XOR”): Let f be a Boolean function, and b(y, r ) denote the innerproduct modulo 2 of the equallength strings y and def r . Suppose that F (x1 , . . . , xt(n) , r ) = b( f (x1 ) · · · f (xt(n) ), r ), where x1 , . . . , xt(n) ∈ {0, 1}n and r ∈ {0, 1}t(n) , is T inapproximable. Assuming that n !→ t(n) · n is 11, def prove that F(x) = F (x, 1t (x) ), where t (t(n) · n) = t(n), is T inapproximable for T (m) = T (m + t (m)) − O(m). Guideline: Reduce the approximation of F to the approximation of F. An important observation is that for any x = (x1 , . . . , xt(n) ), x = (x1 , . . . , xt(n) ), and r = r1 · · · rt(n) such that xi = xi if ri = 1, it holds that F (x, r ) = F(x ) ⊕ ⊕i:ri =0 f (xi ). This suggests a nonuniform reduction of F to F, which uses “adequate” z 1 , . . . , z t(n) ∈ {0, 1}n as well as the corresponding values f (z i )’s as advice. On input x1 , . . . , xt(n) , r1 · · · rt(n) , the reduction sets xi = xi if ri = 1 and xi = z i otherwise, makes the query x = (x1 , . . . , xt(n) ) to F, and returns F(x ) ⊕i:ri =0 f (z i ). Analyze this reduction in the case that z 1 , . . . , z t(n) ∈ {0, 1}n are uniformly distributed, and infer that they can be set to some fixed values (see Exercise 7.16).35
Exercise 7.18 (reducing “standard XOR” to “selective XOR”): In continuation of Exercise 7.17, show a reduction in the opposite direction. That is, for F and F as in Exercise 7.17, show that if F is T inapproximable then F is T inapproximable, where T (m + t (m)) = min(T (m) − O(m), exp(t (m)/O(1)))1/3 . Guideline: Reduce the approximation of F to the approximation of F , using the fact that for any x = (x1 , . . . , xt(n) ) and r = r1 · · · rt(n) it holds that ⊕i∈Sr f (xi ) = F (x, r ), where Sr = {i ∈ [t(n)] : ri = 1}. Note that, with probability 1 − exp(−(t(n)), the set Sr contains at least t(n)/3 indices. Thus, the XOR of t(n)/3 values of f can be reduced to the selective XOR of t(n) such values (by using some of the ideas used in Exercise 7.17 for handling the case that Sr  > t(n)/3). The XOR of t(n) values can be obtained by three XORs (of t(n)/3 values each), at the cost of decreasing the advantage by raising it to a power of three.
Exercise 7.19 (reducing “selective XOR” to direct product): Recall that, in §7.2.1.2, the approximation of the “selective XOR” predicate P was reduced to the guessing 35
That is, assume first that the reduction is given t(n) samples of the distribution (Un , f (Un )), and analyze its success probability on a uniformly distributed input (x, r ) = (x1 , . . . , xt(n) , r1 · · · rt(n) ). Next, apply Exercise 7.16 when X represents the distribution of the actual input (x, r ), and Z represents the distribution of the auxiliary sequence of samples.
281
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
THE BRIGHT SIDE OF HARDNESS
of the value of the direct product function P. Present a reduction in the opposite direction. That is, for P and P as in §7.2.1.2, show that if P is T inapproximable then every T size circuit succeeds in computing P with probability at most 1/T , where T = (T ). Guideline: Use Exercise 7.15.
Exercise 7.20 (Theorem 7.14 versus Theorem 7.5): Consider a generalization of Theorem 7.14 in which f and P are functions from strings to sets of strings such that P(x1 , . . . , xt ) = f (x1 ) × · · · × f (xt ). 1. Prove that if for every family of p1 size circuits, {Cn }n∈N , and all sufficiently large n ∈ N, it holds that Pr[Cn (Un ) ∈ f (Un )] > 1/ p2 (n) then for every family of p size circuits, {Cm }m∈N , it holds that Pr[Cm (Um ) ∈ P(Um )] < ε (m), where ε and p are as in Theorem 7.14. Further generalize the claim by replacing {Un }n∈N with an arbitrary distribution ensemble {X n }n∈N , and replacing Um by a t(n)fold Cartesian product of X n (where m = t(n) · n). 2. Show that the foregoing generalizes both Theorem 7.14 and a nonuniform complexity version of Theorem 7.5. Exercise 7.21 (refinement of the main theme of §7.2.1.3): Consider the following modification of Definition 7.17, in which the decoding condition refers to an agreement threshold of (1/q(N )) + α(N ) rather than to a threshold of α(N ). The modified definition reads as follows (where p is a fixed polynomial): For every w : [(N )] → [q(N )] and x ∈ {0, 1} N such that (x) is (1 − ((1/q(N )) + α(N )))close to w, there exists an oracleaided circuit C of size p((log N )/α(N )) such that C w (i) yields the i th bit of x for every i ∈ [N ]. 1. Formulate and prove a version of Theorem 7.18 that refers to the modified definition (rather than to the original one). Guideline: The modified version should refer to computing g(Um(n) ) with success probability greater than (1/q(n)) + ε(n) (rather than greater than ε(n)).
2. Prove that, when applied to binary codes (i.e., q ≡ 2), the version in Item 1 yields S inapproximable predicates, for S (n ) = S(m −1 (n ))(1) /poly(n ). 3. Prove that the Hadamard code allows implicit decoding under the modified definition (but not according to the original one).36 Guideline: This is the actual contents of Theorem 7.8.
Show that if : {0, 1} N → [q(N )](N ) is a (nonbinary) code that allows implicit decoding then encoding its symbols by the Hadamard code yields a binary code &log q(N )' ({0, 1} N → {0, 1}(N )·2 2 ) that allows implicit decoding. Note that efficient encoding is preserved only if q(N ) ≤ poly(N ). Exercise 7.22 (using Proposition 7.16 to prove Theorem 7.19): Prove Theorem 7.19 by combining Proposition 7.16 and Theorem 7.8. Guideline: Note that, for some γ > 0, Proposition 7.16 yields an exponentialtime computable function fˆ such that  fˆ(x) ≤ x and for every family of circuit {Cn }n ∈N 36
Needless to say, the Hadamard code is not efficient (for the trivial reason that its codewords have exponential length).
282
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
EXERCISES
of size S (n ) = S(n /3)γ /poly(n ) it holds that Pr[Cn (Un ) = fˆ(Un )] < 1/S (n ). Combining this with Theorem 7.8, infer that P(x, r ) = b( fˆ(x), r ), where r  =  fˆ(x) ≤ x, is S inapproximable for S (n ) = S(n /2)(1) /poly(n ). Note that if S(n) = 2(n) then S (n ) = 2(n ) .
Exercise 7.23: Let G be a pairwise independent generator (i.e., as in Lemma 7.22), def S ⊂ {0, 1}n and α = S/2n . Prove that, with probability at least min(n · α, 1)/2, at least one of the n strings output by G(U2n ) resides in S. Furthermore, if α ≤ 1/2n then this probability is at least 0.75 · n · α. Guideline: Using the pairwise independence property and employing the Inclusion Exclusion formula, we lowerbound the aforementioned probability by n · α − n2 · α 2 . If α ≤ 1/n then the claim follows; otherwise we employ the same reasoning to the first 1/α elements in the output of G(U2n ).
Exercise 7.24 (oneway functions versus inapproximable predicates): Prove that the existence of a nonuniformly hard oneway function (as in Definition 7.3) implies the existence of an exponentialtime computable predicate that is T inapproximable (as per Definition 7.9), for every polynomial T . Guideline: Suppose first that the oneway function f is length preserving and 11. Consider the hardcore predicate b guaranteed by Theorem 7.7 for g(x, r ) = ( f (x), r ), define the Boolean function h such that h(z) = b(g −1 (z)), and show that h is T inapproximable for every polynomial T . For the general case a different approach seems needed. Specifically, given a (lengthpreserving) oneway function f , consider the Boolean function h defined as h(z, i, σ ) = 1 if and only if the i th bit of the lexicographically first element in f −1 (z) = {x : f (x) = z} equals σ . (In particular, if f −1 (z) = ∅ then h(z, i, σ ) = 0 for every i and σ .)37 Note that h is computable in exponential time, but is not (worstcase) computable by polynomialsize circuits. Applying Theorem 7.10, we are done. Thus, h may be easy to compute in the averagecase sense (e.g., if f (x) = 0x f (x) for some oneway function f ). 37
283
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
CHAPTER EIGHT
Pseudorandom Generators
Indistinguishable things are identical.1 G. W. Leibniz (1646–1714) A fresh view at the question of randomness has been taken by Complexity Theory: It has been postulated that a distribution is random (or rather pseudorandom) if it cannot be told apart from the uniform distribution by any efficient procedure. Thus, (pseudo)randomness is not an inherent property of an object, but is rather subjective to the observer. At the extreme, this approach says that the question of whether the world is deterministic or allows for some free choice (which may be viewed as sources of randomness) is irrelevant. What matters is how the world looks to us and to various computationally bounded devices. That is, if some phenomenon looks random, then we may just treat it as if it were random. Likewise, if we can generate sequences that cannot be told apart from the uniform distribution by any efficient procedure, then we can use these sequences in any efficient randomized application instead of the ideal coin tosses that are postulated in the design of this application. The pivot of the foregoing approach is the notion of computational indistinguishability, which refers to pairs of distributions that cannot be told apart by efficient procedures. The most fundamental incarnation of this notion associates efficient procedures with polynomialtime algorithms, but other incarnations that restrict attention to other classes of distinguishing procedures also lead to important insights. Likewise, the effective generation of pseudorandom objects, which is of major concern, is actually a general paradigm with numerous useful incarnations (which differ in the Computational Complexity limitations imposed on the generation process). Summary: Pseudorandom generators are efficient deterministic procedures that stretch short random seeds into longer pseudorandom sequences. Thus, a generic formulation of pseudorandom generators consists of specifying three fundamental aspects – the stretch measure of the generators; the class of distinguishers that the generators are supposed to fool (i.e., the algorithms with respect to which the computational indistinguishability requirement should hold); and the resources that the generators are allowed to use (i.e., their own computational complexity). 1
This is Leibniz’s Principle of Identity of Indiscernibles. Leibniz admits that counterexamples to this principle are conceivable but will not occur in real life because God is much too benevolent. We thus believe that he would have agreed to the theme of this chapter, which asserts that indistinguishable things should be considered as if they were identical.
284
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION
The archetypical case of pseudorandom generators refers to efficient generators that fool any feasible procedure; that is, the potential distinguisher is any probabilistic polynomialtime algorithm, which may be more complex than the generator itself (which, in turn, has time complexity bounded by a fixed polynomial). These generators are called generalpurpose, because their output can be safely used in any efficient application. Such (generalpurpose) pseudorandom generators exist if and only if oneway functions exist. In contrast to such (generalpurpose) pseudorandom generators, for the purpose of derandomization a relaxed definition of pseudorandom generators suffices. In particular, for such a purpose, one may use pseudorandom generators that are somewhat more complex than the potential distinguisher (which represents a randomized algorithm to be derandomized). Following this approach, adequate pseudorandom generators yield a full derandomization of BPP (i.e., BPP = P), and such generators can be constructed based on the assumption that some problems in E have no subexponentialsize circuits. It is also beneficial to consider pseudorandom generators that fool spacebounded distinguishers and generators that exhibit some limited random behavior (e.g., outputting a pairwise independent or a smallbias sequence). Such (specialpurpose) pseudorandom generators can be constructed without relying on any computational complexity assumption.
Introduction The “question of randomness” has been puzzling thinkers for ages. Aspects of this question range from philosophical doubts regarding the existence of randomness (in the world) and reflections on the meaning of randomness (in our thinking) to technical questions regarding the measuring of randomness. Among many other things, the second half of the twentieth century has witnessed the development of three theories of randomness, which address different aspects of the foregoing question. The first theory (cf., [63]), initiated by Shannon [204], views randomness as representing lack of information, which in turn is modeled by a probability distribution on the possible values of the missing data. Indeed, Shannon’s Information Theory is rooted in probability theory. Information Theory is focused at distributions that are not perfectly random (i.e., encode information in a redundant manner), and characterizes perfect randomness as the extreme case in which the information contents is maximized (i.e., in this case there is no redundancy at all). Thus, perfect randomness is associated with a unique distribution – the uniform one. In particular, by definition, one cannot (deterministically) generate such perfect random strings from shorter random seeds. The second theory (cf., [153, 156]), initiated by Solomonoff [210], Kolmogorov [147], and Chaitin [51], views randomness as representing lack of structure, which in turn is reflected in the length of the most succinct and effective description of the object. The notion of a succinct and effective description refers to a process that transforms the succinct description to an explicit one. Indeed, this theory of randomness is rooted in computability theory and specifically in the notion of a universal language (equiv., universal machine or computing device; see §1.2.3.4). It measures the randomness (or
285
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
PSEUDORANDOM GENERATORS
complexity) of objects in terms of the shortest program (for a fixed universal machine) that generates the object.2 Like Shannon’s theory, Kolmogorov Complexity is quantitative and perfect random objects appear as an extreme case. However, following Kolmogorov’s approach one may say that a single object, rather than a distribution over objects, is perfectly random. Still, by definition, one cannot (deterministically) generate strings of high Kolmogorov Complexity from short random seeds. The third theory, which is the focus of the current chapter, views randomness as an effect on an observer and thus as being relative to the observer’s abilities (of analysis). The observer’s abilities are captured by its computational abilities (i.e., the complexity of the processes that the observer may apply), and hence, this theory of randomness is rooted in Complexity Theory. This theory of randomness is explicitly aimed at providing a notion of randomness that, unlike the previous two notions, allows for an efficient (and deterministic) generation of random strings from shorter random seeds. The heart of this theory is the suggestion to view objects as equal if they cannot be told apart by any efficient procedure. Consequently, a distribution that cannot be efficiently distinguished from the uniform distribution will be considered random (or rather called pseudorandom). Thus, randomness is not an “inherent” property of objects (or distributions), but is rather relative to an observer (and its computational abilities). To illustrate this approach, let us consider the following mental experiment. Alice and Bob play “head or tail” in one of the following four ways. In each of them, Alice flips an unbiased coin and Bob is asked to guess its outcome before the coin hits the floor. The alternative ways differ by the knowledge Bob has before making his guess. In the first alternative, Bob has to announce his guess before Alice flips the coin. Clearly, in this case Bob wins with probability 1/2. In the second alternative, Bob has to announce his guess while the coin is spinning in the air. Although the outcome is determined in principle by the motion of the coin, Bob does not have accurate information on the motion. Thus we believe that, also in this case, Bob wins with probability 1/2. The third alternative is similar to the second, except that Bob has at his disposal sophisticated equipment capable of providing accurate information on the coin’s motion as well as on the environment affecting the outcome. However, Bob cannot process this information in time to improve his guess. In the fourth alternative, Bob’s recording equipment is directly connected to a powerful computer programmed to solve the motion equations and output a prediction. It is conceivable that in such a case, Bob can improve substantially his guess of the outcome of the coin. We conclude that the randomness of an event is relative to the information and computing resources at our disposal. At the extreme, even events that are fully determined by public information may be perceived as random events by an observer that lacks the relevant information and/or the ability to process it. Our focus will be on the lack of sufficient processing power, and not on the lack of sufficient information. The lack of sufficient 2
We mention that Kolmogorov’s approach is inherently intractable (i.e., Kolmogorov Complexity is uncomputable).
286
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
INTRODUCTION
output sequence
seed
Gen
? a truly random sequence
Figure 8.1: Pseudorandom generators – an illustration.
processing power may be due either to the formidable amount of computation required (for analyzing the event in question) or to the fact that the observer happens to be very limited. A natural notion of pseudorandomness arises – a distribution is pseudorandom if no efficient procedure can distinguish it from the uniform distribution, where efficient procedures are associated with (probabilistic) polynomialtime algorithms. This specific notion of pseudorandomness is indeed the most fundamental one, and much of this chapter is focused on it. Weaker notions of pseudorandomness arise as well – they refer to indistinguishability by weaker procedures such as spacebounded algorithms, constantdepth circuits, and so on. Stretching this approach even further, one may consider algorithms that are designed on purpose so as not to distinguish even weaker forms of “pseudorandom” sequences from random ones (where such algorithms arise naturally when trying to convert some natural randomized algorithms into deterministic ones; see Section 8.5). The foregoing discussion has focused on one aspect of the pseudorandomness question – the resources or type of the observer (or potential distinguisher). Another important aspect is whether such pseudorandom sequences can be generated from much shorter ones, and at what cost (or complexity). A natural approach requires the generation process to be efficient, and furthermore to be fixed before the specific observer is determined. Coupled with the aforementioned strong notion of pseudorandomness, this yields the archetypical notion of pseudorandom generators – those operating in (fixed) polynomial time and producing sequences that are indistinguishable from uniform ones by any polynomialtime observer. In particular, this means that the distinguisher is allowed more resources than the generator. Such (generalpurpose) pseudorandom generators (discussed in Section 8.2) allow for decreasing the randomness complexity of any efficient application, and are thus of great relevance to randomized algorithms and cryptography. The term generalpurpose is meant to emphasize the fact that the same generator is good for all efficient applications, including those that consume more resources than the generator itself. Although generalpurpose pseudorandom generators are very appealing, there are important reasons for also considering the opposite relation between the complexities of the generation and distinguishing tasks; that is, allowing the pseudorandom generator to use more resources (e.g., time or space) than the observer it tries to fool. This alternative is natural in the context of derandomization (i.e., converting randomized algorithms to deterministic ones), where the crucial step is replacing the random input of an algorithm by a pseudorandom input, which in turn can be generated based on a much shorter random seed. In particular, when derandomizing a probabilistic polynomialtime algorithm, the observer (to be fooled by the generator) is a fixed algorithm. In this case, employing a
287
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
PSEUDORANDOM GENERATORS
more complex generator merely means that the complexity of the derived deterministic algorithm is dominated by the complexity of the generator (rather than by the complexity of the original randomized algorithm). Needless to say, allowing the generator to use more resources than the observer that it tries to fool makes the task of designing pseudorandom generators potentially easier, and enables derandomization results that are not known when using generalpurpose pseudorandom generators. The usefulness of this approach is demonstrated in Sections 8.3 through 8.5. We note that the goal of all types of pseudorandom generators is to allow the generation of “sufficiently random” sequences based on much shorter random seeds. Thus, pseudorandom generators offer significant saving in the randomness complexity of various applications (and in some cases eliminating randomness altogether). Saving on randomness is valuable because many applications are severely limited in their ability to generate or obtain truly random bits. Furthermore, typically, generating truly random bits is significantly more expensive than standard computation steps. Thus, randomness is a computational resource that should be considered on top of time complexity (analogously to the consideration of space complexity). Organization. In Section 8.1 we present the general paradigm underlying the various notions of pseudorandom generators. The archetypical case of generalpurpose pseudorandom generators is presented in Section 8.2. We then turn to alternative notions of pseudorandom generators: Generators that suffice for the derandomization of complexity classes such as BPP are discussed in Section 8.3; pseudorandom generators in the domain of spacebounded computations are discussed in Section 8.4; and specialpurpose generators are discussed in Section 8.5. Teaching note: If you can afford teaching only one of the alternative notions of pseudorandom generators, then we suggest teaching the notion of generalpurpose pseudorandom generators (presented in Section 8.2). This notion is more relevant to computer science at large and the technical material is relatively simpler. The chapter is organized to facilitate this option.
Prerequisites. We assume a basic familiarity with elementary probability theory (see Appendix D.1) and randomized algorithms (see Section 6.1). In particular, standard conventions regarding random variables (presented in Appendix D.1.1) will be extensively used. We shall also apply a couple of results from Chapter 7, but these applications will be selfcontained.
8.1. The General Paradigm Teaching note: We advocate a unified view of various notions of pseudorandom generators. That is, we view these notions as incarnations of a general abstract paradigm, to be presented in this section. A teacher who wishes to focus on one of these incarnations may still use this section as a general motivation toward the specific definitions used later. On the other hand, some students may prefer reading this section after studying one of the specific incarnations.
A generic formulation of pseudorandom generators consists of specifying three fundamental aspects – the stretch measure of the generators; the class of distinguishers that the
288
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
8.1. THE GENERAL PARADIGM
generators are supposed to fool (i.e., the algorithms with respect to which the computational indistinguishability requirement should hold); and the resources that the generators are allowed to use (i.e., their own computational complexity). Let us elaborate. Stretch function. A necessary requirement of any notion of a pseudorandom generator is that the generator is a deterministic algorithm that stretches short strings, called seeds, into longer output sequences.3 Specifically, this algorithm stretches kbit long seeds into (k)bit long outputs, where (k) > k. The function : N → N is called the stretch measure (or stretch function) of the generator. In some settings the specific stretch measure is immaterial (e.g., see Section 8.2.4). Computational Indistinguishability. A necessary requirement of any notion of a pseudorandom generator is that the generator “fools” some nontrivial algorithms. That is, it is required that any algorithm taken from a predetermined class of interest cannot distinguish the output produced by the generator (when the generator is fed with a uniformly chosen seed) from a uniformly chosen sequence. Thus, we consider a class D of distinguishers (e.g., probabilistic polynomialtime algorithms) and a class F of (threshold) functions (e.g., reciprocals of positive polynomials), and require that the generator G satisfies the following: For any D ∈ D, any f ∈ F, and for all sufficiently large k’s it holds that  Pr[D(G(Uk )) = 1] − Pr[D(U(k) ) = 1]  < f (k) ,
(8.1)
where Un denotes the uniform distribution over {0, 1}n , and the probability is taken over Uk (resp., U(k) ) as well as over the coin tosses of algorithm D in case it is probabilistic. The reader may think of such a distinguisher, D, as of an observer that tries to tell whether the “tested string” is a random output of the generator (i.e., distributed as G(Uk )) or is a truly random string (i.e., distributed as U(k) ). The condition in Eq. (8.1) requires that D cannot make a meaningful decision; that is, ignoring a negligible difference (represented by f (k)), D’s verdict is the same in both cases.4 The archetypical choice is that D is the set of all probabilistic polynomialtime algorithms, and F is the set of all functions that are the reciprocal of some positive polynomial. Complexity of Generation. The archetypical choice is that the generator has to work in polynomial time (in length of its input – the seed). Other choices will be discussed as well. We note that placing no computational requirements on the generator (or, alternatively, putting very mild requirements such as upperbounding the running time by a doubleexponential function), yields “generators” that can fool any subexponentialsize circuit family (see Exercise 8.1). 3
Indeed, the seed represents the randomness that is used in the generation of the output sequences; that is, the randomized generation process is decoupled into a deterministic algorithm and a random seed. This decoupling facilitates the study of such processes. 4 The class of threshold functions F should be viewed as determining the class of noticeable probabilities (as a function of k). Thus, we require certain functions (i.e., those presented at the l.h.s of Eq. (8.1)) to be smaller than any noticeable function on all but finitely many integers. We call the former functions negligible. Note that a function may be neither noticeable nor negligible (e.g., it may be smaller than any noticeable function on infinitely many values and yet larger than some noticeable function on infinitely many other values).
289
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
PSEUDORANDOM GENERATORS
Notational conventions. We will consistently use k for denoting the length of the seed of a pseudorandom generator, and (k) for denoting the length of the corresponding output. In some cases, this makes our presentation a little more cumbersome (since a more natural presentation may specify some other parameters and let the seedlength be a function of the latter). However, our choice has the advantage of focusing attention on the fundamental parameter of the pseudorandom generation process – the length of the random seed. We note that whenever a pseudorandom generator is used to “derandomize” an algorithm, n will denote the length of the input to this algorithm, and k will be selected as a function of n. Some instantiations of the general paradigm. Two important instantiations of the notion of pseudorandom generators relate to polynomialtime distinguishers. 1. Generalpurpose pseudorandom generators correspond to the case that the generator itself runs in polynomial time and needs to withstand any probabilistic polynomialtime distinguisher, including distinguishers that run for more time than the generator. Thus, the same generator may be used safely in any efficient application. (This notion is treated in Section 8.2.) 2. In contrast, pseudorandom generators intended for derandomization may run more time than the distinguisher, which is viewed as a fixed circuit having size that is upperbounded by a fixed polynomial. (This notion is treated in Section 8.3.) In addition, the general paradigm may be instantiated by focusing on the space complexity of the potential distinguishers (and the generator), rather than on their time complexity. Furthermore, one may also consider distinguishers that merely reflect probabilistic properties such as pairwise independence, smallbias, and hitting frequency.
8.2. GeneralPurpose Pseudorandom Generators Randomness is playing an increasingly important role in computation: It is frequently used in the design of sequential, parallel, and distributed algorithms, and it is of course central to cryptography. Whereas it is convenient to design such algorithms making free use of randomness, it is also desirable to minimize the usage of randomness in real implementations. Thus, generalpurpose pseudorandom generators (as defined next) are a key ingredient in an “algorithmic toolbox” – they provide an automatic compiler of programs written with free usage of randomness into programs that make an economical use of randomness. Organization of this section. Since this is a relatively long section, a short road map seems in place. In Section 8.2.1 we provide the basic definition of generalpurpose pseudorandom generators, and in Section 8.2.2 we describe their archetypical application (which was eluded to in the former paragraph). In Section 8.2.3 we provide a wider perspective on the notion of computational indistinguishability that underlies the basic definition, and in Section 8.2.4 we justify the little concern (shown in Section 8.2.1) regarding the specific stretch function. In Section 8.2.5 we address the existence of generalpurpose pseudorandom generators. In Section 8.2.6 we motivate and discuss a nonuniform version of computational indistinguishability. We conclude in Section 8.2.7 by reviewing other variants and reflecting on various conceptual aspects of the notions discussed in this section.
290
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
8.2. GENERALPURPOSE PSEUDORANDOM GENERATORS
8.2.1. The Basic Definition Loosely speaking, generalpurpose pseudorandom generators are efficient deterministic programs that expand short, randomly selected seeds into longer pseudorandom bit sequences, where the latter are defined as computationally indistinguishable from truly random sequences by any efficient algorithm. Identifying efficiency with polynomialtime operation, this means that the generator (being a fixed algorithm) works within some fixed polynomial time, whereas the distinguisher may be any algorithm that runs in polynomial time. Thus, the distinguisher is potentially more complex than the generator; for example, the distinguisher may run in time that is cubic in the running time of the generator. Furthermore, to facilitate the development of this theory, we allow the distinguisher to be probabilistic (whereas the generator remains deterministic as stated previously). We require that such distinguishers cannot tell the output of the generator from a truly random string of similar length, or rather that the difference that such distinguishers may detect (or “sense”) is negligible. Here, a negligible function is a function that vanishes faster than the reciprocal of any positive polynomial.5 Definition 8.1 (generalpurpose pseudorandom generator): A deterministic polynomialtime algorithm G is called a pseudorandom generator if there exists a stretch function, : N → N (satisfying (k) > k for all k), such that for any probabilistic polynomialtime algorithm D, for any positive polynomial p, and for all sufficiently large k’s it holds that  Pr[D(G(Uk )) = 1] − Pr[D(U(k) ) = 1] 
k for all k. Needless to say, the larger is, the more useful the pseudorandom generator is. Of course, is upperbounded by the running time of the generator (and hence by a polynomial). In Section 8.2.4 we show that any pseudorandom generator (even one having minimal stretch (k) = k + 1) can be used for constructing a pseudorandom generator having any desired (polynomial) stretch function. But before doing so, we rigorously discuss the “saving in 5
Definition 8.1 requires that the functions representing the distinguishing gap of certain algorithms should be smaller than the reciprocal of any positive polynomial for all but finitely many k’s. The former functions are called negligible (cf. footnote 4, when identifying noticeable functions with the reciprocals of any positive polynomial). The notion of negligible probability is robust in the sense that any event that occurs with negligible probability will also occur with negligible probability when the experiment is repeated a “feasible” (i.e., polynomial) number of times. 6 The latter choice is naturally coupled with the association of efficient computation with polynomialtime algorithms: An event that occurs with noticeable probability occurs almost always when the experiment is repeated a “feasible” (i.e., polynomial) number of times.
291
18:49
CUUS063 main
CUUS063 Goldreich
978 0 521 88473 0
March 31, 2008
PSEUDORANDOM GENERATORS
randomness” offered by pseudorandom generators, and provide a wider perspective on the notion of computational indistinguishability that underlies Definition 8.1.
8.2.2. The Archetypical Application We note that “pseudorandom number generators” appeared with the first computers, and have been used ever since for generating random choices (or samples) for various applications. However, typical implementations use generators that are not pseudorandom according to Definition 8.1. Instead, at best, these generators are shown to pass some ad hoc statistical test (cf. [146]). We warn that the fact that a “pseudorandom number generator” passes some statistical tests does not mean that it will pass a new test and that it will be good for a future (untested) application. Needless to say, the approach of subjecting the generator to some ad hoc tests fails to provide general results of the form “for all practical purposes using the output of the generator is as good as using truly unbiased coin tosses.” In contrast, the approach encompassed in Definition 8.1 aims at such generality, and in fact is tailored to obtain it: The notion of computational indistinguishability, which underlines Definition 8.1, covers all possible efficient applications and guarantees that for all of them pseudorandom sequences are as good as truly random ones. Indeed, any efficient randomized algorithm maintains its performance when its internal coin tosses are substituted by a sequence generated by a pseudorandom generator. This substitution is spelled out next. Construction 8.2 (typical application of pseudorandom generator