Computational Complexity Theory, Techniques, and Applications
This book consists of selections from the Encyclopedia of Complexity and Systems Science edited by Robert A. Meyers, published by Springer New York in 2009.
Robert A. Meyers (Ed.)
Computational Complexity Theory, Techniques, and Applications
With 1487 Figures and 234 Tables
123
ROBERT A. MEYERS, Ph. D. EditorinChief RAMTECH LIMITED 122 Escalle Lane Larkspur, CA 94939 USA
[email protected] Library of Congress Control Number: 2011940800
ISBN: 9781461418009 This publication is available also as: Print publication under ISBN: 9781461417996 and Print and electronic bundle under ISBN 9781461418016 © 2012 SpringerScience+Business Media, LLC. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. This book consists of selections from the Encyclopedia of Complexity and Systems Science edited by Robert A. Meyers, published by Springer New York in 2009. springer.com Printed on acid free paper
Preface
Complex systems are systems that comprise many interacting parts with the ability to generate a new quality of collective behavior through selforganization, e.g. the spontaneous formation of temporal, spatial or functional structures. They are therefore adaptive as they evolve and may contain selfdriving feedback loops. Thus, complex systems are much more than a sum of their parts. Complex systems are often characterized as having extreme sensitivity to initial conditions as well as emergent behavior that are not readily predictable or even completely deterministic. The conclusion is that a reductionist (bottomup) approach is often an incomplete description of a phenomenon. This recognition, that the collective behavior of the whole system cannot be simply inferred from the understanding of the behavior of the individual components, has led to many new concepts and sophisticated mathematical and modeling tools for application to many scientiﬁc, engineering, and societal issues that can be adequately described only in terms of complexity and complex systems. The inherent diﬃculty, or hardness, of computational problems in complex systems is a fundamental concept in computational complexity theory. This compendium, Computational Complexity, presents a detailed integrated view of the theoretical basis, computational methods and newest applicable approaches to solving inherently diﬃcult problems whose solution requires extensive resources approaching the practical limits of present day computer systems. Key components of computational complexity are detailed, integrated and utilized, varying from Parameterized Complexity Theory (e.g. see articles entitled Quantum Computing, and Analog Computation), the Exponential Time Hypothesis (e.g. see articles on Cellular Automata and Language Theory) and Complexity Class P (e.g. see Quantum Computational Complexity and Cellular Automata, Universality of), to Heuristics (e.g. Social Network Visualization, Methods of and Repeated Games with Incomplete Information) and Parallel Algorithms (e.g. Cellular Automata as Models of Parallel Computation and Optical Computing). There are 209 articles which have been organized into 14 sections each headed by a recognized expert in the ﬁeld and supported by peer reviewers in addition to the section editor. The sections are:
Agent Based Modeling and Simulation Cellular Automata, Mathematical Basis of Complex Networks and Graph Theory Data Mining and Knowledge Discovery Game Theory Granular Computing Intelligent Systems Probability and Statistics in Complex Systems Quantum Information Science Social Network Analysis Social Science, Physics and Mathematics, Applications in Soft Computing Unconventional Computing Wavelets
The complete listing of articles and section editors is presented on pages VII to XII.
VI
Preface
The articles are written for an audience of advanced university undergraduate and graduate students, professors, and professionals in a wide range of ﬁelds who must manage complexity on scales ranging from the atomic and molecular to the societal and global. Each article was selected and peer reviewed by one of our 13 Section Editors with advice and consultation provided by Board Members: Lotﬁ Zadeh, Stephen Wolfram and Richard Stearns, and the EditorinChief. This level of coordination assures that the reader can have a level of conﬁdence in the relevance and accuracy of the information far exceeding that generally found on the World Wide Web. Accessibility is also a priority and for this reason each article includes a glossary of important terms and a concise deﬁnition of the subject. Robert A. Meyers EditorinChief Larkspur, California July 2011
Sections
Agent Based Modeling and Simulation, Section Editor: Filippo Castiglione Agent Based Computational Economics Agent Based Modeling and Artiﬁcial Life Agent Based Modeling and Computer Languages Agent Based Modeling and Simulation, Introduction to Agent Based Modeling, Large Scale Simulations Agent Based Modeling, Mathematical Formalism for AgentBased Modeling and Simulation Cellular Automaton Modeling of Tumor Invasion Computer Graphics and Games, Agent Based Modeling in Embodied and Situated Agents, Adaptive Behavior in Interaction Based Computing in Physics Logic and Geometry of Agents in AgentBased Modeling Social Phenomena Simulation Swarm Intelligence Cellular Automata, Mathematical Basis of, Section Editor: Andrew Adamatzky Additive Cellular Automata Algorithmic Complexity and Cellular Automata Cellular Automata and Groups Cellular Automata and Language Theory Cellular Automata as Models of Parallel Computation Cellular Automata in Hyperbolic Spaces Cellular Automata Modeling of Physical Systems Cellular Automata on Triangular, Pentagonal and Hexagonal Tessellations Cellular Automata with Memory Cellular Automata, Classiﬁcation of Cellular Automata, Emergent Phenomena in Cellular Automata, Universality of Chaotic Behavior of Cellular Automata Dynamics of Cellular Automata in Noncompact Spaces Ergodic Theory of Cellular Automata Evolving Cellular Automata Firing Squad Synchronization Problem in Cellular Automata Gliders in Cellular Automata Growth Phenomena in Cellular Automata Identiﬁcation of Cellular Automata Mathematical Basis of Cellular Automata, Introduction to
VIII
Sections
Phase Transitions in Cellular Automata Quantum Cellular Automata Reversible Cellular Automata Selforganised Criticality and Cellular Automata SelfReplication and Cellular Automata Structurally Dynamic Cellular Automata Tiling Problem and Undecidability in Cellular Automata Topological Dynamics of Cellular Automata Complex Networks and Graph Theory, Section Editor: Geoffrey Canright Community Structure in Graphs Complex Gene Regulatory Networks – From Structure to Biological Observables: Cell Fate Determination Complex Networks and Graph Theory Complex Networks, Visualization of Food Webs Growth Models for Networks Human Sexual Networks Internet Topology Link Analysis and Web Search Motifs in Graphs Nonnegative Matrices and Digraphs Random Graphs, a Whirlwind Tour of Synchronization Phenomena on Networks World Wide Web, Graph Structure Data Mining and Knowledge Discovery, Section Editor: Peter Kokol Data and Dimensionality Reduction in Data Analysis and System Modeling DataMining and Knowledge Discovery, Introduction to DataMining and Knowledge Discovery, Neural Networks in DataMining and Knowledge Discovery: Case Based Reasoning, Nearest Neighbor and Rough Sets Decision Trees Discovery Systems Genetic and Evolutionary Algorithms and Programming: General Introduction and Application to Game Playing Knowledge Discovery: Clustering Machine Learning, Ensemble Methods in Manipulating Data and Dimension Reduction Methods: Feature Selection Game Theory, Section Editor: Marilda Sotomayor Bayesian Games: Games with Incomplete Information Cooperative Games Cooperative Games (Von Neumann–Morgenstern Stable Sets) Correlated Equilibria and Communication in Games Cost Sharing Diﬀerential Games Dynamic Games with an Application to Climate Change Models Evolutionary Game Theory Fair Division
Sections
Game Theory and Strategic Complexity Game Theory, Introduction to Implementation Theory Inspection Games Learning in Games Market Games and Clubs Mechanism Design Networks and Stability PrincipalAgent Models Repeated Games with Complete Information Repeated Games with Incomplete Information Reputation Eﬀects Signaling Games Static Games Stochastic Games TwoSided Matching Models Voting Voting Procedures, Complexity of Zerosum Two Person Games Granular Computing, Section Editor: Tsau Y. Lin Cooperative MultiHierarchical Query Answering Systems Dependency and Granularity in Data Mining Fuzzy Logic Fuzzy Probability Theory Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions GeneticFuzzy Data Mining Techniques Granular Model for Data Mining Granular Computing and Data Mining for Ordered Data: The DominanceBased Rough Set Approach Granular Computing and Modeling of the Uncertainty in Quantum Mechanics Granular Computing System Vulnerabilities: Exploring the Dark Side of Social Networking Communities Granular Computing, Information Models for Granular Computing, Introduction to Granular Computing, Philosophical Foundation for Granular Computing, Principles and Perspectives of Granular Computing: Practices, Theories and Future Directions Granular Neural Network Granulation of Knowledge: Similarity Based Approach in Information and Decision Systems MultiGranular Computing and Quotient Structure Nonstandard Analysis, an Invitation to Rough and RoughFuzzy Sets in Design of Information Systems Rough Set Data Analysis Rule Induction, Missing Attribute Values and Discretization Social Networks and Granular Computing Intelligent Systems, Section Editor: James A. Hendler Artiﬁcial Intelligence in Modeling and Simulation Intelligent Control Intelligent Systems, Introduction to
IX
X
Sections
Learning and Planning (Intelligent Systems) Mobile Agents Semantic Web Probability and Statistics in Complex Systems, Section Editor: Henrik Jeldtoft Jensen Bayesian Statistics Branching Processes Complexity in Systems Level Biology and Genetics: Statistical Perspectives Correlations in Complex Systems Entropy Extreme Value Statistics Field Theoretic Methods Fluctuations, Importance of: Complexity in the View of Stochastic Processes Hierarchical Dynamics Levy Statistics and Anomalous Transport: Levy Flights and Subdiﬀusion Probability and Statistics in Complex Systems, Introduction to Probability Densities in Complex Systems, Measuring Probability Distributions in Complex Systems Random Matrix Theory Random Walks in Random Environment Record Statistics and Dynamics Stochastic Loewner Evolution: Linking Universality, Criticality and Conformal Invariance in Complex Systems Stochastic Processes Quantum Information Science, Section Editor: Joseph F. Traub Quantum Algorithms Quantum Algorithms and Complexity for Continuous Problems Quantum Computational Complexity Quantum Computing Using Optics Quantum Computing with Trapped Ions Quantum Cryptography Quantum Error Correction and Fault Tolerant Quantum Computing Quantum Information Processing Quantum Information Science, Introduction to Social Network Analysis, Section Editor: John Scott Network Analysis, Longitudinal Methods of Positional Analysis and Blockmodelling Social Network Analysis, Estimation and Sampling in Social Network Analysis, Graph Theoretical Approaches to Social Network Analysis, LargeScale Social Network Analysis, Overview of Social Network Analysis, TwoMode Concepts in Social Network Visualization, Methods of Social Networks, Algebraic Models for Social Networks, Diﬀusion Processes in Social Networks, Exponential Random Graph (p*) Models for
Sections
Social Science, Physics and Mathematics Applications in, Section Editor: Andrzej Nowak Minority Games Rational, GoalOriented Agents Social Processes, Simulation Models of Soft Computing, Section Editor: Janusz Kacprzyk Aggregation Operators and Soft Computing Evolving Fuzzy Systems Fuzzy Logic, Type2 and Uncertainty Fuzzy Optimization Fuzzy Sets Theory, Foundations of Hybrid Soft Computing Models for Systems Modeling and Control Neurofuzzy Systems Possibility Theory Rough Sets in Decision Making Rough Sets: Foundations and Perspectives Soft Computing, Introduction to Statistics with Imprecise Data Unconventional Computing, Section Editor: Andrew Adamatzky Amorphous Computing Analog Computation Artiﬁcial Chemistry Bacterial Computing Cellular Computing Computing in Geometrical Constrained Excitable Chemical Systems Computing with Solitons DNA Computing Evolution in Materio Immunecomputing Mechanical Computing: The Computational Complexity of Physical Devices Membrane Computing Molecular Automata Nanocomputers Optical Computing Quantum Computing ReactionDiﬀusion Computing Reversible Computing Thermodynamics of Computation Unconventional Computing, Introduction to Unconventional Computing, Novel Hardware for Wavelets, Section Editor: Edward Aboufadel Bivariate (Twodimensional) Wavelets Comparison of Discrete and Continuous Wavelet Transforms Curvelets and Ridgelets
XI
XII
Sections
Multivariate Splines and Their Applictions Multiwavelets Numerical Issues When Using Wavelets Popular Wavelet Families and Filters and Their Use Statistical Applications of Wavelets Wavelets and PDE Techniques in Image Processing, a Quick Tour of Wavelets and the Lifting Scheme Wavelets, Introduction to
About the EditorinChief
Robert A. Meyers President: RAMTECH Limited Manager, Chemical Process Technology, TRW Inc. Postdoctoral Fellow: California Institute of Technology Ph. D. Chemistry, University of California at Los Angeles B. A., Chemistry, California State University, San Diego
Biography Dr. Meyers has worked with more than 25 Nobel laureates during his career. Research Dr. Meyers was Manager of Chemical Technology at TRW (now Northrop Grumman) in Redondo Beach, CA and is now President of RAMTECH Limited. He is coinventor of the Gravimelt process for desulfurization and demineralization of coal for air pollution and water pollution control. Dr. Meyers is the inventor of and was project manager for the DOEsponsored Magnetohydrodynamics Seed Regeneration Project which has resulted in the construction and successful operation of a pilot plant for production of potassium formate, a chemical utilized for plasma electricity generation and air pollution control. Dr. Meyers managed the pilotscale DoE project for determining the hydrodynamics of synthetic fuels. He is a coinventor of several thermooxidative stable polymers which have achieved commercial success as the GE PEI, Upjohn Polyimides and RhonePolenc bismaleimide resins. He has also managed projects for photochemistry, chemical lasers, ﬂue gas scrubbing, oil shale analysis and reﬁning, petroleum analysis and reﬁning, global change measurement from space satellites, analysis and mitigation (carbon dioxide and ozone), hydrometallurgical reﬁning, soil and hazardous waste remediation, novel polymers synthesis, modeling of the economics of space transportation systems, space rigidizable structures and chemiluminescencebased devices. He is a senior member of the American Institute of Chemical Engineers, member of the American Physical Society, member of the American Chemical Society and serves on the UCLA Chemistry Department Advisory Board. He was a member of the joint USARussia working group on air pollution control and the EPAsponsored Waste Reduction Institute for Scientists and Engineers.
XIV
About the EditorinChief
Dr. Meyers has more than 20 patents and 50 technical papers. He has published in primary literature journals including Science and the Journal of the American Chemical Society, and is listed in Who’s Who in America and Who’s Who in the World. Dr. Meyers’ scientiﬁc achievements have been reviewed in feature articles in the popular press in publications such as The New York Times Science Supplement and The Wall Street Journal as well as more specialized publications such as Chemical Engineering and Coal Age. A public service ﬁlm was produced by the Environmental Protection Agency of Dr. Meyers’ chemical desulfurization invention for air pollution control. Scientiﬁc Books Dr. Meyers is the author or EditorinChief of 12 technical books one of which won the Association of American Publishers Award as the best book in technology and engineering. Encyclopedias Dr. Meyers conceived and has served as EditorinChief of the Academic Press (now Elsevier) Encyclopedia of Physical Science and Technology. This is an 18volume publication of 780 twentypage articles written to an audience of university students and practicing professionals. This encyclopedia, ﬁrst published in 1987, was very successful, and because of this, was revised and reissued in 1992 as a second edition. The Third Edition was published in 2001 and is now online. Dr. Meyers has completed two editions of the Encyclopedia of Molecular Cell Biology and Molecular Medicine for Wiley VCH publishers (1995 and 2004). These cover molecular and cellular level genetics, biochemistry, pharmacology, diseases and structure determination as well as cell biology. His eightvolume Encyclopedia of Environmental Analysis and Remediation was published in 1998 by John Wiley & Sons and his 15volume Encyclopedia of Analytical Chemistry was published in 2000, also by John Wiley & Sons, all of which are available online.
Editorial Board Members
LOTFI A. Z ADEH Professor in the Graduate School, Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley
STEPHEN W OLFRAM Founder and CEO, Wolfram Research Creator, Mathematica® Author, A New Kind of Science
RICHARD E. STEARNS 1993 Turing Award for foundations of computational complexity Current interests include: computational complexity, automata theory, analysis of algorithms, and game theory.
Section Editors
Agent Based Modeling and Simulation
Complex Networks and Graph Theory
FILIPPO CASTIGLIONE Research Scientist Institute for Computing Applications (IAC) “M. Picone” National Research Council (CNR), Italy
GEOFFREY CANRIGHT Senior Research Scientist Telenor Research and Innovation Fornebu, Norway
Cellular Automata, Mathematical Basis of
Data Mining and Knowledge Discovery
ANDREW ADAMATZKY Professor Faculty of Computing, Engineering and Mathematical Science University of the West of England
PETER KOKOL Professor Department of Computer Science University of Maribor, Slovenia
XVIII
Section Editors
Game Theory
MARILDA SOTOMAYOR Professor Department of Economics University of São Paulo, Brazil Department of Economics Brown University, Providence
Probability and Statistics in Complex Systems
HENRIK JELDTOFT JENSEN Professor of Mathematical Physics Department of Mathematics and Institute for Mathematical Sciences Imperial College London
Granular Computing
Quantum Information Science
TSAU Y. LIN Professor Computer Science Department San Jose State University
JOSEPH F. TRAUB Edwin Howard Armstrong Professor of Computer Science Computer Science Department Columbia University
Intelligent Systems
Social Network Analysis
JAMES A. HENDLER Senior Constellation Professor of the Tetherless World Research Constellation Rensselaer Polytechnic Institute
JOHN SCOTT Professor of Sociology School of Social Science and Law University of Plymouth
Section Editors
Social Science, Physics and Mathematics Applications in
ANDRZEJ N OWAK Director of the Center for Complex Systems University of Warsaw Assistant Professor, Psychology Department Florida Atlantic University
Soft Computing
JANUSZ KACPRZYK Deputy Director for Scientiﬁc Aﬀairs, Professor Systems Research Institute Polish Academy of Sciences
Unconventional Computing
ANDREW ADAMATZKY Professor Faculty of Computing, Engineering and Mathematical Science University of the West of England
Wavelets
EDWARD ABOUFADEL Professor of Mathematics Grand Valley State University
XIX
Table of Contents
Additive Cellular Automata Burton Voorhees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Agent Based Computational Economics Moshe Levy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Agent Based Modeling and Artiﬁcial Life Charles M. Macal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
Agent Based Modeling and Computer Languages Michael J. North, Charles M. Macal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Agent Based Modeling, Large Scale Simulations Hazel R. Parry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
Agent Based Modeling, Mathematical Formalism for Reinhard Laubenbacher, Abdul S. Jarrah, Henning S. Mortveit, S.S. Ravi . . . . . . . . . . . . . . . . . . .
88
Agent Based Modeling and Simulation Stefania Bandini, Sara Manzoni, Giuseppe Vizzari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105
Agent Based Modeling and Simulation, Introduction to Filippo Castiglione . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
118
Aggregation Operators and Soft Computing Vicenç Torra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
122
Algorithmic Complexity and Cellular Automata Julien Cervelle, Enrico Formenti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
132
Amorphous Computing Hal Abelson, Jacob Beal, Gerald Jay Sussman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
Analog Computation Bruce J. MacLennan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
161
Artiﬁcial Chemistry Peter Dittrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
185
Artiﬁcial Intelligence in Modeling and Simulation Bernard Zeigler, Alexandre Muzy, Levent Yilmaz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
204
Bacterial Computing Martyn Amos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
228
Bayesian Games: Games with Incomplete Information Shmuel Zamir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
238
Bayesian Statistics David Draper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
254
Bivariate (Twodimensional) Wavelets Bin Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
275
XXII
Table of Contents
Branching Processes Mikko J. Alava, Kent Bækgaard Lauritsen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
285
Cellular Automata as Models of Parallel Computation Thomas Worsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
298
Cellular Automata, Classiﬁcation of Klaus Sutner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
312
Cellular Automata, Emergent Phenomena in James E. Hanson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
325
Cellular Automata and Groups Tullio CeccheriniSilberstein, Michel Coornaert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
336
Cellular Automata in Hyperbolic Spaces Maurice Margenstern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
350
Cellular Automata and Language Theory Martin Kutrib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
359
Cellular Automata with Memory Ramón AlonsoSanz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
382
Cellular Automata Modeling of Physical Systems Bastien Chopard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
407
Cellular Automata in Triangular, Pentagonal and Hexagonal Tessellations Carter Bays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
434
Cellular Automata, Universality of Jérôme DurandLose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
443
Cellular Automaton Modeling of Tumor Invasion Haralambos Hatzikirou, Georg Breier, Andreas Deutsch . . . . . . . . . . . . . . . . . . . . . . . . . . . .
456
Cellular Computing Christof Teuscher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
465
Chaotic Behavior of Cellular Automata Julien Cervelle, Alberto Dennunzio, Enrico Formenti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
479
Community Structure in Graphs Santo Fortunato, Claudio Castellano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
490
Comparison of Discrete and Continuous Wavelet Transforms Palle E. T. Jorgensen, MyungSin Song . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
513
Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination Sui Huang, Stuart A. Kauﬀman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
527
Complexity in Systems Level Biology and Genetics: Statistical Perspectives David A. Stephens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
561
Complex Networks and Graph Theory Geoﬀrey Canright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
579
Complex Networks, Visualization of Vladimir Batagelj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
589
Computer Graphics and Games, Agent Based Modeling in Brian Mac Namee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
604
Computing in Geometrical Constrained Excitable Chemical Systems Jerzy Gorecki, Joanna Natalia Gorecka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
622
Computing with Solitons Darren Rand, Ken Steiglitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
646
Table of Contents
Cooperative Games Roberto Serrano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
666
Cooperative Games (Von Neumann–Morgenstern Stable Sets) Jun Wako, Shigeo Muto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
675
Cooperative Multihierarchical Query Answering Systems Zbigniew W. Ras, Agnieszka Dardzinska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
690
Correlated Equilibria and Communication in Games Françoise Forges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
695
Correlations in Complex Systems Renat M. Yulmetyev, Peter Hänggi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
705
Cost Sharing Maurice Koster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
724
Curvelets and Ridgelets Jalal Fadili, JeanLuc Starck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
754
Data and Dimensionality Reduction in Data Analysis and System Modeling Witold Pedrycz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
774
DataMining and Knowledge Discovery: CaseBased Reasoning, Nearest Neighbor and Rough Sets Lech Polkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
789
DataMining and Knowledge Discovery, Introduction to Peter Kokol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
810
DataMining and Knowledge Discovery, Neural Networks in Markus Brameier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
813
Decision Trees Vili Podgorelec, Milan Zorman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
827
Dependency and Granularity in DataMining Shusaku Tsumoto, Shoji Hirano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
846
Diﬀerential Games Marc Quincampoix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
854
Discovery Systems Petra Povalej, Mateja Verlic, Gregor Stiglic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
862
DNA Computing Martyn Amos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
882
Dynamic Games with an Application to Climate Change Models Prajit K. Dutta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
897
Dynamics of Cellular Automata in Noncompact Spaces Enrico Formenti, Petr K˚urka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
914
Embodied and Situated Agents, Adaptive Behavior in Stefano Nolﬁ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
925
Entropy Constantino Tsallis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
940
Ergodic Theory of Cellular Automata Marcus Pivato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
965
Evolutionary Game Theory William H. Sandholm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1000
Evolution in Materio Simon Harding, Julian F. Miller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1030
XXIII
XXIV
Table of Contents
Evolving Cellular Automata Martin Cenek, Melanie Mitchell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1043
Evolving Fuzzy Systems Plamen Angelov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1053
Extreme Value Statistics Mario Nicodemi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1066
Fair Division Steven J. Brams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1073
Field Theoretic Methods Uwe Claus Täuber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1080
Firing Squad Synchronization Problem in Cellular Automata Hiroshi Umeo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1094
Fluctuations, Importance of: Complexity in the View of Stochastic Processes Rudolf Friedrich, Joachim Peinke, M. Reza Rahimi Tabar . . . . . . . . . . . . . . . . . . . . . . . . . . .
1131
Food Webs Jennifer A. Dunne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1155
Fuzzy Logic Lotﬁ A. Zadeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1177
Fuzzy Logic, Type2 and Uncertainty Robert I. John, Jerry M. Mendel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1201
Fuzzy Optimization Weldon A. Lodwick, Elizabeth A. Untiedt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1211
Fuzzy Probability Theory Michael Beer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1240
Fuzzy Sets Theory, Foundations of Janusz Kacprzyk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1253
Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions I. Burhan Türk¸sen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1274
Game Theory, Introduction to Marilda Sotomayor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1289
Game Theory and Strategic Complexity Kalyan Chatterjee, Hamid Sabourian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1292
Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing Michael Orlov, Moshe Sipper, Ami Hauptman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1309
GeneticFuzzy Data Mining Techniques TzungPei Hong, ChunHao Chen, Vincent S. Tseng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1321
Gliders in Cellular Automata Carter Bays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1337
Granular Computing and Data Mining for Ordered Data: The DominanceBased Rough Set Approach Salvatore Greco, Benedetto Matarazzo, Roman Słowi´nski . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1347
Granular Computing, Information Models for Steven A. Demurjian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1369
Granular Computing, Introduction to Tsau Young Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1377
Granular Computing and Modeling of the Uncertainty in Quantum Mechanics KowLung Chang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1381
Table of Contents
Granular Computing, Philosophical Foundation for Zhengxin Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1389
Granular Computing: Practices, Theories, and Future Directions Tsau Young Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1404
Granular Computing, Principles and Perspectives of Jianchao Han, Nick Cercone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1421
Granular Computing System Vulnerabilities: Exploring the Dark Side of Social Networking Communities Steve Webb, James Caverlee, Calton Pu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1433
Granular Model for Data Mining Anita Wasilewska, Ernestina Menasalvas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1444
Granular Neural Network YanQing Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1455
Granulation of Knowledge: Similarity Based Approach in Information and Decision Systems Lech Polkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1464
Growth Models for Networks Sergey N. Dorogovtsev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1488
Growth Phenomena in Cellular Automata Janko Gravner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1499
Hierarchical Dynamics Martin Nilsson Jacobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1514
Human Sexual Networks Fredrik Liljeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1535
Hybrid Soft Computing Models for Systems Modeling and Control Oscar Castillo, Patricia Melin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1547
Identiﬁcation of Cellular Automata Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1564
Immunecomputing Jon Timmis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1576
Implementation Theory Luis C. Corchón . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1588
Inspection Games Rudolf Avenhaus, Morton J. Canty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1605
Intelligent Control Clarence W. de Silva . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1619
Intelligent Systems, Introduction to James Hendler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1642
Interaction Based Computing in Physics Franco Bagnoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1644
Internet Topology Yihua He, Georgos Siganos, Michalis Faloutsos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1663
Knowledge Discovery: Clustering Pavel Berkhin, Inderjit S. Dhillon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1681
Learning in Games John Nachbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1695
Learning and Planning (Intelligent Systems) Ugur Kuter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1706
XXV
XXVI
Table of Contents
Levy Statistics and Anomalous Transport: Levy Flights and Subdiﬀusion Ralf Metzler, Aleksei V. Chechkin, Joseph Klafter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1724
Link Analysis and Web Search Johannes Bjelland, Geoﬀrey Canright, Kenth EngøMonsen . . . . . . . . . . . . . . . . . . . . . . . . . . .
1746
Logic and Geometry of Agents in AgentBased Modeling Samson Abramsky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1767
Machine Learning, Ensemble Methods in Sašo Džeroski, Panˇce Panov, Bernard Ženko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1781
Manipulating Data and Dimension Reduction Methods: Feature Selection Huan Liu, Zheng Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1790
Market Games and Clubs Myrna Wooders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1801
Mathematical Basis of Cellular Automata, Introduction to Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1819
Mechanical Computing: The Computational Complexity of Physical Devices John H. Reif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1821
Mechanism Design Ron Lavi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1837
Membrane Computing ˘ Gheorghe PAun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1851
Minority Games Chi Ho Yeung, YiCheng Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1863
Mobile Agents Niranjan Suri, Jan Vitek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1880
Molecular Automata Joanne Macdonald, Darko Stefanovic, Milan Stojanovic . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1894
Motifs in Graphs Sergi Valverde, Ricard V. Solé . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1919
MultiGranular Computing and Quotient Structure Ling Zhang, Bo Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1929
Multivariate Splines and Their Applications MingJun Lai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1939
Multiwavelets Fritz Keinert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1981
Nanocomputers Ferdinand Peper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1998
Network Analysis, Longitudinal Methods of Tom A. B. Snijders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2029
Networks and Stability Frank H. Page Jr., Myrna Wooders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2044
Neurofuzzy Systems Leszek Rutkowski, Krzysztof Cpałka, Robert Nowicki, Agata Pokropi´nska, Rafał Scherer . . . . . . . . . . .
2069
Nonnegative Matrices and Digraphs Abraham Berman, Naomi ShakedMonderer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2082
Nonstandard Analysis, an Invitation to WeiZhe Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2096
Table of Contents
Numerical Issues When Using Wavelets JeanLuc Starck, Jalal Fadili . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2121
Optical Computing Thomas J. Naughton, Damien Woods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2138
Phase Transitions in Cellular Automata Nino Boccara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2157
Popular Wavelet Families and Filters and Their Use MingJun Lai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2168
Positional Analysis and Blockmodeling Patrick Doreian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2226
Possibility Theory Didier Dubois, Henri Prade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2240
PrincipalAgent Models Inés MachoStadler, David PérezCastrillo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2253
Probability Densities in Complex Systems, Measuring Gunnar Pruessner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2267
Probability Distributions in Complex Systems Didier Sornette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2286
Probability and Statistics in Complex Systems, Introduction to Henrik Jeldtoft Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2301
Quantum Algorithms Michele Mosca . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2303
Quantum Algorithms and Complexity for Continuous Problems Anargyros Papageorgiou, Joseph F. Traub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2334
Quantum Cellular Automata Karoline Wiesner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2351
Quantum Computational Complexity John Watrous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2361
Quantum Computing Viv Kendon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2388
Quantum Computing with Trapped Ions Wolfgang Lange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2406
Quantum Computing Using Optics Gerard J. Milburn, Andrew G. White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2437
Quantum Cryptography HoiKwong Lo, Yi Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2453
Quantum Error Correction and Fault Tolerant Quantum Computing Markus Grassl, Martin Rötteler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2478
Quantum Information Processing Seth Lloyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2496
Quantum Information Science, Introduction to Joseph F. Traub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2534
Random Graphs, a Whirlwind Tour of Fan Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2536
Random Matrix Theory Güler Ergün . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2549
XXVII
XXVIII
Table of Contents
Random Walks in Random Environment Ofer Zeitouni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2564
Rational, GoalOriented Agents Rosaria Conte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2578
ReactionDiﬀusion Computing Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2594
Record Statistics and Dynamics Paolo Sibani, Henrik, Jeldtoft Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2611
Repeated Games with Complete Information Olivier Gossner, Tristan Tomala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2620
Repeated Games with Incomplete Information Jérôme Renault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2635
Reputation Eﬀects George J. Mailath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2656
Reversible Cellular Automata Kenichi Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2668
Reversible Computing Kenichi Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2685
Rough and RoughFuzzy Sets in Design of Information Systems Theresa Beaubouef, Frederick Petry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2702
Rough Set Data Analysis Shusaku Tsumoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2716
Rough Sets in Decision Making Roman Słowi´nski, Salvatore Greco, Benedetto Matarazzo . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2727
Rough Sets: Foundations and Perspectives James F. Peters, Andrzej Skowron, Jarosław Stepaniuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2761
Rule Induction, Missing Attribute Values and Discretization Jerzy W. GrzymalaBusse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2772
Selforganized Criticality and Cellular Automata Michael Creutz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2780
SelfReplication and Cellular Automata Gianluca Tempesti, Daniel Mange, André Stauﬀer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2792
Semantic Web Wendy Hall, Kieron O’Hara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2810
Signaling Games Joel Sobel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2830
Social Network Analysis, Estimation and Sampling in Ove Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2845
Social Network Analysis, Graph Theoretical Approaches to Wouter de Nooy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2864
Social Network Analysis, LargeScale Vladimir Batagelj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2878
Social Network Analysis, Overview of John Scott . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2898
Social Network Analysis, TwoMode Concepts in Stephen P. Borgatti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2912
Table of Contents
Social Networks, Algebraic Models for Philippa Pattison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2925
Social Networks, Diﬀusion Processes in Thomas W. Valente . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2940
Social Networks, Exponential Random Graph (p*) Models for Garry Robins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2953
Social Networks and Granular Computing ChurnJung Liau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2968
Social Network Visualization, Methods of Linton C. Freeman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2981
Social Phenomena Simulation Paul Davidsson, Harko Verhagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2999
Social Processes, Simulation Models of Klaus G. Troitzsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3004
Soft Computing, Introduction to Janusz Kacprzyk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3020
Static Games Oscar Volij . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3023
Statistical Applications of Wavelets Soﬁa Olhede . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3043
Statistics with Imprecise Data María Ángeles Gil, Olgierd Hryniewicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3052
Stochastic Games Eilon Solan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3064
Stochastic Loewner Evolution: Linking Universality, Criticality and Conformal Invariance in Complex Systems Hans C. Fogedby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3075
Stochastic Processes Alan J. McKane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3097
Structurally Dynamic Cellular Automata Andrew Ilachinski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3114
Swarm Intelligence Gerardo Beni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3150
Synchronization Phenomena on Networks Guanrong Chen, Ming Zhao, Tao Zhou, BingHong Wang . . . . . . . . . . . . . . . . . . . . . . . . . . .
3170
Thermodynamics of Computation H. John Caulﬁeld, Lei Qian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3187
Tiling Problem and Undecidability in Cellular Automata Jarkko Kari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3198
Topological Dynamics of Cellular Automata Petr K˚urka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3212
TwoSided Matching Models Marilda Sotomayor, Ömer Özak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3234
Unconventional Computing, Introduction to Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3258
XXIX
XXX
Table of Contents
Unconventional Computing, Novel Hardware for Tetsuya Asai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3260
Voting Alvaro Sandroni, Jonathan Pogach, Michela Tincani, Antonio Penta, Deniz Selman . . . . . . . . . . . . .
3280
Voting Procedures, Complexity of Olivier Hudry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3291
Wavelets, Introduction to Edward Aboufadel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3314
Wavelets and the Lifting Scheme Anders La Cour–Harbo, Arne Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3316
Wavelets and PDE Techniques in Image Processing, a Quick Tour of HaoMin Zhou, Tony F. Chan, Jianhong Shen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3341
World Wide Web, Graph Structure Lada A. Adamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3358
ZeroSum Two Person Games T.E.S. Raghavan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3372
List of Glossary Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3397
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3417
Contributors
ABELSON, HAL Massachusetts Institute of Technology Cambridge USA
ANGELOV, PLAMEN Lancaster University Lancaster UK
ABOUFADEL, EDWARD Grand Valley State University Allendale USA
ASAI , TETSUYA Hokkaido University Sapporo Japan
ABRAMSKY, SAMSON Oxford University Computing Laboratory Oxford UK
AVENHAUS, RUDOLF Armed Forces University Munich Neubiberg Germany
ADAMATZKY, ANDREW University of the West of England Bristol UK
BAGNOLI , FRANCO University of Florence Florence Italy
ADAMIC, LADA A. University of Michigan Ann Arbor USA
BANDINI , STEFANIA University of MilanBicocca Milan Italy
ALAVA , MIKKO J. Espoo University of Technology Espoo Finland ALONSOSANZ, RAMÓN Universidad Politécnica de Madrid Madrid Spain
BATAGELJ, VLADIMIR University of Ljubljana Ljubljana Slovenia BAYS, CARTER University of South Carolina Columbia USA
AMOS, MARTYN Manchester Metropolitan University Manchester UK
BEAL, JACOB Massachusetts Institute of Technology Cambridge USA
ÁNGELES GIL, MARÍA University of Oviedo Oviedo Spain
BEAUBOUEF, THERESA Southeastern Louisiana University Hammond USA
XXXII
Contributors
BEER, MICHAEL National University of Singapore Kent Ridge Singapore
CANTY, MORTON J. Forschungszentrum Jülich Jülich Germany
BENI , GERARDO University of California Riverside Riverside USA
CASTELLANO, CLAUDIO “Sapienza” Università di Roma Roma Italy
BERKHIN, PAVEL eBay Inc. San Jose USA BERMAN, ABRAHAM Technion – Israel Institute of Technology Haifa Israel BJELLAND, JOHANNES Telenor R&I Fornebu Norway BOCCARA , N INO University of Illinois Chicago USA CE Saclay GifsurYvette France BORGATTI , STEPHEN P. University of Kentucky Lexington USA
CASTIGLIONE, FILIPPO Institute for Computing Applications (IAC) – National Research Council (CNR) Rome Italy CASTILLO, OSCAR Tijuana Institute of Technology Tijuana Mexico CAULFIELD, H. JOHN Fisk University Nashville USA CAVERLEE, JAMES Texas A&M University College Station USA CECCHERINI SILBERSTEIN, TULLIO Università del Sannio Benevento Italy
BRAMEIER, MARKUS University of Aarhus Århus Denmark
CENEK, MARTIN Portland State University Portland USA
BRAMS, STEVEN J. New York University New York USA
CERCONE, N ICK York University Toronto Canada
BREIER, GEORG Technische Universität Dresden Dresden Germany
CERVELLE, JULIEN Université ParisEst Marne la Vallée France
CANRIGHT, GEOFFREY Telenor R&I Fornebu Norway
CHANG, KOWLUNG National Taiwan University Taipeh Taiwan
Contributors
CHAN, TONY F. University of California Los Angeles USA CHATTERJEE, KALYAN The Pennsylvania State University University Park USA CHECHKIN, ALEKSEI V. Institute for Theoretical Physics NSC KIPT Kharkov Ukraine CHEN, CHUNHAO National Cheng–Kung University Tainan Taiwan CHEN, GUANRONG City University of Hong Kong Hong Kong China CHEN, Z HENGXIN University of Nebraska at Omaha Omaha USA CHOPARD, BASTIEN University of Geneva Geneva Switzerland CHUNG, FAN University of California San Diego USA CONTE, ROSARIA CNR Rome Italy
CPAŁKA , KRZYSZTOF Cz˛estochowa University of Technology Cz˛estochowa Poland Academy of Humanities and Economics Lodz Poland CREUTZ, MICHAEL Brookhaven National Laboratory Upton USA DARDZINSKA , AGNIESZKA Białystok Technical University Białystok Poland DAVIDSSON, PAUL Blekinge Institute of Technology Ronneby Sweden DEMURJIAN, STEVEN A. The University of Connecticut Storrs USA DENNUNZIO, ALBERTO Università degli Studi di MilanoBicocca Milan Italy DE N OOY , W OUTER University of Amsterdam Amsterdam The Netherlands DE SILVA , CLARENCE W. University of British Columbia Vancouver Canada
DEUTSCH, ANDREAS Technische Universität Dresden Dresden Germany
COORNAERT, MICHEL Université Louis Pasteur et CNRS Strasbourg France
DHILLON, INDERJIT S. University of Texas Austin USA
CORCHÓN, LUIS C. Universidad Carlos III Madrid Spain
DITTRICH, PETER Friedrich Schiller University Jena Jena Germany
XXXIII
XXXIV
Contributors
DOREIAN, PATRICK University of Pittsburgh Pittsburgh USA
FADILI , JALAL Ecole Nationale Supèrieure d’Ingènieurs de Caen Caen Cedex France
DOROGOVTSEV, SERGEY N. Universidade de Aveiro Aveiro Portugal A. F. Ioﬀe PhysicoTechnical Institute St. Petersburg Russia
FALOUTSOS, MICHALIS University of California Riverside USA
DRAPER, DAVID University of California Santa Cruz USA DUBOIS, DIDIER Universite Paul Sabatier Toulouse Cedex France DUNNE, JENNIFER A. Santa Fe Institute Santa Fe USA Paciﬁc Ecoinformatics and Computational Ecology Lab Berkeley USA DURANDLOSE, JÉRÔME Université d’Orléans Orléans France DUTTA , PRAJIT K. Columbia University New York USA DŽEROSKI , SAŠO Jožef Stefan Institute Ljubljana Slovenia
FOGEDBY, HANS C. University of Aarhus Aarhus Denmark Niels Bohr Institute Copenhagen Denmark FORGES, FRANÇOISE Université ParisDauphine Paris France FORMENTI , ENRICO Université de Nice Sophia Antipolis Sophia Antipolis France FORTUNATO, SANTO ISI Foundation Torino Italy FRANK, OVE Stockholm University Stockholm Sweden FREEMAN, LINTON C. University of California Irvine USA
ENGØMONSEN, KENTH Telenor R&I Fornebu Norway
FRIEDRICH, RUDOLF University of Münster Münster Germany
ERGÜN, GÜLER University of Bath Bath UK
GORECKA , JOANNA N ATALIA Polish Academy of Science Warsaw Poland
Contributors
GORECKI , JERZY Polish Academy of Science Warsaw Poland Cardinal Stefan Wyszynski University Warsaw Poland GOSSNER, OLIVIER Northwestern University Paris France GRASSL, MARKUS Austrian Academy of Sciences Innsbruck Austria GRAVNER, JANKO University of California Davis USA GRECO, SALVATORE University of Catania Catania Italy GRZYMALA BUSSE, JERZY W. University of Kansas Lawrence USA Polish Academy of Sciences Warsaw Poland
HANSON, JAMES E. IBM T.J. Watson Research Center Yorktown Heights USA HARDING, SIMON Memorial University St. John’s Canada HATZIKIROU, HARALAMBOS Technische Universität Dresden Dresden Germany HAUPTMAN, AMI BenGurion University BeerSheva Israel HENDLER, JAMES Rensselaer Polytechnic Institute Troy USA HE, YIHUA University of California Riverside USA HIRANO, SHOJI Shimane University, School of Medicine Enyacho Izumo City, Shimane Japan
HALL, W ENDY University of Southampton Southampton United Kingdom
HONG, TZUNGPEI National University of Kaohsiung Kaohsiung Taiwan
HAN, BIN University of Alberta Edmonton Canada
HRYNIEWICZ, OLGIERD Systems Research Institute Warsaw Poland
HÄNGGI , PETER University of Augsburg Augsburg Germany
HUANG, SUI Department of Biological Sciences, University of Calgary Calgary Canada
HAN, JIANCHAO California State University Dominguez Hills, Carson USA
HUDRY, OLIVIER École Nationale Supérieure des Télécommunications Paris France
XXXV
XXXVI
Contributors
ILACHINSKI , ANDREW Center for Naval Analyses Alexandria USA
KENDON, VIV University of Leeds Leeds UK
JARRAH, ABDUL S. Virginia Polytechnic Institute and State University Virginia USA
KLAFTER, JOSEPH Tel Aviv University Tel Aviv Israel University of Freiburg Freiburg Germany
JENSEN, ARNE Aalborg University Aalborg East Denmark JENSEN, HENRIK JELDTOFT Institute for Mathematical Sciences London UK Imperial College London London UK JENSEN, HENRIK, JELDTOFT Imperial College London London UK JOHN, ROBERT I. De Montfort University Leicester United Kingdom JORGENSEN, PALLE E. T. The University of Iowa Iowa City USA KACPRZYK, JANUSZ Polish Academy of Sciences Warsaw Poland
KOKOL, PETER University of Maribor Maribor Slovenia KOSTER, MAURICE University of Amsterdam Amsterdam Netherlands ˚ KURKA , PETR Université de Nice Sophia Antipolis Nice France Academy of Sciences and Charles University Prague Czechia
KUTER, UGUR University of Maryland College Park USA KUTRIB, MARTIN Universität Giessen Giessen Germany
KARI , JARKKO University of Turku Turku Finland
LA COUR–HARBO, ANDERS Aalborg University Aalborg East Denmark
KAUFFMAN, STUART A. Department of Biological Sciences, University of Calgary Calgary Canada
LAI , MINGJUN The University of Georgia Athens USA
KEINERT, FRITZ Iowa State University Ames USA
LANGE, W OLFGANG University of Sussex Brighton UK
Contributors XXXVII
LAUBENBACHER, REINHARD Virginia Polytechnic Institute and State University Virginia USA LAURITSEN, KENT BÆKGAARD Danish Meteorological Institute Copenhagen Denmark LAVI , RON The Technion – Israel Institute of Technology Haifa Israel LEVY, MOSHE The Hebrew University Jerusalem Israel LIAU, CHURNJUNG Academia Sinica Taipei Taiwan
MACAL, CHARLES M. Center for Complex Adaptive Agent Systems Simulation (CAS2 ) Argonne USA MACDONALD, JOANNE Columbia University New York USA MACHOSTADLER, INÉS Universitat Autònoma de Barcelona Barcelona Spain MACLENNAN, BRUCE J. University of Tennessee Knoxville USA MAC N AMEE, BRIAN Dublin Institute of Technology Dublin Ireland
LILJEROS, FREDRIK Stockholm University Stockholm Sweden
MAILATH, GEORGE J. University of Pennsylvania Philadelphia USA
LIN, TSAU YOUNG San Jose State University San Jose USA
MANGE, DANIEL Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
LIU, HUAN Arizona State University Tempe USA
MANZONI , SARA University of MilanBicocca Milan Italy
LLOYD, SETH MIT Cambridge USA
MARGENSTERN, MAURICE Université Paul Verlaine Metz France
LODWICK, W ELDON A. University of Colorado Denver Denver USA
MATARAZZO, BENEDETTO University of Catania Catania Italy
LO, HOI KWONG University of Toronto Toronto Canada
MCKANE, ALAN J. University of Manchester Manchester UK
XXXVIII Contributors
MELIN, PATRICIA Tijuana Institute of Technology Tijuana Mexico
MUTO, SHIGEO Institute of Technology Tokyo Japan
MENASALVAS, ERNESTINA Facultad de Informatica Madrid Spain
MUZY, ALEXANDRE Università di Corsica Corte France
MENDEL, JERRY M. University of Southern California Los Angeles USA METZLER, RALF Technical University of Munich Garching Germany MILBURN, GERARD J. The University of Queensland Brisbane Australia MILLER, JULIAN F. University of York Heslington UK MITCHELL, MELANIE Portland State University Portland USA MORITA , KENICHI Hiroshima University HigashiHiroshima Japan MORTVEIT, HENNING S. Virginia Polytechnic Institute and State University Virginia USA MOSCA , MICHELE University of Waterloo Waterloo Canada St. Jerome’s University Waterloo Canada Perimeter Institute for Theoretical Physics Waterloo Canada
N ACHBAR, JOHN Washington University St. Louis USA N AUGHTON, THOMAS J. National University of Ireland Maynooth County Kildare Ireland University of Oulu, RFMedia Laboratory Ylivieska Finland N ICODEMI , MARIO University of Warwick Coventry UK N ILSSON JACOBI , MARTIN Chalmers University of Technology Gothenburg Sweden N OLFI , STEFANO National Research Council (CNR) Rome Italy N ORTH, MICHAEL J. Center for Complex Adaptive Agent Systems Simulation (CAS2 ) Argonne USA N OWICKI , ROBERT Cz˛estochowa University of Technology Cz˛estochowa Poland O’HARA , KIERON University of Southampton Southampton United Kingdom
Contributors
OLHEDE, SOFIA University College London London UK
PEINKE, JOACHIM CarlvonOssietzky University Oldenburg Oldenburg Germany
ORLOV, MICHAEL BenGurion University BeerSheva Israel
PENTA , ANTONIO University of Pennsylvania Philadelphia USA
ÖZAK, ÖMER Brown University Providence USA
PEPER, FERDINAND National Institute of Information and Communications Technology Kobe Japan
PAGE JR., FRANK H. Indiana University Bloomington USA Universite Paris 1 Pantheon–Sorbonne France ˇ PANOV, PAN CE Jožef Stefan Institute Ljubljana Slovenia
PAPAGEORGIOU, ANARGYROS Columbia University New York USA PARRY, HAZEL R. Central Science Laboratory York UK PATTISON, PHILIPPA University of Melbourne Parkville Australia ˘ UN, GHEORGHE PA Institute of Mathematics of the Romanian Academy Bucure¸sti Romania PEDRYCZ, W ITOLD University of Alberta Edmonton Canada Polish Academy of Sciences Warsaw Poland
PÉREZCASTRILLO, DAVID Universitat Autònoma de Barcelona Barcelona Spain PETERS, JAMES F. University of Manitoba Winnipeg Canada PETRY, FREDERICK Stennis Space Center Mississippi USA PIVATO, MARCUS Trent University Peterborough Canada PODGORELEC, VILI University of Maribor Maribor Slovenia POGACH, JONATHAN University of Pennsylvania Philadelphia USA ´ , AGATA POKROPI NSKA Jan Dlugosz University Cz˛estochowa Poland
POLKOWSKI , LECH PolishJapanese Institute of Information Technology Warsaw Poland
XXXIX
XL
Contributors
POVALEJ, PETRA University of Maribor Maribor Slovenia
RENAULT, JÉRÔME Université Paris Dauphine Paris France
PRADE, HENRI Universite Paul Sabatier Toulouse Cedex France
REZA RAHIMI TABAR, M. Sharif University of Technology Theran Iran
PRUESSNER, GUNNAR Imperial College London London UK PU, CALTON Georgia Institute of Technology Atlanta USA QIAN, LEI Fisk University Nashville USA QUINCAMPOIX, MARC Université de Bretagne Occidentale Brest France RAGHAVAN, T.E.S. University of Illinois Chicago USA RAND, DARREN Massachusetts Institute of Technology Lexington USA RAS, Z BIGNIEW W. University of North Carolina Charlotte USA Polish Academy of Sciences Warsaw Poland
ROBINS, GARRY University of Melbourne Melbourne Australia RÖTTELER, MARTIN NEC Laboratories America, Inc. Princeton USA RUTKOWSKI , LESZEK Cz˛estochowa University of Technology Cz˛estochowa Poland SABOURIAN, HAMID University of Cambridge Cambridge UK SANDHOLM, W ILLIAM H. University of Wisconsin Madison USA SANDRONI , ALVARO University of Pennsylvania Philadelphia USA SCHERER, RAFAŁ Cz˛estochowa University of Technology Cz˛estochowa Poland
RAVI , S.S. University at Albany – State University of New York New York USA
SCOTT, JOHN University of Plymouth Plymouth UK
REIF, JOHN H. Duke University Durham USA
SELMAN, DENIZ University of Pennsylvania Philadelphia USA
Contributors
SERRANO, ROBERTO Brown University Providence USA IMDEASocial Sciences Madrid Spain SHAKEDMONDERER, N AOMI Emek Yezreel College Emek Yezreel Israel SHEN, JIANHONG Barclays Capital New York USA SIBANI , PAOLO SDU Odense Denmark SIGANOS, GEORGOS University of California Riverside USA SIPPER, MOSHE BenGurion University BeerSheva Israel SKOWRON, ANDRZEJ Warsaw University Warsaw Poland ´ , ROMAN SŁOWI NSKI Poznan University of Technology Poznan Poland Polish Academy of Sciences Warsaw Poland
SOLAN, EILON Tel Aviv University Tel Aviv Israel SOLÉ, RICARD V. Santa Fe Institute Santa Fe USA SONG, MYUNGSIN Southern Illinois University Edwardsville USA SORNETTE, DIDIER ETH Zurich Zurich Switzerland SOTOMAYOR, MARILDA University of São Paulo/SP São Paulo Brazil Brown University Providence USA STARCK, JEANLUC CEA/Saclay Gif sur Yvette France STAUFFER, ANDRÉ Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland STEFANOVIC, DARKO University of New Mexico Albuquerque USA STEIGLITZ, KEN Princeton University Princeton USA
SNIJDERS, TOM A. B. University of Oxford Oxford United Kingdom
STEPANIUK, JAROSŁAW Białystok University of Technology Białystok Poland
SOBEL, JOEL University of California San Diego USA
STEPHENS, DAVID A. McGill University Montreal Canada
XLI
XLII
Contributors
STIGLIC, GREGOR University of Maribor Maribor Slovenia
TORRA , VICENÇ Institut d’Investigació en Intelligència Artiﬁcial – CSIC Bellaterra Spain
STOJANOVIC, MILAN Columbia University New York USA
TRAUB, JOSEPH F. Columbia University New York USA
SURI , N IRANJAN Institute for Human and Machine Cognition Pensacola USA SUSSMAN, GERALD JAY Massachusetts Institute of Technology Cambridge USA SUTNER, KLAUS Carnegie Mellon University Pittsburgh USA
TROITZSCH, KLAUS G. Universität KoblenzLandau Koblenz Germany TSALLIS, CONSTANTINO Centro Brasileiro de Pesquisas Físicas Rio de Janeiro Brazil Santa Fe Institute Santa Fe USA
TÄUBER, UWE CLAUS Virginia Polytechnic Institute and State University Blacksburg USA
TSENG, VINCENT S. National Cheng–Kung University Tainan Taiwan
TEMPESTI , GIANLUCA University of York York UK
TSUMOTO, SHUSAKU Faculty of Medicine, Shimane University Shimane Japan
TEUSCHER, CHRISTOF Los Alamos National Laboratory Los Alamos USA
TÜRK¸S EN, I. BURHAN TOBBETÜ, (Economics and Technology University of the Union of Turkish Chambers and Commodity Exchanges) Ankara Republic of Turkey
TIMMIS, JON University of York York UK University of York York UK
UMEO, HIROSHI University of Osaka Osaka Japan
TINCANI , MICHELA University of Pennsylvania Philadelphia USA
UNTIEDT, ELIZABETH A. University of Colorado Denver Denver USA
TOMALA , TRISTAN HEC Paris Paris France
VALENTE, THOMAS W. University of Southern California Alhambra USA
Contributors
VALVERDE, SERGI Parc de Recerca Biomedica de Barcelona Barcelona Spain
W EBB, STEVE Georgia Institute of Technology Atlanta USA
VERHAGEN, HARKO Stockholm University and Royal Institute of Technology Stockholm Sweden
W HITE, ANDREW G. The University of Queensland Brisbane Australia
VERLIC, MATEJA University of Maribor Maribor Slovenia
W IESNER, KAROLINE University of Bristol Bristol UK University of Bristol Bristol UK
VITEK, JAN Purdue University West Lafayette USA VIZZARI , GIUSEPPE University of MilanBicocca Milan Italy VOLIJ, OSCAR BenGurion University BeerSheva Israel
W OODERS, MYRNA Vanderbilt University Nashville USA University of Warwick Coventry UK
VOORHEES, BURTON Athabasca University Athabasca Canada
W OODS, DAMIEN University College Cork Cork Ireland University if Seville Seville Spain
W AKO, JUN Gakushuin University Tokyo Japan
W ORSCH, THOMAS Universität Karlsruhe Karlsruhe Germany
W ANG, BINGHONG University of Science and Technology of China Hefei Anhui China Shanghai Academy of System Science Shanghai China
YANG, W EI Z HE National Taiwan University Taipei Taiwan
W ASILEWSKA , ANITA Stony Brook University Stony Brook USA W ATROUS, JOHN University of Waterloo Waterloo Canada
YEUNG, CHI HO The Hong Kong University of Science and Technology Hong Kong China Université de Fribourg Pérolles, Fribourg Switzerland University of Electronic Science and Technology of China (UESTC) Chengdu China
XLIII
XLIV
Contributors
YILMAZ, LEVENT Auburn University Alabama USA
Z HANG, YANQING Georgia State University Atlanta USA
YULMETYEV, RENAT M. Kazan State University Kazan Russia Tatar State University of Pedagogical and Humanities Sciences Kazan Russia
Z HANG, YI CHENG The Hong Kong University of Science and Technology Hong Kong China Université de Fribourg Pérolles, Fribourg Switzerland University of Electronic Science and Technology of China (UESTC) Chengdu China
Z ADEH, LOTFI A. University of California Berkeley USA Z AMIR, SHMUEL Hebrew University Jerusalem Israel Z EIGLER, BERNARD University of Arizona Tucson USA Z EITOUNI, OFER University of Minnesota Minneapolis USA Ž ENKO, BERNARD Jožef Stefan Institute Ljubljana Slovenia
Z HAO, MING University of Science and Technology of China Hefei Anhui China Z HAO, YI University of Toronto Toronto Canada Z HAO, Z HENG Arizona State University Tempe USA Z HOU, HAOMIN Georgia Institute of Technology Atlanta USA
Z HANG, BO Tsinghua University Beijing China
Z HOU, TAO University of Science and Technology of China Hefei Anhui China
Z HANG, LING Anhui University, Hefei Anhui China
Z ORMAN, MILAN University of Maribor Maribor Slovenia
Additive Cellular Automata
Additive Cellular Automata BURTON VOORHEES Center for Science, Athabasca University, Athabasca, Canada
Article Outline Glossary Deﬁnition of the Subject Introduction Notation and Formal Deﬁnitions Additive Cellular Automata in One Dimension dDimensional Rules Future Directions Bibliography
Glossary Cellular automata Cellular automata are dynamical systems that are discrete in space, time, and value. A state of a cellular automaton is a spatial array of discrete cells, each containing a value chosen from a ﬁnite alphabet. The state space for a cellular automaton is the set of all such conﬁgurations. Alphabet of a cellular automaton The alphabet of a cellular automaton is the set of symbols or values that can appear in each cell. The alphabet contains a distinguished symbol called the null or quiescent symbol, usually indicated by 0, which satisﬁes the condition of an additive identity: 0 C x D x. Cellular automata rule The rule, or update rule of a cellular automaton describes how any given state is transformed into its successor state. The update rule of a cellular automaton is described by a rule table, which deﬁnes a local neighborhood mapping, or equivalently as a global update mapping. Additive cellular automata An additive cellular automaton is a cellular automaton whose update rule satisﬁes the condition that its action on the sum of two states is equal to the sum of its actions on the two states separately. Linear cellular automata A linear cellular automaton is a cellular automaton whose update rule satisﬁes the condition that its action on the sum of two states separately equals action on the sum of the two states plus its action on the state in which all cells contain the quiescent symbol. Note that some researchers reverse the deﬁnitions of additivity and linearity.
Neighborhood The neighborhood of a given cell is the set of cells that contribute to the update of value in that cell under the speciﬁed update rule. Rule table The rule table of a cellular automaton is a listing of all neighborhoods together with the symbol that each neighborhood maps to under the local update rule. Local maps of a cellular automaton The local mapping for a cellular automaton is a map from the set of all neighborhoods of a cell to the automaton alphabet. State transition diagram The state transition diagram (STD) of a cellular automaton is a directed graph with each vertex labeled by a possible state and an edge directed from a vertex x to a vertex y if and only if the state labeling vertex x maps to the state labeling vertex y under application of the automaton update rule. Transient states A transient state of a cellular automaton is a state that can at most appear only once in the evolution of the automaton rule. Cyclic states A cyclic state of a cellular automaton is a state lying on a cycle of the automaton update rule, hence it is periodically revisited in the evolution of the rule. Basins of attraction The basins of attraction of a cellular automaton are the equivalences classes of cyclic states together with their associated transient states, with two states being equivalent if they lie on the same cycle of the update rule. Predecessor state A state x is the predecessor of a state y if and only if x maps to y under application of the cellular automaton update rule. More speciﬁcally, a state x is an nth order predecessor of a state y if it maps to y under n applications of the update rule. GardenofEden A GardenofEden state is a state that has no predecessor. It can be present only as an initial condition. Surjectivity A mapping is surjective (or onto) if every state has a predecessor. Injectivity A mapping is injective (onetoone) if every state in its domain maps to a unique state in its range. That is, if states x and y both map to a state z then x D y. Reversibility A mapping X is reversible if and only if a second mapping X 1 exists such that if X (x) D y then X 1 (y) D x. For ﬁnite state spaces reversibility and injectivity are identical. Definition of the Subject Cellular automata are discrete dynamical systems in which an extended array of symbols from a ﬁnite alphabet is it
1
2
Additive Cellular Automata
eratively updated according to a speciﬁed local rule. Originally developed by John von Neumann [1,2] in 1948, following suggestions from Stanislaw Ulam, for the purpose of showing that selfreplicating automata could be constructed. Von Neumann’s construction followed a complicated set of reproduction rules but later work showed that selfreproducing automata could be constructed with only simple update rules, e. g. [3]. More generally, cellular automata are of interest because they show that highly complex patterns can arise from the application of very simple update rules. While conceptually simple, they provide a robust modeling class for application in a variety of disciplines, e. g. [4], as well as fertile grounds for theoretical research. Additive cellular automata are the simplest class of cellular automata. They have been extensively studied from both theoretical and practical perspectives. Introduction A wide variety of cellular automata applications, in a number of diﬀering disciplines, has appeared in the past ﬁfty years, see, e. g. [5,6,7,8]. Among other things, cellular automata have been used to model growth and aggregation processes [9,10,11,12]; discrete reactiondiﬀusion systems [13,14,15,16,17]; spin exchange systems [18,19]; biological pattern formation [20,21]; disease processes and transmission [22,23,24,25,26]; DNA sequences, and gene interactions [27,28]; spiral galaxies [29]; social interaction networks [30]; and forest ﬁres [31,32]. They have been used for language and pattern recognition [33, 34,35,36,37,38,39]; image processing [40,41]; as parallel computers [42,43,44,45,46,47]; parallel multipliers [48]; sorters [49]; and prime number sieves [50]. In recent years, cellular automata have become important for VLSI logic circuit design [51]. Circuit designers need “simple, regular, modular, and cascadable logic circuit structure to realize a complex function” and cellular automata, which show a signiﬁcant advantage over linear feedback shift registers, the traditional circuit building block, satisfy this need (see [52] for an extensive survey). Cellular automata, in particular additive cellular automata, are of value for producing highquality pseudorandom sequences [53,54,55,56,57]; for pseudoexhaustive and deterministic pattern generation [58,59,60,61,62,63]; for signature analysis [64,65,66]; error correcting codes [67,68]; pseudoassociative memory [69]; and cryptography [70]. In this discussion, attention focuses on the subclass of additive cellular automata. These are the simplest cellular automata, characterized by the property that the action of the update rule on the sum of two states is equal to the sum of the rule acting on each state separately. Hybrid
additive rules (i. e., with diﬀerent cells evolving according to diﬀerent additive rules) have proved particularly useful for generation of pseudorandom and pseudoexhaustive sequences, signature analysis, and other circuit design applications, e. g. [52,71,72]. The remainder of this article is organized as follows: Sect. “Notation and Formal Deﬁnitions” introduces deﬁnitions and notational conventions. In Sect. “Additive Cellular Automata in One Dimension”, consideration is restricted to onedimensional rules. The inﬂuence of boundary conditions on the evolution of onedimensional rules, conditions for rule additivity, generation of fractal spacetime outputs, equivalent forms of rule representation, injectivity and reversibility, transient lengths, and cycle periods are discussed using several approaches. Taking X as the global operator for an additive cellular automata, a method for analytic solution of equations of the form X () D ˇ is described. Section “dDimensional Rules” describes work on ddimensional rules deﬁned on tori. The discrete baker transformation is deﬁned and used to generalize onedimensional results on transient lengths, cycle periods, and similarity of state transition diagrams. Extensive references to the literature are provided throughout and a set of general references is provided at the end of the bibliography. Notation and Formal Definitions Let S(L) D fs i g be the set of lattice sites of a ddimensional lattice L with nr equal to the number of lattice sites on dimension r. Denote by A a ﬁnite symbols set with jAj D p (usually prime). An Aconﬁguration on L is a surjective map v : A 7! S(L) that assigns a symbol from A to each site in S(L). In this way, every Aconﬁguration deﬁnes a size n1 nd , ddimensional matrix of symbols drawn from A. Denote the set of all Aconﬁgurations on L by E (A; L). Each s i 2 S(L) is labeled by an integer vector iE D (i1 ; : : : ; i d ) where i r is the number of sites along the rth dimension separating s i from the assigned origin in L. The shift operator on the rth dimension of L is the map r : L 7! L deﬁned by r (s i ) D s j ;
Ej D (i1 ; : : : ; i r 1; : : : ; i d ) :
(1)
Equivalently, the shift maps the value at site Ei to the value at site Ej. Let (s i ; t) D (i1 ; : : : ; i d ; t) 2 A be the entry of corresponding to site s i at iteration t for any discrete dynamical system having E (A; L) as state space. Given a ﬁnite set of integer dtuples N D f(k1 ; : : : ; kd )g, deﬁne the
Additive Cellular Automata
N neighborhood of a site s i 2 S(L) as
o n ˇ N(s i ) D s j ˇ jE D Ei C Ek; kE 2 N
(2)
A neighborhood conﬁguration is a surjective map v : A 7! N(s0 ). Denote the set of all neighborhood conﬁgurations by EN (A). The rule table for a cellular automata acting on the state space E (A; L) with standard neighborhood N(s0 ) is deﬁned by a map x : EN (A) 7! A (note that this map need not be surjective or injective). The value of x for a given neighborhood conﬁguration is called the (value of the) rule component of that conﬁguration. The map x : EN (A) 7! A induces a global map X : E (A; L) 7! E (A; L) as follows: For any given element ˇ ˚ (t) 2 E (A; L), the set C (s i ) D (s j ; t) ˇs j 2 N(s i ) is a neighborhood conﬁguration for the site s i , hence the map (s i ; t) 7! x(C (s i )) for all s i produces a new symbol (s i ; t C 1). The site s i is called the mapping site. When taken over all mapping sites, this produces a matrix (t C 1) that is the representation of X ((t)). A cellular automaton is indicated by reference to its rule table or to the global map deﬁned by this rule table. A cellular automaton with global map X is additive if and only if, for all pairs of states and ˇ, X ( C ˇ) D X () C X (ˇ)
(3)
Addition of states is carried out sitewise mod(p) on the matrix representations of and ˇ; for example, for a onedimensional sixsite lattice with p D 3 the sum of 120112 and 021212 is 111021. The deﬁnition for additivity given in [52] diﬀers slightly from this standard deﬁnition. There, a binary valued cellular automaton is called “linear” if its local rule only involves the XOR operation and “additive” if it involves XOR and/or XNOR. A rule involving XNOR can be written as the binary complement of a rule involving only XOR. In terms of the global operator of the rule, this means that it has the form 1 C X where X satisﬁes Eq. (3) and 1 represents the rule that maps every site to 1. Thus, (1 C X )( C ˇ) equals 1 : : : 1 C X ( C ˇ) while (1 C X )() C (1 C X )(ˇ) D 1 : : : 1 C 1 : : : 1 C X () C X (ˇ) D X () C X (ˇ) mod(2) : In what follows, an additive rule is deﬁned strictly as one obeying Eq. (3), corresponding to rules that are “linear” in [52]. Much of the formal study of cellular automata has focused on the properties and forms of representation of the map X : E (A; L) 7! E (A; L). The structure of the state
transition diagram (STD(X )) of this map is of particular interest. Example 1 (Continuous Transformations of the Shift Dynamical System) Let L be isomorphic to the set of integers Z. Then E (A; Z) is the set of inﬁnite sequences with entries from A. With the product topology induced by the discrete topology on A and as the left shift map, the system (E (A; Z); ) is the shift dynamical system on A. The set of cellular automata maps X : E (A; Z) 7! E (A; Z) constitutes the class of continuous shiftcommuting transformations of (E (A; Z); ), a fundamental result of Hedlund [73]. Example 2 (Elementary Cellular Automata) Let L be isomorphic to L with A D f0; 1g and N D f1; 0; 1g. The neighborhood of site s i is fs i1 ; s i ; s iC1 g and EN (A) D f000; 001; 010; 011; 100; 101; 110; 111g. In this onedimensional case, the rule table can be written as x i D x(i0 i1 i2 ) where i0 i1 i2 is the binary form of the index i. Listing this gives the standard form for the rule table of an elementary cellular automata. 000 x0
001 x1
010 x2
011 x3
100 x4
101 x5
110 x6
111 x7
The standard labeling scheme for elementary cellular automata was introduced by Wolfram [74], who observed that the rule table for elementary rules deﬁnes the binary P number 71D0 x i 2 i and used this number to label the corresponding rule. Example 3 (The Game of Life) This simple 2dimensional cellular automata was invented by John Conway to illustrate a selfreproducing system. It was ﬁrst presented in 1970 by Martin Gardner [75,76]. The game takes place on a square lattice, either inﬁnite or toridal. The neighborhood of a cell consists of the eight cells surrounding it. The alphabet is {0,1}: a 1 in a cell indicates that cell is alive, a 0 indicates that it is dead. The update rules are: (a) If a cell contains a 0, it remains 0 unless exactly three of its neighbors contain a 1; (b) If a cell contains a 1 then it remains a 1 if and only if two or three of its neighbors are 1. This cellular automata produces a number of interesting patterns including a variety of ﬁxed points (still life); oscillators (period 2); and moving patterns (gliders, spaceships); as well as more exotic patterns such as glider guns which generate a stream of glider patterns. Additive Cellular Automata in One Dimension Much of the work on cellular automata has focused on rules in one dimension (d D 1). This section reviews some of this work.
3
4
Additive Cellular Automata
Boundary Conditions and Additivity In the case of onedimensional cellular automata, the lattice L can be isomorphic to the integers; to the nonnegative integers; to the ﬁnite set f0; : : : ; n 1g 2 Z; or to the integers modulo an integer n. In the ﬁrst case, there are no boundary conditions; in the remaining three cases, diﬀerent boundary conditions apply. If L is isomorphic to Zn , the integers mod(n), the boundary conditions are periodic and the lattice is circular (it is a padic necklace). This is called a cylindrical cellular automata [77] because evolution of the rule can be represented as taking place on a cylinder. If the lattice is isomorphic to f0; : : : ; n 1g, null, or Dirchlet boundary conditions are set [78,79,80]. That is, the symbol assigned to all sites in L outside of this set is the null symbol. When the lattice is isomorphic to the nonnegative integers ZC , null boundary conditions are set at the left boundary. In these latter two cases, the neighborhood structure assumed may inﬂuence the need for null conditions. Example 4 (Elementary Rule 90) Let ı represent the global map for the elementary cellular automata rule 90, with rule table 000 0
001 1
010 0
011 1
100 1
101 0
110 1
111 0
For a binary string in Z or Zn the action of rule 90 is deﬁned by [ı()] i D i1 C iC1 mod(2), where all indices are taken mod(n) in the case of Zn . In the remaining cases, 8 iD0 ˆ jwj c :
K(wjv) < K' (wjv) C c : Let z be a shortest program for x knowing v, and let z0 be cz 0 is a proa shortest program for y knowing hv; xi. As z; gram for hx; yi knowing v with representation system ', one has
Proposition 2 For any c 2 RC , there are at least (2nC1 2nc ) cincompressible words w such that jwj 6 n. Proof Each program produces only one word; therefore there cannot be more words whose complexity is below n than the number of programs of size n. Hence, one ﬁnds
K(hx; yijv) 0
cz j C c < K' (wjv) C c 6 jz; 6 K(xjv) C K(yjhv; xi) C 2 log2 (K(xjv)) C 3 C c ; which completes the proof.
The next result proves the existence of incompressible words.
Of course, this inequality is not an equality since if x D y, the complexity of K(hx; yi) is the same as K(x) up to some constant. When the equality holds, it is said that x and y have no mutual information. Note that if we can represent the combination of programs z and z0 in a shorter way, the upper bound is decreased by the same amount. For instance, if we choose x ; y D `(`(x))2 01`(x)x y, it becomes
b
2 log2 log2 jxj C log2 jxj C jxj C jyj : If we know that a program never contains a special word u, then with x ; y D xuy it becomes jxj C jyj.
b
Examples The properties seen so far allow one to prove most of the relations between complexities of words. For instance, choosing f : w 7! ww and g being the identity in Corollary 1, we obtain that there is a constant c such that jK(wwjv) K(wjv)j < c. In the same way, letting f be the identity and g be deﬁned by g(x) D " in Theorem 3, we get that there is a constant c such that K(wjv) < K(w) C c. By Theorem 4, Corollary 1, and choosing f as hx; yi 7! x y, and g as the identity, we have that there is a constant c such that K(x y) < K(x)CK(y)C2 min log2 (K(x)); log2 (K(y)))Cc:
jfw; K(w) < ngj 6 2n 1 : This implies that jfw; K(w) < jwj c and jwj 6 ngj 6 jfw; K(w) < ngj < 2nc ; and, since fw; K(w) > jwj c and jwj 6 ng fw; jwj 6 ngnfw; K(w) < jwj c and jwj 6 ng ; it holds that jfw; K(w) > jwj c and jwj 6 ngj > 2nC1 2nc : From this proposition one deduces that half of the words whose size is less than or equal to n reach the upper bound of their Kolmogorov complexity. This incompressibility method relies on the fact that most words are incompressible. Martin–Löf Tests Martin–Löf tests give another equivalent deﬁnition of random sequences. The idea is simple: a sequence is random if it does not pass any computable test of singularity (i. e.,
Algorithmic Complexity and Cellular Automata
a test that selects words from a “negligible” recursively enumerable set). As for Kolmogorov complexity, this notion needs a proper deﬁnition and implies a universal test. However, in this review we prefer to express tests in terms of Kolmogorov complexity. (We refer the reader to [13] for more on Martin–Löf tests.) Let us start with an example. Consider the test “to have the same number of 0s and 1s.” Deﬁne the set E D fw; w has the same number of 0s and 1sg and order it in military order (length ﬁrst, then lexicographic order), E D fe0 ; e1 ; : : : g. Consider the computable function f : x 7! e x (which is in fact computable for any decidable set E). Note that there are 2n n words has length 2n, x is less than in E of length 2n. Thus, if e x 2n p n C O(1). Then, us, whose logarithm is 2n log 2 n ing Theorem 3 with function f , one ﬁnds that r je x j K(e x ) < K(x) < log2 x < je x j log2 ; 2 up to an additive constant. We conclude that all members ex of E are not cincompressible whenever they are long enough. This notion of test corresponds to “belonging to a small set” in Kolmogorov complexity terms. The next proposition formalizes these ideas. Proposition 3 Let E be a recursively enumerable set such that jE \ f0; 1gn j D o(2n ). Then, for all constants c, there is an integer M such that all words of length greater than M of E are not cincompressible. Proof In this proof, we represent integers as binary strings. Let x ; y be deﬁned, for integers x and y, by
b
b
x ; y D jxj2 01x y : Let f be the computable function 8 < f0; 1g ! f0; 1g ; f : x; y 7! the yth word of length x : in the enumeration of E :
b
From Theorems 2 and 3, there is a constant d such that for all words u of E K(e) < j f 1 (e)j C d : Note u n D jE \ f0; 1gn j. From the deﬁnition of function f one has j f 1 (e)j 6 3 C 2 log2 (jej) C log2 (ujej ) : As u n D o(2n ), log2 (ujej ) jej tends to 1, so there is an integer M such that when jej > M, K(e) < j f 1 (e)j C d < jejc. This proves that no members of E whose length is greater than M are cincompressible.
Dynamical Systems and Symbolic Factors A (discretetime) dynamical system is a structure hX; f i where X is the set of states of the system and f is a function from X to itself that, given a state, tells which is the next state of the system. In the literature, the state space is usually assumed to be a compact metric space and f is a continuous function. The study of the asymptotic properties of a dynamical system is, in general, diﬃcult. Therefore, it can be interesting to associate the studied system with a simpler one and deduce the properties of the original system from the simple one. Indeed, under the compactness assumption, one can associate each system hX; f i with its symbolic factor as follows. Consider a ﬁnite open covering fˇ0 ; ˇ1 ; : : : ; ˇ k g of X. Label each set ˇ i with a symbol ˛ i . For any orbit of initial condition x 2 X, we build an inﬁnite word wx on f˛0 ; ˛1 ; : : : ; ˛ k g such that for all i 2 N, w x (i) D ˛ j if f i (x) 2 ˇ j for some j 2 f0; 1; : : : ; kg. If f i (x) belongs to ˇ j \ ˇ h (for j ¤ h), then arbitrarily choose either ˛ j or ˛ h . Denote by Sx the set of inﬁnite words associated with S the initial condition x 2 X and set S D x2X S x . The system hS; i is the symbolic system associated with hX; f i; is the shift map deﬁned as 8x 2 S8i 2 N; (x) i D x iC1 . When fˇ0 ; ˇ1 ; : : : ; ˇ k g is a clopen partition, then hS; i is a factor of hX; f i (see [12] for instance). Dynamical systems theory has made great strides in recent decades, and a huge quantity of new systems have appeared. As a consequence, scientists have tried to classify them according to diﬀerent criteria. From interactions among physicists was born the idea that in order to simulate a certain physical phenomenon, one should use a dynamical system whose complexity is not higher than that of the phenomenon under study. The problem is the meaning of the word “complexity”; in general it has a diﬀerent meaning for each scientist. From a computerscience point of view, if we look at the state space of factor systems, they can be seen as sets of biinﬁnite words on a ﬁxed alphabet. Hence, each factor hS; i can be associated with a language as follows. For each pair of words u w write u v if u is a factor of w. Given a ﬁnite word u and a biinﬁnite word v, with an abuse of notation we write u v if u occurs as a factor in v. The language L(v) associated with a biinﬁnite word v 2 A is deﬁned as L(v) D fu 2 A ; u vg : Finally, the language L(S) associated with the symbolic factor hS; i is given by L(S) D fu 2 A ; u vg :
137
138
Algorithmic Complexity and Cellular Automata
The idea is that the complexity of the system hX; f i is proportional to the language complexity of its symbolic factor hS; i (see, for example, Topological Dynamics of Cellular Automata). In [4,5], Brudno proposes to evaluate the complexity of symbolic factors using Kolmogorov complexity. Indeed, the complexity of the orbit of initial condition x 2 X according to ﬁnite open covering fˇ0 ; ˇ1 ; : : : ; ˇ k g is deﬁned as K(x; f ; fˇ0 ; ˇ1 ; : : : ; ˇ k g) D lim sup min
n!1 w2S x
K(w0:n ) : n
Finally, the complexity of the orbit x is given by K(x; f ) D sup K(x; f ; ˇ) ; where the supremum is taken w.r.t. all possible ﬁnite open coverings ˇ of X. Brudno has proven the following result. Theorem 5 ([4]) Consider a dynamical system hX; f i and an ergodic measure . For almost all x 2 X, K(x; f ) D H ( f ), where H ( f ) is the measure entropy of hX; f i (for more on the measure entropy see, for example, Ergodic Theory of Cellular Automata). Cellular Automata Cellular automata (CA) are (discretetime) dynamical systems that have received increased attention in the last few decades as formal models of complex systems based on local interaction rules. This is mainly due to their great variety of distinct dynamical behavior and to the possibility of easily performing largescale computer simulations. In this section we quickly review the deﬁnitions and useful results, which are necessary in the sequel. Consider the set of conﬁgurations C , which consists of all functions from Z D into A. The space C is usually equipped with the Cantor metric dC deﬁned as 8a; b 2 C ;
dC (a; b) D 2n ; with n D min fkE v k1 : a(E v ) ¤ b(E v )g ; vE2Z D
(3)
where kE v k1 denotes the maximum of the absolute value of the components of vE. The topology induced by dC coincides with the product topology induced by the discrete topology on A. With this topology, C is a compact, perfect, and totally disconnected space. Let N D fuE1 ; : : : ; uEs g be an ordered set of vectors of Z D and f : As 7! A be a function. Deﬁnition 5 (CA) The Ddimensional CA based on the local rule ı and the neighborhood frame N is the pair
hC ; f i, where f : C 7! C is the global transition rule deﬁned as follows: 8c 2 C ;
8E v 2 ZD ; v C uEs ) : f (c)(E v ) D ı c(E v C uE1 ); : : : ; c(E
(4)
Note that the mapping f is (uniformly) continuous with respect to dC . Hence, the pair hC ; f i is a proper (discretetime) dynamical system. In [7], a new metric on the phase space is introduced to better match the intuitive idea of chaotic behavior with its mathematical formulation. More results in this research direction can be found in [1,2,3,15]. This volume dedicates a whole chapter to the subject ( Dynamics of Cellular Automata in Noncompact Spaces). Here we simply recall the deﬁnition of the Besicovitch distance since it will be used in Subsect. “Example 2”. Consider the Hamming distance between two words u v on the same alphabet A, #(u; v) D jfi 2 N j u i ¤ v i gj. This distance can be easily extended to work on words that are factors of biinﬁnite words as follows: # h;k (u; v) D jfi 2 [h; k] j u i ¤ v i gj ; where h; k 2 Z and h < k. Finally, the Besicovitch pseudodistance is deﬁned for any pair of biinﬁnite words u v as dB (u; v) D lim sup n!1
#n;n (u; v) : 2n C 1
The pseudodistance dB can be turned into a distance when : taking its restriction to AZ jD: , where D is the relation of “being at null dB distance.” Roughly speaking, dB measures the upper density of diﬀerences between two biinﬁnite words. The spacetime diagram is a graphical representation of a CA orbit. Formally, for a Ddimensional CA f D with state set A, is a function from AZ N Z to A D deﬁned as (x; i) D f i (x) j , for all x 2 AZ , i 2 N and j 2 Z. The limit set ˝ f contains the longterm behavior of a CA on a given set U and is deﬁned as follows: \ f n (U) : ˝ f (U) D n2N
Unfortunately, any nontrivial property on the CA limit set is undecidable [11]. Other interesting information on a CA’s longterm behavior is given by the orbit limit set deﬁned as follows: [ O0f (u) ; f (U) D u2U
Algorithmic Complexity and Cellular Automata
where H 0 is the set of adherence points of H. Note that in general the limit set and the orbit limit sets give diﬀerent information. For example, consider a rich conﬁguration c i. e., a conﬁguration containing all possible ﬁnite patterns (here we adopt the terminology of [6]) and the shift map . Then, (fcg) D AZ , while ˝ (fcg) is countable. Algorithmic Complexity as a Demonstration Tool The incompressibility method is a valuable formal tool to decrease the combinatorial complexity of problems. It is essentially based on the following ideas (see also Chap. 6 in [13]): Incompressible words cannot be produced by short programs. Hence, if one has an incompressible (inﬁnite) word, it cannot be algorithmically obtained. Most words are incompressible. Hence, a word taken at random can usually be considered to be incompressible without loss of generality. If one “proves” a recursive property on incompressible words, then, by Proposition 3, we have a contradiction.
Application to Cellular Automata A portion of a spacetime diagram of a CA can be computed given its local rule and the initial condition. The part of the diagram that depends only upon it has at most the complexity of this portion (Fig. 1). This fact often implies great computational dependencies between the initial part and the ﬁnal part of the portion of the diagram: if the ﬁnal part has high complexity, then the initial part must be at least as complex. Using this basic idea, proofs are structured as follows: assume that the ﬁnal part has high complexity; use the hypothesis to prove that the initial part is not complex; then we have a contradiction. This technique provides a faster and clearer way to prove results that could be obtained by technical combinatorial arguments. The ﬁrst example illustrates this fact by rewriting a combinatorial proof in terms of Kolmogorov complexity. The second one is a result that was directly written in terms of Kolmogorov complexity. Example 1 Consider the following result about languages recognizable by CA. Proposition 4 ([17]) The language L D fuvu; u; v 2 f0; 1g ; juj > 1g is not recognizable by a realtime oneway CA.
Algorithmic Complexity and Cellular Automata, Figure 1 States in the gray zone can be computed for the states of the black line: K(gray zone) 6 K(black line) up to an additive constant
Before giving the proof in terms of Kolmogorov complexity, we must recall some concepts. A language L is accepted in realtime by a CA if is accepted after jxj transitions. An input word is accepted by a CA if after accepting state. A CA is oneway if the neighborhood is f0; 1g (i. e., the central cell and the one on its right). Proof Oneway CA have some interesting properties. First, from a oneway CA recognizing L, a computer can build a oneway CA recognizing L˙ D fuvu; u; v 2 ˙ ; juj > 1g, where ˙ is any ﬁnite alphabet. The idea is to code the ith letter of ˙ by 1u0 1u1 1u k 00, where u0 u1 u k is i written in binary, and to apply the former CA. Second, for any oneway CA of global rule f we have that f jwj (xw y) D f jwj (xw) f jwj (w y) for all words x, y, and w. Now, assume that a oneway CA recognizes L in real time. Then, there is an algorithm F that, for any integer n, computes a local rule for a oneway CA that recognizes Lf0;:::;n1g in real time. Fix an integer n and choose a 0incompressible word w of length n(n 1). Let A be the subset of Z of pairs (x; y) for x; y 2 f0; : : : ; n 1g and let x 6D y be deﬁned by A D f(x; y); the nx C yth bit of w is 1g : Since A can be computed from w and vice versa, one ﬁnds that K(A) D K(w) D jwj D n(n 1) up to an additive constant that does not depend on n. Order the set A D f(x0 ; y0 ); (x1 ; y1 ); : : : ; g and build a new word u as follows: u0 D y 0 x 0 ; u iC1 D u i y iC1 x iC1 u i ; u D ujAj1 ;
139
140
Algorithmic Complexity and Cellular Automata
From Lemma 1 in [17], one ﬁnds that for all x; y 2 f0; : : : ; n 1g, (x; y) 2 A , xuy 2 L. Let ı be the local rule produced by F(n), Q the set of ﬁnal states, and f the associated global rule. Since f
juj
(xuy) D f
juj
(xu) f
juj
(uy);
0 y!n , f , u and n; 0 The two words of y 0 of length ur that surround y!n 0 and that are missing to compute x!n with Eq. (6).
We obtain that 0 0 K(x!n jy!n ) 2urCK(u)CK(n)CK( f )CO(1) o(n)
one has that
(7)
(x; y) 2 A , xuy 2 L , ı( f
juj
(xu); f
juj
(uy)) 2 Q :
f juj (xu)
Hence, from the knowledge of each and f juj (ux) for x 2 f0; : : : ; n 1g, a list of 2n integers of f0; : : : ; n 1g, one can compute A. We conclude that K(A) 6 2n log2 (n) up to an additive constant that does not depend on n. This contradicts K(A) D n(n 1) for large enough n. Example 2 Notation Given x 2 AZ , we denote by x!n the word D w 2 S (2nC1) obtained by taking all the states xi for i 2 (2n C 1) D in the martial order. The next example shows a proof that directly uses the incompressibility method. D
Theorem 6 In the Besicovitch topological space there is no transitive CA. Recall that a CA f is transitive if for all nonempty sets A B there exists n 2 N such that f n (A) \ B ¤ ;. Proof By contradiction, assume that there exists a transitive CA f of radius r with C D jSj states. Let x and y be two conﬁgurations such that 8n 2 N; K(x!n jy!n )
n : 2
A simple counting argument proves that such conﬁgurations x and y always exist. Since f is transitive, there are two conﬁgurations x 0 and y 0 such that for all n 2 N
(the notations O and o are deﬁned with respect to n). Note that n is ﬁxed and hence K(n) is a constant bounded by log2 n. Similarly, r and S are ﬁxed, and hence K(f ) is constant and bounded by C 2rC1 log2 C C O(1). 0 Let us evaluate K(y!n jy!n ). Let a1 ; a2 ; a3 ; : : : ; a k be 0 diﬀer at, sorted the positive positions that y!n and y!n in increasing order. Let b1 D a1 and b i D a i a i1 , for 2 i k. By Eq. (5) we know that k 4ın. Note that Pk iD1 b i D a k n. Symmetrically let a10 ; a20 ; a30 ; : : : ; a 0k 0 be the absolute values of the strictly negative positions that y!n and 0 diﬀer at, sorted in increasing order. Let b10 D a10 and y!n 0 b i D a0i a0i1 , where 2 i k 0 . Equation (5) states that k 0 4ın. Since the logarithm is a concave function, one has P X ln b i bi n ln ln ; k k k and hence X n ln b i k ln ; k
(8)
which also holds for b0i and k 0 . Knowledge of the bi , b0i , and k C k 0 states of the cells of 0 0 , is enough to compute y!n , where y!n diﬀers from y!n 0 y!n from y!n . Hence, X 0 jy!n ) ln(b i ) K(y!n X C ln(b0i ) C (k C k 0 ) log2 C C O(1) : Equation (8) states that
0 (x!n ; x!n ) 4"n ;
(5)
0 ) 4ın ; (y!n ; y!n
and an integer u (which only depends on " and ı) such that f u (y 0 ) D x 0 ;
(6) 1
where " D ı D (4e 10 log2 C ) . In what follows only n varies, while C, u, x, y, x 0 , y 0 , ı, and " are ﬁxed and independent of n. 0 By Eq. (6), one may compute the word x!n from the following items:
0 K(y!n jy!n ) k ln
n n Ck 0 ln 0 C(kCk 0 ) log2 CCO(1): k k
The function k 7! k ln nk is increasing on [0; ne ]. As n k 4ın 10 log , we have that 2C e
k ln
10n n n n 4ın ln 10 log C ln e 10 log2 C 10 log C 2 2 k 4ın e e
and that (k C k 0 ) log2 C
2n log2 C : e 10 log2 C
Algorithmic Complexity and Cellular Automata
Replacing a, b, and k by a0 , b0 , and k 0 , the same sequence of inequalities leads to a similar result. One deduces that 0 jy!n ) K(y!n
(2 log2 C C 20)n C O(1) : e 10 log2 C
(9)
0 ). Similarly, Eq. (9) is also true with K(x!n jx!n The triangular mequality for Kolmogorov complexity K(ajb) K(ajc) C K(bjc) (consequence of theorems 3 and 4) gives: 0 0 0 ) C K(x!n jy!n ) K(x!n jy!n ) K(x!n jx!n 0 C K(y!n jy!n ) C O(1) :
Equations (9) and (7) allow one to conclude that K(x!n jy!n )
(2 log2 C C 20)n C o(n) : e 10 log2 C
The hypothesis on x and y was K(x!n jy!n ) n2 . This implies that n (2C C 20)n C o(n) : 2 e 10 log2 C The last inequality is false for big enough n.
Measuring CA Structural Complexity Another use of Kolmogorov complexity in the study of CA is to understand what maximum complexity they can produce extracting examples of CA that show high complexity characteristics. The question is to deﬁne what is the meaning of “show high complexity characteristics” and, more precisely, what characteristic to consider. This section is devoted to structural characteristics of CA, that is to say, complexity that can be observed through static particularities. The Case of Tilings In this section, we give the original example that was given for tilings, often considered as a static version of CA. In [10], Durand et al. construct a tile set whose tilings have maximum complexity. This paper contains two main results. The ﬁrst one is an upper bound for the complexity of tilings, and the second one is an example of tiling that reaches this bound. First we recall the deﬁnitions about tilings of a plane by Wang tiles. Deﬁnition 6 (Tilings with Wang tiles) Let C be a ﬁnite set of colors. A Wang tile is a quadruplet (n; s; e; w) of four colors from C corresponding to a square tile whose top color is n, left color is w, right color is e, and bottom color
is s. A Wang tile cannot be rotated but can be copied to inﬁnity. Given a set of Wang tiles T, we say that a plane can be tiled by T if one can place tiles from T on the square grid Z2 such that the adjacent border of neighboring tiles has the same color. A set of tiles T that can tile the plane is called a palette. The notion of local constraint gives a point of view closer to CA than tilings. Roughly speaking, it gives the local constraints that a tiling of the plane using 0 and 1 must satisfy. Note that this notion can be deﬁned on any alphabet, but we can equivalently code any letter with 0 and 1. Deﬁnition 7 (Tilings by local constraints) Let r be a positive integer called radius. Let C be a set of square patterns of size 2r C 1 made of 0 and 1 (formally a function from r; r to f0; 1g). The set is said to tile the plane if there is a way to put zeros and ones on the 2D grid (formally a function from Z2 to f0; 1g) whose patterns of size 2r C 1 are all in C. The possible layouts of zeros and ones on the (2D) grid are called the tilings acceptable for C. Seminal papers on this subject used Wang tiles. We translate these results in terms of local constraints in order to more smoothly apply them to CA. Theorem 7 proves that, among tilings acceptable by a local constraint, there is always one that is not too complex. Note that this is a good notion since the use of a constraint of radius 1, f 0 ; 1 g, which allows for all possible patterns, provides for acceptable highcomplexity tilings since all tilings are acceptable. Theorem 7 ([10]) Let C be a local constraint. There is tiling acceptable for C such that the Kolmogorov complexity of the central pattern of size n is O(n) (recall that the maximal complexity for a square pattern n n is O(n2 )). Proof The idea is simple: if one knows the bits present in a border of width r of a pattern of size n, there are ﬁnitely many possibilities to ﬁll the interior, so the ﬁrst one in any computable order (for instance, lexicographically when putting all horizontal lines one after the other) has at most the complexity of the border since an algorithm can enumerate all possible ﬁllings and take the ﬁrst one in the chosen order. Then, if one knows the bits in borders of width r of all central square patterns of size 2n for all positive integers n (Fig. 2), one can recursively compute for each gap the ﬁrst possible ﬁlling for a given computable order. The tiling obtained in this way (this actually deﬁnes a tiling since all cells are eventually assigned a value) has the required complexity: in order to compute the central pattern of size n,
141
142
Algorithmic Complexity and Cellular Automata
Algorithmic Complexity and Cellular Automata, Figure 2 Nested squares of size 2k and border width r
the algorithm simply needs all the borders of size 2k for k 6 2n, which is of length at most O(n). The next result proves that this bound is almost reachable. Theorem 8 Let r be any computable monotone and unbounded function. There exists a local constraint Cr such that for all tilings acceptable for Cr , the complexity of the central square pattern of is O(n/r(n)). The original statement does not have the O part. However, note that the simulation ofp Wang tiles by a local constraint uses a square of size ` D d log ke to simulate a single tile from a set of k Wang tiles; a square pattern of size n in the tiling corresponds to a square pattern of size n` in the original tiling. Function r can grow very slowly (for instance, the inverse of the Ackermann function) provided it grows monotonously to inﬁnity and is computable. Proof The proof is rather technical, and we only give a sketch. The basic idea consists in taking a tiling constructed by Robinson [16] in order to prove that it is undecidable to test if a local constraint can tile the plane. This tiling is selfsimilar and is represented in Fig. 3. As it contains increasingly larger squares that occur periodically (note that the whole tiling is not periodic but only quasiperiodic), one can perform more and more computation steps within these squares (the periodicity is required to be sure that squares are present).
Algorithmic Complexity and Cellular Automata, Figure 3 Robinson’s tiling
Using this tiling, Robinson can build a local constraint that simulates any Turing machine. Note that in the present case, the situation is more tricky than it seems since some technical features must be assured, like the fact that a square must deal with the smaller squares inside it or the constraint to add to make sure that smaller squares
Algorithmic Complexity and Cellular Automata
have the same input as the bigger one. Using the constraint to forbid the occurrence of any ﬁnal state, one gets that the compatible tilings will only simulate computations for inputs for which it does not halt. To ﬁnish the proof, Durand et al. build a local constraint C that simulates a special Turing machine that halts on inputs whose complexity is small. Such a Turing machine exists since, though Kolmogorov complexity is not computable, testing all programs from " to 1n allows a Turing machine to compute all words whose complexity is below n and halt if it ﬁnds one. Then all tilings compatible with C contain in each square an input on which the Turing machine does not halt yet is of high complexity. Function r occurs in the technical arguments since computable zones do not grow as fast as the length of squares. The Case of Cellular Automata As we have seen so far, one of the results of [10] is that a tile set always produces tilings whose central square patterns of size n have a Kolmogorov complexity of O(n) and not n2 (which is the maximal complexity). In the case of CA, something similar holds for spacetime diagrams. Indeed, if one knows the initial row. One can compute the triangular part of the spacetime diagram which depends on it (see Fig. 1). Then, as with tilings, the complexity of an n nsquare is the same as the complexity of the ﬁrst line, i. e., O(n). However, unlike tilings, in CA, there is no restriction as to the initial conﬁguration, hence CA have simple spacetime diagrams. Thus in this case Kolmogorov complexity is not of great help. One idea to improve the results could be to study how the complexity of conﬁgurations evolves during the application of the global rule. This aspect is particularly interesting with respect to dynamical properties. This is the subject of the next section. Consider a CA F with radius r and local rule ı. Then, its orbit limit set cannot be empty. Indeed, let a be a state of f . Consider the conﬁguration ! a! . Let s a D ı(a; : : : ; a). Then, f (! a! ) D ! s ! a . Consider now a graph whose vertices are the states of the CA and the edges are (a; s a ). Since each vertex has an outgoing edge (actually exactly one), it must contain a cycle a0 ! a1 ! ! a k ! a0 . Then each of the conﬁgurations ! a! i for ! a! . ) D 0 6 i 6 k is in the limit set since f k (! a! i i This simple fact proves that any orbit limit set (and any limit set) of a CA must contain at least a monochromatic conﬁguration whose complexity is low (for any reasonable deﬁnition). However, one can build a CA whose orbit limit set contains only complex conﬁgurations, except
for the mandatory monochromatic one using the localconstraints technique discussed in the previous section. Proposition 5 There exists a CA whose orbit limit set contains only complex conﬁgurations. Proof Let r be any computable monotone and unbounded function and Cr the associated local constraint. Let A be the alphabet that Cr is deﬁned on. Consider the 2D CA f r on the alphabet A [ f#g (we assume f#g ¤ A). Let r be the radius of f r (the same radius as Cr ). Finally, the local rule ı of f r is deﬁned as follows: ( P0 if P 2 Cr ; ı(P) D # otherwise : Using this local rule, one can verify the following fact. If the conﬁguration c is not acceptable for Cr , then (O(c))0 D f! #! g; (O(c))0 D fcg otherwise. Indeed, if c is acceptable for Cr , then f (c) D c; otherwise there is a position i such that c(i) is not a valid pattern for Cr . Then f (c)(i) D #. By simple induction, this means that all cells that are at a distance less than kr from position i become # after k steps. Hence, for all n > 0, after k > nC2jij steps, r the Cantor distance between f k (c) and ! #! is less than 2n , i. e., O(c) tends to ! #! . Measuring the Complexity of CA Dynamics The results of the previous section have limited range since they tell something about the quasicomplexity but nothing about the plain complexity of the limit set. To enhance our study we need to introduce some general concepts, namely, randomness spaces. Randomness Spaces Roughly speaking, a randomness space is a structure made of a topological space and a measure that helps in deﬁning which points of the space are random. More formally we can give the following. Deﬁnition 8 ([6]) A randomness space is a structure hX; B; i where X is a topological space, B : N ! 2 X a total numbering of a subbase of X, and a Borel measure. Given a numbering B for a subbase for an open set of X, one can produce a numbering B0 for a base as follows: \ B( j) ; B0 (i) D j2D iC1
where D : N ! fE j E N and E is ﬁniteg is the bijection P deﬁned by D1 (E) D i2E 2 i . B0 is called the base derived from the subbase B. Given two sequences of open sets (Vn )
143
144
Algorithmic Complexity and Cellular Automata
and (U n ) of X, we say that (Vn ) is Ucomputable if there exists a recursively enumerable set H 2 N such that [ 8n 2 N; Vn D Ui ; i2N;hn;ii2H
where hi; ji D (i C j)(i C j C 1) C j is the classical bijection between N 2 and N. Note that this bijection can be extended to N D (for D > 1) as follows: hx1 ; x2 ; : : : ; x k i D hx1 ; hx2 ; : : : ; x k ii. Deﬁnition 9 ([6]) Given a randomness space hX; B; i, a randomness test on X is a B0 computable sequence (U n ) of open sets such that 8n 2 N; (U n ) 2n . Given a randomness test (U n ), a point x 2 X is said to pass the test (U n ) if x 2 \n2N U n . In other words, tests select points belonging to sets of null measure. The computability of the tests assures the computability of the selected null measure sets. Deﬁnition 10 ([6]) Given a randomness space hX; B; i, a point x 2 X is nonrandom if x passes some randomness test. The point x 2 X is randomk if it is not nonrandom. Finally, note that for any D 1, hAZ ; B; i is a randomness space when setting B as o n D B( j C D hi1 ; : : : ; i D i) D c 2 AZ ; c i 1 ;:::;i D D a j ; D
where A D fa1 ; : : : ; a j ; : : : ; a D g and is the classical product measure built from the uniform Bernoulli measure over A. Theorem 9 ([6]) Consider a Ddimensional CA f . Then, the following statements are equivalent: 1. f is surjective; D 2. 8c 2 AZ , if c is rich (i. e., c contains all possible ﬁnite patterns), then f (c) is rich; D 3. 8c 2 AZ , if c is random, then f (c) is random. Theorem 10 ([6]) Consider a Ddimensional CA f . Then, D 8c 2 AZ , if c is not rich, then f (c) is not rich. Theorem 11 ([6]) Consider a 1D CA f . Then, 8c 2 AZ , if c is nonrandom, then f (c) is nonrandom. D
Note that the result in Theorem 11 is proved only for 1D CA; its generalization to higher dimensions is still an open problem. Open problem 1 Do Ddimensional CA for D > 1 preserve nonrandomness? From Theorem 9 we know that the property of preserving randomness (resp. richness) is related to surjectivity.
Hence, randomness (resp. richness) preserving is decidable in one dimension and undecidable in higher dimensions. The opposite relations are still open. Open problem 2 Is nonrichness (resp. nonrandomness) a decidable property? Algorithmic Distance In this section we review an approach to the study of CA dynamics from the point of view of algorithmic complexity that is completely diﬀerent than the one reported in the previous section. For more details on the algorithmic distance see Chaotic Behavior of Cellular Automata. In this new approach we are going to deﬁne a new distance using Kolmogorov complexity in such a way that two points x and y are near if it is “easy” to transform x into y or vice versa using a computer program. In this way, if a CA turns out to be sensitive to initial conditions, for example, then it means that it is able to create new information. Indeed, we will see that this is not the case. Deﬁnition 11 The algorithmic distance between x 2 AZ D and y 2 AZ is deﬁned as follows:
D
da (x; y) D lim sup n!1
K(x ! njy ! n) C K(y ! njx ! n) : 2(2n C 1) D
It is not diﬃcult to see that da is only a pseudodistance since there are many points that are at null distance (those that diﬀer only on a ﬁnite number of cells, for example). Consider the relation Š of “being at null da distance, D i. e., 8x; y 2 AZ ; x Š y if and only if da (x; y) D 0. Then, D hAZ /Š ; da i is a metric space. Note that the deﬁnition of da does not depend on the chosen additively optimal universal description mode since the additive constant disappears when dividing by 2(2n C 1) D and taking the superior limit. Moreover, by Theorem 2, the distance is bounded by 1. The following results summarize the main properties of this new metric space. Theorem 12 ( Chaotic Behavior of Cellular Automata) D The metric space hAZ /Š ; da i is perfect, pathwise connected, inﬁnite dimensional, nonseparable, and noncompact. Theorem 12 says that the new topological space has enough interesting properties to make worthwhile the study of CA dynamics on it. The ﬁrst interesting result obtained by this approach concerns surjective CA. Recall that surjectivity plays a central role in the study of chaotic behavior since it is a nec
Algorithmic Complexity and Cellular Automata
essary condition for many other properties used to deﬁne deterministic chaos like expansivity, transitivity, ergodicity, and so on (see Topological Dynamics of Cellular Automata and [8] for more on this subject). The following reD sult proves that in the new topology AZ /Š the situation is completely diﬀerent. Proposition 6 ( Chaotic Behavior of Cellular Automata) If f is a surjective CA, then d A (x; f (x)) D 0 for D any x 2 AZ /Š . In other words, every CA behaves like the D identity in AZ /Š . Proof In order to compute f (x)!n from x!n , one need only know the index of f in the set of CA with radius r, state set S, and dimension d; therefore K( f (x)!n jx!n ) 6 2dr(2n C 2r C 1)d1 log2 jSj C K( f ) C 2 log2 K( f ) C c ; and similarly K(x!n j f (x)!n ) 6 gdr(2n C 1)d1 log2 jSj C K( f ) C 2 log2 K( f ) C c : Dividing by 2(2n C 1)d and taking the superior limit one ﬁnds that da (x; f (x)) D 0. The result of Proposition 6 means that surjective CA can neither create new information nor destroy it. Hence, they have a high degree of stability from an algorithmic point of view. This contrasts with what happens in the Cantor topology. We conclude that the classical notion of deterministic chaos is orthogonal to “algorithmic chaos,” at least in what concerns the class of CA. Proposition 7 ( Chaotic Behavior of Cellular Automata) Consider a CA f that is neither surjective nor constant. Then, there exist two conﬁguraD tions x; y 2 AZ /Š such that da (x; y) D 0 but da ( f (x); f (y)) 6D 0. In other words, Proposition 7 says that nonsurjective, nonconstant CA are not compatible with Š and hence not continuous. This means that, for any pair of conﬁgurations x y, this kind of CA either destroys completely the information content of x y or preserves it in one, say x, and destroys it in y. However, the following result says that some weak form of continuity still persists (see Topological Dynamics of Cellular Automata and [8] for the definitions of equicontinuity point and sensibility to initial conditions). Proposition 8 ( Chaotic Behavior of Cellular Automata) Consider a CA f and let a be the conﬁguration made with all cells in state a. Then, a is both a ﬁxed point and an equicontinuity point for f .
Even if CA were noncontinuous on AZ /Š , one could still wonder what happens with respect to the usual properties used to deﬁne deterministic chaos. For instance, by Proposition 8, it is clear that no CA is sensitive to initial conditions. The following question is still open. D
Open problem 3 Is a the only equicontinuity point for CA D on AZ /Š ? Future Directions In this paper we have illustrated how algorithmic complexity can help in the study of CA dynamics. We essentially used it as a powerful tool to decrease the combinatorial complexity of problems. These kinds of applications are only at their beginnings and much more are expected in the future. For example, in view of the results of Subsect. “Example 1”, we wonder if Kolmogorov complexity can help in proving the famous conjecture that languages recognizable in real time by CA are a strict subclass of lineartime recognizable languages (see [9,14]). Another completely diﬀerent development would consist in ﬁnding how and if Theorem 11 extends to higher dimensions. How this property can be restated in the context of the algorithmic distance is also of great interest. Finally, how to extend the results obtained for CA to other dynamical systems is a research direction that must be explored. We are rather conﬁdent that this can shed new light on the complexity behavior of such systems. Acknowledgments This work has been partially supported by the ANR Blanc Project “Sycomore”. Bibliography Primary Literature 1. Blanchard F, Formenti E, Kurka P (1999) Cellular automata in the Cantor, Besicovitch and Weyl topological spaces. Complex Syst 11:107–123 2. Blanchard F, Cervelle J, Formenti E (2003) Periodicity and transitivity for cellular automata in Besicovitch topologies. In: Rovan B, Vojtas P (eds) (MFCS’2003), vol 2747. Springer, Bratislava, pp 228–238 3. Blanchard F, Cervelle J, Formenti E (2005) Some results about chaotic behavior of cellular automata. Theor Comput Sci 349(3):318–336 4. Brudno AA (1978) The complexity of the trajectories of a dynamical system. Russ Math Surv 33(1):197–198 5. Brudno AA (1983) Entropy and the complexity of the trajectories of a dynamical system. Trans Moscow Math Soc 44:127 6. Calude CS, Hertling P, Jürgensen H, Weihrauch K (2001)
145
146
Algorithmic Complexity and Cellular Automata
7.
8.
9.
10.
11. 12. 13.
Randomness on full shift spaces. Chaos, Solitons Fractals 12(3):491–503 Cattaneo G, Formenti E, Margara L, Mazoyer J (1997) A shiftinvariant metric on SZ inducing a nontrivial topology. In: Privara I, Rusika P (eds) (MFCS’97), vol 1295. Springer, Bratislava, pp 179–188 Cervelle J, Durand B, Formenti E (2001) Algorithmic information theory and cellular automata dynamics. In: Mathematical Foundations of Computer Science (MFCS’01). Lectures Notes in Computer Science, vol 2136. Springer, Berlin pp 248– 259 Delorme M, Mazoyer J (1999) Cellular automata as languages recognizers. In: Cellular automata: A parallel model. Kluwer, Dordrecht Durand B, Levin L, Shen A (2001) Complex tilings. In: STOC ’01: Proceedings of the 33rd annual ACM symposium on theory of computing, pp 732–739 Kari J (1994) Rice’s theorem for the limit set of cellular automata. Theor Comput Sci 127(2):229–254 Kurka ˚ P (1997) Languages, equicontinuity and attractors in cellular automata. Ergod Theory Dyn Syst 17:417–433 Li M, Vitányi P (1997) An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Berlin
14. M Delorme EF, Mazoyer J (2000) Open problems. Research Report LIP 200025, Ecole Normale Supérieure de Lyon 15. Pivato M (2005) Cellular automata vs. quasisturmian systems. Ergod Theory Dyn Syst 25(5):1583–1632 16. Robinson RM (1971) Undecidability and nonperiodicity for tilings of the plane. Invent Math 12(3):177–209 17. Terrier V (1996) Language not recognizable in real time by oneway cellular automata. Theor Comput Sci 156:283–287 18. Wolfram S (2002) A new kind of science. Wolfram Media, Champaign. http://www.wolframscience.com/
Books and Reviews Batterman RW, White HS (1996) Chaos and algorithmic complexity. Fund Phys 26(3):307–336 Bennet CH, Gács P, Li M, Vitányi P, Zurek W (1998) Information distance. EEE Trans Inf Theory 44(4):1407–1423 Calude CS (2002) Information and randomness. Texts in theoretical computer science, 2nd edn. Springer, Berlin Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York White HS (1993) Algorithmic complexity of points in dynamical systems. Ergod Theory Dyn Syst 13:807–830
Amorphous Computing
Amorphous Computing HAL ABELSON, JACOB BEAL, GERALD JAY SUSSMAN Computer Science and Artiﬁcial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA Article Outline Glossary Deﬁnition of the Subject Introduction The Amorphous Computing Model Programming Amorphous Systems Amorphous Computing Paradigms Primitives for Amorphous Computing Means of Combination and Abstraction Supporting Infrastructure and Services Lessons for Engineering Future Directions Bibliography Glossary Amorphous computer A collection of computational particles dispersed irregularly on a surface or throughout a volume, where individual particles have no a priori knowledge of their positions or orientations. Computational particle A (possibly faulty) individual device for an amorphous computer. Each particle has modest computing power and a modest amount of memory. The particles are not synchronized, although they are all capable of operating at similar speeds, since they are fabricated by the same process. All particles are programmed identically, although each particle has means for storing local state and for generating random numbers. Field A function assigning a value to every particle in an amorphous computer. Gradient A basic amorphous computing primitive that estimates the distance from each particle to the nearest particle designated as a source of the gradient. Definition of the Subject The goal of amorphous computing is to identify organizational principles and create programming technologies for obtaining intentional, prespeciﬁed behavior from the cooperation of myriad unreliable parts that are arranged in unknown, irregular, and timevarying ways. The heightened relevance of amorphous computing today stems from the emergence of new technologies that could serve as substrates for information processing systems of
immense power at unprecedentedly low cost, if only we could master the challenge of programming them. Introduction Even as the foundations of computer science were being laid, researchers could hardly help noticing the contrast between the robustness of natural organisms and the fragility of the new computing devices. As John von Neumann remarked in 1948 [53]: With our artiﬁcial automata we are moving much more in the dark than nature appears to be with its organisms. We are, and apparently, at least at present, have to be much more ‘scared’ by the occurrence of an isolated error and by the malfunction which must be behind it. Our behavior is clearly that of overcaution, generated by ignorance. Amorphous computing emerged as a ﬁeld in the mid1990s, from the convergence of three factors: Inspiration from the cellular automata models for fundamental physics [13,34]. Hope that understanding the robustness of biological development could both help overcome the brittleness typical of computer systems and also illuminate the mechanisms of developmental biology. The prospect of nearly free computers in vast quantities.
Microfabrication One technology that has come to fruition over the past decade is micromechanical electronic component manufacture, which integrates logic circuits, microsensors, actuators, and communication on a single chip. Aggregates of these can be manufactured extremely inexpensively, provided that not all the chips need work correctly, and that there is no need to arrange the chips into precise geometrical conﬁgurations or to establish precise interconnections among them. A decade ago, researchers envisioned smart dust elements small enough to be borne on air currents to form clouds of communicating sensor particles [26]. Airborne sensor clouds are still a dream, but networks of millimeterscale particles are now commercially available for environmental monitoring applications [40]. With low enough manufacturing costs, we could mix such particles into bulk materials to form coatings like “smart paint” that can sense data and communicate its actions to the outside world. A smart paint coating on a wall could sense
147
148
Amorphous Computing
vibrations, monitor the premises for intruders, or cancel noise. Bridges or buildings coated with smart paint could report on traﬃc and wind loads and monitor structural integrity. If the particles have actuators, then the paint could even heal small cracks by shifting the material around. Making the particles mobile opens up entire new classes of applications that are beginning to be explored by research in swarm robotics [35] and modular robotics [49].
Digital computers have always been constructed to behave as precise arrangements of reliable parts, and almost all techniques for organizing computations depend upon this precision and reliability. Amorphous computing seeks to discover new programming methods that do not require precise control over the interaction or arrangement of the individual computing elements and to instantiate these techniques in new programming languages.
Cellular Engineering
The Amorphous Computing Model
The second disruptive technology that motivates the study of amorphous computing is microbiology. Biological organisms have served as motivating metaphors for computing since the days of calculating engines, but it is only over the past decade that we have begun to see how biology could literally be a substrate for computing, through the possibility of constructing digitallogic circuits within individual living cells. In one technology logic signals are represented not by electrical voltages and currents, but by concentrations of DNA binding proteins, and logic elements are realized as binding sites where proteins interact through promotion and repression. As a simple example, if A and B are proteins whose concentrations represent logic levels, then an “inverter” can be implemented in DNA as a genetic unit through which A serves as a repressor that blocks the production of B [29,54,55,61]. Since cells can reproduce themselves and obtain energy from their environment, the resulting information processing units could be manufactured in bulk at a very low cost. There is beginning to take shape a technology of cellular engineering that can tailormake programmable cells to function as sensors or delivery vehicles for pharmaceuticals, or even as chemical factories for the assembly of nanoscale structures. Researchers in this emerging ﬁeld of synthetic biology are starting to assemble registries of standard logical components implemented in DNA that can be inserted into E. coli [48]. The components have been engineered to permit standard means of combining them, so that biological logic designers can assemble circuits in a mixandmatch way, similar to how electrical logic designers create circuits from standard TTL parts. There’s even an International Genetically Engineered Machine Competition where student teams from universities around the world compete to create novel biological devices from parts in the registry [24]. Either of these technologies—microfabricated particles or engineered cells—provides a path to cheaply fabricate aggregates of massive numbers of computing elements. But harnessing these for computing is quite a different matter, because the aggregates are unstructured.
Amorphous computing models the salient features of an unstructured aggregate through the notion of an amorphous computer, a collection of computational particles dispersed irregularly on a surface or throughout a volume, where individual particles have no a priori knowledge of their positions or orientations. The particles are possibly faulty, may contain sensors and eﬀect actions, and in some applications might be mobile. Each particle has modest computing power and a modest amount of memory. The particles are not synchronized, although they are all capable of operating at similar speeds, since they are fabricated by the same process. All particles are programmed identically, although each particle has means for storing local state and for generating random numbers. There may also be several distinguished particles that have been initialized to particular states. Each particle can communicate with a few nearby neighbors. In an electronic amorphous computer the particles might communicate via shortdistance radio, whereas bioengineered cells might communicate by chemical signals. Although the details of the communication model can vary, the maximum distance over which two particles can communicate eﬀectively is assumed to be small compared with the size of the entire amorphous computer. Communication is assumed to be unreliable and a sender has no assurance that a message has been received (Higherlevel protocols with message acknowledgement can be built on such unreliable channels). We assume that the number of particles may be very large (on the order of 106 to 1012 ). Algorithms appropriate to run on an amorphous computer should be relatively independent of the number of particles: the performance should degrade gracefully as the number of particles decreases. Thus, the entire amorphous computer can be regarded as a massively parallel computing system, and previous investigations into massively parallel computing, such as research in cellular automata, are one source of ideas for dealing with amorphous computers. However, amorphous computing diﬀers from investigations into cellular automata, because amorphous mechanisms must be
Amorphous Computing
independent of the detailed conﬁguration, reliability, and synchronization of the particles. Programming Amorphous Systems A central theme in amorphous computing is the search for programming paradigms that work within the amorphous model. Here, biology has been a rich source of metaphors for inspiring new programming techniques. In embryonic development, even though the precise arrangements and numbers of the individual cells are highly variable, the genetic “programs” coded in DNA nevertheless produce welldeﬁned intricate shapes and precise forms. Amorphous computers should be able to achieve similar results. One technique for programming an amorphous computer uses diﬀusion. One particle (chosen by some symmetrybreaking process) broadcasts a message. This message is received by each of its neighbors, which propagate it to their neighbors, and so on, to create a wave that spreads throughout the system. The message contains a count, and each particle stores the received count and increments it before rebroadcasting. Once a particle has stored its count, it stops rebroadcasting and ignores future count messages. This countup wave gives each particle a rough measure of its distance from the original source. One can also produce regions of controlled size, by having the count message relayed only if the count is below a designated bound. Two such countup waves can be combined to identify a chain of particles between two given particles A and B. Particle A begins by generating a countup wave as above. This time, however, each intermediate particle, when it receives its count, performs a handshake to identify its “predecessor”—the particle from which it received the count (and whose own count will therefore be one less). When the wave of count messages reaches B, B sends a “successor” message, informing its predecessor that it should become part of the chain and should send a message to its predecessor, and so on, all the way back to A. Note that this method works, even though the particles are irregularly distributed, provided there is a path from A to B. The motivating metaphor for these two programs is chemical gradient diﬀusion, which is a foundational mechanism in biological development [60]. In nature, biological mechanisms not only generate elaborate forms, they can also maintain forms and repair them. We can modify the above amorphous linedrawing program so that it produces selfrepairing line: ﬁrst, particles keep rebroadcasting their count and successor messages. Second, the status of a particle as having a count or being in the chain decays over time unless it is refreshed by new mes
sages. That is, a particle that stops hearing successor messages intended for it will eventually revert to not being in the chain and will stop broadcasting its own successor messages. A particle that stops hearing its count being broadcast will start acting as if it never had a count, pick up a new count from the messages it hears, and start broadcasting the count messages with the new count. Clement and Nagpal [12] demonstrated that this mechanism can be used to generate selfrepairing lines and other patterns, and even reroute lines and patterns around “dead” regions where particles have stopped functioning. The relationship with biology ﬂows in the other direction as well: the amorphous algorithm for repair is a model which is not obviously inconsistent with the facts of angiogenesis in the repair of wounds. Although the existence of the algorithm has no bearing on the facts of the matter, it may stimulate systemslevel thinking about models in biological research. For example, Patel et al. use amorphous ideas to analyze the growth of epithelial cells [43]. Amorphous Computing Paradigms Amorphous computing is still in its infancy. Most of linguistic investigations based on the amorphous computing model have been carried out in simulation. Nevertheless, this work has yielded a rich variety of programming paradigms that demonstrate that one can in fact achieve robustness in face of the unreliability of individual particles and the absence of precise organization among them. Marker Propagation for Amorphous Particles Weiss’s Microbial Colony Language [55] is a marker propagation language for programming the particles in an amorphous computer. The program to be executed, which is the same for each particle, is constructed as a set of rules. The state of each particle includes a set of binary markers, and rules are enabled by boolean combinations of the markers. The rules, which have the form (trigger, condition, action) are triggered by the receipt of labelled messages from neighboring particles. A rule may test conditions, set or clear various markers, and it broadcast further messages to its neighbors. Each message carries a count that determines how far it will diﬀuse, and each marker has a lifetime that determines how long its value lasts. Supporting these language’s rules is a runtime system that automatically propagates messages and manages the lifetimes of markers, so that the programmer need not deal with these operations explicitly. Weiss’s system is powerful, but the level of abstraction is very low. This is because it was motivated by cellular engineering—as something that can be directly im
149
150
Amorphous Computing
Amorphous Computing, Figure 1 A Microbial Colony Language program organizes a tube into a structure similar to that of somites in the developing vertebrate (from [55]).
Amorphous Computing, Figure 2 A pattern generated by GPL whose shape mimics a chain of CMOS inverters (from [15])
plemented by genetic regulatory networks. The language is therefore more useful as a tool set in which to implement higherlevel languages such as GPL (see below), serving as a demonstration that in principle, these higherlevel languages can be implemented by genetic regulatory networks as well. Figure 1 shows an example simulation programmed in this language, that organizes an initially undiﬀerentiated column of particles into a structure with band of two alternating colors: a caricature of somites in developing vertebrae. The Growing Point Language Coore’s Growing Point Language (GPL) [15] demonstrates that an amorphous computer can be conﬁgured by a program that is common to all the computing elements to generate highly complex patterns, such as the pattern representing the interconnection structure of an arbitrary electrical circuit as shown in Fig. 2. GPL is inspired by a botanical metaphor based on growing points and tropisms. A growing point is a locus of activity in an amorphous computer. A growing point propagates through the computer by transferring its activity from one computing element to a neighbor.
As a growing point passes through the computer it effects the diﬀerentiation of the behaviors of the particles it visits. Particles secrete “chemical” signals whose countup waves deﬁne gradients, and these attract or repel growing points as directed by programmerspeciﬁc “tropisms”. Coore demonstrated that these mechanisms are suﬃcient to permit amorphous computers to generate any arbitrary prespeciﬁed graph structure pattern, up to topology. Unlike real biology, however, once a pattern has been constructed, there is no clear mechanism to maintain it in the face of changes to the material. Also, from a programming linguistic point of view, there is no clear way to compose shapes by composing growing points. More recently, Gayle and Coore have shown how GPL may be extended to produce arbitrarily large patterns such as arbitrary text strings[21]. D’Hondt and D’Hondt have explored the use of GPL for geometrical constructions and its relations with computational geometry [18,19]. OrigamiBased SelfAssembly Nagpal [38] developed a prototype model for controlling programmable materials. She showed how to organize a program to direct an amorphous sheet of deformable particles to cooperate to construct a large family
Amorphous Computing
Amorphous Computing, Figure 3 Folding an envelope structure (from [38]). A pattern of lines is constructed according to origami axioms. Elements then coordinate to fold the sheet using an actuation model based on epithelial cell morphogenesis. In the figure, black indicates the front side of the sheet, grey indicates the back side, and the various colored bands show the folds and creases that are generated by the amorphous process. The small white spots show gaps in the sheet cause by “dead” or missing cells—the process works despite these
of globallyspeciﬁed predetermined shapes. Her method, which is inspired by the folding of epithelial tissue, allows a programmer to specify a sequence of folds, where the set of available folds is suﬃcient to create any origami shape (as shown by Huzita’s axioms for origami [23]). Figure 3 shows a sheet of amorphous particles, where particles can
cooperate to create creases and folds, assembling itself into the wellknown origami “cup” structure. Nagpal showed how this language of folds can be compiled into a a lowlevel program that can be distributed to all of the particles of the amorphous sheet, similar to Coore’s GPL or Weiss’s MCL. With a few diﬀerences of initial state (for example, particles at the edges of the sheet know that they are edge particles) the particles run their copies of the program, interact with their neighbors, and fold up to make the predetermined shape. This technique is quite robust. Nagpal studied the range of shapes that can be constructed using her method, and on their sensitivity to errors of communication, random cell death, and density of the cells. As a programming framework, the origami language has more structure than the growing point language, because the origami methods allow composition of shape constructions. On the other hand, once a shape has been constructed, there is no clear mechanism to maintain existing patterns in the face of changes to the material. Dynamic Recruitment In Butera [10]’s “paintable computing”, processes dynamically recruit computational particles from an amorphous computer to implement their goals. As one of his examples Butera uses dynamic recruitment to implement a robust storage system for streaming audio and images (Fig. 4). Fragments of the image and audio stream circulate freely through the amorphous computer and are marshaled to a port when needed. The audio fragments also sort themselves into a playable stream as they migrate to the port. To enhance robustness, there are multiple copies of the lower resolution image fragments and fewer copies of the higher resolution image fragments. Thus, the image is hard to destroy; with lost fragments the image is degraded but not destroyed.
Amorphous Computing, Figure 4 Butera dynamically controls the flow of information through an amorphous computer. In a image fragments spread through the computer so that a degraded copy can be recovered from any segment; the original image is on the left, the blurry copy on the right has been recovered from the small region shown below. In b audio fragments sort themselves into a playable stream as they migrate to an output port; cooler colors are earlier times (from [10]).
151
152
Amorphous Computing
Amorphous Computing, Figure 5 A tracking program written in Proto sends the location of a target region (orange) to a listener (red) along a channel (small red dots) in the network (indicated by green lines). The continuous space and time abstraction allows the same program to run at different resolutions
Clement and Nagpal [12] also use dynamic recruitment in the development of active gradients, as described below. Growth and Regeneration Kondacs [30] showed how to synthesize arbitrary twodimensional shapes by growing them. These computing units, or “cells”, are identicallyprogrammed and decentralized, with the ability to engage in only limited, local communication. Growth is implemented by allowing cells to multiply. Each of his cells may create a child cell and place it randomly within a ring around the mother cell. Cells may also die, a fact which Kondacs puts to use for temporary scaﬀolding when building complex shapes. If a structure requires construction of a narrow neck between two other structures it can be built precisely by laying down a thick connection and later trimming it to size. Attributes of this system include scalability, robustness, and the ability for selfrepair. Just as a starﬁsh can regenerate its entire body from part of a limb, his system can selfrepair in the event of agent death: his spherenetwork representation allows the structure to be grown starting from any sphere, and every cell contains all necessary information for reproducing the missing structure. Abstraction to Continuous Space and Time The amorphous model postulates computing particles distributed throughout a space. If the particles are dense, one can imagine the particles as actually ﬁlling the space, and create programming abstractions that view the space itself as the object being programmed, rather than the collection of particles. Beal and Bachrach [1,7] pursued this approach by creating a language, Proto, where programmers specify
the behavior of an amorphous computer as though it were a continuous material ﬁlling the space it occupies. Proto programs manipulate ﬁelds of values spanning the entire space. Programming primitives are designed to make it simple to compile global operations to operations at each point of the continuum. These operations are approximated by having each device represent a nearby chunk of space. Programs are speciﬁed in space and time units that are independent of the distribution of particles and of the particulars of communication and execution on those particles (Fig. 5). Programs are composed functionally, and many of the details of communication and composition are made implicit by Proto’s runtime system, allowing complex programs to be expressed simply. Proto has been applied to applications in sensor networks like target tracking and threat avoidance, to swarm robotics and to modular robotics, e. g., generating a planar wave for coordinated actuation. Newton’s language Regiment [41,42] also takes a continuous view of space and time. Regiment is organized in terms of stream operations, where each stream represents a timevarying quantity over a part of space, for example, the average value of the temperature over a disc of a given radius centered at a designated point. Regiment, also a functional language, is designed to gather streams of data from regions of the amorphous computer and accumulate them at a single point. This assumption allows Regiment to provide regionwide summary functions that are diﬃcult to implement in Proto.
Primitives for Amorphous Computing The previous section illustrated some paradigms that have been developed for programming amorphous systems, each paradigm building on some organizing metaphor.
Amorphous Computing
But eventually, meeting the challenge of amorphous systems will require a more comprehensive linguistic framework. We can approach the task of creating such a framework following the perspective in [59], which views languages in terms of primitives, means of combination, and means of abstraction. The fact that amorphous computers consist of vast numbers of unreliable and unsynchronized particles, arranged in space in ways that are locally unknown, constrains the primitive mechanisms available for organizing cooperation among the particles. While amorphous computers are naturally massively parallel, the kind of computation that they are most suited for is parallelism that does not depend on explicit synchronization and the use of atomic operations to control concurrent access to resources. However, there are large classes of useful behaviors that can be implemented without these tools. Primitive mechanisms that are appropriate for specifying behavior on amorphous computers include gossip, random choice, ﬁelds, and gradients. Gossip Gossip, also known as epidemic communication [17,20], is a simple communication mechanism. The goal of a gossip computation is to obtain an agreement about the value of some parameter. Each particle broadcasts its opinion of the parameter to its neighbors, and computation is performed by each particle combining the values that it receives from its neighbors, without consideration of the identiﬁcation of the source. If the computation changes a particle’s opinion of the value, it rebroadcasts its new opinion. The process concludes when the are no further broadcasts. For example, an aggregate can agree upon the minimum of the values held by all the particles as follows. Each particle broadcasts its value. Each recipient compares its current value with the value that it receives. If the received value is smaller than its current value, it changes its current value to that minimum and rebroadcasts the new value. The advantage of gossip is that it ﬂows in all directions and is very diﬃcult to disrupt. The disadvantage is that the lack of source information makes it diﬃcult to revise a decision. Random Choice Random choice is used to break symmetry, allowing the particles to diﬀerentiate their behavior. The simplest use of random choice is to establish local identity of particles: each particle chooses a random number to identify
itself to its neighbors. If the number of possible choices is large enough, then it is unlikely that any nearby particles will choose the same number, and this number can thus be used as an identiﬁer for the particle to its neighbors. Random choice can be combined with gossip to elect leaders, either for the entire system or for local regions. (If collisions in choice can be detected, then the number of choices need not be much higher than then number of neighbors. Also, using gossip to elect leaders makes sense only when we expect a leader to be longlived, due to the diﬃculty of changing the decision to designate a replacement leader.) To elect a single leader for the entire system, every particle chooses a value, then gossips to ﬁnd the minimum. The particle with the minimum value becomes the leader. To elect regional leaders, we instead use gossip to carry the identity of the ﬁrst leader a particle has heard of. Each particle uses random choice as a “coin ﬂip” to decide when to declare itself a leader; if the ﬂip comes up heads enough times before the particle hears of another leader, the particle declares itself a leader and broadcasts that fact to its neighbors. The entire system is thus broken up into contiguous domains of particles who ﬁrst heard some particular particle declare itself a leader. One challenge in using random choice on an amorphous computer is to ensure that the particulars of particle distribution do not have an unexpected eﬀect on the outcome. For example, if we wish to control the expected size of the domain that each regional leader presides over, then the probability of becoming a leader must depend on the density of the particles. Fields Every component of the state of the computational particles in an amorphous computer may be thought of as a ﬁeld over the discrete space occupied by those particles. If the density of particles is large enough this ﬁeld of values may be thought of as an approximation of a ﬁeld on the continuous space. We can make amorphous models that approximate the solutions of the classical partialdiﬀerential equations of physics, given appropriate boundary conditions. The amorphous methods can be shown to be consistent, convergent and stable. For example, the algorithm for solving the Laplace equation with Dirichlet conditions is analogous to the way it would be solved on a lattice. Each particle must repeatedly update the value of the solution to be the average of the solutions posted by its neighbors, but the boundary points must not change their values. This algorithm will
153
154
Amorphous Computing
Amorphous Computing, Figure 6 A line being maintained by active gradients, from [12]. A line (black) is constructed between two anchor regions (dark grey) based on the active gradient emitted by the right anchor region (light grays). The line is able to rapidly repair itself following failures because the gradient actively maintains itself
eventually converge, although very slowly, independent of the order of the updates and the details of the local connectedness of the network. There are optimizations, such as overrelaxation, that are just as applicable in the amorphous context as on a regular grid. Katzenelson [27] has shown similar results for the diﬀusion equation, complete with analytic estimates of the errors that arise from the discrete and irregularly connected network. In the diﬀusion equation there is a conserved quantity, the amount of material diﬀusing. Rauch [47] has shown how this can work with the wave equation, illustrating that systems that conserve energy and momentum can also be eﬀectively modeled with an amorphous computer. The simulation of the wave equation does require that the communicating particles know their relative positions, but it is not hard to establish local coordinate systems. Gradients An important primitive in amorphous computing is the gradient, which estimates the distance from each particle to the nearest particle designated as a source. The gradient is inspired by the chemicalgradient diﬀusion process that is crucial to biological development. Amorphous computing builds on this idea, but does not necessarily compute the distance using diﬀusion because simulating diﬀusion can be expensive. The common alternative is a lineartime mechanism that depends on active computation and relaying of information rather than passive diﬀusion. Calculation of a gradient starts with each source particle setting its distance estimate to zero, and every other particle setting its distance
estimate to inﬁnity. The sources then broadcast their estimates to their neighbors. When a particle receives a message from its neighbor, it compares its current distance estimate to the distance through its neighbor. If the distance through its neighbor is less, it chooses that to be its estimate, and broadcasts its new estimate onwards. Although the basic form of the gradient is simple, there are several ways in which gradients can be varied to better match the context in which they are used. These choices may be made largely independently, giving a wide variety of options when designing a system. Variations which have been explored include: Active Gradients An active gradient [12,14,16] monitors its validity in the face of changing sources and device failure, and maintains correct distance values. For example, if the supporting sources disappear, the gradient is deallocated. A gradient may also carry version information, allowing its source to change more smoothly. Active gradients can provide selfrepairing coordinate systems, as a foundation for robust construction of patterns. Figure 6 shows the “countdown wave” line described above in the introduction to this article. The line’s implementation in terms of active gradients provides for selfrepair when the underlying amorphous computer is damaged. Polarity A gradient may be set to count down from a positive value at the source, rather than to count up from zero. This bounds the distance that the gradient can span, which can help limit resource usage, but may limit the scalability of programs.
Amorphous Computing
Adaptivity As described above, a gradient relaxes once to a distance estimate. If communication is expensive and precision unimportant, the gradient can take the ﬁrst value that arrives and ignore all subsequent values. If we want the gradient to adapt to changes in the distribution of particles or sources, then the particles need to broadcast at regular intervals. We can then have estimates that converge smoothly to precise estimates by adding a restoring force which acts opposite to the relaxation, allowing the gradient to rise when unconstrained by its neighbors. If, on the other hand, we value adaptation speed over smoothness, then each particle can recalculate its distance estimate from scratch with each new batch of values. Carrier Normally, the distance value calculated by the gradient is the signal we are interested in. A gradient may instead be used to carry an arbitrary signal outward from the source. In this case, the value at each particle is the most recently arrived value from the nearest source. Distance Measure A gradient’s distance measure is, of course, dependent on how much knowledge we have about the relative positions of neighbors. It is sometimes advantageous to discard good information and use only hopcount values, since it is easier to make an adaptive gradient using hopcount values. Nonlinear distance measures are also possible, such as a countdown gradient that decays exponentially from the source. Finally, the value of a gradient may depend on more sources than the nearest (this is the case for a chemical gradient), though this may be very expensive to calculate. Coordinates and Clusters Computational particles may be built with restrictions about what can be known about local geometry. A particle may know that it can reliably communicate with a few neighbors. If we assume that these neighbors are all within a disc of some approximate communication radius then distances to others may be estimated by minimum hop count [28]. However, it is possible that more elaborate particles can estimate distances to near neighbors. For example, the Cricket localization system [45] uses the fact that sound travels more slowly than radio, so the distance is estimated by the diﬀerence in time of arrival between simultaneously transmitted signals. McLurkin’s swarmbots [35] use the ISIS communication system that gives bearing and range information. However, it is possible for a suﬃciently dense amorphous computer to produce local coordinate systems for its particles with even the crudest method of determining distances. We can make an atlas of overlap
ping coordinate systems, using random symmetry breaking to make new starting baselines [3]. These coordinate systems can be combined and made consistent to form a manifold, even if the amorphous computer is not ﬂat or simply connected. One way to establish coordinates is to choose two initial particles that are a known distance apart. Each one serves as the source of a gradient. A pair of rectangular axes can be determined by the shortest path between them and by a bisector constructed where the two gradients are equal. These may be reﬁned by averaging and calibrated using the known distance between the selected particles. After the axes are established, they may source new gradients that can be combined to make coordinates for the region near these axes. The coordinate system can be further reﬁned using further averaging. Other natural coordinate constructions are bipolar elliptical. This kind of construction was pioneered by Coore [14] and Nagpal [37]. Katzenelson [27] did early work to determine the kind of accuracy that can be expected from such a construction. Spatial clustering can be accomplished with any of a wide variety of algorithms, such as the clubs algorithm [16], LOCI [36], or persistent node partitioning [4]. Clusters can themselves be clustered, forming a hierarchical clustering of logarithmic height. Means of Combination and Abstraction A programming framework for amorphous systems requires more than primitive mechanisms. We also need suitable means of combination, so that programmers can combine behaviors to produce more complex behaviors, and means of abstraction so that the compound behaviors can be named and manipulated as units. Here are a few means of combination that have been investigated with amorphous computing. Spatial and Temporal Sequencing Several behaviors can be strung together in a sequence. The challenge in controlling such a sequence is to determine when one phase has completed and it is safe to move on to the next. Trigger rules can be used to detect completion locally. In Coore’s Growing Point Language [15], all of the sequencing decisions are made locally, with diﬀerent growing points progressing independently. There is no diﬃculty of synchronization in this approach because the only time when two growing points need to agree is when they have become spatially coincident. When growing points merge, the independent processes are automatically synchronized.
155
156
Amorphous Computing
Nagpal’s origami language [38] has longrange operations that cannot overlap in time unless they are in noninteracting regions of the space. The implementation uses barrier synchronization to sequence the operations: when completion is detected locally, a signal is propagated throughout a marked region of the sheet, and the next operation begins after a waiting time determined by the diameter of the sheet. With adaptive gradients, we can use the presence of an inducing signal to run an operation. When the induction signal disappears, the operation ceases and the particles begin the next operation. This allows sequencing to be triggered by the last detection of completion rather than by the ﬁrst. Pipelining If a behavior is selfstabilizing (meaning that it converges to a correct state from any arbitrary state) then we can use it in a sequence without knowing when the previous phase completes. The evolving output of the previous phase serves as the input of this next phase, and once the preceding behavior has converged, the selfstabilizing behavior will converge as well. If the previous phase evolves smoothly towards its ﬁnal state, then by the time it has converged, the next phase may have almost converged as well, working from its partial results. For example, the coordinate system mechanism described above can be pipelined; the ﬁnal coordinates are being formed even as the farther particles learn that they are not on one of the two axes. Restriction to Spatial Regions Because the particles of an amorphous computer are distributed in space it is natural to assign particular behaviors to speciﬁc spatial regions. In Beal and Bachrach’s work, restriction of a process to a region is a primitive[7]. As another example, when Nagpal’s system folds an origami construction, regions on diﬀerent faces may diﬀerentiate so that they fold in diﬀerent patterns. These folds may, if the physics permits, be performed simultaneously. It may be necessary to sequence later construction that depends on the completion of the substructures. Regions of space can be named using coordinates, clustering, or implicitly through calculations on ﬁelds. Indeed, one could implement solid modelling on an amorphous computer. Once a region is identiﬁed, a particle can test whether it is a member of that region when deciding whether to run a behavior. It is also necessary to specify how a particle should change its behavior if its membership in a region may vary with time.
Modularity and Abstraction Standard means of abstraction may be applied in an amorphous computing context, such as naming procedures, data structures, and processes. The question for amorphous computing is what collection of entities is useful to name. Because geometry is essential in an amorphous computing context, it becomes appropriate to describe computational processes in terms of geometric entities. Thus there are new opportunities for combining geometric structures and naming the combinations. For example, it is appropriate to compute with, combine, and name regions of space, intervals of time, and ﬁelds deﬁned on them [7,42]. It may also be useful to describe the propagation of information through the amorphous computer in terms of the light cone of an event [2]. Not all traditional abstractions extend nicely to an amorphous computing context because of the challenges of scale and the fallibility of parts and interconnect. For example, atomic transactions may be excessively expensive in an amorphous computing context. And yet, some of the goals that a programmer might use atomic transactions to accomplish, such as the approximate enforcement of conservation laws, can be obtained using techniques that are compatible with an amorphous environment, as shown by Rauch [47]. Supporting Infrastructure and Services Amorphous computing languages, with their primitives, means of combination, and means of abstraction, rest on supporting services. One example, described above, is the automatic message propagation and decay in Weiss’s Microbial Colony Language [55]. MCL programs do not need to deal with this explicitly because it is incorporated into the operating system of the MCL machine. Experience with amorphous computing is beginning to identify other key services that amorphous machines must supply. Particle Identity Particles must be able to choose identiﬁers for communicating with their neighbors. More generally, there are many operations in an amorphous computation where the particles may need to choose numbers, with the property that individual particles choose diﬀerent numbers. If we are willing to pay the cost, it is possible to build unique identiﬁers into the particles, as is done with current macroscopic computers. We need only locally unique identiﬁers, however, so we can obtain them using pseudorandomnumber generators. On the surface of it,
Amorphous Computing
this may seem problematic, since the particles in the amorphous are assumed to be manufactured identically, with identical programs. There are, however, ways to obtain individualized random numbers. For example, the particles are not synchronized, and they are not really physically identical, so they will run at slightly diﬀerent rates. This diﬀerence is enough to allow pseudorandomnumber generators to get locally out of synch and produce diﬀerent sequences of numbers. Amorphous computing particles that have sensors may also get seeds for their pseudorandomnumber generators from sensor noise. Local Geometry and Gradients Particles must maintain connections with their neighbors, tracking who they can reliably communicate with, and whatever local geometry information is available. Because particles may fail or move, this information needs to be maintained actively. The geometry information may include distance and bearing to each neighbor, as well as the time it takes to communicate with each neighbor. But many implementations will not be able to give signiﬁcant distance or bearing information. Since all of this information may be obsolete or inaccurately measured, the particles must also maintain information on how reliable each piece of information is. An amorphous computer must know the dimension of the space it occupies. This will generally be a constant—either the computer covers a surface or ﬁlls a volume. In rare cases, however, the eﬀective dimension of a computer may change: for example, paint is threedimensional in a bucket and twodimensional once applied. Combining this information with how the number of accessible correspondents changes with distance, an amorphous process can derive curvature and local density information. An amorphous computer should also support gradient propagation as part of the infrastructure: a programmer should not have to explicitly deal with the propagation of gradients (or other broadcast communications) in each particle. A process may explicitly initiate a gradient, or explicitly react to one that it is interested in, but the propagation of the gradient through a particle should be automatically maintained by the infrastructure. Implementing Communication Communication between neighbors can occur through any number of mechanisms, each with its own set of properties: amorphous computing systems have been built that communicate through directional infrared [35], RF broadcast [22], and lowspeed serial cables [9,46]. Simulated sys
tems have also included other mechanisms such as signals superimposed on the power connections [11] and chemical diﬀusion [55]. Communication between particles can be made implicit with a neighborhood shared memory. In this arrangement, each particle designates some of its internal state to be shared with its neighbors. The particles regularly communicate, giving each particle a besteﬀort view of the exposed portions of the states of its neighbors. The contents of the exposed state may be speciﬁed explicitly [10,35,58] or implicitly [7]. The shared memory allows the system to be tuned by trading oﬀ communication rate against the quality of the synchronization, and decreasing transmission rates when the exposed state is not changing. Lessons for Engineering As von Neumann remarked half a century ago, biological systems are strikingly robust when compared with our artiﬁcial systems. Even today software is fragile. Computer science is currently built on a foundation that largely assumes the existence of a perfect infrastructure. Integrated circuits are fabricated in cleanroom environments, tested deterministically, and discarded if even a single defect is uncovered. Entire software systems fail with singleline errors. In contrast, biological systems rely on local computation, local communication, and local state, yet they exhibit tremendous resilience. Although this contrast is most striking in computer science, amorphous computing can provide lessons throughout engineering. Amorphous computing concentrates on making systems ﬂexible and adaptable at the expense of eﬃciency. Amorphous computing requires an engineer to work under extreme conditions. The engineer must arrange the cooperation of vast numbers of identical computational particles to accomplish prespeciﬁed goals, but may not depend upon the numbers. We may not depend on any prespeciﬁed interconnect of the particles. We may not depend on synchronization of the particles. We may not depend on the stability of the communications system. We may not depend on the longterm survival of any individual particles. The combination of these obstacles forces us to abandon many of the comforts that are available in more typical engineering domains. By restricting ourselves in this way we obtain some robustness and ﬂexibility, at the cost of potentially ineﬃcient use of resources, because the algorithms that are appropriate are ones that do not take advantage of these assumptions. Algorithms that work well in an amorphous context depend on the average behavior of participating particles. For example, in Nagpal’s origami system a fold that
157
158
Amorphous Computing
is speciﬁed will be satisfactory if it is approximately in the right place and if most of the particles on the speciﬁed fold line agree that they are part of the fold line: dissenters will be overruled by the majority. In Proto a programmer can address only regions of space, assumed to be populated by many particles. The programmer may not address individual particles, so failures of individual particles are unlikely to make major perturbations to the behavior of the system. An amorphous computation can be quite immune to details of the macroscopic geometry as well as to the interconnectedness of the particles. Since amorphous computations make their own local coordinate systems, they are relatively independent of coordinate distortions. In an amorphous computation we accept a wide range of outcomes that arise from variations of the local geometry. Tolerance of local variation can lead to surprising ﬂexibility: the mechanisms which allow Nagpal’s origami language to tolerate local distortions allow programs to distort globally as well, and Nagpal shows how such variations can account for the variations in the head shapes of related species of Drosophila [38]. In Coore’s language one speciﬁes the topology of the pattern to be constructed, but only limited information about the geometry. The topology will be obtained, regardless of the local geometry, so long as there is suﬃcient density of particles to support the topology. Amorphous computations based on a continuous model of space (as in Proto) are naturally scale independent. Since an amorphous computer is composed of unsynchronized particles, a program may not depend upon a priori timing of events. The sequencing of phases of a process must be determined by either explicit termination signals or with times measured dynamically. So amorphous computations are timescale independent by construction. A program for an amorphous computer may not depend on the reliability of the particles or the communication paths. As a consequence it is necessary to construct the program so as to dynamically compensate for failures. One way to do this is to specify the result as the satisfaction of a set of constraints, and to build the program as a homeostatic mechanism that continually drives the system toward satisfaction of those constraints. For example, an active gradient continually maintains each particle’s estimate of the distance to the source of the gradient. This can be used to establish and maintain connections in the face of failures of particles or relocation of the source. If a system is speciﬁed in this way, repair after injury is a continuation of the development process: an injury causes some constraints to become unsatisﬁed, and the development process builds new structure to heal the injury.
By restricting the assumptions that a programmer can rely upon we increase the ﬂexibility and reliability of the programs that are constructed. However, it is not yet clear how this limits the range of possible applications of amorphous computing. Future Directions Computer hardware is almost free, and in the future it will continue to decrease in price and size. Sensors and actuators are improving as well. Future systems will have vast numbers of computing mechansisms with integrated sensors and actuators, to a degree that outstrips our current approaches to system design. When the numbers become large enough, the appropriate programming technology will be amorphous computing. This transition has already begun to appear in several ﬁelds: Sensor networks The success of sensor network research has encouraged the planning and deployment of everlarger numbers of devices. The adhoc, timevarying nature of sensor networks has encouraged amorphous approaches, such as communication through directed diﬀusion [25] and Newton’s Regiment language [42]. Robotics Multiagent robotics is much like sensor networks, except that the devices are mobile and have actuators. Swarm robotics considers independently mobile robots working together as a team like ants or bees, while modular robotics consider robots that physically attach to one another in order to make shapes or perform actions, working together like the cells of an organism. Gradients are being used to create “ﬂocking” behaviors in swarm robotics [35,44]. In modular robotics, Stoy uses gradients to create shapes [51] while De Rosa et al. form shapes through stochastic growth and decay [49]. Pervasive computing Pervasive computing seeks to exploit the rapid proliferation of wireless computing devices throughout our everyday environment. Mamei and Zambonelli’s TOTA system [32] is an amorphous computing implementation supporting a model of programming using ﬁelds and gradients [33]. Servat and Drogoul have suggested combining amorphous computing and reactive agentbased systems to produce something they call “pervasive intelligence” [50]. Multicore processors As it becomes more diﬃcult to increase processor speed, chip manufacturers are looking for performance gains through increasing the number of processing cores per chip. Butera’s work [10] looks toward a future in which there are thousands of cores per chip and it is no longer rea
Amorphous Computing
sonable to assume they are all working or have them communicate alltoall. While much of amorphous computing research is inspired by biological observations, it is also likely that insights and lessons learned from programming amorphous computers will help elucidate some biological problems [43]. Some of this will be stimulated by the emerging engineering of biological systems. Current work in synthetic biology [24,48,61] is centered on controlling the molecular biology of cells. Soon synthetic biologists will begin to engineer bioﬁlms and perhaps direct the construction of multicellular organs, where amorphous computing will become an essential technological tool.
Bibliography Primary Literature 1. Bachrach J, Beal J (2006) Programming a sensor network as an amorphous medium. In: DCOSS 2006 Posters, June 2006 2. Bachrach J, Beal J, Fujiwara T (2007) Continuous spacetime semantics allow adaptive program execution. In: IEEE International Conference on SelfAdaptive and SelfOrganizing Systems, 2007 3. Bachrach J, Nagpal R, Salib M, Shrobe H (2003) Experimental results and theoretical analysis of a selforganizing global coordinate system for ad hoc sensor networks. Telecommun Syst J, Special Issue on Wireless System Networks 26(2–4):213–233 4. Beal J (2003) A robust amorphous hierarchy from persistent nodes. In: Commun Syst Netw 5. Beal J (2004) Programming an amorphous computational medium. In: Unconventional Programming Paradigms International Workshop, September 2004 6. Beal J (2005) Amorphous medium language. In: LargeScale MultiAgent Systems Workshop (LSMAS). Held in Conjunction with AAMAS05 7. Beal J, Bachrach J (2006) Infrastructure for engineered emergence on sensor/actuator networks. In: IEEE Intelligent Systems, 2006 8. Beal J, Sussman G (2005) Biologicallyinspired robust spatial programming. Technical Report AI Memo 2005001, MIT, January 2005 9. Beebee W M68hc11 gunk api book. http://www.swiss.ai.mit. edu/projects/amorphous/HC11/api.html. Accessed 31 May 2007 10. Butera W (2002) Programming a Paintable Computer. Ph D thesis, MIT 11. Campbell J, Pillai P, Goldstein SC (2005) The robot is the tether: Active, adaptive power routing for modular robots with unary interrobot connectors. In: IROS 2005 12. Clement L, Nagpal R (2003) Selfassembly and selfrepairing topologies. In: Workshop on Adaptability in MultiAgent Systems, RoboCup Australian Open, January 2003 13. Codd EF (1968) Cellular Automata. Academic Press, New York 14. Coore D (1998) Establishing a coordinate system on an amorphous computer. In: MIT Student Workshop on High Performance Computing, 1998
15. Coore D (1999) Botanical Computing: A Developmental Approach to Generating Interconnect Topologies on an Amorphous Computer. Ph D thesis, MIT 16. Coore D, Nagpal R, Weiss R (1997) Paradigms for structure in an amorphous computer. Technical Report AI Memo 1614, MIT 17. Demers A, Greene D, Hauser C, Irish W, Larson J, Shenker S, Stuygis H, Swinehart D, Terry D (1987) Epidemic algorithms for replicated database maintenance. In: 7th ACM Symposium on Operating Systems Principles, 1987 18. D’Hondt E, D’Hondt T (2001) Amorphous geometry. In: ECAL 2001 19. D’Hondt E, D’Hondt T (2001) Experiments in amorphous geometry. In: 2001 International Conference on Artificial Intelligence 20. Ganesan D, Krishnamachari B, Woo A, Culler D, Estrin D, Wicker S (2002) An empirical study of epidemic algorithms in large scale multihop wireless networks. Technical Report IRBTR02003, Intel Research Berkeley 21. Gayle O, Coore D (2006) Selforganizing text in an amorphous environment. In: ICCS 2006 22. Hill J, Szewcyk R, Woo A, Culler D, Hollar S, Pister K (2000) System architecture directions for networked sensors. In: ASPLOS, November 2000 23. Huzita H, Scimemi B (1989) The algebra of paperfolding. In: First International Meeting of Origami Science and Technology, 1989 24. igem 2006: international genetically engineered machine competition (2006) http://www.igem2006.com. Accessed 31 May 2007 25. Intanagonwiwat C, Govindan R, Estrin D (2000) Directed diffusion: a scalable and robust communication paradigm for sensor networks. In: Mobile Computing and Networking, pp 56–67 26. Kahn JM, Katz RH, Pister KSJ (1999) Mobile networking for smart dust. In: ACM/IEEE Int. Conf. on Mobile Computing and Networking (MobiCom 99), August 1999 27. Katzenelson J (1999) Notes on amorphous computing. (Unpublished Draft) 28. Kleinrock L, Sylvester J (1978) Optimum transmission radii for packet radio networks or why six is a magic number. In: IEEE Natl Telecommun. Conf, December 1978, pp 4.3.1–4.3.5 29. Knight TF, Sussman GJ (1998) Cellular gate technology. In: First International Conference on Unconventional Models of Computation (UMC98) 30. Kondacs A (2003) Biologicallyinspired selfassembly of 2d shapes, using globaltolocal compilation. In: International Joint Conference on Artificial Intelligence (IJCAI) 31. Mamei M, Zambonelli F (2003) Spray computers: Frontiers of selforganization for pervasive computing. In: WOA 2003 32. Mamei M, Zambonelli F (2004) Spatial computing: the tota approach. In: WOA 2004, pp 126–142 33. Mamei M, Zambonelli F (2005) Physical deployment of digital pheromones through rfid technology. In: AAMAS 2005, pp 1353–1354 34. Margolus N (1988) Physics and Computation. Ph D thesis, MIT 35. McLurkin J (2004) Stupid robot tricks: A behaviorbased distributed algorithm library for programming swarms of robots. Master’s thesis, MIT 36. Mittal V, Demirbas M, Arora A (2003) Loci: Local clustering service for large scale wireless sensor networks. Technical Report OSUCISRC2/03TR07, Ohio State University
159
160
Amorphous Computing
37. Nagpal R (1999) Organizing a global coordinate system from local information on an amorphous computer. Technical Report AI Memo 1666, MIT 38. Nagpal R (2001) Programmable SelfAssembly: Constructing Global Shape using Biologicallyinspired Local Interactions and Origami Mathematics. Ph D thesis, MIT 39. Nagpal R, Mamei M (2004) Engineering amorphous computing systems. In: Bergenti F, Gleizes MP, Zambonelli F (eds) Methodologies and Software Engineering for Agent Systems, The AgentOriented Software Engineering Handbook. Kluwer, New York, pp 303–320 40. Dust Networks. http://www.dustinc.com. Accessed 31 May 2007 41. Newton R, Morrisett G, Welsh M (2007) The regiment macroprogramming system. In: International Conference on Information Processing in Sensor Networks (IPSN’07) 42. Newton R, Welsh M (2004) Region streams: Functional macroprogramming for sensor networks. In: First International Workshop on Data Management for Sensor Networks (DMSN), August 2004 43. Patel A, Nagpal R, Gibson M, Perrimon N (2006) The emergence of geometric order in proliferating metazoan epithelia. Nature 442:1038–1041 44. Payton D, Daily M, Estowski R, Howard M, Lee C (2001) Pheromone robotics. Autonomous Robotics 11:319–324 45. Priyantha N, Chakraborty A, Balakrishnan H (2000) The cricket locationsupport system. In: ACM International Conference on Mobile Computing and Networking (ACM MOBICOM), August 2000 46. Raffle H, Parkes A, Ishii H (2004) Topobo: A constructive assembly system with kinetic memory. CHI 47. Rauch E (1999) Discrete, amorphous physical models. Master’s thesis, MIT 48. Registry of standard biological parts. http://parts.mit.edu. Accessed 31 May 2007 49. De Rosa M, Goldstein SC, Lee P, Campbell J, Pillai P (2006) Scalable shape sculpting via hole motion: Motion planning in latticeconstrained module robots. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation (ICRA ‘06), May 2006
50. Servat D, Drogoul A (2002) Combining amorphous computing and reactive agentbased systems: a paradigm for pervasive intelligence? In: AAMAS 2002 51. Stoy K (2003) Emergent Control of SelfReconfigurable Robots. Ph D thesis, University of Southern Denmark 52. Sutherland A (2003) Towards rseam: Resilient serial execution on amorphous machines. Master’s thesis, MIT 53. von Neumann J (1951) The general and logical theory of automata. In: Jeffress L (ed) Cerebral Mechanisms for Behavior. Wiley, New York, p 16 54. Weiss R, Knight T (2000) Engineered communications for microbial robotics. In: Sixth International Meeting on DNA Based Computers (DNA6) 55. Weiss R (2001) Cellular Computation and Communications using Engineered Genetic Regular Networks. Ph D thesis, MIT 56. Welsh M, Mainland G (2004) Programming sensor networks using abstract regions. In: Proceedings of the First USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI ’04), March 2004 57. Werfel J, BarYam Y, Nagpal R (2005) Building patterned structures with robot swarms. In: IJCAI 58. Whitehouse K, Sharp C, Brewer E, Culler D (2004) Hood: a neighborhood abstraction for sensor networks. In: Proceedings of the 2nd international conference on Mobile systems, applications, and services. 59. Abelson H, Sussman GJ, Sussman J (1996) Structure and Interpretation of Computer Programs, 2nd edn. MIT Press, Cambridge 60. Ashe HL, Briscoe J (2006) The interpretation of morphogen gradients. Development 133:385–94 61. Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, Netravali I (2003) Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2(1):47–84
Books and Reviews Abelson H, Allen D, Coore D, Hanson C, Homsy G, Knight T, Nagpal R, Rauch E, Sussman G, Weiss R (1999) Amorphous computing. Technical Report AIM1665, MIT
Analog Computation
Analog Computation BRUCE J. MACLENNAN Department of Electrical Engineering & Computer Science, University of Tennessee, Knoxville, USA Article Outline Glossary Deﬁnition of the Subject Introduction Fundamentals of Analog Computing Analog Computation in Nature GeneralPurpose Analog Computation Analog Computation and the Turing Limit Analog Thinking Future Directions Bibliography Glossary Accuracy The closeness of a computation to the corresponding primary system. BSS The theory of computation over the real numbers deﬁned by Blum, Shub, and Smale. Church–Turing (CT) computation The model of computation based on the Turing machine and other equivalent abstract computing machines; commonly accepted as deﬁning the limits of digital computation. EAC Extended analog computer deﬁned by Rubel. GPAC Generalpurpose analog computer. Nomograph A device for the graphical solution of equations by means of a family of curves and a straightedge. ODE Ordinary diﬀerential equation. PDE Partial diﬀerential equation. Potentiometer A variable resistance, adjustable by the computer operator, used in electronic analog computing as an attenuator for setting constants and parameters in a computation. Precision The quality of an analog representation or computation, which depends on both resolution and stability. Primary system The system being simulated, modeled, analyzed, or controlled by an analog computer, also called the target system. Scaling The adjustment, by constant multiplication, of variables in the primary system (including time) so that the corresponding variables in the analog systems are in an appropriate range. TM Turing machine.
Definition of the Subject Although analog computation was eclipsed by digital computation in the second half of the twentieth century, it is returning as an important alternative computing technology. Indeed, as explained in this article, theoretical results imply that analog computation can escape from the limitations of digital computation. Furthermore, analog computation has emerged as an important theoretical framework for discussing computation in the brain and other natural systems. Analog computation gets its name from an analogy, or systematic relationship, between the physical processes in the computer and those in the system it is intended to model or simulate (the primary system). For example, the electrical quantities voltage, current, and conductance might be used as analogs of the ﬂuid pressure, ﬂow rate, and pipe diameter. More speciﬁcally, in traditional analog computation, physical quantities in the computation obey the same mathematical laws as physical quantities in the primary system. Thus the computational quantities are proportional to the modeled quantities. This is in contrast to digital computation, in which quantities are represented by strings of symbols (e. g., binary digits) that have no direct physical relationship to the modeled quantities. According to the Oxford English Dictionary (2nd ed., s.vv. analogue, digital), these usages emerged in the 1940s. However, in a fundamental sense all computing is based on an analogy, that is, on a systematic relationship between the states and processes in the computer and those in the primary system. In a digital computer, the relationship is more abstract and complex than simple proportionality, but even so simple an analog computer as a slide rule goes beyond strict proportion (i. e., distance on the rule is proportional to the logarithm of the number). In both analog and digital computation—indeed in all computation—the relevant abstract mathematical structure of the problem is realized in the physical states and processes of the computer, but the realization may be more or less direct [40,41,46]. Therefore, despite the etymologies of the terms “analog” and “digital”, in modern usage the principal distinction between digital and analog computation is that the former operates on discrete representations in discrete steps, while the later operated on continuous representations by means of continuous processes (e. g., MacLennan [46], Siegelmann [p. 147 in 78], Small [p. 30 in 82], Weyrick [p. 3 in 89]). That is, the primary distinction resides in the topologies of the states and processes, and it would be more accurate to refer to discrete and continuous computa
161
162
Analog Computation
tion [p. 39 in 25]. (Consider socalled analog and digital clocks. The principal diﬀerence resides in the continuity or discreteness of the representation of time; the motion of the two (or three) hands of an “analog” clock do not mimic the motion of the rotating earth or the position of the sun relative to it.) Introduction History Preelectronic Analog Computation Just like digital calculation, analog computation was originally performed by hand. Thus we ﬁnd several analog computational procedures in the “constructions” of Euclidean geometry (Euclid, ﬂ. 300 BCE), which derive from techniques used in ancient surveying and architecture. For example, Problem II.51 is “to divide a given straight line into two parts, so that the rectangle contained by the whole and one of the parts shall be equal to the square of the other part”. Also, Problem VI.13 is “to ﬁnd a mean proportional between two given straight lines”, and VI.30 is “to cut a given straight line in extreme and mean ratio”. These procedures do not make use of measurements in terms of any ﬁxed unit or of digital calculation; the lengths and other continuous quantities are manipulated directly (via compass and straightedge). On the other hand, the techniques involve discrete, precise operational steps, and so they can be considered algorithms, but over continuous magnitudes rather than discrete numbers. It is interesting to note that the ancient Greeks distinguished continuous magnitudes (Grk., megethoi), which have physical dimensions (e. g., length, area, rate), from discrete numbers (Grk., arithmoi), which do not [49]. Euclid axiomatizes them separately (magnitudes in Book V, numbers in Book VII), and a mathematical system comprising both discrete and continuous quantities was not achieved until the nineteenth century in the work of Weierstrass and Dedekind. The earliest known mechanical analog computer is the “Antikythera mechanism”, which was found in 1900 in a shipwreck under the sea near the Greek island of Antikythera (between Kythera and Crete). It dates to the second century BCE and appears to be intended for astronomical calculations. The device is sophisticated (at least 70 gears) and well engineered, suggesting that it was not the ﬁrst of its type, and therefore that other analog computing devices may have been used in the ancient Mediterranean world [22]. Indeed, according to Cicero (Rep. 22) and other authors, Archimedes (c. 287–c. 212 BCE) and other ancient scientists also built analog computers, such as armillary spheres, for astronomical simu
lation and computation. Other antique mechanical analog computers include the astrolabe, which is used for the determination of longitude and a variety of other astronomical purposes, and the torquetum, which converts astronomical measurements between equatorial, ecliptic, and horizontal coordinates. A class of specialpurpose analog computer, which is simple in conception but may be used for a wide range of purposes, is the nomograph (also, nomogram, alignment chart). In its most common form, it permits the solution of quite arbitrary equations in three real variables, f (u; v; w) D 0. The nomograph is a chart or graph with scales for each of the variables; typically these scales are curved and have nonuniform numerical markings. Given values for any two of the variables, a straightedge is laid across their positions on their scales, and the value of the third variable is read oﬀ where the straightedge crosses the third scale. Nomographs were used to solve many problems in engineering and applied mathematics. They improve intuitive understanding by allowing the relationships among the variables to be visualized, and facilitate exploring their variation by moving the straightedge. Lipka (1918) is an example of a course in graphical and mechanical methods of analog computation, including nomographs and slide rules. Until the introduction of portable electronic calculators in the early 1970s, the slide rule was the most familiar analog computing device. Slide rules use logarithms for multiplication and division, and they were invented in the early seventeenth century shortly after John Napier’s description of logarithms. The midnineteenth century saw the development of the ﬁeld analogy method by G. Kirchhoﬀ (1824–1887) and others [33]. In this approach an electrical ﬁeld in an electrolytic tank or conductive paper was used to solve twodimensional boundary problems for temperature distributions and magnetic ﬁelds [p. 34 in 82]. It is an early example of analog ﬁeld computation. In the nineteenth century a number of mechanical analog computers were developed for integration and differentiation (e. g., Litka 1918, pp. 246–256; Clymer [15]). For example, the planimeter measures the area under a curve or within a closed boundary. While the operator moves a pointer along the curve, a rotating wheel accumulates the area. Similarly, the integraph is able to draw the integral of a given function as its shape is traced. Other mechanical devices can draw the derivative of a curve or compute a tangent line at a given point. In the late nineteenth century William Thomson, Lord Kelvin, constructed several analog computers, including a “tide predictor” and a “harmonic analyzer”, which com
Analog Computation
puted the Fourier coeﬃcients of a tidal curve [85,86]. In 1876 he described how the mechanical integrators invented by his brother could be connected together in a feedback loop in order to solve second and higher order diﬀerential equations (Small [pp. 34–35, 42 in 82], Thomson [84]). He was unable to construct this diﬀerential analyzer, which had to await the invention of the torque ampliﬁer in 1927. The torque ampliﬁer and other technical advancements permitted Vannevar Bush at MIT to construct the ﬁrst practical diﬀerential analyzer in 1930 [pp. 42– 45 in 82]. It had six integrators and could also do addition, subtraction, multiplication, and division. Input data were entered in the form of continuous curves, and the machine automatically plotted the output curves continuously as the equations were integrated. Similar diﬀerential analyzers were constructed at other laboratories in the US and the UK. Setting up a problem on the MIT diﬀerential analyzer took a long time; gears and rods had to be arranged to deﬁne the required dependencies among the variables. Bush later designed a much more sophisticated machine, the Rockefeller Diﬀerential Analyzer, which became operational in 1947. With 18 integrators (out of a planned 30), it provided programmatic control of machine setup, and permitted several jobs to be run simultaneously. Mechanical diﬀerential analyzers were rapidly supplanted by electronic analog computers in the mid1950s, and most were disassembled in the 1960s (Bowles [10], Owens [61], Small [pp. 50–45 in 82]). During World War II, and even later wars, an important application of optical and mechanical analog computation was in “gun directors” and “bomb sights”, which performed ballistic computations to accurately target artillery and dropped ordnance. Electronic Analog Computation in the 20th Century It is commonly supposed that electronic analog computers were superior to mechanical analog computers, and they were in many respects, including speed, cost, ease of construction, size, and portability [pp. 54–56 in 82]. On the other hand, mechanical integrators produced higher precision results (0.1% vs. 1% for early electronic devices) and had greater mathematical ﬂexibility (they were able to integrate with respect to any variable, not just time). However, many important applications did not require high precision and focused on dynamic systems for which time integration was suﬃcient. Analog computers (nonelectronic as well as electronic) can be divided into activeelement and passiveelement computers; the former involve some kind of am
pliﬁcation, the latter do not [pp. 21–4 in 87]. Passiveelement computers included the network analyzers, which were developed in the 1920s to analyze electric power distribution networks, and which continued in use through the 1950s [pp. 35–40 in 82]. They were also applied to problems in thermodynamics, aircraft design, and mechanical engineering. In these systems networks or grids of resistive elements or reactive elements (i. e., involving capacitance and inductance as well as resistance) were used to model the spatial distribution of physical quantities such as voltage, current, and power (in electric distribution networks), electrical potential in space, stress in solid materials, temperature (in heat diﬀusion problems), pressure, ﬂuid ﬂow rate, and wave amplitude [p. 22 in 87]. That is, network analyzers dealt with partial diﬀerential equations (PDEs), whereas activeelement computers, such as the diﬀerential analyzer and its electronic successors, were restricted to ordinary diﬀerential equations (ODEs) in which time was the independent variable. Large network analyzers are early examples of analog ﬁeld computers. Electronic analog computers became feasible after the invention of the DC operational ampliﬁer (“op amp”) c. 1940 [pp. 64, 67–72 in 82]. Already in the 1930s scientists at Bell Telephone Laboratories (BTL) had developed the DCcoupled feedbackstabilized ampliﬁer, which is the basis of the op amp. In 1940, as the USA prepared to enter World War II, DL Parkinson at BTL had a dream in which he saw DC ampliﬁers being used to control an antiaircraft gun. As a consequence, with his colleagues CA Lovell and BT Weber, he wrote a series of papers on “electrical mathematics”, which described electrical circuits to “operationalize” addition, subtraction, integration, diﬀerentiation, etc. The project to produce an electronic gundirector led to the development and reﬁnement of DC op amps suitable for analog computation. The wartime work at BTL was focused primarily on control applications of analog devices, such as the gundirector. Other researchers, such as E. Lakatos at BTL, were more interested in applying them to generalpurpose analog computation for science and engineering, which resulted in the design of the General Purpose Analog Computer (GPAC), also called “Gypsy”, completed in 1949 [pp. 69–71 in 82]. Building on the BTL op amp design, fundamental work on electronic analog computation was conducted at Columbia University in the 1940s. In particular, this research showed how analog computation could be applied to the simulation of dynamic systems and to the solution of nonlinear equations. Commercial generalpurpose analog computers (GPACs) emerged in the late 1940s and early 1950s [pp. 72–73 in 82]. Typically they provided several dozen
163
164
Analog Computation
integrators, but several GPACs could be connected together to solve larger problems. Later, largescale GPACs might have up to 500 ampliﬁers and compute with 0.01%– 0.1% precision [p. 233 in 87]. Besides integrators, typical GPACs provided adders, subtracters, multipliers, ﬁxed function generators (e. g., logarithms, exponentials, trigonometric functions), and variable function generators (for userdeﬁned functions) [Chaps. 1.3, 2.4 in 87]. A GPAC was programmed by connecting these components together, often by means of a patch panel. In addition, parameters could be entered by adjusting potentiometers (attenuators), and arbitrary functions could be entered in the form of graphs [pp. 172–81, 2154–156 in 87]. Output devices plotted data continuously or displayed it numerically [pp. 31–30 in 87]. The most basic way of using a GPAC was in singleshot mode [pp. 168–170 in 89]. First, parameters and initial values were entered into the potentiometers. Next, putting a master switch in “reset” mode controlled relays to apply the initial values to the integrators. Turning the switch to “operate” or “compute” mode allowed the computation to take place (i. e., the integrators to integrate). Finally, placing the switch in “hold” mode stopped the computation and stabilized the values, allowing them to be read from the computer (e. g., on voltmeters). Although singleshot operation was also called “slow operation” (in comparison to “repetitive operation”, discussed next), it was in practice quite fast. Because all of the devices computed in parallel and at electronic speeds, analog computers usually solved problems in realtime but often much faster (Truitt and Rogers [pp. 130–32 in 87], Small [p. 72 in 82]). One common application of GPACs was to explore the eﬀect of one or more parameters on the behavior of a system. To facilitate this exploration of the parameter space, some GPACs provided a repetitive operation mode, which worked as follows (Weyrick [p. 170 in 89], Small [p. 72 in 82]). An electronic clock switched the computer between reset and compute modes at an adjustable rate (e. g., 10–1000 cycles per second) [p. 280, n. 1 in 2]. In eﬀect the simulation was rerun at the clock rate, but if any parameters were adjusted, the simulation results would vary along with them. Therefore, within a few seconds, an entire family of related simulations could be run. More importantly, the operator could acquire an intuitive understanding of the system’s dependence on its parameters. The Eclipse of Analog Computing A common view is that electronic analog computers were a primitive predecessor of the digital computer, and that their use was just a historical episode, or even a digression, in the inevitable triumph of digital technology. It is supposed that the cur
rent digital hegemony is a simple matter of technological superiority. However, the history is much more complicated, and involves a number of social, economic, historical, pedagogical, and also technical factors, which are outside the scope of this article (see Small [81] and Small [82], especially Chap. 8, for more information). In any case, beginning after World War II and continuing for twentyﬁve years, there was lively debate about the relative merits of analog and digital computation. Speed was an oftcited advantage of analog computers [Chap. 8 in 82]. While early digital computers were much faster than mechanical diﬀerential analyzers, they were slower (often by several orders of magnitude) than electronic analog computers. Furthermore, although digital computers could perform individual arithmetic operations rapidly, complete problems were solved sequentially, one operation at a time, whereas analog computers operated in parallel. Thus it was argued that increasingly large problems required more time to solve on a digital computer, whereas on an analog computer they might require more hardware but not more time. Even as digital computing speed was improved, analog computing retained its advantage for several decades, but this advantage eroded steadily. Another important issue was the comparative precision of digital and analog computation [Chap. 8 in 82]. Analog computers typically computed with three or four digits of precision, and it was very expensive to do much better, due to the diﬃculty of manufacturing the parts and other factors. In contrast, digital computers could perform arithmetic operations with many digits of precision, and the hardware cost was approximately proportional to the number of digits. Against this, analog computing advocates argued that many problems did not require such high precision, because the measurements were known to only a few signiﬁcant ﬁgures and the mathematical models were approximations. Further, they distinguished between precision and accuracy, which refers to the conformity of the computation to physical reality, and they argued that digital computation was often less accurate than analog, due to numerical limitations (e. g., truncation, cumulative error in numerical integration). Nevertheless, some important applications, such as the calculation of missile trajectories, required greater precision, and for these, digital computation had the advantage. Indeed, to some extent precision was viewed as inherently desirable, even in applications where it was unimportant, and it was easily mistaken for accuracy. (See Sect. “Precision” for more on precision and accuracy.) There was even a social factor involved, in that the written programs, precision, and exactness of digital com
Analog Computation
putation were associated with mathematics and science, but the handson operation, parameter variation, and approximate solutions of analog computation were associated with engineers, and so analog computing inherited “the lower status of engineering visàvis science” [p. 251 in 82]. Thus the status of digital computing was further enhanced as engineering became more mathematical and scientiﬁc after World War II [pp. 247– 251 in 82]. Already by the mid1950s the competition between analog and digital had evolved into the idea that they were complementary technologies. This resulted in the development of a variety of hybrid analog/digital computing systems [pp. 251–253, 263–266 in 82]. In some cases this involved using a digital computer to control an analog computer by using digital logic to connect the analog computing elements, set parameters, and gather data. This improved the accessibility and usability of analog computers, but had the disadvantage of distancing the user from the physical analog system. The intercontinental ballistic missile program in the USA stimulated the further development of hybrid computers in the late 1950s and 1960s [81]. These applications required the speed of analog computation to simulate the closedloop control systems and the precision of digital computation for accurate computation of trajectories. However, by the early 1970s hybrids were being displaced by all digital systems. Certainly part of the reason was the steady improvement in digital technology, driven by a vibrant digital computer industry, but contemporaries also pointed to an inaccurate perception that analog computing was obsolete and to a lack of education about the advantages and techniques of analog computing. Another argument made in favor of digital computers was that they were generalpurpose, since they could be used in business data processing and other application domains, whereas analog computers were essentially specialpurpose, since they were limited to scientiﬁc computation [pp. 248–250 in 82]. Against this it was argued that all computing is essentially computing by analogy, and therefore analog computation was generalpurpose because the class of analog computers included digital computers! (See also Sect. “Deﬁnition of the Subject” on computing by analogy.) Be that as it may, analog computation, as normally understood, is restricted to continuous variables, and so it was not immediately applicable to discrete data, such as that manipulated in business computing and other nonscientiﬁc applications. Therefore business (and eventually consumer) applications motivated the computer industry’s investment in digital computer technology at the expense of analog technology.
Although it is commonly believed that analog computers quickly disappeared after digital computers became available, this is inaccurate, for both generalpurpose and specialpurpose analog computers have continued to be used in specialized applications to the present time. For example, a generalpurpose electrical (vs. electronic) analog computer, the Anacom, was still in use in 1991. This is not technological atavism, for “there is no doubt considerable truth in the fact that Anacom continued to be used because it eﬀectively met a need in a historically neglected but nevertheless important computer application area” [3]. As mentioned, the reasons for the eclipse of analog computing were not simply the technological superiority of digital computation; the conditions were much more complex. Therefore a change in conditions has necessitated a reevaluation of analog technology. Analog VLSI In the mid1980s, Carver Mead, who already had made important contributions to digital VLSI technology, began to advocate for the development of analog VLSI [51,52]. His motivation was that “the nervous system of even a very simple animal contains computing paradigms that are orders of magnitude more eﬀective than are those found in systems made by humans” and that they “can be realized in our most commonly available technology—silicon integrated circuits” [pp. xi in 52]. However, he argued, since these natural computation systems are analog and highly nonlinear, progress would require understanding neural information processing in animals and applying it in a new analog VLSI technology. Because analog computation is closer to the physical laws by which all computation is realized (which are continuous), analog circuits often use fewer devices than corresponding digital circuits. For example, a fourquadrant adder (capable of adding two signed numbers) can be fabricated from four transistors [pp. 87–88 in 52], and a fourquadrant multiplier from nine to seventeen, depending on the required range of operation [pp. 90–96 in 52]. Intuitions derived from digital logic about what is simple or complex to compute are often misleading when applied to analog computation. For example, two transistors are suﬃcient to compute the logarithm or exponential, ﬁve for the hyperbolic tangent (which is very useful in neural computation), and three for the square root [pp. 70– 71, 97–99 in 52]. Thus analog VLSI is an attractive approach to “postMoore’s Law computing” (see Sect. “Future Directions” below). Mead and his colleagues demonstrated a number of analog VLSI devices inspired by the nervous system, including a “silicon retina” and an “electronic cochlea” [Chaps. 15–16 in 52], research that has lead to a renaissance of interest in electronic analog computing.
165
166
Analog Computation
NonElectronic Analog Computation As will be explained in the body of this article, analog computation suggests many opportunities for future computing technologies. Many physical phenomena are potential media for analog computation provided they have useful mathematical structure (i. e., the mathematical laws describing them are mathematical functions useful for general or specialpurpose computation), and they are suﬃciently controllable for practical use. Article Roadmap The remainder of this article will begin by summarizing the fundamentals of analog computing, starting with the continuous state space and the various processes by which analog computation can be organized in time. Next it will discuss analog computation in nature, which provides models and inspiration for many contemporary uses of analog computation, such as neural networks. Then we consider generalpurpose analog computing, both from a theoretical perspective and in terms of practical generalpurpose analog computers. This leads to a discussion of the theoretical power of analog computation and in particular to the issue of whether analog computing is in some sense more powerful than digital computing. We brieﬂy consider the cognitive aspects of analog computing, and whether it leads to a diﬀerent approach to computation than does digital computing. Finally, we conclude with some observations on the role of analog computation in “postMoore’s Law computing”. Fundamentals of Analog Computing Continuous State Space As discussed in Sect. “Introduction”, the fundamental characteristic that distinguishes analog from digital computation is that the state space is continuous in analog computation and discrete in digital computation. Therefore it might be more accurate to call analog and digital computation continuous and discrete computation, respectively. Furthermore, since the earliest days there have been hybrid computers that combine continuous and discrete state spaces and processes. Thus, there are several respects in which the state space may be continuous. In the simplest case the state space comprises a ﬁnite (generally modest) number of variables, each holding a continuous quantity (e. g., voltage, current, charge). In a traditional GPAC they correspond to the variables in the ODEs deﬁning the computational process, each typically having some independent meaning in the analysis of the problem. Mathematically, the variables are taken to contain bounded real numbers, although complexvalued
variables are also possible (e. g., in AC electronic analog computers). In a practical sense, however, their precision is limited by noise, stability, device tolerance, and other factors (discussed below, Sect. “Characteristics of Analog Computation”). In typical analog neural networks the state space is larger in dimension but more structured than in the former case. The artiﬁcial neurons are organized into one or more layers, each composed of a (possibly large) number of artiﬁcial neurons. Commonly each layer of neurons is densely connected to the next layer. In general the layers each have some meaning in the problem domain, but the individual neurons constituting them do not (and so, in mathematical descriptions, the neurons are typically numbered rather than named). The individual artiﬁcial neurons usually perform a simple computation such as this: y D (s) ;
where s D b C
n X
wi xi ;
iD1
and where y is the activity of the neuron, x1 ; : : : ; x n are the activities of the neurons that provide its inputs, b is a bias term, and w1 ; : : : ; w n are the weights or strengths of the connections. Often the activation function is a realvalued sigmoid (“Sshaped”) function, such as the logistic sigmoid, (s) D
1 ; 1 C es
in which case the neuron activity y is a real number, but some applications use a discontinuous threshold function, such as the Heaviside function, ( C1 if s 0 U(s) D 0 if s < 0 in which case the activity is a discrete quantity. The saturatedlinear or piecewiselinear sigmoid is also used occasionally: 8 ˆ 1 (s) D s if 0 s 1 ˆ : 0 if s < 0 : Regardless of whether the activation function is continuous or discrete, the bias b and connection weights w1 ; : : : ; w n are real numbers, as is the “net input” P s D i w i x i to the activation function. Analog computation may be used to evaluate the linear combination s and the activation function (s), if it is realvalued. The biases
Analog Computation
and weights are normally determined by a learning algorithm (e. g., backpropagation), which is also a good candidate for analog implementation. In summary, the continuous state space of a neural network includes the bias values and net inputs of the neurons and the interconnection strengths between the neurons. It also includes the activity values of the neurons, if the activation function is a realvalued sigmoid function, as is often the case. Often large groups (“layers”) of neurons (and the connections between these groups) have some intuitive meaning in the problem domain, but typically the individual neuron activities, bias values, and interconnection weights do not. If we extrapolate the number of neurons in a layer to the continuum limit, we get a ﬁeld, which may be deﬁned as a continuous distribution of continuous quantity. Treating a group of artiﬁcial or biological neurons as a continuous mass is a reasonable mathematical approximation if their number is suﬃciently large and if their spatial arrangement is signiﬁcant (as it generally is in the brain). Fields are especially useful in modeling cortical maps, in which information is represented by the pattern of activity over a region of neural cortex. In ﬁeld computation the state space in continuous in two ways: it is continuous in variation but also in space. Therefore, ﬁeld computation is especially applicable to solving PDEs and to processing spatially extended information such as visual images. Some early analog computing devices were capable of ﬁeld computation [pp. 114–17, 22–16 in 87]. For example, as previously mentioned (Sect. “Introduction”), large resistor and capacitor networks could be used for solving PDEs such as diﬀusion problems. In these cases a discrete ensemble of resistors and capacitors was used to approximate a continuous ﬁeld, while in other cases the computing medium was spatially continuous. The latter made use of conductive sheets (for twodimensional ﬁelds) or electrolytic tanks (for two or threedimensional ﬁelds). When they were applied to steadystate spatial problems, these analog computers were called ﬁeld plotters or potential analyzers. The ability to fabricate very large arrays of analog computing devices, combined with the need to exploit massive parallelism in realtime computation and control applications, creates new opportunities for ﬁeld computation [37,38,43]. There is also renewed interest in using physical ﬁelds in analog computation. For example, Rubel [73] deﬁned an abstract extended analog computer (EAC), which augments Shannon’s [77] general purpose analog computer with (unspeciﬁed) facilities for ﬁeld computation, such as PDE solvers (see Sects. Shannon’s Analysis – Rubel’s Extended Analog Computer below). JW. Mills
has explored the practical application of these ideas in his artiﬁcial neural ﬁeld networks and VLSI EACs, which use the diﬀusion of electrons in bulk silicon or conductive gels and plastics for 2D and 3D ﬁeld computation [53,54]. Computational Process We have considered the continuous state space, which is the basis for analog computing, but there are a variety of ways in which analog computers can operate on the state. In particular, the state can change continuously in time or be updated at distinct instants (as in digital computation). Continuous Time Since the laws of physics on which analog computing is based are diﬀerential equations, many analog computations proceed in continuous real time. Also, as we have seen, an important application of analog computers in the late 19th and early 20th centuries was the integration of ODEs in which time is the independent variable. A common technique in analog simulation of physical systems is time scaling, in which the diﬀerential equations are altered systematically so the simulation proceeds either more slowly or more quickly than the primary system (see Sect. “Characteristics of Analog Computation” for more on time scaling). On the other hand, because analog computations are close to the physical processes that realize them, analog computing is rapid, which makes it very suitable for realtime control applications. In principle, any mathematically describable physical process operating on timevarying physical quantities can be used for analog computation. In practice, however, analog computers typically provide familiar operations that scientists and engineers use in diﬀerential equations [70,87]. These include basic arithmetic operations, such as algebraic sum and diﬀerence (u(t) D v(t) ˙ w(t)), constant multiplication or scaling (u(t) D cv(t)), variable multiplication and division (u(t) D v(t)w(t), u(t) D v(t)/w(t)), and inversion (u(t) D v(t)). Transcendental functions may be provided, such as the exponential (u(t) D exp v(t)), logarithm (u(t) D ln v(t)), trigonometric functions (u(t) D sin v(t), etc.), and resolvers for converting between polar and rectangular coordinates. Most important, of course, is deﬁnite integraRt tion (u(t) D v0 C 0 v( ) d ), but diﬀerentiation may also be provided (u(t) D v˙(t)). Generally, however, direct differentiation is avoided, since noise tends to have a higher frequency than the signal, and therefore diﬀerentiation ampliﬁes noise; typically problems are reformulated to avoid direct diﬀerentiation [pp. 26–27 in 89]. As previously mentioned, many GPACs include (arbitrary) function generators, which allow the use of functions deﬁned
167
168
Analog Computation
only by a graph and for which no mathematical deﬁnition might be available; in this way empirically deﬁned functions can be used [pp. 32–42 in 70]. Thus, given a graph (x; f (x)), or a suﬃcient set of samples, (x k ; f (x k )), the function generator approximates u(t) D f (v(t)). Rather less common are generators for arbitrary functions of two variables, u(t) D f (v(t); w(t)), in which the function may be deﬁned by a surface, (x; y; f (x; y)), or by suﬃcient samples from it. Although analog computing is primarily continuous, there are situations in which discontinuous behavior is required. Therefore some analog computers provide comparators, which produce a discontinuous result depending on the relative value of two input values. For example, ( k if v w ; uD 0 if v < w : Typically, this would be implemented as a Heaviside (unit step) function applied to the diﬀerence of the inputs, w D kU(v w). In addition to allowing the deﬁnition of discontinuous functions, comparators provide a primitive decision making ability, and may be used, for example to terminate a computation (switching the computer from “operate” to “hold” mode). Other operations that have proved useful in analog computation are time delays and noise generators [Chap. 7 in 31]. The function of a time delay is simply to retard the signal by an adjustable delay T > 0 : u(t C T) D v(t). One common application is to model delays in the primary system (e. g., human response time). Typically a noise generator produces timeinvariant Gaussiandistributed noise with zero mean and a ﬂat power spectrum (over a band compatible with the analog computing process). The standard deviation can be adjusted by scaling, the mean can be shifted by addition, and the spectrum altered by ﬁltering, as required by the application. Historically noise generators were used to model noise and other random eﬀects in the primary system, to determine, for example, its sensitivity to eﬀects such as turbulence. However, noise can make a positive contribution in some analog computing algorithms (e. g., for symmetry breaking and in simulated annealing, weight perturbation learning, and stochastic resonance). As already mentioned, some analog computing devices for the direct solution of PDEs have been developed. In general a PDE solver depends on an analogous physical process, that is, on a process obeying the same class of PDEs that it is intended to solve. For example, in Mill’s EAC, diﬀusion of electrons in conductive sheets or solids is used to solve diﬀusion equations [53,54]. His
torically, PDEs were solved on electronic GPACs by discretizing all but one of the independent variables, thus replacing the diﬀerential equations by diﬀerence equations [70], pp. 173–193. That is, computation over a ﬁeld was approximated by computation over a ﬁnite real array. Reactiondiﬀusion computation is an important example of continuoustime analog computing ReactionDiﬀusion Computing. The state is represented by a set of timevarying chemical concentration ﬁelds, c1 ; : : : ; c n . These ﬁelds are distributed across a one, two, or threedimensional space ˝, so that, for x 2 ˝, c k (x; t) represents the concentration of chemical k at location x and time t. Computation proceeds in continuous time according to reactiondiﬀusion equations, which have the form: @c D Dr 2 c C F(c) ; @t where c D (c1 ; : : : ; c n )T is the vector of concentrations, D D diag(d1 ; : : : ; d n ) is a diagonal matrix of positive diffusion rates, and F is nonlinear vector function that describes how the chemical reactions aﬀect the concentrations. Some neural net models operate in continuous time and thus are examples of continuoustime analog computation. For example, Grossberg [26,27,28] deﬁnes the activity of a neuron by diﬀerential equations such as this: n n X X x˙ i D a i x i C b i j w (C) f (x ) c i j w () j j ij i j g j (x j )C I i : jD1
jD1
This describes the continuous change in the activity of neuron i resulting from passive decay (ﬁrst term), positive feedback from other neurons (second term), negative feedback (third term), and input (last term). The f j and g j are () nonlinear activation functions, and the w (C) i j and w i j are adaptable excitatory and inhibitory connection strengths, respectively. The continuous Hopﬁeld network is another example of continuoustime analog computation [30]. The output y i of a neuron is a nonlinear function of its internal state xi , y i D (x i ), where the hyperbolic tangent is usually used as the activation function, (x) D tanh x, because its range is [1; 1]. The internal state is deﬁned by a diﬀerential equation, i x˙ i D a i x i C b i C
n X
wi j y j ;
jD1
where i is a time constant, ai is the decay rate, b i is the bias, and wij is the connection weight to neuron i from
Analog Computation
neuron j. In a Hopﬁeld network every neuron is symmetrically connected to every other (w i j D w ji ) but not to itself (w i i D 0). Of course analog VLSI implementations of neural networks also operate in continuous time (e. g., [20,52].) Concurrent with the resurgence of interest in analog computation have been innovative reconceptualizations of continuoustime computation. For example, Brockett [12] has shown that dynamical systems can perform a number of problems normally considered to be intrinsically sequential. In particular, a certain system of ODEs (a nonperiodic ﬁnite Toda lattice) can sort a list of numbers by continuoustime analog computation. The system is started with the vector x equal to the values to be sorted and a vector y initialized to small nonzero values; the y vector converges to a sorted permutation of x. Sequential Time computation refers to computation in which discrete computational operations take place in succession but at no deﬁnite interval [88]. Ordinary digital computer programs take place in sequential time, for the operations occur one after another, but the individual operations are not required to have any speciﬁc duration, so long as they take ﬁnite time. One of the oldest examples of sequential analog computation is provided by the compassandstraightedge constructions of traditional Euclidean geometry (Sect. “Introduction”). These computations proceed by a sequence of discrete operations, but the individual operations involve continuous representations (e. g., compass settings, straightedge positions) and operate on a continuous state (the ﬁgure under construction). Slide rule calculation might seem to be an example of sequential analog computation, but if we look at it, we see that although the operations are performed by an analog device, the intermediate results are recorded digitally (and so this part of the state space is discrete). Thus it is a kind of hybrid computation. The familiar digital computer automates sequential digital computations that once were performed manually by human “computers”. Sequential analog computation can be similarly automated. That is, just as the control unit of an ordinary digital computer sequences digital computations, so a digital control unit can sequence analog computations. In addition to the analog computation devices (adders, multipliers, etc.), such a computer must provide variables and registers capable of holding continuous quantities between the sequential steps of the computation (see also Sect. “Discrete Time” below). The primitive operations of sequentialtime analog computation are typically similar to those in continuoustime computation (e. g., addition, multiplication, tran
scendental functions), but integration and diﬀerentiation with respect to sequential time do not make sense. However, continuoustime integration within a single step, and spacedomain integration, as in PDE solvers or ﬁeld computation devices, are compatible with sequential analog computation. In general, any model of digital computation can be converted to a similar model of sequential analog computation by changing the discrete state space to a continuum, and making appropriate changes to the rest of the model. For example, we can make an analog Turing machine by allowing it to write a bounded real number (rather than a symbol from a ﬁnite alphabet) onto a tape cell. The Turing machine’s ﬁnite control can be altered to test for tape markings in some speciﬁed range. Similarly, in a series of publications Blum, Shub, and Smale developed a theory of computation over the reals, which is an abstract model of sequentialtime analog computation [6,7]. In this “BSS model” programs are represented as ﬂowcharts, but they are able to operate on realvalued variables. Using this model they were able to prove a number of theorems about the complexity of sequential analog algorithms. The BSS model, and some other sequential analog computation models, assume that it is possible to make exact comparisons between real numbers (analogous to exact comparisons between integers or discrete symbols in digital computation) and to use the result of the comparison to control the path of execution. Comparisons of this kind are problematic because they imply inﬁnite precision in the comparator (which may be defensible in a mathematical model but is impossible in physical analog devices), and because they make the execution path a discontinuous function of the state (whereas analog computation is usually continuous). Indeed, it has been argued that this is not “true” analog computation [p. 148 in 78]. Many artiﬁcial neural network models are examples of sequentialtime analog computation. In a simple feedforward neural network, an input vector is processed by the layers in order, as in a pipeline. That is, the output of layer n becomes the input of layer nC1. Since the model does not make any assumptions about the amount of time it takes a vector to be processed by each layer and to propagate to the next, execution takes place in sequential time. Most recurrent neural networks, which have feedback, also operate in sequential time, since the activities of all the neurons are updated synchronously (that is, signals propagate through the layers, or back to earlier layers, in lockstep). Many artiﬁcial neuralnet learning algorithms are also sequentialtime analog computations. For example, the
169
170
Analog Computation
backpropagation algorithm updates a network’s weights, moving sequentially backward through the layers. In summary, the correctness of sequential time computation (analog or digital) depends on the order of operations, not on their duration, and similarly the eﬃciency of sequential computations is evaluated in terms of the number of operations, not on their total duration. Discrete Time analog computation has similarities to both continuoustime and sequential analog computation. Like the latter, it proceeds by a sequence of discrete (analog) computation steps; like the former, these steps occur at a constant rate in real time (e. g., some “frame rate”). If the realtime rate is suﬃcient for the application, then discretetime computation can approximate continuoustime computation (including integration and diﬀerentiation). Some electronic GPACs implemented discretetime analog computation by a modiﬁcation of repetitive operation mode, called iterative analog computation [Chap. 9 in 2]. Recall (Sect. “Electronic Analog Computation in the 20th Century”) that in repetitive operation mode a clock rapidly switched the computer between reset and compute modes, thus repeating the same analog computation, but with diﬀerent parameters (set by the operator). However, each repetition was independent of the others. Iterative operation was diﬀerent in that analog values computed by one iteration could be used as initial values in the next. This was accomplished by means of an analog memory circuit (based on an op amp) that sampled an analog value at the end of one compute cycle (eﬀectively during hold mode) and used it to initialize an integrator during the following reset cycle. (A modiﬁed version of the memory circuit could be used to retain a value over several iterations.) Iterative computation was used for problems such as determining, by iterative search or reﬁnement, the initial conditions that would lead to a desired state at a future time. Since the analog computations were iterated at a ﬁxed clock rate, iterative operation is an example of discretetime analog computation. However, the clock rate is not directly relevant in some applications (such as the iterative solution of boundary value problems), in which case iterative operation is better characterized as sequential analog computation. The principal contemporary examples of discretetime analog computing are in neural network applications to timeseries analysis and (discretetime) control. In each of these cases the input to the neural net is a sequence of discretetime samples, which propagate through the net and generate discretetime output signals. Many of these neural nets are recurrent, that is, values from later layers
are fed back into earlier layers, which allows the net to remember information from one sample to the next. Analog Computer Programs The concept of a program is central to digital computing, both practically, for it is the means for programming generalpurpose digital computers, and theoretically, for it deﬁnes the limits of what can be computed by a universal machine, such as a universal Turing machine. Therefore it is important to discuss means for describing or specifying analog computations. Traditionally, analog computers were used to solve ODEs (and sometimes PDEs), and so in one sense a mathematical diﬀerential equation is one way to represent an analog computation. However, since the equations were usually not suitable for direct solution on an analog computer, the process of programming involved the translation of the equations into a schematic diagram showing how the analog computing devices (integrators etc.) should be connected to solve the problem. These diagrams are the closest analogies to digital computer programs and may be compared to ﬂowcharts, which were once popular in digital computer programming. It is worth noting, however, that ﬂowcharts (and ordinary computer programs) represent sequences among operations, whereas analog computing diagrams represent functional relationships among variables, and therefore a kind of parallel data ﬂow. Diﬀerential equations and schematic diagrams are suitable for continuoustime computation, but for sequential analog computation something more akin to a conventional digital program can be used. Thus, as previously discussed (Sect. “Sequential Time”), the BSS system uses ﬂowcharts to describe sequential computations over the reals. Similarly, C. Moore [55] deﬁnes recursive functions over the reals by means of a notation similar to a programming language. In principle any sort of analog computation might involve constants that are arbitrary real numbers, which therefore might not be expressible in ﬁnite form (e. g., as a ﬁnite string of digits). Although this is of theoretical interest (see Sect. “Realvalued Inputs, Outputs, and Constants” below), from a practical standpoint these constants could be set with about at most four digits of precision [p. 11 in 70]. Indeed, automatic potentiometersetting devices were constructed that read a series of decimal numerals from punched paper tape and used them to set the potentiometers for the constants [pp. 358–60 in 87]. Nevertheless it is worth observing that analog computers do allow continuous inputs that need not be expressed in digital notation, for example, when the parameters of a sim
Analog Computation
ulation are continuously varied by the operator. In principle, therefore, an analog program can incorporate constants that are represented by a realvalued physical quantity (e. g., an angle or a distance), which need not be expressed digitally. Further, as we have seen (Sect. “Electronic Analog Computation in the 20th Century”), some electronic analog computers could compute a function by means of an arbitrarily drawn curve, that is, not represented by an equation or a ﬁnite set of digitized points. Therefore, in the context of analog computing it is natural to expand the concept of a program beyond discrete symbols to include continuous representations (scalar magnitudes, vectors, curves, shapes, surfaces, etc.). Typically such continuous representations would be used as adjuncts to conventional discrete representations of the analog computational process, such as equations or diagrams. However, in some cases the most natural static representation of the process is itself continuous, in which case it is more like a “guiding image” than a textual prescription [42]. A simple example is a potential surface, which deﬁnes a continuum of trajectories from initial states (possible inputs) to ﬁxedpoint attractors (the results of the computations). Such a “program” may deﬁne a deterministic computation (e. g., if the computation proceeds by gradient descent), or it may constrain a nondeterministic computation (e. g., if the computation may proceed by any potentialdecreasing trajectory). Thus analog computation suggests a broadened notion of programs and programming. Characteristics of Analog Computation Precision Analog computation is evaluated in terms of both accuracy and precision, but the two must be distinguished carefully [pp. 25–28 in 2,pp. 12–13 in 89,pp. 257– 261 in 82]. Accuracy refers primarily to the relationship between a simulation and the primary system it is simulating or, more generally, to the relationship between the results of a computation and the mathematically correct result. Accuracy is a result of many factors, including the mathematical model chosen, the way it is set up on a computer, and the precision of the analog computing devices. Precision, therefore, is a narrower notion, which refers to the quality of a representation or computing device. In analog computing, precision depends on resolution (ﬁneness of operation) and stability (absence of drift), and may be measured as a fraction of the represented value. Thus a precision of 0.01% means that the representation will stay within 0.01% of the represented value for a reasonable period of time. For purposes of comparing analog devices, the precision is usually expressed as a fraction of fullscale
variation (i. e., the diﬀerence between the maximum and minimum representable values). It is apparent that the precision of analog computing devices depends on many factors. One is the choice of physical process and the way it is utilized in the device. For example a linear mathematical operation can be realized by using a linear region of a nonlinear physical process, but the realization will be approximate and have some inherent imprecision. Also, associated, unavoidable physical eﬀects (e. g., loading, and leakage and other losses) may prevent precise implementation of an intended mathematical function. Further, there are fundamental physical limitations to resolution (e. g., quantum eﬀects, diﬀraction). Noise is inevitable, both intrinsic (e. g., thermal noise) and extrinsic (e. g., ambient radiation). Changes in ambient physical conditions, such as temperature, can aﬀect the physical processes and decrease precision. At slower time scales, materials and components age and their physical characteristics change. In addition, there are always technical and economic limits to the control of components, materials, and processes in analog device fabrication. The precision of analog and digital computing devices depend on very diﬀerent factors. The precision of a (binary) digital device depends on the number of bits, which inﬂuences the amount of hardware, but not its quality. For example, a 64bit adder is about twice the size of a 32bit adder, but can made out of the same components. At worst, the size of a digital device might increase with the square of the number of bits of precision. This is because binary digital devices only need to represent two states, and therefore they can operate in saturation. The fabrication standards suﬃcient for the ﬁrst bit of precision are also suﬃcient for the 64th bit. Analog devices, in contrast, need to be able to represent a continuum of states precisely. Therefore, the fabrication of highprecision analog devices is much more expensive than lowprecision devices, since the quality of components, materials, and processes must be much more carefully controlled. Doubling the precision of an analog device may be expensive, whereas the cost of each additional bit of digital precision is incremental; that is, the cost is proportional to the logarithm of the precision expressed as a fraction of full range. The forgoing considerations might seem to be a convincing argument for the superiority of digital to analog technology, and indeed they were an important factor in the competition between analog and digital computers in the middle of the twentieth century [pp. 257–261 in 82]. However, as was argued at that time, many computer applications do not require high precision. Indeed, in many engineering applications, the input data are known to only a few digits, and the equations may be approximate or de
171
172
Analog Computation
rived from experiments. In these cases the very high precision of digital computation is unnecessary and may in fact be misleading (e. g., if one displays all 14 digits of a result that is accurate to only three). Furthermore, many applications in image processing and control do not require high precision. More recently, research in artiﬁcial neural networks (ANNs) has shown that lowprecision analog computation is suﬃcient for almost all ANN applications. Indeed, neural information processing in the brain seems to operate with very low precision (perhaps less than 10% [p. 378 in 50]), for which it compensates with massive parallelism. For example, by coarse coding a population of lowprecision devices can represent information with relatively high precision [pp. 91–96 in 74,75]. Scaling An important aspect of analog computing is scaling, which is used to adjust a problem to an analog computer. First is time scaling, which adjusts a problem to the characteristic time scale at which a computer operates, which is a consequence of its design and the physical processes by which it is realized [pp. 37–44 in 62,pp. 262– 263 in 70,pp. 241–243 in 89]. For example, we might want a simulation to proceed on a very diﬀerent time scale from the primary system. Thus a weather or economic simulation should proceed faster than real time in order to get useful predictions. Conversely, we might want to slow down a simulation of protein folding so that we can observe the stages in the process. Also, for accurate results it is necessary to avoid exceeding the maximum response rate of the analog devices, which might dictate a slower simulation speed. On the other hand, too slow a computation might be inaccurate as a consequence of instability (e. g., drift and leakage in the integrators). Time scaling aﬀects only timedependent operations such as integration. For example, suppose t, time in the primary system or “problem time”, is related to , time in the computer, by D ˇt. Therefore, an integration Rt u(t) D 0 v(t 0 )dt 0 in the primary system is replaced by R the integration u( ) D ˇ 1 0 v( 0 )d 0 on the computer. Thus time scaling may be accomplished simply by decreasing the input gain to the integrator by a factor of ˇ. Fundamental to analog computation is the representation of a continuous quantity in the primary system by a continuous quantity in the computer. For example, a displacement x in meters might be represented by a potential V in volts. The two are related by an amplitude or magnitude scale factor, V D ˛x, (with units volts/meter), chosen to meet two criteria [pp. 103–106 in 2,Chap. 4 in 62,pp. 127–128 in 70,pp. 233–240 in 89]. On the one hand, ˛ must be suﬃciently small so that the range of the problem variable is accommodated within the range of values
supported by the computing device. Exceeding the device’s intended operating range may lead to inaccurate results (e. g., forcing a linear device into nonlinear behavior). On the other hand, the scale factor should not be too small, or relevant variation in the problem variable will be less than the resolution of the device, also leading to inaccuracy. (Recall that precision is speciﬁed as a fraction of fullrange variation.) In addition to the explicit variables of the primary system, there are implicit variables, such as the time derivatives of the explicit variables, and scale factors must be chosen for them too. For example, in addition to displacement x, a problem might include velocity x˙ and acceleration x¨ . Therefore, scale factors ˛, ˛ 0 , and ˛ 00 must be chosen so that ˛x, ˛ 0 x˙ , and ˛ 00 x¨ have an appropriate range of variation (neither too large nor too small). Once a scale factor has been chosen, the primary system equations are adjusted to obtain the analog computing equations. For example, if we haveR scaled u D ˛x and t v D ˛ 0 x˙ , then the integration x(t) D 0 x˙ (t 0 )dt 0 would be computed by scaled equation: Z ˛ t 0 0 u(t) D 0 v(t )dt : ˛ 0 This is accomplished by simply setting the input gain of the integrator to ˛/˛ 0 . In practice, time scaling and magnitude scaling are not independent [p. 262 in 70]. For example, if the derivatives of a variable can be large, then the variable can change rapidly, and so it may be necessary to slow down the computation to avoid exceeding the highfrequency response of the computer. Conversely, small derivatives might require the computation to be run faster to avoid integrator leakage etc. Appropriate scale factors are determined by considering both the physics and the mathematics of the problem [pp. 40–44 in 62]. That is, ﬁrst, the physics of the primary system may limit the ranges of the variables and their derivatives. Second, analysis of the mathematical equations describing the system can give additional information on the ranges of the variables. For example, in some cases the natural frequency of a system can be estimated from the coeﬃcients of the diﬀerential equations; the maximum of the nth derivative is then estimated as the n power of this frequency [p. 42 in 62,pp. 238–240 in 89]. In any case, it is not necessary to have accurate values for the ranges; rough estimates giving orders of magnitude are adequate. It is tempting to think of magnitude scaling as a problem unique to analog computing, but before the invention of ﬂoatingpoint numbers it was also necessary in digital computer programming. In any case it is an essential
Analog Computation
aspect of analog computing, in which physical processes are more directly used for computation than they are in digital computing. Although the necessity of scaling has been a source of criticism, advocates for analog computing have argued that it is a blessing in disguise, because it leads to improved understanding of the primary system, which was often the goal of the computation in the ﬁrst place [5,Chap. 8 in 82]. Practitioners of analog computing are more likely to have an intuitive understanding of both the primary system and its mathematical description (see Sect. “Analog Thinking”). Analog Computation in Nature Computational processes—that is to say, information processing and control—occur in many living systems, most obviously in nervous systems, but also in the selforganized behavior of groups of organisms. In most cases natural computation is analog, either because it makes use of continuous natural processes, or because it makes use of discrete but stochastic processes. Several examples will be considered brieﬂy. Neural Computation In the past neurons were thought of binary computing devices, something like digital logic gates. This was a consequence of the “all or nothing” response of a neuron, which refers to the fact that it does or does not generate an action potential (voltage spike) depending, respectively, on whether its total input exceeds a threshold or not (more accurately, it generates an action potential if the membrane depolarization at the axon hillock exceeds the threshold and the neuron is not in its refractory period). Certainly some neurons (e. g., socalled “command neurons”) do act something like logic gates. However, most neurons are analyzed better as analog devices, because the rate of impulse generation represents signiﬁcant information. In particular, an amplitude code, the membrane potential near the axon hillock (which is a summation of the electrical inﬂuences on the neuron), is translated into a rate code for more reliable longdistance transmission along the axons. Nevertheless, the code is low precision (about one digit), since information theory shows that it takes at least N milliseconds (and probably more like 5N ms) to discriminate N values [39]. The rate code is translated back to an amplitude code by the synapses, since successive impulses release neurotransmitter from the axon terminal, which diffuses across the synaptic cleft to receptors. Thus a synapse acts as a leaky integrator to timeaverage the impulses. As previously discussed (Sect. “Continuous State Space”), many artiﬁcial neural net models have realvalued
neural activities, which correspond to rateencoded axonal signals of biological neurons. On the other hand, these models typically treat the input connections as simple realvalued weights, which ignores the analog signal processing that takes place in the dendritic trees of biological neurons. The dendritic trees of many neurons are complex structures, which often have thousand of synaptic inputs. The binding of neurotransmitters to receptors causes minute voltage ﬂuctuations, which propagate along the membrane, and ultimately cause voltage ﬂuctuations at the axon hillock, which inﬂuence the impulse rate. Since the dendrites have both resistance and capacitance, to a ﬁrst approximation the signal propagation is described by the “cable equations”, which describe passive signal propagation in cables of speciﬁed diameter, capacitance, and resistance [Chap. 1 in 1]. Therefore, to a ﬁrst approximation, a neuron’s dendritic net operates as an adaptive linear analog ﬁlter with thousands of inputs, and so it is capable of quite complex signal processing. More accurately, however, it must be treated as a nonlinear analog ﬁlter, since voltagegated ion channels introduce nonlinear effects. The extent of analog signal processing in dendritic trees is still poorly understood. In most cases, then, neural information processing is treated best as lowprecision analog computation. Although individual neurons have quite broadly tuned responses, accuracy in perception and sensorimotor control is achieved through coarse coding, as already discussed (Sect. “Characteristics of Analog Computation”). Further, one widely used neural representation is the cortical map, in which neurons are systematically arranged in accord with one or more dimensions of their stimulus space, so that stimuli are represented by patterns of activity over the map. (Examples are tonotopic maps, in which pitch is mapped to cortical location, and retinotopic maps, in which cortical location represents retinal location.) Since neural density in the cortex is at least 146 000 neurons per square millimeter [p. 51 in 14], even relatively small cortical maps can be treated as ﬁelds and information processing in them as analog ﬁeld computation. Overall, the brain demonstrates what can be accomplished by massively parallel analog computation, even if the individual devices are comparatively slow and of low precision. Adaptive SelfOrganization in Social Insects Another example of analog computation in nature is provided by the selforganizing behavior of social insects, microorganisms, and other populations [13]. Often such organisms respond to concentrations, or gradients in the concentrations, of chemicals produced by other members
173
174
Analog Computation
of the population. These chemicals may be deposited and diﬀuse through the environment. In other cases, insects and other organisms communicate by contact, but may maintain estimates of the relative proportions of diﬀerent kinds of contacts. Because the quantities are eﬀectively continuous, all these are examples of analog control and computation. Selforganizing populations provide many informative examples of the use of natural processes for analog information processing and control. For example, diﬀusion of pheromones is a common means of selforganization in insect colonies, facilitating the creation of paths to resources, the construction of nests, and many other functions [13]. Real diﬀusion (as opposed to sequential simulations of it) executes, in eﬀect, a massively parallel search of paths from the chemical’s source to its recipients and allows the identiﬁcation of nearoptimal paths. Furthermore, if the chemical degrades, as is generally the case, then the system will be adaptive, in eﬀect continually searching out the shortest paths, so long as source continues to function [13]. Simulated diﬀusion has been applied to robot path planning [32,69]. Genetic Circuits Another example of natural analog computing is provided by the genetic regulatory networks that control the behavior of cells, in multicellular organisms as well as singlecelled ones [16]. These networks are deﬁned by the mutually interdependent regulatory genes, promoters, and repressors that control the internal and external behavior of a cell. The interdependencies are mediated by proteins, the synthesis of which is governed by genes, and which in turn regulate the synthesis of other gene products (or themselves). Since it is the quantities of these substances that is relevant, many of the regulatory motifs can be described in computational terms as adders, subtracters, integrators, etc. Thus the genetic regulatory network implements an analog control system for the cell [68]. It might be argued that the number of intracellular molecules of a particular protein is a (relatively small) discrete number, and therefore that it is inaccurate to treat it as a continuous quantity. However, the molecular processes in the cell are stochastic, and so the relevant quantity is the probability that a regulatory protein will bind to a regulatory site. Further, the processes take place in continuous real time, and so the rates are generally the signiﬁcant quantities. Finally, although in some cases gene activity is either on or oﬀ (more accurately: very low), in other cases it varies continuously between these extremes [pp. 388–390 in 29].
Embryological development combines the analog control of individual cells with the sort of selforganization of populations seen in social insects and other colonial organisms. Locomotion of the cells and the expression of speciﬁc genes is controlled by chemical signals, among other mechanisms [16,17]. Thus PDEs have proved useful in explaining some aspects of development; for example reactiondiﬀusion equations have been used to describe the formation of haircoat patterns and related phenomena [13,48,57]; see ReactionDiﬀusion Computing. Therefore the developmental process is governed by naturally occurring analog computation. Is Everything a Computer? It might seem that any continuous physical process could be viewed as analog computation, which would make the term almost meaningless. As the question has been put, is it meaningful (or useful) to say that the solar system is computing Kepler’s laws? In fact, it is possible and worthwhile to make a distinction between computation and other physical processes that happen to be described by mathematical laws [40,41,44,46]. If we recall the original meaning of analog computation (Sect. “Deﬁnition of the Subject”), we see that the computational system is used to solve some mathematical problem with respect to a primary system. What makes this possible is that the computational system and the primary system have the same, or systematically related, abstract (mathematical) structures. Thus the computational system can inform us about the primary system, or be used to control it, etc. Although from a practical standpoint some analogs are better than others, in principle any physical system can be used that obeys the same equations as the primary system. Based on these considerations we may deﬁne computation as a physical process the purpose of which is the abstract manipulation of abstract objects (i. e., information processing); this deﬁnition applies to analog, digital, and hybrid computation [40,41,44,46]. Therefore, to determine if a natural system is computational we need to look to its purpose or function within the context of the living system of which it is a part. One test of whether its function is the abstract manipulation of abstract objects is to ask whether it could still fulﬁll its function if realized by diﬀerent physical processes, a property called multiple realizability. (Similarly, in artiﬁcial systems, a simulation of the economy might be realized equally accurately by a hydraulic analog computer or an electronic analog computer [5].) By this standard, the majority of the nervous system is purely computational; in principle it could be
Analog Computation
replaced by electronic devices obeying the same diﬀerential equations. In the other cases we have considered (selforganization of living populations, genetic circuits) there are instances of both pure computation and computation mixed with other functions (for example, where the speciﬁc substances used have other – e. g. metabolic – roles in the living system). GeneralPurpose Analog Computation The Importance of GeneralPurpose Computers Although specialpurpose analog and digital computers have been developed, and continue to be developed, for many purposes, the importance of generalpurpose computers, which can be adapted easily for a wide variety of purposes, has been recognized since at least the nineteenth century. Babbage’s plans for a generalpurpose digital computer, his analytical engine (1835), are well known, but a generalpurpose diﬀerential analyzer was advocated by Kelvin [84]. Practical generalpurpose analog and digital computers were ﬁrst developed at about the same time: from the early 1930s through the war years. Generalpurpose computers of both kinds permit the prototyping of specialpurpose computers and, more importantly, permit the ﬂexible reuse of computer hardware for diﬀerent or evolving purposes. The concept of a generalpurpose computer is useful also for determining the limits of a computing paradigm. If one can design—theoretically or practically—a universal computer, that is, a generalpurpose computer capable of simulating any computer in a relevant class, then anything uncomputable by the universal computer will also be uncomputable by any computer in that class. This is, of course, the approach used to show that certain functions are uncomputable by any Turing machine because they are uncomputable by a universal Turing machine. For the same reason, the concept of generalpurpose analog computers, and in particular of universal analog computers are theoretically important for establishing limits to analog computation. GeneralPurpose Electronic Analog Computers Before taking up these theoretical issues, it is worth recalling that a typical electronic GPAC would include linear elements, such as adders, subtracters, constant multipliers, integrators, and diﬀerentiators; nonlinear elements, such as variable multipliers and function generators; other computational elements, such as comparators, noise generators, and delay elements (Sect. “Electronic Analog Computation in the 20th Century”). These are, of
course, in addition to input/output devices, which would not aﬀect its computational abilities. Shannon’s Analysis Claude Shannon did an important analysis of the computational capabilities of the diﬀerential analyzer, which applies to many GPACs [76,77]. He considered an abstract diﬀerential analyzer equipped with an unlimited number of integrators, adders, constant multipliers, and function generators (for functions with only a ﬁnite number of ﬁnite discontinuities), with at most one source of drive (which limits possible interconnections between units). This was based on prior work that had shown that almost all the generally used elementary functions could be generated with addition and integration. We will summarize informally a few of Shannon’s results; for details, please consult the original paper. First Shannon oﬀers proofs that, by setting up the correct ODEs, a GPAC with the mentioned facilities can generate any function if and only if is not hypertranscendental (Theorem II); thus the GPAC can generate any function that is algebraic transcendental (a very large class), but not, for example, Euler’s gamma function and Riemann’s zeta function. He also shows that the GPAC can generate functions derived from generable functions, such as the integrals, derivatives, inverses, and compositions of generable functions (Thms. III, IV). These results can be generalized to functions of any number of variables, and to their compositions, partial derivatives, and inverses with respect to any one variable (Thms. VI, VII, IX, X). Next Shannon shows that a function of any number of variables that is continuous over a closed region of space can be approximated arbitrarily closely over that region with a ﬁnite number of adders and integrators (Thms. V, VIII). Shannon then turns from the generation of functions to the solution of ODEs and shows that the GPAC can solve any system of ODEs deﬁned in terms of nonhypertranscendental functions (Thm. XI). Finally, Shannon addresses a question that might seem of limited interest, but turns out to be relevant to the computational power of analog computers (see Sect. “Analog Computation and the Turing Limit” below). To understand it we must recall that he was investigating the diﬀerential analyzer—a mechanical analog computer—but similar issues arise in other analog computing technologies. The question is whether it is possible to perform an arbitrary constant multiplication, u D kv, by means of gear ratios. He show that if we have just two gear ratios a and b(a; b ¤ 0; 1), such that b is not a rational power of a, then
175
176
Analog Computation
by combinations of these gears we can approximate k arbitrarily closely (Thm. XII). That is, to approximate multiplication by arbitrary real numbers, it is suﬃcient to be able to multiply by a, b, and their inverses, provided a and b are not related by a rational power. Shannon mentions an alternative method Rof constant v multiplication, which uses integration, kv D 0 kdv, but this requires setting the integrand to the constant function k. Therefore, multiplying by an arbitrary real number requires the ability to input an arbitrary real as the integrand. The issue of realvalued inputs and outputs to analog computers is relevant both to their theoretical power and to practical matters of their application (see Sect. “Realvalued Inputs, Output, and Constants”). Shannon’s proofs, which were incomplete, were eventually reﬁned by M. PourEl [63] and ﬁnally corrected by L. Lipshitz and L.A. Rubel [35]. Rubel [72] proved that Shannon’s GPAC cannot solve the Dirichlet problem for Laplace’s equation on the disk; indeed, it is limited to initialvalue problems for algebraic ODEs. Speciﬁcally, the Shannon–PourEl Thesis is that the outputs of the GPAC are exactly the solutions of the algebraic diﬀerential equations, that is, equations of the form P[x; y(x); y 0 (x); y 00 (x); : : : ; y (n) (x)] D 0 ; where P is a polynomial that is not identically vanishing in any of its variables (these are the diﬀerentially algebraic functions) [71]. (For details please consult the cited papers.) The limitations of Shannon’s GPAC motivated Rubel’s deﬁnition of the Extended Analog Computer. Rubel’s Extended Analog Computer The combination of Rubel’s [71] conviction that the brain is an analog computer together with the limitations of Shannon’s GPAC led him to propose the Extended Analog Computer (EAC) [73]. Like Shannon’s GPAC (and the Turing machine), the EAC is a conceptual computer intended to facilitate theoretical investigation of the limits of a class of computers. The EAC extends the GPAC in a number of respects. For example, whereas the GPAC solves equations deﬁned over a single variable (time), the EAC can generate functions over any ﬁnite number of real variables. Further, whereas the GPAC is restricted to initialvalue problems for ODEs, the EAC solves both initial and boundaryvalue problems for a variety of PDEs. The EAC is structured into a series of levels, each more powerful than the ones below it, from which it accepts inputs. The inputs to the lowest level are a ﬁnite number of real variables (“settings”). At this level it operates on real
polynomials, from which it is able to generate the diﬀerentially algebraic functions. The computing on each level is accomplished by conceptual analog devices, which include constant realnumber generators, adders, multipliers, differentiators, “substituters” (for function composition), devices for analytic continuation, and inverters, which solve systems of equations deﬁned over functions generated by the lower levels. Most characteristic of the EAC is the “boundaryvalueproblem box”, which solves systems of PDEs and ODEs subject to boundary conditions and other constraints. The PDEs are deﬁned in terms of functions generated by the lower levels. Such PDE solvers may seem implausible, and so it is important to recall ﬁeldcomputing devices for this purpose were implemented in some practical analog computers (see Sect. “History”) and more recently in Mills’ EAC [54]. As Rubel observed, PDE solvers could be implemented by physical processes that obey the same PDEs (heat equation, wave equation, etc.). (See also Sect. “Future Directions” below.) Finally, the EAC is required to be “extremely wellposed”, which means that each level is relatively insensitive to perturbations in its inputs; thus “all the outputs depend in a strongly deterministic and stable way on the initial settings of the machine” [73]. Rubel [73] proves that the EAC can compute everything that the GPAC can compute, but also such functions as the gamma and zeta, and that it can solve the Dirichlet problem for Laplace’s equation on the disk, all of which are beyond the GPAC’s capabilities. Further, whereas the GPAC can compute diﬀerentially algebraic functions of time, the EAC can compute diﬀerentially algebraic functions of any ﬁnite number of real variables. In fact, Rubel did not ﬁnd any realanalytic (C 1 ) function that is not computable on the EAC, but he observes that if the EAC can indeed generate every realanalytic function, it would be too broad to be useful as a model of analog computation. Analog Computation and the Turing Limit Introduction The Church–Turing Thesis asserts that anything that is effectively computable is computable by a Turing machine, but the Turing machine (and equivalent models, such as the lambda calculus) are models of discrete computation, and so it is natural to wonder how analog computing compares in power, and in particular whether it can compute beyond the “Turing limit”. Superﬁcial answers are easy to obtain, but the issue is subtle because it depends upon choices among deﬁnitions, none of which is obviously correct, it involves the foundations of mathematics and its
Analog Computation
philosophy, and it raises epistemological issues about the role of models in scientiﬁc theories. Nevertheless this is an active research area, but many of the results are apparently inconsistent due to the diﬀering assumptions on which they are based. Therefore this section will be limited to a mention of a few of the interesting results, but without attempting a comprehensive, systematic, or detailed survey; Siegelmann [78] can serve as an introduction to the literature. A Sampling of Theoretical Results ContinuousTime Models P. Orponen’s [59] 1997 survey of continuoustime computation theory is a good introduction to the literature as of that time; here we give a sample of these and more recent results. There are several results showing that—under various assumptions—analog computers have at least the power of Turing machines (TMs). For example, M.S. Branicky [11] showed that a TM could be simulated by ODEs, but he used nondiﬀerentiable functions; O. Bournez et al. [8] provide an alternative construction using only analytic functions. They also prove that the GPAC computability coincides with (Turing)computable analysis, which is surprising, since the gamma function is Turingcomputable but, as we have seen, the GPAC cannot generate it. The paradox is resolved by a distinction between generating a function and computing it, with the latter, broader notion permitting convergent computation of the function (that is, as t ! 1). However, the computational power of general ODEs has not been determined in general [p. 149 in 78]. MB PourEl and I Richards exhibit a Turingcomputable ODE that does not have a Turingcomputable solution [64,66]. M. Stannett [83] also deﬁned a continuoustime analog computer that could solve the halting problem. C. Moore [55] deﬁnes a class of continuoustime recursive functions over the reals, which includes a zeroﬁnding operator . Functions can be classiﬁed into a hierarchy depending on the number of uses of , with the lowest level (no s) corresponding approximately to Shannon’s GPAC. Higher levels can compute nonTuringcomputable functions, such as the decision procedure for the halting problem, but he questions whether this result is relevant in the physical world, which is constrained by “noise, quantum eﬀects, ﬁnite accuracy, and limited resources”. O. Bournez and M. Cosnard [9] have extended these results and shown that many dynamical systems have superTuring power. S. Omohundro [58] showed that a system of ten coupled nonlinear PDEs could simulate an arbitrary cellu
lar automaton (see Mathematical Basis of Cellular Automata, Introduction to), which implies that PDEs have at least Turing power. Further, D. Wolpert and B.J. MacLennan [90,91] showed that any TM can be simulated by a ﬁeld computer with linear dynamics, but the construction uses Dirac delta functions. PourEl and Richards exhibit a wave equation in threedimensional space with Turingcomputable initial conditions, but for which the unique solution is Turinguncomputable [65,66]. SequentialTime Models We will mention a few of the results that have been obtained concerning the power of sequentialtime analog computation. Although the BSS model has been investigated extensively, its power has not been completely determined [6,7]. It is known to depend on whether just rational numbers or arbitrary real numbers are allowed in its programs [p. 148 in 78]. A coupled map lattice (CML) is a cellular automaton with realvalued states Mathematical Basis of Cellular Automata, Introduction to; it is a sequentialtime analog computer, which can be considered a discretespace approximation to a simple sequentialtime ﬁeld computer. P. Orponen and M. Matamala [60] showed that a ﬁnite CML can simulate a universal Turing machine. However, since a CML can simulate a BSS program or a recurrent neural network (see Sect. “Recurrent Neural Networks” below), it actually has superTuring power [p. 149 in 78]. Recurrent neural networks are some of the most important examples of sequential analog computers, and so the following section is devoted to them. Recurrent Neural Networks With the renewed interest in neural networks in the mid1980s, may investigators wondered if recurrent neural nets have superTuring power. M. Garzon and S. Franklin showed that a sequentialtime net with a countable inﬁnity of neurons could exceed Turing power [21,23,24]. Indeed, Siegelmann and E.D. Sontag [80] showed that ﬁnite neural nets with realvalued weights have superTuring power, but W. Maass and Sontag [36] showed that recurrent nets with Gaussian or similar noise had subTuring power, illustrating again the dependence on these results on assumptions about what is a reasonable mathematical idealization of analog computing. For recent results on recurrent neural networks, we will restrict our attention of the work of Siegelmann [78], who addresses the computational power of these networks in terms of the classes of languages they can recognize. Without loss of generality the languages are restricted to sets of binary strings. A string to be tested is fed to the network one bit at a time, along with an input that indi
177
178
Analog Computation
cates when the end of the input string has been reached. The network is said to decide whether the string is in the language if it correctly indicates whether it is in the set or not, after some ﬁnite number of sequential steps since input began. Siegelmann shows that, if exponential time is allowed for recognition, ﬁnite recurrent neural networks with realvalued weights (and saturatedlinear activation functions) can compute all languages, and thus they are more powerful than Turing machines. Similarly, stochastic networks with rational weights also have superTuring power, although less power than the deterministic nets with real weights. (Speciﬁcally, they compute P/POLY and BPP/log respectively; see Siegelmann [Chaps. 4, 9 in 78] for details.) She further argues that these neural networks serve as a “standard model” of (sequential) analog computation (comparable to Turing machines in ChurchTuring computation), and therefore that the limits and capabilities of these nets apply to sequential analog computation generally. Siegelmann [p. 156 in 78] observes that the superTuring power of recurrent neural networks is a consequence of their use of nonrational realvalued weights. In eﬀect, a real number can contain an inﬁnite number of bits of information. This raises the question of how the nonrational weights of a network can ever be set, since it is not possible to deﬁne a physical quantity with inﬁnite precision. However, although nonrational weights may not be able to be set from outside the network, they can be computed within the network by learning algorithms, which are analog computations. Thus, Siegelmann suggests, the fundamental distinction may be between static computational models, such as the Turing machine and its equivalents, and dynamically evolving computational models, which can tune continuously variable parameters and thereby achieve superTuring power. Dissipative Models Beyond the issue of the power of analog computing relative to the Turing limit, there are also questions of its relative eﬃciency. For example, could analog computing solve NPhard problems in polynomial or even linear time? In traditional computational complexity theory, eﬃciency issues are addressed in terms of the asymptotic number of computation steps to compute a function as the size of the function’s input increases. One way to address corresponding issues in an analog context is by treating an analog computation as a dissipative system, which in this context means a system that decreases some quantity (analogous to energy) so that the system state converges to an point attractor. From this perspective, the initial state of the system incorporates the input
to the computation, and the attractor represents its output. Therefore, HT Sieglemann, S Fishman, and A BenHur have developed a complexity theory for dissipative systems, in both sequential and continuous time, which addresses the rate of convergence in terms of the underlying rates of the system [4,79]. The relation between dissipative complexity classes (e. g., Pd , N Pd ) and corresponding classical complexity classes (P, N P) remains unclear [p. 151 in 78]. RealValued Inputs, Outputs, and Constants A common argument, with relevance to the theoretical power of analog computation, is that an input to an analog computer must be determined by setting a dial to a number or by typing a number into digitaltoanalog conversion device, and therefore that the input will be a rational number. The same argument applies to any internal constants in the analog computation. Similarly, it is argued, any output from an analog computer must be measured, and the accuracy of measurement is limited, so that the result will be a rational number. Therefore, it is claimed, real numbers are irrelevant to analog computing, since any practical analog computer computes a function from the rationals to the rationals, and can therefore be simulated by a Turing machine. (See related arguments by Martin Davis [18,19].) There are a number of interrelated issues here, which may be considered brieﬂy. First, the argument is couched in terms of the input or output of digital representations, and the numbers so represented are necessarily rational (more generally, computable). This seems natural enough when we think of an analog computer as a calculating device, and in fact many historical analog computers were used in this way and had digital inputs and outputs (since this is our most reliable way of recording and reproducing quantities). However, in many analog control systems, the inputs and outputs are continuous physical quantities that vary continuously in time (also a continuous physical quantity); that is, according to current physical theory, these quantities are real numbers, which vary according to diﬀerential equations. It is worth recalling that physical quantities are neither rational nor irrational; they can be so classiﬁed only in comparison with each other or with respect to a unit, that is, only if they are measured and digitally represented. Furthermore, physical quantities are neither computable nor uncomputable (in a ChurchTuring sense); these terms apply only to discrete representations of these quantities (i. e., to numerals or other digital representations).
Analog Computation
Therefore, in accord with ordinary mathematical descriptions of physical processes, analog computations can can be treated as having arbitrary real numbers (in some range) as inputs, outputs, or internal states; like other continuous processes, continuoustime analog computations pass through all the reals in some range, including nonTuringcomputable reals. Paradoxically, however, these same physical processes can be simulated on digital computers. The Issue of Simulation by Turing Machines and Digital Computers Theoretical results about the computational power, relative to Turing machines, of neural networks and other analog models of computation raise diﬃcult issues, some of which are epistemological rather than strictly technical. On the one hand, we have a series of theoretical results proving the superTuring power of analog computation models of various kinds. On the other hand, we have the obvious fact that neural nets are routinely simulated on ordinary digital computers, which have at most the power of Turing machines. Furthermore, it is reasonable to suppose that any physical process that might be used to realize analog computation—and certainly the known processes—could be simulated on a digital computer, as is done routinely in computational science. This would seem to be incontrovertible proof that analog computation is no more powerful than Turing machines. The crux of the paradox lies, of course, in the nonTuringcomputable reals. These numbers are a familiar, accepted, and necessary part of standard mathematics, in which physical theory is formulated, but from the standpoint of ChurchTuring (CT) computation they do not exist. This suggests that the the paradox is not a contradiction, but reﬂects a divergence between the goals and assumptions of the two models of computation. The Problem of Models of Computation These issues may be put in context by recalling that the ChurchTuring (CT) model of computation is in fact a model, and therefore that it has the limitations of all models. A model is a cognitive tool that improves our ability to understand some class of phenomena by preserving relevant characteristics of the phenomena while altering other, irrelevant (or less relevant) characteristics. For example, a scale model alters the size (taken to be irrelevant) while preserving shape and other characteristics. Often a model achieves its purposes by making simplifying or idealizing assumptions, which facilitate analysis or simulation of the system. For example, we may use a linear math
ematical model of a physical process that is only approximately linear. For a model to be eﬀective it must preserve characteristics and make simplifying assumptions that are appropriate to the domain of questions it is intended to answer, its frame of relevance [46]. If a model is applied to problems outside of its frame of relevance, then it may give answers that are misleading or incorrect, because they depend more on the simplifying assumptions than on the phenomena being modeled. Therefore we must be especially cautious applying a model outside of its frame of relevance, or even at the limits of its frame, where the simplifying assumptions become progressively less appropriate. The problem is aggravated by the fact that often the frame of relevance is not explicit deﬁned, but resides in a tacit background of practices and skills within some discipline. Therefore, to determine the applicability of the CT model of computation to analog computing, we must consider the frame of relevance of the CT model. This is easiest if we recall the domain of issues and questions it was originally developed to address: issues of eﬀective calculability and derivability in formalized mathematics. This frame of relevance determines many of the assumptions of the CT model, for example, that information is represented by ﬁnite discrete structures of symbols from a ﬁnite alphabet, that information processing proceeds by the application of deﬁnite formal rules at discrete instants of time, and that a computational or derivational process must be completed in a ﬁnite number of these steps.1 Many of these assumptions are incompatible with analog computing and with the frames of relevance of many models of analog computation. Relevant Issues for Analog Computation Analog computation is often used for control. Historically, analog computers were used in control systems and to simulate control systems, but contemporary analog VLSI is also frequently applied in control. Natural analog computation also frequently serves a control function, for example, sensorimotor control by the nervous system, genetic regulation in cells, and selforganized cooperation in insect colonies. Therefore, control systems provide one frame of relevance for models of analog computation. In this frame of relevance realtime response is a critical issue, which models of analog computation, therefore, ought to be able to address. Thus it is necessary to be able to relate the speed and frequency response of analog computation to the rates of the physical processes by which the computation is realized. Traditional methods of algorithm 1 See MacLennan [45,46] for a more detailed discussion of the frame of relevance of the CT model.
179
180
Analog Computation
analysis, which are based on sequential time and asymptotic behavior, are inadequate in this frame of relevance. On the one hand, the constants (time scale factors), which reﬂect the underlying rate of computation are absolutely critical (but ignored in asymptotic analysis); on the other hand, in control applications the asymptotic behavior of algorithm is generally irrelevant, since the inputs are typically ﬁxed in size or of a limited range of sizes. The CT model of computation is oriented around the idea that the purpose of a computation is to evaluate a mathematical function. Therefore the basic criterion of adequacy for a computation is correctness, that is, that given a precise representation of an input to the function, it will produce (after ﬁnitely many steps) a precise representation of the corresponding output of the function. In the context of natural computation and control, however, other criteria may be equally or even more relevant. For example, robustness is important: how well does the system respond in the presence of noise, uncertainty, imprecision, and error, which are unavoidable in real natural and artiﬁcial control systems, and how well does it respond to defects and damage, which arise in many natural and artiﬁcial contexts. Since the real world is unpredictable, ﬂexibility is also important: how well does an artiﬁcial system respond to inputs for which it was not designed, and how well does a natural system behave in situations outside the range of those to which it is evolutionarily adapted. Therefore, adaptability (through learning and other means) is another important issue in this frame of relevance.2 Transcending Turing Computability Thus we see that many applications of analog computation raise diﬀerent questions from those addressed by the CT model of computation; the most useful models of analog computing will have a diﬀerent frame of relevance. In order to address traditional questions such as whether analog computers can compute “beyond the Turing limit”, or whether they can solve NPhard problems in polynomial time, it is necessary to construct models of analog computation within the CT frame of relevance. Unfortunately, constructing such models requires making commitments about many issues (such as the representation of reals and the discretization of time), that may aﬀect the answers to these questions, but are fundamentally unimportant in the frame of relevance of the most useful applications of the concept of analog computation. Therefore, being overly focused on traditional problems in the theory of computation (which was formulated for a diﬀerent frame of rele2 See MacLennan [45,46] for a more detailed discussion of the frames of relevance of natural computation and control.
vance) may distract us from formulating models of analog computation that can address important issues in its own frame of relevance. Analog Thinking It will be worthwhile to say a few words about the cognitive implications of analog computing, which are a largely forgotten aspect of analog vs. digital debates of the late 20th century. For example, it was argued that analog computing provides a deeper intuitive understanding of a system than the alternatives do [5,Chap. 8 in 82]. On the one hand, analog computers aﬀorded a means of understanding analytically intractable systems by means of “dynamic models”. By setting up an analog simulation, it was possible to vary the parameters and explore interactively the behavior of a dynamical system that could not be analyzed mathematically. Digital simulations, in contrast, were orders of magnitude slower and did not permit this kind of interactive investigation. (Performance has improved suﬃciently in contemporary digital computers so that in many cases digital simulations can be used as dynamic models, sometimes with an interface that mimics an analog computer; see [5].) Analog computing is also relevant to the cognitive distinction between knowing how (procedural knowledge) and knowing that (factual knowledge) [Chap. 8 in 82]. The latter (“knowthat”) is more characteristic of scientiﬁc culture, which strives for generality and exactness, often by designing experiments that allow phenomena to be studied in isolation, whereas the former (“knowhow”) is more characteristic of engineering culture; at least it was so through the ﬁrst half of the twentieth century, before the development of “engineering science” and the widespread use of analytic techniques in engineering education and practice. Engineers were faced with analytically intractable systems, with inexact measurements, and with empirical relationships (characteristic curves, etc.), all of which made analog computers attractive for solving engineering problems. Furthermore, because analog computing made use of physical phenomena that were mathematically analogous to those in the primary system, the engineer’s intuition and understanding of one system could be transferred to the other. Some commentators have mourned the loss of handson intuitive understanding attendant on the increasingly scientiﬁc orientation of engineering education and the disappearance of analog computer [5,34,61,67]. I will mention one last cognitive issue relevant to the diﬀerences between analog and digital computing. As already discussed Sect. “Characteristics of Analog Compu
Analog Computation
tation”, it is generally agreed that it is less expensive to achieve high precision with digital technology than with analog technology. Of course, high precision may not be important, for example when the available data are inexact or in natural computation. Further, some advocates of analog computing argue that high precision digital result are often misleading [p. 261 in 82]. Precision does not imply accuracy, and the fact that an answer is displayed with 10 digits does not guarantee that it is accurate to 10 digits; in particular, engineering data may be known to only a few signiﬁcant ﬁgures, and the accuracy of digital calculation may be limited by numerical problems. Therefore, on the one hand, users of digital computers might fall into the trap of trusting their apparently exact results, but users of modestprecision analog computers were more inclined to healthy skepticism about their computations. Or so it was claimed. Future Directions Certainly there are many purposes that are best served by digital technology; indeed there is a tendency nowadays to think that everything is done better digitally. Therefore it will be worthwhile to consider whether analog computation should have a role in future computing technologies. I will argue that the approaching end of Moore’s Law [56], which has predicted exponential growth in digital logic densities, will encourage the development of new analog computing technologies. Two avenues present themselves as ways toward greater computing power: faster individual computing elements and greater densities of computing elements. Greater density increases power by facilitating parallel computing, and by enabling greater computing power to be put into smaller packages. Other things being equal, the fewer the layers of implementation between the computational operations and the physical processes that realize them, that is to say, the more directly the physical processes implement the computations, the more quickly they will be able to proceed. Since most physical processes are continuous (deﬁned by diﬀerential equations), analog computation is generally faster than digital. For example, we may compare analog addition, implemented directly by the additive combination of physical quantities, with the sequential process of digital addition. Similarly, other things being equal, the fewer physical devices required to implement a computational element, the greater will be the density of these elements. Therefore, in general, the closer the computational process is to the physical processes that realize it, the fewer devices will be required, and so the continuity of physical law suggests that analog com
putation has the potential for greater density than digital. For example, four transistors can realize analog addition, whereas many more are required for digital addition. Both considerations argue for an increasing role of analog computation in postMoore’s Law computing. From this broad perspective, there are many physical phenomena that are potentially usable for future analog computing technologies. We seek phenomena that can be described by wellknown and useful mathematical functions (e. g., addition, multiplication, exponential, logarithm, convolution). These descriptions do not need to be exact for the phenomena to be useful in many applications, for which limited range and precision are adequate. Furthermore, in some applications speed is not an important criterion; for example, in some control applications, small size, low power, robustness, etc. may be more important than speed, so long as the computer responds quickly enough to accomplish the control task. Of course there are many other considerations in determining whether given physical phenomena can be used for practical analog computation in a given application [47]. These include stability, controllability, manufacturability, and the ease of interfacing with input and output transducers and other devices. Nevertheless, in the postMoore’s Law world, we will have to be willing to consider all physical phenomena as potential computing technologies, and in many cases we will ﬁnd that analog computing is the most eﬀective way to utilize them. Natural computation provides many examples of effective analog computation realized by relatively slow, lowprecision operations, often through massive parallelism. Therefore, postMoore’s Law computing has much to learn from the natural world. Bibliography Primary Literature 1. Anderson JA (1995) An Introduction to Neural Networks. MIT Press, Cambridge 2. Ashley JR (1963) Introduction to Analog Computing. Wiley, New York 3. Aspray W (1993) Edwin L. Harder and the Anacom: Analog computing at Westinghouse. IEEE Ann Hist of Comput 15(2):35–52 4. BenHur A, Siegelmann HT, Fishman S (2002) A theory of complexity for continuous time systems. J Complex 18:51–86 5. Bissell CC (2004) A great disappearing act: The electronic analogue computer. In: IEEE Conference on the History of Electronics, Bletchley, June 2004. pp 28–30 6. Blum L, Cucker F, Shub M, Smale S (1998) Complexity and Real Computation. Springer, Berlin 7. Blum L, Shub M, Smale S (1988) On a theory of computation and complexity over the real numbers: NP completeness, re
181
182
Analog Computation
8.
9.
10.
11.
12.
13.
14. 15.
16.
17. 18.
19. 20. 21.
22.
23.
24.
25. 26.
cursive functions and universal machines. Bulletin Am Math Soc 21:1–46 Bournez O, Campagnolo ML, Graça DS, Hainry E. The General Purpose Analog Computer and computable analysis are two equivalent paradigms of analog computation. In: Theory and Applications of Models of Computation (TAMC 2006). Lectures Notes in Computer Science, vol 3959. Springer, Berlin, pp 631– 643 Bournez O, Cosnard M (1996) On the computational power of dynamical systems and hybrid systems. Theor Comput Sci 168(2):417–59 Bowles MD (1996) US technological enthusiasm and British technological skepticism in the age of the analog brain. Ann Hist Comput 18(4):5–15 Branicky MS (1994) Analog computation with continuous ODEs. In: Proceedings IEEE Workshop on Physics and Computation, Dallas, pp 265–274 Brockett RW (1988) Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. In: Proceedings 27th IEEE Conference Decision and Control, Austin, December 1988, pp 799–803 Camazine S, Deneubourg JL, Franks NR, Sneyd G, Theraulaz J, Bonabeau E (2001) Selforganization in Biological Systems. Princeton Univ. Press, New York Changeux JP (1985) Neuronal Man: The Biology of Mind (trans: Garey LL). Oxford University Press, Oxford Clymer AB (1993) The mechanical analog computers of Hannibal Ford and William Newell. IEEE Ann Hist Comput 15(2): 19–34 Davidson EH (2006) The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. Academic Press, Amsterdam Davies JA (2005) Mechanisms of Morphogensis. Elsevier, Amsterdam Davis M (2004) The myth of hypercomputation. In: Teuscher C (ed) Alan Turing: Life and Legacy of a Great Thinker. Springer, Berlin, pp 195–212 Davis M (2006) Why there is no such discipline as hypercomputation. Appl Math Comput 178:4–7 Fakhraie SM, Smith KC (1997) VLSICompatible Implementation for Artificial Neural Networks. Kluwer, Boston Franklin S, Garzon M (1990) Neural computability. In: Omidvar OM (ed) Progress in Neural Networks, vol 1. Ablex, Norwood, pp 127–145 Freeth T, Bitsakis Y, Moussas X, Seiradakis JH, Tselikas A, Mangou H, Zafeiropoulou M, Hadland R, Bate D, Ramsey A, Allen M, Crawley A, Hockley P, Malzbender T, Gelb D, Ambrisco W, Edmunds MG (2006) Decoding the ancient Greek astronomical calculator known as the Antikythera mechanism. Nature 444:587–591 Garzon M, Franklin S (1989) Neural computability ii (extended abstract). In: Proceedings, IJCNN International Joint Conference on Neural Networks, vol 1. Institute of Electrical and Electronic Engineers, New York, pp 631–637 Garzon M, Franklin S (1990) Computation on graphs. In: Omidvar OM (ed) Progress in Neural Networks, vol 2. Ablex, Norwood Goldstine HH (1972) The Computer from Pascal to von Neumann. Princeton, Princeton Grossberg S (1967) Nonlinear differencedifferential equations
27.
28.
29. 30.
31. 32. 33.
34.
35. 36.
37.
38.
39.
40.
41. 42.
43. 44.
45.
in prediction and learning theory. Proc Nat Acad Sci USA 58(4):1329–1334 Grossberg S (1973) Contour enhancement, short term memory, and constancies in reverberating neural networks. Stud Appl Math LII:213–257 Grossberg S (1976) Adaptive pattern classification and universal recoding: I. parallel development and coding of neural feature detectors. Biol Cybern 23:121–134 Hartl DL (1994) Genetics, 3rd edn. Jones & Bartlett, Boston Hopfield JJ (1984) Neurons with graded response have collective computational properties like those of twostate neurons. Proc Nat Acad Sci USA 81:3088–92 Howe RM (1961) Design Fundamentals of Analog Computer Components. Van Nostrand, Princeton Khatib O (1986) Realtime obstacle avoidance for manipulators and mobile robots. Int J Robot Res 5:90–99 Kirchhoff G (1845) Ueber den Durchgang eines elektrischen Stromes durch eine Ebene, insbesondere durch eine kreisförmige. Ann Phys Chem 140(64)(4):497–514 Lang GF (2000) Analog was not a computer trademark! Why would anyone write about analog computers in year 2000? Sound Vib 34(8):16–24 Lipshitz L, Rubel LA (1987) A differentially algebraic replacment theorem. Proc Am Math Soc 99(2):367–72 Maass W, Sontag ED (1999) Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitrary regular languages. Neural Comput 11(3):771–782 MacLennan BJ (1987) Technologyindependent design of neurocomputers: The universal field computer. In: Caudill M, Butler C (eds) Proceedings of the IEEE First International Conference on Neural Networks, vol 3, IEEE Press, pp 39–49 MacLennan BJ (1990) Field computation: A theoretical framework for massively parallel analog computation, parts I–IV. Technical Report CS90100, Department of Computer Science, University of Tennessee, Knoxville. Available from http:// www.cs.utk.edu/~mclennan. Accessed 20 May 2008 MacLennan BJ (1991) Gabor representations of spatiotemporal visual images. Technical Report CS91144, Department of Computer Science, University of Tennessee, Knoxville. Available from http://www.cs.utk.edu/~mclennan. Accessed 20 May 2008 MacLennan BJ (1994) Continuous computation and the emergence of the discrete. In: Pribram KH (ed) Origins: Brain & SelfOrganization, Lawrence Erlbaum, Hillsdale, pp 121–151. MacLennan BJ (1994) “Words lie in our way”. Minds Mach 4(4):421–437 MacLennan BJ (1995) Continuous formal systems: A unifying model in language and cognition. In: Proceedings of the IEEE Workshop on Architectures for Semiotic Modeling and Situation Analysis in Large Complex Systems, Monterey, August 1995. pp 161–172. Also available from http://www.cs.utk.edu/ ~mclennan. Accessed 20 May 2008 MacLennan BJ (1999) Field computation in natural and artificial intelligence. Inf Sci 119:73–89 MacLennan BJ (2001) Can differential equations compute? Technical Report UTCS01459, Department of Computer Science, University of Tennessee, Knoxville. Available from http:// www.cs.utk.edu/~mclennan. Accessed 20 May 2008 MacLennan BJ (2003) Transcending Turing computability. Minds Mach 13:3–22
Analog Computation
46. MacLennan BJ (2004) Natural computation and nonTuring models of computation. Theor Comput Sci 317:115–145 47. MacLennan BJ (in press) SuperTuring or nonTuring? Extending the concept of computation. Int J Unconv Comput, in press 48. Maini PK, Othmer HG (eds) (2001) Mathematical Models for Biological Pattern Formation. Springer, New York 49. Maziarz EA, Greenwood T (1968) Greek Mathematical Philosophy. Frederick Ungar, New York 50. McClelland JL, Rumelhart DE, the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 2: Psychological and Biological Models. MIT Press, Cambridge 51. Mead C (1987) Silicon models of neural computation. In: Caudill M, Butler C (eds) Proceedings, IEEE First International Conference on Neural Networks, vol I. IEEE Press, Piscataway, pp 91–106 52. Mead C (1989) Analog VLSI and Neural Systems. AddisonWesley, Reading 53. Mills JW (1996) The continuous retina: Image processing with a singlesensor artificial neural field network. In: Proceedings IEEE Conference on Neural Networks. IEEE Press, Piscataway 54. Mills JW, Himebaugh B, Kopecky B, Parker M, Shue C, Weilemann C (2006) “Empty space” computes: The evolution of an unconventional supercomputer. In: Proceedings of the 3rd Conference on Computing Frontiers, New York, May 2006. ACM Press, pp 115–126 55. Moore C (1996) Recursion theory on the reals and continuoustime computation. Theor Comput Sci 162:23–44 56. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117 57. Murray JD (1977) Lectures on Nonlinear DifferentialEquation Models in Biology. Oxford, Oxford 58. Omohundro S (1984) Modeling cellular automata with partial differential equations. Physica D 10:128–34, 1984. 59. Orponen P (1997) A survey of continoustime computation theory. In: Advances in Algorithms, Languages, and Complexity, Kluwer, Dordrecht, pp 209–224 60. Orponen P, Matamala M (1996) Universal computation by finite twodimensional coupled map lattices. In: Proceedings, Physics and Computation 1996, New England Complex Systems Institute, Cambridge, pp 243–7 61. Owens L (1986) Vannevar Bush and the differential analyzer: The text and context of an early computer. Technol Culture 27(1):63–95 62. Peterson GR (1967) Basic Analog Computation. Macmillan, New York 63. PourEl MB (1974) Abstract computability and its relation to the general purpose analog computer (some connections between logic, differential equations and analog computers). Trans Am Math Soc 199:1–29 64. PourEl MB, Richards I (1979) A computable ordinary differential equation which possesses no computable solution. Ann Math Log 17:61–90 65. PourEL MB, Richards I (1981) The wave equation with computable initial data such that its unique solution is not computable. Adv Math 39:215–239 66. PourEl MB, Richards I (1982) Noncomputability in models of physical phenomena. Int J Theor Phys, 21:553–555 67. Puchta S (1996) On the role of mathematics and mathematical
68. 69.
70. 71. 72. 73. 74.
75.
76. 77.
78. 79.
80. 81. 82.
83.
84.
85. 86.
87. 88.
89.
knowledge in the invention of Vannevar Bush’s early analog computers. IEEE Ann Hist Comput 18(4):49–59 Reiner JM (1968) The Organism as an Adaptive Control System. PrenticeHall, Englewood Cliffs Rimon E, Koditschek DE (1989) The construction of analytic diffeomorphisms for exact robot navigation on star worlds. In: Proceedings of the 1989 IEEE International Conference on Robotics and Automation, Scottsdale AZ. IEEE Press, New York, pp 21–26 Rogers AE, Connolly TW (1960) Analog Computation in Engineering Design. McGrawHill, New York Rubel LA (1985) The brain as an analog computer. J Theor Neurobiol 4:73–81 Rubel LA (1988) Some mathematical limitations of the generalpurpose analog computer. Adv Appl Math 9:22–34 Rubel LA (1993) The extended analog computer. Adv Appl Math 14:39–50 Rumelhart DE, McClelland JL, the PDP Research Group (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations. MIT Press, Cambridge Sanger TD (1996) Probability density estimation for the interpretation of neural population codes. J Neurophysiol 76:2790– 2793 Shannon CE (1941) Mathematical theory of the differential analyzer. J Math Phys Mass Inst Technol 20:337–354 Shannon CE (1993) Mathematical theory of the differential analyzer. In: Sloane NJA, Wyner AD (eds) Claude Elwood Shannon: Collected Papers. IEEE Press, New York, pp 496–513 Siegelmann HT (1999) Neural Networks and Analog Computation: Beyond the Turing Limit. Birkhäuser, Boston Siegelmann HT, BenHur A, Fishman S (1999) Computational complexity for continuous time dynamics. Phys Rev Lett 83(7):1463–6 Siegelmann HT, Sontag ED (1994) Analog computation via neural networks. Theor Comput Sci 131:331–360 Small JS (1993) Generalpurpose electronic analog computing. IEEE Ann Hist Comput 15(2):8–18 Small JS (2001) The Analogue Alternative: The electronic analogue computer in Britain and the USA, 1930–1975. Routledge, London & New York Stannett M (19901) Xmachines and the halting problem: Building a superTuring machine. Form Asp Comput 2:331– 341 Thomson W (Lord Kelvin) (1876) Mechanical integration of the general linear differential equation of any order with variable coefficients. Proc Royal Soc 24:271–275 Thomson W (Lord Kelvin) (1878) Harmonic analyzer. Proc Royal Soc 27:371–373 Thomson W (Lord Kelvin) (1938). The tides. In: The Harvard Classics, vol 30: Scientific Papers. Collier, New York, pp 274– 307 Truitt TD, Rogers AE (1960) Basics of Analog Computers. John F. Rider, New York van Gelder T (1997) Dynamics and cognition. In: Haugeland J (ed) Mind Design II: Philosophy, Psychology and Artificial Intelligence. MIT Press, Cambridge MA, revised & enlarged edition, Chap 16, pp 421–450 Weyrick RC (1969) Fundamentals of Analog Computers. PrenticeHall, Englewood Cliffs
183
184
Analog Computation
90. Wolpert DH (1991) A computationally universal field computer which is purely linear. Technical Report LAUR912937. Los Alamos National Laboratory, Loa Alamos 91. Wolpert DH, MacLennan BJ (1993) A computationally universal field computer that is purely linear. Technical Report CS93206. Dept. of Computer Science, University of Tennessee, Knoxville
Books and Reviews Fifer S (1961) Analog computation: Theory, techniques and applications, 4 vols. McGrawHill, New York Bissell CC (2004) A great disappearing act: The electronic analogue
computer. In: IEEE conference on the history of electronics, 28– 30 Bletchley, June 2004 Lipka J (1918) Graphical and mechanical computation. Wiley, New York Mead C (1989) Analog VLSI and neural systems. AddisonWesley, Reading Siegelmann HT (1999) Neural networks and analog computation: Beyond the Turing limit. Birkhäuser, Boston Small JS (2001) The analogue alternative: The electronic analogue computer in Britain and the USA, 1930–1975. Routledge, London & New York Small JS (1993) Generalpurpose electronic analog computing: 1945–1965. IEEE Ann Hist Comput 15(2):8–18
Artificial Chemistry
Artificial Chemistry PETER DITTRICH Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
Article Outline Glossary Deﬁnition of the Subject Introduction Basic Building Blocks of an Artiﬁcial Chemistry StructuretoFunction Mapping Space Theory Evolution Information Processing Future Directions Bibliography
Glossary Molecular species A molecular species is an abstract class denoting an ensemble of identical molecules. Equivalently the terms “species”, “compound”, or just “molecule” are used; in some speciﬁc context also the terms “substrate” or “metabolite”. Molecule A molecule is a concrete instance of a molecular species. Molecules are those entities of an artiﬁcial chemistry that react. Note that sometimes the term “molecule” is used equivalently to molecular species. Reaction network A set of molecular species together with a set of reaction rules. Formally, a reaction network is equivalent to a Petri network. A reaction network describes the stoichiometric structure of a reaction system. Order of a reaction The order of a reaction is the sum of the exponents of concentrations in the kinetic rate law (if given). Note that an order can be fractional. If only the stoichiometric coeﬃcients of the reaction rule are given (Eq. (1)), the order is the sum of the coeﬃcients of the lefthand side species. When assuming massaction kinetics, both deﬁnitions are equivalent. Autocatalytic set A (selfmaintaining) set where each molecule is produced catalytically by molecules from that set. Note that an autocatalytic may produce molecules not present in that set. Closure A set of molecules A is closed, if no combination of molecules from A can react to form a molecule outside A. Note that the term “closure” has been also
used to denote the catalytical closure of an autocatalytic set. Selfmaintaining A set of molecules is called selfmaintaining, if it is able to maintain all its constituents. In a purely autocatalytic system under ﬂow condition, this means that every molecule can be catalytically produced by at least one reaction among molecule from the set. (Chemical) Organization A closed and selfmaintaining set of molecules. Multiset A multiset is like a set but where elements can appear more than once; that is, each element has a multiplicity, e. g., fa; a; b; c; c; cg. Definition of the Subject Artiﬁcial chemistries are chemicallike systems or abstract models of chemical processes. They are studied in order to illuminate and understand fundamental principles of chemical systems as well as to exploit the chemical metaphor as a design principle for information processing systems in ﬁelds like chemical computing or nonlinear optimization. An artiﬁcial chemistry (AC) is usually a formal (and, more seldom, a physical) system that consists of objects called molecules, which interact according to rules called reactions. Compared to conventional chemical models, artiﬁcial chemistries are more abstract in the sense that there is usually not a onetoone mapping between the molecules (and reactions) of the artiﬁcial chemistry to real molecules (and reactions). An artiﬁcial chemistry aims at capturing the logic of chemistry rather than trying to explain a particular chemical system. More formally, an artiﬁcial chemistry can be deﬁned by a set of molecular species M, a set of reaction rules R, and speciﬁcations of the dynamics, e. g., kinetic laws, update algorithm, and geometry of the reaction vessel. The scientiﬁc motivation for AC research is to build abstract models in order to understand chemicallike systems in all kind of domains ranging from physics, chemistry, biology, computer science, and sociology. Abstract chemical models to study fundamental principles, such as spatial pattern formation and selfreplication, can be traced back to the beginning of modern computer science in the 40s [91,97]. Since then a growing number of approaches have tackled questions concerning the origin of complex forms, the origin of life itself [19,81], or cellular diversity [48]. In the same way a variety of approaches for constructing ACs have appeared, ranging from simple ordinary diﬀerential equations [20] to complex individualbased algebraic transformation systems [26,36,51].
185
186
Artificial Chemistry
The engineering motivation for AC research aims at employing the chemical metaphor as a programming and computing paradigm. Approaches can be distinguished according to whether chemistry serves as a paradigm to program or construct conventional insilico information processing systems [4], Or whether real molecules are used for information processing like in molecular computing [13] or DNA computing [2] Cellular Computing, DNA Computing. Introduction The term artiﬁcial chemistry appeared around 1990 in the ﬁeld of artiﬁcial life [72]. However, models that fell under the deﬁnition of artiﬁcial chemistry appeared also decades before, such as von Neumann’s universal selfreplicating automata [97]. In the 50s, a famous abstract chemical system was introduced by Turing [91] in order to show how spatial diﬀusion can destabilize a chemical system leading to spatial pattern formation. Turing’s artiﬁcial chemistries consist of only a handful of chemical species interacting according to a couple of reaction rules, each carefully designed. The dynamics is simulated by a partial diﬀerential equation. Turing’s model possesses the structure of a conventional reaction kinetics chemical model (Subsect. “The Chemical Diﬀerential Equation”). However it does not aim at modeling a particular reaction process, but aims at exploring how, in principle, spatial patterns can be generated by simple chemical processes. Turing’s artiﬁcial chemistries allow one to study whether a particular mechanism (e. g., destabilization through diﬀusion) can explain a particular phenomena (e. g., symmetry breaking and pattern formation). The example illustrates that studying artiﬁcial chemistries is more a synthetic approach. That is, understanding should be gained through the synthesis of an artiﬁcial system and its observation under various conditions [45,55]. The underlying assumption is that understanding a phenomena and how this phenomena can be (re)created (without copying it) is closely related [96]. Today’s artiﬁcial chemistries are much more complex than Turing’s model. They consist of a huge, sometimes inﬁnite, amount of diﬀerent molecular species and their reaction rules are deﬁned not manually one by one, but implicitly (Sect. “StructuretoFunction Mapping”) and can evolve (Sect. “Evolution”). Questions that are tackled are the structure and organization of large chemical networks, the origin of life, prebiotic chemical evolution, and the emergence of chemical information and its processing.
Introductory Example It is relatively easy to create a complex, inﬁnitely sized reaction network. Assume that the set of possible molecules M are strings over an alphabet. Next, we have to deﬁne a reaction function describing what happens when two molecules a1 ; a2 2 M collide. A general approach is to map a1 to a machine, operator, or function f D fold(a1 ) with f : M ! M [ felasticg. The mapping fold can be deﬁned in various ways, such as by interpreting a1 as a computer program, a Turing machine, or a lambda expression (Sect. “StructuretoFunction Mapping”). Next, we apply the machine f derived from a1 to the second molecule. If f (a2 ) D elastic the molecules collided elastically and nothing happens. This could be the case if the machine f does not halt within a predeﬁned amount of time. Otherwise the molecules react and a new molecule a3 D f (a2 ) is produced. How this new molecule changes the reaction vessel depends on the algorithm simulating the dynamics of the vessel. Algorithm 1 is a–simple, widely applied algorithm that simulates secondorder catalytic reactions under ﬂow conditions: Because the population size is constant and thus limited, there is competition and only those molecules that are somehow created will sustain. First, the algorithm chooses two molecules randomly, which simulates a collision. Note that a realistic measure of time takes the amount of collision and not the amount of reaction events into account. Then, we determine stochastically whether the selected molecules react. If the molecules react, a randomly selected molecule from the population is replaced by the new product, which assures that the population size stays constant. The replaced molecules constitute the outﬂow. The inﬂow is implicitly assumed and is generated by the catalytic production of molecules. From a chemical point of view, the algorithm assumes that a product is built from an energyrich substrate, which does not appear in the algorithm and which is assumed to be available at constant concentration. The following section (Sect. “Basic Building Blocks of an Artiﬁcial Chemistry”) describes the basic building blocks of an artiﬁcial chemistry in more detail. In Sect. “StructuretoFunction Mapping” we will explore various techniques to deﬁne reaction rules. The role of space is brieﬂy discussed in Sect. “Space”. Then the important theoretical concepts of an autocatalytic set and a chemical organization are explained (Sect. “Theory”). Sect. “Evolution” shows how artiﬁcial chemistries can be used to study (chemical) evolution. This article does not discuss information processing in detail, because there are a series of other specialized articles on that topic, which are summarized in Sect. “Information Processing”.
Artificial Chemistry
INPUT: P : population (array of k molecules) randomPosition() := randomInt(0, k1); reaction(r1, r2) := (fold(r1))(r2); while not terminate do reactant1 := P[randomPosition()]; reactant2 := P[randomPosition()]; product := reaction(reactant1, reactant2); if product == not elastic P[randomPosition()] := product; fi t := t + 1/k ;; increment simulated time od Artificial Chemistry, Algorithm 1
Basic Building Blocks of an Artificial Chemistry
Reaction Rules
Molecules
The set of reaction rules R describes how molecules from M D fa1 ; a2 ; : : :g interact. A rule 2 R can be written
The ﬁrst step in deﬁning an AC requires that we deﬁne the set of all molecular species that can appear in the AC. The easiest way to specify this set is to enumerate explicitly all molecules as symbols. For example: M D fH2 ; O2 ; H2 O; H2 O2 g or, equivalently, M D fa; b; c; dg. A symbol is just a name without any structure. Most conventional (bio)chemical reaction system models are deﬁned this way. Also many artiﬁcial chemistries where networks are designed “by hand” [20], are created randomly [85], or are evolved by changing the reaction rules [41,42,43] use only symbolic molecules without any structure. Alternatively, molecules can posses a structure. In that case, the set of molecules is implicitly deﬁned. For example: M D fall polymers that can be made from the two monomers a and b}. In this case, the set of all possible molecules can even become inﬁnite, or at least quite large. A vast variety of such rulebased molecule deﬁnitions can be found in diﬀerent approaches. For example, structured molecules may be abstract character sequences [3,23, 48,64], sequences of instructions [1,39], lambdaexpressions [26], combinators [83], binary strings [5,16,90], numbers [8], machine code [72], graphs [78], swarms [79], or proofs [28]. We can call a molecule’s representation its structure, in contrast to its function, which is given by the reaction rules R. The description of the valid molecules and their structure is usually the ﬁrst step when an AC is deﬁned. This is analogous to a part of chemistry that describes what kind of atom conﬁgurations form stable molecules and how these molecules appear.
according to the chemical notation of reaction rules in the form l a 1 ; a1 C l a 2 ; a2 C ! r a 1 ; a1 C r a 2 ; a2 C : (1) The stoichiometric coeﬃcients l a; and r a; describe the amount of molecular species a 2 M in reaction 2 R on the lefthand and righthand side, respectively. Together, the stoichiometric coeﬃcients deﬁne the stoichiometric matrix S D (s a; ) D (r a; l a; ) :
(2)
An entry s a; of the stoichiometric matrix denotes the net amount of molecules of type a produced in reaction . A reaction rule determines a multiset of molecules (the lefthand side) that can react and subsequently be replaced by the molecules on the righthand side. Note that the sign “C” is not an operator here, but should only separate the components on either side. The set of all molecules M and a set of reaction rules R deﬁne the reaction network hM; Ri of the AC. The reaction network is equivalent to a Petri net [71]. It can be represented by two matrices, (l a; ) and (r a; ), made up of the stoichiometric coeﬃcients. Equivalently we can represent the reaction network as a hypergraph or a directed bipartite graph with two node types denoting reaction rules and molecular species, respectively, and edge labels for the stoichiometry. A rule is applicable only if certain conditions are fulﬁlled. The major condition is that all of the lefthand side components must be available. This condition can
187
188
Artificial Chemistry
be broadened easily to include other parameters such as neighborhood, rate constants, probability for a reaction, inﬂuence of modiﬁer species, or energy consumption. In such a case, a reaction rule would also contain additional information or further parameters. Whether or not these additional predicates are taken into consideration depends on the objectives of the artiﬁcial chemistry. If it is meant to simulate real chemistry as accurate as possible, it is necessary to integrate these parameters into the simulation system. If the goal is to build an abstract model, many of these parameters can be omitted. Like for the set of molecules we can deﬁne the set of reaction rules explicitly by enumerating all rules symbolically [20], or we can deﬁne it implicitly by referring to the structure of the molecules. Implicit deﬁnitions of reaction rules use, for example, string matching/string concatenation [3,60,64], lambdacalculus [26,27], Turing machines [40,90], ﬁnite state machines [98] or machine code language [1,16,78,87], proof theory [27], matrix multiplication [5], swarm dynamics [79], or simple arithmetic operations [8]. Note that in some cases the reactions can emerge as the result of the interaction of many atomic particles, whose dynamics is speciﬁed by force ﬁelds or rules [79]. Section “StructuretoFunction Mapping” will discuss implicitly deﬁned reactions in more details. Dynamics The third element of an artiﬁcial chemistry is its speciﬁcation of how the reaction rules give rise to a dynamic process of molecules reacting in some kind of reaction vessel. It is assumed that the reaction vessel contains multiple copies of molecules, which can be seen as instances of the molecular species M. This section summarizes how the dynamics of a reaction vessel (which usually contains a huge number of molecules) can be modeled and simulated. The approaches can be characterized roughly by whether each molecule is treated explicitly or whether all molecules of one type are represented by a number denoting their frequency or concentration. The Chemical Diﬀerential Equation A reaction network hM; Ri speciﬁes the structure of a reaction system, but does not contain any notion of time. A common way to specify the dynamics of the reaction system is by using a system of ordinary diﬀerential equations of the following form: x˙ (t) D Sv(x(t)) ;
(3)
where x D (x1 ; : : : ; x m )T 2 Rm is a concentration vector
depending on time t, S the stoichiometric matrix derived from R (Eq. (2)), and v D (v1 ; : : : ; vr )T 2 Rr a ﬂux vector depending on the current concentration vector. A ﬂux v 0 describes the velocity or turnover rate of reaction 2 R. The actual value of v depends usually on the concentration of the species participating in the reaction (i. e., LHS( ) fa 2 M : l a; > 0g) and sometimes on additional modiﬁer species, whose concentration is not inﬂuenced by that reaction. There are two important assumptions that are due to the nature of reaction systems. These assumptions relate the kinetic function v to the reaction rules R: Assumption 1 A reaction can only take place if all species of its lefthand side LHS(() ) are present. This implies that for all molecules a 2 M and reactions 2 R with a 2 LHS( ), if x a D 0 then v D 0. The ﬂux v must be zero, if the concentration xa of a molecule appearing on the lefthand side of this reaction is zero. This assumption meets the obvious fact that a molecule has to be present to react. Assumption 2 If all species LHS( ) of a reaction 2 R are present in the reactor (e. g. for all a 2 LHS( ); x a > 0) the ﬂux of that reaction is positive, (i. e., v > 0). In other words, the ﬂux v must be positive, if all molecules required for that reaction are present, even in small quantities. Note that this assumption implies that a modiﬁer species necessary for that reaction must appear as a species on the left (and right) hand side of . In short, taking Assumption 1 and 2 together, we demand for a chemical diﬀerential equation: v > 0 , 8a 2 LHS( ); x a > 0 :
(4)
Note that Assumption 1 is absolutely necessary. Although Assumption 2 is very reasonable, there might be situations where it is not true. Assume, for example, a reaction of the form a C a ! b. If there is only one single molecule a left in the reaction vessel, the reaction cannot take place. Note however, if we want to model this eﬀect, an ODE is the wrong approach and we should choose a method like those described in the following sections. There are a large number of kinetic laws fulﬁlling these assumptions (Eq. (4)), including all laws that are usually applied in practice. The most fundamental of such kinetic laws is massaction kinetics, which is just the product of the concentrations of the interacting molecules (cf. [32]): Y l a; v (x) D xa : (5) a2M
P The sum of the exponents in the kinetic law a2M l a; is called the order of the reaction. Most more compli
Artificial Chemistry
cated laws like Michaelis–Menten kinetics are derived from massaction kinetics. In this sense, massaction kinetics is the most general approach, allowing one to emulate all more complicated laws derived from it, but at the expense of a larger amount of molecular species. Explicitly Simulated Collisions The chemical diﬀerential equation is a time and state continuous approximation of a discrete system where molecules collide stochastically and react. In the introduction we saw how this can be simulated by a rather simple algorithm, which is widely used in AC research [5,16,26]. The algorithm presented in the introduction simulates a secondorder catalytic ﬂow system with constant population size. It is relatively easy to extend this algorithm to include reaction rates and arbitrary orders. First, the algorithm chooses a subset from P, which simulates a collision among molecules. Note that the resulting subset could be empty, which can be used to simulate an inﬂow (i. e., reactions of the form ! A, where the lefthand side is empty). By deﬁning the probability of obtaining a subset of size n, we can control the rate of reactions of order n. If the molecules react, the reactands are removed from the population and the products are added. Note that this algorithm can be interpreted as a stochastic rewriting system operating on a multiset of molecules [70,88]. For large population size k, the dynamics created by the algorithm tends to the continuous dynamics of the chemical diﬀerential equation assuming massaction kinetics. For the special case of secondorder catalytic ﬂow, as described by the algorithm in the introduction, the dynamics can be described by the catalytic network equation [85] X x˙ k D ˛ i;k j x i x j x k ˚(x) ; with i; j2M
˚(x) D
X
˛ i;k j x i x j ;
(6)
i; j;k2M
which is a generalization of the replicator equation [35]. If the reaction function is deterministic, we can set the kinetic constants to ( 1 reaction(i; j) D k ; k ˛ i; j D (7) 0 otherwise : This dynamic is of particular interest, because competition is generated due to a limited population size. Note that the limited population size is equivalent to an unlimited population size with a limited substrate, e. g., [3].
DiscreteEvent Simulation If the copy number of a particular molecular species becomes high or if the reaction rates diﬀer strongly, it is more eﬃcient to use discreteevent simulation techniques. The most famous technique is Gillespie’s algorithm [32]. Roughly, in each step, the algorithm generates two random numbers depending on the current population state to determine the next reaction to occur as well as the time interval t when it will occur. Then the simulated time is advanced t :D t C t, and the molecule count of the population updated according to the reaction that occurred. The Gillespie algorithm generates a statistically correct trajectory of the chemical master equation. The chemical master equation is the most general way to formulate the stochastic dynamics of a chemical system. The chemical master equation is a ﬁrstorder diﬀerential equation describing the timeevolution of the probability of a reaction system to occupy each one of its possible discrete set of states. In a wellstirred (non spatial) AC a state is equivalent to a multiset of molecules. StructuretoFunction Mapping A fundamental “logic” of real chemical systems is that molecules posses a structure and that this structure determines how the molecule interacts with other molecules. Thus, there is a mapping from the structure of a molecule to its function, which determines the reaction rules. In real chemistry the mapping from structure to function is given by natural or physical laws. In artiﬁcial chemistries the structuretofunction mapping is usually given by some kind of algorithm. In most cases, an artiﬁcial chemistry using a structuretofunction mapping does not aim at modeling real molecular dynamics in detail. Rather the aim is to capture the fact that there is a structure, a function, and a relation between them. Having an AC with a structuretofunction mapping at hand, we can study strongly constructive dynamical systems [27,28]. These are systems where, through the interaction of its components, novel molecular species appear. Example: LambdaCalculus (AlChemy) This section demonstrates a structuretofunction that employs a concept from computer science: the calculus. The calculus has been used by Fontana [26] and Fontana and Buss [27] to deﬁne a constructive artiﬁcial chemistry. In the socalled AlChemy, a molecule is a normalized expression. A expression is a word over an alphabet A D f; :; (; )g [ V where V D fx1 ; x2 ; : : :g is an inﬁnite set of available variable names. The set of expres
189
190
Artificial Chemistry
while not terminate() do reactands := choseASubsetFrom(P); if randomReal(0, 1) < reactionProbability(reactands); products = reaction(reactands); P := remove(P, reactands); P := insert(P, products); fi t := t + 1/ sizeOf(P) ;; increment simulated time od Artificial Chemistry, Algorithm 2
sions is deﬁned for x 2 V ; s1 2 ; s2 2 by x2 x:s2 2 (s2 )s1 2
variable name ; abstraction ; application :
An abstraction x:s2 can be interpreted as a function definition, where x is the parameter in the “body” s2 . The expression (s2 )s1 can be interpreted as the application of s2 on s1 , which is formalized by the rewriting rule (x:s2 )s1 D s2 [x
s1 ] ;
(8)
where s2 [x s1 ] denotes the term which is generated by replacing every unbounded occurrence of x in s2 by s1 . A variable x is bounded if it appears in in a form like : : : (x: : : : x : : :) : : :. It is also not allowed to apply the rewriting rule if a variable becomes bounded. Example: Let s1 D x1 :(x1 )x2 :x2 and s2 D x3 :x3 then we can derive: (s1 )s2 ) (x1 :(x1 )x2 :x2 )x3 :x3 ) (x3 :x3 )x2 :x2 ) x2 :x2 :
(9)
The simplest way to deﬁne a set of secondorder catalytic reactions R by applying a molecule s1 to another molecule s2 : s1 C s2 ! s1 C s2 C normalForm((s1 )s2 ) :
(10)
The procedure normalForm reduces its argument term to normal form, which is in practice bounded by a maximum of available time and memory; if these resources are exceeded before termination, the collision is considered to be elastic. It should be noted that the calculus allows an elegant generalization of the collision rule by deﬁning it by expression ˚ 2 : s1 C s2 ! s1 C s2 C normalForm(((˚ )s1 )s2 ). In order to simulate the artiﬁcial chemistry, Fontana and Buss [27] used an algorithm like the algorithm described in Subsect. “Explicitly Simulated Collisions”,
which simulates secondorder catalytic massaction kinetics in a wellstirred population under ﬂow condition. AlChemy possesses a couple of important general properties: Molecules come in two forms: as passive data possessing a structure and as active machines operating on those structures. This generates a “strange loop” like in typogenetics [36,94], which allows molecules to refer to themselves and to operate on themselves. Structures operate on structures, and by doing so change the way structures operate on structures, and so on; creating a “strange”, selfreferential dynamic loop. Obviously, there are always some laws that cannot be changed, which are here the rules of the lambdacalculus, deﬁning the structuretofunction mapping. Thus, we can interpret these ﬁxed and predeﬁned laws of an artiﬁcial chemistry as the natural or physical laws. Arithmetic and Logic Operations One of the most easy ways to deﬁne reaction rules implicitly is to apply arithmetic operations taken from mathematics. Even the simplest rules can generate interesting behaviors. Assume, for example, that the molecules are natural numbers: M D f0; 1; 2 : : : ; n 1g and the reaction rules are deﬁned by division: reaction(a1 ; a2 ) D a1 /a2 if a1 is a multiple of a2 , otherwise the molecules do not react. For the dynamics we assume a ﬁnite, discrete population and an algorithm like the one described in Subsect. “Explicitly Simulated Collisions”. With increasing population size the resulting reaction system displays a phase transition, at which the population is able to produce prime numbers with high probability (see [8] for details). In a typical simulation, the initially high diversity of a random population is reduced leaving a set of nonreacting prime numbers. A quite diﬀerent behavior can be obtained by simply replacing the division by addition: reaction(a1 ; a2 ) D a1 C a2 mod n with n D jMj the number of molecular species. Initializing the wellstirred tank reactor only with
Artificial Chemistry
the molecular species 1, the diversity rapidly increases towards its maximum and the reactor behaves apparently totally chaotic after a very short transient phase (for a suﬃciently large population size). However, there are still regularities, which depend on the prime factors of n (cf. [15]). MatrixMultiplication Chemistry More complex reaction operators can be deﬁned that operate on vectors or strings instead of scalar numbers as molecules. The matrix multiplication chemistry introduce by Banzhaf [5,6,7] uses binary strings as molecules. The reaction between two binary strings is performed by folding one of them into a matrix which then operates on the other string by multiplication.
The system has a couple of interesting properties: Despite the relatively simple deﬁnition of the basic reaction mechanism, the resulting complexity of the reaction system and its dynamics is surprisingly high. Like in typogenetics and Fontana’s lambdachemistry, molecules appear in two forms; as passive data (binary strings) and as active operators (binary matrix). The folding of a binary string to a matrix is the central operation of the structuretofunction mapping. The matrix is multiplied with substrings of the operand and thus some kind of locality is preserved, which mimics local operation of macromolecules (e. g. ribosomes or restriction enzymes) on polymers (e. g. RNA or DNA). Locality is also conserved in the folding, because bits that are close in the string are also close in the matrix.
Example of a reaction for 4bit molecules Assume a reaction s1 C s2 H) s3 . The general approach is:
Autocatalytic Polymer Chemistries
1. Fold s1 to matrix M. Example: s1 D (s11 ; s12 ; s13 ; s14 ) MD
s11 s13
s12 s14
:
(11)
2. Multiply M with subsequences of s2 . Example: Let s2 D s21 ; s22 ; s23 ; s24 be divided into two subsequences s212 D s21 ; s22 and s234 D (s23 ; s24 ). Then we can multiply M with the subsequences: s312 D M ˇ s212 ;
s334 D M ˇ s234 :
(12)
3. Compose s3 by concatenating the products. Example: s3 D s312 ˚ s334 . There are various ways of deﬁning the vector matrix product ˇ. It was mainly used with the following threshold multiplication. Given a bit vector x D (x1 ; : : : ; x n ) and a bit matrix M D (M i j ) then the term y D M ˇ x is deﬁned by: ( yj D
Pn
x i M i; j ˚ :
0
if
1
otherwise :
iD1
(13)
The threshold multiplication is similar to the common matrixvector product, except that the resulting vector is mapped to a binary vector by using the threshold ˚. Simulating the dynamics by an ODE or explicit molecular collisions (Subsect. “Explicitly Simulated Collisions”), we can observe that such a system would develop into a steady state where some string species support each other in production and thus become a stable autocatalytic cycle, whereas others would disappear due to the competition in the reactor.
In order to study the emergence and evolution of autocatalytic sets [19,46,75] Bagley, Farmer, Fontana, Kauﬀman and others [3,23,47,48,59] have used artiﬁcial chemistries where the molecules are character sequences (e. g., M D fa; b; aa; ab; ba; bb; aaa; aab; : : :g) and the reactions are concatenation and cleavage, for example: aa C babb ˛ aababb
(slow) :
(14)
Additionally, each molecule can act as a catalyst enhancing the rate of a concatenation reaction. aa C babb C bbb ˛ bbb C aababb
(fast) :
(15)
Figure 1 shows an example of an autocatalytic network that appeared. There are two time scales. Reactions that are not catalyzed are assumed to be very slow and not depicted, whereas catalyzed reactions, inﬂow, and decay are fast. Typical experiments simulate a wellstirred ﬂow reactor using metadynamical ODE frame work [3]; that is, the ODE model is dynamically changed when new molecular species appear or present species vanish. Note that the structure of the molecules does not fully determine the reaction rules. Which molecule catalyzes which reaction (the dotted arrows in Fig. 1) is assigned explicitly randomly. An interesting aspect is that the catalytic or autocatalytic polymer sets (or reaction networks) evolve without having a genome [47,48]. The simulation studies show how small, spontaneous ﬂuctuations can be ampliﬁed by an autocatalytic network, possibly leading to a modiﬁcation of the entire network [3]. Recent studies by Fernando and Rowe [24] include further aspects like energy conservation and compartmentalization.
191
192
Artificial Chemistry
Artificial Chemistry, Figure 1 Example of an autocatalytic polymer network. The dotted lines represent catalytic links, e. g., aa C ba C aaa ! aaba C aaa. All molecules are subject to a dilution flow. In this example, all polymerization reactions are reversible and there is a continuous inflow of the monomers a and b. Note that the network contains further autocatalytic networks, for example, fa; b; aa; bag and fa; aag, of which some are closed and thus organizations, for example, fa; b; aa; bag. The set fa; aag is not closed, because b and ba can be produced
Bagley and Farmer [3] showed that autocatalytic sets can be silhouetted against a noisy background of spontaneously reacting molecules under moderate (that is, neither too low nor too high) ﬂow. Artiﬁcial Chemistries Inspired by Turing Machines The concept of an abstract automaton or Turing machine provides a base for a variety of structurefunction mappings. In these approaches molecules are usually represented as a sequence of bits, characters, or instructions [36,51,63,86]. A sequence of bits speciﬁes the behavior of a machine by coding for its state transition function. Thus, like in the matrixmultiplication chemistry and the lambdachemistry, a molecule appears in two forms, namely as passive data (e. g., a binary string) and as an active machine. Also here we can call the mapping from a binary string into its machine folding, which might be indeterministic and may depend on other molecules (e. g., [40]). In the early 1970s Laing [51] argued for abstract, nonanalogous models in order to develop a general theory for living systems. For developing such general theory, socalled artiﬁcial organisms would be required. Laing suggested a series of artiﬁcial organisms [51,52,53] that should allow one to study general properties of life and thus would allow one to derive a theory which is not re
stricted to the instance of life we observe on earth. The artiﬁcial organisms consist of diﬀerent compartments, e. g., a “brain” plus “body” parts. These compartments contain binary strings as molecules. Strings are translated to a sequence of instructions forming a threedimensional shape (cf. [78,86,94]). In order to perform a reaction, two molecules are attached such that they touch at one position (Fig. 2). One of the molecules is executed like a Turing machine, manipulating the passive data molecule as its tape. Laing proved that his artiﬁcial organisms are able to perform universal computation. He also demonstrated diﬀerent forms of selfreproduction, selfdescription and selfinspection using his molecular machines [53]. Typogenetics is a similar approach, which was introduced in 1979 by Hofstadter in order to illustrate the “formal logic of life” [36]. Later, typogenetics was simulated and investigated in more detail [67,94,95]. The molecules of typogenetics are character sequences (called strands) over the alphabet A, C, G, T. The reaction rules are “typographic” manipulations based on a set of predeﬁned basic operations like cutting, insertion, or deletion of characters. A sequence of such operations forms a unit (called an enzyme) which may operate on a character sequence like a Turing machine on its tape. A character string can be “translated” to an enzyme (i. e., a sequence of operations) by mapping two characters to an operation according to a predeﬁned “genetic code”.
Artificial Chemistry
sites could not coadapt and went extinct. In a spatial environment (e. g., a 2D lattice), sets of cooperating polymers evolved, interacting in a hypercyclic fashion. The authors also observed a chemotonlike [30] cooperative behavior, with spatially isolated, membranebounded evolutionary stable molecular organizations. Machines with Fixed Tape Size In order to perform largescale systematic simulation studies like the investigation of noise [40] or intrinsic evolution [16], it makes sense to limit the study to molecules of ﬁxed tractable length. Ikegami and Hashimoto [40] developed an abstract artiﬁcial chemistry with two types of molecular species: 7 bit long tapes, which are mapped to 16 bit long machines. Tapes and machines form two separated populations, simulated in parallel. Reactions take place between a tape and a machine according to the following reaction scheme: 0 sM C sT ! sM C sT C sM C sT0 :
Artificial Chemistry, Figure 2 Illustration of Laing’s molecular machines. A program molecule is associated with a data molecule. Figure from [52]
Another artiﬁcial chemistry whose reaction mechanism is inspired by the Turing machine was suggested by McCaskill [63] in order to study the selforganization of complex chemical systems consisting of catalytic polymers. Variants of this AC were realized later in a specially designed parallel reconﬁgurable computer based on FPGAs – Field Programmable Gate Arrays [12,18,89]. In this approach, molecules are binary strings of ﬁxed length (e. g., 20 [63]) or of variable length [12]. As in previous approaches, a string codes for an automaton able to manipulate other strings. And again, pattern matching is used to check whether two molecules can react and to obtain the “binding site”; i. e., the location where the active molecule (machine) manipulates the passive molecule (tape). The general reaction scheme can be written as: sM C sT ! sM C sT0 :
(16)
In experimental studies, selfreplicating strings appeared frequently. In coevolution with parasites, an evolutionary arms race started among these species and the selfreplicating string diversiﬁed to an extent that the para
(17)
A machine sM can react with a tape sT , if its head matches a substring of the tape sT and its tail matches a diﬀerent substring of the tape sT . The machine operates only between these two substrings (called reading frame) which results in a tape sT0 . The tape sT0 is then “translated” (folded) 0 . into a machine sM Ikegami and Hashimoto [40] showed that under the inﬂuence of low noise, simple autocatalytic loops are formed. When the noise level is increased, the reaction network is destabilized by parasites, but after a relatively long transient phase (about 1000 generations) a very stable, dense reaction network appears, called core network [40]. A core network maintains its relatively high diversity even if the noise is deactivated. The active mutation rate is high. When the noise level is very high, only small, degenerated core networks emerge with a low diversity and very low (even no) active mutation. The core networks which emerged under the inﬂuence of external noise are very stable so that their is no further development after they have appeared. Assembler Automata An Assembler automaton is like a parallel von Neumann machine. It consists of a core memory and a set of processing units running in parallel. Inspired by Core Wars [14], assembler automata have been used to create certain artiﬁcial life systems, such as Coreworld [72,73], Tierra [74], Avida [1], and Amoeba [69]. Although these machines have been classiﬁed as artiﬁcial chemistries [72], it is in general diﬃcult to identify molecules or reactions. Furthermore, the assembler automaton Tierra has explicitly
193
194
Artificial Chemistry
been designed to imitate the Cambrian explosion and not a chemical process. Nevertheless, in some cases we can interpret the execution of an assembler automaton as a chemical process, which is especially possible in a clear way in experiments with Avida. Here, a molecule is a single assembler program, which is protected by a memory management system. The system is initialized with manually written programs that are able to selfreplicate. Therefore, in a basic version of Avida only unimolecular ﬁrstorder reactions occur, which are of replicator type. The reaction scheme can be written as a ! a C mutation(a). The function mutation represents the possible errors that can occur while the program is selfreplicating. Latticei Molecular Systems In this section systems are discussed which consist of a regular lattice, where each lattice site can hold a part (e. g. atom) of a molecule. Between parts, bonds can be formed so that a molecule covers many lattice sites. This is different to systems where a lattice site holds a complete molecule, like in Avida. The important diﬀerence is that in lattice molecular systems, the space of the molecular structure is identical to the space in which the molecules are ﬂoating around. In systems where a molecule covers just one lattice site, the molecular structure is described in a diﬀerent space independently from the space in which the molecule is located. Lattice molecular systems have been intensively used to model polymers [92], protein folding [82], and RNA structures. Besides approaches which intend to model real molecular dynamics as accurately as possible, there are also approaches which try to build abstract models. These models should give insight into statistical properties of polymers like their energy landscape and folding processes [76,77], but should not give insight into questions concerning origin and evolution selfmaintaining organizations or molecular information processing. For these questions more abstract systems are studied as described in the following. Varela, Maturana, and Uribe introduced in [93] a lattice molecular system to illustrate their concept of autopoiesis (cf. [65,99]). The system consists of a 2D square lattice. Each lattice site can be occupied by one of the following atoms: Substrate S, catalyst K, and monomer L. Atoms may form bonds and thus form molecular structures on the lattice. If molecules come close to each other they may react according to the following reaction rules: K C 2S ! K C L (1) Composition: Formation of a monomer. : : : LL C L ! : : : LLL (2) Concatenation:
Artificial Chemistry, Figure 3 Illustration of a lattice molecular automaton that contains an autopoietic entity. Its membrane is formed by a chain of L monomers and encloses a catalyst K. Only substrate S may diffuse through the membrane. Substrate inside is catalyzed by K to form free monomers. If the membrane is damaged by disintegration of a monomer L it can be quickly repaired by a free monomer. See [65,93] for details
L C L ! LL
Formation of a bond between a monomer and another monomer with no more than one bond. This reaction is inhibited by a doublebounded monomer [65]. L ! 2S (3) Disintegration: Decomposition of a monomer Figure 3 illustrates an autopoietic entity that may arise. Note that it is quite diﬃcult to ﬁnd the right dynamical conditions under which such autopoietic structures are stable [65,68].
Cellular Automata Cellular automata Mathematical Basis of Cellular Automata, Introduction to can be used as a medium to simulate chemicallike processes in various ways. An obvious approach is to use the cellular automata to model space where each cell can hold a molecule (like in Avida) or atom (like in lattice molecular automata). However there are approaches where it is not clear at the onset what a molecule or reaction is. The model speciﬁcation does not contain any notion of a molecule or reaction, so that an observer has to identify them. Molecules can be equated
Artificial Chemistry
Artificial Chemistry, Figure 4 Example of mechanical artificial chemistries. a Selfassembling magnetic tiles by Hosokawa et al. [38]. b Rotating magnetic discs by Grzybowski et al. [33]. Figures taken (and modified) from [33,38]
with selfpropagating patterns like gliders, selfreproducing loops [54,80], or with the moving boundary between two homogeneous domains [37]. Note that in the latter case, particles become visible as spacetime structures, which are deﬁned by boundaries and not by connected set of cells with speciﬁc states. An interaction between these boundaries is then interpreted as a reaction. Mechanical Artiﬁcial Chemistries There are also physical systems that can be regarded as artiﬁcial chemistries. Hosokawa et al. [38] presented a mechanical selfassembly system consisting of triangular shapes that form bonds by permanent magnets. Interpreting attachment and detachment as chemical reactions, Hosokawa et al. [38] derived a chemical diﬀerential equation (Subsect. “The Chemical Diﬀerential Equation”) modeling the kinetics of the structures appearing in the system. Another approach uses millimetersized magnetic disks at a liquidair interface, subject to a magnetic ﬁeld produced by a rotating permanent magnet [33]. These magnetic discs exhibit various types of structures, which might even interact resembling chemical reactions. The important diﬀerence to the ﬁrst approach is that the rotating magnetic discs form dissipative structures, which require continuous energy supply.
SemiRealistic Molecular Structures and Graph Rewriting Recent development in artiﬁcial chemistry aims at more realistic molecular structures and reactions, like oo chemistry by Bersini [10] (see also [57]) or toy chemistry [9]. These approaches apply graph rewriting, which is a powerful and quite universal approach for deﬁning transformations of graphlike structures [50] In toy chemistry, molecules are represented as labeled graphs; i. e., by their structural formulas; their basic properties are derived by a simpliﬁed version of the extended Hückel MO theory that operates directly on the graphs; chemical reaction mechanisms are implemented as graph rewriting rules acting on the structural formulas; reactivities and selectivities are modeled by a variant of the frontier molecular orbital theory based on the extended Hückel scheme. Figure 5 shows an example of a graphrewriting rule for unimolecular reactions. Space Many approaches discussed above assume that the reaction vessel is wellstirred. However, especially in living systems, space plays an important role. Cells, for example, are not a bag of enzymes, but spatially highly structured.
195
196
Artificial Chemistry
Theory
Artificial Chemistry, Figure 5 An example of a graphrewriting rule (top) and its application to the synthesis of a bridge ring system (bottom). Figure from [9]
Moreover, it is assumed that space has played an important role in the origin of life. Techniques for Modeling Space In chemistry, space is usually modeled by assuming an Euclidean space (partial diﬀerential equation or particlebased) or by compartments, which we can ﬁnd also in many artiﬁcial chemistries [29,70]. However, some artiﬁcial chemistries use more abstract, “computer friendly” spaces, such as a corememory like in Tierra [74] or a planar triangular graph [83]. There are systems like MGS [31] that allow to specify various topological structures easily. Approaches like Psystems and membrane computing [70] allow one even to change the spatial structure dynamically Membrane Computing. Phenomena in Space In general, space delays the extinction of species by ﬁtter species, which is for example, used in Avida to obtain more complex evolutionary patterns [1]. Space leads usually to higher diversity. Moreover, systems that are instable in a wellstirred reaction vessel can become stable in a spatial situation. An example is the hypercycle [20], which stabilizes against parasites when space is introduced [11]. Conversely, space can destabilize an equilibrium, leading to symmetry breaking, which has been suggested as an important mechanism underlying morphogenesis [91]. Space can support the coexistence of chemical species, which would not coexist in a wellstirred system [49]. Space is also necessary for the formation of autopoietic structures [93] and the formation of units that undergo Darwinian evolution [24].
There is a large body of theory from domains like chemical reaction system theory [21], Petri net theory [71], and rewriting system theory (e. g., Psystems [70]), which applies also to artiﬁcial chemistries. In this section, however, we shall investigate in more detail those theoretical concepts which have emerged in artiﬁcial chemistry research. When working with complex artiﬁcial chemistries, we are usually more interested in the qualitative than quantitative nature of the dynamics. That is, we study how the set of molecular species present in the reaction vessel changes over time, rather than studying a more detailed trajectory in concentration space. The most prominent qualitative concept is the autocatalytic set, which has been proposed as an important element in the origin of life [19,46,75]. An autocatalytic set is a set of molecular species where each species is produced by at least one catalytic reaction within the set [41,47] (Fig. 1). This property has also been called selfmaintaining [27] or (catalytic) closure [48]. Formally: A set of species A M is called an autocatalytic set (or sometimes, catalytically closed set), if for all species1 a 2 A there is a catalyst a0 2 A, and a reaction 2 R such that a0 is catalyst in (i. e., a0 2 LHS( ) and a 0 2 RHS( )) and can take place in A (i. e., LHS( ) A). Example 1 (autocatalytic sets) R D fa ! 2a; a ! a C b; a !; b !g. Molecule a catalyzes its own production and the production of b. Both molecules are subject to a dilution ﬂow (or, equivalently, spontaneous decay), which is usually assumed in ACs studying autocatalytic sets. In this example there are three autocatalytic sets: two nontrivial autocatalytic sets fag and fa; bg, and the empty set fg, which is from a mathematical point of view also an autocatalytic set. The term “autocatalytic set” makes sense only in ACs where catalysis is possible, it is not useful when applied to arbitrary reaction networks. For this reason Dittrich and Speroni di Fenizio [17] introduced a general notion of selfmaintenance, which includes the autocatalytic set as a special case: Formally, given a reaction network hM; Ri with m D jMj molecules and r D jRj reactions, and let S D (s a; ) be the (m r) stoichiometric matrix implied by the reaction rules R, where s a; denotes the number 1 For general reaction systems, this deﬁnition has to be reﬁned. When A contains species that are part of the inﬂow, like a and b in Fig. 1, but which are not produced in a catalytic way, we might want them to be part of an autocatalytic set. Assume, for example, the set A D fa; b; aa; bag from Fig. 1, where aa catalyzes the production of aa and ba, while using up “substrate” a and b, which are not catalytically produced.
Artificial Chemistry
of molecules of type a produced in reaction . A set of molecules C M is called selfmaintaining, if there exists a ﬂux vector v 2 Rr such that the following three conditions apply: (1) for all reactions that can take place in C (i. e., LHS( ) C) the ﬂux v > 0; (2) for all remaining reactions (i. e., LHS( ) ª C), the ﬂux v D 0; and (3) for all molecules a 2 C, the production rate (Sv) a 0. v denotes the element of v describing the ﬂux (i. e. rate) of reaction . (Sv) a is the production rate of molecule a given ﬂux vector v. In Example 1 there are three selfmaintaining (autocatalytic) sets. Interestingly, there are selfmaintaining sets that cannot make up a stationary state. In our Example 1 only two of the three selfmaintaining sets are species combinations that can make up a stationary state (i. e., a state x0 for which 0 D Sv(x0 ) holds). The selfmaintaining set fag cannot make up a stationary state because a generates b through the reaction a ! a C b. Thus, there is no stationary state (of Eq. (3) with Assumptions 1 and 2) in which the concentration of a is positive while the concentration of b is zero. In order to ﬁlter out those less interesting selfmaintaining sets, Fontana and Buss [27] introduced a concept taken from mathematics called closure: Formally, a set of species A M is closed, if for all reactions with LHS( ) A (the reactions that can take place in A), the products are also contained in A, i. e., RHS( ) A). Closure and selfmaintenance lead to the important concept of a chemical organization [17,27]: Given an arbitrary reaction network hM; Ri, a set of molecular species that is closed and selfmaintaining is called an organization. The importance of an organization is illustrated by a theorem roughly saying that given a ﬁxed point of the chemical ODE (Eq. (3)), then the species with positive concentrations form an organization [17]. In other words, we have only to check those species combinations for stationary states that are organizations. The set of all organizations can be visualized nicely by a Hassediagram, which sketches the hierarchical (organizational) structure of the reaction network (Fig. 6). The dynamics of the artiﬁcial chemistry can then be explained within this Hassediagram as a movement from organization to organization [61,84]. Note that in systems under ﬂow condition, the (ﬁnite) set of all organizations forms an algebraic lattice [17]. That is, given two organizations, there is always a unique organization union and organization intersection. Although the concept of autopoiesis [93] has been described informally in quite some detail, a stringent formal deﬁnition is lacking. In a way, we can interpret the formal concepts above as an attempt to approach necessary prop
Artificial Chemistry, Figure 6 Lattice of organizations of the autocatalytic network shown in Fig. 1. An organization is a closed and selfmaintaining set of species. Two organizations are connected by a line, if one is contained in the other and there is no organization in between. The vertical position of an organization is determined by the number of species it contains
erties of an autopoietic system step by step by precise formal means. Obviously, being (contained in) at least one organization is a necessary condition for a chemical autopoietic system. But it is not suﬃcient. Missing are the notion of robustness and a spatial concept that formalizes a system’s ability to maintain (and selfcreate) its own identity, e. g., through maintaining a membrane (Fig. 3). Evolution With artiﬁcial chemistries we can study chemical evolution. Chemical evolution (also called “prebiotic evolution”) describes the ﬁrst step in the development of life, such as the formation of complex organic molecules from simpler (in)organic compounds [62]. Because there are no prebiotic fossils, the study of chemical evolution has to rely on experiments [56,66] and theoretic (simulation) models [19]. Artiﬁcial chemistries aim at capturing the
197
198
Artificial Chemistry
constructive nature of these chemical systems and try to reproduce their evolution in computer simulations. The approaches can be distinguished by whether the evolution is driven by external operators like mutation, or whether variation and selection appears intrinsically through the chemical dynamics itself. Extrinsic Evolution In extrinsic approaches, an external variation operator changes the reaction network by adding, removing, or manipulating a reaction, which may also lead to the addition or removal of chemical species. In this approach, a molecule does not need to posses a structure. The following example by Jain and Krishna [41] shows that this approach allows one to create an evolving system quite elegantly: Let us assume that the reaction network consists of m species M D f1; : : : ; mg. There are ﬁrstorder catalytic reaction rules of the form (i ! i C j) 2 R and a general dilution ﬂow (a !) 2 R for all species a 2 M. The reaction network is completely described by a directed graph represented by the adjacency matrix C D (c i; j ); i; j 2 M; c i; j 2 f0; 1g, with c i; j D 1 if molecule j catalyzes the production of i, and c i; j D 0 otherwise. In order to avoid that selfreplicators dominate the system, Jain and Krishna assume c i;i D 0 for all molecules i 2 M. At the beginning, the reaction network is randomly initialized, that is, (for i ¤ j)c i; j D 1 with probability p and c i; j D 0, otherwise. In order to simulate the dynamics, we assume a population of molecules represented by the concentration vector x D (x1 ; : : : ; x m ), where xi represents the current concentration of species i. The whole system is simulated in the following way: Step 1: Simulate the chemical diﬀerential equation X X c j;i x i c k; j x j ; (18) x˙ i D i2M
a slow, evolutionary time scale at which the reaction network evolves (Step 2 and 3), and a fast time scale at which the molecules catalytically react (Step 1). In this model we can observe how an autocatalytic set inevitably appears after a period of disorder. After its arrival the largest autocatalytic set increases rapidly its connectivity until it spans the whole network. Subsequently, the connectivity converges to a steady state determined by the rate of external perturbations. The resulting highly nonrandom network is not fully stable, so that we can also study the causes for crashes and recoveries. For example, Jain and Krishna [44] identiﬁed that in the absence of large external perturbation, the appearance of a new viable species is a major cause of large extinction and recoveries. Furthermore, crashes can be caused by extinction of a “keystone species”. Note that for these observations, the new species created in Step 3 do not need to inherit any information from other species. Intrinsic Evolution Evolutionary phenomena can also be caused by the intrinsic dynamics of the (artiﬁcial) chemistry. In this case, external operators like those mutating the reaction network are not required. As opposed to the previous approach, we can deﬁne the whole reaction network at the onset and let only the composition of molecules present in the reaction vessel evolve. The reaction rules are (usually) de
k; j2M
until a steady state is reached. Note that this steady state is usually independent of the initial concentrations, cf. [85]. Step 2: Select the “mutating” species i, which is the species with smallest concentration in the steady state. Step 3: Update the reaction network by replacing the mutating species by a new species, which is created randomly in the same way as the initial species. That is, the entries of the ith row and ith column of the adjacency matrix C are replaced by randomly chosen entries with the same probability p as during initialization. Step 4: Go to Step 1. Note that there are two explicitly simulated time scales:
Artificial Chemistry, Figure 7 Illustration of syntactically and semantically closed organizations. Each organization consists of an infinite number of molecular species (connected by a dotted line). A circle represents a molecule having the structure: x1 :x2 : : : : 27
Artificial Chemistry
ﬁned by a structuretofunction mapping similar to those described in Sect. “StructuretoFunction Mapping”. The advantage of this approach is that we need not deﬁne an external operator changing the reaction network’s topology. Furthermore, the evolution can be more realistic because, for example, molecular species that have vanished can reenter at a later time, which does not happen in an approach like the one described previously. Also, when we have a structuretofunction mapping, we can study how the structure of the molecules is related to the emergence of chemical organizations. There are various approaches using, for example, Turing machines [63], lambdacalculus [26,27], abstract automata [16], or combinators [83] for the structuretofunction mapping. In all of those approaches we can observe an important phenomenon: While the AC evolves autocatalytic sets or more general, chemical organizations become visible. Like in the system by Jain and Krishna this eﬀect is indicated also by an increase of the connectivity of the species within the population. When an organization becomes visible, the system has focused on a subspace of the whole set of possible molecules. Those molecules, belonging to the emerged or
ganization, posses usually relatively high concentrations, because they are generated by many reactions. The closure of this set is usually smaller than the universe M. Depending on the setting the emerged organization can consist of a single selfreplicator, a small set of mutually producing molecules, or a large set of diﬀerent molecules. Note that in the latter case the organization can be so large (and even inﬁnite) that not all its molecules are present in the population (see Fig. 7 for an example). Nevertheless the population can carry the organization, if the system can keep a generating set of molecules within the population. An emerged organization can be quite stable but can also change either by external perturbations like randomly inserted molecules or by internal ﬂuctuations caused by reactions among molecules that are not part of the emerged organization, but which have remained in small quantities in the population, cf. for an example [61]. Dittrich and Banzhaf [16] have shown that it is even possible to obtain evolutionary behavior without any externally generated variation, that is, even without any inﬂow of random molecules. And without any explicitly deﬁned ﬁtness function or selection process. Selection emerges as a result of the dilution ﬂow and the limited population size.
Artificial Chemistry, Figure 8 Example of a selfevolving artificial chemistry. The figure shows the concentration of some selected species of a reaction vessel containing approximately 104 different species. In this example, molecules are binary strings of length 32 bit. Two binary strings react by mapping one of them to a finite state machine operating on the second binary string. There is only a general, nonselective dilution flow. No other operators like mutation, variation, or selection are applied. Note that the structure of a new species tends to be similar to the structure of the species it has been created from. The dynamics simulated by the algorithm of the introduction. Population size k D 106 molecules. Figure from [16]
199
200
Artificial Chemistry
Syntactic and Semantic Closure In many “constructive” implicitly deﬁned artiﬁcial chemistries we can observe species appearing that share syntactical and functional similarities that are invariant under the reaction operation. Fontana and Buss [27] called this phenomenon syntactic and semantic closure. Syntactic closure refers to the observation that the molecules within an emerged organization O M are structurally similar. That is, they share certain structural features. This allows one to describe the set of molecules O by a formal language or grammar in a compact way. If we would instead pick a subset A from M randomly, the expected length of A’s description is on the order of its size. Assume, for example, if we pick one million strings of length 100 randomly from the set of all strings of length 100, then we would need about 100 million characters to describe the resulting set. Interestingly, the organizations that appear in implicitly deﬁned ACs can be described much more compactly by a grammar. Syntactic and semantic closure should be illustrated with an example taken from [27], where O M is even inﬁnite in size. In particular experiments with the lambdachemistry, molecules appeared that posses the following structure: A i; j x1 :x2 : : : : x i :x j
with
ji;
(19)
for example x1 :x2 :x3 :x2 . Syntactical closure means that we can specify such structural rules specifying the molecules of O and that these structural rules are invariant under the reaction mechanism. If molecules with the structure given by Eq. (19) react, their product can also be described by Eq. (19). Semantic closure means that we can describe the reactions taking place within O by referring to the molecule’s grammatical structure (e. g. Eq. (19)). In our example, all reactions within O can be described by the following laws (illustrated by Fig. 7):
Note that syntactical and semantical closure appears also in real chemical system and is exploited by chemistry to organize chemical explanations. It might even be stated that without this phenomenon, chemistry as we know it would not be possible. Information Processing The chemical metaphor gives rise to a variety of computing concepts, which are explained in detail in further articles of this encyclopedia. In approaches like amorphous computing Amorphous Computing chemical mechanisms are used as a subsystem for communication between a huge number of simple, spatially distributed processing units. In membrane computing Membrane Computing, rewriting operations are not only used to react molecules, but also to model active transport between compartments and to change the spatial compartment structure itself. Beside these theoretical and insilico artiﬁcial chemistries there are approaches that aim at using real molecules: Spatial patterns like waves and solitons in excitable media can be exploited to compute, see ReactionDiﬀusion Computing and Computing in Geometrical Constrained Excitable Chemical Systems. Whereas other approaches like conformational computing and DNA computing DNA Computing do not rely on spatially structured reaction vessels. Other closely related topics are molecular automata Molecular Automata and bacterial computing Bacterial Computing. Future Directions
Currently, we can observe a convergence of artiﬁcial chemistries and chemical models towards more realistic artiﬁcial chemistries [9]. On the one hand, models of systems biology are inspired by techniques from artiﬁcial chemistries, e. g., in the domain of rulebased modeling [22,34]. On the other hand, artiﬁcial chemistries become more realistic, for example, by adopting more real8i; j > 1; k; l : A i; j C A k;l H) A i1; j1 ; istic molecular structures or using techniques from computational chemistry to calculate energies [9,57]. Further8i > 1; j D 1; k; l : A i; j C A k;l H) A kCi1;l Ci1 : (20) more, the relation between artiﬁcial chemistries and real biological systems is made more and more explicit [25, For example x1 :x2 :x3 :x4 :x3 C x1 :x2 :x2 H) 45,58]. In the future, novel theories and techniques have to be x1 :x2 :x3 :x2 and x1 :x2 :x1 Cx1 :x2 :x3 :x4 :x3 H) x1 :x2 :x3 :x4 :x5 :x4 , respectively (Fig. 7). As a result developed to handle complex, implicitly deﬁned reaction of the semantic closure we need not to refer to the un systems. This development will especially be driven by the derlying reaction mechanism (e. g., the lambdacalculus). needs of system biology, when implicitly deﬁned models Instead, we can explain the reactions within O on a more (i. e., rulebased models) become more frequent. Another open challenge lies in the creation of realistic abstract level by referring to the grammatical structure (chemical) openended evolutionary systems. For exam(e. g., Eq. (19)).
Artificial Chemistry
ple, an artiﬁcial chemistry with “true” openended evolution or the ability to show a satisfying long transient phase, which would be comparable to the natural process where novelties continuously appear, has not been presented yet. The reason for this might be lacking computing power, insuﬃcient manpower to implement what we know, or missing knowledge concerning the fundamental mechanism of (chemical) evolution. Artiﬁcial chemistries could provide a powerful platform to study whether the mechanisms that we believe explain (chemical) evolution are indeed candidates. As sketched above, there is a broad range of currently explored practical application domains of artiﬁcial chemistries in technical artifacts, such as ambient computing, amorphous computing, organic computing, or smart materials. And eventually, artiﬁcial chemistry might come back to the realm of real chemistry and inspire the design of novel computing reaction systems like in the ﬁeld of molecular computing, molecular communication, bacterial computing, or synthetic biology.
Bibliography Primary Literature 1. Adami C, Brown CT (1994) Evolutionary learning in the 2D artificial life system avida. In: Brooks RA, Maes P (eds) Prof artificial life IV. MIT Press, Cambridge, pp 377–381. ISBN 0262521903 2. Adleman LM (1994) Molecular computation of solutions to combinatorical problems. Science 266:1021 3. Bagley RJ, Farmer JD (1992) Spontaneous emergence of a metabolism. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. AddisonWesley, Redwood City, pp 93–140. ISBN 0201525704 4. Banâtre JP, Métayer DL (1986) A new computational model and its discipline of programming. Technical Report RR0566. INRIA, Rennes 5. Banzhaf W (1993) Selfreplicating sequences of binary numbers – foundations I and II: General and strings of length n = 4. Biol Cybern 69:269–281 6. Banzhaf W (1994) Selfreplicating sequences of binary numbers: The buildup of complexity. Complex Syst 8:215–225 7. Banzhaf W (1995) Selforganizing algorithms derived from RNA interactions. In: Banzhaf W, Eeckman FH (eds) Evolution and Biocomputing. LNCS, vol 899. Springer, Berlin, pp 69–103 8. Banzhaf W, Dittrich P, Rauhe H (1996) Emergent computation by catalytic reactions. Nanotechnology 7(1):307–314 9. Benkö G, Flamm C, Stadler PF (2003) A graphbased toy model of chemistry. J Chem Inf Comput Sci 43(4):1085–1093. doi:10. 1021/ci0200570 10. Bersini H (2000) Reaction mechanisms in the oo chemistry. In: Bedau MA, McCaskill JS, Packard NH, Rasmussen S (eds) Artificial life VII. MIT Press, Cambridge, pp 39–48 11. Boerlijst MC, Hogeweg P (1991) Spiral wave structure in prebiotic evolution: Hypercycles stable against parasites. Physica D 48(1):17–28
12. Breyer J, Ackermann J, McCaskill J (1999) Evolving reactiondiffusion ecosystems with selfassembling structure in thin films. Artif Life 4(1):25–40 13. Conrad M (1992) Molecular computing: The keylock paradigm. Computer 25:11–22 14. Dewdney AK (1984) In the game called core war hostile programs engage in a battle of bits. Sci Amer 250:14–22 15. Dittrich P (2001) On artificial chemistries. Ph D thesis, University of Dortmund 16. Dittrich P, Banzhaf W (1998) Selfevolution in a constructive binary string system. Artif Life 4(2):203–220 17. Dittrich P, Speroni di Fenizio P (2007) Chemical organization theory. Bull Math Biol 69(4):1199–1231. doi:10.1007/ s1153800691308 18. Ehricht R, Ellinger T, McCascill JS (1997) Cooperative amplification of templates by crosshybridization (CATCH). Eur J Biochem 243(1/2):358–364 19. Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58(10):465–523 20. Eigen M, Schuster P (1977) The hypercycle: A principle of natural selforganisation, part A. Naturwissenschaften 64(11): 541–565 21. Érdi P, Tóth J (1989) Mathematical models of chemical reactions: Theory and applications of deterministic and stochastic models. Pinceton University Press, Princeton 22. Faeder JR, Blinov ML, Goldstein B, Hlavacek WS (2005) Rulebased modeling of biochemical networks. Complexity. doi:10. 1002/cplx.20074 23. Farmer JD, Kauffman SA, Packard NH (1986) Autocatalytic replication of polymers. Physica D 22:50–67 24. Fernando C, Rowe J (2007) Natural selection in chemical evolution. J Theor Biol 247(1):152–167. doi:10.1016/j.jtbi.2007.01. 028 25. Fernando C, von Kiedrowski G, Szathmáry E (2007) A stochastic model of nonenzymatic nucleic acid replication: Elongators sequester replicators. J Mol Evol 64(5):572–585. doi:10.1007/ s0023900602184 26. Fontana W (1992) Algorithmic chemistry. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. AddisonWesley, Redwood City, pp 159–210 27. Fontana W, Buss LW (1994) ‘The arrival of the fittest’: Toward a theory of biological organization. Bull Math Biol 56: 1–64 28. Fontana W, Buss LW (1996) The barrier of objects: From dynamical systems to bounded organization. In: Casti J, Karlqvist A (eds) Boundaries and barriers. AddisonWesley, Redwood City, pp 56–116 29. Furusawa C, Kaneko K (1998) Emergence of multicellular organisms with dynamic differentiation and spatial pattern. Artif Life 4:79–93 30. Gánti T (1975) Organization of chemical reactions into dividing and metabolizing units: The chemotons. Biosystems 7(1):15–21 31. Giavitto JL, Michel O (2001) MGS: A rulebased programming language for complex objects and collections. Electron Note Theor Comput Sci 59(4):286–304 32. Gillespie DT (1976) General method for numerically simulating stochastic time evolution of coupled chemicalreaction. J Comput Phys 22(4):403–434 33. Grzybowski BA, Stone HA, Whitesides GM (2000) Dynamic self
201
202
Artificial Chemistry
34.
35. 36. 37.
38.
39. 40. 41.
42.
43.
44.
45. 46.
47. 48. 49.
50.
51. 52. 53. 54. 55. 56.
assembly of magnetized, millimetresized objects rotating at a liquidair interface. Nature 405(6790):1033–1036 Hlavacek W, Faeder J, Blinov M, Posner R, Hucka M, Fontana W (2006) Rules for modeling signaltransduction systems. Sci STKE 2006:re6 Hofbauer J, Sigmund K (1988) Dynamical systems and the theory of evolution. University Press, Cambridge Hofstadter DR (1979) Gödel, Escher, Bach: An eternal golden braid. Basic Books Inc, New York. ISBN 0465026850 Hordijk W, Crutchfield JP, Mitchell M (1996) Embeddedparticle computation in evolved cellular automata. In: Toffoli T, Biafore M, Leäo J (eds) PhysComp96. New England Complex Systems Institute, Cambridge, pp 153–8 Hosokawa K, Shimoyama I, Miura H (1994) Dynamics of selfassembling systems: Analogy with chemical kinetics. Artif Life 1(4):413–427 Hutton TJ (2002) Evolvable selfreplicating molecules in an artificial chemistry. Artif Life 8(4):341–356 Ikegami T, Hashimoto T (1995) Active mutation in selfreproducing networks of machines and tapes. Artif Life 2(3):305–318 Jain S, Krishna S (1998) Autocatalytic sets and the growth of complexity in an evolutionary model. Phys Rev Lett 81(25):5684–5687 Jain S, Krishna S (1999) Emergence and growth of complex networks in adaptive systems. Comput Phys Commun 122:116–121 Jain S, Krishna S (2001) A model for the emergence of cooperation, interdependence, and structure in evolving networks. Proc Natl Acad Sci USA 98(2):543–547 Jain S, Krishna S (2002) Large extinctions in an evolutionary model: The role of innovation and keystone species. Proc Natl Acad Sci USA 99(4):2055–2060. doi:10.1073/pnas.032618499 Kaneko K (2007) Life: An introduction to complex systems biology. Springer, Berlin Kauffman SA (1971) Cellular homeostasis, epigenesis and replication in randomly aggregated macromolecular systems. J Cybern 1:71–96 Kauffman SA (1986) Autocatalytic sets of proteins. J Theor Biol 119:1–24 Kauffman SA (1993) The origins of order: Selforganization and selection in evolution. Oxford University Press, New York Kirner T, Ackermann J, Ehricht R, McCaskill JS (1999) Complex patterns predicted in an in vitro experimental model system for the evolution of molecular cooperation. Biophys Chem 79(3):163–86 Kniemeyer O, BuckSorlin GH, Kurth W (2004) A graph grammar approach to artificial life. Artif Life 10(4):413–431. doi:10.1162/ 1064546041766451 Laing R (1972) Artificial organisms and autonomous cell rules. J Cybern 2(1):38–49 Laing R (1975) Some alternative reproductive strategies in artificial molecular machines. J Theor Biol 54:63–84 Laing R (1977) Automaton models of reproduction by selfinspection. J Theor Biol 66:437–56 Langton CG (1984) Selfreproduction in cellular automata. Physica D 10D(1–2):135–44 Langton CG (1989) Artificial life. In: Langton CG (ed) Proc of artificial life. AddisonWesley, Redwood City, pp 1–48 Lazcano A, Bada JL (2003) The 1953 Stanley L. Miller experiment: Fifty years of prebiotic organic chemistry. Orig Life Evol Biosph 33(3):235–42
57. Lenaerts T, Bersini H (2009) A synthon approach to artificial chemistry. Artif Life 9 (in press) 58. Lenski RE, Ofria C, Collier TC, Adami C (1999) Genome complexity, robustness and genetic interactions in digital organisms. Nature 400(6745):661–4 59. Lohn JD, Colombano S, Scargle J, Stassinopoulos D, Haith GL (1998) Evolution of catalytic reaction sets using genetic algorithms. In: Proc IEEE International Conference on Evolutionary Computation. IEEE, New York, pp 487–492 60. Lugowski MW (1989) Computational metabolism: Towards biological geometries for computing. In: Langton CG (ed) Artificial Life. AddisonWesley, Redwood City, pp 341–368. ISBN 0201093464 61. Matsumaru N, Speroni di Fenizio P, Centler F, Dittrich P (2006) On the evolution of chemical organizations. In: Artmann S, Dittrich P (eds) Proc of the 7th german workshop of artificial life. IOS Press, Amsterdam, pp 135–146 62. Maynard Smith J, Szathmáry E (1995) The major transitions in evolution. Oxford University Press, New York 63. McCaskill JS (1988) Polymer chemistry on tape: A computational model for emergent genetics. Internal report. MPI for Biophysical Chemistry, Göttingen 64. McCaskill JS, Chorongiewski H, Mekelburg D, Tangen U, Gemm U (1994) Configurable computer hardware to simulate longtime selforganization of biopolymers. Ber Bunsenges Phys Chem 98(9):1114–1114 65. McMullin B, Varela FJ (1997) Rediscovering computational autopoiesis. In: Husbands P, Harvey I (eds) Fourth european conference on artificial life. MIT Press, Cambridge, pp 38–47 66. Miller SL (1953) A production of amino acids under possible primitive earth conditions. Science 117(3046):528–9 67. Morris HC (1989) Typogenetics: A logic for artificial life. In: Langton CG (ed) Artif life. AddisonWesley, Redwood City, pp 341–368 68. Ono N, Ikegami T (2000) Selfmaintenance and selfreproduction in an abstract cell model. J Theor Biol 206(2):243–253 69. Pargellis AN (1996) The spontaneous generation of digital “life”. Physica D 91(1–2):86–96 70. Pˇaun G (2000) Computing with membranes. J Comput Syst Sci 61(1):108–143 71. Petri CA (1962) Kommunikation mit Automaten. Ph D thesis, University of Bonn 72. Rasmussen S, Knudsen C, Feldberg R, Hindsholm M (1990) The coreworld: Emergence and evolution of cooperative structures in a computational chemistry. Physica D 42:111–134 73. Rasmussen S, Knudsen C, Feldberg R (1992) Dynamics of programmable matter. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. AddisonWesley, Redwood City, pp 211–291. ISBN 0201525704 74. Ray TS (1992) An approach to the synthesis of life. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. AddisonWesley, Redwood City, pp 371–408 75. Rössler OE (1971) A system theoretic model for biogenesis. Z Naturforsch B 26(8):741–746 76. Sali A, Shakhnovich E, Karplus M (1994) How does a protein fold? Nature 369(6477):248–251 77. Sali A, Shakhnovich E, Karplus M (1994) Kinetics of protein folding: A lattice model study of the requirements for folding to the native state. J Mol Biol 235(5):1614–1636 78. Salzberg C (2007) A graphbased reflexive artificial chemistry. Biosystems 87(1):1–12
Artificial Chemistry
79. Sayama H (2009) Swarm chemistry. Artif Life. (in press) 80. Sayama H (1998) Introduction of structural dissolution into Langton’s selfreproducing loop. In: Adami C, Belew R, Kitano H, Taylor C (eds) Artificial life VI. MIT Press, Cambridge, pp 114–122 81. Segre D, BenEli D, Lancet D (2000) Compositional genomes: Prebiotic information transfer in mutually catalytic noncovalent assemblies. Proc Natl Acad Sci USA 97(8):4112–4117 82. Socci ND, Onuchic JN (1995) Folding kinetics of proteinlike heteropolymers. J Chem Phys 101(2):1519–1528 83. Speroni di Fenizio P (2000) A less abstract artficial chemistry. In: Bedau MA, McCaskill JS, Packard NH, Rasmussen S (eds) Artificial life VII. MIT Press, Cambridge, pp 49–53 84. Speroni Di Fenizio P, Dittrich P (2002) Artificial chemistry’s global dynamics. Movement in the lattice of organisation. J Three Dimens Images 16(4):160–163. ISSN 13422189 85. Stadler PF, Fontana W, Miller JH (1993) Random catalytic reaction networks. Physica D 63:378–392 86. Suzuki H (2007) Mathematical folding of node chains in a molecular network. Biosystems 87(2–3):125–135. doi:10. 1016/j.biosystems.2006.09.005 87. Suzuki K, Ikegami T (2006) Spatialpatterninduced evolution of a selfreplicating loop network. Artif Life 12(4):461–485. doi: 10.1162/artl.2006.12.4.461 88. Suzuki Y, Tanaka H (1997) Symbolic chemical system based on abstract rewriting and its behavior pattern. Artif Life Robotics 1:211–219 89. Tangen U, Schulte L, McCaskill JS (1997) A parallel hardware evolvable computer polyp. In: Pocek KL, Arnold J (eds) IEEE symposium on FPGAs for custopm computing machines. IEEE Computer Society, Los Alamitos
90. Thürk M (1993) Ein Modell zur Selbstorganisation von Automatenalgorithmen zum Studium molekularer Evolution. Ph D thesis, Universität Jena 91. Turing AM (1952) The chemical basis of morphogenesis. Phil Trans R Soc London B 237:37–72 92. Vanderzande C (1998) Lattice models of polymers. Cambridge University Press, Cambridge 93. Varela FJ, Maturana HR, Uribe R (1974) Autopoiesis: The organization of living systems. BioSystems 5(4):187–196 94. Varetto L (1993) Typogenetics: An artificial genetic system. J Theor Biol 160(2):185–205 95. Varetto L (1998) Studying artificial life with a molecular automaton. J Theor Biol 193(2):257–285 96. Vico G (1710) De antiquissima Italorum sapientia ex linguae originibus eruenda librir tres. Neapel 97. von Neumann J, Burks A (ed) (1966) The theory of selfreproducing automata. University of Illinois Press, Urbana 98. Zauner KP, Conrad M (1996) Simulating the interplay of structure, kinetics, and dynamics in complex biochemical networks. In: Hofestädt R, Lengauer T, Löffler M, Schomburg D (eds) Computer science and biology GCB’96. University of Leipzig, Leipzig, pp 336–338 99. Zeleny M (1977) Selforganization of living systems: A formal model of autopoiesis. Int J General Sci 4:13–28
Books and Reviews Adami C (1998) Introduction to artificial life. Springer, New York Dittrich P, Ziegler J, Banzhaf W (2001) Artificial chemistries – a review. Artif Life 7(3):225–275 Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge
203
204
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation BERNARD Z EIGLER1 , ALEXANDRE MUZY2 , LEVENT YILMAZ3 1 Arizona Center for Integrative Modeling and Simulation, University of Arizona, Tucson, USA 2 CNRS, Università di Corsica, Corte, France 3 Auburn University, Alabama, USA Article Outline Glossary Deﬁnition of the Subject Introduction Review of System Theory and Framework for Modeling and Simulation Fundamental Problems in M&S AIRelated Software Background AI Methods in Fundamental Problems of M&S Automation of M&S SES/Model Base Architecture for an Automated Modeler/Simulationist Intelligent Agents in Simulation Future Directions Bibliography Glossary Behavior The observable manifestation of an interaction with a system. DEVS Discrete Event System Speciﬁcation formalism describes models developed for simulation; applications include simulation based testing of collaborative services. Endomorphic agents Agents that contain models of themselves and/or of other endomorphic Agents. Levels of interoperability Levels at which systems can interoperate such as syntactic, semantic and pragmatic. The higher the level, the more eﬀective is information exchange among participants. Levels of system speciﬁcation Levels at which dynamic input/output systems can be described, known, or speciﬁed ranging from behavioral to structural. Metadata Data that describes other data; a hierarchical concept in which metadata are a descriptive abstraction above the data it describes. Modelbased automation Automation of system development and deployment that employs models or system speciﬁcations, such as DEVS, to derive artifacts. Modeling and simulation ontology The SES is interpreted as an ontology for the domain of hierarchical,
modular simulation models speciﬁed with the DEVS formalism. Netcentric environment Network Centered, typically Internetcentered or webcentered information exchange medium. Ontology Language that describes a state of the world from a particular conceptual view and usually pertains to a particular application domain. Pragmatic frame A means of characterizing the consumer’s use of the information sent by a producer; formalized using the concept of processing network model. Pragmatics Pragmatics is based on Speech Act Theory and focuses on elucidating the intent of the semantics constrained by a given context. Metadata tags to support pragmatics include Authority, Urgency/ Consequences, Relationship, Tense and Completeness. Predicate logic An expressive form of declarative language that can describe ontologies using symbols for individuals, operations, variables, functions with governing axioms and constraints. Schema An advanced form of XML document deﬁnition, extends the DTD concept. Semantics Semantics determines the content of messages in which information is packaged. The meaning of a message is the eventual outcome of the processing that it supports. Sensor Device that can sense or detect some aspect of the world or some change in such an aspect. System speciﬁcation Formalism for describing or specifying a system. There are levels of system speciﬁcation ranging from behavior to structure. Serviceoriented architecture Web service architecture in which services are designed to be (1) accessed without knowledge of their internals through welldeﬁned interfaces and (2) readily discoverable and composable. Structure The internal mechanism that produces the behavior of a system. System entity structure Ontological basis for modeling and simulation. Its pruned entity structures can describe both static data sets and dynamic simulation models. Syntax Prescribes the form of messages in which information is packaged. UML Uniﬁed Modeling Language is a software development language and environment that can be used for ontology development and has tools that map UML speciﬁcations into XML. XML eXtensible Markup Language provides a syntax for document structures containing tagged information
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Table 1 Hierarchy of system specifications Level 4 3 2
Name Coupled systems I/O System I/O Function
1
I/O Behavior
0
I/O Frame
What we specify at this level System built up by several component systems that are coupled together System with state and state transitions to generate the behavior Collection of input/output pairs constituting the allowed behavior partitioned according to the initial state the system is in when the input is applied Collection of input/output pairs constituting the allowed behavior of the system from an external Black Box view Input and output variables and ports together with allowed values
where tag deﬁnitions set up the basis for semantic interpretation. Definition of the Subject This article discusses the role of Artiﬁcial Intelligence (AI) in Modeling and Simulation (M&S). AI is the ﬁeld of computer science that attempts to construct computer systems that emulate human problem solving behavior with the goal of understanding human intelligence. M&S is a multidisciplinary ﬁeld of systems engineering, software engineering, and computer science that seeks to develop robust methodologies for constructing computerized models with the goal of providing tools that can assist humans in all activities of the M&S enterprise. Although each of these disciplines has its core community there have been numerous intersections and crossfertilizations between the two ﬁelds. From the perspective of this article, we view M&S as presenting some fundamental and very diﬃcult problems whose solution may beneﬁt from the concepts and techniques of AI. Introduction To state the M&S problems that may beneﬁt from AI we ﬁrst brieﬂy review a systemtheory based framework for M&S that provides a language and concepts to facilitate deﬁnitive problem statement. We then introduce some key problem areas: veriﬁcation and validation, reuse and composability, and distributed simulation and systems of systems interoperability. After some further review of software and AIrelated background, we go on to outline some areas of AI that have direct applicability to the just given problems in M&S. In order to provide a unifying theme for the problem and solutions, we then raise the question of whether all of M&S can be automated into an integrated autonomous artiﬁcial modeler/simulationist. We then proceed to explore an approach to developing such an intelligent agent and present a concrete means by which such an agent could engage in M&S. We close with con
sideration of an advanced feature that such an agent must have if it is to fully emulate human capability—the ability, to a limited, but signiﬁcant extent, to construct and employ models of its own “mind” as well of the “minds” of other agents. Review of System Theory and Framework for Modeling and Simulation Hierarchy of System Speciﬁcations Systems theory [1] deals with a hierarchy of system speciﬁcations which deﬁnes levels at which a system may be known or speciﬁed. Table 1 shows this Hierarchy of System Speciﬁcations (in simpliﬁed form, see [2] for full exposition). At level 0 we deal with the input and output interface of a system. At level 1 we deal with purely observational recordings of the behavior of a system. This is an I/O relation which consists of a set of pairs of input behaviors and associated output behaviors. At level 2 we have knowledge of the initial state when the input is applied. This allows partitioning the input/output pairs of level 1 into nonoverlapping subsets, each subset associated with a diﬀerent starting state. At level 3 the system is described by state space and state transition functions. The transition function describes the statetostate transitions caused by the inputs and the outputs generated thereupon. At level 4 a system is speciﬁed by a set of components and a coupling structure. The components are systems on their own with their own state set and state transition functions. A coupling structure deﬁnes how those interact. A property of a coupled system that is called “closure under coupling” guarantees that a coupled system at level 3 itself speciﬁes a system. This property allows hierarchical construction of systems, i. e., that
205
206
Artificial Intelligence in Modeling and Simulation
coupled systems can be used as components in larger coupled systems. As we shall see in a moment, the system speciﬁcation hierarchy provides a mathematical underpinning to deﬁne a framework for modeling and simulation. Each of the entities (e. g., real world, model, simulation, and experimental frame) will be described as a system known or speciﬁed at some level of speciﬁcation. The essence of modeling and simulation lies in establishing relations between pairs of system descriptions. These relations pertain to the validity of a system description at one level of speciﬁcation relative to another system description at a diﬀerent (higher, lower, or equal) level of speciﬁcation. On the basis of the arrangement of system levels as shown in Table 1, we distinguish between vertical and horizontal relations. A vertical relation is called an association mapping and takes a system at one level of speciﬁcation and generates its counterpart at another level of speciﬁcation. The downward motion in the structuretobehavior direction, formally represents the process by which the behavior of a model is generated. This is relevant in simulation and testing when the model generates the behavior which then can be compared with the desired behavior. The opposite upward mapping relates a system description at a lower level with one at a higher level of speciﬁcation. While the downward association of speciﬁcations is straightforward, the upward association is much less so. This is because in the upward direction information is introduced while in the downward direction information is reduced. Many structures exhibit the same behavior and recovering a unique structure from a given behavior is not possible. The upward direction, however, is fundamental in the design process where a structure (system at level 3) has to be found that is capable of generating the desired behavior (system at level 1). Framework for Modeling and Simulation The Framework for M&S as described in [2] establishes entities and their relationships that are central to the M&S enterprise (see Fig. 1. The entities of the Framework are: source system, model, simulator, and experimental frame; they are related by the modeling and the simulation relationships. Each entity is formally characterized as a system at an appropriate level of speciﬁcation of a generic dynamic system. Source System The source system is the real or virtual environment that we are interested in modeling. It is viewed as a source of
Artificial Intelligence in Modeling and Simulation, Figure 1 Framework entities and relationships
observable data, in the form of timeindexed trajectories of variables. The data that has been gathered from observing or otherwise experimenting with a system is called the system behavior database. These data are viewed or acquired through experimental frames of interest to the model development and user. As we shall see, in the case of model validation, these data are the basis for comparison with data generated by a model. Thus, these data must be sufﬁcient to enable reliable comparison as well as being accepted by both the model developer and the test agency as the basis for comparison. Data sources for this purpose might be measurement taken in prior experiments, mathematical representation of the measured data, or expert knowledge of the system behavior by accepted subject matter experts. Experimental Frame An experimental frame is a speciﬁcation of the conditions under which the system is observed or experimented with [3]. An experimental frame is the operational formulation of the objectives that motivate a M&S project. A frame is realized as a system that interacts with the system of interest to obtain the data of interest under speciﬁed conditions. An experimental frame speciﬁcation consists of four major subsections: Input stimuli Speciﬁcation of the class of admissible input timedependent stimuli. This is the class from which individual samples will be drawn and injected into the model or system under test for particular experiments. Control Speciﬁcation of the conditions under which the model or system will be initialized, continued under examination, and terminated.
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 2 Experimental frame and components
Metrics Speciﬁcation of the data summarization functions and the measures to be employed to provide quantitative or qualitative measures of the input/output behavior of the model. Examples of such metrics are performance indices, goodness of ﬁt criteria, and error accuracy bound. Analysis Speciﬁcation of means by which the results of data collection in the frame will be analyzed to arrive at ﬁnal conclusions. The data collected in a frame consists of pairs of input/output time functions. When an experimental frame is realized as a system to interact with the model or system under test the speciﬁcations become components of the driving system. For example, a generator of output time functions implements the class of input stimuli. An experimental frame is the operational formulation of the objectives that motivate a modeling and simulation project. Many experimental frames can be formulated for the same system (both source system and model) and the same experimental frame may apply to many systems. Why would we want to deﬁne many frames for the same system? Or apply the same frame to many systems? For the same reason that we might have diﬀerent objectives in modeling the same system, or have the same objective in modeling diﬀerent systems. There are two equally valid views of an experimental frame. One, views a frame as a deﬁnition of the type of data elements that will go into the database. The second views a frame as a system that interacts with the system of interest to obtain the data of
interest under speciﬁed conditions. In this view, the frame is characterized by its implementation as a measurement system or observer. In this implementation, a frame typically has three types of components (as shown in Fig. 2 and Fig. 3): a generator, that generates input segments to the system; acceptor that monitors an experiment to see the desired experimental conditions are met; and transducer that observes and analyzes the system output segments.
Artificial Intelligence in Modeling and Simulation, Figure 3 Experimental frame and its components
207
208
Artificial Intelligence in Modeling and Simulation
Figure 2b illustrates a simple, but ubiquitous, pattern for experimental frames that measure typical job processing performance metrics, such as round trip time and throughput. Illustrated in the web context, a generator produces service request messages at a given rate. The time that has elapsed between sending of a request and its return from a server is the round trip time. A transducer notes the departures and arrivals of requests allowing it to compute the average round trip time and other related statistics, as well as the throughput and unsatisﬁed (or lost) requests. An acceptor notes whether performance achieves the developer’s objectives, for example, whether the throughput exceeds the desired level and/or whether say 99% of the round trip times are below a given threshold. Objectives for modeling relate to the role of the model in systems design, management or control. Experimental frames translate the objectives into more precise experimentation conditions for the source system or its models. We can distinguish between objectives concerning those for veriﬁcation and validation of (a) models and (b) systems. In the case of models, experimental frames translate the objectives into more precise experimentation conditions for the source system and/or its models. A model under test is expected to be valid for the source system in each such frame. Having stated the objectives, there is presumably a best level of resolution to answer these questions. The more demanding the questions, the greater the resolution likely to be needed to answer them. Thus, the choice of appropriate levels of abstraction also hinges on the objectives and their experimental frame counterparts. In the case of objectives for veriﬁcation and validation of systems, we need to be given, or be able to formulate, the requirements for the behavior of the system at the IO behavior level. The experimental frame then is formulated to translate these requirements into a set of possible experiments to test whether the system actually performs its required behavior. In addition we can formulate measures of the eﬀectiveness (MOE) of a system in accomplishing its goals. We call such measures, outcome measures. In order to compute such measures, the system must expose relevant variables, we’ll call these output variables, whose values can be observed during execution runs of the system.
crete Event System Speciﬁcation (DEVS) formalism delineates the subclass of discrete event systems and it can also represent the systems speciﬁed within traditional formalisms such as diﬀerential (continuous) and diﬀerence (discrete time) equations [4]. In DEVS, as in systems theory, a model can be atomic, i. e., not further decomposed, or coupled, in which case it consists of components that are coupled or interconnected together.
Model
Validation and Veriﬁcation
A model is a system speciﬁcation, such as a set of instructions, rules, equations, or constraints for generating input/output behavior. Models may be expressed in a variety of formalisms that may be understood as means for specifying subclasses of dynamic systems. The Dis
The basic concepts of veriﬁcation and validation (V&V) have been described in diﬀerent settings, levels of details, and points of views and are still evolving. These concepts have been studied by a variety of scientiﬁc and engineering disciplines and various ﬂavors of validation and
Simulator A simulator is any computation system (such as a single processor, or a processor network, or more abstractly an algorithm), capable of executing a model to generate its behavior. The more general purpose a simulator is the greater the extent to which it can be conﬁgured to execute a variety of model types. In order of increasing capability, simulators can be: Dedicated to a particular model or small class of similar models Capable of accepting all (practical) models from a wide class, such as an application domain (e. g., communication systems) Restricted to models expressed in a particular modeling formalism, such as continuous diﬀerential equation models Capable of accepting multiformalism models (having components from several formalism classes, such as continuous and discrete event). A simulator can take many forms such as on a single computer or multiple computers executing on a network. Fundamental Problems in M&S We have now reviewed a systemtheorybased framework for M&S that provides a language and concepts in which to formulate key problems in M&S. Next on our agenda is to discuss problem areas including: veriﬁcation and validation, reuse and composability, and distributed simulation and systems of systems interoperability. These are challenging, and heretofore, unsolved problems at the core of the M&S enterprise.
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 4 Basic approach to model validation
veriﬁcation concepts and techniques have emerged from a modeling and simulation perspective. Within the modeling and simulation community, a variety of methodologies for V&V have been suggested in the literature [5,6,7]. A categorization of 77 veriﬁcation, validation and testing techniques along with 15 principles has been oﬀered to guide the application of these techniques [8]. However, these methods vary extensively – e. g., alpha testing, induction, cause and eﬀect graphing, inference, predicate calculus, proof of correctness, and user interface testing and are only loosely related to one another. Therefore, such a categorization can only serve as an informal guideline for the development of a process for V&V of models and systems. Validation and veriﬁcation concepts are themselves founded on more primitive concepts such as system speciﬁcations and homomorphism as discussed in the framework of M&S [2]. In this framework, the entities system, experimental frame, model, simulator take on real importance only when properly related to each other. For example, we build a model of a particular system for some objective and only some models, and not others, are suitable. Thus, it is critical to the success of a simulation modeling eﬀort that certain relationships hold. Two of the most important are validity and simulator correctness. The basic modeling relation, validity, refers to the relation between a model, a source system and an experimental frame. The most basic concept, replicative validity, is aﬃrmed if, for all the experiments possible within the experimental frame, the behavior of the model and system
agree within acceptable tolerance. The term accuracy is often used in place of validity. Another term, ﬁdelity, is often used for a combination of both validity and detail. Thus, a high ﬁdelity model may refer to a model that is both highly detailed and valid (in some understood experimental frame). However, when used this way, the assumption seems to be that high detail alone is needed for high ﬁdelity, as if validity is a necessary consequence of high detail. In fact, it is possible to have a very detailed model that is nevertheless very much in error, simply because some of the highly resolved components function in a diﬀerent manner than their real system counterparts. The basic approach to model validation is comparison of the behavior generated by a model and the source system it represents within a given experimental frame. The basis for comparison serves as the reference Fig. 4 against which the accuracy of the model is measured. The basic simulation relation, simulator correctness, is a relation between a simulator and a model. A simulator correctly simulates a model if it is guaranteed to faithfully generate the model’s output behavior given its initial state and its input trajectory. In practice, as suggested above, simulators are constructed to execute not just one model but also a family of possible models. For example, a network simulator provides both a simulator and a class of network models it can simulate. In such cases, we must establish that a simulator will correctly execute the particular class of models it claims to support. Conceptually, the approach to testing for such execution, illustrated in Fig. 5, is
209
210
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 5 Basic approach to simulator verification
Artificial Intelligence in Modeling and Simulation, Figure 6 Basic approach to system validation
to perform a number of test cases in which the same model is provided to the simulator under test and to a “gold standard simulator” which is known to correctly simulate the model. Of course such test case models must lie within the class supported by the simulat or under test as well as presented in the form that it expects to receive them. Comparison of the output behaviors in the same manner as with model validation is then employed to check the agreement between the two simulators.
If the speciﬁcations of both the simulator and the model are available in separated form where each can be accessed independently, it may be possible to prove correctness mathematically. The case of system validation is illustrated in Fig. 6. Here the system is considered as a hardware and/or software implementation to be validated against requirements for its input/output behavior. The goal is to develop test models that can stimulate the implemented system with
Artificial Intelligence in Modeling and Simulation
inputs and can observe its outputs to compare them with those required by the behavior requirements. Also shown is a dotted path in which a reference model is constructed that is capable of simulation execution. Construction of such a reference model is more diﬃcult to develop than the test models since it requires not only knowing in advance what output to test for, but to actually to generate such an output. Although such a reference model is not required, it may be desirable in situations in which the extra cost of development is justiﬁed by the additional range of tests that might be possible and the consequential increased coverage this may provide.
ferent components. For systems composed of models with dynamics that are intrinsically heterogeneous, it is crucial to use multiple modeling formalisms to describe them. However, combining diﬀerent model types poses a variety of challenges [9,12,13]. Sarjoughian [14], introduced an approach to multiformalism modeling that employs an interfacing mechanism called a Knowledge Interchange Broker to compose model components expressed in diverse formalisms. The KIB supports translation from the semantics of one formalism into that of a second to ensure coordinated and correct execution simulation algorithms of distinct modeling formalisms.
Model Reuse and Composability
Distributed Simulation and System of Systems Interoperability
Model reuse and composability are two sides of the same coin—it is patently desirable to reuse models, the fruits of earlier or others work. However, typically such models will become components in a larger composite model and must be able to interact meaningfully with them. While software development disciplines are successfully applying a componentbased approach to build software systems, the additional systems dynamics involved in simulation models has resisted straight forward reuse and composition approaches. A model is only reusable to the extent that its original dynamic systems assumptions are consistent with the constraints of the new simulation application. Consequently, without contextual information to guide selection and refactoring, a model may not be reused to advantage within a new experimental frame. Davis and Anderson [9] argue that to foster such reuse, model representation methods should distinguish, and separately specify, the model, simulator, and the experimental frame. However, Yilmaz and Oren [10] pointed out that more contextual information is needed beyond the information provided by the set of experimental frames to which a model is applicable [11], namely, the characterization of the context in which the model was constructed. These authors extended the basic modelsimulatorexperimental frame perspective to emphasize the role of context in reuse. They make a sharp distinction between the objective context within which a simulation model is originally deﬁned and the intentional context in which the model is being qualiﬁed for reuse. They extend the system theoretic levels of speciﬁcation discussed earlier to deﬁne certain behavioral model dependency relations needed to formalize conceptual, realization, and experimental aspects of context. As the scope of simulation applications grows, it is increasingly the case that more than one modeling paradigm is needed to adequately express the dynamics of the dif
The problems of model reuse and composability manifest themselves strongly in the context of distributed simulation where the objective is to enable existing geographically dispersed simulators to meaningfully interact, or federate, together. We brieﬂy review experience with interoperability in the distributed simulation context and a linguistically based approach to the System of Systems (SoS) interoperability problem [15]. Sage and Cuppan [16] drew the parallel between viewing the construction of SoS as a federation of systems and the federation that is supported by the High Level Architecture (HLA), an IEEE standard fostered by the DoD to enable composition of simulations [17,18]. HLA is a network middleware layer that supports message exchanges among simulations, called federates, in a neutral format. However, experience with HLA has been disappointing and forced acknowledging the difference between technical interoperability and substantive interoperability [19]. The ﬁrst only enables heterogeneous simulations to exchange data but does not guarantee the second, which is the desired outcome of exchanging meaningful data, namely, that coherent interaction among federates takes place. Tolk and Muguirra [20] introduced the Levels of Conceptual Interoperability Model (LCIM) which identiﬁed seven levels of interoperability among participating systems. These levels can be viewed as a reﬁnement of the operational interoperability type which is one of three deﬁned by [15]. The operational type concerns linkages between systems in their interactions with one another, the environment, and with users. The other types apply to the context in which systems are constructed and acquired. They are constructive—relating to linkages between organizations responsible for system construction and programmatic—linkages between program oﬃces to manage system acquisition.
211
212
Artificial Intelligence in Modeling and Simulation
AIRelated Software Background To proceed to the discussion of the role of AI in addressing key problems in M&S, we need to provide some further software and AIrelated background. We oﬀer a brief historical account of objectorientation and agentbased systems as a springboard to discuss the upcoming concepts of object frameworks, ontologies and endomorphic agents. ObjectOrientation and AgentBased Systems Many of the software technology advances of the last 30 years have been initiated from the ﬁeld of M&S. Objects, as code modules with both structure and behavior, were ﬁrst introduced in the SIMULA simulation language [21]. Objects blossomed in various directions and became institutionalized in the widely adopted programming language C++ and later in the infrastructure for the web in the form of Java [22] and its variants. The freedom from straightline procedural programming that objectorientation championed was taken up in AI in two directions: various forms of knowledge representation and of autonomy. Rulebased systems aggregate modular ifthen logic elements—the rules—that can be activated in some form of causal sequence (inference chains) by an execution engine [23]. In their passive state, rules represent static discrete pieces of inferential logic, called declarative knowledge. However, when activated, a rule inﬂuences the state of the computation and the activation of subsequent rules, providing the system a dynamic or procedural, knowledge characteristic as well. Framebased systems further expanded knowledge representation ﬂexibility and inferencing capability by supporting slots and constraints on their values—the frames—as well as their taxonomies based on generalization/specialization relationships [24]. Convergence with objectorientation became apparent in that frames could be identiﬁed as objects and their taxonomic organization could be identiﬁed with classes within objectstyle organizations that are based on subclass hierarchies. On the other hand, the modular nature of objects together with their behavior and interaction with other objects, led to the concept of agents which embodied increased autonomy and selfdetermination [25]. Agents represent individual threads of computation and are typically deployable in distributed form over computer networks where they interact with local environments and communicate/coordinate with each other. A wide variety of agent types exists in large part determined by the variety and sophistication of their processing capacities—ranging from agents that simply gather information on packet trafﬁc in a network to logicbased entities with elaborate rea
soning capacity and authority to make decisions (employing knowledge representations just mentioned.) The step from agents to their aggregates is natural, thus leading to the concept of multiagent systems or societies of agents, especially in the realm of modeling and simulation [26]. To explore the role of AI in M&S at the present, we project the justgiven historical background to the concurrent concepts of object frameworks, ontologies and endomorphic agents. The Uniﬁed Modeling Language (UML) is gaining a strong foothold as the defacto standard for objectbased software development. Starting as a diagrammatical means of software system representation, it has evolved to a formally speciﬁed language in which the fundamental properties of objects are abstracted and organized [27,28]. Ontologies are models of the world relating to speciﬁc aspects or applications that are typically represented in framebased languages and form the knowledge components for logical agents on the Semantic Web [29]. A convergence is underway that reenforces the commonality of the objectbased origins of AI and software engineering. UML is being extended to incorporate ontology representations so that software systems in general will have more explicit models of their domains of operation. As we shall soon see, endomorphic agents refer to agents that include abstractions of their own structure and behavior within their ontologies of the world [30]. The M&S Framework Within Uniﬁed Modeling Language (UML) With objectorientation as uniﬁed by UML and some background in agentbased systems, we are in a position to discuss the computational realization of the M&S framework discussed earlier. The computational framework is based on the Discrete Event System Speciﬁcation (DEVS) formalism and implemented in various objectoriented environments. Using UML we can represent the framework as a set of classes and relations as illustrated in Figs. 7 and 8. Various software implementations of DEVS support diﬀerent subsets of the classes and relations. In particular, we mention a recent implementation of DEVS within a ServiceOriented Architecture (SOA) environment called DEVS/SOA [31,32]. This implementation exploits some of the beneﬁts aﬀorded by the web environment mentioned earlier and provides a context for consideration of the primary target of our discussion, comprehensive automation of the M&S enterprise. We use one of the UML constructs, the use case diagram, to depict the various capabilities that would be involved in automating all or parts of the M&S enterprise. Use cases are represented by ovals that connect to at least
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 7 M&S framework formulated within UML
Artificial Intelligence in Modeling and Simulation, Figure 8 M&S framework classes and relations in a UML representation
one actor (stick ﬁgure) and to other use cases through “includes” relations, shown as dotted arrows. For example, a sensor (actor) collects data (use case) which includes storage of data (use case). A memory actor stores and
retrieves models which include storage and retrieval (respectively) of data. Constructing models includes retrieving stored data within an experimental frame. Validating models includes retrieving models from memory as com
213
214
Artificial Intelligence in Modeling and Simulation
ponents and simulating the composite model to generate data within an experimental frame. The emulatorsimulator actor does the simulating to execute the model so that its generated behavior can be matched against the stored data in the experimental frame. The objectives of the human modeler drive the model evaluator and hence the choice of experimental frames to consider as well as models to validate. Models can be used in (at least) two time frames [33]. In the long term, they support planning of actions to be taken in the future. In the short term, models support execution and control in realtime of previously planned actions. AI Methods in Fundamental Problems of M&S The enterprise of modeling and simulation is characterized by activities such as model, simulator and experimental frame creation, construction, reuse, composition, veriﬁcation and validation. We have seen that valid model construction requires signiﬁcant expertise in all the components of the M&S enterprise, e. g., modeling formalisms, simulation methods, and domain understanding and knowledge. Needless to say, few people can bring all such elements to the table, and this situation creates a signiﬁcant bottleneck to progress in such projects. Among the contributing factors are lack of trained personnel that must be brought in, expense of such high capability experts, and the time needed to construct models to the resolution required for most objectives. This section introduces some AIrelated technologies that can ameliorate this situation: ServiceOriented Architecture and Seman
Artificial Intelligence in Modeling and Simulation, Figure 9 UML use case formulation of the overall M&S enterprise
tic Web, ontologies, constrained natural language capabilities, and genetic algorithms. Subsequently we will consider these as components in uniﬁed, comprehensive, and autonomous automation of M&S. ServiceOriented Architecture and Semantic Web On the World Wide Web, a ServiceOriented Architecture is a market place of open and discoverable webservices incorporating, as they mature, Semantic Web technologies [34]. The eXtensible Markup Language (XML) is the standard format for encoding data sets and there are standards for sending and receiving XML [35]. Unfortunately, the problem just starts at this level. There are myriad ways, or Schemata, to encode data into XML and a good number of such Schemata have already been developed. More often than not, they are diﬀerent in detail when applied to the same domains. What explains this incompatibility? In a ServiceOriented Architecture, the producer sends messages containing XML documents generated in accordance with a schema. The consumer receives and interprets these messages using the same schema in which they were sent. Such a message encodes a world state description (or changes in it) that is a member of a set delineated by an ontology. The ontology takes into account the pragmatic frame, i. e., a description of how the information will be used in downstream processing. In a SOA environment, data dissemination may be dominated by “user pull of data,” incremental transmission, discovery using metadata, and automated retrieval of data to meet user pragmatic frame speciﬁcations. This is the SOA con
Artificial Intelligence in Modeling and Simulation
cept of datacentered, interfacedriven, loose coupling between producers and consumers. The SOA concept requires the development of platformindependent, communityaccepted, standards that allow raw data to be syntactically packaged into XML and accompanied by metadata that describes the semantic and pragmatic information needed to eﬀectively process the data into increasingly highervalue products downstream. Artificial Intelligence in Modeling and Simulation, Figure 10 Interoperability levels in distributed simulation
Ontologies Semantic Web researchers typically seek to develop intelligent agents that can draw logical inferences from diverse, possibly contradictory, ontologies such as a web search might discover. Semantic Web research has led to a focus on ontologies [34]. These are logical languages that provide a common vocabulary of terms and axiomatic relations among them for a subject area. In contrast, the newly emerging area of ontology integration assumes that human understanding and collaboration will not be replaced by intelligent agents. Therefore, the goal is to create concepts and tools to help people develop practical solutions to incompatibility problems that impede “eﬀective” exchange of data and ways of testing that such solutions have been correctly implemented. As illustrated in Fig. 10, interoperability of systems can be considered at three linguisticallyinspired levels: syntactic, semantic, and pragmatic. The levels are summarized in Table 2. More detail is provided in [36]. Constrained Natural Language Model development can be substantially aided by enabling users to specify modeling constructs using some form of
constrained natural language [37]. The goal is to overcome modeling complexity by letting users with limited or nonexistent formal modeling or programming background convey essential information using natural language, a form of expression that is natural and intuitive. Practicality demands constraining the actual expressions that can be used so that the linguistic processing is tractable and the input can be interpreted unambiguously. Some techniques allow the user to narrow down essential components for model construction. Their goal is to reduce ambiguity between the user’s requirements and essential model construction components. A natural language interface allows model speciﬁcation in terms of a verb phrase consisting of a verb, noun, and modiﬁer, for example “build car quickly.” Conceptual realization of a model from a verb phrase ties in closely with Checkland’s [38] insight that an appropriate verb should be used to express the root deﬁnition, or core purpose, of a system. The main barrier between many people and existing modeling software is their lack of computer literacy and this provides an incentive to develop natural language interfaces as a means of bridging this gap. Natural language
Artificial Intelligence in Modeling and Simulation, Table 2 Linguistic levels Linguistic Level Pragmatic – how information in messages is used Semantic – shared understanding of meaning of messages
A collaboration of systems or services interoperates at this level if: The receiver reacts to the message in a manner that the sender intends
The receiver assigns the same meaning as the sender did to the message.
Syntactic – common The consumer is able to receive and rules governing parse the sender’s message composition and transmitting of messages
Examples An order from a commander is obeyed by the troops in the field as the commander intended. A necessary condition is that the information arrives in a timely manner and that its meaning has been preserved (semantic interoperability) An order from a commander to multi national participants in a coalition operation is understood in a common manner despite translation into different languages. A necessary condition is that the information can be unequivocally extracted from the data (syntactic interoperability) A common network protocol (e. g. IPv4) is employed ensuring that all nodes on the network can send and receive data bit arrays adhering to a prescribed format.
215
216
Artificial Intelligence in Modeling and Simulation
expression could create modelers out of people who think semantically, but do not have the requisite computer skills to express these ideas. A semantic representation frees the user to explore the system on the familiar grounds of natural language and opens the way for brain storming, innovation and testing of models before they leave the drawing board. Genetic Algorithms The genetic algorithm is a subset of evolutionary algorithms that model biological processes to search in highly complex spaces. A genetic algorithm (GA) allows a population composed of many individuals to evolve under speciﬁed selection rules to a state that maximizes the “ﬁtness.” The theory was developed by John Holland [39] and popularized by Goldberg who was able to solve a diﬃcult problem involving the control of gas pipeline transmission for his dissertation [40]. Numerous applications of GAs have since been chronicled [41,42]. Recently, GAs have been applied to cuttingedge problems in automated construction of simulation models, as discussed below [43]. Automation of M&S We are now ready to suggest a unifying theme for the problems in M&S and possible AIbased solutions, by raising the question of whether all of M&S can be automated into an integrated autonomous artiﬁcial modeler/simulationist. First, we provide some background needed to explore an approach to developing such an intelligent agent based on the System Entity Structure/Model Base framework, a hybrid methodology that combines elements of AI and M&S. System Entity Structure The System Entity Structure (SES) concepts were ﬁrst presented in [44]. They were subsequently extended and implemented in a knowledgebased design environment [45]. Application to model base management originated with [46]. Subsequent formalizations and implementations were developed in [47,48,49,50,51]. Applications to various domains are given in [52]. A System Entity Structure is a knowledge representation formalism which focuses on certain elements and relationships that relate to M&S. Entities represent things that exist in the real world or sometimes in an imagined world. Aspects represent ways of decomposing things into more ﬁne grained ones. Multiaspects are aspects for which the components are all of one kind. Specializations represent categories or families of speciﬁc forms
that a thing can assume provides the means to represent a family of models as a labeled tree. Two of its key features are support for decomposition and specialization. The former allows decomposing a large system into smaller systems. The latter supports representation of alternative choices. Specialization enables representing a generic model (e. g., a computer display model) as one of its specialized variations (e. g., a ﬂat panel display or a CRT display). On the basis of SES axiomatic speciﬁcations, a family of models (designspace) can be represented and further automatically pruned to generate a simulation model. Such models can be systematically studied and experimented with based on alternative design choices. An important, salient feature of SES is its ability to represent models not only in terms of their decomposition and specialization, but also their aspects. The SES represents alternative decompositions via aspects. The system entity structure (SES) formalism provides an operational language for specifying such hierarchical structures. An SES is a structural knowledge representation scheme that systematically organizes a family of possible structures of a system. Such a family characterizes decomposition, coupling, and taxonomic relationships among entities. An entity represents a real world object. The decomposition of an entity concerns how it may be broken down into subentities. In addition, coupling speciﬁcations tell how subentities may be coupled together to reconstitute the entity and be associated with an aspect. The taxonomic relationship concerns admissible variants of an entity. The SES/ModelBase framework [52] is a powerful means to support the plangenerateevaluate paradigm in systems design. Within the framework, entity structures organize models in a model base. Thus, modeling activity within the framework consists of three subactivities: speciﬁcation of model composition structure, speciﬁcation of model behavior, and synthesis of a simulation model. The SES is governed by an axiomatic framework in which entities alternate with the other items. For example, a thing is made up of parts; therefore, its entity representation has a corresponding aspect which, in turn, has entities representing the parts. A System Entity Structure speciﬁes a family of hierarchical, modular simulation models, each of which corresponds to a complete pruning of the SES. Thus, the SES formalism can be viewed as an ontology with the set of all simulation models as its domain of discourse. The mapping from SES to the Systems formalism, particularly to the DEVS formalism, is discussed in [36]. We note that simulation models include both static and dynamic elements in any application domain, hence represent an advanced form of ontology framework.
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 11 SES/Model base architecture for automated M&S
SES/Model Base Architecture for an Automated Modeler/Simulationist In this section, we raise the challenge of creating a fully automated modeler/simulationist that can autonomously carry out all the separate functions identiﬁed in the M&S framework as well as the high level management of these functions that is currently under exclusively human control. Recall the use case diagrams in Fig. 9 that depict the various capabilities that would need to be involved in realizing a completely automated modeler/simulationist. To link up with the primary modules of mind, we assign model construction to the belief generator—interpreting beliefs as models [54]. Motivations outside the M&S component drive the belief evaluator and hence the choice of experimental frames to consider as well as models to validate. External desire generators stimulate the imaginer/envisioner to run models to make predictions within pragmatic frames and assist in action planning. The use case diagram of Fig. 9, is itself a model of how modeling and simulation activities may be carried out within human minds. We need not be committed to particular details at this early stage, but will assume that such a model can be reﬁned to provide a useful representation of human mental activity from the perspective of M&S. This provides the basis for examining how such an arti
ﬁcially intelligent modeler/simulationist might work and considering the requirements for comprehensive automation of M&S. The SES/MB methodology, introduced earlier, provides a basis for formulating a conceptual architecture for automating all of the M&S activities depicted earlier in Fig. 9 into an integrated system. As illustrated in Fig. 11, the dichotomy into realtime use of models for acting in the real world and the longerterm development of models that can be employed in such realtime activities is manifested in the distinction between passive and active models. Passive models are stored in a repository that can be likened to longterm memory. Such models under go the lifecycle mentioned earlier in which they are validated and employed within experimental frames of interest for longterm forecasting or decisionmaking. However, in addition to this quite standard concept of operation, there is an input pathway from the short term, or working, memory in which models are executed in realtime. Such execution, in realworld environments, often results in deﬁciencies, which provide impetus and requirements for instigating new model construction. Whereas longterm model development and application objectives are characterized by experimental frames, the shortterm execution objectives are characterized by pragmatic frames. As discussed in [36], a pragmatic frame provides a means of
217
218
Artificial Intelligence in Modeling and Simulation
characterizing the use for which an executable model is being sought. Such models must be simple enough to execute within, usually, stringent realtime deadlines. The M&S Framework Within Mind Architecture An inﬂuential formulation of recent work relating to mind and brain [53] views mind as the behavior of the brain, where mind is characterized by a massively modular architecture. This means that mind is, composed of a large number of modules, each responsible for diﬀerent functions, and each largely independent and sparsely connected with others. Evolution is assumed to favor such differentiation and specialization since under suitably weakly interactive environments they are less redundant and more eﬃcient in consuming space and time resources. Indeed, this formulation is reminiscent of the class of problems characterized by Systems of Systems (SoS) in which the attempt is made to integrate existing systems, originally built to perform speciﬁc functions, into a more comprehensive and multifunctional system. As discussed in [15], the components of each system can be viewed as communicating with each other within a common ontology, or model of the world that is tuned to the smooth functioning of the organization. However, such ontologies may well be mismatched to support integration at the systems level. Despite working on the results of a long history of prehuman evolution, the fact that consciousness seems to provide a uniﬁed and undiﬀerentiated picture of mind, suggests that human evolution has to a large extent solved the SoS integration problem. The Activity Paradigm for Automated M&S At least for the initial part of its life, a modeling agent needs to work on a “ﬁrst order” assumption about its environment, namely, that it can exploit only semanticsfree properties [39]. Regularities, such as periodic behaviors, and stimulusresponse associations, are one source of such semanticsfree properties. In this section, we will focus on a fundamental property of systems, such as the brain’s neural network, that have a large number of components. This distribution of activity of such a system over space and time provides a rich and semanticsfree substrate from which models can be generated. Proposing structures and algorithms to track and replicate this activity should support automated modeling and simulation of patterns of reality. The goal of the activity paradigm is to extract mechanisms from natural phenomena and behaviors to automate and guide the M&S process. The brain oﬀers a quintessential illustration of activity and its potential use to construct models. Figure 12
Artificial Intelligence in Modeling and Simulation, Figure 12 Brain module activities
describes brain electrical activities [54]. Positron emission tomography (PET) is used to record electrical activity inside the brain. The PET method scans show what happens inside the brain when resting and when stimulated by words and music. The red areas indicate high brain activities. Language and music produce responses in opposite sides of the brain (showing the subsystem specializations). There are many levels of activity (ranging from low to high.) There is a strong link between modularity and the applicability of activity measurement as a useful concept. Indeed, modules represent loci for activity—a distribution of activity would not be discernable over a network were there no modules that could be observed to be in diﬀerent states of activity. As just illustrated, neuroscientists are exploiting this activity paradigm to associate brain areas with functions and to gain insight into areas that are active or inactive over diﬀerent, but related, functions, such as language and music processing. We can generalize this approach as a paradigm for an automated modeling agent. Component, Activity and DiscreteEvent Abstractions Figure 13 depicts a brain description through components and activities. First, the modeler considers one brain activity (e. g., listening music.) This ﬁrst level activity corresponds to simulation components at a lower level. At this level, activity of the components can be considered to be only on (grey boxes) or oﬀ (white boxes.) At lower levels, structure and behaviors can be decomposed. The structure of the higher component level is detailed at lower levels (e. g., to the neuronal network). The behavior is also detailed through activity and discreteevent abstraction [55]. At the ﬁnest levels, activity of components can be detailed.
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 13 Hierarchy of components, activity and discreteevent abstractions
Using the pattern detection and quantization methods continuously changing variables can also be treated within the activity paradigm. As illustrated in Fig. 14, small slopes and small peaks can signal low activity whereas high slopes and peaks can signal high activity levels. To provide for scale, discrete event abstraction can be achieved using quantization [56]. To determine the activity level, a quantum or measure of signiﬁcant change has to be chosen. The quantum size acts as a ﬁlter on the continuous ﬂow. For example, one can notice that in the ﬁgure, using the displayed quantum, the smallest peaks will not be signiﬁcant. Thus, diﬀerent levels of resolution can be achieved by employing diﬀerent quantum sizes. A genetic algorithm can be used to ﬁnd the optimum such level of resolution given for a given modeling objective [43]. Activity Tracking Within the activity paradigm, M&S consists of capturing activity paths through component processing and transformations. To determine the basic structure of the whole system, an automated modeler has to answer questions of the form: where and how is activity produced, received, and transmitted? Figure 15 represents a componentbased view of activity ﬂow in a neuronal network. Activity paths through components are represented by full arrows. Ac
Artificial Intelligence in Modeling and Simulation, Figure 14 Activity sensitivity and discreteevents
Artificial Intelligence in Modeling and Simulation, Figure 15 Activity paths in neurons
tivity is represented by full circles. Components are represented by squares. The modeler must generate such a graph based on observed data—but how does it obtain such data? One approach is characteristic of current use of PET scans by neuroscientists. This approach exploits a relationship between activity and energy—activity requires consumption of energy, therefore, observing areas of high energy consumption signals areas of high activity. Notice that this correlation requires that energy consumption be
219
220
Artificial Intelligence in Modeling and Simulation
localizable to modules in the same way that activity is so localized. So for example, current computer architectures that provide a single power source to all components do not lend themselves to such observation. How activity is passed on from component to component can be related to the modularity styles (none, weak, strong) of the components. Concepts relating such modularity styles to activity transfer need to be developed to support an activity tracking methodology that goes beyond reliance on energy correlation. Activity Model Validation Recall that having generated a model of an observed system (whether through activity tracking or by other means), the next step is validation. In order to perform such validation, the modeler needs an approach to generating activity proﬁles from simulation experiments on the model and to comparing these proﬁles with those observed in the real system. Muzy and Nutaro [57] have developed algorithms that exploit activity tracking to achieve eﬃcient simulation of DEVS models. These algorithms can be
Artificial Intelligence in Modeling and Simulation, Figure 16 Evolution of the use of intelligent agents in simulation
adopted to provide an activity tracking pattern applicable to a given simulation model to extract its activity proﬁles for comparison with those of the modeled system. A forthcoming monograph will develop the activity paradigm for M&S in greater detail [58]. Intelligent Agents in Simulation Recent trends in technology as well as the use of simulation in exploring complex artiﬁcial and natural information processes [62,63] have made it clear that simulation model ﬁdelity and complexity will continue to increase dramatically in the coming decades. The dynamic and distributed nature of simulation applications, the signiﬁcance of exploratory analysis of complex phenomena [64], and the need for modeling the microlevel interactions, collaboration, and cooperation among realworld entities is bringing a shift in the way systems are being conceptualized. Using intelligent agents in simulation models is based on the idea that it is possible to represent the behavior of active entities in the world in terms of the interactions of an assembly of agents with their own operational autonomy.
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 17 Agentdirected simulation framework
The early pervading view on the use of agents in simulation stems from the developments in Distributed Artiﬁcial Intelligence (DAI), as well as advances in agent architectures and agentoriented programming. The DAI perspective to modeling systems in terms of entities that are capable of solving problems by means of reasoning through symbol manipulation resulted in various technologies that constitute the basic elements of agent systems. The early work on design of agent simulators within the DAI community focused on answering the question of how goals and intentions of agents emerge and how they lead to execution of actions that change the state of their environment. The agentdirected approach to simulating agent systems lies at the intersection of several disciplines: DAI, Control Theory, Complex Adaptive Systems (CAS), and Discreteevent Systems/Simulation. As shown in Fig. 16, these core disciplines gave direction to technology, languages, and possible applications, which then inﬂuenced the evolution of the synergy between simulation and agent systems. Distributed Artiﬁcial Intelligence and Simulation While progress in agent simulators and interpreters resulted in various agent architectures and their computational engines, the ability to coordinate agent ensembles was recognized early as a key challenge [65]. The MACE system [66] is considered as one of the major milestones in DAI. Speciﬁcally, the proposed DAI system integrated concepts from concurrent programming (e. g., actor formalism [67]) and knowledge representation to symbolically reason about skills and beliefs pertaining to modeling the environment. Task allocation and coordination were considered as fundamental challenges in early DAI systems. The contract net protocol developed by [68] pro
vided the basis for modeling collaboration in simulation of distributed problem solving. Agent Simulation Architectures One of the ﬁrst agentoriented simulation languages, AGENT0 [69], provided a framework that enabled the representation of beliefs and intentions of agents. Unlike objectoriented simulation languages such as SIMULA 67 [70], the ﬁrst objectoriented language for specifying discreteevent systems, AGENTO and McCarthy’s Elephant2000 language incorporated speech act theory to provide ﬂexible communication mechanisms for agents. DAI and cognitive psychology inﬂuenced the development of cognitive agents such as those found in AGENT0, e. g., the BeliefDesiresIntentions (BDI) framework [71]. However, procedural reasoning and control theory provided a basis for the design and implementation of reactive agents. Classical control theory enables the speciﬁcation of a mathematical model that describes the interaction of a control system and its environment. The analogy between an agent and control system facilitated the formalization of agent interactions in terms of a formal speciﬁcation of dynamic systems. The shortcomings of reactive agents (i. e., lack of mechanisms of goaldirected behavior) and cognitive agents (i. e., issues pertaining to computational tractability in deliberative reasoning) led to the development of hybrid architectures such as the RAP system [72]. Agents are often viewed as design metaphors in the development of models for simulation and gaming. Yet, this narrow view limits the potential of agents in improving various other dimensions of simulation. To this end, Fig. 17 presents a uniﬁed paradigm of AgentDirected Simulation that consists of two categories as follows:
221
222
Artificial Intelligence in Modeling and Simulation
(1) Simulation for Agents (agent simulation), i. e., simulation of systems that can be modeled by agents (in engineering, human and social dynamics, military applications etc.) and (2) Agents for Simulation that can be grouped under two groups: agentsupported simulation and agentbased simulation.
tasks. Also, agentbased simulation is useful for having complex experiments and deliberative knowledge processing such as planning, deciding, and reasoning. Agents are also critical enablers to improve composability and interoperability of simulation models [73]. AgentSupported Simulation
Agent Simulation Agent simulation involves the use of agents as design metaphors in developing simulation models. Agent simulation involves the use of simulation conceptual frameworks (e. g., discreteevent, activity scanning) to simulate the behavioral dynamics of agent systems and incorporate autonomous agents that function in parallel to achieve their goals and objectives. Agents possess highlevel interaction mechanisms independent of the problem being solved. Communication protocols and mechanisms for interaction via task allocation, coordination of actions, and conﬂict resolution at varying levels of sophistication are primary elements of agent simulations. Simulating agent systems requires understanding the basic principles, organizational mechanisms, and technologies underlying such systems. AgentBased Simulation Agentbased simulation is the use of agent technology to monitor and generate model behavior. This is similar to the use of AI techniques for the generation of model behavior (e. g., qualitative simulation and knowledgebased simulation). Development of novel and advanced simulation methodologies such as multisimulation suggests the use of intelligent agents as simulator coordinators, where runtime decisions for model staging and updating takes place to facilitate dynamic composability. The perception feature of agents makes them pertinent for monitoring
Artificial Intelligence in Modeling and Simulation, Figure 18 M&S within mind
Agentsupported simulation deals with the use of agents as a support facility to enable computer assistance by enhancing cognitive capabilities in problem speciﬁcation and solving. Hence, agentsupported simulation involves the use of intelligent agents to improve simulation and gaming infrastructures or environments. Agentsupported simulation is used for the following purposes: to provide computer assistance for frontend and/or backend interface functions; to process elements of a simulation study symbolically (for example, for consistency checks and builtin reliability); and to provide cognitive abilities to the elements of a simulation study, such as learning or understanding abilities. For instance, in simulations with defense applications, agents are often used as support facilities to see the battleﬁeld, fuse, integrate and deconﬂict the information presented by the decisionmaker, generate alarms based on the recognition of speciﬁc patterns, Filter, sort, track, and prioritize the disseminated information, and generate contingency plans and courses of actions. A signiﬁcant requirement for the design and simulation of agent systems is the distributed knowledge that represents
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 19 Emergence of endomorphic agents
the mental model that characterizes each agent’s beliefs about the environment, itself, and other agents. Endomorphic agent concepts provide a framework for addressing the diﬃcult conceptual issues that arise in this domain. Endomorphic Agents We now consider an advanced feature that an autonomous, integrated and comprehensive modeler/simulationist agent must have if it is fully emulate human capability. This is the ability, to a limited, but signiﬁcant extent, to construct and employ models of its own mind as well of the minds of other agents. We use the term “mind” in the sense just discussed. The concept of an endomorphic agent is illustrated in Fig. 18 and 19 in a sequence of related diagrams. The diagram labeled with an oval with embedded number 1 is that of Fig. 9 with the modiﬁcations mentioned earlier to match up with human motivation and desire generation modules. In diagram 2, the label “mind” refers to the set of M&S capabilities depicted in Fig. 9. As in [30], an agent, human or technological, is considered to be composed of a mind and a body. Here, “body” represents the external manifestation of the agent, which is observable by other agents. Whereas, in contrast, mind is hidden from view and must be a construct, or model, of other agents. In other words, to use the language of evolutionary psychology, agents must develop a “theory of mind” about other
agents from observation of their external behavior. An endomorphic agent is represented in diagram 3 with a mental model of the body and mind of the agent in diagram 2. This second agent is shown more explicitly in diagram 4, with a mental representation of the ﬁrst agent’s body and mind. Diagram 5 depicts the recursive aspect of endomorphism, where the (original) agent of diagram 2 has developed a model of the second agent’s body and mind. But the latter model contains the justmentioned model of the ﬁrst agent’s body and mind. This leads to a potentially inﬁnite regress in which—apparently—each agent can have a representation of the other agent, and by reﬂection, of himself, that increases in depth of nesting without end. Hofstadter [59] represents a similar concept in the diagram on page 144 of his book, in which the comic character Sluggo is “dreaming of himself dreaming of himself dreaming of himself, without end.” He then uses the label on the Morton Salt box on page 145 to show how not all self reference involves inﬁnite recursion. On the label, the girl carrying the salt box obscures its label with her arm, thereby shutting down the regress. Thus, the salt box has a representation of itself on its label but this representation is only partial. Reference [30] related the termination in selfreference to the agent’s objectives and requirements in constructing models of himself, other agents, and the environment. Brieﬂy, the agent need only to go as deep as needed to get a reliable model of the other agents. The agent can
223
224
Artificial Intelligence in Modeling and Simulation
Artificial Intelligence in Modeling and Simulation, Figure 20 Interacting models of others in baseball
stop at level 1 with a representation of the other’s bodies. However, this might not allow predicting another’s movements, particularly if the latter has a mind in control of these movements. This would force the ﬁrst agent to include at least a crude model of the other agent’s mind. In a competitive situation, having such a model might give the ﬁrst agent an advantage and this might lead the second agent to likewise develop a predictive model of the ﬁrst agent. With the second agent now seeming to become less predictable, the ﬁrst agent might develop a model of the second agent’s mind that restores lost predictive power. This would likely have to include a reﬂected representation of himself, although the impending regression could be halted if this representation did not, itself, contain a model of the other agent. Thus, the depth to which competitive endomorphic agents have models of themselves and others might be the product of a coevolutionary “mental arms race” in which an improvement in one side triggers a contingent improvement in the other—the improvement being an incremental reﬁnement of the internal models by successively adding more levels of nesting. Minsky [60] conjectured that termination of the potentially inﬁnite regress in agent’s models of each other within a society of mind might be constrained by shear limitations on the ability to martial the resources required to support the necessary computation. We can go further by assuming that agents have diﬀering mental capacities to support such computational nesting. Therefore, an agent
with greater capacity might be able to “out think” one of lesser capability. This is illustrated by the following reallife story drawn from a recent newspaper account of a critical play in a baseball game. Interacting Models of Others in Competitive Sport The following account is illustrated in Fig. 20. A Ninth Inning to Forget Cordero Can’t Close, Then BaseRunning Gaﬀe Ends Nats’ Rally Steve Yanda – Washington Post Staﬀ Writer Jun 24, 2007 Copyright The Washington Post Company Jun 24, 2007Indians 4, Nationals 3 Nook Logan played out the ending of last night’s game in his head as he stood on second base in the bottom of the ninth inning. The bases were loaded with one out and the Washington Nationals trailed by one run. Even if Felipe Lopez, the batter at the plate, grounded the ball, say, right back to Cleveland Indians closer Joe Borowski, the pitcher merely would throw home. Awaiting the toss would be catcher Kelly Shoppach, who would tag the plate and attempt to nail Lopez at ﬁrst. By the time Shoppach’s throw reached ﬁrst baseman Victor Martinez, Logan ﬁgured he would be gliding across the plate with the ty
Artificial Intelligence in Modeling and Simulation
ing run. Lopez did ground to Borowski, and the closer did ﬁre the ball home. However, Shoppach elected to throw to third instead of ﬁrst, catching Logan drifting too far oﬀ the bag for the ﬁnal out in the Nationals’ 4–3 loss at RFK Stadium. “I thought [Shoppach] was going to throw to ﬁrst,” Logan said. And if the catcher had, would Logan have scored all the way from second? “Easy.” We’ll analyze this account to show how it throws light on the advantage rendered by having an endomorphic capability to process to a nesting depth exceeding that of an opponent. The situation starts the bottom of the ninth inning with the Washington Nationals at bats having the bases loaded with one out and trailing by one run. The runner on second base, Nook Logan plays out the ending of the game in his head. This can be interpreted in terms of endomorphic models as follows. Logan makes a prediction using his models of the opposing pitcher and catcher, namely that the pitcher would throw home and the catcher would tag the plate and attempt to nail the batter at ﬁrst. Logan then makes a prediction using a model of himself, namely, that he would be able to reach home plate while the pitcher’s thrown ball was traveling to ﬁrst base. In actual play, the catcher threw the ball to third and caught Logan out. This is evidence that the catcher was able to play out the simulation to a greater depth then was Logan. The catcher’s model of the situation agreed with that of Logan as it related to the other actors. The difference was that the catcher used a model of Logan that predicted that the latter (Logan) would predict the he (the catcher) would throw to ﬁrst. Having this prediction, the catcher decided instead to throw the ball to the second baseman which resulted in putting Logan out. We note that the catcher’s model was based on his model of Logan’s model so it was at one level greater in depth of nesting than the latter. To have succeeded, Logan would have had to be able to support one more level, namely, to have a model of the catcher that would predict that the catcher would use the model of himself (Logan) to outthink him and then make the counter move, not to start towards third base. The enigma of such endomorphic agents provides extreme challenges to further research in AI and M&S. The formal and computational framework that the M&S framework discussed here provides may be of particular advantage to cognitive psychologists and philosophers interested in an active area of investigation in which the terms “theory of mind,” “simulation,” and “mind reading” are employed without much in the way of deﬁnition [61].
Future Directions M&S presents some fundamental and very diﬃcult problems whose solution may beneﬁt from the concepts and techniques of AI. We have discussed some key problem areas including veriﬁcation and validation, reuse and composability, and distributed simulation and systems of systems interoperability. We have also considered some areas of AI that have direct applicability to problems in M&S, such as ServiceOriented Architecture and Semantic Web, ontologies, constrained natural language, and genetic algorithms. In order to provide a unifying theme for the problem and solutions, we raised the question of whether all of M&S can be automated into an integrated autonomous artiﬁcial modeler/simulationist. We explored an approach to developing such an intelligent agent based on the System Entity Structure/Model Base framework, a hybrid methodology that combines elements of AI and M&S. We proposed a concrete methodology by which such an agent could engage in M&S based on activity tracking. There are numerous challenges to AI that implementing such a methodology in automated form presents. We closed with consideration of endomorphic modeling capability, an advanced feature that such an agent must have if it is to fully emulate human M&S capability. Since this capacity implies an inﬁnite regress in which models contain models of themselves without end, it can only be had to a limited degree. However, it may oﬀer critical insights into competitive coevolutionary human or higher order primate behavior to launch more intensive research into model nesting depth. This is the degree to which an endomorphic agent can marshal mental resources needed to construct and employ models of its own “mind” as well of the ‘minds” of other agents. The enigma of such endomorphic agents provides extreme challenges to further research in AI and M&S, as well as related disciplines such as cognitive science and philosophy.
Bibliography Primary Literature 1. Wymore AW (1993) Modelbased systems engineering: An introduction to the mathematical theory of discrete systems and to the tricotyledon theory of system design. CRC, Boca Raton 2. Zeigler BP, Kim TG, Praehofer H (2000) Theory of modeling and simulation. Academic Press, New York 3. Ören TI, Zeigler BP (1979) Concepts for advanced simulation methodologies. Simulation 32(3):69–82 4. http://en.wikipedia.org/wiki/DEVS Accessed Aug 2008 5. Knepell PL, Aragno DC (1993) Simulation validation: a confidence assessment methodology. IEEE Computer Society Press, Los Alamitos
225
226
Artificial Intelligence in Modeling and Simulation
6. Law AM, Kelton WD (1999) Simulation modeling and analysis, 3rd edn. McGrawHill, Columbus 7. Sargent RG (1994) Verification and validation of simulation models. In: Winter simulation conference. pp 77–84 8. Balci O (1998) Verification, validation, and testing. In: Winter simulation conference. 9. Davis KP, Anderson AR (2003) Improving the composability of department of defense models and simulations, RAND technical report. http://www.rand.org/pubs/monographs/MG101/. Accessed Nov 2007; J Def Model Simul Appl Methodol Technol 1(1):5–17 10. Ylmaz L, Oren TI (2004) A conceptual model for reusable simulations within a modelsimulatorcontext framework. Conference on conceptual modeling and simulation. Conceptual Models Conference, Italy, 28–31 October, pp 28–31 11. Traore M, Muxy A (2004) Capturing the dual relationship between simulation models and their context. Simulation practice and theory. Elsevier 12. Page E, Opper J (1999) Observations on the complexity of composable simulation. In: Proceedings of winter simulation conference, Orlando, pp 553–560 13. Kasputis S, Ng H (2000) Composable simulations. In: Proceedings of winter simulation conference, Orlando, pp 1577–1584 14. Sarjoughain HS (2006) Model composability. In: Perrone LF, Wieland FP, Liu J, Lawson BG, Nicol DM, Fujimoto RM (eds) Proceedings of the winter simulation conference, pp 104–158 15. DiMario MJ (2006) System of systems interoperability types and characteristics in joint command and control. In: Proceedings of the 2006 IEEE/SMC international conference on system of systems engineering, Los Angeles, April 2006 16. Sage AP, Cuppan CD (2001) On the systems engineering and management of systems of systems and federation of systems. Information knowledge systems management, vol 2, pp 325– 345 17. Dahmann JS, Kuhl F, Weatherly R (1998) Standards for simulation: as simple as possible but not simpler the high level architecture for simulation. Simulation 71(6):378 18. Sarjoughian HS, Zeigler BP (2000) DEVS and HLA: Complimentary paradigms for M&S? Trans SCS 4(17):187–197 19. Yilmaz L (2004) On the need for contextualized introspective simulation models to improve reuse and composability of defense simulations. J Def Model Simul 1(3):135–145 20. Tolk A, Muguira JA (2003) The levels of conceptual interoperability model (LCIM). In: Proceedings fall simulation interoperability workshop, http://www.sisostds.org Accessed Aug 2008 21. http://en.wikipedia.org/wiki/Simula Accessed Aug 2008 22. http://en.wikipedia.org/wiki/Java Accessed Aug 2008 23. http://en.wikipedia.org/wiki/Expert_system Accessed Aug 2008 24. http://en.wikipedia.org/wiki/Frame_language Accessed Aug 2008 25. http://en.wikipedia.org/wiki/Agent_based_model Accessed Aug 2008 26. http://www.swarm.org/wiki/Main_Page Accessed Aug 2008 27. Unified Modeling Language (UML) http://www.omg.org/ technology/documents/formal/uml.htm 28. Object Modeling Group (OMG) http://www.omg.org 29. http://en.wikipedia.org/wiki/Semantic_web Accessed Aug 2008
30. Zeigler BP (1990) Object Oriented Simulation with Hierarchical, Modular Models: Intelligent Agents and Endomorphic Systems. Academic Press, Orlando 31. http://en.wikipedia.org/wiki/Service_oriented_architecture Accessed Aug 2008 32. Mittal S, Mak E, Nutaro JJ (2006) DEVSBased dynamic modeling & simulation reconfiguration using enhanced DoDAF design process. Special issue on DoDAF. J Def Model Simul, Dec (3)4:239–267 33. Ziegler BP (1988) Simulation methodology/model manipulation. In: Encyclopedia of systems and controls. Pergamon Press, England 34. Alexiev V, Breu M, de Bruijn J, Fensel D, Lara R, Lausen H (2005) Information integration with ontologies. Wiley, New York 35. Kim L (2003) Official XMLSPY handbook. Wiley, Indianapolis 36. Zeigler BP, Hammonds P (2007) Modeling & simulationbased data engineering: introducing pragmatics into ontologies for netcentric information exchange. Academic Press, New York 37. Simard RJ, Zeigler BP, Couretas JN (1994) Verb phrase model specification via system entity structures. AI and Planning in high autonomy systems, 1994. Distributed interactive simulation environments. Proceedings of the Fifth Annual Conference, 7–9 Dec 1994, pp 192–1989 38. Checkland P (1999) Soft systems methodology in action. Wiley, London 39. Holland JH (1992) Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, Cambridge 40. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. AddisonWesley Professional, Princeton 41. Davis L (1987) Genetic algorithms and simulated annealing. Morgan Kaufmann, San Francisco 42. Zbigniew M (1996) Genetic algorithms + data structures = evolution programs. Springer, Heidelberg 43. Cheon S (2007) Experimental frame structuring for automated model construction: application to simulated weather generation. Doct Diss, Dept of ECE, University of Arizona, Tucson 44. Zeigler BP (1984) Multifaceted modelling and discrete event simulation. Academic Press, London 45. Rozenblit JW, Hu J, Zeigler BP, Kim TG (1990) Knowledgebased design and simulation environment (KBDSE): foundational concepts and implementation. J Oper Res Soc 41(6):475–489 46. Kim TG, Lee C, Zeigler BP, Christensen ER (1990) System entity structuring and model base management. IEEE Trans Syst Man Cyber 20(5):1013–1024 47. Zeigler BP, Zhang G (1989) The system entity structure: knowledge representation for simulation modeling and design. In: Widman LA, Loparo KA, Nielsen N (eds) Artificial intelligence, simulation and modeling. Wiley, New York, pp 47–73 48. Luh C, Zeigler BP (1991) Model base management for multifaceted systems. ACM Trans Model Comp Sim 1(3):195–218 49. Couretas J (1998) System entity structure alternatives enumeration environment (SEAS). Doctoral Dissertation Dept of ECE, University of Arizona 50. Hyu C Park, Tag G Kim (1998) A relational algebraic framework for VHDL models management. Trans SCS 15(2):43–55 51. Chi SD, Lee J, Kim Y (1997) Using the SES/MB framework to analyze traffic flow. Trans SCS 14(4):211–221 52. Cho TH, Zeigler BP, Rozenblit JW (1996) A knowledge based simulation environment for hierarchical flexible manufacturing. IEEE Trans Syst Man Cyber Part A: Syst Hum 26(1):81–91
Artificial Intelligence in Modeling and Simulation
53. Carruthers P (2006) Massively modular mind architecture the architecture of the mind. Oxford University Press, USA, pp 480 54. Wolpert L (2004) Six impossible things before breakfast: The evolutionary origin of belief, W.W. Norton London 55. Zeigler BP (2005) Discrete event abstraction: an emerging paradigm for modeling complex adaptive systems perspectives on adaptation, In: Booker L (ed) Natural and artificial systems, essays in honor of John Holland. Oxford University Press, Oxford 56. Nutaro J, Zeigler BP (2007) On the stability and performance of discrete event methods for simulating continuous systems. J Comput Phys 227(1):797–819 57. Muzy A, Nutaro JJ (2005) Algorithms for efficient implementation of the DEVS & DSDEVS abstract simulators. In: 1st Open International Conference on Modeling and Simulation (OICMS). ClermontFerrand, France, pp 273–279 58. Muzy A The activity paradigm for modeling and simulation of complex systems. (in process) 59. Hofstadter D (2007) I am a strange loop. Basic Books 60. Minsky M (1988) Society of mind. Simon & Schuster, Goldman 61. Alvin I (2006) Goldman simulating minds: the philosophy, psychology, and neuroscience of mindreading. Oxford University Press, USA 62. Denning PJ (2007) Computing is a natural science. Commun ACM 50(7):13–18 63. Luck M, McBurney P, Preist C (2003) Agent technology: enabling next generation computing a roadmap for agent based computing. Agentlink, Liverpool 64. Miller JH, Page SE (2007) Complex adaptive systems: an introduction to computational models of social life. Princeton University Press, Princeton 65. Ferber J (1999) MultiAgent systems: an introduction to distributed artificial intelligence. AddisonWesley, Princeton 66. Gasser L, Braganza C, Herman N (1987) Mace: a extensible testbed for distributed AI research. Distributed artificial intelligence – research notes in artificial intelligence, pp 119–152 67. Agha G, Hewitt C (1985) Concurrent programming using actors: exploiting largescale parallelism. In: Proceedings of the foundations of software technology and theoretical computer science, Fifth Conference, pp 19–41
68. Smith RG (1980) The contract net protocol: highlevel communication and control in a distributed problem solver. IEEE Trans Comput 29(12):1104–1113 69. Shoham Y (1993) Agentoriented programming. Artif Intell 60(1):51–92 70. Dahl OJ, Nygaard K (1967) SIMULA67 Common base definiton. Norweigan Computing Center, Norway 71. Rao AS, George MP (1995) BDIagents: from theory to practice. In: Proceedings of the first intl. conference on multiagent systems, San Francisco 72. Firby RJ (1992) Building symbolic primitives with continuous control routines. In: Procedings of the First Int Conf on AI Planning Systems. College Park, MD pp 62–29 73. Yilmaz L, Paspuletti S (2005) Toward a metalevel framework for agentsupported interoperation of defense simulations. J Def Model Simul 2(3):161–175
Books and Reviews Alexiev V, Breu M, de Bruijn J, Fensel D, Lara R, Lausen H (2005) Information integration with ontologies. Wiley, New York Alvin I (2006) Goldman Simulating Minds: The philosophy, psychology, and neuroscience of mindreading. Oxford University Press, USA Carruthers P (2006) Massively modular mind architecture The architecture of the mind. http://www.amazon.com/exec/obidos/ searchhandleurl/10212532216360121 John H (1992) Holland adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. The MIT Press Cambridge Zeigler BP (1990) Object oriented simulation with hierarchical, modular models: intelligent agents and endomorphic systems. Academic Press, Orlando Zeigler BP, Hammonds P (2007) Modeling & simulationbased data engineering: introducing pragmatics into ontologies for netcentric information exchange. Academic Press, New York Zeigler BP, Kim TG, Praehofer H (2000) Theory of modeling and simulation. Academic Press, New York
227
228
Bacterial Computing
Bacterial Computing MARTYN AMOS Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, UK Article Outline Glossary Deﬁnition of the Subject Introduction Motivation for Bacterial Computing The Logic of Life Rewiring Genetic Circuitry Successful Implementations Future Directions Bibliography Glossary DNA Deoxyribonucleic acid. Molecule that encodes the genetic information of cellular organisms. Operon Set of functionally related genes with a common promoter (“on switch”). Plasmid Small circular DNA molecule used to transfer genes from one organism to another. RNA Ribonucleic acid. Molecule similar to DNA, which helps in the conversion of genetic information to proteins. Transcription Conversion of a genetic sequence into RNA. Translation Conversion of an RNA sequence into an amino acid sequence (and, ultimately, a protein). Definition of the Subject Bacterial computing is a conceptual subset of synthetic biology, which is itself an emerging scientiﬁc discipline largely concerned with the engineering of biological systems. The goals of synthetic biology may be loosely partioned into four sets: (1) To better understand the fundamental operation of the biological system being engineered, (2) To extend synthetic chemistry, and create improved systems for the synthesis of molecules, (3) To investigate the “optimization” of existing biological systems for human purposes, (4) To develop and apply rational engineering principles to the design and construction of biological systems. It is on these last two goals that we focus in the current article. The main beneﬁts that may accrue from these studies are both theoretical and practical; the construction and study of synthetic biosystems could improve our quantitative understanding of the fundamental underly
ing processes, as well as suggesting plausible applications in ﬁelds as diverse as pharmaceutical synthesis and delivery, biosensing, tissue engineering, bionanotechnology, biomaterials, energy production and environmental remediation. Introduction Complex natural processes may often be described in terms of networks of computational components, such as Boolean logic gates or artiﬁcial neurons. The interaction of biological molecules and the ﬂow of information controlling the development and behavior of organisms is particularly amenable to this approach, and these models are wellestablished in the biological community. However, only relatively recently have papers appeared proposing the use of such systems to perform useful, humandeﬁned tasks. For example, rather than merely using the network analogy as a convenient technique for clarifying our understanding of complex systems, it may now be possible to harness the power of such systems for the purposes of computation. Despite the relatively recent emergence of biological computing as a distinct research area, the link between biology and computer science is not a new one. Of course, for years biologists have used computers to store and analyze experimental data. Indeed, it is widely accepted that the huge advances of the Human Genome Project (as well as other genome projects) were only made possible by the powerful computational tools available. Bioinformatics has emerged as “the science of the 21st century”, requiring the contributions of truly interdisciplinary scientists who are equally at home at the lab bench or writing software at the computer. However, the seeds of the relationship between biology and computer science were sown over ﬁfty years ago, when the latter discipline did not even exist. When, in the 17th century, the French mathematician and philosopher René Descartes declared to Queen Christina of Sweden that animals could be considered a class of machines, she challenged him to demonstrate how a clock could reproduce. Three centuries later in 1951, with the publication of “The General and Logical Theory of Automata” [38] John von Neumann showed how a machine could indeed construct a copy of itself. Von Neumann believed that the behavior of natural organisms, although orders of magnitude more complex, was similar to that of the most intricate machines of the day. He believed that life was based on logic. We now begin to look at how this view of life may be used, not simply as a useful analogy, but as the practical foundation of a whole new engineering discipline.
Bacterial Computing
Motivation for Bacterial Computing Here we consider the main motivations behind recent work on bacterial computing (and, more broadly, synthetic biology). Before recombinant DNA technology made it possible to construct new genetic sequences, biologists were restricted to crudely “knocking out” individual genes from an organism’s genome, and then assessing the damage caused (or otherwise). Such knockouts gradually allowed them to piece together fragments of causality, but the process was very timeconsuming and errorprone. Since the dawn of genetic engineering – with the ability to synthesize novel gene segments – biologists have been in a position to make much more ﬁnelytuned modiﬁcations to their organism of choice, this generating much more reﬁned data. Other advances in biochemistry have also contributed, allowing scientists to – for example – investigate new types of genetic systems with, for example, twelve bases, rather than the traditional four [14]. Such creations have yielded valuable insights into the mechanics of mutation, adaptation and evolution. Researchers in synthetic biology are now extending their work beyond the synthesis of single genes, and are now introducing whole new gene complexes into organisms. The objectives behind this work are both theoretical and practical. As Benner and Seymour argue [5], “ . . . a synthetic goal forces scientists to cross uncharted ground to encounter and solve problems that are not easily encountered through [topdown] analysis. This drives the emergence of new paradigms [“world views”] in ways that analysis cannot easily do.” Drew Endy agrees. “Worstcase scenario, it’s a complementary approach to traditional discovery science” [18]. “Bestcase scenario, we get systems that are simpler and easier to understand . . . ” Or, to put it bluntly, “Let’s build new biological systems – systems that are easier to understand because we made them that way” [31]. As well as shedding new light on the underlying biology, these novel systems may well have significant practical utility. Such new creations, according to Endy’s “personal wish list” might include “generating biological machines that could clean up toxic waste, detect chemical weapons, perform simple computations, stalk cancer cells, lay down electronic circuits, synthesize complex compounds and even produce hydrogen from sunlight” [18]. In the next Section we begin to consider how this might be achieved, by ﬁrst describing the underlying logic of genetic circuitry. The Logic of Life Twenty years after von Neumann’s seminal paper, Francois Jacob and Jacques Monod identiﬁed speciﬁc natural
processes that could be viewed as behaving according to logical principles: “The logic of biological regulatory systems abides not by Hegelian laws but, like the workings of computers, by the propositional algebra of George Boole.” [29] This conclusion was drawn from earlier work of Jacob and Monod [30]. In addition, Jacob and Monod described the “lactose system” [20], which is one of the archetypal examples of a Boolean biosystem. We describe this system shortly, but ﬁrst give a brief introduction to the operation of genes in general terms. DNA as the Carrier of Genetic Information The central dogma of molecular biology [9] is that DNA produces RNA, which in turn produces proteins. The basic “building blocks” of genetic information are known as genes. Each gene codes for one speciﬁc protein and may be turned on (expressed) or oﬀ (repressed) when required. Transcription and Translation We now describe the processes that determine the structure of a protein, and hence its function. Note that in what follows we assume the processes described occur in bacteria, rather than in higher organisms such as humans. For a full description of the structure of the DNA molecule, see the chapter on DNA computing. In order for a DNA sequence to be converted into a protein molecule, it must be read (transcribed) and the transcript converted (translated) into a protein. Transcription of a gene produces a messenger RNA (mRNA) copy, which can then be translated into a protein. Transcription proceeds as follows. The mRNA copy is synthesized by an enzyme known as RNA polymerase. In order to do this, the RNA polymerase must be able to recognize the speciﬁc region to be transcribed. This speciﬁcity requirement facilitates the regulation of genetic expression, thus preventing the production of unwanted proteins. Transcription begins at speciﬁc sites within the DNA sequence, known as promoters. These promoters may be thought of as “markers”, or “signs”, in that they are not transcribed into RNA. The regions that are transcribed into RNA (and eventually translated into protein) are referred to as structural genes. The RNA polymerase recognizes the promoter, and transcription begins. In order for the RNA polymerase to begin transcription, the double helix must be opened so that the sequence of bases may be read. This opening involves the breaking of the hydrogen bonds between bases. The RNA polymerase then
229
230
Bacterial Computing
moves along the DNA template strand in the 3 ! 50 direction. As it does so, the polymerase creates an antiparallel mRNA chain (that is, the mRNA strand is the equivalent of the WatsonCrick complement of the template). However, there is one signiﬁcant diﬀerence, in that RNA contains uracil instead of thymine. Thus, in mRNA terms, “U binds with A.” The RNA polymerase moves along the DNA, the DNA recoiling into its doublehelix structure behind it, until it reaches the end of the region to be transcribed. The end of this region is marked by a terminator which, like the promoter, is not transcribed. Genetic Regulation Each step of the conversion, from stored information (DNA), through mRNA (messenger), to protein synthesis (eﬀector), is itself catalyzed by other eﬀector molecules. These may be enzymes or other factors that are required for a process to continue (for example, sugars). Consequently, a loop is formed, where products of one gene are required to produce further gene products, and may even inﬂuence that gene’s own expression. This process was ﬁrst described by Jacob and Monod in 1961 [20], a discovery that earned them a share of the 1965 Nobel Prize in Physiology or Medicine. Genes are composed of a number of distinct regions, which control and encode the desired product. These regions are generally of the form promoter–gene– terminator. Transcription may be regulated by eﬀector molecules known as inducers and repressors, which interact with the promoter and increase or decrease the level of transcription. This allows eﬀective control over the expression of proteins, avoiding the production of unnecessary compounds. It is important to note at this stage that, in reality, genetic regulation does not conform to the digital “onoﬀ” model that is popularly portrayed; rather, it is continuous or analog in nature. The Lac Operon One of the most wellstudied genetic systems is the lac operon. An operon is a set of functionally related genes with a common promoter. An example of this is the lac operon, which contains three structural genes that allow E. coli to utilize the sugar lactose. When E.coli is grown on the sugar glucose, the product of the (separate, and unrelated to the lac operon) lacI gene represses the transcription of the lacZYA operon (i. e., the operon is turned oﬀ). However, if lactose is supplied together with glucose, a lactose byproduct is produced which interacts with the repressor molecule, preventing
it from repressing the lacZYA operon. This derepression does not itself initiate transcription, since it would be ineﬃcient to utilize lactose if the more common sugar glucose were still available. The operon is positively regulated (i. e., “encouraged”) by a diﬀerent molecule, whose level increases as the amount of available glucose decreases. Therefore, if lactose were present as the sole carbon source, the lacI repression would be relaxed and the high “encouraging” levels would activate transcription, leading to the synthesis of the lacZYA gene products. Thus, the promoter is under the control of two sugars, and the lacZYA operon is only transcribed when lactose is present and glucose is absent. In essence, Jacob and Monod showed how a gene may be thought of (in very abstract terms) as a binary switch, and how the state of that switch might be aﬀected by the presence or absence of certain molecules. Monod’s point, made in his classic book Chance and Necessity and quoted above, was that the workings of biological systems operate not by Hegel’s philosophical or metaphysical logic of understanding, but according to the formal, mathematicallygrounded logical system of George Boole. What Jacob and Monod found was that the transcription of a gene may be regulated by molecules known as inducers and repressors, which either increase or decrease the “volume” of a gene (corresponding to its level of transcription, which isn’t always as clear cut and binary as Monod’s quote might suggest). These molecules interact with the promoter region of a gene, allowing the gene’s level to be ﬁnely “tuned”. The lac genes are socalled because, in the E. coli bacterium, they combine to produce a variety of proteins that allow the cell to metabolise the sugar lactose (which is most commonly found in milk, hence the derivation from the Latin, lact, meaning milk). For reasons of eﬃciency, these proteins should only be produced (i. e., the genes be turned on) when lactose is present in the cell’s environment. Making these proteins when lactose is absent would be a waste of the cell’s resources, after all. However, a diﬀerent sugar – glucose – will always be preferable to lactose, if the cell can get it, since glucose is an “easier” form of sugar to metabolise. So, the input to and output from the lac operon may be expressed as a truth table, with G and L standing for glucose and lactose (1 if present, 0 if absent), and O standing for the output of the operon (1 if on, 0 if oﬀ): G 0 0 1 1
L 0 1 0 1
O 0 1 0 0
Bacterial Computing
The Boolean function that the lac operon therefore physically computes is (L AND (NOT G)), since it only outputs 1 if L=1 (lactose present) and G=0 (glucose is absent). By showing how one gene could aﬀect the expression of another – just like a transistor feeds into the input of another and aﬀects its state – Jacob and Monod laid the foundations for a new way of thinking about genes; not simply in terms of protein blueprints, but of circuits of interacting parts, or dynamic networks of interconnected switches and logic gates. This view of the genome is now wellestablished [22,23,27], but in the next Section we show how it might be used to guide the engineering of biological systems. Rewiring Genetic Circuitry A key diﬀerence between the wiring of a computer chip and the circuitry of the cell is that “electronics engineers know exactly how resistors and capacitors are wired to each other because they installed the wiring. But biologists often don’t have a complete picture. They may not know which of thousands of genes and proteins are interacting at a given moment, making it hard to predict how circuits will behave inside cells” [12]. This makes the task of reengineering vastly more complex. Rather than trying assimilate the huge amounts of data currently being generated by the various genome projects, synthetic biologists are taking a novel route – simplify and build. “They create models of genetic circuits, build the circuits, see if they work, and adjust them if they don’t – learning about biology in the process. ‘I view it as a reductionist approach to systems biology,’ says biomedical engineer Jim Collins of Boston University” [12]. The ﬁeld of systems biology has emerged in recent years as an alternative to the traditional reductionist way of doing science. Rather than simply focussing on a single level of description (such as individual proteins), researchers are now seeking to integrate information from many diﬀerent layers of complexity. By studing how different biological components interact, rather than simply looking at their structure, systems biologists are attempting to build models of systems from the bottom up. A model is simply an abstract description of how a system operates – for example, a set of equations that describe how a disease spreads throughout a population. The point of a model is to capture the essence of a system’s operation, and it should therefore be as simple as possible. Crucially, a model should also be capable of making predictions, which can then be tested against reality using real data (for example, if infected people are placed in quarantine for a week, how does this aﬀect the progress of the
disease in question, or what happens if I feed this signal into the chip?). The results obtained from these tests may then feed back in to further reﬁnements of the model, in a continuous cycle of improvement. When a model suggests a plausible structure for a synthetic genetic circuit, the next stage is to engineer it into the chosen organism, such as a bacterium. Once the structure of the DNA molecule was elucidated and the processes of transcription and translation were understood, molecular biologists were frustrated by the lack of suitable experimental techniques that would facilitate more detailed examination of the genetic material. However, in the early 1970s, several techniques were developed that allowed previously impossible experiments to be carried out (see [8,33]). These techniques quickly led to the ﬁrst ever successful cloning experiments [19,25]. Cloning is generally deﬁned as “ . . . the production of multiple identical copies of a single gene, cell, virus, or organism.” [35]. This is achieved as follows: a speciﬁc sequence (corresponding, perhaps, to a novel gene) is inserted in a circular DNA molecule, known as a plasmid, or vector, producing a recombinant DNA molecule. The vector acts as a vehicle, transporting the sequence into a host cell (usually a bacterium, such as E.coli). Cloning single genes is wellestablished, but is often done on an ad hoc basis. If biological computing is to succeed, it requires some degree of standardization, in the same way that computer manufacturers build diﬀerent computers, but using a standard library of components. “Biobricks are the ﬁrst example of standard biological parts,” explains Drew Endy [21]. “You will be able to use biobricks to program systems that do whatever biological systems do.” He continues. “That way, if in the future, someone asks me to make an organism that, say, counts to 3,000 and then turns left, I can grab the parts I need oﬀ the shelf, hook them together and predict how they will perform” [15]. Each biobrick is a simple component, such as an AND gate, or an inverter (NOT). Put them together one after the other, and you have a NAND (NOTAND) gate, which is all that is needed to build any Boolean circuit (an arbitrary circuit can be translated to an equivalent circuit that uses only NAND gates. It will be much bigger than the original, but it will compute the same function. Such considerations were important in the early stages of integrated circuits, when building diﬀerent logic gates was diﬃcult and expensive). Just as transistors can be used together to build logic gates, and these gates then combined into circuits, there exists a hierarchy of complexity with biobricks. At the bottom are the “parts”, which generally correspond to coding regions for proteins. Then, one level up, we have “devices”, which are built from parts – the oscillator of Elowitz and
231
232
Bacterial Computing
Leibler, for example, could be constructed from three inverter devices chained together, since all an inverter does is “ﬂip” its signal from 1 to 0 (or vice versa). This circuit would be an example of the biobricks at the top of the conceptual tree – “systems”, which are collections of parts to do a signiﬁcant task (like oscillating or counting). Tom Knight at MIT made the ﬁrst 6 biobricks, each held in a plasmid ready for use. As we have stated, plasmids can be used to insert novel DNA sequences in the genomes of bacteria, which act as the “testbed” for the biobrick circuits. “Just pour the contents of one of these vials into a standard reagent solution, and the DNA will transform itself into a functional component of the bacteria,” he explains [6]. Drew Endy was instrumental in developing this work further, one invaluable resource being the Registry of Standard Biological Parts [32], the deﬁnitive catalogue of new biological component. At the start of 2006, it contained 224 “basic” parts and 459 “composite” parts, with 270 parts “under construction”. Biobricks are still at a relatively early stage, but “Eventually we’ll be able to design and build in silico and go out and have things synthesized,” says Jay Keasling, head of Lawrence Berkeley National Laboratory’s new synthetic biology department [12]. Successful Implementations Although important foundational work had been performed by Arkin and Ross as early as 1994 [2], the year 2000 was a particularly signiﬁcant one for synthetic biology. In January two foundational papers appeared backtoback in the same issue of Nature. “Networks of interacting biomolecules carry out many essential functions in living cells, but the ‘design principles’ underlying the functioning of such intracellular networks remain poorly understood, despite intensive eﬀorts including quantitative analysis of relatively simple systems. Here we present a complementary approach to this problem: the design and construction of a synthetic network to implement a particular function” [11]. That was the introduction to a paper that Drew Endy would call “the highwater mark of a synthetic genetic circuit that does something” [12]. In the ﬁrst of the two articles, Michael Elowitz and Stanislau Leibler (then both at Princeton) showed how to build microscopic “Christmas tree lights” using bacteria. Synthetic Oscillator In physics, an oscillator is a system that produces a regular, periodic “output”. Familiar examples include a pendulum, a vibrating string, or a lighthouse. Linking several oscilla
tors together in some way gives rise to synchrony – for example, heart cells repeatedly ﬁring in unison, or millions of ﬁreﬂies blinking on and oﬀ, seemingly as one [36]. Leibler actually had two articles published in the same highimpact issue of Nature. The other was a short communication, coauthored with Naama Barkai – also at Princeton, but in the department of Physics [3]. In their paper, titled “Circadian clocks limited by noise”, Leibler and Barkai showed how a simple model of biochemical networks could oscillate reliably, even in the presence of noise. They argued that such oscillations (which might, for example, control the internal circadian clock that tells us when to wake up and when to be tired) are based on networks of genetic regulation. They built a simulation of a simple regulatory network using a Monte Carlo algorithm. They found that, however they perturbed the system, it still oscillated reliably, although, at the time, their results existed only in silico. The other paper by Leibler was much more applied, in the sense that they had constructed a biological circuit [11]. Elowitz and Leibler had succeeded in constructing an artiﬁcial genetic oscillator in cells, using a synthetic network of repressors. They called this construction the repressilator. Rather than investigating existing oscillator networks, Elowitz and Leibler decided to build one entirely from ﬁrst principles. They chose three repressorpromoter pairs that had already been sequenced and characterized, and ﬁrst built a mathematical model in software. By running the sets of equations, they identiﬁed from their simulation results certain molecular characteristics of the components that gave rise to socalled limit cycle oscillations; those that are robust to perturbations. This information from the model’s results lead Elowitz and Leibler to select strong promoter molecules and repressor molecules that would rapidly decay. In order to implement the oscillation, they chose three genes, each of which aﬀected one of the others by repressing it, or turning it oﬀ. For the sake of illustration, we call the genes A, B and C. The product of gene A turns oﬀ (represses) gene B. The absence of B (which represses C) allows C to turn on. C is chosen such that it turns gene A oﬀ again, and the three genes loop continuously in a “daisy chain” eﬀect, turning on and oﬀ in a repetitive cycle. However, some form of reporting is necessary in order to conﬁrm that the oscillation is occurring as planned. Green ﬂuorescent protein (GFP) is a molecule found occurring naturally in the jellyﬁsh Aequorea victoria. Biologists ﬁnd it invaluable because it has one interesting property – when placed under ultraviolet light, it glows. Biologists quickly sequenced the gene responsible for producing this protein, as they realized that it could have
Bacterial Computing
many applications as a reporter. By inserting the gene into an organism, you have a readymade “status light” – when placed into bacteria, they glow brightly if the gene is turned on, and look normal if it’s turned oﬀ. We can think of it in terms of a Boolean circuit – if the circuit outputs the value 1, the GFP promoter is produced to turn on the light. If the value is 0, the promoter isn’t produced, the GFP gene isn’t expressed, and the light stays oﬀ. Elowitz and Leibler set up their gene network so that the GFP gene would be expressed whenever gene C was turned oﬀ – when it was turned on, the GFP would gradually decay and fade away. They synthesized the appropriate DNA sequences and inserted them into a plasmid, eventually yielding a population of bacteria that blinked on and oﬀ in a repetitive cycle, like miniature lighthouses. Moreover, and perhaps most signiﬁcantly, the period between ﬂashes was longer than the time taken for the cells to divide, showing that the state of the system had been passed on during reproduction. Synthetic Toggle Switch Rather than model an existing circuit and then altering it, Elowitz and Leibler had taken a “bottom up” approach to learning about how gene circuits operate. The other notable paper to appear in that issue was written by Timothy Gardner, Charles Cantor and Jim Collins, all of Boston University in the US. In 2000, Gardner, Collins and Cantor observed that genetic switching (such as that observed in the lambda phage [34]) had not yet been “demonstrated in networks of nonspecialized regulatory components” [13]. That is to say, at that point nobody had been able to construct a switch out of genes that hadn’t already been “designed” by evolution to perform that speciﬁc task. The team had a similar philosophy to that of Elowitz and Leibler, in that their main motivation was being able to test theories about the fundamental behaviour of gene regulatory networks. “Owing to the diﬃculty of testing their predictions,” they explained, “these theories have not, in general, been veriﬁed experimentally. Here we have integrated theory and experiment by constructing and testing a synthetic, bistable [twostate] gene circuit based on the predictions of a simple mathematical model.” “We were looking for the equivalent of a light switch to ﬂip processes on or oﬀ in the cell,” explained Gardner [10]. “Then I realized a way to do this with genes instead of with electric circuits.” The team chose two genes that were mutually inhibitory – that is, each produced a molecule that would turn the other oﬀ. One important thing to bear in mind is that the system didn’t have a single input. Al
though the team acknowledged that bistability might be possible – in theory – using only a single promoter that regulated itself , they anticipated possible problems with robustness and experimental tunability if they used that approach. Instead, they decided to use a system whereby each “side” of the switch could be “pressed” by a diﬀerent stimulus – the addition of a chemical on one, and a change in temperature on the other. Things were set up so that if the system was in the state induced by the chemical, it would stay in that state until the temperature was changed, and would only change back again if the chemical was reintroduced. Importantly, these stimuli did not have to be applied continuously – a “short sharp” burst was enough to cause the switch to ﬂip over. As with the other experiment, Gardner and his colleagues used GFP as the system state reporter, so that the cells glowed in one state, and looked “normal” in the other. In line with the bottomup approach, they ﬁrst created a mathematical model of the system and made some predictions about how it would behave inside a cell. Within a year, Gardner had spliced the appropriate genes into the bacteria, and he was able to ﬂip them –at will – from one state the the other. As McAdams and Arkin observed, synthetic “one way” switches had been created in the mid1980s, but “this is perhaps the ﬁrst engineered design exploiting bistability to produce a switch with capability of reversibly switching between two . . . stable states” [28]. The potential applications of such a bacterial switch were clear. As they state in the conclusion to their article, “As a practical device, the toggle switch . . . may ﬁnd applications in gene therapy and biotechnology.” They also borrowed from the language of computer programming, using an analogy between their construction and the short “applets” written in the Java language, which now allow us to download and run programs in our web browser. “Finally, as a cellular memory unit, the toggle forms the basis for ‘genetic applets’ – selfcontained, programmable, synthetic gene circuits for the control of cell function”. Engineered Communication Towards the end of his life, Alan Turing did some foundational work on pattern formation in nature, in an attempt to explain how zebras get their striped coats or leopards their spots. The study of morphogenesis (from the Greek, morphe – shape, and genesis – creation. “Amorphous” therefore means “without shape or structure”) is concerned with how cells split to assume new roles and communicate with another to form very precise shapes, such as tissues and organs. Turing postulated that the diffusion of chemical signals both within and between cells is
233
234
Bacterial Computing
the main driving force behind such complex pattern formation [37]. Although Turing’s work was mainly concerned with the processes occurring amongst cells inside a developing embryo, it is clear that chemical signalling also goes on between bacteria. Ron Weiss of Princeton University was particularly interested in Vibrio ﬁscheri, a bacterium that has a symbiotic relationship with a variety of aquatic creatures, including the Hawaiian squid. This relationship is due mainly to the fact that the bacteria exhibit bioluminescence – they generate a chemical known as a Luciferase (coded by the Lux gene), a version of which is also found in ﬁreﬂies, and which causes them to glow when gathered together in numbers. Cells within the primitive light organs of the squid draw in bacteria from the seawater and encourage them to grow. Crucially, once enough bacteria are packed into the light organ they produce a signal to tell the squid cells to stop attracting their colleagues, and only then do they begin to glow. The cells get a safe environment in which to grow, protected from competition, and the squid has a light source by which to navigate and catch prey. The mechanism by which the Vibrio “know” when to start glowing is known as quorum sensing, since there have to be suﬃcient “members” present for luminiscence to occur. The bacteria secrete an autoinducer molecule, known as VAI (Vibrio Auto Inducer), which diﬀuses through the cell wall. The Lux gene (which generates the glowing chemical) needs to be activated (turned on) by a particular protein – which attracts the attention of the polymerase – but the protein can only do this with help from the VAI. Its particular 3D structure is such that it can’t ﬁt tightly onto the gene unless it’s been slightly bent out of shape. Where there’s enough VAI present, it locks onto the protein and alters its conformation, so that it can turn on the gene. Thus, the concentration of VAI is absolutely crucial; once a critical threshold has been passed, the bacteria “know” that there are enough of them present, and they begin to glow. Weiss realized that this quorumbased celltocell communication mechanism could provide a powerful framework for the construction of bacterial devices – imagine, for example, a tube of solution containing engineered bacteria that can be added to a sample of seawater, causing it to glow only if the concentration of a particular pollutant exceeds a certain threshold. Crucially, as we will see shortly, it also allows the possibility of generating precise “complex patterned” development. Weiss set up two colonies of E. coli, one containing “sender”, and the other “receivers”. The idea was that the senders would generate a chemical signal made up of VAI,
which could diﬀuse across a gap and then be picked up by the receivers. Once a strong enough signal was being communicated, the receivers would glow using GFP to say that it had been picked up. Weiss cloned the appropriate gene sequences (corresponding to a type of biobrick) into his bacteria, placed colonies of receiver cells on a plate, and the receivers started to glow in acknowledgment. Synthetic Circuit Evolution In late 2002, Weiss and his colleagues published another paper, this time describing how rigorous engineering principles may be brought to bear on the problem of designing and building entirely new genetic circuitry. The motivation was clear – “biological circuit engineers will have to confront their inability to predict the precise behavior of even the most simple synthetic networks, a serious shortcoming and challenge for the design and construction of more sophisticated genetic circuitry in the future” [39]. Together with colleagues Yohei Yokobayashi and Frances Arnold, Weiss proposed a two stage strategy: ﬁrst, design a circuit from the bottom up, as Elowitz and others had before, and clone it into bacteria. Such circuits are highly unlikely to work ﬁrst time, “because the behavior of biological components inside living cells is highly contextdependent, the actual circuit performance will likely diﬀer from the design predictions, often resulting in a poorly performing or nonfunctional circuit.” Rather than simply abandoning their design, Weiss and his team decided to then tune the circuit inside the cell itself, by applying the principles of evolution. By inducing mutations in the DNA that they had just introduced, they were able to slightly modify the behaviour of the circuit that it represented. Of course, many of these changes would be catastrophic, giving even worse performance than before, but, occasionally, they observed a minor improvement. In that case, they kept the “winning” bacteria, and subjected it to another round of mutation, in a repeated cycle. In a microcosmic version of Darwinian evolution, mutation followed by selection of the ﬁttest took an initially unpromising pool of broken circuits and transformed them into winners. “Ron is utilizing the power of evolution to design networks in ways so that they perform exactly the way you want them to,” observed Jim Collins [16]. In a commentary article in the same issue of the journal, Jeﬀ Hasty called this approach “design then mutate” [17]. The team showed how a circuit made up of three genetic gates could be ﬁnetuned in vivo to give the correct performance, and they concluded that “the approach we have outlined should serve as a ro
Bacterial Computing
bust and widely applicable route to obtaining circuits, as well as new genetic devices, that function inside living cells.” Pattern Formation The next topic studied by Weiss and his team was the problem of space – speciﬁcally, how to get a population of bacteria to cover a surface with a speciﬁc density. This facility could be useful when designing bacterial biosensors – devices that detect chemicals in the environment and produce a response. By controlling the density of the microbial components, it might be possible to tune the sensitivity of the overall device. More importantly, the ability for cells to control their own density would provide a useful “selfdestruct” mechanism were these geneticallymodiﬁed bugs ever to be released into the environment for “real world” applications. In “Programmed population control by cellcell communication and regulated killing” [40], Weiss and his team built on their previous results to demonstrate the ability to keep the density of an E. coli population artiﬁcially low – that is, below the “natural” density that could be supported by the available nutrients. They designed a genetic circuit that caused the bacteria to generate a diﬀerent Vibrio signalling molecule, only this time, instead of making the cells glow, a suﬃcient concentration would ﬂip a switch inside the cell, turning on a killer gene, encoding a protein that was toxic in suﬃcient quantities. The system behaved exactly as predicted by their mathematical model. The culture grew at an exponential rate (that is, doubling every time step) for seven hours, before hitting the deﬁned density threshold. At that point the population dropped sharply, as countless cells expired, until the population settled at a steady density signiﬁcantly (ten times) lower than an unmodiﬁed “control” colony. The team concluded that “The populationcontrol circuit lays the foundations for using cellcell communication to programme interactions among bacterial colonies, allowing the concept of communication regulated growth and death to be extended to engineering synthetic ecosystems” [40]. The next stage was to programme cells to form speciﬁc spatial patterns in the dish. As we have already mentioned brieﬂy, pattern formation is one characteristic of multicellular systems. This is generally achieved using some form of chemical signalling, combined with a diﬀerential response – that is, diﬀerent cells, although genetically identical, may “read” the environmental signals and react in diﬀerent ways, depending on their internal state. For example, one cell might be “hungry” and choose to move towards a food source, while an identical cell might choose
to remain in the same spot, since it has adequate levels of energy. The team used a variant of the senderreceiver model, only this time adding a “distance detection” component to the receiver circuit. The senders were placed in the center of the dish, and the receivers distributed uniformly across the surface. The receivers constructed so that they could measure the strength of the signal being “beamed” from the senders, a signal which decayed over distance (a little like a radio station gradually breaking up as you move out of the reception area). The cells were engineered so that only those that were either “near” to the senders or “far” from the senders would generate a response (those in the middle region were instructed to keep quiet). These cells are genetically identical, and are uniformly distributed over the surface – the diﬀerential response comes in the way that they assess the strength of the signal, and make a decision on whether or not to respond. The power of the system was increased further by making the near cells glow green, and those far away glow red (using a diﬀerent ﬂuorescent protein). When the team set the system running, they observed the formation of a “dartboard” pattern, with the “bullseye” being the colony of senders (instructed to glow cyan, or light blue), which was surrounded by a green ring, which in turn was surrounded by a red ring. By placing three sender colonies in a triangle, they were also able to obtain a green heartshaped pattern, formed by the intersection of three green circles, as well as other patterns, determined solely by the initial conﬁguration of senders [4]. Bacterial Camera Rather than generating light, a diﬀerent team decided to use bacteria to detect light – in the process, building the world’s ﬁrst microbial camera. By engineering a dense bed of E. coli, a team of students led by Chris Voight at Berkeley developed lightsensitive “ﬁlm” capable of storing images at a resolution of 100 megapixels per square inch. E. coli are not normally sensitive to light, so the group took genes coding for photoreceptors from bluegreen algae, and spliced them into their bugs [24]. When light was shone on the cells, it turned on a genetic switch that cause a chemical inside them to permanently darken, thus generating a black “pixel”. By projecting an image onto a plate of bacteria, the team were able to obtain several monochrome images, including the Nature logo and the face of team member Andrew Ellington. Nobel Laureate Sir Harry Kroto, discoverer of “buckballs”, called the team’s camera an “extremely exciting advance” [26], going on to say that “I have always thought that the ﬁrst ma
235
236
Bacterial Computing
jor nanotechnology advances would involve some sort of chemical modiﬁcation of biology.” Future Directions Weiss and his team team suggest that “the integration of such systems into higherlevel organisms and with diﬀerent cell functions will have practical applications in threedimensional tissue engineering, biosensing, and biomaterial fabrication.” [4] One possible use for such a system might lie in the detection of bioweapons – spread a culture of bacteria over a surface, and, with the appropriate control circuit, they will be able to accurately pinpoint the location of any pathogens. Programmed cells could eventually replace artiﬁcial tissues, or even organs – current attempts to build such constructions in the laboratory rely on cells arranging themselves around an artiﬁcial scaﬀold. Controlled cellular structure formation could do away with the need for such support – “The way we’re doing tissue engineering, right now, . . . is very unnatural,” argues Weiss. “Clearly cells make scaﬀolds themselves. If we’re able to program them to do that, we might be able to embed them in the site of injury and have them ﬁgure out for themselves what the pattern should be” [7]. In addition to building structures, others are considering engineering cells to act as miniature drug delivery systems – ﬁghting disease or infection from the inside. Adam Arkin and Chris Voight are currently investigating the use of modiﬁed E. coli to battle against cancer tumours, while Jay Keasling and coworkers at Berkeley are looking at engineering circuits into the same bacteria to persuade them to generate a potent antimalarial drug that is normally found in small amounts in wormwood plants. Clearly, bacterial computing/synthetic biology is still at a relatively early stage in its development, although the ﬁeld is growing at a tremendous pace. It could be argued, with some justiﬁcation, that the dominant science of the new millennium may well prove to be at the intersection of biology and computing. As biologist Roger Brent argues, “I think that synthetic biology . . . will be as important to the 21st century as [the] ability to manipulate bits was to the 20th” [1]. Bibliography Primary Literature 1. Anon (2004) Roger Brent and the alpha project. ACM Ubiquity, 5(3) 2. Arkin A, Ross J (1994) Computational functions in biochemical reaction networks. Biophysical J 67:560–578 3. Barkai N, Leibler S (2000) Circadian clocks limited by noise. Nature 403:267–268
4. Basu S, Gerchman Y, Collins CH, Arnold FH, Weiss R (2005) A synthetic multicellular system for programmed pattern formation. Nature 434:1130–1134 5. Benner SA, Sismour M (2005) Synthetic biology. Nature Rev Genet 6:533–543 6. Brown C (2004) BioBricks to help reverseengineer life. EE Times, June 11 7. Brown S (2005) Command performances. San Diego UnionTribune, December 14 8. Brown TA (1990) Gene cloning: an introduction, 2nd edn. Chapman and Hall, London 9. Crick F (1970) Central dogma of molecular biology. Nature 227:561–563 10. Eisenberg A (2000) Unlike viruses, bacteria find a welcome in the world of computing. New York Times, June 1 11. Elowitz M, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403:335–338 12. Ferber D (2004) Synthetic biology: microbes made to order. Science 303(5655):158–161 13. Gardner T, Cantor R, Collins J (2000) Construction of a genetic toggle switch in Escherichia coli. Nature 403:339–342 14. Geyer CR, Battersby TR, Benner SA (2003) Nucleobase pairing in expanded WatsonCricklike genetic information systems. Structure 11:1485–1498 15. Gibbs WW (2004) Synthetic life. Scientific Am April 26 16. Gravitz L (2004) 10 emerging technologies that will change your world. MIT Technol Rev February 17. Hasty J (2002) Design then mutate. Proc Natl Acad Sci 99(26):16516–16518 18. Hopkin K (2004) Life: the next generation. The Scientist 18(19):56 19. Jackson DA, Symons RH, Berg P (1972) Biochemical method for inserting new genetic information into DNA of simian virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc Natl Acad Sci 69:2904–2909 20. Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356 21. Jha A (2005) From the cells up. The Guardian, March 10 22. Kauffman S (1993) Gene regulation networks: a theory for their global structure and behaviors. Current topics in developmental biology 6:145–182 23. Kauffman SA (1993) The origins of order: Selforganization and selection in evolution. Oxford University Press, New York 24. Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voight CA (2005) Engineering Escherichia coli to see light. Nature 438:441–442 25. Lobban PE, Sutton CA (1973) Enzymatic endtoend joining of DNA molecules. J Mol Biol 78(3):453–471 26. Marks P (2005) For ultrasharp pictures, use a living camera. New Scientist, November 26, p 28 27. McAdams HH, Shapiro L (1995) Circuit simulation of genetic networks. Science 269(5224):650–656 28. McAdams HH, Arkin A (2000) Genetic regulatory circuits: Advances toward a genetic circuit engineering discipline. Current Biol 10:318–320 29. Monod J (1970) Chance and Necessity. Penguin, London 30. Monod J, Changeux JP, Jacob F (1963) Allosteric proteins and cellular control systems. J Mol Biol 6:306–329 31. Morton O (2005) Life, Reinvented. Wired 13(1), January
Bacterial Computing
32. Registry of Standard Biological Parts. http://parts.mit.edu/ 33. Old R, Primrose S (1994) Principles of Gene Manipulation, an Introduction to Genetic Engineering, 5th edn. Blackwell, Boston 34. Ptashne M (2004) A Genetic Switch, 3rd edn. Phage Lambda Revisited. Cold Spring Harbor Laboratory Press, Woodbury 35. Roberts L, Murrell C (eds) (1998) An introduction to genetic engineering. Department of Biological Sciences, University of Warwick 36. Strogatz S (2003) Sync: The Emerging Science of Spontaneous Order. Penguin, London 37. Turing AM (1952) The chemical basis of morphogenesis. Phil Trans Roy Soc B 237:37–72 38. von Neumann J (1941) The general and logical theory of automata. In: Cerebral Mechanisms in Behavior. Wiley, New York 39. Yokobayashi Y, Weiss R, Arnold FH (2002) Directed evolution of a genetic circuit. Proc Natl Acad Sci 99(26):16587–16591 40. You L, Cox III RS, Weiss R, Arnold FH (2004) Programmed population control by cellcell communication and regulated killing. Nature 428:868–871
Books and Reviews Alon U (2006) An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman and Hall/CRC Amos M (ed) (2004) Cellular Computing. Series in Systems Biology Oxford University Press Amos M (2006) Genesis Machines: The New Science of Biocomputing. Atlantic Books, London Benner SA (2003) Synthetic biology: Act natural. Nature 421: 118 Endy D (2005) Foundations for engineering biology. Nature 436:449–453 Kobayashi H, Kaern M, Araki M, Chung K, Gardner TS, Cantor CR, Collins JJ (2004) Programmable cells: interfacing natural and engineered gene networks. Proc Natl Acad Sci 101(22):8414– 8419 Sayler GS, Simpson ML, Cox CD (2004) Emerging foundations: nanoengineering and biomicroelectronics for environmental biotechnology. Curr Opin Microbiol 7:267–273
237
238
Bayesian Games: Games with Incomplete Information
Bayesian Games: Games with Incomplete Information SHMUEL Z AMIR Center for the Study of Rationality, Hebrew University, Jerusalem, Israel
Article Outline Glossary Deﬁnition of the Subject Introduction Harsanyi’s Model: The Notion of Type Aumann’s Model Harsanyi’s Model and Hierarchies of Beliefs The Universal Belief Space Belief Subspaces Consistent Beliefs and Common Priors Bayesian Games and Bayesian Equilibrium Bayesian Equilibrium and Correlated Equilibrium Concluding Remarks and Future Directions Acknowledgments Bibliography Glossary Bayesian game An interactive decision situation involving several decision makers (players) in which each player has beliefs about (i. e. assigns probability distribution to) the payoﬀ relevant parameters and the beliefs of the other players. State of nature Payoﬀ relevant data of the game such as payoﬀ functions, value of a random variable, etc. It is convenient to think of a state of nature as a full description of a ‘gameform’ (actions and payoﬀ functions). Type Also known as state of mind, is a full description of player’s beliefs (about the state of nature), beliefs about beliefs of the other players, beliefs about the beliefs about his beliefs, etc. ad inﬁnitum. State of the world A speciﬁcation of the state of nature (payoﬀ relevant parameters) and the players’ types (belief of all levels). That is, a state of the world is a state of nature and a list of the states of mind of all players. Common prior and consistent beliefs The beliefs of players in a game with incomplete information are said to be consistent if they are derived from the same probability distribution (the common prior) by conditioning on each player’s private information. In other words, if the beliefs are consistent, the only source of diﬀerences in beliefs is diﬀerence in information.
Bayesian equilibrium A Nash equilibrium of a Bayesian game: A list of behavior and beliefs such that each player is doing his best to maximize his payoﬀ, according to his beliefs about the behavior of the other players. Correlated equilibrium A Nash equilibrium in an extension of the game in which there is a chance move, and each player has only partial information about its outcome. Definition of the Subject Bayesian games (also known as Games with Incomplete Information) are models of interactive decision situations in which the decision makers (players) have only partial information about the data of the game and about the other players. Clearly this is typically the situation we are facing and hence the importance of the subject: The basic underlying assumption of classical game theory according to which the data of the game is common knowledge (CK) among the players, is too strong and often implausible in real situations. The importance of Bayesian games is in providing the tools and methodology to relax this implausible assumption, to enable modeling of the overwhelming majority of reallife situations in which players have only partial information about the payoﬀ relevant data. As a result of the interactive nature of the situation, this methodology turns out to be rather deep and sophisticated, both conceptually and mathematically: Adopting the classical Bayesian approach of statistics, we encounter the need to deal with an inﬁnite hierarchy of beliefs: what does each player believe that the other player believes about what he believes. . . is the actual payoﬀ associated with a certain outcome? It is not surprising that this methodological diﬃculty was a major obstacle in the development of the theory, and this article is largely devoted to explaining and resolving this methodological diﬃculty. Introduction A game is a mathematical model for an interactive decision situation involving several decision makers (players) whose decisions aﬀect each other. A basic, often implicit, assumption is that the data of the game, which we call the state of nature, are common knowledge (CK) among the players. In particular the actions available to the players and the payoﬀ functions are CK. This is a rather strong assumption that says that every player knows all actions and payoﬀ functions of all players, every player knows that all other players know all actions and payoﬀ functions, every player knows that every player knows that every player knows. . . etc. ad inﬁnitum. Bayesian games (also known as
Bayesian Games: Games with Incomplete Information
games with incomplete information), which is the subject of this article, are models of interactive decision situations in which each player has only partial information about the payoﬀ relevant parameters of the given situation. Adopting the Bayesian approach, we assume that a player who has only partial knowledge about the state of nature has some beliefs, namely prior distribution, about the parameters which he does not know or he is uncertain about. However, unlike in a statistical problem which involves a single decision maker, this is not enough in an interactive situation: As the decisions of other players are relevant, so are their beliefs, since they aﬀect their decisions. Thus a player must have beliefs about the beliefs of other players. For the same reason, a player needs beliefs about the beliefs of other players about his beliefs and so on. This interactive reasoning about beliefs leads unavoidably to inﬁnite hierarchies of beliefs which looks rather intractable. The natural emergence of hierarchies of beliefs is illustrated in the following example: Example 1 Two players, P1 and P2, play a 2 2 game whose payoﬀs depend on an unknown state of nature s 2 f1; 2g. Player P1’s actions are fT; Bg, player P2’s actions are fL; Rg and the payoﬀs are given in the following matrices:
Now, since the optimal action of P1 depends not only on his belief p but also on the, unknown to him, action of P2, which depends on his belief q, player P1 must therefore have beliefs about q. These are his secondlevel beliefs, namely beliefs about beliefs. But then, since this is relevant and unknown to P2, he must have beliefs about that which will be thirdlevel beliefs of P2, and so on. The whole inﬁnite hierarchies of beliefs of the two players pop out naturally in the analysis of this simple twoperson game of incomplete information. The objective of this article is to model this kind of situation. Most of the eﬀort will be devoted to the modeling of the mutual beliefs structure and only then we add the underlying game which, together with the beliefs structure, deﬁnes a Bayesian game for which we deﬁne the notion of Bayesian equilibrium. Harsanyi’s Model: The Notion of Type
Assume that the belief (prior) of P1 about the event fs D 1g is p and the belief of P2 about the same event is q. The best action of P1 depends both on his prior and on the action of P2, and similarly for the best action of P2. This is given in the following tables:
As suggested by our introductory example, the straightforward way to describe the mutual beliefs structure in a situation of incomplete information is to specify explicitly the whole hierarchies of beliefs of the players, that is, the beliefs of each player about the unknown parameters of the game, each player’s beliefs about the other players’ beliefs about these parameters, each player’s beliefs about the other players’ beliefs about his beliefs about the parameters, and so on ad inﬁnitum. This may be called the explicit approach and is in fact feasible and was explored and developed at a later stage of the theory (see [18,5,6,7]). We will come back to it when we discuss the universal belief space. However, for obvious reasons, the explicit approach is mathematically rather cumbersome and hardly manageable. Indeed this was a major obstacle to the development of the theory of games with incomplete information at its early stages. The breakthrough was provided by John Harsanyi [11] in a seminal work that earned him the Nobel Prize some thirty years later. While Harsanyi actually formulated the problem verbally, in an explicit way, he suggested a solution that ‘avoided’ the diﬃculty of having to deal with inﬁnite hierarchies of beliefs, by providing a much more workable implicit, encapsulated model which we present now.
239
240
Bayesian Games: Games with Incomplete Information
The key notion in Harsanyi’s model is that of type. Each player can be of several types where a type is to be thought of as a full description of the player’s beliefs about the state of nature (the data of the game), beliefs about the beliefs of other players about the state of nature and about his own beliefs, etc. One may think of a player’s type as his state of mind; a speciﬁc conﬁguration of his brain that contains an answer to any question regarding beliefs about the state of nature and about the types of the other players. Note that this implies selfreference (of a type to itself through the types of other players) which is unavoidable in an interactive decision situation. A Harsanyi game of incomplete information consists of the following ingredients (to simplify notations, assume all sets to be ﬁnite): I – Player’s set. S – The set of states of nature. T i – The type set of player i 2 I. Let T D i2I Ti – denote the type set, that is, the set type proﬁles. Y S T – a set of states of the world. p 2 (Y) – probability distribution on Y, called the common prior. (For a set A, we denote the set of probability distributions on A by (A).) Remark A state of the world ! thus consists of a state of nature and a list of the types of the players. We denote it as ! D (s(!); t1 (!); : : : ; t n (!)) : We think of the state of nature as a full description of the game which we call a gameform. So, if it is a game in strategic form, we write the state of nature at state of the world ! as: s(!) D (I; (A i (!)) i2I ; (u i (; !)) i2I ) : The payoﬀ functions ui depend only on the state of nature and not on the types. That is, for all i 2 I: s(!) D s(! 0 ) ) u i (; !) D u i (; ! 0 ) : The game with incomplete information is played as follows: (1) A chance move chooses ! D (s(!); t1 (!); : : : ; t n (!)) 2 Y using the probability distribution p. (2) Each player is told his chosen type t i (!) (but not the chosen state of nature s(!) and not the other players’ types ti (!) D (t j (!)) j¤i ). (3) The players choose simultaneously an action: player i chooses a i 2 A i (!) and receives a payoﬀ u i (a; !) where a D (a1 ; : : : ; a n ) is the vector of chosen actions
and ! is the state of the world chosen by the chance move. Remark The set A i (!) of actions available to player i in state of the world ! must be known to him. Since his only information is his type t i (!), we must impose that A i (!) is T i measurable, i. e., t i (!) D t i (! 0 ) ) A i (!) D A i (! 0 ) : Note that if s(!) was commonly known among the players, it would be a regular game in strategic form. We use the term ‘gameform’ to indicate that the players have only partial information about s(!). The players do not know which s(!) is being played. In other words, in the extensive form game of Harsanyi, the gameforms (s(!))!2Y are not subgames since they are interconnected by information sets: Player i does not know which s(!) is being played since he does not know !; he knows only his own type t i (!). An important application of Harsanyi’s model is made in auction theory, as an auction is a clear situation of incomplete information. For example, in a closed privatevalue auction of a single indivisible object, the type of a player is his privatevalue for the object, which is typically known to him and not to other players. We come back to this in the section entitled “Examples of Bayesian Equilibria”. Aumann’s Model A frequently used model of incomplete information was given by Aumann [2]. Deﬁnition 2 An Aumann model of incomplete information is (I; Y; ( i ) i2I ; P) where: I is the players’ set. Y is a (ﬁnite) set whose elements are called states of the world. For i 2 I, i is a partition of Y. P is a probability distribution on Y, also called the common prior. In this model a state of the world ! 2 Y is chosen according to the probability distribution P, and each player i is informed of i (!), the element of his partition that contains the chosen state of the world !. This is the informational structure which becomes a game with incomplete information if we add a mapping s : Y ! S. The state of nature s(!) is the gameform corresponding to the state of the world ! (with the requirement that the action sets A i (!) are i measurable). It is readily seen that Aumann’s model is a Harsanyi model in which the type set T i of player i is the set of
Bayesian Games: Games with Incomplete Information
his partition elements, i. e., Ti D f i (!)j! 2 Yg, and the common prior on Y is P. Conversely, any Harsanyi model is an Aumann model in which the partitions are those deﬁned by the types, i. e., i (!) D f! 0 2 Yjt i (! 0 ) D t i (!)g. Harsanyi’s Model and Hierarchies of Beliefs As our starting point in modeling incomplete information situations was the appearance of hierarchies of beliefs, one may ask how is the Harsanyi (or Aumann) model related to hierarchies of beliefs and how does it capture this unavoidable feature of incomplete information situations? The main observation towards answering this question is the following: Proposition 3 Any state of the world in Aumann’s model or any type proﬁle t 2 T in Harsanyi’s model deﬁnes (uniquely) a hierarchy of mutual beliefs among the players. Let us illustrate the idea of the proof by the following example: Example Consider a Harsanyi model with two players, I and II, each of which can be of two types: TI D fI1 ; I2 g, TII D fII1 ; II2 g and thus: T D f(I1 ; II1 ); (I1 ; II2 ); (I2 ; II1 ); (I2 ; II2 )g. The probability p on types is given by:
Denote the corresponding states of nature by a D s(I1 II1 ), b D s(I1 II2 ), c D s(I2 II1 ) and d D s(I2 II2 ). These are the states of nature about which there is incomplete information. The game in extensive form:
Assume that the state of nature is a. What are the belief hierarchies of the players?
Firstlevel beliefs are obtained by each player from p, by conditioning on his type: I 1 : With probability 12 the state is a and with probability 1 2 the state is b. I 2 : With probability 23 the state is c and with probability 1 3 the state is d. II1 : With probability 37 the state is a and with probability 47 the state is c. II2 : With probability 35 the state is b and with probability 25 the state is d. Secondlevel beliefs (using shorthand notation for the above beliefs: 12 a C 12 b , etc.): I 1 : With probability 12 , player II believes 37 a C 47 c , and with probability 12 , player II believes 35 b C 25 d . I 2 : With probability 23 , player II believes 37 a C 47 c , and with probability 13 , player II believes 35 b C 25 d . II1 : With probability 37 , player I believes 12 a C 12 b , and with probability 47 , player I believes 23 c C 13 d . II2 : With probability 35 , player I believes 12 a C 12 b , and with probability 25 , player I believes 23 c C 13 d . Thirdlevel beliefs: I 1 : With probability
1 2,
player II believes that: “With player I believes 12 a C 12 b and with probability probability 47 , player I believes 23 c C 13 d ”. 3 7,
And with probability 12 , player II believes that: “With probability 35 , player I believes 12 a C 12 b and with probability 25 , player I believes 23 c C 13 d ”. and so on and so on. The idea is very simple and powerful; since each player of a given type has a probability distribution (beliefs) both about the types of the other players
241
242
Bayesian Games: Games with Incomplete Information
and about the set S of states of nature, the hierarchies of beliefs are constructed inductively: If the kth level beliefs (about S) are deﬁned for each type, then the beliefs about types generates the (k C 1)th level of beliefs. Thus the compact model of Harsanyi does capture the whole hierarchies of beliefs and it is rather tractable. The natural question is whether this model can be used for all hierarchies of beliefs. In other words, given any hierarchy of mutual beliefs of a set of players I about a set S of states of nature, can it be represented by a Harsanyi game? This was answered by Mertens and Zamir [18], who constructed the universal belief space; that is, given a set S of states of nature and a ﬁnite set I of players, they looked for the space ˝ of all possible hierarchies of mutual beliefs about S among the players in I. This construction is outlined in the next section. The Universal Belief Space Given a ﬁnite set of players I D f1; : : : ; ng and a set S of states of nature, which are assumed to be compact, we ﬁrst identify the mathematical spaces in which lie the hierarchies of beliefs. Recall that (A) denotes the set of probability distributions on A and deﬁne inductively the sequence of spaces (X k )1 kD1 by X1
D (S)
X kC1 D X k (S X kn1 );
(1) for k D 1; 2; : : : :
(2)
Any probability distribution on S can be a ﬁrstlevel belief and is thus in X 1 . A secondlevel belief is a joint probability distribution on S and the ﬁrstlevel beliefs of the other (n 1) players. This is an element in (S X1n1 ) and therefore a twolevel hierarchy is an element of the product space X 1 (S X1n1 ), and so on for any level. Note that at each level belief is a joint probability distribution on S and the previous level beliefs, allowing for correlation between the two. In dealing with these probability spaces we need to have some mathematical structure. More speciﬁcally, we make use of the weak topology: Deﬁnition 4 A sequence (Fn )1 nD1 of probability measures topology to the probabil(on ˝) converges in the weak R R ity F if and only if limn!1 ˝ g(!)dFn D ˝ g(!)dF for all bounded and continuous functions g : ˝ ! R. It follows from the compactness of S that all spaces deﬁned by (1)–(2) are compact in the weak topology. However, for k > 1, not every element of X k represents a coherent hierarchy of beliefs of level k. For example, if (1 ; 2 ) 2 X 2 where 1 2 (S) D X1 and 2 2 (S X1n1 ), then for this to describe meaningful
beliefs of a player, the marginal distribution of 2 on S must coincide with 1 . More generally, any event A in the space of klevel beliefs has to have the same (marginal) probability in any higherlevel beliefs. Furthermore, not only are each player’s beliefs coherent, but he also considers only coherent beliefs of the other players (only those that are in support of his beliefs). Expressing formally this coherency condition yields a selection Tk X k such that T1 D X 1 D (S). It is proved that the projection of TkC1 on X k is T k (that is, any coherent klevel hierarchy can be extended to a coherent k C 1level hierarchy) and that all the sets T k are compact. Therefore, the projective limit, T D lim1 k Tk , is well deﬁned and nonempty.1 Deﬁnition 5 The universal type space T is the projective limit of the spaces (Tk )1 kD1 . That is, T is the set of all coherent inﬁnite hierarchies of beliefs regarding S, of a player in I. It does not depend on i since by construction it contains all possible hierarchies of beliefs regarding S, and it is therefore the same for all players. It is determined only by S and the number of players n. Proposition 6 The universal type space T is compact and satisﬁes T (S T n1 ) :
(3)
The sign in (3) is to be read as an isomorphism and Proposition 6 says that a type of player can be identiﬁed with a joint probability distribution on the state of nature and the types of the other players. The implicit equation (3) reﬂects the selfreference and circularity of the notion of type: The type of a player is his beliefs about the state of nature and about all the beliefs of the other players, in particular, their beliefs about his own beliefs. Deﬁnition 7 The universal belief space (UBS) is the space ˝ deﬁned by: ˝ D S Tn :
(4)
An element of ˝ is called a state of the world. Thus a state of the world is ! D (s(!); t1 (!); t2 (!); : : : ; t n (!)) with s(!) 2 S and t i (!) 2 T for all i in I. This is the speciﬁcation of the states of nature and the types of all players. The universal belief space ˝ is what we looked for: the set of all incomplete information and mutual belief conﬁgurations of a set of n players regarding the state of 1 The projective limit (also known as the inverse limit) of the sequence (Tk )1 kD1 is the space T of all sequences (1 ; 2 ; : : : ) 2 1 kD1 Tk which satisfy: For any k 2 N, there is a probability distribution k 2 (S Tkn1 ) such that kC1 D (k ; k ).
Bayesian Games: Games with Incomplete Information
nature. In particular, as we will see later, all Harsanyi and Aumann models are embedded in ˝, but it includes also belief conﬁgurations that cannot be modeled as Harsanyi games. As we noted before, the UBS is determined only by the set of states of nature S and the set of players I, so it should be denoted as ˝(S; I). For the sake of simplicity we shall omit the arguments and write ˝, unless we wish to emphasize the underlying sets S and I. The execution of the construction of the UBS according to the outline above involves some nontrivial mathematics, as can be seen in Mertens and Zamir [18]. The reason is that even with a ﬁnite number of states of nature, the space of ﬁrstlevel beliefs is a continuum, the second level is the space of probability distributions on a continuum and the third level is the space of probability distributions on the space of probability distributions on a continuum. This requires some structure for these spaces: For a (Borel) p measurable event E let B i (E) be the event “player i of type ti believes that the probability of E is at least p”, that is, p
B i (E) D f! 2 ˝jt i (E) pg Since this is the object of beliefs of players other than i (beliefs of j ¤ i about the beliefs of i), this set must also be measurable. Mertens and Zamir used the weak topology p which is the minimal topology with which the event B i (E) is (Borel) measurable for any (Borel) measurable event E. In this topology, if A is a compact set then (A), the space of all probability distributions on A, is also compact. However, the hierarchic construction can also be made with stronger topologies on (A) (see [9,12,17]). Heifetz and Samet [14] worked out the construction of the universal belief space without topology, using only a measurable structure (which is implied by the assumption that the beliefs of the players are measurable). All these explicit constructions of the belief space are within what is called the semantic approach. Aumann [6] provided another construction of a belief system using the syntactic approach based on sentences and logical formulas specifying explicitly what each player believes about the state of nature, about the beliefs of the other players about the state of nature and so on. For a detailed construction see Aumann [6], Heifetz and Mongin [13], and Meier [16]. For a comparison of the syntactic and semantic approaches see Aumann and Heifetz [7]. Belief Subspaces In constructing the universal belief space we implicitly assumed that each player knows his own type since we speciﬁed only his beliefs about the state of nature and about the beliefs of the other players. In view of that, and since
by (3) a type of player i is a probability distribution on S T Infig , we can view a type ti also as a probability distribution on ˝ D S T I in which the marginal distribution on T i is a degenerate delta function at ti ; that is, if ! D (s(!); t1 (!); t2 (!); : : : ; t n (!)), then for all i in I, t i (!) 2 (˝) and
t i (!)[t i D t i (!)] D 1 :
(5)
In particular it follows that if Supp(t i ) denotes the support of ti , then ! 0 2 Supp(t i (!)) ) t i (! 0 ) D t i (!) :
(6)
Let Pi (!) D Supp(t i (!)) ˝. This deﬁnes a possibility correspondence; at state of the world !, player i does not consider as possible any point not in Pi (!). By (6), Pi (!) \ Pi (! 0 ) ¤ ) Pi (!) D Pi (! 0 ) : However, unlike in Aumann’s model, Pi does not deﬁne a partition of ˝ since it is possible that ! … Pi (!), and hence the union [!2˝ Pi (!) may be strictly smaller than ˝ (see Example 7). If ! 2 Pi (!) Y holds for all ! in some subspace Y ˝, then (Pi (!))!2Y is a partition of Y. As we said, the universal belief space includes all possible beliefs and mutual belief structures over the state of nature. However, in a speciﬁc situation of incomplete information, it may well be that only part of ˝ is relevant for describing the situation. If the state of the world is ! then clearly all states of the world in [ i2I Pi (!) are relevant, but this is not all, because if ! 0 2 Pi (!) then all states in Pj (! 0 ), for j ¤ i, are also relevant in the considerations of player i. This observation motivates the following deﬁnition: Deﬁnition 8 A belief subspace (BLsubspace) is a closed subset Y of ˝ which satisﬁes: Pi (!) Y
8i 2 I
and 8! 2 Y :
(7)
A belief subspace is minimal if it has no proper subset which is also a belief subspace. Given ! 2 ˝, the belief subspace at !, denoted by Y(!), is the minimal subspace containing !. Since ˝ is a BLsubspace, Y(!) is well deﬁned for all ! 2 ˝. A BLsubspace is a closed subset of ˝ which is also closed under beliefs of the players. In any ! 2 Y, it contains all states of the world which are relevant to the situation: If ! 0 … Y, then no player believes that ! 0 is possible, no player believes that any other player believes that ! 0 is possible, no player believes that any player believes that any player believes. . . , etc.
243
244
Bayesian Games: Games with Incomplete Information
Remark 9 The subspace Y(!) is meant to be the minimal subspace which is beliefclosed by all players at the state !. ˜ Thus, a natural deﬁnition would be: Y(!) is the minimal BLsubspace containing Pi (!) for all i in I. However, if for ˜ Yet, every player the state ! is not in Pi (!) then ! … Y(!). even if it is not in the belief closure of the players, the real state ! is still relevant (at least for the analyst) because it determines the true state of nature; that is, it determines the true payoﬀs of the game. This is the reason for adding the true state of the world !, even though “it may not be in the mind of the players”. It follows from (5), (6) and (7) that a BLsubspace Y has the following structure: Proposition 10 A closed subset Y of the universal belief space ˝ is a BLsubspace if and only if it satisﬁes the following conditions: 1. For any ! D (s(!); t1 (!); t2 (!); : : : ; t n (!)) 2 Y, and for all i, the type t i (!) is a probability distribution on Y. 2. For any ! and ! 0 in Y, ! 0 2 Supp(t i (!)) ) t i (! 0 ) D t i (!) : In fact condition 1 follows directly from Deﬁnition 8 while condition 2 follows from the general property of the UBS expressed in (6). Given a BLsubspace Y in ˝(S; I) we denote by T i the type set of player i, Ti D ft i (!)j! 2 Yg ; and note that unlike in the UBS, in a speciﬁc model Y, the type sets are typically not the same for all i, and the analogue of (4) is Y S T1 Tn : A BLsubspace is a model of incomplete information about the state of nature. As we saw in Harsanyi’s model, in any model of incomplete information about a ﬁxed set S of states of nature, involving the same set of players I, a state of the world ! deﬁnes (encapsulates) an inﬁnite hierarchy of mutual beliefs of the players I on S. By the universality of the belief space ˝(S; I), there is ! 0 2 ˝(S; I) with the same hierarchy of beliefs as that of !. The mapping of each ! to its corresponding ! 0 in ˝(S; I) is called a belief morphism, as it preserves the belief structure. Mertens and Zamir [18] proved that the space ˝(S; I) is universal in the sense that any model Y of incomplete information of the set of players I about the state of nature s 2 S can be embedded in ˝(S; I) via belief morphism ' : Y ! ˝(S; I) so that '(Y) is a belief subspace in ˝(S; I). In the following examples we give the BLsubspaces representing some known models.
Examples of Belief Subspaces Example 1 (A game with complete information) If the state of nature is s0 2 S then in the universal belief space ˝(S; I), the game is described by a BLsubspace Y consisting of a single state of the world: Y D f!g where ! D (s0 ; [1!]; : : : ; [1!]) : Here [1!] is the only possible probability distribution on Y, namely, the trivial distribution supported by !. In particular, the state of nature s0 (i. e., the data of the game) is commonly known. Example 2 (Commonly known uncertainty about the state of nature) Assume that the players’ set is I D f1; : : : ; ng and there are k states of nature representing, say, k possible ndimensional payoﬀ matrices G1 ; : : : ; G k . At the beginning of the game, the payoﬀ matrix is chosen by a chance move according to the probability vector p D (p1 ; : : : ; p k ) which is commonly known by the players but no player receives any information about the outcome of the chance move. The set of states of nature is S D fG1 ; : : : ; G k g. The situation described above is embedded in the UBS, ˝(S; I), as the following BLsubspace Y consisting of k states of the world (denoting p 2 (Y) by [p1 !1 ; : : : ; p k ! k ]):
Y D f!1 ; : : : ; ! k g !1 D (G1 ; [p1 !1 ; : : : ; p k ! k ]; : : : ; [p1 !1 ; : : : ; p k ! k ]) !2 D (G2 ; [p1 !1 ; : : : ; p k ! k ]; : : : ; [p1 !1 ; : : : ; p k ! k ]) :::::: ! k D (G k ; [p1 !1 ; : : : ; p k ! k ]; : : : ; [p1 !1 ; : : : ; p k ! k ]).
There is a single type, [p1 !1 ; : : : ; p k ! k ], which is the same for all players. It should be emphasized that the type is a distribution on Y (and not just on the states of nature), which implies that the beliefs [p1 G1 ; : : : ; p k G k ] on the state of nature are commonly known by the players. Example 3 (Two players with incomplete information on one side) There are two players, I D fI; IIg, and two possible payoﬀ matrices, S D fG1 ; G2 g. The payoﬀ matrix is chosen at random with P(s D G1 ) D p, known to both players. The outcome of this chance move is known only to player I. Aumann and Maschler have studied such situations in which the chosen matrix is played repeatedly and the issue is how the informed player strategically uses his information (see Aumann and Maschler [8] and its references). This situation is presented in the UBS by the following BLsubspace: Y D f!1 ; !2 g !1 D (G1 ; [1!1 ]; [p!1 ; (1 p)!2 ])
Bayesian Games: Games with Incomplete Information
!2 D (G2 ; [1!2 ]; [p!1 ; (1 p)!2 ]). Player I has two possible types: I1 D [1!1 ] when he is informed of G1 , and I2 D [1!2 ] when he is informed of G2 . Player II has only one type, II D [p!1 ; (1 p)!2 ]. We describe this situation in the following extensive formlike ﬁgure in which the oval forms describe the types of the players in the various vertices.
Example 5 (Incomplete information on two sides: A Harsanyi game) In this example, the set of players is again I D fI; IIg and the set of states of nature is S D fs11 ; s12 ; s21 ; s22 g. In the universal belief space ˝(S; I) consider the following BLsubspace consisting of four states of the world. Example 4 (Incomplete information about the other players’ information) In the next example, taken from Sorin and Zamir [23], one of two players always knows the state of nature but he may be uncertain whether the other player knows it. There are two players, I D fI; IIg, and two possible payoﬀ matrices, S D fG1 ; G2 g. It is commonly known to both players that the payoﬀ matrix is chosen at random by a toss of a fair coin: P(s D G1 ) D 1/2. The outcome of this chance move is told to player I. In addition, if (and only if) the matrix G1 was chosen, another fair coin toss determines whether to inform player II which payoﬀ matrix was chosen. In any case player I is not told the result of the second coin toss. This situation is described by the following belief space with three states of the world:
Y D f!1 ; !2 ; !3 g !1 D (G1 ; [ 12 !1 ; 12 !2 ]; [1!1 ]) !2 D (G1 ; [ 12 !1 ; 12 !2 ]; [ 13 !2 ; 23 !3 ]) !3 D (G2 ; [1!3 ]; [ 12 !2 ; 12 !3 ])
Each player has two types and the type sets are:
1 1 !1 ; !2 ; [1!3 ] 2 2
1 2 : TII D fII1 ; II2 g D [1!1 ] ; !2 ; !3 3 3
Y D f!11 ; !12 ; !21 ; !22 g !11 D s11 ; [ 37 !11 ; 47 !12 ]; [ 35 !11 ; 25 !21 ] !12 D s12 ; [ 37 !11 ; 47 !12 ]; [ 45 !12 ; 15 !22 ] !21 D s21 ; [ 23 !21 ; 13 !22 ]; [ 35 !11 ; 25 !21 ] !22 D s22 ; [ 23 !21 ; 13 !22 ]; [ 45 !12 ; 15 !22 ]
Again, each player has two types and the type sets are:
3 2 4 !11 ; !12 ; !21 ; 7 7 3
3 4 2 !11 ; !21 ; !12 ; TII D fII1 ; II2 g D 5 5 5 TI D fI1 ; I2 g D
The type of a player determines his beliefs about the type of the other player. For example, player I of type I 1 assigns probability 3/7 to the state of the world ! 11 in which player II is of type II1 , and probability 4/7 to the state of the world ! 12 in which player II is of type II2 . Therefore, the beliefs of type I 1 about the types of player II are P(II1 ) D 3/7, P(II2 ) D 4/7. The mutual beliefs about each other’s type are given in the following tables:
TI D fI1 ; I2 g D
I1 I2
II1 3/7 2/3
II2 4/7 1/3
Beliefs of player I Note that in all our examples of belief subspaces, condition (6) is satisﬁed; the support of a player’s type contains only states of the world in which he has that type. The game with incomplete information is described in the following ﬁgure:
1 !22 3 1 !22 : 5
I1 I2
II1
II2
3/5 2/5
4/5 1/5
Beliefs of player II
These are precisely the beliefs of Bayesian players if the pair of types (t I ; t II ) in T D TI TII is chosen according to the prior probability distribution p below, and each player is then informed of his own type:
245
246
Bayesian Games: Games with Incomplete Information
tent beliefs cannot be described as a Harsanyi or Aumann model; it cannot be described as a game in extensive form.
In other words, this BLsubspace is a Harsanyi game with type sets TI ; TII and the prior probability distribution p on the types. Actually, as there is onetoone mapping between the type set T and the set S of states of nature, the situation is generated by a chance move choosing the state of nature s i j 2 S according to the distribution p (that is, P(s i j ) D P(I i ; II j ) for i and j in f1; 2g) and then player I is informed of i and player II is informed of j. As a matter of fact, all the BLsubspaces in the previous examples can also be written as Harsanyi games, mostly in a trivial way. Example 6 (Inconsistent beliefs) In the same universal belief space, ˝(S; I) of the previous example, consider now another BLsubspace Y˜ which diﬀers from Y only by changing the type II1 of player II from [ 35 !11 ; 25 !21 ] to [ 12 !11 ; 12 !21 ], that is,
Y˜ D f!11 ; !12 ; !21 ; !22 g !11 D s11 ; [ 37 !11 ; 47 !12 ]; [ 12 !11 ; !12 D s12 ; [ 37 !11 ; 47 !12 ]; [ 45 !12 ; !21 D s21 ; [ 23 !21 ; 13 !22 ]; [ 12 !11 ; !22 D s22 ; [ 23 !21 ; 13 !22 ]; [ 45 !12 ;
1 2 !21 ] 1 5 !22 ] 1 2 !21 ] 1 5 !22 ]
with type sets:
3 2 4 1 !11 ; !12 ; !21 ; !22 TI D fI1 ; I2 g D 7 7 3 3
1 4 1 1 !11 ; !21 ; !12 ; !22 TII D fII1 ; II2 g D : 2 2 5 5 Now, the mutual beliefs about each other’s type are:
I1 I2
II1 3/7 2/3
II2 4/7 1/3
Beliefs of player I
I1 I2
II1
II2
1/2 1/2
4/5 1/5
Beliefs of player II
Unlike in the previous example, these beliefs cannot be derived from a prior distribution p. According to Harsanyi, these are inconsistent beliefs. A BLsubspace with inconsis
Example 7 (“Highly inconsistent” beliefs) In the previous example, even though the beliefs of the players were inconsistent in all states of the world, the true state was considered possible by all players (for example in the state ! 12 player I assigns to this state probability 4/7 and player II assigns to it probability 4/5). As was emphasized before, the UBS contains all belief conﬁgurations, including highly inconsistent or wrong beliefs, as the following example shows. The belief subspace of the two players I and II concerning the state of nature which can be s1 or s2 is given by: Y D f! 1 ; !2 g !1 D s1 ; [ 12 !1 ; 12 !2 ]; [1!2 ] !2 D s2 ; [ 12 !1 ; 12 !2 ]; [1!2 ] . In the state of the world ! 1 , the state of nature is s1 , player I assigns equal probabilities to s1 and s2 , but player II assigns probability 1 to s2 . In other words, he does not consider as possible the true state of the world (and also the true state of nature): !1 … PI (!1 ) and consequently [!2Y PI (!) D f!2 g which is strictly smaller than Y. By the deﬁnition of belief subspace and condition (6), this also implies that [!2˝ PI (!) is strictly smaller than ˝ (as it does not contain ! 1 ). Consistent Beliefs and Common Priors A BLsubspace Y is a semantic belief system presenting, via the notion of types, the hierarchies of belief of a set of players having incomplete information about the state of nature. A state of the world captures the situation at what is called the interim stage: Each player knows his own type and has beliefs about the state of nature and the types of the other players. The question “what is the real state of the world !?” is not addressed. In a BLsubspace, there is no chance move with explicit probability distribution that chooses the state of the world, while such a probability distribution is part of a Harsanyi or an Aumann model. Yet, in the belief space Y of Example 5 in the previous section, such a prior distribution p emerged endogenously from the structure of Y. More speciﬁcally, if the state ! 2 Y is chosen by a chance move according to the probability distribution p and each player i is told his type t i (!), then his beliefs are precisely those described by t i (!). This is a property of the BLsubspace that we call consistency (which does not hold, for instance, for the BLsubspace Y˜ in Example 6) and that we deﬁne now: Let Y ˝ be a BLsubspace.
Bayesian Games: Games with Incomplete Information
Deﬁnition 11 (i) A probability distribution p 2 (Y) is said to be consistent if for any player i 2 I, Z t i (!)dp : (8) pD Y
(ii) A BLsubspace Y is said to be consistent if there is a consistent probability distribution p with Supp(p) D Y. A consistent BLsubspace will be called a Csubspace. A state of the world ! 2 ˝ is said to be consistent if it is a point in a Csubspace. The interpretation of (8) is that the probability distribution p is “the average” of the types t i (!) of player i (which are also probability distributions on Y), when the average is taken on Y according to p. This deﬁnition is not transparent; it is not clear how it captures the consistency property we have just explained, in terms of a chance move choosing ! 2 Y according to p. However, it turns out to be equivalent. For ! 2 Y denote i (!) D f! 0 2 Yjt i (! 0 ) D t i (!)g; then we have: Proposition 12 A probability distribution p 2 (Y) is consistent if and only if t i (!)(A) D p(Aj i (!))
same information will have precisely the same beliefs. It is no surprise that this assumption has strong consequences, the most known of which is due to Aumann [2]: Players with consistent beliefs cannot agree to disagree. That is, if at some state of the world it is commonly known that one player assigns probability q1 to an event E and another player assigns probability q2 to the same event, then it must be the case that q1 D q2 . Variants of this result appear under the title of “No trade theorems” (see, e. g., [19]): Rational players with consistent beliefs cannot believe that they both can gain from a trade or a bet between them. The plausibility and the justiﬁcation of the common prior assumption was extensively discussed in the literature (see, e. g., [4,10,11]). It is sometimes referred to in the literature as the Harsanyi doctrine. Here we only make the observation that within the set of BLsubspaces in ˝, the set of consistent BLsubspaces is a set of measure zero. To see the idea of the proof, consider the following example: Example 8 (Generalization of Examples 5 and 6) Consider a BLsubspace as in Examples 5 and 6 but with type sets: TI D fI1 ; I2 g D f[˛1 !11 ; (1 ˛1 )!12 ]; [˛2 !21 ; (1 ˛2 )!22 ]g TII D fII1 ; II2 g
(9)
holds for all i 2 I and for any measurable set A Y. In particular, a Harsanyi or an Aumann model is represented by a consistent BLsubspace since, by construction, the beliefs are derived from a common prior distribution which is part of the data of the model. The role of the prior distribution p in these models is actually not that of an additional parameter of the model but rather that of an additional assumption on the belief system, namely, the consistency assumption. In fact, if a minimal belief subspace is consistent, then the common prior p is uniquely determined by the beliefs, as we saw in Example 5; there is no need to specify p as additional data of the system. Proposition 13 If ! 2 ˝ is a consistent state of the world, and if Y(!) is the smallest consistent BLsubspace containing !, then the consistent probability distribution p on Y(!) is uniquely determined. (The formulation of this proposition requires some technical qualiﬁcation if Y(!) is a continuum.) The consistency (or the existence of a common prior), is quite a strong assumption. It assumes that diﬀerences in beliefs (i. e., in probability assessments) are due only to diﬀerences in information; players having precisely the
D f[ˇ1 !11 ; (1 ˇ1 )!21 ]; [ˇ2 !12 ; (1 ˇ2 )!22 ]g : For any (˛1 ; ˛2 ; ˇ1 ; ˇ2 ) 2 [0; 1]4 this is a BLsubspace. The mutual beliefs about each other’s type are:
I1 I2
II1 ˛1 ˛2
II2 1 ˛1 1 ˛2
Beliefs of player I
I1 I2
II1
II2
ˇ1 1 ˇ1
ˇ2 1 ˇ2
Beliefs of player II
If the subspace is consistent, these beliefs are obtained as conditional distributions from some prior probability distribution p on T D TI TII , say, by p of the following matrix:
247
248
Bayesian Games: Games with Incomplete Information
This implies (assuming p i j ¤ 0 for all i and j), ˛1 p11 D ; p12 1 ˛1
p21 ˛2 D p22 1 ˛2 p11 p22 ˛1 1 ˛2 and hence D : p12 p21 1 ˛1 ˛2
Similarly, p11 ˇ1 D ; p21 1 ˇ1
p12 ˇ2 D p22 1 ˇ2 p11 p22 ˇ1 1 ˇ2 and hence D : p12 p21 1 ˇ1 ˇ2
It follows that the types must satisfy: ˛1 1 ˛2 ˇ1 1 ˇ2 D ; 1 ˛1 ˛2 1 ˇ1 ˇ2
all the players. For a vector of actions a 2 A(!), we write u i (!; a) for u i (!)(a). Given a BLsubspace Y ˝(S; I) we deﬁne the Bayesian game on Y as follows: Deﬁnition 14 The Bayesian game on Y is a vector payoﬀ game in which: I D f1; : : : ; ng – the players’ set. ˙ i – the strategy set of player i, is the set of mappings i : Y ! A i
which are Ti –measurable :
In particular: t i (!1 ) D t i (!2 ) H) i (!1 ) D i (!2 ) :
(10)
which is generally not the case. More precisely, the set of (˛1 ; ˛2 ; ˇ1 ; ˇ2 ) 2 [0; 1]4 satisfying the condition (10) is a set of measure zero; it is a threedimensional set in the fourdimensional set [0; 1]4 . Nyarko [21] proved that even the ratio of the dimensions of the set of consistent BLsubspaces to the dimension of the set of BLsubspaces goes to zero as the latter goes to inﬁnity. Summing up, most BLsubspaces are inconsistent and thus do not satisfy the common prior condition.
Let ˙ D i2I ˙ i . The payoﬀ function ui for player i is a vectorvalued function u i D u t i t i 2Ti , where u t i (the payoﬀ function of player i of type ti ) is a mapping u t i : ˙ ! R deﬁned by Z u t i ( ) D
u i (!; (!))dt i (!) :
(11)
Y
Bayesian Games and Bayesian Equilibrium As we said, a game with incomplete information played by Bayesian players, often called a Bayesian game, is a game in which the players have incomplete information about the data of the game. Being a Bayesian, each player has beliefs (probability distribution) about any relevant data he does not know, including the beliefs of the other players. So far, we have developed the belief structure of such a situation which is a BLsubspace Y in the universal belief space ˝(S; I). Now we add the action sets and the payoﬀ functions. These are actually part of the description of the state of nature: The mapping s : ˝ ! S assigns to each state of the world ! the gameform s(!) played at this state. To emphasize this interpretation of s(!) as a gameform, we denote it also as ! : ! D (I; A i (t i (!)) i2I ; (u i (!)) i2I ) ; where A i (t i (!)) is the actions set (pure strategies) of player i at ! and u i (!) : A(!) ! R is his payoﬀ function and A(!) D i2I A i (t i (!)) is the set of action proﬁles at state !. Note that while the actions of a player depend only on his type, his payoﬀ depends on the actions and types of
Note that u t i is T i –measurable, as it should be. When Y is a ﬁnite BLsubspace, the abovedeﬁned Bayesian game is an nperson “game” in which the payoﬀ for player i is a vector with a payoﬀ for each one of his types (therefore, a vector of dimension jTi j). It becomes a regular gameform for a given state of the world ! since then the payoﬀ to player i is u t i (!) . However, these gameforms are not regular games since they are interconnected; the players do not know which of these “games” they are playing (since they do not know the state of the world !). Thus, just like a Harsanyi game, a Bayesian game on a BLsubspace Y consists of a family of connected gameforms, one for each ! 2 Y. However, unlike a Harsanyi game, a Bayesian game has no chance move that chooses the state of the world (or the vector of types). A way to transform a Bayesian game into a regular game was suggested by R. Selten and was named by Harsanyi as the Selten game G (see p. 496 in [11]). This is a game with jT1 j jT2 j : : : jTn j players (one for each type) in which each player t i 2 Ti chooses a strategy and then selects his (n 1) partners, one from each T j ; j ¤ i, according to his beliefs ti .
Bayesian Games: Games with Incomplete Information
Bayesian Equilibrium Although a Bayesian game is not a regular game, the Nash equilibrium concept based on the notion of best reply can be adapted to yield the solution concept of Bayesian equilibrium (also called Nash–Bayes equilibrium). Deﬁnition 15 A vector of strategies D (1 ; : : : ; n ), in a Bayesian game, is called a Bayesian equilibrium if for all i in I and for all ti in T i , u t i () u t i (i ; ˜ i ) ;
8˜ i 2 ˙ i ;
(12)
where, as usual, i D ( j ) j¤i denotes the vector of strategies of players other than i. Thus, a Bayesian equilibrium speciﬁes a behavior for each player which is a best reply to what he believes is the behavior of the other players, that is, a best reply to the strategies of the other players given his type. In a game with complete information, which corresponds to a BLsubspace with one state of the world (Y D f!g), as there is only one type of each player, and the beliefs are all probability one on a singleton, the Bayesian equilibrium is just the wellknown Nash equilibrium. Remark 16 It is readily seen that when Y is ﬁnite, any Bayesian equilibrium is a Nash equilibrium of the Selten game G in which each type is a player who selects the types of his partners according to his beliefs. Similarly, we can transform the Bayesian game into an ordinary game in strategic form by deﬁning the payoﬀ function to player i P to be u˜ i D t i 2Ti t i u t i where t i are strictly positive. Again, independently of the values of the constants t i , any Bayesian equilibrium is a Nash equilibrium of this game and vice versa. In particular, if we choose the conP stants so that t i 2Ti t i D 1, we obtain the game suggested by Aumann and Maschler in 1967 (see p. 95 in [8]) and again, the set of Nash equilibria of this game is precisely the set of Bayesian equilibria. The Harsanyi Game Revisited As we observed in Example 5, the belief structure of a consistent BLsubspace is the same as in a Harsanyi game after the chance move choosing the types. That is, the embedding of the Harsanyi game as a BLsubspace in the universal belief space is only at the interim stage, after the moment that each player gets to know his type. The Harsanyi game on the other hand is at the ex ante stage, before a player knows his type. Then, what is the relation between the Nash equilibrium in the Harsanyi game at the ex ante stage and the equilibrium at the interim stage, namely, the Bayesian equilibrium of the corresponding BLsubspace?
This is an important question concerning the embedding of the Harsanyi game in the UBS since, as we said before, the chance move choosing the types does not appear explicitly in the UBS. The answer to this question was given by Harsanyi (1967–8) (assuming that each type ti has a positive probability): Theorem 17 (Harsanyi) The set of Nash equilibria of a Harsanyi game is identical to the set of Bayesian equilibria of the equivalent BLsubspace in the UBS. In other words, this theorem states that any equilibrium in the ex ante stage is also an equilibrium at the interim stage and vice versa. In modeling situations of incomplete information, the interim stage is the natural one; if a player knows his beliefs (type), then why should he analyze the situation, as Harsanyi suggests, from the ex ante point of view as if his type was not known to him and he could equally well be of another type? Theorem 17 provides a technical answer to this question: The equilibria are the same in both games and the equilibrium strategy of the ex ante game speciﬁes for each type precisely his equilibrium strategy at the interim stage. In that respect, for a player who knows his type, the Harsanyi model is just an auxiliary game to compute his equilibrium behavior. Of course the deeper answer to the question above comes from the interactive nature of the situation: Even though player i knows he is of type ti , he knows that his partners do not know that and that they may consider the possibility that he is of type ˜t i , and since this aﬀects their behavior, the behavior of type ˜t i is also relevant to player i who knows he is of type ti . Finally, Theorem 17 makes the Bayesian equilibrium the natural extension of the Nash equilibrium concept to games with incomplete information for consistent or inconsistent beliefs, when the Harsanyi ordinary game model is unavailable. Examples of Bayesian Equilibria In Example 6, there are two players of two types each, and with inconsistent mutual beliefs given by
I1 I2
II1 3/7 2/3
II2 4/7 1/3
Beliefs of player I
I1 I2
II1
II2
1/2 1/2
4/5 1/5
Beliefs of player II
Assume that the payoﬀ matrices for the four type’s of proﬁles are:
249
250
Bayesian Games: Games with Incomplete Information
As the beliefs are inconsistent they cannot be presented by a Harsanyi game. Yet, we can compute the Bayesian equilibrium of this Bayesian game. Let (x; y) be the strategy of player I, which is: Play the mixed strategy [x(T); (1 x)(B)] when you are of type I 1 . Play the mixed strategy [y(T); (1 y)(B)] when you are of type I 2 . and let (z; t) be the strategy of player II, which is: Play the mixed strategy [z(L); (1 z)(R)] when you are of type II1 . Play the mixed strategy [t(L); (1 t)(R)] when you are of type II2 . For 0 < x; y; z; t < 1, each player of each type must be indiﬀerent between his two pure actions; that yields the values in equilibrium: xD
3 ; 5
yD
2 ; 5
zD
7 ; 9
tD
2 : 9
There is no “expected payoﬀ” since this is a Bayesian game and not a game; the expected payoﬀs depend on the actual state of the world, i. e., the actual types of the players and the actual payoﬀ matrix. For example, the state of the world is !11 D (G11 ; I1 ; II1 ); the expected payoﬀs are: (!11 ) D
46 6 3 2 7/9 D ; G11 ; : 2/9 5 5 45 45
Similarly:
3 2 ; G12 5 5
2 3 (!21 ) D ; G21 5 5
2 3 (!22 ) D ; G22 5 5 (!12 ) D
2/9 7/9 7/9 2/9 2/9 7/9
D
D
D
18 4 ; 45 45 21 21 ; 45 45 28 70 ; 45 45
However, these are the objective payoﬀs as viewed by the analyst; they are viewed diﬀerently by the players. For player i of type ti the relevant payoﬀ is his subjective payoﬀ u t i ( ) deﬁned in (11). For example, at state ! 11 (or ! 12 ) player I believes that with probability 3/7 the state is ! 11 in which case his payoﬀ is 46/45 and with probability 4/7 the state is ! 12 in which case his payoﬀ is 18/45. Therefore his subjective expected payoﬀ at state ! 11 is 3/7 46/45 C 4/7 18/45 D 2/3. Similar computations show that in states ! 21 or ! 22 player I “expects” a payoﬀ of 7/15 while player II “expects” 3/10 at states ! 11 or ! 21 and 86/225 in states ! 12 or ! 22 . Bayesian equilibrium is widely used in Auction Theory, which constitutes an important and successful application of the theory of games with incomplete information. The simplest example is that of two buyers bidding in a ﬁrstprice auction for an indivisible object. If each buyer i has a private value vi for the object (which is independent of the private value vj of the other buyer), and if he further believes that vj is random with uniform probability distribution on [0; 1], then this is a Bayesian game in which the type of a player is his private valuation; that is, the type sets are T1 D T2 D [0; 1], which is a continuum. This is a consistent Bayesian game (that is, a Harsanyi game) since the beliefs are derived from the uniform probability distribution on T1 T2 D [0; 1]2 . A Bayesian equilibrium of this game is that in which each player bids half of his private value: b i (v i ) D v i /2 (see, e. g., Chap. III in [25]). Although auction theory was developed far beyond this simple example, almost all the models studied so far are Bayesian games with consistent beliefs, that is, Harsanyi games. The main reason of course is that consistent Bayesian games are more manageable since they can be described in terms of an equivalent ordinary game in strategic form. However, inconsistent beliefs are rather plausible and exist in the market place in general and even more so in auction situations. An example of that is the case of collusion of bidders: When a bidding ring is formed, it may well be the case that some of the bidders outside the ring are unaware of its existence and behave under the belief that all bidders are competitive. The members of the ring may or may not know whether the other bidders know about the ring, or they may be uncertain about it. This rather plausible mutual belief situation is typically inconsistent and has to be treated as an inconsistent Bayesian game for which a Bayesian equilibrium is to be found.
Bayesian Equilibrium and Correlated Equilibrium
:
Correlated equilibrium was introduced in Aumann (1974) as the Nash equilibrium of a game extended by adding to
Bayesian Games: Games with Incomplete Information
it random events about which the players have partial information. Basically, starting from an ordinary game, Aumann added a probability space and information structure and obtained a game with incomplete information, the equilibrium of which he called a correlated equilibrium of the original game. The fact that the Nash equilibrium of a game with incomplete information is the Bayesian equilibrium suggests that the concept of correlated equilibrium is closely related to that of Bayesian equilibrium. In fact Aumann noticed that and discussed it in a second paper entitled “Correlated equilibrium as an expression of Bayesian rationality” [3]. In this section, we review brieﬂy, by way of an example, the concept of correlated equilibrium, and state formally its relation to the concept of Bayesian equilibrium. Example 18 Consider a twoperson game with actions fT; Bg for player 1 and fL; Rg for player 2 with corresponding payoﬀs given in the following matrix:
This game has three Nash equilibria: (T; R) with payoﬀ (2; 7), (B; L) with payoﬀ (7; 2) and the mixed equilibrium ([ 23 (T); 13 (B)]; [ 23 (L); 13 (R)]) with payoﬀ (4 23 ; 4 23 ). Suppose that we add to the game a chance move that chooses an element in fT; Bg fL; Rg according to the following probability distribution :
Let us now extend the game G to a game with incomplete information G in which a chance move chooses an element in fT; Bg fL; Rg according to the probability distribution above. Then player 1 is informed of the ﬁrst (left) component of the chosen element and player 2 is informed of the second (right) component. Then each player chooses an action in G and the payoﬀ is made. If we interpret the partial information as a suggestion of which action to choose, then it is readily veriﬁed that following the suggestion is a Nash equilibrium of the extended game yielding a payoﬀ (5; 5). This was called by Aumann a correlated equilibrium of the original game G. In our terminology, the extended game G is a Bayesian game and its
Nash equilibrium is its Bayesian equilibrium. Thus what we have here is that a correlated equilibrium of a game is just the Bayesian equilibrium of its extension to a game with incomplete information. We now make this a general formal statement. For simplicity, we use the Aumann model of a game with incomplete information. Let G D (I; (A i ) i2I ; (u i ) i2I ) be a game in strategic form where I is the set of players, Ai is the set of actions (pure strategies) of player i and ui is his payoﬀ function. Deﬁnition 19 Given a game in strategic form G, an incomplete information extension (the Iextension) of the game G is the game G given by G D (I; (A i ) i2I ; (u i ) i2I ; (Y; p)); ( i ) i2I ) ; where (Y; p) is a ﬁnite probability space and i is a partition of Y (the information partition of player i). This is an Aumann model of incomplete information and, as we noted before, it is also a Harsanyi typebased model in which the type of player i at state ! 2 Y is t i (!) D i (!), and a strategy of player i is a mapping from his type set to his mixed actions: i : Ti ! (A i ). We identify a correlated equilibrium in the game G by the probability distribution on the vectors of actions A D A1 ; : : : ; A n . Thus 2 (A) is a correlated equilibrium of the game G if when a 2 A is chosen according to and each player i is suggested to play ai , his best reply is in fact to play the action ai . Given a game with incomplete information G as in deﬁnition 19, any vector of strategies of the players D (1 ; : : : ; n ) induces a probability distribution on the vectors of actions a 2 A. We denote this as 2 (A). We can now state the relation between correlated and Bayesian equilibria: Theorem 20 Let be a Bayesian equilibrium in the game of incomplete information G D (I; (A i ) i2I ; (u i ) i2I ; (Y; p)); ( i ) i2I ); then the induced probability distribution is a correlated equilibrium of the basic game G D (I; (A i ) i2I ; (u i ) i2I ). The other direction is: Theorem 21 Let be a correlated equilibrium of the game G D (I; (A i ) i2I ; (u i ) i2I ); then G has an extension to a game with incomplete information G D (I; (A i ) i2I ; (u i ) i2I ; (Y; p)); ( i ) i2I ) with a Bayesian equilibrium for which D .
251
252
Bayesian Games: Games with Incomplete Information
Concluding Remarks and Future Directions The Consistency Assumption To the heated discussion of the merits and justiﬁcation of the consistency assumption in economic and gametheoretical models, we would like to add a couple of remarks. In our opinion, the appropriate way of modeling an incomplete information situation is at the interim stage, that is, when a player knows his own beliefs (type). The Harsanyi ex ante model is just an auxiliary construction for the analysis. Actually this was also the view of Harsanyi, who justiﬁed his model by proving that it provides the same equilibria as the interim stage situation it generates (Theorem 17). The Harsanyi doctrine says roughly that our models “should be consistent” and if we get an inconsistent model it must be the case that it not be a “correct” model of the situation at hand. This becomes less convincing if we agree that the interim stage is what we are interested in: Not only are most mutual beliefs inconsistent, as we saw in the section entitled “Consistent Beliefs and Common Priors” above, but it is hard to argue convincingly that the model in Example 5 describes an adequate mutual belief situation while the model in Example 6 does not; the only diﬀerence between the two is that in one model, a certain type’s beliefs are [ 35 !11 ; 25 !21 ] while in the other model his beliefs are [ 12 !11 ; 12 !21 ]. Another related point is the fact that if players’ beliefs are the data of the situation (in the interim stage), then these are typically imprecise and rather hard to measure. Therefore any meaningful result of our analysis should be robust to small changes in the beliefs. This cannot be achieved within the consistent belief systems which are a thin set of measure zero in the universal belief space. Knowledge and Beliefs Our interest in this article was mostly in the notion of beliefs of players and less in the notion of knowledge. These are two related but diﬀerent notions. Knowledge is deﬁned through a knowledge operator satisfying some axioms. Beliefs are deﬁned by means of probability distributions. Aumann’s model, discussed in the section entitled “Aumann’s Model” above, has both elements: The knowledge was generated by the partitions of the players while the beliefs were generated by the probability P on the space Y (and the partitions). Being interested in the subjective beliefs of the player we could understand “at state of the world ! 2 ˝ player i knows the event E ˝” to mean “at state of the world ! 2 ˝ player i assigns to the event E ˝ probability 1”. However, in the universal belief space, “belief with probability 1” does not satisfy a cen
tral axiom of the knowledge operator. Namely, if at ! 2 ˝ player i knows the event E ˝, then ! 2 E. That is, if a player knows an event, then this event in fact happened. In the universal belief space where all coherent beliefs are possible, in a state ! 2 ˝ a player may assign probability 1 to the event f! 0 g where ! 0 ¤ !. In fact, if in a BLsubspace Y the condition ! 2 Pi (!) is satisﬁed for all i and all ! 2 Y, then belief with probability 1 is a knowledge operator on Y. This in fact was the case in Aumann’s and in Harsanyi’s models where, by construction, the support of the beliefs of a player in the state ! always included !. For a detailed discussion of the relationship between knowledge and beliefs in the universal belief space see Vassilakis and Zamir [24]. Future Directions We have not said much about the existence of Bayesian equilibrium, mainly because it has not been studied enough and there are no general results, especially in the nonconsistent case. We can readily see that a Bayesian game on a ﬁnite BLsubspace in which each state of nature s(!) is a ﬁnite gameform has a Bayesian equilibrium in mixed strategies. This can be proved, for example, by transforming the Bayesian game into an ordinary ﬁnite game (see Remark 16) and applying the Nash theorem for ﬁnite games. For games with incomplete information with a continuum of strategies and payoﬀ functions not necessarily continuous, there are no general existence results. Even in consistent auction models, existence was proved for speciﬁc models separately (see [20,15,22]). Establishing general existence results for large families of Bayesian games is clearly an important future direction of research. Since, as we argued before, most games are Bayesian games, the existence of a Bayesian equilibrium should, and could, reach at least the level of generality available for the existence of a Nash equilibrium. Acknowledgments I am grateful to two anonymous reviewers for their helpful comments. Bibliography 1. Aumann R (1974) Subjectivity and Correlation in Randomized Strategies. J Math Econ 1:67–96 2. Aumann R (1976) Agreeing to disagree. Ann Stat 4:1236–1239 3. Aumann R (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18 4. Aumann R (1998) Common priors: A reply to Gul. Econometrica 66:929–938 5. Aumann R (1999) Interactive epistemology I: Knowledge. Intern J Game Theory 28:263–300
Bayesian Games: Games with Incomplete Information
6. Aumann R (1999) Interactive epistemology II: Probability. Intern J Game Theory 28:301–314 7. Aumann R, Heifetz A (2002) Incomplete Information. In: Aumann R, Hart S (eds) Handbook of Game Theory, vol 3. Elsevier, pp 1666–1686 8. Aumann R, Maschler M (1995) Repeated Games with Incomplete Information. MIT Press, Cambridge 9. Brandenburger A, Dekel E (1993) Hierarchies of beliefs and common knowledge. J Econ Theory 59:189–198 10. Gul F (1998) A comment on Aumann’s Bayesian view. Econometrica 66:923–927 11. Harsanyi J (1967–8) Games with incomplete information played by ‘Bayesian’ players, parts I–III. Manag Sci 8:159–182, 320–334, 486–502 12. Heifetz A (1993) The Bayesian formulation of incomplete information, the noncompact case. Intern J Game Theory 21:329– 338 13. Heifetz A, Mongin P (2001) Probability logic for type spaces. Games Econ Behav 35:31–53 14. Heifetz A, Samet D (1998) Topologyfree topology of beliefs. J Econ Theory 82:324–341 15. Maskin E, Riley J (2000) Asymmetric auctions. Rev Econ Stud 67:413–438
16. Meier M (2001) An infinitary probability logic for type spaces. CORE Discussion Paper 2001/61 17. Mertens JF, Sorin S, Zamir S (1994) Repeated Games, Part A: Background Material. CORE Discussion Paper No. 9420 18. Mertens JF, Zamir S (1985) Foundation of Bayesian analysis for games with incomplete information. Intern J Game Theory 14:1–29 19. Milgrom PR, Stokey N (1982) Information, trade and common knowledge. J Eco Theory 26:17–27 20. Milgrom PR, Weber RJ (1982) A Theory of Auctions and Competitive Bidding. Econometrica 50:1089–1122 21. Nyarko Y (1991) Most games violate the Harsanyi doctrine. C.V. Starr working paper #91–39, NYU 22. Reny P, Zamir S (2004) On the existence of pure strategy monotone equilibria in asymmetric first price auctions. Econometrica 72:1105–1125 23. Sorin S, Zamir S (1985) A 2person game with lack of information on 1 12 sides. Math Oper Res 10:17–23 24. Vassilakis S, Zamir S (1993) Common beliefs and common knowledge. J Math Econ 22:495–505 25. Wolfstetter E (1999) Topics in Microeconomics. Cambridge University Press, Cambridge
253
254
Bayesian Statistics
Bayesian Statistics DAVID DRAPER Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California, Santa Cruz, USA Article Outline Glossary Deﬁnition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography Glossary Bayes’ theorem; prior, likelihood and posterior distributions Given (a) , something of interest which is unknown to the person making an uncertainty assessment, conveniently referred to as You, (b) y, an information source which is relevant to decreasing Your uncertainty about , (c) a desire to learn about from y in a way that is both internally and externally logically consistent, and (d) B, Your background assumptions and judgments about how the world works, as these assumptions and judgments relate to learning about from y, it can be shown that You are compelled in this situation to reason within the standard rules of probability as the basis of Your inferences about , predictions of future data y , and decisions in the face of uncertainty (see below for contrasts between inference, prediction and decisionmaking), and to quantify Your uncertainty about any unknown quantities through conditional probability distributions. When inferences about are the goal, Bayes’ Theorem provides a means of combining all relevant information internal and external to y: p( jy; B) D c p( jB) l( jy; B) :
(1)
Here, for example in the case in which is a realvalued vector of length k, (a) p( jB) is Your prior distribution about given B (in the form of a probability density function), which quantiﬁes all relevant information available to You about external to y, (b) c is a positive normalizing constant, chosen to make the density on the left side of the equation integrate to 1, (c) l( jy; B) is Your likelihood distribution for given y and B, which is deﬁned to be a den
sitynormalized multiple of Your sampling distribution p(j ; B) for future data values y given and B, but reinterpreted as a function of for ﬁxed y, and (d) p( jy; B) is Your posterior distribution about given y and B, which summarizes Your current total information about and solves the basic inference problem. Bayesian parametric and nonparametric modeling (1) Following de Finetti [23], a Bayesian statistical model is a joint predictive distribution p(y1 ; : : : ; y n ) for observable quantities yi that have not yet been observed, and about which You are therefore uncertain. When the yi are realvalued, often You will not regard them as probabilistically independent (informally, the yi are independent if information about any of them does not help You to predict the others); but it may be possible to identify a parameter vector D ( 1 ; : : : ; k ) such that You would judge the yi conditionally independent given , and would therefore be willing to model them via the relation p(y1 ; : : : ; y n j ) D
n Y
p(y i j ) :
(2)
iD1
When combined with a prior distribution p( ) on that is appropriate to the context, this is Bayesian parametric modeling, in which p(y i j ) will often have a standard distributional form (such as binomial, Poisson or Gaussian). (2) When a (ﬁnite) parameter vector that induces conditional independence cannot be found, if You judge your uncertainty about the realvalued yi exchangeable (see below), then a representation theorem of de Finetti [21] states informally that all internally logically consistent predictive distributions p(y1 ; : : : ; y n ) can be expressed in a way that is equivalent to the hierarchical model (see below) (FjB)
p(FjB) IID
(y i jF; B)
F;
(3)
where (a) F is the cumulative distribution function (CDF) of the underlying process (y1 ; y2 ; : : : ) from which You are willing to regard p(y1 ; : : : ; y n ) as (in eﬀect) like a random sample and (b) p(FjB) is Your prior distribution on the space F of all CDFs on the real line. This (placing probability distributions on inﬁnitedimensional spaces such as F ) is Bayesian nonparametric modeling, in which priors involving Dirichlet processes and/or Pólya trees (see Sect. “Inference: Parametric and NonParametric Modeling of Count Data”) are often used.
Bayesian Statistics
Exchangeability A sequence y D (y1 ; : : : ; y n ) of random variables (for n 1) is (ﬁnitely) exchangeable if the joint probability distribution p(y1 ; : : : ; y n ) of the elements of y is invariant under permutation of the indices (1; : : : ; n), and a countably inﬁnite sequence (y1 ; y2 ; : : : ) is (inﬁnitely) exchangeable if every ﬁnite subsequence is ﬁnitely exchangeable. Hierarchical modeling Often Your uncertainty about something unknown to You can be seen to have a nested or hierarchical character. One class of examples arises in cluster sampling in ﬁelds such as education and medicine, in which students (level 1) are nested within classrooms (level 2) and patients (level 1) within hospitals (level 2); cluster sampling involves random samples (and therefore uncertainty) at two or more levels in such a data hierarchy (examples of this type of hierarchical modeling are given in Sect. “Strengths and Weaknesses of the Two Approaches”). Another, quite diﬀerent, class of examples of Bayesian hierarchical modeling is exempliﬁed by equation (3) above, in which is was helpful to decompose Your overall predictive uncertainty about (y1 ; : : : ; y n ) into (a) uncertainty about F and then (b) uncertainty about the yi given F (examples of this type of hierarchical modeling appear in Sect. “Inference and Prediction: Binary Outcomes with No Covariates” and “Inference: Parametric and NonParametric Modeling of Count Data”). Inference, prediction and decisionmaking; samples and populations Given a data source y, inference involves drawing probabilistic conclusions about the underlying process that gave rise to y, prediction involves summarizing uncertainty about future observable data values y , and decisionmaking involves looking for optimal behavioral choices in the face of uncertainty (about either the underlying process, or the future, or both). In some cases inference takes the form of reasoning backwards from a sample of data values to a population: a (larger) universe of possible data values from which You judge that the sample has been drawn in a manner that is representative (i. e., so that the sampled and unsampled values in the population are (likely to be) similar in relevant ways). Mixture modeling Given y, unknown to You, and B, Your background assumptions and judgments relevant to y, You have a choice: You can either model (Your uncertainty about) y directly, through the probability distribution p(yjB), or (if that is not feasible) You can identify a quantity x upon which You judge y to depend and model y hierarchically, in two stages: ﬁrst by modeling x, through the probability distribu
tion p(xjB), and then by modeling y given x, through the probability distribution p(yjx; B): Z p(yjx; B) p(xjB) dx ; (4) p(yjB) D X
where X is the space of possible values of x over which Your uncertainty is expressed. This is mixture modeling, a special case of hierarchical modeling (see above). In hierarchical notation (4) can be reexpressed as
x yD : (5) (y j x) Examples of mixture modeling in this article include (a) equation (3) above, with F playing the role of x; (b) the basic equation governing Bayesian prediction, discussed in Sect. “The Bayesian Statistical Paradigm”; (c) Bayesian model averaging (Sect. “The Bayesian Statistical Paradigm”); (d) de Finetti’s representation theorem for binary outcomes (Sect. “Inference and Prediction: Binary Outcomes with No Covariates”); (e) randomeﬀects parametric and nonparametric modeling of count data (Sect. “Inference: Parametric and NonParametric Modeling of Count Data”); and (f) integrated likelihoods in Bayes factors (Sect. “DecisionMaking: Variable Selection in Generalized Linear Models; Bayesian Model Selection”). Probability – frequentist and Bayesian In the frequentist probability paradigm, attention is restricted to phenomena that are inherently repeatable under (essentially) identical conditions; then, for an event A of interest, P f (A) is the limiting relative frequency with which A occurs in the (hypothetical) repetitions, as the number of repetitions n ! 1. By contrast, Your Bayesian probability PB (AjB) is the numerical weight of evidence, given Your background information B relevant to A, in favor of a truefalse proposition A whose truth status is uncertain to You, obeying a series of reasonable axioms to ensure that Your Bayesian probabilities are internally logically consistent. Utility To ensure internal logical consistency, optimal decisionmaking proceeds by (a) specifying a utility function U(a; 0 ) quantifying the numerical value associated with taking action a if the unknown is really 0 and (b) maximizing expected utility, where the expectation is taken over uncertainty in as quantiﬁed by the posterior distribution p( jy; B). Definition of the Subject and Introduction Statistics may be deﬁned as the study of uncertainty: how to measure it, and how to make choices in the
255
256
Bayesian Statistics
face of it. Uncertainty is quantiﬁed via probability, of which there are two leading paradigms, frequentist (discussed in Sect. “Comparison with the Frequentist Statistical Paradigm”) and Bayesian. In the Bayesian approach to probability the primitive constructs are truefalse propositions A whose truth status is uncertain, and the probability of A is the numerical weight of evidence in favor of A, constrained to obey a set of axioms to ensure that Bayesian probabilities are coherent (internally logically consistent). The discipline of statistics may be divided broadly into four activities: description (graphical and numerical summaries of a data set y, without attempting to reason outward from it; this activity is almost entirely nonprobabilistic and will not be discussed further here), inference (drawing probabilistic conclusions about the underlying process that gave rise to y), prediction (summarizing uncertainty about future observable data values y ), and decisionmaking (looking for optimal behavioral choices in the face of uncertainty). Bayesian statistics is an approach to inference, prediction and decisionmaking that is based on the Bayesian probability paradigm, in which uncertainty about an unknown (this is the inference problem) is quantiﬁed by means of a conditional probability distribution p( jy; B); here y is all available relevant data and B summarizes the background assumptions and judgments of the person making the uncertainty assessment. Prediction of a future y is similarly based on the conditional probability distribution p(y jy; B), and optimal decisionmaking proceeds by (a) specifying a utility function U(a; 0 ) quantifying the numerical reward associated with taking action a if the unknown is really 0 and (b) maximizing expected utility, where the expectation is taken over uncertainty in as quantiﬁed by p( jy; B). The Bayesian Statistical Paradigm Statistics is the branch of mathematical and scientiﬁc inquiry devoted to the study of uncertainty: its consequences, and how to behave sensibly in its presence. The subject draws heavily on probability, a discipline which predates it by about 100 years: basic probability theory can be traced [48] to work of Pascal, Fermat and Huygens in the 1650s, and the beginnings of statistics [34,109] are evident in work of Bayes published in the 1760s. The Bayesian statistical paradigm consists of three basic ingredients: , something of interest which is unknown (or only partially known) to the person making the uncertainty assessment, conveniently referred to, in a convention proposed by Good (1950), as You. Often is a parameter vector of real numbers (of ﬁnite length k, say) or
a matrix, but it can literally be almost anything: for example, a function (three leading examples are a cumulative distribution function (CDF), a density, or a regression surface), a phylogenetic tree, an image of a region on the surface of Mars at a particular moment in time, : : : y, an information source which is relevant to decreasing Your uncertainty about . Often y is a vector of real numbers (of ﬁnite length n, say), but it can also literally be almost anything: for instance, a time series, a movie, the text in a book, : : : A desire to learn about from y in a way that is both coherent (internally consistent: in other words, free of internal logical contradictions; Bernardo and Smith [11] give a precise deﬁnition of coherence) and wellcalibrated (externally consistent: for example, capable of making accurate predictions of future data y ). It turns out [23,53] that You are compelled in this situation to reason within the standard rules of probability (see below) as the basis of Your inferences about , predictions of future data y , and decisions in the face of uncertainty, and to quantify Your uncertainty about any unknown quantities through conditional probability distributions, as in the following three basic equations of Bayesian statistics: p( jy; B) D c p( jB) l( jy; B) Z p(y jy; B) D p(y j ; B) p( jy; B) d a D argmax E( jy;B) U(a; ) :
(6)
a2A
(The basic rules of probability [71] are: for any truefalse propositions A and B and any background assumptions and judgments B, (convexity) 0 P(AjB) 1, with equality at 1 iﬀ A is known to be true under B; (multiplication) P(A and BjB) D P(AjB) P(BjA; B) D P(BjB) P(AjB; B); and (addition) P(A or BjB) D P(AjB) C P(BjB) P(A and BjB).) The meaning of the equations in (6) is as follows. B stands for Your background (often not fully stated) assumptions and judgments about how the world works, as these assumptions and judgments relate to learning about from y. B is often omitted from the basic equations (sometimes with unfortunate consequences), yielding the simplerlooking forms p( jy) D c p( ) l( jy) Z p(y j ) p( jy) d p(y jy) D a D argmax E( jy) U(a; ) : a2A
(7)
Bayesian Statistics
p( jB) is Your prior information about given B, in the form of a probability density function (PDF) or probability mass function (PMF) if lives continuously or discretely on R k (this is generically referred to as Your prior distribution), and p( jy; B) is Your posterior distribution about given y and B, which summarizes Your current total information about and solves the basic inference problem. These are actually not very good names for p( jB) and p( jy; B), because (for example) p( jB) really stands for all (relevant) information about (given B) external to y, whether that information was obtained before (or after) y arrives, but (a) they do emphasize the sequential nature of learning and (b) through long usage it would be diﬃcult for more accurate names to be adopted. c (here and throughout) is a generic positive normalizing constant, inserted into the ﬁrst equation in (6) to make the lefthand side integrate (or sum) to 1 (as any coherent distribution must). p(y j ; B) is Your sampling distribution for future data values y given and B (and presumably You would use the same sampling distribution p(yj ; B) for (past) data values y, mentally turning the clock back to a point before the data arrives and thinking about what values of y You might see). This assumes that You are willing to regard Your data as like random draws from a population of possible data values (an heroic assumption in some cases, for instance with observational rather than randomized data; this same assumption arises in the frequentist statistical paradigm, discussed below in Sect. “Comparison with the Frequentist Statistical Paradigm”). l( jy; B) is Your likelihood function for given y and B, which is deﬁned to be any positive constant multiple of the sampling distribution p(yj ; B) but reinterpreted as a function of for ﬁxed y: l( jy; B) D c p(yj ; B) :
(8)
The likelihood function is also central to one of the main approaches to frequentist statistical inference, developed by Fisher [37]; the two approaches are contrasted in Sect.“Comparison with the Frequentist Statistical Paradigm”. All of the symbols in the ﬁrst equation in (6) have now been deﬁned, and this equation can be recognized as Bayes’ Theorem, named after Bayes [5] because a special case of it appears prominently in work of his that was published posthumously. It describes how to pass coherently from information about external to y (quantiﬁed in the prior distribution p( jB)) to information
both internal and external to y (quantiﬁed in the posterior distribution p( jy; B)), via the likelihood function l( jy; B): You multiply the prior and likelihood pointwise in and normalize so that the posterior distribution p( jy; B) integrates (or sums) to 1. According to the second equation in (6), p(y jy; B), Your (posterior) predictive distribution for future data y given (past) data y and B, which solves the basic prediction problem, must be a weighted average of Your sampling distribution p(y j ; B) weighted by Your current best information p( jy; B) about given y and B; in this integral is the space of possible values of over which Your uncertainty is expressed. (The second equation in (6) contains a simplifying assumption that should be mentioned: in full generality the ﬁrst term p(y j ; B) inside the integral would be p(y jy; ; B), but it is almost always the case that the information in y is redundant in the presence of complete knowledge of , in which case p(y jy; ; B) D p(y j ; B); this state of aﬀairs could be described by saying that the past and future are conditionally independent given the truth. A simple example of this phenomenon is provided by cointossing: if You are watching a Bernoulli( ) process unfold (see Sect. “Inference and Prediction: Binary Outcomes with No Covariates”) whose success probability is unknown to You, the information that 8 of the ﬁrst 10 tosses have been heads is deﬁnitely useful to You in predicting the 11th toss, but if instead You somehow knew that was 0.7, the outcome of the ﬁrst 10 tosses would be irrelevant to You in predicting any future tosses.) Finally, in the context of making a choice in the face of uncertainty, A is Your set of possible actions, U(a; 0 ) is the numerical value (utility) You attach to taking action a if the unknown is really 0 (speciﬁed, without loss of generality, so that large utility values are preferred by You), and the third equation in (6) says that to make the choice coherently You should ﬁnd the action a that maximizes expected utility (MEU); here the expectation Z E( jy;B) U(a; ) D U(a; ) p( jy; B) d (9)
is taken over uncertainty in as quantiﬁed by the posterior distribution p( jy; B). This summarizes the entire Bayesian statistical paradigm, which is driven by the three equations in (6). Examples of its use include clinical trial design [56] and analysis [105]; spatiotemporal modeling, with environmental applications [101]; forecasting and dynamic linear mod
257
258
Bayesian Statistics
els [115]; nonparametric estimation of receiver operating characteristic curves, with applications in medicine and agriculture [49]; ﬁnite selection models, with health policy applications [79]; Bayesian CART model search, with applications in breast cancer research [16]; construction of radiocarbon calibration curves, with archaeological applications [15]; factor regression models, with applications to gene expression data [114]; mixture modeling for highdensity genotyping arrays, with bioinformatic applications [100]; the EM algorithm for Bayesian ﬁtting of latent process models [76]; statespace modeling, with applications in particleﬁltering [92]; causal inference [42,99]; hierarchical modeling of DNA sequences, with genetic and medical applications [77]; hierarchical Poisson regression modeling, with applications in health care evaluation [17]; multiscale modeling, with engineering and ﬁnancial applications [33]; expected posterior prior distributions for model selection [91]; nested Dirichlet processes, with applications in the health sciences [96]; Bayesian methods in the study of sustainable ﬁsheries [74,82]; hierarchical nonparametric metaanalysis, with medical and educational applications [81]; and structural equation modeling of multilevel data, with applications to health policy [19]. Challenges to the paradigm include the following: Q: How do You specify the sampling distribution/ likelihood function that quantiﬁes the information about the unknown internal to Your data set y? A: (1) The solution to this problem, which is common to all approaches to statistical inference, involves imagining future data y from the same process that has yielded or will yield Your data set y; often the variability You expect in future data values can be quantiﬁed (at least approximately) through a standard parametric family of distributions (such as the Bernoulli/binomial for binary data, the Poisson for count data, and the Gaussian for realvalued outcomes) and the parameter vector of this family becomes the unknown of interest. (2) Uncertainty in the likelihood function is referred to as model uncertainty [67]; a leading approach to quantifying this source of uncertainty is Bayesian model averaging [18,25,52], in which uncertainty about the models M in an ensemble M of models (specifying M is part of B) is assessed and propagated for a quantity, such as a future data value y , that is common to all models via the expression Z p(y jy; M; B) p(Mjy; B) dM : (10) p(y jy; B) D M
In other words, to make coherent predictions in the presence of model uncertainty You should form
a weighted average of the conditional predictive distributions p(y jy; M; B), weighted by the posterior model probabilities p(Mjy; B). Other potentially useful approaches to model uncertainty include Bayesian nonparametric modeling, which is examined in Sect. “Inference: Parametric and NonParametric Modeling of Count Data”, and methods based on crossvalidation [110], in which (in Bayesian language) part of the data is used to specify the prior distribution on M (which is an input to calculating the posterior model probabilities) and the rest of the data is employed to update that prior. Q: How do You quantify information about the unknown external to Your data set y in the prior probability distribution p( jB)? A: (1) There is an extensive literature on elicitation of prior (and other) probabilities; notable references include O’Hagan et al. [85] and the citations given there. (2) If is a parameter vector and the likelihood function is a member of the exponential family [11], the prior distribution can be chosen in such a way that the prior and posterior distributions for have the same mathematical form (such a prior is said to be conjugate to the given likelihood); this may greatly simplify the computations, and often prior information can (at least approximately) be quantiﬁed by choosing a member of the conjugate family (see Sect.“Inference and Prediction: Binary Outcomes with No Covariates” for an example of both of these phenomena). In situations where it is not precisely clear how to quantify the available information external to y, two sets of tools are available: Sensitivity analysis [30], also known as preposterior analysis [4]: Before the data have begun to arrive, You can (a) generate data similar to what You expect You will see, (b) choose a plausible prior speciﬁcation and update it to the posterior on the quantities of greatest interest, (c) repeat (b) across a variety of plausible alternatives, and (d) see if there is substantial stability in conclusions across the variations in prior speciﬁcation. If so, ﬁne; if not, this approach can be combined with hierarchical modeling [68]: You can collect all of the plausible priors and add a layer hierarchically to the prior speciﬁcation, with the new layer indexing variation across the prior alternatives. Bayesian robustness [8,95]: If, for example, the context of the problem implies that You only wish to specify that the prior distribution belongs to an inﬁnitedimensional class (such as, for priors on (0; 1), the class of monotone nonincreasing functions)
Bayesian Statistics
with (for instance) bounds on the ﬁrst two moments, You can in turn quantify bounds on summaries of the resulting posterior distribution, which may be narrow enough to demonstrate that Your uncertainty in specifying the prior does not lead to diﬀerences that are large in practical terms. Often context suggests speciﬁcation of a prior that has relatively little information content in relation to the likelihood information; for reasons that are made clear in Sect. “Inference and Prediction: Binary Outcomes with No Covariates”, such priors are referred to as relatively diﬀuse or ﬂat (the term noninformative is sometimes also used, but this seems worth avoiding, because any prior speciﬁcation takes a particular position regarding the amount of relevant information external to the data). See Bernardo [10] and Kass and Wasserman [60] for a variety of formal methods for generating diﬀuse prior distributions. Q: How do You quantify Your utility function U(a; ) for optimal decisionmaking? A: There is a rather less extensive statistical literature on elicitation of utility than probability; notable references include Fishburn [35,36], Schervish et al. [103], and the citations in Bernardo ans Smith [11]. There is a parallel (and somewhat richer) economics literature on utility elicitation; see, for instance, Abdellaoui [1] and Blavatskyy [12]. Sect. “DecisionMaking: Variable Selection in Generalized Linear Models; Bayesian Model Selection” provides a decisiontheoretic example. Suppose that D ( 1 ; : : : ; k ) is a parameter vector of length k. Then (a) computing the normalizing constant in Bayes’ Theorem Z cD
Z
p(yj 1 ; : : : ; k ; B)
p( 1 ; : : : ; k jB) d 1 d k
1 (11)
involves evaluating a kdimensional integral; (b) the predictive distribution in the second equation in (6) involves another kdimensional integral; and (c) the posterior p( 1 ; : : : ; k jy; B) is a kdimensional probability distribution, which for k > 2 can be diﬃcult to visualize, so that attention often focuses on the marginal posterior distributions Z Z p( j jy; B) D p( 1 ; : : : ; k jy; B) d j (12) for j D 1; : : : k, where j is the vector with component j omitted; each of these marginal distributions involves a (k 1)dimensional integral. If k is large these
integrals can be diﬃcult or impossible to evaluate exactly, and a general method for computing accurate approximations to them proved elusive from the time of Bayes in the eighteenth century until recently (in the late eighteenth century Laplace [63]) developed an analytical method, which today bears his name, for approximating integrals that arise in Bayesian work [11], but his method is not as general as the computationallyintensive techniques in widespread current use). Around 1990 there was a fundamental shift in Bayesian computation, with the belated discovery by the statistics profession of a class of techniques – Markov chain Monte Carlo (MCMC) methods [41,44] – for approximating highdimensional Bayesian integrals in a computationallyintensive manner, which had been published in the chemical physics literature in the 1950s [78]; these methods came into focus for the Bayesian community at a moment when desktop computers had ﬁnally become fast enough to make use of such techniques. MCMC methods approximate integrals associated with the posterior distribution p( jy; B) by (a) creating a Markov chain whose equilibrium distribution is the desired posterior and (b) sampling from this chain from an initial (0) (i) until equilibrium has been reached (all draws up to this point are typically discarded) and (ii) for a suﬃciently long period thereafter to achieve the desired approximation accuracy. With the advent and reﬁnement of MCMC methods since 1990, the Bayesian integration problem has been solved for a wide variety of models, with more ambitious sampling schemes made possible year after year with increased computing speeds: for instance, in problems in which the dimension of the parameter space is not ﬁxed in advance (an example is regression changepoint problems [104], where the outcome y is assumed to depend linearly (apart from stochastic noise) on the predictor(s) x but with an unknown number of changes of slope and intercept and unknown locations for those changes), ordinary MCMC techniques will not work; in such problems methods such as reversiblejump MCMC [47,94] and Markov birthdeath processes [108], which create Markov chains that permit transdimensional jumps, are required. The main drawback of MCMC methods is that they do not necessarily scale well as n (the number of data observations) increases; one alternative, popular in the machine learning community, is variational methods [55], which convert the integration problem into an optimization problem by (a) approximating the posterior distribution of interest by a family of distributions yielding a closedform approximation to the integral
259
260
Bayesian Statistics
and (b) ﬁnding the member of the family that maximizes the accuracy of the approximation. Bayesian decision theory [6], based on maximizing expected utility, is unambiguous in its normative recommendation for how a single agent (You) should make a choice in the face of uncertainty, and it has had widespread success in ﬁelds such as economics (e. g., [2,50]) and medicine (e. g., [87,116]). It is well known, however [3,112]), that Bayesian decision theory (or indeed any other formal approach that seeks an optimal behavioral choice) can be problematic when used normatively for group decisionmaking, because of conﬂicts in preferences among members of the group. This is an important unsolved problem.
mortality outcomes become known to You, Your predictive distribution would not change. This still seems to leave p(y1 ; : : : ; y n jB) substantially unspeciﬁed (where B now includes the judgment of exchangeability of the yi ), but de Finetti [20] proved a remarkable theorem which shows (in eﬀect) that all exchangeable predictive distributions for a vector of binary observables are representable as mixtures of Bernoulli sampling distributions: if You’re willing to regard (y1 ; : : : ; y n ) as the ﬁrst n terms in an inﬁnitely exchangeable binary sequence (y1 ; y2 ; : : : ) (which just means that every ﬁnite subsequence is exchangeable), then to achieve coherence Your predictive distribution must be expressible as Z 1
s n (1 )ns n p( jB) d ; (13) p(y1 ; : : : ; y n jB) D 0
Three Examples Inference and Prediction: Binary Outcomes with No Covariates Consider the problem of measuring the quality of care at a particular hospital H. One way to do this is to examine the outcomes of that care, such as mortality, after adjusting for the burden of illness brought by the patients to H on admission. As an even simpler version of this problem, consider just the n binary mortality observables y D (y1 ; : : : ; y n ) (with mortality measured within 30 days of admission, say; 1 = died, 0 = lived) that You will see from all of the patients at H with a particular admission diagnosis (heart attack, say) during some prespeciﬁed future time window. You acknowledge Your uncertainty about which elements in the sequence will be 0s and which 1s, and You wish to quantify this uncertainty using the Bayesian paradigm. As de Finetti [20] noted, in this situation Your fundamental imperative is to construct a predictive distribution p(y1 ; : : : ; y n jB) that expresses Your uncertainty about the future observables, rather than – as is perhaps more common – to reach immediately for a standard family of parametric models for the yi (in other words, to posit the existence of a vector
D ( 1 ; : : : ; k ) of parameters and to model the observables by appeal to a family p(y i j ; B) of probability distributions indexed by ). Even though the yi are binary, with all but the smallest values of n it still seems a formidable task to elicit from Yourself an ndimensional predictive distribution p(y1 ; : : : ; y n jB). De Finetti [20] showed, however, that the task is easier than it seems. In the absence of any further information about the patients, You notice that Your uncertainty about them is exchangeable: if someone (without telling You) were to rearrange the order in which their
P where s n D niD1 y i . Here the quantity on the right side of (13) is more than just an integration variable: the equation says that in Your predictive modeling of the binary yi You may as well proceed as if There is a quantity called , interpretable both as the marginal death probability p(y i D 1j ; B) for each patient and as the longrun mortality rate in the inﬁnite sequence (y1 ; y2 ; : : : ) (which serves, in eﬀect, as a population of values to which conclusions from the data can be generalized); Conditional on and B, the yi are independent identically distributed (IID) Bernoulli ( ); and can be viewed as a realization of a random variable with density p( jB). In other words, exchangeability of Your uncertainty about a binary process is functionally equivalent to assuming the simple Bayesian hierarchical model [27] ( jB)
p( jB)
IID
(y i j ; B) Bernoulli( ) ;
(14)
and p( jB) is recognizable as Your prior distribution for , the underlying death rate for heart attack patients similar to those You expect will arrive at hospital H during the relevant time window. Consider now the problem of quantitatively specifying prior information about . From (13) and (14) the likelihood function is l( jy; B) D c s n (1 )ns n ;
(15)
which (when interpreted in the Bayesian manner as a density in ) is recognizable as a member of the Beta family of probability distributions: for ˛; ˇ > 0 and 0 < < 1,
Beta(˛; ˇ) iﬀ p( ) D c ˛1 (1 )ˇ 1 :
(16)
Bayesian Statistics
Moreover, this family has the property that the product of two Beta densities is another Beta density, so by Bayes’ Theorem if the prior p( jB) is chosen to be Beta(˛; ˇ) for some (asyet unspeciﬁed) ˛ > 0 and ˇ > 0, then the posterior will be Beta(˛ C s n ; ˇ C n s n ): this is conjugacy (Sect. “The Bayesian Statistical Paradigm”) of the Beta family for the Bernoulli/binomial likelihood. In this case the conjugacy leads to a simple interpretation of ˛ and ˇ: the prior acts like a data set with ˛ 1s and ˇ 0s, in the sense that if person 1 does a Bayesian analysis with a Beta(˛; ˇ) prior and sample data y D (y1 ; : : : ; y n ) and person 2 instead merges the corresponding “prior data set” with y and does a maximumlikelihood analysis (Sect. “Comparison with the Frequentist Statistical Paradigm”) on the resulting merged data, the two people will get the same answers. This also shows that the prior sample size n0 in the Beta– Bernoulli/binomial model is (˛ C ˇ). Given that the mean of a Beta(˛; ˇ) distribution is ˛ / (˛ C ˇ), calculation reveals that the posterior mean (˛ C s n ) / (˛ C ˇ C n) of is a weighted average of the prior mean and the data mean P y D n1 niD1 y i , with prior and data weights n0 and n, re
spectively: ˛ C sn D ˛CˇCn
n0
˛ ˛Cˇ
C ny
n0 C n
:
(17)
These facts shed intuitive light on how Bayes’ Theorem combines information internal and external to a given data source: thinking of prior information as equivalent to a data set is a valuable intuition, even in nonconjugate settings. The choice of ˛ and ˇ naturally depends on the available information external to y. Consider for illustration two such speciﬁcations: Analyst 1 does a web search and ﬁnds that the 30day mortality rate for heart attack (given average quality of care and average patient sickness at admission) in her country is 15%. The information she has about hospital H is that its care and patient sickness are not likely to be wildly diﬀerent from the country averages but that a mortality deviation from the mean, if present, would
Bayesian Statistics, Figure 1 Priortoposterior updating with two prior specifications in the mortality data set (in both panels, prior: long dotted lines; likelihood: short dotted lines; posterior: solid lines). The left and right panels give the updating with the priors for Analysts 1 and 2, respectively
261
262
Bayesian Statistics
be more likely to occur on the high side than the low. Having lived in the community served by H for some time and having not heard anything either outstanding or deplorable about the hospital, she would be surprised to ﬁnd that the underlying heart attack death rate at H was less than (say) 5% or greater than (say) 30%. One way to quantify this information is to set the prior mean to 15% and to place (say) 95% of the prior mass between 5% and 30%. Analyst 2 has little information external to y and thus wishes to specify a relatively diﬀuse prior distribution that does not dramatically favor any part of the unit interval. Numerical integration reveals that (˛; ˇ) D (4:5; 25:5), with a prior sample size of 30.0, satisﬁes Analyst 1’s constraints. Analyst 2’s diﬀuse prior evidently corresponds to a rather small prior sample size; a variety of positive values of ˛ and ˇ near 0 are possible, all of which will lead to a relatively ﬂat prior. Suppose for illustration that the time period in question is about four years in length and H is a mediumsize US hospital; then there will be about n D 385 heart attack patients in the data set y. Suppose further that the observed : mortality rate at H comes out y D s n / n D 69 / 385 D 18%. Figure 1 summarizes the priortoposterior updating with this data set and the two priors for Analysts 1 (left panel) and 2 (right panel), with ˛ D ˇ D 1 (the Uniform distribution) for Analyst 2. Even though the two priors are rather diﬀerent – Analyst 1’s prior is skewed, : with a prior mean of 0.15 and n0 D 30; Analyst 2’s prior is ﬂat, with a prior mean of 0.5 and n0 D 2 – it is evident that the posterior distributions are nearly the same in both cases; this is because the data sample size n D 385 is so much larger than either of the prior sample sizes, so that the likelihood information dominates. With both priors the likelihood and posterior distributions are nearly the same, another consequence of n0 n. For Analyst 1 the posterior mean, standard deviation, and 95% central posterior interval for are (0:177; 0:00241; 0:142; 0:215), and the corresponding numerical results for Analyst 2 are (0:181; 0:00258; 0:144; 0:221); again it is clear that the two sets of results are almost identical. With a large sample size, careful elicitation – like that undertaken by Analyst 1 – will often yield results similar to those with a diﬀuse prior. The posterior predictive distributionp(y nC1 jy1 ; : : :y n ; B) for the next observation, having observed the ﬁrst n, is also straightforward to calculate in closed form with the conjugate prior in this model. It is clear that p(y nC1 jy; B) has to be a Bernoulli( ) distribution for some , and
intuition says that should just be the mean ˛ / (˛ C ˇ ) of the posterior distribution for given y, in which ˛ D ˛ C s n and ˇ D ˇ C n s n are the parameters of the Beta posterior. To check this, making use of the fact that the normalizing constant in the Beta(˛; ˇ) family is (˛ C ˇ) / (˛) (ˇ), the second equation in (6) gives p(y nC1 jy1 ; : : : y n ; B) Z 1 (˛ C ˇ ) ˛ 1 D
y nC1 (1 )1y nC1
(˛ ) (ˇ ) 0 (1 )ˇ
D
1
ˇ)
(˛ C (˛ ) (ˇ )
d Z 1
˛
Cy
nC1 1
0
(1 )(ˇ y nC1 C1)1 d (˛ C y nC1 ) (ˇ y nC1 C 1) D (˛ ) (ˇ ) (˛ C ˇ ) ; (˛ C ˇ C 1)
(18)
this, combined with the fact that (x C 1) / (x) D x for any real x, yields, for example in the case y nC1 D 1,
p(y nC1
(˛ C 1) D 1jy; B) D (˛ ) ˛ D ; ˛ C ˇ
(˛ C ˇ ) (˛ C ˇ C 1)
(19)
conﬁrming intuition. Inference: Parametric and NonParametric Modeling of Count Data Most elderly people in the Western world say they would prefer to spend the end of their lives at home, but many instead ﬁnish their lives in an institution (a nursing home or hospital). How can elderly people living in their communities be oﬀered health and social services that would help to prevent institutionalization? Hendriksen et al. [51] conducted an experiment in the 1980s in Denmark to test the eﬀectiveness of inhome geriatric assessment (IHGA), a form of preventive medicine in which each person’s medical and social needs are assessed and acted upon individually. A total of n D 572 elderly people living in noninstitutional settings in a number of villages were randomized, nC D 287 to a control group, who received standard health care, and n T D 285 to a treatment group, who received standard care plus IHGA. The number of hospitalizations during the twoyear life of the study was an outcome of particular interest. The data are presented and summarized in Table 1. Evidently IHGA lowered the mean hospitalization rate per
Bayesian Statistics Bayesian Statistics, Table 1 Distribution of number of hospitalizations in the IHGA study
Group Control Treatment
Number of Hospitalizations Sample 0 1 2 3 4 5 6 7 n Mean Variance 138 77 46 12 8 4 0 2 287 0.944 1.54 147 83 37 13 3 1 1 0 285 0.768 1.02
(b) G(FjB) D limn C !1 p(Fn C jB), where p(jB) is Your joint probability distribution on (C1 ; C2 ; : : : ); and (c) F is the space of all possible CDFs on R. Equation (20) says informally that exchangeability of Your uncertainty about an observable process unfolding on the real line is functionally equivalent to assuming the Bayesian hierarchical model (FjB)
p(FjB) IID
two years (for the elderly Danish people in the study, at : least) by (0:944 0:768) D 0:176, which is about an 18% reduction from the control level, a clinically large diﬀerence. The question then becomes, in Bayesian inferential language: what is the posterior distribution for the treatment eﬀect in the entire population P of patients judged exchangeable with those in the study? Continuing to refer to the relevant analyst as You, with a binary outcome variable and no covariates in Sect. “Inference and Prediction: Binary Outcomes with No Covariates” the model arose naturally from a judgment of exchangeability of Your uncertainty about all n outcomes, but such a judgment of unconditional exchangeability would not be appropriate initially here; to make such a judgment would be to assert that the treatment and control interventions have the same eﬀect on hospitalization, and it was the point of the study to see if this is true. Here, at least initially, it would be more scientiﬁcally appropriate to assert exchangeability separately and in parallel within the two experimental groups, a judgment de Finetti [22] called partial exchangeability and which has more recently been referred to as conditional exchangeability [28,72] given the treatment/control status covariate. Considering for the moment just the control group outcome values C i ; i D 1; : : : ; nC , and seeking as in Sect. “Inference and Prediction: Binary Outcomes with No Covariates” to model them via a predictive distribution p(C1 ; : : : ; C n C jB), de Finetti’s previous representation theorem is not available because the outcomes are realvalued rather than binary, but he proved [21] another theorem for this situation as well: if You’re willing to regard (C1 ; : : : ; C n C ) as the ﬁrst nC terms in an inﬁnitely exchangeable sequence (C1 ; C2 ; : : : ) of values on R (which plays the role of the population P , under the control condition, in this problem), then to achieve coherence Your predictive distribution must be expressible as
(y i jF; B)
F;
(21)
where p(FjB) is a prior distribution on F . Placing distributions on functions, such as CDFs and regression surfaces, is the topic addressed by the ﬁeld of Bayesian nonparametric (BNP) modeling [24,80], an area of statistics that has recently moved completely into the realm of daytoday implementation and relevance through advances in MCMC computational methods. Two rich families of prior distributions on CDFs about which a wealth of practical experience has recently accumulated include (mixtures of) Dirichlet processes [32] and Pólya trees [66]. Parametric modeling is of course also possible with the IHGA data: as noted by Krnjaji´c et al. [62], who explore both parametric and BNP models for data of this kind, Poisson modeling is a natural choice, since the outcome consists of counts of relatively rare events. The ﬁrst Poisson model to which one would generally turn is a ﬁxedeﬀects model, in which (C i jC ) are IID Poisson(C ) (i D 1; : : : ; nC D 287) and (T j jT ) are IID Poisson(T ) ( j D 1; : : : ; n T D 285), with a diﬀuse prior on (C ; T ) if little is known, external to the data set, about the underlying hospitalization rates in the control and treatment groups. However, the last two columns of Table 1 reveal that the sample variance is noticeably larger than the sample mean in both groups, indicating substantial Poisson overdispersion. For a second, improved, parametric model this suggests a randomeﬀects Poisson model of the form indep
(C i j i C ) Poisson( i C ) IID N(ˇ0C ; C2 ) ; log( i C )jˇ0C ; C2
(22)
and similarly for the treatment group, with diﬀuse priors for (ˇ0C ; C2 ; ˇ0T ; T2 ). As Krnjaji´c et al. [62] note, from a medical point of view this model is more plausible than the ﬁxedeﬀects formulation: each patient in the control group has his/her own latent (unobserved) underlying rate Z Y nC of hospitalization i C , which may well diﬀer from the unp(C1 ; : : : ; C n C jB) D F(C i ) dG(FjB) ; (20) derlying rates of the other control patients because of unF iD1 measured diﬀerences in factors such as health status at the here (a) F has an interpretation as F(t) D lim n C !1 Fn C (t), beginning of the experiment (and similarly for the treatwhere Fn C is the empirical CDF based on (C1 ; : : : ; C n C ); ment group).
263
264
Bayesian Statistics
Model (22), when complemented by its analogue in the treatment group, speciﬁes a Lognormal mixture of Poisson distributions for each group and is straightforward to ﬁt by MCMC, but the Gaussian assumption for the mixing distribution is conventional, not motivated by the underlying science of the problem, and if the distribution of the latent variables is not Gaussian – for example, if it is multimodal or skewed – model (22) may well lead to incorrect inferences. Krnjaji´c et al. [62] therefore also examine several BNP models that are centered on the randomeffects Poisson model but which permit learning about the true underlying distribution of the latent variables instead of assuming it is Gaussian. One of their models, when applied (for example) to the control group, was (C i j i C ) log( i C )jG (Gj˛; ; 2 )
indep
IID
Poisson( i C )
G
DP[˛ N(; 2 )] :
(23)
Here DP[˛ N(; 2 )] refers to a Dirichlet process prior distribution, on the CDF G of the latent variables, which is centered at the N(; 2 ) model with precision parameter ˛. Model (23) is an expansion of the randomeﬀects Poisson model (22) in that the latter is a special case of the former (obtained by letting ˛ ! 1). Model expansion is a common Bayesian analytic tool which helps to assess and propagate model uncertainty: if You are uncertain about a particular modeling detail, instead of ﬁtting a model that assumes this detail is correct with probability 1, embed it in a richer model class of which it is a special case, and let the data tell You about its plausibility. With the IHGA data, models (22) and (23) turned out to arrive at similar inferential conclusions – in both cases point estimates of the ratio of the treatment mean to the control mean were about 0.82 with a posterior standard deviation of about 0.09, and a posterior probability that the (population) mean ratio was less than 1 of about 0.95, so that evidence is strong that IHGA lowers mean hospitalizations not just in the sample but in the collection P of elderly people to whom it is appropriate to generalize. But the two modeling approaches need not yield similar results: if the latent variable distribution is far from Gaussian, model (22) will not be able to adjust to this violation of one of its basic assumptions. Krnjaji´c et al. [62] performed a simulation study in which data sets with 300 observations were generated from various Gaussian and nonGaussian latent variable distributions and a variety of parametric and BNP models were ﬁt to the resulting count data; Fig. 2 summarizes the prior and posterior predictive distributions from models (22; top panel)
and (23; bottom panel) with a bimodal latent variable distribution. The parametric Gaussian randomeﬀects model cannot ﬁt the bimodality on the data scale, but the BNP model – even though centered on the Gaussian as the randomeﬀects distribution – adapts smoothly to the underlying bimodal reality. DecisionMaking: Variable Selection in Generalized Linear Models; Bayesian Model Selection Variable selection (choosing the “best” subset of predictors) in generalized linear models is an old problem, dating back at least to the 1960s, and many methods [113] have been proposed to try to solve it; but virtually all of them ignore an aspect of the problem that can be important: the cost of data collection of the predictors. An example, studied by Fouskakis and Draper [39], which is an elaboration of the problem examined in Sect. “Inference and Prediction: Binary Outcomes with No Covariates”, arises in the ﬁeld of quality of health care measurement, where patient sickness at admission is often assessed by using logistic regression of an outcome, such as mortality within 30 days of admission, on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing standard variable selection methods (for instance, backward selection from a model with all predictors) to ﬁnd an “optimal” subset of 10–20 indicators that predict mortality well. The problem with such beneﬁtonly methods is that they ignore the considerable diﬀerences among the sickness indicators in the cost of data collection; this issue is crucial when admission sickness is used to drive programs (now implemented or under consideration in several countries, including the US and UK) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness), because such quality of care investigations are typically conducted under cost constraints. When both datacollection cost and accuracy of prediction of 30day mortality are considered, a large variableselection problem arises in which the only variables that make it into the ﬁnal scale should be those that achieve a costbeneﬁt tradeoﬀ. Variable selection is an example of the broader process of model selection, in which questions such as “Is model M 1 better than M 2 ?” and “Is M 1 good enough?” arise. These inquiries cannot be addressed, however, without ﬁrst answering a new set of questions: good enough (better than) for what purpose? Specifying this purpose [26,57,61,70] identiﬁes model selection as a decision problem that should be approached by constructing a contextually relevant utility function and maximizing expected utility. Fouskakis and Draper [39] create a utility
Bayesian Statistics
Bayesian Statistics, Figure 2 Prior (open circles) and posterior (solid circles) predictive distributions under models (22) and (23) (top and bottom panels, respectively) based on a data set generated from a bimodal latent variable distribution. In each panel, the histogram plots the simulated counts
function, for variable selection in their severity of illness problem, with two components that are combined additively: a datacollection component (in monetary units, such as US$), which is simply the negative of the total amount of money required to collect data on a given set of patients with a given subset of the sickness indicators; and a predictiveaccuracy component, in which a method
is devised to convert increased predictive accuracy into decreased monetary cost by thinking about the consequences of labeling a hospital with bad quality of care “good” and vice versa. One aspect of their work, with a data set (from a RAND study: [58]) involving p D 83 sickness indicators gathered on a representative sample of n D 2,532 elderly American patients hospitalized in the period 1980–
265
266
Bayesian Statistics
Bayesian Statistics, Figure 3 Estimated expected utility of all 16 384 variable subsets in the quality of care study based on RAND data
86 with pneumonia, focused only on the p D 14 variables in the original RAND sickness scale; this was chosen because 214 D 16 384 was a small enough number of possible models to do bruteforce enumeration of the estimated expected utility (EEU) of all the models. Figure 3 is a parallel boxplot of the EEUs of all 16 384 variable subsets, with the boxplots sorted by the number of variables in each model. The model with no predictors does poorly, with an EEU of about US$14.5, but from a costbeneﬁt point of view the RAND sickness scale with all 14 variables is even worse (US$15.7), because it includes expensive variables that do not add much to the predictive power in relation to cheaper variables that predict almost as well. The best subsets have 4–6 variables and would save about US$8 per patient when compared with the entire 14–variable scale; this would amount to signiﬁcant savings if the observedversusexpected assessment method were applied widely. Returning to the general problem of Bayesian model selection, two cases can be distinguished: situations in which the precise purpose to which the model will be put can be speciﬁed (as in the variableselection problem above), and settings in which at least some of the end uses to which the modeling will be put are not yet known. In
this second situation it is still helpful to reason in a decisiontheoretic way: the hallmark of a good (bad) model is that it makes good (bad) predictions, so a utility function based on predictive accuracy can be a good generalpurpose choice. With (a) a single sample of data y, (b) a future data value y , and (c) two models M j ( j D 1; 2) for illustration, what is needed is a scoring rule that measures the discrepancy between y and its predictive distribution p(y jy; M j ; B) under model M j . It turns out [46,86] that the optimal (impartial, symmetric, proper) scoring rules are linear functions of log p(y jy; M j ; B), which has a simple intuitive motivation: if the predictive distribution is Gaussian, for example, then values of y close to the center (in other words, those for which the prediction has been good) will receive a greater reward than those in the tails. An example [65], in this onesample setting, of a model selection criterion (a) based on prediction, (b) motivated by utility considerations and (c) with good model discrimination properties [29] is the fullsample log score n 1X LS FS (M j jy; B) D log p (y i jy; M j ; B) ; (24) n iD1
which is related to the conditional predictive ordinate cri
Bayesian Statistics
terion [90]. Other Bayesian model selection criteria in current use include the following: Bayes factors [60]: Bayes’ Theorem, written in odds form for discriminating between models M 1 and M 2 , says that p(M1 jy; B) D p(M2 jy; B)
p(M1 jB) p(yjM1 ; B) ; p(M2 jB) p(yjM2 ; B)
(25)
here the prior odds in favor of M1 ; p(M1 jB) ; p(M2 jB)
Deviance Information Criterion (DIC): Given a parametric model p(yj j ; M j ; B), Spiegelhalter et al. [106] deﬁne the deviance information criterion (DIC) (by analogy with other information criteria) to be a tradeoﬀ between (a) an estimate of the model lack of ﬁt, as measured by the deviance D( j ) (where j is the posterior mean of j under M j ; for the purpose of DIC, the deviance of a model [75] is minus twice the logarithm of the likelihood for that model), and (b) a penalty for model complexity equal to twice the eﬀective number of parameters p D j of the model: DIC(M j jy; B) D D( j ) C 2 pˆ D j :
are multiplied by the Bayes factor
When p D j is diﬃcult to read directly from the model (for example, in complex hierarchical models, especially those with random eﬀects), Spiegelhalter et al. motivate the following estimate, which is easy to compute from standard MCMC output:
p(yjM1 ; B) p(yjM2 ; B) to produce the posterior odds p(M1 jy; B) : p(M2 jy; B)
pˆ D j D D( j ) D( j ) ;
According to the logic of this criterion, models with high posterior probability are to be preferred, and if all the models under consideration are equally plausible a priori this reduces to preferring models with larger Bayes factors in their favor. One problem with this approach is that – in parametric models in which model M j has parameter vector j deﬁned on parameter space j – the integrated likelihoods p(yjM j ; B) appearing in the Bayes factor can be expressed as p(yjM j ; B) D
Z j
(27)
p(yj j ; M j ; B) p( j jM j ; B) d j
D E( j jM j ;B) p(yj j ; M j ; B) : (26) In other words, the numerator and denominator ingredients in the Bayes factor are each expressible as expectations of likelihood functions with respect to the prior distributions on the model parameters, and if context suggests that these priors should be speciﬁed diﬀusely the resulting Bayes factor can be unstable as a function of precisely how the diﬀuseness is speciﬁed. Various attempts have been made to remedy this instability of Bayes factors (for example, {partial, intrinsic, fractional} Bayes factors, well calibrated priors, conventional priors, intrinsic priors, expected posterior priors, . . . ; [9]); all of these methods appear to require an appeal to adhockery which is absent from the log score approach.
(28)
in other words, pˆ D j is the diﬀerence between the posterior mean of the deviance and the deviance evaluated at the posterior mean of the parameters. DIC is available as an option in several MCMC packages, including WinBUGS [107] and MLwiN [93]. One diﬃculty with DIC is that the MCMC estimate of p D j can be poor if the marginal posteriors for one or more parameters (using the parameterization that deﬁnes the deviance) are far from Gaussian; reparameterization (onto parameter scales where the posteriors are approximately Normal) helps but can still lead to mediocre estimates of p D j . Other notable recent references on the subject of Bayesian variable selection include Brown et al. [13], who examine multivariate regression in the context of compositional data, and George and Foster [43], who use empirical Bayes methods in the Gaussian linear model. Comparison with the Frequentist Statistical Paradigm Strengths and Weaknesses of the Two Approaches Frequentist statistics, which has concentrated mainly on inference, proceeds by (i) thinking of the values in a data set y as like a random sample from a population P (a set to which it is hoped that conclusions based on the data can validly be generalized), (ii) specifying a summary of interest in P (such as the population mean of the outcome variable), (iii) identifying a function ˆ of y that can
267
268
Bayesian Statistics
serve as a reasonable estimate of , (iv) imagining repeating the random sampling from P to get other data sets y and therefore other values of ˆ , and (v) using the random behavior of ˆ across these repetitions to make inferential probability statements involving . A leading implementation of the frequentist paradigm [37] is based on using the value ˆMLE that maximizes the likelihood function as the estimate of and obtaining a measure of uncertainty for ˆMLE from the curvature of the logarithm of the likelihood function at its maximum; this is maximum likelihood inference. Each of the frequentist and Bayesian approaches to statistics has strengths and weaknesses. The frequentist paradigm has the advantage that repeatedsampling calculations are often more tractable than manipulations with conditional probability distributions, and it has the clear strength that it focuses attention on the scientiﬁcally important issue of calibration: in settings where the true datagenerating process is known (e. g., in simulations of random sampling from a known population P ), how often does a particular method of statistical inference recover known truth? The frequentist approach has the disadvantage that it only applies to inherently repeatable phenomena, and therefore cannot be used to quantify uncertainty about many truefalse propositions of realworld interest (for example, if You are a doctor to whom a new patient (male, say) has just come, strictly speaking You cannot talk about the frequentist probability that this patient is HIV positive; he either is or he is not, and his arriving at Your oﬃce is not the outcome of any repeatable process that is straightforward to identify). In practice the frequentist approach also has the weaknesses that (a) model uncertainty is more diﬃcult to assess and propagate in this paradigm, (b) predictive uncertainty assessments are not always straightforward to create from the frequentist point of view (the bootstrap [31] is one possible solution) and (c) inferential calibration may not be easy to achieve when the sample size n is small. An example of several of these drawbacks arises in the construction of conﬁdence intervals [83], in which repeatedsampling statements such as P f ( ˆlow < < ˆhigh ) D 0:95
(29)
(where Pf quantiﬁes the frequentist variability in ˆlow and ˆhigh across repeated samples from P ) are interpreted in the frequentist paradigm as suggesting that the unknown lies between ˆlow and ˆhigh with 95% “conﬁdence.” Two diﬃculties with this are that
(a) equation (29) looks like a probability statement about but is not, because in the frequentist approach is a ﬁxed unknown constant that cannot be described probabilistically, and (b) with small sample sizes nominal 95% conﬁdence intervals based on maximum likelihood estimation can have actual coverage (the percentage of time in repeated sampling that the interval includes the true ) substantially less than 95%. The Bayesian approach has the following clear advantages: (a) It applies (at least in principle) to uncertainty about anything, whether associated with a repeatable process or not; (b) inference is unambiguously based on the ﬁrst equation in (6), without the need to face questions such as what constitutes a “reasonable” estimate of (step (iii) in the frequentist inferential paradigm above); (c) prediction is straightforwardly and unambiguously based on the second equation in (6); and (d) in the problem of decision analysis a celebrated theorem of Wald [111] says informally that all good decisions can be interpreted as having been arrived at by maximizing expected utility as in the third equation of (6), so the Bayesian approach appears to be the way forward in decision problems rather broadly (but note the ﬁnal challenge at the end of Sect. “The Bayesian Statistical Paradigm”). The principal disadvantage of the Bayesian approach is that coherence (internal logical consistency) by itself does not guarantee good calibration: You are free in the Bayesian paradigm to insert strong prior information in the modeling process (without violating coherence), and – if this information is seen after the fact to have been out of step with the world – Your inferences, predictions and/or decisions may also be oﬀtarget (of course, the same is true in the both the frequentist and Bayesian paradigms with regard to Your modeling of the likelihood information). Two examples of frequentist inferences having poor calibration properties in small samples were given by Browne and Draper [14]. Their ﬁrst example again concerns the measurement of quality of care, which is often studied with cluster samples: a random sample of J hospitals (indexed by j) and a random sample of N total patients (indexed by i) nested in the chosen hospitals is taken, and quality of care for the chosen patients and various hospital and patientlevel predictors are measured. With yij as the quality of care score for patient i in hospital j, a ﬁrst step would often be to ﬁt a variancecomponents model with random eﬀects at both the hospital and patient levels, to assess the relative magnitudes of within and betweenhospital vari
Bayesian Statistics
ability in quality of care: y i j D ˇ0 C u j C e i j ; i D 1; : : : ; n j ; J X
nj D N ;
j D 1; : : : ; J ;
IID
(u j ju2 ) N(0; u2 ) ;
jD1 IID
(e i j j e2 ) N(0; e2 ) :
(30)
Browne and Draper [14] used a simulation study to show that, with a variety of maximumlikelihoodbased methods for creating conﬁdence intervals for u2 , the actual coverage of nominal 95% intervals ranged from 72–94% across realistic sample sizes and true parameter values in the ﬁelds of education and medicine, versus 89–94% for Bayesian methods based on diﬀuse priors. Their second example involved a reanalysis of a Guatemalan National Survey of Maternal and Child Health [89,97], with threelevel data (births nested within mothers within communities), working with the randomeﬀects logistic regression model indep with (y i jk j p i jk ) Bernoulli p i jk logit p i jk D ˇ0 Cˇ1 x1i jk Cˇ2 x2 jk Cˇ3 x3k Cu jk Cv k ; (31) where yijk is a binary indicator of modern prenatal care or not and where u jk N(0; u2 ) and v k N(0; v2 ) were random eﬀects at the mother and community levels (respectively). Simulating data sets with 2 449 births by 1 558 women living in 161 communities (as in the Rodríguez and Goldman study [97]), Browne and Draper [14] showed that things can be even worse for likelihoodbased methods in this model, with actual coverages (at nominal 95%) as low as 0–2% for intervals for u2 and v2 , whereas Bayesian methods with diﬀuse priors again produced actual coverages from 89–96%. The technical problem is that the marginal likelihood functions for randomeﬀects variances are often heavily skewed, with maxima at or near 0 even when the true variance is positive; Bayesian methods, which integrate over the likelihood function rather than maximizing it, can have (much) better smallsample calibration performance as a result. Some Historical Perspective The earliest published formal example of an attempt to do statistical inference – to reason backwards from eﬀects to causes – seems to have been Bayes [5], who deﬁned conditional probability for the ﬁrst time and noted that the result we now call Bayes’ Theorem was a trivial consequence of the deﬁnition. From the 1760s til the 1920s,
all (or almost all) statistical inference was Bayesian, using the paradigm that Fisher and others referred to as inverse probability; prominent Bayesians of this period included Gauss [40], Laplace [64] and Pearson [88]. This Bayesian consensus changed with the publication of Fisher [37], which laid out a userfriendly program for maximumlikelihood estimation and inference in a wide variety of problems. Fisher railed against Bayesian inference; his principal objection was that in settings where little was known about a parameter (vector) external to the data, a number of prior distributions could be put forward to quantify this relative ignorance. He believed passionately in the late Victorian–Edwardian goal of scientiﬁc objectivity, and it bothered him greatly that two analysts with somewhat different diﬀuse priors might obtain somewhat diﬀerent posteriors. (There is a Bayesian account of objectivity: a probability is objective if many diﬀerent people more or less agree on its value. An example would be the probability of drawing a red ball from an urn known to contain 20 red and 80 white balls, if a sincere attempt is made to thoroughly mix the balls without looking at them and to draw the ball in a way that does not tend to favor one ball over another.) There are two problems with Fisher’s argument, which he never addressed: 1. He would be perfectly correct to raise this objection to Bayesian analysis if investigators were often forced to do inference based solely on prior information with no data, but in practice with even modest sample sizes the posterior is relatively insensitive to the precise manner in which diﬀuseness is speciﬁed in the prior, because the likelihood information in such situations is relatively so much stronger than the prior information; Sect. “Inference and Prediction: Binary Outcomes with No Covariates” provides an example of this phenomenon. 2. If Fisher had looked at the entire process of inference with an engineering eye to sensitivity and stability, he would have been forced to admit that uncertainty in how to specify the likelihood function has inferential consequences that are often an order of magnitude larger than those arising from uncertainty in how to specify the prior. It is an inescapable fact that subjectivity, through assumptions and judgments (such as the form of the likelihood function), is an integral part of any statistical analysis in problems of realistic complexity. In spite of these unrebutted ﬂaws in Fisher’s objections to Bayesian inference, two schools of frequentist inference – one based on Fisher’s maximumlikelihood esti
269
270
Bayesian Statistics
mation and signiﬁcance tests [38], the other based on the conﬁdence intervals and hypothesis tests of Neyman [83] and Neyman and Pearson [84] – came to dominate statistical practice from the 1920s at least through the 1980s. One major reason for this was practical: the Bayesian paradigm is based on integrating over the posterior distribution, and accurate approximations to highdimensional integrals were not available during the period in question. Fisher’s technology, based on diﬀerentiation (to ﬁnd the maximum and curvature of the logarithm of the likelihood function) rather than integration, was a much more tractable approach for its time. Jeﬀreys [54], working in the ﬁeld of astronomy, and Savage [102] and Lindley [69], building on de Finetti’s results, advocated forcefully for the adoption of Bayesian methods, but prior to the advent of MCMC techniques (in the late 1980s) Bayesians were often in the position of saying that they knew the best way to solve statistical problems but the computations were beyond them. MCMC has removed this practical objection to the Bayesian paradigm for a wide class of problems. The increased availability of aﬀordable computers with decent CPU throughput in the 1980s also helped to overcome one objection raised in Sect. “Strengths and Weaknesses of the Two Approaches” against likelihood methods, that they can produce poorlycalibrated inferences with small samples, through the introduction of the bootstrap by Efron [31] in 1979. At this writing (a) both the frequentist and Bayesian paradigms are in vigorous inferential use, with the proportion of Bayesian articles in leading journals continuing an increase that began in the 1980s; (b) Bayesian MCMC analyses are often employed to produce meaningful predictive conclusions, with the use of the bootstrap increasing for frequentist predictive calibration; and (c) the Bayesian paradigm dominates decision analysis. A BayesianFrequentist Fusion During the 20th century the debate over which paradigm to use was often framed in such a way that it seemed it was necessary to choose one approach and defend it against attacks from people who had chosen the other, but there is nothing that forces an analyst to choose a single paradigm. Since both approaches have strengths and weaknesses, it seems worthwhile instead to seek a fusion of the two that makes best use of the strengths. Because (a) the Bayesian paradigm appears to be the most ﬂexible way so far developed for quantifying all sources of uncertainty and (b) its main weakness is that coherence does not guarantee good calibration, a number of statisti
cians, including Rubin [98], Draper [26], and Little [73], have suggested a fusion in which inferences, predictions and decisions are formulated using Bayesian methods and then evaluated for their calibration properties using frequentist methods, for example by using Bayesian models to create 95% predictive intervals for observables not used in the modeling process and seeing if approximately 95% of these intervals include the actual observed values. Analysts more accustomed to the purely frequentist (likelihood) paradigm who prefer not to explicitly make use of prior distributions may still ﬁnd it useful to reason in a Bayesian way, by integrating over the parameter uncertainty in their likelihood functions rather than maximizing over it, in order to enjoy the superior calibration properties that integration has been demonstrated to provide. Future Directions Since the mid to late1980s the Bayesian statistical paradigm has made signiﬁcant advances in many ﬁelds of inquiry, including agriculture, archaeology, astronomy, bioinformatics, biology, economics, education, environmetrics, ﬁnance, health policy, and medicine (see Sect. “The Bayesian Statistical Paradigm” for recent citations of work in many of these disciplines). Three areas of methodological and theoretical research appear particularly promising for extending the useful scope of Bayesian work, as follows: Elicitation of prior distributions and utility functions: It is arguable that too much use is made in Bayesian analysis of diﬀuse prior distributions, because (a) accurate elicitation of nondiﬀuse priors is hard work and (b) lingering traces still remain of a desire to at least appear to achieve the unattainable Victorian–Edwardian goal of objectivity, the (false) argument being that the use of diﬀuse priors somehow equates to an absence of subjectivity (see, e. g., the papers by Berger [7] and Goldstein [45] and the ensuing discussion for a vigorous debate on this issue). It is also arguable that too much emphasis was placed in the 20th century on inference at the expense of decisionmaking, with inferential tools such as the Neyman–Pearson hypothesis testing machinery (Sect. “Some Historical Perspective”) used incorrectly to make decisions for which they are not optimal; the main reason for this, as noted in Sect. “Strengths and Weaknesses of the Two Approaches” and “Some Historical Perspective”, is that (a) the frequentist paradigm was dominant from the 1920s through the 1980s and (b) the high ground in decision theory is dominated by the Bayesian approach.
Bayesian Statistics
Relevant citations of excellent recent work on elicitation of prior distributions and utility functions were given in Sect. “The Bayesian Statistical Paradigm”; it is natural to expect that there will be a greater emphasis on decision theory and nondiﬀuse prior modeling in the future, and elicitation in those ﬁelds of Bayesian methodology is an important area of continuing research. Group decisionmaking: As noted in Sect. “The Bayesian Statistical Paradigm”, maximizing expected utility is an eﬀective method for decisionmaking by a single agent, but when two or more agents are involved in the decision process this approach cannot be guaranteed to yield a satisfying solution: there may be conﬂicts in the agents’ preferences, particularly if their relationship is at least partly adversarial. With three or more possible actions, transitivity of preference – if You prefer action a1 to a2 and a2 to a3 , then You should prefer a1 to a3 – is a criterion that any reasonable decisionmaking process should obey; informally, a wellknown theorem by Arrow [3] states that even if all of the agents’ utility functions obey transitivity, there is no way to combine their utility functions into a single decisionmaking process that is guaranteed to respect transitivity. However, Arrow’s theorem is temporally static, in the sense that the agents do not share their utility functions with each other and iterate after doing so, and it assumes that all agents have the same set A of feasible actions. If agents A1 and A2 have action spaces A1 and A2 that are not identical and they share the details of their utility speciﬁcation with each other, it is possible that A1 may realize that one of the actions in A2 that (s)he had not considered is better than any of the actions in A1 or vice versa; thus a temporally dynamic solution to the problem posed by Arrow’s theorem may be possible, even if A1 and A2 are partially adversarial. This is another important area for new research. Bayesian computation: Since the late 1980s, simulationbased computation based on Markov chain Monte Carlo (MCMC) methods has made useful Bayesian analyses possible in an increasingly broad range of application areas, and (as noted in Sect. “The Bayesian Statistical Paradigm”) increases in computing speed and sophistication of MCMC algorithms have enhanced this trend signiﬁcantly. However, if a regressionstyle data set is visualized as a matrix with n rows (one for each subject of inquiry) and k columns (one for each variable measured on the subjects), MCMC methods do not necessarily scale well in either n or k, with the result that they can be too slow to be of practical
use with large data sets (e.g, at current desktop computing speeds, with n and/or k on the order of 105 or greater). Improving the scaling of MCMC methods, or ﬁnding a new approach to Bayesian computation that scales better, is thus a third important area for continuing study.
Bibliography 1. Abdellaoui M (2000) Parameterfree elicitation of utility and probability weighting functions. Manag Sci 46:1497–1512 2. Aleskerov F, Bouyssou D, Monjardet B (2007) Utility Maximization, Choice and Preference, 2nd edn. Springer, New York 3. Arrow KJ (1963) Social Choice and Individual Values, 2nd edn. Yale University Press, New Haven CT 4. Barlow RE, Wu AS (1981) Preposterior analysis of Bayes estimators of mean life. Biometrika 68:403–410 5. Bayes T (1764) An essay towards solving a problem in the doctrine of chances. Philos Trans Royal Soc Lond 53:370–418 6. Berger JO (1985) Statistical Decision Theory and Bayesian Analysis. Springer, New York 7. Berger JO (2006) The case for objective Bayesian analysis (with discussion). Bayesian Anal 1:385–472 8. Berger JO, Betro B, Moreno E, Pericchi LR, Ruggeri F, Salinetti G, Wasserman L (eds) (1995) Bayesian Robustness. Institute of Mathematical Statistics Lecture NotesMonograph Series, vol 29. IMS, Hayward CA 9. Berger JO, Pericchi LR (2001) Objective Bayesian methods for model selection: introduction and comparison. In: Lahiri P (ed) Model Selection. Monograph Series, vol 38. Institute of Mathematical Statistics Lecture Notes Series, Beachwood, pp 135–207 10. Bernardo JM (1979) Reference posterior distributions for Bayesian inference (with discussion). J Royal Stat Soc, Series B 41:113–147 11. Bernardo JM, Smith AFM (1994) Bayesian Theory. Wiley, New York 12. Blavatskyy P (2006) Error propagation in the elicitation of utility and probability weighting functions. Theory Decis 60:315– 334 13. Brown PJ, Vannucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J Royal Stat Soc, Series B 60:627–641 14. Browne WJ, Draper D (2006) A comparison of Bayesian and likelihood methods for fitting multilevel models (with discussion). Bayesian Anal 1:473–550 15. Buck C, Blackwell P (2008) Bayesian construction of radiocarbon calibration curves (with discussion). In: Case Studies in Bayesian Statistics, vol 9. Springer, New York 16. Chipman H, George EI, McCulloch RE (1998) Bayesian CART model search (with discussion). J Am Stat Assoc 93:935–960 17. Christiansen CL, Morris CN (1997) Hierarchical Poisson regression modeling. J Am Stat Assoc 92:618–632 18. Clyde M, George EI (2004) Model uncertainty. Stat Sci 19:81– 94 19. Das S, Chen MH, Kim S, Warren N (2008) A Bayesian structural equations model for multilevel data with missing responses and missing covariates. Bayesian Anal 3:197–224
271
272
Bayesian Statistics
20. de Finetti B (1930) Funzione caratteristica di un fenomeno aleatorio. Mem R Accad Lincei 4:86–133 21. de Finetti B (1937). La prévision: ses lois logiques, ses sources subjectives. Ann Inst H Poincaré 7:1–68 (reprinted in translation as de Finetti B (1980) Foresight: its logical laws, its subjective sources. In: Kyburg HE, Smokler HE (eds) Studies in Subjective Probability. Dover, New York, pp 93–158) 22. de Finetti B (1938/1980). Sur la condition d’équivalence partielle. Actual Sci Ind 739 (reprinted in translation as de Finetti B (1980) On the condition of partial exchangeability. In: Jeffrey R (ed) Studies in Inductive Logic and Probability. University of California Press, Berkeley, pp 193–206) 23. de Finetti B (1970) Teoria delle Probabilità, vol 1 and 2. Eunaudi, Torino (reprinted in translation as de Finetti B (1974– 75) Theory of probability, vol 1 and 2. Wiley, Chichester) 24. Dey D, Müller P, Sinha D (eds) (1998) Practical Nonparametric and Semiparametric Bayesian Statistics. Springer, New York 25. Draper D (1995) Assessment and propagation of model uncertainty (with discussion). J Royal Stat Soc, Series B 57:45–97 26. Draper D (1999) Model uncertainty yes, discrete model averaging maybe. Comment on: Hoeting JA, Madigan D, Raftery AE, Volinsky CT (eds) Bayesian model averaging: a tutorial. Stat Sci 14:405–409 27. Draper D (2007) Bayesian multilevel analysis and MCMC. In: de Leeuw J, Meijer E (eds) Handbook of Multilevel Analysis. Springer, New York, pp 31–94 28. Draper D, Hodges J, Mallows C, Pregibon D (1993) Exchangeability and data analysis (with discussion). J Royal Stat Soc, Series A 156:9–37 29. Draper D, Krnjaji´c M (2008) Bayesian model specification. Submitted 30. Duran BS, Booker JM (1988) A Bayes sensitivity analysis when using the Beta distribution as a prior. IEEE Trans Reliab 37:239–247 31. Efron B (1979) Bootstrap methods. Ann Stat 7:1–26 32. Ferguson T (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230 33. Ferreira MAR, Lee HKH (2007) Multiscale Modeling. Springer, New York 34. Fienberg SE (2006) When did Bayesian inference become “Bayesian”? Bayesian Anal 1:1–40 35. Fishburn PC (1970) Utility Theory for Decision Making. Wiley, New York 36. Fishburn PC (1981) Subjective expected utility: a review of normative theories. Theory Decis 13:139–199 37. Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans Royal Soc Lond, Series A 222:309–368 38. Fisher RA (1925) Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh 39. Fouskakis D, Draper D (2008) Comparing stochastic optimization methods for variable selection in binary outcome prediction, with application to health policy. J Am Stat Assoc, forthcoming 40. Gauss CF (1809) Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium, vol 2. Perthes and Besser, Hamburg 41. Gelfand AE, Smith AFM (1990) Samplingbased approaches to calculating marginal densities. J Am Stat Assoc 85:398–409
42. Gelman A, Meng XL (2004) Applied Bayesian Modeling and Causal Inference From IncompleteData Perspectives. Wiley, New York 43. George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87:731–747 44. Gilks WR, Richardson S, Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo in Practice. Chapman, New York 45. Goldstein M (2006) Subjective Bayesian analysis: principles and practice (with discussion). Bayesian Anal 1:385–472 46. Good IJ (1950) Probability and the Weighing of Evidence. Charles Griffin, London 47. Green P (1995) Reversible jump Markov chain Monte carlo computation and Bayesian model determination. Biometrika 82:711–713 48. Hacking I (1984) The Emergence of Probability. University Press, Cambridge 49. Hanson TE, Kottas A, Branscum AJ (2008) Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian nonparametric approaches. J Royal Stat Soc, Series C (Applied Statistics) 57:207–226 50. Hellwig K, Speckbacher G, Weniges P (2000) Utility maximization under capital growth constraints. J Math Econ 33:1–12 51. Hendriksen C, Lund E, Stromgard E (1984) Consequences of assessment and intervention among elderly people: a three year randomized controlled trial. Br Med J 289:1522–1524 52. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417 53. Jaynes ET (2003) Probability Theory: The Logic of Science. Cambridge University Press, Cambridge 54. Jeffreys H (1931) Scientific Inference. Cambridge University Press, Cambridge 55. Jordan MI, Ghahramani Z, Jaakkola TS, Saul L (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233 56. Kadane JB (ed) (1996) Bayesian Methods and Ethics in a Clinical Trial Design. Wiley, New York 57. Kadane JB, Dickey JM (1980) Bayesian decision theory and the simplification of models. In: Kmenta J, Ramsey J (eds) Evaluation of Econometric Models. Academic Press, New York 58. Kahn K, Rubenstein L, Draper D, Kosecoff J, Rogers W, Keeler E, Brook R (1990) The effects of the DRGbased Prospective Payment System on quality of care for hospitalized Medicare patients: An introduction to the series (with discussion). J Am Med Assoc 264:1953–1955 59. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795 60. Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91:1343–1370 61. Key J, Pericchi LR, Smith AFM (1999) Bayesian model choice: what and why? (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics 6. Clarendon Press, Oxford, pp 343–370 62. Krnjaji´c M, Kottas A, Draper D (2008) Parametric and nonparametric Bayesian model specification: a case study involving models for count data. Comput Stat Data Anal 52:2110–2128 63. Laplace PS (1774) Mémoire sur la probabilité des causes par les évenements. Mém Acad Sci Paris 6:621–656 64. Laplace PS (1812) Théorie Analytique des Probabilités. Courcier, Paris
Bayesian Statistics
65. Laud PW, Ibrahim JG (1995) Predictive model selection. J Royal Stat Soc, Series B 57:247–262 66. Lavine M (1992) Some aspects of Pólya tree distributions for statistical modelling. Ann Stat 20:1222–1235 67. Leamer EE (1978) Specification searches: Ad hoc inference with nonexperimental data. Wiley, New York 68. Leonard T, Hsu JSJ (1999) Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers. Cambridge University Press, Cambridge 69. Lindley DV (1965) Introduction to Probability and Statistics. Cambridge University Press, Cambridge 70. Lindley DV (1968) The choice of variables in multiple regression (with discussion). J Royal Stat Soc, Series B 30:31–66 71. Lindley DV (2006) Understanding Uncertainty. Wiley, New York 72. Lindley DV, Novick MR (1981) The role of exchangeability in inference. Ann Stat 9:45–58 73. Little RJA (2006) Calibrated Bayes: A Bayes/frequentist roadmap. Am Stat 60:213–223 74. Mangel M, Munch SB (2003) Opportunities for Bayesian analysis in the search for sustainable fisheries. ISBA Bulletin 10:3–5 75. McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd edn. Chapman, New York 76. Meng XL, van Dyk DA (1997) The EM Algorithm: an old folk song sung to a fast new tune (with discussion). J Royal Stat Soc, Series B 59:511–567 77. Merl D, Prado R (2007) Detecting selection in DNA sequences: Bayesian modelling and inference (with discussion). In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds) Bayesian Statistics 8. University Press, Oxford, pp 1–22 78. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machine. J Chem Phys 21:1087–1091 79. Morris CN, Hill J (2000) The Health Insurance Experiment: design using the Finite Selection Model. In: Morton SC, Rolph JE (eds) Public Policy and Statistics: Case Studies from RAND. Springer, New York, pp 29–53 80. Müller P, Quintana F (2004) Nonparametric Bayesian data analysis. Stat Sci 19:95–110 81. Müller P, Quintana F, Rosner G (2004) Hierarchical metaanalysis over related nonparametric Bayesian models. J Royal Stat Soc, Series B 66:735–749 82. Munch SB, Kottas A, Mangel M (2005) Bayesian nonparametric analysis of stockrecruitment relationships. Can J Fish Aquat Sci 62:1808–1821 83. Neyman J (1937) Outline of a theory of statistical estimation based on the classical theory of probability. Philos Trans Royal Soc Lond A 236:333–380 84. Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 20:175–240 85. O’Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, Oakley JE, Rakow T (2006) Uncertain Judgements. Eliciting Experts’ Probabilities. Wiley, New York 86. O’Hagan A, Forster J (2004) Bayesian Inference, 2nd edn. In: Kendall’s Advanced Theory of Statistics, vol 2B. Arnold, London 87. Parmigiani G (2002) Modeling in medical decisionmaking: A Bayesian approach. Wiley, New York
88. Pearson KP (1895) Mathematical contributions to the theory of evolution, II. Skew variation in homogeneous material. Proc Royal Soc Lond 57:257–260 89. Pebley AR, Goldman N (1992) Family, community, ethnic identity, and the use of formal health care services in Guatemala. Working Paper 9212. Office of Population Research, Princeton 90. Pettit LI (1990) The conditional predictive ordinate for the Normal distribution. J Royal Stat Soc, Series B 52:175–184 91. Pérez JM, Berger JO (2002) Expected posterior prior distributions for model selection. Biometrika 89:491–512 92. Polson NG, Stroud JR, Müller P (2008) Practical filtering with sequential parameter learning. J Royal Stat Soc, Series B 70:413–428 93. Rashbash J, Steele F, Browne WJ, Prosser B (2005) A User’s Guide to MLwiN, Version 2.0. Centre for Multilevel Modelling, University of Bristol, Bristol UK; available at www.cmm.bristol. ac.uk Accessed 15 Aug 2008 94. Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J Royal Stat Soc, Series B 59:731–792 95. Rios Insua D, Ruggeri F (eds) (2000) Robust Bayesian Analysis. Springer, New York 96. Rodríguez A, Dunston DB, Gelfand AE (2008) The nested Dirichlet process. J Am Stat Assoc, 103, forthcoming 97. Rodríguez G, Goldman N (1995) An assessment of estimation procedures for multilevel models with binary responses. J Royal Stat Soc, Series A 158:73–89 98. Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 12:1151– 1172 99. Rubin DB (2005) Bayesian inference for causal effects. In: Rao CR, Dey DK (eds) Handbook of Statistics: Bayesian Thinking, Modeling and Computation, vol 25. Elsevier, Amsterdam, pp 1–16 100. Sabatti C, Lange K (2008) Bayesian Gaussian mixture models for highdensity genotyping arrays. J Am Stat Assoc 103:89–100 101. Sansó B, Forest CE, Zantedeschi D (2008) Inferring climate system properties using a computer model (with discussion). Bayesian Anal 3:1–62 102. Savage LJ (1954) The Foundations of Statistics. Wiley, New York 103. Schervish MJ, Seidenfeld T, Kadane JB (1990) Statedependent utilities. J Am Stat Assoc 85:840–847 104. Seidou O, Asselin JJ, Ouarda TMBJ (2007) Bayesian multivariate linear regression with application to change point models in hydrometeorological variables. In: Water Resources Research 43, W08401, doi:10.1029/2005WR004835. 105. Spiegelhalter DJ, Abrams KR, Myles JP (2004) Bayesian Approaches to Clinical Trials and HealthCare Evaluation. Wiley, New York 106. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit (with discussion). J Royal Stat Soc, Series B 64:583–640 107. Spiegelhalter DJ, Thomas A, Best NG (1999) WinBUGS Version 1.2 User Manual. MRC Biostatistics Unit, Cambridge 108. Stephens M (2000) Bayesian analysis of mixture models with an unknown number of components – an alternative to reversiblejump methods. Ann Stat 28:40–74
273
274
Bayesian Statistics
109. Stigler SM (1986) The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press, Cambridge 110. Stone M (1974) Crossvalidation choice and assessment of statistical predictions (with discussion). J Royal Stat Soc, Series B 36:111–147 111. Wald A (1950) Statistical Decision Functions. Wiley, New York 112. Weerahandi S, Zidek JV (1981) MultiBayesian statistical decision theory. J Royal Stat Soc, Series A 144:85–93
113. Weisberg S (2005) Applied Linear Regression, 3rd edn. Wiley, New York 114. West M (2003) Bayesian factor regression models in the “large p, small n paradigm.” Bayesian Statistics 7:723–732 115. West M, Harrison PJ (1997) Bayesian Forecasting and Dynamic Models. Springer, New York 116. Whitehead J (2006) Using Bayesian decision theory in doseescalation studies. In: Chevret S (ed) Statistical Methods for DoseFinding Experiments. Wiley, New York, pp 149–171
Bivariate (Twodimensional) Wavelets
Bivariate (Twodimensional) Wavelets BIN HAN Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Canada Article Outline Glossary Deﬁnitions Introduction Bivariate Reﬁnable Functions and Their Properties The Projection Method Bivariate Orthonormal and Biorthogonal Wavelets Bivariate Riesz Wavelets Pairs of Dual Wavelet Frames Future Directions Bibliography Glossary Dilation matrix A 2 2 matrix M is called a dilation matrix if all the entries of M are integers and all the eigenvalues of M are greater than one in modulus. Isotropic dilation matrix A dilation matrix M is said to be isotropic if M is similar to a diagonal matrix and all its eigenvalues have the same modulus. Wavelet system A wavelet system is a collection of square integrable functions that are generated from a ﬁnite set of functions (which are called wavelets) by using integer shifts and dilations. Definitions Throughout this article, R; C; Z denote the real line, the complex plane, and the set of all integers, respectively. For 1 6 p 6 1, L p (R2 ) denotes the set of all Lebesgue p measurable bivariate functions f such that k f kL (R2 ) :D p R p 2 R2 j f (x)j dx < 1. In particular, the space L2 (R ) of square integrable functions is a Hilbert space under the inner product Z h f ; gi :D
R2
f (x)g(x)dx ;
f ; g 2 L2 (R2 ) ;
where g(x) denotes the complex conjugate of the complex number g(x). In applications such as image processing and computer graphics, the following are commonly used isotropic
dilation matrices: 1 1 Mp2 D ; 1 1 2 1 ; Mp3 D 1 2
1 1 d dI2 D 0
Q p2 D
1 ; 1 0 ; d
(1)
where d is an integer with jdj > 1. Using a dilation matrix M, a bivariate Mwavelet system is generated by integer shifts and dilates from a ﬁnite set f 1 ; : : : ; L g of functions in L2 (R2 ). More precisely, the set of all the basic wavelet building blocks of an Mwavelet system generated by f 1 ; : : : ; L g is given by 1
X M (f
;:::;
:D f
`;M j;k
L
g)
: j 2 Z; k 2 Z2 ; ` D 1; : : : ; Lg ;
(2)
where `;M j;k (x)
:D j det Mj j/2
`
(M j x k) ;
x 2 R2 :
(3)
One of the main goals in wavelet analysis is to ﬁnd wavelet systems X M (f 1 ; : : : ; L g) with some desirable properties such that any twodimensional function or signal f 2 L2 (R2 ) can be sparsely and eﬃciently represented under the Mwavelet system X M (f 1 ; : : : ; L g): f D
L X X X
h`j;k ( f )
`;M j;k ;
(4)
`D1 j2Z k2Z2
where h`j;k : L2 (R2 ) 7! C are linear functionals. There are many types of wavelet systems studied in the literature. In the following, let us outline some of the most important types of wavelet systems. Orthonormal Wavelets We say that f 1 ; : : : ; L g generates an orthonormal Mwavelet basis in L2 (R2 ) if the system X M (f 1 ; : : : ; L g) is an orthonormal basis of the Hilbert space L2 (R2 ). That is, the linear span of elements in X M (f 1 ; : : : ; L g) is dense in L2 (R2 ) and h
`;M j;k ;
`0 ;M j0 ;k 0 i 0
D ı``0 ı j j0 ı kk 0 ;
8 j; j 2 Z ; k; k 0 2 Z2 ; `; `0 D 1; : : : ; L ;
(5)
where ı denotes the Dirac sequence such that ı0 D 1 and ı k D 0 for all k ¤ 0. For an orthonormal wavelet basis X M (f 1 ; : : : ; L g), the linear functional h`j;k in (4) is
275
276
Bivariate (Twodimensional) Wavelets
`;M j;k i and the representation in (4)
given by h`j;k ( f ) D h f ; becomes f D
L X X X `D1 j2Z
hf;
`;M j;k i
`;M j;k
f 2 L2 (R2 )
;
(6)
k2Z2
with the series converging in L2 (R2 ). Riesz Wavelets We say that f 1 ; : : : ; L g generates a Riesz Mwavelet basis in L2 (R2 ) if the system X M (f 1 ; : : : ; L g) is a Riesz basis of L2 (R2 ). That is, the linear span of elements in X M (f 1 ; : : : ; L g) is dense in L2 (R2 ) and there exist two positive constants C1 and C2 such that 2 X L X X L X X X `;M ` 2 ` C1 jc j;k j 6 c j;k j;k 2 2 `D1 j2Z k2Z `D1 j2Z k2Z 2 L 2 (R )
6 C2
L X X X
jc `j;k j2
`D1 j2Z k2Z2
for all ﬁnitely supported sequences fc `j;k g j2Z;k2Z2 ;`D1;:::;L . Clearly, a Riesz Mwavelet generalizes an orthonormal Mwavelet by relaxing the orthogonality requirement in (5). For a Riesz basis X M (f 1 ; : : : ; L g), it is wellknown that there exists a dual Riesz basis f ˜ `; j;k : j 2 Z; k 2 Z2 ; ` D 1; : : : ; Lg of elements in L2 (R2 ) (this set is not necessarily generated by integer shifts and dilates from some ﬁnite set of functions) such that (5) still holds after 0 0 0 0 replacing `j0 ;k;M0 by ˜ ` ; j ;k . For a Riesz wavelet basis, the linear functional in (4) becomes h` ( f ) D h f ; ˜ `; j;k i. In j;k
fact, ˜ `; j;k :D F 1 ( is deﬁned to be F ( f ) :D
`;M j;k ),
L X X X `D1 j2Z
hf;
where F : L2 (R2 ) 7! L2 (R2 )
`;M j;k i
`;M j;k
;
f 2 L2 (R2 ) :
k2Z2
(7) Wavelet Frames A further generalization of a Riesz wavelet is a wavelet frame. We say that f 1 ; : : : ; L g generates an Mwavelet frame in L2 (R2 ) if the system X M (f 1 ; : : : ; L g) is a frame of L2 (R2 ). That is, there exist two positive constants C1 and C2 such that C1 k f k2L 2 (R2 ) 6
L X X X
jh f ;
`;M 2 j;k ij
`D1 j2Z k2Z2
6 C2 k f k2L 2 (R2 ) ;
8 f 2 L2 (R2 ) :
(8)
It is not diﬃcult to check that a Riesz Mwavelet is an Mwavelet frame. Then (8) guarantees that the frame operator F in (7) is a bounded and invertible linear operator. For an Mwavelet frame, the linear functional in (4) can be chosen to be h`j;k ( f ) D h f ; F 1 ( `;M j;k )i; however, such functionals may not be unique. The fundamental difference between a Riesz wavelet and a wavelet frame lies in that for any given function f 2 L2 (R2 ), the representation in (4) is unique under a Riesz wavelet while it may not be unique under a wavelet frame. The representation in (4) with the choice h`j;k ( f ) D h f ; F 1 ( `;M j;k )i is called the canonical representation of a wavelet frame. In other 2 words, fF 1 ( `;M j;k ) : j 2 Z; k 2 Z ; ` D 1; : : : ; Lg is called the canonical dual frame of the given wavelet frame X M (f 1 ; : : : ; L g). Biorthogonal Wavelets We say that (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g) generates a pair of biorthogonal Mwavelet bases in L2 (R2 ) if each of X M (f 1 ; : : : ; L g) and X M (f ˜ 1 ; : : : ; ˜ L g) is a Riesz basis of L2 (R2 ) and (5) still holds after replac0 0 ing `j0 ;k;M0 by ˜ `j0 ;k;M0 . In other words, the dual Riesz basis of X M (f 1 ; : : : ; L g) has the wavelet structure and is given by X M (f ˜ 1 ; : : : ; ˜ L g). For a biorthogonal wavelet, the wavelet representation in (4) becomes f D
L X X X `D1 j2Z
h f ; ˜ `;M j;k i
`;M j;k
;
f 2 L2 (R2 ) : (9)
k2Z2
Obviously, f 1 ; : : : ; L g generates an orthonormal Mwavelet basis in L2 (R2 ) if and only if (f 1 ; : : : ; L g; f 1 ; : : : ; L g) generates a pair of biorthogonal Mwavelet bases in L2 (R2 ). Dual Wavelet Frames Similarly, we have the notion of a pair of dual wavelet frames. We say that (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g) generates a pair of dual Mwavelet frames in L2 (R2 ) if each of X M (f 1 ; : : : ; L g) and X M (f ˜ 1 ; : : : ; ˜ L g) is an Mwavelet frame in L2 (R2 ) and h f ; gi D
L X X X `D1 j2Z
h f ; ˜ `;M j;k ih
`;M j;k ; gi
;
k2Z2
f ; g 2 L2 (R2 ) : It follows from the above identity that (9) still holds for a pair of dual Mwavelet frames. We say that f 1 ; : : : ; L g generates a tight Mwavelet frame if (f 1 ; : : : ; L g;
Bivariate (Twodimensional) Wavelets
f 1 ; : : : ; L g) generates a pair of dual Mwavelet frames in L2 (R2 ). For a tight wavelet frame f 1 ; : : : ; L g, the wavelet representation in (6) still holds. So, a tight wavelet frame is a generalization of an orthonormal wavelet basis. Introduction Bivariate (twodimensional) wavelets are of interest in representing and processing two dimensional data such as images and surfaces. In this article we shall discuss some basic background and results on bivariate wavelets. We denote ˘ J the set of all bivariate polynomials of total degree at most J. For compactly supported functions 1 ; : : : ; L , we say that f 1 ; : : : ; L g has J vanishing moments if hP; ` i D 0 for all ` D 1; : : : ; L and all polynomials P 2 ˘ J1 . The advantages of wavelet representations largely lie in the following aspects: 1. There is a fast wavelet transform (FWT) for computing the wavelet coeﬃcients h`j;k ( f ) in the wavelet representation (4). 2. The wavelet representation has good time and frequency localization. Roughly speaking, the basic building blocks `;M have good time localization and j;k smoothness. 3. The wavelet representation is sparse. For a smooth function f , most wavelet coeﬃcients are negligible. Generally, the wavelet coeﬃcient h`j;k ( f ) only depends on the information of f in a small neighborhood of the ˜ `;M support of `;M j;k (more precisely, the support of j;k if h` ( f ) D h f ; ˜ `;M i). If f in a small neighborhood of j;k
j;k
the support of `;M j;k is smooth or behaves like a polynomial to certain degree, then by the vanishing moments of the dual wavelet functions, h`j;k ( f ) is often negligible. 4. For a variety of function spaces B such as Sobolev and Besov spaces, its norm k f kB , f 2 B, is equivalent to certain weighted sequence norm of its wavelet coeﬃcients fh`j;k ( f )g j2Z;k2Z2 ;`D1;:::;L . For more details on advantages and applications of wavelets, see [6,8,14,16,21,58]. For a dilation matrix M D dI2 , the easiest way for obtaining a bivariate wavelet is to use the tensor product method. In other words, all the functions in the generator set f 1 ; : : : ; L g L2 (R2 ) take the form ` (x; y) D f ` (x)g ` (y), x; y 2 R, where f ` and g ` are some univariate functions in L2 (R). However, tensor product (also called separable) bivariate wavelets give preference to the horizontal and vertical directions which may not be desirable in applications such as image processing ([9,16,50,58]).
Also, for a nondiagonal dilation matrix, it is diﬃcult or impossible to use the tensor product method to obtain bivariate wavelets. Therefore, nonseparable bivariate wavelets themselves are of importance and interest in both theory and application. In this article, we shall address several aspects of bivariate wavelets with a general twodimensional dilation matrix. Bivariate Refinable Functions and Their Properties In order to have a fast wavelet transform to compute the wavelet coeﬃcients in (4), the generators 1 ; : : : ; L in a wavelet system are generally obtained from a reﬁnable function via a multiresolution analysis [6,15,16,58]. Let M be a 2 2 dilation matrix. For a function in L2 (R2 ), we say that is Mreﬁnable if it satisﬁes the following reﬁnement equation X a k (Mx k) ; a.e. x 2 R2 ; (10) (x) D j det Mj k2Z2
where a : Z2 7! C is a ﬁnitely supported sequence on P Z2 satisfying k2Z2 a k D 1. Such a sequence a is often called a mask in wavelet analysis and computer graphics, or a lowpass ﬁlter in signal processing. Using the Fourier transform ˆ of 2 L2 (R2 ) \ L1 (R2 ) and the Fourier series aˆ of a, which are deﬁned to be Z ˆ () :D (x)e i x dx and R2 X ˆ a k e i k ; 2 R2 ; (11) a() :D k2Z2
the reﬁnement equation (10) can be equivalently rewritten as ˆ T ) D a() ˆ ; ˆ () (M
2 R2 ;
(12)
where M T denotes the transpose of the dilation matrix M. ˆ D 1 and aˆ is a trigonometric polynomial, one Since a(0) can deﬁne a function ˆ by ˆ () :D
1 Y
aˆ((M T ) j ) ;
2 R2 ;
(13)
jD1
with the series converging uniformly on any compact set of R2 . It is known ([4,16]) that is a compactly supported tempered distribution and clearly satisﬁes the reﬁnement equation in (12) with mask a. We call the standard reﬁnable function associated with mask a and dilation M. Now the generators 1 ; : : : ; L of a wavelet system are often obtained from the reﬁnable function via c`
b ˆ (M T ) :D a` ()() ;
2 R2 ; ` D 1; : : : ; L ;
277
278
Bivariate (Twodimensional) Wavelets
for some 2periodic trigonometric polynomials ab` . Such sequences a` are called wavelet masks or highpass ﬁlters. Except for very few special masks a such as masks for box spline reﬁnable functions (see [20] for box splines), most reﬁnable functions , obtained in (13) from mask a and dilation M, do not have explicit analytic expressions and a socalled cascade algorithm or subdivision scheme is used to approximate the reﬁnable function . Let B be a Banach space of bivariate functions. Starting with a suitable initial function f 2 B, we iteratively compute a sen quence fQ a;M f g1 nD0 of functions, where Q a;M f :D j det Mj
X
a k f (M k) :
(14)
k2Z2 n If the sequence fQ a;M f g1 nD0 of functions converges to some f1 2 B in the Banach space B, then we have Q a;M f1 D f1 . That is, as a ﬁxed point of the cascade operator Q a;M , f1 is a solution to the reﬁnement equation (10). A cascade algorithm plays an important role in the study of reﬁnable functions in wavelet analysis and of subdivision surfaces in computer graphics. For more details on cascade algorithms and subdivision schemes, see [4,19,21,22,23,31,36,54] and numerous references therein. One of the most important properties of a reﬁnable function is its smoothness, which is measured by its Lp smoothness exponent p () and is deﬁned to be
p () :D supf : 2 Wp (R2 )g ;
1 6 p 6 1 ; (15)
where Wp (R2 ) denotes the fractional Sobolev space of order > 0 in L p (R2 ). The notion of sum rules of a mask is closely related to the vanishing moments of a wavelet system ([16,44]). For a bivariate mask a, we say that a satisﬁes the sum rules of order J with the dilation matrix M if X
a jCM k P( jCM k) D
k2Z2
X
a M k P(M k); 8P 2 ˘ J1 :
k2Z2
(16)
Or equivalently, @1 1 @2 2 aˆ(2 ) D 0 for all 1 C 2 < J and 2 [(M T )1 Z2 ]nZ2 , where @1 and @2 denote the partial derivatives in the ﬁrst and second coordinate, respectively. Throughout the article, we denote sr(a,M) the highest order of sum rules satisﬁed by the mask a with the dilation matrix M. To investigate various properties of reﬁnable functions, next we introduce a quantity p (a; M) in wavelet analysis (see [31]). For a bivariate mask a and a dilation
matrix M, we denote
p (a; M) :D log (M) [j det Mj11/p (a; M; p)] ; 16 p 61;
(17)
where (M) denotes the spectral radius of M and n : (a; M; p) :D max lim sup ka n;(1 ;2 ) k1/n ` p (Z2 ) n!1
o 1 C 2 D sr(a; M); 1 ; 2 2 N [ f0g ; where the sequence a n;(1 ;2 ) is deﬁned by its Fourier series as follows: for D (1 ; 2 ) 2 R2 ,
3
a n;(1 ;2 ) () :D (1 ei1 )1 (1 e i2 )2 T n1 ˆ ˆ ˆ T ) a() ˆ : a((M ) ) a((M T )n2 ) a(M
It is known that ([31,36]) p (a; M) > q (a; M) > p (a; M) C (1/q 1/p) log (M) j det Mj for all 1 6 p 6 q 6 1. Parts of the following result essentially appeared in various forms in many papers in the literature. For the study of reﬁnable functions and cascade algorithms, see [4,10,12,15,16,22,23,25,29,31,45,48,53] and references therein. The following is from [31]. Theorem 1 Let M be a 2 2 isotropic dilation matrix and a be a ﬁnitely supported bivariate mask. Let denote the standard reﬁnable function associated with mask a and dilation M. Then p () > p (a; M) for all 1 6 p 6 1. If the shifts of are stable, that is, f( k) : k 2 Z2 g is a Riesz system in L p (R2 ), then p () D p (a; M). Moreover, for every nonnegative integer J, the following statements are equivalent. 1. For every compactly supported function f 2 WpJ (R2 ) (if p D 1, we require f 2 C J (R2 )) such that fˆ(0) D 1 and @1 1 @2 2 fˆ(2 k) D 0 for all 1 C 2 < J and n 2 k 2 Z nf0g, the cascade sequence fQ a;M f g1 nD0 con
verges in the Sobolev space WpJ (R2 ) (in fact, the limit function is ). 2. p (a; M) > J.
Symmetry of a reﬁnable function is also another important property of a wavelet system in applications. For example, symmetry is desirable in order to handle the boundary of an image or to improve the visual quality of a surface ([16,21,22,32,50,58]). Let M be a 2 2 dilation matrix and G be a ﬁnite set of 2 2 integer matrices. We say that G is a symmetry group with respect to M ([27,29]) if G is a group under matrix
Bivariate (Twodimensional) Wavelets
multiplication and MEM 1 2 G for all E 2 G. In dimension two, two commonly used symmetry groups are D4 and D6 :
1 0 1 0 0 1 0 1 ;˙ ;˙ ;˙ D4 :D ˙ 0 1 0 1 1 0 1 0 and
1 D6 :D ˙ 0 0 ˙ 1
0 0 ;˙ 1 1 1 1 ;˙ 0 0
1 1 ;˙ 1 1 1 1 ;˙ 1 1
1 ; 0 0 : 1
For symmetric reﬁnable functions, we have the following result ([32, Proposition 1]): Proposition 2 Let M be a 2 2 dilation matrix and G be a symmetry group with respect to M. Let a be a bivariate mask and be the standard reﬁnable function associated with mask a and dilation M. Then a is Gsymmetric with center ca : a E(kc a )Cc a D a k for all k 2 Z2 and E 2 G, if and only if, is Gsymmetric with center c: (E(c)Cc) D ;
8E2G
if 1 (a; M) > 0 and a is an interpolatory mask with dilation M, that is, a0 D j det Mj1 and a M k D 0 for all k 2 Z2 nf0g. Example 6 Let M D 2I2 and ˆ 1 ; 2 ) : D cos(1 /2) cos(2 /2) cos(1 /2 C 2 /2) a( 1 C 2 cos(1 ) C 2 cos(2 ) C 2 cos(1 C 2 ) cos(21 C 2 ) cos(1 C 22 ) cos(1 2 ) /4 : (18) Then sr(a; 2I2 ) D 4, a is D6 symmetric with center 0 and a is an interpolatory mask with dilation 2I 2 . Since
2 (a; 2I2 ) 2:44077, so, 1 (a; 2I2 ) > 2 (a; 2I2 ) 1 1:44077 > 0. By Theorem 5, the associated standard reﬁnable function is interpolating. The butterﬂy interpolatory subdivision scheme for triangular meshes is based on this mask a (see [23]). For more details on bivariate interpolatory masks and interpolating reﬁnable functions, see [4,19,22,23,25,26,27, 31,37,38,39,59].
with c :D (MI2 )1 c a : The Projection Method
In the following, let us present some examples of bivariate reﬁnable functions. ˆ 1 ; 2 ) :D cos4 (1 /2) Example 3 Let M D 2I2 and a( 4 cos (2 /2). Then a is a tensor product mask with sr(a; 2I2 ) D 4 and a is D4 symmetric with center 0. The associated standard reﬁnable function is the tensor product spline of order 4. The CatmullClark subdivision scheme for quadrilateral meshes in computer graphics is based on this mask a ([21]). ˆ 1 ; 2 ) :D cos2 (1 /2) Example 4 Let M D 2I2 and a( 2 2 cos (2 /2) cos (1 /2 C 2 /2). Then sr(a; 2I2 ) D 4 and a is D6 symmetric with center 0. The associated reﬁnable function is the convolution of the three direction box spline with itself ([20]). The Loop subdivision scheme for triangular meshes in computer graphics is based on this mask a ([21]). Another important property of a reﬁnable function is interpolation. We say that a bivariate function is interpolating if is continuous and (k) D ı k for all k 2 Z2 . Theorem 5 Let M be a 2 2 dilation matrix and a be a ﬁnitely supported bivariate mask. Let denote the standard reﬁnable function associated with mask a and dilation M. Then is an interpolating function if and only
In applications, one is interested in analyzing some optimal properties of multivariate wavelets. The projection method is useful for this purpose. Let r and s be two positive integers with r 6 s. Let P be an r s realvalued matrix. For a compactly supported function and a ﬁnitely supported mask a in dimension s, we deﬁne the projected function P and projected mask Pa in dimension r by ˆ T ) and c P() :D (P c Pa() :D
t X
ˆ T C 2" j ) ; a(P
2 Rr ;
(19)
jD1
where ˆ and aˆ are understood to be continuous and f"1 ; : : : ; " t g is a complete set of representatives of the distinct cosets of [PT Rr ]/Zs . If P is an integer matrix, then c D a(P ˆ T ). Pa() Now we have the following result on projected reﬁnable functions ([27,28,34,35]). Theorem 7 Let N be an s s dilation matrix and M be an r r dilation matrix. Let P be an r s integer matrix such that PN D MP and PZs D Zr . Let aˆ be a 2periodic ˆ D 1 and trigonometric polynomials in svariables with a(0) the standard Nreﬁnable function associated with mask a.
279
280
Bivariate (Twodimensional) Wavelets
c T ) D c c Then sr(a; N) 6 sr(Pa; M) and P(M Pa()P(). That is, P is Mreﬁnable with mask Pa. Moreover, for all 1 6 p 6 1,
p () 6 p (P) and j det Mj11/p (Pa; M; p) 6 j det Nj11/p (a; N; p) : If we further assume that (M) D (N), then p (a; N) 6
p (Pa; M). As pointed out in [34,35], the projection method is closely related to box splines. For a given r s (direction) integer matrix of rank r with r 6 s, the Fourier transform of its associated box spline M and its mask a are given by (see [20])
b
Y 1 e i k M () :D ik
and
k2
ac () :D
Y 1 C e i k ; 2
2 Rr ;
(20)
the projected mask Pa is an interpolatory mask with dilation M and sr(Pa; M) > sr(a; j det MjI r ). Example 10 For M D Mp2 or M D Qp2 , we can take H :D Mp2 in Theorem 9. For more details on the projection method, see [25,27,28, 32,34,35]. Bivariate Orthonormal and Biorthogonal Wavelets In this section, we shall discuss the analysis and construction of bivariate orthonormal and biorthogonal wavelets. For analysis of biorthogonal wavelets, the following result is wellknown (see [9,11,16,24,26,31,46,51,53] and references therein): Theorem 11 Let M be a 2 2 dilation matrix and a; a˜ be two ﬁnitely supported bivariate masks. Let and ˜ be the standard Mreﬁnable functions associated with masks a and a˜, respectively. Then ; ˜ 2 L2 (R2 ) and satisfy the biorthogonality relation
k2
where k 2 means that k is a column vector of and k goes through all the columns of once and only once. Let [0;1]s denote the characteristic function of the unit cube [0; 1]s . From (20), it is evident that the box spline M is just the projected function [0;1]s , since [0;1]s D M by [0;1]s () D [0;1]s ( T ); 2 Rr . Note that M is 2I r reﬁnable with the mask a , since M (2) D ac () M (). As an application of the projection method in Theorem 7, we have ([25, Theorem 3.5.]):
b
b
2
1
Corollary 8 Let M D 2I s and a be an interpolatory mask with dilation M such that a is supported inside [3; 3]s . Then 1 (a; 2I s ) 6 2 and therefore, 62 C 2 (Rs ), where is the standard reﬁnable function associated with mask a and dilation 2I s . We use proof by contradiction. Suppose 1 (a; 2I s ) > 2. Let P D [1; 0; : : : ; 0] be a 1 s matrix. Then we must have sr(a; 2I s ) > 3, which, combining with other assumptions c D 1/2 C 9/16 cos() 1/16 cos(3). on a, will force Pa() Since 1 (Pa; 2) D 2, by Theorem 7, we must have
1 (a; 2I s ) 6 1 (Pa; 2) D 2. A contradiction. So, 1 (a; 2I s ) 6 2. In particular, the reﬁnable function in the butterﬂy subdivision scheme in Example 6 is not C2 . The projection method can be used to construct interpolatory masks painlessly [27]). Theorem 9 Let M be an r r dilation matrix. Then there is an r r integer matrix H such that MZr D HZr and H r D j det MjI r . Let P :D j det Mj1 H. Then for any (tensor product) interpolatory mask a with dilation j det MjI r ,
˜ k)i D ı k ; h; (
k 2 Z2 ;
(21)
˜ M) > 0, and (a; a˜) is if and only if, 2 (a; M) > 0, 2 ( a; a pair of dual masks: X aˆ( C 2 )aˆ˜ ( C 2 ) D 1 ; 2 R2 ; (22) 2 M T
where M T is a complete set of representatives of distinct cosets of [(M T )1 Z2 ]/Z2 with 0 2 M T . Moreover, if (21) holds and there exist 2periodic trigonometric polynomi˜1 ; : : : ; a˜ m1 with m :D j det Mj such als ab1 ; : : : ; a m1 , ab that
1
M
1
b ()M
ˆb [ a; a 1 ;:::; a m1 ]
b ()
T
a˜ 1 ;:::; a˜ m1 ] [ aˆ˜ ;b
D Im ;
2 R2 ; (23)
where for 2 R2 ,
b () :D
M
2
ˆb [ a; a 1 ;:::; a m1 ]
ˆ C 2 0 ) a(
6 ˆ C 2 1 ) 6 a( 6: 6: 4:
b ab ( C 2 )
ab1 ( C 2 0 )
3
: : : a m1 ( C 2 0 )
ab1 ( C 2 1 ) :: :
: : : m1 : : :: : :
1
b
ˆ C 2 m1 ) ab1 ( C 2 m1 ) : : : a m1 ( C 2 m1 ) a(
7 7 7 7 5
(24) with f0 ; : : : ; m1 g :D M T and 0 :D 0. Deﬁne : : : ; m1 ; ˜ 1 ; : : : ; ˜ m1 by c`
1;
b ˆ (M T ) :D a` ()() and
c ˆ˜ ; ˜ ` (M T ) :D ab ˜` ()()
` D 1; : : : ; m 1 :
(25)
˜ 1 ; : : : ; ˜ m1 g) generates a pair Then (f of biorthogonal Mwavelet bases in L2 (R2 ). 1; : : : ;
m1 g; f
Bivariate (Twodimensional) Wavelets
As a direct consequence of Theorem 11, for orthonormal wavelets, we have Corollary 12 Let M be a 2 2 dilation matrix and a be a ﬁnitely supported bivariate mask. Let be the standard reﬁnable function associated with mask a and dilation M. Then has orthonormal shifts, that is, h; ( k)i D ı k for all k 2 Z2 , if and only if, 2 (a; M) > 0 and a is an orthogonal mask (that is, (22) is satisﬁed with aˆ˜ being replaced ˆ If in addition there exist 2periodic trigonometby a). ric polynomials ab1 ; : : : ; a m1 with m :D j det Mj such that M
b ()M
1 b b ]()
T
D I m . Then f
ˆb ˆ a 1 ;:::; a m1 [ a; a 1 ;:::; a m1 ] [ a; : : : ; m1 g, deﬁned in (25), generates
1;
an orthonormal
Mwavelet basis in L2 (R2 ). The masks a and a˜ are called lowpass ﬁlters and the masks a 1 ; : : : ; a m1 ; a˜1 ; : : : ; a˜ m1 are called highpass ﬁlters in the engineering literature. The equation in (23) is called the matrix extension problem in the wavelet litera˜ ture (see [6,16,46,58]). Given a pair of dual masks a and a, b b 1 m1 1 m1 though the existence of a ; : : : ; a ; a˜ ; : : : ; a˜ (without symmetry) in the matrix extension problem in (23) is guaranteed by the Quillen–Suslin theorem, it is far from a trivial task to construct them (in particular, with symmetry) algorithmically. For the orthogonal case, the general theory of the matrix extension problem in (23) with a˜ D a and a˜` D a` ; ` D 1; : : : ; m 1, remains unanswered. However, when j det Mj D 2, the matrix extension problem is trivial. In fact, letting 2 M T nf0g (that is, M T D f0; g, one deﬁnes ab1 () :D e i aˆ˜ ( C 2 ) ˜1 () :D e i a( ˆ C 2 ), where 2 Z2 satisﬁes and ab D 1/2. It is not an easy task to construct nonseparable orthogonal masks with desirable properties for a general dilation matrix. For example, for the dilation matrix Qp2 in (1), it is not known so far in the literature [9] whether there is a compactly supported C1 Qp2 reﬁnable function with orthonormal shifts. See [1,2,9, 25,27,30,32,50,51,53,57] for more details on construction of orthogonal masks and orthonormal reﬁnable functions. However, for any dilation matrix, “separable” orthogonal masks with arbitrarily high orders of sum rules can be easily obtained via the projection method. Let M be a 2 2 dilation matrix. Then M D E diag(d1 ; d2 )F for some integer matrices E and F such that j det Ej D j det Fj D 1 and d1 ; d2 2 N. Let a be a tensor product orthogonal mask with the diagonal dilation matrix diag(d1 ; d2 ) satisfying the sum rules of any preassigned order J. Then the projected mask Ea (that is, (Ea) k :D a E 1 k ; k 2 Z2 ) is an orthogonal mask with dilation M and sr(Ea; M) > J. See [27, Corollary 3.4.] for more detail.
1
1
Using the highpass ﬁlters in the matrix extension problem, for a given mask a, dual masks a˜ (without any guaranteed order of sum rules of the dual masks) of a can be obtained by the lifting scheme in [63] and the stable completion method in [3]. Special families of dual masks with sum rules can be obtained by the convolution method in [24, Proposition 3.7.] and [43], but the constructed dual masks generally have longer supports with respect to their orders of sum rules. Another method, which we shall cite here, for constructing all ﬁnitely supported dual masks with any preassigned orders of sum rules is the CBC (coset by coset) algorithm proposed in [5,25,26]. For D (1 ; 2 ) and D (1 ; 2 ), we say 6 if
1 6 1 and 2 6 2 . Also we denote jj D 1 C 2 , ! :D 1 !2 ! and x :D x11 x22 for x D (x1 ; x2 ). We denote ˝ M a complete set of representatives of distinct cosets of Z2 /[MZ2 ] with 0 2 ˝ M . The following result is from [25]. Theorem 13 (CBC Algorithm) Let M be a 2 2 dilation matrix and a be an interpolatory mask with dilation M. Let J be any preassigned positive integer. a ; jj < J, from the mask a by 1. Compute the quantities h the recursive formula: X X ! a :D ı (1)jj a k k : ha h
!( )! 2 06 0, where aˆ˜ dilation M. If 2 (a; M) > 0 and 2 ( a;
1
b
T
is the (1; 1)entry of the matrix [M b1 () ]1 , ˆ a ;:::; a m1 ] [ a; then f 1 ; : : : ; m1 g, which are deﬁned in (25), generates a Riesz Mwavelet basis in L2 (R2 ). ˆ 1 ; 2 ) :D cos2 (1 /2) Example 17 Let M D 2I2 and a( cos2 (2 /2) cos2 (1 /2 C 2 /2) in Example 4. Then sr(a; 2I2 ) D 4 and a is D6 symmetric with center 0. Deﬁne ([41,60]) ab1 ( ; ) : D ei(1 C2 ) aˆ( C ; ); 1
2
1
2
ab2 (1 ; 2 ) : D ei2 aˆ(1 ; 2 C ); ˆ 1 C ; 2 C ) : ab3 (1 ; 2 ) : D e i1 a( Then all the conditions in Theorem 16 are satisﬁed and f 1 ; 2 ; 3 g generates a Riesz 2I 2 wavelet basis in L2 (R2 ) ([41]). This Riesz wavelet derived from the Loop scheme has been used in [49] for mesh compression in computer graphics with impressive performance. Pairs of Dual Wavelet Frames In this section, we mention a method for constructing bivariate dual wavelet frames. The following Oblique Extension Principle (OEP) has been proposed in [18] (and independently in [7]. Also see [17]) for constructing pairs of dual wavelet frames. Theorem 18 Let M be a 2 2 dilation matrix and a; a˜ be two ﬁnitely supported masks. Let ; ˜ be the standard ˜ respectively, such Mreﬁnable functions with masks a and a, that ; ˜ 2 L2 (R2 ). If there exist 2periodic trigonometric
Bivariate (Twodimensional) Wavelets
˜1 ; : : : ; a˜bL such that (0) D 1, polynomials ; ab1 ; : : : ; abL ; ab b b b 1 L 1 a (0) D D a (0) D a˜ (0) D D a˜bL (0) D 0, and T
M
ˆb [(M T ) a; a 1 ;:::; abL ]
()M ˆ b1 bL () [ a˜ ; a˜ ;:::; a˜ ]
D diag(( C 0 ); : : : ; ( C m1 )) ;
(26)
where f0 ; : : : ; m1 g D M T with 0 D 0 in Theorem 11. Then (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g), deﬁned in (25), generates a pair of dual Mwavelet frames in L2 (R2 ). For dimension one, many interesting tight wavelet frames and pairs of dual wavelet frames in L2 (R) have been constructed via the OEP method in the literature, for more details, see [7,17,18,24,30,52,61,62] and references therein. The application of the OEP in high dimensions is much more diﬃcult, mainly due to the matrix extension problem in (26). The projection method can also be used to obtain pairs of dual wavelet frames ([34,35]). Theorem 19 Let M be an r r dilation matrix and N be an s s dilation matrix with r 6 s. Let P be an r s integer matrix of rank r such that MP D PN and PT (Zr n[M T Zr ]) Zs n[N T Zs ]. Let 1 ; : : : ; L ; ˜ 1 ; : : : ; ˜ L be compactly supported functions in L2 (Rs ) such that
2 (
`
)>0
and 2 ( ˜ ` ) > 0
8 ` D 1; : : : ; L :
If (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g) generates a pair of dual Nwavelet frames in L2 (Rs ), then (fP 1 ; : : : ; P L g; fP ˜ 1 ; : : : ; P ˜ L g) generates a pair of dual Mwavelet frames in L2 (Rr ). Example 20 Let be an r s (direction) integer matrix such that T (Zr n[2Zr ]) Zs n[2Z2 ]. Let M D 2I r s and N D 2I s . Let f 1 ; : : : ; 2 1 g be the generators of the tensor product Haar orthonormal wavelet in dimension s, derived from the Haar orthonormal reﬁnable function s :D [0;1]s . Then by Theorem 19, f 1 ; : : : ; 2 1 g r generates a tight 2I r wavelet frame in L2 (R ) and all the projected wavelet functions are derived from the reﬁnable box spline function D M . Future Directions There are still many challenging problems on bivariate wavelets. In the following, we only mention a few here. 1. For any 2 2 dilation matrix M, e. g., M D Qp2 in (1), can one always construct a family of MRA compactly supported orthonormal Mwavelet bases with arbitrarily high smoothness (and with symmetry if j det Mj > 2)?
2. The matrix extension problem in (23) for orthogonal masks. That is, for a given orthogonal mask a with dilation M, ﬁnd ﬁnitely supported highpass ﬁlters a 1 ; : : : ; a m1 (with symmetry if possible) such that (23) holds with a˜ D a and a˜` D a` . 3. The matrix extension problem in (23) for a given pair of dual masks with symmetry. That is, for a given pair of symmetric dual masks a and a˜ with dilation M, ﬁnd ﬁnitely supported symmetric highpass ﬁlters a1 ; : : : ; a m1 ; a˜1 ; : : : ; a˜ m1 such that (23) is satisﬁed. 4. Directional bivariate wavelets. In order to handle edges of diﬀerent orientations in images, directional wavelets are of interest in applications. See [13,55] and many references on this topic. Bibliography Primary Literature 1. Ayache A (2001) Some methods for constructing nonseparable, orthonormal, compactly supported wavelet bases. Appl Comput Harmon Anal 10:99–111 2. Belogay E, Wang Y (1999) Arbitrarily smooth orthogonal nonseparable wavelets in R2 . SIAM J Math Anal 30:678–697 3. Carnicer JM, Dahmen W, Peña JM (1996) Local decomposition of refinable spaces and wavelets. Appl Comput Harmon Anal 3:127–153 4. Cavaretta AS, Dahmen W, Micchelli CA (1991) Stationary subdivision. Mem Amer Math Soc 93(453)1–186 5. Chen DR, Han B, Riemenschneider SD (2000) Construction of multivariate biorthogonal wavelets with arbitrary vanishing moments. Adv Comput Math 13:131–165 6. Chui CK (1992) An introduction to wavelets. Academic Press, Boston 7. Chui CK, He W, Stöckler J (2002) Compactly supported tight and sibling frames with maximum vanishing moments. Appl Comput Harmon Anal 13:224–262 8. Cohen A (2003) Numerical analysis of wavelet methods. NorthHolland, Amsterdam 9. Cohen A, Daubechies I (1993) Nonseparable bidimensional wavelet bases. Rev Mat Iberoamericana 9:51–137 10. Cohen A, Daubechies I (1996) A new technique to estimate the regularity of refinable functions. Rev Mat Iberoamericana 12:527–591 11. Cohen A, Daubechies I, Feauveau JC (1992) Biorthogonal bases of compactly supported wavelets. Comm Pure Appl Math 45:485–560 12. Cohen A, Gröchenig K, Villemoes LF (1999) Regularity of multivariate refinable functions. Constr Approx 15:241–255 13. Cohen A, Schlenker JM (1993) Compactly supported bidimensional wavelet bases with hexagonal symmetry. Constr Approx 9:209–236 14. Dahmen W (1997) Wavelet and multiscale methods for operator equations. Acta Numer 6:55–228 15. Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Comm Pure Appl Math 41:909–996 16. Daubechies I (1992) Ten Lectures on Wavelets, CBMSNSF Series. SIAM, Philadelphia
283
284
Bivariate (Twodimensional) Wavelets
17. Daubechies I, Han B (2004) Pairs of dual wavelet frames from any two refinable functions. Constr Approx 20:325–352 18. Daubechies I, Han B, Ron A, Shen Z (2003) Framelets: MRAbased constructions of wavelet frames. Appl Comput Harmon Anal 14:1–46 19. Dahlke S, Gröchenig K, Maass P (1999) A new approach to interpolating scaling functions. Appl Anal 72:485–500 20. de Boor C, Hollig K, Riemenschneider SD (1993) Box splines. Springer, New York 21. DeRose T, Forsey DR, Kobbelt L, Lounsbery M, Peters J, Schröder P, Zorin D (1998) Subdivision for Modeling and Animation, (course notes) 22. Dyn N, Levin D (2002) Subdivision schemes in geometric modeling. Acta Numer 11:73–144 23. Dyn N, Gregory JA, Levin D (1990) A butterfly subdivision scheme for surface interpolation with tension control. ACM Trans Graph 9:160–169 24. Han B (1997) On dual wavelet tight frames. Appl Comput Harmon Anal 4:380–413 25. Han B (2000) Analysis and construction of optimal multivariate biorthogonal wavelets with compact support. SIAM Math Anal 31:274–304 26. Han B (2001) Approximation properties and construction of Hermite interpolants and biorthogonal multiwavelets. J Approx Theory 110:18–53 27. Han B (2002) Symmetry property and construction of wavelets with a general dilation matrix. Lin Algeb Appl 353:207–225 28. Han B (2002) Projectable multivariate refinable functions and biorthogonal wavelets. Appl Comput Harmon Anal 13:89–102 29. Han B (2003) Computing the smoothness exponent of a symmetric multivariate refinable function. SIAM J Matrix Anal Appl 24:693–714 30. Han B (2003) Compactly supported tight wavelet frames and orthonormal wavelets of exponential decay with a general dilation matrix. J Comput Appl Math 155:43–67 31. Han B (2003) Vector cascade algorithms and refinable function vectors in Sobolev spaces. J Approx Theory 124:44–88 32. Han B (2004) Symmetric multivariate orthogonal refinable functions. Appl Comput Harmon Anal 17:277–292 33. Han B (2006) On a conjecture about MRA Riesz wavelet bases. Proc Amer Math Soc 134:1973–1983 34. Han B (2006) The projection method in wavelet analysis. In: Chen G, Lai MJ (eds) Modern Methods in Mathematics. Nashboro Press, Brentwood, pp. 202–225 35. Han B (2008) Construction of wavelets and framelets by the projection method. Int J Appl Math Appl 1:1–40 36. Han B, Jia RQ (1998) Multivariate refinement equations and convergence of subdivision schemes. SIAM J Math Anal 29:1177–1199 37. Han B, Jia RQ (1999) Optimal interpolatory subdivision schemes in multidimensional spaces. SIAM J Numer Anal 36:105–124 38. Han B, Jia RQ (2002) Quincunx fundamental refinable functions and quincunx biorthogonal wavelets. Math Comp 71:165–196 39. Han B, Jia RQ (2006) Optimal C 2 twodimensional interpolatory ternary subdivision schemes with tworing stencils. Math Comp 75:1287–1308 40. Han B, Jia RQ (2007) Characterization of Riesz bases of wavelets generated from multiresolution analysis. Appl Comput Harmon Anal 23:321–345
41. Han B, Shen Z (2005) Wavelets from the Loop scheme. J Fourier Anal Appl 11:615–637 42. Han B, Shen Z (2006) Wavelets with short support. SIAM J Math Anal 38:530–556 43. Ji H, Riemenschneider SD, Shen Z (1999) Multivariate compactly supported fundamental refinable functions, duals, and biorthogonal wavelets. Stud Appl Math 102:173–204 44. Jia RQ (1998) Approximation properties of multivariate wavelets. Comp Math 67:647–665 45. Jia RQ (1999) Characterization of smoothness of multivariate refinable functions in Sobolev spaces. Trans Amer Math Soc 351:4089–4112 46. Jia RQ, Micchelli CA (1991) Using the refinement equation for the construction of prewavelets II: Power of two. In: Laurent PJ, Le Méhauté A, Schumaker LL (eds) Curves and Surfaces. Academic Press, New York, pp. 209–246 47. Jia RQ, Wang JZ, Zhou DX (2003) Compactly supported wavelet bases for Sobolev spaces. Appl Comput Harmon Anal 15:224– 241 48. Jiang QT (1998) On the regularity of matrix refinable functions. SIAM J Math Anal 29:1157–1176 49. Khodakovsky A, Schröder P, Sweldens W (2000) Progressive geometry compression. Proc SIGGRAPH 50. Kovaˇcevi´c J, Vetterli M (1992) Nonseparable multidimensional perfect reconstruction filter banks and wavelet bases for Rn . IEEE Trans Inform Theory 38:533–555 51. Lai MJ (2006) Construction of multivariate compactly supported orthonormal wavelets. Adv Comput Math 25:41–56 52. Lai MJ, Stöckler J (2006) Construction of multivariate compactly supported tight wavelet frames. Appl Comput Harmon Anal 21:324–348 53. Lawton W, Lee SL, Shen Z (1997) Stability and orthonormality of multivariate refinable functions. SIAM J Math Anal 28:999– 1014 54. Lawton W, Lee SL, Shen Z (1998) Convergence of multidimensional cascade algorithm. Numer Math 78:427–438 55. Le Pennec E, Mallat S (2005) Sparse geometric image representations with bandelets. IEEE Trans Image Process 14:423–438 56. Lorentz R, Oswald P (2000) Criteria for hierarchical bases in Sobolev spaces. Appl Comput Harmon Anal 8:32–85 57. Maass P (1996) Families of orthogonal twodimensional wavelets. SIAM J Math Anal 27:1454–1481 58. Mallat S (1998) A wavelet tour of signal processing. Academic Press, San Diego 59. Riemenschneider SD, Shen Z (1997) Multidimensional interpolatory subdivision schemes. SIAM J Numer Anal 34:2357– 2381 60. Riemenschneider SD, Shen ZW (1992) Wavelets and prewavelets in low dimensions. J Approx Theory 71:18–38 61. Ron A, Shen Z (1997) Affine systems in L2 (Rd ): the analysis of the analysis operator. J Funct Anal 148:408–447 62. Ron A, Shen Z (1997) Affine systems in L2 (Rd ) II: dual systems. J Fourier Anal Appl 3:617–637 63. Sweldens W (1996) The lifting scheme: a customdesign construction of biorthogonal wavelets. Appl Comput Harmon Anal 3:186–200
Books and Reviews Cabrelli C, Heil C, Molter U (2004) Selfsimilarity and multiwavelets in higher dimensions. Mem Amer Math Soc 170(807):1–82
Branching Processes
Branching Processes MIKKO J. ALAVA 1 , KENT BÆKGAARD LAURITSEN2 Department of Engineering Physics, Espoo University of Technology, Espoo, Finland 2 Research Department, Danish Meteorological Institute, Copenhagen, Denmark
1
Article Outline Glossary Deﬁnition of the Subject Introduction Branching Processes SelfOrganized Branching Processes Scaling and Dissipation Network Science and Branching Processes Conclusions Future Directions Acknowledgments Bibliography Glossary Markov process A process characterized by a set of probabilities to go from a certain state at time t to another state at time t C 1. These transition probabilities are independent of the history of the process and only depend on a ﬁxed probability assigned to the transition. Critical properties and scaling The behavior of equilibrium and many nonequilibrium systems in steady states contain critical points where the systems display scale invariance and the correlation functions exhibit an algebraic behavior characterized by socalled critical exponents. A characteristics of this type of behavior is the lack of ﬁnite length and time scales (also reminiscent of fractals). The behavior near the critical points can be described by scaling functions that are universal and that do not depend on the detailed microscopic dynamics. Avalanches When a system is perturbed in such a way that a disturbance propagates throughout the system one speak of an avalanche. The local avalanche dynamics may either conserve energy (particles) or dissipate energy. The avalanche may also loose energy when it reaches the system boundary. In the neighborhood of a critical point the avalanche distribution is described by a powerlaw distribution. Selforganized criticality (SOC) SOC is the surprising “critical” state in which many systems from physics to biology to social ones ﬁnd themselves. In physics jar
gon, they exhibit scaleinvariance, which means that the dynamics – consisting of avalanches – has no typical scale in time or space. The really necessary ingredient is that there is a hidden, ﬁnetuned balance between how such systems are driven to create the dynamic response, and how they dissipate the input (“energy”) to still remain in balance. Networks These are descriptions of interacting systems, where in graph theoretical language nodes or vertices are connected by links or edges. The interesting thing can be the structure of the network, and its dynamics, or that of a process on the top of it like the spreading of computer viruses on the Internet. Definition of the Subject Consider the fate of a human population on a small, isolated island. It consists of a certain number of individuals, and the most obvious question, of importance in particular for the inhabitants of the island, is whether this number will go to zero. Humans die and reproduce in steps of one, and therefore one can try to analyze this fate mathematically by writing down what is called master equations, to describe the dynamics as a “branching process” (BP). The branching here means that if at time t D 0 there are N humans, at the next step t D 1 there can be N 1 (or N C 1 or N C 2 if the only change from t D 0 was that a pair of twins was born). The outcome will depend in the simplest case on a “branching number”, or the number of oﬀspring that a human being will have [1,2,3,4]. If the oﬀspring created are too few, then the population will decay, or reach an “absorbing state” out of which it will never escape. Likewise if they are many (the Malthusian case in reality, perhaps), exponential growth in time will ensue in the simplest case. In between, there is an interesting twist: a phase transition that separates these two outcomes at a critical value of c . As is typical of such transitions in statistical physics one runs into scalefree behavior. The lifetime of the population suddenly has no typical scale, and its total size will be a stochastic quantity, described by a probability distribution that again has no typical scale exactly at c . The example of a small island also illustrates the many diﬀerent twists that one can ﬁnd in branching processes. The population can be “spatially dispersed” such that the individuals are separated by distance. There are in fact two interacting populations, called “male” and “female”, and if the size of one of the populations becomes zero the other one will die out soon as well. The people on the island eat, and there is thus a hidden variable in the dynamics, the availability of food. This causes a history
285
286
Branching Processes
eﬀect which makes the dynamics of human population what is called “nonMarkovian”. Imagine as above, that we look at the number of persons on the island at discrete times. A Markovian process is such that the probabilities to go from a state (say of) N to state N C ı depends only on the ﬁxed probability assigned to the “transition” N ! N C ıN. Clearly, any relatively faithful description of the death and birth rates of human beings has to consider the average state of nourishment, or whether there is enough food for reproduction. Introduction Branching processes are often perfect models of complex systems, or in other words exhibit deep complexity themselves. Consider the following example of a onedimensional model of activated random walkers [5]. Take a line of sites x i ; i D 1 : : : L. Fill the sites randomly to a certain density n D N/L, where N is the preset number of individuals performing the activated random walk. Now, let us apply the simple rule, that if there are two or more walkers at the same xj , two of them get “activated” and hop to j 1 or j C 1, again at random. In other words, this version of the drunken barhoppers problem has the twist that they do not like each other. If the system is “periodic” or i D 1 is connected to i D L, then the dynamics is controlled by the density n. For a critical value nc (estimated by numerical simulations to be about 0.9488. . . [6]) a phase transition takes place, such that for n < nc the asymptotic state is the “absorbing one”, where all the walkers are immobilized since N i D 1 or 0. In the opposite case for n > nc there is an active phase such that (in the inﬁnite Llimit) the activity persists forever. This particular model is unquestionably nonMarkovian if one only considers the number of active walkers or their density . One needs to know the full state of N i to be able to write down exact probabilities for how changes in a discrete step of time. The most interesting things happen if one changes the onedimensional lattice by adapting two new rules. If a walker walks out (to i D 0 or i D L C 1), it disappears. Second, if there are no active ones ( D 0), one adds one new walker randomly to the system. Now the activity (t) is always at a marginal value, and the longterm average of n becomes a (possibly Ldependent) constant such that the ﬁrst statement is true, i. e. the system becomes critical. With these rules, the model of activated random walkers is also known as the Manna model after the Indian physicist [5], and it exhibits the phenomenon dubbed “SelfOrganized Criticality” (SOC) [7]. Figure 1 shows an example of the dynamics by using what is called an “activity plot”,
Branching Processes, Figure 1 We follow the “activity” in a onedimensional system of random activated walkers. The walkers stroll around the xaxis, and the pattern becomes in fact scaleinvariant. This system is such that some of the walkers disappear (by escaping through open boundaries) and to maintain a constant density new ones are added. One question one may ask is what is the waiting time, that another (or the same) walker gets activated at the same location after a time of inactivity. As one can see from the figure, this can be a result of old activity getting back, or a new “avalanche” starting from the addition of an outsider (courtesy of Lasse Laurson)
where those locations xi are marked both in space and time which happen to contain justactivated walkers. One can now apply several kinds of measures to the system, but the ﬁgure already hints about the reasons why these simple models have found much interest. The structure of the activity is a selfaﬃne fractal (for discussions about fractals see other reviews in this volume). The main enthusiasm about SOC comes from the avalanches. These are in other words the bursts of activity that separate quiescent times (when D 0). The silence is broken by the addition of a particle or a walker, and it creates an integrated quantity (volume) of activity, RT s D 0 (t)dt, where for 0 < t < T, > 0 and D 0 at the endpoints t and T. The original boost to SOC took place after Per Bak, Chao Tang, and Kay Wiesenfeld published in 1987 a highly inﬂuential paper in the premium physics journal Physical Review Letters, and introduced what is called the Bak–Tang–Wiesenfeld (BTW) sandpile model – of which the Manna one is a relative [7]. The BTW and Manna models and many others exhibit the important property that the avalanche sizes s have a scalefree probability distribution but note that this simple criticality is not always true, not even for the BTW model which shows socalled multiscaling [8,9]. Scalefree probability distri
Branching Processes
butions are usually written as P(s) s
s
Ds
f s (s/L ) ;
(1)
here all the subscripts refer to the fact that we look at avalanches. s and Ds deﬁne the avalanche exponent and the cutoﬀ exponent, respectively. f s is a cutoﬀ function that together with Ds includes the fact that the avalanches are restricted somehow by the system size (in the onedimensional Manna model, if s becomes too large many walkers are lost, and ﬁrst n drops and then goes to zero). Similar statements can be made about the avalanche durations (T), area or support A, and so forth [10,11,12,13,14]. The discovery of simpletodeﬁne models, yet of very complex behavior has been a welcome gift since there are many, many phenomena that exhibit apparently scale“free statistics and/or bursty or intermittent dynamics similar to SOC models. These come from natural sciences – but not only, since economy and sociology are also giving rise to cases that need explanations. One particular ﬁeld where these ideas have found much interest is the physics of materials ranging from understanding earthquakes to looking at the behavior of vortices in superconductors. The Gutenberg–Richter law of earthquake magnitudes is a powerlaw, and one can measure similar statistics from fracture experiments by measuring acoustic emission event energies from tensile tests on ordinary paper [15]. The modern theory of networks is concerned with graph or network structures which exhibit scalefree characteristics, in particular the number of neighbors a vertex has is often a stochastic quantity exhibiting a powerlawlike probability distribution [16,17]. Here, branching processes are also ﬁnding applications. They can describe the development of network structures, e. g., as measured by the degree distribution, the number of neighbors a node (vertex) is connected to, and as an extension of more usual models give rise to interesting results when the dynamics of populations on networks are being considered. Whenever comparing with real, empirical phenomena models based on branching processes give a paradigm on two levels. In the case of SOC this is given by a combination of “internal dynamics” and “an ensemble”. Of these, the ﬁrst means e. g. that activated walkers or particles of type A are moved around with certain rules, and of course there is an enormous variety of possible models. e. g. in the Mannamodel this is obvious if one splits the walkers into categories A (active) and B (passive). Then, there is the question of how the balance (assuming this balance is true) is maintained. The SOC models do this by a combination of dissipation (by e. g. particles dropping oﬀ the edge of a system) and drive (by addition of B’s), where the rates are chosen actually pretty carefully. In the case of growing
networks, one can similarly ask what kind of growth laws allow for scalefree distributions of the node degree. For the theory of complex systems branching processes thus give rise to two questions: what classes of models are there? What kinds of truly diﬀerent ensembles are there? The theory of reactiondiﬀusion systems has tried to answer to the ﬁrst question since the 70’s, and the developments reﬂect the idea of universality in statistical physics (see e. g. the article of Dietrich Stauﬀer in this volume). There, this means that the behavior of systems at “critical points” – such as deﬁned by the c and nc from above – follows from the dimension at hand and the “universality class” at hand. The activated walkers’ model, it has recently been established, belongs to the “ﬁxed energy sandpile” one, and is closely related to other seemingly farfetched ones such as the depinning/pinning of domain walls in magnets (see e. g. [18]). The second question can be stated in two ways: forgetting about the detailed model at hand, when can one expect complex behavior such as powerlaw distributions, avalanches etc.? Second, and more technically, can one derive the exponents such as s and Ds from those of the same model at its “usual” phase transition? The theory of branching processes provide many answers to these questions, and in particular help to illustrate the inﬂuence of boundary conditions and modes of driving on the expected behavior. Thus, one gets a clear idea of the kind of complexity one can expect to see in the many diﬀerent kinds of systems where avalanches and intermittency and scaling is observed. Branching Processes The mathematical branching process is deﬁned for a set of objects that do not interact. At each iteration, each object can give rise to new objects with some probability p (or in general it can be a set of probabilities). By continuing this iterative process the objects will form what is referred to as a cluster or an avalanche. We can now ask questions of the following type: Will the process continue for ever? Will the process maybe die out and stop after a ﬁnite number of iterations? What will the average lifetime be? What is the average size of the clusters? And what is (the asymptotic form of) the probability that the process is active after a certain number of iterations, etc. BP Deﬁnition We will consider a simple discrete BP denoted the Galton– Watson process. For other types of BPs (including processes in continuous time) we refer to the book by Harris [1]. We will denote the number of objects at generation n as: z0 ; z1 ; z2 ; : : : ; z n ; : : :. The index n is referred to
287
288
Branching Processes
as time and time t D 0 corresponds to the 0th generation where we take z0 D 1, i. e., the process starts from one object. We will assume that the transition from generation n to n C 1 is given by a probability law that is independent of n, i. e., it is assumed that it forms a Markov process. And ﬁnally it will be assumed that diﬀerent objects do not interact with one another. The probability measure for the process is characterized by probabilities as follows: pk is the probability that an object in the nth generation will have k oﬀsprings in the n C 1th generation. We assume that pk is independent of n. The probabilities pk thus reads p k D Prob(z1 D k) and fulﬁlls: X
pk D 1 :
(2)
Branching Processes, Figure 3 Schematic drawing of the behavior of the probability of extinction q, of the branching process. The quantity 1 q is similar to the order parameter for systems in equilibrium at their critical points and the quantity mc is referred to as the critical point for the BP
k
Figure 2 shows an example of the tree structure for a branching process and the resulting avalanche. One can deﬁne the probability generating function f (s) associated to the transition probabilities: f (s) D
X
k
pk s :
(3)
k
The ﬁrst and second moments of the number z1 are denoted as: m D Ez1 ;
2 D Var z1 :
By taking derivatives of the generating function at s D 1 it follows that: Ez n D m n ;
Var z n D n 2 :
For the BP deﬁned in Fig. 2, it follows that m D 2p and 2 D 4p(1 p). An important quantity is the probability for extinction q. It is obtained as follows: q D P(z n ! 0) D P(z n D 0 for some n) D lim P(z n D 0): n
(6)
(4)
Branching Processes, Figure 2 Schematic drawing of an avalanche in a system with a maximum of n D 3 avalanche generations corresponding to N D 2nC1 1 D 15 sites. Each black site relaxes with probability p to two new black sites and with probability 1 p to two white sites (i. e., p0 D 1 p, p1 D 0, p2 D p). The black sites are part of an avalanche of size s D 7, whereas the active sites at the boundary yield a boundary avalanche of size 3 (p; t) D 2
(5)
It can be shown that for m 1 : q D 1 ; and for m > 1: there exists a solution that fulﬁlls q D f (q), where 0 < q < 1 [1]. It is possible to show that limn P(z n D k) D 0, for k D 1; 2; 3; : : : , and that z n ! 0 with probability q, and z n ! 1, with probability 1 q. Thus, the sequence fz n g does not remain positive and bounded [1]. The quantity 1 q is similar to an order parameter for systems in equilibrium and its behavior is schematically shown in Fig. 3. The behavior around the value mc D 1, the socalled critical value for the BP (see below), can in analogy to second order phase transitions in equilibrium systems be described by a critical exponent ˇ deﬁned as follows (ˇ D 1, cf. [1]):
0; m mc (7) Prob(survival) D (m mc )ˇ ; m > mc :
Avalanches and Critical Point We will next consider the clusters, or avalanches, in more detail and obtain the asymptotic form of the probability distributions. The size of the cluster is given by the sum
Branching Processes
z D z1 C z2 C z3 C : : :. One can also consider other types of clusters, e. g., the activity of the boundary (of a ﬁnite tree) and deﬁne the boundary avalanche (cf. Fig. 2). For concreteness, we consider the BP deﬁned in Fig. 2. The quantities Pn (s; p) and Q n (; p) denote the probabilities of having an avalanche of size s and boundary size in a system with n generations. The corresponding generating functions are deﬁned by [1] X f n (x; p) Pn (s; p)x s (8) s
g n (x; p)
X
Q n (; p)x :
(9)
Due to the hierarchical structure of the branching process, it is possible to write down recursion relations for Pn (s; p) and Q n (; p): (10) f nC1 (x; p) D x (1 p) C p f n2 (x; p) ; g nC1 (x; p) D (1 p) C pg n2 (x; p) ;
(11)
where f0 (x; p) D g0 (x; p) D x. The avalanche distribution D(s) is determined by Pn (s; p) by using the recursion relation (10). The stationary solution of Eq. (10) in the limit n 1 is given by p 1 1 4x 2 p(1 p) : (12) f (x; p) D 2x p By expanding Eq. (12) as a series in x, comparing with the deﬁnition (8), and using Stirling’s formula for the highorder terms, we obtain for sizes such that 1 s . n the following behavior: p 2(1 p)/ p exp s/sc (p) : (13) Pn (s; p) D s 3/2 The cutoﬀ sc (p) is given by sc (p) D 2/ ln[4p(1 p)]. As p ! 1/2, sc (p) ! 1, thus showing explicitly that the critical value for the branching process is pc D 1/2 (i. e., mc D 1), and that the meanﬁeld avalanche exponent, cf. Eq. (1), for the critical branching process is D 3/2. The expression (13) is only valid for avalanches which are not eﬀected by the ﬁnite size of the system. For avalanches with n . s . N, it is possible to solve the recursion relation (10), and then obtain Pn (s; p) for p pc by the use of a Tauberian theorem [19,20,21]. By carrying out such an analysis one obtains after some algebra Pn (s; p) A(p) exp s/s0 (p) , with functions A(p) and s0 (p) which can not be determined analytically. Nevertheless, we see that for any p the probabilities Pn (s; p) will decay exponentially. One can also calculate the asymptotic form of Q n (; p) for 1 . n and p pc by the use of
a Tauberian theorem [19,20,21]. We will return to study this in the next section where we will also deﬁne and investigate the distribution of the time to extinction. SelfOrganized Branching Processes We now return to discussing the link between selforganized criticality as mentioned in the introduction and branching processes. The simplest theoretical approach to SOC is meanﬁeld theory [22], which allows for a qualitative description of the behavior of the SOC state. Meanﬁeld exponents for SOC models have been obtained by various approaches [22,23,24,25] and it turns out that their values (e. g., D 3/2) are the same for all the models considered thus far. This fact can easily be understood since the spreading of an avalanche in meanﬁeld theory can be described by a front consisting of noninteracting particles that can either trigger subsequent activity or die out. This kind of process is reminiscent of a branching process. The connection between branching processes and SOC has been investigated, and it has been argued that the meanﬁeld behavior of sandpile models can be described by a critical branching process [26,27,28]. For a branching process to be critical one must ﬁne tune a control parameter to a critical value. This, by definition, cannot be the case for a SOC system, where the critical state is approached dynamically without the need to ﬁne tune any parameter. In the socalled selforganized branchingprocess (SOBP), the coupling of the local dynamical rules to a global condition drives the system into a state that is indeed described by a critical branching process [29]. It turns out that the meanﬁeld theory of SOC models can be exactly mapped to the SOBP model. In the meanﬁeld description of the sandpile model (d ! 1) one neglects correlations, which implies that avalanches do not form loops and hence spread as a branching process. In the SOBP model, an avalanche starts with a single active site, which then relaxes with probability p, leading to two new active sites. With probability 1 p the initial site does not relax and the avalanche stops. If the avalanche does not stop, one repeats the procedure for the new active sites until no active site remains. The parameter p is the probability that a site relaxes when it is triggered by an external input. For the SOBP branching process, there is a critical value, pc D 1/2, such that for p > pc the probability to have an inﬁnite avalanche is nonzero, while for p < pc all avalanches are ﬁnite. Thus, p D pc corresponds to the critical case, where avalanches are power law distributed. In this description, however, the boundary conditions are not taken into account – even though they are cru
289
290
Branching Processes
cial for the selforganization process. The boundary conditions can be introduced in the problem in a natural way by allowing for no more than n generations for each avalanche. Schematically, we can view the evolution of a single avalanche of size s as taking place on a tree of size N D 2nC1 1 (see Fig. 2). If the avalanche reaches the boundary of the tree, one counts the number of active sites n (which in the sandpile language corresponds to the energy leaving the system), and we expect that p decreases for the next avalanche. If, on the other hand, the avalanche stops before reaching the boundary, then p will slightly increase. The number of generations n can be thought of as some measure of the linear size of the system. The above avalanche scenario is described by the following dynamical equation for p(t): p(t C 1) D p(t) C
1 n (p; t) ; N
(14)
where n , the size of an avalanche reaching the boundary, ﬂuctuates in time and hence acts as a stochastic driving force. If n D 0, then p increases (because some energy has been put into the system without any output), whereas if n > 0 then p decreases (due to energy leaving the system). Equation (14) describes the global dynamics of the SOBP, as opposed to the local dynamics which is given by the branching process. One can study the model for a ﬁxed value of n, and then take the limit n ! 1. In this way, we perform the longtime limit before the “thermodynamic” limit, which corresponds exactly to what happens in sandpile models. We will now show that the SOBP model provides a meanﬁeld theory of selforganized critical systems. Consider for simplicity the sandpile model of activated random walkers from the Introduction [5]: When a particle is added to a site zi , the site will relax if z i D 1. In the limit d ! 1, the avalanche will never visit the same site more than once. Accordingly, each site in the avalanche will relax with the same probability p D P(z D 1). Eventually, the avalanche will stop, and 0 particles will leave the system. Thus, the total number of particles M(t) evolves according to
Branching Processes, Figure 4 The value of p as a function of time for a system with n D 10 generations. The two curves refer to two different initial conditions, above pc () and below pc (ı). After a transient, the control parameter p(t) reaches its critical value pc and fluctuates around it with shortrange correlations
where hn i n D (2p)n n (p; t) describes the ﬂuctuations in the steady state. Thus, is obtained by measuring n for each realization of the process. Without the last term, Eq. (16) has a ﬁxed point (dp/dt D 0) for p D pc D 1/2. On linearizing Eq. (16), one sees that the ﬁxed point is attractive, which demonstrates the self organization of the SOBP model since the noise /N will have vanishingly small eﬀect in the thermodynamic limit [29]. Figure 4 shows the value of p as a function of time. Independent of the initial conditions, one ﬁnds that after a transient p(t) reaches the selforganized state described by the critical value pc D 1/2 and ﬂuctuates around it with shortrange correlations (of the order of one time unit). By computing the variance of p(t), one ﬁnds that the ﬂuctuations can be very well described by a Gaussian distribution, (p) [29]. In the limit N ! 1, the distribution (p) approaches a delta function, ı(p pc ). Avalanche Distributions
M(t C 1) D M(t) C 1 :
(15)
The dynamical Eq. (14) for the SOBP model is recovered by noting that M(t) D N P(z D 1) D N p. By taking the continuum time limit of Eq. (14), it is possible to obtain the following expression: dp 1 (2p)n (p; t) D C ; dt N N
(16)
Figure 5 shows the avalanche size distribution D(s) for different values of the number of generations n. One notices that there is a scaling region (D(s) s with D 3/2), whose size increases with n, and characterized by an exponential cutoﬀ. This powerlaw scaling is a signature of the meanﬁeld criticality of the SOBP model. The distribution of active sites at the boundary, D( ), for diﬀerent values of the number of generations falls oﬀ exponentially [29].
Branching Processes
The avalanche lifetime distribution L(t) t y yields the probability to have an avalanche that lasts for a time t. For a system with m generations one obtains L(m) m2 [1]. Identifying the number of generations m of an avalanche with the time t, we thus obtain the meanﬁeld value y D 2, in agreement with simulations of the SOBP model [29]. In summary, the selforganized branching process captures the physical features of the selforganization mechanism in sandpile models. By explicitly incorporating the boundary conditions it follows that the dynamics drives the system into a stationary state, which in the thermodynamic limit corresponds to the critical branching process. Scaling and Dissipation Branching Processes, Figure 5 Log–log plot of the avalanche distribution D(s) for different system sizes. The number of generations n increases from left to right. A line with slope D 3/2 is plotted for reference, and it describes the behavior of the data for intermediate s values, cf. Eq. (18). For large s, the distributions fall off exponentially
In the limit where n 1 one can obtain various analytical results and e. g. calculate the avalanche distribution D(s) for the SOBP model. In addition, one can obtain results for ﬁnite, but large, values of n. The distribution D(s) can be calculated as the average value of Pn (s; p) with respect to the probability density (p), i. e., according to the formula Z 1 D(s) D dp (p) Pn (s; p) : (17) 0
The simulation results in Fig. 4 yield that (p) for N 1 approaches the delta function ı(p pc ). Thus, from Eqs. (13) and (17) we obtain the powerlaw behavior r 2 (18) s ; D(s) D where D 3/2, and for s & n we obtain an exponential cutoﬀ exp(s/s0 (pc )). These results are in complete agreement with the numerical results shown in Fig. 5. The deviations from the powerlaw behavior (18) are due to the fact that Eq. (13) is only valid for 1 s . n. One can also calculate the asymptotic form of Q n (; p) for 1 . n and p pc by the use of a Tauberian theorem [19,20,21]; the result show that the boundary avalanche distribution is Z 1 8 dp (p) Q n (; p) D 2 exp (2/n) ; (19) D() D n 0 which agrees with simulation results for n 1, cf. [29].
Sometimes it can be diﬃcult to determine whether the cutoﬀ in the scaling is due to ﬁnitesize eﬀects or due to the fact that the system is not at but rather only close to the critical point. In this respect, it is important to test the robustness of critical behavior by understanding which perturbations destroy the critical properties. It has been shown numerically [30,31,32] that the breaking of the conservation of particle numbers leads to a characteristic size in the avalanche distributions. We will now allow for dissipation in branching processes and show how the system selforganizes into a subcritical state. In other words, the degree of nonconservation is a relevant parameter in the renormalization group sense [33]. Consider again the twostate model introduced by Manna [5]. Some degree of nonconservation can be introduced in the model by allowing for energy dissipation in a relaxation event. In a continuous energy model this can be done by transferring to the neighboring sites only a fraction (1 ) of the energy lost by the relaxing site [30]. In a discrete energy model, such as the Manna twostate model, one can introduce dissipation as the probability that the two particles transferred by the relaxing site are annihilated [31]. For D 0 one recovers the original twostate model. Numerical simulations [30,31] show that diﬀerent ways of considering dissipation lead to the same eﬀect: a characteristic length is introduced into the system and the criticality is lost. As a result, the avalanche size distribution decays not as a pure power law but rather as D(s) s hs (s/sc ) :
(20)
Here hs (x) is a cutoﬀ function and the cutoﬀ size scales as sc ' :
(21)
291
292
Branching Processes
The size s is deﬁned as the number of sites that relax in an avalanche. We deﬁne the avalanche lifetime T as the number of steps comprising an avalanche. The corresponding distribution decays as D(T) T y h T (T/Tc ) ;
(22)
where hT (x) is another cutoﬀ function and Tc is a cutoﬀ that scales as Tc :
p D pc
1 : 2(1 )
(25)
Thus, for p > pc the probability to have an inﬁnite avalanche is nonzero, while for p < pc all avalanches are ﬁnite. The value p D pc corresponds to the critical case where avalanches are power law distributed.
(23)
The cutoﬀ or “scaling” functions hs (x) and hT (x) fall oﬀ exponentially for x 1. To construct the meanﬁeld theory one proceeds as follows [34]: When a particle is added to an arbitrary site, the site will relax if a particle was already present, which occurs with probability p D P(z D 1), the probability that the site is occupied. If a relaxation occurs, the two particles are transferred with probability 1 to two of the inﬁnitely many nearest neighbors, or they are dissipated with probability (see Fig. 6). The avalanche process in the meanﬁeld limit is a branching process. Moreover, the branching process can be described by the eﬀective branching probability p˜ p(1 ) ;
where p˜ is the probability to create two new active sites. We know that there is a critical value for p˜ D 1/2, or
The Properties of the Steady State To address the selforganization, consider the evolution of the total number of particles M(t) in the system after each avalanche: M(t C 1) D M(t) C 1 (p; t) (p; t) :
(26)
Here is the number of particles that leave the system from the boundaries and is the number of particles lost by dissipation. Since (cf. Sect. “Self–Organized Branching Processes”) M(t) D N P(z D 1) D N p, we obtain an evolution equation for the parameter p: p(t C 1) D p(t) C
(24)
1 (p; t) (p; t) : N
(27)
This equation reduces to the SOBP model for the case of no dissipation ( D 0). In the continuum limit one obtains [34] (p; t) 1 dp 1 (2p(1 ))n pH(p(1 )) C D : dt N N (28) Here, we deﬁned the function H(p(1 )), that can be obtained analytically, and introduced the function (p; t) to describe the ﬂuctuations around the average values of and . It can be shown numerically that the eﬀect of this “noise” term is vanishingly small in the limit N ! 1 [34]. Without the noise term one can study the ﬁxed points of Eq. (28) and one ﬁnds that there is only one ﬁxed point, Branching Processes, Figure 6 Schematic drawing of an avalanche in a system with a maximum of n D 3 avalanche generations corresponding to N D 2nC1 1 D 15 sites. Each black site () can relax in three different ways: (i) with probability p(1 ) to two new black sites, (ii) with probability 1 p the avalanche stops, and (iii) with probability p two particles are dissipated at a black site, which then becomes a marked site (˚), and the avalanche stops. The black sites are part of an avalanche of size s D 6, whereas the active sites at the boundary yield 3 (p; t) D 2. There was one dissipation event such that D 2
p D 1/2 ;
(29)
independent of the value of ; the corrections to this value are of the order O(1/N). By linearizing Eq. (28), it follows that the ﬁxed point is attractive. This result implies that the SOBP model selforganizes into a state with p D p . In Fig. 7 is shown the value of p as a function of time for diﬀerent values of the dissipation . We ﬁnd that independent of the initial conditions after a transient p(t) reaches the selforganized steadystate described by the
Branching Processes
Branching Processes, Figure 7 The value of the control parameter p(t) as a function of time for a system with different levels of dissipation. After a transient, p(t) reaches its fixedpoint value p D 1/2 and fluctuates around it with shortrange time correlations
Branching Processes, Figure 8 Phase diagram for the SOBP model with dissipation. The dashed line shows the fixed points p D 1/2 of the dynamics, with the flow being indicated by the arrows. The solid line shows the critical points, cf. Eq. (25)
ﬁxed point value p D 1/2 and ﬂuctuates around it with shortrange correlations (of the order of one time unit). The ﬂuctuations around the critical value decrease with the system size as 1/N. It follows that in the limit N ! 1 the distribution (p) of p approaches a delta function (p) ı(p p ). By comparing the ﬁxed point value (29) with the critical value (25), we obtain that in the presence of dissipation ( > 0) the selforganized steadystate of the system is subcritical. Figure 8 is a schematic picture of the phase diagram of the model, including the line p D pc of critical behavior (25) and the line p D p of ﬁxed points (29). These two lines intersect only for D 0. Avalanche and Lifetime Distributions In analogy to the results in Sect. “Self–Organized Branching Processes”, we obtain similar formulas for the avalanche size distributions but with p˜ replacing p. As a result we obtain the distribution r 2 1 C C ::: D(s) D exp (s/sc ()) : (30) s We can expand sc ( p˜ ()) D 2/ln [4 p˜ (1 p˜)] in with the result 2 sc () ' ; ' D 2 : (31) Furthermore, the meanﬁeld exponent for the critical branching process is obtained setting D 0, i. e., D 3/2 :
(32)
Branching Processes, Figure 9 Log–log plot of the avalanche distribution D(s) for different levels of dissipation. A line with slope D 3/2 is plotted for reference, and it describes the behavior of the data for intermediate s values, cf. Eq. (30). For large s, the distributions fall off exponentially. The data collapse is produced according to Eq. (30)
These results are in complete agreement with the SOBP model and the simulation of D(s) for the SOBP model with dissipation (cf. Fig. 9). The deviations from the powerlaw behavior (30) are due to the fact that Eq. (13) is only valid for 1 s . n. Next, consider the avalanche lifetime distribution D(T) characterizing the probability to obtain an avalanche which spans m generations. It can be shown that the result
293
294
Branching Processes
Branching Processes, Figure 10 Log–log plot of the lifetime distribution D(T) for different levels of dissipation. A line with slope y D 2 is plotted for reference. Note the initial deviations from the power law for D 0 due to the strong corrections to scaling. The data collapse is produced according to Eq. (33)
can be expressed in the scaling form [1,34] D(T) T y exp(T/Tc ) ;
(33)
where Tc ;
D1:
(34)
The lifetime exponent y was deﬁned in Eq. (22), wherefrom we conﬁrm the meanﬁeld result y D2:
(35)
In Fig. 10, we show the data collapse produced by Eq. (33) for the lifetime distributions for diﬀerent values of . In summary, the eﬀect of dissipation on the dynamics of the sandpile model in the meanﬁeld limit (d ! 1) is described by a branching process. The evolution equation for the branching probability has a single attractive ﬁxed point which in the presence of dissipation is not a critical point. The level of dissipation therefore acts as a relevant parameter for the SOBP model. These results show, in the meanﬁeld limit, that criticality in the sandpile model is lost when dissipation is present. Network Science and Branching Processes An interesting mixture of two kinds of “complex systems” is achieved by considering what kind of applications one can ﬁnd of branching processes applied to or on networks.
Here, network is a loose term in the way it is often used – as in the famous “six degrees” concept of how the social interactions of human beings can be measured to have a “small world” character [16,17,35]. These structures are usually thought of in terms of graphs, with vertices forming a set G, and edges a set E such that if e g g 0 2 E it means that it connects two sites g and g 0 2 G. An edge or link can either be symmetric or asymmetric. Another article in this volume discusses in depth the modern view on networks, providing more information. The Structural properties of networks heterogeneous networks are most usually measured by the degree ki of site i 2 G, i. e. the number of nearest neighbors the vertex i is connected to. One can go further by deﬁning weights wij for edges eij [36,37,38]. The interest in heterogeneous networks is largely due to two typical properties one can ﬁnd. In a scalefree network the structure is such that the probability distribution has a powerlaw form P(k) k ˛ up to a sizedependent cutoﬀ kmax . One can now show using meanﬁeldlike reference models (such as the “conﬁguration model”, which has no structural correlations and a preﬁxed P(k)), that it follows that these networks tend to have a “small world” character, that the average vertexvertex distance is only logarithmic in the number of vertices N: d(G) ln N. This property is correlated with the fact that the various moments of k depend on ˛. For hki to remain ﬁnite, ˛ > 2 is required. In many empirically measured networks it appears that ˛ < 3, which implies that hk 2 i diverges (we are assuming, that R 1 P(k) is cut oﬀ at a maximum degree kmax for which k max 1/N, from which the divergences follow) [16,17,35]. The ﬁrst central question of the two we pose here is, what happens to branching processes on the top of such networks? Again, this can be translated into an inquiry about the “phase diagram” – given a model – and its particulars. We would like to understand what is the critical threshold c , and what happens to subcritical and supercritical systems (with appropriate values of ). The answers may of course depend on the model at hand, i. e. the universality classes are still an important issue. One easy analytical result for uncorrelated (in particular treelike) networks is that activity will persist in the branching process if the number of oﬀspring is at least one for an active “individual” or vertex. Consider now one such vertex, i. Look at the neighbors of neighbors. There are of the order O (k i hkneighbor of i i 1) of these, so c 1/hk 2 i. This demonstrates the highly important result, that the critical threshold may vanish in heterogeneous networks, for ˛ 3 in the example case. PastorSatorras and Vespignani were the ﬁrst to point out this fact for the SusceptibleInfectedSusceptible (SIS)
Branching Processes
model [39]. The analysis is relatively easy to do using meanﬁeld theory for k , the average activity for the SIS branching process on nodes of degree k. The central equation reads h i (36) @ t k (t) D k (t) C k 1 k (t) k () : Here measures the probability that at least one of the neighbors of a vertex of degree k is infected. A MFtreatment of , excluding degree correlations, makes it possible to show that the SISmodel may have a zero threshold, and also establishes other interesting properties as h k i. As is natural, the branching process concentrates on vertices with higher degrees, that is k increases with k. The consequences of a zero epidemic threshold are plentiful and important [40]. The original application of the theory was to the spreading of computer viruses: the Internet is (on the Autonomous System level) a scalefree network [41]. So is the network of websites. The outcome even for subcritical epidemics is a long lifetime, and further complications ensue if the internal dynamics of the spreading do not follow Poissonian statistics of time (e. g. the waiting time before vertex i sends a virus to a neighbor), but a powerlaw distribution [42]. Another highly topical issue is the behavior and control of humanbased disease outbreaks. How to vaccinate against and isolate a dangerous virus epidemy depends on the structure of the network on which the spreading takes place, and on the detailed rules and dynamics. An intuitively easy idea is to concentrate on the “hubs”, on the most connected vertices of the network [43]. This can be an airport through which travellers acting as carriers move, or it can – as in the case of sexually transmitted viruses such as HIV – be the most active individuals. There are indeed claims that the network of sexual contacts is scalefree, with a small exponent ˛. Of interest here is the recent result that for > c and ˛ < 3 the spreading of branching processes deviates from usual expectations. The outgrowth covers a ﬁnite fraction of the network in a short time (vanishing in the largeN limit), and the growth in time of the infected nodes becomes polynomial, after an initial exponential phase [44]. Branching processes can also be applied to network structure. The “standard model” of growing networks is the Barabasi–Albert model. In it, from a small seed graph one grows an example network by socalled preferential attachment. That is, there are two mechanisms that operate: i) new vertices are added, and ii) old ones grow more links (degree increases) by getting linked to new ones. There is a whole zoo of various network growth models, but we next overview some ideas that directly connect to
Branching Processes, Figure 11 Two processes creating a growing or steadystate network: addition and merging of vertices
branching processes. These apply to cases in which vertices disappear by combining with other vertices, are generated from thin air, or split into “subvertices” such that old links from the original one are retained. One important mechanism, operative in cellular protein interaction networks, is the duplication of vertices (a protein) with slight changes to the copy’s connectivity from the parent vertex. Figure 11 illustrates an example where the structure in the steadystate can be fruitfully described by similar mathematics as in other similar cases in networks. The basic rules are twofold: i) a new vertex is added to the network, ii) two randomly chosen vertices merge. This kind of mechanisms are reminiscent of aggregation processes, of e. g. sticky particles that form colloidal aggregates in ﬂuids. The simple model leads to a degree distribution of the asymptotic form P(k) k 3/2 , reminiscent of the meanﬁeld branching process result for the avalanche size distribution (where D 3/2, cf. Sect. “Branching Processes”) [45,46]. The mathematical tool here is given by rate equations that describe the number of vertices with a certain degree, k. These are similar to those that one can write for the size of an avalanche in other contexts. It is an interesting question of what kind of generalizations one can ﬁnd by considering cases where the edges have weights, and the elementary processes are made dependent in some way on those [47]. It is worth noting that the aggregation/branching process description of network structure develops easily deep complexity. This can be achieved by choosing the nodes to be joined with an eye to the graph geometry, and/or splitting vertices with some particular rules. In the latter case, a natural starting point is to maintain all the links of the original vertex and to distribute them among the descendant vertices. Then, one has to choose whether to link those to each other or not – either choice inﬂuences the local correlations. The same is naturally true, if one considers reactions of the type k C x 0 ! k 00 , which originate from joining two neighboring vertices with degrees k and
295
296
Branching Processes
x 0 . The likelihood of this rate is dependent on the conditional probability of a vertex with degree k having a neighbor with x 0 . Such processes are dependent on structural properties of the networks, which themselves change in time, thus the theoretical understanding is diﬃcult due to the correlations. As a ﬁnal note, in some cases the merging of vertices produces to the network a giant component, a vertex which has a ﬁnite fraction of the total mass (or, links in the whole). This is analogous to avalanching systems which are “weakly ﬁrstorder”, in other words exhibit statistics which has a powerlaw, scalefree part and then a separate, singular peak. Often, this would be a ﬁnal giant avalanche.
scription of nonMarkovian phenomena, as when for instance the avalanche shape is nonsymmetrical [51]. This indicates that there is an underlying mechanism which needs to be incorporated into the BP model.
Conclusions
Bibliography
In this short overview we have given some ideas of how to understand complexity via the tool of branching processes. The main issue has been that they are an excellent means of understanding “criticality” and “complexity” in many systems. We have concentrated on two particular applications, SOC and networks, to illustrate this. Many other important ﬁelds where BPbased ideas ﬁnd use have been left out, from biology and the dynamics of species and molecules to geophysics and the spatial and temporal properties of say earthquakes. An example is the socalled ETAS model used for their modeling (see [48,49,50]). Future Directions The applications of branching processes to complex systems continue on various fronts. One can predict interesting developments in a number of cases. First, as indicated the dynamics of BP’s on networks are not understood very well at all when it comes to the possible scenarios. A simple question is, whether the usual language of absorbing state phase transitions applies, and if not why – and what is the eﬀect of the heterogeneous geometry of complex networks. In many cases the structure of a network is a dynamic entity, which can be described by generalized branching processes and there has been so far relatively little work into this direction. The inclusion of spatial eﬀects and temporal memory dynamics is another interesting and important future avenue. Essentially, one searches for complicated variants of usual BP’s to be able to model avalanching systems or cases where one wants to compute the typical time to reach an absorbing state, and the related distribution. Or, the question concerns the supercritical state ( > c ) and the spreading from a seed. As noted in the networks section this can be complicated by the presence of nonPoissonian temporal statistics. Another exciting future task is the de
Acknowledgments We are grateful to our colleague Stefano Zapperi with whom we have collaborated on topics related to networks, avalanches, and branching processes. This work was supported by the Academy of Finland through the Center of Excellence program (M.J.A.) and EUMETSAT’s GRAS Satellite Application Facility (K.B.L.).
Primary Literature 1. Harris TE (1989) The Theory of Branching Processes. Dover, New York 2. Kimmel M, Axelrod DE (2002) Branching Processes in Biology. Springer, New York 3. Athreya KB, Ney PE (2004) Branching Processes. Dover Publications, Inc., Mineola 4. Haccou P, Jagers P, Vatutin VA (2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press, Cambridge 5. Manna SS (1991) J Phys A 24:L363. In this twostate model, the energy takes the two stable values, zi D 0 (empty) and zi D 1 (particle). When zi zc , with zc D 2, the site relaxes by distributing two particles to two randomly chosen neighbors 6. Dickman R, Alava MJ, Munoz MA, Peltola J, Vespignani A, Zapperi S (2001) Phys Rev E64:056104 7. Bak P, Tang C, Wiesenfeld K (1987) Phys Rev Lett 59:381; (1988) Phys Rev A 38:364 8. Tebaldi C, De Menech M, Stella AL (1999) Phys Rev Lett 83:3952 9. Stella AL, De Menech M (2001) Physica A 295:1001 10. Vespignani A, Dickman R, Munoz MA, Zapperi S (2000) Phys Rev E 62:4564 11. Dickman R, Munoz MA, Vespignani A, Zapperi S (2000) Braz J Phys 30:27 12. Alava M (2003) SelfOrganized Criticality as a Phase Transition. In: Korutcheva E, Cuerno R (eds) Advances in Condensed Matter and Statistical Physics. arXiv:condmat/0307688,(2004) Nova Publishers, p 45 13. Lubeck S (2004) Int J Mod Phys B18:3977 14. Jensen HJ (1998) SelfOrganized Criticality. Cambridge University Press, Cambridge 15. Alava MJ, Nukala PKNN, Zapperi S (2006) Statistical models of fracture. Adv Phys 55:349–476 16. Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47 17. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford; (2002) Adv Phys 51:1079; (2004) arXiv:condmat/0404593 18. Bonachela JA, Chate H, Dornic I, Munoz MA (2007) Phys Rev Lett 98:115702
Branching Processes
19. Feller W (1971) An Introduction to Probability Theory and its Applications. vol 2, 2nd edn. Wiley, New York 20. Asmussen S, Hering H (1983) Branching Processes. Birkhäuser, Boston 21. Weiss GH (1994) Aspects and Applications of the Random Walk. NorthHolland, Amsterdam 22. Tang C, Bak P (1988) J Stat Phys 51:797 23. Dhar D, Majumdar SN (1990) J Phys A 23:4333 24. Janowsky SA, Laberge CA (1993) J Phys A 26:L973 25. Flyvbjerg H, Sneppen K, Bak P (1993) Phys Rev Lett 71:4087; de Boer J, Derrida B, Flyvbjerg H, Jackson AD, Wettig T (1994) Phys Rev Lett 73:906 26. Alstrøm P (1988) Phys Rev A 38:4905 27. Christensen K, Olami Z (1993) Phys Rev E 48:3361 28. GarcíaPelayo R (1994) Phys Rev E 49:4903 29. Zapperi S, Lauritsen KB, Stanley HE (1995) Phys Rev Lett 75:4071 30. Manna SS, Kiss LB, Kertész J (1990) J Stat Phys 61:923 31. Tadi´c B, Nowak U, Usadel KD, Ramaswamy R, Padlewski S (1992) Phys Rev A 45:8536 32. Tadic B, Ramaswamy R (1996) Phys Rev E 54:3157 33. Vespignani A, Zapperi S, Pietronero L (1995) Phys Rev E 51:1711 34. Lauritsen KB, Zapperi S, Stanley HE (1996) Phys Rev E 54:2483 35. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167 36. Yook SH, Jeong H, Barabasi AL, Tu Y (2001) Phys Rev Lett 86:5835 37. Barrat A, Barthelemy M, PastorSatorras R, Vespignani A (2004) Proc Natl Acad Sci USA 101:3747 38. Barrat A, Barthelemy M, Vespignani A (2004) Phys Rev Lett 92:228701 39. PastorSatorras R, Vespignani A (2001) Phys Rev Lett 86:3200 40. Dorogovtsev SN, Goltsev AV, Mendes JFF (2007) arXiv:condmat/0750.0110 41. PastorSatorras R, Vespignani A (2004) Evolution and Structure of the Internet: A Statistical Physics Approach. Cambridge University Press, Cambridge 42. Vazquez A, Balazs R, Andras L, Barabasi AL (2007) Phys Rev Lett 98:158702
43. Colizza V, Barrat A, Barthelemy M, Vespignani A (2006) PNAS 103:2015 44. Vazquez A (2006) Phys Rev Lett 96:038702 45. Kim BJ, Trusina A, Minnhagen P, Sneppen K (2005) Eur Phys J B43:369 46. Alava MJ, Dorogovtsev SN (2005) Phys Rev E 71:036107 47. Hui Z, ZiYou G, Gang Y, WenXu W (2006) Chin Phys Lett 23:275 48. Ogata Y (1988) J Am Stat Assoc 83:9 49. Saichev A, Helmstetter A, Sornette D (2005) Pure Appl Geophys 162:1113 50. Lippidello E, Godano C, de Arcangelis L (2007) Phys Rev Lett 98:098501 51. Zapperi S, Castellano C, Colaiori F, Durin G (2005) Nat Phys 1:46
Books and Reviews Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47 Asmussen S, Hering H (1883) Branching Processes. Birkhäuser, Boston Athreya KB, Ney PE (2004) Branching Processes. Dover Publications, Inc., Mineola Feller W (1971) An Introduction to Probability Theory and its Applications. vol 2, 2nd edn. Wiley, New York Haccou P, Jagers P, Vatutin VA (2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press, Cambridge Harris TE (1989) The Theory of Branching Processes. Dover, New York Jensen HJ (1998) SelfOrganized Criticality. Cambridge University Press, Cambridge Kimmel M, Axelrod DE (2002) Branching Processes in Biology. Springer, New York Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167 Weiss GH (1994) Aspects and Applications of the Random Walk. NorthHolland, Amsterdam
297
298
Cellular Automata as Models of Parallel Computation
Cellular Automata as Models of Parallel Computation THOMAS W ORSCH Lehrstuhl Informatik für Ingenieure und Naturwissenschaftler, Universität Karlsruhe, Karlsruhe, Germany Article Outline Glossary Deﬁnition of the Subject Introduction Time and Space Complexity Measuring and Controlling the Activities Communication in CA Future Directions Bibliography Glossary Cellular automaton The classical ﬁnegrained parallel model introduced by John von Neumann. Hyperbolic cellular automaton A cellular automaton resulting from a tessellation of the hyperbolic plane. Parallel Turing machine A generalization of Turing’s classical model where several control units work cooperatively on the same tape (or set of tapes). Time complexity Number of steps needed for computing a result. Usually a function t : NC ! NC , t(n) being the maximum (“worst case”) for any input of size n. Space complexity Number of cells needed for computing a result. Usually a function s : NC ! NC , s(n) being the maximum for any input of size n. State change complexity Number of proper state changes of cells during a computation. Usually a function sc : NC ! NC , sc(n) being the maximum for any input of size n. Processor complexity Maximum number of control units of a parallel Turing machine which are simultaneously active during a computation. Usually a function sc : NC ! NC , sc(n) being the maximum for any input of size n. NC The set f1; 2; 3; : : : g of positive natural numbers. Z The set f: : : ; 3; 2; 1; 0; 1; 2; 3; : : : g of integers. QG The set of all (total) functions from a set G to a set Q. Definition of the Subject This article will explore the properties of cellular automata (CA) as a parallel model.
The Main Theme We will ﬁrst look at the standard model of CA and compare it with Turing machines as the standard sequential model, mainly from a computational complexity point of view. From there we will proceed in two directions: by removing computational power and by adding computational power in diﬀerent ways in order to gain insight into the importance of some ingredients of the deﬁnition of CA. What is Left Out There are topics which we will not cover although they would have ﬁt under the title. One such topic is parallel algorithms for CA. There are algorithmic problems which make sense only for parallel models. Probably the most famous for CA is the socalled Firing Squad Synchronization Problem. This is the topic of Umeo’s article ( Firing Squad Synchronization Problem in Cellular Automata), which can also be found in this encyclopedia. Another such topic in this area is the Leader election problem. For CA it has received increased attention in recent years. See the paper by Stratmann and Worsch [29] and the references therein for more details. And we do want to mention the most exciting (in our opinion) CA algorithm: Tougne has designed a CA which, starting from a single point, after t steps has generated the discretized circle of radius t, for all t; see [5] for this gem. There are also models which generalize standard CA by making the cells more powerful. Kutrib has introduced pushdown cellular automata [14]. As the name indicates, in this model each cell does not have a ﬁnite memory but can make use of a potentially unbounded stack of symbols. The area of nondeterministic CA is also not covered here. For results concerning formal language recognition with these devices refer to Cellular Automata and Language Theory. All these topics are, unfortunately, beyond the scope of this article. Structure of the Paper The core of this article consists of four sections: Introduction: The main point is the standard deﬁnition of Euclidean deterministic synchronous cellular automata. Furthermore, some general aspects of parallel models and typical questions and problems are discussed. Time and space complexity: After deﬁning the standard computational complexity measures, we compare CA with diﬀerent resource bounds. The comparison of CA
Cellular Automata as Models of Parallel Computation
with the Turing Machine (TM) gives basic insights into their computational power. Measuring and controlling activities: There are two approaches to measure the “amount of parallelism” in CA. One is an additional complexity measured directly for CA, the other via the deﬁnition of socalled parallel Turing machines. Both are discussed. Communication: Here we have a look at “variants” of CA with communication structures other than the onedimensional line. We sketch the proofs that some of these variants are in the second machine class. Introduction In this section we will ﬁrst formalize the classical model of cellular automata, basically introduced by von Neumann [21]. Afterwards we will recap some general facts about parallel models. Deﬁnition of Cellular Automata There are several equivalent formalizations of CA and of course one chooses the one most appropriate for the topics to be investigated. Our point of view will be that each CA consists of a regular arrangement of basic processing elements working in parallel while exchanging data. Below, for each of the words regular, basic, processing, parallel and exchanging, we ﬁrst give the standard deﬁnition for clariﬁcation. Then we brieﬂy point out possible alternatives which will be discussed in more detail in later sections. Underlying Grid A Cellular Automaton (CA) consists of a set G of cells, where each cell has at least one neighbor with which it can exchange data. Informally speaking one usually assumes a “regular” arrangement of cells and, in particular, identically shaped neighborhoods. For a ddimensional CA, d 2 NC , one can think of G D Zd . Neighbors are speciﬁed by a ﬁnite set N of coordinate diﬀerences called the neighborhood. The cell i 2 G has as its neighbors the cells i C n for all n 2 N. Usually one assumes that 0 2 N. (Here we write 0 for a vector of d zeros.) As long as one is not speciﬁcally interested in the precise role N is playing, one may assume some standard neighborhood: The von Neumann neighborhood of P radius r is N (r) D f(k1 ; : : : ; kd ) j j jk j j rg and the (r) Moore neighborhood of radius r is M D f(k1 ; : : : ; kd ) j max j jk j j rg. The choices of G and N determine the structure of what in a real parallel computer would be called the “com
munication network”. We will usually consider the case G D Zd and assume that the neighborhood is N D N (1) . Discussion The structure of connections between cells is sometimes deﬁned using the concept of Cayley graphs. Refer to the article by Ceccherini–Silberstein ( Cellular Automata and Groups), also in this encyclopedia, for details. Another approach is via regular tessellations. For example the 2dimensional Euclidean space can be tiled with copies of a square. These can be considered as cells, and cells sharing an edge are neighbors. Similarly one can tile, e. g., the hyperbolic plane with copies of a regular kgon. This will be considered to some extent in Sect. “Communication in CA”. A more thorough exposition can be found in the article by Margenstern ( Cellular Automata in Hyperbolic Spaces) also in this encyclopedia. CA resulting, for example, from tessellations of the 2dimensional Euclidean plane with triangles or hexagons are considered in the article by Bays ( Cellular Automata in Triangular, Pentagonal and Hexagonal Tessellations) also in this encyclopedia. Global and Local Conﬁgurations The basic processing capabilities of each cell are those of a ﬁnite automaton. The set of possible states of each cell, denoted by Q, is ﬁnite. As inputs to be processed each cell gets the states of all the cells in its neighborhood. We will write QG for the set of all functions from G to Q. Thus each c 2 Q G describes a possible global state of the whole CA. We will call these c (global) conﬁgurations. On the other hand, functions ` : N ! Q are called local conﬁgurations. We say that in a conﬁguration c cell i observes the local conﬁguration c iCN : N ! Q where c iCN (n) D c(i C n). A cell gets its currently observed local conﬁguration as input. It remains to be deﬁned how they are processed. Dynamics The dynamics of a CA are deﬁned by specifying the local dynamics of a single cell and how cells “operate in parallel” (if at all). In the ﬁrst sections we will consider the classical case: A local transition function is a function f : Q N ! Q prescribing for each local conﬁguration ` 2 Q N the next state f (`) of a cell which currently observes ` in its neighborhood. In particular, this means that we are considering deterministic behavior of cells. Furthermore, we will ﬁrst concentrate on CA where all cells are working synchronously: The possible transitions from one conﬁguration to the next one in one
299
300
Cellular Automata as Models of Parallel Computation
global step of the CA can be described by a function F : Q G ! Q G requiring that all cells make one state transition: 8i 2 G : F(c)(i) D f (c iCN ). For alternative deﬁnitions of the dynamic behavior of CA see Sect. “Measuring and Controlling the Activities”. Discussion Basically, the above deﬁnition of CA is the standard one going back to von Neumann [21]; he used G D Z2 and N D f(1; 0); (1; 0); (0; 1); (0; 1)g for his construction. But for all of the aspects just deﬁned there are other possibilities, some of which will be discussed in later sections. Finite Computations on CA In this article we are interested in using CA as devices for computing, given some ﬁnite input, in a ﬁnite number of steps a ﬁnite output. Inputs As the prototypical examples of problems to be solved by CA and other models we will consider the recognition of formal languages. This has the advantages that the inputs have a simple structure and, more importantly, the output is only one bit (accept or reject) and can be formalized easily. A detailed discussion of CA as formal language recognizers can be found in the article by Kutrib ( Cellular Automata and Language Theory). The input alphabet will be denoted by A. We assume that A Q. In addition there has to be a special state q 2 Q which is called a quiescent state because it has the property that for the quiescent local conﬁguration `q : N ! Q : n 7! q the local transition function must specify f (`q ) D q. In the literature two input modes are usually considered. Parallel input mode: For an input w D x1 x n 2 An the initial conﬁguration cw is deﬁned as ( cw (i) D
xj
iﬀ i D ( j; 0; : : : ; 0)
q
otherwise :
Sequential input mode: In this case, all cells are in state q in the initial conﬁguration. But cell (0; : : : ; 0) is a designated input cell and acts diﬀerently from the others. It works according to a function g : Q N (A~[fqg). During the ﬁrst n steps the input cell gets input symbol xj in step j; after the last input symbol it always gets q. CA using this type of input are often called iterative arrays (IA).
Unless otherwise noted we will always assume parallel input mode. Conceptually the diﬀerence between the two input modes has the same consequences as for TM. If input is provided sequentially it is meaningful to have a look at computations which “use” less than n cells (see the deﬁnition of space complexity later on). Technically, some results occur only for the parallel input mode but not for the sequential one, or vice versa. This is the case, for example, when one looks at devices with p small time bounds like n or n C n steps. But as soon as one considers more generally ‚(n) or more steps and a space complexity of at least ‚(n), both CA and IA can simulate each other in linear time: An IA can ﬁrst read the complete input word, storing it in successive cells, and then start simulating a CA, and A CA can shift the whole word to the cell holding the ﬁrst input symbol and have it act as the designated input cell of an IA. Outputs Concerning output, one usually deﬁnes that a CA has ﬁnished its work whenever it has reached a stable conﬁguration c, i. e., F(c) D c. In such a case we will also say that the CA halts (although formally one can continue to apply F). An input word w 2 AC is accepted, iﬀ the cell (1; 0; : : : ; 0) which got the ﬁrst input symbol, is in an accepting state from a designated ﬁnite subset FC Q of states. We write L(C) for the set of all words w 2 AC which are accepted by a CA C. For the sake of simplicity we will assume that all deterministic machines under consideration halt for all inputs. E. g., the CA as deﬁned above always reaches a stable conﬁguration. The sequence of all conﬁgurations from the initial one for some input w to the stable one is called the computation for input w. Discussion For 1dimensional CA the deﬁnition of parallel input is the obvious one. For higherdimensional CA, say G D Z2 , one could also think of more compact forms of input for onedimensional words, e. g., inscribing the input symbols row by row into a square with side length p d ne. But this requires extra care. Depending on the formal language to be accepted, the special way in which the symbols are input might provide additional information which is useful for the language recognition task at hand. Since a CA performs work on an inﬁnite number of bits in each step, it would also be possible to consider inputs and outputs of inﬁnite length, e. g., as representations of all real numbers in the interval [0; : : : ; 1]. There is much
Cellular Automata as Models of Parallel Computation
less literature about this aspect; see for example chapter 11 of [8]. It is not too surprising that this area is also related to the view of CA as dynamical systems (instead of computers); see the contributions by Formenti ( Chaotic Behavior of Cellular Automata) and K˚urka ( Topological Dynamics of Cellular Automata). Example: Recognition of Palindromes As an example that will also be useful later, consider the formal language Lpal of palindromes of odd length: Lpal D fvxv R j v 2 A ^ x 2 Ag : (Here vR is the mirror image of v.) For example, if A contains all Latin letters saippuakauppias belongs to Lpal (the Finnish word for a soap dealer). It is known that each T TM with only one tape and only one head on it (see Subsect. “Turing Machines” for a quick introduction) needs time (n2 ) for the recognition of at least some inputs of length n belonging to Lpal [10]. We will sketch a CA recognizing Lpal in time ‚(n). As the set of states for a single cell we use Q D A [ f g [ Q l Qr Qv Q l r , basically subdividing each cell in 4 “registers”, each containing a “substate”. The substates from Q l D f < a; < b; g and Qr D fa > ; b > ; g are used to shift input symbols to the left and to the right respectively. In the third register a substate from Qv D f+; g indicates the results of comparisons. In the fourth register substates from Q l r D f > ; < ; < + > ; <  > ; g are used to realize “signals” > and < which identify the middle cell and distribute the relevant overall comparison result to all cells. As accepting states one chooses those, whose last component is : FC D Q l Qr Qv fg. There is a total of 3 C 3 3 2 5 D 93 states and for a complete deﬁnition of the local transition function one would have to specify f (x; y; z) for 933 D 804357 triples of states. We will not do that, but we will sketch some important parts. In the ﬁrst step the registers are initialized. For all x; y; z 2 A: `(1) x x
`(0) y y y y
`(1) z z
then < zl xr > f (`) D v 0m 0 dm Here, of course, the new value of the third register is computed as ( v 0m
D
+
if v m D + ^ z l D x r

otherwise :
0 in detail. We do not describe the computation of d m Figure 1 shows the computation for the input babbbab, which is a palindrome. Horizontal double lines separate conﬁgurations at subsequent time steps. Registers in state are simply left empty. As can be seen there is a triangle in the space time diagram consisting of the n input cells at time t D 0 and shrinking at both ends by one cell in each subsequent step where “a lot of activity” can happen due to the shifting of the input symbols in both directions.
f (`) ( < y; y > ; +; > ) ( < y; y > ; +; ) ( < y; y > ; +; < ) ( < y; y > ; +; < + > )
In all later steps, if < xl xr > `(1) D vl dl
< yl yr > `(0) D vm dm
< zl zr > `(1) D vr dr
Cellular Automata as Models of Parallel Computation, Figure 1 Recognition of a palindrome; the last configuration is stable and the cell which initially stored the first input symbol is in an accepting state
301
302
Cellular Automata as Models of Parallel Computation
Clearly a twohead TM can also recognize Lpal in linear time, by ﬁrst moving one head to the last symbol and then synchronously shifting both heads towards each other comparing the symbols read. Informally speaking in this case the ability of multihead TM to transport a small amount of information over a long distance in one step can be “compensated” by CA by shifting a large amount of information over a short distance. We will see in Theorem 3 that this observation can be generalized. Complexity Measures: Time and Space For onedimensional CA it is straightforward to deﬁne their time and space complexity. We will consider only worstcase complexity. Remember that we assume that all CA halt for all inputs (reach a stable conﬁguration). For w 2 AC let time0 (w) denote the smallest number of steps such that the CA reaches a stable conﬁguration after steps when started from the initial conﬁguration for input w. Then time : NC ! NC : n 7! maxftime0 (w) j w 2 An g is called the time complexity of the CA. Similarly, let space0 (w) denote the total number of cells which are not quiescent in at least one conﬁguration occurring during the computation for input w. Then space : NC ! NC : n 7! maxfspace0 (w) j w 2 An g is called the space complexity of the CA. If we want to mention a speciﬁc CA C, we indicate it as an index, e. g., timeC . If s and t are functions NC ! NC , we write CA SPC(s) TIME(t) for the set of formal languages which can be accepted by some CA C with spaceC s and timeC t, and analogously CA SPC(s) and CA T IME(t) if only one complexity measure is bounded. Thus we only look at upper bounds. For a whole set of functions T , we will use the abbreviation [ CA TIME(T ) D CA TIME( f ) :
that TM TIME(Pol(n)) D P and TM SPC(Pol(n)) D PSPACE. Discussion For higherdimensional CA the deﬁnition of space complexity requires more consideration. One possibility is to count the number of cells used during the computation. A diﬀerent, but sometimes more convenient approach is to count the number of cells in the smallest hyperrectangle comprising all used cells. Turing Machines For reference, and because we will consider a parallel variant, we set forth some deﬁnitions of Turing machines. In general we allow Turing machines with k work tapes and h heads on each of them. Each square carries a symbol from the tape alphabet B which includes the blank symbol . The control unit (CU) is a ﬁnite automaton with set of states S. The possible actions of a deterministic TM are described by a function f : S B k h ! S B k h D k h , where D D f1; 0; C1g is used for indicating the direction of movement of a head. If the machine reaches a situation in which f (s; b1 ; : : : ; b k h ) D (s; b1 ; : : : ; b k h , 0; : : : ; 0) for the current state s and the currently scanned symbols b1 ; : : : ; b k h , we say that it halts. Initially a word w of length n over the input alphabet A B is written on the ﬁrst tape on squares 1; : : : ; n, all other tape squares are empty, i. e., carry the . An input is accepted if the CU halts in an accepting state from a designated subset FC S. L(T) denotes the formal language of all words accepted by a TM T. We write kT h TM SPC(s) TIME(t) for the class of all formal languages which can be recognized by TM T with k work tapes and h heads on each of them which have a space complexity space T s and time T t. If k and/or h is missing, 1 is assumed instead. If the whole preﬁx kT h is missing, T is assumed. If arbitrary k and h are allowed, we write T .
t2T
Typical examples will be T D O(n) or T D Pol(n), where S in general Pol( f ) D k2NC O( f k ). Resource bounded complexity classes for other computational models will be noted similarly. If we want to make the dimension of the CA explicit, we write Zd CA : : :; if the preﬁx Zd is missing, d D 1 is to be assumed. Throughout this article n will always denote the length of input words. Thus a time complexity of Pol(n) simply means polynomial time, and similarly for space, so
Sequential Versus Parallel Models Today quite a number of diﬀerent computational models are known which intuitively look as if they are parallel. Several years ago van Emde Boas [6] observed that many of these models have one property in common. The problems that can be solved in polynomial time on such a model P , coincide with the problems that can be solved in polynomial space on Turing machines: P T IME(Pol(n)) D TM SPC(Pol(n)) D PSPACE :
Cellular Automata as Models of Parallel Computation
Here we have chosen P as an abbreviation for “parallel” model. Models P satisfying this equality are by deﬁnition the members of the socalled second machine class. On the other hand, the ﬁrst machine class is formed by all models S satisfying the relation S SPC(s) T IME(t) D TM SPC(‚(s)) T IME (Pol(t))
at least for some reasonable functions s and t. We deliberately avoid making this more precise. In general there is consensus on which models are in the these machine classes. We do want to point out that the naming of the two machine classes does not mean that they are diﬀerent or even disjoint. This is not known. For example, if P D PSPACE, it might be that the classes coincide. Furthermore there are models, e. g., Savitch’s NLPRAM [25], which might be in neither machine class. Another observation is that in order to possibly classify a machine model it obviously has to have something like “time complexity” and/or “space complexity”. This may sound trivial, but we will see in Subsect. “Parallel Turing Machines” that, for example, for socalled parallel Turing machines with several work tapes it is in fact not. Time and Space Complexity Comparison of Resource Bounded OneDimensional CA It is clear that time and space complexity for CA are Blum measures [2] and hence inﬁnite hierarchies of complexity classes exist. It follows from the more general Theorem 9 for parallel Turing machines that the following holds: Theorem 1 Let s and t be two functions such that s is fully CA space constructable in time t and t is CA computable in space s and time t. Then: [ CA SPC(‚(s/ )) TIME(‚(t/ )) …O(1)
¤ CA SPC(O(s)) TIME(O(t))
CA SPC(o(s)) TIME(o(t)) ¤ CA SPC(O(s)) TIME(O(t)) CA TIME(o(t)) ¤ CA T IME(O(t)) : The second and third inclusion are simple corollaries of the ﬁrst one. We do not go into the details of the definition of CA constructibility, but note that for hierarchy results for TM one sometimes needs analogous additional conditions. For details, interested readers are referred to [3,12,17].
We not that for CA the situation is better than for deterministic TM: there one needs f (n) log f (n) 2 o(g(n)) in order to prove TM TIME( f ) ¤ TM T IME(g). Open Problem 2 For the proper inclusions in Theorem 1 the construction used in [34] really needs to increase the space used in order to get the time hierarchy. It is an open problem whether there also exists a time hierarchy if the space complexity is ﬁxed, e. g., as s(n) D n. It is even an open problem to prove or disprove that the inclusion CA SPC(n) TIME (n) CA SPC(n) TIME (2O(n) ) : is proper or not. We will come back to this topic in Subsect. “Parallel Turing Machines” on parallel Turing machines. Comparison with Turing Machines It is wellknown that a TM with one tape and one head on that tape can be simulated by a onedimensional CA. See for example the paper by Smith [27]. But even multitape TM can be simulated by a onedimensional CA without any signiﬁcant loss of time. Theorem 3 For all space bounds s(n) n and all time bounds t(n) n the following holds for onedimensional CA and TM with an arbitrary number of heads on its tapes: T TM SPC(s) TIME(t) CA SPC(s) TIME(O(t)) : Sketch of the Simulation We ﬁrst describe a simulation for 1T 1 TM. In this case the actions of the TM are of the form s; b ! s 0 ; b0 ; d where s; s 0 2 S are old and new state, b; b0 2 B old and new tape symbol and d 2 f1; 0; C1g the direction of head movement. The simulating CA uses three substates in each cell, one for a TM state, one for a tape symbol, and an additional one for shifting tape symbols: Q D Q S Q T Q M . We use Q S D S [ f g and a substate of means, that the cell does not store a state. Similarly Q T D B [ f < ı; ı > g and a substate of < ı or ı > means that there is no symbol stored but a “hole” to be ﬁlled with an adjacent symbol. Substates from Q S D B f < ; > g, like < b and b > , are used for shifting symbols from one cell to the adjacent one to the left or right. Instead of moving the state one cell to the right or left whenever the TM moves its head, the tape contents as stored in the CA are shifted in the opposite direction. Assume for example that the TM performs the following actions: s0,d ! s1, d0 , + 1
303
304
Cellular Automata as Models of Parallel Computation
In other words, the CA transports “a large amount of information over a short distance in one step”. Theorem 3 says that this ability is at least as powerful as the ability of multihead TM to transport “a small amount of information over a long distance in one step”. Open Problem 4 The question remains whether some kind of converse also holds and in Theorem 3 an D sign would be correct instead of the , or whether CA are more powerful, i. e., a ¨ sign would be correct. This is not known. The best simulation of CA by TM that is known is the obvious one: states of neighboring cells are stored on adjacent tape squares. For the simulation of one CA step the TM basically makes one sweep across the complete tape segment containing the states of all nonquiescent cells updating them one after the other. As a consequence one gets Cellular Automata as Models of Parallel Computation, Figure 2 Shifting tape contents step by step
s1,e ! s2, e0 , 1 s2,d0 ! s3, d00 , 1 s3,c ! s4, c0 , + 1 Figure 2 shows how shifting the tape in direction d can be achieved by sending the current symbol in that direction and sending a “hole” ı in the opposite direction d. It should be clear that the required state changes of each cell depend only on information available in its neighborhood. A consequence of this approach to incrementally shift the tape contents is that it takes an arbitrary large number of steps until all symbols have been shifted. On the other hand, after only two steps the cell simulating the TM control unit has information about the next symbol visited and can simulate the next TM step and initialize the next tape shift. Clearly the same approach can be used if one wants to simulate a TM with several tapes, each having one head. For each additional tape the CA would use two additional registers analogously to the middle and bottom row used in Fig. 2 for one tape. Stoß [28] has proved that kT h TM (h heads on each tape) can be simulated by (kh)T TM (only one head on each tape) in linear time. Hence there is nothing left to prove. Discussion As one can see in Fig. 2, in every second step one signal is sent to the left and one to the right. Thus, if the TM moves its head a lot and if the tape segment which has to be shifted is already long, many signals are traveling simultaneously.
Theorem 5 For all space bounds s(n) n and all time bounds t(n) n holds: CA SPC(s)TIME(t) TMSPC(s)TIME(O(s t)) TM SPC(s) TIME O t 2 : The construction proving the ﬁrst inclusion needs only a onehead TM, and no possibility is known to take advantage of more heads. The second inclusion follows from the observation that in order to use an initially blank tape square, a TM must move one of its heads there, which requires time. Thus s 2 O(t). Taking Theorems 3 and 5 together, one immediately gets Corollary 6 Cellular automata are in the ﬁrst machine class. And it is not known that CA are in the second machine class. In this regard they are “more like” sequential models. The reason for this is the fact, the number of active processing units only grows polynomially with the number of steps in a computation. In Sect. “Communication in CA” variations of the standard CA model will be considered, where this is diﬀerent. Measuring and Controlling the Activities Parallel Turing Machines One possible way to make a parallel model from Turing machines is to allow several control units (CU), but with all of them working on the same tape (or tapes). This model can be traced back at least to a paper by Hemmerling [9] who called it Systems of Turing automata. A few years later Wiedermann [32] coined the term Parallel Turing machine (PTM).
Cellular Automata as Models of Parallel Computation
We consider only the case where there is only one tape and each of the control units has only one head on that tape. As for sequential TM, we usually drop the preﬁx 1T 1 for PTM, too. Readers interested in the case of PTM with multihead CUs are referred to [33]. PTM with OneHead Control Units The speciﬁcation of a PTM includes a tape alphabet B with a blank Cybil and a set S of possible states for each CU. A PTM starts with one CU on the ﬁrst input symbol as a sequential 1T 1 TM. During the computation the number of control units may increase and decrease, but all CUs always work cooperatively on one common tape. The idea is to have the CUs act independently unless they are “close” to each other, retaining the idea of only local interactions, as in CA. A conﬁguration of a PTM is a pair c D (p; b). The mapping b : Z ! B describes the contents of the tape. Let 2S denote the power set of S. The mapping p : Z ! 2S describes for each tape square i the set of states of the ﬁnite automata currently visiting it. In particular, this formalization means that it is not possible to distinguish two automata on the same square and in the same state: the idea is that because of that they will always behave identically and hence need not be distinguished. The mode of operation of a PTM is determined by the transition function f : 2Q B ! 2QD B where D is the set f1; 0; 1g of possible movements of a control unit. In order to compute the successor conﬁguration c 0 D (p0 ; b0 ) of a conﬁguration c D (p; b), f is simultaneously computed for all tape positions i 2 Z. The arguments used are the set of states of the ﬁnite automata currently visiting square i and its tape symbol. Let (M 0i ; b0i ) D f (p(i); b(i)). Then the new symbol on square i in conﬁguration c 0 is b0 (i) D b0i . The set of ﬁnite automata on square i is replaced by a new set of ﬁnite automata (deﬁned by M 0i Q D) each of which changes the tape square according to the indicated direction of movement. Therefore p0 (i) D fq j (q; 1) 2 M 0i1 _ (q; 0) 2 M 0i _ (q; 1) 2 M 0iC1 g. Thus f induces a global transition function F mapping global conﬁgurations to global conﬁgurations. In order to make the model useful (and to come up to some intuitive expectations) it is required, that CUs cannot arise “out of nothing” and that the symbol on a tape square can change only if it is visited by at least one CU. In other words we require that 8 b 2 B : f (;; b) D (;; b). Observe that the number of ﬁnite automata on the tape may change during a computation. Automata may vanish, for example if f (fsg; b) D (;; b) and new automata may be generated, for example if f (fsg; b) D (f(q; 1); (q0 ; 0)g; b).
For the recognition of formal languages we deﬁne the initial conﬁguration cw for an input word w 2 AC as the one in which w is written on the otherwise blank tape on squares 1; 2; : : : ; jwj, and in which there exists exactly one ﬁnite automaton in an initial state q0 on square 1. A conﬁguration (p; b) of aPTM is called accepting iﬀ it is stable (i. e. F((p; b)) D (p; b)) and p(1) FC . The language L(P) recognized by a PTM P is the set of input words, for which it reaches an accepting conﬁguration. Complexity Measures for PTM Time complexity of a PTM can be deﬁned in the obvious way. For space complexity, one counts the total number of tape squares which are used in at least one conﬁguration. Here we call a tape square i unused in a conﬁguration c D (p; b) if p(i) D ; and b(i) D ; otherwise it is used. What makes PTM interesting is the deﬁnition of its processor complexity. Let proc0 (w) denote the maximum number of CU which exist simultaneously in a conﬁguration occurring during the computation for input w and deﬁne proc : NC ! NC : n 7! maxfproc0 (w) j w 2 An g. For complexity classes we use the notation PTMSPC(s) TIME(t) PROC(p) etc. The processor complexity is one way to measure (an upper bound on) “how many activities” happened simultaneously. It should be clear that at the lower end one has the case of constant proc(n) D 1, which means that the PTM is in fact (equivalent to) a sequential TM. The other extreme is to have CUs “everywhere”. In that case proc(n) 2 ‚(space(n)), and one basically has a CA. In other words, processor complexity measures the mount of parallelism of a PTM. Theorem 7 For all space bounds s and time bounds t: PTM SPC(s)T IME(t) PROC (1) D TM SPC(s) TIME(t) PTM SPC(s)T IME(t) PROC (s) D CA SPC(s) TIME(O(t)) : Under additional constructibility conditions it is even possible to get a generalization of Theorem 5: Theorem 8 For all functions s(n) n, t(n) n, and h(n) 1, where h is fully PTM processor constructable in space s, time t, and with h processors, holds: CA SPC(O(s)) TIME(O(t)) PTMSPC(O(s))T IME(O(st/h))PROC (O(h)): Decreasing the processor complexity indeed leads to the expected slowdown.
305
306
Cellular Automata as Models of Parallel Computation
Relations Between PTM Complexity Classes (part 1) The interesting question now is whether diﬀerent upper bounds on the processor complexity result in diﬀerent computational power. In general that is not the case, as PTM with only one CU are TM and hence computationally universal. (As a side remark we note that therefore processor complexity cannot be a Blum measure. In fact that should be more or less clear since, e. g., deciding whether a second CU will ever be generated might require ﬁnding out whether the ﬁrst CU, i. e., a TM, ever reaches a speciﬁc state.) In this ﬁrst part we consider the case where two complexity measures are allowed to grow in order to get a hierarchy. Results which only need one growing measure are the topic of the second part. First of all it turns out that for ﬁxed processor complexity between log n and s(n) there is a space/time hierarchy: Theorem 9 Let s and t be two functions such that s is fully PTM space constructable in time t and t is PTM computable in space s and time t and let h log. Then: [ …O(1)
PTM SPC(‚(s/ )) TIME(‚(t/ )) PROC (O(h))
¤ PTM SPC(O(s)) T IME(O(t)) PROC (O(h)) : The proof of this theorem applies the usual idea of diagonalization. Technical details can be found in [34]. Instead of keeping processor complexity ﬁxed and letting space complexity grow, one can also do the opposite. As for analogous results for TM, one needs the additional restriction to one ﬁxed tape alphabet. One gets the following result, where the complexity classes carry the additional information about the size of the tape alphabet. Theorem 10 Let s, t and h be three functions such that s is fully PTM space constructable in time t and with h processors, and that t and h are PTM computable in space s and time t and with h processors such that in all cases the tape is not written. Let b 2 be the size of the tape alphabet. Then: [ …O(1)
PTM SPC(s) TIME‚(t/ ) PROC(h/ ) ALPH(b)
¤ PTM SPC(s) TIME(‚(st)) PROC (‚(h)) ALPH(b) : Again we do not go into the details of the constructibility deﬁnitions which can be found in [34]. The important point here is that one can prove that increasing time
and processor complexity by a nonconstant factor does increase the (language recognition) capabilities of PTM, even if the space complexity is ﬁxed, provided that one does not allow any changes to the tape alphabet. In particular the theorem holds for the case space(n) D n. It is now interesting to reconsider Open Problem 2. Let’s assume that CA SPC(n) TIME (n) D CA SPC(n) TIME (2O(n) ) : One may choose (n) D log n, t(n) D 2n/ log n and h(n) D n in Theorem 10. Using that together with Theorem 7 the assumption would give rise to PTM SPC(n) TIME
2n/ log n log n
!
PROC
n log n
ALPH(b) ¤ PTM SPC(n) TIME(n2
n/ log n
) PROC (n) ALPH(b)
D PTM SPC(n) TIME(n) PROC(n) ALPH(b) : If the polynomial time hierarchy for nspace bounded CA collapses, then there are languages which cannot be recognized by PTM in almost exponential time with n/ log n processors but which can be recognized by PTM with n processors in linear time, if the tape alphabet is ﬁxed. Relations Between PTM Complexity Classes (part 2) One can get rid of the ﬁxed alphabet condition by using a combinatorial argument for a speciﬁc formal language (instead of diagonalization) and even have not only the space but also the processor complexity ﬁxed and still get a time hierarchy. The price to pay is that the range of time bounds is more restricted than in Theorem 10. Consider the formal language o n Lvv D vcjvj v j v 2 fa; bgC : It contains all words which can be divided into three segments of equal length such that the ﬁrst and third are identical. Intuitively whatever type of machine is used for recognition, it is unavoidable to “move” the complete information from one end to the other. Lvv shares this feature with Lpal . Using a counting argument inspired by Hennie’s concept of crossing sequences [10] applied to Lpal , one can show: Lemma 11 ([34]) If P is a PTM recognizing Lvv , then time2P proc P 2 (n3 / log2 n).
Cellular Automata as Models of Parallel Computation
On the other hand, one can construct a PTM recognizing Lvv with processor complexity na for suﬃciently nice a:
For the language Lvv which already played a role in the previous subsection one can show:
Lemma 12 ([34]) For each a 2 Q with 0 < a < 1 holds: Lvv 2 PTMSPC(n)TIME ‚ n2a PROC ‚ n a :
Lemma 15 Let f (n) be a nondecreasing function which is not in O(log n), i. e., limn!1 log n/ f (n) D 0. Then any CA C recognizing Lvv makes a total of at least (n2 / f (n)) state changes in the segment containing the n input cells and n cells to the left and to the right of them. In particular, if timeC 2 ‚(n), then sumchgC 2 (n2 / f (n)). Furthermore maxchgC 2 (n/ f (n)).
Putting these lemmas together yields another hierarchy theorem: Theorem 13 For rational numbers 0 < a < 1 and 0 < " < 3/2 a/2 holds: PTM SPC(n) TIME ‚ n3/2a/2"
PROC ‚ n a ¤ PTM SPC(n) TIME ‚ n2a PROC ‚ n a :
Hence, for a close to 1 a “small” increase in time by some 0 n" suﬃces to increase the recognition power of PTM while the processor complexity is ﬁxed at na and the space complexity is ﬁxed at n as well. Open Problem 14 For the recognition of Lvv there is a gap between the lower bound of time2P proc P 2 (n3 / log2 n) in Lemma 11 and the upper bound of time2P proc P 2 O(n4a ) in Lemma 12. It is not known whether the upper or the lower bound or both can be improved. An even more diﬃcult problem is to prove a similar result for the case a D 1, i. e., cellular automata, as mentioned in Open Problem 2. State Change Complexity In CMOS technology, what costs most of the energy is to make a proper state change, from zero to one or from one to zero. Motivated by this fact Vollmar [30] introduced the state change complexity for CA. There are two variants based on the same idea: Given a halting CA computation for an input w and a cell i one can count the number of time points , 1 time0 (w), such that cell i is in diﬀerent states at times 1 and . Denote that number by change0 (w; i). Deﬁne maxchg0 (w) D max change0 (w; i) and i2G X 0 change0 (w; i) sumchg (w) D i2G
and maxchg(n) D maxfmaxchg0 (w) j w 2 An g sumchg(n) D maxfsumchg0 (w) j w 2 An g :
In the paper by Sanders et al. [24] a generalization of this lemma to ddimensional CA is proved. Open Problem 16 While the processor complexity of PTM measures how many activities happen simultaneously “across space”, state change complexity measures how many activities happen over time. For both cases we have made use of the same formal language in proofs. That might be an indication that there are connections between the two complexity measures. But no nontrivial results are known until now. Asynchronous CA Until now we have only considered one global mode of operation: the socalled synchronous case, where in each global step of the CA all cells must update their states synchronously. Several models have been considered where this requirement has been relaxed. Generally speaking, asynchronous CA are characterized by the fact that in one global step of the CA some cells are active and do update their states (all according to the same local transition function) while others do nothing, i. e., remain in the same state as before. There are then diﬀerent approaches to specify some restrictions on which cells may be active or not. Asynchronous update mode. The simplest possibility is to not quantify anything and to say that a conﬁguration c 0 is a legal successor of conﬁguration c, denoted c ` c 0 , iﬀ for all i 2 G one has c 0 (i) D c(i) or c 0 (i) D f (c iCN ). Unordered sequential update mode. In this special case it is required that there is only one active cell in each global step, i. e., card(fijc 0 (i) 6D c(i)g) 1. Since CA with an asynchronous update mode are no longer deterministic from a global (conﬁguration) point of view, it is not completely clear how to deﬁne, e. g., formal language recognition and time complexity. Of course one could follow the way it is done for nondeterministic TM. To the best of our knowledge this has not considered for
307
308
Cellular Automata as Models of Parallel Computation
asynchronous CA. (There are results for general nondeterministic CA; see for example Cellular Automata and Language Theory.) It should be noted that Nakamura [20] has provided a very elegant construction for simulating a CA Cs with synchronous update mode on a CA Ca with one of the above asynchronous update modes. Each cell stores the “current” and the “previous” state of a Cs cell before its last activation and a counter value T modulo 3 (Q a D Qs Qs f0; 1; 2g). The local transition function f a is deﬁned in such a way that an activated cell does the following: T always indicates how often a cell has already been updated modulo 3. If the counters of all neighbors have value T or T C 1, the current Cs state of the cell is remembered as previous state and a new current state is computed according to f s from the current and previous Cs states of the neighbors; the selection between current and previous state depends on the counter value of that cell. In this case the counter is incremented. If the counter of at least one neighboring cell is at T 1, the activated cell keeps its complete state as it is. Therefore, if one does want to gain something using asynchronous CA, their local transition functions would have to be designed for that speciﬁc usage. Recently, interest has increased considerably in CA where the “degree of (a)synchrony” is quantiﬁed via probabilities. In these cases one considers CA with only a ﬁnite number of cells. Probabilistic update mode. Let 0 ˛ 1 be a probability. In probabilistic update mode each legal global step c ` c 0 of the CA is assigned a probability by requiring that each cell i independently has a probability ˛ of updating its state. Random sequential update mode. This is the case when in each global step one of the cells in G is chosen with even probability and its state updated, while all others do not change their state. CA operating in this mode are called fully asynchronous by some authors. These models can be considered special cases of what is usually called probabilistic or stochastic CA. For these CA the local transition function is no longer a map from QN to Q, but from QN to [0; 1]Q . For each ` 2 Q N the value f (`) is a probability distribution for the next state (satisP fying q2Q f (`)(q) D 1). There are only very few papers about formal language recognition with probabilistic CA; see [18].
On the other hand, probabilistic update modes have received some attention recently. See for example [22] and the references therein. Development of this area is still at its beginning. Until now, speciﬁc local rules have mainly been investigated; for an exception see [7]. Communication in CA Until now we have considered only onedimensional Euclidean CA where one bit of information can reach O(t) cells in t steps. In this section we will have a look at a few possibilities for changing the way cells communicate in a CA. First we have a quick look at CA where the underlying grid is Zd . The topic of the second subsection is CA where the cells are connected to form a tree. Diﬀerent Dimensionality In Zd CA with, e. g., von Neumann neighborhood of radius 1, a cell has the potential to inﬂuence O(t d ) cells in t steps. This is a polynomial number of cells. It comes as no surprise that Zd CA SPC(s) TIME(t) TM SPC(Pol(s)) TIME(Pol(t)) and hence Zd CA are in the ﬁrst machine class. One might only wonder why for the TM a space bound of ‚(s) might not be suﬃcient. This is due to the fact that the shape of the cells actually used by the CA might have an “irregular” structure and that the TM has to perform some bookkeeping or simulate a whole (hyper)rectangle of cells encompassing all that are really used by the CA. Trivially, a ddimensional CA can be simulated on a d 0 dimensional CA, where d 0 > d. The question is how much one loses when decreasing the dimensionality. The currently known best result in this direction is by Scheben [26]: Theorem 17 It is possible to simulate a d 0 dimensional CA with running time t on a ddimensional CA, d < d 0 with running time and space O(t 2
lddd 0 /de
).
It should be noted that the above result is not directly about language recognition; the redistribution of input symbols needed for the simulation is not taken into account. Readers interested in that as well are referred to [1]. Open Problem 18 Try to ﬁnd simulations of lower on higher dimensional CA which somehow make use of the “higher connectivity” between cells. It is probably much too diﬃcult or even impossible to hope for general
Cellular Automata as Models of Parallel Computation
speedups. But eﬃcient use of space (small hypercubes) for computations without losing time might be achievable. Tree CA and hyperbolic CA Starting from the root of a full binary tree one can reach an exponential number 2 t of nodes in t steps. If there are some computing capabilities related to the nodes, there is at least the possibility that such a device might exhibit some kind of strong parallelism. One of the earliest papers in this respect is Wiedermann’s article [31] (unfortunately only available in Slovak). The model introduced there one would now call parallel Turing machines, where a tape is not a linear array of cells, but where the cells are connected in such a way as to form a tree. A proof is sketched, showing that these devices can simulate PRAMs in linear time (assuming the socalled logarithmic cost model). PRAMs are in the second machine class. So, indeed in some sense, trees are powerful. Below we ﬁrst quickly introduce a PSPACEcomplete problem which is a useful tool in order to prove the power of computational models involving trees. A few examples of such models are considered afterwards. Quantiﬁed Boolean Formula The instances of the problem Quantiﬁed Boolean Formula (QBF, sometimes also called QSAT) have the structure Q1 x1 Q2 x2 Q k x k : F(x1 ; : : : ; x k ) : Here F(x1 ; : : : ; x k ) is a Boolean formula with variables x1 ; : : : ; x k and connectives ^, _ and :. Each Qj is one of the quantiﬁers 8 or 9. The problem is to decide whether the formula is true under the obvious interpretation. This problem is known to be complete for PSPACE. All known TM, i. e., all deterministic sequential algorithms, for solving QBF require exponential time. Thus a proof that QBF can be solved by some model M in polynomial time (usually) implies that all problems in PSPACE can be solved by M in polynomial time. Often this can be paired with the “opposite” results that problems that can be solved in polynomial time on M are in PSPACE, and hence TM PSPACE D M P. This, of course, is not the case only for models “with trees”; see [6] for many alternatives. Tree CA A tree CA (TCA for short) working on a full dary tree can be deﬁned as follows: There is a set of states Q. For the root there is a local transition function f0 : (A [ fg) Q Qd ! Q, which uses an input symbol (if available), the root cell’s own state and
those of the d child nodes to computex the next state of the node. And there are d local transition functions f i : Q Q Q d ! Q, where 1 i d. The ith child of a node uses f i to compute its new state depending the state of its parent node, its own state and the states of the d child nodes. For language recognition, input is provided sequentially to the root node during the ﬁrst n steps and a blank symbol afterwards. A word is accepted if the root node enters an accepting state from a designated subset FC Q. Mycielski and Niwi´nski [19] were the ﬁrst to realize that sequential polynomial reductions can be carried out by TCA and that QBF can be recognized by tree CA as well: A formula to be checked with k variables is copied and distributed to 2 k “evaluation cells”. The sequences of left/right choices of the paths to them determine a valuation of the variables with zeros and ones. Each evaluation cell uses the subtree below it to evaluate F(x1 ; : : : ; x k ) accordingly. The results are propagated up to the root. Each cell in level i below the root, 1 i k above an evaluation cell combines the results using _ or ^, depending on whether the ith quantiﬁer of the formula was 9 or 8. On the other hand, it is routine work to prove that the result of a TCA running in polynomial time can be computed sequentially in polynomial space by a depth ﬁrst procedure. Hence one gets: Theorem 19 TCA TIME(Pol(n)) D PSPACE Thus tree cellular automata are in the second machine class. Hyperbolic CA Twodimensional CA as deﬁned in Sect. “Introduction” can be considered as arising from the tessellation of the Euclidean plane Z2 with squares. Therefore, more generally, sometimes CA on a grid G D Zd are called Euclidean CA. Analogously, some Hyperbolic CA arise from tessellation of the hyperbolic plane with some regular polygon. They are covered in depth in a separate article ( Cellular Automata in Hyperbolic Spaces). Here we just consider one special case: The twodimensional hyperbolic plane can be tiled with copies of the regular 6gon with six right angles. If one considers only one quarter and draws a graph with the tiles as nodes and links between those nodes who share a common tile edge, one gets the graph depicted in Fig. 3. Basically it is a tree with two types of nodes, black and white ones, and some “additional” edges depicted as dotted lines. The root is a white node. The ﬁrst child of each node is black. All other children are white; a black node has 2 white children, a white node has 3 white children.
309
310
Cellular Automata as Models of Parallel Computation
Cellular Automata as Models of Parallel Computation, Figure 3 The first levels of a tree of cells resulting from a tiling of the hyperbolic plane with 6gons
For hyperbolic CA (HCA) one uses the formalism analogously to that described for tree CA. As one can see, basically HCA are trees with some additional edges. It is therefore not surprising that they can accept the languages from PSPACE in polynomial time. The inverse inclusion is also proved similarly to the tree case. This gives:
Future Directions
Theorem 20
Proper Inclusions and Denser Hierarchies
HCA TIME(Pol(n)) D PSPACE It is also interesting to have a look at the analogs of P, PSPACE, and so on for hyperbolic CA. Somewhat surprisingly Iwamoto et al. [11] have shown:
At several points in this paper we have pointed out open problems which deserve further investigation. Here, we want to stress three areas which we consider particularly interesting in the area of “CA as a parallel model”.
It has been pointed out several times, that inclusions of complexity classes are not known to be proper, or that gaps between resource bounds still needed to be “large” in order to prove that the inclusion of the related classes is a proper one. For the foreseeable future it remains a wide area for further research. Most probably new techniques will have to be developed to make signiﬁcant progress.
Theorem 21 Activities HCA T IME(Pol(n)) D HCA SPC(Pol(n)) D NHCA TIME(Pol(n)) D NHCA SPC(Pol(n)) where NHCA denotes nondeterministic hyperbolic CA. The analogous equalities hold for exponential time and space. Outlook There is yet another possibility for bringing trees into play: trees of conﬁgurations. The concept of alternation [4] can be carried over to cellular automata. Since there are several active computational units, the deﬁnitions are little bit more involved and it turns out that one has several possibilities which also result in models with slightly diﬀerent properties. But in all cases one gets models from the second machine class. For results, readers are referred to [13] and [23]. On the other end there is some research on what happens if one restricts the possibilities for communication between neighboring cells: Instead of getting information about the complete states of the neighbors, in the extreme case only one bit can be exchanged. See for example [15].
The motivation for considering state change complexity was the energy consumption of CMOS hardware. It is known that irreversible physical computational processes must consume energy. This seems not to be the case for reversible ones. Therefore reversible CA are also interesting in this respect. The deﬁnition of reversible CA and results for them are the topic of the article by Morita ( Reversible Cellular Automata), also in this encyclopedia. Surprisingly, all currently known simulations of irreversible CA on reversible ones (this is possible) exhibit a large state change complexity. This deserves further investigation. Also the examination of CA which are “reversible on the computational core” has been started only recently [16]. There are ﬁrst surprising results; the impacts on computational complexity are unforeseeable. Asynchronicity and Randomization Randomization is an important topic in sequential computing. It is high time that this is also investigated in much
Cellular Automata as Models of Parallel Computation
more depth for cellular automata. The same holds for cellular automata where not all cells are updating their states synchronously. These areas promise a wealth of new insights into the essence of ﬁnegrained parallel systems. Bibliography 1. Achilles AC, Kutrib M, Worsch T (1996) On relations between arrays of processing elements of different dimensionality. In: Vollmar R, Erhard W, Jossifov V (eds) Proceedings Parcella ’96, no. 96 in Mathematical Research. Akademie, Berlin, pp 13–20 2. Blum M (1967) A machineindependent theory of the complexity of recursive functions. J ACM 14:322–336 3. Buchholz T, Kutrib M (1998) On time computability of functions in oneway cellular automata. Acta Inf 35(4):329–352 4. Chandra AK, Kozen DC, Stockmeyer LJ (1981) Alternation. J ACM 28(1):114–133 5. Delorme M, Mazoyer J, Tougne L (1999) Discrete parabolas and circles on 2D cellular automata. Theor Comput Sci 218(2):347– 417 6. van Emde Boas P (1990) Machine models and simulations. In: van Leeuwen J (ed) Handbook of Theoretical Computer Science, vol A. Elsevier Science Publishers and MIT Press, Amsterdam, chap 1, pp 1–66 7. Fatès N, Thierry É, Morvan M, Schabanel N (2006) Fully asynchronous behavior of doublequiescent elementary cellular automata. Theor Comput Sci 362:1–16 8. Garzon M (1991) Models of Massive Parallelism. Texts in Theoretical Computer Science. Springer, Berlin 9. Hemmerling A (1979) Concentration of multidimensional tapebounded systems of Turing automata and cellular spaces. In: Budach L (ed) International Conference on Fundamentals of Computation Theory (FCT ’79). Akademie, Berlin, pp 167–174 10. Hennie FC (1965) Onetape, offline Turing machine computations. Inf Control 8(6):553–578 11. Iwamoto C, Margenstern M (2004) Time and space complexity classes of hyperbolic cellular automata. IEICE Trans Inf Syst E87D(3):700–707 12. Iwamoto C, Hatsuyama T, Morita K, Imai K (2002) Constructible functions in cellular automata and their applications to hierarchy results. Theor Comput Sci 270(1–2):797–809 13. Iwamoto C, Tateishi K, Morita K, Imai K (2003) Simulations between multidimensional deterministic and alternating cellular automata. Fundamenta Informaticae 58(3/4):261–271 14. Kutrib M (2008) Efficient pushdown cellular automata: Universality, time and space hierarchies. J Cell Autom 3(2):93–114 15. Kutrib M, Malcher A (2006) Fast cellular automata with restricted intercell communication: Computational capacity. In: Navarro YKG, Bertossi L (ed) Proceedings Theor Comput Sci (IFIP TCS 2006), pp 151–164
16. Kutrib M, Malcher A (2007) Realtime reversible iterative arrays. In: CsuhajVarjú E, Ésik Z (ed) Fundametals of Computation Theory 2007. LNCS vol 4639. Springer, Berlin, pp 376–387 17. Mazoyer J, Terrier V (1999) Signals in onedimensional cellular automata. Theor Comput Sci 217(1):53–80 18. Merkle D, Worsch T (2002) Formal language recognition by stochastic cellular automata. Fundamenta Informaticae 52(1– 3):181–199 ´ 19. Mycielski J, Niwinski D (1991) Cellular automata on trees, a model for parallel computation. Fundamenta Informaticae XV:139–144 20. Nakamura K (1981) Synchronous to asynchronous transformation of polyautomata. J Comput Syst Sci 23:22–37 21. von Neumann J (1966) Theory of SelfReproducing Automata. University of Illinois Press, Champaign. Edited and completed by Arthur W. Burks 22. Regnault D, Schabanel N, Thierry É (2007) Progress in the analysis of stochastic 2d cellular automata: a study of asychronous 2d minority. In: CsuhajVarjú E, Ésik Z (ed) Fundametals of Computation Theory 2007. LNCS vol 4639. Springer, Berlin, pp 376– 387 23. Reischle F, Worsch T (1998) Simulations between alternating CA, alternating TM and circuit families. In: MFCS’98 satellite workshop on cellular automata, pp 105–114 24. Sanders P, Vollmar R, Worsch T (2002) Cellular automata: Energy consumption and physical feasibility. Fundamenta Informaticae 52(1–3):233–248 25. Savitch WJ (1978) Parallel and nondeterministic time complexity classes. In: Proc. 5th ICALP, pp 411–424 26. Scheben C (2006) Simulation of d0 dimensional cellular automata on ddimensional cellular automata. In: El Yacoubi S, Chopard B, Bandini S (eds) Proceedings ACRI 2006. LNCS, vol 4173. Springer, Berlin, pp 131–140 27. Smith AR (1971) Simple computationuniversal cellular spaces. J ACM 18(3):339–353 28. Stoß HJ (1970) kBandSimulation von kKopfTuringMaschinen. Computing 6:309–317 29. Stratmann M, Worsch T (2002) Leader election in ddimensional CA in time diam log(diam). Future Gener Comput Syst 18(7):939–950 30. Vollmar R (1982) Some remarks about the ‘efficiency’ of polyautomata. Int J Theor Phys 21:1007–1015 31. Wiedermann J (1983) Paralelný Turingov stroj – Model distribuovaného poˇcítaˇca. In: Gruska J (ed) Distribouvané a parallelné systémy. CRC, Bratislava, pp 205–214 32. Wiedermann J (1984) Parallel Turing machines. Tech. Rep. RUUCS8411. University Utrecht, Utrecht 33. Worsch T (1997) On parallel Turing machines with multihead control units. Parallel Comput 23(11):1683–1697 34. Worsch T (1999) Parallel Turing machines with onehead control units and cellular automata. Theor Comput Sci 217(1):3–30
311
312
Cellular Automata, Classification of
Cellular Automata, Classification of KLAUS SUTNER Carnegie Mellon University, Pittsburgh, USA Article Outline Glossary Deﬁnition of the Subject Introduction Reversibility and Surjectivity Deﬁnability and Computability Computational Equivalence Conclusion Bibliography Glossary Cellular automaton For our purposes, a (onedimensional) cellular automaton (CA) is given by a local map : ˙ w ! ˙ where ˙ is the underlying alphabet of the automaton and w is its width. As a data structure, suitable as input to a decision algorithm, a CA can thus be speciﬁed by a simple lookup table. We abuse notation and write (x) for the result of applying the global map of the CA to conﬁguration x 2 ˙ Z . Wolfram classes Wolfram proposed a heuristic classiﬁcation of cellular automata based on observations of typical behaviors. The classiﬁcation comprises four classes: evolution leads to trivial conﬁgurations, to periodic conﬁgurations, evolution is chaotic, evolution leads to complicated, persistent structures. Undecidability It was recognized by logicians and mathematicians in the ﬁrst half of the 20th century that there is an abundance of welldeﬁned problems that cannot be solved by means of an algorithm, a mechanical procedure that is guaranteed to terminate after ﬁnitely many steps and produce the appropriate answer. The best known example of an undecidable problem is Turing’s Halting Problem: there is no algorithm to determine whether a given Turing machine halts when run on an empty tape. Semidecidability A problem is said to be semidecidable or computably enumerable if it admits an algorithm that returns “yes” after ﬁnitely many steps if this is indeed the correct answer. Otherwise the algorithm never terminates. The Halting Problem is the standard example for a semidecidable problem. A problem is decidable if, and only if, the problem itself and its negation are semidecidable.
Universality A computational device is universal it is capable of simulating any other computational device. The existence of universal computers was another central insight of the early days of computability theory and is closely related to undecidability. Reversibility A discrete dynamical system is reversible if the evolution of the system incurs no loss of information: the state at time t can be recovered from the state at time t C 1. For CAs this means that the global map is injective. Surjectivity The global map of a CA is surjective if every conﬁguration appears as the image of another. By contrast, a conﬁguration that fails to have a predecessor is often referred to as a GardenofEden. Finite conﬁgurations One often considers CA with a special quiescent state: the homogeneous conﬁguration where all cells are in the quiescent state is required to be ﬁxed point under the global map. Inﬁnite conﬁgurations where all but ﬁnitely many cells are in the quiescent state are often called ﬁnite conﬁgurations. This is somewhat of a misnomer; we prefer to speak about conﬁgurations with ﬁnite support. Definition of the Subject Cellular automata display a large variety of behaviors. This was recognized clearly when extensive simulations of cellular automata, and in particular onedimensional CA, became computationally feasible around 1980. Surprisingly, even when one considers only elementary CA, which are constrained to a binary alphabet and local maps involving only nearest neighbors, complicated behaviors are observed in some cases. In fact, it appears that most behaviors observed in automata with more states and larger neighborhoods already have qualitative analogues in the realm of elementary CA. Careful empirical studies lead Wolfram to suggest a phenomenological classiﬁcation of CA based on the longterm evolution of conﬁgurations, see [68,71] and Sect. “Introduction”. While Wolfram’s four classes clearly capture some of the behavior of CA it turns out that any attempt at formalizing this taxonomy meets with considerable diﬃculties. Even apparently simple questions about the behavior of CA turn out to be algorithmically undecidable and it is highly challenging to provide a detailed mathematical analysis of these systems. Introduction In the early 1980’s Wolfram published a collection of 20 open problems in the the theory of CA, see [69]. The ﬁrst problem on his list is “What overall classiﬁcation of cellu
Cellular Automata, Classification of
lar automata behavior can be given?” As Wolfram points out, experimental mathematics provides a ﬁrst answer to this problem: one performs a large number of explicit simulations and observes the patterns associated with the long term evolution of a conﬁguration, see [67,71]. Wolfram proposed a classiﬁcation that is based on extensive simulations in particular of onedimensional cellular automata where the evolution of a conﬁguration can be visualized naturally as a twodimensional image. The classiﬁcation involves four classes that can be described as follows:
W1: Evolution leads to homogeneous ﬁxed points. W2: Evolution leads to periodic conﬁgurations. W3: Evolution leads to chaotic, aperiodic patterns. W4: Evolution produces persistent, complex patterns of localized structures.
Thus, Wolfram’s ﬁrst three classes follow closely concepts from continuous dynamics: ﬁxed point attractors, periodic attractors and strange attractors, respectively. They correspond roughly to systems with zero temporal and spatial entropy, zero temporal entropy but positive spatial entropy, and positive temporal and spatial entropy, respectively. W4 is more diﬃcult to associate with a continuous analogue except to say that transients are typically very long. To understand this class it is preferable to consider CA as models of massively parallel computation rather than as particular discrete dynamical systems. It was conjectured by Wolfram that W4 automata are capable of performing complicated computations and may often be computationally universal. Four examples of elementary CA that are typical of the four classes are shown in Fig. 1. Li and Packard [32,33] proposed a slightly modiﬁed version of this hierarchy by reﬁning the low classes and in particular Wolfram’s W2. Much like Wolfram’s classiﬁcation, the Li–Packard classiﬁcation is concerned with the asymptotic behavior of the automaton, the structure and behavior of the limiting conﬁgurations. Here is one version of the Li–Packard classiﬁcation, see [33]. LP1: Evolution leads to homogeneous ﬁxed points. LP2: Evolution leads to nonhomogeneous ﬁxed points, perhaps up a to a shift. LP3: Evolution leads to ultimately periodic conﬁgurations. Regions with periodic behavior are separated by domain walls, possibly up to a shift. LP4: Conﬁgurations produce locally chaotic behavior. Regions with chaotic behavior are separated by domain walls, possibly up to a shift. LP5: Evolution leads to chaotic patterns that are spatially unbounded.
LP6: Evolution is complex. Transients are long and lead to complicated spacetime patterns which may be nonmonotonic in their behavior. By contrast, a classiﬁcation closer to traditional dynamical systems theory was introduced by K˚urka, see [27,28]. The classiﬁcation rests on the notions of equicontinuity, sensitivity to initial conditions and expansivity. Suppose x is a point in some metric space and f a map on that space. Then f is equicontinuous at x if 8" > 0 9ı > 0 8y 2 Bı (x); n 2 N : (d( f n (x); f n (y)) < ") where d(:; :) denotes a metric. Thus, all points in a sufﬁciently small neighborhood of x remain close to the iterates of x for the whole orbit. Global equicontinuity is a fairly strong condition, it implies that the limit set of the automaton is reached after ﬁnitely many steps. The map is sensitive (to initial conditions) if 8x; " > 0 9ı > 0 8y 2 Bı (x) 9n 2 N : (d( f n (x); f n (y)) ") : Lastly, the map is positively expansive if 9" > 0 8x ¤ y 9n 2 N : (d( f n (x); f n (y)) ") : K˚urka’s classiﬁcation then takes the following form. K1: All points are equicontinuous under the global map. K2: Some but not all points are equicontinuous under the global map. K3: The global map is sensitive but not positively expansive. K4: The global map is positively expansive. This type of classiﬁcation is perfectly suited to the analysis of uncountable spaces such as the Cantor space f0; 1gN or the full shift space ˙ Z which carry a natural metric structure. For the most part we will not pursue the analysis of CA by topological and measure theoretic means here and refer to Topological Dynamics of Cellular Automata in this volume for a discussion of these methods. See Sect. “Deﬁnability and Computability” for the connections between topology and computability. Given the apparent complexity of observable CA behavior one might suspect that it is diﬃcult to pinpoint the location of an arbitrary given CA in any particular classiﬁcation scheme with any precision. This is in contrast to simple parameterizations of the space of CA rules such as Langton’s parameter that are inherently easy to
313
314
Cellular Automata, Classification of
Cellular Automata, Classification of, Figure 1 Typical examples of the behavior described by Wolfram’s classes among elementary cellular automata
compute. Brieﬂy, the value of a local map is the fraction of local conﬁgurations that map to a nonzero value, see [29,33]. Small values result in short transients leading to ﬁxed points or simple periodic conﬁgurations. As increases the transients grow longer and the orbits become more and more complex until, at last, the dynamics become chaotic. Informally, sweeping the value from 0 to 1 will produce CA in W1, then W2, then W4 and lastly in W3. The last transition appears to be associated with a threshold phenomenon. It is unclear what the connection between Langton’s value and computational properties of a CA is, see [37,46]. Other numerical measures that appear to be loosely connected to classiﬁcations are the mean ﬁeld parameters of Gutowitz [20,21], the Zparameter by Wuensche [72], see also [44]. It seems doubtful that a structured taxonomy along the lines of Wolfram or Li–Packard can be derived from a simple numerical measure such as the value alone, or even from a combination of several such values. However, they may be useful as empirical evidence for membership in a particular class. Classiﬁcation also becomes signiﬁcantly easier when one restricts one’s attention to a limited class of CA such as additive CA, see Additive Cellular Automata. In this
context, additive means that the local rule of the automaP ton has the form (E x ) D i c i x i where the coeﬃcients as well as the states are modular numbers. A number of properties starting with injectivity and surjectivity as well as topological properties such as equicontinuity and sensitivity can be expressed in terms of simple arithmetic conditions on the rule coeﬃcients. For example, equicontinuity is equivalent to all prime divisors of the modulus m dividing all coeﬃcients ci , i > 1, see [35] and the references therein. It is also noteworthy that in the linear case methods tend to carry over to arbitrary dimensions; in general there is a signiﬁcant step in complexity from dimension one to dimension two. No claim is made that the given classiﬁcations are complete; in fact, one should think of them as prototypes rather than deﬁnitive taxonomies. For example, one might add the class of nilpotent CA at the bottom. A CA is nilpotent if all conﬁgurations evolve to a particular ﬁxed point after ﬁnitely many steps. Equivalently, by compactness, there is a bound n such that all conﬁgurations evolve to the ﬁxed point in no more than n steps. Likewise, we could add the class of intrinsically universal CA at the top. A CA is intrinsically universal if it is capable of simulating all other CA of the same dimension in some reasonable sense. For
Cellular Automata, Classification of
a fairly natural notion of simulation see [45]. At any rate, considerable eﬀort is made in the references to elaborate the characteristics of the various classes. For many concrete CA visual inspection of the orbits of a suitable sample of conﬁgurations readily suggests membership in one of the classes. Reversibility and Surjectivity A ﬁrst tentative step towards the classiﬁcation of a dynamical systems is to determine its reversibility or lack thereof. Thus we are trying to determine whether the evolution of the system is associated with loss of information, or whether it is possible to reconstruct the state of the system at time t from its state at time t C 1. In terms of the global map of the system we have to decide injectivity. Closely related is the question whether the global map is surjective, i. e., whether there is no GardenofEden: every conﬁguration has a predecessor under the global map. As a consequence, the limit set of the automaton is the whole space. It was shown of Hedlund that for CA the two notions are connected: every reversible CA is also surjective, see [24], Reversible Cellular Automata. As a matter of fact, reversibility of the global map of a CA implies openness of the global map, and openness implies surjectivity. The converse implications are both false. By a wellknown theorem by Hedlund [24] the global maps of CA are precisely the continuous maps that commute with the shift. It follows from basic topology that the inverse global map of a reversible CA is again the global map of a suitable CA. Hence, the predecessor conﬁguration of a given conﬁguration can be reconstructed by another suitably chosen CA. For results concerning reversibility on the limit set of the automaton see [61]. From the perspective of complexity the key result concerning reversible systems is the work by Lecerf [30] and Bennett [7]. They show that reversible Turing machines can compute any partial recursive function, modulo a minor technical problem: In a reversible Turing machine there is no loss of information; on the other hand even simple computable functions are clearly irreversible in the sense that, say, the sum of two natural numbers does not determine these numbers uniquely. To address this issue one has to adjust the notion of computability slightly in the context of reversible computation: given a partial recursive function f : N ! N the function b f (x) D hx; f (x)i can be computed by a reversible Turing machine where h:; :i is any eﬀective pairing function. If f itself happens to be injective then there is no need for the coding device and f can be computed by a reversible Turing machine directly. For example, we can compute the product of two primes re
versibly. Morita demonstrated that the same holds true for onedimensional cellular automata [38,40,62], Tiling Problem and Undecidability in Cellular Automata: reversibility is no obstruction to computational universality. As a matter of fact, any irreversible cellular automaton can be simulated by a reversible one, at least on conﬁgurations with ﬁnite support. Thus one should expect reversible CA to exhibit fairly complicated behavior in general. For inﬁnite, onedimensional CA it was shown by Amoroso and Patt [2] that reversibility is decidable. Moreover, it is decidable if the the global map is surjective. An eﬃcient practical algorithm using concepts of automata theory can be found in [55], see also [10,14,23]. The fast algorithm is based on interpreting a onedimensional CA as a deterministic transducer, see [6,48] for background. The underlying semiautomaton of the transducer is a de Bruijn automaton B whose states are words in ˙ w1 where ˙ is the alphabet of the CA and w is its width. c The transitions are given by ax ! xb where a; b; c 2 ˙ , w2 x2˙ and c D (axb), being the local map of the CA. Since B is strongly connected, the product automaton of B will contain a strongly connected component C that contains the diagonal D, an isomorphic copy of B. The global map of the CA is reversible if, and only if, C D D is the only nontrivial component. It was shown by Hedlund [24] that surjectivity of the global map is equivalent with local injectivity: the restriction of the map to conﬁgurations with ﬁnite support must be injective. The latter property holds if, and only if, C D D and is thus easily decidable. Automata theory does not readily generalize to words of dimensions higher than one. Indeed, reversibility and surjectivity in dimensions higher than one are undecidable, see [26] and Tiling Problem and Undecidability in Cellular Automata in this volume for the rather intricate argument needed to establish this fact. While the structure of reversible onedimensional CA is wellunderstood, see Tiling Problem and Undecidability in Cellular Automata, [16], and while there is an eﬃcient algorithm to check reversibility, few methods are known that allow for the construction of interesting reversible CA. There is a noteworthy trick due to Fredkin that exploits the reversibility of the Fibonacci equation X nC1 D X n C X n1 . When addition is interpreted as exclusive or this can be used to construct a secondorder CA from any given binary CA; the former can then be recoded as a ﬁrstorder CA over a 4letter alphabet. For example, for the open but irreversible elementary CA number 90 we obtain the CA shown in Fig. 2. Another interesting class of reversible onedimensional CA, the socalled partitioned cellular automata (PCA), is due to Morita and Harao, see [38,39,40]. One
315
316
Cellular Automata, Classification of
Cellular Automata, Classification of, Figure 2 A reversible automaton obtained by applying Fredkin’s construction to the irreversible elementary CA 77
can think of a PCA as a cellular automaton whose cells are divided into multiple tracks; speciﬁcally Morita uses an alphabet of the form ˙ D ˙1 ˙2 ˙3 . The conﬁgurations of the automaton can be written as (X; Y; Z) where X 2 ˙1 Z , Y 2 ˙2 Z and Z 2 ˙3 Z . Now consider the shearing map deﬁned by (X; Y; Z) D (RS(X); Y; LS(Z)) where RS and LS denote the right and left shift, respectively. Given any function f : ˙ ! ˙ we can deﬁne a global map f ı where f is assumed to be applied pointwise. Since the shearing map is bijective, the CA will be reversible if, and only if, the map f is bijective. It is relatively easy to construct bijections f that cause the CA to perform particular computational tasks, even when a direct construction appears to be entirely intractable.
Definability and Computability Formalizing Wolfram’s Classes Wolfram’s classiﬁcation is an attempt to categorize the complexity of the CA by studying the patterns observed during the longterm evolution of all conﬁgurations. The ﬁrst two classes are relatively easy to observe, but it is difﬁcult to distinguish between the last two classes. In particular W4 is closely related to the kind of behavior that would be expected in connection with systems that are capable of performing complicated computations, including the ability to perform universal computation; a property that is notoriously diﬃcult to check, see [52]. The focus on the full conﬁguration space rather than a signiﬁcant sub
Cellular Automata, Classification of
set thereof corresponds to the worstcase approach wellknown in complexity theory and is somewhat inferior to an average case analysis. Indeed, Baldwin and Shelah point out that a product construction can be used to design a CA whose behavior is an amalgamation of the behavior of two given CA, see [3,4]. By combining CA in diﬀerent classes one obtains striking examples of the weakness of the worstcase approach. A natural example of this mixed type of behavior is elementary CA 184 which displays class II or class III behavior, depending on the initial conﬁguration. Another basic example for this type of behavior is the wellstudied elementary CA 30, see Sect. “Conclusion”. Still, for many CA a worstcase classiﬁcation seems to provide useful information about the structural properties of the automaton. The ﬁrst attempt at formalizing Wolfram’s class was made by Culik and Yu who proposed the following hierarchy, given here in cumulative form, see [11]: CY1: All conﬁgurations evolve to a ﬁxed point. CY2: All conﬁgurations evolve to a periodic conﬁguration. CY3: The orbits of all conﬁgurations are decidable. CY4: No constraints. The Culik–Yu classiﬁcation employs two rather diﬀerent methods. The ﬁrst two classes can be deﬁned by a simple formula in a suitable logic whereas the third (and the fourth in the disjoint version of the hierarchy) rely on notions of computability theory. As a general framework for both approaches we consider discrete dynamical systems, structures of the form A D hC ; i where C ˙ Z is the space of conﬁgurations of the system and is the “next conﬁguration” relation on C. We will only consider the deterministic case where for each conﬁguration x there exists precisely one conﬁguration y such that x y. Hence we are really dealing with algebras with one unary function, but iteration is slightly easier to deal with in the relational setting. The structures most important in this context are the ones arising from a CA. For any local map we consider the structure A D hC ; i where the next conﬁguration relation is determined by x (x). Using the standard language of ﬁrst order logic we can readily express properties of the CA in terms of the system A . For example, the system is reversible, respectively surjective, if the following assertions are valid over A: 8 x; y; z (x z and y z implies x D y) ; 8 x 9 y (y x) : As we have seen, both properties are easily decidable in the onedimensional case. In fact, one can express the ba
sic predicate x y (as well as equality) in terms of ﬁnite state machines on inﬁnite words. These machines are deﬁned like ordinary ﬁnite state machines but the acceptance condition requires that certain states are reached inﬁnitely and coinﬁnitely often, see [8,19]. The emptiness problem for these automata is easily decidable using graph theoretic algorithms. Since regular languages on inﬁnite words are closed under union, complementation and projection, much like their ﬁnite counterparts, and all the corresponding operations on automata are eﬀective, it follows that one can decide the validity of ﬁrst order sentences over A such as the two examples above: the modelchecking problem for these structures and ﬁrst order logic is decidable, see [34]. For example, we can decide whether there is a conﬁguration that has a certain number of predecessors. Alternatively, one can translate these sentences into monadic second order logic of one successor, and use wellknown automatabased decision algorithms there directly, see [8]. Similar methods can be used to handle conﬁgurations with ﬁnite support, corresponding to weak monadic second order logic. Since the complexity of the decision procedure is nonelementary one should not expect to be able to handle complicated assertions. On the other hand, at least for weak monadic second order logic practical implementations of the decision method exist, see [17]. There is no hope of generalizing this approach as the undecidability of, say, reversibility in higher dimensions demonstrates. t C Write x ! y if x evolves to y in exactly t steps, x ! y if x evolves to y in any positive number of steps and x ! y t if x evolves to y in any number of steps. Note that ! is deﬁnable for each ﬁxed t, but ! fails to be so deﬁnable in ﬁrst order logic. This is in analogy to the undeﬁnability of path existence problems in the ﬁrst order theory of graphs, see [34]. Hence it is natural to extend our language so we can express iterations of the global map, either by adding transitive closures or by moving to some limited system of
higher order logic over A where ! is deﬁnable, see [8]. Arguably the most basic decision problem associated with a system A that requires iteration of the global map is the Reachability Problem: given two conﬁgurations x and y, does the evolution of x lead to y? A closely related but diﬀerent question is the Conﬂuence Problem: will two conﬁgurations x and y evolve to the same limit cycle? Conﬂuence is an equivalence relation and allows for the decomposition of conﬁguration space into limit cycles together with their basins of attraction. The Reachability and Conﬂuence Problem amount to determining, given conﬁgurations x and y, whether
x! y;
9 z (x ! z and y ! z) ;
317
318
Cellular Automata, Classification of
respectively. As another example, the ﬁrst two Culik–Yu class can be deﬁned like so:
8 x 9 z (x ! z and z z);
C
8 x 9 z (x ! z and z ! z) : It is not diﬃcult to give similar deﬁnitions for the lower Li– Packard classes if one extends the language by a function symbol denoting the shift operator. The third Culik–Yu class is somewhat more involved. By deﬁnition, a CA lies in the third class if it admits a global decision algorithm to determine whether a given conﬁguration x evolves to another given conﬁguration y in a ﬁnite number of steps. In other words, we are looking for automata where the Reachability Problem is algorithmically solvable. While one can agree that W4 roughly translates into undecidability and is thus properly situated in the hierarchy, it is unclear how chaotic patterns in W3 relate to decidability. No method is known to translate the apparent lack of tangible, persistent patterns in rules such as elementary CA 30 into decision algorithms for Reachability. There is another, somewhat more technical problem to overcome in formalizing classiﬁcations. Recall that the full conﬁguration space is C D ˙ Z . Intuitively, given x 2 C we can eﬀectively determine the next conﬁguration y D (x). However, classical computability theory does not deal with inﬁnitary objects such as arbitrary conﬁguration so a bit of care is needed here. The key insight is that we can determine arbitrary ﬁnite segments of (x) using only ﬁnite segments of x (and, of course, the lookup table for the local map). There are several ways to model computability on ˙ Z based on this idea of ﬁnite approximations, we refer to [66] for a particularly appealing model based on socalled type2 Turing machines; the reference also contains many pointers to the literature as well as a comparison between the diﬀerent approaches. It is easy to see that for any CA the global map as well as all its iterates t are computable, the latter uniformly in t. However, due to the ﬁnitary nature of all computations, equality is not decidable in type2 computability: the unequal operator U0 (x; y) D 0 if x ¤ y, U0 (x; y) undeﬁned otherwise, is computable and thus unequality is semidecidable, but the stronger U0 (x; y) D 0 if x ¤ y, U0 (x; y) D 1, otherwise, is not computable. The last result is perhaps somewhat counterintuitive, but it is inevitable if we strictly adhere to the ﬁnite approximation principle. In order to avoid problems of this kind it has become customary to consider certain subspaces of the full conﬁguration space, in particular Cfin , the collection of conﬁgurations with ﬁnite support, Cper , the collection of spatially periodic conﬁgurations and Cap , the collection of al
most periodic conﬁgurations of the form : : : uuuwvvv : : : where u, v and w are all ﬁnite words over the alphabet of the automaton. Thus, an almost periodic conﬁguration diﬀers from a conﬁguration of the form ! u v ! in only ﬁnitely many places. Conﬁgurations with ﬁnite support correspond to the special case where u D v D 0 is a special quiescent symbol and spatially periodic conﬁgurations correspond to u D v, w D ". The most general type of conﬁguration that admits a ﬁnitary description is the class Crec of recursive conﬁgurations, where the assignment of state to a cell is given by a computable function. It is clear that all these subspaces are closed under the application of a global map. Except for Cfin there are also closed under inverse maps in the following sense: given a conﬁguration y in some subspace that has a predecessor x in Call there already exists a predecessor in the same subspace, see [55,58]. This is obvious except in the case of recursive conﬁgurations. The reference also shows that the recursive predecessor cannot be computed eﬀectively from the target conﬁguration. Thus, for computational purposes the dynamics of the cellular automaton are best reﬂected in Cap : it includes all conﬁguration with ﬁnite support and we can eﬀectively trace an orbit in both directions. It is not hard to see that Cap is the least such class. Alas, it is standard procedure to avoid minor technical diﬃculties arising from the inﬁnitely repeated spatial patterns and establish classiﬁcations over the subspace Cfin . There is a arguably not much harm in this simpliﬁcation since Cfin is a dense subspace of Call and compactness can be used to lift properties from Cfin to the full conﬁguration space. The Culik–Yu hierarchy is correspondingly deﬁned over Cfin , the class of all conﬁgurations of ﬁnite support. In this setting, the ﬁrst three classes of this hierarchy are undecidable and the fourth is undecidable in the disjunctive version: there is no algorithm to test whether a CA admits undecidable orbits. As it turns out, the CA classes are complete in their natural complexity classes within the arithmetical hierarchy [50,52]. Checking membership in the ﬁrst two classes comes down to performing an inﬁnite number of potentially unbounded searches and can be described logically by a ˘2 expression, a formula of type 8 x 9 y R(x; y) where R is a decidable predicate. Indeed, CY1 and CY2 are both ˘2 complete. Thus, deciding whether all conﬁgurations on a CA evolve to a ﬁxed point is equivalent to the classical problem of determining whether a semidecidable set is inﬁnite. The third class is even less amenable to algorithmic attack; one can show that CY3 is ˙3 complete, see [53]. Thus, deciding whether all orbits are decidable is as diﬃcult as determining whether any given semidecidable set is decidable. It is
Cellular Automata, Classification of
not diﬃcult to adjust these undecidability results to similar classes such as the lower levels of the Li–Packard hierarchy that takes into account spatial displacements of patterns. Eﬀective Dynamical Systems and Universality The key property of CA that is responsible for all these undecidability results is the fact that CA are capable of performing arbitrary computations. This is unsurprising when one deﬁnes computability in terms of Turing machines, the devices introduced by Turing in the 1930’s, see [47,63]. Unlike the Gödel–Herbrand approach using general recursive functions or Church’s calculus, Turing’s devices are naturally closely related to discrete dynamical systems. For example, we can express an instantaneous description of a Turing machine as a ﬁnite sequence al al C1 : : : a1 p a1 a2 : : : ar where the ai are tape symbols and p is a state of the machine, with the understanding that the head is positioned at a1 and that all unspeciﬁed tape cells contain the blank symbol. Needless to say, these Turing machine conﬁgurations can also be construed as ﬁnite support conﬁgurations of a onedimensional CA. It follows that a onedimensional CA can be used to simulate an arbitrary Turing machine, hence CA are computational universal: any computable function whatsoever can already be computed by a CA. Note, though, that the simulation is not entirely trivial. First, we have to rely on input/output conventions. For example, we may insist that objects in the input domain, typically tuples of natural numbers, are translated into a conﬁguration of the CA by a primitive recursive coding function. Second, we need to adopt some convention that determines when the desired output has occurred: we follow the evolution of the input conﬁguration until some “halting” condition applies. Again, this condition must be primitive recursively decidable though there is considerable leeway as to how the end of a computation should be signaled by the CA. For example, we could insist that a particular cell reaches a special state, that an arbitrary cell reaches a special state, that the conﬁguration be a ﬁxed point and so forth. Lastly, if and when a halting conﬁguration is reached, we a apply a primitive recursive decoding function to obtain the desired output. Restricting the space to conﬁgurations that have ﬁnite support, that are spatially periodic, and so forth, produces an eﬀective dynamical system: the conﬁgurations can be coded as integers in some natural way, and the next conﬁguration relation is primitive recursive in the sense that the corresponding relation on code numbers is so primitive
recursive. A classical example for an eﬀective dynamical system is given by selecting the instantaneous descriptions of a Turing machine M as conﬁgurations, and onestep relation of the Turing machine as the operation of C . Thus we obtain a system A M whose orbits represent the computations of the Turing machine. Likewise, given the local map of a CA we obtain a system A whose operation is the induced global map. While the full conﬁguration space Call violates the eﬀectiveness condition, any of the spaces Cper , Cfin , Cap and Crec will give rise to an eﬀective dynamical system. Closure properties as well as recent work on the universality of elementary CA 110, see Sect. “Conclusion”, suggests that the class of almost periodic conﬁgurations, also known as backgrounds or wallpapers, see [9,58], is perhaps the most natural setting. Both Cfin and Cap provide a suitable setting for a CA that simulates a Turing machine: we can interpret A M as a subspace of A for some suitably constructed onedimensional CA ; the orbits of the subspace encode computations of the Turing machine. It follows from the undecidability of the Halting Problem for Turing machines that the Reachability Problem for these particular CA is undecidable. Note, though, that orbits in A M may well be ﬁnite, so some care must be taken in setting up the simulation. For example, one can translate halting conﬁgurations into ﬁxed points. Another problem is caused by the worstcase nature of our classiﬁcation schemes: in Turing machines and their associated systems A M it is only behavior on specially prepared initial conﬁgurations that matters, whereas the behavior of a CA depends on all conﬁgurations. The behavior of a Turing machine on all instantaneous descriptions, rather than just the ones that can occur during a legitimate computation on some actual input, was ﬁrst studied by Davis, see [12,13], and also Hooper [25]. Call a Turing machine stable if it halts on any instantaneous description whatsoever. With some extra care one can then construct a CA that lies in the ﬁrst Culik–Yu class, yet has the same computational power as the Turing machine. Davis showed that every total recursive function can already be computed by a stable Turing machine, so membership in CY1 is not an impediment to considerable computational power. The argument rests on a particular decomposition of recursive functions. Alternatively, one directly manipulate Turing machines to obtain a similar result, see [49,53]. On the other hand, unstable Turing machines yield a natural and codingfree deﬁnition of universality: a Turing machine is Davisuniversal if the set of all instantaneous description on which the machine halts is ˙1 complete. The mathematical theory of inﬁnite CA is arguably more elegant than the actually observable ﬁnite case. As
319
320
Cellular Automata, Classification of
a consequence, classiﬁcations are typically concerned with CA operating on inﬁnite grids, so that even a conﬁguration with ﬁnite support can carry arbitrarily much information. If we restrict our attention to the space of conﬁgurations on a ﬁnite grid a more ﬁnegrained analysis is required. For a ﬁnite grid of size n the conﬁguration space has the form Cn D [n] ! ˙ and is itself ﬁnite, hence any orbit is ultimately periodic and the Reachability Problem is trivially decidable. However, in practice there is little difference between the ﬁnite and inﬁnite case. First, computational complexity issues make it practically impossible to analyze even systems of modest size. The Reachability Problem for ﬁnite CA, while decidable, is PSPACEcomplete even in the onedimensional case. Computational hardness appears in many other places. For example, if we try to determine whether a given conﬁguration on a ﬁnite grid is a GardenofEden the problem turns out to be NLOGcomplete in dimension one and NP complete in all higher dimensions, see [56]. Second, it stands to reason that the more interesting classiﬁcation problem in the ﬁnite case takes the following parameterized form: given a local map together with boundary conditions, determine the behavior of on all ﬁnite grids. Under periodic boundary conditions this comes down to the study of Cper and it seems that there is little diﬀerence between this and the ﬁxed boundary case. Since all orbits on a ﬁnite grid are ultimately periodic one needs to apply a more ﬁnegrained classiﬁcation that takes into account transient lengths. It is undecidable whether all conﬁgurations on all ﬁnite grids evolve to a ﬁxed point under a given local map, see [54]. Thus, there is no algorithm to determine whether
hCn ; i ˆ 8 x 9 z (x ! z and z z) for all grid sizes n. The transient lengths are trivially bounded by kn where k is the size of the alphabet of the automaton. It is undecidable whether the transient lengths grow according to some polynomial bound, even when the polynomial in question is constant. Restrictions of the conﬁguration space are one way to obtain an eﬀective dynamical system. Another is to interpret the approximationbased notion of computability on the full space in terms of topology. It is wellknown that computable maps Call ! Call are continuous in the standard product topology. The clopen sets in this topology are the ﬁnite unions of cylinder sets where a cylinder set is determined by the values of a conﬁguration in ﬁnitely many places. By a celebrated result of Hedlund the global maps of a CA on the full space are characterized by being continuous and shiftinvariant. Perhaps somewhat counterintuitively, the decidable subsets of Call
are quite weak, they consist precisely of the clopen sets. Now consider a partition of Call into ﬁnitely many clopen sets C0 ; C2 ; : : : ; C n1 . Thus, it is decidable which block of the partition a given point in the space belongs to. Moreover, Boolean operations on clopen sets as well as application of the global map and the inverse global map are all computable. The partition aﬀords a natural projection : Call ! ˙n where ˙n D f0; 1; : : : ; n 1g and (x) D i iﬀ x 2 C i . Hence the projection translates orbits in the full space Call into a class W of !words over ˙n , the symbolic orbits of the system. The Cantor space ˙nZ together with the shift describes all logically possible orbits with respect to the given partition and W describes the symbolic orbits that actually occur in the given CA. The shift operator corresponds to an application of the global map of the CA. The ﬁnite factors of W provide information about possible ﬁnite traces of an orbit when ﬁltered through the given partition. Whole orbits, again ﬁltered through the partition, can be described by !words. To tackle the classiﬁcation of the CA in terms of W it was suggested by Delvenne et al., see [15], to refer to the CA as decidable if there it is decidable whether W has nonempty intersection with a !regular language. Alas, decidability in this sense is very diﬃcult, its complexity being ˙11 complete and thus outside of the arithmetical hierarchy. Likewise it is suggested to call a CA universal if the problem of deciding whether the cover of W, the collection of all ﬁnite factors, is ˙1 complete, in analogy to Davisuniversality. Computational Equivalence In recent work, Wolfram suggests a socalled Principle of Computational Equivalence, or PCE for short, see [71], p. 717. PCE states that most computational processes come in only two ﬂavors: they are either of a very simple kind and avoid undecidability, or they represent a universal computation and are therefore no less complicated than the Halting Problem. Thus, Wolfram proposes a zeroone law: almost all computational systems, and thus in particular all CA, are either as complicated as a universal Turing machine or are computationally simple. As evidence for PCE Wolfram adduces a very large collection of simulations of various eﬀective dynamical systems such as Turing machines, register machines, tag systems, rewrite systems, combinators, and cellular automata. It is pointed out in Chap. 3 of [71], that in all these classes of systems there are surprisingly small examples that exhibit exceedingly complicated behavior – and presumably are capable of universal computation. Thus it is conceivable that universality is a rather common property, a property that is
Cellular Automata, Classification of
indeed shared by all systems that are not obviously simple. Of course, it is often very diﬃcult to give a complete proof of the computational universality of a natural system, as opposed to carefully constructed one, so it is not entirely clear how many of Wolfram’s examples are in fact universal. As a case in point consider the universality proof of Conway’s Game of Life, or the argument for elementary CA 110. If Wolfram’s PCE can be formally established in some form it stands to reason that it will apply to all effective dynamical systems and in particular to CA. Hence, classiﬁcations of CA would be rather straightforward: at the top there would be the class of universal CA, directly preceded by a class similar to the third Culik–Yu class, plus a variety of subclasses along the lines of the lower Li– Packard classes. The corresponding problem in classical computability theory was ﬁrst considered in the 1930’s by Post and is now known as Post’s Problem: is there a semidecidable set that fails to be decidable, yet is not as complicated as the Halting Set? In terms of Turing degrees the problem thus is to construct a semidecidable set A such that ; 3. Future Directions This short survey has only been able to hint at the vast wealth of emergent phenomena that arise in CA. Much work yet remains to be done, in classifying the diﬀerent structures, identifying general laws governing their behavior, and determining the the causal mechanisms that lead them to arise. For example, there are as yet no general techniques for determining whether a given domain is stable in a given CA; for characterizing the set of initial conditions that will eventually give rise to it; or for working out the particles that it supports. In CA or two or more dimensions, a large body of descriptive results are available, but these are more frequently anecdotal than systematic. A signiﬁcant barrier to progress has been the lack of good mathematical techniques for identifying, describing, and classifying domains. One promising development in this area is an informationtheoretic ﬁltering technique that can operate on conﬁgurations of any dimension [13]. Bibliography Primary Literature 1. Boccara N, Nasser J, Roger M (1991) Particlelike structures and their interactions in spatiotemporal patterns generated by onedimensional deterministic cellular automaton rules. Phys Rev A 44:866 2. Chate H, Manneville P (1992) Collective behaviors in spatially extended systems with local interactions and synchronous updating. Profress Theor Phys 87:1 3. Crutchfield JP, Hanson JE (1993) Turbulent pattern bases for cellular automata. Physica D 69:279 4. Eloranta K, Nummelin E (1992) The kink of cellular automaton rule 18 performs a random walk. J Stat Phys 69:1131 5. Fisch R, Gravner J, Griffeath D (1991) Thresholdrange scaling of excitable cellular automata. Stat Comput 1:23–39
6. Gallas J, Grassberger P, Hermann H, Ueberholz P (1992) Noisy collective behavior in deterministic cellular automata. Physica A 180:19 7. Grassberger P (1984) Chaos and diffusion in deterministic cellular automata. Phys D 10:52 8. Griffeath D (2008) The primordial soup kitchen. http://psoup. math.wisc.edu/kitchen.html 9. Hanson JE, Crutchfield JP (1992) The attractorbasin portrait of a cellular automaton. J Stat Phys 66:1415 10. Hanson JE, Crutchfield JP (1997) Computational mechanics of cellular automata: An example. Physica D 103:169 11. Hopcroft JE, Ullman JD (1979) Introduction to Automata Theory, Languages, and Computation. AddisonWesley, Reading 12. Lindgren K, Moore C, Nordahl M (1998) Complexity of twodimensional patterns. J Stat Phys 91:909 13. Shalizi C, Haslinger R, Rouquier J, Klinker K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73:036104 14. Wojtowicz M (2008) Mirek’s cellebration. http://www.mirekw. com/ca/ 15. Wolfram S (1984) Universality and complexity in cellular automata. Physica D 10:1
Books and Reviews Das R, Crutchfield JP, Mitchell M, Hanson JE (1995) Evolving Globally Synchronized Cellular Automata. In: Eshelman LJ (ed) Proceedings of the Sixth International Conference on Genetic Algorithms. Morgan Kaufmann, San Mateo Gerhardt M, Schuster H, Tyson J (1990) A cellular automaton model of excitable medai including curvature and dispersion. Science 247:1563 Gutowitz HA (1991) Transients, Cycles, and Complexity in Cellular Automata. Phys Rev A 44:R7881 Henze C, Tyson J (1996) Cellular automaton model of threedimensional excitable media. J Chem Soc Faraday Trans 92:2883 Hordijk W, Shalizi C, Crutchfield J (2001) Upper bound on the products of particle interactions in cellular automata. Physica D 154:240 Iooss G, Helleman RH, Stora R (ed) (1983) Chaotic Behavior of Deterministic Systems. NorthHolland, Amsterdam Ito H (1988) Intriguing Properties of Global Structure in Some Classes of Finite Cellular Automata. Physica 31D:318 Jen E (1986) Global Properties of Cellular Automata. J Stat Phys 43:219 Kaneko K (1986) Attractors, Basin Structures and Information Processing in Cellular Automata. In: Wolfram S (ed) Theory and Applications of Cellular Automata. World Scientific, Singapore, pp 367 Langton C (1990) Computation at the Edge of Chaos: Phase transitions and emergent computation. Physica D 42:12 Lindgren K (1987) Correlations and Random Information in Cellular Automata. Complex Syst 1:529 Lindgren K, Nordahl M (1988) Complexity Measures and Cellular Automata. Complex Syst 2:409 Lindgren K, Nordahl M (1990) Universal Computation in Simple OneDimensional Cellular Automata. Complex Syst 4:299 Mitchell M (1998) Computation in Cellular Automata: A Selected Review. In: Schuster H, Gramms T (eds) Nonstandard Computation. Wiley, New York
Cellular Automata, Emergent Phenomena in
Packard NH (1984) Complexity in Growing Patterns in Cellular Automata. In: Demongeot J, Goles E, Tchuente M (eds) Dynamical Behavior of Automata: Theory and Applications. Academic Press, New York Packard NH (1985) Lattice Models for Solidification and Aggregation. Proceedings of the First International Symposium on Form, Tsukuba Pivato M (2007) Defect Particle Kinematics in OneDimensional Cellular Automata. Theor Comput Sci 377:205–228
Weimar J (1997) Cellular automata for reactiondiffusion systems. Parallel Comput 23:1699 Wolfram S (1984) Computation Theory of Cellular Automata. Comm Math Phys 96:15 Wolfram S (1986) Theory and Applications of Cellular Automata. World Scientific Publishers, Singapore Wuensche A, Lesser MJ (1992) The Global Dynamics of Cellular Automata. Santa Fe Institute Studies in the Science of Complexity, Reference vol 1. AddisonWesley, Redwood City
335
336
Cellular Automata and Groups
Cellular Automata and Groups TULLIO CECCHERINI SILBERSTEIN1, MICHEL COORNAERT2 1 Dipartimento di Ingegneria, Università del Sannio, Benevento, Italy 2 Institut de Recherche Mathématique Avancée, Université Louis Pasteur et CNRS, Strasbourg, France Article Outline Glossary Deﬁnition of the Subject Introduction Cellular Automata Cellular Automata with a Finite Alphabet Linear Cellular Automata Group Rings and Kaplansky Conjectures Future Directions Bibliography Glossary Groups A group is a set G endowed with a binary operation G G 3 (g; h) 7! gh 2 G, called the multiplication, that satisﬁes the following properties: (i) for all g; h and k in G, (gh)k D g(hk) (associativity); (ii) there exists an element 1G 2 G (necessarily unique) such that, for all g in G, 1G g D g1G D g (existence of the identity element); (iii) for each g in G, there exists an element g 1 2 G (necessarily unique) such that g g 1 D g 1 g D 1G (existence of the inverses). A group G is said to be Abelian (or commutative) if the operation is commutative, that is, for all g; h 2 G one has gh D hg. A group F is called free if there is a subset S F such that any element g of F can be uniquely written as a reduced word on S, i. e. in the form g D s1˛1 s2˛2 s ˛n n , where n 0, s i 2 S and ˛ i 2 Z n f0g for 1 i n, and such that s i ¤ s iC1 for 1 i n 1. Such a set S is called a free basis for F. The cardinality of S is an invariant of the group F and it is called the rank of F. A group G is ﬁnitely generated if there exists a ﬁnite subset S G such that every element g 2 G can be expressed as a product of elements of S and their inverses, that is, g D s11 s22 s nn , where n 0 and s i 2 S, i D ˙1 for 1 i n. The minimal n for which such an expression exists is called the word length of g with respect to S and it is denoted by `(g). The group G is a (discrete) metric space with the
distance function d : G G ! RC deﬁned by setting d(g; g 0 ) D `(g 1 g 0 ) for all g; g 0 2 G. The set S is called a ﬁnite generating subset for G and one says that S is symmetric provided that s 2 S implies s1 2 S. The Cayley graph of a ﬁnitely generated group G w.r. to a symmetric ﬁnite generating subset S G is the (undirected) graph Cay(G; S) with vertex set G and where two elements g; g 0 2 G are joined by an edge if and only if g 1 g 0 2 S. A group G is residually ﬁnite if the intersection of all subgroups of G of ﬁnite index is trivial. A group G is amenable if it admits a rightinvariant mean, that is, a map : P (G) ! [0; 1], where P (G) denotes the set of all subsets of G, satisfying the following conditions: (i) (G) D 1 (normalization); (ii) (A [ B) D (A) C (B) for all A; B 2 P (G) such that A \ B D ¿ (ﬁnite additivity); (iii) (Ag) D (A) for all g 2 G and A 2 P (G) (rightinvariance). Rings A ring is a set R equipped with two binary operations R R 3 (a; b) 7! a C b 2 R and R R 3 (a; b) 7! ab 2 R, called the addition and the multiplication, respectively, such that the following properties are satisﬁed: (i) R, with the addition operation, is an Abelian group with identity element 0, called the zero element, (the inverse of an element a 2 R is denoted by a); (ii) the multiplication is associative and admits an identity element 1, called the unit element; (iii) multiplication is distributive with respect to addition, that is, a(b C c) D ab C ac and (b C c)a D ba C ca for all a; b and c 2 R. A ring R is commutative if ab D ba for all a; b 2 R. A ﬁeld is a commutative ring K ¤ f0g where every nonzero element a 2 K is invertible, that is there exists a1 2 K such that aa1 D 1. In a ring R a nontrivial element a is called a zerodivisor if there exists a nonzero element b 2 R such that either ab D 0 or ba D 0. A ring R is directly ﬁnite if whenever ab D 1 then necessarily ba D 1, for all a; b 2 R. If the ring M d (R) of d d matrices with coeﬃcients in R is directly ﬁnite for all d 1 one says that R is stably ﬁnite. Let R be a ring and let G be a group. Denote by R[G] P the set of all formal sums g2G ˛ g g where ˛ g 2 R and ˛ g D 0 except for ﬁnitely many elements g 2 G. We deﬁne two binary operations on R[G], namely the addition, by setting ! ! X X X ˛g g C ˇh h D (˛ g C ˇ g )g ; g2G
h2G
and the multiplication, by setting
g2G
Cellular Automata and Groups
X g2G
! ˛g g
X
! ˇh h D
h2G
X
˛ g ˇ h gh
g;h2G
kDg h
X
˛ g ˇ g 1 k k:
g;k2G
Then, with these two operations, R[G] becomes a ring; it is called the group ring of G with coeﬃcients in R. Cellular automata Let G be a group, called the universe, and let A be a set, called the alphabet. A conﬁguration is a map x : G ! A. The set AG of all conﬁgurations is equipped with the right action of G deﬁned by AG G 3 (x; g) 7! x g 2 AG , where x g (g 0 ) D x(g g 0 ) for all g 0 2 G. A cellular automaton over G with coeﬃcients in A is a map : AG ! AG satisfying the following condition: there exists a ﬁnite subset M G and a map : AM ! A such that (x)(g) D (x g j M ) for all x 2 AG ; g 2 G, where x g j M denotes the restriction of xg to M. Such a set M is called a memory set and is called a local deﬁning map for . If A D V is a vector space over a ﬁeld K, then a cellular automaton : V G ! V G , with memory set M G and local deﬁning map : V M ! V , is said to be linear provided that is linear. Two conﬁgurations x; x 0 2 AG are said to be almost equal if the set fg 2 G; x(g) ¤ x 0 (g)g at which they differ is ﬁnite. A cellular automaton is called preinjective if whenever (x) D (x 0 ) for two almost equal conﬁgurations x; x 0 2 AG one necessarily has x D x 0 . A Garden of Eden conﬁguration is a conﬁguration x 2 AG n (AG ). Clearly, GOE conﬁgurations exist if and only if is not surjective. Definition of the Subject A cellular automaton is a selfmapping of the set of conﬁgurations of a group deﬁned from local and invariant rules. Cellular automata were ﬁrst only considered on the ndimensional lattice group Z n and for conﬁgurations taking values in a ﬁnite alphabet set but they may be formally deﬁned on any group and for any alphabet. However, it is usually assumed that the alphabet set is endowed with some mathematical structure and that the local deﬁning rules are related to this structure in some way. It turns out that general properties of cellular automata often reﬂect properties of the underlying group. As an example, the Garden of Eden theorem asserts that if the group is amenable and the alphabet is ﬁnite, then the surjectivity of a cellular automaton is equivalent to its preinjectivity
(a weak form of injectivity). There is also a linear version of the Garden of Eden theorem for linear cellular automata and ﬁnitedimensional vector spaces as alphabets. It is an amazing fact that famous conjectures of Kaplansky about the structure of group rings can be reformulated in terms of linear cellular automata. Introduction The goal of this paper is to survey results related to the Garden of Eden theorem and the surjunctivity problem for cellular automata. The notion of a cellular automaton goes back to John von Neumann [37] and Stan Ulam [34]. Although cellular automata were ﬁrstly considered only in theoretical computer science, nowadays they play a prominent role also in physics and biology, where they serve as models for several phenomena ( Cellular Automata Modeling of Physical Systems, Chaotic Behavior of Cellular Automata), and in mathematics. In particular, cellular automata are studied in ergodic theory ( Ergodic Theory of Cellular Automata) and in the theory of dynamical systems ( Topological Dynamics of Cellular Automata), in functional and harmonic analysis, and in group theory. In the classical framework, the universe U is the lattice Z2 of integer points in Euclidean plane and the alphabet A is a ﬁnite set, typically A D f0; 1g. The set AU D fx : U ! Ag is the conﬁguration space, a map x : U ! A is a conﬁguration and a point (n; m) 2 U is called a cell. One is given a neighborhood M of the origin (0; 0) 2 U, typically, for some r > 0, M D f(n; m) 2 Z2 : jnj C jmj rg (von Neumann rball) or M D f(n; m) 2 Z2 : jnj; jmj rg (Moore’s rball) and a local map : AM ! A. One then “extends” to the whole universe obtaining a map : AU ! AU , called a cellular automaton, by setting (x)(n; m) D (x(n C s; m C t)(s;t)2M ). This way, the value (x)(n; m) 2 A of the conﬁguration x at the cell (n; m) 2 U only depends on the values x(n C s; m C s) of x at its neighboring cells (x C s; y C t) (x; y) C (s; t) 2 (x; y)C M, in other words, is Z2 equivariant. M is called a memory set for and a local deﬁning map. In 1963 E.F. Moore proved that if a cellular automa2 2 ton : AZ ! AZ is surjective then it is also preinjective, a weak form of injectivity. Shortly later, John Myhill proved the converse to Moore’s theorem. The equivalence of surjectivity and preinjectivity of cellular automata is referred to as the Garden of Eden theorem (brieﬂy GOE theorem), this biblical terminology being motivated by the fact that it gives necessary and suﬃcient conditions for the existence of conﬁgurations x that are not in the image 2 2 2 of , i. e. x 2 AZ n (AZ ), so that, thinking of ( ; AZ ) as
337
338
Cellular Automata and Groups
a discrete dynamical system, with being the time, they can appear only as “initial conﬁgurations”. It was immediately realized that the GOE theorem was holding also in higher dimension, namely for cellular automata with universe U D Zd , the lattice of integer points in the ddimensional space. Then, Machì and Mignosi [27] gave the deﬁnition of a cellular automaton over a ﬁnitely generated group and extended the GOE theorem to the class of groups G having subexponential growth, that is for which the growth function G (n), which counts the elements g 2 G at “distance” at most n from the unit element 1G of G, has a growth weaker than the exponential, p in formulæ, lim n!1 n G (n) D 1. Finally, in 1999 CeccheriniSilberstein, Machì and Scarabotti [9] extended the GOE theorem to the class of amenable groups. It is interesting to note that the notion of an amenable group was also introduced by von Neumann [36]. This class of groups contains all ﬁnite groups, all Abelian groups, and in fact all solvable groups, all groups of subexponential growth and it is closed under the operation of taking subgroups, quotients, directed limits and extensions. In [27] two examples of cellular automata with universe the free group F 2 of rank two, the prototype of a nonamenable group, which are surjective but not preinjective and, conversely, preinjective but not surjective, thus providing an instance of the failure of the theorems of Moore and Myhill and so of the GOE theorem. In [9] it is shown that this examples can be extended to the class of groups, thus necessarily nonamenable, containing the free group F 2 . We do not know whether the GOE theorem only holds for amenable groups or there are examples of groups which are nonamenable and have no free subgroups: by results of Olshanskii [30] and Adyan [1] it is know that such class is nonempty. In 1999 Misha Gromov [20], using a quite diﬀerent terminology, reproved the GOE for cellular automata whose universes are inﬁnite amenable graphs with a dense pseudogroup of holonomies (in other words such s are rich in symmetries). In addition, he considered not only cellular automata from the full conﬁguration space A into itself but also between subshifts X; Y A . He used the notion of entropy of a subshift (a concept hidden in the papers [27] and [9]. In the mid of the ﬁfties W. Gottschalk introduced the notion of surjunctivity of maps. A map f : X ! Y is surjunctive if it is surjective or not injective. We say that a group G is surjunctive if all cellular automata : AG ! AG with ﬁnite alphabet are surjunctive. Lawton [18] proved that residually ﬁnite groups are surjunctive. From the GOE theorem for amenable groups [9] one immediately deduce that amenable groups are surjunctive as well. Finally Gromov [20] and, independently, Ben
jamin Weiss [38] proved that all soﬁc groups (the class of soﬁc groups contains all residually ﬁnite groups and all amenable groups) are surjunctive. It is not known whether or not all groups are surjunctive. In the literature there is a notion of a linear cellular automaton. This means that the alphabet is not only a ﬁnite set but also bears the structure of an Abelian group and that the local deﬁning map is a group homomorphism, that is, it preserves the group operation. These are also called additive cellular automata ( Additive Cellular Automata). In [5], motivated by [20], we introduced another notion of linearity for cellular automata. Given a group G and a vector space V over a (not necessarily ﬁnite) ﬁeld K, the conﬁguration space is V G and a cellular automaton : V G ! V G is linear if the local deﬁning map : V B ! V is Klinear. The set LCA(V ; G) of all linear cellular automata with alphabet V and universe G naturally bears a structure of a ring. The ﬁniteness condition for a set A in the classical framework is now replaced by the ﬁnite dimensionality of V. Similarly, the notion of entropy for subshifts X AG is now replaced by that of meandimension (a notion due to Gromov [20]). In [5] we proved the GOE theorem for linear cellular automata : V G ! V G with alphabet a ﬁnite dimensional vector space and with G an amenable group. Moreover, we proved a linear version of Gottschalk surjunctivity theorem for residually ﬁnite groups. In the same paper we also establish a connection with the theory of group rings. Given a group G and a ﬁeld K, there is a onetoone correspondence between the elements in the group ring K[G] and the cellular automata : KG ! KG . This correspondence preserves the ring structures of K[G] and LCA(K; G). This led to a reformulation of a long standing problem, raised by Irving Kaplansky [23], about the absence of zerodivisors in K[G] for G a torsionfree group, in terms of the preinjectivity of all 2 LCA(K; G). In [6] we proved the linear version of the Gromov– Weiss surjunctivity theorem for soﬁc groups and established another application to the theory of group rings. We extended the correspondence above to a ring isomorphism between the ring Matd (K[G]) of d d matrices with coefﬁcients in the group ring K[G] and LCA(Kd ; G). This led to a reformulation of another famous problem, raised by Irving Kaplansky [24] about the structure of group rings. A group ring K[G] is stably ﬁnite if and only if, for all d 1, all linear cellular automata : (Kd )G ! (Kd )G are surjunctive. As a byproduct we obtained another proof of the fact that group rings over soﬁc groups are stably ﬁnite,
Cellular Automata and Groups
a result previously established by G. Elek and A. Szabó [11] using diﬀerent methods. The paper is organized as follows. In Sect. “Cellular Automata” we present the general deﬁnition of a cellular automaton for any alphabet and any group. This includes a few basic examples, namely Conway’s Game of Life, the majority action and the discrete Laplacian. In the subsequent section we specify our attention to cellular automata with a ﬁnite alphabet. We present the notions of Cayley graphs (for ﬁnitely generated groups), of amenable groups, and of entropy for Ginvariant subsets in the conﬁguration space. This leads to a description of the setting and the statement of the Garden of Eden theorem for amenable groups. We also give detailed expositions of a few examples showing that the hypotheses of amenability cannot, in general, be removed from the assumption of this theorem. We also present the notion of surjunctivity, of soﬁc groups and state the surjunctivity theorem of Gromov and Weiss for soﬁc groups. In Sect. “Linear Cellular Automata” we introduce the notions of linear cellular automata and of mean dimension for Ginvariant subspaces in V G . We then discuss the linear analogue of the Garden of Eden theorem and, again, we provide explicit examples showing that the assumptions of the theorem (amenability of the group and ﬁnite dimensionality of the underlying vector space) cannot, in general, be removed. Finally we present the linear analogue of the surjunctivity theorem of Gromov and Weiss for linear cellular automata over soﬁc groups. In Sect. “Group Rings and Kaplansky Conjectures” we give the deﬁnition of a group ring and present a representation of linear cellular automata as matrices with coeﬃcients in the group ring. This leads to the reformulation of the two long standing problems raised by Kaplansky about the structure of group rings. Finally, in Sect. “Future Directions” we present a list of open problems with a description of more recent results related to the Garden of Eden theorem and to the surjunctivity problem. Cellular Automata The Conﬁguration Space Let G be a group, called the universe, and let A be a set, called the alphabet or the set of states. A conﬁguration is a map x : G ! A. The set AG of all conﬁgurations is equipped with the right action of G deﬁned by AG G 3 (x; g) 7! x g 2 AG , where x g (g 0 ) D x(g g 0 ) for all g 0 2 G. Cellular Automata A cellular automaton over G with coeﬃcients in A is a map : AG ! AG satisfying the following condition: there ex
ists a ﬁnite subset M G and a map : AM ! A such that (x)(g) D (x g j M )
(1)
for all x 2 AG ; g 2 G, where x g j M denotes the restriction of xg to M. Such a set M is called a memory set and is called a local deﬁning map for . It follows directly from the deﬁnition that every cellular automaton : AG ! AG is Gequivariant, i. e., it satisﬁes (x g ) D (x) g
(2)
for all g 2 G and x 2 AG . Note that if M is a memory set for , then any ﬁnite set M 0 G containing M is also a memory set for . The local deﬁning map associated with such an M 0 is the map 0 0 0 : AM ! A given by 0 D ı , where : AM ! AM is the restriction map. However, there exists a unique memory set M 0 of minimal cardinality. This memory set M 0 is called the minimal memory set for . We denote by CA(G; A) the set of all cellular automata over G with alphabet A. Examples Example 1 (Conway’s Game of Life [3]) The most famous example of a cellular automaton is the Game of Life of John Horton Conway. The set of states is A D f0; 1g. State 0 corresponds to absence of life while state 1 indicates life. Therefore passing from 0 to 1 can be interpreted as birth, while passing from 1 to 0 corresponds to death. The universe for Life is the group G D Z2 , that is, the free Abelian group of rank 2. The minimal memory set is M D f1; 0; 1g2 Z2 . The set M is the Moore neighborhood of the origin in Z2 . It consists of the origin (0; 0) and its eight neighbors ˙(1; 0); ˙(0; 1); ˙(1; 1); ˙(1; 1). The corresponding local deﬁning map : AM ! A given by 8 ˆ ˆ ˆ ˆ ˆ 0:5 and to 0 for m < 0:5. If m is exactly 0.5, then the last state is assigned (s (T) D i(T) ). i This memory mechanism is accumulative in their demand of knowledge of past history: to calculate the memory ˚ charge ! i(T) it is not necessary to know the whole i(t) series, while it can be sequentially calculated as: ! i(T) D ˛! i(T1) C i(T) . It is, ˝(T) D (˛ T 1)/(˛ 1). The choice of the memory factor ˛ simulates the longterm or remnant memory eﬀect: the limit case ˛ D 1 corresponds to memory with equally weighted records (full memory, equivalent to mode if k D 2), whereas ˛ 1 intensiﬁes the contribution of the most recent states and diminishes the contribution of the past ones (short term working memory). The choice ˛ D 0 leads to the ahistoric model. In the most unbalanced scenario up to T, i. e.: i(1) D D i(T1) ¤ i(T) , it is: 1 ˛1 1 ) T D 2 ˛ 1 2 ˛T ˛ 1 1 D : m(1; 1; : : : ; 1; 0) D ) T 2 ˛ 1 2
m(0; 0; : : : ; 0; 1) D
Thus, memory is only operative if ˛ is greater than a critical ˛T that veriﬁes: ˛TT 2˛T C 1 D 0 ;
(1)
in which case cells will be featured at T with state values diﬀerent to the last one. Initial operative values are: ˛3 D 0:61805, ˛4 D 0:5437. When T ! 1, Eq. 1 becomes: 2˛1 C 1 D 0, thus, in the k D 2 scenario, ˛memory is not eﬀective if ˛ 0:5. A Worked Example: The Parity Rule The socalled parity rule states: cell alive if the number of neighbors is odd, dead on the contrary case. Figure 2 shows the eﬀect of memory on the parity rule starting from a single live cell in the Moore neighborhood. In accordance with the above given values of ˛ 3 and ˛ 4 : (i) The
pattern at T D 4 is the ahistoric one if ˛ 0:6, altered when ˛ 0:7, and (ii) the patterns at T D 5 for ˛ D 0:54 and ˛ D 0:55 diﬀer. Not low levels of memory tend to freeze the dynamics since the early timesteps, e. g. over 0.54 in Fig. 2. In the particular case of full memory small oscillators of short range in time are frequently generated, such as the periodtwo oscillator that appears as soon as at T D 2 in Fig. 2. The group of evolution patterns shown in the [0.503,0.54] interval of ˛ variation of Fig. 2, is rather unexpected to be generated by the parity rule, because they are too sophisticated for this simple rule. On the contrary, the evolution patterns with very small memory, ˛ D 0:501, resemble those of the ahistoric model in Fig. 2. But this similitude breaks later on, as Fig. 3 reveals: from T D 19, the parity rule with minimal memory evolves producing patterns notably diﬀerent to the ahistoric ones. These patterns tend to be framed in squares of size not over T T, whereas in the ahistoric case, the patterns tend to be framed in 2T 2T square regions, so even minimal memory induces a very notable reduction in the aﬀected cell area in the scenario of Fig. 2. The patterns of the featured cells tend not to be far to the actual ones, albeit examples of notable divergence can be traced in Fig. 2. In the particular case of the minimal memory scenario of Fig. 2, that of ˛ D 0:501, memory has no eﬀect up to T D 9, when the pattern of featured live cells reduces to the initial one; afterward both evolutions are fairly similar up to T D 18, but at this time step both kinds of patterns notably diﬀers, and since then the evolution patterns in Fig. 3 notably diverge from the ahistoric ones. To give consideration to previous states (historic memory) in twodimensional CA tends to conﬁne the disruption generated by a single live cell. As a rule, full memory tends to generate oscillators, and less historic information retained, i. e. smaller ˛ value, implies an approach to the ahistoric model in a rather smooth form. But the transition which decreases the memory factor from ˛ D 1:0 (full memory) to ˛ D 0:5 (ahistoric model), is not always regular, and some kind of erratic eﬀect of memory can be traced. The inertial (or conservating) eﬀect of memory dramatically changes the dynamics of the semitotalistic LIFE rule. Thus, (i) the vividness that some small clusters exhibit in LIFE, has not been detected in LIFE with memory. In particular, the glider in LIFE does not glide with
Cellular Automata with Memory, Figure 2 The 2D parity rule with memory up to T D 15
383
384
Cellular Automata with Memory
Cellular Automata with Memory
Cellular Automata with Memory, Figure 3 The 2D parity rule with ˛ D 0:501 memory starting from a single site live cell up to T D 55
memory, but stabilizes very close to its initial position as , (ii) as the size of a conﬁguration increases, ofthe tub ten live clusters tend to persist with a higher number of live cells in LIFE with memory than in the ahistoric formulation, (iii) a single mutant appearing in a stable agar can lead to its destruction in the ahistoric model, whereas its eﬀect tends to be restricted to its proximity with memory [26]. OneDimensional CA Elementary rules are onedimensional, twostate rules operating on nearest neighbors. Following Wolfram’s notation, these rules are characterized by a sequence of binary values (ˇ) associated with each of the eight possible triplets
(T) (T) i1 ; i(T) ; iC1 :
111 ˇ1
110 ˇ2
101 ˇ3
100 ˇ4
011 ˇ5
010 ˇ6
001 ˇ7
000 ˇ8
The rules are conveniently speciﬁed by their rule numP ber R D 8iD1 ˇ i 28i . Legal rules are reﬂection symmetric (ˇ5 D ˇ2 ; ˇ7 D ˇ4 ), and quiescent (ˇ8 D 0), restrictions that leave 32 possible legal rules. Figure 4 shows the spatiotemporal patterns of legal rules aﬀected by memory when starting from a single live cell [17]. Patterns are shown up to T D 63, with the memory factor varying from 0.6 to 1.0 by 0.1 intervals, and adopting also values close to the limit of its eﬀectivity: 0.5. As a rule, the transition from the ˛ D 1:0 (fully historic) to the ahistoric scenario is fairly gradual, so that the patterns become more expanded as less historic memory is retained (smaller ˛). Rules 50, 122, 178,250, 94, and 222,254 are paradigmatic of this smooth evolution. Rules 222 and
254 are not included in Fig. 4 as they evolve as rule 94 but with the inside of patterns full of active cells. Rules 126 and 182 also present a gradual evolution, although their patterns with high levels of memory models hardly resemble the historic ones. Examples without a smooth effect of memory are also present in Fig. 4: (i) rule 150 is sharply restrained at ˛ D 0:6, (ii) the important rule 54 extinguish in [0.8,0.9], but not with full memory, (iii) the rules in the group {18,90,146,218} become extinct from ˛ D 0:501. Memory kills the evolution for these rules already at T D 4 for ˛ values over ˛ 3 (thus over 0.6 in Fig. 4): after T D 3 all the cells, even the two outer cells alive at T D 3, are featured as dead, and (iv) rule 22 becomes extinct for ˛ D 0:501, not in 0.507, 0.6, and 0.7, again extinguish at 0.8 and 0.9, and ﬁnally generate an oscillator with full memory. It has been argued that rules 18, 22, 122, 146 and 182 simul ate Rule 90 in that their behavior coincides when restricted to certain spatial subsequences. Starting with a single site live cell, the coincidence fully applies in the historic model for rules 90, 18 and 146. Rule 22 shares with these rules the extinction for high ˛ values, with the notable exception of no extinction in the fully historic model. Rules 122 and 182 diverge in their behavior: there is a gradual decrease in the width of evolving patterns as ˛ is higher, but they do not reach extinction. Figure 5 shows the eﬀect of memory on legal rules when starting at random: the values of sites are initially uncorrelated and chosen at random to be 0 (blank) or 1 (gray) with probability 0.5. Diﬀerences in patterns resulting from reversing the center site value are shown as black pixels. Patterns are shown up to T D 60, in a line of size 129 with periodic boundary conditions imposed on the edges. Only the nine legal rules which generate nonperiodic pat
385
386
Cellular Automata with Memory
Cellular Automata with Memory, Figure 4 Elementary, legal rules with memory from a single site live cell
terns in the ahistoric scenario are signiﬁcantly aﬀected by memory. The patterns with inverted triangles dominate the scene in the ahistoric patterns of Fig. 5, a common appearance that memory tends to eliminate. History has a dramatic eﬀect on Rule 18. Even at the low value of ˛ D 0:6; the appearance of its spatiotem
poral pattern fully changes: a number of isolated periodic structures are generated, far from the distinctive inverted triangle world of the ahistoric pattern. For ˛ D 0:7; the live structures are fewer, advancing the extinction found in [0.8,0.9]. In the fully historic model, simple periodic patterns survive.
Cellular Automata with Memory
Cellular Automata with Memory, Figure 4 (continued)
Rule 146 is aﬀected by memory in much the same way as Rule 18 because their binary codes diﬀer only in their ˇ1 value. The spatiotemporal of rule 182 and its equivalent Rule 146 are reminiscent, though those of Rule 182 look like a negatives photogram of those of Rule 146. The eﬀect of memory on rule 22 and the complex rule 54 is similar. Their spatiotemporal patterns in ˛ D 0:6 and ˛ D 0:7 keep the essential of the ahistoric, although the inverted triangles become enlarged and tend to be
more sophisticated in their basis. A notable discontinuity is found for both rules ascending in the value of the memory factor: in ˛ D 0:8 and ˛ D 0:9 only a few simple structures survive. But unexpectedly, the patterns of the fully historic scenario diﬀer markedly from the others, showing a high degree of synchronization. The four remaining chaotic legal rules (90, 122, 126 and 150) show a much smoother evolution from the ahistoric to the historic scenario: no pattern evolves either to
387
388
Cellular Automata with Memory
Cellular Automata with Memory, Figure 5 Elementary, legal rules with memory starting at random
full extinction or to the preservation of only a few isolated persistent propagating structures (solitons). Rules 122 and 126, evolve in a similar form, showing a high degree of synchronization in the fully historic model. As a rule, the eﬀect of memory on the diﬀerences in patterns (DP) resulting from reversing the value of its initial center site is reminiscent of that on the spatiotemporal patterns, albeit this very much depends on the actual simulation run. In the case of rule 18 for example, damage is not present in the simulation of Fig. 5. The group of rules 90, 122, 126 and 150 shows a, let us say canonical, fairly gradual evolution from the ahistoric to the historic scenario, so that the DP appear more constrained as more historic memory is retained, with no extinction for any ˛
value. Figure 6 shows the evolution of the fraction T of sites with value 1, starting at random ( 0 D 0:5). The simulation is implemented for the same rules as in Fig. 4, but with notably wider lattice: N D 500: A visual inspection of the plots in Fig. 6, ratiﬁes the general features observed in the patterns in Fig. 5 regarding density. That also stands for damage spreading: as a rule, memory depletes the damaged region. In onedimensional r D 2 CA, the value of a given site depends on values of the nearest and nextnearest neighbors. Totalistic rules with memory have the form: r D 2(T) (T) (T) (T) i(TC1) D s(T) C s i2 i1 C s i C s iC1 C s iC2 . The effect of memory on these rules follows the way traced in the r D 1 context, albeit with a rich casuistic studied in [14].
Cellular Automata with Memory
Cellular Automata with Memory, Figure 5 (continued)
Probabilistic CA So far the CA considered are deterministic. In order to study perturbations to deterministic CA as well as transitional changes from one deterministic CA to another, it is natural to generalize the deterministic CA framework to the probabilistic scenario. In the elementary scenario, the ˇ are replaced by probabilities . (T) (T) : ; i(T) ; iC1 p D P i(TC1) D 1 i1 111 110 101 100 011 010 001 000 ˇ1 ˇ2 ˇ3 ˇ4 ˇ5 ˇ6 ˇ7 ˇ8 p1 p2 p3 p4 p5 p6 p7 p8
ˇ 2 f0; 1g p 2 [0; 1]
As in the deterministic scenario, memory can be embedded in probabilistic CA (PCA) by featuring cells by a summary of past states si instead of by their last (T) (T) state i : p D P i(TC1) D 1/s (T) i1 ; s i ; s iC1 . Again, mem
ory is embedded into the characterization of cells but not in the construction of the stochastic transition rules, as done in the canonical approach to PCA. We have explored the eﬀect of memory on three diﬀerent subsets 0; p ; 0; p ; p ; 0; p ; 0 2 4 2 4 , 0; 0; p3 ; 1; 0; p6 ; 1; 0 , and p1 ; p2 ; p1 ; p2 ; p2 ; 0; p2 ; 0 in [9]. Other Memories A number of averagelike memory mechanisms can readily be proposed by using weights diﬀerent to that implemented in the ˛memory mechanism: ı(t) D ˛ Tt . Among the plausible choices of ı, we mention the weights ı(t) D t c and ı(t) D c t , c 2 N, in which the larger the value of c, the more heavily is the recent past taken into account, and consequently the closer the scenario to the ahistoric model [19,21]. Both weights allow for integerbased
389
390
Cellular Automata with Memory
Cellular Automata with Memory, Figure 5 (continued)
arithmetics (à la CA) comparing 2! (T) to 2˝(T) to get the featuring states s (a clear computational advantage over the ˛based model), and are accumulative in respect to charge: ! i(T) D ! i(T1) C T c i(T) , ! i(T) D ! i(T1) C c T i(T) . Nevertheless, they share the same drawback: powers explode, at high values of t, even for c D 2. Limited trailing memory would keep memory of only the last states. This is implemented in the conP text of average memory as: ! i(T) D TtD> ı(t) i(t) , with > D max(1; T C 1). Limiting the trailing memory would approach the model to the ahistoric model ( D 1). In the geometrically discounted method, such an eﬀect is more appreciable when the value of ˛ is high, whereas at low ˛ values (already close to the ahistoric model when memory is not limited) the eﬀect of limiting the trailing memory is not so important. In the k D 2 context, if D 3; provided that ˛ > ˛3 D 0:61805; the memory
mechanism turns out to be that of selecting the mode D mode i(T2) ; i(T) ; i(T1) , of the last three states: s(T) i i. e. the elementary rule 232. Figure 7 shows the eﬀect of this kind of memory on legal rules. As is known, history has a dramatic eﬀect on Rules 18, 90, 146 and 218 as their pattern dies out as early as at T D 4: The case of Rule 22 is particular: two branches are generated at T D 17 in the historic model; the patterns of the remaining rules in the historic model are much reminiscent of the ahistoric ones, but, let us say, compressed. Figure 7 shows also the eﬀect of memory on some relevant quiescent asymmetric rules. Rule 2 shifts a single site live cell one space at every timestep in the ahistoric model; with the pattern dying at T D 4. This evolution is common to all rules that just shift a single site cell without increasing the number of living cells at T D 2, this is the case of the important rules 184 and 226. The patterns generated by
Cellular Automata with Memory
Cellular Automata with Memory, Figure 6 Evolution of the density starting at random in elementary legal rules. Color code: blue ! full memory, black ! ˛ D 0:8, red ! ahistoric model
Cellular Automata with Memory, Figure 7 Legal (first row of patterns) and quiescent asymmetric elementary rules significantly affected by the mode of the three last states of memory
rules 6 and 14 are rectiﬁed (in the sense of having the lines in the spatiotemporal pattern slower slope) by memory in such a way that the total number of live cells in the historic
and ahistoric spatiotemporal patterns is the same. Again, the historic patterns of the remaining rules in Fig. 7 seem, as a rule, like the ahistoric ones compressed [22].
391
392
Cellular Automata with Memory
Cellular Automata with Memory, Figure 8 The Rule 150 with elementary rules up to R D 125 as memory
Cellular Automata with Memory
Cellular Automata with Memory, Figure 9 The parity rule with elementary rules as memory. Evolution from T D 4 15 in the Neumann neighborhood starting from a singe site live cell
Elementary rules (ER, noted f ) can in turn act as memory rules: D f i(T2) ; i(T) ; i(T1) s (T) i Figure 8 shows the eﬀect of ER memories up to R D 125 on rule 150 starting from a single site live cell up to T D 13. The eﬀect of ER memories with R > 125 on rule 150 as well as on rule 90 is shown in [23]. In the latter case, complementary memory rules (rules whose rule number adds 255) have the same eﬀect on rule 90 (regardless of the role played by the three last states in and the initial conﬁguration). In the ahistoric scenario, Rules 90 and 150 are linear (or additive): i. e., any initial pattern can be decomposed into the superposition of patterns from a single site seed. Each of these conﬁgurations can be evolved independently and the results superposed (module two) to obtain the ﬁnal complete pattern. The additivity of rules 90 and 150 remains in the historic model with linear memory rules.
Figure 9 shows the eﬀect of elementary rules on the 2D parity rule with von Neumann neighborhood from a singe site live cell. This ﬁgure shows patterns from T D 4, being . The consideration of CA the three ﬁrst patterns: rules as memory induces a fairly unexplored explosion of new patterns. CA with Three Statesk This section deals with CA with three possible values at each site (k D 3), noted f0; 1; 2g, so the rounding mechanism is implemented by comparing the unrounded weighted mean m to the hallmarks 0.5 and 1.5, assigning the last state in case on an equality to any of these values. Thus, s T D 0 if m T < 0:5 ; s T D 1 if 0:5 < m T < 1:5 ; s T D 2 if m T > 1:5 ;
and
s T D T if m T D 0:5 or m T D 1:5 :
393
394
Cellular Automata with Memory
Cellular Automata with Memory, Figure 10 Parity k D 3 rules starting from a single D 1 seed. The red cells are at state 1, the blue ones at state
In the most unbalanced cell dynamics, historic memory takes eﬀect after time step T only if ˛ > ˛T , with 3˛TT 4˛T C 1 D 0, which in the temporal limit becomes 4˛ C 1 D 0 , ˛ D 0:25. In general, in CA with k states (termed from 0 to k 1), the characteristic equation at T is (2k 3)˛TT (2k 1)˛T C 1 D 0, which becomes 2(k 1)˛ C 1 D 0 in the temporal limit. It is then concluded that memory does not aﬀect the scenario if ˛ ˛ (k) D 1/(2(k 1)). (T) We study ﬁrst totalistic rules: i(TC1) D i1 C (T) i(T) C iC1 , characterized by a sequence of ternary values (ˇs ) associated with each of the seven possible values of the sum (s) of the neighbors: (ˇ6 ; ˇ5 ; ˇ4 ; ˇ3 ; ˇ2 ; P ˇ1 ; ˇ0 ), with associated rule number R D 6sD0 ˇs 3s 2 [0; 2186]: Figure 10 shows the eﬀect of memory on quiescent (ˇ0 D 0) parity rules, i. e. rules with ˇ1 ; ˇ3 and ˇ 5 non null, and ˇ2 D ˇ4 D ˇ6 D 0: Patterns are shown up to T D 26. The pattern for ˛ D 0:3 is shown to test its proximity to the ahistoric one (recall that if ˛ 0:25 memory takes no eﬀect). Starting with a single site seed it can be concluded, regarding proper threestate rules such as those in Fig. 10, that: (i) as an overall rule the patterns become more expanded as less historic memory is retained (smaller ˛). This characteristic inhibition of growth effect of memory is traced on rules 300 and 543 in Fig. 10, (ii) the transition from the fully historic to the ahistoric scenario tends to be gradual in regard to the amplitude of the spatiotemporal patterns, although their composition can diﬀer notably, even at close ˛ values, (iii) in contrast
to the twostate scenario, memory ﬁres the pattern of some threestate rules that die out in the ahistoric model, and no rule with memory dies out. Thus, the eﬀect of memory on rules 276, 519, 303 and 546 is somewhat unexpected: they die out at ˛ 0:3 but at ˛ D 0:4 the pattern expands, the expansion being inhibited (in Fig. 10) only at ˛ 0:8. This activation under memory of rules that die at T D 3 in the ahistoric model is unfeasible in the k D 2 scenario. The features in the evolving patterns starting from a single seed in Fig. 10 are qualitatively reﬂected starting at random as shown with rule 276 in Fig. 11, which is also activated (even at ˛ D 0:3) when starting at random. The effect of average memory (˛ and integerbased models, unlimited and limited trailing memory, even D 2) and that of the mode of the last three states has been studied in [21]. When working with more than three states, it is an inherent consequence of averaging the tendency to bias the featuring state to the mean value: 1. That explains the redshift in the previous ﬁgures. This led us to focus on a much more fair memory mechanism: the mode, in what follows. Mode memory allows for manipulation of pure symbols, avoiding any computing/arithmetics. In excitable CA, the three states are featured: resting 0, excited 1 and refractory 2. State transitions from excited to refractory and from refractory to resting are unconditional, they take place independently on a cell’s neighborhood state: i(T) D 1 ! i(TC1) D 2, i(T) D 2 ! i(TC1) D 0. In [15] the excitation rule adopts a Pavlovian phenomenon of defensive inhibition: when strength of stimulus applied exceeds a certain limit the
Cellular Automata with Memory
Cellular Automata with Memory, Figure 11 The k D 3; R D 276 rule starting at random
system ‘shuts down’, this can be naively interpreted as an inbuilt protection of energy loss and exhaustion. To simulate the phenomenon of defensive inhibition we adopt interval excitation rules [2], and a resting cell becomes excited only if one or two of its neighbors are excited: P i(T) D 0 ! i(T) D 1 if j2N i j(T) D 1 2 f1; 2g [3]. Figure 12 shows the eﬀect of mode of the last three time steps memory on the defensiveinhibition CA rule with the Moore neighborhood, starting from a simple conﬁguration. At T D 3 the outer excited cells in the actual pattern are not featured as excited but as resting cells (twice resting versus one excited), and the series of evolving patterns with memory diverges from the ahistoric evolution at T D 4, becoming less expanded. Again, memory tends to restrain the evolution. The eﬀect of memory on the beehive rule, a totalistic twodimensional CA rule with three states implemented in the hexagonal tessellation [57] has been explored in [13].
Cellular Automata with Memory, Figure 12 Effect of mode memory on the defensive inhibition CA rule
Reversible CA The secondorder in time implementation based on the subtraction modulo of the number of states (noted ): i(TC1) D ( j(T) 2 N i ) i(T1) , readily reverses as: i(T1) D ( j(T) 2 N i ) i(TC1) . To preserve the reversible feature, memory has to be endowed only in the pivotal component of the rule transition, so: i(T1) D (TC1) (s(T) . j 2 Ni ) i For reversing from T it is necessary to know not only i(T) and i(TC1) but also ! i(T) to be compared to ˝(T), to obtain: 8 0 if 2! i(T) < ˝(T) ˆ ˆ < s(T) D i(TC1) if 2! i(T) D ˝(T) i ˆ ˆ : 1 if 2! i(T) > ˝(T) : Then to progress in the reversing, to obtain s (T1) D i round ! i(T1) /˝(T 1) , it is necessary to calculate ! i(T1) D ! i(T) i(T) /˛. But in order to avoid dividing by the memory factor (recall that operations with real numbers are not exact in computer arithmetic), it is preferable to work with i(T1) D ! i(T) i(T) , and to comP Tt . This leads pare these values to (T 1) D T1 tD1 ˛ to: 8 0 if 2 i(T1) < (T 1) ˆ ˆ < s(T1) D i(T) if 2 i(T1) D (T 1) i ˆ ˆ : 1 if 2 i(T1) > (T 1) :
395
396
Cellular Automata with Memory
Cellular Automata with Memory, Figure 13 The reversible parity rule with memory
In general: i(T) D i(TC1) ˛ 1 i(TC1) ; (T ) D (T C 1) ˛ 1 . Figure 13 shows the eﬀect of memory on the reversible parity rule starting from a single site live cell, so the scenario of Figs. 2 and 3, with the reversible qualiﬁcation. As expected, the simulations corresponding to ˛ D 0:6 or below shows the ahistoric pattern at T D 4, whereas memory leads to a pattern diﬀerent from ˛ D 0:7, and the pattern at T D 5 for ˛ D 0:54 and ˛ D 0:55 diﬀer. Again, in the reversible formulation with memory, (i) the conﬁguration of the patterns is notably altered, (ii) the speed of diﬀusion of the area aﬀected are notably reduced, even by minimal
memory (˛ D 0:501), (iii) high levels of memory tend to freeze the dynamics since the early timesteps. We have studied the eﬀect of memory in the reversible formulation of CA in many scenarios, e. g., totalistic, k D r D 2 rules [7], or rules with three states [21]. Reversible systems are of interest since they preserve information and energy and allow unambiguous backtracking. They are studied in computer science in order to design computers which would consume less energy [51]. Reversibility is also an important issue in fundamental physics [31,41,52,53]. Geraldt ’t Hooft, in a speculative paper [34], suggests that a suitably deﬁned deterministic, lo
Cellular Automata with Memory
Cellular Automata with Memory, Figure 14 The parity rule with four inputs: effect of memory and random rewiring. Distance between two consecutive patterns in the ahistoric model (red) and memory models of ˛ levels: 0.6,0.7.0.8, 0.9 (dotted) and 1.0 (blue)
cal reversible CA might provide a viable formalism for constructing ﬁeld theories on a Planck scale. Svozil [50] also asks for changes in the underlying assumptions of current ﬁeld theories in order to make their discretization appear more CAlike. Applications of reversible CA with memory in cryptography are being scrutinized [30,42]. Heterogeneous CA CA on networks have arbitrary connections, but, as proper CA, the transition rule is identical for all cells. This generalization of the CA paradigm addresses the intermediate class between CA and Boolean networks (BN, considered in the following section) in which, rules may be diﬀerent at each site. In networks two topological ends exist, random and regular networks, both display totally opposite geometric properties. Random networks have lower clustering coeﬃcients and shorter average path length between nodes commonly known as small world property. On the other hand, regular graphs, have a large average path length between nodes and high clustering coeﬃcients. In an attempt to build a network with characteristics observed in real networks, a large clustering coeﬃcient and a small world property, Watts and Strogatz (WS, [54]) proposed a model built by randomly rewiring a regular lattice. Thus, the WS model interpolates between regular and ran
dom networks, taking a single new parameter, the random rewiring degree, i. e.: the probability that any node redirects a connection, randomly, to any other. The WS model displays the high clustering coeﬃcient common to regular lattices as well as the small world property (the small world property has been related to faster ﬂow in the information transmission). The longrange links introduced by the randomization procedure dramatically reduce the diameter of the network, even when very few links are rewired. Figure 14 shows the eﬀect of memory and topology on the parity rule with four inputs in a lattice of size 65 65 with periodic boundary conditions, starting at random. As expected, memory depletes the Hamming distance between two consecutive patterns in relation to the ahistoric model, particularly when the degree of rewiring is high. With full memory, quasioscillators tend to appear. As a rule, the higher the curve the lower the memory factor ˛, but in the particular case of a regular lattice (and lattice with 10% of rewiring), the evolution of the distance in the full memory model turns out rather atypical, as it is maintained over some memory models with lower ˛ parameters. Figure 15 shows the evolution of the damage spread when reversing the initial state of the 3 3 central cells in the initial scenario of Fig. 14. The fraction of cells with the state reversed is plotted in the regular and 10% of rewiring scenarios. The plots corresponding to higher
397
398
Cellular Automata with Memory
Cellular Automata with Memory, Figure 15 Damage up to T D 100 in the parity CA of Fig. 14
Cellular Automata with Memory, Figure 16 Relative Hamming distance between two consecutive patterns. Boolean network with totalistic, K D 4 rules in the scenario of Fig. 14
rates of rewiring are very similar to that of the 10% case in Fig. 15. Damage spreads fast very soon as rewiring is present, even in a short extent. Boolean Networks In Boolean Networks (BN,[38]), instead of what happens in canonical CA, cells may have arbitrary connections and rules may be diﬀerent at each site. Working with totalistic P rules: i(TC1) D i ( j2N i s (T) j ). The main features on the eﬀect of memory in Fig. 14 are preserved in Fig. 16: (i) the ordering of the historic networks tends to be stronger with a high memory factor, (ii) with full memory, quasioscillators appear (it seems that full memory tends to induce oscillation), (iii) in the particular case of the regular graph (and a lesser extent in the networks with low rewiring), the evolution of the
full memory model turns out rather atypical, as it is maintained over some of those memory models with lower ˛ parameters. The relative Hamming distance between the ahistoric patterns and those of historic rewiring tends to be fairly constant around 0.3, after a very short initial transition period. Figure 17 shows the evolution of the damage when reversing the initial state of the 3 3 central cells. As a rule in every frame, corresponding to increasing rates of random rewiring, the higher the curve the lower the memory factor ˛. The damage vanishing eﬀect induced by memory does result apparently in the regular scenario of Fig. 17, but only full memory controls the damage spreading when the rewiring degree is not high, the dynamics with the remaining ˛ levels tend to the damage propagation that characterizes the ahistoric model. Thus, with up to 10% of connections rewired, full memory notably controls the
Cellular Automata with Memory
Cellular Automata with Memory, Figure 17 Evolution of the damage when reversing the initial state of the 3 3 central cells in the scenario of Fig. 16
spreading, but this control capacity tends to disappear with a higher percentage of rewiring connections. In fact, with rewiring of 50% or higher, neither full memory seems to be very eﬀective in altering the ﬁnal rate of damage, which tends to reach a plateau around 30% regardless of scenario. A level notably coincident with the percolation threshold in site percolation in the simple cubic lattice, and the critical point for the nearest neighbor Kaufmann model on the square lattice [49]: 0.31.
Structurally Dynamic CA Structurally dynamic cellular automata (SDCA) were suggested by Ilachinski and Halpern [36]. The essential new feature of this model was that the connections between the cells are allowed to change according to rules similar in nature to the state transition rules associated with the conventional CA. This means that given certain conditions, speciﬁed by the link transition rules, links between rules may be created and destroyed; the neighborhood of each cell is dynamic, so, state and link conﬁgurations of an SDCA are both dynamic and continually interacting. If cells are numbered 1 to N, their connectivity is speciﬁed by an N N connectivity matrix in which i j D 1 if cells i and j are connected; 0 otherwise. So, now: (T) (T) (TC1) N i D f j/ i j D 1g and i D ( j(T) 2 N i(T) ). The geodesic distance between two cells i and j, ı i j , is deﬁned as the number of links in the shortest path between i and j. Thus, i and j are direct neighbors if ı i j D 1, and are nextnearest neighbors if ı i j D 2, so
(T)
D f j/ı (T) i j D 2g. There are two types of link transition functions in an SDCA: couplers and decouplers, the former add new links, the latter remove links. The coupler and decoupler set determines the link transition rule: (T) (T) (TC1) D (l i(T) ij j ; i ; j ). Instead of introducing the formalism of the SDCA, we deal here with just one example in which the decoupler rule removes all links connected to cells in which both val(TC1) D 0 iﬀ i(T) C j(T) D 0) ues are zero ((T) i j D 1 ! i j and the coupler rule adds links between all nextnearest neighbor sites in which both values are one ((T) ij D (T) (T) (T) j D 1 iﬀ C D 2 and j 2 N N 0 ! (TC1) i i j i ). The SDCA with these transition rules for connections, together with the parity rule for mass states, is implemented in Fig. 18, in which the initial Euclidean lattice with four neighbors (so the generic cell has eight nextnearest neighbors: ) is seeded with a 3 3 block of ones. After the ﬁrst iteration, most of the lattice structure has decayed as an eﬀect of the decoupler rule, so that the active value cells and links are conﬁned to a small region. After T D 6, the link and value structures become periodic, with a periodicity of two. Memory can be embedded in links in a similar manner as in state values, so the link between any two cells is featured by a mapping of its previous link values: l i(T) j D (T) ; : : : ; ). The distance between two cells in the hisl((1) ij ij toric model (di j ), is deﬁned in terms of l instead of values, so that i and j are direct neighbors if d i j D 1, and are nextnearest neighbors if d i j D 2. Now: N i(T) D f j/d (T) ij D D 2g. Generalizing the approach 1g, and N N i(T) D f j/d (T) ij NN i
399
400
Cellular Automata with Memory
Cellular Automata with Memory, Figure 18 The SDCA described in text up to T D 6
Cellular Automata with Memory, Figure 19 The SD cellular automaton introduced in text with weighted memory of factor ˛. Evolution from T D 4 up to T D 9 starting as in Fig. 18
to embedded memory applied to states, the unchanged transition rules ( and ) operate on the featured link 2 N i ); (TC1) D and cell state values: i(TC1) D (s (T) j ij (T) (T) (T) (l i j; s i ; s j ). Figure 19 shows the eﬀect of ˛memory on the cellular automaton above introduced starting as in Fig. 18. The effect of memory on SDCA in the hexagonal and triangular tessellations is scrutinized in [11]. A plausible wiring dynamics when dealing with excitable CA is that in which the decoupler rule removes all links connected to cells in which both values are at refrac(TC1) tory state ((T) D 0 iﬀ i(T) D j(T) =2) i j D 1 ! i j and the coupler rule adds links between all nextnearest neighbor sites in which both values are excited ((T) ij D D 1 iﬀ i(T) D j(T) D 1 and j 2 N N (T) 0 ! (TC1) ij i ). In the SDCA in Fig. 20, the transition rule for cell states is that of the generalized defensive inhibition rule: resting cell is excited if its ratio of excited and connected to the cell neighbors to total number of connected neighbors lies in the interval [1/8,2/8]. The initial scenario of Fig. 20 is that of Fig. 12 with the wiring network revealed, that of an Euclidean lattice with eight neighbors, in which, the generic
cell has 16 nextnearest neighbors: . No decoupling is veriﬁed at the ﬁrst iteration in Fig. 20, but the excited cells generate new connections, most of them lost, together with some of the initial ones, at T D 3. The excited cells at T D 3 generate a crown of new connections at T D 4. Figure 21 shows the ahistoric and mode memory patterns at T D 20. The ﬁgure makes apparent the preserving eﬀect of memory. The Fredkin’s reversible construction is feasible in the SDCA scenario extending the operation also to links: (T) (T) (T1) D ((T) . These automata (TC1) ij i j ; i ; j ) i j may be endowed with memory as: i(TC1) D s (T) 2 j (T) (T) (T) (T) (T1) (TC1) (T1) N i i ; i j D l i j ; s i ; s j i j [12]. The SDCA seems to be particularly appropriate for modeling the human brain function—updating links between cells imitates variation of synaptic connections between neurons represented by the cells—in which the relevant role of memory is apparent. Models similar to SDCA have been adopted to build a dynamical network approach to quantum spacetime physics [45,46]. Reversibility is an important issue at such a fundamental physics level. Technical applications of SDCA may also be traced [47].
Cellular Automata with Memory
Cellular Automata with Memory, Figure 20 The k D 3 SD cellular automaton described in text, up to T D 4
Cellular Automata with Memory, Figure 21 The SD cellular automaton starting as in Fig. 20 at T D 20, with no memory (left) and mode memory in both cell states and links
Anyway, besides their potential applications, SDCA with memory have an aesthetic and mathematical interest on their own [1,35]. Nevertheless, it seems plausible that further study on SDCA (and Lattice Gas Automata with dynamical geometry [40]) with memory should turn out to be proﬁtable. Memory in Other Discrete Time Contexts ContinuousValued CA The mechanism of implementation of memory adopted here, keeping the transition rule unaltered but applying it to a function of previous states, can be adopted in any spatialized dynamical system. Thus, historic memory can be embedded in: Continuousvalued CA (or Coupled Map Lattices in which the state variable ranges in R, and the transi
tion rule ' is a continuous function [37]), just by considering m instead of in the application of the up (T) . An elementary dating rule: i(TC1) D ' m(T) j 2 Ni CA of this kind with memory would be [20]: i(TC1) D (T) (T) (T) 1 3 m i1 C m i C m iC1 . Fuzzy CA, a sort of continuous CA with states ranging in the real [0,1] interval. An illustration of the eﬀect of memory in fuzzy CA is given in [17]. The illustration (T) operates on the elementary rule 90 : i(TC1) D i1 ^ (T) (T) (T) ) _ ((: i1 ) ^ iC1 , which after fuzziﬁcation (: iC1 (a_b ! min(1; aCb); a^b ! ab; :a ! 1a) yields: (T) (T) (T) (T) C iC1 2 i1 iC1 ; thus incorporating i(TC1) D i1
(T) (T) (T) memory: i(TC1) D m(T) i1 C m iC1 2m i1 m iC1 . Quantum CA, such, for example, as the simple 1D quantum CA models introduced in [32]: 1 (T) (T) ; j(TC1) D 1/2 iı j1 C j(T) C iı jC1 N
401
402
Cellular Automata with Memory
which would become with memory [20]: j(TC1) D
1 (T) (T) iım(T) j1 C m j C iı m jC1 : 1/2 N
Spatial Prisoner’s Dilemma The Prisoner’s Dilemma (PD) is a game played by two players (A and B), who may choose either to cooperate (C or 1) or to defect (D or 0). Mutual cooperators each score the reward R, mutual defectors score the punishment P ; D scores the temptation T against C, who scores S (sucker’s payoﬀ) in such an encounter. Provided that T > R > P > S, mutual defection is the only equilibrium strategy pair. Thus, in a single round both players are to be penalized instead of both rewarded, but cooperation may be rewarded in an iterated (or spatial) formulation. The game is simpliﬁed (while preserving its essentials) if P D S D 0: Choosing R D 1; the model will have only one parameter: the temptation T =b. In the spatial version of the PD, each player occupies at a site (i,j) in a 2D lattice. In each generation the payoﬀ of a given individual (p(T) i; j ), is the sum over all interactions with the eight nearest neighbors and with its own site. In the next generation, an individual cell is assigned the decision (d (T) i; j ) that received the highest payoﬀ among all the cells of its Moore’s neighborhood. In case of a tie, the cell retains its choice. The spatialized PD (SPD for short) has proved to be a promising tool to explain how cooperation can hold out against the everpresent threat of exploitation [43]. This is a task that presents problems in the classic struggle for survival Darwinian framework. When dealing with the SPD, memory can be embedded not only in choices but also in rewards. Thus, in the historic model we dealt with, at T: (i) the payoﬀs coming from previous rounds are accumulated ( i;(T)j ), and (ii) players are featured by a summary of past decisions (ı (T) i; j ). Again, in each round or generation, a given cell plays with each of the eight neighbors and itself, the decision ı in the cell of the neighborhood with the highest being adopted. This approach to modeling memory has been rather neglected, the usual being that of designing strategies that specify the choice for every possible outcome in the sequence of historic choices recalled [33,39]. Table 1 shows the initial scenario starting from a single defector if 8b > 9 , b > 1:125, which means that neighbors of the initial defector become defectors at T D 2. Nowak and May paid particular attention in their seminal papers to b D 1:85, a high but not excessive temptation value which leads to complex dynamics. After T D 2, defection can advance to a 5 5 square or be
Cellular Automata with Memory, Table 1 Choices at T D 1 and T D 2; accumulated payoffs after T D 1 and T D 2 starting from a single defector in the SPD. b > 9/8
d(1) D ı (1) 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1
p(1) D (1) 9 9 9 9 9 8 8 8 9 8 8b 8 9 8 8 8 9 9 9 9
(2) = ˛p(1) + p(2) 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 8 9˛ + 7 9˛ + 9 9˛ + 9 8˛ + 5b 9˛ + 9 9˛ + 9 8˛ + 3b 9˛ + 9 9˛ + 9 8˛ + 5b 9˛ + 9 9˛ + 8 9˛ + 7 9˛ + 9 9˛ + 9 9˛ + 9
9 9 9 9 9
9˛ + 9 9˛ + 6 8˛ + 3b 8˛ 8˛ + 3b 9˛ + 6 9˛ + 9
d(2) D ı(2) 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1
9˛ + 9 9˛ + 7 8˛ + 5b 8˛ + 3b 8˛ + 5b 9˛ + 7 9˛ + 9
9˛ + 9 9˛ + 8 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 8 9˛ + 9
1 1 0 0 0 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9
restrained as a 3 3 square, depending on the comparison of 8˛ C 5 1:85 (the maximum value of the recent defectors) with 9˛ C 9 (the value of the nonaﬀected players). As 8˛ C 5 1:85 D 9˛ C 9 ! ˛ D 0:25, i. e., if ˛ > 0:25, defection remains conﬁned to a 3 3 square at T D 3. Here we see the paradigmatic eﬀect of memory: it tends to avoid the spread of defection. If memory is limited to the last three iterations: i;(T)j D 2 (T2) (T) ˛ 2 p(T2) C˛ p(T1) C p(T) C˛d (T1) C i; j i; j i; j ; m i; j D ˛ d i; j i; j (T) (T) (T) 2 d i; j /(˛ C ˛ C 1); ) ı i; j D round(m i; j ), with assigna(2) tions at T D 2: i;(2)j D ˛ i;(1)j C i;(2)j ; ı (2) i; j D d i; j . Memory has a dramatic restrictive eﬀect on the advance of defection as shown in Fig. 22. This ﬁgure shows the frequency of cooperators (f ) starting from a single defector and from a random conﬁguration of defectors in a lattice of size 400 400 with periodic boundary conditions when b D 1:85. When starting from a single defector, f at time step T is computed as the frequency of cooperators within the square of size (2(T 1) C 1)2 centered on the initial D site. The ahistoric plot reveals the convergence of f to 0.318, (which seems to be the same value regardless of the initial conditions [43]). Starting from a single defector (a), the model with small memory (˛ D 0:1) seems to reach a similar f value, but sooner and in a smoother way. The plot corresponding to ˛ D 0:2 still shows an early decay in f that leads it to about 0.6, but higher memory factor values lead f close to or over 0.9 very soon. Starting at random (b), the curves corresponding to
Cellular Automata with Memory
Cellular Automata with Memory, Figure 22 Frequency of cooperators (f) with memory of the last three iterations. a starting from a single defector, b starting at random (f(1) D 0:5). The red curves correspond to the ahistoric model, the blue ones to the full memory model, the remaining curves to values of ˛ from 0.1 to 0.9 by 0.1 intervals, in which, as a rule, the higher the ˛ the higher the f for any given T
0:1 ˛ 0:6 (thus with no memory of choices) do mimic the ahistoric curve but with higher f , as ˛ 0:7 (also memory of choices) the frequency of cooperators grows monotonically to reach almost full cooperation: D persists as scattered unconnected small oscillators (Dblinkers), as shown in Fig. 23. Similar results are found for any temptation value in the parameter region 0:8 < b < 2:0, in which spatial chaos is characteristic in the ahistoric model. It is then concluded that shorttype memory supports cooperation. As a natural extension of the described binary model, the 01 assumption underlying the model can be relaxed by allowing for degrees of cooperation in a continuousvalued scenario. Denoting by x the degree of cooperation of player A and by y the degree of cooperation of the player B, a consistent way to specify the payoﬀ for values of x and y other than zero or one is to simply interpolate between the extreme payoﬀs of the binary case. Thus, the payoﬀ that the player A receives is:
y R S G A (x; y) D (x; 1 x) : T P 1 y In the continuousvalued formulation it is ı m, (1) historic (2) including ı (2) D ˛d C d /(˛ C 1). Table 2 illusi; j i; j i; j trates the initial scenario starting from a single (full) defector. Unlike in the binary model, in which the initial defector never becomes a cooperator, the initial defector cooperates with degree ˛/(1 C ˛) at T D 3: its neighbors which received the highest accumulated payoﬀ (those in the corners with (2) D 8˛ C 5b > 8b˛), achieved this mean de
Cellular Automata with Memory, Figure 23 Patterns at T D 200 starting at random in the scenario of Fig. 22b
gree of cooperation after T D 2. Memory dramatically constrains the advance of defection in a smooth way, even for the low level ˛ D 0:1. The eﬀect appears much more homogeneous compared to the binary model, with no special case for high values of ˛, as memory on decisions is always operative in the continuousvalued model [24]. The eﬀect of unlimited trailing memory on the SPD has been studied in [5,6,7,8,9,10].
403
404
Cellular Automata with Memory Cellular Automata with Memory, Table 2 Weighted mean degrees of cooperation after T D 2 and degree of cooperation at T D 3 starting with a single defector in the continuousvalued SPD with b D 1:85
ı (2) 1 1 1 1 1 ˛ ˛ ˛ 1 1C˛ 1C˛ 1C˛ 1 1 1
˛ 1C˛ ˛ 1C˛
1 1
˛ 1C˛
˛ 1C˛ ˛ 1C˛
1
1
0
d(3) (˛ < 0:25) 1 1 1 1 1 1 1 ˛ ˛ ˛ ˛ ˛ 1 1C˛ 1C˛ 1C˛ 1C˛ 1C˛ 1 1
1
1
1
1
1
1
˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛
1 1
˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛
˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛
˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛
˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛
1
1
1
1
DiscreteTime Dynamical Systems Memory can be embedded in any model in which time plays a dynamical role. Thus, Markov chains p0TC1 D p0T M become with memory: p0TC1 D ı0T M with ıT being a weighted mean of the probability distributions up to T: ıT D (p1 ; : : : ; pT ). In such scenery, even a minimal incorporation of memory notably alters the evolution of p [23]. Last but not least, conventional, nonspatialized, discrete dynamical systems become with memory: x TC1 D f (m T ) with mT being an average of past values. As an overall rule, memory leads the dynamics a ﬁxed point of the map f [4]. We will introduce an example of this in the context of the PD game in which players follow the socalled Paulov strategy: a Paulov player cooperates if and only if both players opted for the same alternative in the previous move. The name Paulov stems from the fact that this strategy embodies an almost reﬂexlike response to the payoﬀ: it repeats its former move if it was rewarded by T or R, but switches behavior if it was punished by receiving only P or S. By coding cooperation as 1 and defection as 0, this strategy can be formulated in terms of the choices x of Player A (Paulov) and y of Player B as: x (TC1) D 1 j x (T) y (T) j. The Paulov strategy has proved to be very successful in its con
1
d(3) (˛ < 0:25) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ˛ ˛ ˛ 1 1 1C˛ 1 1C˛ 1C˛
1
1 1
1
1 1
1
1 1 1 1 1 1
1
˛ 1C˛ ˛ 1C˛
˛ 1C˛ ˛ 1C˛
˛ 1C˛ ˛ 1C˛
1 1
1 1
1 1 1 1 1 1
tests with otherstrategies [44]. Let us give a simple example of this: suppose that Player B adopts an AntiPaulov strategy (which cooperates to the extent Paulov defects) with y (TC1) D 1 j 1 x (T) y (T) j. Thus, in an iterated PaulovAntiPaulov (PAP) contest, with T(x; y) D 1 jx yj ; 1 j1 x yj , it is T(0; 0) D T(1; 1) D (1; 0), T(1; 0) D (0; 1), and T(0; 1) D (0; 1), so that (0,1) turns out to be immutable. Therefore, in an iterated PAP contest, Paulov will always defect, and AntiPaulov will always cooperate. Relaxing the 01 assumption in the standard formulation of the PAP contest, degrees of cooperation can be considered in a continuousvalued scenario. Now x and y will denote the degrees of cooperation of players A and B respectively, with both x and y lying in [0,1]. In this scenario, not only (0,1) is a ﬁxed point, but also T(0:8; 0:6) D (0:8; 0:6). Computer implementation of the iterated PAP tournament turns out to be fully disrupting of the theoretical dynamics. The errors caused by the ﬁnite precision of the computer ﬂoating point arithmetics (a common problem in dynamical systems working modulo 1) make the ﬁnal fate of every point to be (0,1). With no exceptions: even the theoretically ﬁxed point (0.8,0.6) ends up as (0,1) in the computerized implementation. A natural way to incorporate older choices in the strategies of decision is to feature players by a summary
Cellular Automata with Memory, Figure 24 Dynamics of the mean values of x (red) and y (blue) starting from any of the points of the 1 1 square
Cellular Automata with Memory
(m) of their own choices farther back in time. The PAP (T) contest becomes in this way: x (TC1) D 1 j m(T) x my j (T) ; y (TC1) D 1 j 1 m(T) x m y j. The simplest historic extension results in considering only the two last choices: m(z(T1) ; z(T) ) D (˛z(T1) C z(T) )/(˛ C 1) (z stands for both x and y) [10]. Figure 24 shows the dynamics of the mean values of x and y starting from any of the 101 101 lattice points of the 1 1 square with sides divided by 0.01 intervals. The dynamics in the ahistoric context are rather striking: immediately, at T D 2; both x and y increase from 0.5 up to app. 0:66(' 2/3), a value which remains stable up to app. T D 100, but soon after Paulov cooperation plummets, with the corresponding ﬁring of cooperation of AntiPaulov: ﬁnite precision arithmetics leads every point to (0,1). With memory, Paulov not only keeps a permanent mean degree cooperation but it is higher than that of AntiPaulov; memory tends to lead the overall dynamics to the ahistoric (theoretically) ﬁxed point (0.8, 0.6). Future Directions Embedding memory in states (and in links if wiring is also dynamic) broadens the spectrum of CA as a tool for modeling, in a fairly natural way of easy computer implementation. It is likely that in some contexts, a transition rule with memory could match the correct behavior of the CA system of a given complex system (physical, biological, social and so on). A major impediment in modeling with CA stems from the diﬃculty of utilizing the CA complex behavior to exhibit a particular behavior or perform a particular function. Average memory in CA tends to inhibit complexity, inhibition that can be modulated by varying the depth of memory, but memory not of average type opens a notable new perspective in CA. This could mean a potential advantage of CA with memory over standard CA as a tool for modeling. Anyway, besides their potential applications, CA with memory (CAM) have an aesthetic and mathematical interest on their own. Thus, it seems plausible that further study on CA with memory should turn out proﬁtable, and, maybe that as a result of a further rigorous study of CAM it will be possible to paraphrase T. Toﬀoli in presenting CAM—as an alternative to (rather than an approximation of) integrodifferential equations in modeling— phenomena with memory. Bibliography Primary Literature 1. Adamatzky A (1994) Identification of Cellular Automata. Taylor and Francis
2. Adamatzky A (2001) Computing in Nonlinear Media and Automata Collectives. IoP Publishing, London 3. Adamatzky A, Holland O (1998) Phenomenology of excitation in 2D cellular automata and swarm systems. Chaos Solit Fract 9:1233–1265 4. Aicardi F, Invernizzi S (1982) Memory effects in Discrete Dynamical Systems. Int J Bifurc Chaos 2(4):815–830 5. AlonsoSanz R (1999) The Historic Prisoner’s Dilemma. Int J Bifurc Chaos 9(6):1197–1210 6. AlonsoSanz R (2003) Reversible Cellular Automata with Memory. Phys D 175:1–30 7. AlonsoSanz R (2004) Onedimensional, r D 2 cellular automata with memory. Int J Bifurc Chaos 14:3217–3248 8. AlonsoSanz R (2004) Onedimensional, r D 2 cellular automata with memory. Int J BifurcChaos 14:3217–3248 9. AlonsoSanz R (2005) Phase transitions in an elementary probabilistic cellular automaton with memory. Phys A 347:383–401 AlonsoSanz R, Martin M (2004) Elementary Probabilistic Cellular Automata with Memory in Cells. Sloot PMA et al (eds) LNCS, vol 3305. Springer, Berlin, pp 11–20 10. AlonsoSanz R (2005) The Paulov versus AntiPaulov contest with memory. Int J Bifurc Chaos 15(10):3395–3407 11. AlonsoSanz R (2006) A Structurally Dynamic Cellular Automaton with Memory in the Triangular Tessellation. Complex Syst 17(1):1–15. AlonsoSanz R, Martin, M (2006) A Structurally Dynamic Cellular Automaton with Memory in the Hexagonal Tessellation. In: El Yacoubi S, Chopard B, Bandini S (eds) LNCS, vol 4774. Springer, Berlin, pp 3040 12. AlonsoSanz R (2007) Reversible Structurally Dynamic Cellular Automata with Memory: a simple example. J Cell Automata 2:197–201 13. AlonsoSanz R (2006) The Beehive Cellular Automaton with Memory. J Cell Autom 1:195–211 14. AlonsoSanz R (2007) A Structurally Dynamic Cellular Automaton with Memory. Chaos Solit Fract 32:1285–1295 15. AlonsoSanz R, Adamatzky A (2008) On memory and structurally dynamism in excitable cellular automata with defensive inhibition. Int J Bifurc Chaos 18(2):527–539 16. AlonsoSanz R, Cardenas JP (2007) On the effect of memory in Boolean networks with disordered dynamics: the K D 4 case. Int J Modrn Phys C 18:1313–1327 17. AlonsoSanz R, Martin M (2002) Onedimensional cellular automata with memory: patterns starting with a single site seed. Int J Bifurc Chaos 12:205–226 18. AlonsoSanz R, Martin M (2002) Twodimensional cellular automata with memory: patterns starting with a single site seed. Int J Mod Phys C 13:49–65 19. AlonsoSanz R, Martin M (2003) Cellular automata with accumulative memory: legal rules starting from a single site seed. Int J Mod Phys C 14:695–719 20. AlonsoSanz R, Martin M (2004) Elementary cellular automata with memory. Complex Syst 14:99–126 21. AlonsoSanz R, Martin M (2004) Threestate onedimensional cellular automata with memory. Chaos, Solitons Fractals 21:809–834 22. AlonsoSanz R, Martin M (2005) Onedimensional Cellular Automata with Memory in Cells of the Most Frequent Recent Value. Complex Syst 15:203–236 23. AlonsoSanz R, Martin M (2006) Elementary Cellular Automata with Elementary Memory Rules in Cells: the case of linear rules. J Cell Autom 1:70–86
405
406
Cellular Automata with Memory
24. AlonsoSanz R, Martin M (2006) Memory Boosts Cooperation. Int J Mod Phys C 17(6):841–852 25. AlonsoSanz R, Martin MC, Martin M (2000) Discounting in the Historic Prisoner’s Dilemma. Int J Bifurc Chaos 10(1):87–102 26. AlonsoSanz R, Martin MC, Martin M (2001) Historic Life Int J Bifurc Chaos 11(6):1665–1682 27. AlonsoSanz R, Martin MC, Martin M (2001) The Effect of Memory in the Spatial Continuousvalued Prisoner’s Dilemma. Int J Bifurc Chaos 11(8):2061–2083 28. AlonsoSanz R, Martin MC, Martin M (2001) The Historic Strategist. Int J Bifurc Chaos 11(4):943–966 29. AlonsoSanz R, Martin MC, Martin M (2001) The HistoricStochastic Strategist. Int J Bifurc Chaos 11(7):2037–2050 30. Alvarez G, Hernandez A, Hernandez L, Martin A (2005) A secure scheme to share secret color images. Comput Phys Commun 173:9–16 31. Fredkin E (1990) Digital mechanics. An informal process based on reversible universal cellular automata. Physica D 45:254– 270 32. Grössing G, Zeilinger A (1988) Structures in Quantum Cellular Automata. Physica B 15:366 33. Hauert C, Schuster HG (1997) Effects of increasing the number of players and memory steps in the iterated Prisoner’s Dilemma, a numerical approach. Proc R Soc Lond B 264:513– 519 34. Hooft G (1988) Equivalence Relations Between Deterministic and Quantum Mechanical Systems. J Statistical Phys 53(1/2):323–344 35. Ilachinski A (2000) Cellular Automata. World Scientific, Singapore 36. Ilachinsky A, Halpern P (1987) Structurally dynamic cellular automata. Complex Syst 1:503–527 37. Kaneko K (1986) Phenomenology and Characterization of coupled map lattices, in Dynamical Systems and Sigular Phenomena. World Scientific, Singapore 38. Kauffman SA (1993) The origins of order: SelfOrganization and Selection in Evolution. Oxford University Press, Oxford 39. Lindgren K, Nordahl MG (1994) Evolutionary dynamics of spatial games. Physica D 75:292–309 40. Love PJ, Boghosian BM, Meyer DA (2004) Lattice gas simulations of dynamical geometry in one dimension. Phil Trans R Soc Lond A 362:1667 41. Margolus N (1984) Physicslike Models of Computation. Physica D 10:81–95
42. Martin del Rey A, Pereira Mateus J, Rodriguez Sanchez G (2005) A secret sharing scheme based on cellular automata. Appl Math Comput 170(2):1356–1364 43. Nowak MA, May RM (1992) Evolutionary games and spatial chaos. Nature 359:826 44. Nowak MA, Sigmund K (1993) A strategy of winstay, loseshift that outperforms titfortat in the Prisoner’s Dilemma game. Nature 364:56–58 45. Requardt M (1998) Cellular networks as models for Plankscale physics. J Phys A 31:7797; (2006) The continuum limit to discrete geometries, arxiv.org/abs/mathps/0507017 46. Requardt M (2006) Emergent Properties in Structurally Dynamic Disordered Cellular Networks. J Cell Aut 2:273 47. Ros H, Hempel H, SchimanskyGeier L (1994) Stochastic dynamics of catalytic CO oxidation on Pt(100). Pysica A 206:421– 440 48. Sanchez JR, AlonsoSanz R (2004) Multifractal Properties of R90 Cellular Automaton with Memory. Int J Mod Phys C 15:1461 49. Stauffer D, Aharony A (1994) Introduction to percolation Theory. CRC Press, London 50. Svozil K (1986) Are quantum fields cellular automata? Phys Lett A 119(41):153–156 51. Toffoli T, Margolus M (1987) Cellular Automata Machines. MIT Press, Massachusetts 52. Toffoli T, Margolus N (1990) Invertible cellular automata: a review. Physica D 45:229–253 53. Vichniac G (1984) Simulating physics with Cellular Automata. Physica D 10:96–115 54. Watts DJ, Strogatz SH (1998) Collective dynamics of SmallWorld networks. Nature 393:440–442 55. WolfGladrow DA (2000) LatticeGas Cellular Automata and Lattice Boltzmann Models. Springer, Berlin 56. Wolfram S (1984) Universality and Complexity in Cellular Automata. Physica D 10:1–35 57. Wuensche A (2005) Glider dynamics in 3value hexagonal cellular automata: the beehive rule. Int J Unconv Comput 1:375– 398 58. Wuensche A, Lesser M (1992) The Global Dynamics of Cellular Automata. AddisonWesley, Massachusetts
Books and Reviews AlonsoSanz R (2008) Cellular Automata with Memory. Old City Publising, Philadelphia (in press)
Cellular Automata Modeling of Physical Systems
Cellular Automata Modeling of Physical Systems BASTIEN CHOPARD Computer Science Department, University of Geneva, Geneva, Switzerland Article Outline Glossary Deﬁnition of the Subject Introduction Deﬁnition of a Cellular Automata Limitations, Advantages, Drawbacks, and Extensions Applications Future Directions Bibliography Cellular automata oﬀer a powerful modeling framework to describe and study physical systems composed of interacting components. The potential of this approach is demonstrated in the case of applications taken from various ﬁelds of physics, such as reactiondiﬀusion systems, pattern formation phenomena, ﬂuid ﬂows and road traﬃc models. Glossary BGK models Lattice Boltzmann models where the collision term ˝ is expressed as a deviation from a given local equilibrium distribution f (0) , namely ˝ D ( f (0) f )/ , where f is the unknown particle distribution and a relaxation time (which is a parameter of the model). BGK stands for Bhatnager, Gross and Krook who ﬁrst considered such a collision term, but not speciﬁcally in the context of lattice systems. CA Abbreviation for cellular automata or cellular automaton. Cell The elementary spatial component of a CA. The cell is characterized by a state whose value evolves in time according to the CA rule. Cellular automaton System composed of adjacent cells or sites (usually organized as a regular lattice) which evolves in discrete time steps. Each cell is characterized by an internal state whose value belongs to a ﬁnite set. The updating of these states is made in parallel according to a local rule involving only a neighborhood of each cell. Conservation law A property of a physical system in which some quantity (such as mass, momentum or energy) is locally conserved during the time evolution. These conservation laws should be included in the mi
crodynamics of a CA model because they are essential ingredients governing the macroscopic behavior of any physical system. Collision The process by which the particles of a LGA change their direction of motion. Continuity equation An equation of the form @ t C div u D 0 expressing the mass (or particle number) conservation law. The quantity is the local density of particles and u the local velocity ﬁeld. Critical phenomena The phenomena which occur in the vicinity of a continuous phase transition, and are characterized by very long correlation length. Diﬀusion A physical process described by the equation @ t D Dr 2 , where is the density of a diﬀusing substance. Microscopically, diﬀusion can be viewed as a random motion of particles. DLA Abbreviation of Diﬀusion Limited Aggregation. Model of a physical growth process in which diﬀusing particles stick on an existing cluster when they hit it. Initially, the cluster is reduced to a single seed particle and grows as more and more particles arrive. A DLA cluster is a fractal object whose dimension is typically 1.72 if the experiment is conducted in a twodimensional space. Dynamical system A system of equations (diﬀerential equations or discretized equations) modeling the dynamical behavior of a physical system. Equilibrium states States characterizing a closed system or a system in thermal equilibrium with a heat bath. Ergodicity Property of a system or process for which the timeaverages of the observables converge, in a probabilistic sense, to their ensemble averages. Exclusion principle A restriction which is imposed on LGA or CA models to limit the number of particles per site and/or lattice directions. This ensures that the dynamics can be described with a cellular automata rule with a given maximum number of bits. The consequence of this exclusion principle is that the equilibrium distribution of the particle numbers follows a Fermi–Diraclike distribution in LGA dynamics. FHP model Abbreviation for the Frisch, Hasslacher and Pomeau lattice gas model which was the ﬁrst serious candidate to simulate twodimensional hydrodynamics on a hexagonal lattice. Fractal Mathematical object usually having a geometrical representation and whose spatial dimension is not an integer. The relation between the size of the object and its “mass” does not obey that of usual geometrical objects. A DLA cluster is an example of a fractal. Front The region where some physical process occurs. Usually the front includes the locations in space that
407
408
Cellular Automata Modeling of Physical Systems
are ﬁrst aﬀected by the phenomena. For instance, in a reaction process between two spatially separated reactants, the front describes the region where the reaction takes place. HPP model Abbreviation for the Hardy, de Pazzis and Pomeau model. The ﬁrst twodimensional LGA aimed at modeling the behavior of particles colliding on a square lattice with mass and momentum conservation. The HPP model has several physical drawbacks that have been overcome with the FHP model. Invariant A quantity which is conserved during the evolution of a dynamical system. Some invariants are imposed by the physical laws (mass, momentum, energy) and others result from the model used to describe physical situations (spurious, staggered invariants). Collisional invariants are constant vectors in the space where the Chapman–Enskog expansion is performed, associated to each quantity conserved by the collision term. Ising model Hamiltonian model describing the ferromagnetic paramagnetic transition. Each local classical spin variables s i D ˙1 interacts with its neighbors. Isotropy The property of continuous systems to be invariant under any rotations of the spatial coordinate system. Physical quantities deﬁned on a lattice and obtained by an averaging procedure may or may not be isotropic, in the continuous limit. It depends on the type of lattice and the nature of the quantity. Secondorder tensors are isotropic on a 2D square lattice but fourthorder tensors need a hexagonal lattice. Lattice The set of cells (or sites) making up the spatial area covered by a CA. Lattice Boltzmann model A physical model deﬁned on a lattice where the variables associated to each site represent an average number of particles or the probability of the presence of a particle with a given velocity. Lattice Boltzmann models can be derived from cellular automata dynamics by an averaging and factorization procedure, or be deﬁned per se, independently of a speciﬁc realization. Lattice gas A system deﬁned on a lattice where particles are present and follow given dynamics. Lattice gas automata (LGA) are a particular class of such a system where the dynamics are performed in parallel over all the sites and can be decomposed in two stages: (i) propagation: the particles jump to a nearestneighbor site, according to their direction of motion and (ii) collision: the particles entering the same site at the same iteration interact so as to produce a new particle distribution. HPP and FHP are wellknown LGA.
Lattice spacing The separation between two adjacent sites of a regular lattice. Throughout this book, it is denoted by the symbol r . LB Abbreviation for Lattice Boltzmann. LGA Abbreviation for Lattice Gas Automaton. See lattice gas model for a deﬁnition. Local equilibrium Situation in which a large system can be decomposed into subsystems, very small on a macroscopic scale but large on a microscopic scale such that each subsystem can be assumed to be in thermal equilibrium. The local equilibrium distribution is the function which makes the collision term of a Boltzmann equation vanish. Lookup table A table in which all possible outcomes of a cellular automata rule are precomputed. The use of a lookup table yields a fast implementation of a cellular automata dynamics since however complicated a rule is, the evolution of any conﬁguration of a site and its neighbors is directly obtained through a memory access. The size of a lookup table grows exponentially with the number of bits involved in the rule. Margolus neighborhood A neighborhood made of twobytwo blocks of cells, typically in a twodimensional square lattice. Each cell is updated according to the values of the other cells in the same block. A diﬀerent rule may possibly be assigned dependent on whether the cell is at the upper left, upper right, lower left or lower right location. After each iteration, the lattice partition deﬁning the Margolus blocs is shifted one cell right and one cell down so that at every other step, information can be exchanged across the lattice. Can be generalized to higher dimensions. Microdynamics The Boolean equation governing the time evolution of a LGA model or a cellular automata system. Moore neighborhood A neighborhood composed of the central cell and all eight nearest and nextnearest neighbors in a twodimensional square lattice. Can be generalized to higher dimensions. Multiparticle models Discrete dynamics modeling a physical system in which an arbitrary number of particles is allowed at each site. This is an extension of an LGA where no exclusion principle is imposed. Navier–Stokes equation The equation describing the velocity ﬁeld u in a ﬂuid ﬂow. For an incompressible ﬂuid (@ t D 0), it reads 1 @ t u C (u r)u D rP C r 2 u where is the density and P the pressure. The Navier– Stokes equation expresses the local momentum con
Cellular Automata Modeling of Physical Systems
servation in the ﬂuid and, as opposed to the Euler equation, includes the dissipative eﬀects with a viscosity term r 2 u. Together with the continuity equation, this is the fundamental equation of ﬂuid dynamics. Neighborhood The set of all cells necessary to compute a cellular automaton rule. A neighborhood is usually composed of several adjacent cells organized in a simple geometrical structure. Moore, von Neumann and Margolus neighborhoods are typical examples. Occupation numbers Boolean quantities indicating the presence or absence of a particle in a given physical state. Open system A system communicating with the environment by exchange of energy or matter. Parallel Refers to an action which is performed simultaneously at several places. A parallel updating rule corresponds to the updating of all cells at the same time as if resulting from the computations of several independent processors. Partitioning A technique consisting of dividing space in adjacent domains (through a partition) so that the evolution of each block is uniquely determined by the states of the elements within the block. Phase transition Change of state obtained when varying a control parameter such as the one occurring in the boiling or freezing of a liquid, or in the change between ferromagnetic and paramagnetic states of a magnetic solid. Propagation This is the process by which the particles of a LGA are moved to a nearest neighbor, according to the direction of their velocity vector vi . In one time step t the particle travel from cell r to cell r C vi t where r C vi t is the nearest neighbor in lattice direction i. Random walk A series of uncorrelated steps of length unity describing a random path with zero average displacement but characteristic size proportional to the square root of the number of steps. Reactiondiﬀusion systems Systems made of one or several species of particles which diﬀuse and react among themselves to produce some new species. Scaling hypothesis A hypothesis concerning the analytical properties of the thermodynamic potentials and the correlation functions in a problem invariant under a change of scale. Scaling law Relations among the critical exponents describing the power law behaviors of physical quantities in systems invariant under a change of scale. Selforganized criticality Concept aimed at describing a class of dynamical systems which naturally drive
themselves to a state where interesting physics occurs at all scales. Site Same as a cell, but preferred terminology in LGA and LB models. Spatially extended systems Physical systems involving many spatial degrees of freedom and which, usually, have rich dynamics and show complex behaviors. Coupled map lattices and cellular automata provides a way to model spatially extended systems. Spin Internal degree of freedom associated to particles in order to describe their magnetic state. A widely used case is the one of classical Ising spins. To each particle, one associates an “arrow” which is allowed to take only two diﬀerent orientations, up or down. Time step Interval of time separating two consecutive iterations in the evolution of a discrete time process, like a CA or a LB model. Throughout this work the time step is denoted by the symbol t . Universality The phenomenon whereby many microscopically diﬀerent systems exhibit a critical behavior with quantitatively identical properties such as the critical exponents. Updating operation consisting of assigning a new value to a set of variables, for instance those describing the states of a cellular automata system. The updating can be done in parallel and synchronously as is the case in CA dynamics or sequentially, one variable after another, as is usually the case for Monte–Carlo dynamics. Parallel, asynchronous updating is less common but can be envisaged too. Sequential and parallel updating schemes may yield diﬀerent results since the interdependencies between variables are treated diﬀerently. Viscosity A property of a ﬂuid indicating how much momentum “diﬀuses” through the ﬂuid in a inhomogeneous ﬂow pattern. Equivalently, it describes the stress occurring between two ﬂuid layers moving with diﬀerent velocities. A high viscosity means that the resulting drag force is important and low viscosity means th