Volume 144 Number 6 March 18, 2011
www.cell.com
Review Issue Systems Biology
INTRODUCING OUR NEW
20TH ANNIVERSARY
2011 RESEARCH PRODUCT CATALOG FROM THE LEADING SUPPLIER OF RESEARCH PRODUCTS:
S ANTA CR U Z B IOTECHN O LOG Y Y, I NC.
䡲
over 16,720 monoclonal and 34,860 polyclonal antibodies
䡲
over 135,300 siRNA and shRNA (plasmid and lentivirus) products
䡲
UltraCruz ™, ExactaCruz ™, and CrystalCruz ™ brand labware
䡲
over 141,000 ChemCruz ™ specialty biochemicals
䡲
broad range of support products and secondary antibodies
䡲
website updated daily with new antibodies, biochemicals, labware, product citations and support data
CELEBR ATI N G 20 YEA R S 19 91 – 2 011
anti an t bo ti bodi dies es s ExactaCruz™
lab la bw war arre are
ChemCruz™
siRN si RNA/ A/shRN NA
CrystalCruz™
chem ch emic ical als s
UltraCruz™
ImmunoCruz™
WWW. SC B T .COM
HEADQUARTERS
EUROPEAN SUPPORT
ASIAN SUPPORT
Santa Cruz Biotechnology, Inc. 2145 Delaware Avenue Santa Cruz, California 95060
Santa Cruz Biotechnology, Inc. Bergheimer Str. 89-2 69115 Heidelberg, Germany
Santa Cruz Biotechnology (Shanghai) Co., Ltd. Building No. 2, Lane 315, No. 1-6, Jianye Road Pudong New District, Shanghai, 201201
TOLL FREE: 800.457.3801 or
TOLL FREE: +00800.4573.8000 or
N. CHINA TOLL FREE: 10.800.711.0752
PHONE: 831.457.3800
PHONE: +49.6221.4503.0
S. CHINA TOLL FREE: 10.800.110.0694 JAPAN TOLL FREE: (010) 800.40402026 S. KOREA TOLL FREE: 00798.1.1.002.0297
© 2011 Santa Cruz Biotechnology, Inc., the Santa Cruz Biotechnology, Inc. logo, UltraCruz™, ExactaCruz ™, ChemCruz ™, ImmunoCruz ™ and CrystalCruz ™ are registered trademarks of Santa Cruz Biotechnology, Inc.
Editor Emilie Marcus Senior Deputy Editor Elena Porro Deputy Editor Robert Kruger Scientific Editors Karen Carniol Kara Cerveny Michaeleen Doucleff Fabiola Rivas Niki Scaplehorn Lara Szewczak Senior Managing Editor Meredith Adinolfi Deputy Managing Editor Andy Smith Lead Illustrator Andrew A. Tang Illustrators Yvonne Blanco Kate Mahan Production Staff Reyna Clancy Editorial Assistant Mary Beth O’Leary
Editorial Board C. David Allis Genevie`ve Almouzni Uri Alon Angelika Amon Johan Auwerx Richard Axel Cori Bargmann Konrad Basler Bonnie Bassler David Baulcombe Jeffrey Benovic Carolyn Bertozzi Wendy Bickmore Elizabeth Blackburn Joan Brugge Lewis Cantley Joanne Chory David Clapham Andrew Clark Hans Clevers Stephen Cohen Pascale Cossart George Daley Jeff Dangl Ted Dawson Pier Paolo di Fiore Marileen Dogterom Julian Downward Bruce Edgar Steve Elledge Anne Ephrussi Ronald Evans Witold Filipowicz Marco Foiani Elaine Fuchs Yukiko Goda Stephen Goff Joe Goldstein
Douglas Green Leonard Guarente Taekjip Ha Daniel Haber Ulrike Heberlein Mark Hochstrasser Erika Holzbaur Arthur Horwich Tony Hunter James Hurley Richard Hynes Thomas Jessell Tarun Kapoor Narry Kim Mary-Claire King David Kingsley Frank Kirchhoff John Kuriyan Robert Lamb Mark Lemmon Beth Levine Wendell Lim Jennifer Lippincott-Schwartz Dan Littman Richard Losick Scott Lowe Tom Maniatis Matthias Mann Kelsey Martin Joan Massague´ Iain Mattaj Satyajit Mayor Ruslan Medzhitov Craig Mello Tom Misteli Tim Mitchison Danesh Moazed Alex Mogilner Paul Nurse
Roy Parker Dana Pe’er Kathrin Plath Carol Prives Klaus Rajewsky Venki Ramakrishnan Rama Ranganathan Anne Ridley Alexander Rudensky Helen Saibil Joshua Sanes Charles Sawyers Randy Schekman Ueli Schibler Joseph Schlessinger Hans Scho¨ler Trina Schroer Geraldine Seydoux Kevan Shokat Pamela Sklar Nahum Sonenberg James Spudich Paul Sternberg Bruce Stillman Azim Surani Keiji Tanaka Craig Thompson Robert Tjian Ju¨rg Tschopp Ulrich von Andrian Gerhard Wagner Detlef Weigel Jonathan Weissman Matthew Welch Tian Xu Shinya Yamanaka Marino Zerial Xiaowei Zhuang Huda Zoghbi
Cell Office Cell, Cell Press, 600 Technology Square, 5th Floor, Cambridge, Massachusetts 02139 Phone: (+1) 617 661 7057, Fax: (+1) 617 661 7061, E-mail:
[email protected] Online Publication: http://www.cell.com Cell (ISSN 0092-8674) is published biweekly by Cell Press, 600 Technology Square, 5th Floor, Cambridge, Massachusetts 02139. The institutional subscription rate for 2011 is $1,605 (US and Canada) or $1,847 (elsewhere). The individual subscription rate is $320 (US and Canada) or $363 (elsewhere). The individual copy price is $50. Periodicals postage paid at Boston, Massachusetts and additional mailing offices. Postmaster: send address changes to Elsevier Customer Service Americas, Cell Press Journals, 11830 Westline Industrial Drive, St. Louis, MO 63146, USA. The paper used in this publication meets the requirments of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed by Dartmouth Printing Company, Hanover, NH.
BE THE FIRST
to read the latest issue of any Cell Press journal.
Register for Cell Press Email Alerts and get the complete table of contents as soon as the issue publishes online — FREE! Cell Press Email Alerts deliver the news, research, and commentaries featured in each journal’s latest issue, including the full title of every article, direct links to the articles, and the complete author list. Plus, to save you time, each research article has a brief summary highlighting its significant findings. You don’t have to be a subscriber to sign up for Cell Press Email Alerts. While subscribers have instant access to the full text of all articles listed in the Email Alerts, non-subscribers can read the abstracts of all articles as well as the full text of the issue’s Featured Article.
www.cellpress.com
Cell Press President & CEO Lynne Herndon Editor in Chief, Vice President of Content Development Emilie Marcus Vice President of Business Development Joanne Tracy Vice President of Web Development and Operations Keith Wollman Senior Product Manager Mark Van Hussen Director of Marketing Jonathan Atkinson Production Manager Meredith Adinolfi
Display Advertising Northeast/Mid-Atlantic: Victoria Macomber, ph: 508 928 1255; fax: 508 928 1256; e-mail:
[email protected] Midwest/Southeast/Eastern Canada: Inez Herrero-Redman, ph: 585 678 4395; fax: 585 678 4722; e-mail: i.herrero@elsevier. com Northwest/Southwest/Western Canada: Lori Young, ph: 646 370 6312; fax: 212 462 1915; e-mail:
[email protected] California: Elizabeth Loennborn, ph: 714 655 1877; fax: 214 452 9627; e-mail:
[email protected] UK/Europe: James Kenney, ph: +44 20 7424 4216; fax: +44 18 6585 3136; e-mail:
[email protected] Asia: Wendy Xie, ph: +86 10 8520 8827; e-mail: w.xie@ elsevier.com Classified Advertising United States and Canada: Gordon Sheffield, Key Account Manager, ph: 617 386 2189; fax: 617 397 2805; e-mail: g.sheffi
[email protected] Press Officer Cathleen Genova
UK, Europe, and Asia: Sabrina Dodge, Key Account Manager, ph: +44 20 7424 4997; fax: +44 18 6585 3136; e-mail:
[email protected] ª2011 Elsevier Inc. All rights reserved. This journal and the individual contributions contained in it are protected under copyright by Elsevier Inc., and the following terms and conditions apply to their use:
advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Although all advertising material is expected to conform to ethical (medical) standards, inclusion in this publication does not constitute a guarantee or endorsement of the quality or value of such product or of the claims made of it by its manufacturer.
Photocopying: Single photocopies of single articles may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee are required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for nonprofit educational classroom use. For information on how to seek permission, visit www.elsevier. com/permissions or call (+44) 1865 843830 (UK) / (+1) 215 239 3804 (US). Permissions: For information on how to seek permission, visit www.elsevier.com/ permissions or call (+44) 1865 843830 (UK) / (+1) 215 239 3804 (US). Derivative Works: Subscribers may reproduce tables of contents or prepare lists of articles including summaries for internal circulation within their institutions. Permission of the Publisher is required for resale or distribution outside the institution. Permission of the Publisher is required for all other derivative works, including compilations and translations (please consult www.elsevier.com/permissions). Electronic Storage or Usage: Permission of the Publisher is required to store or use electronically any material contained in this journal, including any article or part of an article (please consult www.elsevier.com/permissions). Except as outlined above, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the Publisher. Notice: No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence, or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Because of rapid
Reprints: Article reprints are available through Cell’s reprint service; for information, contact Nicholas Pavlow (e-mail:
[email protected]; ph: (+1) 212 633 3960). Subscription Orders and Inquiries: Mail, fax, or e-mail address changes to Elsevier Customer Service Americas, allowing 4–6 weeks for processing. Lost or damaged issues will be replaced, subject to availability, if Cell Press is notified within the claim period (US and airmail delivery: 3 months from issue date; surface delivery: 4 months from issue date). Periodical delivery in the US can take up to 3 weeks. Airmail delivery can take 2–4 weeks. The price of a single copy of Cell is $50 (excluding special issues). All orders must be prepaid and in writing. Please include the volume and issue number, payment (check or credit card, MasterCard, Visa, or American Express only), and a delivery address. Allow 4–6 weeks for delivery. Mailing address: Elsevier Customer Service Americas, Cell Press Journals, 11830 Westline Industrial Drive, St. Louis, MO 63146, USA. Toll-free phone within USA/Canada: 866 314 2355; phone for outside US/Canada: (+1) 314 453 7038; fax: (+1) 314 523 5170; e-mail:
[email protected]; internet: www.cellpress.com or <www.cell.com>. Funding Body Agreements and Policies: Elsevier has established agreements and developed policies to allow authors whose articles appear in journals published by Elsevier to comply with potential manuscript archiving requirements as specified as conditions of their grant awards. To learn more about existing agreements and policies, visit http://www.cell.com/cellpress/FundingBodyAgreements. Guide for Authors: For a full and complete guide for authors, please go to www.cell.com/ authors.
THE ONE YOU’VE BEEN WAITING FOR
AT-rich kb
OneTaq DNA Polymerase ™
The ONE polymerase for your endpoint PCR needs s Robust yield with minimal optimization
M
29
37
Standard 47
55
GC-rich 65
66
High GC 73
79
%GC
10.0 8.0 6.0 5.0 4.0 3.0 2.0 1.5 1.0
s Ideal for routine, AT- or GC-rich templates s Hot start and master mix versions available
0.5
Standard Reaction Buffer
Request a sample at
www.neb.com/OneTaq
GC Reaction Buffer Plus High GC Enhancer
Amplification of a selection of sequences with varying AT and GC content from human and C. elegans genomic DNA, using OneTaq DNA Polymerase.
Leading Edge Cell Volume 144 Number 6, March 18, 2011 IN THIS ISSUE
SELECT 831
Control of Biomolecule Abundance
VOICES 837
Systems Biology: What’s the Next Challenge?
ANALYSIS 839
Systems Biology: Evolving into the Mainstream
C. Macilwain
BOOK REVIEW 842
Don’t Fear the Command Line!
O.G. Troyanskaya
ESSAYS 844
Network News: Innovations in 21st Century Systems Biology
A.P. Arkin and D.V. Schaffer
850
The Cell in an Era of Systems Biology
P. Nurse and J. Hayles
855
Informing Biological Design by Integration of Systems and Synthetic Biology
C.D. Smolke and P.A. Silver
MINIREVIEW 860
Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power
T. Ideker, J. Dutkowski, and L. Hood
PERSPECTIVE 864
Principles and Strategies for Developing Network Models in Cancer
D. Pe’er and N. Hacohen
(continued)
Antibodies and Related Reagents for Signal Transduction Research
™
XP
Monoclonal Antibodies, eXceptional Performance
™
Unparalleled product quality, validation, and technical support.
XP monoclonal antibodies are generated using XMT™ technology, a proprietary monoclonal method developed at Cell Signaling Technology. This technology provides access to a broad range of antibody-producing B cells unattainable with traditional monoclonal technologies, allowing more comprehensive screening and the identification of XP monoclonal antibodies.
eXceptional specificity As with all of our antibodies, the antibody is specific to your target of interest, saving you valuable time and resources.
+ eXceptional sensitivity The antibody will provide a stronger signal for your target protein in cells and tissues, allowing you to monitor expression of low levels of endogenous proteins, saving you valuable materials.
+ eXceptional stability and reproducibility XMT technology combined with our stringent quality control ensures maximum lot-to-lot consistency and the most reproducible results.
= eXceptional Performance™ XMT Technology coupled with our extensive antibody validation and stringent quality control delivers XP monoclonal antibodies with eXceptional Performance in the widest range of applications. Above: Confocal IF analysis of rat cerebellum using β3-Tubulin (D71G9) XP™ Rabbit mAb #5568 (green) and Neurofilament-L (DA2) Mouse mAb #2835 (red). Blue pseudocolor = DRAQ5® #4084 (fluorescent DNA dye).
For additional information and a complete list of available XP™ Monoclonal Antibodies visit…
www.cellsignal.com Orders (toll-free) 1-877-616-2355
| Technical support (toll-free) 1-877-678-8324
[email protected] | Inquiries
[email protected] | Environmental Commitment eco.cellsignal.com
© 2011 Cell Signaling Technology, Inc. XMT™, XP™ , eXceptional Performance™, CST™, and Cell Signaling Technology® are trademarks of Cell Signaling Technology, Inc. / DRAQ5® is a registered trademark of Biostatus Limited
XP™ monoclonal antibodies are a line of high quality rabbit monoclonal antibodies exclusively available from Cell Signaling Technology. Any product labeled with XP has been carefully selected based on superior performance in all approved applications.
PRIMER 874
Modeling the Cell Cycle: Why Do Certain Circuits Oscillate?
J.E. Ferrell, Jr., T.Y.-C. Tsai, and Q. Yang
REVIEWS 886
Impulse Control: Temporal Dynamics in Gene Transcription
N. Yosef and A. Regev
897
Signaling from the Living Plasma Membrane
H.E. Grecco, M. Schmick, and P.I.H. Bastiaens
910
Cellular Decision Making and Biological Noise: From Microbes to Mammals
G. Bala´zsi, A. van Oudenaarden, and J.J. Collins
926
Measuring and Modeling Apoptosis in Single Cells
S.L. Spencer and P.K. Sorger
940
Control of the Embryonic Stem Cell State
R.A. Young
955
Pattern, Growth, and Control
A.D. Lander
970
Evolution of Gene Regulatory Networks Controlling Body Plan Development
I.S. Peter and E.H. Davidson
986
Interactome Networks and Human Disease
M. Vidal, M.E. Cusick, and A.-L. Baraba´si
SNAPSHOT 1000
Protein-Protein Interaction Networks
J. Seebacher and A.-C. Gavin
ANNOUNCEMENTS POSITIONS AVAILABLE
On the cover: A fundamental concept in systems biology is to represent processes within living cells as mathematical models. In this issue, Yosef and Regev (pp. 886–896) survey the current understanding on the temporal dynamics of gene expression and their underlying models of regulatory circuits. In the cover image, a murine immune dendritic cell is seen through a ‘‘systems biology’’ prism and is depicted as a blueprint with the interior systems of the cell reconstructed into such models. The cell image is courtesy of Alex Shalek, Jacob Robinson, and Hongkun Park (Harvard). Artwork by Sigrid Knemeyer, with modifications by Yvonne Blanco.
R&D Systems Tools for Cell Biology Research™
Proteins Antibodies ELISAs Assay Services MultiAnalyte Profiling Activity Assays
R&D Systems Quantikine® ELISAs
The Most Referenced Immunoassays A direct measure of product quality is the frequency of citations in the scientific literature. R&D Systems has more than 20 years of experience designing, testing, and optimizing the most cited ELISA kits in the world. Find out why scientists trust R&D Systems ELISAs more than any other brand. R&D Systems is the Most Referenced ELISA Manufacturer
NEW Quantikine ELISA Kits
Stem Cells ELISpot Kits 17.3%
Flow Cytometry
R&D Systems 41.8%
Cell Selection
7.9% 3.7% 2.5% 2.1% 1.6% 1.4%
Approximately 42% of Referenced Immunoassays are Developed and Manufactured by R&D Systems. A survey of 860 manuscripts from 44 journals was conducted to compare the number of citations specifying the use of R&D Systems ELISAs to the number citing ELISAs from other commercial sources. A total of 433 ELISA citations referencing immunoassays from 66 different vendors were identified in the survey.
y_1-Acid Glycoprotein yAngiopoietin-like 3 yCathepsin V yClusterin yDkk-1 yEGF R/ErbB1 yEG-VEGF/PK1 yFetuin A yFGF-21 yGalectin-3 yGas 6 yGDF-15 yIL-17A/F Heterodimer yIL-19 yLipocalin-2/NGAL yMBL yProprotein Convertase 9/PCSK9 yPeriostin/OSF-2 yProgranulin yST2/IL-1 R4 yThrombomodulin/CD141 yTie-1 yTIM-1/KIM-1
For more information visit our website at www.RnDSystems.com/go/ELISA For research use only. Not for use in diagnostic procedures. R&D Systems, Inc. www.RnDSystems.com R&D Systems Europe, Ltd. www.RnDSystems.co.uk R&D Systems China Co., Ltd. www.RnDSystemsChina.com.cn
Leading Edge
In This Issue In thinking about complexity, it’s frequently invoked that the whole is greater than the sum of its parts. This notion serves as one of the motivating principles of systems biology, which seeks to understand the emergent properties of complex biological systems. Among many biologists, systems biology is also synonymous with the use of particular approaches, including high-throughput techniques, large-scale integration of datasets, and computational modeling to probe system behaviors. There is indeed little doubt that the recent growth of the field has been fueled by the massive expansion in the amount of data being generated in the biological sciences—first from genome sequencing and more recently from such sources as transcriptomics, proteomics, and highthroughput imaging. Given this rising tide of data, there is an urgent need for new ways of analyzing large datasets and for conceptualizing biological complexity. It is in this context that we present our 2011 Special Review Issue on systems biology. The overarching goal of this collection is to highlight biological insights revealed by the quantitative and computational approaches associated with systems biology. To accomplish this, the issue includes topics that span vastly different size and time scales, from protein-protein interactions to disease models, from transcriptional dynamics to evolutionary processes. For the issue’s diversity, depth, and thought-provoking insights, we would like to thank the many distinguished authors and reviewers who generously contributed their time and effort. In reading the issue, we hope that you will find that the collection, like biological systems, is more than the sum of its individual parts, providing a new perspective on this rapidly changing field.
A Field in Flux Although branded as a distinct branch of inquiry only relatively recently, systems biology is already leaving its mark on the larger research landscape. In his Analysis, Colin Macilwain (page 839) gives an assessment of systems biology’s growth worldwide and reports on views from both inside and outside the field on the assimilation of systems approaches into the broader life sciences community. Indeed, techniques such as genome-wide screens and mathematical modeling are now commonplace across many disciplines. The merging of systems approaches and classic cell biology is discussed in an Essay by Paul Nurse and Jacqueline Hayles (page 850) that forecasts the impact of systems biology on our understanding of the cell’s inner workings. For a perspective on the history of the field and its recent advances, see the Essay by Adam Arkin and David Schaffer (page 844), which presents key insights into what has been learned about biological structure and its dynamic organization.
Finding Strength in Numbers (and Equations) As much as any discipline in modern biology, systems biology relies on computation and mathematics to collect data, build models, and make predictions. In their Minireview, Trey Ideker, Janusz Dutkowski, and Leroy Hood (page 860) introduce strategies for leveraging accumulated knowledge about biological systems to boost signal-to-noise in analyzing largescale datasets. To illustrate the power of these tools and concepts, they cite key studies that range from genome-wide association studies of disease to kinase-phosphatase signaling networks. In a similar vein, Dana Pe’er and Nir Hacohen (Perspective, page 864), using cancer as an example, outline strategies and principles for identifying gene networks relevant to disease phenotypes and discuss the prospects of network modeling for personalizing cancer treatment. Taking their turn at the chalkboard, James Ferrell, Tony Tsai, and Qiong Yang (Primer, page 874) guide us step-by-step through equations that model the cell cycle to explain why certain circuits oscillate. Their demonstration highlights the power of integrating knowledge gleaned from biochemistry and molecular biology with mathematical modeling. Some problems, however, require greater computing power. On this topic, Olga Troyanskaya (Book Review, page 842) comments on a recently published advanced computing how-to guide aimed at biologists. She discusses the book’s strengths and weaknesses, while encouraging bench researchers to embrace complex computation and quantitative experiments. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 827
Supplying Reagent, Pre-clinical and cGMP Grade Proteins to the Biopharmaceutical and Vaccine Development Community
Recombinant Vaccine Components ! ! P. aeruginosa rEPA ! ! E. coli • C. difficile COMING SOON!
• Circumsporozoite Protein Reagent Proteins is now your single source for reagent, pre-clinical and cGMP grade proteins. See our new website for a complete listing of products and special discounts of up to 30% on recombinant human proteins.
There’s a Time and Place for Everything In systems papers, network diagrams are often very complex. Yet, even the most complicated ‘‘hairball’’ diagram is a vast simplification of what is occurring in vivo, taking into account the intricate spatial landscape of the cell and the careful ordering of events. Two Reviews in this issue provide systemslevel perspectives on time and space. The spatial domain is tackled in a Review by Herna´n Grecco, Malte Schmick, and Philippe Bastiaens (page 897) that explores how the properties and dynamics of the plasma membrane contribute to context-dependent signaling. The temporal aspects of network behavior are examined in a Review by Nir Yosef and Aviv Regev (page 886). They focus on the molecular mechanisms that govern the timing of gene expression, discussing the functional integration of protein factors, cis-regulatory sequences in promoters, and chromatin architecture. Building on this theme of timing, this issue’s Select, written by scientific editors Lara Szewczak of Cell and Brian Plosky of Molecular Cell (page 831), highlights recent papers on the dynamics of RNA and protein abundance and how their levels and resulting cellular outputs are controlled by both intrinsic factors and external conditions.
Decisions, Decisions, Decisions Biological systems are intrinsically noisy. How stochasticity contributes to cellular decision making in diverse settings from microbes to mammals is the topic of the Review by Ga´bor Bala´zsi, Alexander van Oudenaarden, and James Collins (page 910). The complex regulation of a specific kind of cell fate, apoptosis, is further explored by Sabrina Spencer and Peter Sorger (page 926). This Review recounts efforts to quantify components of death pathways and the building of models that explain key features of apoptosis, including its cell-to-cell variability. Development relies on a carefully orchestrated series of decisions by individual cells, and the most elemental of these cells are pluripotent embryonic stem cells. From the standpoint of regulatory circuits, what is pluripotency? Richard Young (Review, page 940) addresses this question by discussing what is known about the circuits that maintain stem cells in a pluripotent state and how these circuits are disrupted during differentiation. Transitioning from individual cells to tissues, organs, and whole organisms, Arthur Lander (Review, page 955) explains how systems biology is uncovering the design principles that ensure robustness, precision, and scaling in development, citing examples from studies of vertebrates and invertebrates. The evolutionary implications of changes in gene regulatory networks for body plan development are explored by Isabelle Peter and Eric Davidson (Review, page 970), who argue that alterations at cis-regulatory nodes are especially potent instigators of evolutionary innovation.
Stay Tuned Where does systems biology go from here? We asked Rudolf Aebersold, Peer Bork, Marc Kirschner, Tobias Meyer, and Marian Walhout to comment on the big challenges facing the field. Their diverse viewpoints are presented side-by-side in a new Cell format called Voices (page 837). One expectation for systems biology is that it will lead to a better understanding of complex disease states and to the identification of new therapeutic targets. This theme, touched upon in many pieces in this issue, lies at the core of the Review by Marc Vidal, Michael Cusick, and Albert-La´szlo´ Baraba´si (page 986) in which they discuss the many different types of interactome networks and how efforts to integrate them may inform our understanding of human disease. To help ease the uninitiated into the study of protein interaction networks, Jan Seebacher and Anne-Claude Gavin (SnapShot, page 1000) have created a helpful one-page guide, which includes information on how networks are constructed, depictions of common types of network topologies, and examples of frequently used network measures. Looking further into the future, Christina Smolke and Pamela Silver (Essay, page 855) argue that synergies between the fields of systems and synthetic biology may accelerate our understanding of the design principles of life and simultaneously enhance our ability to engineer synthetic pathways and organisms for human benefit. Robert P. Kruger
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 829
Expand your stem cell library and save today on the latest books on stem cells and regenerative medicine Stem Cells
Stem Cell Anthology
Scientific Facts and Fiction
From Stem Cell Biology, Tissue Engineering, Regenerative Medicine, Cloning and Stem Cell Methods
Christine Mummery, Ian Wilmut, Anja Van de Stolpe and Bernard Roelen November 2010 | 400 pages | Paperback | $79.95 | €57.95 | £48.99 | ISBN: 9780123815354
Principles of Regenerative Medicine, 2nd Edition
Bruce M. Carlson October 2009 | 450 pp. | Hardback | $150.00 | €100.00 | £95.00 |AU$222.00 | ISBN: 9780123756824
Essential Stem Cell Methods
Anthony Atala and Robert Lanza November 2010 | 1400 pages | Hardback | $199.95 | €143.00 | £125.00 | ISBN: 9780123814227
A Volume in the Reliable Lab Solutions Series
Heart Development and Regeneration, 2-Volume Set
Tissue Engineering
Robert Lanza and Irina Klimanskaya April 2009 | 628 pp. | Paperback | $75.00 | €50.95 | £45.99 |AU$111.00 | ISBN: 9780123750617
Nadia Rosenthal and Richard P. Harvey June 2010 | 1072 pp. | Hardback | $199.95 | €143.00 | £125.00 | AU$296.00 | ISBN: 9780123813329
Clemens van Blitterswijk, Peter Thomsen, Jeffrey Hubbell, Ranieri Cancedda, Anders Lindahl Sahlgrenska, Jerome Sohier and David F. Williams March 2008 | 760 pp. | Hardback | $115.00 | €76.95 | £69.99 |AU$170.00 | ISBN: 9780123708694
Essentials of Stem Cell Biology, 2nd Edition
Human Stem Cell Manual
Robert Lanza, Roger Pedersen, John Gearhart, E. Donnall Thomas, Brigid Hogan, James Thomson, Douglas Melton and Sir Ian Wilmut June 2009 | 600 pp. | Hardback | $199.95 | €134.00 | £125.00 | AU$302.00 | ISBN: 9780123747297
Jeanne F. Loring, Robin L. Wesselschmidt and Philip H. Schwartz June 2007 | 488 pp. | Spiral bound | $88.95 | €59.95 | £53.99 |AU$132.00 | ISBN: 9780123704658
Foundations of Regenerative Medicine Clinical and Therapeutic Applications Anthony Atala, Robert Lanza, James Thomson and Robert Nerem September 2009 | 750 pp. | Hardback | $99.95 | €66.95 | £60.99|AU$148.00 | ISBN: 9780123750853
A Laboratory Guide
Handbook of Stem Cells 2-Volume Set with CD-ROM Vol. 1–2 Vol. 1 – Embryonic Stem Cells Vol. 2 – Adult & Fetal Stem Cells Robert Lanza, Roger Pedersen, Helen Blau, E. Donnall Thomas, John Gearhart, James Thomson, Brigid Hogan, Catherine Verfaillie, Douglas Melton, Irving Weissman, Malcolm Moore and Michael West September 2004 | 1,760 pp. | Hardback | $566.00 | €380.00 | £345.00 | AU$817.00 | ISBN: 9780124366435
Cell Stem Cell subscribers save 25% on their book order Secure ordering online at elsevierdirect.com Enter promo code 28024 at check out Prices and publication dates subject to change without notice.
Leading Edge
Select Control of Biomolecule Abundance Like any company that manages its resources with an eye on profitability, a cell regulates its constituent biomolecules, compensating for changes in internal function, external conditions, and sector or organismal trends. This issue’s Select focuses on new findings that reveal broad insights into how dynamic changes in RNAs and proteins are made and how those changes impact cellular function and fitness.
A NET Gain Obtaining an unbiased high-resolution view of ongoing transcription in living cells sheds light on transcriptome diversity and provides insight into regulatory events such as promoter-proximal pausing of RNA polymerase. Churchman and Weissman (2011) now report an approach dubbed native elongating transcript sequencing (NET-seq) that captures nascent transcripts associated with RNA polymerase. By applying NET-seq in budding yeast, the authors confirm the presence of divergent transcription from most promoters, producing both a protein-encoding mRNA in one direction and a short-lived RNA in the other. Somewhat surprisingly, despite the potential for promoters to support bidirectional transcription, histone deacetylation by the Rpd3S complex plays a key role in enforcing directional transcription, favoring the stable coding RNA. They also uncover widespread polymerase pausing and backtracking within gene bodies. Strains that are defective in recovery from backtracking showed significant relocation of pause sites, suggesting that many of the pause sites in gene bodies are normally associated with backtracking. Moreover, NET-seq data combined with previous data sets on nucleosome positioning reveal a strong correlation between pausing and the first four nucleosomes of the average gene. This study offers an unprecedented level of insight into the interplay between transcriptional dynamics and chromatin organization, and as the technique is broadly applicable, it is expected to reveal new mechanistic insights from a range of experimental systems. Churchman, L.S., and Weissman, J.S. (2011). Nature 469, 368–373.
Hello, I Must Be Going Messenger RNAs are largely of low abundance with limited lifetimes. Combating the experimental challenge of measuring RNAs in living cells, Miller et al. (2011) apply a combination of experimental and computational approaches to look at mRNA synthesis and decay in yeast with minimal perturbations to the cells. The trick, which has been previously applied in cells from other organisms, is to induce the cells to incorporate a nucleoside analog, 4-thiouridine, to enable selective isolation of newly synthesized transcripts. Microarray analysis of the selected mRNAs from more than 4000 yeast genes enables quantitative analysis of mRNA synthesis over time, and this information, coupled with decay rates, provides a cellular view on the dynamics of mRNA biogenesis and stability. The results confirm the general idea that most mRNAs are fleeting members of the cellular community, with only a few copies of each mRNA being made in a given cell cycle. The authors also find that transcription and decay can recur for a given sequence repeatedly during the cell cycle. Not unexpectedly, the two processes are uncoupled under basal conditions. However, the pattern changes during a stress response. For example, osmotic shock leads to a transient coregulation between the processes. During the first stage of the cellular response, both mRNA synthesis and decay are repressed as the cells hunker down to protect the resources that they have, followed by a increase in both synthesis and decay as response genes ride to the rescue in a short-term response and are then hustled away to make way for a return to homeostasis. Analysis of these cell-wide patterns also highlights new factors that are involved in the osmotic stress response and suggests candidate protein-protein interactions between transcription factors driving the response. The general utility of the approach will pave the way for similar analyses in diverse cell types, enabling a fine-scale dissection of mRNA dynamics in response to a variety of conditions. Miller, C., et al. (2011). Mol. Syst. Biol. 7, 458.
What Goes Up Must Come Down Gene expression is tailored to modulate the levels of proteins available to support cellular processes and pathways under both homeostatic and perturbed conditions. As seen for mRNA levels, which need to be modulated by decay pathways, proteins need to be removed from the cell, and different proteins have different half-lives. Eden et al. (2011) look at two broad mechanisms of protein removal—degradation and dilution—and provide an overview of how these processes influence protein dynamics in human cells. To examine changes in protein half-lives, the authors develop a noninvasive approach, termed bleach-chase, which utilizes a library of fusion proteins tagged with yellow fluorescence protein. They examine individual proteins by bleaching a subset of the expressed fusions and then monitoring the changes in the fluorescent population, allowing them to calculate the decay kinetics for the bleached (and therefore invisible) pool. Looking at 100 proteins, they found a distribution of lifetimes ranging from just less than an hour to slightly less than a day, the latter representing the average time for cell division in the human cancer cells studied. Degradation and dilution effects could be separated by varying the growth conditions and revealed distinct, functionally related subsets of the proteome that are principally dependent on the two mechanisms.
A bleach-chase approach reveals proteome half-life dynamics and the effects of degradation and dilution on different subsets of proteins. Image courtesy of G. Brodsky.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 831
3TRUGGLINGåTOåKEEPåUPåWITHåå THEåLATESTåLIFEåSCIENCEåNEWS
#ELLå$AILY¬.EWS¬!GGREGATORåHASåTHEåSOLUTION 3UBSCRIBEåTOå&2%%å$AILYå.EWSå!LERTSåATåNEWSCELLCOMåANDå GETåTHEåLATESTåLIFEåSCIENCEåHEADLINESåDELIVEREDåTOåYOURåINåBOX
NEWSCELLCOM
Perturbing growth rate, for instance with drugs or serum starvation, led to changes in protein half-lives, even when what we think of as standard avenues for degradation like the proteasome were not affected directly. Surprisingly, proteins with longer half-lives were preferentially stabilized, whereas short half-lives remained relatively constant. These findings suggest that drugs used for growth arrest may impact fast growing cells, like cancer cells, in part by knocking the expressed proteome off kilter. Eden, E., et al. (2011). Science 331, 764–768.
Meddling in Your Neighbor’s Expression Pattern The mechanisms controlling stability of biomolecules can be broadly categorized, as for proteins in the bleach-chase analysis, or can be finely parsed. Xu et al. (2011) offer insight into a mechanism for fine-tuning mRNA stability that relies on long noncoding RNAs. The authors study budding yeast and show that, under conditions of both genetic and environmental variability, genes whose loci also transcribe noncoding RNAs on the opposite strand—that is, in an antisense orientation—show a greater dynamic range of expression. Interestingly, the range extends to lower expression levels. Mechanistically, it appears that, when transcripts overlap opposing promoters, there is a greater opportunity for the antisense transcript to influence sense expression, particularly when the latter is low, creating a threshold effect. The antisense transcript, in effect, tamps down leaky expression of the sense transcript. However, when transcription is stimulated, the ‘‘off switch’’ set by the antisense transcript can be flipped ‘‘on’’ to allow efficient gene expression. The authors test this proposed model for the SUR7 transcript. The SUR7 gene lies adjacent to the GAL80 gene, and its antisense transcript shares a bidirectional promoter with GAL80. Mutations in the Gal4-binding site found within the promoter not only disrupt GAL80 expression (as expected), but also deregulate the antisense transcript and, as a consequence, SUR7 expression. This observation suggests that studies aimed at making mutations in gene promoter regions may also have effects on upstream genes through changes to antisense transcripts. This type of coregulation not only has implications for studies in more complex eukaryotes, including mouse models, but also impacts studies in synthetic biology, in which the design of many gene circuits centers on promoter-based feedback loops. Xu, Z., et al. (2011). Mol. Syst. Biol. 7, 468.
Fruit Flies as Network Engineers Although complex and interesting on their own, transcriptional regulation, mRNA dynamics, and protein stability ultimately serve to control intricate and integrated cellular processes. Translating these kinds of genome-wide or large-scale information resources into a blueprint for control in a physiological setting remains a significant challenge—one in which computational or engineering principals have informed recent steps forward. A recent paper from Afek et al. (2011) flips that around, taking a tip from biology to solve the longstanding question in distributed computing of how to best select a maximal independent set (MIS). Appropriate selection of an MIS is key for creating the backbone for wireless networks, setting up routers, and running network protocols. In a computational setting, an MIS is a subset of processors or nodes within a larger network. Every node is either in the MIS or is connected to a node in the MIS. If a node is in the MIS, it cannot be directly connected to another node within the MIS, leading to maximal independence and Nerve cells self-select to become sensory accelerated processing. Afek and colleagues noted that an analogous task is solved organ precursors (SOPs, arrows). These during the development of the Drosophila neural sensory organ precursor cells (SOPs). cells block neighboring cells from These cells behave as an MIS in that every cell is either a SOP or physically touches becoming SOPs, causing them to activate a SOP, and no two SOPs are adjacent to each other. The authors use real-time imaging the Notch protein (red). Image courtesy of and molecular modeling to study the molecular networks by which potential SOPs (candiO. Barad. dates) are selected from clusters of seemingly equivalent precursor cells and how candidates keep neighboring cells from taking an identical path. The results indicate that candidate SOPs are selected by a random self-nomination process. Once candidacy is declared, perhaps by reaching a critical threshold of a particular protein (such as Delta), the candidate suppresses any potential competition from neighboring cells. The high cell surface levels of Delta activate the Notch pathway in adjacent cells, turning off the SOP gene expression program. Moving from SOP selection in flies back to distributed computing, the authors develop a computational algorithm for MIS selection. The fly-derived algorithm requires fewer assumptions and is more energy efficient than methods currently used by engineers. These findings show that lessons from biological information processing can be used to address computational and engineering challenges. Afek, Y., et al. (2011). Science 331, 183–185.
Lara Szewczak and Brian Plosky
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 833
www.cancer-research.roche.com
Choose a Trusted, Experienced Partner To Help You Advance Your Cancer Research Faster Hoping to advance your cancer research more quickly? Start by combining Roche Applied Science’s world-class reagents and instruments to rapidly, accurately study cancer at the level of the gene, transcript, protein, and cell. As part of Roche, a company that is both the world’s leading supplier of oncology treatments (Roche Pharmaceuticals/Genentech) and a leader in molecular and tissue-based diagnostics for cancer, Roche Applied Science is uniquely qualified to be your primary partner in cancer research, offering:
CIM-Plate 16
O
Performance you can trust, plus innovation:
Combine time-tested reagents with novel instrumentation that enables you to study cancer in new ways. O
Flexible, efficient solutions that help you make more from less: Obtain more results faster in many
applications. O
Our commitment to you: Confidently rely on our
dedicated service professionals, on-site reagent stocking, customized research solutions, and much more.
Continuously Monitor Cell Invasion and Migration Examine the intricacies of cancer cell invasion and migration under optimal growth conditions by using the xCELLigence RTCA DP Instrument for real-time, label-free cellular monitoring.
xCELLigence RTCA DP Instrument
Gain Deeper Insights into Cellular Mechanisms Combine real-time, label-free cell monitoring using the xCELLigence System with endpoint assays for cell proliferation, viability, cytotoxicity, and apoptosis. Endpoint Assays
For life science research only. Not for use in diagnostic procedures. LIGHTCYCLER, NIMBLEGEN, GS FLX, 454 SEQUENCING, GS JUNIOR, XCELLIGENCE, and COMPLETE are trademarks of Roche. CIM-Plate is a registered trademark of ACEA Biosciences, Inc. in the U.S. All other product names and trademarks are the property of their respective owners. © 2011 Roche Diagnostics. All rights reserved.
Join us in
Boo AAC R 102 ntdh 10 1 at the Ann April 2-6 ual Meeting, in Orlando !
Detect and Characterize Genomic Variations Choose the new 4.2M feature NimbleGen CGH microarray or 454 Sequencing systems — or combine our NimbleGen Sequence Capture with the GS Junior Sequencing System — to detect and characterize genomic variations (copy number changes, insertions, SNPs, etc.).
GS Junior Sequencing System
Analyze Gene Expression in the Entire Transcriptome and Specific Gene Pathways LightCycler® 480 Real-Time PCR System
Rapidly profile and validate gene expression with LightCycler® high-throughput (96-, 384-, or 1536-well) real-time PCR platforms, our flexible multiplex NimbleGen microarrays, and 454 Sequencing systems.
Build a More Comprehensive Picture of DNA Methylation Enhance your cancer epigenomics research for genome-wide and targeted promoter DNA methylation analysis by using our high-resolution (up to 2.1 million features) NimbleGen microarrays, and quickly confirming methylation levels using High Resolution Melting analysis.
NimbleGen MS 200 Microarray Scanner
Investigate the Impact of Proteins in Cancer Pathways Protect proteins’ native state from dephosphorylation and degradation. Or more efficiently transfect cancer cell lines to see the impact of the transfected sequence on proteins or genes in the same cancer pathways.
Let Roche Applied Science help you reveal the cellular and molecular mechanisms of cancer. Learn more by visiting www.cancer-research.roche.com
cOmplete Protease Inhibitor Cocktail Tablets
Roche Diagnostics Corporation Roche Applied Science Indianapolis, Indiana
Announcing an innovative new textbook from Academic Cell Primer to The Immune Response, Academic Cell Update Edition By Tak W. Mak and Mary Saunders
Primer to The Immune Response, Academic Cell Update Edition, is an invaluable resource for students who need a concise but complete and understandable introduction to immunology. Academic Cell textbooks contain premium journal content from Cell Press and are part of a new cutting-edge textbook/journal collaboration designed to help today’s instructors teach students to “think like a scientist.”
academiccell.com
Primer to The Immune Response Academic Cell Update Edition Tak W. Mak The Campbell Family Institute for Breast Cancer Research, Ontario, Canada Mary Saunders The Campbell Family Institute for Breast Cancer Research, Ontario, Canada Paperback/456 pages ISBN: 9780123847430 $79.95/£54.99/€64.95
Academic Cell is a dynamic textbook publishing partnership between Academic Press and Cell Press, two market-leading publishers bringing scientific advances from the world of life science research into the classroom. Order online now from: elsevierdirect.com/9780123847430 Request and examination copy from textbooks.elsevier.com
Twitter.com/academiccell
Facebook.com/academiccell
Leading Edge
Voices Systems Biology: What’s the Next Challenge? Scale Matters
Quality Trumps Quantity
Tools, Archives, and Models
A.J. Marian Walhout
Rudolf Aebersold
Tobias Meyer
University of Massachusetts Medical School
ETH Zurich
Stanford University Medical School
Systems biology has described ‘‘small’’ systems in great detail and with great success, with the yeast galactose system and the eukaryotic cell cycle being two key examples. In my mind, the biggest challenge now is how do we phrase questions and design and interpret experiments that will illuminate how complexity is achieved in larger systems? As the physicist P.W. Anderson aptly wrote in 1972, ‘‘more is different.’’ This idea implies that entirely new concepts will emerge from studying increasingly large systems. For each system, whether an organelle, unicellular organism, metazoan, or plant, we need to define its components and the interactions between them, as well as formulate conceptual frameworks that will help us understand how complexity is achieved, not only by deriving basic design principles, but by truly understanding how simple building blocks function together in a dynamic manner in space and time. This requires the continuous development and application of unbiased and highthroughput technologies to define and perturb large biological systems. A deep and comprehensive knowledge of the way that systems develop and maintain homeostasis and respond to outside cues or insults will ultimately be essential to understand how systems go awry during aging and in disease.
There is a general sentiment among biologists that we are drowning in data, courtesy of high-throughput technologies in genomics, proteomics, metabolomics, lipidomics, and live cell imaging, with each approach generating terabytes of data per lab. So it may at first seem paradoxical that one of the big challenges for systems biology is the creation of highquality data sets. A general roadmap for studies in systems biology that I and colleagues have promoted consists of making a series of targeted perturbations in a given system and quantifying the molecular changes that result. The data obtained from multiple, sequential cycles of perturbation and measurement support the generation and refinement of mathematical models describing the dynamic behavior of the system, thus increasing biological understanding. However, for this approach to work, data quality is much more important than data quantity. Data sets useful for systems biology need to quantify each component of the system in each perturbed state with minimal error; meaning, the data sets need to be quantitative, accurate, reproducible, and complete. With the exception of transcription profiling, data sets with these attributes are rare. Fortunately, techniques are advancing at a rapid pace. In particular, new mass spectrometric approaches that accurately quantify predetermined sets of molecules (proteins, phosphoproteins, metabolites, lipids) at very high sensitivity, reproducibility, and dynamic range seem poised to alleviate systems biology’s hunger for high-quality data sets.
The field faces three major challenges. First, new tools are needed to systematically perturb and monitor signaling processes and functions in both cells and organisms. This requires innovators with engineering and chemistry backgrounds to develop chemical and optical methods to perturb and monitor complex systems as well as improvements in image analysis to facilitate the extraction of quantitative data. Second, large data sets are needed to create interaction maps linking genes to each other and to their biological functions. A big challenge is to organize and interpret the information being generated. This requires computationally skilled biologists with a mindset of librarians to not only generate data, but to also preserve and update data sets. Third, the ultimate goal is generating quantitative models that link genes, proteins, and other biomolecules to cellular and organismal function. The big challenge here is to start with draft models, typically described by differential equations, to propose the most informative perturbation tools and biosensors to then generate new data sets that can, in turn, be used to improve the underlying theory. Systems biology is a quest for finding such converging loops between theory and experiments. This requires a new breed of researchers with detailed knowledge of organismal, cellular, and molecular mechanisms and mathematical expertise, as well as creative ideas for how to attack these issues.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 837
Consider Context
Systems Pharmacology
Peer Bork
Marc W. Kirschner
EMBL
Harvard University
The amount of data being produced to study biological systems at various scales, from molecules to ecosystems, is growing exponentially, and handling this data from local production to its storage in publicly accessible and integratable depositories poses technical challenges that some areas of biology are already confronting. But the conceptual challenges ahead may be even more daunting. A major one is quantifying the impact of context, that is, experimental constraints and environmental factors that influence results. Internal and environmental properties together characterize biological systems, exemplified by human diseases, which are affected by complex genetic and environmental components, the latter being barely understood and still frequently neglected in current studies. Another challenge is how much we can abstract from observations derived from cultivated cell lines, given the absence of a native tissue context? Many current technologies impose noise onto real biological signals (for instance, studying cell populations rather than individual cells is frequently unavoidable), and given the complexity of biological systems in terms of their many interacting elements and confounding variables, how are we to estimate which aspects of a finding remain valid in other settings? Thus, there is an urgent need for generalized formal descriptions of the state and the environmental context of biological systems (metadata), which would not only improve the reproducibility and comparability of observations, but would also enable strategies for quantifying the impact of environmental conditions. Such efforts will help to minimize data overinterpretation (as can easily occur with indirect correlations) and reduce the accumulation of misleading results.
It is in pharmacology that systems biology may face its most practical challenge and opportunity. Although everyone might grant that drug action is both a quantitative and multicomponent problem, the targets of drugs are not pathways, but individual proteins. Hence, the question naturally arises: If the ultimate targets of drugs are products of individual genes, are the qualitative pictures that we currently derive from biochemistry and genetics sufficiently informative to allow those targets to be identified effectively? Or could the process be facilitated by a quantitative understanding of the dynamics of the pathways at a high level of integration? We can all grant that high-level integration exists without agreeing that analyzing it quantitatively will dramatically increase efficiency in the high-stakes search for new and better medicines. We should soon know the answer, however, because approaches for understanding pathways derived from systems biology will certainly merge with more traditional areas of pharmacology, if for no other reason than that present approaches are often unproductive. But it would be wrong to think of systems biology just as a set of tools to bring to pharmacology. Systems biology is invading virgin intellectual territory, much as molecular biology and cell biology did before. And this brash invasion has already begun to raise new questions, pose testable hypotheses, and question long-held beliefs. It will transform how we understand biological behavior. Quantitative and broad knowledge from systems biology, more than just its new tools, could soon bring major new insights to physiology and pharmacology.
838 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Leading Edge
Analysis Systems Biology: Evolving into the Mainstream Systems approaches to biology are steadily widening their reach, but the road to integration and acceptance has been fraught with skepticism and technical hurdles. Interdisciplinary research teams at systems biology centers around the globe are working to win over the critics. In 2008, the US National Research Council (NRC) appointed a panel of 16 leading biologists and engineers to determine how biology could best capitalize on the wellspring of technical advancements inundating the field. Led by Nobel laureate Philip Sharp, the committee published its findings a year later, calling for a ‘‘new biology’’ that would incorporate physics, chemistry, computer science, engineering, and mathematics. ‘‘Biological research is in the midst of a revolutionary change due to the integration of powerful technologies,’’ declares their report Biology in the 21st. Systems biology, which involves building mathematical models of living processes, sits at the heart of this new paradigm.
Systems approaches have been discussed and attempted for almost a century (see Essay by Arkin and Schaffer on page 844 of this issue). But in the late 1990s, a torrent of genomics data, together with the availability of unprecedented computing power, led Leroy Hood, founder of the Institute for Systems Biology in Seattle, and Hiroaki Kitano, a computer scientist with Sony Corporation, to propose a new, integrative ‘‘systems biology.’’ Like the recent blueprint from Sharp’s panel, this systems biology would draw on physics, chemistry, mathematics, and computer science to better understand life. Over the past decade, major funding agencies around the world have been
backing projects and programs devoted specifically to systems biology, and countless biologists have started applying system approaches into their research. But systems biology’s assimilation into the biology mainstream is still a work in progress, and the road to acceptance has been bumpier than proponents anticipated. Awkward Adolescence In essence, Hood and Kitano were advocating a paradigm shift in how biological research is performed. ‘‘We’re looking at a period of evolution in how biology is done,’’ says Adrianno Henney, director of the Germany’s largest systems biology project, the Virtual Liver Network. Currently, ‘‘systems biology is in a period of awkward adolescence, he says. Biologists still differ sharply over what the emphasis of systems biology ought to be and on its place within the wider world of biology. Additionally, some researchers still question the approach’s validity. And even Henney admits that systems biology ‘‘has not had that many real success stories’’ so far. Others disagree and claim clear successes for systems-based approaches. For example, Denis Noble, co-director of computational physiology at the University of Oxford, attributes the development of two successful angina treatments,
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 839
Ivabradine and Ranolazine, to systems approaches used in his laboratory since the 1960s, including a computer-based model of cardiac cells’ electrical rhythms. But there’s no question that integrative systems biology has proven tough to implement. Mathematical modeling has its natural home in fields of engineering, such as jet engine design, in which basic parameters (e.g., temperature and pressure) are readily measurable and their relationships well-established through the laws of thermodynamics. In contrast, the basic parameters of a biological system are hard to measure, vary over disparate timescales and often reside in a noisy milieu of irrelevant signals. Thus, our global picture of how these components interact is often incomplete. And the mathematics required to model such a complex network is not part of most university biology curricula. ‘‘I’d be the first to admit that the complexities [of biological systems] are horrific,’’ says Noble. ‘‘And I don’t oppose the reductionist paradigm. I just say that it has to be complemented by an integrative paradigm.’’ He believes that even the most committed reductionists are coming around to this perspective. Many Followers, Fewer Purists The toughness of systems approaches has constrained the field’s growth, says Kitano. He started the annual International Society of Systems Biology meeting in 2000. It drew 800 attendees in 2004 and 1200 last year. ‘‘Genomics exploded, because if you buy a sequencing machine, anyone can do it,’’ he says. ‘‘But having to combine good biology with good mathematical modeling isn’t easy.’’ Kitano says that Japan’s two major public funding agencies, Japan Science and Technology Agency (JST) and the Japanese Society for the Promotion of Science (JSPS), are supporting systems biology programs. He estimates their total annual investment in related fields at $30–$50 million but says that it ‘‘hasn’t grown as much as I would wish.’’ This picture in Japan reflects the global pattern: a large and increasing number of cell biologists, immunologists, and other biologists are incorporating systems approaches into their work, but the cadre of researchers expressly devoted to systems biology remains relatively small.
But Hood is working to expand that cadre, having built up the world’s largest center explicitly dedicated to systems biology research. The Institute for Systems Biology (ISB)—set up independently in 2000, when the University of Washington refused to host it—now has 300 staff and an annual budget of some $50 million. In April, it will occupy a 140,000 square foot complex, with the intention of further expansion. Hood believes that real systems biology requires this kind of concentrated, multidisciplinary effort. But what matters most of all, he says, ‘‘is that systems biology is driven by biology, not by computation.’’ Some scientists have been too reliant on fancy mathematical models, often borrowed from other disciplines, he says. ‘‘We need to create models that are predictive and adaptive,’’ he explains. ‘‘Most of the mathematical models out there aren’t worth a hill of beans.’’ To remedy the situation, Hood argues that biology needs more institutes established and equipped specifically for systems biology, such as the Broad Institute at Massachusetts Institute of Technology and Harvard, ETH Zurich, and the European Molecular Biology Laboratory. The ISB is currently spreading its wings internationally by helping the government of Luxembourg set up Europe’s best-resourced institute for systems biology. Headed by German geneticist Rudi Balling, the Luxembourg Center for Systems Biomedicine will have an annual budget of $20 million. Networking Hot Spots Still, public money for systems biology research remains far more plentiful in the United States than elsewhere around the globe. The Center for Bioinformatics and Computational Biology at the US National Institute of General Medical Sciences has an annual budget of $120 million, and a large majority of this funding goes to systems biology research, says its director, Karin Remington. That includes $35 million to support a dozen systems biology centers at universities. Additional centers are supported by other NIH institutes, including the National Cancer Institute. Considerably more money goes to systems biology projects through mainstream principal investigator grants. But
840 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
as Remington and others point out, quantifying the total systems funding is essentially arbitrary because it depends on which projects are regarded as taking a systems approach. The US Department of Energy says that it spent $90 million last year on systems biology through its genomic science program. This number doesn’t include its support for many facilities that apply systems approaches, such as the Joint Genome Institute at Walnut Creek, California. The National Science Foundation also has several systems programs, including the $50 million iPlant Collaborative for computational plant biology, based at the University of Arizona at Tucson. In Europe, German Federal Ministry of Education and Research (BMBF) has provided the strongest support for systems biology. Early on, German policymakers decided to focus on the study of liver cells. The approach has now evolved into the Virtual Liver Network, which aims to model the whole organ. Overall, BMBF is spending about E50 million annually to fund nine major systems biology initiatives, including the Liver Network, which involves 69 research groups across Germany. In the UK, the Biotechnology and Biological Sciences Research Council (BBSRC) has led the systems biology effort, spending £50 million since 2004 to establish six centers at UK universities, each with a different emphasis. These centers were given 5 year, nonrenewable grants that will expire this year. The BBSRC has supported another series of systems projects to the tune of £23 million over 5 years. And, according to Colin Miles, BBSRC’s head of molecular and cell biology, the council is also supporting 100 standard grants in systems biology, worth a total of £13 million annually. There has been less organized support for systems biology from the Wellcome Trust and the Medical Research Council in the UK. Jim Smith, director of the MRC’s intramural labs and head of its small systems biology division, takes a cautious line on the field’s potential. He says that development of tools needed to measure parameters, such as the tension of a particular cell substrate, is the first step in a long road to making systems biology work. One nation that has worked hard to establish a presence in the field is Switzerland. In 2008, a special decree from their parliament
established the SystemX initiative, a 4 year, SwF 100 million program headed by Rudolf Aebersold of ETZ Zurich. According to Aebersold, SystemX involves 11 partner institutions in 14 main projects, ranging in scope from measurement technologies and modeling methods to mechanisms of signaling networks, such as those in yeast metabolism. Aebersold says that he doesn’t underestimate the difficulty of modeling biological systems. ‘‘The number of degrees of freedom is one thing; the systems are also much noisier, they have not been engineered. Every element that we look at comes along with a lot of historical baggage. And things are tightly interlinked in ways that we don’t yet understand.’’
What’s Next? Most researchers who have chosen to specialize in systems biology agree with Hood that a large-scale, integrative, multidisciplinary approach is needed for systems biology to flourish. This means that funding agencies must continue to support these dedicated centers and programs. ‘‘You can’t just graft it onto cell or molecular biology,’’ Hood says. Still though, skeptics remain. Molecular biologist Sydney Brenner, for example, says that research money would be better spent on more detailed, reductionist studies of biology until we better understand how the genome programs the cell. ‘‘Eventually, that’s what you have to explain,’’ Brenner says. ‘‘I don’t want to
stop these guys. But I don’t like the sort of religion that says there is a simple path to heaven.’’ On balance, however, an increasingly integrative approach to biology seems inevitable, not so much because we know it will work, but because there is no alternative. The weaknesses of narrower approaches that rely on only one class of data, such as genome-wide association studies, and the limitations of the traditional, reductionist approach are obvious. Models of the cellular networks—imperfect as they may be— offer a route forward. The union of biology and mathematics may be a shotgun wedding, but maybe once its offspring walk and talk, initial misgivings will fade.
Colin Macilwain Edinburgh, UK DOI 10.1016/j.cell.2011.02.044
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 841
Leading Edge
Book Review Don’t Fear the Command Line! Practical Computing for Biologists Authors: Steven Haddock and Casey Dunn Sunderland, MA, USA: Sinauer Associates, Inc. (2010). 538 pp. $59.95.
Although basic computing skills are routine for most biologists, most of us still struggle with more sophisticated tasks, beyond the ‘‘out of the box’’ solutions. Unfortunately, day-to-day lab work increasingly involves more advanced computing challenges. For example, you may find yourself wanting to merge several gene expression microarray files, convert them into a format compatible with the clustering software, perform statistical analysis on all the resulting clusters, and plot the results. Worse still, you might need to organize your laboratory’s microarray studies into a single local database and apply the above analysis pipeline to all of them. How does a bench biologist solve problems like these? A new textbook, Practical Computing for Biologists, by Steven Haddock of the Monterey Bay Aquarium Research Institute and the University of California, Santa Cruz and Casey Dunn of Brown University aims to teach biology researchers the computing skills necessary in such situations. The authors describe themselves as ‘‘biologists who also happen to have backgrounds in computing,’’ and they provide a problem-oriented approach to addressing data analysis and presentation challenges in modern biology. The book covers a wide range of subjects that truly justifies the title of ‘‘practical computing.’’ In addition to the usual programming-related topics, it also includes a thorough introduction to the programming environment, approaches to combining different programs together, a description of the basic text manipulation tools such as regular expressions, and even an introduction to dealing with digital art and images. As such the book is great value for the money, being at least three books in one. Readers will benefit from the breadth of topics covered, from Python programming and image manipulation, to databases and even electronic circuits. The most dedicated can even start learning about more advanced topics
such as relational databases, though the single chapter covers only the basics of setting up and managing MySQL. One potential omission is the lack of webrelated topics, which could perhaps make an interesting addition to the second edition. The textbook’s broad scope is both an advantage and a shortcoming. On the positive side, any student who learns the full content will not only be versed in Python programming and image manipulation but also have a rudimentary understanding of databases and circuits. On
the negative side, some sections might be too advanced for the more casual reader. In addition, the overall organization of the book is sometimes puzzling, making it harder to identify the ‘‘mustread’’ parts. For example, chapter 20 provides useful tips on how to connect to remote computers through secure connections and how to control programs on your machine. These skills would be helpful to have before learning to program in chapters 7 through 13 because most modern scientific computing is performed on remote servers and clusters. In
842 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
addition, the section that focuses on combining programs and methods (Part IV) also includes a lone chapter on relational databases—advanced material that perhaps is not necessary for most readers. However, despite these shortcomings, this is an excellent textbook, especially its introduction to programming in Parts I–IV. The book is well written in a lively style that is grounded in many concrete examples. Logical formatting combined with an excellent use of color and icons make it easy to follow, even for beginners. In addition, the clear explanations ensure that students are not simply retyping example programs but are actually learning how to solve real-world biological problems with programming. The choice of Python as the programming language is astute, given its relative lack of complexity, its broad utility, and the availability of robust scientific and biologyspecific libraries. The beginning of the book (Part I) covers basic text manipulation, starting with installing and learning how to use a text editor. The authors chose Mac OS X as the setting for all of the book’s examples and specifically focus on TextWrangler, a free editor that only runs under this operating system. Although such a specific setting makes it easier for students to follow exercises, the choice of one particular operating system and a text editor that is fully tied to it seems like a limitation, especially given the proliferation of freely available text editors for every common platform. The rest of this section provides background for learning how to use regular expressions for search and replace-type operations on various text files, a very useful skill for analyzing biology datasets. The ‘‘meat’’ of the book is in Parts II–IV, which focus on programming both through command-line operations and in Python scripts. This section also cultivates an understanding of how to combine multiple methods together to address more complex tasks. The authors deserve praise here, as the book strikes an excellent balance between teaching relatively advanced programming skills and remaining utterly practical and easy to follow. For example, a whole chapter is devoted to explaining debugging techniques, an area that many biologist programmers are
never taught, leading to many frustrating hours that force them either to rediscover debugging on their own or to give up programming entirely. The book also provides many practical pointers that can help streamline future programming efforts, such as learning how to best organize data in spreadsheets to simplify down-the-line processing and analysis. The remaining sections of the book are a mix, including sections on creating and working with vector art, manipulating images, and basic electronics, while others focus on practical topics such as installing and troubleshooting software and working on remote computers. The appendices are short but very useful, including a nice reference section for
topics covered earlier in the book (Python, shell and SQL commands, and regular expression terms) as well as a short guide to working with Windows and Linux operating systems. A nice and unusual teaching tool can be found in Appendix 5, where the same template program is presented in several different programming languages. The many examples presented throughout the book provide a solid foundation for the programming concepts and make the book easier to digest. The book is billed as ‘‘standing on its own,’’ and indeed, it could be a good self-study guide for students and professionals. Parts I–IV could also serve as study material for a ‘‘Programming for Biologists’’
course at either the undergraduate or graduate level, although in this case, the book would benefit from the addition of exercises and test questions at the ends of the chapters. Overall, Practical Computing for Biologists is a good choice and great value for a textbook. As a reference manual, there may be better options whose organization is more suited for fast information retrieval (such as the O’Reilly series). However, Practical Computing for Biologists provides a clear and sophisticated background in programming for the experimental scientist and lays the foundations for more advanced topics that many biologists are likely to find increasingly useful.
Olga G. Troyanskaya1,* 1Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.02.042
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 843
Leading Edge
Essay Network News: Innovations in 21st Century Systems Biology Adam P. Arkin1,4,* and David V. Schaffer1,2,3,4 1Department
of Bioengineering of Chemical and Biomolecular Engineering 3The Helen Wills Neuroscience Institute University of California, Berkeley, Berkeley, CA 94720, USA 4Physical Biosciences Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.03.008 2Department
A decade ago, seminal perspectives and papers set a strong vision for the field of systems biology, and a number of these themes have flourished. Here, we describe key technologies and insights that have elucidated the evolution, architecture, and function of cellular networks, ultimately leading to the first predictive genome-scale regulatory and metabolic models of organisms. Can systems approaches bridge the gap between correlative analysis and mechanistic insights? System biology aims to understand how individual elements of the cell interact to generate behaviors that allow survival in changeable environments and collective cellular organization into structured communities. Ultimately, these cellular networks assemble into larger population networks to form large-scale ecologies and thinking machines, such as humans. Given this central focus on codifying the organizational principles and algorithms of life, we argue that systems biology is not a newly emerging field, but rather a mature synthesis of thought about the implications of biological structure and its dynamic organization, ideas that have been brewing for more than a century. To many scientists, the beginning of the last decade marked the definition and rise of the field of systems biology. However, systems biology’s conceptual origins date back almost 100 years. In 1917, D’Arcy Thompson formalized the first link between development, evolution, and physics in his treatise On Growth and Form, when he observed that shapes and function of biological systems were fundamentally determined by physical requirements and mechanical laws. In 1939, Walter Canon, then chairman of the Department of Physiology at Harvard Medical School, coined the term ‘‘homeostasis’’ when he noted that organisms hold essential physiological variables at constant values despite a fluctuating environment (Canon, 1939). In 1943, the
American mathematician Norbert Weiner, along with his coauthors, proposed that negative feedback loops would be central to maintaining this stability in biological systems (Rosenbleuth et al., 1943), thus linking concepts of control and optimality with biological dynamics. Ten years later, the British developmental biologist Conrad Waddington laid some of the modern foundation for systems biology when he presciently conceptualized networks of cellular components (i.e., genes, cells, and tissues) as evolutionarily dynamical systems expressible as solutions to a series of simultaneous differential equations. Over his long career, Waddington argued for a truly dynamic systems theory of cellular decision making driven by gene expression and epigenetics (Waddington, 1954, 1977). When Jacques Lucien Jacob and Franc¸ois Monod unveiled the molecular mechanisms of gene regulation in 1962, they noted, ‘‘it is obvious from the analysis of these [bacterial genetic regulatory] mechanisms that their known elements could be connected into a wide variety of ‘circuits’ endowed with any desired degree of stability’’ (Jacob and Monod, 1962). During the ensuing decade, scientists across a wide array of disciplines started exploring the nonlinear dynamics in biochemical networks. Although experimental data to support their theoretical hypotheses were still largely missing, this period was quite productive, as
844 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
numerous fundamental principles came to light. These included the possible mechanisms and advantages of different biochemical switches and oscillators with and without biochemical noise (Goodwin, 1963); new models of metabolic control and engineering (Heinrich and Rapoport, 1974; Kacser and Burns, 1973); the reverse engineering of cellular networks (Bekey and Beneken, 1978); and abstracted models of these networks to understand the evolution and optimization of specific network ‘‘designs’’ (Kauffman, 1969). Indeed, these latter principles of how networks can be structured to achieve particular functions have been used more recently to explicitly predict natural network behavior. Thus, by the early 1970s, the concepts and components were all in place for what encompasses most of what we call ‘‘systems biology’’—the integrated molecular analysis of cellular networks. However, one roadblock remained: experimental data to support the models and hypotheses. This is where the last two decades have revolutionized the field of cellular network inference and analysis. Since the early 1990s, a vast array of technologies has dramatically improved the efficiency of manipulating cells genetically, the measurement of cellular components at high precision and completeness, and the dissemination of materials and information at unprecedented speeds (due to the other network revolution, which
Figure 1. A Simplified Scheme for Organizing Results in the Field of Systems Biology References are placed (subjectively) into this space according to whether their respective study focused more on mechanistic insight or on large-scale correlation analysis (the x axis) and whether the results were primarily principles about cellular networks or predictions of their behavior (the y axis). (Because of space constraints, only the last name of the first author is given).
has also left a conceptual mark on systems biology). Many of these biological technologies are scaling by a Moore’s Law-type (Moore, 1965) dynamic in which every few years, the amount of DNA that can be sequenced or synthesized doubles in size for half the cost (as has the number of transistors on a microchip) (Carlson, 2003). Clearly, this ability to read and write genomic information has profoundly accelerated systems biology. Principles versus Prediction and Correlation versus Causation This brief historical perspective suggests that discoveries in systems biology may be organized within a conceptual space (Figure 1). The y axis distinguishes between two relatively distinct objectives: deducing principles of network organization necessary for behaviors versus reverse engineering networks to predict their behavior. Strikingly, with the advent of scaling biological data, two general approaches have evolved to meet these objectives. On one hand, correlative
studies, which are usually on the genomic scale, infer relationships among genes and modules of function. These studies can also annotate genes and their products by a ‘‘guilt-by-association’’ approach in which detailed biochemical information available about one gene or system is transferred to others with correlated behaviors. This strategy contrasts with a ‘‘casual’’ approach in which direct interactions among molecules are tracked to glean mechanistic insights. Interestingly, as genetic and biochemical technologies climb the scaling curves, correlative and causal studies have become more intermingled. In other words, as it becomes possible to rapidly alter any gene (Paddison et al., 2004), modulate any gene’s expression level, and perhaps even reorganize large regions of the genome (Gibson et al., 2010; Wang et al., 2009; Warner et al., 2010), mechanistic studies will become available at a genome scale. Obviously, prediction is not truly antipodal to principles, nor is correlation
distantly removed from causation; indeed, the quadrants are connected. However, when we asked a group of colleagues which systems biology papers over the last decade have been most important to the field, the resulting set of landmark studies naturally clustered into different regions of this systems biology ‘‘plane’’ (Figure 1). Correlative Approaches Genome-scale data have fundamentally changed the types of questions that we ask about cellular systems. We can now observe how genomes dynamically change expression in response to environmental conditions and then correlate these results to other phenotypes, such as growth, fate choices, and biosynthetic productivity. Such experiments have inspired several classes of analysis that can vastly improve the data-driven annotation of genomes, more strongly link genotype to phenotype through inferred networks of interaction, and predict behaviors of cellular systems (Figure 1, lower-left quadrant). They have also led to a wide array of conceptual interpretations about the organization and evolution of cellular networks into evolvable modules, the decomposition of these networks into recurrent regulatory ‘‘motifs’’ with useful dynamical function, and the robustness of these architectures to mutation (Figure 1, upper-left quadrant). Correlative Approaches to Predicting Function One type of analysis infers properties of biomolecules from correlated changes of genome-scale RNA, protein, DNA copy number, or metabolite abundance as it varies in time and across conditions. Most often, genes sharing common expression dynamics are inferred to share regulators and possibly functional roles, as least at some level (Brown and Botstein, 1999). The challenge in this area has been isolating the set of correlated genes from the background of measurement noise and from those genes with merely coincident coexpression. Although clustering techniques have been used for decades to derive relationships in complex correlative data sets such as those found in gene expression compendia, in 2000, Cheng and Church introduced an algorithm called ‘‘biclustering’’ that explicitly discovers ‘‘modules’’ from such data. This method identifies groups of genes, or ‘‘modules,’’
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 845
with similar patterns of expression over a specific subset of conditions (Cheng and Church, 2000). Individual genes may belong to multiple modules, thereby allowing inference of their numerous functions and combinatorial regulation. This important work inspired an increasing number of algorithms concerned with identifying related sets of biomolecules from complex data and inferring their ‘‘modular’’ function. These algorithms thereby opened the door to discovering an apparent hierarchical modular architecture to cellular regulation, which complements the more informal ‘‘pathway’’ organization with which biologists were familiar. The modules of coherent function also greatly simplify construction and interpretation of predictive models, as they enabled prediction of how different modules, rather than the individual constituent genes, are dynamically deployed—a system formulation that has far fewer variables and thus requires far less data. Gene expression can be an indirect measurement of a component’s contribution to a particular cellular process, and thus, genetic perturbations and activity assays may be required. In seminal work, Giaever et al. (2002) constructed a bar-coded deletion library for the entire genome of Saccharomyces cerevisiae. This library enabled single-pot assays of the relative growth or fitness of each strain when exposed to a specific condition (Giaever et al., 2002). In a subsequent study, a growth phenotype for nearly every gene in yeast was identified using 1000 chemical perturbations (Hillenmeyer et al., 2008). These types of studies can rapidly dissect the cellular targets of drugs and even directly identify specific transporters involved. In addition, these studies have shown that genes displaying changes in expression under a given condition are not always the genes necessary for responding functionally to that condition (Giaever et al., 2002). Although the implications of this result are not fully understood, one obvious conclusion is that different types of experiments are required to deduce or even predict function of genes. Correlative Prediction of Organization Another type of analysis seeks to infer relationships among gene modules; in other words, the strategy used to infer
function of a single gene is now extended to infer the underlying biochemical network (Arkin et al., 1997). In 2001, Ideker et al. combined genetic, macromolecular interactions and expression data (both protein and gene) to infer how the galactose utilization network in yeast is regulated (Ideker et al., 2001). They then used the resulting ‘‘influence network’’ to predict how the system responds to genetic perturbations. Some of these predictions were validated by experiments, yet others were proven incorrect, suggesting that properties of this wellcharacterized regulatory network still await discovery. Variants of this approach that applied additional, more sophisticated algorithms from multivariate statistics and machine learning quickly began to have a strong impact on the field. In particular, Hartemink et al. (2001) offered perhaps the first Bayesian approach for rating different network structural hypotheses (i.e., different patterns of molecular interaction) against data. Using a collection of 52 conditions, they demonstrated that it was possible to infer the regulatory interactions in the galactose pathway (Hartemink et al., 2001). Two years later, Segal et al. (2003) increased the power of these algorithms to infer the sets of genes (i.e., modules) regulated by particular transcription factors under specific conditions. This algorithm also correctly predicted new regulatory roles for lesscharacterized proteins (Segal et al., 2003). In particular, the model predicted that one putative transcription factor (Ypl230w) and two signaling molecules (Kin82 and Ppt1) were important for cellular response to three different conditions: heat shock, hypo-osmotic shift, and entry into stationary phase, respectively. Disrupting the genes elicited no expression phenotype in rich, unstressed conditions but strong changes in expression relative to wild-type in the condition predicted to be relevant for a given gene. Applying a different statistical approach called ‘‘Partial Least Squares Regression,’’ Janes and colleagues undertook herculean efforts to measure and correlate mammalian cell survival, apoptosis, intracellular protein phosphorylation states, and kinase activities (thereby generating a data set with 7980 intracellular measurements) in response to combinations of
846 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
extracellular growth factor and cytokine inputs (Janes et al., 2005, 2006). The resulting model successfully predicted the level of apoptosis as a function of cytokine inputs and led to the new mechanistic insight that cascades of autocrine signaling were involved in mediating downstream cell responses to the extracellular cues. Shortly thereafter, in another landmark paper, Bonneau et al. (2007) demonstrated how the output of a new gene expression biclustering algorithm provided input to a clever regression algorithm that deciphers the transcriptional regulatory network of an Archaea (Halobacterium salinarum NRC-1) and predicts expression responses to > 100 conditions (Bonneau et al., 2007). Recently, such correlative systems analyses are scaling up to link biomolecular networks to ecological networks. These pioneering studies are uncovering new scales of biological organization that should lead to entirely new principles of ecosystem function (Zhou et al., 2010). Nevertheless, it is not yet clear how to optimally design perturbation repertoires to achieve maximum accuracy in annotating gene function and regulation and in predictive model inference with minimal expense. Also, it has yet to be proven that the models obtained in these types of studies are sufficiently accurate or inexpensive to have an impact in a medical or industrial setting. Nonetheless, the ability to collect such compendia of data, even from diverse types of experiments, is rapidly becoming a feasible task for even a single laboratory to accomplish. We predict that the increased accessibility to these large-scale data sets will enable the detailed characterization of organisms after their genomes are sequenced and may, ultimately, change what it means to ‘‘complete’’ the genome of an organism. Uncovering Principles of Network Organization The fact that clear functional modules of gene expression can be inferred from correlative data sets implies the existence of underlying organizational principles for these networks. Similar hierarchies of modules have been found in large-scale protein interaction data and metabolic networks. Certain ‘‘scale-free’’ topologies of molecular interaction networks have received considerable attention in biology
and other fields. Such topologies, which seem to arise often in both natural and human designed systems, are characterized by a pattern of interconnectedness among the nodes (e.g., proteins) in which the number of interactions per node follows a power law. Influential papers have suggested that these topologies lead to robustness to perturbation (Jeong et al., 2000) and in the case of proteins, naturally arise due the evolutionary process of duplication and divergence (Rzhetsky and Gomez, 2001). Likewise, in developmental biology, it has been argued for decades that for integrated cellular processes to evolve, they must be dissociable into hierarchical, modular units that can adapt their behavior with little interference from other such units. Thus, interaction and expression modules may allow rapid, effective rewiring and tuning of internal dynamics (Price et al., 2007; Singh et al., 2008), such that this ability to evolve may even be a selectable trait (Earl and Deem, 2004). However, caution must be taken in assigning evolutionary meaning to apparent modularity (Lynch, 2007). On slightly smaller size scales, certain topological motifs—that is, stereotypical small networks of regulatory interactions and chemical reactions—may have important control functions for cellular networks (Rao and Arkin, 2001). The availability of large-scale data has, in the last decade, enabled the discovery that certain motifs appear more than expected by random chance (Shen-Orr et al., 2002), including feed-forward and feedback loops (for more on feed-forward loops, see Review by Yosef and Regev on page 886 of this issue). These motifs have potential functional importance, such as noise rejection, and appear physiologically robust but also evolutionarily flexible with tunable function (Voigt et al., 2005). Milo et al. (2002) hypothesized that these motifs might form a sort of basis set of dynamic functions from which complex optimized networks could be assembled in numerous contexts within and outside of biology (Milo et al., 2002). A beautiful theoretical paper by Segre` et al. (2005) determined another organizational principle of cellular networks. They not only showed that functional modules could be inferred from growth phenotypes of double knockout mutants, but also that
the epistatic interactions between pairs of genes in these modules always fell into one of two classes of interactions: buffering, in which epitasis diminishes the individual phenotypic effects of the two mutations, or aggravating, in which the deleterious, individual effects of two mutations are worsened by their combination (Segre` et al., 2005). Modules were thus ‘‘monochromatic’’ and never contained mixed type genes, a principle that was recently verified experimentally (Costanzo et al., 2010). These architectural principles uncovered from large sets of correlative data are evocative and well supported, but the challenge remains to find incontrovertible evidence for evolutionary selection of these architectures and to fully characterize their functional consequences. Mechanistic Approaches to Study Causal Relationships Although large-scale genomic data sets lend themselves to statistical analysis of correlation, causal analysis necessitates more detailed biochemical data on the networks’ effectors, such as proteins, second messengers, and metabolites. Unfortunately, the experimental analyses of these components have not enjoyed the same growth in scale as those of nucleic acids. That is, whereas volumes of data on one-dimensional genomes are readily available, causal analysis also requires multidimensional data on biomolecules’ interactions, reactions and their rates, localization, and transport. Mass spectrometry, imaging, genetic sensors, chemical probes, and other technologies are increasingly providing such data, but not yet at the same magnitude as genomic information. As a result, causal analyses of cellular networks initially focused on elucidating functional principles but are becoming increasingly empowered with data to enable prediction. Uncovering Principles of Function Large-scale models of biological networks face the challenges that molecular mechanisms are often complex and nonlinear (e.g., cooperative protein interactions and epigenetic regulation) and many of their inherent parameters are unknown (e.g., affinities and rate constants). However, in some model systems, the biochemistry is sufficiently well characterized to enable the construction of elegant, large-scale models.
As a prime example, Tyson and colleagues (Chen et al., 2004) modeled the cell-cycle control system of Saccharomyces cerevisiae using a set of 35 ordinary differentiation equations (ODE) representing molecular mechanisms and mass action (Chen et al., 2004) (for more on modeling the cell cycle, see Primer by Ferrell et al. on page 874 of this issue). The goal of the model was not to account for the full complexity of the system but instead to provide a reasonable approximation of network behavior and to uncover dynamical principles of the architecture. Indeed, their model succeeded in accounting for a majority of mutant phenotypes simulated. Using a similar framework, El-Samad et al. (2005a, 2005b) modeled the heat shock response in Escherichia coli. Despite the simplicity of the response— deploying chaperones to keep proteins folded at higher temperature—this model uncovered complexity in the modular control structure of the system. It also demonstrated how the many feedback loops in this system confer the ability to respond quickly and robustly while also trying to minimize the energetic cost of heat shock protein expression (El-Samad et al., 2005a, 2005b). In another important study, Yi et al. used dynamical systems control theory to analyze bacterial chemotaxis (Yi et al., 2000), another system with well-characterized biochemistry. Building on the principle that negative feedback is often central to biological stability (Rosenbleuth et al., 1943), the study found that integral feedback control underlies the robustness of network adaptation to significant perturbations in both the amounts and kinetic parameters of its component proteins. Interestingly, control engineers ‘‘reinvented’’ this strategy and proved that it is required, in certain conditions, to build robustness into electrical circuits and other systems. Deterministic representations of networks are compromised when their constituents are present at low concentrations or undergo slow reactions. Moreover, early studies suggested that noise can significantly influence network function (Arkin et al., 1998). Elowitz et al. (2002) explored the principle that fluctuations in the quantities and reaction rates of gene expression machinery can cause noise in gene expression at both a global
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 847
level in a cell (extrinsic), as well as for an individual gene (intrinsic) (Elowitz et al., 2002). Indeed, subsequent single-molecule imaging studies directly confirmed that both translation (Yu et al., 2006) and transcription (Raj et al., 2006) can underlie such noisy protein expression. The principle that noise is inherent in biological networks raised the question of whether its effects on biological fitness are neutral, positive, or negative. Although the value of noise depends on the system, in certain cases, noise appears to make positive contributions to fitness. Organisms have a need to adapt to changing environments, and two adaptation strategies are sensing and responding to change or stochastically switching phenotype. Two theoretical studies arrived at the principle that, under some conditions, such as when transitions in selective environments are slow or cannot be sensed, stochastic fluctuations in an organism’s phenotype can increase its fitness (Kussell and Leibler, 2005; Wolf et al., 2005). In a study that combined experimental approaches with simulations, Weinberger et al. (2005) investigated this principle by analyzing stochastic effects in HIV infection (Weinberger et al., 2005). Low initial numbers of viral molecules, slow gene expression, and amplification by a positive feedback loop lead to very noisy gene expression, which for some infections yielded long delays in gene expression. This delayed expression contributed to the formation of latent HIV, which is clinically recognized as the most formidable barrier to the elimination of virus from a patient. In an elegant study, Acar et al. (2008) engineered Saccharomyces cerevisiae strains that stochastically switched phenotypes at different rates. Interestingly, they found that the fast-switching strain outgrew the slow-switching strain in environments undergoing rapid fluctuations, whereas the slow-switching strains were more fit in environments that fluctuated slowly (Acar et al., 2008). Predictive Analysis of Network and Cell Function The complexity of molecular mechanisms and scarcity of biochemical parameters often makes the development of predictive models challenging. Ibarra et al. (2002) created a constraints-based
whole-cell metabolic model for E. coli, in which stoichiometric, thermodynamic, and other constraints mathematically yielded a solution space of allowed metabolic network states (Ibarra et al., 2002). This model, which requires fewer parameters than full dynamical models, can make predictions of network function that optimize growth under different environmental conditions. Indeed, when Ibarra et al. grew E. coli on a new carbon substrate, the cells evolved to the metabolic state predicted by the model. In some systems, substantive comparison to data can yield deterministic models increasingly capable of prediction. Hoffmann and colleagues (2002) analyzed the mammalian NF-kB system (Hoffmann et al., 2002), in which activation of this transcription factor upregulates expression of IkBa, a negative regulator of NF-kB. Integrating experimental data with a deterministic model enabled prediction of the oscillatory behavior of this module upon stimulation and perturbation. Finally, Schoeberl et al. (2002) developed a model with 94 ODEs to simulate epidermal growth factor signaling through MAP kinase, including receptor trafficking dynamics and intracellular phosphorylation cascades (Schoeberl et al., 2002). This is the first dynamic model of a large cellular signaling network that was carefully parameterized by prior experimental measurements and that yielded prediction on signal transduction dynamics, which were subsequently validated experimentally. The Next Decade As systems biology matures, the number of studies linking correlation with causation and principles with prediction continues to grow (Figure 1). Advances in measurement technologies that enable large-scale experiments across an array of parameters and conditions will increasingly meld these correlative and causal approaches, including correlative analyses leading to mechanistic hypothesis testing as well as causal models empowered with sufficient data to make predictions. In addition, the increasing number of organisms sequenced and the increasing ease of measurement and genetic manipulation will enable deep comparison of systems across phyloge-
848 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
netic trees, thereby enhancing our understanding of mechanistic features that are necessary for function and evolution. The increasing integration of experimental and computational technologies will thus corroborate, deepen, and diversify the theories that the earliest systems biologists used logic to infer, thereby inching us ever closer to that central question: ‘‘What is Life’’? ACKNOWLEDGMENTS We would like to thank our colleagues for suggesting a number of the papers that we reference in this work, and we apologize for not being able to include all of them. The authors would like to acknowledge the National Institutes of Health (R01 GM073010-01), and work conducted by ENIGMA was supported by the Office of Science, Office of Biological and Environmental Research of the US Department of Energy under contract number DE-AC02-05CH11231.
REFERENCES Acar, M., Mettetal, J.T., and van Oudenaarden, A. (2008). Nat. Genet. 40, 471–475. Arkin, A., Ross, J., and McAdams, H.H. (1998). Genetics 149, 1633–1648. Arkin, A.P., Shen, P.-D., and Ross, J. (1997). Science 277, 1275. Bekey, G.A., and Beneken, J.E.W. (1978). Automatica 14, 41–47. Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). Cell 131, 1354–1365. Brown, P.O., and Botstein, D. (1999). Nat. Genet. 21(1, Suppl), 33–37. Canon, W. (1939). The Wisdom of the Body (London: Norton). Carlson, R. (2003). Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science 1, 203–214. Chen, K.C., Calzone, L., Csikasz-Nagy, A., Cross, F.R., Novak, B., and Tyson, J.J. (2004). Mol. Biol. Cell 15, 3841–3862. Cheng, Y., and Church, G.M. (2000). Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 93–103. Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh, J.L., Toufighi, K., Mostafavi, S., et al. (2010). Science 327, 425–431. Earl, D.J., and Deem, M.W. (2004). Proc. Natl. Acad. Sci. USA 101, 11531–11536. El-Samad, H., Khammash, M., Homescu, C., and Petzold, L. (2005a). Proceedings 16th IFAC World Congress. http://engineering.ucsb.edu/cse/ Files/IFACC_HS_OPT04.pdf.
El-Samad, H., Kurata, H., Doyle, J.C., Gross, C.A., and Khammash, M. (2005b). Proc. Natl. Acad. Sci. USA 102, 2736–2741.
Janes, K.A., Albeck, J.G., Gaudet, S., Sorger, P.K., Lauffenburger, D.A., and Yaffe, M.B. (2005). Science 310, 1646–1653.
Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D., and Mu¨ller, G. (2002). Nat. Biotechnol. 20, 370–375.
Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. (2002). Science 297, 1183–1186.
Janes, K.A., Gaudet, S., Albeck, J.G., Nielsen, U.B., Lauffenburger, D.A., and Sorger, P.K. (2006). Cell 124, 1225–1239.
Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Friedman, N. (2003). Nat. Genet. 34, 166–176.
Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Baraba´si, A.L. (2000). Nature 407, 651–654.
Segre`, D., Deluna, A., Church, G.M., and Kishony, R. (2005). Nat. Genet. 37, 77–83.
Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Ve´ronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre´, B., et al. (2002). Nature 418, 387–391. Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., Chuang, R.-Y., Algire, M.A., Benders, G.A., Montague, M.G., Ma, L., Moodie, M.M., et al. (2010). Science 329, 52–56.
Kacser, H., and Burns, J.A. (1973). Symp. Soc. Exp. Biol. 27, 65–104.
Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. (2002). Nat. Genet. 31, 64–68.
Kauffman, S.A. (1969). J. Theor. Biol. 22, 437–467.
Singh, A.H., Wolf, D.M., Wang, P., and Arkin, A.P. (2008). Proc. Natl. Acad. Sci. USA 105, 7500–7505.
Kussell, E., and Leibler, S. (2005). Science 309, 2075–2078.
Voigt, C.A., Wolf, D.M., and Arkin, A.P. (2005). Genetics 169, 1187–1202.
Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Young, R.A. (2001). In Pacific Symposium on Biocomputing 2001 (PSB01), R. Altman, A.K. Dunker, L. Hunter, K. Lauderdale, and T. Klein, eds. (New Jersey:: World Scientific), pp. 422–433.
Lynch, M. (2007). Proc. Natl. Acad. Sci. USA 104 (Suppl 1), 8597–8604.
Waddington, C.H. (1954). Proceedings of the 9th International Congress of Genetics 9, 232–245.
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Science 298, 824–827.
Waddington, C.H. (1977). Tools for thought (New York: Basic Books).
Heinrich, R., and Rapoport, T.A. (1974). Eur. J. Biochem. 42, 89–95.
Moore, G.E. (1965). Electronics 38, 114–117.
Goodwin, B.C. (1963). (London: Academic Press).
Hillenmeyer, M.E., Fung, E., Wildenhain, J., Pierce, S.E., Hoon, S., Lee, W., Proctor, M., St Onge, R.P., Tyers, M., Koller, D., et al. (2008). Science 320, 362–365. Hoffmann, A., Levchenko, A., Scott, M.L., and Baltimore, D. (2002). Science 298, 1241–1245. Ibarra, R.U., Edwards, J.S., and Palsson, B.O. (2002). Nature 420, 186–189. Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, D.R., Aebersold, R., and Hood, L. (2001). Science 292, 929–934. Jacob, F., and Monod, J. (1962). Cold Spring Harb. Symp. Quant. Biol. 26, 193–211.
Paddison, P.J., Silva, J.M., Conklin, D.S., Schlabach, M., Li, M., Aruleba, S., Balija, V., O’Shaughnessy, A., Gnoj, L., Scobie, K., et al. (2004). Nature 428, 427–431. Price, M.N., Dehal, P.S., and Arkin, A.P. (2007). PLoS Comput. Biol. 3, 1739–1750. Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., and Tyagi, S. (2006). PLoS Biol. 4, e309.
Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, G., Forest, C.R., and Church, G.M. (2009). Nature 460, 894–898. Warner, J.R., Reeder, P.J., Karimpour-Fard, A., Woodruff, L.B., and Gill, R.T. (2010). Nat. Biotechnol. 28, 856–862. Weinberger, L.S., Burnett, J.C., Toettcher, J.E., Arkin, A.P., and Schaffer, D.V. (2005). Cell 122, 169–182. Wolf, D.M., Vazirani, V.V., and Arkin, A.P. (2005). J. Theor. Biol. 234, 227–253.
Rao, C.V., and Arkin, A.P. (2001). Annu. Rev. Biomed. Eng. 3, 391–419.
Yi, T.-M., Huang, Y., Simon, M.I., and Doyle, J. (2000). Proc. Natl. Acad. Sci. USA 97, 4649–4653.
Rosenbleuth, A., Wiener, N., and Bigelow, J. (1943). Philos. Sci. 10, 18–43.
Yu, J., Xiao, J., Ren, X., Lao, K., and Xie, X.S. (2006). Science 311, 1600–1603.
Rzhetsky, A., and Gomez, S.M. (2001). Bioinformatics 17, 988–996.
Zhou, J., Deng, Y., Luo, F., He, Z., Tu, Q., and Zhi, X. (2010). MBio. 1, e00169-10.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 849
Leading Edge
Essay The Cell in an Era of Systems Biology Paul Nurse1,2,* and Jacqueline Hayles1 1Cancer
Research UK, London Research Institute, 44, Lincoln’s Inn Fields, London UK WC2A 3LY, UK Rockefeller University, 1230 York Avenue, New York, NY 10021-6399, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.02.045 2The
The increasing use of high-throughput technologies and computational modeling is revealing new levels of biological function and organization. How are these features of systems biology influencing our view of the cell? It is difficult to forecast the impact of systems biology on our understanding of the cell, an issue not made any easier by the fact that there is as yet no firm consensus as to what is meant by ‘‘systems biology,’’ although as our colleague Marc Kirschner has said, ‘‘we all seem to know it when we see it.’’ And in that spirit we will discuss here the various attributes and methodological approaches usually associated with systems biology, how they have been applied to cell biology, and how they may be developed to attain a better understanding of how cells work. Reductionism and Holism Discussions of systems biology often make a distinction between holistic and reductionist approaches. Our view is that scientific explanations and methodologies are essentially reductionist in nature. However, although it is difficult to imagine a scientific enquiry or explanation that is not reductionist, it is important to keep a focus on the behavior of whole systems in biology and to understand how the interactions and processes brought about by component parts acting at lower levels in a system are constrained by overall functions acting at higher levels. Sometimes those of a more holistic persuasion object to the dominance of molecular explanations in cell biology, but the fact is that most useful explanations in cell biology have to be in terms of molecules because molecules are the most relevant lower-level component into which to decompose the function and organization of the cell. However, not all explanations in biology are molecular, for example developmental processes may be explained in terms of cell behavior (Towers and Tickle, 2009) and neurobiology by the action of neural
networks (Langston et al., 2010), and likewise some insights in cell biology may not arise from strictly molecular explanations. Biological Function and Organization One approach to systems biology has been to emphasize the overall biological functions expressed at different levels of biological organization, such as the organelle, the cell, the tissue, the organ, and the organism. The level of the cell occupies a particularly important position (Brenner, 2010; Nurse, 2008) as it is the simplest unit exhibiting the characteristics of life, so understanding biological function at the level of the cell brings us closer to a better appreciation of the nature of life. The differing levels or units of organization from organelle to organism often exhibit teleonomic, that is, apparently purposeful behaviors (Monod, 1972). Examples of purposeful behavior include homeostasis and the maintenance of organizational integrity, the generation of spatial and temporal order, communication within and between the units of organization, and the reproduction of those units. The objective of this approach is to understand how teleonomic behaviors are generated at the different units of organization, usually in terms of molecules and of interactions between molecules. This view of systems biology stresses overall biological function of the relevant biological unit and is an approach encompassed by a number of traditional biological disciplines including physiology and forward genetics. An interest in the overall biological functions of a living organism naturally leads to consideration of the influence of ecology and evolution on how that organism works. This also applies to cells, and
850 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
increasingly those interested in systems approaches in cell biology are considering ecological and evolutionary perspectives (Ezov et al., 2006; Liti et al., 2009). Ecology is relevant to the relationships of a cell with other cells and with its physical environment and applies to both free-living single-celled organisms such as the yeasts and Protozoa and to cells within tissues. Ecological and evolutionary perspectives can help to understand how a cell has come to function as it does and to improve awareness of the selective pressures operating on a cell (Ding et al., 2010; Shah et al., 2009). Improved contacts between the ecological and evolutionary communities and cell biologists will enhance these studies. Ensemble Descriptions An approach often associated with systems biology is the generation of ensemble descriptions, that is, the collection of data describing the behavior of large numbers of components. This has been made possible by increasingly sophisticated technologies and analytical procedures, which have led to massively parallel collections of different types of data and the establishment of consortia such as ENCODE (http://encodeproject. org/) and databases such as the Saccharomyces Genome Database (SGD) and the fission yeast database (Pombase). The canonical ensemble approach has been whole-genome sequencing, which has allowed the description and comparison of gene contents for a wide range of organisms, facilitating molecular genetic analysis of biological mechanisms far beyond the limited numbers of genetically amenable model organisms. Genome sequencing is particularly useful for cell biology because all living
organisms are composed of cells, and so orthologous genes important for cellular phenomena can be studied in a variety of organisms. Cells in different organisms or in different tissues of the same organism allow orthologous genes and related cellular phenomena to be investigated in a range of situations yielding informative comparisons. A good example has been the comparison of cellcycle control in yeast cells with metazoan embryos (Gould and Nurse, 1989; Murray and Kirschner, 1989). Knowledge of whole-genome sequences also allows gene ablation experiments to be carried out on a genome-wide basis. Two major methodologies have been used, systematic gene deletions and libraries of smallinterfering RNAs (siRNAs). Other methodologies such as transposition have been used, particularly in prokaryotes (Zhang and Lin, 2009). To date, whole-genome gene deletions have only been completed in bacteria and yeasts (de Berardinis et al., 2008; Giaever et al., 2002; Kim et al., 2010) and have the advantage of completely ablating a gene function, making functional assignments and comparisons of gene functions between organisms more straightforward. siRNA libraries are very versatile as they can be employed in many organisms but can be subject to partial knockdown and off-target effects (Sioud, 2011). Successes using these approaches include the identification of all genes required for the viability of budding and fission yeast cells, for cellular processes such as centromeric cohesion in budding yeast (Marston et al., 2004), and for mitosis in human cells (Kittler et al., 2007; Neumann et al., 2010). Ensemble descriptions have been used extensively, including microarrays for monitoring the types and levels of RNAs, mass spectroscopy for studying proteins, and mass spectroscopy and chromatography for assessing metabolites. Ensemble data collections have the advantage of avoiding the dangers of inadvertently ‘‘cherry-picking’’ data when studies are confined to work on limited numbers of gene products, which can result in investing too much importance to a particular RNA or protein simply because it is the only one under investigation. Comparisons between different cells and organisms allow the identification of gene products that are implicated more
universally in a particular cellular phenomenon. For example, a comparative approach has enabled the identification of RNAs whose levels change at transitions through specific cell-cycle stages in a conserved manner in more than one organism (Rustici et al., 2004). Networks The availability of ensemble datasets also allows the systematic grouping of genes with related functions. For example, catalogs of genes that when deleted have a similar cellular phenotype will identify gene sets required for particular processes. Similarly, RNA transcripts that behave in a similar manner, such as peaking in level at a particular phase of the cell cycle, reveal RNAs that potentially have related roles. In this way, the ‘‘toolkit’’ required for a specific cellular process can be assembled. Another grouping approach is to construct networks based on gene products that interact with each other. Such networks can be assembled using interaction trap methodologies (such as two-hybid methodologies and immunoprecipitations) that assess whether molecules are in physical contact. Also important are catalytic interactions resulting in metabolic changes or chemical modifications, such as phosphorylation. Biochemical approaches can be complemented by highthroughput genetic interaction assays (screening, for example, for synthetic lethality), although these do not necessarily provide evidence for direct physical interaction between components. Green fluorescent protein tags can be used to identify molecular components that spatially colocalize, as an indicator of potential functional relationships (Huh et al., 2003; Matsuyama et al., 2006). These various methodologies allow networks to be built up that connect molecular components throughout the cell to generate an overall cellular interactome (Collins et al., 2007; Rual et al., 2005). The power of these networks is enhanced when they are combined with catalogs of genes involved in a particular cellular function because they lead to a better molecular understanding of the process of interest. Interaction networks can also identify linker components that connect different functional networks and processes (Zhong et al., 2009) and
may therefore have interesting regulatory roles. For many researchers, the creation of interaction networks is a major goal of systems cell biology that is aimed at providing complete networks of different cellular processes. However, achieving this aim may require more sophisticated languages or notations to fully describe how the networks work. Unlike simple networks, such as an airline transportation network, the interaction linkages in biological networks may represent stable complexes or transient catalytic reactions or may reflect the logical nature of the interaction, for example a representation of a negative or positive feedback. The notation used in network descriptions needs to reflect this complexity. It is also important to take account of the fact that the linkages are not always hard-wired because they are mostly based on chemistry with connections established by chemicals diffusing from one component to another. These chemical linkages can readily break and reform to connect different components and remodel the architecture of the network (Bray, 2009). Quantitative Methodologies Quantitative methodologies involving both large datasets and the modeling of data are frequently used in systems approaches. The massively parallel collections of data as generated by microarrays, for example, have superseded the more qualitative measurements of traditional molecular biology with techniques such as northern and western blotting. An advantage of good quantification is that it leads to a better appreciation of the effect of the number of molecules within a cell on biological processes. This allows an assessment of the stoichiometry between different molecular components as well as recognition that there may be only a few molecules of a particular type present within a cell. Some gene transcripts in yeast are present at an average of less than one per cell (Velculescu et al., 1997), shifting our view of regulation from being driven by mass action, which is analog in character, to one that is more stochastic and digital. This means that greater attention is needed on the influence of molecular noise on the cell (Newman et al., 2006). An important issue is whether noise and
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 851
the variability it generates between cells are exploited for regulatory purposes, for example to ensure a range of cellular responses to environmental changes such as the competence state of Bacillus subtilis (Maamar et al., 2007; Suel et al., 2006). Monitoring of single cell behaviors has revealed that there is a significant variation between cells that was not appreciated previously by global population analyses (Choi and Kim, 2009). The combined use of photomicrography and robotic microscopes is capable of generating large amounts of data to investigate these effects of noise and stochastic behaviors in cells, for example cell size variability at the G1-S transition in budding yeast (Di Talia et al., 2007). Biochemical processes are generally modeled by deriving differential equations to calculate flux through pathways using in vivo estimates of the rate constants and concentrations of components. Although modeling in cell biology has become more popular in recent years, in part due to the massive increase in data available and to the migration of more theoretically inclined scientists to biology, in the past it was only pursued by a few committed individuals (Novak and Tyson, 1997; Tyson, 1983). The evolutionary biologist John Maynard Smith contended that the act of thinking about a model’s equations greatly clarifies understanding of how the model works. Biologists have a tendency to produce somewhat loosely formulated models summarized in the form of cartoons, and it is useful to subject these to the discipline of writing equations in the expectation that the thought imposed by equation writing will improve understanding of the model’s assumptions and dynamics. However, two major problems are often encountered when generating mathematical models for cell biology: the complexity of the pathways being modeled and the difficulty of estimating the appropriate values for rate constants and the concentration of components. Biochemical pathways are often complex with many redundant functions, reflecting the fact that evolution does not always lead to, from an engineer’s point of view, the most efficient and economic solutions (Jacob, 1977; Saunders and Ho, 1976). Natural selection acts on pre-existing cells often by making additions to previously opera-
tional pathways, and these additions increase redundancy. In this respect modeling in biology may differ from physics where the aesthetic is to search for the simplest and most elegant model to explain a phenomenon. In biology, there are often more elements in a model than are strictly necessary and some act redundantly. The number of elements also increases the degrees of freedom available, reducing confidence in the outcome of the modeling process. There are several ways these difficulties can be addressed. One way used by modelers is to test the sensitivity of models to make sure that they still work well when the parameters used in the equations are varied. If the model still behaves robustly when different values are used in the equations, then confidence in the model is increased. It is also helpful if the biological function being studied can be recapitulated in vitro. Many quite complex processes can be carried out in concentrated Xenopus egg and cell extracts, for example important aspects of cell-cycle control (Blow and Laskey, 1986; Deibler and Kirschner, 2010). In Xenopus extracts, for instance, the levels of biochemical components can be both measured and manipulated more easily than is possible in a living cell. Fluorochrome-based sensor modules combined with light microscopy are also providing better ways of measuring concentrations within cells in vivo, such as protein levels in budding yeast cells (Newman et al., 2006). Another approach is to simplify the biochemical network underlying the biological function of interest, although this is only useful if the essential elements of that process are still maintained. The advantage of simplification is that it reduces the degrees of freedom available, making modeling easier and the outcome more reliable. An example of the potential for network simplification is seen with a recent genetic manipulation of the mitotic control network in fission yeast (Coudreuse and Nurse, 2010). Many gene products have been identified that regulate the cyclindependant kinases (CDKs), and several quantitative models have been generated that can explain how CDKs are controlled to ensure orderly progression through the cell cycle at the correct cell size. Unexpectedly, a number of the gene products
852 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
thought to have been important in CDK regulation and mitotic control can be eliminated while still maintaining good size control over mitotic onset. This simplified network focuses attention on those elements that are sufficient to generate good mitotic control and cell size homeostasis, reducing the degrees of freedom and making modeling more straightforward. In a way, this is synthetic biology in reverse; rather than building a simple network de novo, such as an oscillator or clock (Danino et al., 2010; Elowitz and Leibler, 2000), a pre-existing network is simplified. Both approaches lead to the same outcome—the generation of simpler models still capable of explaining the biological function of interest. Managing Information Networks and quantitative modeling are closely associated with information management within the cell. Many of the most insightful explanations in cell biology have been made in terms of information flow; this involves understanding how the cell gathers, processes, stores, and uses information in the context of a biological function or phenomenon of interest (Nurse, 2008). Information is gathered from both outside and inside the cell and is processed and communicated to different parts of the cell. Storage of information occurs over a wide range of timescales, from the long timescale seen in heredity (encoded in the DNA sequence and possibly mediated by epigenetics), through to the medium timescale seen with mRNA and gene transcriptional circuits, to the short timescale seen with activated small G proteins (Bonasio et al., 2010; Etienne-Manneville and Hall, 2002; Roy et al., 2010). Information is used to direct cell behaviors, coordinating appropriate responses to changing circumstances. Recognition of the significance of information was crucial at the beginning of molecular biology, particularly in dealing with how information flowed from gene to protein, although it applies to all aspects of cell behavior. The iconic examples from that time are the concepts that DNA acts as a digital information storage device (Brenner et al., 1961) and that the lac operon regulatory circuit forms a negative feedback loop (Dickson et al., 1975; Lin and Riggs, 1975; Ohki and Sato, 1975). Systems
biology, by generating datasets, networks, and models, provides an opportunity to understand information flow through the cell. In our view, this is one of the most important aspects of systems analyses in cell biology and will help move studies from descriptions of biological phenomena to a better understanding of how they work. Information management involves various processing elements or logic modules that carry out particular computational functions, which can be categorized according to the type of function they carry out. For example, a negative feedback loop communicates information from a late step in a pathway to an earlier step, and if there is increased flow at the later step, then a negative signal is sent to the earlier one, reducing overall flow through the pathway and thus maintaining homeostasis. In contrast, a positive feedback loop sends a positive signal that increases overall flow to generate a switch to maximum flow through the pathway. More complex logic modules produce more sophisticated responses, such as toggles switching between two states, timers measuring elapsed time, oscillators cycling in time, and gradients measuring cellular dimensions (Tyson et al., 2003). The operation of these modules depends on how the various components are linked together and the shapes of the response curves that determine the character of those interactions. There is a need to build on past work to construct a full listing of the different types of logic modules that are operational in cells. Working with engineers and cyberneticists should be helpful in achieving this goal (Alon, 2003; Nurse, 2008). An emphasis on information management may reveal some unexpected features of cells. An example is the potential for dynamics to enrich information transfer through signaling pathways. Such pathways are usually thought of as on/off switches that can only be in one of two states. However, if signals are pulsed down the pathway and the output depends on the dynamics of those pulses, then more information can be communicated (von Kriegsheim et al., 2009). This is the same idea that forms the basis for Morse code, a system that communicates complex messages by utilizing the dynamics produced by
a series of dots and dashes. Information is also managed in the three dimensions of cellular space (Scott and Pawson, 2009). Not only must spatial information be generated to define the space of the cell but the availability of various cellular compartments means that different information can be stored in different places and a wide variety of connections between logic modules can be formed and reformed through diffusible chemicals. The richness of behavior possible with this arrangement is reminiscent of the complex behaviors normally associated with neural networks. A Cell Biology Systems Initiative The cell is the simplest unit that exhibits the characteristics of life and so is likely to be the most effective level in biology to investigate how life works. The tools and intellectual framework of systems biology will provide great opportunities to achieve this objective by generating the data needed and the approaches required for a comprehensive understanding of the cell. This applies to all types of cells including bacteria, which can have small genomes and where there have been great advances in recent years (Wang et al., 2010). But it is with eukaryotic cells where the greatest benefits are likely to be realized because already much work has been achieved and the conservation of many processes across eukaryotes means that different cell types with differing characteristics and strengths in methodologies can be used to study the same biological phenomena. It’s perhaps not surprising that we, as two yeast geneticists, would recommend the unicellular budding and fission yeasts as good models for studying many aspects of cell biology using a systems approach. Both organisms are eukaryotes with small genomes of only 5000– 6000 genes, making systems genomic analyses more straightforward to carry out. The availability of genome-wide gene deletion collections together with other methods for saturation forward genetics (Guo and Levin, 2010) allows the identification of nearly all the genes in the genome that are involved in a particular cellular function or process. Application of interaction trap procedures together with bioinformatics will help to identify the biochemical roles of gene
functions and to group them into the networks responsible for the process. Genetics can be used to simplify the network, focusing attention on the core gene functions responsible for the process to help with subsequent modeling. Comparisons with cells in other organisms will test whether the conclusions being reached can be generalized across species including human cells and also allow in vitro systems to be developed especially with Xenopus egg extracts. A major aim with the initiative will be to explain as often as possible a cell biological function or process in terms of information management. This requires interdisciplinary approaches and is not so straightforward because cell biology experiments generally yield biochemical results, and there are no easy ways to translate chemistry into the information processing elements or logic modules that we have argued are needed for good understanding. It would be helpful if there were more effective ways to model pathways and networks without having to know all the rate constants and concentrations involved, and we have previously outlined possible procedures that may help with that elsewhere (Nurse, 2008). Despite these difficulties, we are now well placed to apply the methods of systems biology more comprehensively to cell biology to gain greater insight into how cells work. ACKNOWLEDGMENTS We would like to thank our colleagues at The Rockefeller University and CRUK London Research Institute, particularly L. Weston and J. Wu, for helpful comments on the manuscript. We would also like to acknowledge the many researchers whose work we have not been able to reference because of space constraints.
REFERENCES Alon, U. (2003). Science 301, 1866–1867. Blow, J.J., and Laskey, R.A. (1986). Cell 47, 577–587. Bonasio, R., Tu, S., and Reinberg, D. (2010). Science 330, 612–616. Bray, D. (2009). Wetware: A Computer in Every Living Cell (New Haven, CT: Yale University Press). Brenner, S. (2010). Philos. Trans. R. Soc. Lond. B Biol. Sci. 365, 207–212. Brenner, S., Jacob, F., and Meselson, M. (1961). Nature 190, 576–581.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 853
Choi, J.K., and Kim, Y.J. (2009). Nat. Genet. 41, 498–503. Collins, S.R., Miller, K.M., Maas, N.L., Roguev, A., Fillingham, J., Chu, C.S., Schuldiner, M., Gebbia, M., Recht, J., Shales, M., et al. (2007). Nature 446, 806–810. Coudreuse, D., and Nurse, P. (2010). Nature 468, 1074–1079. Danino, T., Mondragon-Palomino, O., Tsimring, L., and Hasty, J. (2010). Nature 463, 326–330.
Kim, D.U., Hayles, J., Kim, D., Wood, V., Park, H.O., Won, M., Yoo, H.S., Duhig, T., Nam, M., Palmer, G., et al. (2010). Nat. Biotechnol. 28, 617–623.
Rual, J.F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., Berriz, G.F., Gibbons, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. (2005). Nature 437, 1173–1178.
Kittler, R., Pelletier, L., Heninger, A.K., Slabicki, M., Theis, M., Miroslaw, L., Poser, I., Lawo, S., Grabner, H., Kozak, K., et al. (2007). Nat. Cell Biol. 9, 1401–1412.
Rustici, G., Mata, J., Kivinen, K., Lio, P., Penkett, C.J., Burns, G., Hayles, J., Brazma, A., Nurse, P., and Bahler, J. (2004). Nat. Genet. 36, 809–817.
Langston, R.F., Ainge, J.A., Couey, J.J., Canto, C.B., Bjerknes, T.L., Witter, M.P., Moser, E.I., and Moser, M.B. (2010). Science 328, 1576–1580.
Saunders, P.T., and Ho, M.W. (1976). J. Theor. Biol. 63, 375–384.
Lin, S., and Riggs, A.D. (1975). Cell 4, 107–111.
Scott, J.D., and Pawson, T. (2009). Science 326, 1220–1224.
Deibler, R.W., and Kirschner, M.W. (2010). Mol. Cell 37, 753–767.
Liti, G., Carter, D.M., Moses, A.M., Warringer, J., Parts, L., James, S.A., Davey, R.P., Roberts, I.N., Burt, A., Koufopanou, V., et al. (2009). Nature 458, 337–341.
Shah, S.P., Morin, R.D., Khattra, J., Prentice, L., Pugh, T., Burleigh, A., Delaney, A., Gelmon, K., Guliany, R., Senz, J., et al. (2009). Nature 461, 809–813.
Di Talia, S., Skotheim, J.M., Bean, J.M., Siggia, E.D., and Cross, F.R. (2007). Nature 448, 947–951.
Maamar, H., Raj, A., and Dubnau, D. (2007). Science 317, 526–529.
Sioud, M. (2011). Methods Mol. Biol. 703, 173–187.
Dickson, R.C., Abelson, J., Barnes, W.M., and Reznikoff, W.S. (1975). Science 187, 27–35.
Marston, A.L., Tham, W.H., Shah, H., and Amon, A. (2004). Science 303, 1367–1370.
Ding, L., Ellis, M.J., Li, S., Larson, D.E., Chen, K., Wallis, J.W., Harris, C.C., McLellan, M.D., Fulton, R.S., Fulton, L.L., et al. (2010). Nature 464, 999–1005.
Matsuyama, A., Arai, R., Yashiroda, Y., Shirai, A., Kamata, A., Sekido, S., Kobayashi, Y., Hashimoto, A., Hamamoto, M., Hiraoka, Y., et al. (2006). Nat. Biotechnol. 24, 841–847.
Elowitz, M.B., and Leibler, S. (2000). Nature 403, 335–338.
Monod, J. (1972). Chance and Necessity: An Essay on the Natural Philosophy of Modern Biology (New York: Vintage Books).
de Berardinis, V., Vallenet, D., Castelli, V., Besnard, M., Pinet, A., Cruaud, C., Samair, S., Lechaplais, C., Gyapay, G., Richez, C., et al. (2008). Mol. Syst. Biol. 4, 174.
Etienne-Manneville, S., and Hall, A. (2002). Nature 420, 629–635. Ezov, T.K., Boger-Nadjar, E., Frenkel, Z., Katsperovski, I., Kemeny, S., Nevo, E., Korol, A., and Kashi, Y. (2006). Genetics 174, 1455–1468. Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., et al. (2002). Nature 418, 387–391.
Murray, A.W., and Kirschner, M.W. (1989). Nature 339, 275–280. Neumann, B., Walter, T., Heriche, J.K., Bulkescher, J., Erfle, H., Conrad, C., Rogers, P., Poser, I., Held, M., Liebel, U., et al. (2010). Nature 464, 721–727. Newman, J.R., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., DeRisi, J.L., and Weissman, J.S. (2006). Nature 441, 840–846.
Suel, G.M., Garcia-Ojalvo, J., Liberman, L.M., and Elowitz, M.B. (2006). Nature 440, 545–550. Towers, M., and Tickle, C. (2009). Int. J. Dev. Biol. 53, 805–812. Tyson, J.J. (1983). J. Theor. Biol. 104, 617–631. Tyson, J.J., Chen, K.C., and Novak, B. (2003). Curr. Opin. Cell Biol. 15, 221–231. Velculescu, V.E., Zhang, L., Zhou, W., Vogelstein, J., Basrai, M.A., Bassett, D.E., Jr., Hieter, P., Vogelstein, B., and Kinzler, K.W. (1997). Cell 88, 243–251. von Kriegsheim, A., Baiocchi, D., Birtwistle, M., Sumpton, D., Bienvenut, W., Morrice, N., Yamada, K., Lamond, A., Kalna, G., Orton, R., et al. (2009). Nat. Cell Biol. 11, 1458–1464. Wang, Y., Cui, T., Zhang, C., Yang, M., Huang, Y., Li, W., Zhang, L., Gao, C., He, Y., Li, Y., et al. (2010). J. Proteome Res. 9, 6665–6677.
Gould, K.L., and Nurse, P. (1989). Nature 342, 39–45.
Novak, B., and Tyson, J.J. (1997). Proc. Natl. Acad. Sci. USA 94, 9147–9152.
Guo, Y., and Levin, H.L. (2010). Genome Res. 20, 239–248.
Ohki, M., and Sato, S. (1975). Nature 253, 654–656.
Zhang, R., and Lin, Y. (2009). Nucleic Acids Res. 37, D455–D458.
Roy, S., Ernst, J., Kharchenko, P.V., Kheradpour, P., Negre, N., Eaton, M.L., Landolin, J.M., Bristow, C.A., Ma, L., Lin, M.F., et al. (2010). Science 330, 1787–1797.
Zhong, Q., Simonis, N., Li, Q.R., Charloteaux, B., Heuze, F., Klitgord, N., Tam, S., Yu, H., Venkatesan, K., Mou, D., et al. (2009). Mol. Syst. Biol. 5, 321.
Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., and O’Shea, E.K. (2003). Nature 425, 686–691. Jacob, F. (1977). Science 196, 1161–1166.
Nurse, P. (2008). Nature 454, 424–426.
854 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Leading Edge
Essay Informing Biological Design by Integration of Systems and Synthetic Biology Christina D. Smolke1,* and Pamela A. Silver2,* 1Department
of Bioengineering, Stanford University, 473 Via Ortega, Stanford, CA 94305-4201, USA of Systems Biology and Wyss Institute of Biologically Inspired Engineering, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA *Correspondence:
[email protected] (C.D.S.),
[email protected] (P.A.S.) DOI 10.1016/j.cell.2011.02.020 2Department
Synthetic biology aims to make the engineering of biology faster and more predictable. In contrast, systems biology focuses on the interaction of myriad components and how these give rise to the dynamic and complex behavior of biological systems. Here, we examine the synergies between these two fields. Biology is the technology of this century. The potential uses of biology to improve the human condition and the future of the planet are myriad. Over the last century, humans have used biology to make many useful things, in part based on discoveries from molecular biology. In addition, researchers have redesigned biological systems to test our fundamental understanding of their components and integrated functions. However, the complexity and reliability of engineered biological systems still cannot approach the diversity and richness exhibited by their natural counterparts. It is then the combined promise of systems biology and synthetic biology that may drive transformative advances in our ability to program biological function. One recent example of the successful engineering of a biological system to address a global challenge in health and medicine is the creation of microbes that produce a precursor to the antimalarial drug artemisinin (Ro et al., 2006). By shifting synthesis from the natural production host (a plant) to one more optimized for rapid production times and inexpensive scale up (a microorganism), researchers were able to develop a process that enabled cheaper supply of this drug, providing a more accessible cure for a disease devastating third world countries. However, the research phase of this project required an investment of over $25 million and 150 person-years of highly trained researcher effort. This investment cannot realistically be replicated for every chemical or material to which we would
apply this approach. Instead, imagine a time when a bioengineer designs a system at the computer, orders the necessary DNA encoding the specified system, and then begins the actual experiment of turning it into life. Thus, one overarching goal of synthetic biology is to make the engineering of biology faster, affordable, and more predictable. Biological systems and their underlying components offer a number of functional parallels with engineered systems. For example, biological sensors are exquisitely sensitive; the olfactory system can detect single odorant molecules and decode them. Biological systems can send and receive signals rapidly and in a highly specific manner. Pathways exist to sense and respond to the environment. Plants and microbes can use sunlight as an energy source. However, biological systems are also uniquely capable of self-replication, mutation, and selection, leading to evolution. Synthetic biologists aim to take advantage of these parallels and develop engineering principles for the design and construction of biological systems. However, an open question is whether we understand biological systems sufficiently to be able to redesign them to fulfill specific requirements. Engineers enjoy the concept of interchangeable parts and modularity. Biology offers many sources of potential modularity but exhibits nonmodular features as well. For many years the gene was regarded as a fundamental modular unit of biology. As such, a gene is capable of transferring a particular phenotype to the
organism. However, we now know that genes display more fine-grained modularity in the form of promoters, open reading frames (ORFs), and regulatory elements. mRNAs contain sequences important for proper intracellular targeting and degradation. Proteins often contain targeting sequences, reactive centers, and degradation sequences. And lastly, entire pathways are modular in that some signaling pathways can be transferred from one organism to another to reconstruct a new state in the engineered organism. This modularity underlies one of the core concepts of synthetic biology—the notion that one can assemble biological systems from well-defined ‘‘parts’’ or modules (Endy, 2005). However, modular assembly approaches have largely remained confounded by the effects of context—that is, the nonmodular aspects of biology. For example, where a gene or an associated regulatory element is located in the genome can impact expression and thus its function. In addition, the location of regulatory elements relative to each other and ORFs can impact their encoded function (Haynes and Silver, 2009). Further analyses provided by systems biology may help to guide the development of standard strategies for assembling genetic modules into functional units. Approaches to Synthetic Biology Given that the goals of synthetic biology are to make the engineering of biology faster and more predictable, and to harness the power of biology for the
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 855
common good, the development of new approaches that support the design and construction of genetic systems has been a core activity within the field. Although advances have been made in both areas (fabrication and design), our ability to construct large genetic systems currently surpasses our ability to design such systems, resulting in a growing ‘‘design gap’’ that is a critical issue that synthetic biologists must address. The ability to synthesize large pieces of DNA corresponding to operons, entire pathways, chromosomes, and genomes in a rapid and predictable way is a key approach to system fabrication. Systems biology has provided numerous templates with the abundance of sequenced genomes being deposited daily into publicly accessible databases. Some progress has recently been reported, including the resynthesis of a bacterial genome and its successful insertion into a different bacterial host (Gibson et al., 2010). However, it took researchers nearly 15 years and approximately $30 million to develop various fundamental aspects of this project. Much of this time and cost was methods development that will hopefully reduce the resources needed to carry out such projects in the future. In addition, new high-throughput methods for large-scale DNA synthesis have been recently described (Matzas et al., 2010; Norville et al., 2010; Tian et al., 2009). However, much more work is still needed to develop these technologies to the point where they are accessible to the majority of researchers (that is, in terms of cost and reliability), and systems biology may provide important clues. For example, faster and more reliable ways to synthesize large pieces of DNA may be uncovered by examination of new organisms and thereby reveal new nonchemical methods for DNA synthesis. A second approach is to develop the methods to generate new component functions that can act as sensors, regulators, controllers, and enzyme activities, for example. These components will in turn extend the set of parts from which synthetic biologists can build genetic devices and systems. Synthetic biologists work not only with design of DNA that encodes genetic circuits but also with molecular design of biomolecules, such
as RNAs and proteins, to perform new functions. Substantial efforts in the field of protein engineering have contributed to the diversity of functions exhibited by protein components (Dougherty and Arnold, 2009). However, even with these advances, the diversity of component activities that is currently available as parts has been limited, thus limiting the design of genetic circuits. Systems biology may aid in the development of effective strategies for generating new component functions by providing information on how Nature has evolved different functions for macromolecules. A third approach is the predictable design of complex genetic circuits that lay the foundation for new biological devices and systems. Many circuits designed and built thus far have relied on our fairly detailed knowledge of how gene transcription is regulated. For example, synthetic circuits have applied concepts of positive and negative feedback to generate systems that sense stimuli, remember past events, and promote cell death in both prokaryotic and eukaryotic cells (Burrill and Silver, 2010; Sprinzak and Elowitz, 2005). However, many of these systems have been built in a fairly ad hoc manner, requiring substantial troubleshooting and iterative design to exhibit desired functions, and lack the robust performance standards one might expect as an engineer. Going forward, synthetic biologists need to better understand the parts underlying system design, how to predict their function in a particular genetic context, and how to predict their integrated function with other system parts (Ellis et al., 2009; Savageau, 2011). This biological understanding will then be integrated with computational models to develop computeraided design tools. What Does Systems Biology Mean to Synthetic Biology? As with synthetic biology, many different types of research have been categorized as systems biology. Broadly speaking, systems biology represents an approach to biological research that focuses on the interactions between components of a biological system and how those interactions give rise to the dynamic behavior of the system in contrast to the more traditional molecular biologists’ reductionist
856 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
approach of studying components in isolation from each other (Alon, 2007). Systems biology has been associated with new technologies and methods that allow for quantitative measures of components and component interactions within biological systems, particularly those that allow for genome-wide measurements. In addition, because many of these technologies result in large datasets, systems biology has also been associated with computational tools that support the integration and analysis of these datasets to identify static relationships and interactions between components. Finally, as one of the ultimate goals of systems biology is to be able to predict a system’s dynamic behavior from the component parts, computational tools that can model biological systems-level function from underlying components are associated with this field. However, there are currently a number of challenges and limitations facing the field of systems biology. Paramount is determining how to correctly analyze and draw valid conclusions from large amounts of different types of data ranging from genomics and metabolomics to molecular dynamics in many single cells. Effectively addressing this problem may require new mathematical and computer science approaches. A second key challenge is knowing what kind of measurements to make and how accurate these measurements need to be to fully understand a biological system. Effectively addressing this challenge will require a re-evaluation of how measurements have been made over the past 10 years in systems biology (for instance, the movement from two-hybrid interactions to mass spectrometry to measure protein interactions). It will also require the development of even more sensitive strategies to make time-dependent measurements inside many cells simultaneously. Taken together, systems biology is confronted with the problem of both sensitivity and scale. Does the ultimate goal of synthetic biology of the predictable design, construction, and characterization of biological systems rely on findings and approaches from systems biology? Design, analysis, and understanding are integrally linked in engineering methodology. Therefore, it is reasonable to assume that advances
gained through systems systems biology research. biology in our understanding For example, genome seof how biological compoquencing can provide an nents interact to form inteincreased diversity of biologgrated systems will support ical parts that synthetic bioloefforts in synthetic biology to gists can use in their gene design engineered biological circuit designs. More imporsystems. However, there is tantly, systems biology will a different viewpoint that provide not just the physical argues that the design principarts but a fundamental ples that systems biologists understanding of how these elucidate for natural biologcomponents can be inteical systems are products of grated effectively with other evolution over many millions components and how biologof years and thus are limited ical systems integrate diverse by the history of what came components and regulatory before. It is possible, then, mechanisms to achieve that the design principles robust information transmiselucidated for natural biologsion and behaviors. The ical systems may not be importance of this contribunecessary or optimal for the tion is highlighted by the engineered systems that synlimited diversity of parts and regulatory mechanisms that thetic biologists may design have been integrated into from scratch on a computer with less of a restriction of synthetic gene circuits to Figure 1. The Challenges and Synergies for Systems and Synthetic date, in which the majority generating new function Biology of engineered systems rely through evolutionary proon a limited number of trancesses and timescales. Both of these views have merit, and the reality take a bottom-up approach, with systems scriptional regulators and do not exhibit is likely somewhere in between—even if biology emphasizing the understanding of robust behaviors over different timescales synthetic biologists design biological biological systems from the underlying and environmental conditions (Elowitz systems to have certain properties that components and synthetic biology and Leibler, 2000; Gardner et al., 2000; are not generally found in natural systems emphasizing building biological systems Purnick and Weiss, 2009). In order to move toward the design of integrated (i.e., optimized for troubleshooting, from modular components. tailoring, reuse, removal, designer identifiIn examining the parallels between the genetic systems, synthetic biologists will cation), a greater understanding of how two fields, it is also useful to examine need to design more sophisticated gecomponents interact to form integrated how the key challenges each field is cur- netic circuits that utilize diverse regulatory systems will inform and support the rently facing relate to one another (Fig- strategies (specifically, the integration of design process. ure 1). The challenges synthetic biologists posttranscriptional and posttranslational currently face in engineering genetic mechanisms), balance energetic load, systems can be classified as relating to and dynamically modulate system beSynergy between Systems either limitations in understanding biolog- havior (Lim, 2010; Win et al., 2009). and Synthetic Biology Another important contribution of sysAlthough synthetic biology did not directly ical systems or limitations in technical emerge from systems biology, there are capabilities to study biological systems. tems biology to synthetic biology is assoimportant parallels between the two The challenges systems biologists cur- ciated with the technologies and tools for fields. Both systems biology and syn- rently face in understanding biological analyzing biological systems. Synthetic thetic biology represent fundamental systems are related to the complexities biologists often spend the bulk of their shifts in approaches from the fields they associated with studying natural biolog- effort in a design, characterization, and grew out of. Whereas systems biology ical systems and the inadequacies of cur- optimization loop, where original designs represents a shift in the more traditional rent computational models to capture the are modified based on characterization reductionist approach taken in biological physical properties of biological systems. data to achieve the desired system beresearch from studying components in We see several areas where these two havior. The tools developed by systems isolation to studying integrated compo- fields can be brought together to effec- biologists to study components in a system and their interactions can be applied nents, synthetic biology represents a shift tively address these challenges. in emphasizing engineering principles and The richness and complexity of engi- to analyzing synthetic systems and troumethodology in building biological sys- neered genetic networks, which synthetic bleshooting the system performance. tems from more traditional genetic engi- biologists could build, will be advanced This is particularly true in cases in which neering research. In addition, both fields by using the knowledge gained through the synthetic gene network may have Cell 144, March 18, 2011 ª2011 Elsevier Inc. 857
unanticipated effects on native pathways in the cell that may in turn affect system behavior. A common example of this challenge can arise in engineering metabolic pathways, for which synthetic biologists can use genome-wide profiling of transcript, protein, or metabolite levels to identify undesired effects of introducing the synthetic pathway in the host cell on critical functions such as redox balance, cofactor levels, and stress response (Mukhopadhyay et al., 2008). As another example, systems biologists have developed a variety of computational tools for modeling biological systems and sharing information on biological components across different databases. These tools will be useful foundations for synthetic biologists looking to develop methods to standardize and share information across component libraries and develop computer-aided design tools for biological systems. Advances in synthetic biology will provide key contributions to systems biology research by creating new tools for interfacing with and manipulating biological systems. Research aimed at understanding a biological system often utilizes methods to perturb or manipulate that system and examine the resulting behavior of the modified system. Synthetic biologists are developing novel genetic devices that can be used by systems biologists to interface with native networks and precisely probe and manipulate those systems. For example, synthetic genetic devices have recently been used to rewire signaling pathways and create novel interactions between unrelated cellular components (Culler et al., 2010; Lim, 2010). In addition, synthetic biology can contribute strategies for simplifying and isolating biological components and their interactions through the application of diverse approaches for implementing specific component interactions. Synthetic biology can also provide new simulation platforms for systems biology. For example, systems biologists currently develop mathematical models to represent the behavior of their systems and use these models to predict the behavior of their systems under different perturbations and environments. However, the development of these models often requires assumptions that are imperfect
matches for the physical model of a cell (i.e., hard sphere, dilute gas models), such that the ability of current computational models to capture system behavior is limited at best. The potential advances in constructing genetic systems coming from synthetic biology research may enable systems biologists to shift from computational models to physical models for their systems by implementing simulations inside of cells. Specifically, scalable and affordable DNA synthesis technology can allow systems biologists to build many modified versions of natural systems to test their understanding of those systems. Perspective of the Future Moving forward, the synergy between synthetic and systems biology will drive transformative advances in biotechnology. The impact includes not only further understanding of the complexity of biological systems but the ability to use this information to, for instance, design better drugs, commodity manufacturing processes, and cell-based therapies (Ducat et al., 2011). As one example, the complexity of biosynthesis processes that can be engineered has been recently advanced through the integration of a number of pathway construction and optimization tools, including genomic discovery and engineering (Bayer et al., 2009; Ro et al., 2006; Wang et al., 2009), in vivo screens for enzyme activity (Pfleger et al., 2006), and enzyme localization strategies (Dueber et al., 2009). Future efforts will focus on the development of more advanced tools for bioprocess optimization, such as those enabling noninvasive monitoring of pathway flux (Win and Smolke, 2007), closed loop embedded control of biosynthesis system behavior (Dunlop et al., 2010; Farmer and Liao, 2000), and biosynthesis compartmentalization and specialization. As another example, systems engineering strategies will play key roles in addressing current challenges in cellular therapies by enabling the programming of cell-fate decisions (Culler et al., 2010), differentiated states (Deans et al., 2007), improved engraftment and targeting (Chen et al., 2010), and effective kill switches (Callura et al., 2010). Ultimately, researchers will design systems that incorporate evolution—designing gene circuits that exhibit
858 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
desired, evolvable behaviors and eventually constructing ecosystems that exhibit dynamic and predictable behavior patterns. However, it is important to look at history in thinking about the promises and risks of synthetic biology. Molecular biology, and in particular the insertion of foreign genes into microbes, was met with circumspection by both the public and scientific communities. At the time, scientists made promises to the public— for example, the production of human insulin by engineered bacteria—and delivered on at least some of these promises (Villa-Komaroff et al., 1978). So, what can we expect from the interplay between systems biology and synthetic biology in the near and long term? In the near term, we have already seen companies promise to deliver on new fuels and carbon-based products (such as plastics), and in 5 years time this will be a partial reality, thereby starting to take petroleum out of the production loop. We believe that, in 10 years time, many high-value commodities, including drugs, will be produced biologically as the result of synthetic biology efforts. In the much longer time frame of 20 to 50 years, we hope that synthetic biology will lead to new cellbased therapies, the expansion of immunotherapy, synthetic organs and tissues, and rebuilding devastated environments and ecosystems. These anticipated futures bring us to the controversial areas in synthetic biology. How do we think about a future that could involve the reprogramming of entire organisms? Should we consider engineering ecosystems to support sustainable agriculture, environmental remediation, and pathogen removal and to treat human disease? How far should and can we go in reprogramming life to form new types of cells, tissues, and entire organisms? These are only some of the potential benefits and questions scientists, engineers, policy makers, governments, and, most importantly, the public will need to ponder. Molecular biologists set standards for safe use of engineered organisms over 30 years ago. However, as research in synthetic biology is advancing toward the goals of making biology easier to engineer, the issues of safety and ethical use are being revisited as we write this Essay. In fact,
a recent US government report captures many of the critical issues around public benefits and responsible stewardship (Presidential Commission for the Study of Bioethical Issues, 2010). Although each field could in principle exist without the other, we instead feel that the natural interplay between design, analysis, and understanding highlights the important relationship between systems biology and synthetic biology. Systems biology brings added layers of information that will further empower future efforts to design synthetic biological systems. Synthetic biology brings new technologies and tools that can be applied to effectively test our understanding of natural biological systems. By integrating the contributions of these rapidly evolving fields, scientists and engineers together will be well positioned to transform health, well-being, and the environment in the years to come.
Bayer, T.S., Widmaier, D.M., Temme, K., Mirsky, E.A., Santi, D.V., and Voigt, C.A. (2009). J. Am. Chem. Soc. 131, 6508–6515. Burrill, D.R., and Silver, P.A. (2010). Cell 140, 13–18. Callura, J.M., Dwyer, D.J., Isaacs, F.J., Cantor, C.R., and Collins, J.J. (2010). Proc. Natl. Acad. Sci. USA 107, 15898–15903. Chen, Y.Y., Jensen, M.C., and Smolke, C.D. (2010). Proc. Natl. Acad. Sci. USA 107, 8531–8536. Culler, S.J., Hoff, K.G., and Smolke, C.D. (2010). Science 330, 1251–1255. Deans, T.L., Cantor, C.R., and Collins, J.J. (2007). Cell 130, 363–372. Dougherty, M.J., and Arnold, F.H. (2009). Curr. Opin. Biotechnol. 20, 486–491. Ducat, D.C., Way, J.C., and Silver, P.A. (2011). Trends Biotechnol. 29, 95–103. Dueber, J.E., Wu, G.C., Malmirchegini, G.R., Moon, T.S., Petzold, C.J., Ullal, A.V., Prather, K.L., and Keasling, J.D. (2009). Nat. Biotechnol. 27, 753–759. Dunlop, M.J., Keasling, J.D., and Mukhopadhyay, A. (2010). Syst. Synth. Biol. 4, 95–104. Ellis, T., Wang, X., and Collins, J.J. (2009). Nat. Biotechnol. 27, 465–471.
ACKNOWLEDGMENTS P.A.S. is supported by funds from the NIH, DOD, DOE, NSF, and the Wyss Institute for Biologically Inspired Engineering. C.D.S. is supported by funds from the NIH, NSF, and the Alfred P. Sloan Foundation. The authors thank Drew Endy and Jeff Way for comments.
REFERENCES Alon, U. (2007). An Introduction to Systems Biology: Design Principles of Biological Circuits (Boca Raton, FL: Chapman and Hall/CRC Press).
Elowitz, M.B., and Leibler, S. (2000). Nature 403, 335–338. Endy, D. (2005). Nature 438, 449–453. Farmer, W.R., and Liao, J.C. (2000). Nat. Biotechnol. 18, 533–537. Gardner, T.S., Cantor, C.C., and Collins, J.J. (2000). Nature 403, 339–342. Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., Chuang, R.Y., Algire, M.A., Benders, G.A., Montague, M.G., Ma, L., Moodie, M.M., et al. (2010). Science 329, 52–56. Haynes, K., and Silver, P.A. (2009). J. Cell Biol. 187, 589–596.
Lim, W.A. (2010). Nat. Rev. Mol. Cell Biol. 11, 393–403. Matzas, M., Stahler, P.F., Kefer, N., Siebelt, N., Boisguerin, V., Leonard, J.T., Keller, A., Stahler, C.F., Haberle, P., Gharizaden, B., et al. (2010). Nat. Biotechnol. 28, 1291–1294. Mukhopadhyay, A., Redding, A.M., Rutherford, B.J., and Keasling, J.D. (2008). Curr. Opin. Biotechnol. 19, 228–234. Norville, J.E., Derda, R., Drinkwater, K.A., Leschziner, A.E., and Knight, T.R. (2010). J. Biol. Eng. 4, 17. Pfleger, B.F., Pitera, D.J., Smolke, C.D., and Keasling, J.D. (2006). Nat. Biotechnol. 24, 1027–1032. Presidential Commission for the Study of Bioethical Issues. (2010). http://www.bioethics.gov/. Purnick, P.E., and Weiss, R. (2009). Nat. Rev. Mol. Cell Biol. 10, 410–422. Ro, D.K., Paradise, E.M., Ouellet, M., Fisher, K.J., Newman, K.L., Ndungu, J.M., Ho, K.A., Eachus, R.A., Ham, T.S., Kirby, J., et al. (2006). Nature 440, 940–943. Savageau, M. (2011). Ann. Biomed. Eng. Published online January 4, 2011. 10.1007/s10439-0100220-2. Sprinzak, D., and Elowitz, M.B. (2005). Nature 438, 443–448. Tian, J., Ma, K., and Saaem, I. (2009). Mol. Biosyst. 5, 14–22. Villa-Komaroff, L., Efstradiadis, A., Broome, S., Lomedico, P., Tizard, R., Naber, S.P., Chick, W.L., and Gilbert, W. (1978). Proc. Natl. Acad. Sci. USA 75, 3727–3731. Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, G., Forest, C.R., and Church, G.M. (2009). Nature 460, 894–898. Win, M.N., and Smolke, C.D. (2007). Proc. Natl. Acad. Sci. USA 104, 14283–14288. Win, M.N., Liang, J.C., and Smolke, C.D. (2009). Chem. Biol. 16, 298–310.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 859
Leading Edge
Minireview Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power Trey Ideker,1,2,* Janusz Dutkowski,1 and Leroy Hood3 1Departments
of Medicine and Bioengineering for Genomic Medicine University of California, San Diego, La Jolla, CA 92093, USA 3Institute for Systems Biology, Seattle, WA 98103, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.03.007 2Institute
A major difficulty in the analysis of complex biological systems is dealing with the low signalto-noise inherent to nearly all large biological datasets. We discuss powerful bioinformatic concepts for boosting signal-to-noise through external knowledge incorporated in processing units we call filters and integrators. These concepts are illustrated in four landmark studies that have provided model implementations of filters, integrators, or both.
Introduction Complexity is the grand challenge for science and engineering in the 21st century. Complex systems—by definition—have many parts in an intricate arrangement that gives rise to seemingly inexplicable or emergent behaviors. For example, a radio captures an electromagnetic signal and converts it through electronic circuitry into sound that we hear. To most, the radio is a black box with an input (electromagnetic waves) and an output (sound waves). However, understanding the inner workings of this box requires going head-to-head with the challenges of complexity. What are the component parts of the system and how are these parts interconnected? How do these connections influence functions and dynamic system outputs? In biology, ultimately one would like to create models that predict the emergent behaviors of complex entities—and even re-engineer these behaviors to humankind’s benefit. To decipher complexity, biologists have developed an impressive array of technologies—next-generation sequencing, tandem mass spectrometry, cell-based screening, and so on— that are capable of generating millions of molecular measurements in a single run. This enormous amount of data, however, is typically accompanied by a fundamental problem—an incredibly low rate of signal-to-noise. For example, the millions of single-nucleotide variants (SNVs) found in a typical genomewide association study or by the International Cancer Genome Consortium (Hudson et al., 2010) make it extremely difficult to identify which particular SNVs are the true causes of disease. Due to the overwhelming number of measurements, such analyses either lack power to detect the true signal or must admit an unacceptable amount of noise. Fortunately, biologists have two major weapons with which signal-to-noise may be improved. First is what we know about complexity, which can and should be used as strong prior assumptions when analyzing biological data. Known principles of complexity such as modularity, hierarchical organization, evolution, and inheritance (Hartwell et al., 1999) all provide 860 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
important insights into how biological systems are constructed and how they function. Second is the availability of data in many complementary layers—including the genome, transcriptome, proteome, metabolome, and interactome. A recent wave of new bioinformatic methods has demonstrated how both weapons—strong prior assumptions related to complexity and systematic accumulation of complementary data—can be used together or separately to exact substantial increases in signal-to-noise. In what follows, we summarize these developments within a general paradigm for signal detection in biology. Central to this paradigm are processing units we call filters and integrators, which draw on prior biological assumptions and complementary data to reduce noise and to boost statistical power. To illustrate these ideas in context, we review four landmark studies that have provided model implementations of filters and integrators. The Signal Detection Paradigm Imagine a biological dataset as a stream of information flowing into a hypothetical signal detection device (Figure 1A). The information flow is quantized into atomic units or events, representing measurements for entities such as genes or proteins, protein interactions, SNVs, pathways, cells, or individuals. Each event contains a certain amount of information, ranging from a single measurement (e.g., strength of protein interaction) to thousands (e.g., an SNV state or gene expression value over a population of patients). Some events represent true biological signals, with the definition of ‘‘signal’’ depending exquisitely on the type of results the experimentalist is looking for (e.g., an SNV causing disease or a true protein interaction; many examples are given later). The remaining events are noise, which can be due to errors that are technical in nature (uncontrollable variation in different instrument readings collected from the same sample) or biological in nature (uncontrollable variation in different samples collected from the same biological condition). An event may also be considered part of noise even if it is biological and
Figure 1. Boosting Signal-to-Noise in Biological Data using Prior Knowledge (A) Signal detection paradigm in which an input data stream is routed through a series of filtering and integration units, ending in a statistical test that makes accept or reject decisions. Symbols: m, information per event or sample size; D, effect size; ta, decision threshold; FDR, false discovery rate. (B) Probability distribution P(t) of the test statistic t over the entire data stream of signal plus noise (purple). This distribution is factored into a red signal and a blue noise component. FDR and power are visualized in terms of the areas under these curves to the right of ta. (C) Effect of varying parameters on the signal, noise, and signal plus noise probability distributions. The power is increased by more than 6-fold compared to (B), at an identical FDR. Colors are shown as in (B). (D) MAGENTA, a specific implementation of the signal detection paradigm for pathway-based disease gene mapping as described in Segre` et al. (2010).
reproducible, simply because it encodes aspects of phenotype irrelevant to the current studies. To make a decision on which events are signal, the device scores each event and accepts those for which the score exceeds a statistically defined decision threshold (Figure 1A). It is precisely this decision that becomes problematic in many large-scale biological studies, in which one either mistakenly rejects a large proportion of the true signal (low statistical power) or must tolerate a high proportion of accepted events that are noise (high false discovery rate or FDR).
Boosting Signal with Filters and Integrators To increase signal-to-noise, a pivotal trend in bioinformatics has been to augment the signal detection process with complementary datasets and with prior knowledge about the nature of signal. The vast majority of these approaches fall into either of two categories that we call filters and integrators (Table S1 available online). Filters attempt to cull some events from the information flow immediately and reject them as noise. For example, a detection system for differential expression might reject certain genes immediately if their expression levels fail to exceed a background value in any condition. Integrators, on the other hand, transform the information flow by aggregating individual events into larger units to yield a fundamentally new type of information, or by integrating together different types of information (Hwang et al., 2009). For example, genes might be aggregated into clusters of similar expression or of related function, in which the median levels of the clusters—not their individual genes—are propagated as the ‘‘events’’ on which final accept/reject decisions are made (Park et al., 2007). Importantly, the combining of filters or integrators results in a new device that itself can be recombined with other signal detection systems in a modular fashion. Both filters and integrators influence statistical power and FDR, but by fundamentally different means. Filters reduce the fraction of noise passing through the system and, as a consequence, the FDR. Alternatively, as filters are added, FDR can be held constant by relaxing the decision threshold, resulting in higher statistical power (Figures 1B and 1C). By comparison, integrators combine a train of weak signals into fewer stronger events, leading to an increase in ‘‘effect size’’ and thus a direct increase in statistical power. These methods complement the more classical means of boosting power by increasing the amount of information per event (also called the sample size) (Figure 1A). In each of the following four examples, boosting power with a combination of filters and integrators has been critical to the success of a landmark genome-scale analysis project. Example 1: Pathway-Level Integration of Genome-wide Association Studies Genome-wide association studies (GWAS) seek to identify polymorphisms, such as SNVs, that cause a disease or other phenotypic trait of interest. Despite the success of this strategy in mapping SNVs underlying many diseases, the identified loci typically explain only a small proportion of the heritable variation. For such diseases, one likely explanation is that the genetic contribution is distributed over many functionally related loci with large collective impact but with only modest individual effects that do not reach genome-wide significance in singleSNV tests (Wang et al., 2010; Yang et al., 2010). Based on this hypothesis, Segre` et al. (2010) investigated the collective impact of mitochondrial gene variation in type II diabetes. They described a method called MAGENTA that performs a meta-analysis of many different GWAS to achieve larger sample sizes than any single study, thereby increasing statistical power. MAGENTA also includes both filtering and integration steps (Figure 1D). First, a filter is applied so that SNVs that fall far from genes are removed. Next an integrator is applied to transform SNVs to genes, such that each gene is assigned a Cell 144, March 18, 2011 ª2011 Elsevier Inc. 861
score equal to the most significant p value of association among its SNVs. Gene scores are further corrected for confounding factors such as gene size, number of SNVs per kilobase, and genetic linkage. Finally, a second integrator combines the scores across sets of genes assigned to the same biochemical function or pathway, resulting in a single pathway-level p value of association. Simulation studies using MAGENTA suggest a potentially large boost in power to detect disease associations (Figure S1A). For example, the method has 50% power to detect enrichment for a pathway containing 100 genes of which 10 genes have weak association to the trait of interest. This performance is compared to only 10% power to detect any of the 10 genes at the single-SNV level. At this increased power, MAGENTA did not identify any mitochondrial pathways as functionally associated with type II diabetes, suggesting that mitochondria have overall low genetic contribution to diabetes susceptibility—a surprise given the conventional wisdom about the disease. On the other hand, in an independent analysis of genes influencing cholesterol, MAGENTA identified pathways related to fatty acid metabolism that had been missed by classical GWAS. Example 2: Mapping Disease Genes in Complete Genomes Sequencing and analysis of individual human genomes is one of the most exciting emerging areas of biology, made possible by the rapid advances in next-generation sequencing (Metzker, 2010). As complete genome sequencing becomes pervasive, one of the most important challenges will be to determine how such sequences should best be analyzed to map disease genes. The signal filtering and integration paradigm provides an excellent framework for developing methods in this arena. As a landmark example, Roach et al. (2010) described a filtering methodology for disease genes based on the complete genomic sequences of a nuclear family of four. This approach was used to identify just three candidate mutant genes, one of which encoded the Miller syndrome, a rare recessive Mendelian disorder for which both offspring, but neither parent, were affected. To begin the analysis, the four genome sequences were processed to identify approximately 3.7 million SNVs across the family. SNVs were then directed through a series of filters (Figure S2A). In the first, SNVs were rejected if they were unlikely to influence a gene-coding region annotated in the human genome reference map (http://genome.ucsc.edu/), leaving approximately 1% of SNVs that led to missense or nonsense mutations or fell precisely onto splice junctions. A second filter removed SNVs that were common in the human population and thus were unlikely to cause a rare Mendelian disorder. Like the first one, this filter yielded an approximate 100-fold decrease in the number of candidates. A third filter was designed to check inheritance patterns, which can be gleaned only from a family of related genomes. SNVs were removed that had a non-Mendelian pattern of inheritance (result of DNA sequencing errors) or did not segregate as expected for a recessive disease gene, in which each affected child must inherit recessive alleles from both parents. This filter yielded another 4- to 5-fold decrease in candidate SNVs versus using only a single parental genome. Finally, 862 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
an integrator was used to translate all remaining SNVs into their corresponding genes. Using the entire system of filters and integrators under a compound heterozygote recessive model, a total of three genes were identified as candidates. One of these (DHODH) was concurrently shown to be the cause of Miller syndrome. In this way, the family genome sequencing approach used the principles of Mendelian genetics (prior knowledge) to correct approximately 70% of the sequencing errors, directly identify rare variants (those present in two or more family members), and reduce enormously the search space for disease traits (corresponding to an increase in statistical power from 0.15% to 33%) (Figure S1B). Example 3: Assembly of Global Protein Signaling Networks Another area in which filtering and integration are turning out to be key is assembly of protein networks. An excellent example of network assembly is provided by the recent work of Breitkreutz et al. (2010), in which mass spectrometric analysis was used to report a high-quality network of 1844 interactions centered on yeast kinases and phosphatases. Central to the task of network assembly was a signal detection system for quality control and interpretation of the raw data. The data consisted of a stream of more than 38,000 proteins that had been coimmunoprecipitated with a different kinase or phosphatase used as bait. Bait proteins can interact both specifically and nonspecifically with a wide variety of peptides, and the nonspecific interactions comprise a major source of noise. To remove nonspecific interactions, the authors introduced a method called significance analysis of interactome (SAINT), in which each putative interacting protein is assigned a likelihood of true interaction based on its number of peptide identifications (representing the amount of information per event or sample size) (Figure S2B). After filtering, the remaining protein interactors are funneled to an integrator stage in which they are clustered into modules based on their overall pattern of interactions (Table S1). The resulting modular interaction network reveals an unprecedented level of crosstalk between kinase and phosphatase units during cell signaling. In this network, kinases and phosphatases are not mere cascades of proteins ordered in a linear fashion. Rather, they are more akin to the neurons of a vast neural network, in which each kinase integrates signals from myriad others, enabling the network to sense cell states, compute functions of these states, and drive an appropriate cellular response. It is likely that evolution tunes this network, such that some interactions dominate and others are minimized in a species-specific fashion. This might help explain two paradoxical effects seen pervasively in both signaling and regulation: (1) the same network across species can be used to control very different phenotypes (McGary et al., 2010); and (2) very different networks across species can be used to execute near identical responses (Erwin and Davidson, 2009). Example 4: Filtering Gene Regulatory Networks using Prior Knowledge One of the grand challenges of biology is to decipher the networks of transcription factors and other regulatory
components that drive gene expression, phenotypic traits, and complex behaviors (Bonneau et al., 2007). Toward this goal, probabilistic frameworks such as Bayesian networks have been extensively applied to learn gene regulatory relationships from mRNA expression data gathered over multiple time points and/or experimental conditions (Friedman, 2004). However, due to a limited sample size, large space of possible networks, and probabilistic equivalence of many alternative models, these approaches are often unable to find the underlying causal gene relationships. Recently, Zhu et al. (2008) showed that supplementing gene expression profiles with complementary information on genotypes may help to overcome some of these problems (Figure S2C). These authors sought to assemble a gene regulatory network for the yeast Saccharomyces cerevisiae using previously published mRNA expression profiles gathered for 112 yeast segregants. Rather than assemble a Bayesian network from expression data alone, the data were first supplemented with the genotypes of each segregant. The combined dataset was then analyzed to identify expression quantitative trait loci (eQTL)—genetic loci for which different mutant alleles associate with differences in expression for genes at the same locus (cis-eQTL) or for genes located elsewhere in the genome (trans-eQTL). The eQTLs were used as a filter to prioritize some gene relations and demote others. Any candidate causeeffect relations in which the effect gene is near an eQTL were removed, as the cis-eQTL already explains the gene expression changes at that locus. Conversely, cause-effect relations that were supported by trans-eQTLs and passed a formal causality test were prioritized. Supplementing gene expression profiles with genetic information significantly enhanced the power to identify bona fide causal gene relationships. Further improvement was achieved by introducing a second filter that prioritized cause-effect relations that correspond to measured physical interactions, including data from the many genome-wide chromatin immunoprecipitation experiments published for yeast that document physical interactions between transcription factors and gene promoters. Summary Biology is expanding enormously in its ability to decipher complex systems. This ability derives from the expanded power to incorporate diverse and complementary data types and to inject prior understanding of biological principles. Signal detection systems such as those discussed here—along with their filters, integrators, and other components—are leading to fundamental new biological discoveries and models, some of which will ultimately transform our understanding of disease and therapeutics. It is also likely that many of the strategies, technologies, and computational tools developed for healthcare can be
applied to problems of complexity inherent in other scientific domains, including energy, agriculture, and the environment. Healthcare and energy will demand significant societal resources moving forward—and hence offer unique opportunities to push the development and application of approaches for attacking complexity. SUPPLEMENTAL INFORMATION Supplemental Information includes two figures and one table and can be found with this article online at doi:10.1016/j.cell.2011.03.007. ACKNOWLEDGMENTS We gratefully acknowledge G. Hannum, S. Choi, I. Shmulevich, D. Galas, J. Roach, and N. Price for helpful comments and feedback. This work was funded by grants from the National Center for Research Resources (RR031228, T.I., J.D.), the National Institute of General Medical Sciences (GM076547, L.H.; GM070743 and GM085764, T.I.), the Department of Defense (W911SR-07-C-0101, L.H.), and the Luxembourg strategic partnership (L.H.). REFERENCES Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). Cell 131, 1354–1365. Breitkreutz, A., Choi, H., Sharom, J.R., Boucher, L., Neduva, V., Larsen, B., Lin, Z.Y., Breitkreutz, B.J., Stark, C., Liu, G., et al. (2010). Science 328, 1043–1046. Erwin, D.H., and Davidson, E.H. (2009). Nat. Rev. Genet. 10, 141–148. Friedman, N. (2004). Science 303, 799–805. Hartwell, L.H., Hopfield, J.J., Leibler, S., and Murray, A.W. (1999). Nature 402, C47–C52. Hudson, T.J., Anderson, W., Artez, A., Barker, A.D., Bell, C., Bernabe, R.R., Bhan, M.K., Calvo, F., Eerola, I., Gerhard, D.S., et al. (2010). Nature 464, 993–998. Hwang, D., Lee, I.Y., Yoo, H., Gehlenborg, N., Cho, J.H., Petritis, B., Baxter, D., Pitstick, R., Young, R., Spicer, D., et al. (2009). Mol. Syst. Biol. 5, 252. McGary, K.L., Park, T.J., Woods, J.O., Cha, H.J., Wallingford, J.B., and Marcotte, E.M. (2010). Proc. Natl. Acad. Sci. USA 107, 6544–6549. Metzker, M.L. (2010). Nat. Rev. Genet. 11, 31–46. Park, M.Y., Hastie, T., and Tibshirani, R. (2007). Biostatistics 8, 212–227. Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., Shannon, P.T., Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., et al. (2010). Science 328, 636–639. Segre`, A.V., Groop, L., Mootha, V.K., Daly, M.J., and Altshuler, D. (2010). PLoS Genet. 6, e1001058. Wang, K., Li, M., and Hakonarson, H. (2010). Nat. Rev. Genet. 11, 843–854. Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., et al. (2010). Nat. Genet. 42, 565–569. Zhu, J., Zhang, B., Smith, E.N., Drees, B., Brem, R.B., Kruglyak, L., Bumgarner, R.E., and Schadt, E.E. (2008). Nat. Genet. 40, 854–861.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 863
Leading Edge
Perspective Principles and Strategies for Developing Network Models in Cancer Dana Pe’er1,2,* and Nir Hacohen3,4,5 1Department
of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY 10027, USA for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, USA 3Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA 4Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, 149 13th Street, Charlestown, MA 02129, USA 5Department of Medicine, Harvard Medical School, Boston, MA 02115, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.03.001 2Center
The flood of genome-wide data generated by high-throughput technologies currently provides biologists with an unprecedented opportunity: to manipulate, query, and reconstruct functional molecular networks of cells. Here, we outline three underlying principles and six strategies to infer network models from genomic data. Then, using cancer as an example, we describe experimental and computational approaches to infer ‘‘differential’’ networks that can identify genes and processes driving disease phenotypes. In conclusion, we discuss how a network-level understanding of cancer can be used to predict drug response and guide therapeutics. Cells contain a vast array of molecular structures that come together to form complex, dynamic, and plastic networks. The recent development of high-throughput, massively parallel technologies has provided biologists with an extensive, although still incomplete, list of these cellular parts. The emerging challenge over the next decade is to systematically assemble these components into functional molecular and cellular networks and then to use these networks to answer fundamental questions about cellular processes and how diseases derail them. For example, how do these cellular components come together to robustly maintain homeostasis, process exogenous and endogenous signals, and then coordinate responses? How do genetic aberrations disrupt the regulatory network and manifest in disease, such as cancer? In this Perspective, we reason that, even with a partial understanding of molecular networks, biologists are currently poised to understand how networks are deregulated in cancer cells and then predict how these networks might respond to drugs. Quantitative biophysical network models encompassing a small number of components have made enormous contributions to our understanding of cellular networks. However, in this Perspective, we focus on deriving network models at a large systems scale from high-throughput data, using ‘‘data-driven network inference.’’ In this process, a set of modeling assumptions are defined, such as ‘‘genetic aberrations alter normal cellular regulation and drive tumor proliferation.’’ Then, data are used to derive a specific model, such as specifying for each tumor, which typically harbors many aberrant genes, which particular genes drive proliferation. In the end, a ‘‘good’’ model of biological networks should be able to predict the behavior of the network under different conditions and perturbations and, ideally, even help us to engineer a desired response. For example, where in the molecular network of a tumor should we perturb with drug to reduce tumor proliferation or metastasis? 864 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Such a global understanding of networks can have transformative value, allowing biologists to dissect out the pathways that go awry in disease and then identify optimal therapeutic strategies for controlling them. To illustrate the potential impact of global models, we note that the effect of a cancer drug is often hard to predict because crosstalk and feedback are still poorly mapped in most signaling pathways. For example, the mammalian target of rapamycin (mTOR) is critical for cell growth, and its activity is aberrant in most cancers; hence, it was expected to be a good therapeutic target. Nevertheless, it shows poor results in clinical trials. This deviation from our expectations may be due to feedback and crosstalk between the Akt/mTOR and the extracellular signal-regulated kinase (ERK) pathways (Carracedo et al., 2008). Inhibition of mTOR releases feedback inhibition of the receptor tyrosine kinases, which can activate both ERK and Akt (O’Reilly et al., 2006) and subsequently increase cell proliferation. For targeted therapy to succeed, a global view of the interconnectivity of signaling proteins and their influences is critical. In this Perspective, we consider the current state and potential future of data-driven computational approaches to network inference, with an emphasis on applications to cancer. We will describe three principles underlying molecular networks and inferring these from data. These principles are matched to current experimental capabilities and will need revamping as technological leaps produce new types of data (e.g., more quantitative data and with real-time dynamics). We then consider six promising experimental-computational strategies for constructing network-level models. Though not exhaustive, these principles and strategies illustrate fruitful directions in network biology and will hopefully stimulate discussion and experimentation among computational and experimental biologists.
Principle 1: Molecular Influences Generate Statistical Relations in Data Network biology has been empowered by genomics technologies that enable the simultaneous measurement of thousands of molecular species. Such data offer a global unbiased view of the entire system, which in turn necessitates computation and statistics. The key underlying assumption frequently used for inferring networks from genomic data is that influences and interactions between biological entities generate statistical relations in the observed data. For example, if protein A induces expression of protein B, then we expect to see high levels of protein B whenever levels or specific molecular states of its activator A are high. The reverse of this logic is that statistical correlation between protein states indicates a potential interaction between them. In a datadriven manner, a computer can comprehensively test millions of such hypotheses in seconds and provide a statistical score for each candidate molecular interaction or influence. For example, one can test the statistical association between the DNA copy number of a candidate regulator and gene expression of a target for each locus and gene in the genome (see Strategy 4). Various statistical frameworks have been successfully applied to network inference (Basso et al., 2005; Bonneau et al., 2007; Friedman et al., 2000); the commonality between the frameworks is that they model a target’s behavior as a function of its regulators and search for the most predictive regulator set. For example, Bayesian networks were used to reconstruct detailed signaling pathway structures in human T cells using only the concentration of phosphoproteins simultaneously measured in individual cells (Sachs et al., 2005). Based solely on this data, this network analysis discovered the majority of known influences between the measured signaling components without prior knowledge of any pathways. Moreover, the analysis uncovered a new point of crosstalk, which was confirmed experimentally. The same computational approach and mathematical formulae correctly reconstructed yeast metabolic networks from gene expression data (Pe’er et al., 2001). Together, these studies demonstrate the universal nature of statistical dependencies; the same formalism can be used to reconstruct yeast metabolic networks from gene expression data and mammalian signaling networks from phosphoprotein abundances. Mathematical models of molecular networks have been derived from basic biochemical principles for decades, combining chemical reaction equations into a quantitative model. For example, Michaelis Menten equations are frequently used to model transcription factor binding to DNA. Nevertheless, most contemporary data sets lack the quantitative and statistical power to resolve such models, even for small networks. Datadriven approaches typically necessitate hundreds of samples to gain the statistical power to resolve even a partial qualitative map of molecular interactions. Data requirements are highly dependent on the number of components modeled, the mathematical complexity of the equations representing the molecular interactions, and the effect size of the influences themselves. Thus, at the heart of data-driven modeling is finding the sweet spot in the tradeoff between more realistic (e.g., chemical reaction equations) and simpler models that can be inferred more robustly from data (e.g., linear regression).
One option is to build qualitative, rather than quantitative, models. These models can identify qualitative features such as ‘‘Mek (mitogen-activated protein kinase) activates Erk’’ or that ‘‘Met4 and Met28 are required together to induce sulfur metabolism.’’ If quantitative modeling is important for the problem at hand, linear regression models provide a robust alternative to nonlinear models (e.g., target gene expression is a linear combination of its transcription factors). Although nonlinear relations frequently occur in biology, linear regression models are more robust, and thus they often give better results, even when the underlying model is nonlinear. A detailed molecular model that is exhaustive in its molecular species and in the modeling of their interactions remains beyond our reach for the near future. A powerful strategy in systems biology is to abstract and simplify models. In the ‘‘module-network’’ approach (Segal et al., 2003), genes are grouped into modules that are assumed to share a regulatory program. The rationale for this grouping is based on numerous examples in which the same regulatory circuits coordinate activation or repression of groups of genes that are involved in the same process (e.g., the entire ribosome complex is regulated by common transcription factors). By pooling many similar genes together, the module-network framework significantly increases the statistical power to identify regulatory influences (Litvin et al., 2009). Principle 2: Networks Are Not Fixed: The Role of Context and Dynamics Molecular networks are not static; rather, they exhibit dynamic adaptations in response to both internal states and external signals. Influences that determine network context can be divided into four categories. (1) Genetic background strongly determines network behavior and gives rise to significant differences across individuals (and even cells in the special case of cancer). (2) Cell lineages have dramatically different network structures because of epigenetic changes and differential expression of genes. (3) Tissue milieu can reprogram networks and their behaviors, as stromal cells do for tumors. (4) Exogenous signals, such as nutrients and other chemicals, affect networks (Figure 1). Ultimately, health or disease emerges from an individual’s integration of internal and external cues. In cancer, context can have a profound impact on how patients respond to therapies. For example, in recent clinical trials of a new generation of rationally targeted therapies (e.g., Gleevec, Herceptin, and BRAF inhibitors for chronic myelogenous leukemia, breast cancer, and melanoma, respectively), even patients that share the targeted mutation and tumor type displayed substantially variable responses to the drugs (Sharma et al., 2010a). In addition, in another recent trial (i.e., phase II), a therapy was extremely effective at reversing tumors in metastatic melanoma patients carrying the oncogenic BRAF mutation (Flaherty et al., 2010), in which this drug effectively shuts down the ERK pathway that is critical for this cancer. Strikingly, however, the same drug leads to the activation of the ERK pathway in cells with wild-type BRAF (Poulikakos et al., 2010), potentially promoting tumors in these cells. To gauge such network activity, response, and potential, experiments must deliberately perturb the cell. For example, blood cells from acute myeloid leukemia patients could not be Cell 144, March 18, 2011 ª2011 Elsevier Inc. 865
be to generate a model that has a reasonable chance of being able to predict responses to new, previously unmeasured inputs, such as new drugs or combinations of drugs.
Figure 1. Differential Networks Explain Phenotypic Variation across Contexts The function of a molecular network is determined by context: genetics, tissue type, environment (e.g., nutrients), cell-cell communication, and small molecules. These influences combine to determine the phenotypic response. The ‘‘differential network’’ (colored nodes and edges) models the essential components that determine how and why a phenotypic response will vary between contexts.
differentiated from healthy cells when only the basal levels of phosphorylation of key signaling molecules were measured. Only when the samples were interrogated with growth factors and cytokines did the resulting signaling profiles correlate with tumor genetics, drug response, and disease outcome (Irish et al., 2004). The importance of interrogation with stimuli comes into play because many important signaling responses, such as ERK2 activation in response to epidermal growth factor receptor (EGFR), depend only on fold change, rather than basal protein levels that exhibit a high degree of variance (Cohen-Saidon et al., 2009). Cellular responses often involve multiple feedback loops and additional complexities (see Review by Yosef and Regev on page 886 of this issue). For example, the transcriptional response to EGF stimulation induces feedback attenuation factors, such as dual-specific phosphatases (DUSPs), which shut down the same pathways that activate EGF signaling (Amit et al., 2007). Therefore, to understand tumor network function, drug response, and the emergence of drug resistance, tumors must be systematically interrogated with different stimuli and drugs, followed by time series measurements. These measurements can then be used to derive a model describing the quantitative temporal sequence of events from the initial detection of an input to the tumor’s response. The goal would 866 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Principle 3: Extracting ‘‘Differential’’ Networks Given the importance of context, a central challenge for the field will be to collect data across multiple environments, cell types, and genetic backgrounds using genome-wide profiling to infer network connectivity and function in each context. Rather than explicitly modeling all of the moving parts of a network, we propose that it is feasible to derive models that focus on key components by capturing the essential differences in network wiring, function, and response between contexts (Figure 1). A ‘‘differential-network’’ model is designed to elucidate the following: How do a small number of changes to the network (e.g., genetic, epigenetic) alter the function of the network? At the center of such a model are the altered nodes (i.e., genes or proteins), and data-driven computation can be used to: (1) identify additional components that interact with these altered nodes; (2) qualify and quantify how these interactions are perturbed; and (3) model how these network perturbations continue to propagate though additional components to generate the phenotype of interest, such as proliferation, invasion, or drug response. For example, Carro et al. (2010) identify C/EBPb and STAT3 as ‘‘master’’ transcription factors for which their overexpression synergistically activates expression of mesenchymal genes and subsequent tumor aggressiveness in malignant glioma (see Strategy 3). The network model can be significantly simplified because only the components that play a role in the modeled response need identification and inclusion. Importantly, the differential network strategy does not apply only to disease. It can be used in any context to address questions such as what is the difference between two cell types or how does nutrient status affect cellular behavior? Here, we present six strategies that combine experimental and computational approaches to generate network inference models. Strategies 1 and 2 focus on identifying key components; Strategies 3 and 4 focus on deriving key network components concurrently with their regulatory influences; and Strategies 5 and 6 advance toward increasingly detailed quantitative models of network influences. Strategy 1: Discovery of Inherited Alleles and Somatic Mutations Chromosomal aberrations and mutations are a central characteristic of tumor cells. Multiple genetic aberrations collectively influence the expression of thousands of genes, altering the pathways and processes underlying malignant behaviors. The emergence of high-resolution copy number assays and massively parallel sequencing technologies opens the possibility of tracing phenotypic differences back to their genetic source. Large-scale initiatives are currently sequencing thousands of tumor genomes to comprehensively catalog the prevalent sequence mutations and chromosomal aberrations underlying each cancer type. Indeed, entire cancer genomes have already been sequenced in dozens of tumors, revealing a surprising degree of mutations and chromosomal aberrations in each
individual cancer (Stephens et al., 2009). On the other hand, exon capture techniques, called exome sequencing (Ng et al., 2010), concentrate on the 1% of coding sequence in the human genome. This technique enables a more economical cataloging of coding mutations in cohorts of hundreds of tumors per cancer type. Finally, transcriptome (or RNA) sequencing identifies expressed coding and noncoding RNA mutations. Transcriptome sequencing also reveals fusion genes created by intronic translocations, which are therefore undetected by exon sequencing techniques (Maher et al., 2009). These large-scale sequencing projects have uncovered a staggering diversity of genetic aberrations across tumors. Although each individual tumor typically harbors a large number of aberrations, only a few play a role in pathogenesis. Therefore, distinguishing between genetic changes that promote cancer progression (i.e., driver mutations) and neutral mutations (i.e., passenger) is like finding needles in haystacks. Recurrence was a rule of thumb for copy number aberrations (Weir et al., 2007). Thus, it was unforeseen that only a handful of genes would recurrently be targeted by sequence mutations in each cancer type. The current presumption is that the majority of the driver mutations are unique to each tumor. A key unresolved computational challenge is, therefore, to identify the driver mutations associated with each cancer genome. Indeed, the identification of these drivers is required before a differential-network approach can model how the pathogenic behavior emerges. Computational methods addressing this task are still under development (Akavia et al., 2010; Beroukhim et al., 2010; Carter et al., 2009). Although recurrence may not occur at the gene level, significant recurrence does occur at the level of pathways. For example, in glioblastoma, the majority of tumors have mutations in each of three signaling pathways: P53, retinoblastoma protein 1 (RB1), and rat sarcoma (RAS)/P13K (Cancer Genome Atlas Research Network, 2008). Because these findings define pathways, rather than genes, as unifying explanations for tumor progression, it is clear that finding drivers will rely on knowledge of molecular networks. Unfortunately, there is currently insufficient information on pathways in existing databases. First, the majority of signaling proteins are not associated with any known pathway. Second, existing databases include only a small part of what is known and typically do not take context (e.g., cell type) into account. More sophisticated experimental and computational methods will be needed to define and catalog the components involved in each pathway. A promising direction is the use of systematic experimental and computational approaches to build interaction maps (Amit et al., 2009; Bandyopadhyay et al., 2010), which can subsequently be used to identify key aberrant genes. For example, an algorithm known as interactome dysregulation enrichment analysis (IDEA) (Mani et al., 2008) uses a specially derived context-specific molecular network to identify key aberrant genes in lymphoma. Strategy 2: Discovering Key Network Components Using RNAi Although naturally occurring genetic alterations help to nominate causal genes in cancer and other diseases, deliberate perturba-
tion greatly facilitates causal gene identification. Taking advantage of sequenced genomes, mammalian interference (RNAi) libraries have emerged as a central tool for systematic perturbation of any gene. Indeed, RNAi-based screens have proven to be a major tool in cancer research in which cell lines are readily available and cell proliferation and survival provide surrogates of tumorigenesis. In one strategy, unbiased genome-wide RNAi screens in vitro and in vivo are used to identify candidate causative oncogenes and tumor suppressors that affect cell proliferation or survival. Typically, candidate genes that are found to have an aberrant sequence mutation, copy number alteration, or expression change in tumors are usually selected for deeper mechanistic characterization (Boehm et al., 2007; Ngo et al., 2010). However, one must always keep in mind that candidate genes that are not aberrant may be equally important to study and target therapeutically. In a second strategy, candidate genes are first selected from cancer genomic data sets and then validated with small-scale RNAi screens. For example, this strategy was recently used to identify critical genes within tumor chromosomal deletions (Ebert et al., 2008) and for finding the small subset of genes that affect metastasis among hundreds selectively expressed in metastatic tumor (Bos et al., 2009). Finally, unbiased screens can also shed light on the susceptibility or resistance of specific tumors to treatment (Ho¨lzel et al., 2010) and to find ways to enhance the effects of current therapies, such as taxanes (Whitehurst et al., 2007). Indeed, these types of findings can rapidly influence clinical research and practice. In all cases, RNAi serves as a ‘‘functional filter’’ to pinpoint or annotate genes that affect proliferation, death, metastasis, or any cellular processes. Combining computationally guided experiment design with RNAi screens has enormous untapped potential. Although genome-wide data sets are the most comprehensive, they are also expensive to perform at the large scale that is required to cover all contexts. A more economical approach is to refine our understanding with iterative cycles of experimentation and computation. Computational hypotheses derived from one data set are used to design the experiments for collecting the next data set (Figure 2). For example, protein interaction maps and microarray expression data were used to nominate high likelihood genes for characterization in an RNAi screen that dissects interactions between influenza and its host (Shapira et al., 2009). This approach deepened our understanding of how the virus manipulates or is controlled by key host defenses through direct and indirect interactions with four major host pathways. In the cancer setting, a good network model combined with computational inferences can suggest which gene combinations, genetic background, and cell assay (e.g., proliferation, invasion, metabolism) should be matched in searching for new components. For example, multiple mutations must occur together to produce a tumor (Land et al., 1983), necessitating a combinatorial RNAi approach. However, because a large-scale combinatorial RNAi screen is not feasible, computational selection of likely combinations renders the experiments feasible. Additionally, although most screens are performed in a single genetic background, in reality, the functional impact of perturbation is highly dependent on genetic background: disrupting the Cell 144, March 18, 2011 ª2011 Elsevier Inc. 867
Figure 2. Experimental Design for Network Inference (A) To comprehensively characterize tumor response to a drug, we suggest profiling a cohort of genetically characterized tumors using multiple technologies, following perturbation with small molecules and RNAi. Then, data-driven algorithms can infer differential network models from these data. The inferred models subsequently guide the design of experiments for the next iteration of data collection. (B) This figure illustrates how different genetic backgrounds and experiments can help to identify driver mutations and network structure. Each identified mutation recurs in a subset of samples, and driver targets are identified by knockdown using RNAi or drug.
expression of a gene can cause death in one cell line and have no effect in another cell line (Luo et al., 2008). Thus, it would be useful to select cell lines with informative genetic backgrounds. Finally, a good model can link genes with specific biological processes (Akavia et al., 2010) and help us efficiently extend RNAi studies to problems of invasion, metabolism, cell-cell interactions, and other cancer hallmarks that are poorly understood (Hanahan and Weinberg, 2011). Strategy 3: Statistical Identification of Dysregulated Genes and Their Regulators After discovering key network components, the next step is to decipher the wiring of the network. The majority of the computa868 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
tional work in this area has been through the analysis of tumor gene expression profiles that have accumulated on the order of tens of thousands of microarrays over the past decade. Unlike the top-down strategies described above, here, the approach is bottom up: first identify the differentially expressed genes relevant to a tumor phenotype of interest, and then use these genes to pinpoint the master regulator that brings about their dysregulation. Data-driven approaches (Principle 1) have been particularly powerful at locating the dysregulated genes and regulatory relations within tumor-related pathways. Analysis of glioblastoma gene expression profiles using ARACNE (algorithm for the reconstruction of accurate cellular networks) (Basso et al., 2005)
revealed two master regulators of mesenchymal transformation in malignant glioma (Carro et al., 2010): the gene module that corresponds to the mesenchymal transformation and the transcription factors most likely regulating this module (based on mutual information between regulator and targets). Both transcription factors were then confirmed experimentally. By extending this statistical reasoning to higher dimensions, the MINDY (modulator inference by network dynamics) algorithm (Wang et al., 2009) could cleverly identify posttranslational activators and inhibitors of master regulators. Based on the assumption that high (or low) expression of such activators (or inhibitors) would lead to increased (or reduced) coregulation of MYC with its known targets, MINDY uncovered new posttranslational modifiers of MYC in human B lymphocytes, and four of them were validated using RNAi. Demonstrating the generality of the statistical approach, the identified modifiers were found to act by diverse mechanisms, including protein turnover, transcription complex formation, and selective enzyme recruitment. As we wait for the development of experimental technologies that detect most posttranslational changes in high throughput, thousands of existing mRNA expression data sets can benefit from this powerful statistical approach to predict key modulators of regulatory activity by any biochemical mechanism. We have thus only begun to tap into the potential of these approaches to uncover the regulatory mechanisms that lead to tumors and other pathogenic phenotypes. Moreover, once profiles of cancer proteomes and their posttranslational modifications become more readily available, these methods will be dramatically empowered. Strategy 4: Integrating Genotype and Gene Expression into Causal Models Current analysis has only scratched the surface of existing data sets, and there is critical need for powerful computational approaches to expose the wealth of hidden information. A promising approach is ‘‘data integration’’ that builds a model from diverse data types (e.g., gene sequence, gene expression profiles, and protein-protein interactions), which each shed a different light on the underlying biology. The resulting combination is more than the sum of the parts (see the MiniReview by Ideker et al. on page 860 of this issue). A natural integration that captures the essence of differential networks is sequence and expression. For example, the CONEXIC (copy number and expression in cancer) algorithm (Akavia et al., 2010) combines DNA copy number with gene expression levels to identify driver mutations and predict the processes that they alter. The modeling assumptions underlying the data integration are: (1) A driver mutation should co-vary with a gene module involved in tumorigenesis (i.e., it assumes that the module’s expression is ‘‘modulated’’ by the driver); and (2) Expression levels of the driver control the malignant phenotype rather than copy number (because other mechanisms may lead to similar dysregulated expression of the driver gene). This approach predicted two new tumor dependencies in melanoma and the processes that they alter. Moreover, these predictions were then confirmed using RNAi. CONEXIC thus uses gene expression as an intermediary to connect genotype
to phenotype, building a cascade of events from DNA, through modulated gene expression, to tumorigenic phenotype. Anchoring the model at the DNA provided support for causality of influence between driver and module, although this influence can still be indirect by a cascade of unknown mechanisms. Though such modeling approaches have only recently taken hold in cancer genomics, these have been developing in genetic association for a few years. Chen and colleagues identified gene networks that are perturbed by quantitative trait loci (QTL), which in turn lead to metabolic disease (Chen et al., 2008). A single comprehensive computation locates the QTL, identifies how it perturbs the molecular network, and in turn leads to variation in disease traits. As more data types that capture the ‘‘state’’ of the network are collected (e.g., metabolite concentrations using mass spectrometry), these differential-network (Principle 3) approaches will lead to increasingly mechanistic and causal models of disease. Although this strategy can be applied to any process or disease, cancer is particularly suited for these approaches because somatic mutations driving tumorigenesis typically have a large impact on multiple genes and cellular processes, and thus their effect is more easily detected. Disease genes based on germline mutations that persist though the powerful evolutionary filters are typically more subtle and harder to detect; indeed, disease is frequently invoked only by the combinatorial interaction of many genes. As proof of concept of ‘‘personalized medicine’’ and using yeast as a model system, CAMELOT (causal modeling with expression linkage for complex traits) (Chen et al., 2009) integrated genotype and gene expression levels (measured prior to drug exposure) to quantitatively predict drug sensitivity. Applying a differential network approach, a small number of causative genes are identified and then used to build regression models to predict drug response for each yeast strain. The algorithm faithfully predicted both the causal genes (24/24 predictions validated) and drug response. Although epistatic relations existed between genes, the statistical simplicity of linear models led to more robust and accurate models from data. We anticipate that a comparable data set from patient tumors (including genotype, basal gene expression, and quantitative drug response) could be used to rationally select each individual patient’s drug treatment, essentially customizing and optimizing patient care. Strategy 5: Integration of Single Cell Data to Account for Cell-to-Cell Heterogeneity Whereas the measurements discussed thus far were taken over population aggregates using bulk assays, most signal processing occurs at the level of the individual cell. Over the past decade, studies have repeatedly demonstrated a large degree of heterogeneity between individual cells, even within clonal populations. This variation arises from differences in protein concentrations and stochastic fluctuations in biochemical reactions involving molecules with low copy numbers. A common finding is that a response appears dose dependent in bulk assays but is actually an ‘‘all or nothing’’ response in single cells. That is, the intensity of the single cell response remains constant under dose, but the fraction of the cells that respond increases Cell 144, March 18, 2011 ª2011 Elsevier Inc. 869
with dose (e.g., NF-kB in response to TNFa) (Tay et al., 2010). In these cases, there are a number of distinct subpopulations, and no individual cell behaves in accordance with the population average. Such subpopulations confound network inference algorithms when two molecules exhibit statistical dependency at the population level but actually reside in mutually exclusive cells. Heterogeneity of molecules at the single cell level can have crucial functional impact. Even clonal cell lines treated with drugs under carefully controlled conditions exhibit a large, previously unappreciated degree of variation in cell survival and other parameters (Cohen et al., 2008). A bulk growth assay can mask a small subpopulation of drug-resistant cells, which can later form a drug-resistant tumor. Though much debate still exists regarding the origins and emergence of these subpopulations, it is clear that such populations often exist in tumors. For example, Sharma and colleagues identified a drug-tolerant state that can be transiently acquired and relinquished through reversible epigenetic changes that occur at low frequency (Sharma et al., 2010b). Therefore, to model drug response in tumors, it is vital to observe the system at the single cell level and take heterogeneity (stochastic, genetic, and microenvironment) into account. A unique and beneficial feature of single cell data is the simultaneous observation of multiple signaling proteins in each individual cell. The stochastic variation observed across individual cells can be harnessed as a data-rich source for network inference, in which each of many thousands of cells can be treated as an individual sample (Sachs et al., 2005). This strategy provides significantly more samples than are available in bulk assays (e.g., each microarray is only a single sample). Nevertheless, this amount of data comes with a technical tradeoff. To identify interactions and their function, the participating signaling proteins need to be measured simultaneously in the same sample. Typically, single cell measurement technologies are limited to a small number of simultaneous channels (approximately four to ten channels for flow cytometry and approximately three channels for microscopy), with microscopy having the unique advantage of real-time tracking across space and time. A promising emerging technology is mass spectrometry-based single cell cytometry (Ornatsky et al., 2008), which currently can measure up to 35 antibodies in a single cell, with the potential scale up to 100. This approach will likely break new ground by enabling the study of midscale networks in individual cells. We hope and must rely on clever chemists, engineers, and physicists to take on this important challenge of measuring many molecular states in live, single cells over time and space. In the meantime, computational approaches can help bridge the gap by: (1) pointing to a small number of key components in a differential network, which would be valuable to analyze at the single cell level, and (2) stitching together small, overlapping subnetworks into larger network models (Sachs et al., 2009). But there remains a need to develop methods for integrating genomic data sets at the population level with single cell measurements over small subsets of components at critical network junctures, leading to a more accurate model of the underlying cellular computations. 870 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Strategy 6: Using Perturbations to Reveal Network Wiring To infer network models that describe how a network responds to stimuli, as well as through what molecular interactions and mechanisms this sensing and response occurs, comprehensive profiles must be measured following perturbations. We consider three methods to perturb the system: RNAi, drugs, and natural variation. As this strategy is still under development, this section is more speculative. Measuring network behavior following an RNAi perturbation uncovers the functions of a gene and provides definitive causal links between network components. A key strength of RNAi is that it can be used effectively to target any desired gene. However, RNAi also has limitations due to its slow kinetics and potential nonspecific cellular responses (e.g., innate immune response to double-stranded RNA, overloading of the RNAi machinery, and off-target effects). Using RNAi-based perturbations followed by comprehensive measurements, Amit et al. (2009) recently developed a network model of transcriptional regulation in the pathogen-sensing response. Candidate regulators and a reduced signature response were first selected from microarray data of cells stimulated with pathogens. Each candidate was then knocked down with RNAi, and the effect on the signature was quantified. This strategy uncovered many new factors involved in pathogen sensing and generated an informative network wiring diagram that revealed new crosstalk and feedback in these pathways. This strategy and its variations should succeed in reconstructing medium-size molecular networks in other systems. A second perturbation to consider is small molecules, which often have unique and valuable properties for network modeling and direct relevance to patient care. First, in contrast to RNAi kinetics, the instantaneous action of small molecules allows for accurate control of both dose and timing, leading to simpler interpretations of its effects, without the need to consider network adaptation. Second, small molecules can have specific biochemical effects on proteins, leading to elimination of edges in the network, rather than entire nodes as RNAi does. By comprehensive monitoring of the resulting changes in the network upon drug perturbation, we can refine network models and, importantly, discover how pathway activation, crosstalk, and feedback differ across individual tumors with variable levels of drug sensitivity. Third, variation in the DNA across individuals is a powerful resource for studying the effects of perturbation on network function. It is also effective for detecting regulatory interactions, uncovering complex phenotypes, and inferring networks (Lee et al., 2006). In contrast to deliberate and somewhat dramatic disruption of a gene’s function through RNAi or drugs, more subtle effects, such as the attenuation or alteration of function, can be observed in genetically divergent individuals. Natural variation provides us with numerous genetic alterations in various combinations, as selected by evolution to produce functional pathways. By monitoring functional pathways in action, we can infer how network components work together under different conditions. Each individual’s genetic variation provides distinct information linking genotype and phenotype and helps to explain network behavior.
What still needs to be developed is an integrated experimental-computational strategy that combines stimulations and perturbations with functional measurements from the same cells to build network models. Variation in stimuli and environment allows us to derive what the network is computing, and perturbations to its components elucidate how the network is computing. This suggests expanding the framework set forth by Amit and colleagues (Amit et al., 2009) to additional dimensions, including a time series of gene expression and proteomic measurements, following each combination of stimuli and perturbations. Natural variation between individuals and tumors combined with targeted perturbations using RNAi or drugs will provide particularly powerful data for deriving tumor network models. Executing the experimental design proposed above requires technological developments. Much of the dynamics occurs at the level of proteins and their modifications, raising the need for high-throughput proteomics to measure protein abundances and activity states. Importantly, the proposed design requires assaying a prohibitively large number of samples. To make significant progress in the understanding of molecular networks, there is a critical need for the development of more economical multiplex functional assays that can measure thousands of molecular species per sample at low sample cost. An iterative approach, in which computational modeling with existing data guides the selection of the next set of experiments, will provide the most cost-effective design (Figure 2). New experimental technologies are rapidly progressing, with computational efforts lagging behind. For example, generating transcriptome sequence reads is easy, but their assembly remains challenging. To utilize the enormous potential of the data types delineated above, significant advances in computational modeling are required. Specifically, there is need for a transition from static and qualitative models to temporal and quantitative models. Future: Personalized Cancer Medicine Networks govern fundamental processes, such as the development of a multicellular organism from a single cell and communication between immune cells in response to a pathogen. Fueled by technology and computation, research in the coming decade is expected to unravel the details and principles behind diverse molecular networks and how they compute life’s functions. For example, the ongoing revolution that has enabled the sequencing of individuals provides the first opportunities to systematically study and explain how DNA variation results in our phenotypic diversity. Reaching these goals, however, will also necessitate a deeper understanding of the biophysical principles underlying signal processing in small biological circuits and how these come together in systems of increasing size and complexity. Within cancer research, systems biology is dramatically advancing our mechanistic understanding of tumor progression and the design of personalized therapeutics. Continued success, however, will depend on critical advances in both experimental and computational methods. Improvements in tools for measurement—especially mass spectrometry and cost-effective multiplex detection—and perturbation—especially RNAi and small molecules—will fill in our understanding of the many molecular
layers that underlie network function. On the computational end, the key bottleneck is the development of validated computational methods that integrate heterogeneous data and build differential-network models on a per tumor basis. These methods are required to: (1) identify the genetic aberrations and the master regulators that drive proliferation, survival, metastasis, and drug resistance; (2) model the adaptive/feedback mechanisms that thwart the efficacy of potent drugs; and (3) predict additional target pathways for combinatorial drug treatment. Based on these predictions, more data can be collected to refine the models in iterative rounds of computation and experiments. As three-dimensional models of cancer (Ridky et al., 2010) continue to develop, we can also profile multiple cell types in a tumor environment and model the interactions between these. In short, these studies should teach us what drives cancers and what part of the networks we should target, both initially and after the network adapts and mutates. Many of us believe that the ultimate solutions to minimizing cancer reside in the regime of combinatorial patient-specific drug therapy, immunotherapy, and gene therapy. Accurate quantitative models of tumor networks should predict the effects of drug perturbations and thus enable sophisticated rational therapy with optimized dosage, timing, and drug combination for each individual tumor. Drug combinations can address feedback and network adaptation, ensuring shutdown of the necessary pathways. Additionally, drug combinations can target distinct subpopulations within a tumor. Tumor networks are armed with the ability to adapt and rapidly evolve and, thus, are a powerful adversary. These need to be met with equally sophisticated and flexible therapy regimes that can track these adaptations and dynamically adapt over time, placing us several moves ahead of the tumor. Studying the emergence of drug resistance both in vitro (Johannessen et al., 2010) and in vivo can better inform methods to anticipate potential paths of resistance. The ultimate therapies would involve sending ‘‘networks’’ in vivo to track tumor behavior and control the dosage and timing of drug release in response to tumor behavior. This long-term goal should become feasible as the fields of network biology, synthetic biology, and appropriate drug delivery methods mature. In the immediate future, however, our goal should be to anticipate and monitor real-time changes in the tumor’s network and adapt our therapies accordingly. ACKNOWLEDGMENTS The authors would like to thank Arnon Arazi, Andrea Califano, William Hahn, Andreja Jovic, Oren Litvin, Neal Rosen, Sagi Shapira, and Cathy Wu for valuable comments. The authors would like to thank Oren Litvin for help with the illustrations. This research was supported by the NIH Director’s New Innovator Award Program through grant numbers DP2-OD002414-01 (D.P.) and DP2 OD002230 (N.H.), as well as NIAID U54 AI057159 (N.H.). D.P. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and Packard Fellowship for Science and Engineering.
REFERENCES Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., Pochanard, P., Mozes, E., Garraway, L.A., and Pe’er, D. (2010). An integrated approach to uncover drivers of cancer. Cell 143, 1005–1017.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 871
Amit, I., Citri, A., Shay, T., Lu, Y., Katz, M., Zhang, F., Tarcic, G., Siwak, D., Lahad, J., Jacob-Hirsch, J., et al. (2007). A module of negative feedback regulators defines growth factor signaling. Nat. Genet. 39, 503–512. Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., Guttman, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science 326, 257–263. Bandyopadhyay, S., Chiang, C.Y., Srivastava, J., Gersten, M., White, S., Bell, R., Kurschner, C., Martin, C.H., Smoot, M., Sahasrabudhe, S., et al. (2010). A human MAP kinase interactome. Nat. Methods 7, 801–805. Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R., and Califano, A. (2005). Reverse engineering of regulatory networks in human B cells. Nat. Genet. 37, 382–390. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., et al. (2010). The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905. Boehm, J.S., Zhao, J.J., Yao, J., Kim, S.Y., Firestein, R., Dunn, I.F., Sjostrom, S.K., Garraway, L.A., Weremowicz, S., Richardson, A.L., et al. (2007). Integrative genomic approaches identify IKBKE as a breast cancer oncogene. Cell 129, 1065–1079. Bonneau, R., Facciotti, M.T., Reiss, D.J., Schmid, A.K., Pan, M., Kaur, A., Thorsson, V., Shannon, P., Johnson, M.H., Bare, J.C., et al. (2007). A predictive model for transcriptional control of physiology in a free living cell. Cell 131, 1354–1365. Bos, P.D., Zhang, X.H., Nadal, C., Shu, W., Gomis, R.R., Nguyen, D.X., Minn, A.J., van de Vijver, M.J., Gerald, W.L., Foekens, J.A., and Massague´, J. (2009). Genes that mediate breast cancer metastasis to the brain. Nature 459, 1005– 1009. Cancer Genome Atlas Research Network. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068. Carracedo, A., Ma, L., Teruya-Feldstein, J., Rojo, F., Salmena, L., Alimonti, A., Egia, A., Sasaki, A.T., Thomas, G., Kozma, S.C., et al. (2008). Inhibition of mTORC1 leads to MAPK pathway activation through a PI3K-dependent feedback loop in human cancer. J. Clin. Invest. 118, 3065–3074. Carro, M.S., Lim, W.K., Alvarez, M.J., Bollo, R.J., Zhao, X., Snyder, E.Y., Sulman, E.P., Anne, S.L., Doetsch, F., Colman, H., et al. (2010). The transcriptional network for mesenchymal transformation of brain tumours. Nature 463, 318–325. Carter, H., Chen, S., Isik, L., Tyekucheva, S., Velculescu, V.E., Kinzler, K.W., Vogelstein, B., and Karchin, R. (2009). Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 6660–6667. Chen, Y., Zhu, J., Lum, P.Y., Yang, X., Pinto, S., MacNeil, D.J., Zhang, C., Lamb, J., Edwards, S., Sieberts, S.K., et al. (2008). Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429–435. Chen, B.J., Causton, H.C., Mancenido, D., Goddard, N.L., Perlstein, E.O., and Pe’er, D. (2009). Harnessing gene expression to identify the genetic basis of drug resistance. Mol. Syst. Biol. 5, 310. Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., et al. (2008). Dynamic proteomics of individual cancer cells in response to a drug. Science 322, 1511–1516. Cohen-Saidon, C., Cohen, A.A., Sigal, A., Liron, Y., and Alon, U. (2009). Dynamics and variability of ERK2 response to EGF in individual living cells. Mol. Cell 36, 885–893. Ebert, B.L., Pretz, J., Bosco, J., Chang, C.Y., Tamayo, P., Galili, N., Raza, A., Root, D.E., Attar, E., Ellis, S.R., and Golub, T.R. (2008). Identification of RPS14 as a 5q- syndrome gene by RNA interference screen. Nature 451, 335–339. Flaherty, K.T., Puzanov, I., Kim, K.B., Ribas, A., McArthur, G.A., Sosman, J.A., O’Dwyer, P.J., Lee, R.J., Grippo, J.F., Nolop, K., and Chapman, P.B. (2010).
872 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Inhibition of mutated, activated BRAF in metastatic melanoma. N. Engl. J. Med. 363, 809–819. Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620. Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: The next generation. Cell 144, 646–674. Ho¨lzel, M., Huang, S., Koster, J., Ora, I., Lakeman, A., Caron, H., Nijkamp, W., Xie, J., Callens, T., Asgharzadeh, S., et al. (2010). NF1 is a tumor suppressor in neuroblastoma that determines retinoic acid response and disease outcome. Cell 142, 218–229. Irish, J.M., Hovland, R., Krutzik, P.O., Perez, O.D., Bruserud, O., Gjertsen, B.T., and Nolan, G.P. (2004). Single cell profiling of potentiated phospho-protein networks in cancer cells. Cell 118, 217–228. Johannessen, C.M., Boehm, J.S., Kim, S.Y., Thomas, S.R., Wardwell, L., Johnson, L.A., Emery, C.M., Stransky, N., Cogdill, A.P., Barretina, J., et al. (2010). COT drives resistance to RAF inhibition through MAP kinase pathway reactivation. Nature 468, 968–972. Land, H., Parada, L.F., and Weinberg, R.A. (1983). Tumorigenic conversion of primary embryo fibroblasts requires at least two cooperating oncogenes. Nature 304, 596–602. Lee, S.I., Pe’er, D., Dudley, A.M., Church, G.M., and Koller, D. (2006). Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc. Natl. Acad. Sci. USA 103, 14062–14067. Litvin, O., Causton, H.C., Chen, B.J., and Pe’er, D. (2009). Modularity and interactions in the genetics of gene expression. Proc. Natl. Acad. Sci. USA 106, 6441–6446. Luo, B., Cheung, H.W., Subramanian, A., Sharifnia, T., Okamoto, M., Yang, X., Hinkle, G., Boehm, J.S., Beroukhim, R., Weir, B.A., et al. (2008). Highly parallel identification of essential genes in cancer cells. Proc. Natl. Acad. Sci. USA 105, 20380–20385. Maher, C.A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, X., Sam, L., Barrette, T., Palanisamy, N., and Chinnaiyan, A.M. (2009). Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101. Mani, K.M., Lefebvre, C., Wang, K., Lim, W.K., Basso, K., Dalla-Favera, R., and Califano, A. (2008). A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol. Syst. Biol. 4, 169. Ng, S.B., Buckingham, K.J., Lee, C., Bigham, A.W., Tabor, H.K., Dent, K.M., Huff, C.D., Shannon, P.T., Jabs, E.W., Nickerson, D.A., et al. (2010). Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35. Ngo, V.N., Young, R.M., Schmitz, R., Jhavar, S., Xiao, W., Lim, K.H., Kohlhammer, H., Xu, W., Yang, Y., Zhao, H., et al. (2010). Oncogenically active MYD88 mutations in human lymphoma. Nature 470, 115–119. O’Reilly, K.E., Rojo, F., She, Q.B., Solit, D., Mills, G.B., Smith, D., Lane, H., Hofmann, F., Hicklin, D.J., Ludwig, D.L., et al. (2006). mTOR inhibition induces upstream receptor tyrosine kinase signaling and activates Akt. Cancer Res. 66, 1500–1508. Ornatsky, O.I., Lou, X., Nitz, M., Scha¨fer, S., Sheldrick, W.S., Baranov, V.I., Bandura, D.R., and Tanner, S.D. (2008). Study of cell antigens and intracellular DNA by identification of element-containing labels and metallointercalators using inductively coupled plasma mass spectrometry. Anal. Chem. 80, 2539–2547. Pe’er, D., Regev, A., Elidan, G., and Friedman, N. (2001). Inferring subnetworks from perturbed expression profiles. Bioinformatics 17 (Suppl 1), S215–S224. Poulikakos, P.I., Zhang, C., Bollag, G., Shokat, K.M., and Rosen, N. (2010). RAF inhibitors transactivate RAF dimers and ERK signalling in cells with wild-type BRAF. Nature 464, 427–430. Ridky, T.W., Chow, J.M., Wong, D.J., and Khavari, P.A. (2010). Invasive threedimensional organotypic neoplasia from multiple normal human epithelia. Nat. Med. 16, 1450–1455.
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., and Nolan, G.P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529. Sachs, K., Itani, S., Carlisle, J., Nolan, G.P., Pe’er, D., and Lauffenburger, D.A. (2009). Learning signaling network structures with sparsely distributed data. J. Comput. Biol. 16, 201–212. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Friedman, N. (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176. Shapira, S.D., Gat-Viks, I., Shum, B.O., Dricot, A., de Grace, M.M., Wu, L., Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139, 1255–1267. Sharma, S.V., Haber, D.A., and Settleman, J. (2010a). Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents. Nat. Rev. Cancer 10, 241–253. Sharma, S.V., Lee, D.Y., Li, B., Quinlan, M.P., Takahashi, F., Maheswaran, S., McDermott, U., Azizian, N., Zou, L., Fischbach, M.A., et al. (2010b). A chro-
matin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80. Stephens, P.J., McBride, D.J., Lin, M.L., Varela, I., Pleasance, E.D., Simpson, J.T., Stebbings, L.A., Leroy, C., Edkins, S., Mudie, L.J., et al. (2009). Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010. Tay, S., Hughey, J.J., Lee, T.K., Lipniacki, T., Quake, S.R., and Covert, M.W. (2010). Single-cell NF-kappaB dynamics reveal digital activation and analogue information processing. Nature 466, 267–271. Wang, K., Saito, M., Bisikirska, B.C., Alvarez, M.J., Lim, W.K., Rajbhandari, P., Shen, Q., Nemenman, I., Basso, K., Margolin, A.A., et al. (2009). Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat. Biotechnol. 27, 829–839. Weir, B.A., Woo, M.S., Getz, G., Perner, S., Ding, L., Beroukhim, R., Lin, W.M., Province, M.A., Kraja, A., Johnson, L.A., et al. (2007). Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893–898. Whitehurst, A.W., Bodemann, B.O., Cardenas, J., Ferguson, D., Girard, L., Peyton, M., Minna, J.D., Michnoff, C., Hao, W., Roth, M.G., et al. (2007). Synthetic lethal screen identification of chemosensitizer loci in cancer cells. Nature 446, 815–819.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 873
Leading Edge
Primer Modeling the Cell Cycle: Why Do Certain Circuits Oscillate? James E. Ferrell, Jr.,1,2,* Tony Yu-Chen Tsai,1,2 and Qiong Yang1 1Department
of Chemical and Systems Biology of Biochemistry Stanford University School of Medicine, Stanford, CA 94305-5174, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.03.006 2Department
Computational modeling and the theory of nonlinear dynamical systems allow one to not simply describe the events of the cell cycle, but also to understand why these events occur, just as the theory of gravitation allows one to understand why cannonballs fly in parabolic arcs. The simplest examples of the eukaryotic cell cycle operate like autonomous oscillators. Here, we present the basic theory of oscillatory biochemical circuits in the context of the Xenopus embryonic cell cycle. We examine Boolean models, delay differential equation models, and especially ordinary differential equation (ODE) models. For ODE models, we explore what it takes to get oscillations out of two simple types of circuits (negative feedback loops and coupled positive and negative feedback loops). Finally, we review the procedures of linear stability analysis, which allow one to determine whether a given ODE model and a particular set of kinetic parameters will produce oscillations. In many eukaryotic cells, the cell cycle proceeds as a sequence of contingent events. A new cell must first grow to a sufficient size before it can begin DNA replication. Then, the cell must complete DNA replication before it can begin mitosis. Finally, the cell must successfully organize a metaphase spindle before it can complete mitosis and begin the cycle again. If cell growth, DNA replication, or spindle assembly is slowed down, the entire cell cycle slows. Thus, this type of cell cycle is like an ‘‘assembly line’’ or ‘‘succession of dominoes’’ (Hartwell and Weinert, 1989; Murray and Kirschner, 1989b). However, some cell cycles are qualitatively different in terms of their dynamics. Most notable of these exceptions is the early embryonic cell cycle in the amphibian Xenopus laevis. DNA replication is not contingent upon cell growth, probably because the frog egg is so big to start with. Mitotic entry is not contingent upon completion of DNA replication, and mitotic exit is not contingent upon the successful assembly of a metaphase spindle because the relevant checkpoints are ineffective in the context of the embryo’s high cytoplasm:nucleus ratio (Dasso and Newport, 1990; Minshull et al., 1994). Lacking these contingencies, the early embryo simply pulses once every 25 min, irrespective of whether the endpoints of the cell cycle (DNA replication and mitosis) have been completed (Hara et al., 1980). Thus, this cell cycle is clock-like (Murray and Kirschner, 1989b); it behaves as if it is being driven by an autonomous biochemical oscillator. Although many biological processes seem almost unfathomably complex and incomprehensible, oscillators and clocks are the types of processes that we might have a good chance of not just describing, but also understanding. Accordingly, much effort has gone into understanding how simple cell cycles work in model systems like Xenopus embryos and the fungi S. pombe 874 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
and S. cerevisiae. This requires the identification of the proteins and genes needed for the embryonic cell cycle and the elucidation of the regulatory processes that connect these proteins and genes. Over the past three decades, enormous progress has been made toward these ends. In each case, the cell cycle is driven by a protein circuit centered on the cyclin-dependent protein kinase CDK1 and the anaphase-promoting complex (APC) (Figure 1A). The activation of CDK1 drives the cell into mitosis, whereas the activation of APC, which generally lags behind CDK1, drives the cell back out (Figure 1B). There are still some missing components and poorly understood connections, but overall, the cell-cycle network is fairly well mapped out. But a satisfying understanding of why the CDK1/APC system oscillates requires more than a description of components and connections; it requires an understanding of why any regulatory circuit would oscillate instead of simply settling down into a stable steady state. What types of biochemical circuits can oscillate, and what is required of the individual components of the circuit to permit oscillations? Such insights are provided by the theory of nonlinear dynamics and by computational modeling. Indeed, cell-cycle modeling has become a very popular pursuit. Hundreds of models have been published (Table 1), beginning with Kauffman, Wille, and Tyson’s prescient proposal that the cell cycle of the yellow slime mold Physarum polycephalum is driven by a relaxation oscillator (Kauffman and Wille, 1975; Tyson and Kauffman, 1975). Many of the early models, and a few of the more recent models, were simple, as models in physics typically are. They consisted of a small number of ordinary differential equations relating a few timedependent variables (e.g., protein concentrations or activities) to each other and to a few time-independent kinetic parameters.
ODE models, which translate this logic into chemical terms. The basic methods for analyzing ODE models of oscillators are well known in the field of nonlinear dynamics but are not so well known among biologists. We believe that it is high time that they were; after all, we biologists are studying what are probably the world’s most interesting nonlinear dynamical systems. We will emphasize the basic concepts of oscillator function and, to the extent possible, keep the algebra to a minimum. For further information, the reader is directed to lucid reviews by Goldbeter (Goldbeter, 2002) and Nova´k and Tyson (Nova´k and Tyson, 2008; Tyson et al., 2003), as well as Strogatz’s outstanding textbook (Strogatz, 1994).
Figure 1. Simplified Depiction of the Embryonic Cell Cycle, Highlighting the Main Regulatory Loops (A) Cyclin-CDK1 is the master regulator of mitosis. APC-Cdc20 is an E3 ubiquityl ligase, which marks mitotic cyclins for degradation by the proteasome. Wee1 is a protein kinase that inactivates cyclin-CDK1. Cdc25 is a phosphoprotein phosphatase that activates cyclin-CDK1. Not shown here is Plk1, which cooperates with cyclin-CDK1 in the activation of APC-Cdc20. (B) In the Xenopus embryo, the activation of CDK1 drives the cell into mitosis, whereas the activation of APC, which generally lags behind CDK1, drives the cell back out of mitosis.
The purpose of this type of modeling is to understand in simpler, albeit more abstract, terms how and why the cell cycle works. Through time, many of the models have become more complicated and more like chemical engineering models, consisting of dozens of variables and regulatory processes. The purpose of this type of modeling is to account for and test our understanding of specific details of the system that, because of the complexity of the system, cannot always be understood through intuition. This type of detailed model has successfully accounted for the phenotypes of dozens of budding yeast mutants (Chen et al., 2004). Both types of modeling have their place in understanding cellcycle regulation, and both have their adherents. Modeling approaches range from simple Boolean modeling to stochastic modeling and partial differential equation modeling. However, to date, the majority of effort has focused on ordinary differential equation (ODE) modeling (Table 1), which gets at the basic solution phase biochemistry of cell-cycle regulation. Here, we address the question of what it takes to make a simple protein circuit like the CDK1/APC system oscillate. We will start with Boolean modeling, which provides intuition into the logic of biochemical oscillators. We then move on to
Boolean Models We begin by paring the cell cycle down to a simple two-component model in which CDK1 activates APC and APC inactivates CDK1 (Figure 2B). This is the essential negative feedback loop upon which the cell-cycle oscillator is built (Murray et al., 1989). Perhaps the simplest way to think about the dynamics of a system like this is through Boolean or logical analysis (Glass and Kauffman, 1973). Suppose that both CDK1 and APC are perfectly switch-like in their regulation; that is, they are either completely on or completely off. Then, together, the system of CDK1 plus APC has four possible discrete states (APCon/CDK1on, APCon/ CDK1off, APCoff/CDK1on, and APCoff/CDK1off) (Figure 2E). Now suppose the system starts in an interphase-like state, with APCoff/CDK1off. In the first increment of time, what will happen? If the APC is off, then CDK1 turns on. Thus, we define a rule: state 1, with APCoff/CDK1off, goes to state 2 with APCoff/CDK1on. Next, the active CDK1 activates APC; thus, state 2 goes to 3. The active APC then inactivates CDK1, and state 3 goes to state 4. Finally, in the absence of active CDK1, the APC becomes inactive, and state 4 goes to state 1. This completes the cycle. We can depict the dynamics of this oscillator as a diagram in ‘‘state space’’ (Figure 2E). The model goes through a neverending cycle, and all of the possible states of the system are visited during each run through the cycle. If we add one more component to the system—for example, a protein like Polo-like kinase 1 (Plk1), which here we assume is activated by CDK1 and, in turn, contributes to the activation of APC (Figure 2C)—then there are eight (2 3 2 3 2) possible states for the system. If we start with all of the proteins off and assume six biologically reasonable rules (active CDK1 activates Plk1, active Plk1 activates APC, active APC inactivates CDK1.), once again we get a never-ending cycle of states (Figure 2F). But this time, only some of the possible states (states 1–6 in Figure 2F) lie on the cycle. The other two states (7 and 8) feed into the cycle in a manner determined by the rules we assume. Thus, no matter where the system starts, it will converge to the cycle sooner or later. The behavior of this Boolean model is analogous to ‘‘limit cycle oscillations,’’ which we will encounter again in the next section. With Boolean models, it is easy to obtain oscillations. Indeed, one can even get oscillations from a model with a single species (CDK1) that flips on when it is off and flips off when it is on (Figures 2A and 2D), a discrete representation of a protein that negatively regulates itself. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 875
Table 1. Some Mathematical Models of the Eukaryotic Cell Cycle Year
Organism/Cell Type
Type of Model
Reference
1970
No specific organism
ODE
(Sel’kov, 1970)
1974
No specific organism
ODE
(Gilbert, 1974)
1975
Physarum polycephalum
ODE
(Kauffman and Wille, 1975)
1975
Physarum polycephalum
ODE
(Tyson and Kauffman, 1975)
1991
Xenopus laevis embryos
ODE
(Goldbeter, 1991)
1991
Xenopus embryos
ODE
(Norel and Agur, 1991)
1991
Xenopus embryos, somatic cells
ODE
(Tyson, 1991)
1992
Xenopus embryos
ODE
(Obeyesekere et al., 1992)
1993
Xenopus embryos
ODE
(Novak and Tyson, 1993a)
1993
Xenopus embryos
ODE
(Novak and Tyson, 1993b)
1994
Xenopus embryos
ODE, delay differential equations
(Busenberg and Tang, 1994)
1996
Xenopus embryos
ODE
(Goldbeter and Guilmot, 1996)
1997
S. pombe
ODE
(Novak and Tyson, 1997)
1998
S. pombe
ODE
(Novak et al., 1998)
1998
Xenopus embryos
ODE
(Borisuk and Tyson, 1998)
1999
Mammalian somatic cells
ODE
(Aguda and Tang, 1999)
2003
Xenopus embryos
ODE
(Pomerening et al., 2003)
2003
S. cerevisiae
ODE
(Ciliberto et al., 2003)
2004
S. cerevisiae
ODE
(Chen et al., 2004) (Li et al., 2004)
2004
S. cerevisiae
Boolean
2004
S. pombe
Stochastic
(Steuer, 2004)
2005
Xenopus embryos
ODE
(Pomerening et al., 2005)
2006
Mammalian somatic cells
Delay differential equations
(Srividhya and Gopinathan, 2006)
2006
S. cerevisiae
Stochastic
(Zhang et al., 2006)
2007
S. cerevisiae
Stochastic
(Braunewell and Bornholdt, 2007) (Okabe and Sasai, 2007)
2007
S. cerevisiae
Stochastic
2007
S. cerevisiae
Hybrid
(Barberis et al., 2007)
2008
Xenopus embryos
ODE
(Tsai et al., 2008)
2008
S. cerevisiae
Stochastic
(Ge et al., 2008)
2008
S. cerevisiae
Stochastic
(Mura and Csika´sz-Nagy, 2008)
2008
S. pombe
Boolean
(Davidich and Bornholdt, 2008)
2008
Mammalian somatic cells
ODE
(Yao et al., 2008)
2009
Mammalian somatic cells
ODE
(Alfieri et al., 2009)
2010
S. cerevisiae
ODE
(Charvin et al., 2010)
2010
S. cerevisiae, S. pombe
Boolean
(Mangla et al., 2010)
2010
S. pombe
ODE
(Li et al., 2010)
ODE Models of the CDK1/APC System Although Boolean analysis is simple and appealing, it is not completely realistic. First, all three Boolean models with negative feedback loops (Figures 2A–2C) yielded oscillations even though we know that real negative feedback loops do not always oscillate. The problem is the simplifying assumptions that underpin Boolean analysis: the discrete activity states and time steps. Even if individual CDK1 and APC molecules actually flip between discrete on/off states, a cell contains a number of CDK1 and APC molecules, and they would not be expected to all flip simultaneously. The framework for describing the dynamics of such a system is chemical kinetic theory, and, assuming that the numbers of 876 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
CDK1 and APC molecules are large, the activation and inactivation of CDK1 and APC can be described by a set of differential equations. Here, we will build up an ODE model of the system, starting with a one-ODE model, which fails to produce oscillations. We then add additional complexity to the ODEs until the model succeeds in producing sustained, limit cycle oscillations. A One-ODE Model By definition, the rate of change of active CDK1 (denoted CDK1*) is the rate of CDK1 activation minus the rate of CDK1 inactivation. For simplicity, we will assume that CDK1 is activated by the rapid, high-affinity binding of cyclin, which is being
Figure 2. Boolean Models of CDK1 Regulation (A–C) Schematic representation of negative feedback loops composed of one (A), two (B), or three (C) species. (D–F) Trajectories in state space for Boolean models of these three negative feedback systems. Solid lines represent limit cycles; dashed lines (in F) connect the states off the limit cycle to the limit cycle.
synthesized at a constant rate of a1 (Equation 1, blue). For CDK1 inactivation, we will assume mass action kinetics (Equation 1, pink). This gives us the first-order differential equation:
[Equation 1] There are two time-dependent variables, CDK1* and APC*. To allow the system to be described by an ODE with a single time-dependent variable (Figure 3A), we assume that the activity of APC is regulated rapidly enough by CDK1* so that it can be considered an instantaneous function of CDK1*. What functional form should we use for APC’s response function? Here, we will assume that APC’s response to CDK1* is ultrasensitive— sigmoidal in shape, like the response of a cooperative enzyme—and that the response is described by a Hill function. This assumption is reasonable because APC activation is a multi-
step process; multistep processes often yield ultrasensitive, sigmoidal responses; and, for our purposes, the Hill equation with a Hill coefficient (n) greater than 1 can be thought of as a generic sigmoidal function. Substituting a Hill function for APC* in Equation 1, we get a one-ODE model of a negative feedback loop: dCDK1 CDK1n1 = a1 b1 CDK1 n1 dt K1 + CDK1n1
[Equation 2]
We now choose, somewhat arbitrarily, values for the model’s parameters (a1 = 0.1, b1 = 1, K1 = 0.5, n1 = 8) and initial conditions (CDK1*[0] = 0). We can then numerically integrate Equation 2 over time and see how the concentration of activated CDK1* evolves. As shown in Figure 3C, the system moves monotonically from its initial state toward a steady state; there is no hint of oscillation. This monotonic approach to steady state is observed no matter what we assume for the parameters and initial conditions. Thus, Figure 3. A Model of CDK1 Regulation with One Differential Equation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, b1 = 1, K1 = 0.5, and n1 = 8. (B) Trajectories in one-dimensional phase space, approaching a stable steady state (designated by the filled circle) at CDK1*z0.43. (C) Time course of the system, starting with CDK1*[0] = 0 and evolving toward the steady state.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 877
Figure 4. A Two-ODE Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 = 3, b1 = 3, b2 = 1, K1 = 0.5, K2 = 0.5, n1 = 8, and n2 = 8. (B) Phase space depiction of the system. The red and green curves are the two nullclines of the system, which can be thought of as the steadystate response curves for the two individual legs of the feedback loop. The filled black circle at the intersection of the nullclines (with CDK1*z0.42 and APC*z0.37) represents a stable steady state. One trajectory is shown, starting at CDK1*[0] = 0, APC[0] = 0, and spiraling in toward the stable steady state. (C) Time course of the system, showing damped oscillations approaching the steady state.
we have not yet built an oscillator model. Even though we were able to produce sustained oscillations with a one-variable Boolean model of a negative feedback loop (Figures 2A and 2D), translating the model into a differential equation eliminated the oscillations. Another way of representing the system’s behavior is through a phase plot, which shows all possible activities of the system. This is similar to the state-space plots that we used for the Boolean analysis, but instead of having a few discrete states, the phase plot displays a continuum, showing how the system’s transition between states occurs through a smooth continuum (as we would expect, given that the numerous CDK1 molecules do not all activate simultaneously but ‘‘smoothly’’ turn on.). The phase plot contains one dimension for each time-dependent variable. Therefore, in this one-variable model, the phase plot possesses one axis, representing the concentration of activated CDK1* (Figure 3B). In addition, the system’s phase plot shows one stable steady state with CDK1 z0:43. If the system starts off with CDK1 activity less or greater than 0.43, the system will move along a trajectory back to 0.43. In other words, any initial condition to the left or right of the steady state yields a trajectory moving to the right or left, respectively.
A Two-ODE Model Why did the one variable Boolean model produce oscillations (Figures 2A and 2D), whereas the one-ODE model (Equation 2) did not (Figure 3)? The discrete time steps of the Boolean model help to segregate CDK1 activation from inactivation in time. Thus, perhaps adding another ODE (Figure 4A), which acknowledges the fact that APC regulation is not instantaneous, might allow us to generate oscillations. First, we write an ODE for the activation and inactivation of CDK1 (Equation 3). We once again assume that CDK1 is activated by a constant rate of cyclin synthesis (a1 ). We assume that the multistep process through which APC* inactivates CDK1* is described by a Hill function. The inactivation rate is therefore proportional to the concentration of CDK1* (the substrate being inactivated) times a Hill function of APC*. 878 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Now for APC (Equation 4), we assume that its rate of its activation by CDK1* is proportional to the concentration of inactive APC (which, assuming the total concentration of active and inactive APC to be constant, we take to be 1 APC ) times a Hill function of CDK1*, and the rate of inactivation of APC* is described by simple mass action kinetics. The resulting twoODE model is: dCDK1 APCn1 = a1 b1 CDK1 n1 dt K1 + APC1n1
[Equation 3]
dAPC CDK1n2 = a2 ð1 APC Þ n2 b2 APC [Equation 4] dt K2 + CDK1n2 Again, we choose kinetic parameters and initial condition (as described in the caption to Figure 4) and integrate the ODEs numerically. The results are shown in Figures 4B and 4C. The CDK1 activity initially rises as the system moves from interphase (low CDK1 activity) toward M phase (high CDK1 activity) (Figure 4C). After a lag, the APC activity begins to rise too. Then, the rate of CDK1 inactivation (driven by APC activation) exceeds the rate of CDK1 activation (driven by cyclin synthesis), and the CDK1 activity starts to fall. After a few wiggles up and down, the system approaches a steady state with intermediate levels of both CDK1 and APC activities. Thus, we have generated damped oscillations, but not sustained oscillations. Figure 4B shows the phase space view of these damped oscillations. The phase space is now two dimensional because there are two time-dependent variables. There is a stable steady state that sits at the intersection of two curves called the nullclines (green and red curves, Figure 4B). These two nullclines can be thought of as stimulus-response curves for the two individual legs of the CDK1/APC system. The red nullcline (defined by the equation dCDK1 =dt = 0) represents what the steady-state response of CDK1* to constant levels of APC activity would be if there were no feedback from CDK1* to APC* (Figure 4B). The green nullcline (defined by dAPC =dt = 0) represents what the steady-state response of APC* to CDK1* would be if there were no feedback from APC* to CDK1* (Figure 4B). For the whole
Figure 5. A Three-ODE Model of CDK1, Plk1, and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 = 3, a3 = 3, b1 = 3, b2 = 1, b3 = 1, K1 = 0.5, K2 = 0.5, K3 = 0.5, n1 = 8, n2 = 8, and n3 = 8. (B) Phase space depiction of the system. The two colored surfaces are two of the three null surfaces of the system. For clarity, we have omitted the third. The open circle at the intersection of the null surfaces (with CDK1*z0.43, Plk1*z0.42, and APC*z0.37) represents an unstable steady state (or unstable spiral). One trajectory is shown, starting at CDK1*[0] = 0, Plk1[0] = 0, APC[0] = 0, and spiraling in toward the limit cycle. (C) Time course of the system, showing sustained limit cycle oscillations.
system to be in steady state, both time derivatives must be zero. Thus, the steady state for the entire system lies where the two nullclines intersect. The steady state is stable, and the trajectory of the system (black curve) spirals in from the initial values of CDK1* and APC* toward the stable steady state (Figure 4B). A Three-ODE model Perhaps we can improve the oscillations by adding a third species to the model, which increases the lag between CDK1 activation and APC activation (Figure 4C). Here, we will add Plk1 back into the model, as we did in the three-component Boolean model (Figure 2C), with Plk1 assumed to act as an intermediary between CDK1 and APC. We now have three ODEs (Equations 5–7). The equation for the activation and inactivation of CDK1 stays the same (Equation 5). The activation of Plk1 by CDK1* is proportional to the concentration of inactive Plk1 (1 Plk1*) times a Hill function of CDK1*, and the inactivation is proportional to Plk1* (Equation 6). A similar logic for the activation and inactivation of APC gives Equation 7. dCDK1 APCn1 = a1 b1 CDK1 n1 dt K1 + APCn1
[Equation 5]
dPlk1 CDK1n2 = a2 ð1 Plk1 Þ n2 b2 Plk1 dt K2 + CDK1n2
[Equation 6]
dAPC Plk1n3 b3 APC = a3 ð1 APC Þ n3 dt K3 + Plk1n3
[Equation 7]
We arbitrarily choose parameters and initial conditions, and eureka! We now have sustained oscillations (Figures 5B and 5C). Moreover, no matter initial conditions, the system eventually approaches the same pattern of oscillations, with CDK1 activity peaking first, followed by Plk1 activity and then APC activity (Figure 5C). In the phase plane view, this pattern of oscillations is a limit cycle, a closed circle of states that all trajectories spiral in or out toward (black curve, Figure 5B). With Equations 5-7, we finally have an ODE model of the Xenopus embryonic cell cycle that exhibits sustained limit cycle oscillations. The key features of this model include the presence of negative feedback, the fact that there are more than two components to the negative feedback loop, and the presence of ultrasensitivity in the individual steps of the loop. These last two features
help to generate a time delay in the negative feedback, which helps to keep the system from settling into a stable steady state. Linear Stability Analysis So far, we have confined ourselves to analyzing ODE models through simulations. This provides an intuitive feel for the behavior of a system, but of course, it is never possible to choose all possible values for the kinetic parameters or all possible initial conditions. Is there a way to explain theoretically, rather than computationally, why the one-ODE model failed to oscillate at all, the two-ODE model at best yielded damped oscillations, and the three-ODE model finally yielded sustained oscillations? The answer is yes, and probably the most straightforward approach is ‘‘linear stability analysis.’’ Linear stability analysis is quite remarkable. It assesses the stability of the steady states of the system, and, almost magically, allows the dynamics of the system to be characterized even when the system is far from steady state. To get started with linear stability analysis, we will analyze the steady state of the one-ODE model described in Equation 2. Linear Stability Analysis of the One-ODE Model For notational simplicity, we will refer to the rate of change of CDK1 (dCDK1 =dt) as f. This function f can be thought of as a function of CDK1*, which in turn is a function of time. In terms of f, Equation 2 becomes: f = a1 b1 CDK1
CDK1n1 + CDK1n1
K1n1
[Equation 8]
The system will have a steady state when the derivative dCDK1 =dt equals zero (that is, CDK1* is not changing with respect to time), which means that: f =0
[Equation 9]
We can calculate the value of CDK1* for which Equation 9 is true either numerically or algebraically. For the parameters used in Figure 3, CDK1ss z0:43. To the left of the steady state, f is positive (Figure 6); thus, if CDK1* is less than its steady-state value, it will increase with time. Similarly, if CDK1* is greater than its steady-state value, it will decrease with time. This immediately shows that the steady state is stable. With linear stability analysis, we can push this further and determine how stable the steady-state is, in quantitative terms. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 879
dCDK1 ðtÞ = dCDK1 ð0Þelt
[Equation 12]
Thus, to determine the stability of the steady state, one simply needs to determine the value of l by evaluating the derivative df=dCDK1 at the steady state. If l is negative, the steady state is stable and a small perturbation of the system will return exponentially toward the steady state with a half-time of ln 2=l. The bigger the absolute value of l, the faster the system approaches the steady state and, in a sense, the more stable the steady state is. We can now apply linear stability analysis to our one-ODE model (Equation 8). First, we differentiate the right side of the ODE with respect to CDK1*: b1 CDK1n1 CDK1n1 + K1n1 ð1 + n1Þ df = [Equation 13] 2 dCDK1 CDK1n1 + K n1 1
Figure 6. Linear Stability Analysis for the One-ODE Model The blue curve represents f as a function of CDK1*. The dashed red line approximates f for small values of dCDK1*.
Imagine that we perturb the system away from the steady state by some small increment dCDK1 . At what rate will CDK1* move back toward CDK1*ss (and dCDK1 move back toward zero)? In other words, how quickly does the system return to equilibrium? This question can be addressed algebraically with a Taylor series expansion, but perhaps it is easier to approach graphically. This is set up in Figure 6. The x axis represents the concentration of active CDK1*; the y axis represents the rate of change of CDK1*, f; and the blue curve depicts how f varies with CDK1*. When CDK1ss z0:43, the system is at steady state and f = 0. To the left of the steady state, the value of f is positive, and the blue curve lies above the x axis. To the right of the steady state, the value of f is negative, and the blue curve lies below the axis. If the system is perturbed from the steady state by dCDK1 , the rate at which it will return toward the steady state is given by the value of f at CDK1ss + dCDK1 . For small values ofdCDK1 , we can approximate fðCDK1ss + dCDK1 Þ by dCDK1 times the slope of the dashed red line (Figure 6), which is the tangent to the blue curve at the steady state. The slope of the dashed red line is defined to be df=dCDK1 jCDK1ss (the value of df=dCDK1 at CDK1ss ). Therefore, the rate at which CDK1* goes toward the steady state, which equals the rate at which dCDK1 goes toward zero, is given by: ddCDK1 df = slope,dCDK1 = ,dCDK1 dt dCDK1 CDK1ss [Equation 10] For notational convenience, we will represent this slope by l. Thus, Equation 10 becomes: ddCDK1 = l,dCDK1 dt
[Equation 11]
ODEs like Equation 11 show up over and over again in quantitative biology. And, fortunately, it is a particularly simple ODE, probably the only one that most biologists will ever need to solve analytically. Its solution is an exponential function and describes an exponential approach to steady state: 880 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Next, we evaluate this derivative at CDK1* = CDK*ss. Because all kinetic parameters are positive numbers and CDK1*ss is always nonnegative, this derivative always evaluates to a negative number (because of the leading negative sign) and the steady state is stable. For the particular choice of parameters given in Figure 3, lz 1:66. Stability in the Two-ODE Model and Three-ODE Model The logic behind linear stability analysis for a two-ODE model is similar; the algebra, though, is more complicated. We start by rewriting Equations 3 and 4, using the shorthand of f and g to represent the rates of change of CDK1* and APC*, respectively. dCDK1 APCn1 = f = a1 b1 CDK1 n1 dt K1 + APC1n1
[Equation 14]
dAPC CDK1n2 b2 APC = g = a2 ð1 APC Þ n2 dt K2 + CDK1n2 [Equation 15] Again, we identify the steady states of the system and consider small perturbations of the system from the steady state. At this point, the procedure becomes more complicated. To quantitatively analyze the stability of the system, we cannot simply calculate one scalar value l at the steady state values (CDK1*ss, APCss*) because the two equations are interdependent. Instead, we need to calculate eigenvalues of the system at the steady state. Eigenvalues are coefficients—real or complex numbers— that yield the same information about stability that we got from the value of l in the one-dimensional analysis. For present purposes, we will consider them simply as numbers that can be calculated through a straightforward procedure (see Box 1). The eigenvalues for the two-ODE model turn out to be complex numbers (Box 1). What does that mean? Remember that: elt = eðx + iyÞt = ext eiyt = ext ðcos yt + isin ytÞ
[Equation 18]
Thus, the real part of l (x in Equation 18) determines whether the amplitude of oscillations increases or decreases (i.e.,
Box 1. Obtaining Eigenvalues for the Two-ODE Model First, we set up the system’s Jacobian matrix A, which is a table of the two partial derivatives of f and the two partial derivatives of g:
A=
vf vCDK1 vg vCDK1
vf vAPC vg vAPC
[Equation 16]
Next, we evaluate these four partial derivatives at the steady a b state, yielding a matrix of four numbers, ð Þ. Finally, we c d use these four numbers to calculate the eigenvalues. For a two-ODE system, the eigenvalues are given by: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi t + t2 4D t t 2 4D l1 = ; l2 = 2 2 where [Equation 17] t = traceðAÞ = a + d D = detðAÞ = ad bc For our two-ODE model, the eigenvalues turn out to be l1;2 z 0:91 ± 3:30i.
‘‘dampens’’) over time: if the real part of l is negative, the amplitude of the oscillations will decrease by an exponential decay; and if the real part of l is positive, the oscillations will grow exponentially over time. The imaginary part of l (y in Equation 18) makes the perturbation oscillate up and down (as sine and cosine functions do). For the parameters that we have chosen for our two-ODE system, we have damped oscillations (the real parts of the eigenvalues are negative, and the imaginary parts are nonzero). And one can show algebraically that, for any choice of parameters, the real parts of the eigenvalues will be negative, and the oscillations will be damped. So what about our three-ODE model, which did exhibit sustained oscillations? Again, we carry out a linear stability analysis at the steady state of the system (for details, see Box 2). For the choice of parameters that we made above, the eigenvalues are 5:29; 0:88 + 3:47i; 0:88 3:47i. Therefore, the steady state is unstable because two of the eigenvalues have positive real parts and the system exhibits sustained limit cycle oscillations (Figures 5B and 5C). Summary: Oscillations in ODE Models of Simple Negative Feedback Loops Using examples motivated by the cell cycle, we have shown that a one-ODE model of a simple negative feedback loop cannot oscillate; a two-ODE model can exhibit damped, but not sustained, oscillations; and a three-ODE model can exhibit sustained limit cycle oscillations. Linear stability analysis of the systems’ steady states gave us an explanation for why these behaviors are found. From this analysis, we conclude that a simple three-ODE negative feedback model seems like a reasonable starting point for describing oscillations in CDK1 activity like those seen in Xenopus embryos. Indeed, some of the earliest models of the
Box 2. Obtaining Eigenvalues for the Three-ODE Model We write the three ODEs as:
dCDK1 APCn1 = f = a1 b1 CDK1 n1 dt K1 + APCn1
[Equation 19]
dPlk1 CDK1n2 = g = a2 ð1 Plk1 Þ n2 b2 Plk1 dt K2 + CDK1n2 [Equation 20] dAPC Plk1n3 b3 APC = h = a3 ð1 APC Þ n3 dt K3 + Plk1n3 [Equation 21] Next, we set up the Jacobian matrix and calculate the partial derivatives:
0
A=
@
vf vCDK1 vg vCDK1 vh vCDK1
vf vPlk1 vg vPlk1 vh vPlk1
vf vAPC vg vAPC vh vAPC
1 A
[Equation 22]
Finally, we calculate the three eigenvalues. For the choice of parameters we made in Figure 5, the eigenvalues are 5:29; 0:88 + 3:47i; 0:88 3:47i.
cell cycle were simple three-ODE negative feedback loops (Goldbeter, 1991). The ability of a model like this to generate sustained oscillations depends upon the length of the negative feedback loop and the amount of ultrasensitivity assumed for the regulatory interactions within the loop. The longer the loop and the more switch-like the interactions, the easier it is to produce oscillations. Negative Feedback with a Time Delay As mentioned above, the mechanism through which CDK1 activates APC is incompletely understood, but it is probably a multistep mechanism with many intermediate species and ample possibility for the introduction of time delays. The same is true for the inactivation of CDK1 by active APC. Given the vagaries of the exact mechanisms, perhaps a reasonable approach would be to leave the formalism of ODEs and make use instead of delay differential equations, in which an explicit time delay relates the change in activity of APC to an earlier activity of CDK1, and vice versa. Consider our two-ODE model (Equations 3 and 4), modified to include two explicit delays, t1 and t2: dCDK1 ½t APC ½t t 1 = a1 b1 CDK1 ½t n1 n1 dt K1 + APC1 ½t t 1 n1
[Equation 23] dAPC ½t CDK1 ½t t2 = a2 ð1 APC ½tÞ b2 APC ½t n2 n2 dt K2 + CDK1 ½t t 2 n2
[Equation 24]
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 881
Figure 7. A Delay Differential Equation Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.1, a2 = 3, b1 = 3, b2 = 1, K1 = 0.5, K2 = 0.5, n1 = 8, n2 = 8, t1 = 0.5, and t2 = 0.5. (B) Phase space depiction of the system. The red and green lines are the nullclines. One trajectory is shown. The initial history for this trajectory was CDK1*[t % 0] = 0, APC[t % 0] = 0. The trajectory spirals in toward a limit cycle. (C) Time course of the system, showing sustained limit cycle oscillations.
Here, the rate of change of CDK1 activity at time t depends on APC activity at time t- t1, and the rate of change of APC activity at time t depends on CDK1 activity at time tt2. This two-equation model now yields sustained limit cycle oscillations (Figure 7) once the time delays exceed a fairly small critical value. Even a model of a negative feedback loop with only one delay differential equation can be made to oscillate. The explicit time delays, like the discrete time steps in the Boolean model (Figure 2), help to keep the activities of CDK1 and APC from settling into a stable steady state. Delay differential equation models have been used to rationalize the robust oscillations seen in some synthetic biochemical oscillators based on negative feedback loops (Stricker et al., 2008) and have been proposed to model the embryonic cell cycle in Xenopus as well (Busenberg and Tang, 1994).
2001; Gardner et al., 2000; Tyson et al., 2003). The term ‘‘bistable’’ means that the system can be in either of two alternative, stable steady states, depending upon its history, and the term ‘‘hysteretic’’ means that, once the system has been switched from one state to the other, it tends to stay there. Indeed, experimental studies have shown that, in Xenopus egg extracts, the CDK1/Wee1/Cdc25 system does respond to cyclin in a hysteretic fashion; it is easier to maintain an extract in M phase than it is to push an interphase extract into M phase (Pomerening et al., 2003; Sha et al., 2003). Thus, mitosis is driven by a bistable trigger. How would a bistable trigger alter our simple model of the cell cycle? Let us begin again with our two-ODE model (Equations 3 and 4) but now add an additional positive feedback term (Equation 25, yellow) to the first equation, accounting for the fact that active CDK1 promotes the formation of more active CDK1 in a highly nonlinear way:
[Equation 25]
Adding a Bistable Trigger To this point, we have ignored an important part of the scheme shown in Figure 1, the positive feedback loop (CDK1 activates Cdc25, which in turn activates CDK1) and the double-negative feedback loop (CDK1 inhibits Wee1, which in turn inhibits CDK1). Nevertheless, this is probably a critical part of the network; every eukaryotic species examined so far has at least one identifiable Wee1 homolog, and all eukaryotic species except higher plants have at least one Cdc25 homolog. In addition, genetic studies in S. pombe identified these genes as critical for cell-cycle oscillations (Russell and Nurse, 1986, 1987), although, surprisingly, they become less important in S. pombe strains engineered to run off a single cyclin/Cdk fusion protein (Coudreuse and Nurse, 2010). Biochemical studies in Xenopus egg extracts and gene disruption studies in human HeLa cells also provide evidence that these feedback loops are important for the cell cycle (Pomerening et al., 2005, 2008). What do these positive and double-negative feedback loops add to the oscillator? On their own, positive or double-negative feedback loops can accomplish several things. For example, they can amplify the magnitude of a signal and can amplify the system’s sensitivity to a change in a signal. However, these feedback loops are probably best known for their potential to function as bistable, hysteretic toggle switches (Ferrell, 2002; Ferrell and Xiong, 882 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
dAPC CDK1n2 b2 APC [Equation 26] = a2 ð1 APC Þ n2 dt K2 + CDK1n2 Moreover, we assume that the basal rate of CDK1 activation, a1—essentially the cyclin synthesis rate—is slow compared to the other activation and inactivation rates. Now, let us examine the system one leg at a time. First, we look at how the steady state APC* activity would vary with CDK1* activity if there were no feedback from APC to CDK1. This dependency is given by the solution of the equation: a2 ð1 APC Þ
CDK1n2 b2 APC = 0 K2n2 + CDK1n2
[Equation 27]
Equation 27 defines one of the nullclines for the two-ODE system (shown in green in Figure 8B). This nullcline is a monotonic, sigmoidal curve. When CDK1* is low, APC* is low; when CDK1* is high, APC* is high; and in between, APC is intermediate in activity. However, the other nullcline (shown in red in Figure 8B), which describes how the steady-state activity of CDK1 would vary with APC* in the absence of feedback from CDK1 to APC, is qualitatively different. It is not just sigmoidal, it is S shaped. This means that there are three possible steady-state values of CDK1* for
Figure 8. Interlinked Positive and Negative Feedback Loops in a Two-Component Model of CDK1 and APC Regulation (A) Schematic of the model. The parameters chosen for the model were a1 = 0.02, a2 = 3, a2 = 3, b1 = 3, b2 = 1, K1 = 0.5, K2 = 0.5, K3 = 0.5, n1 = 8, n2 = 8, and n3 = 8. (B) Phase space depiction of the system. The red and green lines are the nullclines. They intersect at an unstable steady state designated by the open circle. All trajectories spiral in or out toward a stable limit cycle, denoted by the closed black loop. (C) Time course of the system, showing sustained limit cycle oscillations.
a given APC activity when APC* is within a certain range (APC z0:35 to 0:5, shown by pink shading in Figure 8B). By applying linear stability analysis to this one-dimensional system (or rate balance analysis, which is an easier way to analyze the stability of steady states in one-dimensional systems [Ferrell and Xiong, 2001]), one can show that the left and right steady states are stable, and the middle one is an unstable threshold. Thus, we have chosen parameters such that one leg of the CDK1/APC system functions like a hysteretic, bistable toggle switch. As APC* increases, CDK1* decreases toward the edge of a cliff (at APC*z0.5) and then falls precipitously to a very low level. Then, as APC* decreases, CDK1* rises only slightly until APC*z0.35, whereupon it shoots sky-high. When this toggle switch is coupled to a negative feedback loop, the result can be stable limit cycle oscillations, and for the parameters chosen here, that is what we get (Figures 8B and 8C). CDK1 activity rises slowly at first and then explodes upward toward high mitotic levels. This is followed closely by a rapid rise in APC*, which changes the rapid rise in CDK1* to a similarly precipitous fall. Once CDK1* has fallen enough to turn APC back off, the system begins to slowly ramp up toward its next spike. The oscillations in CDK1 activity shown in Figure 8C look qualitatively similar to those seen in cycling Xenopus egg extracts (Murray and Kirschner, 1989a; Murray et al., 1989; Pomerening et al., 2003) and in HeLa cells in culture (Gavet and Pines, 2010). Accordingly, models that combine positive and negative feedback loops have dominated the cellcycle modeling field since its beginning (Novak and Tyson, 1993a; Tyson and Kauffman, 1975). However, the oscillations shown in Figure 8C are qualitatively quite different from the oscillations that we observed from our ODE models of simple negative feedback loops (Figure 4 and Figure 5). The oscillations for the positive-plus-negative feedback model are spiky, not smooth (Figure 8C), and there are distinct slow and fast phases. This is the type of oscillation that is exhibited by pacemaker cells in a beating heart and by dripping water faucets, and it is termed a ‘‘relaxation oscillation.’’ Biological oscillator circuits often do include positive feedback loops, arguing that relaxation oscillators may be particularly easy
to evolve or may have particular performance advantages that make them especially suitable for biological applications (Holt et al., 2008; Pomerening et al., 2003; Skotheim et al., 2008; Tsai et al., 2008). Why can this two-ODE system oscillate, whereas the straight negative feedback two-ODE system could not? It is because positive feedback adds a type of time delay to the system, making the ODE model behave more like a delay differential equation model. The typical response of a system without positive feedback is a gradual, progressively slowing approach to a steady state. In contrast, a system with positive feedback first simmers and then explodes. This simmering phase is essentially a time lag, and it facilitates the generation of oscillations. Accordingly, we expect that the stable steady state seen in the straight negative feedback two-ODE system (Figure 4) must be destabilized in the positive-plus-negative feedback system (Figure 8). Indeed, this is the case. Linear stability analysis yielded eigenvalues of l1;2 z 0:91 ± 3:30i for the negative feedback-only system; now, with positive feedback added, the eigenvalues are 1:12 ± 4:77i. The real part of the eigenvalues is positive, so the steady state is unstable; the imaginary part of the eigenvalues is nonzero, so there are oscillations. At this point, we have an oscillator model composed of two ODEs, representing two interlinked feedback loops. By adding more ODEs, the model can be made more realistic. For example, one could divide the process of CDK1 activation into its two most critical steps: the production of cyclin-CDK1 complexes through the synthesis of cyclin and regulation of the complexes’ activity through phosphorylation and dephosphorylation. This additional realism comes at the cost of additional complexity; the more ODEs, the harder it is to understand why the system behaves the way that it does. Conclusion The Xenopus embryonic cell cycle is driven by a protein circuit that acts like an autonomous oscillator. In this Primer, we set out to explore how oscillations can arise from a protein circuit. We examined three types of models of simple oscillator circuits based on the CDK1/APC system: Boolean models, ordinary differential equation models, and delay differential equation models. The discrete character of Boolean models and the time lags introduced into Cell 144, March 18, 2011 ª2011 Elsevier Inc. 883
delay differential equation models make it relatively easy to generate oscillations. For ODE models, it is more difficult to keep the model from settling into a stable steady state. With everything else equal, longer negative feedback loops are easier to get oscillating than shorter ones, and switch-like, ultrasensitive response functions within the negative feedback loop promote oscillations, as well. Adding a positive feedback loop to a negative feedback loop tends to promote oscillations, and oscillators with this bistable trigger have distinct characteristics that might make them particularly suitable for biological systems. Linear stability analysis addresses why one ODE model oscillates and another one does not. Accordingly, we have presented several examples of stability analysis for simple oscillator circuits. For one-ODE systems, linear stability analysis is fairly simple. For two or more ODEs, however, one must make use of matrix algebra manipulations, calculating the eigenvalues of the system at the steady state(s). This takes some effort, but the effort is worth it—it provides us with an understanding of why a circuit does or does not oscillate. In many eukaryotic cells, the cell cycle is driven by a CDK1/ APC circuit that behaves more like a succession of decisions or contingent events rather than an autonomous oscillator. Nevertheless, simple models of the Xenopus oscillator, such as the ones discussed here, provide insight that informs the understanding of more complex cell-cycle circuits. Just as positive feedback loops can provide an oscillator circuit with robustness, positive feedback loops can be used to build a succession of reliable switches. We suspect that the link between clock-like cell cycles (like the Xenopus embryonic cycle) and domino-like cell cycles (like the somatic cell cycle) is that they are both constructed out of bistable switches. It is clear that the Xenopus embryonic cell cycle can operate in the absence of transcription. Therefore, we have regarded the cell-cycle oscillator as only a protein circuit—proteins regulate other proteins, but not gene expression. Nonetheless, many cellcycle regulators in many cell types undergo periodic transcription (Spellman et al., 1998). Indeed, in budding yeast, transcriptional oscillations persist in the absence of CDK1 oscillations (Haase and Reed, 1999; Orlando et al., 2008). Transcriptional regulation undoubtedly contributes to the overall functioning of the cell-cycle oscillator, with the protein oscillator acting as a basic core circuit upon which additional controls have been layered. The same may be true of another well-studied biological oscillator, the circadian clock. The slow pace of the circadian clock makes it natural to think of the clock as arising from a transcriptional gene circuit. Nevertheless, in cyanobacteria (Nakajima et al., 2005; Rust et al., 2007; Tomita et al., 2005), Ostreococcus (O’Neill et al., 2011), and human red blood cells (O’Neill and Reddy, 2011), circadian oscillations can proceed in the absence of transcription. Perhaps core protein circuits constitute the basic circadian clock, with transcriptional circuits reinforcing and refining the clock’s behavior (Zwicker et al., 2010). In any case, whether one is interested in gene circuits or protein circuits, and in cell-cycle oscillations or circadian oscillations, the basic concepts and tools that we have reviewed here— negative feedback loops, bistable triggers, time lags, and linear stability analysis—should prove helpful. Our hope is that the detailed analysis of particular oscillator circuits, coupled with 884 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
the comparative analysis of different biological oscillators, will allow us to gain insight into the basic design principles of all of these fascinating clocks. ACKNOWLEDGMENTS We thank Markus Covert for the idea of starting this tutorial with a Boolean analysis instead of plunging right into ODEs and David Dill for helpful discussions. This work was supported by NIH GM077544. REFERENCES Aguda, B.D., and Tang, Y. (1999). The kinetic origins of the restriction point in the mammalian cell cycle. Cell Prolif. 32, 321–335. Alfieri, R., Barberis, M., Chiaradonna, F., Gaglio, D., Milanesi, L., Vanoni, M., Klipp, E., and Alberghina, L. (2009). Towards a systems biology approach to mammalian cell cycle: modeling the entrance into S phase of quiescent fibroblasts after serum stimulation. BMC Bioinformatics 10 (Suppl 12), S16. Barberis, M., Klipp, E., Vanoni, M., and Alberghina, L. (2007). Cell size at S phase initiation: an emergent property of the G1/S network. PLoS Comput. Biol. 3, e64. Borisuk, M.T., and Tyson, J.J. (1998). Bifurcation analysis of a model of mitotic control in frog eggs. J. Theor. Biol. 195, 69–85. Braunewell, S., and Bornholdt, S. (2007). Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity. J. Theor. Biol. 245, 638–643. Busenberg, S., and Tang, B. (1994). Mathematical models of the early embryonic cell cycle: the role of MPF activation and cyclin degradation. J. Math. Biol. 32, 573–596. Charvin, G., Oikonomou, C., Siggia, E.D., and Cross, F.R. (2010). Origin of irreversibility of cell cycle start in budding yeast. PLoS Biol. 8, e1000284. Chen, K.C., Calzone, L., Csikasz-Nagy, A., Cross, F.R., Novak, B., and Tyson, J.J. (2004). Integrative analysis of cell cycle control in budding yeast. Mol. Biol. Cell 15, 3841–3862. Ciliberto, A., Novak, B., and Tyson, J.J. (2003). Mathematical model of the morphogenesis checkpoint in budding yeast. J. Cell Biol. 163, 1243–1254. Coudreuse, D., and Nurse, P. (2010). Driving the cell cycle with a minimal CDK control network. Nature 468, 1074–1079. Dasso, M., and Newport, J.W. (1990). Completion of DNA replication is monitored by a feedback system that controls the initiation of mitosis in vitro: studies in Xenopus. Cell 61, 811–823. Davidich, M.I., and Bornholdt, S. (2008). Boolean network model predicts cell cycle sequence of fission yeast. PLoS One 3, e1672. Ferrell, J.E., Jr. (2002). Self-perpetuating states in signal transduction: positive feedback, double-negative feedback and bistability. Curr. Opin. Cell Biol. 14, 140–148. Ferrell, J.E., Jr., and Xiong, W. (2001). Bistability in cell signaling: How to make continuous processes discontinuous, and reversible processes irreversible. Chaos 11, 227–236. Gardner, T.S., Cantor, C.R., and Collins, J.J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. Gavet, O., and Pines, J. (2010). Progressive activation of CyclinB1-Cdk1 coordinates entry to mitosis. Dev. Cell 18, 533–543. Ge, H., Qian, H., and Qian, M. (2008). Synchronized dynamics and nonequilibrium steady states in a stochastic yeast cell-cycle network. Math. Biosci. 211, 132–152. Gilbert, D.A. (1974). The nature of the cell cycle and the control of cell proliferation. Curr. Mod. Biol. 5, 197–206. Glass, L., and Kauffman, S.A. (1973). The logical analysis of continuous, nonlinear biochemical control networks. J. Theor. Biol. 39, 103–129. Goldbeter, A. (1991). A minimal cascade model for the mitotic oscillator involving cyclin and cdc2 kinase. Proc. Natl. Acad. Sci. USA 88, 9107–9111.
Goldbeter, A. (2002). Computational approaches to cellular rhythms. Nature 420, 238–245. Goldbeter, A., and Guilmot, J.M. (1996). Arresting the mitotic oscillator and the control of cell proliferation: insights from a cascade model for cdc2 kinase activation. Experientia 52, 212–216. Haase, S.B., and Reed, S.I. (1999). Evidence that a free-running oscillator drives G1 events in the budding yeast cell cycle. Nature 401, 394–397. Hara, K., Tydeman, P., and Kirschner, M. (1980). A cytoplasmic clock with the same period as the division cycle in Xenopus eggs. Proc. Natl. Acad. Sci. USA 77, 462–466.
Orlando, D.A., Lin, C.Y., Bernard, A., Wang, J.Y., Socolar, J.E., Iversen, E.S., Hartemink, A.J., and Haase, S.B. (2008). Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature 453, 944–947. Pomerening, J.R., Kim, S.Y., and Ferrell, J.E., Jr. (2005). Systems-level dissection of the cell-cycle oscillator: bypassing positive feedback produces damped oscillations. Cell 122, 565–578. Pomerening, J.R., Sontag, E.D., and Ferrell, J.E., Jr. (2003). Building a cell cycle oscillator: hysteresis and bistability in the activation of Cdc2. Nat. Cell Biol. 5, 346–351.
Hartwell, L.H., and Weinert, T.A. (1989). Checkpoints: controls that ensure the order of cell cycle events. Science 246, 629–634.
Pomerening, J.R., Ubersax, J.A., and Ferrell, J.E., Jr. (2008). Rapid cycling and precocious termination of G1 phase in cells expressing CDK1AF. Mol. Biol. Cell 19, 3426–3441.
Holt, L.J., Krutchinsky, A.N., and Morgan, D.O. (2008). Positive feedback sharpens the anaphase switch. Nature 454, 353–357.
Russell, P., and Nurse, P. (1986). cdc25+ functions as an inducer in the mitotic control of fission yeast. Cell 45, 145–153.
Kauffman, S., and Wille, J.J. (1975). The mitotic oscillator in Physarum polycephalum. J. Theor. Biol. 55, 47–93.
Russell, P., and Nurse, P. (1987). Negative regulation of mitosis by wee1+, a gene encoding a protein kinase homolog. Cell 49, 559–567.
Li, B., Shao, B., Yu, C., Ouyang, Q., and Wang, H. (2010). A mathematical model for cell size control in fission yeast. J. Theor. Biol. 264, 771–781.
Rust, M.J., Markson, J.S., Lane, W.S., Fisher, D.S., and O’Shea, E.K. (2007). Ordered phosphorylation governs oscillation of a three-protein circadian clock. Science 318, 809–812.
Li, F., Long, T., Lu, Y., Ouyang, Q., and Tang, C. (2004). The yeast cell-cycle network is robustly designed. Proc. Natl. Acad. Sci. USA 101, 4781–4786. Mangla, K., Dill, D.L., and Horowitz, M.A. (2010). Timing robustness in the budding and fission yeast cell cycles. PLoS ONE 5, e8906. Minshull, J., Sun, H., Tonks, N.K., and Murray, A.W. (1994). A MAP kinase-dependent spindle assembly checkpoint in Xenopus egg extracts. Cell 79, 475–486. Mura, I., and Csika´sz-Nagy, A. (2008). Stochastic Petri Net extension of a yeast cell cycle model. J. Theor. Biol. 254, 850–860. Murray, A.W., and Kirschner, M.W. (1989a). Cyclin synthesis drives the early embryonic cell cycle. Nature 339, 275–280. Murray, A.W., and Kirschner, M.W. (1989b). Dominoes and clocks: the union of two views of the cell cycle. Science 246, 614–621. Murray, A.W., Solomon, M.J., and Kirschner, M.W. (1989). The role of cyclin synthesis and degradation in the control of maturation promoting factor activity. Nature 339, 280–286. Nakajima, M., Imai, K., Ito, H., Nishiwaki, T., Murayama, Y., Iwasaki, H., Oyama, T., and Kondo, T. (2005). Reconstitution of circadian oscillation of cyanobacterial KaiC phosphorylation in vitro. Science 308, 414–415. Norel, R., and Agur, Z. (1991). A model for the adjustment of the mitotic clock by cyclin and MPF levels. Science 251, 1076–1078. Novak, B., Csikasz-Nagy, A., Gyorffy, B., Chen, K., and Tyson, J.J. (1998). Mathematical model of the fission yeast cell cycle with checkpoint controls at the G1/S, G2/M and metaphase/anaphase transitions. Biophys. Chem. 72, 185–200. Novak, B., and Tyson, J.J. (1993a). Modeling the cell division cycle: M-phase trigger, oscillations, and size control. J. Theor. Biol. 165, 101–134. Novak, B., and Tyson, J.J. (1993b). Numerical analysis of a comprehensive model of M-phase control in Xenopus oocyte extracts and intact embryos. J. Cell Sci. 106, 1153–1168. Novak, B., and Tyson, J.J. (1997). Modeling the control of DNA replication in fission yeast. Proc. Natl. Acad. Sci. USA 94, 9147–9152. Nova´k, B., and Tyson, J.J. (2008). Design principles of biochemical oscillators. Nat. Rev. Mol. Cell Biol. 9, 981–991. O’Neill, J.S., and Reddy, A.B. (2011). Circadian clocks in human red blood cells. Nature 469, 498–503. O’Neill, J.S., van Ooijen, G., Dixon, L.E., Troein, C., Corellou, F., Bouget, F.Y., Reddy, A.B., and Millar, A.J. (2011). Circadian rhythms persist without transcription in a eukaryote. Nature 469, 554–558. Obeyesekere, M.N., Tucker, S.L., and Zimmerman, S.O. (1992). Mathematical models for the cellular concentrations of cyclin and MPF. Biochem. Biophys. Res. Commun. 184, 782–789. Okabe, Y., and Sasai, M. (2007). Stable stochastic dynamics in yeast cell cycle. Biophys. J. 93, 3451–3459.
Sel’kov, E.E. (1970). [2 alternative autooscillatory stationary states in thiol metabolism—2 alternative types of cell multiplication: normal and neoplastic]. Biofizika 15, 1065–1073. Sha, W., Moore, J., Chen, K., Lassaletta, A.D., Yi, C.S., Tyson, J.J., and Sible, J.C. (2003). Hysteresis drives cell-cycle transitions in Xenopus laevis egg extracts. Proc. Natl. Acad. Sci. USA 100, 975–980. Skotheim, J.M., Di Talia, S., Siggia, E.D., and Cross, F.R. (2008). Positive feedback of G1 cyclins ensures coherent cell cycle entry. Nature 454, 291–296. Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297. Srividhya, J., and Gopinathan, M.S. (2006). A simple time delay model for eukaryotic cell cycle. J. Theor. Biol. 241, 617–627. Steuer, R. (2004). Effects of stochasticity in models of the cell cycle: from quantized cycle times to noise-induced oscillations. J. Theor. Biol. 228, 293–301. Stricker, J., Cookson, S., Bennett, M.R., Mather, W.H., Tsimring, L.S., and Hasty, J. (2008). A fast, robust and tunable synthetic gene oscillator. Nature 456, 516–519. Strogatz, S.H. (1994). Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering (Cambridge, MA: Westview Press). Tomita, J., Nakajima, M., Kondo, T., and Iwasaki, H. (2005). No transcriptiontranslation feedback in circadian rhythm of KaiC phosphorylation. Science 307, 251–254. Tsai, T.Y., Choi, Y.S., Ma, W., Pomerening, J.R., Tang, C., and Ferrell, J.E., Jr. (2008). Robust, tunable biological oscillations from interlinked positive and negative feedback loops. Science 321, 126–129. Tyson, J.J. (1991). Modeling the cell division cycle: cdc2 and cyclin interactions. Proc. Natl. Acad. Sci. USA 88, 7328–7332. Tyson, J., and Kauffman, S. (1975). Control of mitosis by a continuous biochemical oscillation: Synchronization; spatially inhomogeneous oscillations. J. Math. Biol. 1, 289–310. Tyson, J.J., Chen, K.C., and Novak, B. (2003). Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr. Opin. Cell Biol. 15, 221–231. Yao, G., Lee, T.J., Mori, S., Nevins, J.R., and You, L. (2008). A bistable Rb-E2F switch underlies the restriction point. Nat. Cell Biol. 10, 476–482. Zhang, Y., Qian, M., Ouyang, Q., Deng, M., Li, F., and Tang, C. (2006). A stochastic model of the yeast cell cycle network. Physica. D 219, 35–39. Zwicker, D., Lubensky, D.K., and ten Wolde, P.R. (2010). Robust circadian clocks from coupled protein-modification and transcription-translation cycles. Proc. Natl. Acad. Sci. USA 107, 22540–22545.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 885
Leading Edge
Review Impulse Control: Temporal Dynamics in Gene Transcription Nir Yosef1,2 and Aviv Regev1,3,* 1Broad
Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA for Neurologic Diseases, Brigham & Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA 3Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.02.015 2Center
Regulatory circuits controlling gene expression constantly rewire to adapt to environmental stimuli, differentiation cues, and disease. We review our current understanding of the temporal dynamics of gene expression in eukaryotes and prokaryotes and the molecular mechanisms that shape them. We delineate several prototypical temporal patterns, including ‘‘impulse’’ (or single-pulse) patterns in response to transient environmental stimuli, sustained (or state-transitioning) patterns in response to developmental cues, and oscillating patterns. We focus on impulse responses and their higher-order temporal organization in regulons and cascades and describe how core protein circuits and cis-regulatory sequences in promoters integrate with chromatin architecture to generate these responses. Introduction The transcriptional program that controls gene expression in cells and organisms is remarkably flexible, constantly reconfiguring itself to respond and adapt to perturbations. These changes are apparent across a broad range of timescales, from rapid responses to environmental signals (i.e., minutes to hours) to slower events during development and pathogenesis (i.e., hours to days) (Lopez-Maury et al., 2008). Dissecting these dynamic changes, both functionally and mechanistically, is a fundamental challenge in biology and raises several key questions. What is the scope of temporal patterns of gene expression in biological systems? What functions do different patterns serve? What molecular mechanisms underlie the formation of each pattern, and what is their capacity to process the temporal signal into a specific change in gene expression over time? Finally, are any principles, either functional or mechanistic, shared among temporal responses in distinct timescales? Recent parallel advances in genomics and cell biology provide an unprecedented opportunity to map dynamic gene expression and decipher its underlying mechanisms. At the same time, livecell imaging of fluorescent reporter proteins (Locke and Elowitz, 2009) allows us to study gene expression at fine temporal resolution and at the single-cell level. Such studies, when coupled with molecular manipulations and quantitative modeling, can identify basic mechanisms of temporal patterning. Further, genomic technologies provide global insights on the regulation of gene expression by allowing us to measure and perturb many aspects of the regulatory system, such as mRNA levels, protein-promoter interactions (Badis et al., 2009; Lee et al., 2002), or chromatin modification states (Wei et al., 2009; Whitehouse et al., 2007). Finally, emerging methods in synthetic biology, robotics, and microfluidics (Szita et al., 2010) are poised 886 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
to transform our ability to manipulate cellular inputs and components at unparalleled temporal resolution. Here, we review recent advances in our understanding of transcriptional dynamics, including the prototypical patterns of temporal mRNA expression and their underlying molecular mechanisms. We identify a small number of prominent temporal patterns, such as single pulse responses (‘‘impulses’’), sustained state-transitioning patterns, and oscillations. Focusing on impulse responses, we then present the molecular circuits that generate these patterns, highlighting the prominent role that transcription factor localization, integration of multiple inputs through cis-regulatory elements, and nucleosome occupancy play in tuning the response to a given stimulus. Finally, we discuss the prospect for a unified view of regulatory dynamics across timescales and systems, emphasizing critical directions for further research. Prototypical Patterns of Temporal Dynamics What capacity does a cell or organism have to generate temporal patterns of gene expression? Recent studies reveal several key classes of patterns (Figure 1). The first one, indefinite oscillators (Figure 1A), plays integral roles in homeostasis, such as the execution of the cell cycle or circadian rhythm. Other classes of temporal patterns follow an external stimulus. These include impulse (or single-pulse) patterns in response to environmental stimuli (Figures 1B–1D) and sustained (or statetransitioning) patterns in response to developmental stimuli (Figure 1E). Each of these patterns serves a set of interrelated functional goals, including optimizing the investment of cellular responses, temporally compartmentalizing antagonistic processes, and imposing order on the biogenesis of complex biological systems. On a systems-wide scale, the regulation of individual genes is commonly organized at a higher order into
ordered sequentially. Here, we focus on the impulse-like pattern, specifically its function and integration within transcriptional programs.
Figure 1. Prototypical Patterns of Temporal Dynamics of Gene Expression Schematic views of gene expression levels (y axis; arbitrary units) over time (x axis) commonly found in cells in steady state or during a response to environmental, developmental, or pathogenic stimuli. Blue and red plots show possible profiles for different genes under each category. Common functions for these gene expression patterns are listed.
regulons, in which a group of genes are controlled by the same transcription factors and, thus, share the same gene expression patterns. In addition, genes can be organized into transcriptional cascades and other patterns in which expression is
Impulse (Single-Pulse) Responses to Environmental Signals Changes in gene expression in response to perturbations of the surrounding environment, such as heat, salinity, or osmotic pressure, typically follow a characteristic ‘‘impulse’’-like pattern (Chechik and Koller, 2009; Chechik et al., 2008). Transcript levels spike up or down abruptly following the environmental cue, sustain a new level for a certain period of time (which may or may not depend on the continuation of the cue), and then transition to a new steady state, often similar to the original levels (Figure 1B). Impulse patterns are prevalent in responses to environmental changes in all organisms, from bacteria to mammals (Braun and Brenner, 2004; Gasch et al., 2000; Litvak et al., 2009; Lopez-Maury et al., 2008; Murray et al., 2004). One of the most extensively studied impulse systems is the environmental stress response (ESR) program in yeast. The ESR consists of 900 genes that exhibit short-term changes in transcription levels in response to various environmental stresses (Gasch et al., 2000). The transient impulse pattern of the ESR likely represents an adaptation phase, during which the cell optimizes its internal protein milieu before resuming growth (Gasch et al., 2000). Indeed, many of the downregulated genes in the ESR are associated with protein synthesis, reflecting the characteristic transient suppression in translation initiation and growth (Gasch et al., 2000). The ESR is also associated with the brief induction of genes involved in specific response mechanisms, such as DNA-damage repair, carbohydrate metabolism, and metabolite transport (Capaldi et al., 2008; Gasch et al., 2000). A notable exception to the impulse-like stress response in yeast is the case of starvation, in which the cells initiate more sustained programs, such as quiescence, filamentation, or sporulation (Lopez-Maury et al., 2008). Transient impulse patterns are also prevalent in mammalian cells (Foster et al., 2007; Litvak et al., 2009; Murray et al., 2004), extending beyond environmental stimuli. For example, when innate immune cells, such as macrophages (Gilchrist et al., 2006; Ramsey et al., 2008) or dendritic cells (Amit et al., 2009), respond to pathogens, expression changes in individual genes follow a clear impulse pattern. These patterns, however, are often coupled to each other, forming multistep transcriptional cascades, in which the products of genes that are induced early in a response affect the expression of downstream targets. These targets, in turn, may exhibit either an impulse pattern or a more sustained one that initiates a long-term change in the cell’s state (Amit et al., 2007a; Murray et al., 2004). Sign-Sensitive Delay and Persistence Detection in Impulse Responses Impulse patterns can respond distinctly to the introduction of a signal versus its withdrawal. This differential response results in a ‘‘sign-sensitive delay’’ (Figure 1C), in which the speed of the cell’s response to one ‘‘sign-shift’’ (e.g., from the presence to the absence of a nutrient) is different from that of the complementary shift (e.g., from the absence to the presence of a nutrient). Cell 144, March 18, 2011 ª2011 Elsevier Inc. 887
Sign-sensitive delays are common in responses of microorganisms to changes in nutrients. For example, consider the arabinose-utilization system of E. coli, in which cyclic adenosine monophosphate (cAMP) regulates transcription from the L-arabinose operon. The transcriptional response to an increase in cAMP (i.e., ‘‘on’’ sign) is much slower than to a cAMP decrease (i.e., ‘‘off’’ sign) (Mangan et al., 2003). One possible reason for this asymmetry is that, at least inside a mammalian host, the ‘‘on’’ state is common whereas the ‘‘off’’ state is maintained only during short and rare pulses of glucose. Consequently, although the cell can halt the production of L-arabinose genes soon after the introduction of glucose, it can tolerate slower commencement of their production when glucose levels decrease and cAMP is produced (Mangan et al., 2003). Alternatively, a sign-sensitive delay may reflect noise filtering; the cell refrains from activation of response pathways following spurious or transient signals. For the arabinose system, the ‘‘on’’ switch delay is approximately 20 min, comparable to the timescale of spurious pulses of cAMP in other natural settings (Alon, 2007). Conversely, a delayed response to the ‘‘off’’ switch can prolong the effect of a transient stimulus. For example, the expression of flagella motor genes in E. coli persists for 1 hr after the biogenesis input signal is turned off, but no delay occurs during the on switch. Indeed, this delay time in shutting down is comparable to the time needed for the biogenesis of a complete flagella motor (Kalir et al., 2005). Similar principles of signal processing in impulse responses have also been observed in mammalian systems. For instance, a small regulatory circuit that controls the expression of the gene encoding the proinflammatory cytokine interleukin-6 (IL-6) in mouse macrophages exhibits a delayed response to lipopolysaccharide (LPS) stimulation (the on switch) and discriminates between transient and persistent signals in the innate immune system (Litvak et al., 2009). Other ‘‘persistence detection’’ mechanisms have also been observed in transcriptional responses to DNA damage (Loewer et al., 2010), to epidermal growth factor (EGF) (Amit et al., 2007a), and to extracellularsignal-regulated kinase (ERK) signaling (Murphy et al., 2002). Transcriptional Anticipation as an Adaptation to Dynamic or Noisy Environments Most studies of environmental stimuli in the lab focus on one sustained signal at a time, but the natural environment to which cells are adapted is substantially more complex, noisy, and irregular (Lopez-Maury et al., 2008; Wilkinson, 2009). Impulse-like transcriptional programs reflect some strategies that cells employ to handle such temporally fluctuating environments. Random fluctuations are optimally handled by sensing environmental changes and specifically responding by transcriptional changes in relevant genes, as described above (e.g., Capaldi et al., 2008; Gasch et al., 2000). In certain cases, a population of cells may respond stochastically; they activate different changes in gene expression in different cells of the same population, thus ‘‘hedging’’ their adaptive bets (Lopez-Maury et al., 2008). When fluctuations are stable and predictable, bacteria and yeast cells may use an anticipatory strategy for gene regulation (Mitchell et al., 2009; Tagkopoulos et al., 2008). For example, when exposed to heat shock, yeasts induce an impulse 888 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
response of genes needed for oxidative stress, although these genes are not directly necessary for adaptation to heat shock. Interestingly, yeast do not induce heat shock genes in response to oxidative stress (Mitchell et al., 2009). This asymmetry (Figure 1D) may reflect the predictable order of the two stresses under natural circumstances: oxidative respiration and accumulation of oxidative radicals follow a temperature increase during fermentation. Notably, this anticipation strategy differs from symmetrical cross-protection (Kultz, 2005) through shared stress-response pathways (Gasch et al., 2000). Rather, it indicates that any optimization of transcriptional programs during evolution occurred in a complex adaptive landscape. Thus, a strategy that may appear ‘‘suboptimal’’ when considering only one stimulus in the lab may indeed be optimal in the presence of multiple simultaneous or sequential stimuli. Higher-Order Temporal Coordination of Impulse Responses A functional temporal program of gene expression requires appropriate temporal coordination between genes (Figure 1F). Studies reveal two main classes of temporal coordination: regulatory modules and timing motifs. A regulatory module consists of genes that are coexpressed with the same temporal pattern or amplitude (FANTOM consortium et al., 2009; Gasch et al., 2000; Spellman et al., 1998). Regulatory modules serve to coordinate the production of proteins that are needed to perform relevant cellular functions in the given response. Regulatory modules are a hallmark of all known transcriptional programs and all known temporal patterns (Figure 1), including oscillatory patterns (e.g., Spellman et al., 1998), sustained responses (e.g., FANTOM consortium et al., 2009), and impulse responses (e.g., Chechik et al., 2008). Complementing the tight temporal coincidence within regulons, timing motifs reflect a particular order of transcriptional events among genes or modules, such as a linear cascade of genes with sequentially ordered expression (Alon, 2007; Chechik et al., 2008; Ihmels et al., 2004). In microorganisms, such ordering is commonly observed among genes encoding metabolic and biosynthetic enzymes, and therefore, it can play an important role in achieving metabolic efficiency or avoiding toxic intermediates (Chechik et al., 2008; Ihmels et al., 2004; Zaslaver et al., 2004). For example, following deprivation of amino acids, E. coli induces the expression of amino acid metabolic genes in the same order that their encoded enzymes are present in the relevant amino acid biosynthetic pathway (Zaslaver et al., 2004). This ‘‘just-in-time’’ pattern (Zaslaver et al., 2004), which may optimize resource utilization, has also been observed in other bacterial processes, most notably flagellar biogenesis (Kalir et al., 2005). A broader range of ordered patterns of expression onset, typically in impulse responses, is found in metabolic enzymes in yeast (Chechik et al., 2008; Ihmels et al., 2004). These include timing motifs with gene expression in the same order as the metabolic pathway (i.e., a just-in-time induction or shutoff of a pathway), as well as in the reverse order to the metabolic pathway. These reversed directions possibly contribute to the fast removal of an end metabolite that is either toxic or otherwise disruptive under the new condition (Chechik et al., 2008).
Coordinated timing motifs are also found at metabolic branch points (Chechik et al., 2008; Ihmels et al., 2004). For example, consider a metabolic funnel, where two enzymes (A, B) produce complementary metabolites that are together consumed by a third reaction (catalyzed by C). In the ‘‘funnel-same-time’’ motif, the genes that encode the three enzymes (A, B, and C) are often expressed simultaneously, thus optimizing metabolite use by coordinating the production or consumption of metabolites along codependent branches. Similar temporal coordination was found for the genes encoding enzymes in ‘‘forks,’’ involving one enzyme producing two metabolites, which are then consumed by two separate reactions. Ordered Impulse Responses within State-Transitioning Systems Cell-fate decisions are typically associated with stable changes in gene expression that transition the regulatory system from one steady state to the next (Figure 1E). Such cell-fate decisions are prevalent in development (Basma et al., 2009; Nachman et al., 2007; Oliveri et al., 2008), pathogenesis (Iliopoulos et al., 2009), and immune responses (Amit et al., 2007a, 2009; Ramsey et al., 2008; Wei et al., 2009). State transitioning in cells involves sustained induction or repression of gene expression, stabilizing the cell on a new characteristic expression program, and disassociating it from its precursors. Nevertheless, processes that lead to such stable changes often involve a succession of impulse responses that promote transient effects necessary for achieving the transition. Such a combination of transient and stable changes in transcription was observed during PMA (phorbol myristate acetate)-induced differentiation of myelomonocytic leukemia cells (THP-1) cells (FANTOM consortium et al., 2009). Sustained responses included repression of genes required for cell-cycle progression and DNA synthesis, which is consistent with the growth arrest associated with PMA-induced differentiation. In addition, genes that characterize the differentiated phenotype (e.g., immune response) were persistently induced. Conversely, transient, impulse-like, changes were associated with various transcription factors that play an important role early in the transition, promoting the differentiation program prior to repression of the factors that maintain the undifferentiated state. A similar pattern, specifically immediate early impulse responses of key regulators followed by stable changes of downstream genes, has been observed in many other mammalian systems, including responses to growth (Amit et al., 2007a), pathogens (Amit et al., 2009), and stress (Murray et al., 2004) signals. Impulse responses are not limited to the immediate wave of transcription at the beginning of the state transition. Rather, a succession of impulses, forming a series of transcriptional ‘‘waves,’’ has been observed in various state-transitioning responses (Amit et al., 2007a, 2009; Ramsey et al., 2008; Shapira et al., 2009). For instance, the response of immune dendritic cells to pathogens involves several waves of induction in which coregulated genes follow a simple impulse profile with distinct onset and offset times (Amit et al., 2007a). As in PMA-induced differentiation, the first wave is an immediate-early response enriched for genes that encode proteins with roles in transcriptional regulation. Then a subsequent transcriptional wave is enriched for genes that are required for extracellular signaling (e.g., inter-
feron-beta 1 or IFNB1) and motility (e.g., chemokine ligand 3 or CCL 3) during that time interval (2–4 hr post-stimulus) in the in vivo innate immune response. This temporal organization allows innate immune cells to activate the CCL3 ligand at the appropriate time, favoring the migration of activated cells to the draining lymph node to activate the adaptive immune response. Long transcriptional cascades of ordered sequential regulation are also at the basis of many complex developmental processes (Davidson, 2010). For instance, in the sea urchin embryo, the transcriptional program of skeletogenic cell development in endomesoderm specification includes several layers of regulation that correspond to developmental phases (Oliveri et al., 2008). Progression through the phases is facilitated by a regulatory cascade in which transcription factors that are active during one phase (e.g., early micromere specification) activate genes in the next phase (e.g., late specification). Notably, transcriptional changes in genes encoding regulatory factors can also feedback and regulate the expression of their temporal ‘‘predecessors’’ (Amit et al., 2007a). Such mechanisms are used to shape both impulse responses and sustained responses, as we discuss below. Mechanism of Temporal Control of Impulse Responses What is the cell’s capacity to ‘‘compute’’ a temporal pattern of mRNA expression? Are there canonical molecular mechanisms that underlie distinct types of patterns? Can a single mechanistic unit generate more than one pattern depending on the incoming signal or its downstream target? In this section, we focus on the molecular mechanisms that generate impulse responses at single genes, gene modules, and temporal motifs. Network Architecture Can Be Decomposed into Characteristic Topological Motifs Regulatory systems that control gene expression are often represented as networks (i.e., directed graphs) with the nodes corresponding to regulatory proteins (e.g., transcription factors) and the edges linking a DNA-binding protein to proteins encoded by genes it binds to and regulates (e.g., Hu et al., 2007; Lachmann et al., 2010; Shen-Orr et al., 2002). Such graphs have been assembled from many small-scale studies on regulation of individual genes and operons (Shen-Orr et al., 2002) or by systematic chromatin immunoprecipitation (ChIP), in vitro assays, and computational analysis of cis-regulatory sequence elements (Badis et al., 2009; Harbison et al., 2004; Hu et al., 2007; Lachmann et al., 2010; Lee et al., 2002). Although network graphs appear highly complex, they can be effectively decomposed to putative functional units based on recurring topological patterns (Figure 2). These ‘‘network motifs’’ (Shen-Orr et al., 2002) are small subnetworks consisting of only a few nodes and edges with a topological pattern that is significantly overrepresented in the transcriptional graph. Although the patterns themselves are static, they can be associated, analytically (Bolouri and Davidson, 2003; Goentoro et al., 2009; Kittisopikul and Suel, 2010; Mangan et al., 2003; Shen-Orr et al., 2002; Tyson et al., 2003) or experimentally (Basu et al., 2004; Cantone et al., 2009; Kaplan et al., 2008; Mangan et al., 2006; Rosenfeld et al., 2002), with different dynamic interpretations, thus relating the architecture of these network components with a functional capacity for generating temporal Cell 144, March 18, 2011 ª2011 Elsevier Inc. 889
Figure 2. General Network Motifs in Transcriptional Regulatory Networks General motifs found in transcriptional regulatory networks are shown. Nodes represent proteins; edges are directed from a DNA-binding protein to a protein encoded by a gene to which it binds and regulates. Arrows and blunt-arrows represent activation and repression, respectively; circle-ending arrows are either activation (+) or repression (). Relevant functions for these motifs are listed.
responses (Figure 2). These responses include rapid or slowed responses (Figures 2A and 2B), feedback control (Figure 2C), sign-sensitive delays (Figure 2D), temporal ordering (Figure 2E), and temporal coordination in modules (Figure 2F). Notably, the relation between the topology of a motif and its induced temporal pattern is far from unique and depends on the characteristics of the incoming signal and of the interacting molecules (Macia et al., 2009). For instance, protein production rate, protein degradation rate, or activation thresholds of regulators can each alter the dynamic transcriptional pattern generated by the motif (Lahav et al., 2004). Moreover, different motifs or combinations of motifs (Geva-Zatorsky et al., 2006) can induce similar behaviors. For a more thorough discussion of network motifs, we refer the reader to other extensive reviews (Alon, 2007; Davidson, 2009, 2010; Tyson et al., 2003). Combinatorial Logic in the Feedforward Loop Generates Sign-Sensitive Delays The feedforward loop (Figure 2D) is a major building block of combinatorial regulation (Amit et al., 2009). A feedforward loop has a unidirectional structure consisting of three nodes: an upstream regulator X that regulates a downstream regulator Y, which in turn regulates a downstream target Z (which is not necessarily a regulator). An additional edge is directed from X to Z, thus closing a unidirectional ‘‘loop.’’ Each interaction can be suppressing or activating, resulting in eight distinct feedforward loop structures. 890 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
One commonly found structure in transcriptional networks (Alon, 2007) is the Type-1 coherent feedforward loop, in which all of the interactions are activating. This feedforward loop can generate a sign-sensitive time delay. The length of the delay and whether it occurs during the off or the on switch depends on the specific molecular parameters of the loop. The particular logic mediated by the loop largely depends on the organization of cis-regulatory elements in the promoter of the target gene (‘‘Z’’). For instance, when the two transcription factors in a coherent feedforward loop exhibit an ‘‘or’’ logic at the promoter of the downstream gene (i.e., only one transcription factor suffices to activate the gene), the resulting dynamics is usually a sign-sensitive delay with faster response to the on switch and a prolonged transcriptional response, as in flagellar biogenesis (Kalir et al., 2005). Conversely, an ‘‘and’’ logic for the two transcription factors (i.e., both factors are needed to activate the gene) is associated with a faster response to the off switch, as in the L-arabinose operon. This feedforward loop structure facilitates persistence detection (Mangan et al., 2003). Another prevalent form of the feedforward loop is the incoherent variant (Figure 2D), in which Y acts as a repressor rather than an activator. Depending on its parameters, this motif can induce pulse-like responses (Basu et al., 2004), lead to a rapid (Mangan et al., 2006) or nonmonotone (Kaplan et al., 2008) response of the downstream target Z, or provide a mechanism for detecting fold-change (e.g., that a component’s level changed by 2-fold rather than an absolute value) (Goentoro et al., 2009). Single-Input Modules and Chromatin Architecture Coordinate Responses in Modules and in Just-in-Time Motifs The single-input module (Figure 2F) motif occurs when a single regulator has multiple targets (Alon, 2007; Lee et al., 2002). This architecture, often associated with regulatory hubs (‘‘master regulators’’), can facilitate a temporally coordinated response of multiple genes in a module. However, the activation of the downstream genes in a singleinput module is not necessarily concurrent, and differences in their promoter properties can lead to ordered activation (Figure 3). Specifically, a transcription factor’s affinity for a specific cis-regulatory sequence affects the fraction of time that it occupies a binding site (Bruce et al., 2009; Tanay, 2006). The stronger the binding affinity, the higher the probability that the transcription factor remains bound to a site and recruits the transcriptional machinery (Hager et al., 2009). Differential recruitment at different promoters results in a range of induction thresholds, allowing a single transcription factor with a temporally fluctuating level to generate an ordering of its target genes. This principle was demonstrated in a recent study using a series of genetically modified promoters of the Pho5 gene during the response to phosphate starvation in yeast (Lam et al., 2008). In this system (Figure 3), promoters with high-affinity sites for the transcription factor Pho4 that are ‘‘open’’ (i.e., not occluded by nucleosomes) responded to weaker signals of slight phosphate deprivation (Figure 3B) and had a shorter response time (Figure 3A) to phosphate starvation compared to those with lower-affinity sites. Similar behavior was observed for synthetic promoter variants and for different targets of Pho4 that had similar promoter architecture.
Figure 3. Promoter Regions and Nucleosome Positioning as Temporal Signal Processors (A) The transcription factor Pho4 (orange oval) targets different variants of the Pho5 promoter following phosphate starvation in yeast cells (left). The purple (upper) promoter contains the wild-type Pho5 promoter sequence, whereas the green and red promoters (denoted as H1 and H3, respectively) are synthetic variants. Each target exhibits a different response time (right), depending on the affinity of Pho4 for its binding site when the site is unoccluded by nucleosomes (depicted in panel B). The y axis corresponds to median fluorescence levels, across separate measurements, scaled between the promoter-specific expression minimum at 0 hr and maximum at 7 hr after induction. (B) Suggested mechanism for decoupling promoter induction threshold from dynamic range. These cartoons show occupancies of Pho4 and nucleosome at the three Pho4 promoter variants under mild (left) and acute (right) phosphate starvation. Gray-blue and yellow ovals represent nucleosomes and Pho4, respectively; dark blue circles and red triangles correspond to lowaffinity and high-affinity binding sites, respectively; and X marks ablation of the Pho4-binding motif. Darker blue ovals represent more highly occupied nucleosomes (across a cell population). Under intermediate levels of phosphate (left), substantial Pho4 occupancy and subsequent transcriptional activity occurs only at promoters with exposed high-affinity sites. The plot at the bottom left shows the respective expression levels, divided for each variant by the maximum level at full starvation in arbitrary units (a.u.). In the absence of phosphate (right), Pho4 activity is saturated, resulting in nucleosome eviction and maximum expression at all promoters. The plot at the bottom right shows the respective maximal induction levels (a.u.). Reproduced from Lam et al. (2008), with permission from the authors.
Thus, graded binding affinities complement the single-input module motif in which a single transcription factor induces temporal ordering among its targets through differential binding affinity (Figure 3A). In the phosphate starvation responses, this results in tuning of the responding genes to the severity and duration of phosphate depletion. At intermediate phosphate levels (with intermediate levels of nuclear Pho4), first-response genes with exposed high-affinity sites like PHO84 and PHM4 allow the cell to take up environmental phosphate and mobilize internal reserves. Under starvation conditions, this initial response is followed by a second-order response such as upregulation of PHO5 and other phosphate-scavenging components (Springer et al., 2003). More generally, such graded affinities may explain the ordered timing of an impulse-like response of genes within metabolic pathways, in timing motifs such as just-in-time. In yeast, the timing of ordered activation in a timing motif was found to correlate with the affinity of the respective gene with its regulating transcription factor (Chechik and Koller, 2009; Chechik et al., 2008). Similar principles were also observed in E. coli (Zaslaver et al., 2004). Nucleosome Positioning Contributes to Activation Timing The position of nucleosomes in a gene promoter impacts the accessibility of transcription factors for their DNA-binding sites. Therefore, nucleosome positioning also affects the order of activations across several genes regulated by the same transcription factor. This effect was convincingly demonstrated in the Pho5 system (Lam et al., 2008). Most Pho4-binding sites are occluded under nucleosomes in normal conditions, but they become exposed when chromatin is dynamically remodeled in response to phosphate starvation (Figure 3B). The threshold of response, and hence a gene’s onset time, is thus also affected by the chromatin architecture of the repressed state. Conversely, the dynamic range of the response is determined by the active state’s architecture. Maximum transcriptional outputs of the Pho5 variants differed by up to 7-fold and correlated with the number, affinity, and placement of Pho4 sites, irrespective of their accessibility in the initial (pre-starvation) chromatin state. These results suggest a mechanism by which the cell decouples the determinants of promoter activation timing (site affinity and nucleosome positions) from the determinants of expression capacity (site affinity alone). Global studies on changes in nucleosome positions in response to environmental signals (Deal et al., 2010) support the generality of the Pho model, at least in yeast (Shivaswamy et al., 2008). Protein Oscillators Generate Coordinated Impulse Responses across Regulons Recent studies suggest that oscillations in the localization or activity of trans-regulators that control single-input modules play a substantial role in governing (nonoscillating) impulse transcriptional patterns. Most notably, coordinated impulse patterns across a regulon may often stem from limited oscillations in the nuclear localization of a regulatory factor controlling the target genes (Ashall et al., 2009; Cai et al., 2008). This has been suggested for the transcription factor Crz1 in yeast, which uses a ‘‘pulsing’’ mechanism to encode information about extracellular calcium levels (Figure 4) (Cai et al., 2008). When extracellular Cell 144, March 18, 2011 ª2011 Elsevier Inc. 891
Figure 4. Coordinated Impulse Response Generated by Protein Oscillators (A) In response to extracellular calcium, yeast cells initiate bursts of nuclear localization of the transcription factor Crz1. Bottom left: A single-cell time trace of the amount of phosphorylated Crz1 in the nucleus; the arrow indicates introduction of extracellular calcium. Bottom right: The frequency of bursts (y axis) rises with calcium levels (x axis). Error bars calculated by using different thresholds for burst determination (see Cai et al., 2008). Inset: A histogram of burst duration times under high (red) and low (blue) calcium levels indicates that burst duration is independent of calcium concentration. (B) Expression levels of three synthetic Crz1-dependent promoters increase proportionally to extracellular calcium concentration (x axis). On the y axis, data are divided, for each variant, by the expression at maximum calcium level. The synthetic promoters have 1 (red), 2 (green), or 4 (blue) calcineurindependent response elements. Inset: A bar chart showing the fold-change of the different targets, following Crz1 overexpression. The targets exhibit different responses, probably due to their different numbers of Crz1-binding sites. Reproduced from Cai et al. (2008) with permission from the authors.
calcium increases, Crz1 is dephosphorylated and exhibits short bursts of translocation to the nucleus. At higher levels of calcium, the cells respond, not by increasing the amount of nuclear Crz1 in each translocation burst but rather by increasing the frequency of the bursts (Figure 4A). Such ‘‘frequency modulation’’ may be important because of the nonlinearity (Yuh et al., 2001) and diversity (Kim et al., 2009) of the input functions associated with different target promoters. Because distinct Crz1 target promoters (Figure 4A) probably respond differently to changing levels of Crz1, amplitude modulation of Crz1 would not maintain their relative ratios 892 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
(Figure 4B, inset). In contrast, modulation of the frequency of Crz1’s nuclear localization can control the expression of multiple target genes in a more proportional manner and thus maintain more stable ratios of gene expression, regardless of the shapes of their input functions (Figure 4B, main graph). This behavior might be explained by the fact that a strong nonlinear component (i.e., dependence on Crz1 magnitude) is now kept relatively constant for different calcium levels, and the variable part is the amount of time the promoters are exposed to a fixed amount of nuclear Crz1. Oscillations in the level or localization of transcription factors have been observed in diverse environmental responses, such as those involving NF-kB (nuclear factor K-light-chain-enhancer of activated B cells) (Ashall et al., 2009; Covert et al., 2005; Friedrichsen et al., 2006; Nelson et al., 2004; Tay et al., 2010) and the tumor suppressor p53 (Geva-Zatorsky et al., 2006; Loewer et al., 2010) (Ashall et al., 2009; Friedrichsen et al., 2006; Nelson et al., 2004; Tay et al., 2010) in mammals and the SOS response to DNA damage in bacteria (Friedman et al., 2005). In the p53 and SOS systems, monitoring with high temporal resolution revealed tightly regulated oscillations in the nuclear levels of the key regulators (e.g., p53) with variable amplitude but more precise timing. Oscillations in regulatory proteins, which are driven by external stimuli, often lead to nonoscillatory, impulse transcriptional patterns. For example, the expression of p21, a p53-target gene, is induced in a nonoscillatory manner during DNA damage (Loewer et al., 2010). Similarly, oscillations in NF-kB localization and activity following TNF-a stimulation are coupled to impulselike patterns in a host of early response genes, such as the NF-kB inhibitor Ik-Ba, even when assessed at the single-cell level (Tay et al., 2010). Thus, protein oscillations in environmental response systems may play a general mechanistic role in regulating downstream impulse transcriptional changes. First, oscillation of transcription factor levels can maintain a steady response as long as the damage signal is present and constitutive supply of the downstream gene products is needed (as in the p53 response). Second, oscillations in transcription factor localization can underlie the induction of proportional responses through frequency modulation (as with Crz1). Finally, combinations of protein oscillators can generate various transcriptional kinetic patterns. For instance, activation of NF-kB in mouse embryo fibroblasts treated with LPS depends on two pathways, MyD88-dependent and MyD88-independent (Covert et al., 2005). Perturbing either one of these pathways and leaving the other one intact leads, in both cases, to oscillatory NF-kB activity. However, when both pathways are intact, both oscillators act upon LPS stimulation but with a relative phase shift of 30 min, resulting in a stable, nonoscillatory pattern of NF-kB activity. It is likely that other combinations, as well as modulation of both amplitude and frequency, will play a role at encoding other complex patterns of transcriptional regulation at single genes and gene modules. Attenuation and Ordering of Impulse Responses through Feedback and Cascades Impulse patterns can be attenuated and ordered in more complex programs and through more elaborate regulatory architectures, most notably within developmental programs. In
particular, in the cascade motif (Figure 2E), regulators are ordered in layers, and proteins from one layer control ones in subsequent layers (Hooshangi et al., 2005; Rappaport et al., 2005). This pattern was observed in transcriptional networks during sea urchin development (Bolouri and Davidson, 2003; Davidson, 2009, 2010; Oliveri et al., 2008), state-transitioning systems in microorganisms (Chu et al., 1998), and environmental responses in mammalian cells (Amit et al., 2007a, 2009; Ramsey et al., 2008; Shapira et al., 2009). A cascade-like network topology entails an inherent temporal order of regulation events (Hooshangi et al., 2005). It was postulated to enable contextspecific responses (Davidson, 2009) and to provide robustness both to spurious input signals (Hooshangi et al., 2005) and to noise in the rates of protein production (Rappaport et al., 2005). Regulatory interactions between different layers in a cascade can form multicomponent loops in which genes in a late transcriptional wave regulate genes from earlier waves (Figure 2C). The ensuing feedback effect can contribute to the ultimate attenuation of impulse responses, even under a sustained signal (Amit et al., 2007b). For example, stimulation of human cell lines with EGF induces several ordered impulse responses (Amit et al., 2007a), including the induction of ‘‘delayed early’’ genes. Delayed early genes are primarily induced by transcription factors that were themselves induced as ‘‘immediate early’’ genes. Delayed early genes encode a large number of signaling proteins and RNA-binding proteins that attenuate RNA levels and protein activity of the initial response pathways. Such negative transcriptional feedback mediated through a transcriptional cascade is common in environmental responses in yeast as well (Segal et al., 2003). A more basic form of feedback is the autoregulatory loop by which a transcription factor regulates its own gene. Negative autoregulation (Figure 2A) facilitates a rapid transcriptional response of the autoregulating gene. It has been associated with the induction of a rapid impulse response to EGF stimulation in human cells (Amit et al., 2007a) and to DNA damage in E. coli (Camas et al., 2006). Conversely, the positive autoregulatory loop (Figure 2B) is associated with the opposite effect because it results in a slow response time (Alon, 2007). Positive loops, with either one or more components (Figure 2B or Figure 2C, respectively), can lead to substantial variation between isogenic cells, due to stochastic effects, and can play an important role in maintaining stability after state transitioning (Davidson, 2009; Kim et al., 2008; Macarthur et al., 2009; Oliveri et al., 2008). Perspective Diverse mechanisms drive impulse-like changes in mRNA levels, which can occur on a broad range of timescales, from rapid environmental stress responses to slower and more elaborate developmental processes. What can we learn by comparing these processes across timescales? The emerging picture supports a few basic principles. Just-in-time responses and sign sensitivity optimize process efficiency, whereas the organization of the impulse responses in functional waves and cascades provides temporal compartmentalization and order to gene expression. Although in this Review, we have made convenient distinctions between impulse responses, state transitions, and oscilla-
tors, most biological systems intertwine these temporal patterns. For example, oscillations in protein levels or localization can also lead to impulse responses, and ordered impulses are important in generating sustained responses through cascades. Furthermore, many of the underlying molecular mechanisms driving these temporal patterns can be intimately linked. For example, a gene may be poised for transcription with a preinitiation complex in anticipation of both developmental and environmental stimuli. Similarly, the mechanistic regulatory building blocks surveyed here are typically embedded within a wider network context. First, many responses, especially in metazoans, involve a large number of inputs into a single promoter during both environmental and developmental responses (Amit et al., 2009; FANTOM consortium et al., 2009). In addition, transcriptional cascades are often combined with other motifs, such as negative feedbacks (Amit et al., 2007a), feedforward loops (Basu et al., 2004; Shen-Orr et al., 2002), and single-input modules (ShenOrr et al., 2002). Such elaborate loops (Figure 2C) and cascades (Figure 2E) are essential to generate temporal order and stable cell states in developmental systems (Davidson, 2009; Hooshangi et al., 2005; Kim et al., 2008; Lee et al., 2002; Li et al., 2007; Macarthur et al., 2009; Oliveri et al., 2008; Rappaport et al., 2005). Furthermore, multiple cis-regulatory elements and sequences affecting nucleosome positions are integrated within more complex cis-regulatory functions in both yeasts (Gertz et al., 2009; Raveh-Sadka et al., 2009) and metazoans (Kaplan et al., 2009; Yuh et al., 2001; Zinzen et al., 2009). Both computational studies and synthetic molecular circuits (Cantone et al., 2009) have provided additional insights into the crosstalk between motifs (Ishihara et al., 2005; Ma et al., 2004) and into the dynamics of complex networks (Walczak et al., 2010) that integrate multiple motifs. Nevertheless, the correspondence between simple subnetworks and motifs and the observed temporal patterns of mRNA levels (Alon, 2007; Davidson, 2009, 2010) suggests a substantial degree of modularity in the operation of regulatory systems. Most of the mechanisms driving mRNA concentrations described in this Review, and that have been deciphered in detail so far, are transcriptional, but other pathways also affect mRNA levels, including mRNA processing, transport, and degradation. Although recent studies (Shalem et al., 2008) suggest that such mechanisms can play a substantial role in shaping temporal profiles of mRNA levels, these mechanisms are still far less understood than transcription regulation. Indeed, the scarcity of experimental methods to monitor these processes has hampered progress in this area. However, we anticipate that recent advances in massively parallel cDNA sequencing (RNASeq) (Mortazavi et al., 2008) will help advance this front. More generally, deciphering circuitry and understanding the capacity of molecular mechanisms to encode complex signals and decode them into specific responses will require tight integration between experiments, analysis, and computation, in particular for temporal responses. First, there is a substantial need for direct manipulation of both signals, for example using microfluidic devices, and of individual components, by manipulation of either trans-components (Amit et al., 2009; Costanzo et al., 2010; FANTOM consortium et al., 2009) or cis-sequences Cell 144, March 18, 2011 ª2011 Elsevier Inc. 893
(Gertz et al., 2009; Patwardhan et al., 2009). Monitoring temporal responses in segregating populations (Eng et al., 2010) can provide a complementary means for testing the effect of many simultaneous genetic perturbations. Analytical methods and computational models can guide the design of these perturbations to a search space that is maximally informative and biologically relevant. For example, sequence models of gene regulation (Gertz et al., 2009; Raveh-Sadka et al., 2009) can help investigators make relevant promoter variants to test, whereas provisional models of trans-regulation (Amit et al., 2009) can help narrow down targets for gene silencing or disruption. Improving the ability to monitor a larger number of circuit components over time in living cells is important for broadening the scope of single-cell studies and for deepening our understanding of population-level phenomena observed with genomics profiling technologies. Recent advances in simultaneously monitoring in vivo multiple types of RNA (Kern et al., 1996; Muzzey and van Oudenaarden, 2009) or proteins (Bandura et al., 2009) are promising. Notably, although the difference between a single-cell and population view is a recurring theme of recent studies, reconciling the two is important for a functional understanding of a response, especially in multicellular organisms (Simon et al., 2005). For example, a recent study of the NF-kB response to TNF-a stimulation showed that the observed cellular heterogeneity may be optimal for achieving a functional population (or mean) response for paracrine cytokine signaling (Paszek et al., 2010). Computational analysis of time course data presents several challenging problems. These include, among others, identifying differentially expressed genes, grouping them into clusters of similar temporal patterns, and inferring their regulatory interactions. Recent studies have shown that a useful algorithmic starting point is to derive a continuous representation of transcriptional profiles by fitting to a particular mathematical function (Chechik and Koller, 2009; Storey et al., 2005). Specifically, impulse responses fit well to a certain class of sigmoid ‘‘impulse-like’’ functions, which have a small number of biologically interpretable parameters (e.g., onset time) (Chechik and Koller, 2009; Chechik et al., 2008). The fitted continuous representations can then be used in conjunction with the original expression values, aiming to provide a more robust analysis, particularly for differential expression (Storey et al., 2005) and clustering (Chechik and Koller, 2009; Chechik et al., 2008). Despite these advances and the vast amount of research on the more advanced task of regulatory network inference (Bansal et al., 2007; Karlebach and Shamir, 2008), there is still much to be accomplished. The emerging complexity of regulatory mechanisms and the expected availability of more diverse and refined temporal data leave substantial room for developing more refined mechanistic models of gene regulation, which account for both cis and trans elements and their integration in time. Finally, advances in synthetic biology promise the ability not only to manipulate biological entities but also to design systems to aid the development and interpretation of analytical models with increasing complexity. This would be particularly critical to decipher the complex web of interactions and the multiplicity of inputs that determine temporal changes in gene regulation in living cells. 894 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
REFERENCES Alon, U. (2007). Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461. Amit, I., Citri, A., Shay, T., Lu, Y., Katz, M., Zhang, F., Tarcic, G., Siwak, D., Lahad, J., Jacob-Hirsch, J., et al. (2007a). A module of negative feedback regulators defines growth factor signaling. Nat. Genet. 39, 503–512. Amit, I., Wides, R., and Yarden, Y. (2007b). Evolvable signaling networks of receptor tyrosine kinases: relevance of robustness to malignancy and to cancer therapy. Mol. Syst. Biol. 3, 151. Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., Guttman, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science 326, 257–263. Ashall, L., Horton, C.A., Nelson, D.E., Paszek, P., Harper, C.V., Sillitoe, K., Ryan, S., Spiller, D.G., Unitt, J.F., Broomhead, D.S., et al. (2009). Pulsatile stimulation determines timing and specificity of NF-kappaB-dependent transcription. Science 324, 242–246. Badis, G., Berger, M.F., Philippakis, A.A., Talukder, S., Gehrke, A.R., Jaeger, S.A., Chan, E.T., Metzler, G., Vedenko, A., Chen, X., et al. (2009). Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723. Bandura, D.R., Baranov, V.I., Ornatsky, O.I., Antonov, A., Kinach, R., Lou, X., Pavlov, S., Vorobiev, S., Dick, J.E., and Tanner, S.D. (2009). Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822. Bansal, M., Belcastro, V., Ambesi-Impiombato, A., and di Bernardo, D. (2007). How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 78. Basma, H., Soto-Gutierrez, A., Yannam, G.R., Liu, L., Ito, R., Yamamoto, T., Ellis, E., Carson, S.D., Sato, S., Chen, Y., et al. (2009). Differentiation and transplantation of human embryonic stem cell-derived hepatocytes. Gastroenterology 136, 990–999. Basu, S., Mehreja, R., Thiberge, S., Chen, M.T., and Weiss, R. (2004). Spatiotemporal control of gene expression with pulse-generating networks. Proc. Natl. Acad. Sci. USA 101, 6355–6360. Bolouri, H., and Davidson, E.H. (2003). Transcriptional regulatory cascades in development: initial rates, not steady state, determine network kinetics. Proc. Natl. Acad. Sci. USA 100, 9371–9376. Braun, E., and Brenner, N. (2004). Transient responses and adaptation to steady state in a eukaryotic gene regulation system. Phys. Biol. 1, 67–76. Bruce, A.W., Lopez-Contreras, A.J., Flicek, P., Down, T.A., Dhami, P., Dillon, S.C., Koch, C.M., Langford, C.F., Dunham, I., Andrews, R.M., et al. (2009). Functional diversity for REST (NRSF) is defined by in vivo binding affinity hierarchies at the DNA sequence level. Genome Res. 19, 994–1005. Cai, L., Dalal, C.K., and Elowitz, M.B. (2008). Frequency-modulated nuclear localization bursts coordinate gene regulation. Nature 455, 485–490. Camas, F.M., Blazquez, J., and Poyatos, J.F. (2006). Autogenous and nonautogenous control of response in a genetic network. Proc. Natl. Acad. Sci. USA 103, 12718–12723. Cantone, I., Marucci, L., Iorio, F., Ricci, M.A., Belcastro, V., Bansal, M., Santini, S., di Bernardo, M., di Bernardo, D., and Cosma, M.P. (2009). A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172–181. Capaldi, A.P., Kaplan, T., Liu, Y., Habib, N., Regev, A., Friedman, N., and O’Shea, E.K. (2008). Structure and function of a transcriptional network activated by the MAPK Hog1. Nat. Genet. 40, 1300–1306. Chechik, G., and Koller, D. (2009). Timing of gene expression responses to environmental changes. J. Comput. Biol. 16, 279–290. Chechik, G., Oh, E., Rando, O., Weissman, J., Regev, A., and Koller, D. (2008). Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network. Nat. Biotechnol. 26, 1251–1259.
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., and Herskowitz, I. (1998). The transcriptional program of sporulation in budding yeast. Science 282, 699–705.
Iliopoulos, D., Hirsch, H.A., and Struhl, K. (2009). An epigenetic switch involving NF-kappaB, Lin28, Let-7 MicroRNA, and IL6 links inflammation to cell transformation. Cell 139, 693–706.
Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh, J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic landscape of a cell. Science 327, 425–431.
Ishihara, S., Fujimoto, K., and Shibata, T. (2005). Cross talking of network motifs in gene regulation that generates temporal pulses and spatial stripes. Genes Cells 10, 1025–1038.
Covert, M.W., Leung, T.H., Gaston, J.E., and Baltimore, D. (2005). Achieving stability of lipopolysaccharide-induced NF-kappaB activation. Science 309, 1854–1857.
Kalir, S., Mangan, S., and Alon, U. (2005). A coherent feed-forward loop with a SUM input function prolongs flagella expression in Escherichia coli. Mol. Syst. Biol. 1, 0006.
Davidson, E.H. (2009). Network design principles from the sea urchin embryo. Curr. Opin. Genet. Dev. 19, 535–540. Davidson, E.H. (2010). Emerging properties of animal gene regulatory networks. Nature 468, 911–920.
Kaplan, N., Moore, I.K., Fondufe-Mittendorf, Y., Gossett, A.J., Tillo, D., Field, Y., LeProust, E.M., Hughes, T.R., Lieb, J.D., Widom, J., et al. (2009). The DNA-encoded nucleosome organization of a eukaryotic genome. Nature 458, 362–366.
Deal, R.B., Henikoff, J.G., and Henikoff, S. (2010). Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science 328, 1161–1164.
Kaplan, S., Bren, A., Dekel, E., and Alon, U. (2008). The incoherent feedforward loop can generate non-monotonic input functions for genes. Mol. Syst. Biol. 4, 203.
Eng, K.H., Kvitek, D.J., Keles, S., and Gasch, A.P. (2010). Transient genotypeby-environment interactions following environmental shock provide a source of expression variation for essential genes. Genetics 184, 587–593.
Karlebach, G., and Shamir, R. (2008). Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780.
FANTOM consortium, Suzuki, H., Forrest, A.R., van Nimwegen, E., Daub, C.O., Balwierz, P.J., Irvine, K.M., Lassmann, T., Ravasi, T., Hasegawa, Y., et al. (2009). The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat. Genet. 41, 553–562. Foster, S.L., Hargreaves, D.C., and Medzhitov, R. (2007). Gene-specific control of inflammation by TLR-induced chromatin modifications. Nature 447, 972–978. Friedman, N., Vardi, S., Ronen, M., Alon, U., and Stavans, J. (2005). Precise temporal modulation in the response of the SOS DNA repair network in individual bacteria. PLoS Biol. 3, e238. Friedrichsen, S., Harper, C.V., Semprini, S., Wilding, M., Adamson, A.D., Spiller, D.G., Nelson, G., Mullins, J.J., White, M.R., and Davis, J.R. (2006). Tumor necrosis factor-alpha activates the human prolactin gene promoter via nuclear factor-kappaB signaling. Endocrinology 147, 773–781.
Kern, D., Collins, M., Fultz, T., Detmer, J., Hamren, S., Peterkin, J.J., Sheridan, P., Urdea, M., White, R., Yeghiazarian, T., et al. (1996). An enhanced-sensitivity branched-DNA assay for quantification of human immunodeficiency virus type 1 RNA in plasma. J. Clin. Microbiol. 34, 3196–3202. Kim, H.D., Shay, T., O’Shea, E.K., and Regev, A. (2009). Transcriptional regulatory circuits: predicting numbers from alphabets. Science 325, 429–432. Kim, J., Chu, J., Shen, X., Wang, J., and Orkin, S.H. (2008). An extended transcriptional network for pluripotency of embryonic stem cells. Cell 132, 1049–1061. Kittisopikul, M., and Suel, G.M. (2010). Biological role of noise encoded in a genetic network motif. Proc. Natl. Acad. Sci. USA 107, 13300–13305. Kultz, D. (2005). Molecular and evolutionary basis of the cellular stress response. Annu. Rev. Physiol. 67, 225–257. Lachmann, A., Xu, H., Krishnan, J., Berger, S.I., Mazloom, A.R., and Ma’ayan, A. (2010). ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444.
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257.
Lahav, G., Rosenfeld, N., Sigal, A., Geva-Zatorsky, N., Levine, A.J., Elowitz, M.B., and Alon, U. (2004). Dynamics of the p53-Mdm2 feedback loop in individual cells. Nat. Genet. 36, 147–150.
Gertz, J., Siggia, E.D., and Cohen, B.A. (2009). Analysis of combinatorial cisregulation in synthetic and genomic promoters. Nature 457, 215–218.
Lam, F.H., Steger, D.J., and O’Shea, E.K. (2008). Chromatin decouples promoter threshold from dynamic range. Nature 453, 246–250.
Geva-Zatorsky, N., Rosenfeld, N., Itzkovitz, S., Milo, R., Sigal, A., Dekel, E., Yarnitzky, T., Liron, Y., Polak, P., Lahav, G., et al. (2006). Oscillations and variability in the p53 system. Mol. Syst. Biol. 2, 0033.
Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., et al. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804.
Gilchrist, M., Thorsson, V., Li, B., Rust, A.G., Korb, M., Roach, J.C., Kennedy, K., Hai, T., Bolouri, H., and Aderem, A. (2006). Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4. Nature 441, 173–178. Goentoro, L., Shoval, O., Kirschner, M.W., and Alon, U. (2009). The incoherent feedforward loop can provide fold-change detection in gene regulation. Mol. Cell 36, 894–899. Hager, G.L., McNally, J.G., and Misteli, T. (2009). Transcription dynamics. Mol. Cell 35, 741–753. Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., et al. (2004). Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104. Hooshangi, S., Thiberge, S., and Weiss, R. (2005). Ultrasensitivity and noise propagation in a synthetic transcriptional cascade. Proc. Natl. Acad. Sci. USA 102, 3581–3586. Hu, Z., Killion, P.J., and Iyer, V.R. (2007). Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 39, 683–687. Ihmels, J., Levy, R., and Barkai, N. (2004). Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nat. Biotechnol. 22, 86–92.
Li, J., Liu, Z.J., Pan, Y.C., Liu, Q., Fu, X., Cooper, N.G., Li, Y., Qiu, M., and Shi, T. (2007). Regulatory module network of basic/helix-loop-helix transcription factors in mouse brain. Genome Biol. 8, R244. Litvak, V., Ramsey, S.A., Rust, A.G., Zak, D.E., Kennedy, K.A., Lampano, A.E., Nykter, M., Shmulevich, I., and Aderem, A. (2009). Function of C/EBPdelta in a regulatory circuit that discriminates between transient and persistent TLR4-induced signals. Nat. Immunol. 10, 437–443. Locke, J.C., and Elowitz, M.B. (2009). Using movies to analyse gene circuit dynamics in single cells. Nat. Rev. Microbiol. 7, 383–392. Loewer, A., Batchelor, E., Gaglia, G., and Lahav, G. (2010). Basal dynamics of p53 reveal transcriptionally attenuated pulses in cycling cells. Cell 142, 89–100. Lopez-Maury, L., Marguerat, S., and Bahler, J. (2008). Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nat. Rev. Genet. 9, 583–593. Ma, H.W., Kumar, B., Ditges, U., Gunzer, F., Buer, J., and Zeng, A.P. (2004). An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Res. 32, 6643–6649.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 895
Macarthur, B.D., Ma’ayan, A., and Lemischka, I.R. (2009). Systems biology of stem cell fate and cellular reprogramming. Nat. Rev. Mol. Cell Biol. 10, 672–681. Macia, J., Widder, S., and Sole, R. (2009). Specialized or flexible feed-forward loop motifs: a question of topology. BMC Syst. Biol. 3, 84.
by opposing effects of mRNA production and degradation. Mol. Syst. Biol. 4, 223. Shapira, S.D., Gat-Viks, I., Shum, B.O., Dricot, A., de Grace, M.M., Wu, L., Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139, 1255–1267.
Mangan, S., Zaslaver, A., and Alon, U. (2003). The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J. Mol. Biol. 334, 197–204.
Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68.
Mangan, S., Itzkovitz, S., Zaslaver, A., and Alon, U. (2006). The incoherent feed-forward loop accelerates the response-time of the gal system of Escherichia coli. J. Mol. Biol. 356, 1073–1081.
Shivaswamy, S., Bhinge, A., Zhao, Y., Jones, S., Hirst, M., and Iyer, V.R. (2008). Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation. PLoS Biol. 6, e65.
Mitchell, A., Romano, G.H., Groisman, B., Yona, A., Dekel, E., Kupiec, M., Dahan, O., and Pilpel, Y. (2009). Adaptive prediction of environmental changes by microorganisms. Nature 460, 220–224.
Simon, I., Siegfried, Z., Ernst, J., and Bar-Joseph, Z. (2005). Combined static and dynamic analysis for determining the quality of time-series expression profiles. Nat. Biotechnol. 23, 1503–1508.
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628.
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297.
Murphy, L.O., Smith, S., Chen, R.H., Fingar, D.C., and Blenis, J. (2002). Molecular interpretation of ERK signal duration by immediate early gene products. Nat. Cell Biol. 4, 556–564. Murray, J.I., Whitfield, M.L., Trinklein, N.D., Myers, R.M., Brown, P.O., and Botstein, D. (2004). Diverse and specific gene expression responses to stresses in cultured human cells. Mol. Biol. Cell 15, 2361–2374. Muzzey, D., and van Oudenaarden, A. (2009). Quantitative time-lapse fluorescence microscopy in single cells. Annu. Rev. Cell Dev. Biol. 25, 301–327. Nachman, I., Regev, A., and Ramanathan, S. (2007). Dissecting timing variability in yeast meiosis. Cell 131, 544–556.
Springer, M., Wykoff, D.D., Miller, N., and O’Shea, E.K. (2003). Partially phosphorylated Pho4 activates transcription of a subset of phosphate-responsive genes. PLoS Biol. 1, E28. Storey, J.D., Xiao, W., Leek, J.T., Tompkins, R.G., and Davis, R.W. (2005). Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. USA 102, 12837–12842. Szita, N., Polizzi, K., Jaccard, N., and Baganz, F. (2010). Microfluidic approaches for systems and synthetic biology. Curr. Opin. Biotechnol. 21, 517–523. Tagkopoulos, I., Liu, Y.C., and Tavazoie, S. (2008). Predictive behavior within microbial genetic networks. Science 320, 1313–1317.
Nelson, D.E., Ihekwaba, A.E., Elliott, M., Johnson, J.R., Gibney, C.A., Foreman, B.E., Nelson, G., See, V., Horton, C.A., Spiller, D.G., et al. (2004). Oscillations in NF-kappaB signaling control the dynamics of gene expression. Science 306, 704–708.
Tanay, A. (2006). Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972.
Oliveri, P., Tu, Q., and Davidson, E.H. (2008). Global regulatory logic for specification of an embryonic cell lineage. Proc. Natl. Acad. Sci. USA 105, 5955–5962.
Tay, S., Hughey, J.J., Lee, T.K., Lipniacki, T., Quake, S.R., and Covert, M.W. (2010). Single-cell NF-kappaB dynamics reveal digital activation and analogue information processing. Nature 466, 267–271.
Paszek, P., Ryan, S., Ashall, L., Sillitoe, K., Harper, C.V., Spiller, D.G., Rand, D.A., and White, M.R. (2010). Population robustness arising from cellular heterogeneity. Proc. Natl. Acad. Sci. USA 107, 11644–11649.
Tyson, J.J., Chen, K.C., and Novak, B. (2003). Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr. Opin. Cell Biol. 15, 221–231.
Patwardhan, R.P., Lee, C., Litvin, O., Young, D.L., Pe’er, D., and Shendure, J. (2009). High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175.
Walczak, A.M., Tkacik, G., and Bialek, W. (2010). Optimizing information flow in small genetic networks. II. Feed-forward interactions. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 81, 041905.
Ramsey, S.A., Klemm, S.L., Zak, D.E., Kennedy, K.A., Thorsson, V., Li, B., Gilchrist, M., Gold, E.S., Johnson, C.D., Litvak, V., et al. (2008). Uncovering a macrophage transcriptional program by integrating evidence from motif scanning and expression dynamics. PLoS Comput. Biol. 4, e1000021.
Wei, G., Wei, L., Zhu, J., Zang, C., Hu-Li, J., Yao, Z., Cui, K., Kanno, Y., Roh, T.Y., Watford, W.T., et al. (2009). Global mapping of H3K4me3 and H3K27me3 reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells. Immunity 30, 155–167.
Rappaport, N., Winter, S., and Barkai, N. (2005). The ups and downs of biological timers. Theor. Biol. Med. Model. 2, 22.
Whitehouse, I., Rando, O.J., Delrow, J., and Tsukiyama, T. (2007). Chromatin remodelling at promoters suppresses antisense transcription. Nature 450, 1031–1035.
Raveh-Sadka, T., Levo, M., and Segal, E. (2009). Incorporating nucleosomes into thermodynamic models of transcription regulation. Genome Res. 19, 1480–1496.
Wilkinson, D.J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nat. Rev. Genet. 10, 122–133.
Rosenfeld, N., Elowitz, M.B., and Alon, U. (2002). Negative autoregulation speeds the response times of transcription networks. J. Mol. Biol. 323, 785–793.
Yuh, C.H., Bolouri, H., and Davidson, E.H. (2001). Cis-regulatory logic in the endo16 gene: switching from a specification to a differentiation mode of control. Development 128, 617–629.
Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Friedman, N. (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176.
Zaslaver, A., Mayo, A.E., Rosenberg, R., Bashkin, P., Sberro, H., Tsalyuk, M., Surette, M.G., and Alon, U. (2004). Just-in-time transcription program in metabolic pathways. Nat. Genet. 36, 486–491.
Shalem, O., Dahan, O., Levo, M., Martinez, M.R., Furman, I., Segal, E., and Pilpel, Y. (2008). Transient transcriptional responses to stress are generated
896 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Zinzen, R.P., Girardot, C., Gagneur, J., Braun, M., and Furlong, E.E. (2009). Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70.
Leading Edge
Review Signaling from the Living Plasma Membrane Herna´n E. Grecco,1 Malte Schmick,1 and Philippe I.H. Bastiaens1,2,* 1Max
Planck Institute for Molecular Physiology, Department of Systemic Cell Biology, Otto-Hahn-Str. 11, D-44227 Dortmund, Germany Biology Department, Faculty of Chemistry, TU-Dortmund, D-44227 Dortmund, Germany *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.01.029 2Chemical
Our understanding of the plasma membrane, once viewed simply as a static barrier, has been revolutionized to encompass a complex, dynamic organelle that integrates the cell with its extracellular environment. Here, we discuss how bidirectional signaling across the plasma membrane is achieved by striking a delicate balance between restriction and propagation of information over different scales of time and space and how underlying dynamic mechanisms give rise to rich, context-dependent signaling responses. In this Review, we show how computer simulations can generate counterintuitive predictions about the spatial organization of these complex processes. The Plasma Membrane: A Dynamic Barrier Biological systems operate within a carefully tailored balance of opposing tendencies, favoring one or the other in response to internal and external cues. Such duality between robustness and adaptability or between exploration of possibilities and commitment to a decision, for example, permeates every level of organization. The function of the plasma membrane is an excellent example of this duality, as it defines the cell by isolating it from the extracellular environment while at the same time integrating the cell with its surroundings by transferring messenger molecules or initiating reaction cascades within it. Isolation versus communication is therefore the precarious balance that the plasma membrane must continuously maintain, separating the outside from the inside while presenting each a representation of the other. To the inside, the plasma membrane summarizes the cell’s ‘‘social’’ context while projecting the cell’s state to the outside. For this reason, plasma membrane function is fundamental not only to keep a single cell alive, but also to maintain its proper behavior in the organismal collective. The plasma membrane is composed of a bilayer of lipids and incorporated proteins, whose interactions as an ensemble enable it to receive, remember, process, and relay information along and across it. These interactions form a signal transduction hierarchy of interconnected time- and lengthscales bridging more than three orders of magnitude, from nanometer-sized proteins to the micrometer scale of the cell. Within each level of the hierarchy, the lengthscales (how far information will spread) are coupled with the timescales (how fast information will spread) through underlying physical-chemical mechanisms such as free diffusion, reaction-diffusion, or active transport. In its most primitive state, the plasma membrane forms a spherical shell, 5 nm thick, and is permeable only to small nonpolar molecules such as oxygen and nitrogen. In a waterbased environment, lipids shield their hydrophobic tails from the surrounding polar fluid, exposing their more hydrophilic heads. This arrangement minimizes the free energy of the
water-lipid system and therefore occurs spontaneously. This property of self-assembly provided a convenient evolutionary path for the generation of a relatively stable supramolecular structure that shields its contents from the dissipative effects of diffusion (Griffiths, 2007). However, the plasma membrane of the modern cell is not a static, self-assembled system but is continuously renewed to preserve its nonequilibrium state. For example, its lipid composition is dynamically maintained by a combination of lipid synthesis and chemical conversion, vesicular fusion and fission events that tie into intracellular transport and sorting processes (van Meer et al., 2008). The lipids, which were previously thought to serve only a structural function, are themselves subject to chemical modification and can thereby relay signals. The resulting axial and lateral asymmetry of the membrane can be rapidly modulated to allow for bidirectional information transfer. Lipids also provide a fluid matrix in which proteins reside and diffuse laterally (Zimmerberg and Gawrisch, 2006). These membrane proteins, which represent more than 50% of the cross-sectional area of the membrane in some cell types (Janmey and Kinnunen, 2006), provide the machinery for most of the plasma membrane’s dynamic properties. In addition to structural and sensory functions, they mediate matter exchange with the environment, enabling the membrane to actively and passively regulate transport of substances across it, even against a concentration gradient. This, for example, can generate ion gradients across the membrane that have important physiological functions such as water homeostasis and electrical excitability. Experimental work with model lipid membranes has nevertheless been one of the main sources of quantitative information about the physical properties of lipid bilayers (Janmey and Kinnunen, 2006). Biophysical parameters such as rigidity, tension, spontaneous curvature, and elastic moduli have thus been determined as a function of temperature, hydration, and lipid composition. Such model membranes clearly lack the dynamic Cell 144, March 18, 2011 ª2011 Elsevier Inc. 897
features of their real-life equivalents. For example, lipid composition is generally symmetrical in model membranes (Devaux and Morris, 2004), and phenomena such as membrane coupling to the cytoskeleton (Kwik et al., 2003) are difficult to reproduce in vitro. Almost 40 years have passed since Singer and Nicholson wrote their seminal work detailing the fluid mosaic model of the plasma membrane (Singer and Nicolson, 1972). In this work, which elegantly integrated the experimental and theoretical knowledge of the time, the authors stated: Biological membranes play a crucial role in almost all cellular phenomena, yet our understanding of the molecular organization of membranes is still rudimentary. Experience has taught us, however, that in order to achieve a satisfactory understanding of how any biological system functions, the detailed molecular composition and structure of that system must be known. In spite of the enormous amount of knowledge about the structure and composition of the plasma membrane gathered in recent decades, our understanding of it can still be considered ‘‘rudimentary’’ in light of the complexity of its dynamics that have become apparent since then. A major challenge will therefore be to animate our rather static view of the plasma membrane by bringing our model membrane systems to life in the test tube. Here, we will discuss the impact that the dynamic, ‘‘living’’ membrane has on cellular information processing. From the extensive range of research available, we focus on examples that represent canonical mechanisms to constrain information within the cell, relying on the plasma membrane as a dynamically maintained supramolecular structure. Signaling across a Dynamic Barrier: The Lateral Dimension The bidirectional transduction of signals by the plasma membrane is modulated by its state, and the cell’s historical context therefore determines its response to incoming signals. However, incoming signals also modify the state of the membrane, and in this way, the transducing medium becomes the message. How fast and to what extent this signal is propagated across the membrane must be tightly regulated to allow information to spread on a scale that is relevant to the biological process while preventing spurious responses. This requires a balance between responding to an actual signal and resisting spurious events induced by noise, a feat that is achieved by partitioning the plasma membrane into domains that span several time- and lengthscales, corresponding to the dimensions at which the biological processes operate. The largest partitions of the plasma membrane occur at a micrometer scale. For example, the partitioning of epithelial cells into apical and basolateral domains generates a cellular polarity that enables transcytotic vectorial transport between two distinct extracellular environments (Mellman and Nelson, 2008; Rodriguez-Boulan et al., 2005; van der Wouden et al., 2003). This polarity is established and maintained from yeast to mammals by the PAR (partitioning defective)/protein kinase C system that amplifies a RHO family G protein (CDC42)-mediated polarity cue (Suzuki and Ohno, 2006). The resulting apical and 898 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
basolateral domains maintain their different lipid and protein compositions dynamically through the life cycle of the cell (Muth and Caplan, 2003; Simons and van Meer, 1988). The living plasma membrane, together with PAR proteins, forms a selfreferencing system that establishes polarity by mechanochemically restructuring the cell. Though such a coarse-grained partitioning provides the cell with a stable polarized structure on a timescale of days, it does not supply a sufficiently rapid response mechanism that is appropriate for localized cues such as those that occur during chemotaxis, for which a short-term, fine-grained spatial memory is needed. In model membranes, protein diffusion is fast enough (D z5 mm2/s) to equilibrate across microns within seconds (Ramadurai et al., 2010). Under these conditions, localized signals such as activated receptors would redistribute across an area equivalent to the cell surface in a few minutes. In such homogeneous membrane systems, the diffusion coefficient scales logarithmically with the inverse of the diffusant radius (Saffman and Delbru¨ck, 1975). This implies that slowing down diffusion through oligomerization of receptors is not enough to constrain mobility. For example, an oligomer of a hundred monomers will diffuse at only half the speed of a monomer. Therefore, diffusion of proteins must be contained in order to maintain spatial memory with micrometer precision over a timescale of minutes. Single-molecule experiments have shown that the plasma membrane is partitioned into 50–300 nm wide domains by the combined action of actin-based membrane skeleton ‘‘fences’’ and anchored-transmembrane protein ‘‘pickets’’ (Kusumi and Sako, 1996). Within these membrane domains, proteins and lipids are highly mobile, with nanoscopic diffusion coefficients in the order of those measured for model membranes (Kusumi et al., 2005). The hopping of signaling proteins across the fences occurs with low probability and thereby becomes the ratelimiting factor in lateral information transfer. In contrast to diffusion, the hopping rate is strongly dependent on the size of proteins, and therefore ligand-induced oligomerization of activated receptors traps the signal within these domains (Nelson et al., 1999). Such oligomerization-induced trapping thus provides a mechanism for the maintenance of spatial memory. Conversely, the confinement of monomers due to their low hopping rate facilitates oligomerization within domains. These domains can be considered as well-mixed protein reaction vessels because the time needed to diffuse through them is two orders of magnitude smaller (150 ms) than the residence time within them (15 ms) (Kusumi et al., 2005). Compartmentalization therefore increases the rate of interaction between receptors. This has important implications for proteins such as the epidermal growth factor receptor (EGFR), which can oligomerize even in the absence of ligand. In cells expressing moderate numbers of EGFR molecules such as BAF/3 or COS7 (5 3 104 receptors/cell), the number of receptors per membrane domain, and therefore the degree of receptor clustering, will be low (2–3 receptors per cluster, consistent with the observations of Clayton et al. [2005]). However, in cancer cell lines that express abnormally high levels of EGFR such as A431 (2 3 106 receptors/ cell), the average size of transient clusters will be much higher (10–15 receptors per cluster, consistent with the observation of Zidovetzki et al. [1981]) (Figure 1A). These preformed clusters
Figure 1. Mechanisms Regulating Lateral Signaling across the Plasma Membrane (A) The reactivity of the plasma membrane is modulated by spatial constraints. Schematic depiction of receptor tyrosine kinase density in cytoskeleton-mediated membrane domains (top) and corresponding distributions (bottom) for different amounts of receptor per cell (N). (B) A bistable system generated by a receptor tyrosine kinase that inhibits its own inhibitory protein tyrosine phosphatase. Two-dimensional time evolution of membrane-bound receptor activation after an initial local point activation shows spreading of the signal. (C) A Turing system generated by a receptor tyrosine kinase that activates its own inhibitory protein tyrosine phosphatase. Two-dimensional time evolution after global stimulation shows the generation of kinase activity hot spots in the plasma membrane.
have a profound impact on the propagation of receptor signals, as they spatially modulate the basal activity and reactivity of the plasma membrane. On a mechanistic level, the transmission of signals by receptor tyrosine kinases (RTKs) such as EGFR is relatively well under-
stood. Binding of the cognate ligand to a receptor promotes their dimerization and thereby enables their phosphorylation in trans via their intrinsic tyrosine kinase activity (Lemmon and Schlessinger, 2010). The resulting phosphorylated tyrosine residues, exposed to the cytoplasm, act as docking sites for proteins Cell 144, March 18, 2011 ª2011 Elsevier Inc. 899
that contain specialized domains, such as SH2 or PTB domains (Lim and Pawson, 2010). Their recruitment induces allosteric changes in enzymatic activity or binding affinity on another module of the docked molecule, conveying signals deeper into the cytoplasm (Deribe et al., 2010). However, though these sequences of reaction events provide insight into how signals are transferred across the membrane into the cell, they do not provide information on how these signals are regulated in space and time. To achieve this, we must consider the collective behavior of the ensemble of signaling molecules in the plasma membrane that have an influence on receptor phosphorylation state. Even at a low hopping rate of clustered receptors, the basal kinase activity of RTKs will eventually result in their full phosphorylation in the absence of a countering phosphatase activity (Reynolds et al., 2003). The degree of receptor phosphorylation within the plasma membrane is therefore determined by a continuous cycle of phosphorylation and dephosphorylation. Growth factor binding increases the amount of phosphorylated receptors by shifting the kinase-phosphatase balance in favor of the kinase. Though membrane-tethered and cytosolic proteins that are activated by receptors but not confined to the domains might propagate signals, their rather slow microscopic diffusion is incompatible with the timescales of minutes observed for such phenomena (Tischer and Bastiaens, 2003). Small molecule second messengers, such as calcium or reactive oxygen species (ROS), like hydrogen peroxide, have much larger diffusion coefficients, thereby propagating information via diffusion more quickly. Previously seen as a reaction by-product that causes oxidative stress, hydrogen peroxide has gained increasing interest as a mediator in signaling (Rhee, 2006). Hydrogen peroxide is produced from the dismutation of superoxide generated by enzyme systems such as NAPDH oxidase (NOX) (Brown and Griendling, 2009). Seven NOX catalytic subunits have been identified (NOX1, NOX2, NOX3, NOX4, NOX5, DUOX1, and DUOX2) that generate superoxide by transferring an electron from NADPH to molecular oxygen. The best-characterized NADPH oxidase, phagocytic NOX2, is a multisubunit enzyme complex with both transmembrane and cytosolic components. Upon stimulation, the cytosolic subunits are translocated to the membrane to bind the membraneassociated components, leading to activation of the NADPH oxidase complex. This activation process is triggered by growth factor receptor activation through the phosphorylation of two cytoplasmic subunits, P47PHOX and P67PHOX, and the conversion of GDP-bound RAC1 into GTP-bound forms through the activation of a RAC guanine nucleotide exchange factor (GEF) (Finkel, 2006). RAC GEFs such as bPIX are recruited via their pleckstrin homology domain, and the resulting increase in RAC activity is presumed to stimulate NOX directly (Finkel, 2006). Importantly, NOX enzymes produce superoxide on the outer leaflet of the plasma membrane, after which it dismutates to hydrogen peroxide and diffuses back into the cell (Rhee, 2006). Hydrogen peroxide has been shown to inactivate PTPs such as PTP1B by reversible oxidation of a reactive cysteine in the catalytic cleft (Janssen-Heininger et al., 2008). The hydrogen peroxide-mediated coupling of RTK activation with PTP inhibition (Lee et al., 1998) therefore exemplifies a double-negative 900 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
feedback loop, which together with the autocatalytic kinase activity of the receptor, results in a bistable system (Reynolds et al., 2003; Tischer and Bastiaens, 2003). This reaction network effectively operates in a nanoenvironment that is local to the activated receptor due to the short half-life of intracellular ROS, which are the target of very efficient antioxidant enzymes such as peroxiredoxin I (PRXI) (Woo et al., 2010). Although spatially constrained, the presence of ROS can still lower the excitation threshold of neighboring, inactive receptors. Receptor density then becomes the key to trigger a dominolike rapid propagation of activity at long range, whereby the RTK/PTP/H2O2 system acts as an excitable medium (Figure 1B) (Reynolds et al., 2003). This global activation initiated by a local source is only possible due to the tight coupling between reaction components that have opposing activities. Insight into the nature of such coupling is required to predict the spatial outcome of reaction diffusion systems. Consider the case of activated, phosphorylated RTKs that can activate their own inhibitors such as the PTP SHP1. Here, phosphotyrosines on the activated RTK bind SHP1 via its SH2 domain, which allosterically activates the phosphatase. Phosphorylation of SHP1 by the RTK then locks it into the active state, irrespective of binding to the RTK (Frank et al., 2004; Uchida et al., 1994). In spite of the high similarity with the excitable media network structure discussed above, the spatial outcome is exactly the opposite, as it focuses global activation into local hot spots (Figure 1C). The theoretical basis for the emergence of such large-scale patterns from random local fluctuations was proposed by Turing in 1952 (Turing, 1952). In addition to cell-wide and submicroscopic domains with lifetimes ranging from days to seconds, even smaller (nanometer scale), shorter-lived (subsecond) domains have been proposed to transiently confine membrane proteins (Simons and Ikonen, 1997). These high-viscosity patches composed of cholesterol and glycosphingolipid are known as lipid rafts and have been shown to have an important role as labile platforms to which signaling components are recruited, favoring their interaction (Harding and Hancock, 2008). We refer the reader to some excellent recent reviews for more information about this extensive topic (Lingwood and Simons, 2010) that goes beyond the scope of this Review. Signaling across a Dynamic Barrier: The Axial Dimension Axial signal propagation into the cytoplasm by phosphorylation of soluble substrates, like lateral signal propagation, is also tightly controlled by reaction-diffusion systems that generate a local environment of activated substrates. For example, transfer of growth factor signals from RTKs in the plasma membrane to soluble substrates in the cytoplasm also depends on cyclic reaction-diffusion systems of opposing tyrosine kinase/phosphatase activities. However, the catalytic activity of fully active PTPs is up to three orders of magnitude higher than that of tyrosine kinases (Fischer et al., 1991), which would preclude the effective transfer of growth factor signals via phosphorylation in the cytoplasm. On the other hand, the absence of PTP activity near the plasma membrane would allow spurious signals to be transmitted in the cell. The solution to this dilemma is the
membrane-proximal, partial inactivation of PTPs by oxidation of the catalytic cysteine with hydrogen peroxide that is produced by NOX as outlined above. The reducing activity of the cytoplasm (sink) together with the source of hydrogen peroxide production at the plasma membrane generates a hydrogen peroxide gradient in the cytoplasm in which PTP activity is strongly reduced near the membrane. Thus, signal penetration via tyrosine phosphorylation is ultimately a self-referencing system in which tyrosine phosphorylation depends on the magnitude of the hydrogen peroxide gradient, which in turn depends on the balance between RTK and PTP activities. The extent of feedback in this system became even more apparent with the recent identification of PRXI as a major reducing agent that controls hydrogen peroxide levels in the cytoplasm (Woo et al., 2010). Importantly, its activity is inhibited by phosphorylation mediated by membrane-bound Src on tyrosine 194, thereby generating a local positive feedback loop around activated RTKs. We might therefore expect that the resulting, more extended downregulation of PTP activity in the cytoplasm allows more efficient penetration of signals via soluble phosphorylated substrates of the RTKs into the cytoplasm. In order to verify this intuition, we performed cellular automata simulations (Markus et al., 1999) of this reaction diffusion system using realistic parameters (Figure 2). In this simulation, we tracked the spatial and temporal evolution of the reaction of the RTK network described above following a ligand-binding event. The outcome of the simulation showed that coupling of PRXI activity to RTK activity has only a marginal effect on signal penetration in the cytoplasm. The surprise was that the major effect of this coupling was on the excitability of the receptor system in the membrane, in that the excitation threshold is lowered. This demonstrates one of the underestimated values of simulations, namely to guide our sometimes faulty intuitions about dynamic processes. Signaling is often perceived as the linear transfer of information from the plasma membrane to the nucleus, where it regulates the global state of the cell by modulating gene expression. However, signaling from the plasma membrane can also generate actively maintained local cytoplasmic states that can act as morphogenetic cues through their effect on the cytoskeleton and membranes (Dehmelt and Bastiaens, 2010). How can these local cytoplasmic states be generated? From a biochemical perspective, transfer of information in the cytoplasm mostly occurs via reversible enzymatic posttranslational modification or Figure 2. H2O2-Dependent Regulation of Signal Penetration in the Cytoplasm (A) Schematic representation of a double-negative kinase phosphatase feedback system that regulates substrate phosphorylation. Receptor tyrosine kinases (RTKs), their phosphatases (PTPs), and their substrates (S), such as PRXI, are present in two states. Conversion between states is denoted by curved arrows; mediation of conversion, straight arrows. Phosphorylated species are denoted by subscript p; active species, subscript a; inactive, subscript i. In case of PRXI, the phosphorylated state is the inactive state. Phosphorylation of the receptor mediates extracellular H2O2 production via recruited NOX, and active PRXI reduces H2O2. (B) Computer simulation of diffusion and reaction of substances as detailed in (A) with a cellular automaton approach. Simulations were performed in the presence (left) or absence (right) of PRXI regulation. Lateral and axial concentration profile 50 ms (first row) and 170 ms (second row) after ligand binding to a receptor tyrosine kinase at the membrane, separating the cytosol
from the extracellular domain (black). (Third row) Lateral membrane receptor phosphorylation levels. Coloring indicates the propagation in time of the high levels of phosphorylated receptor across the membrane. The penetration of phosphorylated substrate Sp is only marginally affected when PRXI is regulated. However, the lateral speed of membrane signal propagation is increased due to increased receptor reactivity. Parameters were chosen in accordance with the literature, as available, and to preserve bistability of activation of RTK in the membrane (Reynolds et al., 2003). Specifically: Dreceptor = 0.2 mm2/s; Dsubstrate = 10 mm2/s; DROS = 100 mm2/s; kPTP = 25 s1; kRTK = 1 s1; kNOX = 10 s1; kPRXIa = 100 s1; kROS = 5 s1. (C) Axial concentration profile of Sp, H2O2, PTPp, and PRXIp. Blue curves depict regulation of PRXI by RTK/PTP; green curves are in absence of PRXI regulation, as in (B); red curves are in absence of PTP inactivation via H2O2 to demonstrate the high impact of PTP regulation on the extent of substrate phosphorylation.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 901
induced conformational changes of soluble proteins (Deribe et al., 2010). Both types of changes are generally counteracted by spontaneous or enzymatic reversion to the original state while diffusion spreads the ‘‘activity’’ from its source. This can lead to the formation of activity gradients if the source of information transfer is localized to a supramolecular structure that is surrounded by a sink where the reverse reaction occurs. For example, as discussed above, a plasma membrane-bound RTK as a source, together with a soluble cytosolic PTP as a sink, can form a cyclic reaction by acting on a phosphorylatable, cytosolic substrate. This system induces a local ‘‘cytoplasmic state’’ (Niethammer et al., 2007) by generating a membrane-proximal gradient of phosphorylated substrate that extends into the cytoplasm. Such a phosphorylation gradient emanating from the plasma membrane was, for example, observed for the microtubule regulator stathmin/ OP18, locally switching off its microtubule destabilizing activity and therefore enhancing microtubule density in the lamellipodia of migrating cells (Niethammer et al., 2004). The duality of containing and propagating signals from the plasma membrane also becomes apparent in the gradient of mitogen-activated protein kinase (MAPK) activity emanating from the shmoo: a mating projection that occurs in response to pheromone in budding yeast cells (Maeder et al., 2007). Here, the MAPKK kinase (STE7), which phosphorylates MAPK (called FUS3 in budding yeast), is localized to the plasma membrane via interaction with the scaffold protein STE5. The sink in the cytoplasm is provided by the homogenously distributed phosphatases PTP3 and MSG5. In order for this system to generate a gradient, phosphorylation of FUS3 must decrease its affinity for STE5 (van Drogen et al., 2001). This scaffold is itself recruited to the membrane by interaction with the liberated bg subunits (STE14/STE18) of a trimeric G protein after activation of the G protein-coupled pheromone receptor STE2/3. Local plasma membrane composition also regulates STE5 localization in that phosphatidylinositol 4,5-bisphosphate is required in the shmoo membrane for its targeting (Garrenton et al., 2010). The ensuing local cytoplasmic state that contains active FUS3 proximal to the plasma membrane has been suggested to maintain the structure of the shmoo by local phosphorylation of the actin cableregulating formin, BNI1 (Matheos et al., 2004). However, the dimension of the FUS3 gradient is extensive enough to reach the nucleus of the small yeast cell, where active import causes an enrichment of phosphorylated, active FUS3 (Maeder et al., 2007). This gradient/nuclear import system thereby generates two functional compartments with high levels of active FUS3. In the shmoo, active FUS3 maintains the structure of the cytoskeleton, whereas in the nucleus, it affects gene expression. To achieve this kind of dual purpose signaling in much larger mammalian cells, the distance from the plasma membrane to the nucleus must be bridged while retaining membrane-proximal concentrations of phosphorylated substrates. Longer distances can be covered by a cascade of coupled reaction cycles, which generate secondary shallower gradients from primary steep ones (Stelling and Kholodenko, 2009). The extent of this secondary gradient system resolves the problem of signal penetration but fails to provide an effective means of independently generating both global (transcriptional) and local (morphoge902 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
netic) signals. The latter can be achieved by scaffold proteins recruiting source activities directly to the plasma membrane, thereby locally constraining signaling. For example, by associating the source kinase MEK with its substrate ERK, the scaffold KSR1 maintains a primary gradient of double-phosphorylated ERK proximal to the plasma membrane, just as STE5 does in yeast. How then is the duality between local and global signaling resolved? By taking into account that scaffolds usually have a lower concentration in the cell than their clients (Lee et al., 2003; Maeder et al., 2007), it becomes clear that scaffoldsubstrate coupling does not represent the totality of signaling activity. Instead, it is part of a functional module that uses the soluble building blocks of the canonical signaling pathway. The localized functionality of this signaling subset is therefore distinct from the canonical function of the soluble MAPK-signaling pathway. Which functions are then associated with the soluble part of the pathway, and which are associated with the scaffold? One possibility is that proliferative growth factor signals are transmitted to the nucleus via a short-lived, secondary gradient in the ‘‘soluble’’ MAPK module. Negative feedback loops observed in the MAPK module could only occur in the soluble cytoplasmic state and might account for the pulse-like MAPK response typically observed after proliferative growth factor signals such as EGF (Marshall, 1995). One such well-known negative feedback loop is the inhibitory phosphorylation of SOS by ERK (Buday et al., 1995). The resulting pulse of MAPK activity is sensed in the nucleus by mechanisms such as incoherent feed-forward loops that operate as fold change detectors, thereby translating relative activity changes into modified patterns of gene expression (Goentoro et al., 2009). When MAPK signaling is restrained by a scaffold at the plasma membrane, this negative feedback might be absent or changed to positive feedback to allow for prolonged but membrane-proximal signaling via ERK gradients. Such persistent cytoplasmic signaling might affect the local state of the cytoskeleton to induce and maintain specific morphologies. In agreement with this notion, it was found that proliferative EGF stimulation leads to transient ERK activity in both the nucleus and the cytoplasm of MCF7 cells, whereas differentiating Heregulin stimulation leads to sustained ERK activity in the cytoplasm and transient activity in the nucleus (Nakakuki et al., 2010). Intracellular Membranes: The Extended Axial Dimension Based on the previous section, one could argue that cytoplasmic signals that affect the morphology of the cell via the cytoskeleton are mostly maintained in the proximity of the plasma membrane by gradient-generating mechanisms, whereas nuclear signals that propagate through the cytoplasm and affect gene expression are based on a temporal code. The endocytic system can effectively extend the axial reach of plasma membrane signaling deep into the cytoplasm by transferring the source of gradientgenerating systems to vesicles within the cytoplasm (Birtwistle and Kholodenko, 2009). This allows signals to penetrate deeper into the cytoplasm without necessarily affecting gene expression. The lateral and axial propagation of RTK signals emanating from the plasma membrane need also to be acutely terminated by removing and deactivating the RTK source of the signal in
order to avoid uncontrolled signal spread and to resensitize the membrane for extracellular signals. Endocytosis of activated receptors is one of the mechanisms for such a task (Wiley and Burke, 2001). Though a comprehensive description of the endocytic machinery is outside the scope of this Review, we would like to discuss two functionalities of this system that are relevant to our discussion of the regulation of signaling by the living plasma membrane: the endosomal pathway that leads to lysosomal receptor degradation and endosomal-plasma membrane recycling. Depending on their type, activation state, and cellular context (Le Roy and Wrana, 2005), activated growth factor receptors are trapped in clathrin-coated pits, caveolae, or both. For example, EGFR internalization occurs mostly through clathrin-mediated endocytosis activated by the ubiquitin ligase CBL within minutes of ligand stimulation. A fraction of the activated RTKs (e.g., 30% in case of EGFR) is targeted for degradation using the RAB7dependent degradative route from late endosomes through multivesicular bodies (MVB) to lysosomes. The snare system (van den Bogaart et al., 2010) is important in this route for the fusion of vesicular systems and the ubiquitin-regulated ESCRT complexes to generate vesicles inside the MVB. Receptor recycling occurs through the slow (t1/2 z20 min through RAB8 and RAB11) and fast (t1/2 z5 min through RAB4) recycling pathways for clathrin-mediated endocytosis (Sorkin et al., 1991). The G protein ARF6 is responsible for recycling of nonclathrin-mediated endocytosed receptors. For more detail on the workings of the endocytic machinery, we refer the reader to some recent excellent reviews (Miaczynska et al., 2004; Scita and Di Fiore, 2010; Sorkin and von Zastrow, 2009) and the references therein. Recycling of activated receptors through the endocytic machinery constitutes a simple mechanism to reset the state of the plasma membrane in order to allow further stimuli after a refractory period. The endocytic system effectively regulates the response properties of the plasma membrane as a signaling entity by controlling the availability of receptors at the cell surface. As we have already discussed, the concentration of receptors in the plasma membrane can change the qualitative behavior of the signaling response, as shown for COS7 cells in which lateral propagation of EGF local activation was only possible upon EGFR overexpression. However, in this work, it was also shown that the same result could be achieved by blocking endocytosis, which augments the poststimulus concentration of receptors in the plasma membrane (Sawano et al., 2002). In the case of the PDGFb receptor, which is usually not recycled but degraded, recycling can be induced by the loss of T cell PTP activity. This results in an enrichment of the phosphorylated receptor and an increase in PLCg activity at the plasma membrane that stimulates cell migration (Karlsson et al., 2006). Both examples show that receptor recycling can change the response of the cell by altering the qualitative behavior of the ensemble of receptors at the plasma membrane in response to a stimulus. However, continuous endosomal cycling may also play a role in maintaining and propagating a signal. For example, transforming growth factor (TGF)-b stimulation activates the SMAD system, which is shuttled from cytosol to nucleus and back (Inman et al., 2002). It was shown that, by cycling, the signal transducers continuously monitor receptor activity. In a similar
fashion, endosomes act as a constitutive sensing mechanism that could relay information between the plasma membrane and cytosol. Another key aspect of the endocytic system is that it generates a mobile local cytoplasmic state by moving the signaling source through the inactivating cytoplasm. The concept of signaling endosomes as entities in which selective and regulated RTK signal transduction occurs was first described in the mid-1990s (Baass et al., 1995; Grimes et al., 1996). Functional microscopy experiments provided further evidence that cytosolic signaling proteins are recruited by activated receptors in endosomes, strengthening the idea that endosomes are not just a path to degradation and recycling (Sorkin et al., 2000; Wouters and Bastiaens, 1999). The endocytic machinery, positioned both temporally and physically between the plasma membrane and the lysosomal compartment, thus provides a mechanism not only for signal downregulation at the plasma membrane, but also for signal propagation in the cytosol. Moreover, endosomes provide a reaction platform on which protein assemblies can generate new functionality, as has been shown for APPL. These proteins are multifunctional adaptors and effectors of RAB5, which localize to a subpopulation of early endosomes but are also capable of nucleocytoplasmic shuttling. They have as many cytoplasmic targets as they have nuclear targets (Rashid et al., 2009; Schenck et al., 2008). APPL-harboring endosomes are therefore an intermediate in signaling between the plasma membrane and the nucleus (Rashid et al., 2009). Endosomes also provide a more long-range transport signal, as shown for TRKA in neurons (Cosker et al., 2008; Ehlers et al., 1995). Motor proteins such as dyneins can transport endosomes along microtubule tracts, allowing for rapid signal propagation into the cytoplasm. Kinesins can effectively transport vesicles back to the plasma membrane. For example, NGF-activated TRKA-containing endosomes contribute to retrograde transport of survival signals from the tip of the axon to the soma of the neuron. This type of signal transport is of special relevance to cells that have extended cytoplasmic structures like neurons, where growth cone signals have to travel over millimeter lengths in order to reach the cell soma. This would take days if mediated by passive diffusion alone (Howe, 2005). The geometrical properties of endosomes, such as their reduced size and closed surface, also facilitate signal amplification. The elevated ligand concentration in the endosomal lumen makes full activation of contained receptors possible. In a similar way to membrane domains in the plasma membrane that are contained by the cytoskeleton, their reduced size increases the chance of amplifying growth factor signals in encounter-driven activation processes. In such processes, a higher chance of second encounters compensates for the fact that not all encounters lead to activation. This effect is especially significant in receptors that must undergo multiple activation steps to attain full kinase activity (Lemmon and Schlessinger, 2010) and leads to faster activation of these receptor populations (Figure 3). Examples of such RTKs are the insulin and IGF1 receptors. Endosomes can also be an effective way to deliver hydrogen peroxide gradients deep inside of the cell to enhance local signaling in the cytoplasm around them. Here, a gradient that is generated proximal to the plasma membrane is transported Cell 144, March 18, 2011 ª2011 Elsevier Inc. 903
Figure 3. Constraining Receptor Diffusion Accelerates Activation (A) Example of Brownian motion simulations for a receptor with a single (top) or double (bottom) step activation mechanism. A plot shows the time evolution (x axis) for 20 receptors (y axis) for a particular realization of the simulation in which a single receptor is activated at time 0. The receptor state is color coded. Three key frames depicting the receptor’s trajectory for highlighted particles are shown. (B) Average (solid line) evolution of the fraction of receptor in each state for 100 realizations of the simulation. The standard error of the mean is indicated (dashed lines). t1 and t2 indicate the times at which the activated population reaches 50%. (C) The simulation was repeated for different domain sizes and numbers of particles. The difference between t2 and t1 is reduced as the size of the domain is reduced.
904 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
with the endosome (Oakley et al., 2009). For example, RACmediated NOX association with IL-1 and subsequent internalization in endosomes result in superoxide production and conversion to hydrogen peroxide in the lumen of the endosome (Li et al., 2006). The diffusion of hydrogen peroxide out of the lumen of the endosome to the surrounding cytosol results in efficient coupling to its targets in the cytoplasm. At the plasma membrane, this coupling is less efficient, as half of the hydrogen peroxide is lost to the extracellular milieu. The aforementioned examples show how endocytosis modulates signaling, but the inverse is also true. For example, the routing and timing of receptors in the endosomal system are regulated by cargo-mediated G protein signaling, mostly through RAB family G proteins. An example is the early-to-late endosomal conversion characterized by the RAB5-RAB7 switch (Spang, 2009). The timing of this switch involves coordination of two opposing behaviors: a fast positive and a slow negative feedback loop. In the early endosome, the recruitment of RAB5 by its GEF (RABX5) initiates a positive feedback loop whereby RAB5-GTP activates RABX5. Subsequent binding and activation of RAB5 effector molecules such as the phosphatidylinositol 3-kinase (PI3K) VPS34 result in the local synthesis of phosphatidylinositol 3-phosphate (PI3P). On the one hand, the accumulation of PI3P recruits SAND-1, which displaces RABX5 and thereby disrupts the RAB5-RABX5 positive feedback loop (Del Conte-Zerial et al., 2008; Poteryaev et al., 2010). On the other hand, it has been shown that the SAND-1 yeast homolog MON1p interacts with the HOPS complex. A subunit of this complex, VPS39p acts as a GEF for yeast RAB7 and therefore activates it (Bohdanowicz and Grinstein, 2010). In essence, this system behaves like a transistor in gain amplification mode wherein the cargo-mediated activation of PI3K (the signal) generates the base current of PI3P that, upon accumulation, triggers the RAB5-RAB7 switch (Vartak and Bastiaens, 2010). The cargo therefore influences its own endocytic routing in a timedependent manner. Just as the primordial plasma membrane provided a surface within which reactions were facilitated through containment of diffusion (Griffiths, 2007), the endocytic system does the same for the cytosol of the eukaryotic cell. It also constitutes a shuttling service, providing bidirectional communication between the plasma membrane and the interior of the cell. Evolution has thus tinkered with the endocytic machinery to generate information-processing functions (Jacob, 1977). This tight coupling between the membrane and signaling systems is becoming increasingly important not only to our understanding of the function of both systems, but also to our ability to steer them away from pathological behaviors. Signal Propagation between Membrane Compartments In the previous section, we discussed how the endocytic machinery transports the reactive properties of the plasma membrane into the cytoplasm to propagate signals. We now discuss the spatial organization of the small G protein RAS as an example of a mechanism that transports the reactive properties of the plasma membrane to another membrane compartment within the cytoplasm. Here, the association between the signal carrier and membranes is achieved through the addition
Figure 4. Localization of RAS through the Acylation Cycle (A) To create a full three-dimensional (3D) simulation of the reaction-diffusion system that underlies the acylation cycle, a 3D stack of confocal images (left) is registered to identify three compartments (right): plasma membrane (white), cytosol/endoplasmic reticulum (gray), and Golgi (red). Computer simulations were performed with a cellular automaton approach to reflect: palmitoylated versus unpalmitoylated RAS; localized PAT-activity at the Golgi; ubiquitous thioesterase activity; unidirectional transport of palmitoylated species from Golgi to the plasma membrane; high inter- and intracompartmental mobility of unpalmitoylated versus low mobility of palmitoylated species. (B) Localization of the two species of palmitoylated (green) versus unpalmitoylated RAS (red) in presence of ubiquitous thioesterase activity (top row) and blocking thioesterase activity (bottom row) shown as an overlay of both species colored according to the 2D color map (right), wherein white denotes oversaturation of palmitoylated RAS (otherwise green). Starting from the initial condition of 100% unpalmitoylated RAS (left), the distribution of palmitoylation evolves toward enrichment of palmitoylated RAS at the plasma membrane and Golgi (upper-right) versus unspecific distribution over all membranes in case of thioesterase inhibition (lower-right).
of lipid anchors to the protein. Such lipid modifications are required for the membrane targeting of many proteins and for the enrichment of target proteins in specific microdomains on organelles (Hancock, 2003; Resh, 1999). Small G proteins exist in either a GTP-bound (activated) or GDP-bound (inactivated) state within a catalytic GTPase cycle operated through the intervention of guanine nucleotide exchange factors (GEFs) and GTPase-activating proteins (GAPs). The GTP-binding proteins of the RAS family, which are involved in a wide range of signal transduction processes, undergo various lipid modifications at the C terminus. The H-RAS and N-RAS isoforms undergo two types of lipid modification: an irreversible farnesylation at the cysteine residue of the CAAX box, followed by reversible palmitoylation at specific cysteine residues in the C-terminal hypervariable region (HVR). This S-acylation is unique in that it is the only known reversible lipid modification (Linder and Deschenes, 2003; Smotrys and Linder, 2004). Farnesylation conveys a membrane affinity to RAS but still allows high intercompartmental mobility. Additional palmitoylation further increases its affinity to membranes without conferring specificity for any membrane compartment. How then can spatial organization arise from these posttranslational lipidations? Consider unpalmitoylated but farnesylated RAS (Figure 4B, first column), which is distributed homogenously among membrane compartments in the cell. Localizing palmitoyl transferase (PAT) activity to the Golgi apparatus (Rocks et al., 2005) enables this membrane compartment to trap newly palmitoylated RAS. This trapping occurs because palmitoylation enhances the stability of the interaction of RAS with the membrane, thereby slowing its diffusion (Silvius and l’Heureux, 1994). However, if this were the end of the story, all RAS molecules would eventually be trapped at the Golgi and would then
slowly redistribute over all membranes to reach a homogeneous equilibrium distribution. Before this can occur, the nonequilibrium state of RAS enrichment at the Golgi is transferred to the plasma membrane via the secretory pathway (Choy et al., 1999; Rocks et al., 2010). Away from the high PAT activity at the Golgi, the palmitoyl group is removed by ubiquitous thioesterase activity. This depalmitoylated, farnesylated N/H-RAS rapidly redistributes over all endomembranes, enhancing the chance of re-encounter and trapping at the Golgi. Repeated cycles of de/repalmitoylation together with Golgi trapping by palmitoylation and the directionality of the secretory pathway thus constitute a spatially organizing system that counters the entropy-driven re-equilibration of lipidated RAS throughout the membranes of the cell (Rocks et al., 2010; Rocks et al., 2005). Simulation of this reaction-diffusion system using realistic reaction and diffusion parameters shows that our intuition that the proposed dynamic mechanism can generate the observed asymmetric spatial organization of RAS is correct (Figure 4). We can also predict that any interference with the dynamics of the acylation cycle will cause RAS to lose its specific localization, irrespective of its lipidation state (Figure 4B, lower row). This insight leads to a counterintuitive target for affecting RAS localization and thereby its signaling capacity: thioesterase activity. Palmostatin-B, an inhibitor of the cellular thioesterase APT1, has indeed been shown to cause fully palmitoylated RAS to redistribute more equally between membrane compartments and thereby lower oncogenic H-RAS-G12V signaling activity (Dekker et al., 2010). Let us now consider how the spatial organization of RAS affects information processing and transfer across membrane compartments. G proteins are active in the GTP-bound form because it is in this state that they can interact with effectors. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 905
The GTP binding state of RAS is almost exclusively determined by the relative local abundance of GEFs and GAPs and has no influence on the acylation cycle that regulates its spatial organization. By differentially localizing GAPs and GEFs to specific membrane compartments, a local RAS activity state can therefore be generated. For example, upon binding of ligand to RTKs at the plasma membrane, an increase in GEF activity increases the local concentration of active GTP-RAS. The Son of Sevenless (SOS) canonical RAS-GEF not only interacts with activated phosphorylated receptors through binding adaptor proteins, such as GRB2 (Innocenti et al., 2002; Jang et al., 2010), but also contains an allosteric site that increases its activity upon RAS-GTP-binding (Freedman et al., 2006). In solution, only low-affinity-binding of SOS to Ras-GTP is observed. Within the cell, however, the effective concentration of SOS is enhanced by its sequestration within the two-dimensional plasma membrane, where RAS is also enriched. This reduction in dimensionality increases the effective concentration of both SOS and RAS and hence facilitates activation of the system through positive feedback. RAS activation is countered by increasing RAS-GAP concentrations at the plasma membrane. A slower or delayed increase in RAS-GAP concentration thus generates a pulse of RAS activity at the plasma membrane (Augsten et al., 2006). Subsequently, this membrane-proximal activation pulse is subjected to the acylation cycle, transforming the spatial organization of RAS into a temporal response spanning the cell. The acylation cycle can therefore be considered as a carrier wave that links the intracellular membrane compartments and is modulated by the state of the plasma membrane. It was recently shown that the Golgi lacks RAS-specific GEF/GAP activity in certain cells and appears to act as a passive receiver of the RAS signal from the plasma membrane (Lorentzen et al., 2010). In this case, a diffusion-broadened echo of the original pulse of RAS activity is observed at the Golgi. The plasma membrane, because of its high degree of GEF/GAP regulation, is, in contrast, effectively decoupled from the activity of RAS at the Golgi. The endoplasmic reticulum (ER), however, acting as a platform for fast trafficking of depalmitoylated RAS between the PM and the Golgi, offers a stage for further regulation of RAS activity before it becomes trapped at the Golgi. Indeed, growth factorinduced upregulation of GEF activity at the ER, either by removal of GAPs or increase of GEFs, causes sustained RAS activity at the Golgi (Lorentzen et al., 2010). This system is capable of generating a biphasic response at the Golgi: a broadened echo from the plasma membrane pulse convoluted with sustained activity from the ER. Given that there are now many known examples of signaling networks in which the gene expression machinery is sensitive to the temporal properties of upstream signals, the capacity to transfer reaction properties across membrane surfaces is likely to have a fundamental impact on the regulation of cellular state (Goentoro et al., 2009; Kholodenko and Kolch, 2008; Murphy et al., 2004). This is especially important when considering cell fate in the presence of oncogenic, constitutively active forms of RAS that ‘‘short-circuit’’ the above signaling network into constant activity. Modulating the localization of such signaling molecules can be a means of restoring the switching functionality of those 906 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
networks, in effect giving the cell back its decision-making capacity. For example, the epithelial cell line MDCK-f3, which expresses oncogenic H-RAS-G12V, has undergone epithelialto-mesenchymal transition (Thiery et al., 2009), thereby losing cell contact inhibition (Chen et al., 2011). This phenotype can be reset to an overall cell shape and contact inhibition level of that of untransformed cells by the thioesterase inhibitor palmostatin-B (Dekker et al., 2010), which results in a decreased concentration of oncogenic H-RAS-G12V at the plasma membrane and Golgi. Though this will reduce interactions between RAS and its effectors such as RAF at the plasma membrane, effector activation is not the only point which needs to be considered. An oncogenic RAS mutation encoded at a single allele has the potential to activate wild-type RAS (encoded by the other allele) via the SOS feedback system described above in a dose-dependent manner. If the dose of oncogenic RAS at the plasma membrane is sufficiently low, the system can still respond to growth factor binding to cognate receptors through wild-type RAS. The oncogenic RAS generates an offset in the downstream signaling amplitude (for example in activated ERK) that is filtered out by the fold change detection mechanism in the gene expression machinery (Goentoro et al., 2009). However, if the dose of oncogenic RAS at the plasma membrane is high enough to overcome a threshold to activate wild-type RAS by the SOS feedback loop, all RAS in the cell is activated and can no longer switch in response to extracellular stimulation. Lowering the amount of palmitoylatable RAS proximal to the plasma membrane by thioesterase inhibition using palmostatin-B thus reduces the effective dose of oncogenic RAS at the plasma membrane below the threshold that is needed to trigger SOS feedback. This effectively inactivates the wild-type RAS population, such that it can respond again to growth factor activation by nucleotide exchange and therefore reacquire its decision-making capacity. Perspectives The living plasma membrane and its extension inside of the cell as endocytic vesicles constitute a system to receive, integrate, and distribute external and internal signals. True understanding of the living plasma membrane and its intricate connection to signaling requires novel experimental and theoretical methods that can take full account of the spatiotemporal asymmetries and confinement of its components, both of which provide the cell with its unique ability to regulate signal propagation. Bottom-up approaches such as the generation of reconstituted living membranes that mimic the dynamics of biological systems will be indispensable to move away from equilibrium biophysics. Here, the molecular machinery for membrane fusion and fission needs to be introduced in reconstituted membrane systems (Wollert and Hurley, 2010) to mimic the effects of vesicle transport dynamics on the information processing ability of a membrane. Moreover, the introduction of an endomembrane system in prokaryotes together with further research into bacterial organisms with an endocytic machinery (Lonhienne et al., 2010) will provide insight into the development of cell compartmentalization and its effect on signal processing. Top-down approaches such as spatiotemporal monitoring of the endocytic system in eukaryotes will be vital to understand
signal integration. The importance of different modes of spatial propagation in signaling is now clear. However, more and better genetically encoded fluorescent protein biosensors that report on the local activity of proteins need to be developed in order for us to better verify our hypotheses. This should go hand in hand with the further development of new functional imaging approaches such as fluorescence correlation spectroscopy, with sufficient spatial resolution to take advantage of the full potential of these biosensors (Maeder et al., 2007). Experimental approaches also need to be complemented by computational models and simulations not only to accurately interpret the experimental results, but also to place them in a correct spatiotemporal frame. We have taken as an example the properties arising from localization-dependent interaction in a realistic experimentally acquired cellular geometry. Such models can be expected to show emergent and often counterintuitive behaviors that are not readily understood. The convergence of bottom-up and top-down experimental approaches, together with computational models and simulations that feed back into the experimental design, will eventually allow us to move away from a static picture of the plasma membrane to a movie, featuring a dynamic and living entity that generates shapes and makes decisions. REFERENCES Augsten, M., Pusch, R., Biskup, C., Rennert, K., Wittig, U., Beyer, K., Blume, A., Wetzker, R., Friedrich, K., and Rubio, I. (2006). Live-cell imaging of endogenous Ras-GTP illustrates predominant Ras activation at the plasma membrane. EMBO Rep. 7, 46–51. Baass, P.C., Di Guglielmo, G.M., Authier, F., Posner, B.I., and Bergeron, J.J. (1995). Compartmentalized signal transduction by receptor tyrosine kinases. Trends Cell Biol. 5, 465–470. Birtwistle, M.R., and Kholodenko, B.N. (2009). Endocytosis and signalling: a meeting with mathematics. Mol. Oncol. 3, 308–320.
Dekker, F.J., Rocks, O., Vartak, N., Menninger, S., Hedberg, C., Balamurugan, R., Wetzel, S., Renner, S., Gerauer, M., Scho¨lermann, B., et al. (2010). Smallmolecule inhibition of APT1 affects Ras localization and signaling. Nat. Chem. Biol. 6, 449–456. Del Conte-Zerial, P., Brusch, L., Rink, J.C., Collinet, C., Kalaidzidis, Y., Zerial, M., and Deutsch, A. (2008). Membrane identity and GTPase cascades regulated by toggle and cut-out switches. Mol. Syst. Biol. 4, 206. Deribe, Y.L., Pawson, T., and Dikic, I. (2010). Post-translational modifications in signal integration. Nat. Struct. Mol. Biol. 17, 666–672. Devaux, P.F., and Morris, R. (2004). Transmembrane asymmetry and lateral domains in biological membranes. Traffic 5, 241–246. Ehlers, M.D., Kaplan, D.R., Price, D.L., and Koliatsos, V.E. (1995). NGFstimulated retrograde transport of trkA in the mammalian nervous system. J. Cell Biol. 130, 149–156. Finkel, T. (2006). Intracellular redox regulation by the family of small GTPases. Antioxid. Redox Signal. 8, 1857–1863. Fischer, E.H., Charbonneau, H., and Tonks, N.K. (1991). Protein tyrosine phosphatases: a diverse family of intracellular and transmembrane enzymes. Science 253, 401–406. Frank, C., Burkhardt, C., Imhof, D., Ringel, J., Zscho¨rnig, O., Wieligmann, K., Zacharias, M., and Bo¨hmer, F.D. (2004). Effective dephosphorylation of Src substrates by SHP-1. J. Biol. Chem. 279, 11375–11383. Freedman, T.S., Sondermann, H., Friedland, G.D., Kortemme, T., Bar-Sagi, D., Marqusee, S., and Kuriyan, J. (2006). A Ras-induced conformational switch in the Ras activator Son of sevenless. Proc. Natl. Acad. Sci. USA 103, 16692– 16697. Garrenton, L.S., Stefan, C.J., McMurray, M.A., Emr, S.D., and Thorner, J. (2010). Pheromone-induced anisotropy in yeast plasma membrane phosphatidylinositol-4,5-bisphosphate distribution is required for MAPK signaling. Proc. Natl. Acad. Sci. USA 107, 11805–11810. Goentoro, L., Shoval, O., Kirschner, M.W., and Alon, U. (2009). The incoherent feedforward loop can provide fold-change detection in gene regulation. Mol. Cell 36, 894–899. Griffiths, G. (2007). Cell evolution and the problem of membrane topology. Nat. Rev. Mol. Cell Biol. 8, 1018–1024.
Bohdanowicz, M., and Grinstein, S. (2010). Vesicular traffic: a Rab SANDwich. Curr. Biol. 20, R311–R314.
Grimes, M.L., Zhou, J., Beattie, E.C., Yuen, E.C., Hall, D.E., Valletta, J.S., Topp, K.S., LaVail, J.H., Bunnett, N.W., and Mobley, W.C. (1996). Endocytosis of activated TrkA: evidence that nerve growth factor induces formation of signaling endosomes. J. Neurosci. 16, 7950–7964.
Brown, D.I., and Griendling, K.K. (2009). Nox proteins in signal transduction. Free Radic. Biol. Med. 47, 1239–1253.
Hancock, J.F. (2003). Ras proteins: different signals from different locations. Nat. Rev. Mol. Cell Biol. 4, 373–384.
Buday, L., Warne, P.H., and Downward, J. (1995). Downregulation of the Ras activation pathway by MAP kinase phosphorylation of Sos. Oncogene 11, 1327–1331.
Harding, A.S., and Hancock, J.F. (2008). Using plasma membrane nanoclusters to build better signaling circuits. Trends Cell Biol. 18, 364–371.
Chen, Y.S., Mathias, R.A., Mathivanan, S., Kapp, E.A., Moritz, R.L., Zhu, H.J., and Simpson, R.J. (2011). Proteomics profiling of Madin-Darby canine kidney plasma membranes reveals Wnt-5a involvement during oncogenic H-Ras/ TGF-{beta}-mediated epithelial-mesenchymal transition. Mol. Cell. Proteomics 10, M110.001131. Choy, E., Chiu, V.K., Silletti, J., Feoktistov, M., Morimoto, T., Michaelson, D., Ivanov, I.E., and Philips, M.R. (1999). Endomembrane trafficking of ras: the CAAX motif targets proteins to the ER and Golgi. Cell 98, 69–80. Clayton, A.H., Walker, F., Orchard, S.G., Henderson, C., Fuchs, D., Rothacker, J., Nice, E.C., and Burgess, A.W. (2005). Ligand-induced dimer-tetramer transition during the activation of the cell surface epidermal growth factor receptor-A multidimensional microscopy analysis. J. Biol. Chem. 280, 30392–30399. Cosker, K.E., Courchesne, S.L., and Segal, R.A. (2008). Action in the axon: generation and transport of signaling endosomes. Curr. Opin. Neurobiol. 18, 270–275. Dehmelt, L., and Bastiaens, P.I. (2010). Spatial organization of intracellular communication: insights from imaging. Nat. Rev. Mol. Cell Biol. 11, 440–452.
Howe, C.L. (2005). Modeling the signaling endosome hypothesis: why a drive to the nucleus is better than a (random) walk. Theor. Biol. Med. Model. 2, 43. Inman, G.J., Nicola´s, F.J., and Hill, C.S. (2002). Nucleocytoplasmic shuttling of Smads 2, 3, and 4 permits sensing of TGF-beta receptor activity. Mol. Cell 10, 283–294. Innocenti, M., Tenca, P., Frittoli, E., Faretta, M., Tocchetti, A., Di Fiore, P.P., and Scita, G. (2002). Mechanisms through which Sos-1 coordinates the activation of Ras and Rac. J. Cell Biol. 156, 125–136. Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166. Jang, I.K., Zhang, J., Chiang, Y.J., Kole, H.K., Cronshaw, D.G., Zou, Y., and Gu, H. (2010). Grb2 functions at the top of the T-cell antigen receptor-induced tyrosine kinase cascade to control thymic selection. Proc. Natl. Acad. Sci. USA 107, 10620–10625. Janmey, P.A., and Kinnunen, P.K. (2006). Biophysical properties of lipids and dynamic membranes. Trends Cell Biol. 16, 538–546. Janssen-Heininger, Y.M., Mossman, B.T., Heintz, N.H., Forman, H.J., Kalyanaraman, B., Finkel, T., Stamler, J.S., Rhee, S.G., and van der Vliet, A. (2008). Redox-based regulation of signal transduction: principles, pitfalls, and promises. Free Radic. Biol. Med. 45, 1–17.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 907
Karlsson, S., Kowanetz, K., Sandin, A., Persson, C., Ostman, A., Heldin, C.H., and Hellberg, C. (2006). Loss of T-cell protein tyrosine phosphatase induces recycling of the platelet-derived growth factor (PDGF) beta-receptor but not the PDGF alpha-receptor. Mol. Biol. Cell 17, 4846–4855. Kholodenko, B.N., and Kolch, W. (2008). Giving space to cell signaling. Cell 133, 566–567. Kusumi, A., Nakada, C., Ritchie, K., Murase, K., Suzuki, K., Murakoshi, H., Kasai, R.S., Kondo, J., and Fujiwara, T. (2005). Paradigm shift of the plasma membrane concept from the two-dimensional continuum fluid to the partitioned fluid: high-speed single-molecule tracking of membrane molecules. Annu. Rev. Biophys. Biomol. Struct. 34, 351–378. Kusumi, A., and Sako, Y. (1996). Cell surface organization by the membrane skeleton. Curr. Opin. Cell Biol. 8, 566–574. Kwik, J., Boyle, S., Fooksman, D., Margolis, L., Sheetz, M.P., and Edidin, M. (2003). Membrane cholesterol, lateral mobility, and the phosphatidylinositol 4,5-bisphosphate-dependent organization of cell actin. Proc. Natl. Acad. Sci. USA 100, 13964–13969. Le Roy, C., and Wrana, J.L. (2005). Clathrin- and non-clathrin-mediated endocytic regulation of cell signalling. Nat. Rev. Mol. Cell Biol. 6, 112–126. Lee, E., Salic, A., Kru¨ger, R., Heinrich, R., and Kirschner, M.W. (2003). The roles of APC and Axin derived from experimental and theoretical analysis of the Wnt pathway. PLoS Biol. 1, E10. Lee, S.R., Kwon, K.S., Kim, S.R., and Rhee, S.G. (1998). Reversible inactivation of protein-tyrosine phosphatase 1B in A431 cells stimulated with epidermal growth factor. J. Biol. Chem. 273, 15366–15372. Lemmon, M.A., and Schlessinger, J. (2010). Cell signaling by receptor tyrosine kinases. Cell 141, 1117–1134. Li, Q., Harraz, M.M., Zhou, W., Zhang, L.N., Ding, W., Zhang, Y., Eggleston, T., Yeaman, C., Banfi, B., and Engelhardt, J.F. (2006). Nox2 and Rac1 regulate H2O2-dependent recruitment of TRAF6 to endosomal interleukin-1 receptor complexes. Mol. Cell. Biol. 26, 140–154. Lim, W.A., and Pawson, T. (2010). Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142, 661–667.
Murphy, L.O., MacKeigan, J.P., and Blenis, J. (2004). A network of immediate early gene products propagates subtle differences in mitogen-activated protein kinase signal amplitude and duration. Mol. Cell. Biol. 24, 144–153. Muth, T.R., and Caplan, M.J. (2003). Transport protein trafficking in polarized cells. Annu. Rev. Cell Dev. Biol. 19, 333–366. Nakakuki, T., Birtwistle, M.R., Saeki, Y., Yumoto, N., Ide, K., Nagashima, T., Brusch, L., Ogunnaike, B.A., Okada-Hatakeyama, M., and Kholodenko, B.N. (2010). Ligand-specific c-Fos expression emerges from the spatiotemporal control of ErbB network dynamics. Cell 141, 884–896. Nelson, S., Horvat, R.D., Malvey, J., Roess, D.A., Barisas, B.G., and Clay, C.M. (1999). Characterization of an intrinsically fluorescent gonadotropin-releasing hormone receptor and effects of ligand binding on receptor lateral diffusion. Endocrinology 140, 950–957. Niethammer, P., Bastiaens, P., and Karsenti, E. (2004). Stathmin-tubulin interaction gradients in motile and mitotic cells. Science 303, 1862–1866. Niethammer, P., Kronja, I., Kandels-Lewis, S., Rybina, S., Bastiaens, P., and Karsenti, E. (2007). Discrete states of a protein interaction network govern interphase and mitotic microtubule dynamics. PLoS Biol. 5, e29. Oakley, F.D., Abbott, D., Li, Q., and Engelhardt, J.F. (2009). Signaling components of redox active endosomes: the redoxosomes. Antioxid. Redox Signal. 11, 1313–1333. Poteryaev, D., Datta, S., Ackema, K., Zerial, M., and Spang, A. (2010). Identification of the switch in early-to-late endosome transition. Cell 141, 497–508. Ramadurai, S., Duurkens, R., Krasnikov, V.V., and Poolman, B. (2010). Lateral diffusion of membrane proteins: consequences of hydrophobic mismatch and lipid composition. Biophys. J. 99, 1482–1489. Rashid, S., Pilecka, I., Torun, A., Olchowik, M., Bielinska, B., and Miaczynska, M. (2009). Endosomal adaptor proteins APPL1 and APPL2 are novel activators of beta-catenin/TCF-mediated transcription. J. Biol. Chem. 284, 18115– 18128. Resh, M.D. (1999). Fatty acylation of proteins: new insights into membrane targeting of myristoylated and palmitoylated proteins. Biochim. Biophys. Acta 1451, 1–16.
Linder, M.E., and Deschenes, R.J. (2003). New insights into the mechanisms of protein palmitoylation. Biochemistry 42, 4311–4320.
Reynolds, A.R., Tischer, C., Verveer, P.J., Rocks, O., and Bastiaens, P.I. (2003). EGFR activation coupled to inhibition of tyrosine phosphatases causes lateral signal propagation. Nat. Cell Biol. 5, 447–453.
Lingwood, D., and Simons, K. (2010). Lipid rafts as a membrane-organizing principle. Science 327, 46–50.
Rhee, S.G. (2006). Cell signaling. H2O2, a necessary evil for cell signaling. Science 312, 1882–1883.
Lonhienne, T.G., Sagulenko, E., Webb, R.I., Lee, K.C., Franke, J., Devos, D.P., Nouwens, A., Carroll, B.J., and Fuerst, J.A. (2010). Endocytosis-like protein uptake in the bacterium Gemmata obscuriglobus. Proc. Natl. Acad. Sci. USA 107, 12883–12888. Lorentzen, A., Kinkhabwala, A., Rocks, O., Vartak, N., and Bastiaens, P.I.H. (2010). Regulation of Ras localization by acylation enables a mode of intracellular signal propagation. Sci. Signal. 3, ra68. Maeder, C.I., Hink, M.A., Kinkhabwala, A., Mayr, R., Bastiaens, P.I., and Knop, M. (2007). Spatial regulation of Fus3 MAP kinase activity through a reactiondiffusion mechanism in yeast pheromone signalling. Nat. Cell Biol. 9, 1319– 1326. Markus, M., Bo¨hm, D., and Schmick, M. (1999). Simulation of vessel morphogenesis using cellular automata. Math. Biosci. 156, 191–206. Marshall, C.J. (1995). Specificity of receptor tyrosine kinase signaling: transient versus sustained extracellular signal-regulated kinase activation. Cell 80, 179–185. Matheos, D., Metodiev, M., Muller, E., Stone, D., and Rose, M.D. (2004). Pheromone-induced polarization is dependent on the Fus3p MAPK acting through the formin Bni1p. J. Cell Biol. 165, 99–109.
Rocks, O., Gerauer, M., Vartak, N., Koch, S., Huang, Z.P., Pechlivanis, M., Kuhlmann, J., Brunsveld, L., Chandra, A., Ellinger, B., et al. (2010). The palmitoylation machinery is a spatially organizing system for peripheral membrane proteins. Cell 141, 458–471. Rocks, O., Peyker, A., Kahms, M., Verveer, P.J., Koerner, C., Lumbierres, M., Kuhlmann, J., Waldmann, H., Wittinghofer, A., and Bastiaens, P.I. (2005). An acylation cycle regulates localization and activity of palmitoylated Ras isoforms. Science 307, 1746–1752. Rodriguez-Boulan, E., Kreitzer, G., and Mu¨sch, A. (2005). Organization of vesicular trafficking in epithelia. Nat. Rev. Mol. Cell Biol. 6, 233–247. Saffman, P.G., and Delbru¨ck, M. (1975). Brownian motion in biological membranes. Proc. Natl. Acad. Sci. USA 72, 3111–3113. Sawano, A., Takayama, S., Matsuda, M., and Miyawaki, A. (2002). Lateral propagation of EGF signaling after local stimulation is dependent on receptor density. Dev. Cell 3, 245–257. Schenck, A., Goto-Silva, L., Collinet, C., Rhinn, M., Giner, A., Habermann, B., Brand, M., and Zerial, M. (2008). The endosomal protein Appl1 mediates Akt substrate specificity and cell survival in vertebrate development. Cell 133, 486–497. Scita, G., and Di Fiore, P.P. (2010). The endocytic matrix. Nature 463, 464–473.
Mellman, I., and Nelson, W.J. (2008). Coordinated protein sorting, targeting and distribution in polarized cells. Nat. Rev. Mol. Cell Biol. 9, 833–845.
Silvius, J.R., and l’Heureux, F. (1994). Fluorimetric evaluation of the affinities of isoprenylated peptides for lipid bilayers. Biochemistry 33, 3014–3022.
Miaczynska, M., Pelkmans, L., and Zerial, M. (2004). Not just a sink: endosomes in control of signal transduction. Curr. Opin. Cell Biol. 16, 400–406.
Simons, K., and Ikonen, E. (1997). Functional rafts in cell membranes. Nature 387, 569–572.
908 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Simons, K., and van Meer, G. (1988). Lipid sorting in epithelial cells. Biochemistry 27, 6197–6202. Singer, S.J., and Nicolson, G.L. (1972). The fluid mosaic model of the structure of cell membranes. Science 175, 720–731. Smotrys, J.E., and Linder, M.E. (2004). Palmitoylation of intracellular signaling proteins: regulation and function. Annu. Rev. Biochem. 73, 559–587. Sorkin, A., Krolenko, S., Kudrjavtceva, N., Lazebnik, J., Teslenko, L., Soderquist, A.M., and Nikolsky, N. (1991). Recycling of epidermal growth factorreceptor complexes in A431 cells: identification of dual pathways. J. Cell Biol. 112, 55–63. Sorkin, A., McClure, M., Huang, F., and Carter, R. (2000). Interaction of EGF receptor and grb2 in living cells visualized by fluorescence resonance energy transfer (FRET) microscopy. Curr. Biol. 10, 1395–1398. Sorkin, A., and von Zastrow, M. (2009). Endocytosis and signalling: intertwining molecular networks. Nat. Rev. Mol. Cell Biol. 10, 609–622.
van den Bogaart, G., Holt, M.G., Bunt, G., Riedel, D., Wouters, F.S., and Jahn, R. (2010). One SNARE complex is sufficient for membrane fusion. Nat. Struct. Mol. Biol. 17, 358–364. van der Wouden, J.M., Maier, O., van IJzendoorn, S.C., and Hoekstra, D. (2003). Membrane dynamics and the regulation of epithelial cell polarity. Int. Rev. Cytol. 226, 127–164. van Drogen, F., Stucke, V.M., Jorritsma, G., and Peter, M. (2001). MAP kinase dynamics in response to pheromones in budding yeast. Nat. Cell Biol. 3, 1051– 1059. van Meer, G., Voelker, D.R., and Feigenson, G.W. (2008). Membrane lipids: where they are and how they behave. Nat. Rev. Mol. Cell Biol. 9, 112–124. Vartak, N., and Bastiaens, P. (2010). Spatial cycles in G-protein crowd control. EMBO J. 29, 2689–2699.
Spang, A. (2009). On the fate of early endosomes. Biol. Chem. 390, 753–759.
Wiley, H.S., and Burke, P.M. (2001). Regulation of receptor tyrosine kinase signaling by endocytic trafficking. Traffic 2, 12–18.
Stelling, J., and Kholodenko, B.N. (2009). Signaling cascades as cellular devices for spatial computations. J. Math. Biol. 58, 35–55.
Wollert, T., and Hurley, J.H. (2010). Molecular mechanism of multivesicular body biogenesis by ESCRT complexes. Nature 464, 864–869.
Suzuki, A., and Ohno, S. (2006). The PAR-aPKC system: lessons in polarity. J. Cell Sci. 119, 979–987. Thiery, J.P., Acloque, H., Huang, R.Y., and Nieto, M.A. (2009). Epithelialmesenchymal transitions in development and disease. Cell 139, 871–890. Tischer, C., and Bastiaens, P.I. (2003). Lateral phosphorylation propagation: an aspect of feedback signalling? Nat. Rev. Mol. Cell Biol. 4, 971–974. Turing, A.M. (1952). The Chemical Basis of Morphogenesis. Philos. Trans. R. Soc. Lond. B Biol. Sci. 237, 37–72. Uchida, T., Matozaki, T., Noguchi, T., Yamao, T., Horita, K., Suzuki, T., Fujioka, Y., Sakamoto, C., and Kasuga, M. (1994). Insulin stimulates the phosphorylation of Tyr538 and the catalytic activity of PTP1C, a protein tyrosine phosphatase with Src homology-2 domains. J. Biol. Chem. 269, 12220–12228.
Woo, H.A., Yim, S.H., Shin, D.H., Kang, D., Yu, D.Y., and Rhee, S.G. (2010). Inactivation of peroxiredoxin I by phosphorylation allows localized H(2)O(2) accumulation for cell signaling. Cell 140, 517–528. Wouters, F.S., and Bastiaens, P.I. (1999). Fluorescence lifetime imaging of receptor tyrosine kinase activity in cells. Curr. Biol. 9, 1127–1130. Zidovetzki, R., Yarden, Y., Schlessinger, J., and Jovin, T.M. (1981). Rotational diffusion of epidermal growth factor complexed to cell surface receptors reflects rapid microaggregation and endocytosis of occupied receptors. Proc. Natl. Acad. Sci. USA 78, 6981–6985. Zimmerberg, J., and Gawrisch, K. (2006). The physical chemistry of biological membranes. Nat. Chem. Biol. 2, 564–567.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 909
Leading Edge
Review Cellular Decision Making and Biological Noise: From Microbes to Mammals Ga´bor Bala´zsi,1 Alexander van Oudenaarden,2 and James J. Collins3,4,* 1Department of Systems Biology–Unit 950, The University of Texas MD Anderson Cancer Center, 7435 Fannin Street, Houston, TX 77054, USA 2Departments of Physics and Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA 3Howard Hughes Medical Institute, Department of Biomedical Engineering and Center for BioDynamics, Boston University, Boston, MA 02215, USA 4Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02215, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.01.030
Cellular decision making is the process whereby cells assume different, functionally important and heritable fates without an associated genetic or environmental difference. Such stochastic cell fate decisions generate nongenetic cellular diversity, which may be critical for metazoan development as well as optimized microbial resource utilization and survival in a fluctuating, frequently stressful environment. Here, we review several examples of cellular decision making from viruses, bacteria, yeast, lower metazoans, and mammals, highlighting the role of regulatory network structure and molecular noise. We propose that cellular decision making is one of at least three key processes underlying development at various scales of biological organization. Introduction If we, humans, want to control living cells, two strategies are typically available: modifying their genome or changing the environment in which they reside. Does this mean that cells with identical genomes exposed to the same (possibly time-dependent) environment will necessarily have identical phenotypes? Not at all, for reasons that are still not entirely clear. When cells assume different, functionally important and heritable fates without an associated genetic or environmental difference, cellular decision making occurs. This includes asymmetric cell divisions as well as spontaneous differentiation of isogenic cells exposed to the same environment. Specific environmental or genetic cues may bias the process, causing certain cellular fates to be more frequently chosen (as when tossing identically biased coins). Still, the outcome of cellular decision making for individual cells is a priori unknown. A growing number of cell types are being described as capable of decision making under various circumstances, suggesting that such cellular choices are widespread in all organisms. What are the molecular mechanisms underlying the decisions of various cell types, and why are such decisions so common? We hope to suggest answers to these questions here by considering examples at increasing levels of biological complexity, from viruses to mammals. Such a comparative overview may reveal common themes across different domains of life and may offer clues about the significance of cellular decision making at increasing levels of biological complexity (Maynard Smith and Szathma´ry, 1995). Balls rolling down a slanted landscape with bifurcating valleys (Waddington and Kacser, 1957) have been widely and 910 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
repeatedly used for several decades as a pictorial illustration of differentiation in multicellular development. Despite its suggestive qualities and repeated use, it has been largely unclear what the valleys and peaks represent in the illustration called ‘‘Waddington’s epigenetic landscape.’’ The increasingly quantitative characterization of gene regulation at the single-cell level is now enabling the computation of Waddington’s landscape (Figure 1), which can serve as a general illustration of an emerging theoretical framework for cellular decision making. Assuming for a moment that cellular states can be represented by the concentration of a single molecule, the horizontal axes in Figure 1 will correspond to the concentration of this molecule and a timedependent environmental factor, respectively, whereas the vertical dimension corresponds to a potential that governs cellular dynamics. Cells illustrated as spheres will tend to slide down along the concentration axis (pointing from left to right) toward local minima (stable cell states) on this landscape while they also progress toward the observer in time, as a time-dependent environmental factor continuously reshapes the geography of the landscape. Based solely on these considerations, identical cells released from the same point on Waddington’s landscape will follow indistinguishable trajectories, precluding cellular decision making and differentiation. On the other hand, cells released from distinct but nearby points can move to different minima as the bifurcating valleys amplify pre-existing positional differences. According to this deterministic interpretation, cellular decision making and differentiation are completely explained by pre-existing phenotypic differences within isogenic cell populations. Extensive theoretical and experimental work has started to seriously challenge this simplistic deterministic view, as it is
Figure 1. Illustration of Cellular Decision Making on a Molecular Potential Landscape The landscape (projected onto the concentration of a specific molecule) is reshaped as the environment changes in time. The blue ball represents a cell that, under the influence of a changing environment, can assume three different fates at the proximal edge of the landscape (white balls at the end of the time course). Even in a constant environment, cells can transition between local minima due to random perturbations to the landscape (intrinsic molecular noise).
becoming clear that at least four critical revisions to Waddington’s picture are needed to properly describe cellular dynamics. First, in reality, the landscape is high-dimensional, defined by all intracellular molecular concentrations and multiple relevant environmental factors, and is not a potential in the usual sense. For this reason, cyclic flows (eddies) may exist that move cells around on closed trajectories in concentration space, even if the local geography is completely even (Wang et al., 2008). Second, the landscape is under the constant influence of omnipresent molecular noise (Kaern et al., 2005; Maheshri and O’Shea, 2007; Rao et al., 2002)—stochastic ‘‘seismic vibrations’’ of varying amplitudes and spectra, specific to each location on the landscape (Figure 1). Third, the landscape is not rigid: cells themselves may reshape the geography due to cell-cell interactions (Waters and Bassler, 2005) and the growth rate dependence of protein concentrations (Klumpp et al., 2009; Tan et al., 2009). Last, but not least, growth rate differences between various cellular states reshape the landscape, lowering locations of high fitness and elevating points with reduced fitness as fast growth ‘‘overpopulates’’ certain locations and thereby deepens the landscape. Therefore, Waddington’s landscape must be integrated with a nongenetic version of Sewall Wright’s fitness landscape for genetically identical individuals (Pa´l and Miklo´s, 1999). For the purposes of this review, intrinsic gene expression noise (Blake et al., 2003; Elowitz et al., 2002; Ozbudak et al., 2002) is the most critical component missing from Waddington’s picture. The reason is that even identical cells released from the same location in Figure 1 will feel the perturbing effects of omnipresent random fluctuations at every point on their way. Random noise will shake them apart, modifying their trajectories and forcing them to cross barriers, diffuse along plateaus, and find new local minima. Thus, intrinsic noise enables the phenotypic diversification of completely identical cells exposed to the same environment and further facilitates cellular decision making for cells already slightly different when released onto Waddington’s landscape. Moreover, according to the concepts
of escape rate theory (Ha¨nggi et al., 1984; Mehta et al., 2008; Walczak et al., 2005), even cells maintained in a constant environment will have limited residence time around each local minimum (valley) on the landscape, as noise can induce repeated transitions between various cellular states. Why is cellular decision making so widespread, and when could it confer advantages compared to more deterministic cell fate scenarios? Considering that noise is unavoidable whenever a few copies of a certain molecule react with others inside small volumes (as is the case of DNA inside cells), nongenetic diversity should be very common in the cellular world. Because noise reduction requires high intracellular concentrations or costly negative feedback loops, it should be more surprising if a cellular process is not noisy than if it is. However, not all phenotypic diversity is functionally important and heritable across cell divisions and may therefore not classify as cellular decision making. Still, some cellular processes are noisier than expected based on Poissonian protein synthesis and degradation (Newman et al., 2006), or the resulting cellular states are heritable across several cell cycles (see below), arguing for functionality as the reason for their existence. The need for stochastic differentiation appears when individual cells are unable to fully adapt to their environment. For example, photosynthesis and nitrogen fixation are essential but mutually exclusive functions in many cell types. To resolve this dilemma, many cyanobacteria dedicate a subpopulation of cells entirely to nitrogen fixation while the rest of the cells remain photosynthetic (Wolk, 1996), thereby ensuring that the cell population can simultaneously fix carbon and nitrogen. The segregation of somatic cells from germ cells is another classic example in which the tasks of locomotion and replication are allocated to different subpopulations (Kirk, 2005). Stochastic differentiation into a growth-arrested but stress-resistant state (such as a spore) may optimize survival in an uncertain, frequently stressful environment by segregating two essential tasks: growth in the absence of stress and survival in the presence of stress. Theoretical work has demonstrated the advantage of phenotypic specialization in a cell population when the added benefits from two vital tasks are smaller than the cost for one cell to perform both tasks (Wahl, 2002). Theory has also shown that a population of cells capable of random phenotypic switching can have an advantage in a fluctuating environment (Kussell Cell 144, March 18, 2011 ª2011 Elsevier Inc. 911
and Leibler, 2005; Thattai and van Oudenaarden, 2004; Wolf et al., 2005). Recent experiments confirmed these predictions, showing that noise can aid survival in severe stress (Blake et al., 2006), can optimize the efficiency of resource uptake atay et al., 2009), and can optimize survival during starvation (C¸ag in specific fluctuating environments (Acar et al., 2008). Still, the optimality of stochastic cellular decision making in a well-defined environment does not guarantee that this behavior can evolve. This is because, usually, one of the stochastically chosen cellular states has lower direct fitness (West et al., 2007), rendering the switching strategy vulnerable to invasion by mutants that never switch into the less fit state but nevertheless reap the benefits of cohabitation with faithful switchers. This can be prevented by cheater control (West et al., 2006) or by the regular recurrence of detrimental environmental conditions that suppress or eliminate such mutants. Once stochastic switching became an evolutionarily stable strategy, such task-sharing decisions in clonal microbial populations (Bonner, 2003; Veening et al., 2008a) may have formed the bases of multicellular development. Therefore, the need for optimal resource utilization and survival in a changing environment may have been important driving forces behind the evolution and maintenance of cellular decision making across various domains of life, as suggested by the recent laboratory evolution of bet hedging (Beaumont et al., 2009). In the following, we will describe how approaches from molecular biology, nonlinear dynamics, and synthetic biology have been used to gain insight into the role of biological noise in cellular decision making, effectuated by a variety of molecular network structures in organisms of increasing biological complexity, including viruses, bacteria, yeast, lower metazoans, and mammals. Viruses One of the earliest molecular choices made during the evolution of life on Earth may have been the environment-dependent decision to arrest replication. As the first replicators appeared in the primordial soup (Dawkins, 2006), it may have been advantageous to copy themselves rapidly only in favorable conditions, including an appropriate level of basic building blocks, temperature, acidity, radiation, and preferably no fellow competitors. Moreover, alliances between replicators and sensor molecules may have formed to ensure that replication occurred efficiently and accurately under the appropriate circumstances. Though we may never be certain about the specific events that took place as life began on our planet, viral infections probably offer some clues (Koonin et al., 2006). Viruses are among the simplest nucleic acid-based replicating entities, which presently can only multiply inside of the cells they parasitize. Nevertheless, viral decisions taking place in host cells are in every aspect similar to the bacterial, fungal, and metazoan cellular fate choices described in the subsequent sections, indicating that cellular decision making is a misnomer. In fact, ‘‘cellular’’ decisions are taken by more or less autonomous replicating systems that reside inside and manipulate the behavior of carrier cells to maximize the chance of their own propagation (Dawkins, 2006). A particularly well-studied virus is bacteriophage lambda (Ptashne, 2004), which preys on the bacterium Escherichia coli 912 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
and has served as a model for virology for more than half of a century. The infection cycle of this ‘‘coliphage’’ virus begins with attachment to the bacterial cell wall, followed by the injection of viral DNA into the host and the initiation of transcript synthesis. From this moment, two outcomes are possible. The infection either culminates with replication and virus assembly that causes host lysis, or it concludes with the integration of viral DNA into the bacterial chromosome followed by a prolonged period of lysogeny. This is a typical example of decision making at the subcellular level, as viruses with identical genomes infecting isogenic cells can either become lytic or lysogenic. Despite the apparent simplicity of the viral genome, the story of lambda phage decision making is still not completely written and may hold many surprises. In a series of papers starting in the 1960s (Ptashne, 1967), Mark Ptashne described two repressors (CI and Cro) that proved to be essential for the lysis-lysogeny decision of phage lambda (Figure 2A). CI and Cro are encoded from two divergent promoters (PRM and PR, respectively) and are controlled by three shared operator sites (OR1, OR2, and OR3) to which either CI or Cro dimers can bind but with different affinities (OR1 > OR2 > OR3 for CI and OR3 > OR2 > OR1 for Cro). CI repressor binding to OR1 has negligible effect on cI transcription, whereas CI binding to OR2 activates and CI binding to OR3 represses cI transcription. CI binding to any operator site represses cro transcription (Figure 2A). Cro represses both its own and cI expression but has a stronger effect on cI. Consequently, CI and Cro mutually repress each other, operating as a natural toggle switch (Gardner et al., 2000) with bistable dynamics (Figures 2B and 2C), augmented with autoregulatory loops. This regulatory structure inspired the first mathematical models of the lambda switch (Shea and Ackers, 1985). Though the CI-Cro module is most commonly known as the core of the ‘‘lambda switch,’’ it is only the tip of the iceberg of regulatory interactions involved in the lysis/lysogeny decision (Oppenheim et al., 2005; Ptashne, 2004). Additional mechanisms include DNA loop formation that reinforces cro repression, regulation of cI expression by CII and CIII, and antitermination of cro transcript synthesis (Figure 2A). These components were included into a comprehensive stochastic model of the lambda switch (Arkin et al., 1998), which was also the first study to apply the Gillespie algorithm (Gillespie, 1977) for modeling a natural gene network. This seminal work pointed out how stochastic molecular events, originating from the random movement of cellular contents, can trigger decisions on a much larger scale, leading to divergent cellular fates. Stochastic decision making starts as soon as the first viral gene products appear in the cytoplasm. Cro gets a head start, but CI catches up soon, and both fluctuate due to random transcription-regulatory events. The race continues until the abundance of one of these molecules overwhelms the other, terminally flipping the lambda switch into one of two possible stable states (cro-on/cI-off or cI-on/cro-off). In addition to early stochastic events, many environmental factors can bias stochastic decision making and influence the outcome of infection, including the nutritional state and DNA damage response of the host cell, as well as the number of phages coinfecting the host cell (multiplicity of infection). Therefore, the lambda phage
Figure 2. Viral Decision Making (A) Gene regulatory network controlling the lambda phage lysis/lysogeny decision consists of the core repressor pair CI and Cro and a number of additional regulators, such as N and CII. Cro and CI mutually repress each other, and CI also activates itself from the OR2 operator site, which results in a structure of nested positive and negative feedback loops. The mutual regulatory effects of CI and Cro are annotated with the number of the OR site corresponding to each particular interaction. (B) Nullclines for CI and Cro, based on the model from Weitz and colleagues (Weitz et al., 2008), at a multiplicity of infection MOI = 2. Along the CI nullcline, there is no change in CI, and along the Cro nullcline, there is no change in Cro. Neither CI nor Cro changes in the points where the nullclines intersect, which represent steady states. The nullclines intersect in three distinct points, indicating that there are three steady R states. (C) Potential calculated along the Cro nullcline, based on the Fokker-Planck approximation, 4 = 2 ½ðf gÞ=ðf + gÞ d½CI, wherein f and g represent CI synthesis and degradation along the Cro nullcline, respectively. Filled circles indicate stable nodes. The gray circle indicates that the middle state is a saddle (unstable along the Cro nullcline but stable along the CI nullcline). Molecular noise will force the system to transition between the two valleys, especially in the beginning of infection when transcripts and proteins are rare and noise is high. (D) The autoregulation of the Tat transcription factor from HIV was reconstituted by expressing both GFP and Tat from the LTR promoter, which is naturally activated by Tat. The internal ribosomal entry site (IRES) (Pelletier and Sonenberg, 1988) between the two coding regions ensures that GFP and Tat are cotranslated from the same mRNA template. (E) After being sorted based on their expression level as Off, Dim, Mid, and Bright, the cells followed different relaxation patterns: Off remained Off; Dim first trifurcated into Off, Dim, and Bright, and then the Dim peak gradually disappeared; Mid relaxed to Bright; and most of Bright remained Bright, with a small subpopulation relaxing to Low. (F) Control synthetic gene circuit without feedback. (G) After sorting, the control gene circuit had a much simpler relaxation pattern. Most cells were Low, which remained Low after sorting, whereas Dim cells mostly remained Dim, with a few of them relaxing to Off. These patterns were interpreted as the hallmarks of excitable dynamics.
has a stochastic switch that is capable of hedging bets in a ‘‘smart,’’ environment-dependent manner, investing in both immediate and future expansions. The importance of intrinsic noise in the lambda switch was recently questioned by a number of research groups. First, it was shown that the host cell volume plays an important role in the decision, with larger cells being more likely to lyse (St-Pierre and Endy, 2008). This pointed to the concentration of infecting phages (rather than their absolute number) as the critical factor in the outcome of infection. Following theoretical predictions by
Weitz et al. (Weitz et al., 2008), Zeng and colleagues explained away even more stochasticity (Zeng et al., 2010), showing that the predictability of infection outcome improves if each phage is assumed to cast its own lysis/lysogeny vote, a unanimous vote being necessary for lysogeny. Importantly, stochasticity was reduced, but not eliminated, in this study, suggesting that, although further details of the phage-host system may be discovered that make the outcome of infection more predictable, intrinsic stochasticity stemming from the random nature of gene expression will remain an important factor to consider. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 913
So, is noise a general factor in viral choices between lysis and dormancy? This seems to be the case, as suggested by recent work on the latency of human immunodeficiency virus (HIV) in CD4+ T cells (Weinberger et al., 2005). After HIV integrates into the host genome, active HIV infections almost always culminate in lysis. However, the site of HIV integration is highly variable and has a strong effect on the resulting expression dynamics. In rare occasions, the integrated virus becomes latent, creating an incurable reservoir that is the main obstacle preventing the elimination of the disease (Han et al., 2007). To determine the mechanism of HIV latency, Weinberger and colleagues focused on the positive autoregulatory loop of the Tat transcription factor as the key component in HIV decision making (Weinberger et al., 2005). The authors built two synthetic gene constructs, the first of which coexpressed the green fluorescent protein (GFP) with Tat from the long terminal repeat (LTR) promoter (positive feedback, Figure 2D), whereas the second consisted of GFP alone transcribed from the same promoter (no feedback, Figure 2F). After integrating these constructs into the genome, the authors monitored the dynamics of GFP expression over several weeks after sorting CD4+ T cells by their fluorescence as either Off, Dim, Mid, or Bright. The relaxation of these sorted cell populations over time (Figures 2E and 2G) was interpreted as a signature of excitable dynamics, when cells perturbed from the stable Off state undergo transient excursions into the Bright regime, from which they return to the Off state. Remarkably, this behavior depended on the site of HIV integration (because most clonal populations initiated from a Bright cell remained Bright, and all Off clones remained Off). Only a small subset of clones exhibited excitable dynamics, suggesting that excitability requires weak basal LTR promoter activity. These findings were in agreement with a simple mathematical model that captured the experimentally observed behavior of these constructs and identified preintegration transcription as the stochastic perturbation that causes the spikes in Tat expression. Further work showed a lack of cooperativity in the response of the LTR promoter to Tat and a rightward shift in the autocorrelation function of GFP expression due to positive feedback (Weinberger et al., 2008), confirming the earlier conclusions that Tat autoregulation does not induce bistability (Weinberger et al., 2005). Instead, futile cycles of acetylation/deacetylation of Tat en route to the LTR promoter act as a dissipative ‘‘resistor,’’ weakening autoregulation and reducing Tat expression to basal levels. The fact that excitable HIV integration clones readily respond to a number of immune response-related external factors suggests that these exceptional integrants may provide the pool of latent HIV infection in resting memory T cells. When highly active antiretroviral therapy eliminates the productive HIV pool, these latent but excitable viruses wait for their chance to reappear as a new infection. In conclusion, these studies on lambda phage and HIV suggest that viral choices between replication and latency may, in general, be stochastic, driven by random molecular noise within networks characterized by bistable or excitable dynamics. This hints at the possibility that some of the most studied cellular processes such as DNA replication may be based on stochastic decision making inherited from ancient biomolecular circuits, e.g., that autonomously dictate the length of the G1 phase before cell cycle Start (Di Talia et al., 2007). Moreover, these studies on 914 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
viruses support the idea that ‘‘cellular’’ decisions actually occur at the level of intracellular molecular networks. The outcome of these stochastic decisions is an environment-dependent balance between lysis and lysogeny within viral populations, faithfully encoded by the viral genome and the host environment. Therefore, cellular decision making constitutes a very simple mechanism for pattern formation that does not require cell-cell interactions or intercell communication and can therefore operate from the lowest to the highest levels of biological complexity (from viruses to multicellular eukaryotes), as discussed below. Bacteria Among microbes used in the study of unicellular development, Bacillus subtilis leads the pack. Besides its easy genetic manipulation, the main reason for the popularity of this soil bacterium is the variety of developmental choices that it assumes during starvation (Lopez et al., 2009). As nutrients become limiting, B. subtilis gears up to differentiate into spores—nongrowing capsules that are highly resistant to a variety of stresses and starvation—but without a rush. In fact, these bacteria take every possible opportunity to delay sporulation of the entire clonal cell population by exploring a number of alternative options, including extracellular matrix production, motility, cannibalism, nutrient release through cell lysis, cell growth arrest, and DNA uptake (competence). Cells uncommitted to sporulation start growing as soon as one of these alternative strategies enables them to do so or as soon as nutrients become available. One particular B. subtilis cell fate decision that has attracted much attention recently is the transition to competence, when cells take up extracellular DNA and use it as food or perhaps to integrate it into the genome as a mechanism of increased evolvability under stress (Galhardo et al., 2007). During starvation, only a limited percent of the clonal bacterial population becomes competent, a decision dictated by the master regulator ComK that activates the genes involved in this developmental program, including itself (Figure 3A). ComK levels are controlled by the protease complex MecA/ClpC/ClpP, which also binds ComS, a factor that is capable of preventing ComK degradation through competitive binding to the protease. Because comS is repressed during competence, these interactions form a negative feedback loop around comK. Su¨el and colleagues developed a mathematical model, showing that the nested positive and negative feedback loops enable excitable dynamics (Figures 3B and 3C), generating pulses of ComK protein expression and episodes of competence (Su¨el et al., 2006). Each of these episodes starts with a transient increase in ComK levels that is amplified through autoregulation, leading to a quick rise to maximal ComK protein expression and transition to competence. This, in turn, leads to comS repression, enabling the protease complex to degrade ComK, terminating the ComK pulse and the episode of competence. If ComK controls entry into and ComS controls exit from competence, then they should affect different aspects of these transient differentiation events. This was indeed the case, as found by controlled ComK and ComS protein overexpression (Su¨el et al., 2007). High basal comK expression increased the frequency of competence epochs until the point in which the cells remained permanently competent. On the other hand,
Figure 3. Competence Initiation in B. subtilis (A) The gene regulatory network controlling entry into competence consists of the master regulator ComK and its indirect activator, ComS. ComK activates its own expression, and ComS is downregulated during competence, which results in a structure of nested positive and negative feedback loops. Regulatory interactions mediating positive and negative feedback are shown in red and blue, respectively. Arrowheads indicate activation; blunt arrows indicate repression. (B) Nullclines for ComK and ComS, based on the model from Su¨el et al. (2006). The nullclines intersect in three distinct points, indicating that there are three steady states. (C) Potential calculated along the nullcline d[ComS]/dt = 0, based on the R Fokker-Planck approximation, f = 2 ½ðf gÞ=ðf + gÞ d½ComK, wherein f and g represent comK synthesis and degradation, respectively, along the ComS nullcline. The filled circle on the left indicates a stable steady state. The gray circles in the middle and on the right indicate saddle points: the middle one is unstable along the ComS nullcline (it is sitting on a ‘‘crest‘‘ in the potential), whereas the one on the right is unstable along the ComK nullcline. A small perturbation (due to molecular noise) will drive ComK expression from the stable steady state near the other two steady states, initiating transient differentiation into competence, after which the system returns to the steady state on the left.
high comS basal expression prolonged the time spent in competence. To further establish the mechanism of competence initiation, the authors ingeniously inhibited cell division while DNA replication continued unaltered. This caused the cell volume to increase, leaving the average ComK concentrations unaffected
while lowering the noise in ComK protein expression. Examining the rate of competence initiation in cells of increasing length (and consequently, decreasing ComK noise), the rate of competence initiation dropped substantially. Consequently, ComK noise plays a crucial role in competence initiation by elevating subthreshold levels of ComK toward a critical point at which positive feedback takes effect to initiate periods of competence, in a manner similar to stochastic resonance (Wiesenfeld and Moss, 1995). Likewise, ComS protein expression noise was found crucial for controlling not just the length, but also the variability of competence episodes. A synthetic gene circuit with equivalent average dynamics to the natural one had much lower variability of competence episodes, which severely compro atay mised the DNA uptake capacity of the cell population (C¸ag et al., 2009). The crucial role of noise in competence initiation was independently confirmed by another group (Maamar et al., 2007) after successfully decoupling ComK protein expression noise and mean. Although they studied ComK dynamics over a shorter time, Maamar et al. found that entry into competence occurred predominantly during a transient rise in ComK expression around the time of entry into stationary phase. Competence is a bacterial attempt to delay complete sporulation of the entire clonal cell population. However, if no cells decide to sporulate while the environment continues to worsen, the population will have a decreased chance of survival. Therefore, all bacteria must eventually sporulate, which they do, but only gradually over several days. A recent account of cell fate decision in sporulation conditions reported on cell population size and individual cell length in growing B. subtilis microcolonies (Veening et al., 2008b). After the initial exponential phase, the authors observed a period of slow bacterial growth (diauxic shift), later followed by complete growth arrest for approximately half of a day. By measuring the growth rate and morphology of individual cells, three distinct cell fates were identified: spores, vegetative cells, and lysing cells. Interestingly, only the vegetative cells grew during the diauxic phase, accounting alone for all of the growth observed during this period. Cells that later formed spores or lysed did not grow, indicating that their cellular fates bifurcated much before their terminal phenotypes could be determined. This phenotypic bifurcation was independent of cell age but was consistent within ‘‘cell families’’ defined as a cell and all its descendants, implying ‘‘transgenerational epigenetic inheritance’’ (Jablonka and Raz, 2009) of this decision. These heritable cell fate decisions correlated with transcription from a sporulation promoter and were eliminated when the phosphorelay feedback through the master sensor kinase for sporulation was disrupted, demonstrating the importance of posttranslational (rather than transcriptional) positive feedback in the inheritance of cellular fate. Observing the frequency of stochastic cellular decisions in clonal bacterial populations brings up the interesting question: is there a role for cellular decision making as bacteria join forces in a population-level effort such as in quorum sensing, the ability of bacteria to detect their density and thereby orchestrate population-level behaviors such as luminescence or virulence? This question is currently being addressed using Vibrio harveyi as a model organism. As V. harveyi cells divide and their density exceeds a threshold, they undergo a remarkable transition and Cell 144, March 18, 2011 ª2011 Elsevier Inc. 915
become bioluminescent, which is made possible by their ability to synthesize and detect specific small signaling molecules (autoinducers) through quorum sensing (Waters and Bassler, 2005). Growing cell populations produce more and more autoinducer, which becomes concentrated and turns on bioluminescence, in addition to a number of other functions related to multicellular behavior. Whether all or only some individual cells undergo the decision triggered by quorum sensing remains an important open question that will soon be answered thanks to recent efforts to measure quorum sensing-related gene expression at the single-cell level in newly engineered V. harveyi strains. So far, gene expression measurements for the master quorumsensing regulator LuxR (Teng et al., 2010) and a small RNA controlling LuxR expression revealed relatively low but autoinducer-dependent noise (Long et al., 2009), which may imply that the V. harveyi quorum-sensing circuit has evolved to reduce noise and bacterial individuality while transitioning to populationlevel behavior. Indeed, multiple nested negative feedback loops have been identified along the signaling cascade connecting autoinducer receptors to LuxR (Tu et al., 2010), which are network structures capable of noise reduction (Becskei and Serrano, 2000; Nevozhay et al., 2009). Other examples of cellular decision making in bacteria are the activation of the lactose operon in E. coli and bacterial persistence (phenotypic switching of bacteria to an antibiotic-tolerant state). The first of these has a history of more than five decades (Novick and Weiner, 1957) and will not be discussed here. Regarding bacterial persistence, some critical information is still missing. Persistence of E. coli cells has been observed at the single-cell level (Balaban et al., 2004), but the underlying network and molecular mechanisms may be highly complex and are currently unknown. Conversely, a bistable stress response network has been proposed to underlie persistence in Mycobacteria (Sureka et al., 2008; Tiwari et al., 2010), but the measurements to observe persistent cells and link them to this network have yet to be performed. In summary, bacteria are masters of cellular decision making, which enables them to hedge bets in a fluctuating, often stressful environment. This may explain their presence in the most extreme and unpredictable environments. Unlike viruses, which typically decide between lysis and lysogeny, genetically identical bacteria can select their fates randomly from a spectrum of multiple options. Fates with lowest direct fitness (such as the spore state) are entered gradually, with a delay, while a variety of alternative options are explored. Bacterial cell decisions involve noisy networks with feedback loops that are capable of bistable or excitable dynamics. Unlike viruses, bacteria can combine cellular decision making with other mechanisms (such as cell-cell communication) to achieve more complex population-level behaviors. Cellular decision making appears suppressed when cell-cell communication becomes prominent (as in quorum sensing), suggesting that microbial individuality is undesired when genetically identical bacteria assume multicellular behaviors. The above examples indicate that many bacterial species are capable of population-level behaviors. Moreover, these examples suggest that the simplest forms of multicellular behavior do not require physical contact or communication between cells. 916 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Yeast The budding yeast Saccharomyces cerevisiae was the first organism for which the noise of thousands of fluorescently tagged proteins expressed from their native promoters was measured (Bar-Even et al., 2006; Newman et al., 2006). Many yeast genes were found to be significantly noisier than expected based on Poissonian protein synthesis and degradation, suggesting that gene expression bursts may cause the elevated noise of certain genes, which may be beneficial and under selection. These noisy genes had a tendency to be associated with stress responses (Gasch et al., 2000) and often contained a TATA box in their core promoter. Accordingly, TATA box mutations were found to diminish gene expression noise, which lowered the chance of survival in severe stress from which the gene’s protein product offered protection (Blake et al., 2006). Taken together, these results suggested that yeast cells carry an arsenal of genes with unexpectedly noisy expression, supplying the noise needed for phenotypic diversification, which can benefit the population in a fluctuating, often stressful environment. The galactose uptake system is a relatively well-studied example of a noisy environmental response network. Yeast cells show bimodal expression of galactose uptake (GAL) genes when exposed to a mixture of low glucose and high galactose, indicating that cells decide stochastically between utilizing either the limited amount of glucose or growing on galactose (Biggar and Crabtree, 2001). On the other hand, the expression of GAL genes is, in general, more uniform in the absence of glucose when grown on galactose alone or on galactose mixed with raffinose and glycerol. How is it possible for a gene network to generate uniform or bimodal (noisy) expression across the cell population, depending on the stimulus? Essentially, the GAL molecular circuitry consists of three feedback loops. Two of these feedback loops are positive and involve the galactose permease Gal2p and the signaling protein Gal3p. The third feedback loop is negative and involves the inhibitor Gal80p. All three molecules (Gal2p, Gal3p, and Gal80p) are under the control of the activator Gal4p, and they also regulate Gal4p activity and galactose uptake (Figure 4). To understand how this network structure affects cellular decision making, each of these feedback loops was individually disrupted (Acar et al., 2005), and the pattern of GAL gene expression across the cell population was examined after transferring the cells from no galactose- or high galactose-containing medium to various intermediate galactose concentrations. The wild-type strain, with all three feedback loops intact, had history-dependent gene expression a day after transfer, depending on the original growth condition. Specifically, wildtype cells transferred from high galactose had unimodal GAL expression tracking the galactose concentration, whereas those transferred from low galactose had bimodal expression, indicating that only a subpopulation of cells made the choice to take up galactose. GAL2 deletion had a minimal effect on the GAL expression pattern compared to wild-type cells. On the other hand, disruption of the Gal3p-based positive feedback loop resulted in unimodal GAL gene expression regardless of the conditions prior to transfer, indicating that the cells lost their capacity of decision making. Finally, disruption of the
Figure 4. The Galactose Uptake Network in S. cerevisiae (A) Regulatory network controlling galactose uptake. Regulatory interactions mediating positive and negative feedback are shown in red and blue, respectively, and the regulatory interaction that participates in both positive and negative feedback loops is shown in light blue. Solid lines indicate transcriptional regulation; dashed lines indicate nontranscriptional regulation (for example, Gal80p binds to Gal4p and represses Gal4p activator function on GAL promoters). Arrowheads indicate activation; blunt arrows indicate repression. (B) Gal3p synthesis (blue lines) and degradation (red line) rates as functions of Gal3p concentration, for three different galactose concentrations. R (C) Potential based on the Fokker-Planck approximation, f = 2 ½ðf gÞ= ðf + gÞd½Gal3p, wherein f and g represent Gal3p synthesis and degradation, respectively. There is a stable steady state on the left side of the surface at all galactose concentrations. At sufficiently high galactose concentrations, an
Gal80p-based negative feedback loop resulted in unimodal, low GAL expression for cells transferred from no galactose, whereas cells transferred from high galactose had a bimodal distribution. Overall, these results indicate that the Gal3p- and Gal80p-based feedback loops play critical roles in cellular decision making and history dependence of GAL expression. The gene expression patterns observed by Acar and colleagues (Acar et al., 2005) bring up an important concept: cellular memory. Considering that cells make stochastic decisions, how long do they stick to their choices? This question can be reformulated in terms of escape rates and addressed theoretically, as follows: given that a cell resides in a potential well on Waddington’s landscape (Figure 1), how long does it take for it to escape under the influence of noise to a nearby well? Theory predicts that the chance of escape depends on two factors: noise strength and the height of the barrier that needs to be surpassed in order to escape (Ha¨nggi et al., 1984) (noise facilitates, whereas a tall barrier hinders escape). Based on the noise strength and the ‘‘geography’’ of the potential shown in Figure 4, the authors predicted that, by controlling GAL80 expression, they could prolong or shorten the maintenance of high and low GAL expression states in cells with disrupted negative feedback. This was then confirmed experimentally (Acar et al., 2005). Another remarkable case of yeast cell decision making was described by Paliwal and colleagues, who used clever microfluidic chip design to study the response of individual a mating-type yeast cells to the a pheromone (Paliwal et al., 2007). Pheromone was supplied artificially so as to establish a spatial gradient in which a high number of cells exposed to various pheromone concentrations could be observed. Normally, the pheromone serves as a cue to direct a cell elongation (shmooing) toward a mating partner of opposite type (a). Cells exposed to no pheromone or high pheromone behaved in a uniform fashion (all cells budding and shmooing, respectively). However, a very different scenario emerged for cells that were exposed to identical intermediate pheromone concentrations: a mixture of budding, cell cycle arrested, and shmooing phenotypes were observed, demonstrating cellular decision making. Shmooing cells had significantly higher expression of the transcription factor Fus1p, indicating that at least one observed phenotype was attributable to bimodal gene expression. The network that is responsible for Fus1p activation consists of a mitogen-activated protein kinase (MAPK) pathway that encompasses multiple positive feedback loops, prime candidates for inducing bimodal FUS1 expression. Indeed, disruption of these feedback loops made FUS1 expression and the response to pheromone more uniform across yeast cell colonies, supporting the idea that positive autoregulation can induce cellular decision making. These examples indicate that cellular decision making is widely utilized by yeast cells to maximize the propagation of their genome in a changing environment. A prominent role of feedback regulation in cellular decision making is emerging from
additional steady state appears (deep well on the right). As galactose concentration is slowly increased, cells can end up in either potential well (cellular decision making). Moreover, molecular noise can move cells from one potential well to the other, even in constant galactose concentration.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 917
these examples, although other regulatory mechanisms (such as epigenetic regulation) can also play a role (Octavio et al., 2009). As many genes are noisy when yeast cells grow in suspension, it is interesting to ask how noise and cellular decision making are regulated and exploited during the transition to populationlevel behaviors such as flocculation due to quorum sensing in yeast cell populations (Smukalla et al., 2008). Yeast cells carry a primitive version of the molecular arsenal utilized during metazoan development, such as homeodomain proteins, morphogens, and the apoptosis pathway. Is noise in these pathways suppressed or elevated in yeast compared to higher eukaryotes? Answering these questions may yield important insights into the regulation of cellular decision making in metazoan development. Lower Metazoans Animals are compact multicellular organisms that grow out from a single zygote cell following a complex embryonic developmental program. During development, increasingly differentiated cell types emerge through sequential rounds of cell division, giving rise from about one thousand (Caenorhabditis elegans) to millions (Drosophila melanogaster) or tens of trillions (humans) of isogenic cells in a fully developed animal. Moreover, these expanding and diversifying cell subpopulations perform remarkably well-defined movements in space and time, such that they arrive to appropriate locations relative to each other, ready to perform their function in the adult animal (Goldstein and Nagy, 2008). Importantly, a few cells embed themselves into specific niches and remain partially undifferentiated, thereby becoming adult stem cells that are capable of replacing differentiated cells that are lost during adult life. The tremendous population expansion that cells undergo during embryonic development poses a serious danger of error amplification, implying that stochastic cellular decision making should be less common than in unicellular organisms, and control mechanisms should exist to suppress it during development (Arias and Hayward, 2006). Without proper control, a random switch to an incorrect cell fate in the wrong place or at the wrong time could have detrimental consequences for the developing embryo. For this reason, highly stochastic cell fate choices may be restricted to specific cell types and developmental stages, such as the differentiation of adult and embryonic stem cells or the differentiation of cells whose precise location is unimportant (such as retinal patterning and hematopoiesis). Given the omnipresence of noise, how precise can animal development be, and what noise control mechanisms are utilized? These questions were addressed recently by monitoring the spatial expression pattern of the gap gene hunchback in single D. melanogaster nuclei in response to the morphogen Bicoid (Figures 5A and 5B), which is asymmetrically deposited by the mother to the anterior pole of the egg (Gregor et al., 2007). The fertilized fruit fly zygote initially does not separate into individual cells, allowing Bicoid to freely diffuse away from this pole and create an exponential anterior-posterior gradient along the dividing nuclei. Consequently, single nucleus-wide sections perpendicular to the anterior-posterior axis in the developing embryo will have practically identical, exponentially decreasing morphogen concentrations (Figure 5B), with a 10% 918 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Figure 5. Cell Development
Fate
Specification
during
Lower
Metazoan
(A) The morphogen Bicoid regulates hunchback expression during fruit fly development, setting up the scene for subsequent patterning of the embryo. (B) Bicoid and Hunchback concentrations along the anterior-posterior axis of the fruit fly embryo (length: 500 mm), according to the measurements by Gregor and colleagues (Gregor et al., 2007). The Bicoid concentration (red) is exponentially decreasing toward the posterior end, with a length constant of 500 mm, and is ‘‘read out’’ by Hunchback (blue) with a 10% relative error rate according to the average dose-response relationship Hb/Hbmax = (Bcd/Bcd1/2)5/[1+(Bcd/Bcd1/2)5]. (C) Gene regulatory network controlling intestinal cell fate specification during Caenorhabditis elegans development.
drop between neighboring sections, regardless of their location in the embryo (Gregor et al., 2007). This concentration change is successfully and reliably detected by neighboring nuclei, as indicated by their gene expression pattern (Holloway et al., 2006). How is it possible to achieve this precision? Among other genes, hunchback expression represents a critical readout of Bicoid concentration (Figure 5A), restricting future segments in the larva, and later the adult fly, to their appropriate locations. Hunchback expression levels showed sigmoidal morphogen dependence, indicating highly cooperative activation by Bicoid (Figure 5B). More importantly, Hunchback had remarkably low noise levels in sets of nuclei exposed to identical morphogen concentrations, with a noise peak corresponding to the steepest region of the Hunchback dose response, in which the coefficient of variation was about 20%. Assuming that hunchback expression noise was originating from Bicoid fluctuations, the authors used the Bicoid-Hunchback dose-response data to infer the noise in Bicoid concentration, as perceived by individual nuclei, and found a U-shaped error profile along the anterior-posterior axis, with a minimum coefficient of variation of 10%, consistent with earlier work (Holloway et al., 2006). This indicates that cellular decision making is strongly suppressed while setting up hunchback expression along the embryo in response to Bicoid. Individual nuclei have merely
10% autonomy in deciding what Bicoid concentration is in their surroundings and setting up the appropriate response. Seeking to understand how neighboring nuclei could reliably detect a 10% drop in Bicoid concentration, Gregor and coworkers estimated the averaging time necessary to reduce the error that individual nuclei make in estimating Bicoid concentrations, relying solely on stochastic Bicoid binding/dissociation events to/from its DNA-binding sites. The results were strikingly inconsistent with the temporal averaging hypothesis, requiring nearly 2 hr of averaging to reach 10% relative error. Looking for alternatives, the authors asked whether spatial averaging could also contribute to noise reduction. Measuring the spatial autocorrelation of Hunchback concentration fluctuations around the mean revealed that nuclear communication indeed occurs over approximately five nuclear distances, reducing the averaging time to a single nuclear cycle (3 min). In summary, sets of neighboring nuclei talk to each other and jointly accomplish quick and accurate estimates of the local Bicoid gradient. The identity of the mediator for this nuclear communication remains elusive. To study spatiotemporal patterns of expression for several genes during a later developmental stage (mesodermal patterning), another group applied quantitative in situ hybridization followed by automated image processing in hundreds of fruit fly embryos (Boettiger and Levine, 2009). Contrary to the high precision of Hunchback response to Bicoid (Gregor et al., 2007), several genes had variable, ‘‘dotted’’ expression across the developing premesodermal surface, indicating that gene expression can be noisy even during multicellular development. This noise was, however, transient, as by the end of the mesodermal patterning phase, all cells expressed these genes at maximal level, indicating that cells can choose autonomously the time of their activation during mesodermal patterning but have no freedom to choose their final expression level at the end of this period. Importantly, another subset of genes behaved differently from their noisy peers and reached their full expression in concert, over a relatively short timescale. Seeking to identify mechanisms underlying this type of ‘‘synchrony’’ for this second subclass of genes, the authors found that their expression was typically regulated through a stalled polymerase. Moreover, one of the low-noise genes, dorsal, had to be present in two copies for maintaining the synchrony and low noise of other genes from the second subclass. The few genes that still maintained low noise after deleting one dorsal copy were found to have shadow enhancers—distal sequences involved in gene activation, which apparently ensure the robustness and reliability of expression for a few highly critical developmental genes. These findings indicate that noisy gene expression and stochastic cell fate decisions would be the default even during metazoan development if intricate regulatory mechanisms did not exist to suppress these variations, ensuring reliable patterning. One developmental process that fully exploits cellular decision making is the patterning of the fly’s eye. Compound fly eyes consist of hundreds of ommatidia, each of which harbor eight photoreceptors, two of which (R7 and R8) are responsible for color vision. Based on rhodopsin (Rh) expression in these photoreceptors, the corresponding ommatidia can become pale or
yellow. The pale/yellow choice occurs in the photoreceptor R7 of each ommatidium: if R7 expresses Rh3, then the ommatidium becomes pale, whereas if it expresses Rh4, the ommatidium becomes yellow. R7’s choice is then transferred to R8 and stabilized through a positive feedback loop between the regulators warts and melted. Pale and yellow ommatidia are randomly localized and make up 30% and 70% of the fly eye, respectively, suggesting that their positioning results from stochastic cell fate choices. This random patterning can be abolished by the deletion or overexpression of the transcription factor spineless, which changes the retinal mosaic into uniformly pale and yellow, respectively (Wernet et al., 2006). Fruit fly development suggests that gene expression noise and stochastic cell fate choices are carefully controlled and often suppressed, except when they are not disruptive for developmental patterning (Boettiger and Levine, 2009) or when they are exploited to assign random cell fates with desired probabilities (Wernet et al., 2006). What happens if noise suppression fails and fluctuations escape from control? This was examined by monitoring mRNA expression in single cells during C. elegans development (Raj et al., 2010) in a regulatory cascade composed of multiple feed-forward loops controlling the expression of elt-2, a self-activating transcription factor that is critical for intestinal cell fate specification (Figure 5C). After the 65-cell stage, elt-2 expression was high in all cells of all wild-type worm embryos. However, this uniform expression pattern became variable from embryo to embryo and bimodal within individual embryos after mutation of the transcription factor skn-1, which sits at the top of the regulatory hierarchy in Figure 5C, and caused lack of intestinal cells in some, but not all, embryos. Similar phenomena, when genetically identical individuals carrying the same mutation show either disrupted or wild-type phenotype, are called partial penetrance. Counting individual mRNAs in all cells of hundreds of embryos, Raj et al. observed sequential activation of the genes in Figure 5C during development from the top toward the bottom of the hierarchy, with med-1/2 exhibiting an early spike of expression, accompanied by a wider end-3 spike and a prolonged but still transient high expression period of end-1. The outcome of these gene expression events was high and stable elt-2 expression and proper intestinal cell fate specification. By contrast, in the skn-1 mutant, the expression of all genes was diminished or absent, and the majority of embryos had practically no elt-2 expression. Moreover, end-1 expression was highly variable within individual embryos, indicating that skn-1 mutations relieve pre-existing noise suppression, thereby allowing stochastic cell fate decisions to occur. Downregulation of the histone deacetylase hda-1 partially rescued the skn-1 mutant phenotype, indicating that chromatin remodeling was one source of end-1 noise unveiled in skn-1 mutant embryos. However, deletion of upstream transcription factors other than skn-1 (i.e., med-1/2, end-3) did not cause a comparably detrimental reduction of end-1 levels. Taken together, these data suggest that these intermediate transcription factors act in a redundant fashion, buffering noise in the system and ensuring sufficiently high end-1 expression, which can then switch the elt-2 positive feedback loop to the high expression state, ensuring reliable intestinal cell fate specification. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 919
Figure 6. Embryonic Stem Cell Decision Making in Mammals (A) The Nanog-Oct4 gene regulatory network primes ESC differentiation. Regulatory interactions mediating positive and negative feedback are shown in red and blue, respectively. Regulatory interactions that participate in both positive and negative feedback loops are shown in light blue. Arrowheads indicate activation; blunt arrows indicate repression. (B) Nullclines for Nanog and Oct4, based on the model from Kalmar et al. (Kalmar et al., 2009). The nullclines intersect only once, corresponding to a single stable steady state. (C) Potential calculated along the nullcline d[Nanog]/dt = 0, based on the Fokker-Planck approximation. The filled circle on the right indicates the only stable steady state. The gray shaded area is inaccessible because it corresponds to nonphysical solutions. The system undergoes transient excursions to the left (low Nanog concentrations) under the influence of molecular noise. This will prime the ESCs for differentiation if appropriate signals are present.
These examples together indicate that the noise of certain genes is suppressed and buffered by a variety of mechanisms (such as spatial and temporal averaging, stalled polymerases, and redundant regulation) during the development of lower metazoans. Consequently, cellular decision making is generally suppressed unless specifically required for developmental patterning (as for the ommatidia of the composite fly eye) or unless it is harmless (does not interfere with the execution of the overall developmental program). Disruption of the noise control mechanisms unmasks noise and can have detrimental effects on the development of the organism. Noise control during development may resemble the apparent suppression of cellular individuality during quorum sensing, which triggers populationwide behavior in microbes. These and similar open questions can be properly addressed in the context of social evolution theory (West et al., 2006). On the experimental side, much remains to be discovered about the consequences of ‘‘letting noise loose’’ during development. For example, once the factor that is responsible for spatial averaging across fruit fly nuclei (Gregor et al., 2007) is identified, it would be interesting to examine how fly development tolerates the inhibition of this internuclear communication. Mammals Embryonic development is highly conserved among mammals: after a few divisions of the fertilized egg, the resulting cells quickly advance to the blastocyst stage, which manifests as a spherical trophectoderm surrounding the inner cell mass. The inner cell mass consists of pluripotent embryonic stem (ES) cells that are capable of differentiating into any cell type in the future organism. Therefore, efficiently isolating and maintaining ES cells in laboratory conditions holds exceptional potential for future medical applications. However, to truly exploit the pluripotency of stem cells, it is essential to understand and control the processes underlying their differentiation into various tissues. Moreover, the recent success of reverting differentiated cells into induced pluripotent stem (iPS) cells (Takahashi and Yamanaka, 2006) poses further 920 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
questions about the efficiency and stability of this reversal. Is differentiation into specific cell types solely the result of cellular decision making, or is it somewhat controllable? To what degree is differentiation reversible, and can the rate of induced pluripotency be increased? And what is the role of noise in pluripotency? Nanog is a critical pluripotency marker whose expression is lost during ES cell differentiation, and it is maintained at a high level only in pluripotent cells. Following in the footsteps of Chambers et al., who showed stochastic Nanog expression corresponding to attempts of ES cell differentiation (Chambers et al., 2007), Kalmar et al. monitored single ES cells and embryonal carcinoma (EC) cells to better understand Nanog dynamics (Kalmar et al., 2009). Both cell lines had a surprisingly strong, bimodal heterogeneity of Nanog expression that involved transitions between the high and low expression states. Consistent with Nanog’s function, cells with low expression responded better to differentiation signals. Analyzing the dynamics of the gene regulatory network controlling Nanog expression (Figure 6A), the authors suggested that the system was excitable rather than bistable, giving rise to a small ES cell subpopulation with low Nanog expression through occasional random excursions from the high to the low expression state (Figures 6B and 6C). Though this expression pattern is opposite to ComK dynamics during competence initiation in B. subtilis, it relies on a gene regulatory network of similar structure, involving nested positive and negative feedback loops, namely: mutual Oct4 and Nanog activation, Oct4 and Nanog autoregulation, and Nanog repression by Oct4. However, because the network underlying ES cell pluripotency is not completely known, it cannot yet be excluded that the high- and low-Nanog subpopulations result from noise-induced transitions in a bistable system (Chickarmane et al., 2006; Glauche et al., 2010; Kalmar et al., 2009). Indeed, the source of noise driving Nanog excursions into the low expression state remains elusive, especially considering that high molecular levels are often associated with low noise. Gene expression bursts (Raj et al., 2006) may offer a solution, as highly expressed proteins can be noisy provided that they are expressed in bursts (Newman et al., 2006).
Differentiation is accompanied by loss of Nanog expression, in addition to downregulation of Oct4 and Sox2, the other transcription factors responsible for the maintenance of Nanog expression and pluripotency. Contrary to the early belief that differentiated cells cannot return to the pluripotent state, Takahasi and Yamanaka (Takahashi and Yamanaka, 2006) found that controlled upregulation of Oct4, Sox2, Klf4, or c-Myc can convert fully differentiated cells into iPS cells. However, such iPS cells were remarkably difficult to obtain and appeared as only a minuscule percentage in large differentiated cell populations exposed to identical genetic and environmental perturbations. Trying to understand the enigmatic source of iPS cells, two possible scenarios for their generation were proposed (Yamanaka, 2009): the elite model assumed pre-existing differences responsible for reversal to the iPS cell state, whereas the stochastic model assumed that reversal occurred by random chance, even without any pre-existing differences. The dichotomy of these models is analogous to the contrasting views of deterministic versus stochastic dynamics on Waddington’s landscape, as well as the recent controversy on the predictability of the lambda switch (St-Pierre and Endy, 2008; Zeng et al., 2010). A recent study set out to test experimentally the validity of the elite versus the stochastic model in iPS cell induction (Hanna et al., 2009). Differentiated murine B cells were identically prepared to harbor inducible copies of Oct4, Sox2, Klf4, and c-Myc and to express Nanog-GFP once reversal to the iPS state occurred. A large number of clonal populations established from such B cells were maintained in constant conditions continuously for several months, and the appearance of iPS cells was monitored over time. The first iPS cells appeared after 2–3 weeks, followed by other iPS reversals as time progressed. Toward the end of the experiment, nearly every clonal population (93%) had a significant number of iPS cells, demonstrating that obtaining the iPS state is just a matter of time and patience, as some descendants of every B cell were capable of returning to the pluripotent state (also confirmed by their ability to generate teratomas and chimaeras). These findings strongly support the stochastic model of induced pluripotency. The authors also studied the influence of overexpressing p53, p21, Lin28, or Nanog (in combination with all of the iPS-inducing factors Oct4, Sox2, Klf4, and c-Myc) on the speed of reversal to the iPS state. All of these additional perturbations were found to increase the rate of reversals to the iPS state but for different reasons. Whereas p53, p21, and Lin28 increased the cell division rate and had an effect by raising the B cell population size while leaving the reversal rate per individual B cell unaffected, Nanog overexpression had a significant effect even after adjusting for growth rate differences. Considering these studies demonstrating the role of noise in ES cell differentiation and the induction of pluripotency, it is intriguing to ask whether there is a role for cellular decision making in adult mammals. One of the first studies to address this question focused on adult progenitor cells (a multipotent hematopoietic stem cell line) (Chang et al., 2008), observing that the expression of the stem cell marker Sca-1 varied over three orders of magnitude across this cell population. Sorting the cells into distinct subpopulations based on their expression revealed that the variability in Sca-1 levels was dynamic: all
sorted populations relaxed to the original distribution in 9 days. The variability was found to reflect predisposition for certain cell fates because cells with low Sca-1 expression had relatively high expression of the erythroid differentiation factor Gata1 and lower expression of the myeloid differentiation factor PU.1. Accordingly, upon stimulation with erythropoietin, low Sca-1-expressing cells differentiated much faster into erythrocytes than their peers with high Sca-1 expression. Moreover, the differences among the original pluripotent stem cells were not restricted to these two differentiation factors: microarray analysis revealed additional genome-wide differences in gene expression between three subpopulations sorted by their Sca-1 expression (Sca-1low, Sca-1mid, and Sca-1high). In addition to cell differentiation, one of the most important processes recently shown to rely on cellular decision making is apoptosis (Spencer et al., 2009). These authors followed by microscopy the fate of sister cell lineages exposed to a ‘‘mortal’’ agent: tumor necrosis factor-related apoptosis inducing ligand (TRAIL) in two clonal cell lines (HeLa and MCF10A). A striking heterogeneity in cell fate was observed. Some cells never died, and those that died showed a highly variable time between TRAIL exposure and commitment to programmed cell death (indicated by caspase activation or mitochondrial outermembrane permeabilization). Moreover, sister cells that died soon after TRAIL exposure showed synchronous commitment to apoptosis, whereas those that died later showed gradually decreasing correlation between their times of death, indicating that these suicidal decisions depended on factors inherited from the mother cell that gradually and stochastically diverged as daughter cells divided over time. Measuring the concentrations of five apoptosis-related proteins in single cells, together with a mathematical model of TRAIL-induced apoptosis allowed the authors to conclude that most stochastic variation in the commitment to cell death was due to initiator procaspase activity that cleaves the apoptotic regulator BH3 interacting domain death agonist (BID) into the truncated form tBID. When tBID hits a threshold, this sets off an irreversible avalanche of molecular interactions that culminate in apoptosis. In summary, these examples from mammalian cells indicate that cellular decision making underlies the most basic cellular processes in some of the most complex organisms, relying on regulatory networks with dynamics similar to those found in lower metazoans and microbes. However, the exact structure of the regulatory mechanisms controlling mammalian cell decisions is much less understood than for lower organisms and may involve cytoskeleton dynamics (Ambravaneswaran et al., 2010), subcellular localization, posttranslational modification, microRNAbased regulation, or other yet unknown mechanisms. Moreover, the studies discussed above were conducted in cell lines, and not actual mammals, and very little is known about mammalian cell fate choices in vivo. To start overcoming this gap, it will be important to compare and analyze cellular decision making from microbes and lower metazoans from an evolutionary perspective, hoping to learn lessons applicable to mammals. Conclusions, Challenges, and Open Questions Here, we reviewed several examples of cellular decision making at multiple levels of biological organization. The generality of this Cell 144, March 18, 2011 ª2011 Elsevier Inc. 921
phenomenon suggests that we are dealing with a fundamental biological property, which many organisms evolved to utilize due to the benefits of task allocation in isogenic cell populations. Cellular decision making combined with environmental sensing and cell-cell communication are three key processes underlying pattern formation and development from microbes to mammals. Moreover, viral decision making suggests that some form of random diversification may have been present even before cells existed. In fact, the phrase ‘‘cellular decision making’’ is an oxymoron because these decisions actually occur at the level of gene regulatory networks such as the ones highlighted in this Review. Cells only provide microscopic meeting places for the real key players: genes connected into regulatory networks (Dawkins, 2006). Several conclusions can be drawn from the examples discussed above. First, cellular decision making is frequently based on networks with multiple nested feedback loops, at least one of which is positive. The role of these feedback loops in various decision-making circuits remains to be determined, but it appears that positive feedback makes cellular decisions stable, whereas negative feedback makes them more easily reversible. Studying the dynamics of multiple feedback loops and their role in differentiation and development has much insight to offer (Brandman et al., 2005; Ray and Igoshin, 2010; Tiwari et al., 2010). Second, these networks appear to operate in parameter regimes enabling either bistable or excitable dynamics. Third, cellular decision making relies on intrinsic molecular noise, which induces transitions between steady states in bistable systems and transient excursions of gene expression in excitable systems. Fourth, as a consequence of the above, all cellular decisions are reversible from a theoretical point of view, although, in practice, this may not occur due to the irreversibility of secondary effects triggered by cellular decision making (such as cell lysis or apoptosis). The importance of intrinsic noise in cellular decision making has been questioned in a number of recent papers, which found that pre-existing differences in cell size, virus copy number, microenvironments, etc., may explain to a significant degree cell fate decisions (St-Pierre and Endy, 2008; Weitz et al., 2008). However, whereas the variability in cell-fate choices was somewhat reduced after accounting for certain newly identified factors, viral decisions were by far not entirely deterministic (Zeng et al., 2010). Though it may be tempting to expect that increasingly detailed measurements of the structure and properties of single cells may enable the exact prediction of cell fate, this hope is unlikely to be fully realized. Imagine for a moment that we could find two cells of exactly the same size and molecular composition and place them into the same environment. These cells could then theoretically have the same fate if all of their corresponding molecules would be in identical positions and would have identical velocities at a given time. However, this condition can never be satisfied in practice because the probability of finding all of the molecules in the same state (position, velocity, etc.) is infinitesimally small. Therefore, noise is inherent to gene networks confined to small compartments, such as cells or artificial microscopic compartments (Doktycz and Simpson, 2007), and cannot be eliminated. Instead, researchers should strive to understand and control noise increasingly better in order to control cell fate decisions. 922 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Whereas noise makes individual cells somewhat uncontrollable, the same may not be the case for large clonal cell populations, which can develop reliable patterns from unreliable elements due to the sheer power of statistics. For example, repeatedly tossing 100 fair coins will very likely result in nearly equal numbers of heads and tails, even though the fate of the individual coins is unpredictable. In the same way, the fly eye will reliably consist of 30% pale and 70% yellow ommatidia, even though the fate of individual ommatidia prior to patterning is uncertain. Synthetic gene networks capable of controlling gene expression noise (Murphy et al., 2010), the rate of random phenotypic switching (Acar et al., 2005), or the duration variability of transient differen atay et al., 2009) may be useful in the future tiation episodes (C¸ag for adjusting the rate and outcome of cellular differentiation. Finally, a major challenge is to understand how cellular decision making evolves under well-defined conditions. As discussed above, stochastic cellular fate choices lead to cell population diversity, the simplest possible developmental pattern within isogenic cell populations. Such population-level characteristics are, however, conferred by gene networks carried by every individual cell in these populations, and stochastic diversification may ultimately serve the propagation of their constituent genes (Dawkins, 2006). Phenotypic diversity implies that some individual phenotypic variants will have low direct fitness and will be at a disadvantage without stress, whereas others will perish when the environment becomes stressful. However, in specific cases, this type of sacrifice can be justified by Hamilton’s rule (Hamilton, 1964), considering that the relatedness between clonal individual cells is maximal, and the survival of any individual will propagate the same genome. This may allow for kin selection, as suggested by recent theoretical work (Gardner et al., 2007). On the experimental side, laboratory evolution of microbes in fluctuating environments may offer exciting opportunities to address these questions (Cooper and Lenski, 2010), as exemplified by the recent experimental evolution of random phenotypic switching (Beaumont et al., 2009). More generally, it will be interesting to examine from the perspective of social evolution (West et al., 2006) the formation of complex biological patterns, which may involve altruism (Lee et al., 2010), selfishness, spite, and various forms of cooperation in addition to stochastic cell fate choices. Observation of patterns in growing microbial colonies (Ben-Jacob et al., 1998) has lead to the proposal of considering microbes as multicellular organisms (Shapiro, 1998). Though criticized by researchers from the field of social evolution (West et al., 2006), this proposal brings up an interesting question: which microbial patterns are functional, and when can patterns evolve? Because patterns form readily in nonliving systems due to purely physical reasons, it will be interesting to examine, in the context of sociobiology (West et al., 2007), the conditions when a cell population becomes a multicellular organism (Queller and Strassmann, 2009) and whether specific biological patterns have biological function subject to population-level selection. ACKNOWLEDGMENTS We would like to thank D. Nevozhay, R.M. Adams, G.B. Mills, O.A. Igoshin, R. Azevedo, J.E. Strassmann, and two anonymous reviewers for their helpful
comments on the manuscript. G.B. was supported by the NIH Director’s New Innovator Award Program (grant 1DP2 OD006481-01) and by NSF grant IOS 1021675. A.v.O. was supported by the NIH/NCI Physical Sciences Oncology Center at MIT (U54CA143874) and the NIH Director’s Pioneer Award Program (grant 1DP1OD003936). J.J.C. was supported by the NIH Director’s Pioneer Award Program (grant DP1 OD00344), NIH grants RC2 HL102815 and RL1 DE019021, the Ellison Medical Foundation, and the Howard Hughes Medical Institute.
REFERENCES Acar, M., Becskei, A., and van Oudenaarden, A. (2005). Enhancement of cellular memory by reducing stochastic transitions. Nature 435, 228–232. Acar, M., Mettetal, J.T., and van Oudenaarden, A. (2008). Stochastic switching as a survival strategy in fluctuating environments. Nat. Genet. 40, 471–475. Ambravaneswaran, V., Wong, I.Y., Aranyosi, A.J., Toner, M., and Irimia, D. (2010). Directional decisions during neutrophil chemotaxis inside bifurcating channels. Integr. Biol. (Camb.) 2, 639–647. Arias, A.M., and Hayward, P. (2006). Filtering transcriptional noise during development: concepts and mechanisms. Nat. Rev. Genet. 7, 34–44. Arkin, A., Ross, J., and McAdams, H.H. (1998). Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics 149, 1633–1648. Balaban, N.Q., Merrin, J., Chait, R., Kowalik, L., and Leibler, S. (2004). Bacterial persistence as a phenotypic switch. Science 305, 1622–1625. Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O’Shea, E., Pilpel, Y., and Barkai, N. (2006). Noise in protein expression scales with natural protein abundance. Nat. Genet. 38, 636–643.
Cooper, T.F., and Lenski, R.E. (2010). Experimental evolution with E. coli in diverse resource environments. I. Fluctuating environments promote divergence of replicate populations. BMC Evol. Biol. 10, 11. Dawkins, R. (2006). The Selfish Gene: 30th Anniversary Edition, 30th anniversary edn (New York: Oxford University Press). Di Talia, S., Skotheim, J.M., Bean, J.M., Siggia, E.D., and Cross, F.R. (2007). The effects of molecular noise and size control on variability in the budding yeast cell cycle. Nature 448, 947–951. Doktycz, M.J., and Simpson, M.L. (2007). Nano-enabled synthetic biology. Mol. Syst. Biol. 3, 125. Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. (2002). Stochastic gene expression in a single cell. Science 297, 1183–1186. Galhardo, R.S., Hastings, P.J., and Rosenberg, S.M. (2007). Mutation as a stress response and the regulation of evolvability. Crit. Rev. Biochem. Mol. Biol. 42, 399–435. Gardner, A., West, S.A., and Griffin, A.S. (2007). Is bacterial persistence a social trait? PLoS ONE 2, e752. Gardner, T.S., Cantor, C.R., and Collins, J.J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257. Gillespie, D.T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361. Glauche, I., Herberg, M., and Roeder, I. (2010). Nanog variability and pluripotency regulation of embryonic stem cells—insights from a mathematical model analysis. PLoS ONE 5, e11238.
Beaumont, H.J., Gallie, J., Kost, C., Ferguson, G.C., and Rainey, P.B. (2009). Experimental evolution of bet hedging. Nature 462, 90–93.
Goldstein, A.M., and Nagy, N. (2008). A bird’s eye view of enteric nervous system development: lessons from the avian embryo. Pediatr. Res. 64, 326–333.
Becskei, A., and Serrano, L. (2000). Engineering stability in gene networks by autoregulation. Nature 405, 590–593.
Gregor, T., Tank, D.W., Wieschaus, E.F., and Bialek, W. (2007). Probing the limits to positional information. Cell 130, 153–164.
Ben-Jacob, E., Cohen, I., and Gutnick, D.L. (1998). Cooperative organization of bacterial colonies: from genotype to morphotype. Annu. Rev. Microbiol. 52, 779–806.
Hamilton, W.D. (1964). The genetical evolution of social behaviour. I. J. Theor. Biol. 7, 1–16.
Biggar, S.R., and Crabtree, G.R. (2001). Cell signaling can direct either binary or graded transcriptional responses. EMBO J. 20, 3167–3176. Blake, W.J., Bala´zsi, G., Kohanski, M.A., Isaacs, F.J., Murphy, K.F., Kuang, Y., Cantor, C.R., Walt, D.R., and Collins, J.J. (2006). Phenotypic consequences of promoter-mediated transcriptional noise. Mol. Cell 24, 853–865. Blake, W.J., Kaern, M., Cantor, C.R., and Collins, J.J. (2003). Noise in eukaryotic gene expression. Nature 422, 633–637.
Han, Y., Wind-Rotolo, M., Yang, H.C., Siliciano, J.D., and Siliciano, R.F. (2007). Experimental approaches to the study of HIV-1 latency. Nat. Rev. Microbiol. 5, 95–106. Ha¨nggi, P., Grabert, H., Talkner, P., and Thomas, H. (1984). Bistable systems: Master equation versus Fokker-Planck modeling. Phys. Rev. A 29, 371–378. Hanna, J., Saha, K., Pando, B., van Zon, J., Lengner, C.J., Creyghton, M.P., van Oudenaarden, A., and Jaenisch, R. (2009). Direct cell reprogramming is a stochastic process amenable to acceleration. Nature 462, 595–601.
Bonner, J.T. (2003). On the origin of differentiation. J. Biosci. 28, 523–528.
Holloway, D.M., Harrison, L.G., Kosman, D., Vanario-Alonso, C.E., and Spirov, A.V. (2006). Analysis of pattern precision shows that Drosophila segmentation develops substantial independence from gradients of maternal gene products. Dev. Dyn. 235, 2949–2960.
Brandman, O., Ferrell, J.E., Jr., Li, R., and Meyer, T. (2005). Interlinked fast and slow positive feedback loops drive reliable cell decisions. Science 310, 496–498.
Jablonka, E., and Raz, G. (2009). Transgenerational epigenetic inheritance: prevalence, mechanisms, and implications for the study of heredity and evolution. Q. Rev. Biol. 84, 131–176.
atay, T., Turcotte, M., Elowitz, M.B., Garcia-Ojalvo, J., and Su¨el, G.M. C¸ag (2009). Architecture-dependent noise discriminates functionally analogous differentiation circuits. Cell 139, 512–522.
Kaern, M., Elston, T.C., Blake, W.J., and Collins, J.J. (2005). Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet. 6, 451–464.
Boettiger, A.N., and Levine, M. (2009). Synchronous and stochastic patterns of gene activation in the Drosophila embryo. Science 325, 471–473.
Chambers, I., Silva, J., Colby, D., Nichols, J., Nijmeijer, B., Robertson, M., Vrana, J., Jones, K., Grotewold, L., and Smith, A. (2007). Nanog safeguards pluripotency and mediates germline development. Nature 450, 1230–1234. Chang, H.H., Hemberg, M., Barahona, M., Ingber, D.E., and Huang, S. (2008). Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature 453, 544–547. Chickarmane, V., Troein, C., Nuber, U.A., Sauro, H.M., and Peterson, C. (2006). Transcriptional dynamics of the embryonic stem cell switch. PLoS Comput. Biol. 2, e123.
Kalmar, T., Lim, C., Hayward, P., Mun˜oz-Descalzo, S., Nichols, J., GarciaOjalvo, J., and Martinez Arias, A. (2009). Regulated fluctuations in nanog expression mediate cell fate decisions in embryonic stem cells. PLoS Biol. 7, e1000149. Kirk, D.L. (2005). A twelve-step program for evolving multicellularity and a division of labor. Bioessays 27, 299–310. Klumpp, S., Zhang, Z., and Hwa, T. (2009). Growth rate-dependent global effects on gene expression in bacteria. Cell 139, 1366–1375. Koonin, E.V., Senkevich, T.G., and Dolja, V.V. (2006). The ancient Virus World and evolution of cells. Biol. Direct 1, 29.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 923
Kussell, E., and Leibler, S. (2005). Phenotypic diversity, population growth, and information in fluctuating environments. Science 309, 2075–2078.
Shapiro, J.A. (1998). Thinking about bacterial populations as multicellular organisms. Annu. Rev. Microbiol. 52, 81–104.
Lee, H.H., Molla, M.N., Cantor, C.R., and Collins, J.J. (2010). Bacterial charity work leads to population-wide resistance. Nature 467, 82–85.
Shea, M.A., and Ackers, G.K. (1985). The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation. J. Mol. Biol. 181, 211–230.
Long, T., Tu, K.C., Wang, Y., Mehta, P., Ong, N.P., Bassler, B.L., and Wingreen, N.S. (2009). Quantifying the integration of quorum-sensing signals with single-cell resolution. PLoS Biol. 7, e68. Lopez, D., Vlamakis, H., and Kolter, R. (2009). Generation of multiple cell types in Bacillus subtilis. FEMS Microbiol. Rev. 33, 152–163. Maamar, H., Raj, A., and Dubnau, D. (2007). Noise in gene expression determines cell fate in Bacillus subtilis. Science 317, 526–529. Maheshri, N., and O’Shea, E.K. (2007). Living with noisy genes: how cells function reliably with inherent variability in gene expression. Annu. Rev. Biophys. Biomol. Struct. 36, 413–434. Maynard Smith, J., and Szathma´ry, E. (1995). The Major Transitions in Evolution (Oxford: Oxford University Press). Mehta, P., Mukhopadhyay, R., and Wingreen, N.S. (2008). Exponential sensitivity of noise-driven switching in genetic networks. Phys. Biol. 5, 026005. Murphy, K.F., Adams, R.M., Wang, X., Bala´zsi, G., and Collins, J.J. (2010). Tuning and controlling gene expression noise in synthetic gene networks. Nucleic Acids Res. 38, 2712–2726. Nevozhay, D., Adams, R.M., Murphy, K.F., Josic, K., and Bala´zsi, G. (2009). Negative autoregulation linearizes the dose-response and suppresses the heterogeneity of gene expression. Proc. Natl. Acad. Sci. USA 106, 5123–5128. Newman, J.R., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., DeRisi, J.L., and Weissman, J.S. (2006). Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846. Novick, A., and Weiner, M. (1957). Enzyme induction as an all-or-none phenomenon. Proc. Natl. Acad. Sci. USA 43, 553–566. Octavio, L.M., Gedeon, K., and Maheshri, N. (2009). Epigenetic and conventional regulation is distributed among activators of FLO11 allowing tuning of population-level heterogeneity in its expression. PLoS Genet. 5, e1000673. Oppenheim, A.B., Kobiler, O., Stavans, J., Court, D.L., and Adhya, S. (2005). Switches in bacteriophage lambda development. Annu. Rev. Genet. 39, 409–429. Ozbudak, E.M., Thattai, M., Kurtser, I., Grossman, A.D., and van Oudenaarden, A. (2002). Regulation of noise in the expression of a single gene. Nat. Genet. 31, 69–73. Pa´l, C., and Miklo´s, I. (1999). Epigenetic inheritance, genetic assimilation and speciation. J. Theor. Biol. 200, 19–37. Paliwal, S., Iglesias, P.A., Campbell, K., Hilioti, Z., Groisman, A., and Levchenko, A. (2007). MAPK-mediated bimodal gene expression and adaptive gradient sensing in yeast. Nature 446, 46–51. Pelletier, J., and Sonenberg, N. (1988). Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA. Nature 334, 320–325. Ptashne, M. (1967). Specific binding of the lambda phage repressor to lambda DNA. Nature 214, 232–234. Ptashne, M. (2004). A Genetic Switch: Phage Lambda Revisited, Third Edition (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press). Queller, D.C., and Strassmann, J.E. (2009). Beyond society: the evolution of organismality. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364, 3143–3155.
Smukalla, S., Caldara, M., Pochet, N., Beauvais, A., Guadagnini, S., Yan, C., Vinces, M.D., Jansen, A., Prevost, M.C., Latge´, J.P., et al. (2008). FLO1 is a variable green beard gene that drives biofilm-like cooperation in budding yeast. Cell 135, 726–737. Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M., and Sorger, P.K. (2009). Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432. St-Pierre, F., and Endy, D. (2008). Determination of cell fate selection during phage lambda infection. Proc. Natl. Acad. Sci. USA 105, 20705–20710. Su¨el, G.M., Garcia-Ojalvo, J., Liberman, L.M., and Elowitz, M.B. (2006). An excitable gene regulatory circuit induces transient cellular differentiation. Nature 440, 545–550. Su¨el, G.M., Kulkarni, R.P., Dworkin, J., Garcia-Ojalvo, J., and Elowitz, M.B. (2007). Tunability and noise dependence in differentiation dynamics. Science 315, 1716–1719. Sureka, K., Ghosh, B., Dasgupta, A., Basu, J., Kundu, M., and Bose, I. (2008). Positive feedback and noise activate the stringent response regulator rel in mycobacteria. PLoS One 3, e1771. Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676. Tan, C., Marguet, P., and You, L. (2009). Emergent bistability by a growthmodulating positive feedback circuit. Nat. Chem. Biol. 5, 842–848. Teng, S.W., Wang, Y., Tu, K.C., Long, T., Mehta, P., Wingreen, N.S., Bassler, B.L., and Ong, N.P. (2010). Measurement of the copy number of the master quorum-sensing regulator of a bacterial cell. Biophys. J. 98, 2024–2031. Thattai, M., and van Oudenaarden, A. (2004). Stochastic gene expression in fluctuating environments. Genetics 167, 523–530. Tiwari, A., Bala´zsi, G., Gennaro, M.L., and Igoshin, O.A. (2010). The interplay of multiple feedback loops with post-translational kinetics results in bistability of mycobacterial stress response. Phys. Biol. 7, 036005. Tu, K.C., Long, T., Svenningsen, S.L., Wingreen, N.S., and Bassler, B.L. (2010). Negative feedback loops involving small regulatory RNAs precisely control the Vibrio harveyi quorum-sensing response. Mol. Cell 37, 567–579. Veening, J.W., Igoshin, O.A., Eijlander, R.T., Nijland, R., Hamoen, L.W., and Kuipers, O.P. (2008a). Transient heterogeneity in extracellular protease production by Bacillus subtilis. Mol. Syst. Biol. 4, 184. Veening, J.W., Stewart, E.J., Berngruber, T.W., Taddei, F., Kuipers, O.P., and Hamoen, L.W. (2008b). Bet-hedging and epigenetic inheritance in bacterial cell development. Proc. Natl. Acad. Sci. USA 105, 4393–4398. Waddington, C.H., and Kacser, H. (1957). The Strategy of the Genes: A Discussion of Some Aspects of Theoretical Biology (London, UK: George Allen & Unwin). Wahl, L.M. (2002). Evolving the division of labour: generalists, specialists and task allocation. J. Theor. Biol. 219, 371–388. Walczak, A.M., Onuchic, J.N., and Wolynes, P.G. (2005). Absolute rate theories of epigenetic stability. Proc. Natl. Acad. Sci. USA 102, 18926–18931.
Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., and Tyagi, S. (2006). Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309.
Wang, J., Xu, L., and Wang, E. (2008). Potential landscape and flux framework of nonequilibrium networks: robustness, dissipation, and coherence of biochemical oscillations. Proc. Natl. Acad. Sci. USA 105, 12271–12276.
Raj, A., Rifkin, S.A., Andersen, E., and van Oudenaarden, A. (2010). Variability in gene expression underlies incomplete penetrance. Nature 463, 913–918.
Waters, C.M., and Bassler, B.L. (2005). Quorum sensing: cell-to-cell communication in bacteria. Annu. Rev. Cell Dev. Biol. 21, 319–346.
Rao, C.V., Wolf, D.M., and Arkin, A.P. (2002). Control, exploitation and tolerance of intracellular noise. Nature 420, 231–237.
Weinberger, L.S., Burnett, J.C., Toettcher, J.E., Arkin, A.P., and Schaffer, D.V. (2005). Stochastic gene expression in a lentiviral positive-feedback loop: HIV-1 Tat fluctuations drive phenotypic diversity. Cell 122, 169–182.
Ray, J.C., and Igoshin, O.A. (2010). Adaptable functionality of transcriptional feedback in bacterial two-component systems. PLoS Comput. Biol. 6, e1000676.
924 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Weinberger, L.S., Dar, R.D., and Simpson, M.L. (2008). Transient-mediated fate determination in a transcriptional circuit of HIV. Nat. Genet. 40, 466–470.
Weitz, J.S., Mileyko, Y., Joh, R.I., and Voit, E.O. (2008). Collective decision making in bacterial viruses. Biophys. J. 95, 2673–2680. Wernet, M.F., Mazzoni, E.O., Celik, A., Duncan, D.M., Duncan, I., and Desplan, C. (2006). Stochastic spineless expression creates the retinal mosaic for colour vision. Nature 440, 174–180.
Wiesenfeld, K., and Moss, F. (1995). Stochastic resonance and the benefits of noise: from ice ages to crayfish and SQUIDs. Nature 373, 33–36. Wolf, D.M., Vazirani, V.V., and Arkin, A.P. (2005). Diversity in times of adversity: probabilistic strategies in microbial survival games. J. Theor. Biol. 234, 227–253. Wolk, C.P. (1996). Heterocyst formation. Annu. Rev. Genet. 30, 59–78.
West, S.A., Griffin, A.S., and Gardner, A. (2007). Social semantics: altruism, cooperation, mutualism, strong reciprocity and group selection. J. Evol. Biol. 20, 415–432. West, S.A., Griffin, A.S., Gardner, A., and Diggle, S.P. (2006). Social evolution theory for microorganisms. Nat. Rev. Microbiol. 4, 597–607.
Yamanaka, S. (2009). Elite and stochastic models for induced pluripotent stem cell generation. Nature 460, 49–52. Zeng, L., Skinner, S.O., Zong, C., Sippy, J., Feiss, M., and Golding, I. (2010). Decision making at a subcellular level determines the outcome of bacteriophage infection. Cell 141, 682–691.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 925
Leading Edge
Review Measuring and Modeling Apoptosis in Single Cells Sabrina L. Spencer1,2 and Peter K. Sorger1,* 1Center
for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA address: Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.03.002 2Present
Cell death plays an essential role in the development of tissues and organisms, the etiology of disease, and the responses of cells to therapeutic drugs. Here we review progress made over the last decade in using mathematical models and quantitative, often single-cell, data to study apoptosis. We discuss the delay that follows exposure of cells to prodeath stimuli, control of mitochondrial outer membrane permeabilization, switch-like activation of effector caspases, and variability in the timing and probability of death from one cell to the next. Finally, we discuss challenges facing the fields of biochemical modeling and systems pharmacology. Introduction Apoptosis is a form of programmed cell death involving caspases, specialized cysteine proteases found in animal cells as inactive proenzymes (Fuentes-Prior and Salvesen, 2004). Dramatic progress has been made in recent years in identifying and determining the biochemical activities and cellular functions of biomolecules that regulate apoptosis and carry out its proteolytic program. However, current knowledge is largely qualitative and descriptive, and the complex circuits that integrate prosurvival and prodeath signals to control the fates of normal and diseased cells remain poorly understood. Successful creation of quantitative and predictive computational models of apoptosis would be significant from both basic research and clinical perspectives. From the standpoint of basic research, apoptosis is a stereotypical systems-level problem in which complex circuits involving graded and competing molecular signals determine binary life-death decisions at a single-cell level. Progress in modeling such decisions has had a significant impact on the small but growing field of mammalian systems biology. From a clinical perspective, diseases such as cancer involve disruption of the normal balance between cell proliferation and cell death, and anticancer drugs are thought to achieve their therapeutic effects by inducing apoptosis in cancer cells (Fadeel et al., 1999). However, it is difficult to anticipate whether a tumor cell will or will not be sensitive to a proapoptotic stimulus or drug based on general knowledge of apoptosis biochemistry because the importance of specific processes varies dramatically from one cell type to the next. Predictive, multifactorial, and context-sensitive computational models relevant to disease states will impact drug discovery and clinical care. Apoptosis can be triggered by intrinsic and extrinsic stimuli. In intrinsic apoptosis, the death-inducing stimulus involves cellular damage or malfunction brought about by stress, ultraviolet (UV) or ionizing radiation, oncogene activation, toxin exposure, etc. (Kaufmann and Earnshaw, 2000). Extrinsic apoptosis is triggered by binding of extracellular ligands to specific transmembrane 926 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
receptors, primarily members of the tumor necrosis factor receptor (TNFR) family (Kaufmann and Earnshaw, 2000). Receptor binding by TNF family ligands activates caspasedependent pathways that are quite well understood in molecular terms. In general, extrinsic apoptosis has received more attention than intrinsic apoptosis from investigators seeking to develop mathematical models, but extrinsic and intrinsic apoptosis share many components and regulatory mechanisms. The best studied inducers of extrinsic apoptosis are TNF-a, Fas ligand (FasL, also known as Apo-1/CD95 ligand), and TRAIL (TNF-related apoptosis-inducing ligand, also known as Apo2L; Figure 1A). Binding of these ligands to trimers of cognate receptors causes a conformational change that promotes assembly of death-inducing signaling complexes (DISCs) on receptor cytoplasmic tails (Gonzalvez and Ashkenazi, 2010). DISCs contain multiple adaptor proteins, such as TRADD and FADD, which recruit and promote the activation of initiator procaspases. The composition of the DISC differs from one type of death receptor to the next and also changes upon receptor internalization (Schutze et al., 2008). A remarkable feature of TNF-family receptors is that they activate both proapoptotic and prosurvival signaling cascades and the extent of cell death is determined in part by the balance between these competing signals. Prodeath processes are triggered by activation of initiator procaspases-8 and -10 at the DISC, a process that can be modulated by the catalytically inactive procaspase-8 homolog FLIP (Fuentes-Prior and Salvesen, 2004). Prosurvival processes are generally ascribed to activation of the NF-kB transcription factor, but other less well-understood processes are also involved, such as induction of the mitogen-activated protein kinase (MAPK) and Akt (protein kinase B) cascades (Falschlehner et al., 2007). Initiator caspases recruited to the DISC directly cleave effector procaspases-3 and -7 generating active proteases (FuentesPrior and Salvesen, 2004). Effector caspases cleave essential structural proteins such as cytokeratins and nuclear lamins and also inhibitor of caspase-activated DNase (iCAD), which
Figure 1. Modeling Receptor-Mediated Apoptosis (A) Simplified schematic of receptor-mediated apoptosis signaling, with fluorescent reporters for initiator caspases (IC FRET) and effector caspases (EC FRET) indicated. The MOMP reporter measures mitochondrial outer membrane permeablization. (B) Steps involved in converting a biochemical cartoon into a reaction diagram and ordinary differential equations. C8* indicates active caspase-8. Lower panels show a model-based 12 hr simulation of the increase in tBid relative to the time of MOMP and analysis of the sensitivity of MOMP time to Bid levels. The simulation in (B) was adapted from Albeck et al. (2008b).
liberates the DNase (CAD) to digest chromosomal DNA and cause cell death. So-called ‘‘type I’’ apoptosis, which comprises a direct pathway of receptor/initiator caspases/effector caspases/death, is thought to be sufficient for death in certain cell types, but in most cell types apoptosis occurs by a ‘‘type II’’ pathway in which mitochondrial outer membrane permeabilization (MOMP) is a necessary precursor to effector caspase activation (Scaffidi et al., 1998). MOMP is triggered by the formation of pores in the mitochondrial membrane. Pore formation is controlled by the 20 members of the Bcl-2 protein family, which can be roughly divided into four types: the ‘‘effectors’’ Bax and Bak whose oligomerization creates pores; ‘‘inhibitors’’ of Bax and Bak association such as Bcl-2, Mcl1, and BclxL; ‘‘activators’’ of Bax and Bak such as Bid and Bim; and ‘‘sensitizers’’ such as Bad, Bik, and Noxa that antagonize antiapoptotic Bcl2-like proteins (Letai, 2008). In extrinsic apoptosis, initiator caspases that have been activated at the DISC cleave Bid into tBid, which in turn promotes a conformational change in Bax and Bak leading to oligomerization. Bax or Bak oligomers create pores in the mitochondrial outer membrane and promote cytoplasmic translocation of critical apoptosis regulators such as cytochrome c and Smac/Diablo, which normally reside in the space between the outer and inner mitochondrial membranes. MOMP does not occur until proapoptotic pore-forming proteins overwhelm antiapoptotic Bcl-2-like proteins (the so-called rheostat model) (Korsmeyer et al., 1993). Under most circumstances, MOMP is a sudden process that lasts a few minutes and marks the point of no return in the commitment to cell death (Chipuk et al., 2006; Tait et al., 2010). Once translocated to the cytosol, cytochrome c combines with Apaf-1 and caspase-9 to form the apoptosome, which cleaves and activates effector procaspases (Fuentes-Prior and Salvesen, 2004). XIAP associates with the catalytic pocket of active effector caspases-3 and -7 blocking protease activity and promoting their ubiquitin-dependent degradation. Binding of Smac to XIAP relieves this inhibition, allowing effector caspases to cleave their substrates and cause cell death (Fuentes-Prior and Salvesen, 2004). In this Review, we describe how combining theoretical and computational approaches with live-cell imaging and quantitative biochemical analysis has provided new insight into mechanisms controlling the dynamics of extrinsic apoptosis. We start with a brief description of modeling concepts and methods relevant to apoptosis research. Next, we survey the recent literature. Modeling apoptosis, like quantitative analysis of mammalian signal transduction in general, is a field in its infancy fraught with many technical and conceptual challenges. Thus, only a subset of the known biochemistry of extrinsic apoptosis has been subjected to computational analysis, and this analysis has been performed only in a few cell lines. Key questions, such as differences between normal and transformed cells, have not yet been addressed in terms amenable to modeling. This Review, therefore, focuses on the subset of questions for which modeling has provided new insight (Figure 2). These include: (1) How is all-or-none control over effector caspase activity achieved? (2) How are activated effector caspases inhibited during the pre-MOMP delay while initiator caspase activity rises? (3) How do prosurvival and prodeath signals interact to determine if and when MOMP occurs? (4) What Cell 144, March 18, 2011 ª2011 Elsevier Inc. 927
causes cell-to-cell variation in the timing and probability of apoptosis? We close this Review with an evaluation of current and emerging methods and future prospects. Readers interested in a more thorough description of the biology of extrinsic apoptosis are referred to several excellent reviews (FuentesPrior and Salvesen, 2004; Gonzalvez and Ashkenazi, 2010; Hengartner, 2000) and to Douglas Green’s new book Means to an End: Apoptosis and Other Cell Death Mechanisms (Green, 2011). Modeling Concepts Relevant for Apoptosis The term ‘‘model’’ is used in a variety of fields in the natural and applied sciences to describe a mathematical or computational representation of a physical system. In molecular biology, the term usually refers to a ‘‘word model’’ or narrative description accompanied by a diagram, although it can also refer to a cell line or genetically engineered mouse that recapitulates aspects of a human disease. In this Review, we restrict use of the term ‘‘model’’ to describe an executable set of rules or equations in mathematical form. We are primarily interested in models that are built and tested using detailed cellular or biochemical experiments. Models of cellular biochemistry can be based on different mathematical formalisms, from Boolean logic to differential equations, depending on the degree of detail and the scope of the modeling effort. Most models of apoptosis have been encoded using ordinary differential equations (ODEs), which describe the evolution of a system in continuous time. ODEs are the mathematical representation of mass action kinetics, the familiar biochemical approximation in which rates of reaction are proportional to the concentrations of reactants (Figure 1B) (Chen et al., 2010). Diffusion, spatial gradients, or transport can be modeled explicitly using partial differential equations (PDEs), which represent biochemical systems in continuous time and space. For example, Rehm et al. (2009) used PDEs to model the spread of mitochondrial permeabilization through a cell following an initial, localized MOMP event. Using sets of differential equations it is possible to encode a complex network of interacting biochemical reactions and then study network dynamics under the assumption that protein concentrations and reaction rates can be estimated from experimental data. Differential equation models often increase rapidly in complexity as species are added, as each new protein can give rise to a large number of model species differing in location, binding state, and degree of posttranslational modification. This problem has effectively limited data-dependent ODE/PDE models to fewer than 20 gene products (and on the order of 50–100 model species), although efforts are underway to increase this limit. In addition to differential equations, several other formalisms have been used to model apoptosis. Stochastic models make it possible to represent reactions as processes that are discrete and random, rather than continuous and deterministic. Stochastic models are advantageous when the number of individual reactants of any species is small (typically fewer than 100) or reaction rates very slow (Zheng and Ross, 1991). In these cases, a Monte Carlo procedure is used to represent the probabilistic nature of collisions and reactions among individual molecules (Gillespie, 1977). For example, stochastic cellular au928 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
tomata have been used to model the movement of molecules on the mitochondrial outer membrane (Chen et al., 2007). When sufficient time-resolved quantitative data are lacking, a less precise modeling framework is usually advantageous, and logic-based models have proven particularly popular. Boolean models, for example, are discrete two-state logical models in which each node in a network is represented as a simple on/ off switch. Boolean models have been used to represent the interplay among survival, necrosis, and apoptosis pathways and to predict the likelihood that each phenotype would result following changes in the levels of regulatory proteins (Calzone et al., 2010). However, the more qualitative and phenomenological the modeling framework, the less mechanistic the insight. Regardless of modeling framework, a trade-off exists between model tractability and model detail or scope. The inclusion of more species makes it possible to analyze biochemical processes in greater detail or to represent the operation of large networks involving many gene products, but larger models are more difficult to constrain with experimental data, and excess detail can mask underlying regulatory mechanisms. A Jorge Luis Borges story comes to mind in which the art of cartography achieved such a perfection of detail that cartographers built a map of their empire with 1:1 correspondence to the empire itself, rendering the map useless (Borges and Hurley, 1999). On the other hand, although small models have the advantage of relative simplicity and even analytical tractability (i.e., capable of being solved exactly without simulation), they run the risk of grossly simplifying the underlying biochemistry and of including an insufficient number of regulatory processes. As yet, no clear principles exist to guide decisions about model scope and complexity, and most studies remain constrained by the relative immaturity of modeling software and a paucity of experimental data. Estimating values for rate constants and initial protein concentrations (the parameters in differential equation models) remains extremely challenging both computationally and experimentally. Each reaction in an ODE model is associated with one or more ‘‘initial conditions’’ (the concentrations of reactants at time zero) and rate constants, usually a forward and reverse rate constant. Some of these parameters are available in the literature, typically from in vitro biochemical experiments, and these values may hold true in the context of a cell. In many other cases, however, no estimates of rate constants are available and parameters must be estimated directly from experiments (Chen et al., 2010). In addition, protein concentrations vary from cell type to cell type and should be measured directly in the cell type under investigation, although this is often not done because it is time consuming. The estimation of unknown parameter values based on data (typically, time-dependent changes in the abundance or localization of proteins in the model) is called model calibration, model training, or model fitting. Almost all realistic models of biological systems are too large for all parameters to be fully constrained by experimental data, and the models are therefore ‘‘nonidentifiable.’’ Thus far, the process of model calibration has been approached rather informally, but more rigorous approaches are in development (e.g., Kim et al., 2010). Careful analysis is expected to confirm the common-sense view that solid conclusions can be reached even in the case of partial knowledge.
Figure 2. Questions Addressed in This Review (A and B) Composite plot of effector caspase substrate cleavage measured using a CFPDEVDR-YFP reporter (A) or initiator caspase substrate cleavage measured using CFP-IETDGGIETD-YFP (B) for >50 HeLa cells treated with 50 ng/ml TRAIL in the presence of cycloheximide and aligned by the average time of MOMP (red line). (C) Fitted trajectories for initiator caspase substrate cleavage (assayed using CFP-IETDGGIETD-YFP) in single HeLa cells treated with 10 ng/ml TRAIL in the presence of cycloheximide (fits are based on sampling at 3 min intervals). Concomitant expression of a reporter for MOMP permits a determination of the time at which mitochondria permeabilize and thus an estimation of the height of the MOMP threshold (yellow circles) and the rate of approach to the threshold (the ‘‘slope’’ of the green lines). (D) Histograms of time of death in HeLa cells treated with various death ligands in the presence of cycloheximide, as determined by live-cell microscopy. (A), (B), and (D) were adapted from Albeck et al. (2008b); (C) was adapted from Spencer et al. (2009).
Modeling biological processes requires the collection and analysis of quantitative experimental data. An ODE model, which assumes that each compartment is well mixed, necessarily represents a single cell, and calibrating and testing ODE models therefore require collecting data on single cells over time. However, live-cell imaging experiments usually rely on genetically modified cell lines carrying fluorescent reporters. Creating these lines is relatively time-consuming, and the extent of multiplexing is limited by phototoxicity and the availability of noninterfering fluorophores. It is not always clear that an engineered reporter correctly represents the activity or state of modification of endogenous proteins (see, for example, discrepancies regarding initiator caspase activity reporters, discussed below; Albeck et al., 2008a; Hellwig et al., 2008; Hellwig et al., 2010). Flow cytometry, immunofluorescence, and single-cell PCR are also effective means to assay single cells, and biochemical experiments (immunoblotting or ELISAs for example) performed on populations of cells remain essential for quantitative biology. Although rarely addressed, effective integration of data arising from multiple measurement methods is an area in which computational models are likely to play a key role (Albeck et al., 2006). The construction and parameterization of even a well designed model do not lead directly to a better understanding of the system—model analysis is required. The dependence of the system on parameter values is of particular interest and can be approached using sensitivity analysis. Sensitivity analysis involves systematically varying parameters (initial conditions or rate constants) while monitoring the consequences for model output (the time at which a cell undergoes apoptosis, for example). Sensitivity analysis reveals which outputs are sensitive to variation in which parameters and can be viewed as the
computational equivalent of experiments that knock down or overexpress proteins while monitoring phenotype. For example, Hua et al. (2005) created an ODE model of Fas signaling and performed sensitivity analysis by varying the initial concentration of each protein species 10- or 100-fold above or below a baseline value. Using the half-time of caspase-3 activation as an output, they predicted (and confirmed experimentally) that increases but not decreases in Bcl-2 levels would alter sensitivity to FasL. From a practical perspective, sensitive parameters must be estimated with particular care if a model is to be reliable, but from a biological perspective, they represent possible means of regulation. Points in a network that exhibit extreme sensitivity to small perturbations are often referred to as ‘‘fragile’’ (the converse of ‘‘robust’’), and considerable interest exists in the idea that fragility analysis, a concept borrowed from control theory, might be applied to biological pathways. In this view, fragile points might identify processes frequently mutated in disease or potentially modifiable using therapeutic drugs (Luan et al., 2007). Stability analysis is another commonly used method of model analysis. Some models of biochemical networks have the interesting property of converging at equilibrium to a small set of stable states known as fixed points, where the rate of change in the concentrations of all model species is zero. Identification and characterization of fixed points can provide valuable insight into the dynamics of a system, its responses to perturbation, and the nature of regulatory mechanisms. Of particular interest in biology is bistability, a property in which a system of equations has two stable fixed points separated by an unstable fixed point. Bistability has obvious appeal in the case of apoptosis, in which cells are either alive or dead, and has been proposed to underlie Cell 144, March 18, 2011 ª2011 Elsevier Inc. 929
a variety of binary fate decisions such as maturation of Xenopus oocytes (Ferrell and Machleder, 1998) and lactose utilization in E. coli (Ozbudak et al., 2004). From the perspective of control, many bistable systems have two valuable properties: (1) they are insensitive to minor perturbations because the system is ‘‘attracted’’ to the nearest stable state (in apoptosis, a bistable system would be resistant to spontaneous activation of proapoptotic proteins, for example), and (2) they exhibit ‘‘all-ornone’’ transitions from one stable state to another in response to small changes in the level of a key regulatory input (a property known in biochemistry as ‘‘ultrasensitivity’’). Bistable processes often exhibit hysteresis (path dependence): once in the on state, they do not readily slip back to off. It is often assumed that the regulatory machinery for apoptosis must be bistable in the mathematical sense with one equilibrium state corresponding to caspases off and ‘‘alive’’ and the other to caspases on and ‘‘dead’’ (Figure 3A). Although bistability remains the favorite framework for thinking about the switch between life and death, bistability is not strictly necessary for a switch-like transition between two distinct states (Albeck et al., 2008b). A monostable system in which the landscape changes through time can create a temporal switch between two states; in this case, the change in the landscape involves the creation, destruction, or translocation of precisely those proteins (caspases, cytochrome c, etc.) that are known to regulate apoptosis. In this regard, it should be noted that the ‘‘sharpness’’ of a switch in a conventional bistable system refers to the steepness of the dose-response curve (to a change in the concentration of a regulatory protein, for example), not necessarily sharpness in time. In contrast, the ‘‘all-or-nothing’’ switch observed by time-lapse microscopy of cells undergoing apoptosis refers to a switch from alive to dead that is sharp in a temporal sense. These considerations do not imply that the biochemical pathways controlling apoptosis are not bistable systems, but rather that bistability is not necessary a priori. Modeling and Measuring Receptor-Mediated Apoptosis The first model of extrinsic apoptosis was published a decade ago and set the stage for subsequent work in the field. Fussenegger et al. (2000) used emerging understanding of MOMP and caspase activation by death receptors to assemble a simple ODE model. By increasing or decreasing the levels of pairs of proteins in the model, the authors determined which combinations promoted or blocked effector caspase activation, thereby providing insight into ratiometric control over cell death by caspase-3 and XIAP (Fussenegger et al., 2000). At the same time, the development of fluorescent reporters for MOMP and caspase substrate cleavage allowed several groups to collect data on the dynamics of apoptosis in single cells. These data showed that following exposure to inducers of either intrinsic or extrinsic apoptosis (UV light, actinomycin D, staurosporine, or TNF), cells wait for several hours before initiating a rapid chain of events that triggers MOMP and activates effector caspases (Goldstein et al., 2000a, 2000b; Tyas et al., 2000). This contrasts with data obtained by western blotting and other population-average biochemical assays that suggested that MOMP and caspase activation occur gradually over a period of several hours. The two types of data can be reconciled by noting that apoptosis is 930 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
sudden and switch like in individual cells, but that it takes place at different times in different cells (Figure 3B) (Goldstein et al., 2000a; Goldstein et al., 2000b; Tyas et al., 2000). All-or-None Control over Effector Caspase Activity Goldstein et al. (2000b) used time-lapse imaging of cytochrome c translocation to obtain the first data on the kinetics of MOMP. They observed the time between proapoptotic insult and MOMP to vary depending on the type and strength of the stimulus (ranging from 4–20 hr following exposure to the pan-specific kinase inhibitor staurosporine and 9–17 hr following exposure to UV light), but the rate and extent of cytochrome c release were constant, taking 5 min to reach completion. Further understanding of the link between MOMP and caspase activation was made possible by the development of intramolecular Fo¨rster resonance energy transfer (FRET) reporters for caspase-mediated proteolysis. The first FRET reporters for monitoring caspase activity by time-lapse microscopy linked cyan fluorescent protein (CFP) to yellow fluorescent protein (YFP) using a polypeptide linker containing the amino acid sequence DEVD, a substrate for caspase-3 (CFP-DEVD-YFP) (Rehm et al., 2002; Tyas et al., 2000). Prior to reporter cleavage, CFP lies in close proximity to YFP, causing FRET between the two fluorescent proteins and reducing CFP emission. Following cleavage of the DEVD-containing linker, the efficiency of FRET drops dramatically, increasing the CFP to YFP fluorescence ratio. Time-lapse imaging of cells expressing CFP-DEVD-YFP revealed that caspase-3 is also activated rapidly, taking 100-fold molar excess of XIAP over caspase-3 would be required to ensure effective 932 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
inhibition of caspase-3 proteolytic activity over the course of a typical 2–6 hr pre-MOMP delay. The requirement for such a large excess of XIAP over caspase-3 arises because competitive inhibition is reversible whereas substrate cleavage is not and because substrates, which are abundant, are in competition with XIAP for access to the caspase catalytic site. As XIAP and caspase-3 are present at roughly equal concentrations in HeLa cells, simple competitive inhibition cannot be the sole inhibitory mechanism. XIAP is an E3 ligase able to promote ubiquitination and degradation of caspase-3, and simulation suggests that a combination of competitive inhibition and caspase degradation would constitute an effective means of regulation (Albeck et al., 2008a). Confirming these predictions, depletion of XIAP by RNAi or pharmacological inhibition of the proteasome was observed to cause
effector caspase activation prior to MOMP (Albeck et al., 2008a). Deletion of XIAP in the mouse or truncation of the ubiquitinationpromoting RING domain also caused elevated caspase-3 activity and sensitivity to apoptosis (Schile et al., 2008), demonstrating a critical role for XIAP-mediated ubiquitination of caspase-3 in vivo. The pre-MOMP delay evidently constitutes a ‘‘latent’’ death state in which effector procaspases are actively processed by initiators but are held in check by XIAP until Smac is released during MOMP. The reasoning that led to this conclusion illustrates the value of making models explicit and analyzing them computationally: a biochemical mechanism that seems adequate on its face (Bcl-2-Bax binding in the rheostat model or competitive inhibition of C3 by XIAP during the pre-MOMP delay) proves insufficient when actual protein levels and rates of reaction are taken into account. In this sense, quantitative analysis can fundamentally change our qualitative understanding of a regulatory mechanism. It should be noted, however, that current models of receptor-mediated apoptosis in type II cells cannot completely restrain pre-MOMP caspase-3 activity when experimentally measured procaspase-3 and XIAP concentrations are used. Although XIAP-mediated degradation of active caspase-3 is necessary, raising this degradation rate too much compromises the switch-like activation of effector caspase substrate cleavage post-MOMP. Reconciliation of all experimental observations awaits the development of more sophisticated and complete models. If XIAP is partially depleted by RNAi and MOMP is blocked by overexpression of Bcl-2, a sublethal level of effector caspase activity is generated and effector caspase substrates are only partially processed; moreover incomplete cleavage of caspase-3 substrates does not necessarily cause cell death (at least in HeLa cells) (Albeck et al., 2008a). Modeling and experiments with XIAP overexpression suggest three possible outcomes depending on XIAP levels: with [XIAP] < 0.15 mM, effector caspase substrate cleavage is complete; at [XIAP] > 0.30 mM, cleavage is fully inhibited; and at intermediate XIAP concentrations, slow submaximal effector caspase substrate cleavage occurs (Figure 3E) (Rehm et al., 2006). Thus, alteration of XIAP levels disrupts normal switch-like control over effector caspase activation and interferes with the normal link between caspase activation and cell killing. Activation of CAD in the absence of cell death is expected to be particularly problematic since it has the potential to cause genomic instability (Lovric and Hawkins, 2010) and has been proposed to be the trigger of the chromosomal translocations observed in some leukemias (Betti et al., 2005; Vaughan et al., 2002, 2005; Villalobos et al., 2006). The role of XIAP in restraining caspase-3 in the absence of MOMP makes it a central factor in controlling type I versus type II apoptosis. Jost et al. (2009) observed that inhibition of XIAP function by gene targeting or a Smac mimetic drug caused type II cells to adopt a type I phenotype. Bid deficiency protected hepatocytes and pancreatic b cells from FasL-induced apoptosis (fulfilling the definition of mitochondria-dependent type II death), but concomitant loss of XIAP (in Bid/ XIAP/ mice) restored FasL sensitivity, thereby demonstrating a switch to type I behavior (Jost et al., 2009). Type I cells are defined as not requiring MOMP for apoptosis, but blockade of the mitochondrial pathway via Bid depletion or Bcl-2 overexpression in
type I cells has been observed to reduce effector caspase activity and to increase the number of cells surviving TRAIL exposure (Maas et al., 2010). Both type I and type II pathways, therefore, appear to depend to a greater or lesser extent on the mitochondrial pathway, either for regulating XIAP and activating effector caspases or for killing cells by disrupting essential mitochondrial functions. Determinants of the Timing and Probability of MOMP Apoptosis proceeds at different rates in different cells, even among members of a clonal population. Some cells die within 45 min of exposure to FasL or TRAIL, whereas other cells in the same dish wait 12 hr or more. A simple way to conceptualize control over the timing of apoptosis in single cells is that the level of active receptor determines the amount of active caspase-8/ 10, which sets the rate of tBid cleavage and, thus, the rate of approach to a threshold that must be overcome for MOMP to occur (Figure 2C). The height of this threshold is set by the relative levels of competing pro- and antiapoptotic Bcl-2-family proteins (Chipuk and Green, 2008). We discuss below recent advances in our understanding of the MOMP threshold and return later to the determinants of the rate of approach to the threshold. Using fluorescent measurements in a purified in vitro system, Lovell et al. (2008) simultaneously measured the rates of three reactions leading to pore formation and determined the following order of events. First tBid binds rapidly to mitochondrial membranes where tBid and Bax interact, promoting insertion of Bax into the membrane, a rate-limiting step. Bax then oligomerizes to form pores, and membranes become permeable. In vitro, Bax oligomerization continues even after membranes are permeabilized (Lovell et al., 2008). In cell culture, Bax multimerization is first detected immediately prior to MOMP and then continues for at least 30 min, ultimately generating many more Bax puncta or pores than the number required for MOMP (Albeck et al., 2008b; Dussmann et al., 2010). Formation of the first observable Bax (or Bak) puncta correlates temporally and spatially with the first subset of mitochondria to undergo MOMP. Pore formation and MOMP then spread through the cell as a wave with a velocity of 0.6 mm/s, a process that has been modeled using a PDE network (Rehm et al., 2009). The process of pore formation proceeds more rapidly at higher doses of TRAIL, presumably due to an increased rate of procaspase-8 activation (Rehm et al., 2009). However, it has recently been observed that in a subset of HeLa cells, MCF-7 cells, and murine embryonic fibroblasts, some mitochondria fail to undergo MOMP in response to diverse proapoptotic stimuli (actinomycin D, UV, staurosporine, or TNF). The subset of mitochondria that remain intact fail to accumulate GFP-Bax puncta but undergo complete MOMP when treated with the Bcl-2 antagonist and investigational therapeutic ABT-737, suggesting that resistance of mitochondria to MOMP lies at the point of Bax/Bak activation (Tait et al., 2010). These findings suggest that mitochondria in a single cell differ from each other with respect to their sensitivities to proapoptotic stimuli and that MOMP might not always be an all-or-none event at the single-cell level (Tait et al., 2010). Time-lapse imaging of initiator caspase and MOMP reporters shows that the height of the MOMP threshold varies from cell to cell. MOMP is triggered following cleavage of 10% of a reporter carrying one IETD recognition site (Hellwig et al., 2008, 2010) or Cell 144, March 18, 2011 ª2011 Elsevier Inc. 933
30%–60% of a reporter carrying two recognition sites (Albeck et al., 2008a, 2008b; Spencer et al., 2009). Variation in the height of the MOMP threshold from cell to cell (presumably arising from variation in the levels of Bcl-2-family proteins, see below) can most easily be resolved using the sensitized dual-IETD reporter (Figure 2C) and contributes 20% of the total variability in the time of death among HeLa cells in clonal population exposed to 10 ng/ml TRAIL (Spencer et al., 2009). The remaining 80% of the variability appears to reflect differences in the rate of Bid cleavage, although these percentages are expected to change with stimulus and cell type. However, the precise dynamics of Bid cleavage have recently been thrown into some doubt: a FRET reporter containing full-length Bid rather than an artificial IETD caspase recognition site exhibits minimal cleavage prior to MOMP (Hellwig et al., 2010). One explanation for this discrepancy is that IETD-only reporters might be overly sensitive and not reflect the kinetics of endogenous substrate cleavage. In this view, cleavage of Bid by initiator caspases is subject to additional forms of regulation so that tBid does not accumulate until just before MOMP (Hellwig et al., 2010). Conversely, the Bidcontaining FRET reporter might simply be insufficiently sensitive, and levels of tBid required for MOMP (estimated to be 2.53 higher protein expression compared to cells in the bottom 5th percentile (Niepel et al., 2009). The question then arises of whether such modest variation in protein levels is sufficient to explain the observed variation in the timing of cell death. Model-based simulation suggested that it is: when the distribution of cell death times was computed for TRAILinduced apoptosis assuming log-normally distributed protein concentrations, a close match was observed between the variability in simulation and experiment (Spencer et al., 2009). In the absence of any simple experimental test, the match between simulation and measurement increases our confidence in the hypothesis that natural variation in the levels of apoptotic regulators is responsible for variability in the time and probability of cell death. Is it possible to establish a direct link between the levels of any single protein and the probability and timing of apoptosis? In principle such a measurement could be made by fluorescently tagging proteins of interest at the endogenous locus and then relating their levels to time of death using live-cell microscopy. However, mathematical modeling suggests that achieving reasonable predictability over cell fate would require single-cell measurement of many protein levels (as well as some posttranslational modifications), a difficult task. Alternatively, simulation suggests that predictability can be achieved by measuring the rates of critical reactions, such as the processing of caspase-8 substrates. Because this rate depends on the levels of multiple upstream proteins, measuring it is much more informative than simply knowing protein levels (Spencer et al., 2009). This conclusion implies a fundamental limit to our ability to predict cell fate based on single-cell proteomics. Conclusions and Future Prospects Key goals for a combined model- and experiment-driven analysis of apoptosis are to understand how multiple cooperating and competing signals are integrated to effectively execute a binary death-survival decision, to determine why some processes and proteins are important in one cell type and not in another, and to predict the responses of cells to death ligands and chemotherapy drugs. A review of the literature thus far suggests that these goals remain largely unfulfilled. Skeptics will argue that quantitative analysis can only add details to Cell 144, March 18, 2011 ª2011 Elsevier Inc. 935
existing conceptual frameworks or that mathematical models are too theoretical and too dependent on assumptions to be useful (although drawing a pathway diagram may involve just as many assumptions). A more generous and realistic assessment would be that mechanistic modeling of apoptosis has had an impact in motivating the collection and analysis of quantitative single-cell data, critically evaluating potential regulatory mechanisms, and investigating the origins of cell-to-cell variability. Technical Challenges Addressing the long-term goals of quantitative, model-driven biology will require major conceptual and technical advances. Most computational tools currently in use have been adapted from other fields, but understanding a biological system is nothing like fixing a radio. Cells are not well-mixed systems as encountered in chemistry, nor are they easily understood in terms of fundamental physical laws or obviously subject to the design principles (such as modularity) encountered in engineered systems. They resemble all of these to some extent, but systems biology is currently immersed in the uncharted process of working out which concepts from chemistry, physics, and engineering are most useful in understanding cells and tissues. It is already evident that different research groups will continue to build models differing in scope and level of detail and customized to the biological questions being addressed. Current approaches to model building typically involve de novo creation of complex sets of equations in each paper. A lack of transparency in the underlying assumptions makes it difficult for practitioners, nevermind the general research community, to understand how models differ from each other. Fortunately, ‘‘rules-based’’ modeling methods now in development promise to address the issue of model reusability and intelligibility (Faeder et al., 2009; Hlavacek et al., 2006). More rigorous means for linking models to experimental data and for understanding which aspects of a model are supported by data are required. Progress in this area is slow, but the basic principles are understood in the context of engineering and the physical sciences (Jaqaman and Danuser, 2006). Finally, we must work to ensure basic familiarity with dynamical systems among trainees. It is widely accepted that a working knowledge of statistical methods such as clustering and regression is essential in contemporary biomedicine, but it is unfortunate that few students are taught that familiar Michaelis-Menten equations are simply approximations to a mass-action formalism written as networks of ODEs (Chen et al., 2010). Biological Challenges Cancer pharmacology is the area of translational medicine in which models of apoptosis are most obviously of value. Critical questions in the development of rational and personalized treatment of cancer involve understanding precisely how anticancer drugs induce apoptosis, why the extent of cell killing varies so dramatically from one tumor to the next, and how we can predict response to chemotherapy, both ‘‘targeted’’ and cytotoxic. As yet no quantitative, model-based studies of these issues have been reported, but it seems almost certain that sensitivity and resistance will be controlled in a multifactorial manner. Genes and proteins that are important in one cellular setting 936 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
will not be significant in another. In the case of TRAIL, for example, conventional molecular approaches have implicated the levels of O-glycosylation enzymes (GALNT3, GALNT14; (Wagner et al., 2007), TRAIL decoy receptors (DcR1, DcR2, and osteoprotegerin), c-FLIP, BclxL, and inhibitor of apoptosis proteins (IAPs) in TRAIL resistance in different cell lines (reviewed in Zhang and Fang, 2005). It is likely that all of these explanations are correct to some degree, and the key task therefore becomes understanding the role of context. This is precisely where models hold great promise, as they are able to quantify and weigh the contributions of multiple factors. Such context sensitivity could be implemented by using a model in which the topology and rate constants remain the same for all cell types but protein concentrations (initial conditions) are altered to match experimentally measured protein levels. Ultimately, we need to understand the regulation of apoptosis in the context of real human tissues and tumors. Because mechanistic modeling is dependent on quantitative, multiplex data, this will not be straightforward, even in model organisms. New in vivo caspase activity probes (Edgington et al., 2009) and high-resolution intravital microscopy (Condeelis and Weissleder, 2010) will play an important role in data acquisition in vivo, but it also seems probable that the development of mechanistic models able to store, simulate, and rationalize results obtained across a panel of cancer cell lines will be essential. Such context-sensitive modeling might uncover a multifactorial measurement that could be made on real human tumors. Expression profiling and cancer genome sequencing also aspire to personalize cancer therapy, but the framework we envision is complementary in focusing on biochemical mechanism. A multiplex measurement method (BH3 profiling) already exists to estimate the propensity of cells to undergo apoptosis; it involves permeabilizing cells and then monitoring their responses to diverse BH3-only peptides (Deng et al., 2007). BH3 profiling can predict sensitivity to conventional chemotherapies and to the Bcl-2/BclxL antagonist ABT-737 (Deng et al., 2007). It would be valuable to construct a predictive mathematical framework for BH3 profiling and thereby generate precise mechanistic understanding of drug sensitivity and resistance that could be translated clinically. Single-cell analysis of cellular responses to FasL and TRAIL has highlighted the dramatic impact of cell-to-cell variability in determining the timing and probability of response. That cells surviving exposure to a death ligand or cytotoxic drug can resume normal proliferation is a testament to the ‘‘stiff trigger,’’ ‘‘all-or-nothing’’ nature of the apoptotic switch. Cells that cross the threshold for MOMP are normally fully committed to die, whereas cells that remain below it can recover and continue to proliferate. In the case of receptor-mediated apoptosis, the presence of a dose-dependent variable delay preceding MOMP followed by a dose-independent and nearly invariant post-MOMP period likely reflects the evolutionary advantages of such a system. Variability in the timing and probability of apoptosis makes it possible for a uniform population of cells to respond to a prodeath stimulus in a graded manner, even though the response is binary at the single-cell level. In contrast, by undergoing MOMP and effector caspase activation in a rapid and invariant way, cells avoid the highly deleterious effects of
initiating but not completing apoptosis; these effects include formation of ‘‘undead’’ cells with damaged genomes. Variability in response appears to be universal across diverse cell lines and proapoptotic stimuli (Cohen et al., 2008; Gascoigne and Taylor, 2008; Geva-Zatorsky et al., 2006; Orth et al., 2008; Sharma et al., 2010; Spencer et al., 2009; Huang et al., 2010). For example, Gascoigne and Taylor (2008) characterized the response of 15 cell lines to three different classes of antimitotic drugs and found significant inter- and intra-cell line variation, with cells exhibiting multiple distinct phenotypes in response to the same treatment. Cohen et al. (2008) correlated variability in the levels of two proteins with the life-or-death response to the cancer drug camptothecin. Most recently, Sharma et al. (2010) detected a small subpopulation of reversibly ‘‘drug-tolerant’’ cells following treatment with cisplatin or the epidermal growth factor receptor inhibitor erlotinib. The significance of these findings is that cancer therapy is beset by the problem of fractional, or incomplete, killing of tumor cells. Multiple explanations have been proposed for fractional killing, including drug insensitivity during certain phases of the cell cycle, genetic heterogeneity, incomplete access of tumor to drug (Chabner and Longo, 2006; Skeel, 2003), and the existence of drug-resistant cancer stem cells (Reya et al., 2001). Single-cell imaging and computational modeling of apoptosis have added to this list cell-to-cell variability in protein levels arising from stochasticity in protein expression (Spencer et al., 2009). A critical task for the future will be to ascertain the relative importance of these processes in determining the extent of fractional killing with real tumors and therapeutic protocols. Because a wide variety of biochemical processes are involved, all operating on different timescales, developing an appropriate quantitative framework will be a key step to better understanding. ACKNOWLEDGMENTS The authors thank J. Albeck, J. Bachman, D. Flusberg, S. Gaudet, T. Letai, and C. Lopez for their help and acknowledge NIH grants GM68762 and CA139980 for support. REFERENCES
Betti, C.J., Villalobos, M.J., Jiang, Q., Cline, E., Diaz, M.O., Loredo, G., and Vaughan, A.T. (2005). Cleavage of the MLL gene by activators of apoptosis is independent of topoisomerase II activity. Leukemia 19, 2289–2295. Bhola, P.D., and Simon, S.M. (2009). Determinism and divergence of apoptosis susceptibility in mammalian cells. J. Cell Sci. 122, 4296–4302. Borges, J.L., and Hurley, A. (1999). Collected Fictions (London: Allen Lane The Penguin Press). Calzone, L., Tournier, L., Fourquet, S., Thieffry, D., Zhivotovsky, B., Barillot, E., and Zinovyev, A. (2010). Mathematical modelling of cell-fate decision in response to death receptor engagement. PLoS Comput. Biol. 6, e1000702. Chabner, B., and Longo, D.L. (2006). Cancer Chemotherapy and Biotherapy: Principles and Practice, Fourth Edition (Philadelphia: Lippincott Willians & Wilkins). Chen, C., Cui, J., Lu, H., Wang, R., Zhang, S., and Shen, P. (2007). Modeling of the role of a Bax-activation switch in the mitochondrial apoptosis decision. Biophys. J. 92, 4304–4315. Chen, W.W., Niepel, M., and Sorger, P.K. (2010). Classic and contemporary approaches to modeling biochemical reactions. Genes Dev. 24, 1861–1875. Chipuk, J.E., Bouchier-Hayes, L., and Green, D.R. (2006). Mitochondrial outer membrane permeabilization during apoptosis: the innocent bystander scenario. Cell Death Differ. 13, 1396–1402. Chipuk, J.E., and Green, D.R. (2008). How do BCL-2 proteins induce mitochondrial outer membrane permeabilization? Trends Cell Biol. 18, 157–164. Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., et al. (2008). Dynamic proteomics of individual cancer cells in response to a drug. Science 322, 1511–1516. Condeelis, J., and Weissleder, R. (2010). In vivo imaging in cancer. Cold Spring Harb. Perspect. Biol. 2, a003848. Deng, J., Carlson, N., Takeyama, K., Dal Cin, P., Shipp, M., and Letai, A. (2007). BH3 profiling identifies three distinct classes of apoptotic blocks to predict response to ABT-737 and conventional chemotherapeutic agents. Cancer Cell 12, 171–185. Dussmann, H., Rehm, M., Concannon, C.G., Anguissola, S., Wurstle, M., Kacmar, S., Voller, P., Huber, H.J., and Prehn, J.H. (2010). Single-cell quantification of Bax activation and mathematical modelling suggest pore formation on minimal mitochondrial Bax accumulation. Cell Death Differ. 17, 278–290. Edgington, L.E., Berger, A.B., Blum, G., Albrow, V.E., Paulick, M.G., Lineberry, N., and Bogyo, M. (2009). Noninvasive optical imaging of apoptosis by caspase-targeted activity-based probes. Nat. Med. 15, 967–973. Eissing, T., Conzelmann, H., Gilles, E.D., Allgower, F., Bullinger, E., and Scheurich, P. (2004). Bistability analyses of a caspase activation model for receptor-induced apoptosis. J. Biol. Chem. 279, 36892–36897.
Albeck, J., Macbeath, G., White, F., Sorger, P., Lauffenburger, D., and Gaudet, S. (2006). Collecting and organizing systematic sets of protein data. Nat. Rev. Mol. Cell Biol. 7, 803–812.
Eissing, T., Allgower, F., and Bullinger, E. (2005). Robustness properties of apoptosis models with respect to parameter variations and intrinsic noise. Syst. Biol. (Stevenage) 152, 221–228.
Albeck, J.G., Burke, J.M., Aldridge, B.B., Zhang, M., Lauffenburger, D.A., and Sorger, P.K. (2008a). Quantitative analysis of pathways controlling extrinsic apoptosis in single cells. Mol. Cell 30, 11–25.
Fadeel, B., Orrenius, S., and Zhivotovsky, B. (1999). Apoptosis in human disease: a new skin for the old ceremony? Biochem. Biophys. Res. Commun. 266, 699–717.
Albeck, J.G., Burke, J.M., Spencer, S.L., Lauffenburger, D.A., and Sorger, P.K. (2008b). Modeling a snap-action, variable-delay switch controlling extrinsic cell death. PLoS Biol. 6, 2831–2852.
Faeder, J.R., Blinov, M.L., and Hlavacek, W.S. (2009). Rule-based modeling of biochemical systems with BioNetGen. Methods Mol. Biol. 500, 113–167.
Ashall, L., Horton, C.A., Nelson, D.E., Paszek, P., Harper, C.V., Sillitoe, K., Ryan, S., Spiller, D.G., Unitt, J.F., Broomhead, D.S., et al. (2009). Pulsatile stimulation determines timing and specificity of NF-kappaB-dependent transcription. Science 324, 242–246.
Falschlehner, C., Emmerich, C.H., Gerlach, B., and Walczak, H. (2007). TRAIL signalling: decisions between life and death. Int. J. Biochem. Cell Biol. 39, 1462–1475. Ferrell, J.E., Jr., and Machleder, E.M. (1998). The biochemical basis of an allor-none cell fate switch in Xenopus oocytes. Science 280, 895–898.
Bagci, E.Z., Vodovotz, Y., Billiar, T.R., Ermentrout, G.B., and Bahar, I. (2006). Bistability in apoptosis: roles of bax, bcl-2, and mitochondrial permeability transition pores. Biophys. J. 90, 1546–1559.
Fricker, N., Beaudouin, J., Richter, P., Eils, R., Krammer, P.H., and Lavrik, I.N. (2010). Model-based dissection of CD95 signaling dynamics reveals both a pro- and antiapoptotic role of c-FLIPL. J. Cell Biol. 190, 377–389.
Bentele, M., Lavrik, I., Ulrich, M., Stosser, S., Heermann, D.W., Kalthoff, H., Krammer, P.H., and Eils, R. (2004). Mathematical modeling reveals threshold mechanism in CD95-induced apoptosis. J. Cell Biol. 166, 839–851.
Friedman, N., Cai, L., and Xie, X.S. (2006). Linking stochastic dynamics to population distribution: an analytical framework of gene expression. Phys. Rev. Lett. 97, 168302.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 937
Fuentes-Prior, P., and Salvesen, G.S. (2004). The protein structures that shape caspase activity, specificity, activation and inhibition. Biochem. J. 384, 201–232.
Krishna, S., Banerjee, B., Ramakrishnan, T.V., and Shivashankar, G.V. (2005). Stochastic simulations of the origins and implications of long-tailed distributions in gene expression. Proc. Natl. Acad. Sci. USA 102, 4771–4776.
Fussenegger, M., Bailey, J.E., and Varner, J. (2000). A mathematical model of caspase function in apoptosis. Nat. Biotechnol. 18, 768–774.
Lavrik, I.N., Golks, A., Riess, D., Bentele, M., Eils, R., and Krammer, P.H. (2007). Analysis of CD95 threshold signaling: triggering of CD95 (FAS/ APO-1) at low concentrations primarily results in survival signaling. J. Biol. Chem. 282, 13664–13671.
Gascoigne, K.E., and Taylor, S.S. (2008). Cancer cells display profound intraand interline variation following prolonged exposure to antimitotic drugs. Cancer Cell 14, 111–122. Geva-Zatorsky, N., Rosenfeld, N., Itzkovitz, S., Milo, R., Sigal, A., Dekel, E., Yarnitzky, T., Liron, Y., Polak, P., Lahav, G., et al. (2006). Oscillations and variability in the p53 system. Mol. Syst. Biol. 2, 2006.0033. Gillespie, D.T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361. Goldstein, J.C., Kluck, R.M., and Green, D.R. (2000a). A single cell analysis of apoptosis. Ordering the apoptotic phenotype. Ann. N Y Acad. Sci. 926, 132–141. Goldstein, J.C., Waterhouse, N.J., Juin, P., Evan, G.I., and Green, D.R. (2000b). The coordinate release of cytochrome c during apoptosis is rapid, complete and kinetically invariant. Nat. Cell Biol. 2, 156–162. Goldstein, J.C., Munoz-Pinedo, C., Ricci, J.E., Adams, S.R., Kelekar, A., Schuler, M., Tsien, R.Y., and Green, D.R. (2005). Cytochrome c is released in a single step during apoptosis. Cell Death Differ. 12, 453–462. Gonzalvez, F., and Ashkenazi, A. (2010). New insights into apoptosis signaling by Apo2L/TRAIL. Oncogene 29, 4752–4765. Green, D.R. (2011). Means to an End: Apoptosis and Other Cell Death Mechanisms (Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory Press). Hellwig, C.T., Kohler, B.F., Lehtivarjo, A.K., Dussmann, H., Courtney, M.J., Prehn, J.H., and Rehm, M. (2008). Real time analysis of tumor necrosis factor-related apoptosis-inducing ligand/cycloheximide-induced caspase activities during apoptosis initiation. J. Biol. Chem. 283, 21676–21685. Hellwig, C.T., Ludwig-Galezowska, A.H., Concannon, C.G., Litchfield, D.W., Prehn, J.H., and Rehm, M. (2010). Activity of protein kinase CK2 uncouples Bid cleavage from caspase-8 activation. J. Cell Sci. 123, 1401–1406. Hengartner, M.O. (2000). The biochemistry of apoptosis. Nature 407, 770–776. Hlavacek, W.S., Faeder, J.R., Blinov, M.L., Posner, R.G., Hucka, M., and Fontana, W. (2006). Rules for modeling signal-transduction systems. Sci. STKE 2006, re6. Ho, K.L., and Harrington, H.A. (2010). Bistability in apoptosis by receptor clustering. PLoS Comput. Biol. 6, e1000956. Hoffmann, A., Levchenko, A., Scott, M.L., and Baltimore, D. (2002). The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation. Science 298, 1241–1245. Hua, F., Cornejo, M.G., Cardone, M.H, Stokes, C.L., and Lauffenburger, D.A. (2005). Effects of Bcl-2 levels on Fas signaling-induced caspase-3 activation: molecular genetic tests of computational model predictions. J. Immunol. 175, 985–995. Huang, H.C., Mitchison, T.J., and Shi, J. (2010). Stochastic competition between mechanistically independent slippage and death pathways determines cell fate during mitotic arrest. PLoS One 5, e15724. Jaqaman, K., and Danuser, G. (2006). Linking data to models: data regression. Nat. Rev. Mol. Cell Biol. 7, 813–819. Jost, P.J., Grabow, S., Gray, D., McKenzie, M.D., Nachbur, U., Huang, D.C., Bouillet, P., Thomas, H.E., Borner, C., Silke, J., et al. (2009). XIAP discriminates between type I and type II FAS-induced apoptosis. Nature 460, 1035–1039. Kaufmann, S.H., and Earnshaw, W.C. (2000). Induction of apoptosis by cancer chemotherapy. Exp. Cell Res. 256, 42–49.
Lee, T.K., Denny, E.M., Sanghvi, J.C., Gaston, J.E., Maynard, N.D., Hughey, J.J., and Covert, M.W. (2009). A noisy paracrine signal determines the cellular NF-kappaB response to lipopolysaccharide. Sci. Signal. 2, ra65. Legewie, S., Bluthgen, N., and Herzel, H. (2006). Mathematical modeling identifies inhibitors of apoptosis as mediators of positive feedback and bistability. PLoS Comput. Biol. 2, e120. Letai, A.G. (2008). Diagnosing and exploiting cancer’s addiction to blocks in apoptosis. Nat. Rev. Cancer 8, 121–132. Lovell, J.F., Billen, L.P., Bindner, S., Shamas-Din, A., Fradin, C., Leber, B., and Andrews, D.W. (2008). Membrane binding by tBid initiates an ordered series of events culminating in membrane permeabilization by Bax. Cell 135, 1074– 1084. Lovric, M.M., and Hawkins, C.J. (2010). TRAIL treatment provokes mutations in surviving cells. Oncogene 29, 5048–5060. Luan, D., Zai, M., and Varner, J.D. (2007). Computationally derived points of fragility of a human cascade are consistent with current therapeutic strategies. PLoS Comput. Biol. 3, e142. Luo, K.Q., Yu, V.C., Pu, Y., and Chang, D.C. (2003). Measuring dynamics of caspase-8 activation in a single living HeLa cell during TNFalpha-induced apoptosis. Biochem. Biophys. Res. Commun. 304, 217–222. Maas, C., Verbrugge, I., de Vries, E., Savich, G., van de Kooij, L.W., Tait, S.W., and Borst, J. (2010). Smac/DIABLO release from mitochondria and XIAP inhibition are essential to limit clonogenicity of type I tumor cells after TRAIL receptor stimulation. Cell Death Differ. 17, 1613–1623. Madesh, M., Antonsson, B., Srinivasula, S.M., Alnemri, E.S., and Hajnoczky, G. (2002). Rapid kinetics of tBid-induced cytochrome c and Smac/DIABLO release and mitochondrial depolarization. J. Biol. Chem. 277, 5651–5659. Nelson, D.E., Ihekwaba, A.E., Elliott, M., Johnson, J.R., Gibney, C.A., Foreman, B.E., Nelson, G., See, V., Horton, C.A., Spiller, D.G., et al. (2004). Oscillations in NF-kappaB signaling control the dynamics of gene expression. Science 306, 704–708. Neumann, L., Pforr, C., Beaudouin, J., Pappa, A., Fricker, N., Krammer, P.H., Lavrik, I.N., and Eils, R. (2010). Dynamics within the CD95 death-inducing signaling complex decide life and death of cells. Mol. Syst. Biol. 6, 352. Niepel, M., Spencer, S.L., and Sorger, P.K. (2009). Non-genetic cell-to-cell variability and the consequences for pharmacology. Curr. Opin. Chem. Biol. 13, 556–561. Orth, J.D., Tang, Y., Shi, J., Loy, C.T., Amendt, C., Wilm, C., Zenke, F.T., and Mitchison, T.J. (2008). Quantitative live imaging of cancer and normal cells treated with Kinesin-5 inhibitors indicates significant differences in phenotypic responses and cell fate. Mol. Cancer Ther. 7, 3480–3489. Ozbudak, E.M., Thattai, M., Lim, H.N., Shraiman, B.I., and Van Oudenaarden, A. (2004). Multistability in the lactose utilization network of Escherichia coli. Nature 427, 737–740. Raj, A., and van Oudenaarden, A. (2008). Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216–226. Rehm, M., Dussmann, H., Janicke, R.U., Tavare, J.M., Kogel, D., and Prehn, J.H. (2002). Single-cell fluorescence resonance energy transfer analysis demonstrates that caspase activation during apoptosis is a rapid process. Role of caspase-3. J. Biol. Chem. 277, 24506–24514.
Kim, K.A., Spencer, S.L., Albeck, J.G., Burke, J.M., Sorger, P.K., Gaudet, S., and Kim do, H. (2010). Systematic calibration of a cell signaling network model. BMC Bioinformatics 11, 202.
Rehm, M., Dussmann, H., and Prehn, J.H. (2003). Real-time single cell analysis of Smac/DIABLO release during apoptosis. J. Cell Biol. 162, 1031–1043.
Korsmeyer, S.J., Shutter, J.R., Veis, D.J., Merry, D.E., and Oltvai, Z.N. (1993). Bcl-2/Bax: a rheostat that regulates an anti-oxidant pathway and cell death. Semin. Cancer Biol. 4, 327–332.
Rehm, M., Huber, H.J., Dussmann, H., and Prehn, J.H. (2006). Systems analysis of effector caspase activation and its control by X-linked inhibitor of apoptosis protein. EMBO J. 25, 4338–4349.
938 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Reya, T., Morrison, S.J., Clarke, M.F., and Weissman, I.L. (2001). Stem cells, cancer, and cancer stem cells. Nature 414, 105–111.
Thornberry, N.A., Rano, T.A., Peterson, E.P., Rasper, D.M., Timkey, T., GarciaCalvo, M., Houtzager, V.M., Nordstrom, P.A., Roy, S., Vaillancourt, J.P., et al. (1997). A combinatorial approach defines specificities of members of the caspase family and granzyme B. Functional relationships established for key mediators of apoptosis. J. Biol. Chem. 272, 17907–17911.
Scaffidi, C., Fulda, S., Srinivasan, A., Friesen, C., Li, F., Tomaselli, K.J., Debatin, K.M., Krammer, P.H., and Peter, M.E. (1998). Two CD95 (APO-1/Fas) signaling pathways. EMBO J. 17, 1675–1687.
Tyas, L., Brophy, V.A., Pope, A., Rivett, A.J., and Tavare, J.M. (2000). Rapid caspase-3 activation during apoptosis revealed using fluorescence-resonance energy transfer. EMBO Rep. 1, 266–270.
Rehm, M., Huber, H.J., Hellwig, C.T., Anguissola, S., Dussmann, H., and Prehn, J.H. (2009). Dynamics of outer mitochondrial membrane permeabilization during apoptosis. Cell Death Differ. 16, 613–623.
Schile, A.J., Garcia-Fernandez, M., and Steller, H. (2008). Regulation of apoptosis by XIAP ubiquitin-ligase activity. Genes Dev. 22, 2256–2266. Schutze, S., Tchikov, V., and Schneider-Brachert, W. (2008). Regulation of TNFR1 and CD95 signalling by receptor compartmentalization. Nat. Rev. Mol. Cell Biol. 9, 655–662. Sharma, S.V., Lee, D.Y., Li, B., Quinlan, M.P., Takahashi, F., Maheswaran, S., McDermott, U., Azizian, N., Zou, L., Fischbach, M.A., et al. (2010). A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80. Sigal, A., Milo, R., Cohen, A., Geva-Zatorsky, N., Klein, Y., Liron, Y., Rosenfeld, N., Danon, T., Perzov, N., and Alon, U. (2006). Variability and memory of protein levels in human cells. Nature 444, 643–646. Skeel, R.T. (2003). Handbook of Cancer Chemotherapy, Sixth Edition (Philadelphia: Lippincott Williams & Wilkins).
Vaughan, A.T., Betti, C.J., and Villalobos, M.J. (2002). Surviving apoptosis. Apoptosis 7, 173–177. Vaughan, A.T., Betti, C.J., Villalobos, M.J., Premkumar, K., Cline, E., Jiang, Q., and Diaz, M.O. (2005). Surviving apoptosis: a possible mechanism of benzeneinduced leukemia. Chem. Biol. Interact. 153-154, 179–185. Villalobos, M.J., Betti, C.J., and Vaughan, A.T. (2006). Detection of DNA double-strand breaks and chromosome translocations using ligation-mediated PCR and inverse PCR. Methods Mol. Biol. 314, 109–121. von Ahsen, O., Renken, C., Perkins, G., Kluck, R.M., Bossy-Wetzel, E., and Newmeyer, D.D. (2000). Preservation of mitochondrial structure and function after Bid- or Bax-mediated cytochrome c release. J. Cell Biol. 150, 1027–1036.
Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M., and Sorger, P.K. (2009). Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432.
Wagner, K.W., Punnoose, E.A., Januario, T., Lawrence, D.A., Pitti, R.M., Lancaster, K., Lee, D., von Goetz, M., Yee, S.F., Totpal, K., et al. (2007). Deathreceptor O-glycosylation controls tumor-cell sensitivity to the proapoptotic ligand Apo2L/TRAIL. Nat. Med. 13, 1070–1077.
Stennicke, H.R., Renatus, M., Meldal, M., and Salvesen, G.S. (2000). Internally quenched fluorescent peptide substrates disclose the subsite preferences of human caspases 1, 3, 6, 7 and 8. Biochem. J. 350, 563–568.
Wurstle, M.L., Laussmann, M.A., and Rehm, M. (2010). The caspase-8 dimerization/dissociation balance is a highly potent regulator of caspase-8, -3, -6 signaling. J. Biol. Chem. 285, 33209–33218.
Sun, T., Lin, X., Wei, Y., Xu, Y., and Shen, P. (2010). Evaluating bistability of Bax activation switch. FEBS Lett. 584, 954–960. Tait, S.W., Parsons, M.J., Llambi, F., Bouchier-Hayes, L., Connell, S., MunozPinedo, C., and Green, D.R. (2010). Resistance to caspase-independent cell death requires persistence of intact mitochondria. Dev. Cell 18, 802–813.
Zhang, L., and Fang, B. (2005). Mechanisms of resistance to TRAIL-induced apoptosis in cancer. Cancer Gene Ther. 12, 228–237. Zheng, Q., and Ross, J. (1991). Comparison of deterministic and stochastic kinetics for nonlinear systems. J. Chem. Phys. 94, 3644–3648.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 939
Leading Edge
Review Control of the Embryonic Stem Cell State Richard A. Young1,2,* 1Whitehead
Institute for Biomedical Research, Cambridge, MA 02142, USA of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.01.032 2Department
Embryonic stem cells and induced pluripotent stem cells hold great promise for regenerative medicine. These cells can be propagated in culture in an undifferentiated state but can be induced to differentiate into specialized cell types. Moreover, these cells provide a powerful model system for studies of cellular identity and early mammalian development. Recent studies have provided insights into the transcriptional control of embryonic stem cell state, including the regulatory circuitry underlying pluripotency. These studies have, as a consequence, uncovered fundamental mechanisms that control mammalian gene expression, connect gene expression to chromosome structure, and contribute to human disease. Introduction Embryonic stem cells (ESCs) are pluripotent, self-renewing cells that are derived from the inner cell mass (ICM) of the developing blastocyst. Pluripotency is the capacity of a single cell to generate all cell lineages of the developing and adult organism. Self-renewal is the ability of a cell to proliferate in the same state. The molecular mechanisms that control ESC pluripotency and self-renewal are important to discover because they are key to understanding development. Because defects in development cause many different diseases, improved understanding of control mechanisms in pluripotent cells may lead to new therapies for these diseases. ESCs have a gene expression program that allows them to self-renew yet remain poised to differentiate into essentially all cell types in response to developmental cues. Recent reviews have discussed ESCs and developmental potency (Rossant, 2008), the nature of the pluripotent ground state of ESCs (Silva and Smith, 2008), ESC transcriptional regulatory circuitry (Chen et al., 2008a; Jaenisch and Young, 2008; Macarthur et al., 2009; Orkin et al., 2008), the influence of extrinsic factors on pluripotency (Pera and Tam, 2010), and cellular reprogramming into ESC-like states (Hanna et al., 2010; Stadtfeld and Hochedlinger, 2010; Yamanaka and Blau, 2010). This Review provides a synthesis of key concepts that explain how pluripotency and self-renewal are controlled transcriptionally. These concepts have emerged from genetic, biochemical, and molecular studies of the transcription factors, cofactors, chromatin regulators, and noncoding RNAs (ncRNAs) that control the ESC gene expression program. The regulators of gene expression programs can participate in gene activation, establish a poised state for gene activation in response to developmental cues, or contribute to gene silencing (Figure 1). The molecular mechanisms by which these regulators generally participate in control of gene expression are the subject of other reviews (Bartel, 2009; Bonasio et al., 2010; Fuda et al., 2009; Ho and Crabtree, 2010; Li et al., 2007; Roeder, 2005; Surface et al., 2010; Taatjes, 2010). I describe here the 940 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
regulators that have been implicated in control of ESC state and discuss how they contribute to the gene expression program of pluripotency and self-renewal. The Stem Cell State ESCs are perhaps unique in that they’ve been the subject of virtually every scale of investigation from large-scale genomics and protein-DNA interaction studies to highly focused mechanistic studies of individual regulatory factors. The combined results of these systems-level and molecular approaches offer a definition of embryonic stem cell state in terms of global gene regulation, which serves as both a baseline for understanding the changes that occur as cells differentiate and develop and as a means to understand the basic biology of these cells. For the purposes of this Review, this ‘‘state’’ is the product of all the regulatory inputs that produce the gene expression program of pluripotent, self-renewing cells. The most important regulatory inputs in ESCs appear to come from a small number of ‘‘core’’ transcription factors acting in concert with other transcription factors, some of which are terminal components of developmental signaling pathways. Transcription Factors Transcription factors recognize specific DNA sequences and either activate or prevent transcription. Early studies into the transcriptional control of the E. coli lac operon created the framework for understanding gene control (Jacob and Monod, 1961). In the absence of lactose, the lac operon is repressed by the Lac repressor, which binds the lac operator and inhibits transcription by RNA polymerase. In the presence of lactose, the Lac repressor is lost and gene expression is activated by a transcription-activating factor that binds a nearby site and recruits RNA polymerase. The fundamental concept that emerged from these studies—that gene control relies on specific repressors and activators and the DNA sequence elements they recognize—continues to provide the foundation for understanding control of gene expression in all organisms.
Figure 1. Models for Transcriptionally Active, Poised, and Silent Genes Transcription factors, cofactors, chromatin regulators, and ncRNA regulators can be found at active, poised, and silent genes. At active genes, enhancers are typically bound by multiple transcription factors, which recruit cofactors that can interact with RNA polymerase II at the core promoter. RNA polymerase II generates a short transcript and pauses until pause-release factors and elongation factors allow further transcription. Chromatin regulators, which include nucleosome-remodeling complexes such as Swi/Snf complexes and histone-modifying enzymes such as TrxG, Dot1, and Set2, are recruited by transcription factors or the transcription apparatus and mobilize or modify local nucleosomes. Poised genes are rapidly activated when ESCs are stimulated to differentiate. At poised genes, transcription initiation and recruitment of TrxG can occur, but pause release, elongation, and recruitment of Dot1 and Set2 do not occur. The PcG and SetDB1 chromatin regulators can contribute to this repression, and these can be recruited by some transcription factors and by ncRNAs. The RNA polymerase II ‘‘ghost’’ in this model of poised genes reflects the low levels of the enzyme that are detected under steady-state conditions. Silent genes show little or no evidence of transcription initiation or elongation and are often occupied by chromatin regulators that methylate histone H3K9 and other residues. Some of these silent genes are probably silenced by mechanisms that depend on transcription of at least a portion of the gene (Buhler and Moazed, 2007; Grewal and Elgin, 2007; Zaratiegui et al., 2007).
In mammals, transcription factors make up the largest single class of proteins encoded in the genome, representing approximately 10% of all protein-coding genes (Levine and Tjian, 2003; Vaquerizas et al., 2009). Transcription factors bind both to promoter-proximal DNA elements and to more distal regions that can be nearby or 100s of kb away. The elements that are involved in positive gene regulation are called enhancers, and these elements are generally bound by multiple transcription
factors. Transcription factors can activate gene expression by recruiting the transcription apparatus and/or by stimulating release of RNA polymerase II from pause sites (Fuda et al., 2009). They can also recruit various chromatin regulators to promoter regions to modify and mobilize nucleosomes in order to increase access to local DNA sequences (Li et al., 2007). In ESCs, the pluripotent state is largely governed by the core transcription factors Oct4, Sox2, and Nanog (Table 1) (Chambers and Smith, 2004; Niwa, 2007; Silva and Smith, 2008). Oct4 and Nanog were identified as key regulators based on their relatively unique expression pattern in ESCs and genetic experiments showing that they are essential for establishing or maintaining a robust pluripotent state (Chambers et al., 2003; Chambers and Smith, 2004; Mitsui et al., 2003; Nichols et al., 1998; Niwa et al., 2000). Oct4 functions as a heterodimer with Sox2 in ESCs, thus placing Sox2 among the key regulators (Ambrosetti et al., 2000; Avilion et al., 2003; Masui et al., 2007). Reprogramming of somatic cells into induced pluripotent stem (iPS) cells generally requires forced expression of Oct4 and Sox2, unless endogenous Sox2 is expressed in the somatic cell, consistent with the view that Oct4/Sox2 are key to establishing the ESC state (Hanna et al., 2010; Stadtfeld and Hochedlinger, 2010; Yamanaka and Blau, 2010). Although ESCs can be propagated in the absence of Nanog (Chambers et al., 2007), Nanog promotes a stable undifferentiated ESC state (Chambers et al., 2007), is necessary for pluripotency to develop in ICM cells (Silva et al., 2009), and co-occupies most sites with Oct4 and Sox2 throughout the ESC genome (Marson et al., 2008b), so it is included here as a component of the core regulatory circuitry. Core Regulatory Circuitry Two key concepts dominate our understanding of the function of the core transcription factors Oct4, Sox2, and Nanog in control of ESC state (Figure 2): (1) The core transcription factors function together to positively regulate their own promoters, forming an interconnected autoregulatory loop. (2) The core factors cooccupy and activate expression of genes necessary to maintain ESC state, while contributing to repression of genes encoding lineage-specific transcription factors whose absence helps prevent exit from the pluripotent state. The interconnected autoregulatory loop formed by Oct4, Sox2, and Nanog generates a bistable state for ESCs: residence in a positive-feedback-controlled gene expression program when the factors are expressed at appropriate levels, versus entrance into a differentiation program when any one of the master transcription factors is no longer functionally available (Boyer et al., 2005; Loh et al., 2006). This regulatory circuit likely explains the ability to jump-start the ESC gene expression program during reprogramming by forced expression of reprogramming factors (Jaenisch and Young, 2008). Thus, the ectopically expressed factors activate transcription of the endogenous Pou5f (Oct4), Sox2, and Nanog genes and thereby initiate the positive-feedback loop that sustains ongoing production of these factors from the endogenous genes in the absence of further input from the ectopically expressed factors. Some factors present in reprogramming cocktails, such as c-Myc, appear to facilitate activation of this interconnected autoregulatory circuitry by stimulating gene expression and proliferation more generally (Rahl et al., 2010). Cell 144, March 18, 2011 ª2011 Elsevier Inc. 941
Table 1. Transcriptional Regulators Implicated in Control of ESC State Type of Regulator
Function
References
Transcription Factors Oct4
Core circuitry
1
Sox2
Core circuitry
2
Nanog
Core circuitry
3
Tcf3
Wnt signaling to core circuitry
4
Stat3
Lif signaling to core circuitry
5
Smad1
BMP signaling to core circuitry
6
Smad2/3
TGF-b/Activin/Nodal signaling
7
c-Myc
Proliferation
8
Esrrb
Steroid hormone receptor
9
Sall4
Embryonic regulator
10
Tbx3
Mediates LIF signaling
11
Zfx
Self-renewal
12
Ronin
Metabolism
13
Klf4
LIF signaling
14
Prdm14
ESC identity
15
Mediator
Core circuitry
16
Cohesin
Core circuitry
17
Paf1 complex
Couples transcription with histone modification
18
Dax1
Oct4 inhibitor
19
Cnot3
Myc/Zfx cofactor
20
Trim28
Myc/Zfx cofactor
21
Polycomb group
Silencing of lineage-specific regulators
22
SetDB1 (ESET)
Silencing of lineage-specific regulators
23
esBAF
Nucleosome mobilization
24
Chd1
Nucleosome mobilization
25
Chd7
Nucleosome mobilization
26
Tip60-p400
Histone acetylation
27
miRNAs
Fine-tuning of pluripotency transcripts
28
GC-rich ncRNAs
PcG complex recruitment
29
Cofactors
Chromatin Regulators
ncRNA Regulators
The vast majority of these regulators were identified in murine ES cells, but most appear to play similar roles in human ES cells. LIF-Stat3 signaling is important for maintenance of murine ESCs and ActivinSmad2/3 signaling has been demonstrated to be important for human ESCs. References: 1 (Chambers et al., 2003; Chambers and Smith, 2004; Hart et al., 2004; Mitsui et al., 2003; Nichols et al., 1998; Niwa et al., 2000; Scholer et al., 1990); 2 (Chambers and Smith, 2004; Masui et al., 2007); 3 (Chambers et al., 2003; Hart et al., 2004; Mitsui et al., 2003); 4 (Cole et al., 2008; Marson et al., 2008b); 5 (Niwa et al., 1998); 6 (Ying et al., 2003); 7 (Beattie et al., 2005; James et al., 2005; Vallier et al., 2005), 8 (Cartwright et al., 2005); 9 (Ivanova et al., 2006; Zhang et al., 2008); 10 (Wu et al., 2006; Zhang et al., 2006); 11 (Han et al.; Ivanova et al., 2006; Niwa et al., 2009), 12 (Galan-Caridad et al., 2007); 13 (Dejosez
942 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Oct4, Sox2, and Nanog collaborate to activate a substantial fraction of the actively transcribed protein-coding and miRNA genes in ESCs (Figure 3A) (Chen et al., 2008b; Marson et al., 2008b). Sites co-occupied by the three core regulators generally have enhancer activity, and transcription of genes adjacent to such sites often depends on at least one of the trio (Chen et al., 2008b; Chew et al., 2005; Matoba et al., 2006). Oct4 and Nanog can bind and recruit multiple coactivators, as described below, accounting for their ability to activate genes. Oct4, Sox2, and Nanog also occupy repressed genes encoding cell-lineage-specific regulators, and the repression of these genes is essential for ESCs to maintain a stable pluripotent state and to undergo normal differentiation (Bilodeau et al., 2009; Boyer et al., 2005, 2006; Lee et al., 2006; Loh et al., 2006; Marson et al., 2008b; Pasini et al., 2008; Pasini et al., 2004). The loss of these core regulators leads to rapid induction of a wide spectrum of genes encoding lineage-specific regulators, indicating that these genes are poised for activation. How might Oct4/Sox2 and Nanog act to repress these genes? The SetDB1 and Polycomb group (PcG) chromatin regulators have both been implicated in repression of these lineagespecific regulatory genes. Oct4 can bind sumoylated SetDB1, which catalyzes the repressive histone modification H3K9me3 at many of these genes (Bilodeau et al., 2009; Yeap et al., 2009; Yuan et al., 2009). PcG complexes can associate with nucleosomes with histone H3K9me3 (Margueron et al., 2009) and further contribute to repression through mechanisms described below. It is also possible that Oct4, Sox2, and Nanog activate some level of transcription initiation in the extensive GCrich promoter regions of these genes. The corresponding GCrich RNA species produced from these regions might then recruit or stabilize PcG complexes (Guenther and Young, 2010; Zhao et al., 2010). Thus, Oct4 and its partners may recruit SetDB1 through protein-protein interactions and PcG complexes via interactions with both histone H3K9me3 and transcripts produced as a consequence of local transcription activation. The ability of Oct4, Sox2, and Nanog to positively regulate genes necessary to maintain ESC state while repressing genes that would enable egress from this state explains, in part, the ability of ESCs to self-renew in an undifferentiated state yet remain poised to differentiate into all cell types of the body in response to developmental cues. Additional regulators of gene expression are known to collaborate with Oct4, Sox2, and Nanog to control the ESC gene expression program (Table 1). Many of these regulators have emerged from systems-level
et al., 2008; Dejosez et al., 2010); 14 (Jiang et al., 2008; Niwa et al., 2009); 15 (Chia et al., 2010); 16 (Hu et al., 2009; Kagey et al., 2010); 17 (Fazzio et al., 2008; Hu et al., 2009; Kagey et al., 2010); 18 (Ding et al., 2009); 19 (Kim et al., 2008; Niakan et al., 2006; Sun et al., 2009); 20 (Hu et al., 2009); 21 (Fazzio et al., 2008; Hu et al., 2009); 22 (Azuara et al., 2006; Bernstein et al., 2006; Boyer et al., 2006; Bracken et al., 2006; Lee et al., 2006; Leeb et al., 2010; Li et al., 2010; Pasini et al.; Peng et al., 2009; Shen et al., 2009; Stock et al., 2007; van der Stoop et al., 2008); 23 (Bilodeau et al., 2009; Yeap et al., 2009; Yuan et al., 2009); 24 (Ho and Crabtree, 2010; Schnetz et al., 2010); 25 (Gaspar-Maia et al., 2009); 26 (Schnetz et al., 2010); 27 (Fazzio et al., 2008); 28 (Marson et al., 2008b); 29 (Guenther and Young, 2010; Surface et al., 2010).
Figure 2. Core Regulatory Circuitry Oct4, Sox2, and Nanog collaborate to regulate their own promoters, forming an interconnected autoregulatory loop. The Pou5f (Oct4), Sox2, and Nanog genes are represented as blue boxes and proteins as red balloons. These core transcription factors (O/S/N) function to activate expression of protein-coding and miRNA genes necessary to maintain ESC state, but they also occupy poised genes encoding lineage-specific protein and miRNA regulators whose repression is essential to maintaining that state. Additional transcription factors, such as the c-Myc/Max heterodimer (M/M), cause pause release at actively transcribed genes. A subset of the cofactors and chromatin regulators implicated in control of ES cell state (Table 1) are shown.
genetic and proteomic screens (Bilodeau et al., 2009; Chia et al., 2010; Fazzio et al., 2008; Hu et al., 2009; Ivanova et al., 2006; Kagey et al., 2010; Liang et al., 2008; Pardo et al., 2010; van den Berg et al., 2010; Wang et al., 2006; Zhao et al., 2010). Although the roles of these regulators are not yet fully understood, they ultimately exert their effects by regulating RNA polymerase II at various steps in transcription. Control of RNA Polymerase II Transcription factors control at least two major steps in gene expression (Fuda et al., 2009; Peterlin and Price, 2006; Rahl et al., 2010). Some transcription factors recruit RNA polymerase II to promoters, where the enzyme typically transcribes a short distance (approximately 35 bp) and then pauses or terminates. Other transcription factors recruit a cyclin-dependent kinase (Cdk9/cyclinT) called p-TEFb, which phosphorylates the polymerase and its associated pause control factors, allowing the enzyme to be released from pause sites and fully transcribe the gene. Oct4, Sox2, and Nanog interact with coactivators that bind to RNA polymerase II (Kagey et al., 2010), so the core regulators are involved in RNA polymerase II recruitment. In contrast, c-Myc, which plays important roles in ESC proliferation and self-renewal (Cartwright et al., 2005), does not appear to play an important role in RNA polymerase II recruitment but rather binds to E box sequences at core promoter sites and recruits p-TEFb, thus stimulating pause release (Rahl et al., 2010). A large proportion of the actively transcribed genes in ESCs are bound and regulated by both the core transcription factors and c-Myc (Figure 3A). Thus, Oct4/Sox2/Nanog apparently play dominant roles in selecting the set of ESC genes that will be actively transcribed and recruiting RNA polymerase II to these genes, while c-Myc regulates the efficiency with which these selected genes are fully transcribed. This likely explains why forced expression of c-Myc can enhance reprogramming efficiency and why this transcription factor plays such a potent role in proliferation of many cancer cells (Jaenisch and Young, 2008; Rahl et al., 2010). Multiple Enhancers and Enhanceosomes Enhancers are generally bound by multiple transcription factors, forming large nucleoprotein complexes called enhanceosomes, which permit cooperative binding between transcription factors and allow for synergistic and combinatorial effects on gene regu-
lation (Maniatis et al., 1998). The cooperative interactions among transcription factors binding to adjacent DNA sites and to cofactor complexes explains why multiple transcription factors are found together in the genome and why transcription factors bind stably to only a small subset of the millions of DNA sequence motifs present in the vertebrate genome. Many genes have multiple enhancers and thus multiple enhanceosomes (Levine and Tjian, 2003). In Drosophila, these multiple, seemingly redundant enhancers have been shown to contribute to phenotypic robustness during embryonic development (Frankel et al., 2010; Hong et al., 2008). That is, normal levels of gene expression are obtained despite environmental and genetic variability so long as genes are equipped with multiple enhancers. In addition to Oct4, Sox2, Nanog, and c-Myc, the transcription factors Tcf3, Smad1, Stat3, Esrrb, Sall4, Tbx3, Zfx, Ronin, Klf2, Klf4, Klf5, and PRDM14 have been shown to play important roles in control of ESC state (Table 1). The ChIP-Seq data that have been obtained for these transcription factors indicate that they can bind to loci occupied by Oct4/Sox2/Nanog as well as other loci (Figure 3B), forming sites that have been called multiple transcription factor-binding loci (MTL) (Chen et al., 2008b; Kim et al., 2008). Several lines of evidence indicate that most MTL are enhancers. Most MTL are occupied by the p300 cofactor, and the subset of MTL that are occupied by Oct4/Sox2/Nanog are also occupied by the mediator cofactor (Chen et al., 2008b; Kagey et al., 2010). All Oct4/Sox2/Nanog-containing MTL tested to date have been shown to exhibit enhancer activity (Chen et al., 2008b). It is therefore likely that functional enhanceosomes are formed at most MTL. The evidence obtained thus far suggests that most Oct4/ Sox2/Nanog-regulated genes are co-occupied by one or more of the other transcription factors implicated in control of ESCs (Figures 3A–3C). Examination of the Max gene reveals a typical pattern, where the promoter region contains a site bound by Oct4, Sox2, Nanog, Tcf3, and Essrb and various other sites occupied by c-Myc, Zfx, Ronin, and Klf4 (Figure 3D). Thus the functions of the core regulators (Oct4, Sox2, and Nanog) are augmented by the functions of many the other transcription factors implicated in control of ESC state at actively transcribed target genes. Cell 144, March 18, 2011 ª2011 Elsevier Inc. 943
Signaling to the Core Regulatory Circuitry Cells sense and respond to their cellular and biochemical environment through signal transduction pathways, which can deliver information to the genome in the form of activated transcription factors or cofactors. For ESCs, maintenance of the pluripotent state is dependent on the absence or inhibition of signals that stimulate differentiation (Pera and Tam, 2010; Silva and Smith, 2008). ESCs were initially cultured on a layer of irradiated fibroblasts in order to obtain the necessary factors for self-renewal and pluripotency (Smith, 2001; Smith and Hooper, 1983). LIF, Wnt, and ligands of the TGF-b/BMP signaling pathway were among factors supplied by the fibroblasts and found to influence the murine ESC state (Okita and Yamanaka, 2006; Sato et al., 2004; Smith et al., 1998; Williams et al., 1988; Ying et al., 2003). Remarkably, the transcription factors associated with the LIF, Wnt, and BMP4 signaling pathways (Stat3, Tcf3, and Smad1) tend to co-occupy enhancers bound by Oct4, Sox2, and Nanog, thereby allowing direct control of genes within the core circuitry by these signaling pathways (Figure 4) (Chen et al., 2008a, 2008b; Cole et al., 2008; Tam et al., 2008; Wu et al., 2006; Zhang et al., 2006). Loss of Oct4 leads to a loss of these signaling transcription factors at Oct4-bound enhancers. Thus, signals mediated by these pathways are delivered directly to the enhancers of genes within the core regulatory circuitry and can thereby have profound effects on pluripotency and self-renewal. This likely explains why manipulation of the Wnt signaling pathway can enhance reprogramming (Lluis et al., 2008; Marson et al., 2008a). Transcriptional Cofactors Cofactors are protein complexes that contribute to activation (coactivators) and repression (corepressors) but do not have DNA-binding properties of their own. Some cofactors mobilize or modify nucleosomes, and in these cases they are also considered chromatin regulators. Cofactors are generally expressed in most cell types, but ESCs are more sensitive than somatic cells to reduced levels of certain cofactors and chromatin regulators, such as mediator and cohesin (Fazzio and Panning, 2010; Kagey et al., 2010).
Figure 3. Relationships between Core and Other Transcription Factors in Regulatory Circuitry and Gene Control (A) Overlap between actively transcribed genes occupied by core transcription factors (TFs) (union of Oct4-, Sox2-, and Nanog-bound genes) and those occupied by c-Myc. Active genes (9355) were defined as the set of genes
944 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
occupied by both RNA polymerase II and nucleosomes with histone H3K79me3. (B) Frequency distribution showing how c-Myc, Tcf3, Smad1, Stat3, Esrrb, Tbx3, Zfx, Ronin, and Klf4 are associated with Oct4/Sox2/Nanog-occupied loci. Oct4, Sox2, and Nanog are the three transcription factors in the first bin, which indicates that 23% of O/S/N-bound loci are not occupied by any of the other transcription factors included in the analysis. Binding was called at a high confidence (p < 109) threshold within a 50 bp window, so the actual number of factors bound to Oct4/Sox2/Nanog-occupied loci is somewhat higher than indicated in this graph. (C) Frequency distribution showing how often c-Myc, Tcf3, Smad1, Stat3, Esrrb, Tbx3, Zfx, Ronin, and Klf4 are associated with Oct4/Sox2/Nanogoccupied genes (p < 109). Oct4, Sox2, and Nanog are the three transcription factors in the first bin, which shows that only 2% of O/S/N-bound genes lack binding to any of the other transcription factors. (D) Gene tracks showing an example of an actively transcribed gene (Max) occupied by an Oct4/Sox2/Nanog enhancer and other transcription factors implicated in ESC control. At this gene, Tcf3 and Essrb occupy the Oct4/Sox2/ Nanog enhancer and Zfx, Ronin, Klf4, and c-Myc bind loci closer to the transcription start site. ChIP-Seq data were obtained from GSE11431, GSE11724, GSE12680, and GSE22557.
Figure 4. Signaling to Core Regulatory Circuitry (A) Model of an enhancer where transcription factors associated with Wnt, LIF, and BMP4 signaling (Stat3, Tcf3, and Smad1) occupy sites near the core regulators. (B) Oct4 distal enhancer provides an example of a DNA element that is bound by the core regulators and signaling transcription factors and contains sequence motifs for each of these factors. (C) Frequency distribution showing how often signaling transcription factors (Stat3, Tcf3, and Smad1) are associated with Oct4/Sox2/Nanog-bound loci throughout genome. Binding was called at a high confidence (p < 109) threshold within a 50 bp window, so the actual percent of Oct4/Sox2/Nanog-bound loci that are occupied by signaling transcription factors is somewhat higher. ChIP-Seq data were obtained from GSE11431, GSE11724, GSE12680, and GSE22557.
Transcription factors that occupy active enhancers bind coactivators such as p300 and mediator, which in turn bind and control the activity of the transcription initiation apparatus (Conaway et al., 2005; Malik and Roeder, 2005; Roeder, 1998; Taatjes, 2010). The p300 and mediator coactivators are very large multisubunit complexes that can accommodate simultaneous interactions with many transcription factors. The p300 cofactor occupies most active promoters in ESCs (Chen et al., 2008b). Reduced levels of p300 do not appear to adversely affect ESCs but rather have a profound effect on ESC differentiation (Chen et al., 2008b; Zhong and Jin, 2009). Recent studies have shown that mediator physically links Oct4/Sox2/Nanog-bound enhancers to the promoters of active genes in the core regulatory circuitry of ESCs (Figure 5) (Kagey et al., 2010). The mediator recruited to the Oct4/Sox2/Nanogregulated promoters associates with the cohesion-loading factor Nipbl, which provides a mechanism for cohesin loading at these sites. The mediator/cohesin complex forms a looped chromosome architecture between enhancers and core promoters that is necessary for normal gene activity. Mediator and cohesin co-occupy different promoters in different cell types, thus generating cell-type-specific DNA loops associated with the gene expression program of each cell. Mediator plays an important role in the transcriptional response to signaling. The CDK8 kinase subunit of the mediator complex can influence the activity of signaling transcription factors (Alarcon et al., 2009; Fryer et al., 2004; Gao et al., 2009; Taatjes, 2010). For example, CDK8-meditated phosphorylation of the linker region within Smad1/5 or Smad2/3 complexes can activate these transcription factor complexes, but it also targets them for proteasomal degradation. A dynamic cycle of transcription factor activation and destruction ensures that continuous pathway activation is necessary for continuous gene activation and may facilitate rapid changes in cell state when signaling is altered. Cohesin and condensin complexes mediate essential changes in chromosome morphology associated with expression and maintenance of the genome (Nasmyth and Haering, 2009; Wood et al., 2010), and ESCs are highly sensitive to reduced levels of these key structural components of chromatin (Fazzio and Panning, 2010; Kagey et al., 2010). The association of cohesin with mediator and its contribution to both gene activity and DNA looping in ESCs makes it both an essential tran-
scriptional cofactor and a key chromatin regulator (Kagey et al., 2010). The presence of similar cohesin/condensin complexes in prokaryotes suggests that these proteins existed before histones and may thus have more ancient roles in structuring DNA than nucleosomes. ESCs are also sensitive to changes in the levels of the Paf1 complex, which is associated with RNA polymerase II at active genes (Ding et al., 2009). Based on studies in yeast, the Paf1 complex couples transcription initiation and elongation with histone H3K4 and H3K36 methylation (Krogan et al., 2003). In ESCs, the Paf1 complex may also play this role, as knockdowns lead to reduced levels of histone H3K4me3 at actively transcribed genes (Ding et al., 2009). Corepressors that have been implicated in control of ES cell state include Dax1, Cnot3, and Trim28 (Fazzio et al., 2008; Hu et al., 2009; Sun et al., 2009). Overexpression of Dax1 causes ESC differentiation, likely due to an inhibitory interaction with Oct4 (Sun et al., 2009). Cnot3 and Trim28 co-occupy many promoters with c-Myc and Zfx and probably contribute to control of proliferation and self-renewal. They differ somewhat in the additional promoters they occupy, which might explain why loss of Cnot3 causes ESCs to differentiate into trophectoderm, whereas loss of Trim28 causes cells to differentiate into the primitive ectoderm lineage (Hu et al., 2009). The mechanisms involved in ESC gene regulation by Cnot3 and Trim28 are not yet well understood, but Trim28 can interact with HP1 and SetDB1 to facilitate formation of repressive chromatin (Cammas et al., 2004; Schultz et al., 2002). In summary, ESCs are especially sensitive to reduced levels of certain cofactors, such as the mediator and PAF1 complexes, possibly because a large portion of the ESC genome is transcriptionally active and these cofactors are limiting. ESCs are also sensitive to the loss of specific corepressors, which apparently exert their control by acting on Oct4 directly or through repressive chromatin-modifying activities. Chromatin Regulators in ESC Gene Activity and Silencing Eukaryotic genomes are packaged into nucleosomes (Kornberg and Thomas, 1974; Olins and Olins, 1974), which provide a means to compact the genome and to influence gene expression. Early studies showed that nucleosomes can affect transcription in vitro (Knezetic and Luse, 1986; Lorch et al., 1987) Cell 144, March 18, 2011 ª2011 Elsevier Inc. 945
Figure 5. Mediator and Cohesin Contribute to Gene Control in Core Circuitry (A) ChIP-Seq data at the Pou5f gene for transcription factors, mediator and cohesin, and the transcription apparatus (Pol2 and TBP). Note evidence for crosslinking of most components to both enhancer elements and core promoter. The numbers on the y axis are reads/million. ChIP-Seq data were obtained from GSE11431, GSE11724, GSE12680, and GSE22557. (B) Model for DNA looping by mediator and cohesin. Oct4, Sox2, and Nanog bind mediator, which binds RNA polymerase II at the core promoter, thus forming a loop between the enhancer and the core promoter. The transcription activator-bound form of mediator binds the cohesion-loading factor Nipbl, which provides a means to load cohesin. Both mediator and cohesin are necessary for normal gene activity. This model contains a single DNA loop, but multiple enhancers may be bound simultaneously, generating multiple loops.
and in vivo (Han and Grunstein, 1988; Kayne et al., 1988). Subsequent studies revealed that gene expression can be influenced by proteins that modify histones (Brownell et al., 1996) or mobilize nucleosomes (Cote et al., 1994; Imbalzano et al., 1994; Kwon et al., 1994) and these have come to be known as chromatin regulators. Chromatin regulators are generally recruited to genes by DNA-binding transcription factors, the transcription apparatus, or specific RNA species (Guenther and Young, 2010; Li et al., 2007; Roeder, 2005; Surface et al., 2010). Some chromatin regulators are essential for ESC viability, including SetDB1 and the cohesin/condensin protein complexes (Dodge et al., 2004; Fazzio and Panning, 2010; Kagey et al., 2010), whereas others contribute to the stability of ESCs or establish a state that is essential for differentiation (Leeb et al., 2010; Meissner, 2010; Niwa, 2007). The chromatin regulators 946 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
that contribute to these states fall into four classes: cohesin/condensin protein complexes (discussed above), histone-modifying enzymes, ATP-dependent chromatin-remodeling complexes, and DNA methyltransferases. Histone-Modifying Enzymes The chromatin regulators known to have the most profound impact on ESC state are histone-modifying enzymes that repress genes encoding lineage-specific developmental regulators. These include the PcG protein complexes, SetDB1, and Tip60-p400. PcG and Trithorax group (TrxG) genes were discovered in Drosophila melanogaster as repressors and activators of Hox genes (Schuettengruber et al., 2007). TrxG proteins catalyze trimethylation of histone H3 lysine 4 (H3K4me3) at the promoters of active genes and facilitate maintenance of active gene states during development, in part by antagonizing the functions of PcG proteins. PcG protein complexes catalyze ubiquitylation of histone H2A lysine 119 (H2AK119u) and trimethylation of histone H3 lysine 27 (H3K27me3) and function in ESCs to help silence genes encoding key regulators of development yet allow them to remain in a state that is ‘‘poised’’ for activation during differentiation (Azuara et al., 2006; Bernstein et al., 2006; Boyer et al., 2006; Bracken et al., 2006; Endoh et al., 2008; Landeira et al., 2010; Lee et al., 2006; Li et al., 2010; Pan et al., 2007; Pasini et al., 2010; Peng et al., 2009; Shen et al., 2009; van der Stoop et al., 2008). PcG proteins are thought to inhibit transcription, at least in part, by restraining poised RNA polymerase molecules (Stock et al., 2007; Zhou et al., 2008b). ESCs lacking PcG protein complexes can be established but are unstable and tend to differentiate; when they do differentiate, they fail to execute differentiation programs appropriately (Leeb et al., 2010). Multiple histone H3 lysine 9 methyltransferases have been implicated in control of ESC state (Bilodeau et al., 2009; Yeap et al., 2009; Yuan et al., 2009). A subset of the silent genes that encode lineage-specific developmental regulators, including those involved in generating the extraembryonic trophoblast lineage, are occupied and repressed by SetDB1, which catalyzes methylation of histone H3 lysine 9. Thus, multiple repressive mechanisms, involving methylation of H3K27 and H3K9 and ubiquitylation of histone H2A, are used to silence genes encoding lineage-specific developmental regulators. The Tip60-p400 complex has multiple activities, among which is histone acetylation, and loss of this complex affects ESC morphology and state (Fazzio et al., 2008). It is found associated with active promoters in ESCs and appears to be recruited in two ways, directly by the H3K4me3 mark and indirectly by Nanog. Interestingly, the complex is also associated with nucleosomes with H3K4me3 at PcG-occupied genes encoding lineagespecific regulators, where it apparently facilitates repression of these poised genes. Because Tip60-p400 is generally found associated with active genes, its repressive function may derive from its potential role in facilitating transcription of ncRNAs that recruit or stabilize PcG complexes, as described below. ATP-Dependent Nucleosome Remodeling ATP-dependent nucleosome-remodeling complexes can be recruited by transcription factors and modified histones to the promoters of genes, where they enhance or reduce the access of transcriptional components to DNA sequences with resulting positive or negative effects on gene activity (Clapier and Cairns,
2009; Ho and Crabtree, 2010). Components of multiple ATPdependent nucleosome-remodeling complexes have been implicated in control of ESC state (Table 1) (Bilodeau et al., 2009; Gaspar-Maia et al., 2009; Ho and Crabtree, 2010; Klochendler-Yeivin et al., 2000; Schnetz et al., 2010). A complex purified from ESCs called esBAF has been shown to be associated with the promoters of genes under the control of Oct4, Sox2, and Nanog, and core subunits of this complex are essential for ESC maintenance (Ho and Crabtree, 2010). Chd1, a member of the chromodomain helicase DNA-binding (CHD) family of ATPdependent chromatin remodelers, is associated with the promoters of active genes, and Chd1-deficient ESCs are incapable of giving rise to primitive endoderm (Gaspar-Maia et al., 2009). Another member of the CHD family, Chd7, is associated with active Oct4/Sox2/Nanog-bound enhancers in ES cells, where it is thought to fine-tune the expression levels of ESCspecific genes (Schnetz et al., 2010). Unlike mutations in esBAF and Chd1, which affect ESC state, the effects of changing Chd7 dosage are subtle and do not appear to affect pluripotency or self-renewal. Thus, multiple ATP-dependent nucleosomeremodeling complexes are present at many key ESC genes. DNA Methylation DNA methylation is essential for mammalian development and is required in most somatic tissues. Although five DNA methyltransferases (Dnmt1, 2, 3a, 3b, and 3l) are expressed in ES cells and 60%–80% of all CpG dinucleotides are methylated (Meissner, 2010), ESCs can be established and maintained in the absence of Dnmts and DNA methylation. However, Dnmt-deficient ESCs are markedly deficient in differentiation (Jackson et al., 2004), which is likely due, at least in part, to their inability to completely silence genes encoding Oct4 and Nanog during differentiation (Feldman et al., 2006). Noncoding RNAs in ESC Regulatory Circuitry The idea that ncRNA might regulate genes was proposed at the dawn of studies on regulation of gene expression (Britten and Davidson, 1969; Jacob and Monod, 1961). It is now clear that ncRNA is involved in regulation of many important biological processes, including X inactivation, dosage compensation, imprinting, polycomb repression, and silencing of repeated elements, as described in several recent reviews (Lee, 2009; Surface et al., 2010; Wilusz et al., 2009; Zaratiegui et al., 2007). Indeed, a variety of ncRNA species have been implicated in control of ESC state (Table 1). These include miRNAs, which can regulate the stability and translatability of mRNAs and, acting in this fashion, play essential roles in normal ESC selfrenewal and cellular differentiation. They also include longer ncRNAs of various types, which have been implicated in recruitment of chromatin regulators such as the PcG complexes (Bracken and Helin, 2009; Guenther and Young, 2010; Surface et al., 2010; Wilusz et al., 2009; Zhao et al., 2010). miRNAs and Control of ESC Identify Multiple lines of evidence indicate that miRNAs contribute to the control of early development. ESCs deficient in miRNA-processing enzymes such as dicer and DCGR8 show defects in differentiation and proliferation (Kanellopoulou et al., 2005; Murchison et al., 2005; Wang et al., 2007). Two key themes have emerged from studying the regulation of miRNA genes in ESCs (Marson
et al., 2008b). First, the core regulators Oct4/Sox2/Nanog activate genes for miRNAs that are preferentially expressed in ESCs, and these miRNAs contribute to cell state maintenance and cell state transitions by fine-tuning the expression of key ESC genes and by promoting the rapid clearance of ESC transcripts during differentiation. Second, the core regulators co-occupy repressed lineage-specific miRNA genes with SteDB1 and PcG complexes, thus poising them for expression during differentiation. The core circuitry controls the expression of miRNAs that finetune the expression of key transcripts and promote the rapid clearance of ESC-specific transcripts during differentiation (Figure 6). Several miRNA polycistrons that specify the most abundant miRNAs in ESCs and that are silenced during early differentiation are positively regulated by Oct4/Sox2/Nanog (Marson et al., 2008b). These include the mir-290-295 cluster, and miRNAs with seed sequences in this family have been implicated in cell proliferation (He et al., 2005; O’Donnell et al., 2005; Wang et al., 2008) and have been shown to rescue the proliferation defects observed in miRNA-deficient ES cells (Kanellopoulou et al., 2005; Murchison et al., 2005; Wang et al., 2007, 2008). Furthermore, the zebrafish homolog of this miRNA family, miR-430, contributes to the rapid degradation of maternal transcripts in early zygotic development (Giraldez et al., 2006), and this miRNA family also promotes the clearance of transcripts in early mammalian development (Farh et al., 2005). The core transcription factors and PcG complexes co-occupy genes for miRNAs that are repressed in ESCs but become selectively expressed in cells of the immune system (mir-155), pancreatic islets (mir-375), neural cells (mir-124 and mir-9), and differentiating ESCs (mir-296) (Figure 6A) (Marson et al., 2008b). This set of miRNA genes is thus poised to contribute to cell-fate decisions during development in the same fashion as genes encoding lineage-specific transcription factors that are co-occupied by the core regulators and PcG complexes. Some of these miRNA genes are rapidly induced upon ESC differentiation and facilitate loss of ESC state; for example mir-296 targets Nanog mRNA (Tay et al., 2008) (Figure 6B). Other poised miRNA genes, such as those specifying mir-155, mir-375, mir-124, and mir-9, are induced in a tissue-specific manner during development. ncRNAs and Polycomb-Mediated Silencing Recent studies indicate that a broad spectrum of ncRNA molecules recruit or stabilize PcG complexes at specific sites in the ESC genome. Specific ncRNA molecules have been shown to recruit PcG complexes to the X-inactivation center X(ic), the kcnq1 domain, the INK4b/ARF/INK4a locus, the HOXD locus, and many other genomic loci in ESCs and other cell types (Gupta et al., 2010; Pandey et al., 2008; Rinn et al., 2007; Tsai et al., 2010; Yap et al., 2010; Zhao et al., 2010; Zhao et al., 2008). Noncoding RNAs of various lengths are transcribed bidirectionally by RNA polymerase II from a majority of promoters (Core et al., 2008; Guenther et al., 2007; He et al., 2008; Seila et al., 2008); some of these are able to bind PcG complexes, which tend to occupy genes near promoter sites (Guenther and Young, 2010; Kanhere et al., 2010; Surface et al., 2010; Zhao et al., 2010). Polycomb Repressive Complex 2 (PRC2), one of the PcG complexes, has been shown to bind RNA species of 200–1200 nucleotides that Cell 144, March 18, 2011 ª2011 Elsevier Inc. 947
scribed from a large number of promoter regions in ESCs. This may also help to explain why transcripts are occasionally observed from genes occupied by PcG complexes. ESCs and iPSCs Induced pluripotent stem cells (iPSCs) have been generated from a broad range of murine and human somatic cells by using forced expression of Oct4, Sox2, and other transcription factors (Hanna et al., 2010; Stadtfeld and Hochedlinger, 2010; Yamanaka and Blau, 2010). Fully reprogrammed murine iPSCs are apparently equivalent to ESCs in developmental potency and gene expression, although some iPSCs can retain a memory of their somatic program. A few murine iPSCs have been shown to be capable of generating ‘‘all-iPSC’’ mice and thus have a developmental potency equivalent to ESCs (Boland et al., 2009; Kang et al., 2009; Stadtfeld et al., 2010; Zhao et al., 2009). In one study with genetically matched murine ESCs and iPSCs, no consistent gene expression differences were observed, except for transcripts within the imprinted Dlk1–Dio3 gene cluster (Stadtfeld et al., 2010). Similarly, few differences were observed in a comparison of gene expression and histone modifications in human ESCs and iPSCs (Guenther et al., 2010). However, some iPSCs do retain an epigenetic memory of their donor cells (Kim et al., 2010b; Polo et al., 2010; Lister et al., 2011). These results indicate that ESC state and thus ESC regulatory circuitry is re-established in fully reprogrammed iPSCs, but that a limited memory of the gene expression program of the cell of origin can be observed in some iPSCs.
Figure 6. Selected Components of ESC Core Regulatory Circuitry and Its Disruption during Differentiation (A) This model of core regulatory circuitry incorporates selected proteincoding and miRNA target genes. Oct4, Sox2, and Nanog directly activate transcription of genes whose products include the spectrum of transcription factors, cofactors, chromatin regulators, and miRNAs that are known to contribute to ESC state. Oct4, Sox2, and Nanog are also associated with SetDB1- and PcG-repressed protein-coding and miRNA genes that are poised for differentiation. (B) The loss of ESC state during differentiation involves the silencing of the Pou5f1 gene, the proteolytic destruction of Nanog by caspase-3, and miRNAmediated reduction in Oct4, Nanog, and Sox2 mRNA levels.
originate from approximately 20% of the sites in the ESC genome that are occupied by PRC2 (Zhao et al., 2010). PRC2 can also bind to RNA species of 50–200 nucleotides (Kanhere et al., 2010). Polycomb Repressive Complex 1 (PRC1), another of the PcG complexes, can also bind specific RNA species (Yap et al., 2010). Thus, PcG complexes may generally be recruited, stabilized, and thus regulated by binding to ncRNAs that are tran948 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Transitioning from ES to Specialized States The ESC regulatory circuitry is reconfigured when cells are stimulated to differentiate (Figure 6B). ESC differentiation involves the loss of Oct4 and Nanog through transcriptional and posttranscriptional mechanisms, activation of lineage-specific transcription factors and miRNAs, and changes to the subunit composition of cofactors. Silencing of the Pou5f1 gene is mediated by trans-acting repressors such as ARP-1, COUP-TF1, and GCNF, nucleosome modification by the G9a H3K9 methyltransferase, and promoter DNA methylation by Dnmt3a/3b (Ben-Shushan et al., 1995; Feldman et al., 2006; Fuhrmann et al., 2001). Nanog undergoes proteolytic destruction by caspase-3 (Fujita et al., 2008). Specific miRNAs (mir-134, mir-296, and mir-470, for example) contribute to reduce the levels of Oct4, Nanog, and Sox2 mRNAs (Tay et al., 2008). Loss of the key ESC transcription factors leads to downregulation of the miRNA regulator Lin28, with consequent maturation of the Let-7 miRNAs, which are highly expressed in somatic tissues where they inhibit self-renewal genes (Melton et al., 2010; Viswanathan et al., 2008). Differentiation is also accompanied by modifications in the subunit composition of mediator, BAF, and TFIID complexes (Deato et al., 2008; Deato and Tjian, 2007; Ho and Crabtree, 2010; Taatjes, 2010). In summary, the signals that stimulate ESCs to differentiate cause changes in all classes of regulators discussed here: transcription factors, cofactors, chromatin regulators, and ncRNAs. Insights into Disease Mechanisms The study of ESC control has provided new insights into mechanisms that are involved in several human diseases. For
example, improved understanding of the functions of transcription factors such as c-Myc, cofactor complexes such as mediator and cohesin, and chromatin regulators such as TrxG and PcG has provided new insights into the molecular pathways affected by mutations in these regulators. Key aspects of the ESC gene expression program are recapitulated in cancer cells (Ben-Porath et al., 2008), and it has been argued that this is largely a consequence of c-Myc (Kim et al., 2010a). c-Myc amplification is the most frequent somatic copy-number amplification in tumor cells (Beroukhim et al., 2010). Tumor cells that overexpress c-Myc have enhanced expression of proliferation genes, and this is likely due to the role of c-Myc in recruiting P-TEFb to effect RNA polymerase II pause release at these genes (Rahl et al., 2010). This insight suggests that therapeutic agents that target control of transcription elongation may be valuable for treating tumors that overexpress c-Myc. Mutations in the genes encoding mediator, cohesion, and the cohesion-loading factor Nipbl can cause an array of human developmental syndromes and diseases. Mediator mutations have been associated with Opitz-Kaveggia (FG) syndrome, Lujan syndrome, schizophrenia, Transposition of the Great Arteries (TGA) syndrome, and colon cancer progression (Ding et al., 2008; Firestein et al., 2008; Muncke et al., 2003; Philibert and Madan, 2007; Risheg et al., 2007; Schwartz et al., 2007). Mutations in Nipbl and cohesin are responsible for most cases of Cornelia de Lange syndrome, which is characterized by developmental defects and mental retardation and appears to be the result of misregulation of gene expression rather than chromosome cohesion or mitotic abnormalities (Krantz et al., 2004; Strachan, 2005; Tonkin et al., 2004). Knowledge that mediator, Nipbl, and cohesin are linked at active promoters suggests therapies that might compensate for partial loss of transcriptional activity. The CDK8 kinase resides within a subcomplex of mediator that has repressive activities (Knuesel et al., 2009; Taatjes, 2010), so it is conceivable that small-molecule antagonists of CDK8 would lead to an increase in transcriptionally active mediator/cohesin assemblies. Mutations that affect the functions or levels of TrxG and PcG chromatin regulators have been implicated in a variety of cancers (Bracken and Helin, 2009; Krivtsov and Armstrong, 2007). The study of these regulators in ESCs and in cancer cells has revealed how repression of lineage-specific transcription factors and cell-cycle regulators may contribute to cancer phenotypes. Chromatin regulators with enzymatic activities are a new class of targets for small-molecule drug discovery, and we can expect new developments in this field in the near future. Summary and Outlook How do regulators of the ESC gene expression program produce a self-renewing cell capable of differentiating into all the cells of the adult? Part of the answer is that the core transcription factors positively autoregulate their own expression, activate transcription of a large fraction of the active genes, and contribute to the poised state of lineage-specific genes. The core transcription factors frequently share enhancers with signaling transcription factors, so signal transduction pathways can deliver signals directly to the genes regulated by the core factors. At actively
transcribed genes, additional transcription factors implicated in proliferation and other aspects of self-renewal bind to sites that can be separate from the core enhancers and modulate RNA expression levels though mechanisms that include release of paused polymerases. The core factors help create a poised state by recruiting repressive chromatin regulators to genes encoding lineage-specific factors. Many of the regulatory features of ESCs probably operate to control cell identity in other cell types. Reprogramming and transdifferentiation experiments support the idea that a small number of master transcription factors can control cell state in various cell types (Graf and Enver, 2009; Vierbuchen et al., 2010; Zhou et al., 2008a). If this model holds true for most cell types, then identification of the master transcription factors of all cell types would significantly improve our understanding of cell identity. The concept that some transcription factors control transcription initiation and others transcription elongation is almost certainly operative in all cell types, suggesting that improved models of global transcriptional control will depend on ascertaining which of these functions applies to each transcription factor. The ability of signaling pathways to transmit information about the cellular environment to enhancers bound by master regulators seems likely to be general, and if this is the case, better understanding of the cell-type-specific effects of certain signaling pathways will be at hand. The emerging evidence that repression of ESC genes by PcG complexes can involve RNA species transcribed in the vicinity of the repressed genes makes it important to determine whether PcG complexes are generally recruited or stabilized by local transcription in other cell types, and if so, to learn what controls such transcription. ESCs will continue to provide a powerful system for discovering the molecules and mechanisms that regulate mammalian cell state and a resource for understanding the changes that occur as cells differentiate. There are, however, many interesting challenges that must be met in order to more fully understand the basic regulation of these cells, the process of mammalian development, and how regulation goes awry in disease. These include, but are not limited to, determining the dynamic changes that occur as cells migrate through the cell cycle to self-renew or to differentiate, ascertaining the influence of natural cellular environments, and understanding the impact of genetic variation. ACKNOWLEDGMENTS Many ideas discussed in this Review emerged from conversations with Steve Bilodeau, Laurie Boyer, Megan Cole, Joan and Ron Conaway, Jerry Crabtree, Job Dekker, David Gifford, Amanda Fisher, Garrett Frampton, Matthew Guenther, Kristian Helin, Rudolf Jaenisch, Richard Jenner, Michael Kagey, Tony Lee, Stuart Levine, Charles Lin, John Lis, Alexander Marson, Alan Mullen, Jamie Newman, Huck Ng, Stuart Orkin, David Orlando, Renato Paro, Peter Rahl, Peter Reddien, Robert Roeder, Phillip Sharp, Dylan Taatjes, Ken Zaret, Len Zon, Robert Weinberg, and Thomas Zwaka. I am also grateful to David Orlando and Steve Bilodeau for help with data analysis and figures. REFERENCES Alarcon, C., Zaromytidou, A.I., Xi, Q., Gao, S., Yu, J., Fujisawa, S., Barlas, A., Miller, A.N., Manova-Todorova, K., Macias, M.J., et al. (2009). Nuclear CDKs drive Smad transcriptional activation and turnover in BMP and TGF-beta pathways. Cell 139, 757–769.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 949
Ambrosetti, D.C., Scholer, H.R., Dailey, L., and Basilico, C. (2000). Modulation of the activity of multiple transcriptional activation domains by the DNA binding domains mediates the synergistic action of Sox2 and Oct-3 on the fibroblast growth factor-4 enhancer. J. Biol. Chem. 275, 23387–23397. Avilion, A.A., Nicolis, S.K., Pevny, L.H., Perez, L., Vivian, N., and Lovell-Badge, R. (2003). Multipotent cell lineages in early mouse development depend on SOX2 function. Genes Dev. 17, 126–140. Azuara, V., Perry, P., Sauer, S., Spivakov, M., Jorgensen, H.F., John, R.M., Gouti, M., Casanova, M., Warnes, G., Merkenschlager, M., et al. (2006). Chromatin signatures of pluripotent cell lines. Nat. Cell Biol. 8, 532–538. Bartel, D.P. (2009). MicroRNAs: Target recognition and regulatory functions. Cell 136, 215–233. Beattie, G.M., Lopez, A.D., Bucay, N., Hinton, A., Firpo, M.T., King, C.C., and Hayek, A. (2005). Activin A maintains pluripotency of human embryonic stem cells in the absence of feeder layers. Stem Cells 23, 489–495. Ben-Porath, I., Thomson, M.W., Carey, V.J., Ge, R., Bell, G.W., Regev, A., and Weinberg, R.A. (2008). An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat. Genet. 40, 499–507. Ben-Shushan, E., Sharir, H., Pikarsky, E., and Bergman, Y. (1995). A dynamic balance between ARP-1/COUP-TFII, EAR-3/COUP-TFI, and retinoic acid receptor:retinoid X receptor heterodimers regulates Oct-3/4 expression in embryonal carcinoma cells. Mol. Cell. Biol. 15, 1034–1048. Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry, B., Meissner, A., Wernig, M., Plath, K., et al. (2006). A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., et al. (2010). The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905. Bilodeau, S., Kagey, M.H., Frampton, G.M., Rahl, P.B., and Young, R.A. (2009). SetDB1 contributes to repression of genes encoding developmental regulators and maintenance of ES cell state. Genes Dev. 23, 2484–2489. Boland, M.J., Hazen, J.L., Nazor, K.L., Rodriguez, A.R., Gifford, W., Martin, G., Kupriyanov, S., and Baldwin, K.K. (2009). Adult mice generated from induced pluripotent stem cells. Nature 461, 91–94.
Cartwright, P., McLean, C., Sheppard, A., Rivett, D., Jones, K., and Dalton, S. (2005). LIF/STAT3 controls ES cell self-renewal and pluripotency by a Mycdependent mechanism. Development 132, 885–896. Chambers, I., and Smith, A. (2004). Self-renewal of teratocarcinoma and embryonic stem cells. Oncogene 23, 7150–7160. Chambers, I., Colby, D., Robertson, M., Nichols, J., Lee, S., Tweedie, S., and Smith, A. (2003). Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell 113, 643–655. Chambers, I., Silva, J., Colby, D., Nichols, J., Nijmeijer, B., Robertson, M., Vrana, J., Jones, K., Grotewold, L., and Smith, A. (2007). Nanog safeguards pluripotency and mediates germline development. Nature 450, 1230–1234. Chen, X., Vega, V.B., and Ng, H.H. (2008a). Transcriptional regulatory networks in embryonic stem cells. Cold Spring Harb. Symp. Quant. Biol. 73, 203–209. Chen, X., Xu, H., Yuan, P., Fang, F., Huss, M., Vega, V.B., Wong, E., Orlov, Y.L., Zhang, W., Jiang, J., et al. (2008b). Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117. Chew, J.L., Loh, Y.H., Zhang, W., Chen, X., Tam, W.L., Yeap, L.S., Li, P., Ang, Y.S., Lim, B., Robson, P., et al. (2005). Reciprocal transcriptional regulation of Pou5f1 and Sox2 via the Oct4/Sox2 complex in embryonic stem cells. Mol. Cell. Biol. 25, 6031–6046. Chia, N.Y., Chan, Y.S., Feng, B., Lu, X., Orlov, Y.L., Moreau, D., Kumar, P., Yang, L., Jiang, J., Lau, M.S., et al. (2010). A genome-wide RNAi screen reveals determinants of human embryonic stem cell identity. Nature 468, 316–320. Clapier, C.R., and Cairns, B.R. (2009). The biology of chromatin remodeling complexes. Annu. Rev. Biochem. 78, 273–304. Cole, M.F., Johnstone, S.E., Newman, J.J., Kagey, M.H., and Young, R.A. (2008). Tcf3 is an integral component of the core regulatory circuitry of embryonic stem cells. Genes Dev. 22, 746–755. Conaway, R.C., Sato, S., Tomomori-Sato, C., Yao, T., and Conaway, J.W. (2005). The mammalian Mediator complex and its role in transcriptional regulation. Trends Biochem. Sci. 30, 250–255.
Bonasio, R., Tu, S., and Reinberg, D. (2010). Molecular signals of epigenetic states. Science 330, 612–616.
Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848.
Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G., et al. (2005). Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956.
Cote, J., Quinn, J., Workman, J.L., and Peterson, C.L. (1994). Stimulation of GAL4 derivative binding to nucleosomal DNA by the yeast SWI/SNF complex. Science 265, 53–60.
Boyer, L.A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L.A., Lee, T.I., Levine, S.S., Wernig, M., Tajonar, A., Ray, M.K., et al. (2006). Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441, 349–353. Bracken, A.P., and Helin, K. (2009). Polycomb group proteins: navigators of lineage pathways led astray in cancer. Nat. Rev. Cancer 9, 773–784. Bracken, A.P., Dietrich, N., Pasini, D., Hansen, K.H., and Helin, K. (2006). Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes Dev. 20, 1123–1136. Britten, R.J., and Davidson, E.H. (1969). Gene regulation for higher cells: a theory. Science 165, 349–357. Brownell, J.E., Zhou, J., Ranalli, T., Kobayashi, R., Edmondson, D.G., Roth, S.Y., and Allis, C.D. (1996). Tetrahymena histone acetyltransferase A: A homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell 84, 843–851.
Deato, M.D., and Tjian, R. (2007). Switching of the core transcription machinery during myogenesis. Genes Dev. 21, 2137–2149. Deato, M.D., Marr, M.T., Sottero, T., Inouye, C., Hu, P., and Tjian, R. (2008). MyoD targets TAF3/TRF3 to activate myogenin transcription. Mol. Cell 32, 96–105. Dejosez, M., Krumenacker, J.S., Zitur, L.J., Passeri, M., Chu, L.F., Songyang, Z., Thomson, J.A., and Zwaka, T.P. (2008). Ronin is essential for embryogenesis and the pluripotency of mouse embryonic stem cells. Cell 133, 1162–1174. Dejosez, M., Levine, S.S., Frampton, G.M., Whyte, W.A., Stratton, S.A., Barton, M.C., Gunaratne, P.H., Young, R.A., and Zwaka, T.P. (2010). Ronin/Hcf-1 binds to a hyperconserved enhancer element and regulates genes involved in the growth of embryonic stem cells. Genes Dev. 24, 1479–1484.
Buhler, M., and Moazed, D. (2007). Transcription and RNAi in heterochromatic gene silencing. Nat. Struct. Mol. Biol. 14, 1041–1048.
Ding, L., Paszkowski-Rogacz, M., Nitzsche, A., Slabicki, M.M., Heninger, A.K., de Vries, I., Kittler, R., Junqueira, M., Shevchenko, A., Schulz, H., et al. (2009). A genome-scale RNAi screen for Oct4 modulators defines a role of the Paf1 complex for embryonic stem cell identity. Cell Stem Cell 4, 403–415.
Cammas, F., Herzog, M., Lerouge, T., Chambon, P., and Losson, R. (2004). Association of the transcriptional corepressor TIF1beta with heterochromatin protein 1 (HP1): an essential role for progression through differentiation. Genes Dev. 18, 2147–2160.
Ding, N., Zhou, H., Esteve, P.O., Chin, H.G., Kim, S., Xu, X., Joseph, S.M., Friez, M.J., Schwartz, C.E., Pradhan, S., et al. (2008). Mediator links epigenetic silencing of neuronal gene expression with x-linked mental retardation. Mol. Cell 31, 347–359.
950 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Dodge, J.E., Kang, Y.K., Beppu, H., Lei, H., and Li, E. (2004). Histone H3-K9 methyltransferase ESET is essential for early development. Mol. Cell. Biol. 24, 2478–2486. Endoh, M., Endo, T.A., Endoh, T., Fujimura, Y., Ohara, O., Toyoda, T., Otte, A.P., Okano, M., Brockdorff, N., Vidal, M., et al. (2008). Polycomb group proteins Ring1A/B are functionally linked to the core transcriptional regulatory circuitry to maintain ES cell identity. Development 135, 1513–1524. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817–1821. Fazzio, T.G., and Panning, B. (2010). Condensin complexes regulate mitotic progression and interphase chromatin structure in embryonic stem cells. J. Cell Biol. 188, 491–503. Fazzio, T.G., Huff, J.T., and Panning, B. (2008). An RNAi screen of chromatin proteins identifies Tip60-p400 as a regulator of embryonic stem cell identity. Cell 134, 162–174. Feldman, N., Gerson, A., Fang, J., Li, E., Zhang, Y., Shinkai, Y., Cedar, H., and Bergman, Y. (2006). G9a-mediated irreversible epigenetic inactivation of Oct-3/4 during early embryogenesis. Nat. Cell Biol. 8, 188–194. Firestein, R., Bass, A.J., Kim, S.Y., Dunn, I.F., Silver, S.J., Guney, I., Freed, E., Ligon, A.H., Vena, N., Ogino, S., et al. (2008). CDK8 is a colorectal cancer oncogene that regulates beta-catenin activity. Nature 455, 547–551. Frankel, N., Davis, G.K., Vargas, D., Wang, S., Payre, F., and Stern, D.L. (2010). Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature 466, 490–493. Fryer, C.J., White, J.B., and Jones, K.A. (2004). Mastermind recruits CycC:CDK8 to phosphorylate the Notch ICD and coordinate activation with turnover. Mol. Cell 16, 509–520. Fuda, N.J., Ardehali, M.B., and Lis, J.T. (2009). Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 461, 186–192. Fuhrmann, G., Chung, A.C., Jackson, K.J., Hummelke, G., Baniahmad, A., Sutter, J., Sylvester, I., Scholer, H.R., and Cooney, A.J. (2001). Mouse germline restriction of Oct4 expression by germ cell nuclear factor. Dev. Cell 1, 377–387. Fujita, J., Crane, A.M., Souza, M.K., Dejosez, M., Kyba, M., Flavell, R.A., Thomson, J.A., and Zwaka, T.P. (2008). Caspase activity mediates the differentiation of embryonic stem cells. Cell Stem Cell 2, 595–601. Galan-Caridad, J.M., Harel, S., Arenzana, T.L., Hou, Z.E., Doetsch, F.K., Mirny, L.A., and Reizis, B. (2007). Zfx controls the self-renewal of embryonic and hematopoietic stem cells. Cell 129, 345–357. Gao, S., Alarcon, C., Sapkota, G., Rahman, S., Chen, P.Y., Goerner, N., Macias, M.J., Erdjument-Bromage, H., Tempst, P., and Massague, J. (2009). Ubiquitin ligase Nedd4L targets activated Smad2/3 to limit TGF-beta signaling. Mol. Cell 36, 457–468.
sion programs of human embryonic and induced pluripotent stem cells. Cell Stem Cell 7, 249–257. Gupta, R.A., Shah, N., Wang, K.C., Kim, J., Horlings, H.M., Wong, D.J., Tsai, M.C., Hung, T., Argani, P., Rinn, J.L., et al. (2010). Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076. Han, M., and Grunstein, M. (1988). Nucleosome loss activates yeast downstream promoters in vivo. Cell 55, 1137–1145. Hanna, J.H., Saha, K., and Jaenisch, R. (2010). Pluripotency and cellular reprogramming: Facts, hypotheses, unresolved issues. Cell 143, 508–525. Hart, A.H., Hartley, L., Ibrahim, M., and Robb, L. (2004). Identification, cloning and expression analysis of the pluripotency promoting Nanog genes in mouse and human. Dev. Dyn. 230, 187–198. He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S., Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J., et al. (2005). A microRNA polycistron as a potential human oncogene. Nature 435, 828–833. He, Y., Vogelstein, B., Velculescu, V.E., Papadopoulos, N., and Kinzler, K.W. (2008). The antisense transcriptomes of human cells. Science 322, 1855–1857. Ho, L., and Crabtree, G.R. (2010). Chromatin remodelling during development. Nature 463, 474–484. Hong, J.W., Hendrix, D.A., and Levine, M.S. (2008). Shadow enhancers as a source of evolutionary novelty. Science 321, 1314. Hu, G., Kim, J., Xu, Q., Leng, Y., Orkin, S.H., and Elledge, S.J. (2009). A genome-wide RNAi screen identifies a new transcriptional module required for self-renewal. Genes Dev. 23, 837–848. Imbalzano, A.N., Kwon, H., Green, M.R., and Kingston, R.E. (1994). Facilitated binding of TATA-binding protein to nucleosomal DNA. Nature 370, 481–485. Ivanova, N., Dobrin, R., Lu, R., Kotenko, I., Levorse, J., DeCoste, C., Schafer, X., Lun, Y., and Lemischka, I.R. (2006). Dissecting self-renewal in stem cells with RNA interference. Nature 442, 533–538. Jackson, M., Krassowska, A., Gilbert, N., Chevassut, T., Forrester, L., Ansell, J., and Ramsahoye, B. (2004). Severe global DNA hypomethylation blocks differentiation and induces histone hyperacetylation in embryonic stem cells. Mol. Cell. Biol. 24, 8862–8871. Jacob, F., and Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356. Jaenisch, R., and Young, R. (2008). Stem cells, the molecular circuitry of pluripotency and nuclear reprogramming. Cell 132, 567–582. James, D., Levine, A.J., Besser, D., and Hemmati-Brivanlou, A. (2005). TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells. Development 132, 1273–1282.
Gaspar-Maia, A., Alajem, A., Polesso, F., Sridharan, R., Mason, M.J., Heidersbach, A., Ramalho-Santos, J., McManus, M.T., Plath, K., Meshorer, E., et al. (2009). Chd1 regulates open chromatin and pluripotency of embryonic stem cells. Nature 460, 863–868.
Jiang, J., Chan, Y.S., Loh, Y.H., Cai, J., Tong, G.Q., Lim, C.A., Robson, P., Zhong, S., and Ng, H.H. (2008). A core Klf circuitry regulates self-renewal of embryonic stem cells. Nat. Cell Biol. 10, 353–360.
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312, 75–79.
Kagey, M.H., Newman, J.J., Bilodeau, S., Zhan, Y., Orlando, D.A., van Berkum, N.L., Ebmeier, C.C., Goossens, J., Rahl, P.B., Levine, S.S., et al. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435.
Graf, T., and Enver, T. (2009). Forcing cells to change lineages. Nature 462, 587–594. Grewal, S.I., and Elgin, S.C. (2007). Transcription and RNA interference in the formation of heterochromatin. Nature 447, 399–406. Guenther, M.G., and Young, R.A. (2010). Transcription. Repressive transcription. Science 329, 150–151. Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A. (2007). A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77–88. Guenther, M.G., Frampton, G.M., Soldner, F., Hockemeyer, D., Mitalipova, M., Jaenisch, R., and Young, R.A. (2010). Chromatin structure and gene expres-
Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T., Livingston, D.M., and Rajewsky, K. (2005). Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev. 19, 489–501. Kang, L., Wang, J., Zhang, Y., Kou, Z., and Gao, S. (2009). iPS cells can support full-term development of tetraploid blastocyst-complemented embryos. Cell Stem Cell 5, 135–138. Kanhere, A., Viiri, K., Araujo, C.C., Rasaiyaah, J., Bouwman, R.D., Whyte, W.A., Pereira, C.F., Brookes, E., Walker, K., Bell, G.W., et al. (2010). Short RNAs are transcribed from repressed polycomb target genes and interact with polycomb repressive complex-2. Mol. Cell 38, 675–688.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 951
Kayne, P.S., Kim, U.J., Han, M., Mullen, J.R., Yoshizaki, F., and Grunstein, M. (1988). Extremely conserved histone H4 N terminus is dispensable for growth but essential for repressing the silent mating loci in yeast. Cell 55, 27–39. Kim, J., Chu, J., Shen, X., Wang, J., and Orkin, S.H. (2008). An extended transcriptional network for pluripotency of embryonic stem cells. Cell 132, 1049–1061. Kim, J., Woo, A.J., Chu, J., Snow, J.W., Fujiwara, Y., Kim, C.G., Cantor, A.B., and Orkin, S.H. (2010a). A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 143, 313–324. Kim, K., Doi, A., Wen, B., Ng, K., Zhao, R., Cahan, P., Kim, J., Aryee, M.J., Ji, H., Ehrlich, L.I., et al. (2010b). Epigenetic memory in induced pluripotent stem cells. Nature 467, 285–290. Klochendler-Yeivin, A., Fiette, L., Barra, J., Muchardt, C., Babinet, C., and Yaniv, M. (2000). The murine SNF5/INI1 chromatin remodeling factor is essential for embryonic development and tumor suppression. EMBO Rep. 1, 500–506. Knezetic, J.A., and Luse, D.S. (1986). The presence of nucleosomes on a DNA template prevents initiation by RNA polymerase II in vitro. Cell 45, 95–104. Knuesel, M.T., Meyer, K.D., Bernecky, C., and Taatjes, D.J. (2009). The human CDK8 subcomplex is a molecular switch that controls Mediator coactivator function. Genes Dev. 23, 439–451. Kornberg, R.D., and Thomas, J.O. (1974). Chromatin structure; oligomers of the histones. Science 184, 865–868. Krantz, I.D., McCallum, J., DeScipio, C., Kaur, M., Gillis, L.A., Yaeger, D., Jukofsky, L., Wasserman, N., Bottani, A., Morris, C.A., et al. (2004). Cornelia de Lange syndrome is caused by mutations in NIPBL, the human homolog of Drosophila melanogaster Nipped-B. Nat. Genet. 36, 631–635. Krivtsov, A.V., and Armstrong, S.A. (2007). MLL translocations, histone modifications and leukaemia stem-cell development. Nat. Rev. Cancer 7, 823–833. Krogan, N.J., Dover, J., Wood, A., Schneider, J., Heidt, J., Boateng, M.A., Dean, K., Ryan, O.W., Golshani, A., Johnston, M., et al. (2003). The Paf1 complex is required for histone H3 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. Mol. Cell 11, 721–729. Kwon, H., Imbalzano, A.N., Khavari, P.A., Kingston, R.E., and Green, M.R. (1994). Nucleosome disruption and enhancement of activator binding by a human SW1/SNF complex. Nature 370, 477–481. Landeira, D., Sauer, S., Poot, R., Dvorkina, M., Mazzarella, L., Jorgensen, H.F., Pereira, C.F., Leleu, M., Piccolo, F.M., Spivakov, M., et al. (2010). Jarid2 is a PRC2 component in embryonic stem cells required for multi-lineage differentiation and recruitment of PRC1 and RNA Polymerase II to developmental regulators. Nat. Cell Biol. 12, 618–624. Lee, J.T. (2009). Lessons from X-chromosome inactivation: long ncRNA as guides and tethers to the epigenome. Genes Dev. 23, 1831–1842. Lee, T.I., Jenner, R.G., Boyer, L.A., Guenther, M.G., Levine, S.S., Kumar, R.M., Chevalier, B., Johnstone, S.E., Cole, M.F., Isono, K., et al. (2006). Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301–313. Leeb, M., Pasini, D., Novatchkova, M., Jaritz, M., Helin, K., and Wutz, A. (2010). Polycomb complexes act redundantly to repress genomic repeats and genes. Genes Dev. 24, 265–276. Levine, M., and Tjian, R. (2003). Transcription regulation and animal diversity. Nature 424, 147–151. Li, B., Carey, M., and Workman, J.L. (2007). The role of chromatin during transcription. Cell 128, 707–719. Li, G., Margueron, R., Ku, M., Chambon, P., Bernstein, B.E., and Reinberg, D. (2010). Jarid2 and PRC2, partners in regulating gene expression. Genes Dev. 24, 368–380. Liang, J., Wan, M., Zhang, Y., Gu, P., Xin, H., Jung, S.Y., Qin, J., Wong, J., Cooney, A.J., Liu, D., et al. (2008). Nanog and Oct4 associate with unique transcriptional repression complexes in embryonic stem cells. Nat. Cell Biol. 10, 731–739.
952 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Lister, R., Pelizzola, M., Kida, Y.S., Hawkins, R.D., Nery, J.R., Hon, G., Antosiewicz-Bourget, J., O’Malley, R., Castanon, R., Klugman, S., et al. (2011). Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature. Published online February 2, 2011. 10.1038/nature09798. Lluis, F., Pedone, E., Pepe, S., and Cosma, M.P. (2008). Periodic activation of Wnt/beta-catenin signaling enhances somatic cell reprogramming mediated by cell fusion. Cell Stem Cell 3, 493–507. Loh, Y.H., Wu, Q., Chew, J.L., Vega, V.B., Zhang, W., Chen, X., Bourque, G., George, J., Leong, B., Liu, J., et al. (2006). The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 38, 431–440. Lorch, Y., LaPointe, J.W., and Kornberg, R.D. (1987). Nucleosomes inhibit the initiation of transcription but allow chain elongation with the displacement of histones. Cell 49, 203–210. Macarthur, B.D., Ma’ayan, A., and Lemischka, I.R. (2009). Systems biology of stem cell fate and cellular reprogramming. Nat. Rev. Mol. Cell Biol. 10, 672–681. Malik, S., and Roeder, R.G. (2005). Dynamic regulation of pol II transcription by the mammalian Mediator complex. Trends Biochem. Sci. 30, 256–263. Maniatis, T., Falvo, J.V., Kim, T.H., Kim, T.K., Lin, C.H., Parekh, B.S., and Wathelet, M.G. (1998). Structure and function of the interferon-beta enhanceosome. Cold Spring Harb. Symp. Quant. Biol. 63, 609–620. Margueron, R., Justin, N., Ohno, K., Sharpe, M.L., Son, J., Drury, W.J., 3rd, Voigt, P., Martin, S.R., Taylor, W.R., De Marco, V., et al. (2009). Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762–767. Marson, A., Foreman, R., Chevalier, B., Bilodeau, S., Kahn, M., Young, R.A., and Jaenisch, R. (2008a). Wnt signaling promotes reprogramming of somatic cells to pluripotency. Cell Stem Cell 3, 132–135. Marson, A., Levine, S.S., Cole, M.F., Frampton, G.M., Brambrink, T., Johnstone, S., Guenther, M.G., Johnston, W.K., Wernig, M., Newman, J., et al. (2008b). Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533. Masui, S., Nakatake, Y., Toyooka, Y., Shimosato, D., Yagi, R., Takahashi, K., Okochi, H., Okuda, A., Matoba, R., Sharov, A.A., et al. (2007). Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat. Cell Biol. 9, 625–635. Matoba, R., Niwa, H., Masui, S., Ohtsuka, S., Carter, M.G., Sharov, A.A., and Ko, M.S. (2006). Dissecting Oct3/4-regulated gene networks in embryonic stem cells by expression profiling. PLoS ONE 1, e26. Meissner, A. (2010). Epigenetic modifications in pluripotent and differentiated cells. Nat. Biotechnol. 28, 1079–1088. Melton, C., Judson, R.L., and Blelloch, R. (2010). Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. Nature 463, 621–626. Mitsui, K., Tokuzawa, Y., Itoh, H., Segawa, K., Murakami, M., Takahashi, K., Maruyama, M., Maeda, M., and Yamanaka, S. (2003). The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell 113, 631–642. Muncke, N., Jung, C., Rudiger, H., Ulmer, H., Roeth, R., Hubert, A., Goldmuntz, E., Driscoll, D., Goodship, J., Schon, K., et al. (2003). Missense mutations and gene interruption in PROSIT240, a novel TRAP240-like gene, in patients with congenital heart defect (transposition of the great arteries). Circulation 108, 2843–2850. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. (2005). Characterization of Dicer-deficient murine embryonic stem cells. Proc. Natl. Acad. Sci. USA 102, 12135–12140. Nasmyth, K., and Haering, C.H. (2009). Cohesin: its roles and mechanisms. Annu. Rev. Genet. 43, 525–558. Niakan, K.K., Davis, E.C., Clipsham, R.C., Jiang, M., Dehart, D.B., Sulik, K.K., and McCabe, E.R. (2006). Novel role for the orphan nuclear receptor Dax1 in embryogenesis, different from steroidogenesis. Mol. Genet. Metab. 88, 261–271.
Nichols, J., Zevnik, B., Anastassiadis, K., Niwa, H., Klewe-Nebenius, D., Chambers, I., Scholer, H., and Smith, A. (1998). Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell 95, 379–391. Niwa, H. (2007). How is pluripotency determined and maintained? Development 134, 635–646. Niwa, H., Burdon, T., Chambers, I., and Smith, A. (1998). Self-renewal of pluripotent embryonic stem cells is mediated via activation of STAT3. Genes Dev. 12, 2048–2060. Niwa, H., Miyazaki, J., and Smith, A.G. (2000). Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat. Genet. 24, 372–376. Niwa, H., Ogawa, K., Shimosato, D., and Adachi, K. (2009). A parallel circuit of LIF signalling pathways maintains pluripotency of mouse ES cells. Nature 460, 118–122. O’Donnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V., and Mendell, J.T. (2005). c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839–843. Okita, K., and Yamanaka, S. (2006). Intracellular signaling pathways regulating pluripotency of embryonic stem cells. Curr. Stem Cell Res. Ther. 1, 103–111. Olins, A.L., and Olins, D.E. (1974). Spheroid chromatin units (v bodies). Science 183, 330–332. Orkin, S.H., Wang, J., Kim, J., Chu, J., Rao, S., Theunissen, T.W., Shen, X., and Levasseur, D.N. (2008). The transcriptional network controlling pluripotency in ES cells. Cold Spring Harb. Symp. Quant. Biol. 73, 195–202. Pan, G., Tian, S., Nie, J., Yang, C., Ruotti, V., Wei, H., Jonsdottir, G.A., Stewart, R., and Thomson, J.A. (2007). Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylation in human embryonic stem cells. Cell Stem Cell 1, 299–312. Pandey, R.R., Mondal, T., Mohammad, F., Enroth, S., Redrup, L., Komorowski, J., Nagano, T., Mancini-Dinardo, D., and Kanduri, C. (2008). Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol. Cell 32, 232–246. Pardo, M., Lang, B., Yu, L., Prosser, H., Bradley, A., Babu, M.M., and Choudhary, J. (2010). An expanded Oct4 interaction network: Implications for stem cell biology, development, and disease. Cell Stem Cell 6, 382–395. Pasini, D., Bracken, A.P., Jensen, M.R., Lazzerini Denchi, E., and Helin, K. (2004). Suz12 is essential for mouse development and for EZH2 histone methyltransferase activity. EMBO J. 23, 4061–4071. Pasini, D., Bracken, A.P., Agger, K., Christensen, J., Hansen, K., Cloos, P.A., and Helin, K. (2008). Regulation of stem cell differentiation by histone methyltransferases and demethylases. Cold Spring Harb. Symp. Quant. Biol. 73, 253–263. Pasini, D., Cloos, P.A., Walfridsson, J., Olsson, L., Bukowski, J.P., Johansen, J.V., Bak, M., Tommerup, N., Rappsilber, J., and Helin, K. (2010). JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells. Nature 464, 306–310. Peng, J.C., Valouev, A., Swigut, T., Zhang, J., Zhao, Y., Sidow, A., and Wysocka, J. (2009). Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell 139, 1290–1302. Pera, M.F., and Tam, P.P. (2010). Extrinsic regulation of pluripotent stem cells. Nature 465, 713–720. Peterlin, B.M., and Price, D.H. (2006). Controlling the elongation phase of transcription with P-TEFb. Mol. Cell 23, 297–305. Philibert, R.A., and Madan, A. (2007). Role of MED12 in transcription and human behavior. Pharmacogenomics 8, 909–916. Polo, J.M., Liu, S., Figueroa, M.E., Kulalert, W., Eminli, S., Tan, K.Y., Apostolou, E., Stadtfeld, M., Li, Y., Shioda, T., et al. (2010). Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells. Nat. Biotechnol. 28, 848–855.
Rahl, P.B., Lin, C.Y., Seila, A.C., Flynn, R.A., McCuine, S., Burge, C.B., Sharp, P.A., and Young, R.A. (2010). c-Myc regulates transcriptional pause release. Cell 141, 432–445. Rinn, J.L., Kertesz, M., Wang, J.K., Squazzo, S.L., Xu, X., Brugmann, S.A., Goodnough, L.H., Helms, J.A., Farnham, P.J., Segal, E., et al. (2007). Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323. Risheg, H., Graham, J.M., Jr., Clark, R.D., Rogers, R.C., Opitz, J.M., Moeschler, J.B., Peiffer, A.P., May, M., Joseph, S.M., Jones, J.R., et al. (2007). A recurrent mutation in MED12 leading to R961W causes Opitz-Kaveggia syndrome. Nat. Genet. 39, 451–453. Roeder, R.G. (1998). Role of general and gene-specific cofactors in the regulation of eukaryotic transcription. Cold Spring Harb. Symp. Quant. Biol. 63, 201–218. Roeder, R.G. (2005). Transcriptional regulation and the role of diverse coactivators in animal cells. FEBS Lett. 579, 909–915. Rossant, J. (2008). Stem cells and early lineage development. Cell 132, 527–531. Sato, N., Meijer, L., Skaltsounis, L., Greengard, P., and Brivanlou, A.H. (2004). Maintenance of pluripotency in human and mouse embryonic stem cells through activation of Wnt signaling by a pharmacological GSK-3-specific inhibitor. Nat. Med. 10, 55–63. Schnetz, M.P., Handoko, L., Akhtar-Zaidi, B., Bartels, C.F., Pereira, C.F., Fisher, A.G., Adams, D.J., Flicek, P., Crawford, G.E., Laframboise, T., et al. (2010). CHD7 targets active gene enhancer elements to modulate ES cellspecific gene expression. PLoS Genet. 6, e1001023. Schuettengruber, B., Chourrout, D., Vervoort, M., Leblanc, B., and Cavalli, G. (2007). Genome regulation by polycomb and trithorax proteins. Cell 128, 735–745. Scholer, H.R., Ruppert, S., Suzuki, N., Chowdhury, K., and Gruss, P. (1990). New type of POU domain in germ line-specific protein Oct-4. Nature 344, 435–439. Schultz, D.C., Ayyanathan, K., Negorev, D., Maul, G.G., and Rauscher, F.J., 3rd. (2002). SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes Dev. 16, 919–932. Schwartz, C.E., Tarpey, P.S., Lubs, H.A., Verloes, A., May, M.M., Risheg, H., Friez, M.J., Futreal, P.A., Edkins, S., Teague, J., et al. (2007). The original Lujan syndrome family has a novel missense mutation (p.N1007S) in the MED12 gene. J. Med. Genet. 44, 472–477. Seila, A.C., Calabrese, J.M., Levine, S.S., Yeo, G.W., Rahl, P.B., Flynn, R.A., Young, R.A., and Sharp, P.A. (2008). Divergent transcription from active promoters. Science 322, 1849–1851. Shen, X., Kim, W., Fujiwara, Y., Simon, M.D., Liu, Y., Mysliwiec, M.R., Yuan, G.C., Lee, Y., and Orkin, S.H. (2009). Jumonji modulates polycomb activity and self-renewal versus differentiation of stem cells. Cell 139, 1303–1314. Silva, J., and Smith, A. (2008). Capturing pluripotency. Cell 132, 532–536. Silva, J., Nichols, J., Theunissen, T.W., Guo, G., van Oosten, A.L., Barrandon, O., Wray, J., Yamanaka, S., Chambers, I., and Smith, A. (2009). Nanog is the gateway to the pluripotent ground state. Cell 138, 722–737. Smith, A.G. (2001). Embryo-derived stem cells: of mice and men. Annu. Rev. Cell Dev. Biol. 17, 435–462. Smith, S.K., Charnock-Jones, D.S., and Sharkey, A.M. (1998). The role of leukemia inhibitory factor and interleukin-6 in human reproduction. Hum. Reprod. 13 (Suppl 3), 237–243, discussion 244–236. Smith, T.A., and Hooper, M.L. (1983). Medium conditioned by feeder cells inhibits the differentiation of embryonal carcinoma cultures. Exp. Cell Res. 145, 458–462. Stadtfeld, M., and Hochedlinger, K. (2010). Induced pluripotency: history, mechanisms, and applications. Genes Dev. 24, 2239–2263. Stadtfeld, M., Apostolou, E., Akutsu, H., Fukuda, A., Follett, P., Natesan, S., Kono, T., Shioda, T., and Hochedlinger, K. (2010). Aberrant silencing of
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 953
imprinted genes on chromosome 12qF1 in mouse induced pluripotent stem cells. Nature 465, 175–181. Stock, J.K., Giadrossi, S., Casanova, M., Brookes, E., Vidal, M., Koseki, H., Brockdorff, N., Fisher, A.G., and Pombo, A. (2007). Ring1-mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells. Nat. Cell Biol. 9, 1428–1435. Strachan, T. (2005). Cornelia de Lange Syndrome and the link between chromosomal function, DNA repair and developmental gene regulation. Curr. Opin. Genet. Dev. 15, 258–264. Sun, C., Nakatake, Y., Akagi, T., Ura, H., Matsuda, T., Nishiyama, A., Koide, H., Ko, M.S., Niwa, H., and Yokota, T. (2009). Dax1 binds to Oct3/4 and inhibits its transcriptional activity in embryonic stem cells. Mol. Cell. Biol. 29, 4574–4583. Surface, L.E., Thornton, S.R., and Boyer, L.A. (2010). Polycomb group proteins set the stage for early lineage commitment. Cell Stem Cell 7, 288–298. Taatjes, D.J. (2010). The human Mediator complex: a versatile, genome-wide regulator of transcription. Trends Biochem. Sci. 35, 315–322. Tam, W.L., Lim, C.Y., Han, J., Zhang, J., Ang, Y.S., Ng, H.H., Yang, H., and Lim, B. (2008). T-cell factor 3 regulates embryonic stem cell pluripotency and self-renewal by the transcriptional control of multiple lineage pathways. Stem Cells 26, 2019–2031. Tay, Y., Zhang, J., Thomson, A.M., Lim, B., and Rigoutsos, I. (2008). MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation. Nature 455, 1124–1128. Tonkin, E.T., Wang, T.J., Lisgo, S., Bamshad, M.J., and Strachan, T. (2004). NIPBL, encoding a homolog of fungal Scc2-type sister chromatid cohesion proteins and fly Nipped-B, is mutated in Cornelia de Lange syndrome. Nat. Genet. 36, 636–641. Tsai, M.C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J.K., Lan, F., Shi, Y., Segal, E., and Chang, H.Y. (2010). Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693. Vallier, L., Alexander, M., and Pedersen, R.A. (2005). Activin/Nodal and FGF pathways cooperate to maintain pluripotency of human embryonic stem cells. J. Cell Sci. 118, 4495–4509. van den Berg, D.L., Snoek, T., Mullin, N.P., Yates, A., Bezstarosti, K., Demmers, J., Chambers, I., and Poot, R.A. (2010). An Oct4-centered protein interaction network in embryonic stem cells. Cell Stem Cell 6, 369–381. van der Stoop, P., Boutsma, E.A., Hulsman, D., Noback, S., Heimerikx, M., Kerkhoven, R.M., Voncken, J.W., Wessels, L.F., and van Lohuizen, M. (2008). Ubiquitin E3 ligase Ring1b/Rnf2 of polycomb repressive complex 1 contributes to stable maintenance of mouse embryonic stem cells. PLoS ONE 3, e2235. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., and Luscombe, N.M. (2009). A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263. Vierbuchen, T., Ostermeier, A., Pang, Z.P., Kokubu, Y., Sudhof, T.C., and Wernig, M. (2010). Direct conversion of fibroblasts to functional neurons by defined factors. Nature 463, 1035–1041. Viswanathan, S.R., Daley, G.Q., and Gregory, R.I. (2008). Selective blockade of microRNA processing by Lin28. Science 320, 97–100. Wang, J., Rao, S., Chu, J., Shen, X., Levasseur, D.N., Theunissen, T.W., and Orkin, S.H. (2006). A protein interaction network for pluripotency of embryonic stem cells. Nature 444, 364–368. Wang, Y., Medvid, R., Melton, C., Jaenisch, R., and Blelloch, R. (2007). DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell selfrenewal. Nat. Genet. 39, 380–385. Wang, Y., Baskerville, S., Shenoy, A., Babiarz, J.E., Baehner, L., and Blelloch, R. (2008). Embryonic stem cell-specific microRNAs regulate the G1-S transition and promote rapid proliferation. Nat. Genet. 40, 1478–1483.
954 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Williams, R.L., Hilton, D.J., Pease, S., Willson, T.A., Stewart, C.L., Gearing, D.P., Wagner, E.F., Metcalf, D., Nicola, N.A., and Gough, N.M. (1988). Myeloid leukaemia inhibitory factor maintains the developmental potential of embryonic stem cells. Nature 336, 684–687. Wilusz, J.E., Sunwoo, H., and Spector, D.L. (2009). Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 23, 1494–1504. Wood, A.J., Severson, A.F., and Meyer, B.J. (2010). Condensin and cohesin complexity: the expanding repertoire of functions. Nat. Rev. Genet. 11, 391–404. Wu, Q., Chen, X., Zhang, J., Loh, Y.H., Low, T.Y., Zhang, W., Sze, S.K., Lim, B., and Ng, H.H. (2006). Sall4 interacts with Nanog and co-occupies Nanog genomic sites in embryonic stem cells. J. Biol. Chem. 281, 24090–24094. Yamanaka, S., and Blau, H.M. (2010). Nuclear reprogramming to a pluripotent state by three approaches. Nature 465, 704–712. Yap, K.L., Li, S., Munoz-Cabello, A.M., Raguz, S., Zeng, L., Mujtaba, S., Gil, J., Walsh, M.J., and Zhou, M.M. (2010). Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol. Cell 38, 662–674. Yeap, L.S., Hayashi, K., and Surani, M.A. (2009). ERG-associated protein with SET domain (ESET)-Oct4 interaction regulates pluripotency and represses the trophectoderm lineage. Epigenetics Chromatin 2, 12. Ying, Q.L., Nichols, J., Chambers, I., and Smith, A. (2003). BMP induction of Id proteins suppresses differentiation and sustains embryonic stem cell selfrenewal in collaboration with STAT3. Cell 115, 281–292. Yuan, P., Han, J., Guo, G., Orlov, Y.L., Huss, M., Loh, Y.H., Yaw, L.P., Robson, P., Lim, B., and Ng, H.H. (2009). Eset partners with Oct4 to restrict extraembryonic trophoblast lineage potential in embryonic stem cells. Genes Dev. 23, 2507–2520. Zaratiegui, M., Irvine, D.V., and Martienssen, R.A. (2007). Noncoding RNAs and gene silencing. Cell 128, 763–776. Zhang, J., Tam, W.L., Tong, G.Q., Wu, Q., Chan, H.Y., Soh, B.S., Lou, Y., Yang, J., Ma, Y., Chai, L., et al. (2006). Sall4 modulates embryonic stem cell pluripotency and early embryonic development by the transcriptional regulation of Pou5f1. Nat. Cell Biol. 8, 1114–1123. Zhang, X., Zhang, J., Wang, T., Esteban, M.A., and Pei, D. (2008). Esrrb activates Oct4 transcription and sustains self-renewal and pluripotency in embryonic stem cells. J. Biol. Chem. 283, 35825–35833. Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J., and Lee, J.T. (2008). Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756. Zhao, J., Ohsumi, T.K., Kung, J.T., Ogawa, Y., Grau, D.J., Sarma, K., Song, J.J., Kingston, R.E., Borowsky, M., and Lee, J.T. (2010). Genome-wide identification of Polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953. Zhao, X.Y., Li, W., Lv, Z., Liu, L., Tong, M., Hai, T., Hao, J., Guo, C.L., Ma, Q.W., Wang, L., et al. (2009). iPS cells produce viable mice through tetraploid complementation. Nature 461, 86–90. Zhong, X., and Jin, Y. (2009). Critical roles of coactivator p300 in mouse embryonic stem cell differentiation and Nanog expression. J. Biol. Chem. 284, 9168–9175. Zhou, Q., Brown, J., Kanarek, A., Rajagopal, J., and Melton, D.A. (2008a). In vivo reprogramming of adult pancreatic exocrine cells to beta-cells. Nature 455, 627–632. Zhou, W., Zhu, P., Wang, J., Pascual, G., Ohgi, K.A., Lozach, J., Glass, C.K., and Rosenfeld, M.G. (2008b). Histone H2A monoubiquitination represses transcription by inhibiting RNA polymerase II transcriptional elongation. Mol. Cell 29, 69–80.
Leading Edge
Review Pattern, Growth, and Control Arthur D. Lander1,* 1Department of Developmental and Cell Biology and Department of Biomedical Engineering, Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697-2300, USA *Correspondence:
[email protected] DOI 10.1016/j.cell.2011.03.009
Systems biology seeks not only to discover the machinery of life but to understand how such machinery is used for control, i.e., for regulation that achieves or maintains a desired, useful end. This sort of goal-directed, engineering-centered approach also has deep historical roots in developmental biology. Not surprisingly, developmental biology is currently enjoying an influx of ideas and methods from systems biology. This Review highlights current efforts to elucidate design principles underlying the engineering objectives of robustness, precision, and scaling as they relate to the developmental control of growth and pattern formation. Examples from vertebrate and invertebrate development are used to illustrate general lessons, including the value of integral feedback in achieving set-point control; the usefulness of self-organizing behavior; the importance of recognizing and appropriately handling noise; and the absence of ‘‘free lunch.’’ By illuminating such principles, systems biology is helping to create a functional framework within which to make sense of the mechanistic complexity of organismal development. Introduction The practice of developmental biology is much like re-reading a good book. Even when the ending is well known, much can be learned from exploring the plot twists and character development that bring it about. The tendency to see developmental events as being inevitably directed toward fixed, predetermined ends is deeply ingrained in developmental biology, a habit that is understandable, given the remarkable abilities of embryos to come out normally after drastic manipulations. ‘‘Embryonic regulation’’ has fascinated scientists since the 19th century, when Driesch derived normally patterned sea urchin larvae from single embryo blastomeres. Extending this concept to genetic, as opposed to surgical, manipulation is Waddington’s notion of canalization, the idea that the normal phenotype has been selected to be especially insensitive to genetic variation. In the modern developmental biology literature, terms like robustness and precision are finding increasing use. Robustness is the further generalization of canalization to include insensitivity to all kinds of perturbations, environmental and genetic. Precision—the magnitude of natural variation in developmental outcomes—is a measure of robustness with respect to natural perturbations (e.g., standing genetic variation, normal environmental fluctuations, and the randomness of biochemical processes). The frequency and degree with which embryonic regulation, canalization, robustness, and precision are encountered in development raises many questions. Is there a common principle underlying all such phenomena? Are there conserved mechanisms? Can we explain how (and why) such processes evolved? These questions have, of course, been around for a very long time. What’s new these days is an influx of ideas and concepts from methodologies outside of traditional biology, including control theory, information theory, and network anal-
ysis. These methods are being applied to biological systems with the aim of elucidating underlying ‘‘design principles,’’ a goal well suited to the investigation of processes that must achieve desired ends. This Review focuses on current progress in understanding developmental control and the influence that systems biology is having on such work. Studies on a variety of animal species (Figure 1) are discussed below; however, it should be noted that systems biology approaches to plant morphogenesis are also currently bearing (if the pun may be forgiven) considerable fruit (e.g., Jiao and Meyerowitz, 2010; Sahlin et al., 2011). Complexity, Performance, and Control A complex system can be defined as any system in which large enough numbers of elements interact in simple ways to produce nonobvious behavior. There are two types of such systems: those that are complex by chance, and those that are complex by necessity. The former are often studied by physicists and typically involve situations in which orderly properties of matter at one level of description emerge out of collective chaos at lower ones. Such emergent properties are often summarized in terms of physical laws like the Universal Gas Law or Fick’s Laws of Diffusion. The second type of complex system is encountered by engineers, who design systems to meet specific performance objectives. When engineered systems are dynamic (changing in time), and the performance goals require control (steering behavior toward desired goals), the numbers and types of interacting components can quickly reach the point at which system behavior is sufficiently nonobvious that sophisticated mathematics or computer simulation is required to understand and predict it. There is clearly a strong affinity between this second type of complexity—deriving from dynamics and control—and the Cell 144, March 18, 2011 ª2011 Elsevier Inc. 955
Figure 1. Control Objectives in Morphogenesis The figure compares some of the experimental systems, discussed in this Review, that are being used to study developmental regulation, canalization, robustness, and precision. The Drosophila wing imaginal disc (A) is an excellent model for both pattern formation (through the action of long-range morphogens, such as Hedgehog, Decaptentaplegic [Dpp], and Wingless) and growth control. The wing disc demonstrates both scaling of pattern to size and scaling of size to pattern. The early Xenopus embryo (B) provides an excellent system for studying pattern formation in the absence of growth, as well as the scaling of pattern to size. Pattern formation has been extensively studied along the anteroposterior axis (C) and the dorsoventral axis (D) of the Drosophila embryo. Anteroposterior patterning is initiated by the transcription factor Bicoid, which acts as a long-range morphogen within the cytoplasm of the syncytial early embryo, controlling a cascade of long-range and self-organizing events that segment the embryo into specific regions (stripes). Dorsoventral patterning utilizes the long-range morphogen Dpp to trigger, among other things, a self-organizing process at the dorsal midline. Self-organization also characterizes the mechanism by which narrow, straight veins are positioned on the Drosophila wing (E) during the pupal stage. The development of pigment stripes in teleost fish, such as the zebrafish (F), provides another opportunity to investigate self-organizing patterns, especially in the context of regeneration, the experimental investigation of which has shed new light on mechanism. Mammalian brain (G) and muscle (H) are good models of organ size control; in the case of muscle, genetic studies have revealed a critical role for feedback from chalones. Feedback regulation of growth has long been known about through studies on regeneration of the mammalian liver (I). More recently, studies in the mouse olfactory epithelium (J) have shed light on mechanisms underlying feedback control of both size and regenerative speed. Analogous mechanisms appear to be at work in mammalian hematopoiesis (K). Other excellent experimental systems, not shown here, include the early vertebrate spinal cord and hindbrain (pattern formation); vertebrate limb buds (pattern formation, growth control); vertebrate and invertebrate retinas (growth control); and plant shoot apical meristems (pattern formation). Figure 1J courtesy of Kim Gokoffski and Anne Calof.
picture developmental biologists have of embryos: dynamic, yet reliably achieving prespecified ends. Indeed, more than 60 years ago, developmental biologists were already suggesting that ‘‘the complex engineering performances of technology are a much more pertinent model of the nature of morphogenesis than are the more elementary phenomena dealt with in basic physics and chemistry’’ (Weiss, 1950). Yet in those days, conditions were not right for exploiting the natural affinity between engineering and morphogenesis. As we shall see, there are essentially two types of engineering: forward engineering, which involves knowing a set of performance objectives and building a system that fulfills them, and reverse engineering, which involves knowing how a system is built and inferring the performance objectives that necessitated it being built that way. Throughout most of the 20th century, developmental biologists were not ready to do either. Now they are increasingly doing both. Forward Engineering Pattern If we understand ‘‘performance objectives’’ in biology as corresponding to whatever natural selection selects for—i.e., what evolutionary biologists call ‘‘fitness’’—then we see that one 956 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
problem with forward engineering is that biologists rarely have a thorough understanding of what contributes to fitness (except perhaps for unicellular organisms in simple environments). Moreover, even if the performance objectives that drove the evolution of an individual biological system were known, there is no guarantee that a forward engineering approach would come up with the same solution as nature. Ask an engineer to build a bridge, and it may not look like any other bridge. This point is illustrated by the history of Turing patterns in developmental biology. The name Turing pattern derives from Alan Turing’s seminal paper, which also introduced the word morphogen (Turing, 1952). It describes a solution to the general problem of creating repeating patterns in space. Through further elaboration of Turing’s work (Gierer and Meinhardt, 1972; Meinhardt and Gierer, 2000), we know that such patterns tend to arise in systems of spatially arrayed, equivalent components (e.g., cells) when they produce both ‘‘activating’’ and ‘‘inhibiting’’ signals that spread at different rates. Depending upon the details, steady states may be reached in which peaks and troughs of signal production occur in repeated patterns of spots or stripes (Figure 2A). Turing patterns exemplify a class of mechanisms
Figure 2. Two Modes of Organization in the Control of Pattern The performance objectives of patterning systems include both controlling the locations of events relative to each other and controlling them relative to prespecified landmarks. Turing patterns are one example of self-organizing patterns (A). Repeated patterns form spontaneously and exhibit spacings that depend primarily upon the details of local signal activation, inhibition, and spread, with relatively little influence from events outside the system. In contrast, long-range morphogen gradients typify boundary-driven organization (B). They inform cells of their location relative to fixed landmarks. In both cases, morphogens establish a characteristic ‘‘length scale’’ or ‘‘wavelength.’’ In the first case (A), pattern is a direct reflection of that scale, such that elements (spots or stripes) occur once per length scale. In the second case (B), the length scale simply determines how gradually ‘‘positional information’’ decays over space; where pattern elements occur (blue, red, green blocks) depends upon how cells interpret the positional information they receive.
termed ‘‘self-organizing’’ because the location and spacing of elements emerge out of local interactions, and not through instructions that come from elsewhere. Initial hopes that Turing processes would provide a simple explanation for all the periodic patterns of development—from skin markings to seashell patterns to embryonic segmentation—have not been realized. Particularly with respect to early, high-precision events, such as the specification of embryonic segments, 30 years of intensive experimental genetics has failed to produce simple, diffusible activator/inhibitor pairs for such cases. Instead, such work has tended to support the view that pattern is organized by morphogens that form long-range gradients from which cells learn their positions. Such systems are boundary organized (Figure 2B), meaning that positional information is encoded in one or more boundaries, with morphogens passively conveying that information across a field of cells. Interest in self-organizing patterns is, however, very much on the rise today, partly because of recent evidence for the involvement of Turing processes in left-right axis specification, skeletal
patterning in the vertebrate limb, the patterning of mammalian and avian ectodermal organs, skin pigmentation patterns, branching morphogenesis in the lung, and hydra head regeneration (reviewed by Kondo and Miura, 2010). Recent work on skin pigmentation patterns in fish (Yamaguchi et al., 2007) has been particularly instructive because it takes advantage of the fact that self-organizing mechanisms are inherently regulative, i.e., they can locally repair themselves. Moreover, the precise way in which pigment stripes respond to surgical manipulation in the fish is strongly indicative of a Turing process. As we shall see later, the regulative nature of self-organizing pattern can both help and hinder robust patterning, a fact that may explain why boundary-organized mechanisms are also needed in pattern formation. Work on fish pigmentation patterns also emphasizes the fact that the creation of Turing patterns does not necessarily require secreted, diffusible activators and inhibitors (Kondo and Miura, 2010). The Turing process is a mathematical abstraction that invokes the production and destruction of interacting, moving signals. No restrictions are imposed on the molecular details of the signals, how they move, or how they interact. It is possible that the true prevalence of Turing patterns in development has been underestimated because biologists have been too focused on looking for particular kinds of molecules, rather than general design principles. From this we can see both the strength and weakness of the forward engineering approach in biology: it provides a direct route to design principles, but it cannot tell us how those principles are implemented in real biological systems. Reverse Engineering Growth The classic definition of a reverse engineer is the industrial spy who, using only stolen blueprints, figures out what a competitor’s product does. Unlike forward engineering, which progresses from performance objectives to design, reverse engineering starts with design and seeks to learn performance objectives. To do this, the engineer must either use pre-existing knowledge of design principles or use modeling and/or simulation to explore the sorts of behaviors a system is capable of, in the hope of recognizing performance that might be useful or desirable. Reverse engineering requires extensive knowledge of a system’s ‘‘wiring diagram,’’ which is one reason why opportunities to do it were rare in biology until the advent of comprehensive data-gathering methodologies such as genomics, proteomics, saturation mutagenesis, et cetera. Yet this is only half the reason why reverse engineering is such a prominent activity in systems biology. The other is that the goal of reverse engineering—to learn performance objectives—fills in just the kind of information that traditional molecular genetics cannot: what the components of a system are for, as opposed to merely what they do. The more massive the biological system, the more important such insight is. Among the processes that systems biologists have reverse engineered are metabolism, cell-cycle control, stress responses, and bacterial chemotaxis (Alon et al., 1999; Csikasz-Nagy et al., 2008; Khammash, 2008; Sauro and Kholodenko, 2004). The first explicit attempts to reverse engineer complex developmental systems date to Odell’s work on the network of signaling and Cell 144, March 18, 2011 ª2011 Elsevier Inc. 957
gene regulation that establishes segment polarity in the Drosophila embryo, and on Notch signaling in insect neurogenesis (Meir et al., 2002; von Dassow et al., 2000). In both cases, it was proposed that system design was influenced by a need for robustness to parameter uncertainty and internal noise. This work has been followed by many studies from other groups exploring ways in which other known mechanisms of pattern formation can also be robust (reviewed by Barkai and Shilo, 2009; Eldar et al., 2004; Lander et al., 2009b). Patterning is only one of two fundamental processes in morphogenesis, the other being growth. As growth is often a consequence of cell proliferation, this Review equates growth control with control of proliferation, aware, of course, that proliferation can occur without growth (e.g., in early embryos) or with much delayed growth. That growth is under tight control is supported by the precision observed in the sizes of organisms and their parts. For example, when genetic variability is controlled for, adult mouse brains vary only about 5% in size and cell number (Williams, 2000). For bilaterally symmetric organs (such as limbs), left-to-right variance in size is similarly very small (Wolpert, 2010). Such precision is impressive in light of the fact that proliferation, being an exponential process, compounds its errors. A mere 2% decrease in cell-cycle length will, over 30 cell cycles, cause a >50% increase in the size of a growing population. It is unlikely that the necessary cell-cycle precision to achieve normal organ and body size control can be achieved without some sort of feedback process. Indeed, the idea that negative feedback is involved in organ size control received early support from studies of liver regeneration and from in vitro studies showing that many cell types produce substances that suppress their own proliferation (reviewed by Elgjo and Reichelt, 2004). Work on such substances, chalones, did not significantly take off until the late 1990s, when it was found that mice deficient in the TGF-b family member GDF8 (myostatin) produced an excess of skeletal muscle. GDF8 is made by muscle and acts on muscle progenitors, thus fulfilling the requirements of a chalone. Subsequently, GDF11, a close homolog of GDF8, was found to exhibit analogous effects in a self-renewing neural tissue, the mouse olfactory epithelium (Wu et al., 2003). Other molecules have recently been suggested to act as chalones in a variety of tissues (reviewed by Lander et al., 2009a). The basic chalone model—in which chalones slow the proliferation of progenitors by an amount directly related to organ or tissue size—is too generic for reverse engineering. What’s needed is an actual wiring diagram of how such feedback is implemented in a real organ. Progress toward this end was made in studies of the olfactory epithelium, where progenitors pass through distinct lineage stages and GDF11 acts only at a very specific stage to influence the behavior of an apparent transitamplifying cell located between a stem cell and a differentiated neuron (Wu et al., 2003). It was subsequently found that activin B, another TGF-b family member, is also expressed in the olfactory epithelium and also has a negative effect on proliferation but acts uniquely on the stem cell stage, and not the transit-amplifying cell. Reverse engineering of this system (Lander et al., 2009a) entailed mathematically exploring what performance objectives could potentially be met by a multiplicity of progenitor 958 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
cell stages, a multiplicity of feedback factors, and the specificity of factors for single lineage stages. This analysis produced several useful results, and one of the most important was negative: A chalone that acts by slowing the divisions of an intermediate cell in the lineage of a self-renewing tissue should not be able to have any effect on steady-state tissue size. This suggested that the mechanism of action of GDF11, which targets just such a lineage stage, must involve more than just suppressing cell divisions. This in turn led to experiments showing that GDF11 also controls the renewal probability of its target cell, i.e., the probability that the target cell’s progeny remain of the same type instead of progressing to the next lineage stage (Lander et al., 2009a). Once this additional mechanism was taken into account, calculations showed that not only could GDF11 influence the tissue’s steady state, it could control it with near-perfect robustness. For instance, the steady state became robust to cell-cycle speeds, initial numbers of stem cells, and rates of cell death. Moreover, this feedback arrangement also creates a mechanism for triggering extremely rapid regeneration after injury. However, both performance objectives could not be met under the same conditions, unless additional feedback (in this case from activin) onto the stem cell was also included. Thus, reverse engineering suggests that the detailed interaction of lineage, feedback, and regulation of self-renewal found in the olfactory epithelium constitutes a system for simultaneous robust size control and rapid regeneration (Lander et al., 2009b; Lo et al., 2009). The Value of Integral Feedback Around the same time as the above studies on the olfactory epithelium, two groups independently concluded that feedback control of cell number in hematopoiesis also occurs primarily through the regulation of self-renewal, i.e., through control of lineage progression, as opposed to control of cell-cycle speed. In one case, the conclusion was supported by the dynamics of regenerative responses following bone marrow transplantation (Marciniak-Czochra et al., 2009). The other (Kirouac et al., 2009) derived the result from a combination of model exploration (a systematic approach to reverse engineering) and model fitting (using computational algorithms to extract parameter values from in vitro and in vivo data). The evidence that feedback specifically targets progenitor self-renewal in multiple systems suggests that there is some generically useful feature associated with this mechanism. Indeed, inspection shows that it is a straightforward implementation of an engineering strategy known as integral feedback control. Essentially, integral feedback control describes the strategy of feeding back into a system a signal that is proportional to the time integral of the difference between the system’s current behavior and its desired behavior (Figure 3). Integral control is observed in other biological systems (such as bacterial chemotaxis) (Figure 3A) and appears to be generically essential whenever feedback must maintain a desired output exactly (set-point control); this explains why it can robustly maintain self-renewing tissues at a predetermined size. In contrast, feedback regulation of the rate of progenitor progression through the cell cycle amounts to what engineers call proportional control (feedback regulation of cell death also amounts to
Precisely what negative feedback factors are responsible for such behavior in the brain remains unknown. Factors such as GDF8, GDF11, and activin are present in many locations throughout the nervous system. In the retina, however, loss of gdf11 leads not to a change in tissue size but to marked alterations in the proportions of neuronal cell types produced, with some expanding at the expense of others (Kim et al., 2005). In the neural retina, a single progenitor cell type is thought to give rise to all the differentiated cells, suggesting that GDF11’s effects extend not just to whether progenitor cells renew or differentiate but also to their choice of what cell type to differentiate into.
Figure 3. Versatility of Integral Feedback Control Integral feedback is particularly useful for achieving set-point control, in which a system achieves a prespecified steady-state behavior independent of external (and often many internal) perturbations. The essence of integral control is to feed back a signal that reflects the time integral of error (the difference between the actual and desired states of the system). Biological systems often use this type of control to achieve robust, perfect adaptation, i.e., to return to a zero-activity state even after sustained perturbations. For example, in bacterial chemotaxis (A), integral feedback adaptively modulates signaling to maximize sensitivity to changes in chemoattractant levels (Alon et al., 1999; Yi et al., 2000). Integral feedback in the control of cell growth has been described for two distinct systems. Production of chalones, such as GDF11, by differentiated cells in the olfactory epithelium inhibits progenitor self-renewal (B), providing a feedback signal that increases (decreases) in time as long as the probability of progenitor cell renewal is greater (lesser) than 50% (Lander et al., 2009a). Mechanical compression within the Drosophila wing disc increases with disc size (C), potentially providing a growth inhibitory signal that increases in time as long as cells are proliferating (Shraiman, 2005). Integral feedback can also be used to make a morphogen gradient scale to fit the territory between its source of production and a distant boundary (D). In this case, the morphogen inhibits the production of a molecule that acts at long range to expand the range (length scale) of the morphogen (Ben-Zvi and Barkai, 2010). In such a scenario, buildup of the expander over time provides a time-integrated error signal, which only vanishes when the morphogen gradient expands all the way to the distant boundary.
proportional control). Although proportional control can provide some compensation for disturbances, it generically does not restore a perturbed system to a set-point. The same integral control mechanism that achieves robust maintenance of a steady state in constantly renewing lineages, as in the olfactory epithelium or in hematopoiesis, can also provide for robust final size specification in nonrenewing tissues such as the brain. Consistent with this, the pattern of gradual progenitor pool expansion, contraction, and extinction that occurs in the developing brain closely follows the expected consequences of negative feedback control of progenitor selfrenewal (Lander et al., 2009a). Indeed, measurements of progenitor self-renewal probabilities in the cerebral cortex show just the predicted steady decline that negative feedback control should produce (Nowakowski et al., 2002).
The Range of Control Notwithstanding their likely importance in regulating tissue growth and cellular composition, secreted negative feedback factors can, at best, be only part of the picture. Notch signaling, for example, also influences cell proliferation and controls the fate choices of progenitors (Artavanis-Tsakonas et al., 1999). Through lateral inhibition, Notch can ensure that precisely one progenitor can arise within a particular region of space, a sort of short-range set-point control. At the opposite extreme of range of action are circulating feedback inhibitors, which parabiosis experiments long ago implicated in liver size control (Moolten and Bucher, 1967). Indeed, every use of feedback for control in development has a characteristic spatial range. For example, secreted polypeptide growth factors (e.g., chalones) are thought to act within epithelial tissues at ranges up to a few hundred microns, due to the depleting effects of receptor-mediated uptake (e.g., Lander et al., 2009a; Shvartsman et al., 2001). How could such molecules integrate size information over the much larger scale of macroscopic organs? One possibility is that they act at an early stage of development, when dimensions are smaller. As organ growth proceeds, the control provided by feedback would become more and more locally autonomous. This would still allow for an accurate global response to perturbations that affect all locations equally (e.g., genetic variability, changes in body temperature, nutritional status), but not to local disruptions (e.g., physical damage to a part of the growing tissue would not elicit compensatory growth elsewhere). This seems a good framework for thinking about the specification of limb size, which is remarkably precise yet created out of the actions of parts that exhibit considerable growth autonomy (discussed by Pan, 2007; Wolpert, 2010). Such observations do not imply that growth control is achieved without global feedback, but simply that global feedback may occur early (e.g., in the limb bud instead of in the limb). This makes an important general point about developmental precision: the machinery for control is needed only at times when relevant perturbations tend to happen. One possible solution to control growth on many spatial scales is to combine strategies. This seems to happen in the olfactory epithelium because tissue size along the apicobasal dimension of the epithelium (70% of specific Drosophila melanogaster eve stripe 2 sites are not conserved in some other Drosophilidae, even though these modules produce identical output patterns. They all, however, respond to the same qualitative inputs. Furthermore, four different eve cis-regulatory modules (three pair-rule stripe modules and a heart expression module) isolated from flies perhaps 100 million years removed from their last common ancestor with Drosophila were shown to function identically when introduced into D. melanogaster despite extremely different site order, number, and spacing. Another example is found in a comparison of orthologous otx cis-regulatory modules in distantly related ascidians, which again revealed extremely different module organization despite identical spatial regulatory function (Oda-Ishii et al., 2005). These results indicate great freedom of cis-regulatory design, given only the constraint on input identity and of course the requirement that all the relevant sites lie within functional interaction range, in practice usually the several hundred base pairs of the module sequence. There is, however, one notable exception, namely when there is a high conservation of arrangement of sites found very closely apposed, presumably because the proteins bound to them interact directly with each other or with third parties, for instance Dorsal and Twist sites in multiple Drosophila neurogenic ectoderm genes (Hong et al., 2008), or Otx and Gatae sites in orthologous cis-regulatory modules of echinoderm otx genes (Hinman and Davidson, 2007). Also, many vertebrate cis-regulatory modules are known in which the order of closely packed target sites is conserved, resulting in high levels of sequence identity across cis-regulatory modules that have been evolving separately for 350–450 million years (Elgar and Vavouri, 2008; Pennacchio et al., 2006; Rastegar et al., 2008; Siepel et al., 2005; Vavouri et al., 2007; Wang et al., 2009). Because of this exception to the general rule of relaxed cis-regulatory design, in Table 1 site spacing is considered as a possible cause of input gain or loss. As Table 1 indicates, only those intramodular cis-
regulatory sequence changes that produce qualitative gain or loss of target sites can result in the co-option of the respective network node to a new temporal/spatial expression domain and thus in the alteration of functional GRN topology. An important implication of Table 1 is that contextual (external) cis-regulatory changes of several kinds may be a major source of evolutionary GRN redesign. Co-optive redeployment of cisregulatory modules can be due to translocation by mobile elements; spatial repression functions can disappear by deletion of whole modules; cis-regulatory recruitment can be altered by functions that tether them to different promoters. In some branches of evolution, duplication of regulatory genes followed by subfunctionalization has been a major source of evolutionary novelty (Jimenez-Delgado et al., 2009; Ohno, 1970). Although it is possible to estimate computationally the rate of single target site sequence appearance and disappearance, or for specific cases observe it, we have virtually no fix on the rates of processes that move cis-regulatory modules into new genomic contexts. Because cis-regulatory modules may be carried around by transposing mobile elements, and because the transposition of mobile elements is the most rapid type of large-scale genomic sequence change in animal genomes, this is likely to be a major mechanism of GRN evolution. In human, mouse, and Drosophila, estimates suggest insertion rates for certain types of mobile elements on the order of 101 per genome generation (Garza et al., 1991; Ostertag and Kazazian, 2001), and it is clear that there have been great bursts of mobile element insertion in the evolutionary history of many animal lineages including our own (e.g., Ohshima et al., 2003; Ostertag and Kazazian, 2001). DNA transposons, long-terminal repeat (LTR)-containing retrotransposons, and non-LTR-containing retrotransposons, both autonomous and nonautonomous (the latter meaning that enzymatic machinery from another retrotransposon is required for mobility), are all capable of altering genomic sequence. Their various excision, copy, and integration mechanisms lie beyond the scope of this paper (for reviews, see Gogvadze and Buzdin, 2009; Kazazian, 2004); suffice it to say that the diverse types of rearrangements they cause may directly affect transcriptional processes, positively or negatively. The LTRs of retrotransposons have intrinsic cis-regulatory activity and, when transposed into the vicinity of a gene, may cause its transcription (Gogvadze and Buzdin, 2009). In mammals, non-LTR retrotransposons (such as L1 in humans) have the ability to mobilize nonautonomous mobile elements (such as Alu repeats in humans), and these frequently carry with them adjacent sequence elements. Thus Alu repeats have apparently picked up cis-regulatory apparatus during their nonautonomous transpositions and moved them to the locations of new genes, and in addition their own sequence may mutate to produce cis-active transcription factor target sites, as shown in a number of specific examples (for review, Britten, 1997). A very important aspect of this mode of cis-regulatory target site insertion has recently been emphasized with the observation that, on a genome-wide basis, many such sites are species (or genus or order) specific (e.g., Odom et al., 2007). An excellent case in point is a recent study of sites recognizing the neural repressor REST (Johnson et al., 2009) where it is clear from comparison among mammalian genomes that primate-specific Cell 144, March 18, 2011 ª2011 Elsevier Inc. 971
Table 1. Evolutionary Alterations in cis-Regulatory Modules and Their Possible Functional Consequences Effect of Change at Sequence Level
Loss of Function
Quantitative Output Change
Input Gain/Loss within GRN
Gain of Function; Co-optive Redeployment to a New GRN
X
X
X
X
X
X
X
X
Change in site spacinga
X
X
X
Change in site arrangementa
X
X
X
Translocation of module to new geneb
X
Module deletionb
X
Appearance of new target site(s)a Loss of old target site(s)a Change in site number
a
X
New tethering functionb Duplication, subfunctionalization
X b
X
X
X
X X
GRN, gene regulatory network. a Internal change in cis-regulatory module sequence. b Change affecting genomic context of cis-regulatory.
sites have been inserted in recent evolutionary time all over the genome by Alu and L1 transposition, though most of the primate-specific sites are (as yet) probably functionless. In another case, a non-LTR retrotransposon has inserted an auto- and cross-regulatory site into a duplicate copy of the dmrt sex control gene in Medaka within the last 10 million years, which generates a functional species-specific control circuit determining developmental interplay between these two genes (Herpin et al., 2010). In summary, as previously speculated (Britten and Davidson, 1971), mobile elements could have provided a major mechanism of GRN evolution. They have the potential to produce exactly the kinds of genomic cis-regulatory change that a priori might be the most potent mechanisms for GRN change, that is, gain-of-function co-options of regulatory gene expression (Table 1). Evolution by cis-Regulatory Gain of Function Evolutionary change in GRN structure may follow directly from qualitative gain of cis-regulatory linkages among regulatory and/or signaling genes. If the phenotypic functionality of this type of evolutionary process were to require the homozygosity of the underlying DNA alteration, as in classic microevolutionary theory, GRN evolution would be essentially inconceivable. But in fact phenotypic functionality of a co-optive change in regulatory gene expression will not depend on homozygosity. As initially pointed out by Ruvkun et al. (1991) and further discussed by Davidson and Erwin (2010), gain-of-function cis-regulatory co-options that produce regulatory gene expression in new domains act dominantly, and this has fundamental consequences for evolutionary process. Thousands of routine lab experiments in which regulatory systems are systematically redesigned to produce ectopic expression show that for most regulatory genes, particularly in early development, a single copy of the gain-of-function allele produces the regulatory effect. The potency of a cis-regulatory gain-of-function co-option for altering GRN structure is easily imagined, as in the cartoon of Figure 1. Here we see how the co-option could have occurred by addition of sites to a pre-existing cis-regulatory module, or by insertion of a new cis-regulatory module, and then how this 972 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
co-option could alter function downstream of the GRN. In the examples of Figure 1, the regulatory gene newly incorporated in the GRN might control the deployment of a signal system, or of a differentiation gene battery, but of course it could have many different effects. There is an intrinsically high possibility of evolutionary reorganization of GRN structure by cis-regulatory gain-of-function co-options, given the general rapidity of cis-regulatory evolution and the haplodominance of gain-of-function changes in regulatory gene expression. Any organism in which such a change had occurred in either the maternal or paternal germline would, if viable, become a clonal founder (Davidson and Erwin, 2010). A cis-regulatory gain-of-function event of any of the kinds listed in Table 1 could have an immediate operational effect on a GRN, if a newly incident addition to a regulatory state caused additional GRN subcircuits to be deployed (as in Figure 1B). Or it could perhaps result in a regulatory gene expression that is for the moment functionless, though harmless, but which could later become functional when additional co-optive events add to the regulatory state other factors with which the first can cooperate combinatorially, or when additional cis-regulatory changes provide new functional targets. An almost revolutionary revision emerges from the realization that GRN function can change in creative ways by mechanisms that are likely rather than unlikely to occur; that will be dominant and haplosufficient when they do occur; and that may be driven by a plethora of diverse processes at the cis-regulatory DNA level, some of which continuously or stochastically alter genomes with relatively high frequency. Periods of rapid evolutionary change may be thought of in these terms, but this also raises the obverse question: we now need an explanation for the paleontological demonstration of very long periods of evolutionary stasis in the basic body plans of many animal lineages. The Hierarchical Organization of Developmental GRNs Knowing that the basic events causing GRN evolution are cisregulatory alterations, particularly those resulting in qualitative additions to or subtractions from the developmental regulatory
Figure 1. Regulatory Gene Co-option and Possible Consequences The diagram shows cis-regulatory mutations that could result in co-optive change in the domain of expression of a regulatory gene and consequences at the level of gene regulatory networks (GRNs). (A) Co-option event: The gene regulatory networks operating in spatial Domains 1 and 2 produce different regulatory states (colored balls, representing diverse transcription factors). A cis-regulatory module of Gene A, a regulatory gene, has target sites for factors present in the Domain 1 regulatory state and so Gene A and its downstream targets are expressed in Domain 1, but not in Domain 2 where only one of the three sites can be occupied. Two alternative types of cis-regulatory mutations are portrayed: appearance of new sites within the module by internal nucleotide sequence change; and transposition into the DNA near the gene of a module from elsewhere in the genome bearing new sites. Although these gainof-function changes do not affect the occupancy of the cis-regulatory sites of Gene A in Domain 1, the new sites allow Gene A to respond to the regulatory state of Domain 2, resulting in a cooptive change in expression so that Gene A is now active in Domain 2 (modified from Davidson and Erwin, 2010). (B) Gain-of-function changes in Domain 2 GRN architecture caused by co-option of Gene A: Gene A might control expression of an inductive signaling ligand, which could alter the fate/function of adjacent cells now receiving the signal from Domain 2 (left); Gene A might control expression of Gene B, another regulatory gene, and together with it cause expression of a differentiation (D) gene battery, which in consequence of the co-option is now expressed in Domain 2 (right).
state, we can sharpen the question we are asking: how do the structural properties of GRNs affect the developmental consequences of such cis-regulatory alterations? The Consequences of Hierarchical GRN Structure As discussed above, the GRNs controlling embryonic development of the body plan are intrinsically hierarchical, essentially because of the number of successive spatial regulatory states that must be installed in the course of pattern formation, celltype specification, and differentiation. This property of GRNs fundamentally affects the way we need to consider the question just put. The consequences of any given cis-regulatory mutation will depend entirely on where in the GRN hierarchy the affected cis-regulatory node lies. As Figure 2 shows, changes that occur in the cis-regulatory control apparatus of a given differentiation gene could cause redeployment of that gene; changes in the cis-regulatory system determining expression of a controller of the battery could cause redeployment of the whole battery; changes upstream of that could affect redeployment of whole regulatory states, or of many other features. The circuitry drawn in Figure 2 is of course arbitrary but its import is general. So in order to understand predictively the effect of a given cis-regulatory change, the GRN architecture and the position of the mutation therein must be known. This may seem a demanding requirement, but from the point of view of understanding evolution mechanistically, it places a powerful lever in our hands. First, it should enable a rational interpretation of evolutionary differ-
ences in development between related animals in terms of GRN structure (we consider examples below); second, in principle it could enable predicted effects to be tested experimentally by inserting the cis-regulatory change into a related form expressing the pleisiomorphic GRN, termed ‘‘synthetic exzperimental evolution’’ (Erwin and Davidson, 2009). Another direct evolutionary consequence of GRN hierarchy has also been discussed (Davidson and Erwin, 2006, 2009), and this is the phenomenon of canalization. In developmental terms the establishment of a spatial regulatory state constrains subsequent processes: like a decrease in entropy, the number of possible regulatory states downstream is now decreased. If the regulatory state defines a progenitor field for a given organ, then all the subsequent stages in the development of that organ must take place within that domain. As in development so in evolution, and thus a co-optive mutation leading to qualitative evolutionary reorganization at cis-regulatory nodes of an upper-level GRN subcircuit is much more likely to entail numerous deleterious problems downstream than if the change were to occur further down in the hierarchy. Therefore upper levels of GRN hierarchy are much less likely to change once a hierarchical GRN has evolved than are more peripheral levels, and this is the empirical mark of the classical canalization phenomenon. Currently, no GRN is analyzed to a degree that we know its linkages and functions from its upstream to downstream Cell 144, March 18, 2011 ª2011 Elsevier Inc. 973
Figure 2. Evolutionary Consequences of cis-Regulatory Mutations Functional evolutionary consequences of cis-regulatory mutations depend on their location in gene regulatory network (GRN) architecture. A GRN circuit encoding the control system of a differentiation gene battery (bottom tiers) activated in response to a signal from adjacent cells (top tier); linkages are in blue, red, and green. The double arrow indicates signal reception and transduction causing gene expression in the recipient cells. Note that the middle tier of circuitry consists of a dynamic feedback stabilization subcircuit. The numbered red ‘‘x’’ symbols denote mutational changes in the cis-regulatory modules controlling expression of these genes, keyed by number to the functional consequences listed in the box below. Loss-of-function mutations (1 and 2) are indicated in green, and co-optive gain-of-function mutations (3 and 4) resulting in expression of the affected gene in a new domain, as in Figure 1A, are indicated in blue (modified from Erwin and Davidson, 2009).
peripheries, that is, from the beginning of the developmental process to the terminal differentiated state. We do know, however, that the GRN output is observable as individual gene expression patterns and, ultimately, as the developmental process. We can use these outputs to infer a framework within which to position individual regulatory subcircuits or evolutionary changes within the hierarchical GRN. To facilitate the discussion on GRN evolution we now define GRN parts according to the developmental functions they control and then go on to consider abstractly the impact of evolutionary changes occurring in each of these parts. As shown in Figure 3, we can distinguish four causally connected developmental functions that are encoded by sections of the GRN represented by Boxes I–IV. The most upstream part of the GRN indicated in Box I controls postgastrular pattern formation. It is animated by pregastrular spatial and signaling inputs (maternal anisotropies, maternal factors, early interblastomere signals, all used as directional cues, and then by the outputs of the initial zygotic GRNs). The functions of the GRNs set up in this phase of development, including their signaling interactions, are to establish broad domains that section the organism with respect to the major body axes. The immediate output of the GRNs of Box I is to set upregulatory state domains within spatially defined areas of the organism. These domains, such as the neuraxis or mesodermal layers, constrain the position of future body parts and also now provide initial regulatory 974 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
inputs that will be utilized in subsequent derivatives of their territories. The fate patterns they produce are often broadly conserved within clades (the early postembryonic ‘‘phylotype’’). In Box II, progenitor fields for specific body parts (for example, the heart progenitor field or the limb bud) are defined within these early domains. These are sets of cells each expressing the specific GRNs indicated at the level of Box II. The progenitor field then must be subdivided into regions that give rise to the future constituent pieces of the body part, each of which is foreshadowed by a new GRN (for example, the aorta or ventricle of the heart or the autopod of the limb). Within Box III thus lie the GRNs that control both the identity and the spatial boundaries of these subparts. This patterning GRN thus implements a coordinate system within the progenitor domain that is crucial for morphology and function of the body part. Both patterning GRNs (e.g., Box I and Box III) are oriented along the same axes, and the downstream body-part-specific patterning GRN therefore depends at least indirectly on the upper-level postgastrular patterning GRNs. Depending on the complexity of the body part, multiple rounds of spatial regulatory state subdivision and installation of further regional GRNs may be required. Thus, the progression from Box II- and Box III-type GRNs may be reiterated (backward arrow in Figure 3). Only following these patterning processes, the terminal cell-fate specification GRNs (Box IV) become activated in spatially restricted domains within the body part progenitor field. At the lower periphery of developmental GRNs are the differentiation gene batteries, that is, the protein-coding effector genes plus their immediate transcriptional regulatory drivers. What kinds of subcircuit topologies are found at these different levels of GRN hierarchy? So far, a number of GRNs have been elaborated that indicate the recurrent use of subcircuits in given developmental contexts (Peter and Davidson, 2009). One such subcircuit, the positive feedback subcircuit, links two or more regulatory genes by multiple activating regulatory interactions and acts to stabilize regulatory states. This is necessary in body-part-specific GRNs (Box II) or cell-fate GRNs (Box IV), given that pattern formation processes usually occur only in a limited temporal window. Recurrent activating linkages keep the genes expressed even when the initial activating regulatory input fades. A positive intercellular feedback subcircuit can result in a ‘‘community effect’’ (Bolouri and Davidson, 2010), the stabilizing activation of similar regulatory states within a field of cells. Here a gene encoding an intercellular signaling ligand is expressed under the control of the same signal transduction system it activates. The pattern-forming GRNs of Box I and Box III in Figure 3, in contrast, operate largely by means of transient signal inputs as well as repressive exclusion functions that control spatial subdivision. Patterning processes are not concerned with stabilization or homogenization of regulatory states, and they contain few positive feedback loops. The biological function of individual subcircuit topologies predicts the probability of its occurrence at specific positions within the GRN hierarchy. If one had to predict the GRN parts most likely modified in the evolution of body plans, a place to begin would be to define where in the developmental process and therefore in the GRN hierarchy differences occur. Morphological differences between species of
different phyla affect the basic body plan, the overall organization of the organism. During development, the body plan is established mainly by the upstream embryonic patterning mechanisms and the individual body-part specification programs that they activate in given positions. Phylum-level morphological differences are therefore expected to occur in the GRNs underlying Boxes I and II. Among classes within the same phylum, the position with respect to the body axes or the internal structures of individual body parts may differ. Differences in the positions of body parts relative to each other could occur even when embryonic patterning GRNs and body-part specification GRNs are conserved, simply by rewiring the connections between these functions (such as the linkages connecting Box I and Box II; see also the discussion of hox gene functions below). This could result in alterations in the positions of given body parts. Morphological differences within body parts are more likely to be caused by differences in the spatial assignment of cell-fate domains determined by the body-part patterning GRNs of Box III. Based on these arguments one would expect that mutations in regulatory linkages within the patterning functions are more likely to be the cause of morphological changes, whereas specification GRNs active within given cell types or body-part progenitor fields are more likely to be conserved. Given the predicted prevalence of specific network topologies for given biological functions, there might be a direct correlation between regional network topology and rate of evolutionary change. Regulatory linkages used for patterning embryos or body parts frequently rely on inductive signals that connect GRNs underlying specification in different domains and ensure orchestrated progression of development. In organisms of different spatial geometry, inductive signaling relationships will differ, and thus, inductive signaling interactions are likely to show a higher rate of evolutionary change. Indeed they do, as discussed elsewhere (Davidson and Erwin, 2006; Erwin and Davidson, 2009). The high level of conservation of positive feedback subcircuits has been previously proposed in the Kernel theory of Davidson and Erwin (2006). These Kernels consist of a few regulatory genes linked by recursive positive regulatory interactions, and they are usually used upstream in GRNs that control the specification of progenitor fields for particular body parts, and they are conserved at large evolutionary distances. In summary, evolution of GRNs to produce new developmental outcomes must involve new subcircuit deployments. This places a premium on co-optive change at the switches, signals, and inter-subcircuit inputs that encode subcircuit deployment. Evolution of new developmental GRN features must thus proceed to some extent as a process in which diverse subcircuits are combined, recombined, activated, and inactivated in given spatial domains of the embryo. Evolution by Regulatory Changes in Single Genes Though the jobs of development require the outputs of multigene subcircuits of given topologies, we see from the above that there are points of ‘‘flexibility’’ in developmental GRNs, where cooptive gain-of-function, or loss-of-function, regulatory changes may have large effects. By focusing on naturally occurring variations between closely related animals where visible evolutionary change has occurred recently, the most evolutionarily flexible
aspects of the regulatory system are revealed. In the examples that follow, in which single genes are responsible for the changes observed, it has furthermore been possible to obtain experimental evidence for the evolutionary mechanism underlying the phenotypic variation in form. Genomic Basis of Rapid Evolutionary Trait Loss A canonical example, recently elaborated at the sequence level, and causally confirmed by experiment, is reduction of pelvic spines in stickleback fish. Following the end of the last Ice Age, marine stickleback fish were marooned in multiple lakes formed as the glaciers melted, and during the last 10,000– 20,000 years independent populations of two different genera of these fish have repeatedly lost external pelvic spines. The exact selective advantages of pelvic reduction and spine loss are not defined, but as it has happened many times independently, there clearly are some (Shapiro et al., 2006 and references therein). Genetic complementation tests show that diverse isolates bear the same or overlapping genetic lesions, and this is so even in crosses of species from different genera displaying the same spine reduction phenotype. The underlying genomic event turns out to be deletion of a cis-regulatory module that controls expression of the pitx1 regulatory gene in the pelvic buds during larval development (Chan et al., 2010). Most significantly, when this cis-regulatory module was cloned upstream of a sequence encoding the Pitx1 protein and introduced into reduced spine fish, it rescued the spineless phenotype. The cis-regulatory module lies in an unstable, repetitive sequencefilled genomic region, possibly accounting for its repeated deletion (Chan et al., 2010). The pitx1 gene is clearly involved in pattern formation functions upstream of pelvic girdle specification, and in spineless fish there is no pitx1 expression in the pelvic buds even though the coding region of the gene is intact (Cole et al., 2003; Shapiro et al., 2006). In amniotes pitx1 operates in the patterning system that organizes the subparts of the appendages developing from the hindlimb buds, and forced expression in forelimb buds transforms them into hindlimbs (Logan and Tabin, 1999; Szeto et al., 1999). Thus this gene operates upstream in a portion of the GRN, the function of which is to generate the spatial regulatory states that presage the parts of the amniote hindlimb, and also of the pelvis, which is rudimentary in mice deficient in pitx1 (Szeto et al., 1999). Though pitx1 could execute more downstream roles in pelvic skeletal formation as well, its expression prior to the terminal phases of pelvic skeletogenesis indicates that it also functions in a Box III body-part-specific patterning GRN in stickleback fish. However, rapidly evolving, reduced, or regressive phenotypes can be due to gain-of-function as well as loss-of-function mutations. The Mexican cave fish Astyanax exists both in riverine surface waters and in various cave populations that became isolated about 10,000 years ago, and the regressively evolved traits of the cave populations have been studied for over a half century. A recurrent change in cave Astyanax is degeneration of eyes during larval development. During embryogenesis of cavefish, the eyes initially develop similarly to those of surface conspecifics, including expression of many regulatory genes (Jeffery, 2005, 2009). But then many things go wrong in eye development including apoptotic degeneration of lens and retina. A cause is ectopic spatial expression of sonic hedgehog Cell 144, March 18, 2011 ª2011 Elsevier Inc. 975
(shh) from the normal medial interocular region across the top of the ocular fields in cave fish. As shown experimentally by introduction of shh mRNA in surface Astyanax, excess Shh causes expression of transcription repressors (vax1 and pax2a), which interfere with pax6 expression and thus the downstream pax6 ocular patterning subcircuit (Jeffery, 2009; Yamamoto et al., 2004; Baumer et al., 2002). Also, excess Shh indirectly promotes apoptosis in lens and retina. Though yet undefined at the sequence level, in cave Astyanax, regulatory changes have evidently caused a spatial gain of function in shh transcription resulting in regression of the eyes. The simplest cases of evolutionary trait loss are deleterious mutations in far downstream differentiation genes. Pigmentation is among the regressive traits in cave Astyanax. Two pigmentation phenotypes have been shown to be due to mutations in the protein-coding sequences of receptors directly involved in pigmentation, oca2 (Protas et al., 2006) and mc1r (Gross et al., 2009). However, in stickleback fishes where there is also loss of pigmentation in lacustrine forms, cis-regulatory changes rather than coding region mutations are responsible (Miller et al., 2007). Here the gene responsible encodes kit ligand (Steele factor) and this gene has pleiotropic effects, so that total loss of function would be severely deleterious. Loss of function in a single cis-regulatory module, on the other hand, has specific effects that under certain conditions are adaptive. Because this is a general feature of cis-regulatory versus coding sequence mutations, it predicts that evolutionary changes in any pleiotropically active gene, as are most regulatory genes, will generally target specific cis-regulatory modules (as discussed, for example, by Chan et al., 2010; Miller et al., 2007; Prud’homme et al., 2006). Inverting this argument, we see a powerful evolutionary explanation for the modularity generally typical of the cis-regulatory systems controlling expression of regulatory and signaling genes in animal genomes (Davidson, 2001, 2006). GRN evolution by regulatory gain and loss of function of expression of these genes would be utterly impossible were these control systems not in general modular, given that almost all such genes function in multiple time-space compartments, and in multiple GRNs during development. Physical and functional modularity in the control systems of regulatory genes is thus among the fundamental characteristics of animal genomes that permit and, indeed, that produce evolution of development by GRN reorganization. Morphological Variation due to Single-Gene Regulatory Changes Whereas the foregoing concerns rapidly occurring evolutionary changes in single-gene functions that are of adaptive significance, we now face a conundrum. How do we extrapolate from recent evolutionary events to the much more ancient processes by which order- and class-level differences in body plan arose, let alone phylum-level differences? Recent studies focusing on the adaptive evolution of external traits in and among Drosophila species have revealed processes of cis-regulatory sequence microevolution. Such processes account for variation in pigmentation patterns due to regulatory changes affecting expression of the yellow differentiation gene (Gompel et al., 2005; Rokas and Carroll, 2006) and the ebony differentiation gene (Rebeiz et al., 2009). Similarly, cis-regulatory 976 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
evolution in the shavenbaby (ovo) regulatory gene, which controls the differentiation and morphogenesis of trichomes (short hair-like surface appendages), determines where this gene is expressed, and thereby the minute pattern differences in trichome distribution distinguishing Drosophila species (McGregor et al., 2007). These studies afford multiple real examples of cis-regulatory site addition, and quantitative as well as qualitative cis-regulatory gain and loss of function due to internal DNA sequence change (see Table 1). They provide general and specific indication of the flexibility and changeability of cis-regulatory modules in local evolution, at the level of function and deployment of differentiation gene batteries, the lowest level in the hierarchy of Figure 3. Mechanistic studies of intra- and interspecific evolutionary variation illuminate the next level up as well, that is, evolutionary changes (other than simple loss of function) in the Box III-type pattern formation GRNs that determine the morphological characteristics of given body parts. The results have thus far often resolved into demonstration of alterations in the deployment of signal systems in the development of these parts; that is, the underlying evolutionary change is in the cis-regulatory apparatus controlling time and place of inductive signaling, just as predicted earlier. The causal developmental mechanism underlying the adaptively diverse beak morphologies of Darwin’s classic series of Galapagos finch species was solved in these terms by Abzhanov et al. (2006, 2004). Species with heavy beaks displayed earlier and higher expression of bone morphogenetic protein 4 (BMP4) in pre-beak neural crest mesenchyme, and species with elongated, pointed beaks expressed Ca2+/calmodulin at higher levels, indicating that beak length depends on extent of Ca2+ signaling. Remarkably, experimental overexpression of BMP4 by retroviral gene transfer into developing frontonasal tissues of chicken embryos produces robust beaks, and experimental overexpression of the downstream mediator of Ca2+ signaling, CaMKII, produced elongated beaks, confirming the causality. To take another example, a recent study shows that short legs in dog breeds such as dachshunds and basset hounds is due to a retrogene encoding fibroblast growth factor 4 (FGF4), inserted and evidently controlled by cis-regulatory elements carried in non-LTR transposons (Parker et al., 2009). Changes in upstream patterning apparatus can account for differences in body plan at inter-ordinal to inter-class levels, and such changes are not found in comparing organisms that diverged only a few million or a few thousand years ago or less. For example, one of the characters distinguishing bats and rodents, which are of different mammalian orders and in fact belong to different super-orders, is the much longer relative length of the forearm skeleton in bats. A candidate regulatory gene known to affect limb skeletal elongation is prx1 (mhox), and in bats this gene is upregulated after the early limb bud stage compared to mice (Cretekos et al., 2008). The (indirect) causality of this change was then demonstrated by inserting the bat prx1 limb enhancer into the mouse gene, with the result that the forelimbs of the recipient mouse now develop with relatively longer dimensions. In an essentially similar case, the tbx5 gene, deeply embedded in the vertebrate heart formation GRN (for review, Davidson, 2006), turns out to be regulated differently during heart formation in reptiles than in birds and mammals,
Figure 3. Hierarchy in Developmental Gene Regulatory Networks The diagram shows a symbolic representation of hierarchy in developmental gene regulatory networks. The developmental process begins with the onset of embryogenesis at top. The outputs of the initial (i.e., pregastrular) embryonic gene regulatory networks (GRNs) are used after gastrulation to set up the GRNs, which establish regulatory states throughout the embryo, organized spatially with respect to the embryonic axes (axial organization and spatial subdivision are symbolized by orthogonal arrows and colored patterns). These spatial domains divide the embryonic space into broad domains occupied by pluripotent cell populations already specified as mesoderm, endoderm, future brain, future axial neuroectoderm, non-neural ectoderm, etc. The GRNs establishing this initial mosaic of postgastrular regulatory states, including the signaling interactions that help to establish domain boundaries, are symbolized as Box I. Within Box I domains the progenitor fields for the future adult body parts are later demarcated by signals plus local regulatory spatial information formulated in Box I, and given regulatory states are established in each such field by the earliest body-part-specific GRNs. Many such progenitor fields are thus set up during postgastrular embryogenesis, and a GRN defining one of these is here symbolized as Box II. Each progenitor field is then divided up into the subparts that will together constitute the body part, where the subdivisions are initially defined by installation of unique GRNs producing unique regulatory states. These ‘‘sub-body part’’ GRNs are symbolized by the
a class-level difference. Expression of this gene is confined to the left ventricle in the developing amniote heart but is expressed across the common ventricle in the three-chambered reptile heart (Koshiba-Takeuchi et al., 2009). When uniform tbx5 expression is forced in the mouse heart, or left ventricle tbx5 expression is prevented, that is, if a reptilian tbx5 spatial regulatory expression is imposed, the mouse develops a three-chambered heart lacking an interventricular septum. Understanding of developmental GRN structure tells us that these examples differ from the foregoing in that they imply the existence of Box III GRN subcircuits in which the targeted genes participate. In contrast, in the peripheral gene examples above, the phenotype is wholly encompassed by changes in a single cis-regulatory system. Hox Gene Functions in Upper-Level GRN Patterning Systems Genes of the trans-bilaterian hox complexes have been the subject of a vast amount of phenomenological research, which has revealed the many and various effects on developmental morphology of hox gene knockouts or ectopic hox gene expression. The variety of effects precludes any simple interpretation of the functions of these genes in terms of developmental GRN structure, for the simple reason that they work at diverse levels. Studies of direct hox gene targets reveal both other regulatory genes and far downstream genes encoding proteins active in apoptosis, cell-cycle control, cell adhesion, cell polarity, noncanonical signaling, and cytoskeletal functions (Cobb and Duboule, 2005; Hueber and Lohmann, 2008; Pearson et al., 2005). However, Hox genes are most famous for their developmental effects on the placement and the internal organization of body parts. The most important evolutionary and developmental attributes of hox gene complex function can be reduced to two statements: first, in organisms in which coherent hox complexes exist they are expressed in a vectorial or sequential fashion with respect to the coordinates of the body plan or the body part; and second, they can act as switches that allow (or activate) GRN patterning subcircuits in given locations of the body plan or body part, or alternately they prohibit (or repress) these subcircuits in given locations. The genomic organization of hox gene clusters indicates that distinct mechanisms account for the locations in the body plan where individual hox genes are expressed in development. In Drosophila a plethora of cis-regulatory modules control each aspect of expression of each gene. Particularly well-known at the cis-regulatory level is the bithorax region (Ho et al., 2009; Maeda and Karch, 2009; Simon et al., 1990). Each specific hox gene enhancer responds to local upstream regulatory states that are the product of earlier developmental GRNs, just as in oriented patterns of Box III. Because some body parts are ultimately of great complexity, the process of patterned subdivision and installation of successively more confined GRNs may be iterated, like a ‘‘do-loop,’’ symbolized here by the upwards arrow from Box III to Box II, labeled n R 1. Toward the termination of the developmental process in each region of the late embryo, the GRNs specifying the several individual cell types and deployed in each subpart of each body part, are symbolized here as Box IV. Postembryonic generation of specific cell types (from stem cells) is a Box IV process as well. At the bottom of the diagram are indicated several differentiation gene batteries (‘‘DGB1, 2, 3’’), the final outputs of each cell type. Morphogenetic functions are also programmed in each cell type (not shown). For discussion and background, see text and Davidson, 2001, 2006.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 977
any other developmental process. Similarly, many very wellcharacterized cis-regulatory modules that control very specific spatial and temporal aspects of anterior hox gene expression are known in mammals, and often conserved to fish (Tumpel et al., 2009). The prevalence of local cis-regulatory hox gene control modules explains how these genes can function in animals that lack large hox gene clusters. It is interesting that hox genes are not required for embryonic development of organisms that utilize fixed cell lineages for specification (Davidson, 1990), for instance in C. elegans, which lacks both a coherent hox complex and many hox genes (Aboobaker and Blaxter, 2010); in sea urchins (Martinez et al., 1999); or in Ciona, which also lacks a coherent hox complex (Ikuta et al., 2010). However, in addition to control by local enhancers, another entirely different mechanism that speaks directly to both the evolutionary maintenance of the hox gene cluster(s) and the vectorial expression of hox genes relative to one another has come to light in mammals and other tetrapods. Over the last decade, transcriptional control of the mouse hoxd complex has been extensively examined by deletions, rearrangements, and insertions of reporter transgenes, including ectopically positioned hox genes at various locations in the complex (Herault et al., 1998, 1999; Kmita et al., 2000; Spitz et al., 2003; Tarchini and Duboule, 2006). To summarize very briefly, early expression in the tetrapod limb bud is controlled not only by local enhancers but also by distant regulatory regions located outside the hox gene clusters. One of these operates from the 30 (anterior) end of the cluster and causes the progressive expression of first anterior and then middle hox genes in the limb bud region that will give rise to the forearm. Meanwhile the posterior hox genes are repressed by a counteracting locus control region operating from beyond the 50 end of the complex in the anterior cells of the early limb bud, allowing expression of these genes only in the posterior limb bud cells. A second phase of hoxd expression is controlled by other complex distant enhancers located 200 kb away from the 50 end of the cluster, which are required to pattern the autopod region of the tetrapod limb where the digits form (Tarchini and Duboule, 2006). This ‘‘global control region’’ (GCR) is responsible for a graded expression of the five posterior hox genes across the anterior/posterior (A/P) dimension of the autopod. The GCR probably had an ancient role in controlling colinear expression in the central nervous system, a basal axial organization function that in terms of our Figure 3 would reside somewhere in Box I; part of the active GCR elements are conserved from fish to mammals. However some limb-specific elements of the GCR likely evolved in tetrapods, particularly the autopod control device and its patterning GRN, which would make the autopod a novel evolutionary invention with respect to the fish antecedents (Gonzalez et al., 2007; Woltering and Duboule, 2010). More generally, it is an interesting speculation that distant hox complex control regions were superimposed during chordate evolution (they are absent from Drosophila), and control by local hox gene enhancers was the primal regulatory mode (Spitz et al., 2001). However, because the regulatory landscape to which the local enhancers must respond can be very different in different organisms, they themselves must have evolved in clade-specific ways. 978 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Given these systems, deeply conserved and otherwise, by which hox gene expression is regionally controlled, we come to their mode of interaction with the GRNs that control development of specific body parts. Sometimes individual hox genes act by participating, like any other regulatory genes in patterning GRNs, for example in early hindbrain specification, a Box I function. Together with other important regulatory genes such as krox and Kreisler, the anterior group hoxa and hoxb genes establish recursively wired, extremely conserved, rhombomerespecific GRNs (Tumpel et al., 2007, 2009). But more often they operate in another, evolutionarily flexible way, such that change in their functions has been directly correlated, in many comparative observations, with evolutionary change in both the positioning and organization of body parts. Not all body parts require the vectorial patterning function of the hox gene complex, for example they are not expressed in the midbrain or forebrain of vertebrates and they have nothing to do with the specification of the extremely complex regional regulatory states installed during midbrain or forebrain development. Where vectorial inputs are required, hox genes intervene in local, mid-development, patterning functions (Box III). Here we can rely on a number of specific examples. These are of immediate evolutionary significance in that the developmental outcomes that they control vary sharply among related clades. For example, the tetrapod limb bud is a ‘‘new’’ evolutionary invention, dating to the emergence of vertebrate forms onto land. Development of the limb depends directly on deployment of hox gene expression at several levels of the underlying GRN. The early expression of 50 hox genes at the posterior margin of the bud causes expression of the shh gene in these cells, ultimately setting up anterior and posterior regulatory states in the limb bud (Zakany et al., 2004). Posterior 50 hox gene expression can be thought of as a switch activating the responsible circuitry. Later, during the autopod expression phase, the GCR responds in turn to graded levels of Shh contributing to the nested pattern of hox gene expression in the autopod, and the GCR can be thought of as a node in the patterning network. Another example concerns the axial skeleton in vertebrates, which vary greatly in the distribution of vertebral morphologies, again a developmental function of hox gene expression patterns. It is now possible to state just which sets of vertebrae require hox5PG (paralog group), hox6PG, hox9PG, hox10PG, and hox11PG (for review, Wellik, 2009). These relationships can all be interpreted in one simple way. For each type of vertebra (cervical, rib-bearing thoracic, lumbar, etc.), there is a specific patterning GRN operating at the Box III level, and the products of (often) two adjacent PGs allow it to be activated in the right place along the axis or may cause it to be activated ectopically when these hox genes are activated ectopically. That is to say, these hox genes act as regionally active switches that we can imagine sitting on the outside of the Boxes containing the morphogenetic patterning GRNs. Switch behavior is particularly easy to perceive when the switch acts negatively: thus the PG10 hox genes prevent rib formation, normally used to preclude ribs on the lumbar vertebrae; if expressed ectopically no ribs form, and in complete loss of function ribs form almost everywhere (Carapuco et al., 2005; Vinagre et al., 2010; Wellik and Capecchi,
2003). On the other hand, hox6PG genes promote rib formation. The autonomy of these hox-driven switches, as shown by the complete ectopic production of one or another vertebral type in gain-of-function experiments, implies a useful evolutionary mechanism for variation in axial skeletal proportions. Indeed, comparative observations show that different vertebrate classes have hox spatial expression domains that correlate with the axial morphology (examples reviewed in Davidson, 2006). However, the most severe axial changes in tetrapod evolution, those responsible for the body plans of snakes and reptiles, have involved more than merely upstream regulatory changes affecting hox gene expression domains (Di-Poi et al., 2010; Woltering et al., 2009): in addition, the sequences of some of the genes themselves have changed, regulatory linkages between gene expression and effects such as the hox10 inhibition of rib formation have been broken, and numerous transposon insertions have altered the genomic structure of the posterior hox cluster possibly affecting their spatial regulation. The mechanism by which the Drosophila ubx gene represses wing formation in the third thoracic (T3) segment provides the most explicit possible illustration of what it means for a hox gene to intervene negatively and switch off a local patterning GRN. In the absence of Ubx function in T3, what should be the haltere imaginal disc produces a wing, hence Ed Lewis’ famous 4-winged fly (Bender et al., 1983). Thus Ubx function is repressive with respect to the wing patterning GRN in the late T3 imaginal disc. The way this works is repression by Ubx and its cofactors of several genes of the wing GRN, as shown by analyses of Ubx clones in the haltere disc and of Ubx+ clones in the wing disc (Galant et al., 2002; Weatherbee et al., 1998). These are direct cis-regulatory repressions. There are many arthropod examples not yet examined at the GRN level where the mechanisms of hox gene function must be similar. In arthropods the anterior boundaries of expression of the Ubx/Abd-A genes vary from class to class and sometimes among orders of the same class, e.g., among crustaceans, and this boundary is correlated with the type of appendage present on the segment; from these correlations, Ubx evidently represses execution of the patterning GRN underlying development of feeding appendages (maxillipeds) and permits development of locomotory thoracic appendages (Averof and Patel, 1997). This inference has been demonstrated, by experimentally decreasing or increasing Ubx expression in a shrimp that normally produces one pair of maxillipeds, with the result of producing additional pairs of these appendages or instead only thoracic legs, respectively (Liubicich et al., 2009; Pavlopoulos et al., 2009). Drosophila affords many further examples of hox gene switches that permit or preclude regional morphogenetic GRN function in body part formation, among the most convincing of which is in heart development (Lo et al., 2002). Further examples of regional hox gene control of specific body part identity by cis-regulatory intervention are in somatic muscle pair specification. Each muscle develops from founder cells expressing specific transcription factors, i.e., a specific regulatory state (Baylies et al., 1998). There are direct hox gene inputs into this process, for example, the alary muscles that connect the aorta of the heart and that require Ubx and AbdA for their development (Dubois et al.,
2007; LaBeau et al., 2009). Throughout the body plan, hox genes control clade-specific deployment of organs and structures. So in summary, the common statement that hox genes ‘‘pattern’’ this or that body part means that they provide negative or positive cis-regulatory inputs into genes that are engaged in the GRN circuits, which actually do the work of spatial patterning and body-part morphogenesis. Sometimes the hox gene inputs form part of the subcircuit itself as when there are feedback linkages between them and other regulatory genes, as in the later limb bud or rhombomere specification circuitry cited above. But in many more cases than those mentioned here the function of these regionally expressed genes is rather to provide a one-way switch that provides ‘‘go’’ or ‘‘no go’’ instruction to body-part-specific GRN patterning circuitry. In evolution the deployment of these switches, and the linkages between them and the body-part-specific subcircuits, are far more flexible than is the internal structure of these subcircuits. Some of these body-part-specific GRN structures are in evolutionary terms very ancient indeed. Conservation and Change in Developmental GRNs The self-described field of ‘‘evo-devo’’ has generated enormous masses of descriptive spatial gene expression data, a frequent object of which is to show evolutionary ‘‘conservation’’ of developmental gene use. Developmental gene use cannot truly be regarded as conserved unless the regulatory linkages surrounding the genes in the GRN are conserved. Thus gene expression data by themselves are a poor index of evolutionary conservation. Because negative results are uninformative, we learn little of what has changed by looking only at what has not. Unless all forms were ‘‘sprung forth fully blown’’ like Athena from the head of Zeus, the evolution of the diverse body plans of animals requires large-scale processes of change in ancestral developmental GRN architecture. Furthermore, what is it that is conserved: is it use of a given gene in a given developmental process? Is it use of a given gene in a given subcircuit in a given process? Here we consider evolutionary conservation and evolutionary change, not of specific individual gene use, but of specific GRN circuitry. Conservation The hierarchical Linnean classification system we use, including modern corrections based on molecular phylogenetics, essentially arranges animal body plans on the basis of their evolutionarily shared and derived characters (avoiding convergent associations). Shared body plan characters of given clades ultimately imply conserved developmental regulatory circuitry (Davidson and Erwin, 2009). But other apparently older characters are shared over huge phylogenetic distances across cladistic boundaries, being represented in multiple bilaterian phyla and in diverse body plans. These are particular body parts, such as hearts, and the major domains of brains, and particular cell types, such as muscle and neurons. Because of their very widespread distribution, some differentiation gene batteries are probably among the oldest features of modern developmental GRNs (Davidson, 2006; Davidson and Erwin, 2009). But just as a cell type is not the same thing as a body part, so a differentiation gene battery is not the same thing as a cell type. During evolution the identity of the effector Cell 144, March 18, 2011 ª2011 Elsevier Inc. 979
genes can change radically, whereas the biological function of the cell type remains the same; and in addition, the cell type often has cell-biological or -morphological characteristics that are not encoded the same way as is activation of sets of effector genes. So we have to consider what GRN structures actually lie at the root of trans-phyletic cell-type conservation. A few examples may clarify this issue. We know many cell types that are present in many types of animals, the specific properties of which depend on conserved differentiation gene batteries including both conserved downstream regulatory states and effector genes. For example, everyone is familiar with pan-eumetazoan (cnidarian plus bilaterian) conservation of striated and smooth muscle. Here the distinctive cellular morphology, the function, and, underlying these, the regulatory state consisting of myogenic bHLH factors and MEF2, plus downstream effector genes exemplified by the myosin heavy chain contractile protein, are all conserved (Seipel and Schmid, 2005). The same is true of neuronal cell types (e.g., Hayakawa et al., 2004). There are many additional examples in which both regulatory state and effector genes are evidently conserved. A comparison between vertebrate and annelid light-sensitive nonocular neurosecretory cell types that produce vasotocin (vasopressin-neurophysin) as well as opsin provides a striking case (Tessmar-Raible et al., 2007). This cell type is located in the forebrains of both a polychaete annelid and zebrafish, as are also very similar chemosensory neurosecretory cell types that produce RF-amide. The vasotocinergic cells of both vertebrate and annelid express similar (Box IV) regulatory states, generated by the nk2.1, rx, and otp genes, as well as a gene producing the miR-7 microRNA that is also, in both organisms, expressed in the RF-amidonergic cells. Vasotocinergic neurosecretory cells were probably pan-bilaterian cell types, though genes encoding vasotocin have been lost in (sequenced) ecdysozoan lineages. Ocular photoreceptor cells provide another example of a pan-bilaterian cell type in which the Box IV GRN controlling the various subtypes of receptors (rhabdomeric receptors in insects, and rods and cones in vertebrates) operate downstream of regulatory genes of the K50 homeodomain family (Mishra et al., 2010; Ranade et al., 2008). These genes are otx2 and crx in mammals (Corbo et al., 2010; Hennig et al., 2008) and otd in Drosophila (where a paired class regulatory gene, pph3, which binds to the same sites as does Pax6, is also utilized in regulation of the same target genes). The transcription factors encoded by otd, or crx and otx2, directly activate the cis-regulatory control systems of the genes encoding the photoreceptor pigments in flies and mice. In addition, the targets of these regulatory genes, in both flies and mice, include phototransduction genes (rhodopsins, transducins, phosphodiesterase genes, arrestins) and cell morphogenesis genes (Ranade et al., 2008). The mammalian Box IV crx/otx2 GRN includes a canonical set of six other regulatory genes, interactions among which in mammals determine the photoreceptor subtype (Hennig et al., 2008; Swaroop et al., 2010). That is, in these cell types both downstream effector genes and their immediate regulatory apparatus are deployed in a manner that is widely conserved. But there is another, profoundly interesting pattern of conservation displayed by pan-bilaterian cell types, in which the down980 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
stream effector genes are clade specific, whereas the definitive upstream regulatory states are conserved across clades. Immune cells provide the most evidence, for as knowledge of the diverse strategies for immune response, both adaptive and nonadaptive, extends beyond mammals, an amazing variety of effector genes is revealed but the same familiar sets of regulatory genes are found to control their expression. Lampreys, for example, have the equivalent of T cells and B cells but instead of somatically reassembled T- and B-immunoglobulin receptors, they express somatically reassembled variable leucine-rich repeat receptors (Guo et al., 2009; Herrin and Cooper, 2010). Yet the T cell-like lamprey cell regulatory state includes factors encoded by familiar T cell genes, such as bc11b, gata2/3, c-and rel; and like T cells, their development depends on Notch signaling. In Drosophila, the pathogen-activated innate immune response, which deploys a number of antimicrobial effector molecules, depends, as does much of our very different innate immune response, on inducible regulatory factors of the NF-kB family (Hoffmann, 2003). And sea urchins, which employ a surprising and unique repertoire of hundreds of receptors of several different classes in their dedicated immune cells (Hibino et al., 2006; Messier-Solek et al., 2010), express in these cells a regulatory state very familiar to students of mammalian hematopoietic systems, such as the factors encoded by the scl, e2a, gata1/2/3, ikaros, and runx genes, and even a pu.1-like ets family gene. Another system in which a conserved cell-type-specific regulatory state controls entirely different effector genes, which nonetheless execute the same function, is found in the cells that in development create the outer epidermal barrier against the external world, and which recreate this barrier in wound repair. In vertebrates the barrier is composed of a mixture of crosslinked keratins of diverse kinds, matrix proteins, lipids, special cornified membrane proteins, etc; in insects it is composed of crosslinked chitins, plus other proteins and lipids. The structures are entirely nonhomologous in molecular identity. In mammals wound repair requires expression (among other proteins) of a crosslinking transglutaminase, whereas in Drosophila it requires expression of dopa decarboxylase and tyrosine hydroxylase, which generates quinones that crosslink chitin and cuticle proteins (Pearson et al., 2009; Ting et al., 2005). But in both flies and mice these functions are directly regulated by genes of the grainyhead family, which encodes transcription factors that utilize a unique DNA-binding domain (also found in fungi), plus other factors of the jun/fos family. The Box IV cell-type GRNs are conserved, but the effector genes are entirely diverse. So we see that ancient cell-type-specific functions, which were utilized in the lineage ancestral to all bilaterians, are essentially defined by specific regulatory states, that is to say by genomically encoded GRN cassettes that produce cell-typespecific regulatory states. Sometimes the effector gene sets that these regulatory states animate are at least partially conserved, sometimes not. In evolutionary terms, the genomic repository of basic bilaterian (or eumetazoan) functions such as immunity, wound repair, contraction, and photoreception was built into these cell-type-specific regulatory cassettes, and they have ever since retained their identity.
Some body parts are also conserved across the cladistic boundaries. This implies that there is something in the genetic programs for development of these body parts that is also conserved. However, in cases where the final structures are diverse, and develop via very diverse pattern formation and morphogenetic mechanisms, it may be that only the Box II GRN circuits encoding the initial establishment of the progenitor field from which the body part will be built are conserved, plus the final deployment of conserved cell types. Comparative GRN analysis is beginning to reveal ‘‘kernels’’ (Davidson and Erwin, 2006), in which regulatory genes wired together in certain conserved linkages execute upstream regulatory functions in development of given body parts. These circuits are characterized by extensive feedback wiring, and where tested, interference with expression of any of their genes results in developmental catastrophe. These features, and developmental canalization due to the upstream position of such kernels in the body-part GRN, explain their exceptional evolutionary conservation. Examples include what may be a pan-bilaterian (i.e., from flies to mice) kernel for heart specification (Davidson, 2006) and an (at least) pan-echinoderm kernel underlying mesoderm specification in both sea urchin and sea star development (McCauley et al., 2010) (these lineages have not shared a common ancestor since the end of the Cambrian). Similarly, a fundamental Box II subcircuit may underlie mesoderm specification in vertebrate embryogenesis (Swiers et al., 2010). A recursively wired triple feedback circuit has been proposed as a kernel underlying the pluripotent state of endothelial/hematopoietic precursors that arise in vertebrate development (Pimanda et al., 2007). There are also many less coherent observations, not yet at the level of an explicit GRN, in which detailed patterning similarities plus some gene interaction data strongly suggest the existence of GRN kernels that yet await elucidation. One convincing example is the brain, where a large amount of work has illuminated striking similarities in both A/P and mediolateral patterns of regulatory gene expression as well as homologous gene interactions between Drosophila and mouse (Davidson, 2006; Denes et al., 2007; Lowe et al., 2003; Seibert and Urbach, 2010; Tessmar-Raible et al., 2007). Evolutionary Change in GRN Architecture Evolutionary rewiring of GRN architecture by means of cis-regulatory co-optive change of given linkages among regulatory genes is the most common upper-level evolutionary mechanism by which developmental process is altered. That is, the GRN of a common ancestor is the source structure for diverse alterations in that structure in the derived descendants. But of course, not all parts of the structure are equally accessible to change, for the reasons we have tried here to point out. Sometimes the contrast between the conserved and nonconserved parts of a given GRN is quite dramatic, as in a comparison between sea urchin and sea star endomesodermal GRNs that revealed an extremely conserved, five-gene subcircuit, surrounded by linkages not one of which had survived the half-billion years since divergence without change (Hinman and Davidson, 2007). This was an inter-class comparison; for visualization of the process of evolutionary GRN rewiring the inter-ordinal comparison between developmental GRNs in Drosophila and Tribolium is illuminating.
Remarkable examples of architectural GRN rewiring have come to light in comparisons of the segmentation GRNs of various insects. Given that the short germ band mode of development appears to be pleisiomorphic for insects and their sister group the crustaceans, the linkages seen in the early A/P patterning GRN of Tribolium, for example, may be closer to the ancestral linkages than the derived linkages of Drosophila GRNs. This is supported by a vast literature on many other insects and crustaceans as well (see following citations for references to work on other species). Every major aspect of A/P patterning analyzed at the gene interaction level appears to include some different linkages in Drosophila compared to Tribolium. For example, it had been thought that the absence of the bicoid gene outside of higher Diptera was compensated in other groups by a similar function of the anteriorly expressed otd gene, which encodes a regulator with a Bcd-like homeodomain and target specificity. But recent work shows that in Tribolium otd functions very differently from bcd in Drosophila, in that it operates through different downstream linkages (Kotkamp et al., 2010). Unlike bcd in Drosophila, it controls dorsal/ventral (D/V) patterning, by repressing sog expression; it affects zen expression; and it contributes no spatial A/P input to the patterning process. Similarly, it is clear that some of the GRN wiring downstream of the hunchback gene differs, for in Tribolium hb apparently does not directly control primary pairrule genes and does not repress but rather activates giant (Choe et al., 2006; Marques-Souza et al., 2008). On the other hand, in both species hb sets the anterior boundary of Ubx expression and provides an activating input into the kruppel regulatory system (Marques-Souza et al., 2008). The architecture of pair-rule GRNs in Drosophila and Tribolium, which are composed of largely the same genes, is very different (Choe and Brown, 2009; Jaynes and Fujioka, 2004), but, amazingly, they generate the same downstream outcomes, the expression of wg and en across each parasegment border. Some linkages in the pair-rule GRNs are the same, but many are entirely rewired: for instance, eve directly represses wg in Tribolium whereas eve indirectly represses en in Drosophila (via slp; Choe and Brown, 2009). Upstream of this, in Tribolium is a possibly pleisiomorphic kernel-like segmentation subcircuit, consisting of mutual interconnections among eve, runt, and odd, which runs sequentially to pattern the forming segments (Choe et al., 2006). Another example of extensive rewiring among the same genes engaged in the same developmental process since divergence between Coleoptera and Diptera is in eye specification. A comparison between the relatively well-known eye specification GRN of Drosophila (for reviews, Friedrich, 2006; Kumar, 2009) with that governing larval and adult eye specification in Tribolium (Yang et al., 2009) displays remarkable differences. The genes at the top of the Drosophila hierarchy, e.g., toy and ey (pax6 orthologs), are not even needed for adult eye development in Tribolium, where another gene in the same network, dachshund, operates redundantly with pax6, rather than being located downstream of the pax6 genes in the network. Subcircuit Co-option Considering what we know of how developmental GRNs are constructed, it is not surprising that successful developmental Cell 144, March 18, 2011 ª2011 Elsevier Inc. 981
programs are used repeatedly, plugged into various positions in the GRN hierarchy. One example of a subcircuit-level cooption event has been discovered in sea urchins, in which a class-specific evolutionary modification has caused the acquisition of an embryonic skeleton not present in other echinoderms. The shared feature of echinoderms is an endoskeleton in the adult organism. A comparison of regulatory gene expression in embryonic and adult skeletogenic precursor cells revealed a large overlap at least at the nodes of these GRNs, which very likely extends also to the linkages between them (Gao and Davidson, 2008; Oliveri et al., 2008). Regulatory genes exclusive to the embryonic GRN are those determining the embryonic location in which the skeletogenic GRN is activated. Thus, by modifying probably only a small number of cis-regulatory sequences, the skeletogenic GRN subcircuit is redeployed such that it is activated both in the embryo and in the adult, most likely by use of the exact same genomic sequences. A similar interpretation has been applied to the apparent conservation of proximodistal patterning mechanisms in entirely nonhomologous bilaterian appendages (Lemons et al., 2010). Thus, Drosophila and vertebrate leg progenitor fields express the same set of regulatory genes in the same sequential order along the proximodistal axis, although as a result of different regulatory interactions. Interestingly, this very same sequence of regulatory gene expressions is observed also along the anteroposterior axis in the head neuroectoderm of Drosophila and Saccoglossus, a hemichordate lacking appendages. McGinnis et al. therefore propose that the similarity of patterning observed in these nonhomologous body parts might be the result of independent co-options of a subcircuit with conserved function (Lemons et al., 2010). A relatively recent co-option of an entire body part has occurred in teleost fish, resulting in the formation of a secondary jaw in the same location where ancient pharyngeal teeth developed (Fraser et al., 2009). Malawi cichlids, which possess both oral and pharyngeal jaws and teeth, show very similar expression of signaling molecules and transcription factors in tooth-forming cells in both locations, supporting the hypothesis that tooth development in oral and pharyngeal jaws is driven by the same tooth GRN. However, one substantial difference exists, which is the expression of a set of hox genes in the pharyngeal but not the oral jaw. In mouse pharyngeal arches, hox genes are expressed in all but the first pharyngeal arch, which gives rise to the oral jaw. Mutation of genes in the hoxa cluster results in formation of ectopic jaw-like skeletal structures, and hox genes are therefore thought to prevent the development of jaws in caudal pharyngeal arches (Minoux et al., 2009). In other words, the jaw-forming patterning GRN would be expressed in more posterior pharyngeal arches were it not for the repressive hox gene switch. Therefore, if the same was true in the teleost ancestor, a prerequisite for the evolutionary co-option of the jaw GRN to a causal pharyngeal arch in cichlids would have been the uncoupling of this developmental program from the repressive hox input. These examples may display the co-optive redeployment of developmental GRNs and the switch-like function of hox genes in controlling spatial utilization of GRNs. 982 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Conclusions To make sense of the physical mechanisms that underlie the origin of animal body plans (Davidson and Erwin, 2009), we must consider how change in DNA sequence can affect development of the body plan at the system level. For development of the body plan is a heritable regulatory system process, which we can represent and manipulate and comprehend only in terms of genomically encoded GRN architecture. The evidence that comes to us from evo-devo comparisons of gene expression patterns, from detailed studies of regulatory changes in single genes, from direct comparative GRN analysis, from evolutionary conservation and evolutionary innovation, and from the fossil record can only be integrated in a mechanistic way by resolving the meaning of this evidence in terms of its import for developmental GRN architecture. This is the path to demystification of body-plan evolution. This project cannot be approached, except in an indirect exemplary sense, by looking at change in single cisregulatory modules or single proteins, nor in ignorance of the regulatory gene interactions that constitute the architecture of developmental GRNs. The theory of evolution by change in GRN architecture also generates the path to experimental validation of evolutionary process by synthetic changes in developmental GRNs. This approach is already beginning to be applied, as we review above. The genomic control of the developmental process itself can only be understood in terms of the genomic regulatory system, and so must time-based change in that regulatory system, the basis of body-plan evolution. ACKNOWLEDGMENTS We gratefully acknowledge support for this work from NIH Grant HD-037105 and from NSF Grant IOS-0641398.
REFERENCES Aboobaker, A., and Blaxter, M. (2010). The nematode story: Hox gene loss and rapid evolution. Adv. Exp. Med. Biol. 689, 101–110. Abzhanov, A., Protas, M., Grant, B.R., Grant, P.R., and Tabin, C.J. (2004). Bmp4 and morphological variation of beaks in Darwin’s finches. Science 305, 1462–1465. Abzhanov, A., Kuo, W.P., Hartmann, C., Grant, B.R., Grant, P.R., and Tabin, C.J. (2006). The calmodulin pathway and evolution of elongated beak morphology in Darwin’s finches. Nature 442, 563–567. Averof, M., and Patel, N.H. (1997). Crustacean appendage evolution associated with changes in Hox gene expression. Nature 388, 682–686. Balhoff, J.P., and Wray, G.A. (2005). Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. Proc. Natl. Acad. Sci. USA 102, 8591–8596. Baumer, N., Marquardt, T., Stoykova, A., Ashery-Padan, R., Chowdhury, K., and Gruss, P. (2002). Pax6 is required for establishing naso-temporal and dorsal characteristics of the optic vesicle. Development 129, 4535–4545. Baylies, M.K., Bate, M., and Ruiz Gomez, M. (1998). Myogenesis: A view from Drosophila. Cell 93, 921–927. Bender, W., Akam, M., Karch, F., Beachy, P.A., Peifer, M., Spierer, P., Lewis, E.B., and Hogness, D.S. (1983). Molecular genetics of the bithorax complex in Drosophila melanogaster. Science 221, 23–29. Bolouri, H., and Davidson, E.H. (2010). The gene regulatory network basis of the ‘‘community effect,’’ and analysis of a sea urchin embryo example. Dev. Biol. 340, 170–178.
Britten, R.J. (1997). Mobile elements inserted in the distant past have taken on important functions. Gene 205, 177–182.
Erwin, D.H., and Davidson, E.H. (2009). The evolution of hierarchical gene regulatory networks. Nat. Rev. Genet. 10, 141–148.
Britten, R.J., and Davidson, E.H. (1971). Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty. Q. Rev. Biol. 46, 111–138.
Fraser, G.J., Hulsey, C.D., Bloomquist, R.F., Uyesugi, K., Manley, N.R., and Streelman, J.T. (2009). An ancient gene network is co-opted for teeth on old and new jaws. PLoS Biol. 7, e31.
Cameron, R.A., and Davidson, E.H. (2009). Flexibility of transcription factor target site position in conserved cis-regulatory modules. Dev. Biol. 336, 122–135.
Friedrich, M. (2006). Ancient mechanisms of visual sense organ development based on comparison of the gene networks controlling larval eye, ocellus, and compound eye specification in Drosophila. Arthropod Struct. Dev. 35, 357–378.
Carapuco, M., Novoa, A., Bobola, N., and Mallo, M. (2005). Hox genes specify vertebral types in the presomitic mesoderm. Genes Dev. 19, 2116–2121. Chan, Y.F., Marks, M.E., Jones, F.C., Villarreal, G., Jr., Shapiro, M.D., Brady, S.D., Southwick, A.M., Absher, D.M., Grimwood, J., Schmutz, J., et al. (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327, 302–305. Choe, C.P., and Brown, S.J. (2009). Genetic regulation of engrailed and wingless in Tribolium segmentation and the evolution of pair-rule segmentation. Dev. Biol. 325, 482–491.
Galant, R., Walsh, C.M., and Carroll, S.B. (2002). Hox repression of a target gene: extradenticle-independent, additive action through multiple monomer binding sites. Development 129, 3115–3126. Gao, F., and Davidson, E.H. (2008). Transfer of a large gene regulatory apparatus to a new developmental address in echinoid evolution. Proc. Natl. Acad. Sci. USA 105, 6091–6096. Garza, D., Medhora, M., Koga, A., and Hartl, D.L. (1991). Introduction of the transposable element mariner into the germline of Drosophila melanogaster. Genetics 128, 303–310.
Cobb, J., and Duboule, D. (2005). Comparative analysis of genes downstream of the Hoxd cluster in developing digits and external genitalia. Development 132, 3055–3067.
Gogvadze, E., and Buzdin, A. (2009). Retroelements and their impact on genome evolution and functioning. Cell. Mol. Life Sci. 66, 3727–3742.
Choe, C.P., Miller, S.C., and Brown, S.J. (2006). A pair-rule gene circuit defines segments sequentially in the short-germ insect Tribolium castaneum. Proc. Natl. Acad. Sci. USA 103, 6560–6564.
Gompel, N., Prud’homme, B., Wittkopp, P.J., Kassner, V.A., and Carroll, S.B. (2005). Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature 433, 481–487.
Cole, N.J., Tanaka, M., Prescott, A., and Tickle, C. (2003). Expression of limb initiation genes and clues to the morphological diversification of threespine stickleback. Curr. Biol. 13, R951–R952.
Gonzalez, F., Duboule, D., and Spitz, F. (2007). Transgenic analysis of Hoxd gene regulation during digit development. Dev. Biol. 306, 847–859.
Corbo, J.C., Lawrence, K.A., Karlstetter, M., Myers, C.A., Abdelaziz, M., Dirkes, W., Weigelt, K., Seifert, M., Benes, V., Fritsche, L.G., et al. (2010). CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors. Genome Res. 20, 1512–1525. Cretekos, C.J., Wang, Y., Green, E.D., Martin, J.F., Rasweiler, J.J.t., and Behringer, R.R. (2008). Regulatory divergence modifies limb length between mammals. Genes Dev. 22, 141–151. Davidson, E.H. (1990). How embryos work: a comparative view of diverse modes of cell fate specification. Development 108, 365–389. Davidson, E.H. (2001). Genomic Regulatory Systems: Development and Evolution (San Diego, CA: Acdemic Press). Davidson, E.H. (2006). The Regulatory Genome. Gene Regulatory Networks in Development and Evolution (San Diego, CA: Academic Press/Elsevier). Davidson, E.H., and Erwin, D.H. (2006). Gene regulatory networks and the evolution of animal body plans. Science 311, 796–800. Davidson, E.H., and Erwin, D.H. (2009). An integrated view of precambrian eumetazoan evolution. Cold Spring Harb. Symp. Quant. Biol. 74, 65–80. Davidson, E.H., and Erwin, D.H. (2010). Evolutionary innovation and stability in animal gene networks. J. Exp. Zoolog. B Mol. Dev. Evol. 314, 182–186. Denes, A.S., Jekely, G., Steinmetz, P.R., Raible, F., Snyman, H., Prud’homme, B., Ferrier, D.E., Balavoine, G., and Arendt, D. (2007). Molecular architecture of annelid nerve cord supports common origin of nervous system centralization in bilateria. Cell 129, 277–288. Dermitzakis, E.T., Bergman, C.M., and Clark, A.G. (2003). Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. Mol. Biol. Evol. 20, 703–714. Di-Poi, N., Montoya-Burgos, J.I., Miller, H., Pourquie, O., Milinkovitch, M.C., and Duboule, D. (2010). Changes in Hox genes’ structure and function during the evolution of the squamate body plan. Nature 464, 99–103. Dubois, L., Enriquez, J., Daburon, V., Crozet, F., Lebreton, G., Crozatier, M., and Vincent, A. (2007). Collier transcription in a single Drosophila muscle lineage: the combinatorial control of muscle identity. Development 134, 4347–4355. Elgar, G., and Vavouri, T. (2008). Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends Genet. 24, 344–352.
Gross, J.B., Borowsky, R., and Tabin, C.J. (2009). A novel role for Mc1r in the parallel evolution of depigmentation in independent populations of the cavefish Astyanax mexicanus. PLoS Genet. 5, e1000326. Guo, P., Hirano, M., Herrin, B.R., Li, J., Yu, C., Sadlonova, A., and Cooper, M.D. (2009). Dual nature of the adaptive immune system in lampreys. Nature 459, 796–801. Hare, E.E., Peterson, B.K., Iyer, V.N., Meier, R., and Eisen, M.B. (2008). Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet. 4, e1000106. Hayakawa, E., Fujisawa, C., and Fujisawa, T. (2004). Involvement of Hydra achaete-scute gene CnASH in the differentiation pathway of sensory neurons in the tentacles. Dev. Genes Evol. 214, 486–492. Hennig, A.K., Peng, G.H., and Chen, S. (2008). Regulation of photoreceptor gene expression by Crx-associated transcription factor network. Brain Res. 1192, 114–133. Herault, Y., Beckers, J., Kondo, T., Fraudeau, N., and Duboule, D. (1998). Genetic analysis of a Hoxd-12 regulatory element reveals global versus local modes of controls in the HoxD complex. Development 125, 1669–1677. Herault, Y., Beckers, J., Gerard, M., and Duboule, D. (1999). Hox gene expression in limbs: colinearity by opposite regulatory controls. Dev. Biol. 208, 157–165. Herpin, A., Braasch, I., Kraeussling, M., Schmidt, C., Thoma, E.C., Nakamura, S., Tanaka, M., and Schartl, M. (2010). Transcriptional rewiring of the sex determining dmrt1 gene duplicate by transposable elements. PLoS Genet. 6, e1000844. Herrin, B.R., and Cooper, M.D. (2010). Alternative adaptive immunity in jawless vertebrates. J. Immunol. 185, 1367–1374. Hibino, T., Loza-Coll, M., Messier, C., Majeske, A.J., Cohen, A.H., Terwilliger, D.P., Buckley, K.M., Brockton, V., Nair, S.V., Berney, K., et al. (2006). The immune gene repertoire encoded in the purple sea urchin genome. Dev. Biol. 300, 349–365. Hinman, V.F., and Davidson, E.H. (2007). Evolutionary plasticity of developmental gene regulatory network architecture. Proc. Natl. Acad. Sci. USA 104, 19404–19409. Ho, M.C., Johnsen, H., Goetz, S.E., Schiller, B.J., Bae, E., Tran, D.A., Shur, A.S., Allen, J.M., Rau, C., Bender, W., et al. (2009). Functional evolution of
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 983
cis-regulatory modules at a homeotic gene in Drosophila. PLoS Genet. 5, e1000709.
in hemichordates and the origins of the chordate nervous system. Cell 113, 853–865.
Hoffmann, J.A. (2003). The immune response of Drosophila. Nature 426, 33–38.
Ludwig, M.Z., Bergman, C., Patel, N.H., and Kreitman, M. (2000). Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567.
Hong, J.W., Hendrix, D.A., Papatsenko, D., and Levine, M.S. (2008). How the Dorsal gradient works: insights from postgenome technologies. Proc. Natl. Acad. Sci. USA 105, 20072–20076.
Maeda, R.K., and Karch, F. (2009). The bithorax complex of Drosophila an exceptional Hox cluster. Curr. Top. Dev. Biol. 88, 1–33.
Hueber, S.D., and Lohmann, I. (2008). Shaping segments: Hox gene function in the genomic age. Bioessays 30, 965–979.
Marques-Souza, H., Aranda, M., and Tautz, D. (2008). Delimiting the conserved features of hunchback function for the trunk organization of insects. Development 135, 881–888.
Ikuta, T., Satoh, N., and Saiga, H. (2010). Limited functions of Hox genes in the larval development of the ascidian Ciona intestinalis. Development 137, 1505– 1513.
Martinez, P., Rast, J.P., Arenas-Mena, C., and Davidson, E.H. (1999). Organization of an echinoderm Hox gene cluster. Proc. Natl. Acad. Sci. USA 96, 1469–1474.
Jaynes, J.B., and Fujioka, M. (2004). Drawing lines in the sand: even skipped et al. and parasegment boundaries. Dev. Biol. 269, 609–622.
McCauley, B.S., Weideman, E.P., and Hinman, V.F. (2010). A conserved gene regulatory network subcircuit drives different developmental fates in the vegetal pole of highly divergent echinoderm embryos. Dev. Biol. 340, 200–208.
Jeffery, W.R. (2005). Adaptive evolution of eye degeneration in the Mexican blind cavefish. J. Hered. 96, 185–196. Jeffery, W.R. (2009). Regressive evolution in Astyanax cavefish. Annu. Rev. Genet. 43, 25–47. Jimenez-Delgado, S., Pascual-Anaya, J., and Garcia-Fernandez, J. (2009). Implications of duplicated cis-regulatory elements in the evolution of metazoans: the DDI model or how simplicity begets novelty. Brief. Funct. Genomics Proteomics 8, 266–275. Johnson, R., Samuel, J., Ng, C.K., Jauch, R., Stanton, L.W., and Wood, I.C. (2009). Evolution of the vertebrate gene regulatory network controlled by the transcriptional repressor REST. Mol. Biol. Evol. 26, 1491–1507. Kazazian, H.H., Jr. (2004). Mobile elements: drivers of genome evolution. Science 303, 1626–1632. Kmita, M., van Der Hoeven, F., Zakany, J., Krumlauf, R., and Duboule, D. (2000). Mechanisms of Hox gene colinearity: transposition of the anterior Hoxb1 gene into the posterior HoxD complex. Genes Dev. 14, 198–211. Koshiba-Takeuchi, K., Mori, A.D., Kaynak, B.L., Cebra-Thomas, J., Sukonnik, T., Georges, R.O., Latham, S., Beck, L., Henkelman, R.M., Black, B.L., et al. (2009). Reptilian heart development and the molecular basis of cardiac chamber evolution. Nature 461, 95–98. Kotkamp, K., Klingler, M., and Schoppmeier, M. (2010). Apparent role of Tribolium orthodenticle in anteroposterior blastoderm patterning largely reflects novel functions in dorsoventral axis formation and cell survival. Development 137, 1853–1862. Kumar, J.P. (2009). The molecular circuitry governing retinal determination. Biochim. Biophys. Acta 1789, 306–314. LaBeau, E.M., Trujillo, D.L., and Cripps, R.M. (2009). Bithorax complex genes control alary muscle patterning along the cardiac tube of Drosophila. Mech. Dev. 126, 478–486. Lemons, D., Fritzenwanker, J.H., Gerhart, J., Lowe, C.J., and McGinnis, W. (2010). Co-option of an anteroposterior head axis patterning system for proximodistal patterning of appendages in early bilaterian evolution. Dev. Biol. 344, 358–362. Liberman, L.M., and Stathopoulos, A. (2009). Design flexibility in cis-regulatory control of gene expression: synthetic and comparative evidence. Dev. Biol. 327, 578–589. Liubicich, D.M., Serano, J.M., Pavlopoulos, A., Kontarakis, Z., Protas, M.E., Kwan, E., Chatterjee, S., Tran, K.D., Averof, M., and Patel, N.H. (2009). Knockdown of Parhyale Ultrabithorax recapitulates evolutionary changes in crustacean appendage morphology. Proc. Natl. Acad. Sci. USA 106, 13892–13896. Lo, P.C., Skeath, J.B., Gajewski, K., Schulz, R.A., and Frasch, M. (2002). Homeotic genes autonomously specify the anteroposterior subdivision of the Drosophila dorsal vessel into aorta and heart. Dev. Biol. 251, 307–319.
McGregor, A.P., Orgogozo, V., Delon, I., Zanet, J., Srinivasan, D.G., Payre, F., and Stern, D.L. (2007). Morphological evolution through multiple cis-regulatory mutations at a single gene. Nature 448, 587–590. Messier-Solek, C., Buckley, K.M., and Rast, J.P. (2010). Highly diversified innate receptor systems and new forms of animal immunity. Semin. Immunol. 22, 39–47. Miller, C.T., Beleza, S., Pollen, A.A., Schluter, D., Kittles, R.A., Shriver, M.D., and Kingsley, D.M. (2007). cis-Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in sticklebacks and humans. Cell 131, 1179–1189. Minoux, M., Antonarakis, G.S., Kmita, M., Duboule, D., and Rijli, F.M. (2009). Rostral and caudal pharyngeal arches share a common neural crest ground pattern. Development 136, 637–645. Mishra, M., Oke, A., Lebel, C., McDonald, E.C., Plummer, Z., Cook, T.A., and Zelhof, A.C. (2010). Pph13 and orthodenticle define a dual regulatory pathway for photoreceptor cell morphogenesis and function. Development 137, 2895– 2904. Oda-Ishii, I., Bertrand, V., Matsuo, I., Lemaire, P., and Saiga, H. (2005). Making very similar embryos with divergent genomes: conservation of regulatory mechanisms of Otx between the ascidians Halocynthia roretzi and Ciona intestinalis. Development 132, 1663–1674. Odom, D.T., Dowell, R.D., Jacobsen, E.S., Gordon, W., Danford, T.W., MacIsaac, K.D., Rolfe, P.A., Conboy, C.M., Gifford, D.K., and Fraenkel, E. (2007). Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat. Genet. 39, 730–732. Ohno S., ed. (1970). Evolution by Gene Duplication (New York: Springer-Verlag). Ohshima, K., Hattori, M., Yada, T., Gojobori, T., Sakaki, Y., and Okada, N. (2003). Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 4, R74. Oliveri, P., Tu, Q., and Davidson, E.H. (2008). Global regulatory logic for specification of an embryonic cell lineage. Proc. Natl. Acad. Sci. USA 105, 5955– 5962. Ostertag, E.M., and Kazazian, H.H., Jr. (2001). Biology of mammalian L1 retrotransposons. Annu. Rev. Genet. 35, 501–538. Parker, H.G., VonHoldt, B.M., Quignon, P., Margulies, E.H., Shao, S., Mosher, D.S., Spady, T.C., Elkahloun, A., Cargill, M., Jones, P.G., et al. (2009). An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science 325, 995–998.
Logan, M., and Tabin, C.J. (1999). Role of Pitx1 upstream of Tbx4 in specification of hindlimb identity. Science 283, 1736–1739.
Pavlopoulos, A., Kontarakis, Z., Liubicich, D.M., Serano, J.M., Akam, M., Patel, N.H., and Averof, M. (2009). Probing the evolution of appendage specialization by Hox gene misexpression in an emerging model crustacean. Proc. Natl. Acad. Sci. USA 106, 13897–13902.
Lowe, C.J., Wu, M., Salic, A., Evans, L., Lander, E., Stange-Thomann, N., Gruber, C.E., Gerhart, J., and Kirschner, M. (2003). Anteroposterior patterning
Pearson, J.C., Lemons, D., and McGinnis, W. (2005). Modulating Hox gene functions during animal body patterning. Nat. Rev. Genet. 6, 893–904.
984 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Pearson, J.C., Juarez, M.T., Kim, M., Drivenes, O., and McGinnis, W. (2009). Multiple transcription factor codes activate epidermal wound-response genes in Drosophila. Proc. Natl. Acad. Sci. USA 106, 2224–2229.
Swaroop, A., Kim, D., and Forrest, D. (2010). Transcriptional regulation of photoreceptor development and homeostasis in the mammalian retina. Nat. Rev. Neurosci. 11, 563–576.
Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., Lewis, K.D., et al. (2006). In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502.
Swiers, G., Chen, Y.H., Johnson, A.D., and Loose, M. (2010). A conserved mechanism for vertebrate mesoderm specification in urodele amphibians and mammals. Dev. Biol. 343, 138–152.
Peter, I.S., and Davidson, E.H. (2009). Modularity and design principles in the sea urchin embryo gene regulatory network. FEBS Lett. 583, 3948–3958. Pimanda, J.E., Ottersbach, K., Knezevic, K., Kinston, S., Chan, W.Y., Wilson, N.K., Landry, J.R., Wood, A.D., Kolb-Kokocinski, A., Green, A.R., et al. (2007). Gata2, Fli1, and Scl form a recursively wired gene-regulatory circuit during early hematopoietic development. Proc. Natl. Acad. Sci. USA 104, 17692– 17697. Protas, M.E., Hersey, C., Kochanek, D., Zhou, Y., Wilkens, H., Jeffery, W.R., Zon, L.I., Borowsky, R., and Tabin, C.J. (2006). Genetic analysis of cavefish reveals molecular convergence in the evolution of albinism. Nat. Genet. 38, 107–111. Prud’homme, B., Gompel, N., Rokas, A., Kassner, V.A., Williams, T.M., Yeh, S.D., True, J.R., and Carroll, S.B. (2006). Repeated morphological evolution through cis-regulatory changes in a pleiotropic gene. Nature 440, 1050–1053. Ranade, S.S., Yang-Zhou, D., Kong, S.W., McDonald, E.C., Cook, T.A., and Pignoni, F. (2008). Analysis of the Otd-dependent transcriptome supports the evolutionary conservation of CRX/OTX/OTD functions in flies and vertebrates. Dev. Biol. 315, 521–534. Rastegar, S., Hess, I., Dickmeis, T., Nicod, J.C., Ertzer, R., Hadzhiev, Y., Thies, W.G., Scherer, G., and Strahle, U. (2008). The words of the regulatory code are arranged in a variable manner in highly conserved enhancers. Dev. Biol. 318, 366–377. Rebeiz, M., Pool, J.E., Kassner, V.A., Aquadro, C.F., and Carroll, S.B. (2009). Stepwise modification of a modular enhancer underlies adaptation in a Drosophila population. Science 326, 1663–1667. Rokas, A., and Carroll, S.B. (2006). Bushes in the tree of life. PLoS Biol. 4, e352. Ruvkun, G., Wightman, B., Burglin, T., and Arasu, P. (1991). Dominant gain-offunction mutations that lead to misregulation of the C. elegans heterochronic gene lin-14, and the evolutionary implications of dominant mutations in pattern-formation genes. Dev. Suppl. 1, 47–54. Seibert, J., and Urbach, R. (2010). Role of en and novel interactions between msh, ind, and vnd in dorsoventral patterning of the Drosophila brain and ventral nerve cord. Dev. Biol. 346, 332–345. Seipel, K., and Schmid, V. (2005). Evolution of striated muscle: jellyfish and the origin of triploblasty. Dev. Biol. 282, 14–26. Shapiro, M.D., Bell, M.A., and Kingsley, D.M. (2006). Parallel genetic origins of pelvic reduction in vertebrates. Proc. Natl. Acad. Sci. USA 103, 13753–13758. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050. Simon, J., Peifer, M., Bender, W., and O’Connor, M. (1990). Regulatory elements of the bithorax complex that control expression along the anteriorposterior axis. EMBO J. 9, 3945–3956. Spitz, F., Gonzalez, F., Peichel, C., Vogt, T.F., Duboule, D., and Zakany, J. (2001). Large scale transgenic and cluster deletion analysis of the HoxD complex separate an ancestral regulatory module from evolutionary innovations. Genes Dev. 15, 2209–2214. Spitz, F., Gonzalez, F., and Duboule, D. (2003). A global control region defines a chromosomal regulatory landscape containing the HoxD cluster. Cell 113, 405–417.
Szeto, D.P., Rodriguez-Esteban, C., Ryan, A.K., O’Connell, S.M., Liu, F., Kioussi, C., Gleiberman, A.S., Izpisua-Belmonte, J.C., and Rosenfeld, M.G. (1999). Role of the Bicoid-related homeodomain factor Pitx1 in specifying hindlimb morphogenesis and pituitary development. Genes Dev. 13, 484–494. Tarchini, B., and Duboule, D. (2006). Control of Hoxd genes’ collinearity during early limb development. Dev. Cell 10, 93–103. Tessmar-Raible, K., Raible, F., Christodoulou, F., Guy, K., Rembold, M., Hausen, H., and Arendt, D. (2007). Conserved sensory-neurosecretory cell types in annelid and fish forebrain: insights into hypothalamus evolution. Cell 129, 1389–1400. Ting, S.B., Caddy, J., Hislop, N., Wilanowski, T., Auden, A., Zhao, L.L., Ellis, S., Kaur, P., Uchida, Y., Holleran, W.M., et al. (2005). A homolog of Drosophila grainy head is essential for epidermal integrity in mice. Science 308, 411–413. Tumpel, S., Cambronero, F., Ferretti, E., Blasi, F., Wiedemann, L.M., and Krumlauf, R. (2007). Expression of Hoxa2 in rhombomere 4 is regulated by a conserved cross-regulatory mechanism dependent upon Hoxb1. Dev. Biol. 302, 646–660. Tumpel, S., Wiedemann, L.M., and Krumlauf, R. (2009). Hox genes and segmentation of the vertebrate hindbrain. Curr. Top. Dev. Biol. 88, 103–137. Vavouri, T., Walter, K., Gilks, W.R., Lehner, B., and Elgar, G. (2007). Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans. Genome Biol. 8, R15. Vinagre, T., Moncaut, N., Carapuco, M., Novoa, A., Bom, J., and Mallo, M. (2010). Evidence for a myotomal Hox/Myf cascade governing nonautonomous control of rib specification within global vertebral domains. Dev. Cell 18, 655–661. Walters, J., Binkley, E., Haygood, R., and Romano, L.A. (2008). Evolutionary analysis of the cis-regulatory region of the spicule matrix gene SM50 in strongylocentrotid sea urchins. Dev. Biol. 315, 567–578. Wang, J., Lee, A.P., Kodzius, R., Brenner, S., and Venkatesh, B. (2009). Large number of ultraconserved elements were already present in the jawed vertebrate ancestor. Mol. Biol. Evol. 26, 487–490. Weatherbee, S.D., Halder, G., Kim, J., Hudson, A., and Carroll, S. (1998). Ultrabithorax regulates genes at several levels of the wing-patterning hierarchy to shape the development of the Drosophila haltere. Genes Dev. 12, 1474–1482. Wellik, D.M. (2009). Hox genes and vertebrate axial pattern. Curr. Top. Dev. Biol. 88, 257–278. Wellik, D.M., and Capecchi, M.R. (2003). Hox10 and Hox11 genes are required to globally pattern the mammalian skeleton. Science 301, 363–367. Woltering, J.M., and Duboule, D. (2010). The origin of digits: Expression patterns versus regulatory mechanisms. Dev. Cell 18, 526–532. Woltering, J.M., Vonk, F.J., Muller, H., Bardine, N., Tuduce, I.L., de Bakker, M.A., Knochel, W., Sirbu, I.O., Durston, A.J., and Richardson, M.K. (2009). Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code. Dev. Biol. 332, 82–89. Yamamoto, Y., Stock, D.W., and Jeffery, W.R. (2004). Hedgehog signalling controls eye degeneration in blind cavefish. Nature 431, 844–847. Yang, X., Zarinkamar, N., Bao, R., and Friedrich, M. (2009). Probing the Drosophila retinal determination gene network in Tribolium (I): The early retinal genes dachshund, eyes absent and sine oculis. Dev. Biol. 333, 202–214. Zakany, J., Kmita, M., and Duboule, D. (2004). A dual role for Hox genes in limb anterior-posterior asymmetry. Science 304, 1669–1672.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 985
Leading Edge
Review Interactome Networks and Human Disease Marc Vidal,1,2,* Michael E. Cusick,1,2 and Albert-La´szlo´ Baraba´si1,3,4,* 1Center
for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA of Genetics, Harvard Medical School, Boston, MA 02115, USA 3Center for Complex Network Research (CCNR) and Departments of Physics, Biology and Computer Science, Northeastern University, Boston, MA 02115, USA 4Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA *Correspondence:
[email protected] (M.V.),
[email protected] (A.-L.B.) DOI 10.1016/j.cell.2011.02.016 2Department
Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease. Introduction Since the advent of molecular biology, considerable progress has been made in the quest to understand the mechanisms that underlie human disease, particularly for genetically inherited disorders. Genotype-phenotype relationships, as summarized in the Online Mendelian Inheritance in Man (OMIM) database (Amberger et al., 2009), include mutations in more than 3000 human genes known to be associated with one or more of over 2000 human disorders. This is a truly astounding number of genotype-phenotype relationships considering that a mere three decades have passed since the initial description of Restriction Fragment Length Polymorphisms (RFLPs) as molecular markers to map genetic loci of interest (Botstein et al., 1980), only two decades since the announcement of the first positional cloning experiments of disease-associated genes using RFLPs (Amberger et al., 2009), and just one decade since the release of the first reference sequences of the human genome (Lander et al., 2001; Venter et al., 2001). For complex traits, the information gathered by recent genome-wide association studies suggests high-confidence genotype-phenotype associations between close to 1000 genomic loci and one or more of over one hundred diseases, including diabetes, obesity, Crohn’s disease, and hypertension (Altshuler et al., 2008). The discovery of genomic variations involved in cancer, inherited in the germline or acquired somatically, is equally striking, with hundreds of human genes found linked to cancer (Stratton et al., 2009). In light of new powerful technological developments such as next-generation sequencing, it is easily imaginable that a catalog of nearly all human genomic variations, whether deleterious, advantageous, or neutral, will be available within our lifetime. Despite the natural excitement emerging from such a huge body of information, daunting challenges remain. Practically, the genomic revolution has, thus far, seldom translated directly into the development of new therapeutic strategies, and the mechanisms underlying genotype-phenotype relationships remain only partially explained. Assuming that, with time, most human genotypic variations will be described together with 986 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
phenotypic associations, there would still be major problems to fully understand and model human genetic variations and their impact on diseases. To understand why, consider the ‘‘one-gene/one-enzyme/ one-function’’ concept originally framed by Beadle and Tatum (Beadle and Tatum, 1941), which holds that simple, linear connections are expected between the genotype of an organism and its phenotype. But the reality is that most genotype-phenotype relationships arise from a much higher underlying complexity. Combinations of identical genotypes and nearly identical environments do not always give rise to identical phenotypes. The very coining of the words ‘‘genotype’’ and ‘‘phenotype’’ by Johannsen more than a century ago derived from observations that inbred isogenic lines of bean plants grown in well-controlled environments give rise to pods of different size (Johannsen, 1909). Identical twins, although strikingly similar, nevertheless often exhibit many differences (Raser and O’Shea, 2005). Likewise, genotypically indistinguishable bacterial or yeast cells grown side by side can express different subsets of transcripts and gene products at any given moment (Elowitz et al., 2002; Blake et al., 2003; Taniguchi et al., 2010). Even straightforward Mendelian traits are not immune to complex genotype-phenotype relationships. Incomplete penetrance, variable expressivity, differences in age of onset, and modifier mutations are more frequent than generally appreciated (Perlis et al., 2010). We, along with others, argue that the way beyond these challenges is to decipher the properties of biological systems, and in particular, those of molecular networks taking place within cells. As is becoming increasingly clear, biological systems and cellular networks are governed by specific laws and principles, the understanding of which will be essential for a deeper comprehension of biology (Nurse, 2003; Vidal, 2009). Accordingly, our goal is to review key aspects of how complex systems operate inside cells. Particularly, we will review how by interacting with each other, genes and their products form complex networks within cells. Empirically determining and modeling cellular networks for a few model organisms and for
Figure 1. Perturbations in Biological Systems and Cellular Networks May Underlie Genotype-Phenotype Relationships By interacting with each other, genes and their products form complex cellular networks. The link between perturbations in network and systems properties and phenotypes, such as Mendelian disorders, complex traits, and cancer, might be as important as that between genotypes and phenotypes.
human has provided a necessary scaffold toward understanding the functional, logical and dynamical aspects of cellular systems. Importantly, we will discuss the possibility that phenotypes result from perturbations of the properties of cellular systems and networks. The link between network properties and phenotypes, including susceptibility to human disease, appears to be at least as important as that between genotypes and phenotypes (Figure 1). Cells as Interactome Networks Systems biology can be said to have originated more than half a century ago, when a few pioneers initially formulated a theoretical framework according to which multiscale dynamic complex systems formed by interacting macromolecules could underlie cellular behavior (Vidal, 2009). These theoretical systems biology ideas were elaborated upon at a time when there was little knowledge of the exact nature of the molecular components of biology, let alone any detailed information on functional and biophysical interactions between them. While greatly inspirational to a few specialists, systems concepts remained largely ignored by most molecular biologists, at least until empirical observations could be gathered to validate them. Meanwhile, theoretical representations of cellular organization evolved steadily, closely following the development of ever improving molecular technologies. The organizational view of the cell changed from being merely a ‘‘bag of enzymes’’ to a web of highly interrelated and interconnected organelles (Robinson et al., 2007). Cells can accordingly be envisioned as complex webs of macromolecular interactions, the full complement of which constitutes the ‘‘interactome’’ network. At the dawn of the 21st century, with most components of cellular networks having been identified, the basic ideas of systems and network biology are ready to be experimentally tested and applied to relevant biological problems.
Mapping Interactome Networks Network science deals with complexity by ‘‘simplifying’’ complex systems, summarizing them merely as components (nodes) and interactions (edges) between them. In this simplified approach, the functional richness of each node is lost. Despite or even perhaps because of such simplifications, useful discoveries can be made. As regards cellular systems, the nodes are metabolites and macromolecules such as proteins, RNA molecules and gene sequences, while the edges are physical, biochemical and functional interactions that can be identified with a plethora of technologies. One challenge of network biology is to provide maps of such interactions using systematic and standardized approaches and assays that are as unbiased as possible. The resulting ‘‘interactome’’ networks, the networks of interactions between cellular components, can serve as scaffold information to extract global or local graph theory properties. Once shown to be statistically different from randomized networks, such properties can then be related back to a better understanding of biological processes. Potentially powerful details of each interaction in the network are left aside, including functional, dynamic and logical features, as well as biochemical and structural aspects such as protein post-translational modifications or allosteric changes. The power of the approach resides precisely in such simplification of molecular detail, which allows modeling at the scale of whole cells. Early attempts at experimental proteome-scale interactome network mapping in the mid-1990s (Finley and Brent, 1994; Bartel et al., 1996; Fromont-Racine et al., 1997; Vidal, 1997) were inspired by several conceptual advances in biology. The biochemistry of metabolic pathways had already given rise to cellular scale representations of metabolic networks. The discovery of signaling pathways and cross-talk between them, as well as large molecular complexes such as RNA polymerases, all involving innumerable physical protein-protein interactions, suggested the existence of highly connected webs of interactions. Finally, the rapidly growing identification of many individual interactions between transcription factors and specific DNA regulatory sequences involved in the regulation of gene expression raised the question of how transcriptional regulation is globally organized within cells. Three distinct approaches have been used since to capture interactome networks: (1) compilation or curation of already existing data available in the literature, usually obtained from one or just a few types of physical or biochemical interactions (Roberts, 2006); (2) computational predictions based on available ‘‘orthogonal’’ information apart from physical or biochemical interactions, such as sequence similarities, gene-order conservation, copresence and coabsence of genes in completely sequenced genomes and protein structural information (Marcotte and Date, 2001); and (3) systematic, unbiased highthroughput experimental mapping strategies applied at the scale of whole genomes or proteomes (Walhout and Vidal, 2001). These approaches, though complementary, differ greatly in the possible interpretations of the resulting maps. Literature-curated maps present the advantage of using already available information, but are limited by the inherently variable quality of the published data, the lack of systematization, and the absence of reporting of negative data (Cusick et al., 2009; Turinsky Cell 144, March 18, 2011 ª2011 Elsevier Inc. 987
Figure 2. Networks in Cellular Systems To date, cellular networks are most available for the ‘‘super-model’’ organisms (Davis, 2004) yeast, worm, fly, and plant. High-throughput interactome mapping relies upon genome-scale resources such as ORFeome resources. Several types of interactome networks discussed are depicted. In a protein interaction network, nodes represent proteins and edges represent physical interactions. In a transcriptional regulatory network, nodes represent transcription factors (circular nodes) or putative DNA regulatory elements (diamond nodes); and edges represent physical binding between the two. In a disease network, nodes represent diseases, and edges represent gene mutations of which are associated with the linked diseases. In a virus-host network, nodes represent viral proteins (square nodes) or host proteins (round nodes), and edges represent physical interactions between the two. In a metabolic network, nodes represent enzymes, and edges represent metabolites that are products or substrates of the enzymes. The network depictions seem dense, but they represent only small portions of available interactome network maps, which themselves constitute only a few percent of the complete interactomes within cells.
et al., 2010). Computational prediction maps are fast and efficient to implement, and usually include satisfyingly large numbers of nodes and edges, but are necessarily imperfect because they use indirect information (Plewczynski and Ginalski, 2009). While high-throughput maps attempt to describe unbiased, systematic, and well-controlled data, they were initially more difficult to establish, although recent technological advances suggest that near completion can be reached within a few years for highly reliable, comprehensive protein-protein interaction and gene regulatory network maps for human (Venkatesan et al., 2009). The mapping and analysis of interactome networks for model organisms was instrumental in getting to this point. Such efforts provided, and will continue to provide, both necessary pioneering technologies and crucial conceptual insights. As with other aspects of biology, advancements in mapping of interactome networks would have been minimal without a focus on model organisms (Davis, 2004). The field of interactome mapping has been helped by developments in several model organisms, primarily the yeast, Saccharomyces cerevisiae, the fly, Drosophila melanogaster, and the worm, Caenorhabditis elegans (Figure 2). For instance, genome-wide resources such as collections of all, or nearly all, open reading frames (ORFeomes) were first generated for these model organisms, both because their genomes are the best annotated and because there are fewer complications, such as the high number of splice variants in human and other mammals. ORFeome resources allow efficient transfer of large numbers of ORFs into vectors suitable for diverse interactome mapping technologies (Hartley et al., 2000; Walhout et al., 2000b). Moreover, gene ablation technologies, knockouts (for yeast) and knockdowns by RNAi (for worms and flies) and transposon insertions (for plants), 988 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
were discovered in and are being applied genome-wide for these model organisms (Mohr et al., 2010). Metabolic Networks Metabolic network maps attempt to comprehensively describe all possible biochemical reactions for a particular cell or organism (Schuster et al., 2000; Edwards et al., 2001). In many representations of metabolic networks, nodes are biochemical metabolites and edges are either the reactions that convert one metabolite into another or the enzymes that catalyze these reactions (Jeong et al., 2000; Schuster et al., 2000) (Figure 2). Edges can be directed or undirected, depending on whether a given reaction is reversible or not. In specific cases of metabolic network modeling, the converse situation can be used, with nodes representing enzymes and edges pointing to adjacent pairs of enzymes for which the product of one is the substrate of the other (Lee et al., 2008). Although large metabolic pathway charts have existed for decades (Kanehisa et al., 2008), nearly complete metabolic network maps required the completion of full genome sequencing together with accurate gene annotation tools (Oberhardt et al., 2009). Network construction is manual with computational assistance, involving: (1) the meticulous curation of large numbers of publications, each describing experimental results regarding one or several metabolic reactions characterized from purified or reconstituted enzymes, and (2) when necessary, the compilation of predicted reactions from studies of orthologous enzymes experimentally characterized in other species. Assembly of the union of all experimentally demonstrated and predicted reactions gives rise to proteome-scale network maps (Mo and Palsson, 2009). Such maps have been compiled for numerous species, predominantly prokaryotes
and unicellular eukaryotes (Oberhardt et al., 2009), and full-scale metabolic reconstructions are now underway for human as well (Ma et al., 2007). Metabolic network maps are likely the most comprehensive of all biological networks, although considerable gaps will remain to be filled in by direct experimental investigations. Protein-Protein Interaction Networks In protein-protein interaction network maps, nodes represent proteins and edges represent a physical interaction between two proteins. The edges are nondirected, as it cannot be said which protein binds the other, that is, which partner functionally influences the other (Figure 2). Of the many methodologies that can map protein-protein interactions, two are currently in wide use for large-scale mapping. Mapping of binary interactions is primarily carried out by ever improving variations of the yeast two-hybrid (Y2H) system (Fields and Song, 1989; Dreze et al., 2010). Mapping of membership in protein complexes, providing indirect associations between proteins, is carried out by affinityor immunopurification to isolate protein complexes, followed by some form of mass spectrometry (AP/MS) to identify protein constituents of these complexes (Rigaut et al., 1999; Charbonnier et al., 2008). While Y2H datasets contain mostly direct binary interactions, AP/MS cocomplex data sets are composed of direct interactions mixed with a preponderance of indirect associations. Accordingly, the graphs generated by these two approaches exhibit different global properties (Seebacher and Gavin, 2011), such as the relationships between gene essentiality and the number of interacting proteins (Yu et al., 2008). In the past decade, significant steps have been taken toward the generation of comprehensive protein-protein interaction network maps. Comprehensive efforts using Y2H technologies to generate interactome maps began with the model organisms S. cerevisiae, C. elegans, and D. melanogaster (Ito et al., 2000, 2001; Uetz et al., 2000; Walhout et al., 2000a; Giot et al., 2003; Reboul et al., 2003; Li et al., 2004), and eventually included human (Colland et al., 2004; Rual et al., 2005; Stelzl et al., 2005; Venkatesan et al., 2009). Comprehensive mapping of cocomplex membership by high-throughput AP/MS was initially undertaken in yeast (Gavin et al., 2002; Ho et al., 2002), rapidly progressing to ever improving completeness and quality thereafter (Gavin et al., 2006; Krogan et al., 2006). For technical reasons future comprehensive AP/MS efforts will stay focused on unicellular organisms such as yeast (Collins et al., 2007) and mycoplasma (Kuhner et al., 2009), whereas Y2H efforts are more readily implemented for complex multicellular organisms (Seebacher and Gavin, 2011). In their early implementations, systematic and comprehensive interaction network mapping efforts met with skepticism regarding their accuracy (von Mering et al., 2002), analogous to the original concerns over whether automated highthroughput genome sequencing efforts might have considerably lower accuracy than dedicated efforts carried out cumulatively in many laboratories. Only after the emergence of rigorous statistical tests to estimate sequencing accuracy could highthroughput sequencing efforts reach their full potential (Ewing et al., 1998). Analogously, an empirical framework recently propagated for protein interaction mapping (Venkatesan et al., 2009)
now allows the estimation of overall accuracy and sensitivity for maps obtained using high-throughput mapping approaches. Four critical parameters need to be estimated: (1) completeness: the number of physical protein pairs actually tested in a given search space; (2) assay sensitivity: which interactions can and cannot be detected by a particular assay; (3) sampling sensitivity: the fraction of all detectable interactions found by a single implementation of any interaction assay; and (4) precision: the proportion of true biophysical interactors. Careful consideration of these parameters offers a quantitative idea of the completeness and accuracy of a particular high-throughput interaction map (Yu et al., 2008; Simonis et al., 2009; Venkatesan et al., 2009), and allows comparison of multiple maps as long as standardized framework parameters are used. In contrast, comparing the results of small-scale experiments available in literature curated databases is not possible, as there is little way to control for accuracy, reproducibility, and sensitivity. The binary interactome empirical framework offers a way to estimate the size of interactome networks, which in turn is essential to define a roadmap to reach completion for the interactome mapping efforts of any species of interest. While originally established for protein-protein interaction mapping, similar empirical frameworks can be applied more broadly to mapping of other types of interactome networks (Costanzo et al., 2010). Gene Regulatory Networks In most gene regulatory network maps, nodes are either a transcription factor or a putative DNA regulatory element, and directed edges represent the physical binding of transcription factors to such regulatory elements (Zhu et al., 2009). Edges can be said to be incoming (transcription factor binds a regulatory DNA element) or outgoing (regulatory DNA element bound by a transcription factor) (Figure 2). Currently, two general approaches are amenable to large-scale mapping of gene regulatory networks. In yeast one-hybrid (Y1H) approaches, a putative cis-regulatory DNA sequence, commonly a suspected promoter region, is used as bait to capture transcription factors that bind to that sequence (Deplancke et al., 2004). In chromatin immunoprecipitation (ChIP) approaches, antibodies raised against transcription factors of interest, or against a peptide tag used in fusion with potential transcription factors, are used to immunoprecipitate potentially interacting cross-linked DNA fragments (Lee et al., 2002). As Y1H proceeds from genes and captures associated proteins it is said to be ‘‘gene-centric,’’ whereas ChIP strategies are ‘‘protein-centric’’ in that they proceed from transcription factors and attempt to capture associated gene regions (Walhout, 2006). The two approaches are complementary. The Y1H system can discover novel transcription factors but relies on having known, or at least suspected, regulatory regions; ChIP methods can discover novel regulatory motifs but rely on the availability of reagents specific to transcription factors of interest, which themselves depend on accurate predictions of transcription factors (Reece-Hoyes et al., 2005; Vaquerizas et al., 2009). Large-scale Y1H networks have been produced for C. elegans (Vermeirssen et al., 2007; Grove et al., 2009). Large-scale ChIPbased networks have been produced for yeast (Lee et al., 2002) and have been carried out for mammalian tissue culture cells as well (Cawley et al., 2004). Cell 144, March 18, 2011 ª2011 Elsevier Inc. 989
Figure 3. Integrated Networks Coexpression and phenotypic profiling can be thought of as matrices comprising all genes of an organism against all conditions that this organism has been exposed to within a given expression compendium and all phenotypes tested, respectively. For any correlation measurement, Pearson correlation coefficients (PCCs) being commonly used, the threshold between what is considered coexpressed and noncoexpressed needs to be set using appropriate titration procedures. Pairs of genes whose expression or phenotype profiles are above the determined threshold are then linked. The resulting integrated networks have powerful predictive value. Adapted from (Gunsalus et al., 2005).
In addition to transcription factor activities, overall gene transcript levels are also regulated post-transcriptionally by micro RNAs (miRNAs), short noncoding RNAs that bind to complementary cis-regulatory RNA sequences usually located in 30 untranslated regions (UTRs) of target mRNAs (Lee et al., 2004; Ruvkun et al., 2004). miRNAs are not expected to act as master regulators, but rather act post-transcriptionally to fine-tune gene expression by modulating the levels of target mRNAs. Complex networks are formed by miRNAs interacting with their targets. In such networks, nodes are either a miRNA or a target 30 UTR, and edges represent the complementary annealing of the miRNA to the target RNA. Edges can be said to be incoming (miRNA binds a 30 UTR element) or outgoing (30 UTR element bound by a miRNA) (Martinez et al., 2008). The targets of miRNAs are generally computationally predicted, as experimental methodologies to map miRNA/30 UTR interactions at high-throughput are just coming online (Karginov et al., 2007; Guo et al., 2010; Hafner et al., 2010). Since transcription factors regulate the expression of miRNAs, it is however possible to combine Y1H methods with computationally predicted miRNA/30 UTR interactions, a strategy which was used to derive a large-scale miRNA network in C. elegans (Martinez et al., 2008) and which could be extended to other genomes. Integrating Interactome Networks with other Cellular Networks The three interactome network types considered so far, metabolic, protein-protein interaction, and gene regulatory networks, are composed of physical or biochemical interactions between macromolecules. The corresponding network maps provide crucial ‘‘scaffold’’ information about cellular systems, on top of which additional layers of functional links can be added to finetune the representation of biological reality (Figure 3) (Vidal, 2001). Networks composed of functional links, although strikingly different in terms of what the edges represent, can nevertheless complement what can be learned from interactome network maps in powerful ways, and vice versa. Networks of functional links represent a category of cellular networks that can be derived from indirect, or ‘‘conceptual,’’ interactions where links between genes and gene products are reported 990 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
based upon functional relationships or similarities, independently of physical macromolecular interactions. We consider three types of functional networks that have been mapped thus far at the scale of whole genomes and used together with physical interactome networks to interrogate the complexities of genotype-to-phenotype relationships. Transcriptional Profiling Networks Gene products that function together in common signaling cascades or protein complexes are expected to show greater similarities in their expression patterns than random sets of gene products. But how does this expectation translate at the level of whole proteomes and transcriptomes? How do transcriptome states correlate globally with interactome networks? Since the original description of microarray and DNA chip techniques and more recently de novo RNA sequencing using nextgeneration sequencing technologies, vast compendiums of gene expression datasets have been generated for many different species across a multitude of diverse genetic and environmental conditions. This type of information can be thought of as matrices comprising all genes of an organism against all conditions that this organism has been exposed to within a given expression compendium (Vidal, 2001). In the resulting coexpression networks, nodes represent genes, and edges link pairs of genes that show correlated coexpression above a set threshold (Kim et al., 2001; Stuart et al., 2003). For any correlation measurement, Pearson correlation coefficients (PCCs) being commonly used, the threshold between what is considered coexpressed and not coexpressed needs to be set using appropriate titration procedures (Stuart et al., 2003; Gunsalus et al., 2005). Integration attempts in yeast, combining physical proteinprotein interaction maps with coexpression profiles, revealed that interacting proteins are more likely to be encoded by genes with similar expression profiles than noninteracting proteins (Ge et al., 2001; Grigoriev, 2001; Jansen et al., 2002; Kemmeren et al., 2002). These observations were subsequently confirmed in many other organisms (Ge et al., 2003). Beyond the fundamental aspect of finding significant overlaps between interaction edges in interactome networks and coexpression edges in transcription profiling networks, these observations have been used to estimate the overall biological significance of interactome datasets. While correlations can be statistically significant over
huge datasets, still many valid biologically relevant proteinprotein interactions correspond to pairs of genes whose expression is uncorrelated or even anticorrelated. Coexpression similarity links need not be perfectly overlapping with physical interactions of the corresponding gene products and vice versa. In another example of what coexpression networks can be used for, preliminary steps have been taken to delineate gene regulatory networks from coexpression profiles (Segal et al., 2003; Amit et al., 2009). Such network constructions provide verifiable hypotheses about how regulatory pathways operate. Phenotypic Profiling Networks Perturbations of genes that encode functionally related products often confer similar phenotypes. Systematic use of gene knockout strategies developed for yeast (Giaever et al., 2002) and knock-down approaches using RNA interference (RNAi) for C. elegans, Drosophila, and, recently, human (Mohr et al., 2010), are amenable to the perturbation of (nearly) all genes and subsequent testing of a wide variety of standardized phenotypes. As with transcriptional profiling networks, this type of information can be thought of as matrices comprising all genes of an organism and all phenotypes tested within a given phenotypic profiling compendium. In the resulting phenotypic similarity or ‘‘phenome’’ network, nodes represent genes, and edges link pairs of genes that show correlated phenotypic profiles above a set threshold. Here again, titration is needed to decide on the threshold between what is considered phenotypically similar and what is not (Gunsalus et al., 2005). The earliest evidence that phenotypic profiling or ‘‘phenome’’ networks might help in interpreting protein-protein interactome networks was obtained in studies of the C. elegans DNA damage response and hermaphrodite germline (Boulton et al., 2002; Piano et al., 2002; Walhout et al., 2002). The physical basis of phenome networks is not yet completely defined, though there are strong overlaps between correlated phenotypic profiles and physical protein-protein interactions (Walhout et al., 2002; Gunsalus et al., 2005). Overlapping three network types, binary interactions, coexpression, and phenotype profiling, produces integrated networks with high predictive power, as demonstrated for C. elegans early embryogenesis (Walhout et al., 2002; Gunsalus et al., 2005). Integration of transcriptional regulatory networks with these other network types has also been undertaken in worm (Grove et al., 2009). Comprehensive genome-wide phenome networks are now a reality for the yeast S. cerevisiae (Giaever et al., 2002), and are expected to be further developed for C. elegans (So¨nnichsen et al., 2005) and Drosophila (Mohr et al., 2010). Now that RNAi reagents are available for nearly all genes of mouse and human (Root et al., 2006), phenome maps for cell lines of these organisms should soon follow. Genetic Interaction Networks Pairs of functionally related genes tend to exhibit genetic interactions, defined by comparing the phenotype conferred by mutations in pairs of genes (double mutants) to the phenotype conferred by either one of these mutations alone (single mutants). Genetic interactions are classified as negative, i.e. aggravating, synthetic sick or lethal, when the phenotype of double mutants is significantly worse than expected from that of single mutants, and positive, i.e. alleviating or suppressive,
when the phenotype of double mutants is significantly better than that expected from the single mutants (Mani et al., 2008). Though finding genetic interactions has been crucial to geneticists for decades (Sturtevant, 1956; Novick et al., 1989), only in the last ten years has functional genomics advanced sufficiently to allow systematic high-throughput mapping of genetic interactions to give rise to large-scale networks (Boone et al., 2007). Two general strategies have been followed for the systematic mapping of genetic interactions in yeast. Synthetic genetic arrays (SGA) and derivative methodologies use high-density arrays of double mutants by mating pairs from an available set of single mutants (Tong et al., 2001; Boone et al., 2007). Alternative strategies take advantage of sequence barcodes embedded in a set of yeast deletion mutants (Giaever et al., 2002; Beltrao et al., 2010) to measure the relative growth rate in a population of double mutants by hybridization to anti-barcode microarrays (Pan et al., 2004; Boone et al., 2007). These two approaches seem to capture similar aspects of genetic interactions, as the overlap between the two types of datasets is significant (Costanzo et al., 2010). Patterns of genetic interactions can be used to define a kind of network that is similar to phenotypic profiling or phenome networks. As with transcriptional and phenotypic profiling networks, this type of information can be thought of as matrices comprising all genes of an organism and the genes with which they exhibit a genetic interaction. In such ‘‘genetic interaction profiling’’ networks, edges functionally link two genes based on high similarities of genetic interaction profiles. Here again, predictive models of biological processes can be obtained when such networks are combined with other types of interactome networks. Integration of genetic interaction networks with other types of interactome network maps provides potentially powerful models. While genetic interactions do not necessarily correspond to physical interactions between the corresponding gene products (Mani et al., 2008), interesting patterns emerge between the different datasets. Because they tend to reveal pairs of genes involved in parallel pathways or in different molecular machines, negative genetic interactions tend not to correlate with either protein associations in protein complexes or with binary protein-protein interactions (Beltrao et al., 2010; Costanzo et al., 2010). In contrast, positive genetic interactions tend to point to pairs of gene products physically associated with each other. This trend is usually explained by loss of either one or two gene products working together in a molecular complex resulting in similar effects (Beltrao et al., 2010). Graph Properties of Networks A critical realization over the past decade is that the structure and evolution of networks appearing in natural, technological, and social systems over time follows a series of basic and reproducible organizing principles. Theoretical advances in network science (Albert and Barabasi, 2002), paralleling advances in high-throughput efforts to map biological networks, have provided a conceptual framework with which to interpret large interactome network maps. Full understanding of the internal organization of a cell requires awareness of the constraints and laws that biological networks follow. We summarize several Cell 144, March 18, 2011 ª2011 Elsevier Inc. 991
principles of network theory that have immediate applications to biology. Degree Distribution and Hubs Any empirical investigation starts with the same question: could the investigated phenomena have emerged by chance, or could random effects account for them? The earliest network models assumed that complex networks are wired randomly, such that any two nodes are connected by a link with the same probability p. This Erdos-Renyi model generates a network with a Poisson degree distribution, which implies that most nodes have approximately the same degree, that is, the same number of links, while nodes that have significantly more or fewer links than any average node are exceedingly rare or altogether absent. In contrast, many real networks, from the world wide web to social networks, are scale-free (Baraba´si and Albert, 1999), which means that their degree distribution follows a power law rather than the expected Poisson distribution. In a scale-free network most nodes have only a few interactions, and these coexist with a few highly connected nodes, the hubs, that hold the whole network together. This scale-free property has been found in all organisms for which protein-protein interaction and metabolic network maps exist, from yeast to human (Baraba´si and Oltvai, 2004; Seebacher and Gavin, 2011). Regulatory networks, however, show a mixed behavior. The outgoing degree distribution, corresponding to how many different genes a transcription factor can regulate, is scale-free, meaning that some master regulators can regulate hundreds of genes. In contrast, the incoming degree distribution, corresponding to how many transcription factors regulate a specific gene, best fits an exponential model (Deplancke et al., 2006), indicating that genes that are simultaneously regulated by large numbers of transcription factors are exponentially rare. Gene Duplication as the Origin of the Scale-Free Property The scale-free topology of biological networks likely originates from gene duplication. While the principle applies from metabolic to regulatory networks, it is best illustrated in proteinprotein interaction networks, where it was first proposed (Pastor-Satorras et al., 2003; Va´zquez et al., 2003). When cells divide and the genome replicates, occasionally an extra copy of one or several genes or chromosomes gets produced. Immediately following a duplication event, both the original protein and the new extra copy have the same structure, so both interact with the same set of partners. Consequently, each of the protein partners that interacted with the ancestor gains a new interaction. This process results in a ‘rich-get-richer’ phenomenon (Baraba´si and Albert, 1999), where proteins with a large number of interactions tend to gain links more often, as it is more likely that they interact with a duplicated protein. This mechanism has been shown to generate hubs (Pastor-Satorras et al., 2003; Va´zquez et al., 2003), and so could be responsible for the scale-free property of protein-protein interaction networks. The Role of Hubs Network biology attempts to identify global properties in interactome network graphs, and subsequently relate such properties to biological reality by integrating various functional datasets. One of the best examples where this approach was successful is in defining the role of hubs. In the model organisms S. cerevisiae 992 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
and C. elegans, hub proteins were found to: (1) correspond to essential genes (Jeong et al., 2001), (2) be older and have evolved more slowly (Fraser et al., 2002), (3) have a tendency to be more abundant (Ivanic et al., 2009), and (4) have a larger diversity of phenotypic outcomes resulting from their deletion compared to the deletion of less connected proteins (Yu et al., 2008). While the evidence attributed to some of these findings has been debated (Jordan et al., 2003; Yu et al., 2008; Ivanic et al., 2009), the special role of hub proteins in model organisms led to the expectation that, in humans, hubs should preferentially encode disease-related genes. Indeed, upregulated genes in lung squamous cell carcinoma tissues tend to have a high degree in protein-protein interaction networks (Wachi et al., 2005), and cancer-related proteins have, on average, twice as many interaction partners as noncancer proteins in proteinprotein interaction networks (Jonsson and Bates, 2006). A cautionary note is necessary: since disease-related proteins tend to be more avidly studied their higher connectivity may be partly rooted in investigative biases. Therefore, this type of finding needs to be appropriately controlled using systematic proteome-wide interactome network maps. Understanding the role of hubs in human disease requires distinguishing between essential genes and disease-related genes (Goh et al., 2007). Some human genes are essential for early development, such that mutations in them often lead to spontaneous abortions. The protein products of mouse in utero essential genes show a strong tendency to be associated with hubs and to be expressed in multiple tissues (Goh et al., 2007). Nonessential disease genes tend not to be encoded by hubs and tend to be tissue specific. These differences can be best appreciated from an evolutionary perspective. Mutations that disrupt hubs may have difficulty propagating in the population, as the host may not survive long enough to have offspring. Only mutations that impair functionally and topologically peripheral genes can persist, becoming responsible for heritable diseases, particularly those that manifest in adulthood. Date and Party Hubs Another success in uncovering the functional consequences of the topology of interactome networks was provided by the discovery of date and party hubs (Han et al., 2004). Upon integrating protein-protein interaction network data with transcriptional profiling networks for yeast, at least two classes of hubs can be discriminated. Party hubs are highly coexpressed with their interacting partners while date hubs appear to be more dynamically regulated relative to their partners (Han et al., 2004). In other words, date hubs interact with their partners at different times and/or different conditions, whereas party hubs seem to interact with their partners at all times or conditions tested (Seebacher and Gavin, 2011). Despite the preponderance of evidence in its favor, the date and party hubs concept remains a subject of debate, (Agarwal et al., 2010), attributable to the necessity to appropriately calibrate coexpression and proteinprotein interaction hub thresholds when analyzing new transcriptome and interactome datasets (Bertin et al., 2007). Fundamentally, date hubs preferentially connect functional modules to each other, whereas party hubs preferentially act inside functional modules, hence they are occasionally called inter-module and intra-module hubs, respectively (Han et al.,
2004; Taylor et al., 2009). Date hubs are less evolutionarily constrained than party hubs (Fraser, 2005; Ekman et al., 2006; Bertin et al., 2007). Party hubs contain fewer and shorter regions of intrinsic disorder than do date hubs (Ekman et al., 2006; Singh et al., 2006; Kahali et al., 2009) and contain fewer linear motifs (short binding motifs and post-translational modification sites) than do date hubs (Taylor et al., 2009). Initially explored in a yeast interactome (Han et al., 2004; Ekman et al., 2006), the distinction between date and party hubs can be recapitulated in human interactomes as well (Taylor et al., 2009). Motifs There has been considerable attention paid in recent years to network motifs, which are characteristic network patterns, or subgraphs, in biological networks that appear more frequently than expected given the degree distribution of the network (Milo et al., 2002). Such subgraphs have been found to be associated with desirable (or undesirable) biological function (or dysfunction). Hence identification and classification of motifs can offer information about the various network subgraphs needed for biological processes. It is now commonly understood that motifs constitute the basic building blocks of cellular networks (Milo et al., 2002; Yeger-Lotem et al., 2004). Originally identified in transcriptional regulatory networks of several model organisms (Milo et al., 2002; Shen-Orr et al., 2002), motifs have been subsequently identified in interactome networks and in integrated composite networks (Yeger-Lotem et al., 2004; Zhang et al., 2005). Different types of networks exhibit different motif profiles, suggesting a means for network classification (Milo et al., 2004; Zhang et al., 2005). The high degree of evolutionary conservation of motif constituents within interaction networks (Wuchty et al., 2003), combined with the convergent evolution that is seen in the transcription regulatory networks of diverse species toward the same motif types (Baraba´si and Oltvai, 2004), makes a strong argument that motifs are of direct biological relevance. Classification of several highly significant motifs of two, three, and four nodes, with descriptors like coherent feed forward loop or single-input module, has shown that specific types of motifs carry out specific dynamic functions within cells (Alon, 2007; Shoval and Alon, 2010). Topological, Functional, and Disease Modules Most biological networks have a rather uneven organization. Many nodes are part of locally dense neighborhoods, or topological modules, where nodes have a higher tendency to link to nodes within the same local neighborhood than to nodes outside of it (Ravasz et al., 2002). A region of the global network diagram that corresponds to a potential topological module can be identified by network clustering algorithms which are blind to the function of individual nodes. These topological modules are often believed to carry specific cellular functions, hence leading to the concept of a functional module, an aggregation of nodes of similar or related function in the same network neighborhood. Interest is increasing in disease modules, which represent groups of network components whose disruption results in a particular disease phenotype in humans (Baraba´si et al., 2010). There is a tacit assumption, based on evidence in the biological literature, that cellular components forming topological modules have closely related functions, thus corresponding to functional modules. New potentially powerful methods to iden-
tify topological and functional clusterings continue to be described (Ahn et al., 2010). Such modules can serve as hypothesis building tools to identify regions of the interactome likely involved in particular cellular functions or disease (Baraba´si et al., 2010). Networks and Human Diseases Having reviewed why biological networks are important to consider, how they can be mapped and integrated with each other, and what global properties are starting to emerge from such models, we next return to our original question: to what extent do biological systems and cellular networks underlie genotype-phenotype relationships in human disease? We attempt to provide answers by covering four recent advances in network biology: (1) studies of global relationships between human disorders, associated genes and interactome networks, (2) predictions of new human disease-associated genes using interactome models, (3) analyses of network perturbations by pathogens, and (4) emergence of node removal versus edgespecific or ‘‘edgetic’’ models to explain genotype-phenotype relationships. Global Disease Networks One of the main predictions derived from the hypothesis that human disorders should be viewed as perturbations of highly interlinked cellular networks is that diseases should not be independent from each other, but should instead be themselves highly interconnected. Such potential cellular network-based dependencies between human diseases has led to the generation of various global disease network maps, which link disease phenotypes together if some molecular or phenotypic relationships exist between them. Such a map was built using known gene-disease associations collected in the OMIM database (Goh et al., 2007), where nodes are diseases and two diseases are linked by an edge if they share at least one common gene, mutations in which are associated with these diseases. In the obtained disease network more than 500 human genetic disorders belong to a single interconnected main giant component, consistent with the idea that human diseases are much more connected to each other than anticipated. The flipside of this representation of connectivity is a network of disease-associated genes linked together if mutations in these genes are known to be responsible for at least one common disorder. Providing support for our general hypothesis that perturbations in cellular networks underlie genotype-phenotype relationships, such disease-associated gene networks overlap significantly with human protein-protein interactome network maps (Goh et al., 2007). Additional types of connectivity between large numbers of human diseases can be found in ‘‘comorbidity’’ networks where diseases are linked to each other when individuals who were diagnosed for one particular disease are more likely to have also been diagnosed for the other (Rzhetsky et al., 2007; Hidalgo et al., 2009). Diabetes and obesity represent probably the best known disease pair with such significant comorbidity. While comorbidity can have multiple origins, ranging from environmental factors to treatment side effects, its potential molecular origin has attracted considerable attention. A network biology interpretation would suggest that the molecular defects responsible for Cell 144, March 18, 2011 ª2011 Elsevier Inc. 993
one of a pair of diseases can ‘‘spread along’’ the edges in cellular networks, affecting the activity of related gene products and causing or affecting the outcome of the other disease (Park et al., 2009). Predicting Disease Related Genes by Using Interactome Networks If cellular networks underlie genotype-phenotype relationships, then network properties should be predictive of novel, yet to be identified human disease-associated genes. In an early example, it was shown that the products of a few dozen ataxia-associated genes occupy particular locations in the human interactome network, in that the number of edges separating them is on average much lower than for random sets of gene products (Lim et al., 2006). Physical protein-protein interactome network maps can indeed generate lists of genes potentially enriched for new candidate disease genes or modifier genes of known disease genes (Lim et al., 2006; Oti et al., 2006; Fraser and Plotkin, 2007). Integration of various interactome and functional relationship networks have also been applied to reveal genes potentially involved in cancer (Pujana et al., 2007). Integrating a coexpression network, seeded with four well-known breast cancer associated genes, together with genetic and physical interactions, yielded a breast cancer network model out of which candidate cancer susceptibility and modifier genes could be predicted (Pujana et al., 2007). Integrative network modeling strategies are applicable to other types of cancer and other types of disease (Ergun et al., 2007; Wu et al., 2008; Lee et al., 2010). Network Perturbations by Pathogens Pathogens, particularly viruses, have evolved sophisticated mechanisms to perturb the intracellular networks of their hosts to their advantage. As obligate intracellular pathogens, viruses must intimately rewire cellular pathways to their own ends to maintain infectivity. Since many virus-host interactions happen at the level of physical protein-protein interactions, systematic maps capturing viral-host physical protein-protein interactions, or ‘‘virhostome’’ maps, have been obtained using Y2H for EpsteinBarr virus (Calderwood et al., 2007), hepatitis C virus (de Chassey et al., 2008), several herpesviruses (Uetz et al., 2006), influenza virus (Shapira et al., 2009) and others (Mendez-Rios and Uetz, 2010), and by co-AP/MS methodologies for HIV (Ja¨ger et al., 2010). An eminent goal is to find perturbations in network properties of the host network, properties that would not be made evident by small-scale investigations focused on one or a handful of viral proteins. For instance, it has been found several times now that viral proteins preferentially target hubs in host interactome networks (Calderwood et al., 2007; Shapira et al., 2009). The many host targets identified in virhostome screens are now getting biologically validated by RNAi knock-down and transcriptional profiling, leading to detailed maps of the interactions underlying viral-host relationships (Shapira et al., 2009). Another impetus for mapping virhostome networks is that virus protein interactions can act as surrogates for human genetic variations, inducing disease states by influencing local and global properties of cellular networks. The inspiration for this concept emerged from classical observations such as the binding of Adenovirus E1A, HPV E7, and SV40 Large T antigen to the human retinoblastoma protein, which is the product of a gene in which mutations lead to a predisposition to retinoblas994 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
toma and other types of cancers (DeCaprio, 2009). This hypothesis will soon be tested globally by systematic investigations of how host networks, including physical interaction, gene regulatory and genetic interaction networks, are perturbed upon viral infection. Pathogen-host interaction mapping projects are also in their first iterations, with similar goals of identifying emergent global properties and disease surrogates. As microbial pathogens can have thousands of gene products relative to much smaller numbers for most viruses, such projects will require considerably more effort and time. Edgetics Our underlying premise throughout has been that phenotypic variations of an organism, particularly those that result in human disease, arise from perturbations of cellular interactome networks. These alterations range from the complete loss of a gene product, through the loss of some but not all interactions, to the specific perturbation of a single molecular interaction while retaining all others. In interactome networks these alterations range from node removal at one end and edge-specific, ‘‘edgetic’’ perturbations at the other (Zhong et al., 2009). The consequences on network structure and function are expected to be radically dissimilar for node removal versus edgetic perturbation. Node removal not only disables the function of a node but also disables all the interactions of that node with other nodes, disrupting in some way the function of all of the neighboring nodes. An edgetic disruption, removing one or a few interactions but leaving the rest intact and functioning, has subtler effects on the network, though not necessarily on the resulting phenotype (Madhani et al., 1997). The distinction between node removal and edgetic perturbation models can provide new clues on mechanisms underlying human disease, such as the different classes of mutations that lead to dominant versus recessive modes of inheritance (Zhong et al., 2009). The idea that the disruption of specific protein interactions can lead to human disease (Schuster-Bockler and Bateman, 2008) complements canonical gene loss/perturbation models (Botstein and Risch, 2003), and is poised to explain confounding genetic phenomena such as genetic heterogeneity. Matching the edgetic hypothesis to inherited human diseases, approximately half of 50,000 Mendelian alleles available in the human gene mutation database can be modeled as potentially edgetic if one considers deletions and truncating mutations as node removal, and in-frame point mutations leading to single amino-acid changes and small insertions and deletions as edgetic perturbations (Zhong et al., 2009). This number is probably a good approximation, since thus far disease-associated genes predicted to bear edgetic alleles using this model have been experimentally confirmed (Zhong et al., 2009). For genes associated with multiple disorders and for which predicted protein interaction domains are available, it was shown that putative edgetic alleles responsible for different disorders tend to be located in different interaction domains, consistent with different edgetic perturbations conferring strikingly different phenotypes. Conclusion A comprehensive catalog of sequence variations among the 7 billion human genomes present on earth might soon become
available. This information will continue to revolutionize biology in general and medicine in particular for many decades and perhaps centuries to come. The prospects of predictive and personalized medicine are enormous. However, it should be kept in mind that genome variations merely constitute variations in the parts list and often fail to provide a description of the mechanistic consequences on cellular functions. Here, we have summarized why considering perturbations of biological networks within cells is crucial to help interpret how genome variations relate to phenotypic differences. Given their high levels of complexity, it is no surprise that interactome networks have not yet been mapped completely. The data and models accumulated in the last decade point to clear directions for the next decade. We envision that with more interactome datasets of increasingly high quality, the trends reviewed here will be fine tuned. The global properties observed so far and those yet to be uncovered should help ‘‘make sense’’ of the enormous body of information encompassed in the human genome. ACKNOWLEDGMENTS We thank David E. Hill, Matija Dreze, Anne-Ruxandra Carvunis, Benoit Charloteaux, Quan Zhong, Balaji Santhanam, Sam Pevzner, Song Yi, Nidhi Sahni, Jean Vandenhaute, and Roseann Vidal for careful reading of the manuscript. Interactome mapping efforts at CCSB have been supported mainly by National Institutes of Health grant R01-HG001715. M.V. is grateful to Nadia Rosenthal for the peaceful Suttonian environment. We apologize to those in the field whose important work was not cited here due to space limitation.
Bertin, N., Simonis, N., Dupuy, D., Cusick, M.E., Han, J.D., Fraser, H.B., Roth, F.P., and Vidal, M. (2007). Confirmation of organized modularity in the yeast interactome. PLoS Biol. 5, e153. Blake, W.J., Kaern, M., Cantor, C.R., and Collins, J.J. (2003). Noise in eukaryotic gene expression. Nature 422, 633–637. Boone, C., Bussey, H., and Andrews, B.J. (2007). Exploring genetic interactions and networks with yeast. Nat. Rev. Genet. 8, 437–449. Botstein, D., and Risch, N. (2003). Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. Suppl. 33, 228–237. Botstein, D., White, R.L., Skolnick, M., and Davis, R.W. (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331. Boulton, S.J., Gartner, A., Reboul, J., Vaglio, P., Dyson, N., Hill, D.E., and Vidal, M. (2002). Combined functional genomic maps of the C. elegans DNA damage response. Science 295, 127–131. Calderwood, M.A., Venkatesan, K., Xing, L., Chase, M.R., Vazquez, A., Holthaus, A.M., Ewence, A.E., Li, N., Hirozane-Kishikawa, T., Hill, D.E., et al. (2007). Epstein-Barr virus and virus human protein interaction maps. Proc. Natl. Acad. Sci. USA 104, 7606–7611. Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., et al. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509. Charbonnier, S., Gallego, O., and Gavin, A.C. (2008). The social network of a cell: recent advances in interactome mapping. Biotechnol. Annu. Rev. 14, 1–28.
REFERENCES
Colland, F., Jacq, X., Trouplin, V., Mougin, C., Groizeleau, C., Hamburger, A., Meil, A., Wojcik, J., Legrain, P., and Gauthier, J.M. (2004). Functional proteomics mapping of a human signaling pathway. Genome Res. 14, 1324–1332.
Agarwal, S., Deane, C.M., Porter, M.A., and Jones, N.S. (2010). Revisiting date and party hubs: novel approaches to role assignment in protein interaction networks. PLoS Comput. Biol. 6, e1000817.
Collins, S.R., Kemmeren, P., Zhao, X.C., Greenblatt, J.F., Spencer, F., Holstege, F.C., Weissman, J.S., and Krogan, N.J. (2007). Towards a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics 6, 439–450.
Ahn, Y.Y., Bagrow, J.P., and Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature 466, 761–764. Albert, R., and Barabasi, A.L. (2002). Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97. Alon, U. (2007). Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461. Altshuler, D., Daly, M.J., and Lander, E.S. (2008). Genetic mapping in human disease. Science 322, 881–888. Amberger, J., Bocchini, C.A., Scott, A.F., and Hamosh, A. (2009). McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37, D793–D796. Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., Guttman, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science 326, 257–263. Baraba´si, A.L., and Albert, R. (1999). Emergence of scaling in random networks. Science 286, 509–512. Baraba´si, A.L., Gulbahce, N., and Loscalzo, J. (2010). Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68. Baraba´si, A.L., and Oltvai, Z.N. (2004). Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113. Bartel, P.L., Roecklein, J.A., SenGupta, D., and Fields, S. (1996). A protein linkage map of Escherichia coli bacteriophage T7. Nat. Genet. 12, 72–77. Beadle, G.W., and Tatum, E.L. (1941). Genetic control of biochemical reactions in Neurospora. Proc. Natl. Acad. Sci. USA 27, 499–506. Beltrao, P., Cagney, G., and Krogan, N.J. (2010). Quantitative genetic interactions reveal biological modularity. Cell 141, 739–745.
Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh, J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic landscape of a cell. Science 327, 425–431. Cusick, M.E., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A.R., Simonis, N., Rual, J.F., Borick, H., Braun, P., Dreze, M., et al. (2009). Literature-curated protein interaction datasets. Nat. Methods 6, 39–46. de Chassey, B., Navratil, V., Tafforeau, L., Hiet, M.S., Aublin-Gex, A., Agaugue, S., Meiffren, G., Pradezynski, F., Faria, B.F., Chantier, T., et al. (2008). Hepatitis C virus infection protein network. Mol. Syst. Biol. 4, 230. Davis, R.H. (2004). The age of model organisms. Nat. Rev. Genet. 5, 69–76. DeCaprio, J.A. (2009). How the Rb tumor suppressor structure and function was revealed by the study of Adenovirus and SV40. Virology 384, 274–284. Deplancke, B., Dupuy, D., Vidal, M., and Walhout, A.J. (2004). A Gatewaycompatible yeast one-hybrid system. Genome Res. 14, 2093–2101. Deplancke, B., Mukhopadhyay, A., Ao, W., Elewa, A.M., Grove, C.A., Martinez, N.J., Sequerra, R., Doucette-Stamm, L., Reece-Hoyes, J.S., Hope, I.A., et al. (2006). A gene-centered C. elegans protein-DNA interaction network. Cell 125, 1193–1205. Dreze, M., Monachello, D., Lurin, C., Cusick, M.E., Hill, D.E., Vidal, M., and Braun, P. (2010). High-quality binary interactome mapping. Methods Enzymol. 470, 281–315. Edwards, J.S., Ibarra, R.U., and Palsson, B.O. (2001). In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol. 19, 125–130. Ekman, D., Light, S., Bjorklund, A.K., and Elofsson, A. (2006). What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 7, R45.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 995
Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. (2002). Stochastic gene expression in a single cell. Science 297, 1183–1186.
ically organized modularity in the yeast protein-protein interaction network. Nature 430, 88–93.
Ergun, A., Lawrence, C.A., Kohanski, M.A., Brennan, T.A., and Collins, J.J. (2007). A network biology approach to prostate cancer. Mol. Syst. Biol. 3, 82.
Hartley, J.L., Temple, G.F., and Brasch, M.A. (2000). DNA cloning using in vitro site-specific recombination. Genome Res. 10, 1788–1795.
Ewing, B., Hillier, L., Wendl, M.C., and Green, P. (1998). Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185.
Hidalgo, C.A., Blumm, N., Barabasi, A.L., and Christakis, N.A. (2009). A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol. 5, e1000353.
Fields, S., and Song, O. (1989). A novel genetic system to detect proteinprotein interactions. Nature 340, 245–246.
Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., et al. (2002). Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183.
Finley, R.L., Jr., and Brent, R. (1994). Interaction mating reveals binary and ternary connections between Drosophila cell cycle regulators. Proc. Natl. Acad. Sci. USA 91, 12980–12984. Fraser, H.B. (2005). Modularity and evolutionary constraint on proteins. Nat. Genet. 37, 351–352. Fraser, H.B., Hirsh, A.E., Steinmetz, L.M., Scharfe, C., and Feldman, M.W. (2002). Evolutionary rate in the protein interaction network. Science 296, 750–752.
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574.
Fraser, H.B., and Plotkin, J.B. (2007). Using protein complexes to predict phenotypic effects of gene mutation. Genome Biol. 8, R252.
Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., and Sakaki, Y. (2000). Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA 97, 1143–1147.
Fromont-Racine, M., Rain, J.C., and Legrain, P. (1997). Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens. Nat. Genet. 16, 277–282.
Ivanic, J., Yu, X., Wallqvist, A., and Reifman, J. (2009). Influence of protein abundance on high-throughput protein-protein interaction detection. PLoS ONE 4, e5815.
Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dumpelfeld, B., et al. (2006). Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636.
Ja¨ger, S., Gulbahce, N., Cimermancic, P., Kane, J., He, N., Chou, S., D’Orso, I., Fernandes, J., Jang, G., Frankel, A.D., et al. (2010). Purification and characterization of HIV-human protein complexes. Methods 53, 13–19.
Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., et al. (2002). Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147.
Jansen, R., Greenbaum, D., and Gerstein, M. (2002). Relating whole-genome expression data with protein-protein interactions. Genome Res. 12, 37–46.
Ge, H., Liu, Z., Church, G.M., and Vidal, M. (2001). Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 29, 482–486. Ge, H., Walhout, A.J., and Vidal, M. (2003). Integrating ‘omic’ information: a bridge between genomics and systems biology. Trends Genet. 19, 551–560. Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., et al. (2002). Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391. Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y.L., Ooi, C.E., Godwin, B., Vitols, E., et al. (2003). A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736. Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M., and Barabasi, A.L. (2007). The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685– 8690.
Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. (2001). Lethality and centrality in protein networks. Nature 411, 41–42. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabasi, A.L. (2000). The large-scale organization of metabolic networks. Nature 407, 651–654. Johannsen, W. (1909). Elemente der exakten Erblichkeitslehre (Jena: Gustav Fischer). Jonsson, P.F., and Bates, P.A. (2006). Global topological features of cancer proteins in the human interactome. Bioinformatics 22, 2291–2297. Jordan, I.K., Wolf, Y.I., and Koonin, E.V. (2003). No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol. Biol. 3, 1. Kahali, B., Ahmad, S., and Ghosh, T.C. (2009). Exploring the evolutionary rate differences of party hub and date hub proteins in Saccharomyces cerevisiae protein-protein interaction network. Gene 429, 18–22.
Grigoriev, A. (2001). A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 29, 3513–3519.
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484.
Grove, C.A., De Masi, F., Barrasa, M.I., Newburger, D.E., Alkema, M.J., Bulyk, M.L., and Walhout, A.J. (2009). A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327.
Karginov, F.V., Conaco, C., Xuan, Z., Schmidt, B.H., Parker, J.S., Mandel, G., and Hannon, G.J. (2007). A biochemical approach to identifying microRNA targets. Proc. Natl. Acad. Sci. USA 104, 19291–19296.
Gunsalus, K.C., Ge, H., Schetter, A.J., Goldberg, D.S., Han, J.D., Hao, T., Berriz, G.F., Bertin, N., Huang, J., Chuang, L.S., et al. (2005). Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 436, 861–865.
Kemmeren, P., van Berkum, N.L., Vilo, J., Bijma, T., Donders, R., Brazma, A., and Holstege, F.C. (2002). Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 9, 1133–1143.
Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835–840.
Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N., and Davidson, G.S. (2001). A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092.
Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano, M., Jr., Jungkamp, A.C., Munschauer, M., et al. (2010). Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129–141.
Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al. (2006). Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643.
Han, J.D., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout, A.J., Cusick, M.E., Roth, F.P., et al. (2004). Evidence for dynam-
996 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Kuhner, S., van Noort, V., Betts, M.J., Leo-Macias, A., Batisse, C., Rode, M., Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P., et al. (2009). Proteome organization in a genome-reduced bacterium. Science 326, 1235–1240.
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921. Lee, D.S., Park, J., Kay, K.A., Christakis, N.A., Oltvai, Z.N., and Barabasi, A.L. (2008). The implications of human metabolic network topology for disease comorbidity. Proc. Natl. Acad. Sci. USA 105, 9880–9885. Lee, I., Lehner, B., Vavouri, T., Shin, J., Fraser, A.G., and Marcotte, E.M. (2010). Predicting genetic modifier loci using functional gene networks. Genome Res. 20, 1143–1153. Lee, R., Feinbaum, R., and Ambros, V. (2004). A short history of a short RNA. Cell 116, S89–S92. Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., et al. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804.
Perlis, R.H., Smoller, J.W., Mysore, J., Sun, M., Gillis, T., Purcell, S., Rietschel, M., Nothen, M.M., Witt, S., Maier, W., et al. (2010). Prevalence of incompletely penetrant Huntington’s disease alleles among individuals with major depressive disorder. Am. J. Psychiatry 167, 574–579. Piano, F., Schetter, A.J., Morton, D.G., Gunsalus, K.C., Reinke, V., Kim, S.K., and Kemphues, K.J. (2002). Gene clustering based on RNAi phenotypes of ovary-enriched genes in C. elegans. Curr. Biol. 12, 1959–1964. Plewczynski, D., and Ginalski, K. (2009). The interactome: predicting the protein-protein interactions in cells. Cell. Mol. Biol. Lett. 14, 1–22. Pujana, M.A., Han, J.-D.J., Starita, L.M., Stevens, K.N., Tewari, M., Ahn, J.S., Rennert, G., Moreno, V., Kirchhoff, T., Gold, B., et al. (2007). Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat. Genet. 39, 1338–1349. Raser, J.M., and O’Shea, E.K. (2005). Noise in gene expression: origins, consequences, and control. Science 309, 2010–2013.
Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P.O., Han, J.D., Chesneau, A., Hao, T., et al. (2004). A map of the interactome network of the metazoan C. elegans. Science 303, 540–543.
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., and Barabasi, A.L. (2002). Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555.
Lim, J., Hao, T., Shaw, C., Patel, A.J., Szabo, G., Rual, J.F., Fisk, C.J., Li, N., Smolyar, A., Hill, D.E., et al. (2006). A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 125, 801–814.
Reboul, J., Vaglio, P., Rual, J.F., Lamesch, P., Martinez, M., Armstrong, C.M., Li, S., Jacotot, L., Bertin, N., Janky, R., et al. (2003). C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nat. Genet. 34, 35–41.
Ma, H., Sorokin, A., Mazein, A., Selkov, A., Selkov, E., Demin, O., and Goryanin, I. (2007). The Edinburgh human metabolic network reconstruction and its functional analysis. Mol. Syst. Biol. 3, 135.
Reece-Hoyes, J.S., Deplancke, B., Shingles, J., Grove, C.A., Hope, I.A., and Walhout, A.J. (2005). A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol. 6, R110.
Madhani, H.D., Styles, C.A., and Fink, G.R. (1997). MAP kinases with distinct inhibitory functions impart signaling specificity during yeast differentiation. Cell 91, 673–684. Mani, R., St Onge, R.P., Hartman, J.L., 4th, Giaever, G., and Roth, F.P. (2008). Defining genetic interaction. Proc. Natl. Acad. Sci. USA 105, 3461–3466. Marcotte, E., and Date, S. (2001). Exploiting big biology: integrating largescale biological data for function inference. Brief. Bioinform. 2, 363–374. Martinez, N.J., Ow, M.C., Barrasa, M.I., Hammell, M., Sequerra, R., DoucetteStamm, L., Roth, F.P., Ambros, V.R., and Walhout, A.J. (2008). A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev. 22, 2535–2549. Mendez-Rios, J., and Uetz, P. (2010). Global approaches to study proteinprotein interactions among viruses and hosts. Future Microbiol. 5, 289–301. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M., and Alon, U. (2004). Superfamilies of evolved and designed networks. Science 303, 1538–1542. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science 298, 824–827. Mo, M.L., and Palsson, B.O. (2009). Understanding human metabolic physiology: a genome-to-systems approach. Trends Biotechnol. 27, 37–44. Mohr, S., Bakal, C., and Perrimon, N. (2010). Genomic screening with RNAi: results and challenges. Annu. Rev. Biochem. 79, 37–64. Novick, P., Osmond, B.C., and Botstein, D. (1989). Suppressors of yeast actin mutations. Genetics 121, 659–674. Nurse, P. (2003). The great ideas of biology. Clin. Med. 3, 560–568. Oberhardt, M.A., Palsson, B.O., and Papin, J.A. (2009). Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 5, 320. Oti, M., Snel, B., Huynen, M.A., and Brunner, H.G. (2006). Predicting disease genes using protein-protein interactions. J. Med. Genet. 43, 691–698. Pan, X., Yuan, D.S., Xiang, D., Wang, X., Sookhai-Mahadeo, S., Bader, J.S., Hieter, P., Spencer, F., and Boeke, J.D. (2004). A robust toolkit for functional profiling of the yeast genome. Mol. Cell 16, 487–496.
Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., Mann, M., and Seraphin, B. (1999). A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17, 1030–1032. Roberts, P.M. (2006). Mining literature for systems biology. Brief. Bioinform. 7, 399–406. Robinson, C.V., Sali, A., and Baumeister, W. (2007). The molecular sociology of the cell. Nature 450, 973–982. Root, D.E., Hacohen, N., Hahn, W.C., Lander, E.S., and Sabatini, D.M. (2006). Genome-scale loss-of-function screening with a lentiviral RNAi library. Nat. Methods 3, 715–719. Rual, J.F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., Berriz, G.F., Gibbons, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. (2005). Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178. Ruvkun, G., Wightman, B., and Ha, I. (2004). The 20 years it took to recognize the importance of tiny RNAs. Cell 116, S93–S96. Rzhetsky, A., Wajngurt, D., Park, N., and Zheng, T. (2007). Probing genetic overlap among complex human phenotypes. Proc. Natl. Acad. Sci. USA 104, 11694–11699. Schuster-Bockler, B., and Bateman, A. (2008). Protein interactions in human genetic diseases. Genome Biol. 9, R9. Schuster, S., Fell, D.A., and Dandekar, T. (2000). A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat. Biotechnol. 18, 326–332. Seebacher, J., and Gavin, A.-C. (2011). SnapShot: Protein-protein interaction networks. Cell 144, this issue, 1000. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Friedman, N. (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176.
Park, J., Lee, D.S., Christakis, N.A., and Barabasi, A.L. (2009). The impact of cellular networks on disease comorbidity. Mol. Syst. Biol. 5, 262.
Shapira, S.D., Gat-Viks, I., Shum, B.O.V., Dricot, A., de Grace, M.M., Wu, L., Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139, 1255–1267.
Pastor-Satorras, R., Smith, E., and Sole, R.V. (2003). Evolving protein interaction networks through gene duplication. J. Theor. Biol. 222, 199–210.
Shen-Orr, S.S., Milo, R., Mangan, S., and Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68.
Cell 144, March 18, 2011 ª2011 Elsevier Inc. 997
Shoval, O., and Alon, U. (2010). SnapShot: network motifs. Cell 143, 326–326.e1. Simonis, N., Rual, J.F., Carvunis, A.R., Tasan, M., Lemmons, I., HirozaneKishikawa, T., Hao, T., Sahalie, J.M., Venkatesan, K., Gebreab, F., et al. (2009). Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network. Nat. Methods 6, 47–54. Singh, G.P., Ganapathi, M., and Dash, D. (2006). Role of intrinsic disorder in transient interactions of hub proteins. Proteins 66, 761–765. So¨nnichsen, B., Koski, L.B., Walsh, A., Marschall, P., Neumann, B., Brehm, M., Alleaume, A.M., Artelt, J., Bettencourt, P., Cassin, E., et al. (2005). Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans. Nature 434, 462–469.
Vermeirssen, V., Barrasa, M.I., Hidalgo, C.A., Babon, J.A., Sequerra, R., Doucette-Stamm, L., Barabasi, A.L., and Walhout, A.J. (2007). Transcription factor modularity in a gene-centered C. elegans core neuronal protein-DNA interaction network. Genome Res. 17, 1061–1071. Vidal, M. (1997). The reverse two-hybrid system. In The Yeast Two-Hybrid System, P. Bartels and S. Fields, eds. (New York: Oxford University Press), pp. 109–147. Vidal, M. (2001). A biological atlas of functional maps. Cell 104, 333–339. Vidal, M. (2009). A unifying view of 21st century systems biology. FEBS Lett. 583, 3891–3894. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., and Bork, P. (2002). Comparative assessment of large-scale data sets of proteinprotein interactions. Nature 417, 399–403.
Stelzl, U., Worm, U., Lalowski, M., Haenig, C., Brembeck, F.H., Goehler, H., Stroedicke, M., Zenkner, M., Schoenherr, A., Koeppen, S., et al. (2005). A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957–968.
Wachi, S., Yoneda, K., and Wu, R. (2005). Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics 21, 4205–4208.
Stratton, M.R., Campbell, P.J., and Futreal, P.A. (2009). The cancer genome. Nature 458, 719–724.
Walhout, A.J. (2006). Unraveling transcription regulatory networks by proteinDNA and protein-protein interaction mapping. Genome Res. 16, 1445–1454.
Stuart, J.M., Segal, E., Koller, D., and Kim, S.K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255.
Walhout, A.J., Reboul, J., Shtanko, O., Bertin, N., Vaglio, P., Ge, H., Lee, H., Doucette-Stamm, L., Gunsalus, K.C., Schetter, A.J., et al. (2002). Integrating interactome, phenome, and transcriptome mapping data for the C. elegans germline. Curr. Biol. 12, 1952–1958.
Sturtevant, A.H. (1956). A highly specific complementary lethal system in Drosophila melanogaster. Genetics 41, 118–123. Taniguchi, Y., Choi, P.J., Li, G.W., Chen, H., Babu, M., Hearn, J., Emili, A., and Xie, X.S. (2010). Quantifying E. coli proteome and transcriptome with singlemolecule sensitivity in single cells. Science 329, 533–538.
Walhout, A.J., Sordella, R., Lu, X., Hartley, J.L., Temple, G.F., Brasch, M.A., Thierry-Mieg, N., and Vidal, M. (2000a). Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116–122.
Taylor, I.W., Linding, R., Warde-Farley, D., Liu, Y., Pesquita, C., Faria, D., Bull, S., Pawson, T., Morris, Q., and Wrana, J.L. (2009). Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat. Biotechnol. 27, 199–204.
Walhout, A.J., Temple, G.F., Brasch, M.A., Hartley, J.L., Lorson, M.A., van den Heuvel, S., and Vidal, M. (2000b). GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes. Methods Enzymol. 328, 575–592.
Tong, A.H., Evangelista, M., Parsons, A.B., Xu, H., Bader, G.D., Page, N., Robinson, M., Raghibizadeh, S., Hogue, C.W., Bussey, H., et al. (2001). Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368.
Walhout, A.J., and Vidal, M. (2001). Protein interaction maps for model organisms. Nat. Rev. Mol. Cell Biol. 2, 55–62. Wu, X., Jiang, R., Zhang, M.Q., and Li, S. (2008). Network-based global inference of human disease genes. Mol. Syst. Biol. 4, 189.
Turinsky, A.L., Razick, S., Turner, B., Donaldson, I.M., and Wodak, S.J. (2010). Literature curation of protein interactions: measuring agreement across major public databases. Database (Oxford), 2010, baq026.
Wuchty, S., Oltvai, Z.N., and Barabasi, A.L. (2003). Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat. Genet. 35, 176–179.
Uetz, P., Dong, Y.A., Zeretzke, C., Atzler, C., Baiker, A., Berger, B., Rajagopala, S.V., Roupelieva, M., Rose, D., Fossum, E., et al. (2006). Herpesviral protein networks and their interaction with the human proteome. Science 311, 239–242.
Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Alon, U., and Margalit, H. (2004). Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc. Natl. Acad. Sci. USA 101, 5934–5939.
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000). A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627.
Yu, H., Braun, P., Yildirim, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). Highquality binary protein interaction map of the yeast interactome network. Science 322, 104–110.
Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., and Luscombe, N.M. (2009). A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263.
Zhang, L.V., King, O.D., Wong, S.L., Goldberg, D.S., Tong, A.H.Y., Lesage, G., Andrews, B., Bussey, H., Boone, C., and Roth, F.P. (2005). Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J. Biol. 4, 6.
Va´zquez, A., Flammini, A., Maritan, A., and Vespignani, A. (2003). Modeling of protein interaction networks. Complexus 1, 38–44. Venkatesan, K., Rual, J.F., Vazquez, A., Stelzl, U., Lemmens, I., HirozaneKishikawa, T., Hao, T., Zenkner, M., Xin, X., Goh, K.I., et al. (2009). An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. (2001). The sequence of the human genome. Science 291, 1304–1351.
998 Cell 144, March 18, 2011 ª2011 Elsevier Inc.
Zhong, Q., Simonis, N., Li, Q.R., Charloteaux, B., Heuze, F., Klitgord, N., Tam, S., Yu, H., Venkatesan, K., Mou, D., et al. (2009). Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol. 5, 321. Zhu, C., Byers, K.J., McCord, R.P., Shi, Z., Berger, M.F., Newburger, D.E., Saulrieta, K., Smith, Z., Shah, M.V., Radhakrishnan, M., et al. (2009). Highresolution DNA-binding specificity analysis of yeast transcription factors. Genome Res 19, 556–566.
Scientific Editor, Cell Cell is seeking a scientist to join its editorial team. The minimum qualification for this position is a PhD in a relevant area of biomedical research, although additional postdoctoral or editorial experience is preferred. This is a superb opportunity for a talented individual to play a critical role in promoting science by helping researchers disseminate their findings to the wider community. As an editor, you would be responsible for assessing submitted research papers and overseeing the refereeing process, and you would commission and edit material for Cell's Leading Edge. You would also travel frequently to scientific conferences to follow developments in research and to establish and maintain close ties with the scientific community. You would have opportunities to pioneer and contribute to new trends in scientific publishing. The key qualities we look for are breadth of scientific interest and the ability to think critically about a wide range of scientific issues. The successful candidate will be highly motivated and creative and able to work independently as well as in a team. This is a full-time, in-house editorial position, based at the Cell Press office in Kendall Square near MIT in Cambridge, Massachusetts. Cell Press offers an attractive salary and benefits package and a stimulating working environment that encourages innovation.
To apply, please visit http://reedelsevier.taleo.net/careersection/51/jobdetail.ftl?lang=en&job=SCI0005W.
Scientific Editor, Molecular Cell Molecular Cell is seeking a full-time scientific editor to join its editorial team. We will consider qualified candidates with scientific expertise in any area that the journal covers. The minimum qualification for this position is a PhD in a relevant area of biomedical research, although additional experience is preferred. This is a superb opportunity for a talented individual to play a critical role in the research community away from the bench. As a scientific editor, you would be responsible for assessing submitted research papers, overseeing the refereeing process, and choosing and commissioning review material. You would also travel frequently to scientific conferences to follow developments in research and establish and maintain close ties with the scientific community. The key qualities we look for are breadth of scientific interest and the ability to think critically about a wide range of scientific issues. The successful candidate will also be highly motivated and creative and able to work independently as well as in a team. This is a full-time in-house editorial position, based at the Cell Press office in Cambridge, Massachusetts. Cell Press offers an attractive salary and benefits package and a stimulating working environment. Applications will be held in the strictest of confidence and will be considered on an ongoing basis until the position is filled.
To apply Please submit a CV and cover letter describing your qualifications, research interests, and reasons for pursuing a career in scientific publishing, as soon as possible, to our online jobs site: http://www.elsevier.com/wps/find/job_search.careers. Click on “search for US jobs” and select “Massachusetts.” Or: http://reedelsevier.taleo.net/careersection/51/jobdetail.ftl?lang=en&job=SCI0005X.
No phone inquiries, please. Cell Press is an equal opportunity/affirmative action employer, M/F/D/V.
Scientific Editor, Cancer Cell Cancer Cell is seeking an additional full-time scientific editor to join its editorial team based in Cambridge, Massachusetts. Cancer Cell publishes studies across broad areas of cancer research with an emphasis on translational research. As a scientific editor, you would be responsible for assessing submitted manuscripts, overseeing the review process, and commissioning and editing review material for the journal. You would travel frequently to scientific conferences and research institutions to follow developments in cancer research and to establish and maintain close ties with the scientific community. This position will also work closely with other aspects of the business, including production, business development, marketing, and commercial sales, and, therefore, provide an excellent entry opportunity to scientific publishing. The minimum qualification for this position is a PhD in a relevant area of cancer research; additional experience at the postdoctoral level is preferred. Previous editorial experience is beneficial but is not required. This is a superb opportunity for a talented individual to play a critical role in the research community away from the bench. The key qualities we look for are breadth of scientific interest and the ability to think critically about a wide range of scientific issues. The successful candidate will also be highly motivated and creative, possess strong communication skills, and be able to work in a team as well as independently. This is a full-time, in-house editorial position, based at the Cell Press office in Cambridge, Massachusetts. Cell Press offers an attractive salary and benefits package and a stimulating working environment. Applications will be held in the strictest of confidence and will be considered on an ongoing basis.
To apply Please submit a cover letter and CV to our online jobs site: http://reedelsevier.taleo.net/careersection/51/jobdetail.ftl?lang=en&job=SCI00065. Please, no phone inquiries. Cell Press is an equal opportunity/affirmative action employer, M/F/D/V.
Editor, Trends in Biotechnology We are seeking to appoint a new Editor for Trends in Biotechnology, to be based in the Cell Press offices in Cambridge, Massachusetts. As Editor of Trends in Biotechnology, you will be responsible for the strategic development and content management of the journal. You will be acquiring and developing the very best editorial content, making use of a network of contacts in academia plus information gathered at international conferences, to ensure that Trends in Biotechnology maintains its market-leading position. This is an exciting and challenging role that provides an opportunity to stay close to the cutting edge of scientific advances while developing a new career away from the bench. You will work in a highly dynamic and collaborative publishing environment that includes 14 Trends titles and 12 Cell Press titles. You will also collaborate with your Cell Press colleagues to maximize quality and efficiency of content commissioning and participate in exciting new non-journal-based initiatives. The minimum qualification is a doctoral degree in a relevant discipline, and postdoctoral training is an advantage. Previous publishing experience is not necessary—we will make sure you get the training and development you need. Good interpersonal skills are essential because the role involves networking in the wider scientific community and collaboration with other parts of the business.
To apply, please visit http://reedelsevier.taleo.net/careersection/51/jobdetail.ftl?lang=en&job=SCI0007Q and submit a CV and cover letter describing your qualifications, research interests, and reasons for pursuing a career in publishing. No phone inquiries, please. Cell Press is an Equal Opportunity Employer. Cell Press offers an attractive salary and benefits package and a stimulating working environment. Applications will be held in the strictest of confidence and considered on an ongoing basis.
Editor: Trends in Endocrinology and Metabolism We are seeking to appoint a new Editor for Trends in Endocrinology and Metabolism, to be based in the Cell Press offices in Cambridge, Massachusetts. As Editor of Trends in Endocrinology and Metabolism, you will be responsible for the strategic development and content management of the journal. You will be acquiring and developing the very best editorial content, making use of a network of contacts in academia plus information gathered at international conferences, to ensure that Trends in Endocrinology and Metabolism maintains its market-leading position. This is an exciting and challenging role that provides an opportunity to stay close to the cutting edge of scientific advances while developing a new career away from the bench. You will work in a highly dynamic and collaborative publishing environment that includes 14 Trends titles and 12 Cell Press titles. You will also collaborate with your Cell Press colleagues to maximize quality and efficiency of content commissioning and participate in exciting new non-journal-based initiatives. The minimum qualification is a doctoral degree in a relevant discipline, and postdoctoral training is an advantage. Previous publishing experience is not necessary—we will make sure you get the training and development you need. Good interpersonal skills are essential because the role involves networking in the wider scientific community and collaboration with other parts of the business.
To apply Please submit a CV and cover letter describing your qualifications, research interests, current salary, and reasons for pursuing a career in publishing at http://reedelsevier.taleo.net/careersection/51/jobdetail.ftl?lang=en&job=SCI0008R. No phone inquiries, please. Cell Press is an equal opportunity employer. Applications will be considered on an ongoing basis until the closing date of Friday, April 1st.
EDITOR-IN-CHIEF
SENIOR EDITORS
ASSOCIATE EDITORS
F.E. Bloom La Jolla, CA, USA
J.F. Baker Chicago, IL, USA P.R. Hof New York, NY, USA G.R. Mangun Davis, CA, USA J.I. Morgan Memphis, TN, USA F.R. Sharp Sacramento, CA, USA R.J.Smeyne Memphis, TN, USA A.F. Sved Pittsburgh, PA, USA
G. Aston-Jones Charleston, SC, USA J.S. Baizer Buffalo, NY, USA J.D. Cohen Princeton, NJ, USA B.M. Davis Pittsburgh, PA, USA J. De Felipe Madrid, Spain M.A. Dyer Memphis, TN, USA M.S. Gold Pittsburgh, PA, USA G.F. Koob La Jolla, CA, USA
T.A. Milner New York, NY, USA S.D. Moore Durham, NC, USA T.H. Moran Baltimore, MD, USA T.F. Münte Magdeburg, Germany K-C. Sonntag Belmont, MA, USA R.J. Valentino Philadelphia, PA, USA C.L. Williams Durham,NC, USA
1
23
Twenty-three to the Power of One.
One re-unified journal, nine specialist sections, 23 receiving Editors ← Authors receive first editorial decision within 30 days of submission ← “Young Investigator Awards” for innovative work by a new generation of researchers ←
Brain Research take another look www.elsevier.com/locate/brainres
Announcements/Positions Available Postdoctoral position in the ACUPUNCTURE research University of California, Irvine Susan Samueli Center for Integrative Medicine A postdoctoral position is available in the area of autonomic control, specifically in research of acupuncture’s effect in hypertension, hypotension and cardiac ischemia. Please check http://www.sscim.uci.edu/ (under faculty, Dr. Longhurst) for more information on our research program. The optimal applicant must have good knowledge of physiology, molecular biology, acupuncture, traditional Chinese medicine and clinical cardiovascular disease. More specifically, applicants should have hand-on experiences in molecular biology techniques (e.g., molecular cloning, transfection and RNAi techniques) and small animal surgery, including hemodynamic recording and central nuclei microinjection and electrophysiology. Preferred candidates are individuals with a recent Ph.D. degree with good publications and excellent command of English. Annual salary: $45,000. Please e-mail CV and three reference letters to: John C. Longhurst, M.D., Ph.D. University of California, Irvine Professor of Medicine Director, Susan Samueli Center for Integrative Medicine E-mail:
[email protected] Positions Available
University of Notre Dame Faculty Positions in Cancer Research As part of a campus-wide initiative associated with the newly formed Harper Cancer Research Institute, multiple open-rank tenure-track faculty positions including endowed chairs are available at the University of Notre Dame in the broad area of cancer research. Individuals engaged in basic and translational cancer research using molecular, cellular, organotypic and/or animal models of cancer are encouraged to apply. A demonstrated record of collaborative and interdisciplinary research is preferred. Successful candidates will be appointed in either the Department of Biological Sciences or the Department of Chemistry and Biochemistry and will be expected to establish a vigorous extramurally supported research program in cancer biology, to participate in collaborations within and across disciplines, and to contribute to excellence in graduate and undergraduate education. Senior applicants should have a record of national and international distinction. Key research facilities include the AAALAC-accredited Freimann Animal Facility, Notre Dame Integrated Imaging Facility, Center for Zebrafish Research, Eck Institute for Global Health, Keck Center for Transgene Research, Center for Rare and Neglected Diseases, Lizzadro Magnetic Resonance Research Center, Center for Research Computing, and core facilities for Proteomics, Genomics and Bioinformatics. Additional information can be found at http://science.nd.edu. The Harper Cancer Research Institute represents a new joint venture between the University of Notre Dame and Indiana University School of Medicine-South Bend (http://medicine.iu.edu/southbend) and is housed in 55,000 square feet of new research space. The University is also engaged in the Indiana Clinical and Translational Sciences Institute (I-CTSI) partnership with Indiana University School of Medicine, Indiana University, IUPUI and Purdue University. The positions include competitive salary, start-up funding and laboratory space. Applicants should upload a cover letter, curriculum vitae, a detailed research plan, and a statement of teaching interests directed to M. Sharon Stack, Scientific Director of the Harper Cancer Research Institute (https://info.chem.nd.edu/apply). Candidates must also arrange to have at least three letters of recommendation sent directly to the search committee via the application website, although senior applicants can apply in confidence. Review of applications will commence immediately and will continue until suitable candidates are identified. The University of Notre Dame, an equal opportunity employer with a strong institutional and academic commitment to diversity, endeavors to foster a vibrant learning community animated by the Catholic intellectual tradition.
Positions Available
Faculty Positions Assistant and Associate Professors Center for Cell Biology & Cancer Research Albany Medical College The Center for Cell Biology & Cancer Research at the Albany Medical College announces the availability of tenure-track faculty positions at the Assistant and Associate Professor levels. The successful candidates’ research will interface with scientific programs within the Center which has a strong focus on tissue remodeling, tumor microenvironment, inflammation and fibrosis, tumor growth/metastasis and gene regulation. Candidates will have the opportunity to be affiliated with the Cancer Genomics Center at the State University of New York’s East Campus. The Albany area is also home to Taconic Farms, Rensselaer Polytechnic Institute, the College of Nanoscale Science & Engineering at the University of Albany (SUNY) and General Electric Health Care Research, all of which contribute significantly to the collaborative environment in the Capital District. Studies by Center faculty concentrate on molecular mechanisms regulating cell adhesion and motility, angiogenesis, growth factor- and matrixdependent signal transduction, transcriptional control of cell fate, tumor-stroma interactions and targeted therapy of cancer. The applicant will be expected to develop an extramurally-funded research program emphasizing molecular and/or genetic approaches to problems relating to these focus areas. Candidates using animal models that mimic stages in human cancer progression are particularly encouraged to apply. Qualifications include a Ph.D. degree and a demonstrated track record of excellence in research. Associate Professor candidates are expected to have an externally-funded research program. All faculty in the Center participate in the teaching missions of the College Graduate and Medical School curricula. Full consideration will be given to those applications received by May 1, 2011. Curriculum vitae, description of research interests, and at least three letters of recommendation are required; providing copies of published papers is strongly encouraged. All materials should be submitted to: Paul J. Higgins, Ph.D. Center for Cell Biology & Cancer Research (MC-165) Albany Medical College 47 New Scotland Avenue Albany, New York 12208 The Albany Medical College is an Equal Opportunity, Affirmative Action Employer
¬,OOK¬!GAIN
$ISCOVER¬-ORE s¬!CCESS¬TO¬THE¬¬#ELL¬0RESS¬PRIMARY¬RESEARCH¬JOURNALS¬AND¬ ¬4RENDS¬REVIEWS¬TITLES ¬ALL¬ON¬THE¬SAME¬PLATFORM s¬)MPROVED ¬MORE¬ROBUST¬ARTICLE¬AND¬AUTHOR¬SEARCH s¬6IDEO ¬ANIMATIONS¬AND¬SOUND¬lLES s¬%ASY¬TO¬NAVIGATE¬HOME¬PAGE ¬ARTICLES¬PAGES¬AND¬ARCHIVE
WWWCELLCOM
.%7
Find Your Ideal Job! så3EARCHåJOBSåBYåKEYWORD å LOCATIONååTYPEå så0OSTåYOURåRESUMEåå ANONYMOUSLYå så#REATEåAå*OBå!LERTåå ANDåLETåYOURåIDEALåå JOBålNDåYOU
careers.cell.com
SnapShot: Protein-Protein Interaction Networks Jan Seebacher and Anne-Claude Gavin Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
EXPERIMENTAL METHODS FOR CHARTING PROTEIN-PROTEIN INTERACTION NETWORKS Binary interactions
Molecular machines/Protein complexes comembership
4L[OVKZ
:WSP[WYV[LPUZ
(ZZH`9LHKV\[
4L[OVKZ
Yeast two-hybrid Protein fragment complementation assay
Transcription factor, ubiquitin Dehydrofolate reductase GFP or YFP
Transcription Antibiotic resistence Fluorescence
Affinity purification/Mass spectrometry Biochemical purification of affinity-tagged baits followed by MS identification of copurifying preys
+
A B
+
A
-
C
A E
+
D
A F
+
AC
B A
D
E
B
F
-
B
G
-
F G H I
A
B C
D
E
F
G H
+ + + + + -
+ + -
+ + -
+ + -
+ +
+ +
+ + + -
+ + -
I
+ + -
Bait
Complex 1
Prey
E A C BD
Protein pairs + Interacting - Noninteracting
False negatives
D
E
F
G H
I
Bait
Prey
Socio-affinity
B C
Complex 2
Interaction examples - Allosteric - Chaperone-assisted
E
A F GH I
Interaction strength -Transient to stable
True negatives
B C
D
Interaction examples - Signaling - Enzyme - substrate
F
Interaction strength -Stable
G H I
INTERACTION NETWORK DATA QUALITY / BENCHMARK PARAMETERS Novel interactions & false positives
A A
PREDICTION OF PROTEIN COMPLEX TOPOLOGY
Spoke model
“Gold standard” (PRS, positive reference list)
Matrix model
Socio-affinity model
Experimental PPI data set Socioaffinity
“Negative set” (RRS, random reference set, or set of proteins unlikely to interact)
Coverage
False positives
NETWORK COMPONENTS
NETWORK TOPOLOGIES Random network
Scale-free network
Hierarchical network
(Biological/cellular networks)
(Many types of real networks)
-Degrees follow power-law distributions -Robustness against random failure -Vulnerability to targeted attacks
-Degrees follow power-law distributions -Account for modularity, local clustering, and scale-free topology -High clustering coefficient (C)
Hub: node with high degree
Party hubs: same time and space
Edge: link between two nodes (interaction)
Node (protein)
Date hubs: different time and/ or space
-Degrees follow Poisson (or peaked) distribution -Vulnerability to failure
Expression profiles and/or localization
NETWORK MEASURES Degree/ connectivity (k) C
I
A
C
H J
E
K
k A =Nb of edges through A=5
1000
Assortativity/average nearest neighbor’s connectivity (NC) C
G B
A
D
Clustering coefficient/ interconnectivity (C)
Actual links between A’s neighbors (black) C A= Possible links between A’s neighbors (orange) B
D
E
C A =n A /[k A (k A -1)/2] =2/[4x(4-1)/2]=0.333
Cell 144, March 18, 2011 ©2011 Elsevier Inc.
C
G B
D
I
A
F E
K
NC A =(k B +k C +k D +k E +k J )/5 =(5+2+2+3+1)/5=2.6
DOI 10.1016/j.cell.2011.02.025
Betweenness/ centrality (B) C
G B
I
F
A
G B H
H J
D
A
Shortest path (SP) between two nodes
F
H J
E
K
SP FH =(F,D,A,B,H)=4
D
K E B 4 =Fraction of SPs passing through A =0.090
See online version for legend and references.
Metabolism & Aging March 27-29, 2011 Cape Cod, Massachusetts, USA
Conference Organizers Prof. David A. Sinclair, Harvard Medical School, Boston, USA Dr. Nir Barzilai, M.D., Albert Einstein College of Medicine, New York, USA Dr. C. Ronald Kahn, Joslin Diabetes Center at Harvard Medical School, Boston, USA Speakers Domenico Accilli, Columbia University, NY, USA Adam Antebi, Max Planck Institute for Biology of Ageing, Germany Dongsheng Cai, Albert Einstein College of Medicine, NY, USA Hassy Cohen, UC Los Angeles, CA, USA Jill Crandall, Albert Einstein College of Medicine, NY, USA Rafael de Cabo, National Institute of Health, MD, USA Andy Dillin, Salk Institute For Biological Studies, CA, USA David J. Glass, Novartis Institutes for BioMedical Research, MA, USA Leonard Guarente, Massachusetts Institute of Technology, MA, USA Pankaj Kapahi, The Buck Institute for Age Research, CA, USA Brian Kennedy, University of Washington, WA, USA James Kirkland, Mayo Clinic, MN, USA Valter D. Longo, UC San Francisco, CA, USA Jim Nelson, UT Health Science Center, TX, USA Eric Ravussin, Pennington Biomedical Research Center, LA, USA Arlan Richardson, UT Health Science Center, TX, USA Randy Strong, UT Health Science Center, TX, USA Marc Tatar, Brown University, RI, USA Heidi Tissenbaum, University of Massachusetts Medical School, MA, USA Eric Verdin, UC San Francisco, CA, USA
The first Cell Symposia meeting of 2011, Metabolism & Aging takes place on March 27 – 29 in the beautiful Cape Cod peninsula at the southern tip of Massachusetts, USA. This meeting aims to bring together scientists with interests in aging and metabolism to further explore how these fields intersect and to identify the most promising future directions. We will hear the latest data from leaders in the field about the key pathways at the level of the cell and the organ, across a range of contexts including model organisms, mammalian systems, and translational studies in primates and humans. Topics will also include how the signalling networks of metabolism and aging connect and communicate and how we can best make use of these connections to improve medicine and society.
Visit the Metabolism & Aging website to: REGISTER
Supporting publications
Submit your poster abstract Study the final programme View our speakers biographies
www.cell-symposia-metabolism-aging.com
the difference between discover and do over
Gibco®. Every little thing matters. Fact: your research will only be as accurate, as efficient, as groundbreaking as the cell culture you’re working with. That’s why you’re settling for nothing less than Gibco® media. Considering it could be the difference between publishing your results and reading someone else’s, why experiment with your experiment?
Go to invitrogen.com/discovergibco Download the free mobile app at http://gettag.mobi Scan the barcode to instantly access more information about Gibco®.
Life Technologies offers a breadth of products
DNA | RNA | PROTEIN | CELL CULTURE | INSTRUMENTS
FOR RESEARCH USE ONLY. NOT INTENDED FOR ANY ANIMAL OR HUMAN THERAPEUTIC OR DIAGNOSTIC USE. © 2011 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life Technologies Corporation or their respective owners. Printed in the USA. CO17587 0211