Enzyme Functionality Design, Engineering, and Screening edited by
Allan Svendsen
Novozymes AIS Bagsvard, Denmark
m M...
157 downloads
1575 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Enzyme Functionality Design, Engineering, and Screening edited by
Allan Svendsen
Novozymes AIS Bagsvard, Denmark
m MARCEL
DEKKER
MARCELDEKKER, INC.
NEWYORK BASEL
Although great care has been taken to provide accurate and current information, neither the author(s) nor the publisher, nor anyone else associated with this publication, shall be liable for any loss, damage, or liability directly or indirectly caused or alleged to be caused by this book. The material contained herein is not intended to provide specific advice or recommendations for any specific situation. Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress. ISBN: 0-8247-4709-7 This book is printed on acid-free paper. Headquarters Marcel Dekker, Inc., 270 Madison Avenue, New York, NY 10016, U.S.A. tel: 212-696-9000; fax: 212-685-4540 Distribution and Customer Service Marcel Dekker, Inc., Cimarron Road, Monticello, New York 12701, U.S.A. tel: 800-228-1160; fax: 845-796-1772 Eastern Hemisphere Distribution Marcel Dekker AG, Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland tel: 41-61-260-6300; fax: 41-61-260-6333 World Wide Web http://www.dekker.com The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales/Professional Marketing at the headquarters address above. Copyright n 2004 by Marcel Dekker, Inc. All Rights Reserved. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher. Current printing (last digit): 10 9 8 7 6 5 4 3 2 1 PRINTED IN THE UNITED STATES OF AMERICA
Preface
This book focuses on understanding enzyme functionality and protein design in order to create altered characteristics. Aspects of knowledge and understanding within the areas of enzyme function and engineering are discussed, as are the subsequent screening of new enzyme variants. The book presents some background concepts required for analyzing enzymes and proteins, in order to increase our understanding of their function. The following chapters are on selected topics that have been part of my daily work regarding protein design, which includes structural knowledge to guide random mutagenesis to selected areas of the protein, as well as methods to create and select variant proteins with the desired characteristics. I hope that the many stories from the many specialists who contributed to this volume will generate novel ideas for further development in the field of protein engineering. The diverse knowledge backgrounds from theoretical to pure experimental science and the biological to the biophysical sciences are intended to meld to create a novel understanding of the enzyme functionality and the variant enzyme design and production. In order to engineer enzyme functionality, some basic understanding is needed of the enzyme function and the specific enzyme characteristics, both in general and under the conditions in which the enzymes are intended to be used. To understand the functionality of an enzyme we also need to understand the substrate, as the substrate is often much more complicated and larger than the enzyme. This makes the interaction of the enzyme with the iii
iv
Preface
substrate a much more complicated interaction than small substrate interactions with a bigger enzyme, which are usually discussed in textbooks. While the issue of the substrate is not the subject of a separate chapter, the problem is touched on by G. H. Peters in Chapter 6. The increasing insight into sequence families and the structural relationship among different functionalities is breaking up the old enzyme classification system (the EC classes). There is discussion of the possibilities of combining the knowledge of the relationships and the developed techniques within protein engineering, design, and directed evolution, for development of new functionalities in common scaffolds—the so-called ‘‘promiscuity area.’’ The screening is emphasized as the most important issue in protein engineering. We can to a certain extent direct our design to limit the overwhelming number of possibilities, and we can do it in the DNA, but we cannot easily sample what we expect to find because our assays are not always good enough: ‘‘you get what you select for.’’ This book is divided into parts on enzyme design, enzyme diversity generation, and screening. Within these areas the book focuses on ideas, function, and results rather than on precise method description. Many methods are mentioned, but a separate reference may be consulted to obtain more detailed information. Each Part begins with a chapter that provides a general overview or a discussion of ideas and concepts, and continues with chapters that present scientific studies, but in the sense of presenting the writers’ ideas and experiences of the subject. Part I, on design, covers protein engineering concepts, library design methods, and computer design methods. The chapters focus on how to design or suggest directions of mutational work, and discuss computational methods. The simulation data fix a new complexity level and allow us to extrapolate our ideas further into a more complex understanding. Parts IIA and B, on engineering, cover site-directed methods including combinations and redesign, and evolutionary methods and phage display, and provide examples on how to make the variants. Part III, on screening, covers chemical-based assays, fluorescence-based assays, and in vivo assays, and discusses the important area of selecting the correct variants within the large variability it is possible to create. Also in Part III, two issues of assaying the characteristics and expression of the variant enzymes are briefly discussed. All scientific studies present research containing examples of the issues covered in that section. Part I: Enzyme Design Chapter 1 introduces enzyme engineering concepts, mainly using phytases as examples. Chapter 2 discusses the classification system for enzymes, including the EC system and a new classification system so far used mainly for carbohydrate-degrading enzymes. The new classification puts different activity types together in one family, basically connected by a similarity in the 3D structure. Chapter 3 examines the
Preface
v
variability in variant 3D structures based on X-ray crystallography, and hence begins discussing ‘‘subtle’’ changes in these structures. This chapter gives examples from subtilisin proteases and T4 lysozyme. Chapter 4 discusses another predictive computational tool for lipase, using enantioselectivity. Chapter 5 discusses a computational prediction tool called ‘‘combine,’’ which has implications for the guidance of protein engineering activities. The enzyme is haloalkane dehalogenase. Chapter 6 thoroughly describes molecular simulation, using lipases and PTPase as examples. Chapter 7 discusses some of the issues within theoretical electrostatics analysis of enzymes, mainly focusing on some titration characteristics with examples from xylanase, lyzozyme, and alcohol dehydrogenase. Chapter 8 presents the theoretical numbers of combinations of variant sequences and the necessary limitation of the diversity in experimental settings. Part IIA: Enzyme Diversity Generation: Site-Directed and Redesign Chapter 9 describes the activity of a variant enzyme using bacterial alphaamylases as the main example, and includes electrostatics calculations. Chapter 10 discusses the understanding of catalysis of chitinase, including electrostatics, molecular dynamics, and variants as tools. Chapter 11 discusses the mutational development of a phosphotriesterase enzyme; changes in specific activities and stereospecificity are also discussed. Chapter 12 explores the engineering of glucose dehydrogenases and their potential use in biosensors. Chapter 13 provides a thorough review of one well-researched concept for stabilization of proteins, the so-called ‘‘Proline rule,’’ using oligo1,6-glucosidase as an example. Chapter 14 discusses changing the DNA enzymes themselves for better functionality, using the homing endonucleases as examples. Part IIB: Enzyme Diversity Generation: Evolutionary Methods Chapter 15 discusses evolutionary methods, mentioning pathway engineering and genome shuffling. Chapter 16 gives examples on random mutagenesis methods and describes the error-prone PCR methods and results. Chapter 17 discusses phage display of enzymes, including examples of gluthathion transferase, betalactamase, subtilisins, lipases, PenG, and metalloenzymes, as well as suicide substrates. Chapter 18 reviews the in-vivo–directed evolution in yeast, with a lipase as an example. Chapter 19 discusses shuffling of the catechol 2,3dioxygenase and Chapter 20 the shuffling and chimers of gluthathion transferases. Chapter 21 discusses chimers of b-glucosidases. Part III: Screening Chapter 22 reviews screening methods. Chapter 23 focuses on methods for screening for thermostability and examples of thermostable variants. Chapter 24 discusses screening methods and highthroughput screening (HTS)—fluorescence methods and digital imaging, as well as library design and combinatorial algorithms. Chapter 25 describes bottlenecks in screening setup, pricing, and HTS—primary and secondary
vi
Preface
screening. Chapter 26 discusses HTS of variants of Pseudomonas lipases changing the enantiomeric ratio. Chapter 27 discusses display in cells— fluorescence-activated cell sorting (FACS). Chapter 28 gives an example of the importance of expression, which is well known but not discussed at length elsewhere—this is a very important issue for utilization of the variant enzymes, and also for understanding the screening results. Chapter 29 provides examples of protein modifications and assays. Although subtilisins and amylases have been the most famous engineered enzymes, they are discussed only in Chapters 3 and 17. Some of the earliest work on protein engineering of enzymes was on alpha-amylases. Amylases are addressed in Chapters 2, 9, and 10. Lipases are the most represented enzyme type discussed in this book, in Chapters 4, 6, 16–18, 22, 26, and 28, of which five discuss Pseudomonas sp. lipases, and three discuss enantiomer selectivity. Several chapters are within my own present field of molecular modeling and computational biochemistry (Chapters 4–7) and theoretical considerations of DNA and mutant combinations (Chapter 8), as well as on library design (Chapter 24). Several other enzymes are discussed: phytase (Chapter 1), triesterases (Chapter 11), and glucose dehydrogenases (Chapter 12), among others. A large number of engineering methods are mentioned, for example, phage display (Chapter 17), in vivo recombination in yeast (Chapter 18), cell surface display (Chapter 27), screening assays in petri dishes (Chapters 18, 23, and 25), and in-solution and microtiter plates (Chapters 23–26 and 29). Assays for activity, specificity, and stability together with expression are very important parts of enzyme engineering. Details of reaction mechanisms are given in Chapters 10 and 11, and are also touched on in other chapters. Since around 1980, researchers started to change protein sequence by purpose. This led to the ‘‘old protein engineering cycle’’ (1983–1990), based on the understanding of the protein structure and its relationship to function. The protein engineering loop, ‘‘structure–theory–design–mutation–purification–analysis,’’ was applied and the mutation was often based on some protein concepts, which was taken all the way through to analysis before a new variant was designed. Around 1990–1994 the ‘‘medium cycle’’ arose, still largely based on the rational method, but with increases in speed—‘‘make many and test’’—and still needing pure samples and feedback from the results. Later, ‘‘the evolutionary period’’ (1994–2002?) introduced random methods and variant libraries with many combinations; high-throughput screening started, testing the conformational ‘‘space.’’ Increases in computer speed opened up possibilities for a new understanding, and electrostatics and molecular dynamics simulations were becoming an integrated part of the design process.
Preface
vii
At present, we may want some understanding and structural information as well as the more random methodologies in combination. The uncertainty—the lowering of possibilities—has directed work, restricted the use of structural information and more sophisticated screening methods, and made it important for combining library design and screening. Terms like ‘‘directed random evolution’’—doped oligonucleotide methods in addition to directed evolution—which had begun to be used in 1993–1994, have become more common. In the future, we probably will have to think ‘‘out of the protein’’ on the external interactions, the electric field, the dynamical behavior, the water structure, the surroundings, the interactions, and so on. In the beginning of protein engineering, the mutants were followed from design (structure-based) to test, often before doing the next mutant, and led to our belief today in high throughput. In Chapter 8, G. L. Moore and C. D. Maranas make it clear that a limitation is necessary (possible). A drawback in directed evolution has been the fact that the variant enzyme is judged in a nonpurified condition. Also, the error-prone PCR method is found not to be a random choice but rather a much-directed choice, which, when analyzed, covers only a certain number of possibilities (see Chapter 16). An interesting possibility today is the changing of the polymerases and DNA acting enzymes for alternative mutagenesis reactions (see Chapter 14). My experience tells me that to make a final choice of the variant enzymes, the variant enzyme has to be purified in order to secure the correct characteristics of the protein. Chapter 29, the last chapter, focuses on the protein stability measurements and modified enzymes—in this case, chemically modified enzymes. This chapter, together with Chapter 28 on expression of variant enzymes, emphasizes the importance of testing and expressing reasonable amounts, and making measurements of the final purified protein. The contributors to this book come from all over the world, young scientists and older ones, and present different opinions on which method is the best for obtaining a certain characteristic of an enzyme. As a structural chemist, I personally prefer a structurally derived background for the design, whereas others prefer more random methods. The reasoning behind the choices and each contributor’s personal experiences are discussed. Enzyme Functionality is mainly for scientific professionals, Ph.D. candidates, and post-doctoral students in the field of enzymes and the many related special areas such as molecular dynamics simulation, electrostatics, and genetic methods for variant engineering, as well as the always very important area of screenings assay development. The chapters are based mainly on the scientists’ own experiences and are written at a high scientific level, with thorough discussions of ideas and methods and a wide range of references to original articles.
viii
Preface
I would like to mention that the second chapter of this book is dedicated to Martin Schu¨lein, a Novozymes researcher, colleague, and friend, who died much too early in his always-energetic work on carbohydrate-degrading enzymes. The chapter authors have written based on their own experience and expertise, and present a wide variety of ideas. I hope this diverse range of ideas gives readers inspiration for their own choice of design, and that they can contribute to strengthening the important discussion in this strongly developing field of protein engineering. Allan Svendsen
Contents
Preface Contributors
iii xiii
Part I: Enzyme Design 1. Concepts for Protein Engineering Martin Lehmann 2. Sequence Families and Modular Organization of Carbohydrate-Active Enzymes Bernard Henrissat, Pedro M. Coutinho, Emeline Deleury, and Gideon Davies 3. Analyzing Three-Dimensional Structures of Variant Enzymes Richard Bott 4. Quantitative Modeling of Lipase Enantioselectivity Ju¨rgen Pleiss
1
15
35
59
ix
x
Contents
5. Rational Redesign of Haloalkane Dehalogenases Guided by Comparative Binding Energy Analysis Jirˇı´ Damborsky´, Jan Kmunı´cˇek, Toma´sˇ Jedlicˇka, Santos Luengo, Federico Gago, Angel R. Ortiz, and Rebecca C. Wade 6. Computer Simulations: A Tool for Investigating the Function of Complex Biological Macromolecules Gu¨nther H. Peters 7. Calculations of Ionization Equilibria in Proteins Andrey Karshikoff 8. Modeling and Optimization of Directed Evolution Protocols Gregory L. Moore and Costas D. Maranas
79
97
149
185
Part IIA: Enzyme Diversity Generation: Site-Directed and Redesign 9. Rational Redesign of Enzymes Jens Erik Nielsen 10. Details in the Reaction Mechanism of Chitinases Vincent G. H. Eijsink, Gustav Kolstad, Sigrid Ga˚seidnes, Bjørnar Synstad, Martin G. Peter, Jens Erik Nielsen, David Komander, Douglas Houston, and Daan M. F. van Aalten 11. Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase Frank M. Raushel 12. Protein Engineering of PQQ Glucose Dehydrogenase Satoshi Igarashi and Koji Sode 13. The Proline Rule: A Concept for Engineering Protein Stability Yuzuru Suzuki 14. Homing Endonucleases: Tools and Targets for Protein Engineering Alfred Pingoud, Ann-Jose´e Noe¨l, Vera Pingoud, Shawn Steuer, and Wolfgang Wende
213
229
247
261
293
325
Contents
xi
Part IIB: Enzyme Diversity Generation: Evolutionary Methods 15. Evolutionary Methods for Protein Engineering Huimin Zhao and Wenjuan Zha 16. Directed Evolution by Random Mutagenesis: A Critical Evaluation Thorsten Eggert, Manfred T. Reetz, and Karl-Erich Jaeger 17. Enzyme Engineering by Phage Display Patrice Soumillion, Daniel Legendre, and Jacques Fastrez 18. In Vivo Gene Shuffling in Yeast: A Fast and Easy Method for Directed Evolution of Enzymes Jens Sigurd Okkels 19. Effective DNA Shuffling Methods for Enzyme Evolution Osamu Kagami, Sang-Ho Baik, and Shigeaki Harayama 20. Exploring the Functional Space of Combinatorial Mutant Libraries for the Directed Evolution of Novel Enzyme Activities Bengt Mannervik, Lars O. Hansson, and William G. Bardsley 21. Modifying the Character of an Enzyme by Producing Chimeric Enzymes: Chimeric h-glucosidases as an Illustration Kiyoshi Hayashi, Bong Jo Kim, Kshamata Goyal, Satya Singh, Jong-Deog Kim, Yeon-Kye Kim, Satoru Nirasawa, and Motomitsu Kitaoka
353
375
391
413
425
443
461
Part III: Screening 22. Assay Systems for Screening or Selection of Biocatalysts Uwe T. Bornscheuer
475
23. Screening of Enzyme Variants for Thermostability Shigenori Kanaya
491
xii
24. Combinatorial Mutagenesis Algorithms, Digital Imaging Spectroscopy, and Solid-Phase Assays for Directed Evolution Simon Delagrave, Edward J. Bylina, William J. Coleman, Steven J. Robles, Mary M. Yang, Christin L. McConnell, and Douglas C. Youvan
Contents
507
25. Screen Automation and Robotics Michael H. Lamsa, Nils Buchberg Jensen, and Steen Krogsgaard
525
26. Screening for Enantioselective Enzymes Manfred T. Reetz
559
27. Enzyme Engineering by Microbial Cell Surface Display Thorsten M. Adams and Harald Kolmar
599
28. Overexpression and Secretion of Biocatalysts in Pseudomonas Frank Rosenau and Karl-Erich Jaeger
617
29. Analysis of Catalytic and Structural Stability of Native and Covalently Modified Enzymes P. V. Sundaram and S. Srimathi
633
Index
661
Contributors
Thorsten M. Adams Abteilung fu¨r Molekulare Genetik und Pra¨parative Molekularbiologie, Institut fu¨r Mikrobiologie und Genetik, Georg-AugustUniversita¨t Go¨ttingen, Go¨ttingen, Germany Sang-Ho Baik Kamaishi Laboratories, Marine Biotechnology Institute Co., Ltd., Kamaishi, Japan William G. Bardsley Uppsala, Sweden
Department of Biochemistry, Uppsala University,
Uwe T. Bornscheuer Institute of Chemistry and Biochemistry, Department of Technical Chemistry and Biotechnology, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany Richard Bott Genencor International, Palo Alto, California, U.S.A. Edward J. Bylina, Ph.D. U.S.A. William J. Coleman, Ph.D. U.S.A.
KAIROS Scientific Inc., San Diego, California,
KAIROS Scientific Inc., San Diego, California, xiii
xiv
Contributors
Pedro M. Coutinho, Ph.D.* Centre for Biological and Chemical Engineering, Instituto Superior Te´cnico, Lisbon, Portugal Jirˇ ı´ Damborsky´, Ph.D. National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic Gideon J. Davies York Structural Biology Laboratory, Department of Chemistry, University of York, York, England Simon Delagrave, B.Sc., Ph.D.
BioTech Studio, Newark, Delaware, U.S.A.
Emeline Deleury Architecture et Fonction des Macromole´cules Biologiques, Centre National de la Recherche Scientifique (CNRS), Universite´s d’Aix-Marseille I and II, Marseille, France Thorsten Eggert, Ph.D. Institut fu¨r Molekulare Enzymtechnologie, Heinrich-Heine Universita¨t Du¨sseldorf, Forschungszentrum Ju¨lich, Ju¨lich, Germany Vincent G. H. Eijsink, Ph.D. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway Jacques Fastrez Laboratoire de Biochimie Physique et des Biopolyme`res, Institut des Sciences de la Vie, Universite´ Catholique de Louvain, Louvain-laNeuve, Belgium Federico Gago, Ph.D. Madrid, Spain
Department of Pharmacology, University of Alcala,
Sigrid Ga˚seidnes, M.Sc. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway Kshamata Goyal Tsukuba, Japan
Enzyme Laboratory, National Food Research Institute,
Lars O. Hansson Department of Biochemistry, Uppsala University, Uppsala, Sweden
* Current affiliation: Architecture et Fonction des Macromole´cules Biologiques, Centre National de la Recherche Scientifique (CNRS), Universite´s d’Aix-Marseille I and II, Marseille, France.
Contributors
xv
Shigeaki Harayama Tokyo, Japan Kiyoshi Hayashi Tsukuba, Japan
National Institute of Technology and Evaluation,
Enzyme Laboratory, National Food Research Institute,
Bernard Henrissat, D.Sc. Architecture et Fonction des Macromole´cules Biologiques, Centre National de la Recherche Scientifique (CNRS), Universite´s d’Aix-Marseille I and II, Marseille, France Douglas Houston Division of Molecular Microbiology and Biological Chemistry, Wellcome Trust Biocentre, University of Dundee, Dundee, Scotland Satoshi Igarashi Department of Biotechnology, Tokyo University of Agriculture and Technology, Tokyo, Japan Karl-Erich Jaeger, Ph.D. Institut fu¨r Molekulare Enzymtechnologie, Heinrich-Heine Universita¨t Du¨sseldorf, Forschungszentrum Ju¨lich, Ju¨lich, Germany Toma´sˇ Jedlicˇka, M.Sc. National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic Nils Buchberg Jensen, M.Sc. (Chem. Eng.) Laboratory Technology, Novo Nordisk Engineering A/S, Bagsværd, Denmark Osamu Kagami, Ph.D. Kamaishi Laboratories, Marine Biotechnology Institute Co., Ltd., Kamaishi, Japan Shigenori Kanaya, Ph.D. Department of Material and Life Science, Graduate School of Engineering, Osaka University, Osaka, Japan Andrey Karshikoff, Ph.D. Department of Biosciences at Novum, Karolinska Institutet, Huddinge, Sweden Bong Jo Kim Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Jong-Deog Kim Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan
xvi
Contributors
Yeon-Kye Kim Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Motomitsu Kitaoka Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Jan Kmunı´ cˇ ek, M.Sc. National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic Harald Kolmar Abteilung fu¨r Molekulare Genetik und Pra¨parative Molekularbiologie, Institut fu¨r Mikrobiologie und Genetik, Georg-August-Universita¨t Go¨ttingen, Go¨ttingen, Germany Gustav Kolstad, M.Sc. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway David Komander Division of Molecular Microbiology and Biological Chemistry, Wellcome Trust Biocentre, University of Dundee, Dundee, Scotland Steen Krogsgaard, Ph.D. Strain Development, Molecular Biotechnology, Novozymes A/S, Bagsværd, Denmark Michael H. Lamsa, B.S. HTS-Core Robotics, Novozymes Biotech, Inc., Davis, California, U.S.A. Daniel Legendre Laboratoire de Biochimie Physique et des Biopolyme`res, Institut des Sciences de la Vie, Universite´ Catholique de Louvain, Louvain-laNeuve, Belgium Martin Lehmann Biotechnology Research Department, Roche Vitamins AG, Basel, Switzerland Santos Luengo Madrid, Spain
Department of Pharmacology, University of Alcala,
Bengt Mannervik Uppsala, Sweden
Department of Biochemistry, Uppsala University,
Costas D. Maranas, Ph.D. Department of Chemical Engineering, Pennsylvania State University, University Park, Pennsylvania, U.S.A.
Contributors
xvii
Christin L. McConnell Department of Earth and Planetary Sciences, Environmental Science and Public Policy Program, Harvard University, Cambridge, Massachusetts, U.S.A. Gregory L. Moore, B.S. Department of Chemical Engineering, Pennsylvania State University, University Park, Pennsylvania, U.S.A. Jens Erik Nielsen, Ph.D.* Howard Hughes Medical Institute and Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California, U.S.A. Satoru Nirasawa Tsukuba, Japan
Enzyme Laboratory, National Food Research Institute,
Ann-Jose´e Noe¨l Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany Jens Sigurd Okkels, Ph.D.y Novozymes A/S, Bagsværd, Denmark Angel R. Ortiz, Ph.D.z Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York, U.S.A. Martin G. Peter, Dr. Germany
Institute of Chemistry, University of Potsdam, Golm,
Gu¨nther H. Peters, Ph.D. Department of Chemistry, MEMPHYS-Center for Biomembrane Physics, Technical University of Denmark, Lyngby, Denmark Alfred Pingoud sen, Germany
Institute for Biochemistry, Justus-Liebig-Universita¨t, Gies-
Vera Pingoud Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany Ju¨rgen Pleiss, Ph.D. Institute of Technical Biochemistry, University of Stuttgart, Stuttgart, Germany * Current affiliation: Department of Biochemistry, University College Dublin, Dublin, Ireland y Current affiliation: Molecular Biology, Maxygen Aps, Horsholm, Denmark z Current affiliation: Centro de Biologia Molecular, Universidad Autonoma de Madrid, Madrid, Spain
xviii
Contributors
Frank M. Raushel, Ph.D. Department of Chemistry, Texas A&M University, College Station, Texas, U.S.A. Manfred T. Reetz Max-Planck-Institut fu¨r Kohlenforschung, Mu¨lheim an der Ruhr, Germany Steven J. Robles, Ph.D. U.S.A.
KAIROS Scientific Inc., San Diego, California,
Frank Rosenau, Ph.D. Institut fu¨r Molekulare Enzymtechnologie, Heinrich-Heine Universita¨t Du¨sseldorf, Forschungszentrum Ju¨lich, Ju¨lich, Germany Satya Singh Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Koji Sode Department of Biotechnology, Tokyo University of Agriculture and Technology, Tokyo, Japan Patrice Soumillion Laboratoire de Biochimie Physique et des Biopolyme`res, Institut des Sciences de la Vie, Universite´ Catholique de Louvain, Louvain-la-Neuve, Belgium S. Srimathi Centre for Protein Engineering and Biomedical Research, The Voluntary Health Services, Madras, India Shawn Steuer Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany P. V. Sundaram Centre for Protein Engineering and Biomedical Research, The Voluntary Health Services, Madras, India Yuzuru Suzuki, Ph.D. Department of Applied Biochemistry, Kyoto Prefectural University, Kyoto, Japan Allan Svendsen, Ph.D. mark
Protein Design, Novozymes A/S, Bagsværd, Den-
Bjørnar Synstad, Ph.D. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway
Contributors
xix
Daan M. F. van Aalten, Ph.D. Division of Molecular Microbiology and Biological Chemistry, Wellcome Trust Biocentre, University of Dundee, Dundee, Scotland Rebecca C. Wade, D.Phil.
EML Research, Heidelberg, Germany
Wolfgang Wende Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany Mary M. Yang, Ph.D. U.S.A.
KAIROS Scientific Inc., San Diego, California,
Douglas C. Youvan, Ph.D.* KAIROS Scientific Inc., San Diego, California, U.S.A. Wenjuan Zha, B.S. Department of Chemical and Biomolecular Engineering, Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, U.S.A. Huimin Zhao, Ph.D. Department of Chemical and Biomolecular Engineering, Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, U.S.A.
* Current affiliation: Foundation for the Biological Manhattan Project, Frontenac, Kansas, U.S.A.
1 Concepts for Protein Engineering Martin Lehmann Roche Vitamins AG Basel, Switzerland
1
INTRODUCTION
Increasingly, proteins and especially enzymes come into focus as therapeutic and commercial targets, not only as isolated products but also as targets for metabolic engineering inside of a cell. However, most of the time, enzymes as isolated from nature do not fulfil the specific demands they have to meet for their industrial or medical application or in a re-engineered cell tailored for special tasks such as overproduction of natural compounds. To improve proteins/enzymes for given purposes, protein-engineering concepts have emerged over the last decades. Protein engineering describes the alteration/improvement of certain properties of a protein by changing its building blocks and, therefore, its structure and properties. This can be done by chemically modifying the amino acid residues of a protein or by altering its primary structure, which is possible since all modern DNA manipulation tools have become available. Typical chemical modifications are the crosslinking of amino acids at the protein surface or the derivatization of particular, reactive amino acid residues in the active center of enzymes such as the catalytically active serines in serine-proteases. Advantages of chemical modification include that changes of the 3-D-structure remain localized and are 1
2
Lehmann
therefore much better predictable. However, the desired effects are generally weak and are not achievable in each case and for every protein. In addition, the protein must be extracted from the cells and sometimes even purified. It is not applicable to proteins that should be active intracellularly. Finally, chemical modification adds costs to a possible product. For these reasons, it is much more promising nowadays to directly change the properties of a protein by altering its primary structure. In order to make the most efficient use of rational protein engineering, it is still desirable to understand better the links between the primary structure of a protein and its 3-D structure on one hand, and between its 3-D structure and properties on the other hand (Table 1).
Table 1
Protein Engineering Concepts
Stability Homology approaches
Reducing entropy of the unfolded state Increasing strength and number of interactions
Protein design algorithms (PDA) Activity Comparison of homologous enzymes
Comparison with thermostable homologues (Ref. 31), amino acid residue exchange between homologous proteins of same stability (Refs. 2,3), consensus concept (Refs. 4,9,11–13) Proline rule (Ref. 26), additional disulfide bonds (Ref. 23) H-bonds (Ref. 20), salt bridges, hydrophobic interactions (Refs. 21,22), stabilization of secondary structure elements (Refs. 33,36) (Ref. 40)
Detection of amino acid residues critical for catalysis, transfer of residues or introduction of new amino acid residues at those positions (Refs. 14,15) Alteration of the electrostatic and steric 3-D structure alone or in environment around amino acid combination with docking residues crucial for catalysis to algorithms further analyzing influence substrate specificity, specific catalysis of the enzyme activity, and enantioselectivity (Refs. 15,37–39) pKa shifts of a titratable group by alteration of the local environment to shift the pH activity profile (Ref. 37)
Concepts for Protein Engineering
2
3
HOMOLOGY APPROACHES
Homology-based approaches try to use the information that is inherent in a collection of homologous wild-type enzymes. The amino acid sequence and the enzyme properties of a series of homologous wild-type enzymes are collected and compared in order to relate the differences in the amino acid sequence level to differing enzyme properties. This is helpful for identifying amino acids that have a modulating effect on catalysis or stability. 2.1
Stability
One strategy for identifying thermostabilizing mutations is to compare homologous proteins that pronouncedly differ in their thermostability and to identify the amino acid residues responsible for the difference in thermostability. In one example, Perl et al. (1) compared the cold shock proteins (67 amino acids long) from the thermophile Bacillus caldolyticus and the mesophile Bacillus subtilis, which differ only in 12 amino acid residues from each other but show a remarkable difference in stability. Site-directed mutagenesis of all 12 differing residues in the B. caldolyticus background revealed that only two are responsible for the lower thermostability of the B. subtilis enzyme. However, the lower the sequence homology of two proteins the more difficult it is to identify the residues that account for a difference in stability. Serrano et al. (2) showed that it is also possible to use the information of highly homologous enzymes of similar stability to create a new enzyme with improved stability. The work-intensive strategy is to determine for every non-conserved position of two or more homologous proteins which of the occurring amino acid residues is most stabilizing. Like in the cold-shock protein, as discussed above, the majority of the differing amino acids will show no or only a slight difference in their impact on the overall protein stability. However, combination of the ‘‘most stabilizing’’ residues in one protein results in a mutant that, surprisingly, is more thermostable than any of the parents used for the design. Using the sequence information of barnase and binase, Serrano et al. were able to construct a new microbial RNase which was 3.3 kcal/mol more stable than barnase. In a similar approach in which structural information was also taken into account, Jiang et al. (3) increased the stability of the WW domain by 2.5 kcal/mol and increased the Tm by 28jC. The central prerequisite of this approach is a homology among the group of proteins used that is high enough that a given amino acid replacement can be expected to have the same effect in all proteins compared. Steipe et al. (4) went one step further. They compared the variable VL domains of different immunoglobulins and raised the idea that an amino acid that occurs more frequently at a given position of a sequence alignment is more stabilizing than an amino acid occurring less frequently. Using this idea, they predicted 10
4
Lehmann
individual, stabilizing mutations of which six had indeed a stabilizing effect. Experiments on the VL and VH domains (5,6,7) and the design of a highly stable functional GroEL minichaperone (8) further supported this hypothesis. This approach greatly reduces the number of single mutations that have to be tested to achieve a stabilized mutant. A further step ahead was the idea to take the entire sequence alignment of a group of homologous proteins to calculate the consensus sequence of this alignment and to generate a synthetic gene coding for the consensus sequence obtained. This approach was tested on a subset of fungal phytases. The resulting consensus phytase, which is based on 13 wild-type sequences, was, surprisingly, 15–26jC more thermostable than all of its parents (9). It differed in at least 80 amino acid residues from any of its parents. Incorporation of additional wild-type sequences in the alignment yielded an improved consensus phytase that was 7.4jC more thermostable than the first one (10). Examination of the effects on protein stability of most of the newly introduced residues revealed that 10 were stabilizing, 8 had no pronounced effect, and 10 showed a destabilizing effect; 4 were not tested. Back mutation of the most destabilizing mutations and introduction of another stabilizing mutation increased the melting temperature by another 5jC to 90jC ((11–12), see also Fig. 1). Although the 3-Dstructure of one of the wild-type phytases was known, none of this information went into the design process. The same consensus approach was used to generate a more stable consensus ankyrin repeat protein (13). 2.2
Activity
The information inherent in a sequence alignment of homologous proteins can also be used for activity engineering when the compared enzymes show differences in their catalytic properties. The crucial part of such an attempt is to correlate the differences in the amino acid sequence with the difference in a catalytic property. In case there is a manageable number of amino acid differences, each residue can be tested separately. However, when the number of differences is too high, additional information is usually required to determine the critical amino acid differences. A known 3-D-structure of one or more of the homologous proteins is of great help to further reduce the number of amino acid residues that might be responsible for the difference in enzymatic properties. In this direction, much work has been done on the engineering of fungal phytases. Two wild-type phytases from the Aspergillus niger strains NRRL 3135 and A. niger T213 display a threefold difference in specific activity, although they have only 12 amino acid differences distributed over the entire protein. Among the 12 divergent positions, three are located in or close to the substrate binding site. Testing of these three differing positions by site-directed mutagenesis revealed that a single amino
Concepts for Protein Engineering
5
Figure 1 ‘‘Evolution’’ of consensus phytases. (From Ref. 11. For further details see Refs. 9 and 10.)
acid difference is responsible for the threefold difference in specific activity (14). Position 27 of the same enzyme, which is located in the active site cleft, was also found to have a profound effect on the specific activity. Usually, glutamine is found at this position, except for two phytases from Aspergillus terreus strains, which have a leucine. Remarkably, the latter two phytases have a much higher specific activity with phytic acid (up to 196 U/mg). Exchanging the glutamine residue against leucine in A. fumigatus phytase increased the specific activity with phytic acid from 26.5 to 92.1 U/mg at pH 5.0. However, this amino acid exchange had a negative effect on protein stability. Therefore a series of additional amino acids were tested at position 27 for their effects on the enzymatic properties. Threonine was the most favorable of the tested amino acids and had a positive effect on the specific activity without negatively impacting protein stability (15). Comparable effects as for A. fumigatus phytase were observed in consensus phytase (Fig. 2). Threonine has not yet been found in a wild-type phytase amino acid sequence at this position. In conclusion, comparison of the sequences and properties of homologous enzymes is quite useful for detecting amino acid residues that have an influence on catalysis. Amino acid substitutions at the identified,
6
Lehmann
Figure 2 pH-Dependent activity profile of consensus phytase and two mutants, in which glutamine at position 50 was replaced by leucine or threonine (Refs. 9,10). This critical position was identified in a homology approach under the additional use of an available 3-D structure.
critical residues should not be restricted to amino acids occurring in homologous wild-type sequences; rather, saturation mutagenesis of those residues should be envisaged, as the example above shows. 3
STRUCTURE-BASED APPROACHES
The valid attempt to explain by the structure of a protein alone how it works, how it folds, and how it maintains its structure is not only very challenging but is also aimed too high at the moment. There are too many interactions that govern the folding of a protein or the catalysis of a chemical reaction. However, a 3-D structure alone or in combination with additional information, such as the characterization of the biochemical and biophysical properties of a protein or of additional structures of homologous proteins or the structure of the same protein in a complex with an inhibitor, a substrate or a product of the enzyme reaction, together with general concepts of protein folding and the way enzymes catalyze a reaction, can help generate ideas on how to improve the selected properties of a protein. First of all, amino acid residues have to be identified together with their function in the activity or stability of the enzyme. A time-consuming but straightforward approach using structural information for picking the protein positions of interest is
Concepts for Protein Engineering
7
called alanine scanning (16). Here a series of amino acid residues are changed to alanine and the effect on enzyme activity or stability is analyzed. For improvement, saturation mutagenesis can be applied to the identified, critical protein positions. 3.1
Stability
The first question that arises when a protein engineer is confronted with the task of stabilizing an enzyme is: under which conditions, why, and how is an enzyme inactivated? There are many ways to inactivate an enzyme and, accordingly, a number of strategies can be chosen to avoid inactivation. Generally, enzymes are sensitive against high temperature, extreme pH values, oxygen stress, proteases, deamidation of glutamine and asparagine residues, and chelating agents—if they depend on a metal ion for stability or activity— just to mention the most important factors that come into play when an enzyme is used in an industrial environment. Amino acids most susceptible to oxidation are the sulfur-containing amino acids cysteine and methionine. Replacing cysteines and methionines on the surface of the protein where they are particularly susceptible to oxidation can greatly reduce oxidative inactivation (17). Similarly, surface-exposed glutamine and asparagine residues can be replaced to avoid deamidation. If a protein is susceptible to proteases, it can be made more resistant by carefully altering the sequence around the preferred cleavage site(s) (18). Engineering a protein to sustain at extreme pH can be a more challenging task involving more than one or two residues. Here phenomena such as the spontaneous formation of peptide succinimides in Asp–Gly and Asn–Gly sequences, the burial of ionized groups, and the repulsive electrostatic forces caused by the large net charge that many proteins encounter at extreme pH values are important factors. If a problematic site is known, it can be attempted to replace this residue. However, it can be quite difficult to identify the responsible residues, in particular if more than one or two residues are involved. Still, the most promising way to preserve the stability against most of the destabilizing factors is to increase the general stability of a protein molecule. A large number of weak interactions (hydrophobic interactions, salts bridges, H-bonds) of the amino acid side chains together with the backbone of peptide bonds keep a protein in its active 3-D state. Additionally, disulfide bonds and the incorporation of prolines help to stabilize the native state by reducing the entropy of the unfolded state. Therefore every mutation that helps to increase the number and strength of weak interactions in the native state or is able to reduce the entropy of the unfolded state increases the stability of a protein. Several concepts have been developed to achieve exactly that. Researchers have attempted to fill cavities in a protein (19). This should
8
Lehmann
help increase the number and strength of most hydrophobic interactions. They engineered new salt bridges or H-bonds (20), they have improved hydrophobic interactions at the surface or in the protein core (21,22), they have engineered new disulfide bridges (23), and they have introduced additional prolines (24) or less glycines into the amino acid sequence (25). For every amino acid replacement in a protein, it has to be considered that the effect of the replacement can reach far beyond the localized area it has been planned for. Sometimes, this leads to reorganization of parts or the entire structure of a protein accompanied by a much stronger and often negative effect on the stability than the desired localized improvement would bring. As it is not possible at the moment to predict these far-reaching effects of an amino acid replacement on the protein structure, the results of rational protein engineering are sometimes quite unexpected. Nevertheless, there have been some impressive examples using the strategies described above. Suzuki et al. (26) proposed that the proline content of a protein is correlated with its stability, because prolines have the unique property to decrease the entropy of the unfolded state of a protein. This strategy was successfully applied to different proteins such as the bacteriophage T4 lysozyme (27), an oligo-2,6-glucosidase of Bacillus cereus, the neutral protease of Bacillus stearothermophilus (24), and the glucoamylase of Aspergillus awamori (28). The obtained results show that the positions in the amino acid sequence chosen for proline introduction have to be carefully selected; otherwise, strong negative effects on stability could be the result of such replacements. Also, the introduction of new disulfide bonds has led to impressive results (23). Again, the success rate is rather low because of the unpredictable long-range effects of the amino acid replacements. An article about protein stability would not be complete without mentioning one of the most impressive examples of thermostabilization of a protein. Eijsink’s group has worked for several years on the stabilization of thermolysin-like protease from B. stearothermophilus (29). They finally succeeded in a hyper-stable mutant which was still active at 100jC. The temperature optimum of the eightfold mutant was 21jC higher than that of the wild-type enzyme while maintaining its normal activity at 37jC. Five amino acid substitutions were derived from the 3jC more thermostable homologue thermolysin (30). The three remaining, rationally designed mutations included an additional proline and a new disulfide bridge. These impressive results indicate that the combination of different strategies will result in larger jumps in thermostabilization. There has been an attempt also to increase the number of interactions by changing the quaternary structure from a monomer to a multimer because it is often observed that enzymes occurring in a mesophilic organism as a monomer are found in a thermophilic organism as a multimer (31). However, while engineering of active monomers from a multimer has already been
Concepts for Protein Engineering
9
achieved by replacing amino acids that are critical for multimerization and by subsequent stabilization of the monomeric structure (32), it is difficult to favor multimerization of a protein that is found as a monomer in its wildtype form. It is also possible to improve the stability of secondary structure elements such as a-helices. a-Helices have a net positive charge at their Nterminal end and a net negative charge at their C-terminal end. An additional stabilizing interaction is generated when an oppositely charged amino acid residue is located in the right distance to the ends of the a-helix. This feature is called helix-capping (33,34). When an a-helix does not show helixcapping or suboptimal capping, the responsible amino acid can be replaced by a better one (35). Besides this, it has been found that not every amino acid occurs with the same probability in an a-helix or a h-sheet. By replacing amino acids that have a low propensity for occurring in such secondary structure elements against an amino acid with a higher propensity, the stability of a protein can be improved (36). Furthermore, it has been very popular to compare the structure of a mesophilic enzyme to its homologous thermophilic counterpart(s), to find out which are the most common ways used by nature to evolve highly stable proteins. However, no general rules have been observed so far that can be applied to a larger number of proteins. Most of the proteins have found their own way of stabilization, mostly a combination of all the approaches mentioned above. Because of the considerable number of different promising approaches that can be used in a more rational way or that make use of more ‘‘irrational’’ methods such as directed evolution, protein stabilization is seen more and more as a solvable task that is done routinely in biochemical laboratories. 3.2
Activity
Compared to stability engineering, engineering of the catalytic properties of an enzyme is still more of an endeavor. The pH activity profile, the specific activity, the substrate specificity, or the enantioselectivity are typical properties a protein engineer is interested to change. To improve any of those properties in a rational way, a high resolution 3-D structure (preferably of an enzyme complexed with its substrate, product, or a substrate analog), a good idea about the way the enzyme catalyses the reaction and, as a result of this, the identity of the responsible amino acid residues in the active center with their function during catalysis are required. Computer programs are already able to model a substrate into an active site cleft to generate some ideas about the amino acid residues that are important for substrate binding, stabilization of the transition state of the reaction, and for catalysis of the reaction
10
Lehmann
itself. Having all this information collected, one can try to raise educated ideas about the way the activity profile of an enzyme can be changed. The pH-activity profile, for example, can be changed by altering the pKa value of either a nucleophile or a proton donor of a reaction. This, again, is possible by altering the electrostatic field around a titratable group of an enzyme, which is influenced by the local hydrogen bonding network, by solvent accessibility, and by the neighboring charged groups. Mutations introduced to influence one of those factors should have an impact on the pH-activity profile of the enzyme. It is thought that the introduction of a negative charge in the surrounding of a titratable group produces an upward shift of the pKa value of a proton donor while the introduction of a positive charge should shift its pKa value downwards. Even this rather simple model is sometimes contradicted by the experiment. Wind et al. (40) inserted an additional positive or negative charge in the active site of cyclodextrin glycosyl transferase. Contrary to expectations, both mutants showed a downward shift of the pHactivity profile. Knowing or guessing the amino acid residues interacting with the substrate during catalysis enables one to specifically increase or decrease the fit of a substrate into its active site and to attempt to favor one substrate over another by specifically altering the electrostatic and steric environment inside the active center. Among others, this can lead to an increase in the specific activity of an enzyme. Sometimes, facilitation of the release of the product increases the specific activity; however, this is accompanied by a higher Km value most of the time (15). Steric and electrostatic interactions between the substrate and the enzyme are also the engineering targets for improving the enantioselectivity of an enzyme. A successful example was described by Rotticci et al. (41) who were able to double the enantioselectivity of Candida antarctica lipase B toward halohydrins using a modeling algorithm that predicted the structural changes in the active center of possible mutants. Only those mutants were constructed that displayed better interactions with the substrate in energy contour maps. One single-point mutation doubled the enantioselectivity while another mutation annihilated the enantioselectivity toward the target substrate. However, as already mentioned for rational stability engineering, every amino acid substitution can cause adaptations of the entire protein structure to the introduction of a new amino acid residue. This can perturb or even reverse a predicted effect on the enzyme. 4
OUTLOOK
Every amino acid substitution has a more or less pronounced effect on the entire structure of a protein, which means that most of the time its effect
Concepts for Protein Engineering
11
reaches far beyond the actual site of mutation. Therefore it is very difficult to predict accurately the effect a mutation has on an enzyme property of interest. The accurate prediction of such an effect has to take the interactions of the peptide bond backbone and all amino acid residues with each other and with the solvent into account. This huge number of interactions can only be handled by complex computer programs. Yet, protein design algorithms improve rapidly. Special programs are already able to predict stabilizing point mutations with an accuracy of 1 kcal/mol (42,43). These force field calculations or similar algorithms are also able to predict the interactions between a substrate and an active site, which makes them attractive for activity engineering. However, as long as these programs cannot meet the accuracy required, directed evolution (44), an approach of mutagenesis and screening, is so far the most successful approach in particular for stability but also for activity engineering; at least as long as the target is to obtain an improved mutant rather than an explanation on how proteins fold or how an enzyme catalyzes a chemical reaction.
ACKNOWLEDGMENT The author wishes to thank Dr. M. Wyss for his critical and fruitful comments and discussions regarding the manuscript.
REFERENCES 1.
2. 3.
4.
5. 6.
7.
D Perl, U Mu¨ller, U Heinemann, FX Schmid. Two exposed amino acid residues confer thermostability on a cold shock protein. Nat Struct Biol 7:380–383, 2000. L Serrano, AG Day, AR Fersht. Step-wise mutation of barnase to binase. J Mol Biol 233:305–312, 1993. X Jiang, J Kowalski, JW Kelly. Increasing protein stability using a rational approach combining sequence homology and structural alignment: Stabilizing the WW domain. Protein Sci 10:1454–1465, 2001. B Steipe, B Schiller, A Plu¨ckthun, S Steinbacher. Sequence statistics reliably predict stabilizing mutations in a protein domain. J Mol Biol 240:188–192, 1994. E Ohage, B Steipe. Intrabody construction and expression: I. The critical role of VL domain stability. J Mol Biol 291:1119–1128, 1999. S Ewert, A Honegger, A Plu¨ckthun. Structure-based improvement of the biophysical properties of immunoglobulin VH domains with a generalizable approach. Biochemistry 42:1517–1528, 2003. A Knappik, L Ge, A Honegger, P Pack, M Fischer, G Wellnhofer, A Hoess, J Wo¨lle, A Plu¨ckthun, B Virneka¨s. Fully synthetic human combinatorial antibody
12
8. 9.
10.
11.
12.
13.
14.
15.
16. 17.
18.
19.
20.
Lehmann libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J Mol Biol 296:57–86, 2000. Q Wang, AM Buckle, NW Foster, CM Johnson, AR Fersht. Design of highly stable functional GroEL minichaperones. Protein Sci 8:2186–2193, 1999. M Lehmann, D Kostrewa, M Wyss, R Brugger, A D’Arcy, L Pasamontes, APGM van Loon. From DNA sequence to improved functionality: using protein sequence comparisons to rapidly design a thermostable consensus phytase. Protein Eng 13:49–57, 2000. M Lehmann, C Loch, A Middendorf, D Studer, SF Lassen, L Pasamontes, APGM van Loon, M Wyss. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng 15:403–411, 2002. M Lehmann, L Pasamontes, SF Lassen, M Wyss. The consensus concept for thermostability engineering of proteins. Biochim Biophys Acta 1543:408–415, 2000. M Lehmann, M Wyss. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr Opin Biotechnol 12:371–375, 2001. A Kohl, HK Binz, MT Stumpp, A Plu¨ckthun, MG Gru¨tter. Designed to be stable: crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci USA 100:1700–1705, 2003. A Tomschy, M Wyss, D Kostrewa, K Vogel, M Tessier, S Ho¨fer, H Bu¨rgin, A Kronenberger, R Re´my, APGM van Loon, L Pasamontes. Active site residue 297 of Aspergillus niger phytase critically affects the catalytic properties. FEBS Lett 472:169–172, 2000. A Tomschy, M Tessier, M Wyss, R Brugger, C Broger, L Schnoebelen, APGM van Loon, L Pasamontes. Optimization of the catalytic properties of Aspergillus fumigatus phytase based on the three-dimensional structure. Protein Sci 9: 1304–1311, 2000. C Cunningham, JA Wells. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244:1081–1085, 1989. TV Borchert, SF Lassen, A Svendsen, HB Frantzen. Oxidation stable amylases for detergents. In: SB Petersen, B Svensson, S Petersen, eds. Carbohydrate Bioengineering. Amsterdam: Elsevier Science, 1995, pp 175–179. M Wyss, L Pasamontes, A Friedlein, R Re´my, M Tessier, A Kronenberger, A Middendorf, M Lehmann, L Schnoebelen, U Ro¨thlisberger, E Kusznir, G Wahl, F Mu¨ller, H-W Lahm, K Vogel, APGM van Loon. Biophysical characterization of fungal phytases (myo-inositol hexakisphosphate phosphohydrolases): molecular size, glycosylation pattern, and engineering of proteolytic resistance. Appl Environ Microbiol 65:367–373, 1999. M Karpusas, WA Baase, M Matsumura, BW Matthews. Hydrophobic packing in T4 lysozyme probed by cavity-filling mutants. Proc Natl Acad Sci USA 86:8237–8241, 1989. VGH Eijsink, G Vriend, JR van der Zee, B van den Burg, G Venema. Increasing the thermostability of the neutral proteinase of Bacillus stearothermophilus by improvement of internal hydrogen-bonding. Biochem J 285:625–628, 1992.
Concepts for Protein Engineering
13
21. B van den Burg, BW Dijkstra, G Vriend, B van der Vinne, G Venema, VHG Eijsink. Protein stabilization by hydrophobic interactions at the surface. Eur J Biochem 220:981–985, 1994. 22. K Ishikawa, H Nakamura, K Morikawa, S Kanaya. Stabilisation of Escherichia coli ribonuclease HI by cavity-filling mutations within a hydrophobic core. Biochemistry 32:6171–6178, 1993. 23. M Matsumura, G Signor, BW Matthews. Substantial increase of protein stability by multiple disulfide bonds. Nature 342:291–293, 1989. 24. K Watanabe, T Masuda, H Ohashi, H Mihara, Y Suzuki. Multiple proline substitutions cumulatively thermostabilize Bacillus cereus ATCC 7064 oligo-1,6glucosidase. Irrefragable proof supporting the proline rule. Eur J Biochem 226: 277–283, 1994. 25. I Margarit, S Campagnoli, F Frigerio, G Grandi, V De Filippis, A Fontana. Cumulative stabilizing effects of glycine to alanine substitutions in Bacillus subtilis neutral protease. Protein Eng 5:543–550, 1992. 26. Y Suzuki, K Hatagaki, H Oda. A hyperthermostable pullulanase produced by an extreme thermophile, Bacillus flavocaldarius KP 1228, and evidence for the proline theory of increasing protein thermostability. Appl Microbiol Biotechnol 34:707–714, 1991. 27. BW Matthews, H Nicholson, WJ Becktel. Enhanced protein thermo-stability from site-directed mutations that decrease the entropy of unfolding. Proc Natl Acad Sci USA 84:6663–6667, 1987. 28. MJ Allen, PM Coutinho, CF Ford. Stabilization of Aspergillus awamori glucoamylase by proline substitution and combining stabilizing mutations. Protein Eng 11:783–788, 1998. 29. B van den Burg, G Vriend, OR Veltman, G Venema, VGH Eijsink. Engineering an enzyme to resist boiling. Proc Natl Acad Sci USA 95:2056–2060, 1998. 30. VGH Eijsink, OR Veltman, W Aukema, G Vriend, G Venema. Structural determinants of the stability of thermolysin-like proteinases. Nat Struct Biol 2:374– 379, 1995. 31. B Dalhus, M Saarinen, UH Sauer, P Eklund, K Johansson, A Karlsson, S Ramaswamy, A Bjork, B Synstad, K Naterstad, R Sirevag, H Eklund. Structural basis for thermophilic protein stability: structures of thermophilic and mesophilic malate dehydrogenase. J Mol Biol 318:707–721, 2002. 32. G Saab-Rinco´n, VR Jua´rez, J Osuna, F Sa´nchez, X Sobero´n. Different strategies to recover the activity of monomeric triosephophate isomerase by directed evolution. Protein Eng 14:149–155, 2001. 33. H Nicholson, WJ Becktel, BW Matthews. Enhanced protein thermostability from designed mutations that interact with a-helix dipoles. Nature 336:651–656, 1988. 34. L Serrano, AR Fersht. Capping and a-helix stability. Nature 342:296–299, 1989. 35. S Walter, B Hubner, U Hahn, FX Schmid. Destabilization of a protein helix by electrostatic interactions. J Mol Biol 252:133–143, 1995. 36. X-J Zang, WA Baase, BW Matthews. Multiple alanine replacements within ahelix. Protein Sci 1:761–776, 1992.
14
Lehmann
37. G Jones, P Willett, RC Glen, AR Leach, R Taylor. Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727–748, 1997. 38. DS Goodsell, GM Morris, AJ Olson. Automated docking of flexible ligands: applications of AutoDock. J Mol Recognit 9:1–5, 1996. 39. M Rarey, B Kramer, T Lengauer, G Klebe. A fast flexible docking method using an incremental construction algorithm. J Mol Biol 261:470–489, 1996. 40. RD Wind, JC Uitdehaag, RM Buitelaar, BW Dijkstra, L Dijkhuizen. Engineering of cyclodextrin product specificity and pH optima of the thermostable cyclodextrin glycosyltransferase from Thermoanaerobacterium thermosulfurigenes EM1. J Biol Chem 273:5771–5779, 1998. 41. D Rotticci, JC Rotticci-Mulder, S Denman, T Norin, K Hult. Improved enantioselectivity of a lipase by rational protein engineering. Chem Biol Chem 2:766– 770, 2001. 42. BI Dahiyat. In silico design for protein stabilization. Curr Opin Biotechnol 10: 387–390, 1999. 43. SM Malakauskas, SL Mayo. Design, structure and stability of a hyperthermophilic protein variant. Nat Struct Biol 5:470–475, 1998. 44. JC Moore, FH Arnold. Directed evolution of a para-nitrobenzyl esterase for aqueous-organic solvents. Nat Biotechnol 14:458–467, 1996.
2 Sequence Families and Modular Organization of Carbohydrate-Active Enzymes* Bernard Henrissat and Emeline Deleury Centre National de la Recherche Scientifique (CNRS), ´s d’Aix-Marseille I and II Universite Marseille, France
Pedro M. Coutinhosuy ´cnico Instituto Superior Te Lisbon, Portugal
Gideon J. Davies University of York York, England
1
INTRODUCTION
Carbohydrates, in the form of oligo- and polysaccharides, are universally found in nature. These compounds are elaborated from simple sugars by gly* Dedicated to the memory of Martin Schu¨lein. y Current affiliation: Centre National de la Recherche Scientifique (CNRS), Universite´s d’AixMarseille I and II, Marseille, France
15
16
Henrissat et al.
cosyltransferases (GTs) and are degraded by glycoside hydrolases (GHs) and polysaccharide lyases (PLs). In this work, these biosynthesis and degradative enzymes are referred to as ‘‘carbohydrate-active enzymes.’’ Large amounts of polysaccharide-based compounds are biosynthesized each year on Earth (for cellulose alone this amounts to over 109 t/year), mostly from photosynthesis. Because of this abundance, carbohydrate-based materials have long found applications as raw materials. Today, they are used in various industries, such as food, feed, paper, detergent, and textile, where there is a large scope for application of degradative enzymes to improve the properties of carbohydrate-based materials or to achieve their degradation into simple and fermentable sugars. In other words, carbohydrate-active enzymes are key enzymes for the clean processing of abundant and useful renewable resources. Because of the many possible isomers of a simple monosaccharide, there is an enormous chemical diversity of structures in oligo- and polysaccharides (1). In addition, carbohydrate structures are sometimes ‘‘decorated’’ by noncarbohydrate substituents such as various esters. Finally, chemically simple homopolysaccharides, such as cellulose, chitin, or starch, display extensive physical diversity and heterogeneity depending on the source, due to the difference in aggregation states or crystallinity. The physical state of the substrate has severe consequences as, e.g., some enzymes are able to degrade the noncrystalline part only, while others can affect the crystalline part of the substrate (2). This immense chemical and physical diversity requires a corresponding level of diversity in enzymes for the selective biosynthesis and biodegradation of carbohydrate structures. The diversity of these enzymes has long been the source of the problem in their classification. 2
THE CLASSIFICATION OF CARBOHYDRATE-ACTIVE ENZYMES
Several criteria can be envisioned for the classification of carbohydrate-active enzymes. The simplest form of classification is based on their substrate specificities. Such a classification is the basis of the recommendations of the International Union of Biochemistry and Molecular Biology (IUBMB) (3) and is expressed in the EC number for a given enzyme. O-Glycoside hydrolases are given the code EC 3.2.1.x, where the last digit represents the substrate specificity. For example, h-glucosidase is EC 3.2.1.21, while h-galactosidase is EC 3.2.1.23. The advantage of this system is its simplicity, which has led to its widespread usage. The intrinsic problem with a classification based on substrate (or product) specificity is that it does not appropriately accommodate enzymes that act on several substrates. This is particularly relevant to glycosidases,
Carbohydrate-Active Enzymes
17
which work on the highly complex polysaccharides and which frequently display broad, overlapping, specificities. For instance, endoglucanases, while typically considered to be cellulases, are also active to various degrees on xylan, xyloglucan, h-glucan, and various artificial substrates. Recently, there have been many structural results on glycoside hydrolases, but the classification based on substrate specificity fails to reflect the three-dimensional structural features of these enzymes, which are now becoming apparent. In another example, the IUBMB classification assigns cyclodextrin glucanotransferases (EC 2.4.1.19) and starch branching enzymes (EC 2.4.1.18) in the transferases class and does not reflect the clear structural, evolutionary, and mechanistic relationship of these enzymes with a large family of starchhydrolysing enzymes (4). Similarly, myrosinase, an enzyme hydrolysing a particular series of S-glucosides, is classified as EC 3.2.3.1 (thio-glucosidase) and yet, has a sequence, molecular mechanism, and 3-D structure strikingly similar to that of an O-h-glucosidase (EC 3.2.1.21) (5,6). Conversely, many structurally unrelated enzymes such as endoglucanases display a similar substrate specificity and hence, identical IUBMB classification (for a review, see Ref. 7). Finally, with the flood of sequence data originating from molecular biology and recently from genome sequencing projects, it is now common to discover open reading frames which show similarities to known glycoside hydrolase sequences, but without knowledge of (or the means to readily determine) the substrate specificity. To circumvent the problems with the EC classification, we proposed a novel system in 1991 for the glycosidases (8). This system is based on a direct relationship between sequence and folding similarities (9). Consequently a classification solely based on amino acid sequence similarities was proposed. It was anticipated that the system would prove useful with the fast-growing number of glycosidase genes being sequenced and with the increasing number of 3-D structures being solved. The basic principle behind this new classification system is simple: regardless of activity and substrate specificity, sequences which would display similarity would be grouped in the same family, while sequences displaying no apparent similarity would be assigned to different families (8). The 300 sequences of glycosidases and related enzymes available in 1991 were observed to form 35 families (8). The earliest feature that appeared from the sequence-based families is that many were ‘‘polyspecific,’’ i.e., they contained enzymes of different substrate specificity (e.g., containing several EC numbers). The family that groups the largest number of EC numbers is family GH13 (also known as the a-amylase superfamily), which contains almost 30 enzyme specificities (for a review, see Ref. 10). A number of other families contain more than two distinct EC numbers, such as family GH1 (grouping eight EC numbers), family GH16 (seven EC numbers),
18
Henrissat et al.
families GH3 and GH32 (six EC numbers), etc. Several of the ‘‘monospecific’’ families (i.e., containing only one EC number) could turn out to be ‘‘polyspecific’’ when all members are characterized at the biochemical level. The existence of a number of polyspecific families indicates: (1) that the acquisition of new specificities by glycosidases is a common evolutionary event, (2) that the substrate specificity of glycosidases could be engineered for application purposes, and (3) that the substrate (or product) specificity of a glycosidase is defined by details of the 3-D structure, not by the global fold. Several consequences emerged from the new classification system. First, because sequence similarity is a strong indication of folding similarities, members of a given family were predicted to share the same fold. This would facilitate the homology modeling of other family members when the 3D structure of one member is determined. This may also help the structural biologist to choose a target for structural investigation because there is more potential for discovery in the resolution of a potentially new structure than of a structure that could be predicted. Second, because the catalytic apparatus is expected to be conserved within each family, an important outcome of the new classification system is the ability to locate the potential active site residues within a family based on the identification of appropriate invariant residues or based on the prior experimental identification of a catalytic residue. Equally important was the observation of Gebler et al. that the molecular mechanism is strictly conserved within a given glycosidase family (11). Fig. 1 shows the two primary mechanisms of glycosidases. With no exception to date, all the active members of a given family operate by using the same mechanism. It is worth mentioning here that some family members lost their catalytic machinery and assumed new roles (amino acid transporters, lectins, signaling sensors, inhibitors, etc.), pointing to the versatility of the glycosidase scaffolds for the development of new functionalities. The number of glycosidase families (35 in 1991), being necessarily a mere consequence of the size of the initial sequence sample, was predicted to increase as more sequences would become available (8). The exponential growth in sequences in public databases indeed led to a steady growth in the number of GH families, with 300 sequences and 35 families in 1991 (8), 480 sequences and 45 families in 1993 (12), 2800 sequences in 74 families in April 1999 (13) and 6500 sequences in 87 families as of May 2002. Given the success of the new classification system of glycosidases— which eventually became a standard of description of this category of enzymes—a similar strategy was applied to other carbohydrate-active enzymes, such as the glycosyltransferases (14). Again, most features of the glycosidase classification were found. In particular, several families were found to contain
Carbohydrate-Active Enzymes
19
Figure 1 The two canonical mechanisms of glycosidases (shown here for the hydrolysis of an equatorial glycosidic bond): (a) the retaining mechanism, (b) the inverting mechanism.
enzymes of varying donor and acceptor specificity. For instance, families GT1 and GT2 each contain eight characterized enzyme specifities. Because the biochemical characterization of glycosyltransferases is notoriously difficult, only a very small number of them are characterized, and it is more than likely that each of these families actually contains dozens of other enzyme specificities. Here the power of the sequence-based classification is maximal, considering that some glycosyltransferase families such as family GT2 contain over 1200 members, of which less than 10% have been characterized. More recently, the sequence-based classification was extended to the polysaccharide lyases and the carbohydrate esterases. In theory, other carbohydrate-active enzymes, such as the epimerases for instance, could be subjected to a similar classification system.
20
3
Henrissat et al.
THE CARBOHYDRATE-ACTIVE ENZYMES SERVER
The main issues in maintaining carbohydrate-active family classifications are as follows: (1) how to make them available to the scientific community, (2) how to disclose the new families and new family members, and (3) how to keep up with the ever-increasing number of sequences. The World Wide Web is obviously the best medium for the distribution of family classifications. For carbohydrate-active enzymes, the first milestone was achieved in 1996 with the availability of the GH classification on ExPASy (http://www. expasy.ch/ cgi-bin/lists?glycosid.txt) (15). However, this useful document suffered from containing annotated SwissProt entries only, thereby missing a large number of entries already available in GenBank, and the information on the threedimensional structures in the Protein Data Bank (PDB). Other drawbacks were as follows: (1) the irregular updates; (2) the impossibility of performing family-by-family browsing, and (3) the unavailability of family classifications of other carbohydrate-active enzymes such as glycosyltransferases. To overcome some of these problems, we created the Carbohydrate Active enZYmes server (CAZy, http://afmb.cnrs-mrs.fr/CAZY/index.html) to provide access to the classifications of GHs, GTs, and PLs in families based on sequence similarities (13). CAZy grants access to the various families of carbohydrateactive enzymes. Each family is annotated with information regarding all the enzyme activities that have been characterized and with the known catalytic and structural features. This summary is followed by a list of proteins and open reading frames (ORFs) belonging to the family with links to sequence and structural information available in public databases. Links to complementary relevant resources available on the Internet are also provided. An example is shown in Fig. 2. Because information on the repertoire of carbohydrate-active enzymes present in a given organism can provide interesting insights on its carbohydrate metabolism (16), a new feature was recently added where the user can access CAZy via organism (for organisms whose genome has been completely sequenced; see, example in Fig. 3). Based on a relational database, CAZy provides curated nonredundant sequence and structural information on carbohydrate-active enzyme families to the academic and commercial research communities. As of May 2002, the CAZy database contained over 12,500 proteins and ORFs belonging to more than 3200 organisms. The proteins are arranged in 200 families and cover 180 EC numbers (note that many enzyme specificities are not covered by the EC numbers). The CAZy web site features 280 HTML pages with almost 49,000 external links. The carbohydrate-active enzyme content of 53 complete genomes is available. Over half a million pages have been downloaded from CAZy externally since its launch in September 1998. The server is regularly updated, generally at least once a month. During this period, the number of single entries covered by CAZy increased fourfold!
Carbohydrate-Active Enzymes
21
Figure 2 Example of a CAZy page: family GH13 of the glycosidases. The header is a resume of what is known in this family. The ‘‘known activities’’ field shows that no less than 19 different enzyme activities have been experimentally identified in this family. Other fields indicate, for example, that the molecular mechanism leads to overall retention of the anomeric configuration and that the catalytic residues have been identified. The ‘‘statistics’’ field shows, among other data, that there were 942 members in the family (as of 13 May 2002) and that 34 have had their 3-D structure solved (resulting in a total of 119 PDB files). After the header, a listing (as complete as possible) of the proteins and ORFs assigned to this family is given with links to protein, nucleotide, enzyme classification, and structure databases.
4
THE PREDICTIVE POWER OF THE NEW CLASSIFICATION SYSTEM
As stated earlier, the mechanism, catalytic residues, and fold are conserved within each family. In consequence, it is now necessary to determine the stereochemical outcome or to identify the catalytic apparatus of one family member as this information can be readily extended to all members of the
22
Henrissat et al.
Carbohydrate-Active Enzymes
23
Figure 4 Molecular mechanism in the families of glycosidases (May 2002). The families which operate with a mechanism leading to overall retention of the anomeric configuration are indicated in black on a gray backgound. The families which act with an inverting mechanism are indicated in white on a black background. Those families for which the mechanism remains to be established are presented in gray on a white background.
family, and those missing the catalytic machinery can be identified (and corrected if sequencing errors are the cause (17)). The molecular mechanism is now known for 64 of the 87 families of glycosidases (Fig. 4). When the classification was introduced in 1991, only a handful of families had a structural representative. As expected from the relationship between sequence and structure, the recent accumulation of structural data for glycosidases (over 1600 PDB entries are listed in CAZy as of May 2002) confirmed that enzymes belonging to the same family indeed had a similar fold. More unexpected (and exciting) was the astonishing number of different
Figure 3 Access to carbohydrate-active enzymes by completely sequenced organism. (Top) The 62 organisms available from CAZy as of 13 May 2002. (Bottom) Results page for Thermotoga maritima.
24
Henrissat et al.
folds displayed by glycosidases—from all a to all h, (h/a)8-, (a/a)6- and (a/ a)7-barrels, jelly-rolls, h-propellers, h-barrels, h-helix, etc. (7,18,19)—a diversity largely exceeding the level known for esterases or peptidases, for example Fig. 5 shows that about half of the glycosidase families have at least one structural representative (May 2002). It is possible that other folds are yet to be discovered in the unresolved families. The clans of glycosidases (7,15,18) group together families sharing a common ancestor. When two proteins have related sequences, their 3-D structures are related. However, the opposite is not true because 3-D structures are better conserved than the sequences. Sometimes, one can predict that several families will fold similarly, usually by increasing the sensitivity of sequence comparison methods (see, e.g., Ref. 20). Because this implies detecting relatedness at the borderline of significance, structure determination is clearly the method of choice to unambiguously establish that some of the sequence-based families are related. By analogy to the
Figure 5 3-D structures in the families of glycosidases (May 2002). The families for which at least one 3-D structure has been deposited in the Protein Data Bank appear in white on a black background; those for which crystallization notes have been published appear in black on a pale gray background. Those for which there is no 3D structural data available are shown in gray on a white background.
Carbohydrate-Active Enzymes
25
proteinase work (21) and to avoid the confusion associated with the term ‘‘superfamily,’’ it was proposed that these groupings of related structures be referred to as ‘‘clans’’ (7,15,18). Thus far, 12 such clans have been described (an updated list of these clans can be found in the CAZy server). The largest of these, glycoside hydrolase clan GH-A, is composed of families 1, 2, 5, 10, 17, 26, 35, 39, 42, 51, 53, 59, 72, 79, and 86. What are the characteristics of a clan? Besides a common fold, the clans group families of enzymes sharing an identical catalytic machinery (identical residues on equivalent secondary structure elements) and hence an identical catalytic machinery. There is residual sequence similarity, sometimes detectable but too low to produce a global alignment (only two residues are invariant in clan GH-A for instance). Finally, there is a topological resemblance of the substrates (orientation of the glycosidic bond; see Fig. 6). These features make the clans different from the ‘‘folding/structural superfamilies,’’ which group together proteins sharing the same fold and whose common origin is hard to demonstrate (22). A fundamental basis for a useful classification requires that it must have predictive power. For instance, member-
Figure 6 Topological resemblance of the substrates hydrolyzed by glycosidase clan GH-A members. h-D- and a-L-hexosides and pentosides all share an identical reactive center (traced in black) with an equatorial glycosidic bond. An identical catalytic machinery can cleave these apparently dissimilar substrates.
26
Henrissat et al.
ship of a folding superfamily does not necessarily predict the details of the substrate, the mechanism, the catalytic amino acids nor the possibility of side reactions. The power of the sequence-based classification stems from the fact that family or clan membership predicts the enzyme structure and the configuration of both substrate and product. By contrast, folding superfamilies sometimes group enzymes operating with different mechanisms, or sometimes, even enzymes performing totally unrelated chemical reactions (22). 5
CARBOHYDRATE-ACTIVE ENZYMES IN THE ERA OF GENOMICS
A total of 85 organisms have had their genome fully sequenced (May 2002) and sequences of over 350 genomes are currently under preparation (Genome Online Database; http://wit.integratedgenomics.com/GOLD/). The predictive power of the sequence-based families of carbohydrate-active enzymes provides an efficient tool for the competent annotation (e.g., prediction of the function, fold, and mechanism) of ORFs found during genome sequencing. With the availability of a number of completely sequenced genomes, one can also search and make a census of all carbohydrate-active enzymes contained in a genome. As soon as this is performed for several genomes, the complement of carbohydrate-active enzymes within different genomes can be compared. However, before entering into these considerations, we must examine
Figure 7 Schematic structure of selected proteins containing a carbohydratebinding modules of family CBM2. GHX, module belonging to glycosidase family X; PLX, module belonging to polysaccharide lyase family X; CEX, module belonging to carbohydrate esterase family X; CBMX, module belonging to carbohydrate-binding module family X; FN3, modules distantly related to eukaryotic fibronectin type III modules; X, modules of unknown function with homologues in the databases; unlabeled gray boxes represent regions not yet assigned; TM, membrane-spanning region. (a) Endo-1,4-glucanase (Acidothermus cellulolyticus); (b) endo-1,4-glucanase (Streptomyces lividans); (c) cellulase Cel6B (Cellulomonas fimi); (d) cellulase Cel6A (C. fimi); (e) cellulase Cel9A (C. fimi); (f) ORF Slr0897 (Synechocystis sp.); (g) xylanase Xyn10A (C. fimi); (h) xylanase Xyn10A (Pseudomonas cellulosa); (i) xylanase B (S. lividans); (j) endo-1,4-glucanase (S. lividans); (k) chitinase (Bacillus thuringiensis); (l) chitinase C (Streptomyces coelicolor); (m) cellulase Cel45A (P. cellulosa); (n) cellulase Cel48A (C. fimi); (o) cellulase Cel48A (Thermobifida fusca); (p) arabinofuranosidase C (P. cellulosa); (q) ORF SC5C7.30c (S. coelicolor); (r) ORF Rv1987 (Mycobacterium tuberculosis); (s) pectate lyase Pel10A (P. cellulosa); (t) rhamnogalacturonan lyase Rgl11A (P. cellulosa); (u) esterase D (P. cellulosa); (v) acetyl xylan esterase STX-III (Streptomyces thermoviolaceus); (w) xylanase Xyn11A (C. fimi); (x) chitin-binding protein celS2 (Streptomyces viridosporus).
Carbohydrate-Active Enzymes
27
28
Henrissat et al.
one particular feature of crucial importance for the genomic analysis of carbohydrate-active enzymes: the modularity of these proteins. Many carbohydrate-active enzymes are modular, consisting of one or more catalytic domains carrying one or several noncatalytic domains. The noncatalytic modules often have a function in carbohydrate-binding (23), but sometimes their function is to promote protein–protein interaction, such as the dockerin modules implicated in the assembly of the multisubunit cellulosomes (24). In many cases, additional modules have been inferred by sequence analysis only, and their function remains to be studied and described (23). The noncatalytic modules also form distinct families, and those whose function has been shown to be carbohydrate-binding can now also be readily accessed through the CAZy server (http://afmb.cnrs-mrs.fr/ CAZY/CBM.html). A particular feature of non-catalytic modules (whether carbohydratebinding or not) is that they can be attached to many different types of catalytic domains (Figs. 7, 8, and 9). It is essential to understand and dissect the modularity of any given ORF prior to annotation, classification, or exploitation. Failure to appreciate its modularity is often the cause of many incorrect genome annotations, such as the labeling of certain Arabidopsis ORFs as ‘‘h-1,3-glucanase-like’’ when they are merely small noncatalytic modules that most likely bind h-1,3-glucans (25). It is also interesting to note that protein modularity is not restricted to glycoside hydrolases, and that a number of modular glycosyltransferases, carbohydrate esterases, and polysaccharide lyases have been identified (Figs. 7, 8, and 9). In summary, two major problems associated with carbohydrate-active enzymes must be appropriately dealt with to avoid erroneous genome annotations: (1) the modularity of carbohydrate-active enzymes and (2) the polyspecificity of the families. Failure to take these aspects into consideration can lead to: Wrong assignment. This frequently happens in the case of modular proteins (see above). Overprediction. For instance, only a avery few residues sometimes switch the specificity of a glycosyltransferase. In an extreme case, it has been shown that a mutation of a single residue could change an a-1,3-GalNAc transferase into an a-1,3-galactosyltransferase (26). When an ORF is distantly related to a largely polyspecific family, or to a family where only a very few members have been characterized, the precise substrate specificity cannot be reliably predicted. Conversely, a good fit between an ORF and many members of a large monospecific family (where many members have been characterized) allows a more confident prediction of the specificity. Whenever possible, additional features should be examined, for instance, the presence of the catalytic residues. In a small but significant number of cases,
Figure 8 Schematic structure of selected proteins containing a carbohydratebinding modules of family CBM13. GHX, module belonging to glycosidase family X; GTX, module belonging to glycosyltransferase family X; PLX, module belonging to polysaccharide lyase family X; CEX, module belonging to carbohydrate esterase family X; CBM X, module belonging to carbohydrate-binding module family X; unlabeled gray boxes represent regions not yet assigned; RIP, ribosome inactivating protein; KIN, kinase; S2A and M12A, peptidases belonging to families S2A and M12A. (a) Cinnamomin (Cinnamomum camphora); (b) polypeptide GalNAc transferase T1 (human); (c) ORF Rv1419 (Mycobacterium tuberculosis); (d) h-1,3glucanase I (Oerskovia xanthineolytica); (e), h-1,3-glucanase II (O. xanthineolytica); (f) ORF 6 (Polyangium cellulosum); (g) Pectate lyase B (Pseudoalteromonas haloplanktis); (h), ORF CAP0120 (Clostridium acetobutylicum); (i) ORF CAC0706 (C. acetobutylicum); (j) ORF CAP0071 (C. acetobutylicum); (k) a-galactosidase (Aspergillus niger); (l) xylanase (Streptomyces olivaceoviridis); (m) ORF SCD69.08 (Streptomyces coelicolor); (n) chitinase Chi35 (Streptomyces thermoviolaceus); (o) arabinofuranosidase B (S. coelicolor); (p) serine protease (Rarobacter faecitabidus); (q) protease (Chryseobacterium meningosepticum).
Figure 9 Schematic structure of selected proteins containing a bacterial dockerin module. DOC1, bacterial dockerin module; COH, bacterial cohesin module; GHX, module belonging to glycosidase family X; PLX, module belonging to polysaccharide lyase family X; CEX, module belonging to carbohydrate esterase family X; CBM X, module belonging to carbohydrate-binding module family X; X, modules of unknown function with homologues in the databases; unlabeled gray boxes represent regions not yet assigned. (a) ORF CAC0912 (Clostridium acetobutylicum); (b) mannanase ManK (Clostridium cellulolyticum); (c) endo-1,4-glucanase C (C. cellulolyticum); (d) scaffoldin CipV (Acetivibrio cellulolyticus); (e) cellulase CelE (C. cellulolyticum); (f) xylanase C (Clostridium thermocellum); (g) xylanase Y/feruloyl esterase (C. thermocellum); (h) xylanase A (C. thermocellum); (i) lichenase B (C. thermocellum); (j) xylanase/lichenase D (Ruminococcus flavefaciens); (k) chitinase (C. thermocellum); (l) h-mannanase Man26B (C. thermocellum); (m) cellulase/mannanase Cel26A-Cel5E (C. thermocellum); (n) a-galactosidase (Clostridium josuii); (o) cellulase CelF (C. cellulolyticum); (p) cellulase CelJ (C. thermocellum); (q) ORF CAC0919 (C. acetobutylicum); (r) pectate lyase A (Clostridium cellulovorans); (s) ORF Y-P (C. cellulolyticum); (t) xylanase B (R. flavefaciens).
Carbohydrate-Active Enzymes
31
members of a glycosidase family appear to lack the catalytic residues identified in other members. Aside from the always possible sequencing error, other reasons exist—such as the evolution of non-catalytic proteins from enzymes by loss of the catalytic machinery (see, for instance, Ref. 25). Underprediction. Because of the problems cited above, annotators sometimes assume the other extreme view, and it is not infrequent to find uninformative annotations such as ‘‘putative sugar hydrolase.’’ Here the inherent characteristics of the sequence-based families could improve annotation because, e.g., one could predict whether this is a noncatalytic protein or an enzyme, and if so, whether it operates with retention or inversion of the anomeric configuration or whether it would hydrolyze an axial or an equatorial glycosidic bond. Again, if the ORF to annotate is strongly related to a large monospecific and well-characterized family, then a precise annotation becomes possible. In doubtful cases, an annotation such as ‘‘member of glycosidase family GH5’’ (for example) would be much more informative than ‘‘putative sugar hydrolase.’’ In our efforts to update and maintain the CAZy server (where the family assignments are based on catalytic modules), we started to examine the carbohydrate-active enzyme content of genomes. Some global results are given as follows. 5.1
Eukaryotes
As of January 2002, five eukaryotic genomes are available: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, and Man. Of these organisms, Arabidopsis has, by far, the largest number of carbohydrate-active enzymes with 386 GHs and 411 GTs (16). The total (almost 800) exceeds 3% of the coding regions of the genome. In comparison, S. cerevisiae, D. melanogaster, and C. elegans have only f90, f230, and f320 ORFs coding for proteins related to glycosidases and glycosyltransferases. The human genome, with about 350 of these proteins, is not much more impressive than that of the nematode C. elegans. 5.2
Archaea
Twelve archaeal genomes were available at the time of writing. One surprising finding was that the genomes of three of them (Aeropyrum pernix, Archaeoglobus fulgidus,and Methanobacterium thermoautotrophicum) appear to completely lack glycosidases (27). This puzzling observation suggests any of the following possibilities: (1) that the metabolism of these organisms does not involve the degradation of glycosides, (2) that these organisms have developed an alternative chemistry to perform this reaction, (3) that these organisms have glycosidases which are so different from the known ones that
32
Henrissat et al.
they have not been identified, or (4) perhaps that these organisms rely on other organisms for the hydrolysis of glycosidic bonds. The last three possibilities are unlikely because these three Archaea do not grow on sugars as a carbon source. The nine other Archaea examined do have glycosidases, but these were clearly acquired from hyperthermophilic bacteria by horizontal transfer. Therefore, it is tempting to speculate that early Archaea developed before the emergence of metabolic pathways involving the degradation of glycosidic bonds. 5.3
Bacteria
Regardless of the size of their genomes, all free-living bacteria have about 1– 2% of their coding regions dedicated to glycosidases and glycosyltransferases. The only outlier is Thermotoga maritima, whose genome contains about 3% glycosidases and glycosyltransferases. It is interesting to note that a large number of the glycosidases of T. maritima are involved in plant cell wall degradation. Bacteria which only grow as parasites/pathogens of eukaryotic cells have a much reduced content in glycosidases (Helicobacter pylori, Mycobacterium leprae, Neisseria meningitidis) or sometimes show no glycosidase at all (for instance, Campylobacter jejuni), illustrative of the loss of complete metabolic pathways in parasitic organisms. The discovery potential of genomic research in the search for enzymes is considerable and one may even find useful enzymes from organisms which apparently do not express the desired activity. Here we must mention as a striking example the presence of the complete operon to make cellulose in the genomes of Escherichia coli and Salmonella typhimurium. Yet, these bacteria are notorious for their inability to biosynthesize cellulose. It is only recently that researchers have found that these bacteria indeed can synthesize cellulose under appropriate conditions (28).
6
CONCLUSION
The increasing insight provided by the sequence families of carbohydrateactive enzymes is breaking up the traditional EC class system. Because the sequence-based system allows the inference of structural and mechanistic relationships between enzymes of differing substrate specificity, it paves the way for protein engineering, directed evolution, and the development of new functionalities on common and stable scaffolds. It is clear that other enzyme systems can benefit from a similar approach, and an excellent example is provided by the proteolytic enzymes. A catalogue and a structure-based classification of these enzymes is readily available from the MEROPS database (29). In the genomic era, such structure-based classification systems provide the best possible tools for the appropriate annotation of genome data.
Carbohydrate-Active Enzymes
33
ACKNOWLEDGMENTS The authors are particularly grateful to Amos Bairoch (Switzerland), James A. Campbell (Australia), and R. Antony Warren (Canada) for many useful discussion throughout the years.
REFERENCES 1.
2. 3.
4.
5.
6.
7. 8. 9. 10.
11.
12. 13.
RA Laine. A calculation of all possible oligosaccharide isomers both branched and linear yields 1.051012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems. Glycobiology 4:759–767, 1994. TT Teeri. Crystalline cellulose degradation—new insight into the function of cellobiohydrolases. Trends Biotechnol 15:160–167, 1997. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. San Diego, CA: Academic Press, 1992. HM Jespersen, EA MacGregor, B Henrissat, MR Sierks, B Svensson. Starchand glycogen-debranching and branching enzymes: prediction of structural features of the catalytic (beta/alpha)8-barrel domain and evolutionary relationship to other amylolytic enzymes. J Protein Chem 12:791–805, 1993. WP Burmeister, S Cottaz, H Driguez, R Iori, S Palmieri, B Henrissat. The crystal structures of Sinapis alba myrosinase and a covalent glycosyl-enzyme intermediate provide insights into the substrate recognition and active-site machinery of an S-glycosidase. Structure 5:663–675, 1997. WP Burmeister, S Cottaz, P Rollin, A Vasella, B Henrissat. High resolution Xray crystallography shows that ascorbate is a cofactor for myrosinase and substitutes for the function of the catalytic base. J Biol Chem 275:39385–39393, 2000. B Henrissat, G Davies. Structural and sequence-based classification of glycoside hydrolases. Curr Opin Struct Biol 7:637–644, 1997. B Henrissat. A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem J 280:309–316, 1991. C Chothia, AM Lesk. The relation between the divergence of sequence and the structure in proteins. EMBO J 5:823–826, 1986. EA MacGregor, S Janecek, B Svensson. Relationship of sequence and structure to specificity in the a-amylase family of enzymes. Biochim Biophys Acta 1546: 1–20, 2001. J Gebler, NR Gilkes, M Claeyssens, DB Wilson, P Be´guin, WW Wakarchuk, DG Kilburn, RC Miller Jr, RA Warren, SG Withers. Stereoselective hydrolysis catalyzed by related h-1,4-glucanases and h-1,4-xylanases J Biol Chem 267: 12559–12561, 1992. B Henrissat, A Bairoch. New families in the classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem J 293:781–788, 1993. PM Coutinho, B Henrissat. Carbohydrate-active enzymes: an integrated database approach. In: H Gilbert, G Davies, B Henrissat, B Svensson, eds. Recent
34
14.
15. 16. 17.
18. 19. 20.
21. 22.
23.
24. 25. 26.
27. 28.
29.
Henrissat et al. Advances in Carbohydrate Bioengineering. Cambridge: The Royal Society of Chemistry, 1999, pp 3–12. JA Campbell, GJ Davies, V Bulone, B Henrissat. A classification of nucleotidediphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem J 326:929–939, 1997. B Henrissat, A Bairoch. Updating the sequence-based classification of glycosyl hydrolases. Biochem J 316:695–696, 1996. B Henrissat, PM Coutinho, GJ Davies. A census of carbohydrate-active enzymes in the genome of Arabidopsis thaliana. Plant Mol Biol 47:55–72, 2001. B Henrissat, PM Coutinho, PJ Reilly. Reading-frame shift in Saccharomyces glucoamylases restores catalytic base, extends sequence and improves alignment with other glucoamylases. Protein Eng 7:1281–1282, 1994. G Davies, B Henrissat. Structures and mechanisms of glycosyl hydrolases. Structure 3:853–859, 1995. Y Bourne, B Henrissat. Glycoside hydrolases and glycosyltransferases: families and functional modules. Curr Opin Struct Biol 11:593–600, 2001. B Henrissat, I Callebaut, S Fabrega, P Lehn, JP Mornon, G Davies. Conserved catalytic machinery and the prediction of a common fold for several families of glycosyl hydrolases. Proc Natl Acad Sci USA 92:7090–7094, 1995. ND Rawlings, AJ Barrett. Classification of peptidases. Methods Enzymol 244: 1–15, 1994. N Nagano, CT Porter, JM Thornton. The (h/a)8 glycosidases: sequence and structure analyses suggest distant evolutionary relationships. Protein Eng 14: 845–855, 2001. AB Boraston, BW McLean, JM Kormos, M Alam, NR Gilkes, CA Haynes, P Tomme, DG Kilburn, RAJ Warren. Carbohydrate-binding modules: diversity of structure and function. In: HJ Gilbert, GJ Davies, B Henrissat, B Svensson, eds. Recent Advances in Carbohydrate Bioengineering. Cambridge: The Royal Society of Chemistry, 1999, pp 202–211. EA Bayer, H Chanzy, R Lamed, Y Shoham. Cellulose, cellulases and cellulosomes. Curr Opin Struct Biol 8:548–557, 1998. B Henrissat, GJ Davies. Glycoside hydrolases and glycosyltransferases : families, modules and implications for genomics. Plant Physiol 124:1515–1519, 2000. NO Seto, CA Compston, SV Evans, DR Bundle, SA Narang, MM Palcic. Donor substrate specificity of recombinant human blood group A, B and hybrid A/B glycosyltransferases expressed in Escherichia coli. Eur J Biochem 259:770–775, 1999. PM Coutinho, B Henrissat. Life with no sugars? J Mol Microbiol Biotechnol 1:307–308, 1999. X Zogaj, M Nimtz, M Rohde, W Bokranz, U Romling. The multicellular morphotypes of Salmonella typhimurium and Escherichia coli produce cellulose as the second component of the extracellular matrix. Mol Microbiol 39:1452– 1463, 2001. ND Rawlings, E O’Brien, AJ Barrett. MEROPS: the protease database. Nucleic Acids Res 30:343–346, 2002.
3 Analyzing Three-Dimensional Structures of Variant Enzymes Richard Bott Genencor International Palo Alto, California, U.S.A.
1
INTRODUCTION
Proteins provide diverse functional capability to sustain life. It is assumed that every protein that is actively expressed within a cell has at least one functional and/or structural purpose such that it increases the ability of the organism to survive and reproduce. At a molecular level, proteins are polymers of different sequences of 20 naturally occurring amino acids. The specific sequence of a protein is determined by the nucleotide sequence of the gene encoding that protein in the genome of the organism. Proteins consist of one or more polypeptide chains that reproducibly fold into a specific tertiary structure. In the tertiary structure, amino acid side chains that are widely separated in their linear sequence are brought into close proximity. In functionally related proteins having a common ancestor and a similar function, the tertiary structure brings together the side chains of amino acids in highly conserved spatial juxtapositions (e.g., the catalytic triad found in serine proteases). 35
36
Bott
Proteins having such a conserved structural motif often share patterns of conserved amino acid sequence throughout their structures. The pattern of conserved sequence can range from nearly complete identity (>95%) but may be much more restricted—limited to regions forming the catalytic site and substrate and/or cofactor-binding sites that may represent as little as 20% of the overall sequence. These shared patterns of conservation for functionally related proteins seem to diverge in a manner similar to that presumed to reflect the expected evolutionary relationships for the parent organisms. Serine proteases of the trypsin-like family from mammals are more closely and more extensively conserved than the related serine proteases from bacteria. Although these structurally and functionally homologous proteins share a common mechanism of action, they have diverse specificity and stability profiles, which are the result of the differences in the amino acid sequences encoded by DNA. This somehow better adapts the proteins for functioning in particular organisms, each of which is in its own unique environment. Because the genetic information necessary to maintain the organism encodes for the amino acid sequence, the amino acid sequence must in turn somehow determine the overall tertiary folding of the proteins and thereby its function; it follows that modifying the DNA sequence to encoding a particular protein will alter its structure, and hence its function. The development of recombinant DNA technology allowed the manipulation of the genetic sequence to specifically alter the coding for one or more amino acids in a site-specific manner. This enabled altering the amino acid sequence of a protein with the aim of probing the specific functional roles of particular amino acid side chains, or of engineering the protein toward a commercial function. This technique has been highly fruitful for both pursuits. The selective replacement of purported functionally important residues with an accompanied loss of function is regarded as the definitive proof of the role of a particular side chain. There are now numerous examples of engineered enzymes, such as proteases, amylases, cellulases, and lipases, where one to a few amino acid changes have resulted in enzymes that are superior to naturally occurring enzymes. Such engineered proteins have been used in commercial applications. There are also numerous cases of protein engineering where the substituted side chains did not result in the desired effect. Indeed, it very often the case that the site-specific substitutions are neither beneficial nor deleterious, resulting in little or no change in the properties being measured for a particular protein. Such changes are regarded as neutral substitutions. Now that the entire genomes of several species are known, it is also possible to follow evolutionary drifts in the sequence of related proteins performing a similar function. The class of trypsin-like serine proteases provide a well-documented case study. The three-dimensional structure of serine proteinases of the trypsin class is conserved even in the presence of deletions or
Structures of Variant Enzymes
37
insertions of large segments of amino acid sequence. Structural studies have discovered that in regions where these proteins shared a common overall tertiary fold, the sequences can be diverged by more than 50%. From this observation, it became clear that the overall tertiary fold was much more highly conserved than the amino acid sequences. Given this insight, it follows that the tertiary fold of the protein is tolerant to single amino acid substitutions, provided that these substitutions do not replace one of the critical side chains involved in the catalytic machinery, or a residue that is crucial for a specific recognition process. This would, of course, be a necessary corollary to any postulate that protein evolved by the slow accumulation of amino acids substitutions that gradually allow the divergence of function to the point of having two enzymes with different functionality. The functionality must have been sufficiently maintained during the accumulation of numerous substitutions that gradually altered the function through the accumulation of a series of compensating and synergistic changes. Natural proteins that would have evolved by this process would have tertiary folds that were, to a large extent, forgiving and, at the same time, robust. From this line of reasoning, there would be an inherent potential for success to engineer proteins. A rationale based on the best available understanding of the relationship between tertiary structure and function would provide a means to select sites that would produce immediate and, hopefully, beneficial changes toward the desired function or property (e.g., enzyme activity or stability). This process that would focus on productive, rather than random, accumulation of changes should result in accelerated evolution of a desired function in a protein. Such a process of rational or ‘‘directed’’ mutagenesis should be superior to the random walk accumulation of changes over time occurring in natural evolution. In both cases, these changes would most likely be tolerated without a major conformational change. Thus, it is possible to model probable changes based on the native structure. Several functional characteristics have emerged as not only being desirable, but also quite achievable in protein engineering. It has been demonstrated that for numerous proteins, it is possible to alter substrate specificity and overall catalytic efficiency. In the case of subtilisin, relative specificity has been altered by as much as 1000-fold, and relative catalytic efficiency has been altered, as measured by kcat/Km, by as much as 100-fold (1). The pH and temperature performance profiles have also been shifted easily by at least one pH unit, and temperature for optimum performance has been raised by 10j C (1). Stability has also been manipulated by site-directed and rational mutagenesis, as has been exemplified by the extensive work with T4 lysozyme (2) and bacterial amylase (3). Clearly, knowledge of the structure of the starting protein is central to this strategy. X-ray crystallographic determination has been the most extensively used approach to obtain three-dimensional structures of the parent and
38
Bott
variant enzymes, and nuclear magnetic resonance (NMR) has also been employed to determine three-dimensional structures of proteins up to 30 kDa (4). This chapter will review some of the standard approaches that are used in determining variant structure, along with some of the general guiding principles that appear to be recurring themes in structures of engineered proteins. It would be impossible to cover the extensive work that has been done in a number of protein systems, so this chapter will focus on a few representative examples. These illustrate how the knowledge of the threedimensional structure of native and variant enzymes structure has been used to elucidate the underlying principles of stability, enzyme function, and energy conversion. This chapter will also focus on emerging techniques to further quantitate significant changes in the structure as regards to coordinates shift, flexibility, and conformational change. Finally, the chapter will look at the emerging techniques and what new structural insights may be forthcoming as site-specific variants continue to be analyzed in the future. 2
STRUCTURE DETERMINATION
There are numerous texts that cover the general principles of x-ray crystallography—one for the nonexpert wishing to understand only the underlying principles (5), as well as excellent in-depth textbooks for detailed studies (6,7). It all begins with a protein crystal. These are regular arrays of protein molecules that will scatter x-rays from the electron clouds of individual atoms to form a coherent diffraction pattern. These crystals are formed by solvated protein molecules, with the crystal comprising between 40% and 60% solvent. The determination of the parent protein structure by x-ray crystallography requires an experimental collection of diffraction data taken as intensities of diffracted x-rays scattered from a crystal of the protein and the ‘‘phases’’ for combining the observed intensities in a Fourier summation. The combination of these data produces a three-dimensional visualization of the scattering matter (the electron of atoms within the protein molecule), which is expressed as an electron density map. There are several highly successful strategies for sufficiently determining accurate phases, which include multiple isomorphous replacement, multiple wavelength anomalous dispersion, and, finally, molecular replacement. The latter technique has been extensively used in the determination of variant structures. A model is then constructed, consisting of coordinates of usually all nonhydrogen atoms by fitting the expected atoms residue by residue into the electron density map. There is every reason to expect that the three-dimensional structure obtained from x-ray crystallography is a good representation of the active protein structure in a solution. The structures of proteins determined in
Structures of Variant Enzymes
39
different crystal forms are in close agreement (8), and there is general agreement between the structures determined by x-ray crystallography and those determined independently by NMR (4). The general predictions made on the basis of these x-ray structures, particularly the identification of active site residues, have been subsequently confirmed by site-directed mutagenesis. 3
THREE-DIMENSIONAL STRUCTURE OF PROTEINS
In solution, the amino acid sequence of proteins is presumed to dictate the reproducible folding of the molecules into stable three-dimensional structures. Within these structures, there are recognizable substructures or features of secondary structure such as loops, helices, and sheets. The geometry of these features of secondary structure, expressed as torsion angles of the main chain atoms, is largely a consequence of the stereochemical restraints imposed by the interacting atoms of the peptide bond, which links the amino acids in the polypeptide chain with the Ca atom and its constituents (9). The peptide bond is planar and rigid so that there are only two bonds that are free to rotate for each peptide residue. Rotation about these bonds is limited to specific ranges that result in periodical structures such as the a helix. The steric constraints are used as a criterion for evaluating the quality of protein structures by plotting the torsional angles for each peptide bond to give the Ramachandran plot (9). At the time this chapter was written, there were more than 18,000 structures deposited in the protein data bank (10). The vast majority of torsion bond angles for these proteins conform to ranges recognized for the helical, sheet, and turn segments, upon which the overall fold is observed. The absence of a side chain loosens the restraints so that the only outliers in a Ramachandran plot are usually glycine residues. The distribution of side chains in the three-dimensional structure for soluble proteins is the ‘‘oil-drop’’ model. From the first protein structures of myoglobin (11), hemoglobin (12), lysozyme (13), and ribonuclease (14), it was noticed that hydrophobic side chains were found in the interior of the protein whereas hydrophilic side chains were found on the surface. Within the interior, these side chains were found to be in close packed arrangement, leaving only a few cavities that were occupied by solvents. Occasionally, hydrophilic residues were found in the interior; however, they were usually observed to be situated as pairs of hydrophilic, oppositely charged side chains. By being situated in a shielded environment, they would interact with each other more strongly than in an aqueous environment. Such structures are termed salt bridges and are considered to have a stabilizing influence on the overall fold of the enzyme. In general, enzymes tend to be roughly spherical objects with relative smooth surfaces. Any crevices are usually filled with ordered solvent. In enzymes, there is usually a surface feature where the reaction occurs, which
40
Bott
includes the side chains responsible for catalytic activity. The active site is surrounded by residues to create a unique binding surface for the substrate molecule. These features often are the sites with the highest probability for altering the function and/or specificity of the enzyme.
4
DIRECTED EVOLUTION OF A PROTEIN
To perform a directed evolution of a protein, a hypothesis is formulated as to which property of the protein will enhance its performance. This is often largely based on a biochemical analysis of the enzymatic reaction, the activity of the enzyme toward a substrate of interest, and an analysis of the optimal conditions for use of the enzyme in a given application. This can further be enhanced by comparing the performance of different enzymes in the application. The different enzymes will most often come from libraries of natural isolates taken from environments that most closely match the conditions of the application for which the enzyme is intended. In most cases, the engineering goals will be to alter the substrate specificity, increase the overall catalytic efficiency under a specific set of environmental conditions such as pH and temperature, and/or alter the stability of the enzyme. These parameters will then lead to the selection of sites and regions to be systematically explored using recombinant DNA technology. The resulting variant enzymes would then be screened for improved performance and ranked. The probable structures of the variants will then be evaluated with regard to the altered properties that resulted from the change. In certain instances, anomalies between the overall pattern of altered performance and a specific variant will benefit from the determination of the actual structure as opposed to the ‘‘probable’’ structure derived from modeling on the basis of the parent enzyme. In these instances, the knowledge of the three-dimensional structure of the variant protein becomes crucial.
5
DETERMINING X-RAY STRUCTURES OF VARIANT PROTEINS
Knowledge of the native enzyme structure is of considerable advantage in determining the structure of the variant enzyme. First, the crystallization conditions giving crystals of the parent enzyme will, in most cases, give crystals of the variant that are suitable for diffraction studies. If crystals of the parent protein are available, small crystals can be used as seed crystals for the variant enzyme. The seed crystals serve as a nucleation site for crystal growth of the variant and predispose the variant to crystallize in an isomorphous form that facilitates the comparison of the three-dimensional structures of the
Structures of Variant Enzymes
41
parent and variant proteins. Another advantage when the variant crystallizes in a form that is isomorphous is that the coordinates of the parent enzyme can immediately serve as a phasing model. In this most favorable instance, the time needed to begin an analysis of the variant structure is limited to the time needed to obtain a suitably diffracting crystal and the time to collect the diffraction data from the variant crystal. The most immediate visualization of differences between the parent and the variant protein comes from the difference electron density map. In the case of isomorphous crystals, the |Fovariant||Foparent| difference in electron density can be examined, where Fo corresponds to the observed structure factors of equivalent reflections from the variant and parent enzyme crystals. These maps can be regarded as being essentially the result of subtracting the electron density of parent from the electron density of the variants at each sampling point throughout the electron density map. Where the structures of the variant and parent enzymes are unchanged, the electron density of the parent and variant will be the same and cancel out. In regions where atoms have moved, there will be positive electron density for the new position of atoms that have changed or have been added by a substitution in the variant and negative density at the position where the atoms are replaced by a substitution in the parent protein. However, atoms rarely shift by distances exceeding their van der Waals radius so that there is no overlap expected between the old and new positions. However, as illustrated by the example in Fig. 1, the new and old positions do overlap. There are also instances when a new crystal form is obtained. New crystal forms have been linked to substitutions involving crystal contacts (15). In these cases, it is necessary to use the techniques of molecular replacement to obtain a starting phase model to generate an electron density map of the variant structure. There are several highly successful program packages available, including AmoRe (16) and CNS (17), which determine the correct orientation and position of the reference molecule in an automatic or semiautomatic manner. However, in these instances, it is not possible to directly compare the structure of the variant and parent molecules as before. Instead, FoFc differences in electron density map are examined, where Fo is the observed diffraction intensity from the variant protein and Fc is the calculated diffraction intensity from a model of the parent enzyme aligned and positioned to serve as a phasing model for the variant. In this map, the difference map is the result of subtracting features present in the model of the parent protein from the electron density of the variant enzyme. In this case, the electron density includes all features not included in the model of the parent enzyme. Solvent molecules, salts, and ligands not present in the coordinate set of the parent protein will appear as positive electron density as well as differences between the structure of the parent and variant enzymes. So
42
Bott
Figure 1 Structural perturbations arising from site-specific mutations. Superposition of native subtilisin BPNVand variant having Y217L substitution (dark gray). The side chain of Leu217 is seen to closely resemble the conformation of Tyr217 in the native enzyme. Residues forming the catalytic triad (Asp32, His64, and Ser221) were not altered.
instead of a few different density peaks that immediately highlight the structural differences, there may be many more features of the electron density that must be surveyed. As will be noted in specific examples below, this can often be a rewarding exercise in cases where the solvent molecules serve as indicators of structural changes either for side chain shifts, or also directly themselves as mediators of altered function. In general, the differences arising for single amino acid substitutions are very subtle and usually result in limited local perturbations in the structure. The phenomenon of subtle changes can be reinforced by what is found in naturally occurring variants. There are numerous examples of related proteins that share a close homology in amino acid sequence and function within the protein data bank. One of the most extensively engineered enzymes, subtilisin, is a representative case. Subtilisins belong to the S8 family of serine proteinases. The three-dimensional structures of subtilisins from several different species of Bacillus have been characterized. Three in particular (from
Structures of Variant Enzymes
43
Bacillus amyloliquefaciens, B. licheniformis, and B. lentus), which have been commercialized for use as detergent additives, have been extensively studied in several laboratories (18–20). The sequences of these enzymes differ at 87 and 103 of a possible 275 positions (Fig. 2). It should be clear from this picture that if different species having 83–103 substitutions have a similar overall folding pattern, then one would certainly expect that variants having 1–10 substitutions would also have a similar folding pattern. The finding that very subtle shifts occur represents one of the major challenges of analyzing the
Figure 2 A comparison of main chain folding of three subtilisin enzymes. Subtilisin BPNVfrom B. amyloliquefaciens (gray), subtilisin Carlsberg from B. licheniformis (black), and subtilisin from B. lentus (dark gray). Although the sequence can differ at 40% of the positions, these enzymes share an identical overall tertiary folding pattern.
44
Bott
structure of site-specific changes, which is to discern real differences from random fluctuations in structure. Some attempts to address this issue are described below. 6
STRUCTURAL ANALYSIS OF SITE-SPECIFIC VARIANTS
It has been possible to alter stability, substrate specificity, pH activity profile, and electrostatic interactions of many proteins. X-ray crystallography has been used to determine structures that illustrate the structural basis for the alteration of protein properties in a number of proteins. It would be impossible to do justice to the breadth of all crystallographic analysis of site-directed mutagenesis in all possible protein systems developed over more than two decades in a dozen such chapters. Therefore, it is not the intent of this chapter to attempt encyclopedic coverage, but rather to select samples over a range of the last 20 years. We will discuss the analysis of two enzymes, T4 lysozyme and subtilisin, both of which have been extensively studied; a redox protein, cytochrome f; and a light transducing protein, bacteriorhodopsin. T4 lysozyme has been extensively studied as a model system to understand the structural basis of protein stability. T4 lysozyme consists of 164 amino acids that folds into two domains linked by a long central a helix connecting the two domains. Through extensive mutagenesis, it has been possible to elucidate certain principles governing protein stability of helices. In one study, the three-dimensional structures of variants, in which 13 of the possible 19 natural amino acid replacements (Ala, Arg, Asn, Glu, Gly, Ile, Leu, Lys, Phe, Pro, Thr, Trp, and Val) were introduced to replace serine in the middle of the a helix, were determined (21). All amino acids were accommodated without a major distortion of the helix main chain, a pattern that is repeated at other sites in T4 lysozyme and other proteins. Based on the conservation of main chain conformation and the helix-stabilizing hydrogen bonds, it was possible to identify the structural basis for the high helix propensity for alanine as well as the low helix propensity for glycine and proline. Alanine was proposed to provide an energetic compromise between increased hydrophobic stabilizations without incurring the entropy cost associated with the conformational restriction of the additional side chain atoms present in residues with larger side chains. Proline, while restricting conformational freedom, has the obvious enthalpic cost of losing a main chain hydrogen bond and also the introduction of some steric interactions, although not sufficient to disrupt the helix backbone, and thus has a lower helix propensity. Glycine was proposed to have low helix propensity due to the entropy cost that accompanies the additional conformational flexibility. These structures were determined from multiple crystal forms, but when the helix residues 40–49 where aligned, the root mean square (rms) deviation
Structures of Variant Enzymes
45
for Ca atoms ranged from 0.10 to 0.14 A˚ for isomorphous crystal forms and from 0.19 to 0.33 A˚ for variants determined from nonisomorphous crystal forms. A similar variation was reported in a survey of structures of T4 lysozyme determined from 25 crystal forms (22). These crystals were grown under diverse conditions, varying pH values, and different space groups, and had one to five molecules in the asymmetrical unit. Variation between equivalent Ca was again reported to be in the range of 0.25–0.4 A˚. It was noted that these were well above the estimated error of 0.1–0.2 A˚. This study reinforced the pattern seen for the helix substitutions above, such that in general, the folding pattern of T4 lysozyme was sufficiently robust to tolerate between 1 and 11 substitutions distributed over 16 sites. Several of these sites altered the domain-to-domain juxtaposition, resulting in an altered hinge angle. Here the determination of structures of numerous variants was required to decipher whether the change in hinge angle was a consequence of the substitutions at the domain interface, thereby resulting in different crystals forms, or whether the flexibility of the variant in the hinge angle was an intrinsic property of the enzyme and that the different crystal forms provided the opportunity to map the variants. Numerous variants that involved substitutions far removed from the domain interface resulted in different crystals forms, which also manifested altered hinge angles between the two domains of T4 lysozyme. Thus, the altered hinge angle was interpreted to be the result of intrinsic flexibility in the molecule. Substitutions of sites involved in crystal contacts were attributed to altered crystallization patterns. This has also been reported for subtilisin crystals (15), which are discussed below. The substitutions gave different crystal forms allowing the observation of hinge angle variability and resulting in the conclusion that flexibility of the hinge was an intrinsic property of the enzyme structure. In these cases, individual domains showed high conservation that allowed the definition of the hinge-bending axis of the molecule. A careful analysis showed that the motion included more than a simple opening and closing of the cleft, but was a combination of rotation and had a substantial side-to-side component akin to ‘‘the chewing action of a camel.’’ Such an illustrative description would not have been possible without the detailed structural analysis of numerous mutants, which in turn gave rise to numerous crystal forms of the enzyme. Our understanding of the relative importance of internal hydrogen bonds for stability has been derived, in part, from x-ray crystallography of variants of the T4 lysozyme. In studies focusing on internal solvent and the characterization of the presence or absence of internal solvent in the structures of variants of T4 lysozyme, differences in relative thermal stability have been manifested. From the analysis of numerous variants constructed to introduce or replace internal solvents, it was concluded that hydrogen bonds
46
Bott
are energy-neutral. By relating structures that had either lost or gained solvents and the availability for hydrogen bonds to form, it was possible to conclude that the hydrogen bonds formed in the folded state are offset by hydrogen bonds with solvents in the unfolded state. It was also possible to define the rules for creating and the requirements that included the proximity of three or four potential hydrogen bond donor/acceptors; otherwise, the resulting variant would be expected to be less stable. These conclusions were closely related to the structural analysis that determined which of the variants had additional solvent molecules present and how many were introduced. Often creating a cavity that could accommodate two water molecules, such as when a methionine or another large residue was replaced with alanine (Met6!Ala), the two molecules each satisfied one hydrogen-bonding requirement of the other. These studies and related ones in other enzymes such as amylase (3) have provided examples of structurally based strategies that can be successfully employed to stabilize proteins. There are also examples of variants that manifest dramatic changes in properties, such as synergy between the variants for stability, where the substitutions are far removed from each other. In these structures, the phenomenon of long-range interactions has been invoked to explain the consequences of these changes. However, although the phenomenon is well documented, the basis of long-range interactions and the means to evaluate these have remained elusive. 7
ENZYME SPECIFICITY AND CATALYTIC ACTIVITY
Structural studies have also played a role in the engineering of altered specificity and increased catalytic function of proteolytic enzymes. The subtilisins, proteolytic enzymes originally isolated from various Bacillus species, display broad specificity and relatively high stability to denaturants such as detergents. Therefore, they have been incorporated into detergent powders as additives to dissolve proteinaceous stains. Subtilisins perform a similar function as surfactants to solubilize stains and are cost-effective, resulting in their incorporation into surfactants. One obvious strategy in improving these enzymes was to alter the specificity of enzymes with the aim of targeting specific soils. Toward this end, numerous studies were undertaken to identify the specific residues that serve as determinants of specificity along the binding sites. Subtilisin can be inhibited by the product resulting from the hydrolysis of a polypeptide chain or artificial substrate. When cleaved, the artificial substrate, succinyl–Ala– Ala–Pro–phenylalaninyl para-nitroanilide, results in a product, succinyl–Ala– Ala–Pro–phenylalanine, that can inhibit subtilisin. We have determined the structures of the product-inhibited native enzyme, which has served as a basis
Structures of Variant Enzymes
47
for a model of an enzyme–substrate complex (Fig. 3). When examining the interactions of the phenylalanine side chain of the substrate, we can see that it has van der Waals contacts with the main chain of residues 126–129 and with the side chain at positions 155 and 156. It was also noticed that glycine 166 was found in an analogous position of site thought to determine the P1 specificity in chymotrypsin and trypsin position 189. As a glycine residue lacks any side chain, the site is open rather than closed as in chymotrypsin and trypsin. We use the nomenclature proposed by Berger and Schecter (23) to
Figure 3 The model of binding from the synthetic substrate, succinyl–Ala–Ala– Pro-phenylalaninyl–para-nitroanilide. The model was deduced from substrate and product complexes to numerous subtilisin BPNVvariants. The location of residues forming the catalytic triad is indicated.
48
Bott
designate the specific subsite for binding polypeptide substrates. Modeling experiments have shown that a side chain would close this pocket and would increase the potential contacts available for interacting with the P1 side chain. This was borne when all 19 substitutions were made and one of these (asparagine) resulted in an enzyme showing increased catalytic activity. Another substitution, a lysine for glycine, resulted in a 1000-fold increase relative to the parent enzyme for substrates having glutamic acid at the P1 position. Thus, a change at even a single position can radically alter the specificity at the P1 position and knowledge of the three-dimensional structure coupled with the analysis of site-specific substitutions, which, in the instance of position 166, suggest that it can accommodate many different side chains without perturbing the tertiary structure, and hence the function, of the enzyme. A comparison of the subtilisin from B. amyloliquefaciens (subtilisin BPNV) and B. licheniformis (subtilisin Carlsberg), which differ at 89 of a possible 275 amino acids in their sequences, displays a number of different kinetic properties. In addition to the specificity differences for negatively charged amino acids, the two enzymes display a 10-fold difference in kcat, the turnover number for a synthetic substrate. For example, with the substrate succinyl alanine–alanine proline–phenylalanine–para-nitroanilide, subtilisin BPNVhas a kcat value of 50 turnovers/s, whereas subtilisin Carlsberg has a kcat of 510 turnovers/s (24). Although subtilisin Carlsberg differs from subtilisin BPNV at 89 positions, few of these differences are near the active site of the enzyme or the substrate-binding site. The initial attempt to recruit the substrate specificity and turnover properties by replacing three of the amino acid differences that were found in the substrate-binding site was highly successful (24). The three substitutions were Glu156!Ser, Gly169!Ala, and Tyr217!Leu. These three changes were shown to successfully recruit both substrate specificity and turnover rate (kcat) of subtilisin Carlsberg into subtilisin BPNV. A crystallographic analysis confirmed that these substitutions resulted in a structure that was highly conserved, except that the side chains at positions 156 and 217 adopted conformations that were identical to those found for the same amino acid side chains in subtilisin Carlsberg. The introduction of the side chain at position 169 was also accommodated without any conformational change. The turnover number was later found to largely occur as a result of a single amino acid substitution, Tyr217!Leu. Thus, with a single amino acid change, a 10-fold increase in turnover number was introduced into subtilisin BPNV. Analysis of the structure of the enzyme in complex with reaction products revealed an identical binding pattern with that seen for the native enzyme. Based on the structural data, the increase in the turnover number appeared not to be the result of altered substrate binding, but rather due to the removal of any steric hindrance to the reactions involved in the rate-limiting acylation step. These studies showed that the net removal of only
Structures of Variant Enzymes
49
four nonhydrogen atoms in a molecule containing 1880 atoms can have very dramatic effects. Another example of how such subtle changes can influence the performance of a variant was seen in a different subtilisin from B. lentus. This enzyme differs from subtilisin BPNVat 103 positions of which six residues are deletions resulting in a molecule consisting of 269 amino acid residues. Nevertheless, the subtilisin from B. lentus shares a common, highly conserved folding pattern with subtilisin BPNV. B. lentus subtilisin already has a leucine at position 217 and has an even higher turnover number for equivalent synthetic substrates than subtilisin Carlsberg or the Tyr217!Leu variant of subtilisin BPNV. An engineered variant having three substitutions Asn76!Asp, which also occurs in subtilisin Carlsberg, Ser103!Ala, and Val104!Ile, was found to result in an enzyme that was effective by more than twofold in detergent applications than the native enzyme. Here the difference involved replacing nitrogen with oxygen, removing a hydroxyl oxygen, and adding a methyl carbon. As might have now been expected, the three-dimensional structure showed very subtle changes resulting from the resculpting of the substratebinding surface by two atoms and the recruitment of another atomic replacement near the tight calcium site shared by all three subtilisins: subtilisin BPNV, subtilisin Carlsberg, and B. lentus subtilisin. The analysis of significant changes showed that few changes arise from the very subtle structural differences between the variant and the parent enzymes. However, this variant displayed a significant difference in the overall flexibility of the segment involved in the substrate-binding site that was altered to make it more flexible (25). The method for determining this will be discussed below. The increased flexibility has been verified independently in the NMR structures of the native and variant enzymes (26). The relation rates of amide nitrogens indicate increased flexibility in several regions, including one side of the substratebinding site as predicted by the variation on average temperature factors from the crystallographic structure. Because there are very few positional differences of any significance arising from the very subtle structural changes described above, it appears that the increase in flexibility may be a factor in the increased turnover number and the variant’s increased performance. Sometimes the largest changes that arise as a consequence of sitespecific substitutions do not affect the conformation at the site of substitution but rather the contiguous side chains of molecules such as the solvent. Often the solvent molecules fill what would otherwise be cavities within the molecules or crevices along the surface. In some cases, solvents form channels that either contribute to the function of the protein, or serve as space holders— surrogates for products or substrates that must pass through a channel or enter a cavity. Two examples of the former are found in bacteriorhodopsin (27) and cytochrome b6f (28).
50
Bott
Bacteriorhodopsin links the photoisomerization of the all-trans retinal chromophore to the 13-cis,15-anti isomer with proton transfer in a unidirectional manner. Extensive mutagenesis studies identified a series of mutants blocking this process and facilitating the linking of protonation of specific side chains with particular spectroscopic states (29). A series of structures of two site-specific mutants, E204Q and D96N, both of which were found to interrupt the photocycle roughly to the same state (either early or late M state) (27), was determined in both the resting and M states, where the retinal was still photoisomerized, and compared. The mutants were found on opposite ends of the solvent channel leading to the retinal, which is covalently linked to Lys216. The variant E204Q is in the extracellular region and D96N is in the cytoplasmic region. Comparing the structures of these variants showed nearly identical ground states but highlighted subtle changes in the conformation, which were the consequences of the site-specific mutations at either the cytoplasmic or extracellular face of the molecule. The structures of E204Q in the resting and M states in the cytoplasmic region, when compared to the consequences of the D96N mutation (30), could be used to differentiate changes in the cytoplasmic region that were the result of the differences between the resting and M states from the consequences of D96N, which would be overlaid on the difference seen for E204Q. Likewise, comparing the structures of the D96N variant in the resting and M states for changes with those of E204Q in the extracellular region was performed to differentiate shifts in the extracellular region due to the difference in resting and M states onto which the structural consequences of E204Q would be overlaid. Thus, in this study, different site-specific mutations, each contributing subtle local perturbations, were used to filter out these consequences of sitespecific substitutions to obtain an unbiased comparison of the structural changes between the resting and ground states. Each substitution prevented a key protonation event necessary in the continuation of the photocycle to the next relaxation step involving a coordinated transfer of a proton along the channel. It was noted that in the M state of the E204Q, there appeared to be a nascent solvent channel partially formed to facilitate this event. Cytochrome b6f from Chlamydomonas reinhardtii was modified to remove residues hydrogen bonding to the internal solvent channel. These mutants displayed similar phenotypes all manifesting a decreased rate of reduction and, in the most impaired mutant N168F, the organism could not grow phototrophically (28). The three-dimensional structures of the three mutants (N168F, Q158L, and N153Q) were determined and compared to the native protein (28). N168F was determined in a different crystal form (P21) instead of P212121 and produced the highest resolution data (1.6 A˚). Structural analysis showed that the N168F mutant had a loss of two of the
Structures of Variant Enzymes
51
five internal solvent channels, which correlated with the pronounced decrease in the reduction rate and phototrophic growth. Smaller disruptions were seen in the Q158L and N153Q mutants, which had shifts in one of the five internal solvent atoms. In summary, the effect of site-specific mutation can also be indirect, affecting either neighboring side chains or ordered water, rather than the immediate shift in the residue itself. 8
RELATING STRUCTURE TO FUNCTION: NEW STRATEGIES AND TECHNIQUES
In all of the instances cited above, there is very close agreement in the overall structure of the variant and parent proteins. In most instances, very subtle structural changes or absence of structural changes have been linked to the changes in performance of the variant with respect to the parent protein. This, in turn, has raised the question: Are any other subtle changes occurring within the variant that might contribute to the altered function being missed? Intuitively, we expect that the remainder of the protein must affect the functioning of the enzyme either by stabilizing, shielding, or otherwise modulating the interaction of particular amino acid residues. To explore this concern, one must be able to determine and measure what significant changes have occurred in the variant structure. The problem here is that these may be subtle shifts within highly ordered structure that are smaller than insignificant shifts in the more variable parts of the structure and also may vary with the resolution range of the data collected. As such, these are not likely to appear in the difference electron density maps, or might be dismissed as noise peaks when some density did appear. The first step would be to evaluate the empirical error between coordinate sets. There are a number of relatively trivial metrics that have been and are being used to estimate error in protein structure. One of the most favored ones is to take the diagonal terms from the last cycle of refinement as a measure of the residual ‘‘error.’’ These terms reflect what would be the shift for a particular atom in the next cycle of refinement and, as such, are a better measure of the convergence of the refinement rather than the error. Often these values appear very low, ranging below 0.1 A˚ for coordinates derived from an electron density at 2.0 A˚. Such estimates are also restricted to internal error, measuring how well the model agrees with a single data set. This would be contrasted to external error, which would be the variation in coordinates determined from independent data sets. Put in another way, would the structure be the same if it were independently determined a second time from two independent data sets? A second approach that does address external errors is the rms variation determination between a set of coordinates, usually the most well-ordered, such as the main chain atoms or sometimes only the Ca
52
Bott
atoms. This gives the variance of one standard deviation if we assume that the mean experimental error is precisely zero. However, this is not the case and not a really desired outcome because one standard deviation does not conform to the statistical 95% confidence level. Instead, it would be more useful to know the actual mean error along with the variation of the error about the mean. In this way, one could establish confidence criteria based on an analysis of variance. 9
IDENTIFYING STATISTICALLY SIGNIFICANT DIFFERENCES
Such a method (31) has been applied to variants of subtilisin. This method is empirical, relying on taking the distances between equivalent atoms as a function of the average temperature factor. When the logs of the differences between equivalent atoms versus temperature factor of those atoms are plotted, a linear distribution is found (Fig. 4). The temperature factor is a
Figure 4 A plot of the log of the distance between equivalent atoms after molecular superposition as a function of the refined crystallographic temperature factor. A linear regression fit of the mean error is drawn as a solid line, with variants at 2r plotted as dashed lines.
Structures of Variant Enzymes
53
refined parameter that models the relative flexibility of individual atoms within a protein molecule. Atoms in the interior of the molecule are generally less flexible and will lower temperature factors than atoms of residues located on the surface. Linear regression is used to determine the mean error that is not zero and the variant about the mean. The equations for the mean and variance are plotted in Fig. 4 as solid and dashed lines. What is obtained is a linear function for the mean error as a function of the temperature factors that can be applied to all atoms. We find that in all instances so far examined, the atoms with low-temperature factors coming from the more ordered parts of the structure tend to have a lower overall mean error, whereas those atoms with high-temperature factors have higher mean errors and are compounded with higher variants as well. So what might be a significant difference between well-ordered atoms in the parent and variant proteins might not be significant for the more disordered segments, and these can be differentiated as a function of the crystallographic temperature factor. In practice, a residue Z-score can be computed using the equation of the mean error and variant as a function of temperature factor and those residues having a Z-score of z3.0 are taken to be significantly different. In this way, differences in residues neighboring the substituted side chains have been detected. Moreover, one can also have more confidence in the identification of significant structural differences that are potential important determinants of altered function in the variant protein. The advantage of this approach is in variants where the substituted side chain is within the well-ordered regions of the protein structure. Significant differences detected for well-ordered segments would otherwise be overshadowed by nonsignificant variants in disordered segments and may well fall below the overall rms for all atoms in the structure. Using this approach, one can systematically identify significant changes arising from the substituted side chain(s) throughout the entire molecule. 10
MEASUREMENT OF ALTERED FLEXIBILITY
The strategy employed above can easily be extrapolated to the analysis of other structural properties such as flexibility. The central basis in the previous approach was to construct a means for doing an analysis of variance between closely related observations. In the case of Bott and Frane (31), the plot of the log versus the temperature factor resulted in a linear distribution as a function of the temperature factor. In this plot, all data could be subjected to a linear regression analysis to get a function based on all atoms that could be applied to any pair. To analyze flexibility, we realized that the main chain atoms would, by virtue of the covalent linkage and restraints of the peptide bond, have similar flexibility. Thus, by computing a rolling average, the variation of flexibility,
54
Bott
as measured by the crystallographic temperature factor, would be a gradual varying value and this average could again be compared between the parent and variant proteins. This would generate a sufficiently large set of observations to construct a mean value of the error between any pair. The variance about the mean error seen for all the pairwise comparisons of rolling average between corresponding residue sets in the parent and variant structures can also be constructed. The validation of this approach came from the comparison of the same variant and parent proteins by x-ray crystallography and NMR (26). The same segment was identified by both techniques as being more flexible in the subtilisin variant N76D/N87S/S103A/V104I (DSAI) described above relative to the parent enzyme. 11
MEASURING CONFORMATIONAL CHANGES
The initial presumption set out at the onset of this chapter was that structure is determined by juxtaposition of disparate atoms into a functionally important conformational change between a variant and the parent protein. The extension of this presumption was that an analysis of variant structure would identify conformational changes arising from the introduction of specific substitutions. In many regards, the quantitation of error and flexibility is relatively trivial compared to the quantitation of conformation change. Both involve simple two-parameter relationships: the degree of variation versus a particular temperature factor, or the linear position of a residue in the polypeptide chain. Instead, in the analysis of conformational change, one looks for the simultaneous displacement of atoms in amino acid residues that are not contiguous. Although each may be within the boundaries or vagaries of statistically significant difference with regard to the coordinates, the conformational change is significant. Knowledge of what conformation changes occur and the relative magnitude of these changes should be considered in relating structural and functional changes. This problem is not unique to the analysis of site-specific mutations. The question of what occurs upon substrate binding to facilitate catalysis has been an equally vexing problem. One solution to address this question was put forward by Bystroff and Kraut (32) to analyze the ligand-induced changes in dihydrofolate reductase. Their strategy was to examine ‘‘distance difference’’ or DD plots. These were based on the distance plot developed by Ooi and Nishikawa (33) to plot a twodimensional grid, the Ca–Ca distances, between residues (residues 1–1, 1–2, 1–3, 1–4 , etc. by 2–1, 2–2, 2–3, etc.) within each protein molecule. Such a plot would have diagonal symmetry, with diagonal values (1–1, 2–2, 3–3, . . . , n–n) being zero. In such a plot, residues involved in tertiary interactions will be in close proximity and certain secondary features can easily be recognized as patterns of short contacts. Bystroff and Kraut recognized that if the differences in these distances were plotted instead, then such a map would
Structures of Variant Enzymes
55
immediately identify conformational changes occurring between two crystallographic structures, such as an enzyme alone versus one complexed with a substrate or substrate analog. Using the DD plot, it was possible to identify and differentiate shifts from a contiguous series of residues relative to the molecule as a whole and the movement of whole domains. An extension of this strategy has been employed in the analysis of conformational changes arising from site-specific mutagenesis (34). One very attractive feature of the DD plot is that it is internally consistent and it is not dependent on the alignment of the variant and parent proteins because only intramolecular contacts are being compared. For a protein of 250 amino acids, for example, there will be 250250 intramolecular differences of which 250 would be trivial self-contacts. These could be used to establish mean deviation and variance. This, in turn, could be used in an analogous manner to determine significant conformation shifts within the molecule. These shifts, taken together with the analysis of significant changes in positional coordinate and altered flexibility, provide the means of pinpointing the probable structural changes that provide the basis for the altered performance in the variant protein. 12
WHAT DOES THE FUTURE HOLD?
We have seen in these limited examples a consistent theme for site-specific mutation. The changes are usually localized and often subtle. Although numerous techniques have been and continue to be developed, one of the most pressing needs is to obtain the most precise structural information possible. We have seen that there have been efforts to link the results of x-ray crystallography and NMR. NMR can provide very precise information if sufficient spectra can be acquired for the particular residue in question. The combination of NMR results should expand the ability to probe detailed interactions and provide additional insights into the nature of subtle changes, altered charge interactions, and molecular motion. Similar developments are occurring in x-ray crystallography, particularly in the preparation of crystals for data collection. The use of cryocooling has become an increasingly routine approach to increase crystal lifetime in x-ray beams allowing a collection of diffraction data from smaller crystals. In the past few years, it has also proven to be a useful tool for the extension of resolution in larger crystals (35). Crystals of subtilisin from B. lentus, when collected at room temperature, routinely diffract to resolutions ranging between 1.8 and 1.6 A˚ (18–20). When cryocooled to 100 K, the same crystals have been found to diffract to a resolution of 0.78 A˚ (36). At 0.78 A˚ resolution, it is possible to differentiate the different elements on the basis of electron density so that, for example, correct orientation of histidine can be determined rather than inferred, and charge can be inferred from the presence of doubly and singly bonded carboxyl CO
56
Bott
bonds of aspartic and glutamic acids side chains. Hydrogen atoms can be visualized for well-ordered atoms, which, by their presence or absence, can be used to identify charged atoms. These advances will no doubt bring us closer to understanding not only the consequences of particular site-specific substitutions but of the overall relationship between the structure and function of the molecules responsible for and regulating metabolism. REFERENCES 1. 2.
3. 4.
5. 6. 7. 8.
9. 10.
11. 12. 13.
14.
JA Wells, DA Estell. Subtilisin—an enzyme designed to be engineered. Trends Biochem Res 13:291–297, 1988. BW Matthews. Studies on protein stability with T4 lysozyme. Advances in Protein Chemistry on ‘‘Protein Stability’’. New York: Academic Press, 1995, pp 249–278. A Shaw, R Bott, AG Day. Protein engineering of alpha-amylase for low pH performance. Curr Opin Biotechnol 10:349–352, 1999. JR Martin, FAA Mulder, Y Karmini-Nejad, J van der Zwan, M Mariani, D Schipper, R Boelens. The solution structure of serine protease PB92 from Bacillus alcalophilus presents a rigid fold and flexible substrate-binding site. Structure 5:521–532, 1997. G Rhodes. Crystallography Made Crystal Clear: A Guide for Users of Macromolecular Structure. San Diego, CA: Academic Press, 1993. TL Blundell, LN Johnson. Protein Crystallography. New York: Academic Press, 1976. D Blow. Outline of Crystallography for Biologists. Oxford: Oxford University Press, 2002. T Gallagher, J Oliver, R Bott, C Betzel, GL Gilliland. Subtilisin BPNVat 1.6 A˚ resolution: analysis of discreet disorder and comparison of crystal forms. Acta Crystallogr, D Biol Crystallogr 52:1125–1135, 1996. GN Ramachandran, C Ramakrishnan, V Sasissekharan. Stereochemistry of polypeptide chain conformations. J Mol Biol 7:95–99, 1963. HM Berman, J Westbrook, Z Feng, G Gilliland, TN Bhat, H Weissig, IN Shindyalov, PE Bourne. The protein data bank. Nucleic Acids Res 28:235–242, 2000. CL Nobbs, HC Watson, JC Kendrew. Structure of deoxymyoglobin: a crystallographic study. Nature 209:339–341, 1966. MF Perutz. X-ray analysis of hemoglobin. Science 140:863–869, 1963. CCF Blake, DF Koenig, GA Mair, ACT North, DC Phillips, VR Sarma. Structure of hen egg-white lysozyme: a three-dimensional Fourier synthesis at 2.0 A˚ resolution. Nature 206:757, 1965. HW Wyckoff, KD Hardman, NM Allewell, T Inagami, LN Johnson, FM Richards. The structure of ribonuclease-S at 3.5 A˚ resolution. J Biol Chem 242: 3984–3988, 1967.
Structures of Variant Enzymes
57
15. JL Dauberman, G Ganshaw, C Simpson, TP Graycar, S McGinnis, R Bott. Packing selection of Bacillus lentus subtilisin and a site-specific variant. Acta Crystallogr, D Biol Crystallogr 40:650–656, 1994. 16. J Navaza. AmoRe: an automated package for molecular replacement. Acta Crystallogr A 50:157–163, 1994. 17. AT Brunger, PD Adams, GM Clore, WL DeLano, P Gros, RW GrosseKumstlave, J-S Jiang, J Kusezewski, M Nilges, NS Pannu, RJ Read, LM Rice, T Simonsen, GL Warren. Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr, D Biol Crystallogr 54:905–921, 1998. 18. C Betzel, S Klupsch, G Papendorf, S Hastrup, S Branner, KS Wilson. Crystal structure of the alkaline protease savinase from Bacillus lentus at 1.4 A˚ resolution. J Mol Biol 223:427–445, 1992. 19. JM van der Laan, AV Teplyakov, H Kelders, KH Kalk, O Misset, LJSM Mulleners, BW Dijkstra. Crystal structure of the high alkaline serine protease PB92 from Bacillus alcalophilus. Protein Eng 5:405–411, 1992. 20. R Bott, J Dauberman, R Caldwell, C Mitchinson, L Wilson, B Schmidt, C Simpson, S Power, R Lad, IH Sagar, T Graycar, D Estell. Using structural comparison as a guide in protein engineering. Ann NY Acad Sci 672:10–19, 1992. 21. M Blaber, X Zhang, BW Matthews. Structural basis of amino acid a helix propensity. Science 260:1637–1640, 1993. 22. X Zhang, JA Wozniak, BW Matthews. Protein flexibility and adaptability seen in 25 crystal forms of T4 lysozyme. J Mol Biol 250:527–552, 1995. 23. A Berger, I Schecter. Mapping the active site of papain with the aid of peptide substrate and inhibitors. Philos Trans R Soc Lond, B 257:249–264, 1970. 24. JA Wells, BC Cunningham, TP Graycar, DA Estell. Recruitment of substratespecificity properties from one enzyme into a related one by protein engineering. Proc Natl Acad Sci USA 84:5167–5171, 1987. 25. T Graycar, M Knapp, G Ganshaw, J Dauberman, R Bott. Engineered Bacillus lentus subtilisin having altered flexibility. J Mol Biol 292:97–109, 1999. 26. FA Mulder, D Schipper, R Bott, R Boelens. Altered flexibility in the substratebinding site of related native and engineered high-alkaline Bacillus subtilisins. J Mol Biol 292:111–123, 1999. 27. H Luecke, B Schobert, J-P Cartailler, H-T Richter, A Rosengarth, R Needleman, JK Lanyl. Coupling photoisomerization of retinal to directional transport in bacteriorhodopsin. J Mol Biol 300:1237–1255, 2000. 28. G Sainz, CJ Carrell, MV Ponamarev, GM Soriano, WA Cramer, JL Smith. Interruption of the internal water chain of cytochrome f impairs photosynthetic function. Biochemistry 39:9164–9173, 2000. 29. LS Brown. Proton transport mechanism of bacteriorhodopsin as revealed by site-specific mutagenesis and protein sequence variability. Biochemistry (Moscow) 66:1249–1255, 2001. 30. H Luecke, B Schobert, J-P Cartailler, H-T Richter, JK Lanyl. Structural changes in bacteriorhodopsin during ion transport at 2.0 A˚ resolution. Science 286:255– 260, 1999.
58
Bott
31. R Bott, J Frane. Incorporation of crystallographic temperature factors in the statistical analysis of protein tertiary structures. Protein Eng 3:649–657, 1990. 32. C Bystroff, J Kraut. Crystal structure of unliganded Echerichia coli dihydrofolate reductase. Ligand-induced conformational changes and cooperativity in binding. Biochemistry 30:2227–2239, 1991. 33. T Ooi, K Nishikawa. Conformation of Biological Macromolecules and Polymers. New York: Academic Press, 1973. 34. R Bott. Unpublished results. 35. UK Genick, SM Soltis, P Kuhn, IL Canestrelli, ED Getzoff. Structure at 0.85 A˚ of an early protein photocycle intermediate. Nature 392:206–209, 1998. 36. P Kuhn, M Knapp, SM Soltis, G Ganshaw, M Thoene, R Bott. The 0.78 A˚ structure of a serine protease: Bacillus lentus subtilisin. Biochemistry 39:13446– 13452, 1998.
4 Quantitative Modeling of Lipase Enantioselectivity ¨rgen Pleiss Ju University of Stuttgart Stuttgart, Germany
1
INTRODUCTION
Lipases are versatile tools in the hands of organic chemists (1,2). They are used to hydrolyze ester bonds of a variety of nonpolar substrates at high activity, regioselectivity, and stereoselectivity. Moreover, they are used to catalyze the reverse reaction in nonpolar solvents. Thus the reaction can be optimized by changing substrate structure, solvent, additives, water activity, pressure, temperature, immobilization methods, and, as recombinant lipases became available, the biocatalyst itself (3–5). Optimization of reaction conditions is still a trial-and-error process of screening a highly dimensional parameter space. The role of the sequence and structure of the biocatalyst was studied by x-ray analysis, site-directed mutagenesis, and, recently, by random mutation, created by directed evolution experiments. Based on these experimental studies, empirical rules on how to predict the fast-reacting enantiomer were derived for secondary alcohols (6), primary alcohols (7), and carboxylic acids (8). These rules are highly useful to organic chemists, but are not able to predict quantitative proper59
60
Pleiss
ties such as the E value, nor the effect of changes in the reaction conditions or the biocatalyst itself. Since the first x-ray structures of lipases became available (9,10), lipase–substrate interactions were studied on a molecular level. Structure data confirmed the catalytic mechanism, which is similar to serine proteases, and identified the catalytic machinery: a catalytic triad (serine, histidine, aspartic, or glutamic acid) and an oxyanion hole. Soon it became evident that lipases may crystallize in two conformations, a closed form or an open form (9,11–13). These conformations differ in the position of the lid and the geometry of the oxyanion hole. In aqueous solution, the equilibrium is shifted toward the closed, inactive form, while near a hydrophobic substrate interface, the open, active form is stabilized (14). In this open form, the binding site is fully exposed. Several hydrophobic binding patches were observed: a patch to bind the acid moiety, mostly a medium or a long-chain fatty acid, and at least two more patches to bind the alcohol moiety. Currently, the sequence information of several thousand lipases is deposited, but only 30 lipases and serine esterases have known structures. Although lipases have no global sequence similarity, they have a similar architecture, the a/h hydrolase fold (15), and they follow the same catalytic mechanism. Many lipases show enantiorecognition toward chiral alcohols or carboxylic acids. For chiral secondary alcohols, x-ray data have revealed the structural basis of enantiorecognition by Candida rugosa lipase (16), which supports an empirical rule for the prediction of enantiopreference (6). For Pseudomonas cepacia lipase, the structural basis of stereoselectivity toward triacylglycerol analogs was investigated (17). Based on these structural data, computer-aided modeling is a promising method to establish quantitative predictions of enantioselectivity, which applies to a broad range of lipases and substrates. Lipase enantioselectivity is promising to be modeled because 1) lipases are highly active toward a broad range of substrates under a variety of reaction conditions; they act as monomers and need no cofactors; 2) many sequence and structure data on lipases are available, many of them in complex with substrate-analogous inhibitor; 3) interactions between protein and substrates are expected to be dominated by shape complementarity, with hydrophobic substrates binding to a hydrophobic binding site; induced fit effects upon binding of an inhibitor to the open form of a lipase are limited to side chains movements; 4) as enantioselectivity measures the ratio of kcat/Km toward the two enantiomers, only the transition state complexes have to be compared; in contrast to substrate specificity, differences in properties such as size, solubility, diffusion, or interactions of the Michaelis complexes play no role on enantioselectivity (18–20).
Quantitative Modeling of Lipase Enantioselectivity
2 2.1
61
SEQUENCE AND STRUCTURE SIMILARITIES: LIPASE ENGINEERING DATABASE Annotation of the Catalytic Machinery
The Lipase Engineering Database (http://www.led.uni-stuttgart.de) (21) was established as a repository that integrates information on sequence, structure, and function of lipases, and makes it available to protein engineering studies. Currently, it includes 92 sequences from the Swiss-Prot sequence database (22). Based on sequence similarity, each sequence is assigned to one of the 32 homologous families, which are grouped into 15 superfamilies. For each family, multisequence alignments have been performed. These are annotated by information on amino acids, which are relevant to function (catalytic triad, oxyanion hole, lid, substrate binding site), information on structure (secondary structure, disulfide bridges), and information on the effect of mutations. Nine of the thirty-two homologous families include a member with a known 3-D structure. Fifty-two structures of twenty different lipases are superposed and consistently annotated. The [G,A,T]-x-S-x-G motif near the catalytic serine is the only sequence motif common to all lipases and esterases. Therefore the other residues of the catalytic machinery, the catalytic H–D/E pair and the residues of the oxyanion hole, can only be identified by their three-dimensional structure. For all nine superfamilies, where one member has a known structure, this assignment can be performed with high reliability (21). Thus for 91% of all sequences in the Lipase Engineering Database, the catalytic machinery is completely annotated, compared to 48% in the Swiss-Prot database (November 1999); in addition, in four homologous families, the annotation of the catalytic histidine had to be corrected. 2.2
Shape of the Binding Site and Chain-Length Specificity
In all lipases, the binding site is a deep, hydrophobic pocket with varying shape. Lipases can be classified into three groups: 1) lipases that bind the scissile fatty acid in a long hydrophobic crevice near the surface (lipases from filamentous fungi); 2) lipases with a deep, funnel-type binding site (pancreatic lipases, Pseudomonas lipases, and lipase B from Candida antarctica); and 3) lipases that bind the scissile fatty acid in a tunnel deep in the protein and the alcohol moiety in a flat region near the protein surface (lipases from C. rugosa and Geotrichum candidum). The shapes of the binding sites can be used to interpret the biochemical properties of the enzymes (23). While the alcohol binding site of C. antarctica lipase B is located at the wall of a deep and narrow funnel, it is well accessible
62
Pleiss
in the lipase from C. rugosa. This may explain why C. antarctica lipase B is frequently used to resolve racemic mixtures of small secondary alcohols at high enantioselectivity, while C. rugosa lipase is used for bulky secondary alcohols with ring structures (24). The shape and size of the scissile fatty acid binding site mediate the chain length profile of the lipase: C. antarctica lipase B, which prefers short- and medium-chain-length fatty acids, binds the scissile fatty acid at the wall of its narrow, 6-A˚-long funnel, while the long-chainspecific Rhizomucor miehei lipase has a 10-A˚-long hydrophobic crevice (23). The latter has been blocked in the homologous Rhizopus lipases by point mutants, thus shifting the specificity profile toward short-chain fatty acids (25,26). 2.3
Classification by Conserved Structural Elements
The systematic comparison of sequence and structure of all microbial lipases demonstrated that despite their variability in sequence and structure, they can be assigned to either of two classes derived from the structure of the oxyanion hole (21): the GGGX type (with G binding to the oxyanion), which includes mostly carboxylesterases, and the GX type, which includes all other lipases that bind the oxyanion via the backbone nitrogen of a hydrophobic or hydrophilic residue (denoted X). This structure-based classification seems to have direct implications to substrate specificity: While all GX-type lipases are not accepting esters of tertiary alcohols, most GGGXtype lipases are hydrolyzing these substrates at moderate enantioselectivity (26a). 3
A QUANTITATIVE MODEL OF ENANTIOSELECTIVITY
Secondary alcohols are industrially important optically active intermediates. Racemic resolution or enantioselective acylation catalyzed by P. cepacia lipase has been well studied because of the high enantioselectivity of the enzyme toward a broad range of substrates. However, for some substrates, enantioselectivity is low despite optimization of reaction conditions. The catalytic machinery of P. cepacia lipase (catalytic triad S87–H286– D264 and oxyanion hole L17–Q88) is located at the bottom of the funnel-like substrate binding site. The acid and the alcohol moieties of the substrate bind to the wall of the funnel: 1) the acid moiety to the hydrophobic crevice (23); 2) the large substituent of the alcohol moiety to the hydrophobic dent (27) (side chains of L248, L287, V266, and backbone atoms of the catalytic H286); 3) the medium-sized substituent of the alcohol moiety to the entrance to the hydrophilic trench (side chains of T18, Y29, H86, L287, I290); 4) the side chains of Y29 and L287 open toward the hydrophilic trench, which consists of
Quantitative Modeling of Lipase Enantioselectivity
63
hydrophilic and hydrophobic side chains (T18, L27, Y29, H86, L287, Q292, I290, L293), and the backbone atoms of Y29; 5) two rigid structures, the oxyanion stop (backbone atoms of L17 and T18) near the oxyanion hole and the His stop (side chain of H86 and backbone of the catalytic H286) near the hydrophobic dent. The two enantiomers of 30 chiral secondary alcohol substrates for which experimental E values have been published (24) were manually placed in the substrate binding site of P. cepacia lipase and relaxed by molecular dynamics simulation (Fig. 1). Secondary alcohols can bind in two binding modes (27): 1) in a productive binding mode, where the distance d(HNq–Oalc) between the HNq of the catalytic H286 and the alcohol oxygen Oalc of the substrate is less than 2.5 A˚, thus allowing formation of a hydrogen bond; the fast-reacting enantiomer optimally binds in this mode, while the slow-reacting enantiomer is repelled by the oxyanion stop; 2) in a non-
Figure 1 Surface of the substrate binding site of Pseudomonas cepacia lipase (hydrophilic and hydrophobic side chains) in complex with the fast-reacting enantiomer of substrate in productive binding mode; the alcohol moiety is a chiral secondary alcohol with hydrogen (light gray) and two substituents L and M at the stereo center: the large substituent (L) binds to the hydrophobic dent, the mediumsized substituent (M) near the entrance to the hydrophilic trench; the scissile fatty acid (R) binds to the hydrophobic crevice (Ref. 27).
64
Pleiss
productive binding mode, where the distance d(HNq–Oalc) is greater than 2.5 A˚. While the slow-reacting enantiomer optimally binds in this mode, the fastreacting enantiomer is blocked by the His stop. Thus enantiopreference can be explained by the fast- and slow-reacting enantiomers preferably binding to a productive and a nonproductive mode, respectively. This is in accordance with x-ray data on binding of D- and L-menthol to C. rugosa lipase (16). Both enantiomers of the 30 substrates were docked in the productive mode and d(HNq–Oalc) was determined. For the slow-reacting enantiomer, d(HNq–Oalc) correlated best to the experimentally determined E values. Three regions were assigned (Fig. 2): For substrates with low E values (E < 20), distances d(HNq–Oalc) are smaller than 2.0 A˚. The high activity toward the slow-reacting enantiomer is consistent with the observed low enantioselectivity. For substrates with high E values (E > 100), distances d(HNq–Oalc) are larger than 2.2 A˚. Substrates in a twilight zone between 2.0 and 2.2 A˚ have unpredictable enantioselectivity. This in silico assay of enantioselectivity was also successfully applied to explain enantioselectivity of C. rugosa lipase toward secondary alcohols with two stereo centers (28) and of P. cepacia lipase toward g- and y-lactones (29). As an alternative to analyzing the geometry of enzyme–substrate complexes, the difference in free energy has been determined by molecular mod-
Figure 2 Correlation of d(HNq–Oalc) for the slow-reacting enantiomer in a productive binding mode with experimental E values for 30 substrates (E values > 100 were displayed at E = 100); three zones are indicated. Low and high E values are separated by a twilight zone (Ref. 27).
Quantitative Modeling of Lipase Enantioselectivity
65
eling (30). By docking both enantiomers of chiral secondary alcohols to Candida antarctica lipase B, the two binding modes and enantiopreference can be predicted by comparing the potential energy of complexes with both enantiomers (30,31). When entropy was included in the evaluation of free energy differences, the changes in enantioselectivity can also be reproduced (32,33). 4
MUTANTS WITH CHANGED ENANTIOSELECTIVITY
4.1
Stereoselectivity Toward Triacylglycerols and Sn-2 Substituted Analogs Triacylglycerols are the natural substrates of lipases. Lipases from filamentous fungi from the genus Rhizopus and Mucorales were shown to predominantly hydrolyze the sn-1 and sn-3 groups (Fig. 3), and show slightly different stereoselectivity (34). In an effort to find the structural determinants of stereoselectivity, structural analogs of triacylglycerols were investigated with
Figure 3 Left: Stereoselective hydrolysis of triradylglycerol to form sn-1,2- or sn2,3-diradylglycerols; right: flexible (ether, benzylether, ester) and rigid (amide, phenyl) sn-2 substituents (Ref. 40).
66
Pleiss
their functional sn-2 ester group exchanged by an ether, amide, or phenyl group (35–37). Modifying the structure near the stereo center had an influence not only on enantiomeric excess, but also on stereopreference of Rhizopus lipase: While substrates with a flexible sn-2 group (ether, ester) were preferably hydrolyzed in the sn-1 position, Rhizopus lipase had sn-3 preference toward substrates with a rigid sn-2 group (amide, phenyl). Moreover, the homologous Rhizomucor miehei lipase did not show this switch in stereopreference: For all four substrates, the lipase preferably hydrolyzed the sn-1 group, although the lipases from Rhizopus and R. miehei have a similar structure and their sequences are 56% identical. Thus stereopreference seems to depend both on the structure of the substrate and on the details of sequence and structure of the biocatalyst. An empirical rule to predict enantiopreference toward primary alcohols (7) could not be applied to explain these experimental results: It only includes lipase from Pseudomonas cepacia and excludes substrates with an oxygen next to the stereo center. To explain this puzzling observation, the interaction of Rhizopus and Rhizomucor lipases and triacylglycerols and sn-2 substituted analogs were modeled (37–39), and mutants were designed with modified stereoselectivity (40). In both lipases, the scissile fatty acid binds to a hydrophobic crevice (T83, A89, I93, F95, F112, L146, P178, V206, V209, P210, F216 in Rhizopus lipase) (23). The binding site of the diacylglycerol moiety consists of two
Figure 4 Triacylglycerol (ester substrate) in sn-1 (left) and sn-3 (right) orientation. Side chains of the catalytic S145 and the two mutated amino acids L258 and L254 are displayed. AO3–C3 describes the torsion of the bond between C3 of the glycerol backbone and an alcohol oxygen (Ref. 40).
Quantitative Modeling of Lipase Enantioselectivity
67
orthogonal a-helices (D204–V209 and S253–S259), the G elbow loop (39,41), and a hydrophobic patch, the hydrophobic dent (in Rhizopus lipase: I205, T252, L254, L258). Triacylglycerol substrates and their sn-2-substituted analogs were docked in two orientations to the binding sites (Fig. 4): in the sn-1 orientation with the scissile sn-1 chain bound to the hydrophobic crevice, or in the sn-3 orientation with the scissile sn-3 chain bound to the hydrophobic crevice. In both orientations, the sn-2 chain was positioned in the hydrophobic dent. The lipase–substrate complexes in both orientations of the substrates were relaxed by energy minimization and molecular dynamics simulation. The geometry of the averaged substrate structure was analyzed and correlated with the experimentally determined stereoselectivity (37,38). For both lipases and all substrates, the torsion angle AO3–C3 of the substrates in the sn-3 orientation was an appropriate probe of stereoselectivity: For AO3–C3 > 150j, both lipases preferably hydrolyzed the substrate in the sn-1 position; for AO3–C3 < 150j, the sn-3 position was preferred (Tab. 1).
Table 1 Stereoselectivity of Lipases from Rhizopus (ROL) and Rhizomucor Toward Flexible (Ether, Ester) and Rigid (Amide, Phenyl) Triradylglycerols Experimental Lipase/substrate
Preference
ee value [%]
Model a
E
b
Preference
AO3–C3
ROL Ether Ester Amide Phenyl
sn-1 sn-1 sn-3 sn-3
61 19 63 77
(F (F (F (F
2) 5) 6) 3)
4 1 5 8
sn-1 sn-1 sn-3 sn-3
164j 170j 117j 118j
RML Ether Ester Amide Phenyl
sn-1 sn-1 sn-1 sn-1
69 73 56 68
(F (F (F (F
4) 3) 2) 2)
6 7 4 6
sn-1 sn-1 sn-1 sn-1
163j 169j 166j 173j
a
ee ¼
½A ½B 100 ½A þ ½B
E¼
lnð1 cð1 þ eep ÞÞ ; at conversion c ¼ 10% lnð1 cð1 eep ÞÞ
b
Source: Ref. 39.
68
Pleiss
Comparing the two orientations, the side chain of L258 differentiated between both conformations by interaction with the substrate. This seemed to be the major determinant of stereoselectivity. In the sn-1 orientation, the functional group of the sn-2 chain binds deep in the His gap, a cleft between the catalytic H257 and its neighbor L258 (Fig. 5); in the sn-3 orientation, the sn-2 chain is near the entrance to the His gap. As a consequence, the more rigid and bulky the sn-2 substituent, the more unfavorable its sn-1 orientation and the more favorable its sn-3 orientation. Thus the structure of the substrate’s sn-2 group and the shape of the lipase’s His gap determine stereoselectivity. 4.2
Reversal of Stereopreference for Rigid Substrates
To validate the role of the His gap for stereoselectivity, its size was modified by replacing L258 by a bulky phenylalanine, a small and hydrophobic alanine, or a hydrophilic serine. Mutants that lead to an increased His gap were expected to display increased sn-1 selectivity or decreased sn-3 selectivity, while a decrease of the His gap size should shift stereoselectivity toward sn-3. As expected, the modeled torsion angles and the experimentally determined stereoselectivity followed this prediction: For the rigid amide and phenyl substrates, the mutant L258F had increased sn-3 selectivity (E = 14 and E = 22, compared to wild type E = 5 and E = 8 for amide and phenyl substrates, respectively), while the mutants L258A and L258S were less sn-3 selective toward the amide substrate (40). The most interesting effect was observed for
Figure 5 Model of a complex of Rhizomucor lipase and trioctanoin in sn-1 orientation; the functional ester group of the sn-2 fatty acid points toward the His gap (side chains of H257 and L258) (Ref. 41).
Quantitative Modeling of Lipase Enantioselectivity
69
the bulky phenyl substrate: While the mutant L258F had higher sn-3 selectivity than the wild-type enzyme (E = 22 and E = 8, respectively), the stereopreference of the mutants L258A and L258S switched to sn-1 (E=5 and E=3, respectively). Thus exchanging a single residue led to a reversal of the apparent handedness of the biocatalyst. However, this effect was highly specific: It occurred only toward the phenyl substrate but not toward the amide or other substrates. The effect of mutations toward flexible substrates (ether, ester) was less pronounced. Because L258A and L258S had similar selectivity, the shape of the His gap, but not its physicochemical properties, mediates stereoselectivity. For all mutants and substrates, the torsion angle AO3–C3 predicts stereopreference (AO3–C3 > 150j: sn-1 selective; AO3–C3< 150j: sn-3 selective), and even ranking of substrates and mutants by stereoselectivity (Fig. 6).
Figure 6 Correlation of experimentally determined E values and torsion angle AO3– C3 for all mutants of Rhizopus lipase and substrates; sn-1 preference: E > 1, AO3–C3 > 150j, sn-3 preference: E < 1, AO3–C3 < 150j.
70
5 5.1
Pleiss
TUNING ENANTIOSELECTIVITY BY REACTION CONDITIONS Solvent Effects
To establish a quantitative model of the enantioselectivity of Pseudomonas lipase toward secondary alcohols (cf Chapter 3), experimentally determined E values were collected from literature. Selected data were determined under optimized reaction conditions. For most of the substrates, enantioselectivity could be increased by choosing the appropriate solvent. One of the investigated secondary alcohols (medium-sized and large-substituent CF3 and naphthyl, respectively, according to Fig. 1) had low enantioselectivity (E = 22) in t-butyl methyl ether. Changing the solvent led to moderate (E = 60–70 in diethyl ether, toluene, dodecane, and hexane) and even high enantioselectivity (E > 100 in tetrahydrofurane, acetone, and benzene) (42). However, for other substrates, solvent engineering failed to improve low enantioselectivity (43). The in silico assay offers a structural interpretation of this observation (27) assuming an upper limit of enantioselectivity, which is determined by the structure of lipase and substrate, and can be probed by modeling. For large or small distances d(HNq–Oalc), high or low, respectively, enantioselectivity is expected if the optimal solvent is used. However, for a suboptimal solvent, enantioselectivity decreases below this structure-based optimum. Thus for all substrates for which the experiment shows low enantioselectivity in a given solvent, but modeling results in large distances d(HNq–Oalc), it may be worthwhile trying to increase enantioselectivity by solvent engineering. However, for small distances, the model predicts low enantioselectivity for all solvents. 5.2
Pressure Dependence
Since long it has been suggested that enantioselectivity could be tuned by pressure (44). Recently, it was found for a supercritical CO2 system that increasing the pressure led to a decrease in enantioselectivity of Candida antarctica lipase B catalyzed acylation (45). This effect was also observed for Candida rugosa lipase catalyzed transesterification of esters of racemic menthol in chloroform under different pressures (46). In order to rationalize these experimental findings, a fully solvated system of Candida rugosa lipase in chloroform at 7% water content was investigated by molecular dynamics simulations at various pressures (46). A water-filled cavity was identified, which leads from the protein surface through the center of the protein toward the catalytic H449. At increasing pressures, it gradually filled with water molecules, thus increasing its volume and displacing the catalytic histidine
Quantitative Modeling of Lipase Enantioselectivity
71
Figure 7 Pressure induced displacement of the H449 side chain in the active site of Candida rugosa lipase. The lipase structure was averaged over the last 50 ps of the 100 bar simulation with the coordinates of the 13 water molecules in the water channel taken from the snapshot at 250 ps. The (+)-menthyl ester was docked as tetrahedral intermediate to the averaged structure and energy was minimized. In comparison, the crystal structure (1LPM) contains only 6 water molecules in the water channel (Ref. 46).
side chain (Fig. 7). The difference DdNq–O = dNq–O+ dNq–O of the distance between the H449–Nq and the menthyl oxygen for fast- and slow-reacting enantiomer were analyzed; as the H449 side chain was displaced, DdNq–O decreased, which can explain the decreasing enantioselectivity (Fig. 8). 6
OUTLOOK: FROM QUALITATIVE TO QUANTITATIVE INFORMATION
For a broad range of lipases and substrates (secondary and primary alcohols and caboxylic acids), the molecular basis of enantiorecognition was modeled by flexible docking using molecular dynamics simulations. As the binding sites and the substrates are hydrophobic, enzyme–substrate interaction is predominantly sterical. Upon docking, the protein side chains and the substrate change their conformation. The resulting conformation of protein side chains and substrate differs for the two enantiomers. For each substrate class, a geometrical parameter that predicts the fast-reacting enantiomer could be
72
Pleiss
Figure 8 Correlation between experimentally determined enantioselectivity E and the difference of distances between H449–Nq and menthyl-alcohol-O of the (+)- and ()-enantiomer DdNq–O = dNq–O+ dNq–O [dNq–O+, dNq–O: distance between (+) and ()-menthyl-alcohol-O, respectively] (46).
identified. In addition, a semiquantitative correlation was observed between a geometrical parameter derived from the model and the experimentally determined enantioselectivity as measured by the E value. This correlation holds with few exceptions for changing the substrate structure, and also for changing the shape of the binding site by site-directed mutagenesis. Thus the in silico assay could be applied to predict ranking of mutants by enantioselectivity toward a single substrate, and also to predict ranking of substrates. Enzyme–substrate pairs can only be ranked by comparing identical reaction conditions. Changing external parameters such as hydrostatic pressure or solvent type may have dramatic effects on enantioselectivity: Increasing pressure leads to diffusion of water molecules into internal cavities of Candida rugosa lipase, thus changing the shape of the binding site and, consequently, its enantioselectivity. The effect of solvent can be estimated by the following observation: There seems to be a maximum enantioselectivity for each enzyme–substrate pair that is determined by the structure of both components and that can be predicted by modeling. Enzyme–substrate pairs with low or high enantioselectivity are predicted by small or large distances d(HNq–Oalc), respectively, in the model. This structure-limited maximum enantioselectivity can be attained by using the optimal solvent. However, nonoptimal solvent decreases enantioselectivity. Thus enzyme–substrate pairs with large distances
Quantitative Modeling of Lipase Enantioselectivity
73
d(HNq–Oalc) but with low experimental enantioselectivity are expected to be optimizable by solvent engineering. As it is known since long that lipase selectivity may be strongly influenced by the reaction medium, several suggestions have been put forward to explain these effects (5). Because the two enantiomers bind in different orientations, the solvent-exposed surface might differ. Exchanging the solvent could lead to changes in the difference in free energy of binding, DDG, of the two enantiomers. Alternatively, solvent molecules might compete with the two enantiomers for the binding site; exchanging the solvent would shift the equilibrium between the two enantiomers. As a third explanation, the solvent might change the average structure and the dynamics of the lipase, which could result in a change of enantioselectivity. Another factor that mediates enantioselectivity are mutations in the lipase. For mutants in the binding site with direct contact to the substrates, their short-range interaction with the substrate can be fairly modeled by analyzing local geometry (40) or evaluating activation enthalpy and entropy (32,33). However, there is growing evidence that amino acids located far from the binding site can also mediate enantioselectivity of lipases, as shown by directed evolution experiments (47) or by chemical modification (48). How can such long-range interaction be rationalized? For other proteins, it has been shown that the specificity of a receptor or the interaction of an enzyme with reaction intermediates are linked to the dynamics of the complex (49–51). In a few cases, it has been demonstrated that mutations or chemical modifications may indeed change the dynamics of the protein (52,53). In principle, molecular dynamics simulations of solvated systems are appropriate to study these effects. With current hardwares and softwares, simulations of mediumsized proteins are performed in the 10–100 nsec time scale (54), which is not far from the Asec to msec time scale of slow hinge bending motions that are suspected to play a dominant role in binding (55). Understanding the relationship of sequence, structure, dynamics, and function will open new routes of analyzing sequence information. Today, qualitative information is derived from sequence data: assignment to a protein family or enzyme class, and, in the best case, annotation of functionally relevant amino acids. However, if we want to understand the properties of enzyme mutants or if we want to apply protein engineering to optimize biochemical properties, we need methods for quantitative prediction of how enantioselectivity depends on protein sequence, solvent effects, or substrate structure. A mutant is not simply ‘‘better than wild type’’—it is better toward one given substrate, but may be worse toward another. Thus for a reaction to be optimized, mutants have to be engineered in a personalized way to match the reaction conditions and the structure of the substrate. Understanding the interplay of sequence, structure, dynamics of an enzyme, and its interaction
74
Pleiss
with substrate and solvent on a quantitative level will allow to understand the metabolic function of enzymes, to direct engineering, and to control the quality of biocatalytic experiments.
REFERENCES 1. 2. 3. 4. 5. 6.
7. 8.
9.
10. 11.
12. 13.
14. 15.
RD Schmid, R Verger. Lipases: interfacial enzymes with attractive applications. Angew Chem Int Ed Engl 37:1608–1633, 1998. UT Bornscheuer, RJ Kazlauskas. Hydrolases in Organic Synthesis—Regioand Stereoselective Biotransformations. Weinheim: Wiley-VCH, 1999. A Svendsen. Lipase protein engineering. Biochim Biophys Acta 1543:223–238, 2000. F Theil. Enhancement of selectivity and reactivity of lipases by additives. Tetrahedron 56:2905–2919, 2000. P Berglund. Controlling lipase enantioselectivity for organic synthesis. Biomol Eng 18:13–22, 2001. RJ Kazlauskas, ANE Weissfloch, AT Rappaport, LA Cuccia. A rule to predict which enantiomer of a secondary alcohol reacts faster in reactions catalyzed by cholesterol esterase, lipase from Pseudomonas cepacia, and lipase from Candida rugosa. J Org Chem 56:2656–2665, 1991. ANE Weissfloch, RJ Kazlauskas. Enantiopreference of lipase from Pseudomonas cepacia toward primary alcohols. J Org Chem 60:6959–6969, 1995. SN Ahmed, RJ Kazlauskas, AH Morinville, P Grochulski, JD Schrag, M Cygler. Enantioselectivity of Candida rugosa Lipase toward Carboxylic Acids—a Predictive Rule from Substrate Mapping and X-ray Crystallography. Biocatalysis 9:209–225, 1994. L Brady, AM Brzozowski, ZS Derewenda, E Dodson, G Dodson, S Tolley, JP Turkenburg, L Christiansen, B Huge-Jensen, L Norskov. A serine protease triad forms the catalytic centre of a triacylglycerol lipase. Nature 343:767–770, 1990. FK Winkler, A D’Arcy, W Hunziker. Structure of human pancreatic lipase. Nature 343:771–774, 1990. U Derewenda, AM Brzozowski, DM Lawson, ZS Derewenda. Catalysis at the interface: the anatomy of a conformational change in a triglyceride lipase. Biochemistry 31:1532–1541, 1992. P Grochulski, Y Li, JD Schrag, M Cygler. Two conformational states of Candida rugosa lipase. Protein Sci 3:82–91, 1994. JD Schrag, Y Li, M Cygler, D Lang, T Burgdorf, HJ Hecht, R Schmid, D Schomburg, TJ Rydel, JD Oliver, LC Strickland, CM Dunaway, SB Larson, J Day, A McPherson. The open conformation of a Pseudomonas lipase. Structure 5:187–202, 1997. R Verger. ‘‘Interfacial activation’’ of lipases: facts and artefacts. Trends Biotechnol 15:32–38, 1997. DL Ollis, E Cheah, M Cygler, B Dijkstra, F Frolow, SM Franken, M Harel, SJ
Quantitative Modeling of Lipase Enantioselectivity
16.
17.
18. 19. 20.
21.
22. 23. 24.
25. 26.
26a.
27.
28.
29.
30.
75
Remington, I Silman, J Schrag. The alpha/beta hydrolase fold. Protein Eng 5:197–211, 1992. M Cygler, P Grochulski, RJ Kazlauskas, JD Schrag, F Bouthillier, B Rubin, AN Serreqi, AK Gupta. A structural basis for the chiral preferences of lipases. J Am Chem Soc 116:3180–3186, 1994. DA Lang, MLM Mannesse, GH DeHaas, HM Verheij, BW Dijkstra. Structural basis of the chiral selectivity of Pseudomonas cepacia lipase. Eur J Biochem 254:333–340, 1998. RS Phillips. Temperature effects on stereochemistry of enzymatic reactions. Enzyme Microb Technol 14:417–419, 1992. PLA Overbeeke, SC Orrenius, JA Jongejan, JA Duine. Enthalpic and entropic contributions to lipase enantioselectivity. Chem Phys Lipids 93:81–93, 1998. PLA Overbeeke, J Ottosson, K Hult, JA Jongejan, JA Duine. The temperature dependence of enzyme kinetic resolutions reveals the relative importance of enthalpy and entropy to enzyme enantioselectivity. Biocatal Biotransform 17: 61–79, 1999. J Pleiss, M Fischer, M Peiker, C Thiele, RD Schmid. Lipase Engineering Database—understanding and exploiting sequence–structure–function relationships. J Mol Catal, B 10:491–508, 2000. A Bairoch, R Apweiler. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res 25:31–36, 1997. J Pleiss, M Fischer, RD Schmid. Anatomy of lipase binding sites: the scissile fatty acid binding site. Chem Phys Lipids 93:67–80, 1998. RJ Kazlauskas, UT Bornscheuer. Biotransformations with lipases. In: H-J Rehm, G Reed, Eds. Biotechnology. Weinheim, New York: Wiley-VCH, 1998, pp 37–191. RD Joerger, MJ Haas. Alteration of chain length selectivity of a Rhizopus delemar lipase through site-directed mutagenenesis. Lipids 29:377–384, 1994. RR Klein, G King, RA Moreau, MJ Haas. Altered acyl chain length specificity of Rhizopus delemar lipase through mutagenesis and molecular modeling. Lipids 32:123–130, 1997. E Henke, J Pleiss, UT Bornscheuer. Activity of lipases and esterases towards tertiary alcohols: Insights into structure-function relationships. Angew Chem Int Ed 41:3211–3213, 2002. T Schulz, J Pleiss, RD Schmid. Stereoselectivity of Pseudomonas cepacia lipase toward secondary alcohols: a quantitative model. Protein Sci 9:1053–1062, 2000. T Schulz, RD Schmid, J Pleiss. Structural basis of stereoselectivity in Candida rugosa lipase catalyzed hydrolysis of secondary alcohols. J Mol Model 7:265– 270, 2001. B-Y Hwang, H Scheib, J Pleiss, B-G Kim, RD Schmid. Computer-aided molecular modeling of the enantioselectivity of Pseudomonas cepacia lipase toward g- and y-lactones. J Mol Catal B 10:223–231, 2000. F Haeffner, T Norin, K Hult. Molecular modeling of the enantioselectivity in lipase-catalyzed transesterification reactions. Biophys J 74:1251–1262, 1998.
76
Pleiss
31. S Raza, L Fransson, K Hult. Enantioselectivity in Candida antarctica lipase B: a molecular dynamics study. Protein Sci 10:329–338, 2001. 32. J Ottosson, JC Rotticci-Mulder, D Rotticci, K Hult. Rational design of enantioselective enzymes requires considerations of entropy. Protein Sci 10:1769– 1774, 2001. 33. D Rotticci, JC Rotticci-Mulder, S Denman, T Norin, K Hult. Improved enantioselectivity of a lipase by rational protein engineering. Chembiochem 2:766– 770, 2001. 34. E Rogalska, C Cudrey, F Ferrato, R Verger. Stereoselective hydrolysis of triglycerides by animal and microbial lipases. Chirality 5:24–30, 1993. 35. P Stadler, A Kovac, L Haalck, F Spener, F Paltauf. Stereoselectivity of microbial lipases. The substitution at position sn-2 of triacylglycerol analogs influences the stereoselectivity of different microbial lipases. Eur J Biochem 227: 335–343, 1995. 36. A Kovac, P Stadler, L Haalck, F Spener, F Paltauf. Hydrolysis and esterification of acylglycerols and analogs in aqueous medium catalyzed by microbial lipases. Biochim Biophys Acta 1301:57–66, 1996. 37. L Haalck, F Paltauf, J Pleiss, RD Schmid, F Spener, P Stadler. Stereoselectivity of lipase from Rhizopus oryzae towards triacylglycerols and analogs: computer aided modeling and experimental validation. Methods Enzymol: Lipases 284: 353–376, 1997. 38. H-C Holzwarth, J Pleiss, RD Schmid. Computer aided modelling of Rhizopus oryzae lipase catalyzed stereoselective hydrolysis of triglycerides. J Mol Catal B 3:73–82, 1997. 39. H Scheib, J Pleiss, A Kovac, F Paltauf, RD Schmid. Stereoselectivity of Mucorales lipases toward triradylglycerols—a simple solution to a complex problem. Protein Sci 8:215–221, 1999. 40. H Scheib, J Pleiss, P Stadler, A Kovac, AP Potthoff, L Haalck, F Spener, F Paltauf, RD Schmid. Rational design of Rhizopus oryzae lipase with modified stereoselectivity toward triradylglycerols. Protein Eng 11:675–682, 1998. 41. J Pleiss, H Scheib, RD Schmid. The His gap motif in microbial lipases: a determinant of stereoselectivity toward triacylglycerols and analogs. Biochimie 82:1043–1052, 2000. 42. J Gaspar, A Guerrero. Lipase-catalysed enantioselective synthesis of napthyl trifluoromethyl carbinols and their corresponding non-fluorinated counterparts. Tetrahedron: Asymmetry 6:231–238, 1995. 43. I Petschen, EA Malo, MP Bosch, A Guerrero. Highly enantioselective synthesis of long chain alkyl trifluoromethyl carbinols and h-thiotrifluoromethyl carbinols through lipases. Tetrahedron: Asymmetry 7:2135–2143, 1996. 44. SV Kamat, B Iwaskewycz, EJ Beckman, AJ Russell. Biocatalytic synthesis of acrylates in supercritical fluids: tuning enzyme activity by changing pressure. Proc Natl Acad Sci U S A 90:2940–2944, 1993. 45. T Matsuda, R Kanamaru, K Watanabe, T Harada, K Nakamura. Control on enantioselectivity with pressure for lipase-catalyzed esterification in supercritical carbon dioxide. Tetrahedron Lett. 42:8319–8321, 2001.
Quantitative Modeling of Lipase Enantioselectivity 46.
47.
48.
49.
50. 51.
52.
53. 54. 55.
77
UHM Kahlow, RD Schmid, J Pleiss. A model of the pressure dependence of the enantioselectivity of Candida rugosa lipase towards (+/)-menthol. Protein Sci 10:1942–1952, 2001. DX Zha, S Wilensek, M Hermes, KE Jaeger, MT Reetz. Complete reversal of enantioselectivity of an enzyme-catalyzed reaction by directed evolution. Chem Commun 2664–2665, 2001. M Basri, BL Th’ng, CN Razak, AB Salleh. Effect of reductive alkylation of Candida rugosa lipase on its enantioselective esterification reaction. Ann N Y Acad Sci 864:192–197, 1998. L Zidek, MV Novotny, MJ Stone. Increased protein backbone conformational entropy upon hydrophobic ligand binding. Nat Struct Biol 6:1118–1121, 1999. JL Radkiewicz, CL Brooks. Protein dynamics in enzymatic catalysis: exploration of dihydrofolate reductase. J Am Chem Soc 122:225–231, 2000. MJ Osborne, J Schnell, SJ Benkovic, HJ Dyson, PE Wright. Backbone dynamics in dihydrofolate reductase complexes: role of loop flexibility in the catalytic mechanism. Biochemistry 40:9846–9859, 2001. RB Rose, CS Craik, RM Stroud. Domain flexibility in retroviral proteases: structural implications for drug resistant mutations. Biochemistry 37:2607– 2621, 1998. BF Volkman, D Lipson, DE Wemmer, D Kern. Two-state allosteric behavior in a single-domain signaling protein. Science 291:2429–2433, 2001. V Daggett. Long timescale simulations. Curr Opin Struct Biol 10:160–164, 2000. B Ma, M Shatsky, HJ Wolfson, R Nussinov. Multiple diverse ligands binding at a single protein site: a matter of pre-existing populations. Protein Sci 11:184– 197, 2002.
5 Rational Redesign of Haloalkane Dehalogenases Guided by Comparative Binding Energy Analysis ˇek and Toma ´ˇ ˇka ´, Jan Kmunı´c Jirˇı´ Damborsky s Jedlic Masaryk University Brno, Czech Republic
Santos Luengo and Federico Gago University of Alcala Madrid, Spain
Angel R. Ortiz Mount Sinai School of Medicine New York, New York, U.S.A.
Rebecca C. Wade EML Research Heidelberg, Germany
1
COMPARATIVE BINDING ENERGY ANALYSIS
Comparative binding energy (COMBINE) analysis is a computational method for deducing quantitative structure–activity relationships using structural 79
80
Damborsky´ et al.
data from ligand–macromolecule complexes (1). It can be applied to the formation of macromolecule–small molecule complexes and macromolecule– macromolecule complexes; in this article, these complexes will be referred to generically as macromolecule–ligand complexes. The ‘‘COMBINE’’ acronym refers to two aspects of the technique (2): (i) macromolecule–ligand structural data are combined with experimental binding data and (ii) empirical molecular mechanics energy calculations are combined with Partial Least-Squares Projection to Latent Structures (PLS) chemometric analysis. COMBINE analysis systematically explores the relationships between experimental binding affinities for a set of ligands and selected interaction energies with the macromolecule. COMBINE analysis is formally similar to CoMFA (comparative molecular field analysis) (3) in as much as both methods deal with data matrices containing a large number of energy descriptors that are subjected to chemometric analysis. On the other hand, the energy descriptors differ: in CoMFA they are interaction fields calculated for the ligand alone, whereas in COMBINE analysis they represent residue-based ligand-receptor interactions. Compared to classical molecular mechanics calculations of binding energies, the advantages of subjecting ligand–macromolecule interaction energies to statistical analysis are that the noise due to inaccuracies in the potential energy functions and molecular models can be reduced and that mechanistically important interactions can be identified. Compared to classical Quantitative Structure–Activity Relationships (QSAR) analysis, COMBINE is expected to be more predictive as it incorporates more physically relevant information about the energetics of ligand–receptor interactions (1). To estimate the total binding energy for each ligand–macromolecule complex, DU, a molecular mechanics force field is used to calculate the following terms: (i) the sum, EINTERLM, of intermolecular interaction energies (Dui) between the ligand and each macromolecule residue, each of which consists of van der Waals and electrostatic contributions; (ii) the change in intramolecular energy of the ligand upon binding to the macromolecule, DEL; and (iii) the change in intramolecular energy of the macromolecule upon ligand binding, DEM. In addition, a measure of the cost in electrostatic free energy of desolvating the apposing surfaces of both interacting partners upon complex formation (4) is estimated using a continuum electrostatics method that provides two extra terms: (iv) the desolvation energy of the ligand, EDESOLVL, and (v) the desolvation energy of the macromolecule, EDESOLVM. L M LM þ DE L þ DEM þ EDESOLV þ EDESOLV DU ¼ EINTER
ð1Þ
The COMBINE analysis methodology is schematized in Fig. 1. The energy descriptors obtained from the set of experimentally determined or modeled ligand–macromolecule complexes are used to construct a matrix in
Rational Redesign of Haloalkane Dehalogenases
1
2
L1
L2
L1M
3
..... L2M
81
Ln
M
+ .....
LnM
R 1 R 2 R 3 R 4 ...R n R 1 R 2 R 3 R 4 ...R n E DESOLV Ki L1 L2 L3 . . Ln
E VDW
E ELE
PLS
COMBINE model
Figure 1 Scheme of the COMBINE analysis methodology. L1. . .Ln—ligands; M— macromolecule; L1M. . .LnM—ligand–macromolecule complexes; R1. . .Rn—residue of a macromolecule; EVDW—van der Waals interaction energy; EELE—electrostatic interaction energy; EDESOLV—substrate desolvation energy; Ki—log (experimental binding affinity); PLS—Partial Least Squares Projection to Latent Structures analysis.
which the rows represent the different ligands and the columns contain the two blocks of residue-based molecular mechanics energy information (van der Waals and electrostatic) plus the additional desolvation energy terms and a last column containing the experimental binding affinities/activities. This matrix is then projected to a small number of latent variables using the PLS method (5), and the original energy terms are given weights, wi, according to their importance in the model. 2
APPLICATION OF COMBINE ANALYSIS IN DRUG DESIGN
COMBINE analysis was initially used for the study of protein–inhibitor complexes. Ortiz et al. (1) applied COMBINE analysis to a series of 26 inhibitors of the human synovial fluid phospholipase A2. The COMBINE model explained 92% (82% cross-validated) of the quantitative variability of binding constants and provided insight into the mechanism of phospholipase inhibition. Only 2% of the energy terms were required for explaining the differences in activity. The model indicated that the calcium ion present in the enzyme active site is important for inhibitory activity as is the steric accommodation of the inhibitors in the binding site of the enzyme. Perez et al. (4) conducted COMBINE analysis with a set of 33 HIV-1 protease inhibitors and externally validated their models using an additional 16 inhibitors.
82
Damborsky´ et al.
Incorporation of electrostatic desolvation effects in the model resulted in significant improvement of its predictive ability. The model constructed for a merged set of 49 inhibitors explained 91% (81% cross-validated) of the quantitative variability of the experimental data. This study was further extended by Pastor et al. (6) who incorporated the two possible binding modes of the HIV-1 inhibitors into the COMBINE model. This was achieved by manipulation of the data matrix used to describe the interaction energies and provided a model with improved external predictive ability and simplified interpretability. Tomic et al. (7) developed a COMBINE model for the binding specificity of transcription factors of the nuclear receptor family to DNA. They analyzed experimental data for the interaction of 20 mutant glucocorticoid receptor DNA-binding domains with 16 different response elements in a total of 320 complexes. The analysis revealed that specificity of binding of the transcription factor to DNA is largely determined by the energy cost of DNA desolvation and is tuned by intermolecular electrostatic interactions and conformational changes. Lozano et al. (8) applied in parallel COMBINE and GRID/GOLPE analyses to a series of 12 heterocyclic amines and human cytochrome P450 1A2. The resultant COMBINE model explained 90% (74% cross-validated) of the quantitative variability of the activity data and corresponded well with the GRID/GOLPE model explaining 96% (79% cross-validated) of the quantitative variability of the activity data. The study showed that the combined use of two 3D-QSAR approaches for model construction acts as a mutual validation procedure and allows a more reliable and detailed interpretation of the results. Cuevas et al. (9) studied 40 complexes of human neutrophil elastase with the N3-substituted trifluoromethylketone-based pyridone inhibitors. The authors carried out Poisson–Boltzmann computations and derived two additional descriptors representing the electrostatic energy contributions to the partial desolvation of both the receptor and the ligands, and solvent-screened electrostatic interactions. Incorporation of these descriptors into the model improved its statistical parameters. Most recently, Wang and Wade (10) constructed a COMBINE model for two subtypes and one mutant of neuraminidase from influenza virus complexed with 43 inhibitors. The model highlighted 12 protein residues and 1 bound water molecule as particularly important for inhibitory activity and indicated the potential for using COMBINE analysis to investigate species specificity and resistant mutants. 3
APPLICATION OF COMBINE ANALYSIS IN PROTEIN ENGINEERING
A primary goal of protein engineering is to alter the physico-chemical and functional properties of proteins by modification of their structures. Protein
Rational Redesign of Haloalkane Dehalogenases
83
structures can be engineered either by directed evolutionary approaches (11,12), which do not require any a priori knowledge of protein–function relationships, or by rational design which is based on the knowledge of these relationships. Protein structures and structure–function relationships are often so complex that it is difficult to study them without the use of computer graphics and computer modeling. COMBINE analysis quantitatively explores residue-based protein–ligand interactions and provides quantitative information about the importance of every residue in a macromolecule for the binding of different substrates. Mutagenesis of the residues with the highest importance in a COMBINE model should lead to the most significant changes in substrate specificity. The molecular models of mutant structures can be constructed in silico and the effects of substitution on substrate binding can be predicted prior to experiment using the COMBINE model. The application of COMBINE analysis to the study of structure– function relationships and engineering of haloalkane dehalogenase DhlA has been recently reported by Kmunicek et al. (13) and is further extended in this contribution.
4
PROTEIN ENGINEERING OF HALOALKANE DEHALOGENASES
Haloalkane dehalogenases are microbial enzymes that catalyze the cleavage of a carbon–halogen bond by a hydrolytic mechanism (Fig. 2). Haloalkane dehalogenases require a water molecule as the only co-factor for the reaction that is considered to be a critical step for the biological degradation of various haloalkanes (14). Haloalkanes are widely used as solvents, degreasing agents, intermediates in chemical synthesis, and pesticides. Therefore haloalkane dehalogenases could find application in bioremediation technologies and chemical syntheses (15–17). Different haloalkane dehalogenases have been isolated from various bacteria (18–25), but none of them shows sufficient activity toward some of the technologically interesting compounds, such as 1,2-dichloropropane, 2-chloropropane, 2-chlorobutane,
Figure 2 Reaction scheme of hydrolytic dehalogenation catalysed by haloalkane dehalogenases. Enz—enzyme.
84
Damborsky´ et al.
and 1,2,3-trichloropropane, although these substances have the potential to be good substrates for haloalkane dehalogenases from the reaction mechanism standpoint. Site-directed mutagenesis experiments were initiated to study structure–function relationships and redesign of haloalkane dehalogenases (26– 41). These studies identified some functional residues, such as the catalytic triad or pairs of transition-state and product stabilizing residues, but to our knowledge none of them provided enzymes with significantly improved activities toward target substances. Structural studies have been conducted to determine the 3-D structures of the wild type (42–51) and mutant proteins (31,35,52). The haloalkane dehalogenases are composed of two domains. The core of the main domain consists of an eight-stranded h-pleated sheet with seven parallel strands and one antiparallel strand (Fig. 3). This h-sheet
Figure 3 Three-dimensional model of the haloalkane dehalogenase DhlA (A) and LinB (B), and topological arrangement of secondary elements in DhlA (C) and LinB (D). The structures were determined by protein crystallography (from Refs. 42,50). Numbering of the secondary elements respects the evolutionary changes in the cap domains (from Ref. 63). The triangles indicate position of the catalytic triad residues.
Rational Redesign of Haloalkane Dehalogenases
85
is surrounded by a-helices. The cap domain is lying on top of the main domain and consists of five a-helices. A buried, mainly hydrophobic cavity is located between these two domains. Three-dimensional structures provide not only a good starting point for the rational design of site-directed mutations and for the interpretation of results from mutagenesis experiments, but also essential data for computer-modeling studies. Molecular docking (51,53), quantitative structure–function relationships (54), quantum-mechanical calculations (55–60), and molecular dynamics simulations (61–63) have brought insights into the binding of substrates to the enzyme active site, the mechanism of the dehalogenation reaction, and the conformational behavior of several dehalogenase enzymes at atomic resolution. Although the haloalkane dehalogenases are currently being intensively studied and engineered, an effective catalyst for some target compounds has not been obtained yet. Another approach to improve catalytic performance is to modify the reaction conditions. Grey et al. reported construction of a thermostable haloalkane dehalogenase DhaA (64) suitable for dehalogenation at elevated temperatures. 5
COMBINE MODEL FOR THE HALOALKANE DEHALOGENASE DhlA
COMBINE analysis was conducted to identify the protein residues responsible for the differences in binding affinities of 18 chlorinated and brominated aliphatic substrates of haloalkane dehalogenase DhlA from Xanthobacter autotrophicus GJ10 (13). Experimental data for the following compounds were extracted from the literature (65): 1-chlorobutane, 1-chlorohexane, 1bromobutane, 1-bromohexane, 1,1-dichloromethane, 1,2-dichloroethane, 1,1-dibromomethane, 1,2-dibromoethane, 1,2-dichloropropane, 1,2-dibromopropane, 2-chloroethanol, 2-bromoethanol, epichlorohydrine, epibromohydrine, 2-chloroacetonitrile, 2-bromoacetonitrile, 2-chloroacetamide, and 2-bromoacetamide. The values of apparent dissociation constants (Km) varied by three orders of magnitude. The substrate molecules were positioned in the active site of DhlA in such a way that their C–X bonds aligned with the corresponding bond as found in the experimental structure of 1,2-dichloroethane in the Michaelis–Menten complex with DhlA (43). Manually prepared enzyme–substrate complexes were energy minimized and van der Waals and electrostatic interaction energies between the protein and the substrates were calculated and decomposed on a per residue basis using the program AMBER 5.0 (66). The data matrix composed of these intermolecular interaction energies, together with the desolvation energies calculated using an electrostatic continuum method, was correlated with Km values using the PLS method. A four-component model explained 91% (73% cross-validated) of
86
Damborsky´ et al.
Figure 4 Plots of observed vs. predicted Km values and the models of Michaelis complexes for structure-based model of DhlA (A,B), docking-based model of DhlA (C,D), and docking-based model of LinB (E,F). The nucleophile (Asp) and the halide-stabilizing residue (Trp) are shown in stick representation.
Rational Redesign of Haloalkane Dehalogenases
87
the quantitative variance in Km (Fig. 4A). The first dimension mainly projected out the electrostatic term of Asp124, which contributes substantially to the energy variance but has a poor contribution to the Km correlation. Asp124 is the nucleophile that initiates the dehalogenation reaction and is in very close contact with the electrophilic carbon of each substrate (Fig. 4B). Analysis of the second, third, and fourth principal components showed that only a few energy variables, involving only a few protein residues, are important for explaining the differences in binding among substrates (1% of the enzyme’s amino acids explained 91% of variance in Km). These residues can be divided into two classes, with respect to their interaction with the substrates. The first class is formed by residues separating chlorinated from brominated derivatives: Trp125, Trp175, and Pro223. These residues form the halogen binding site in the enzyme. Mutations affecting these residues should be primarily used to modulate the halogen specificity of the enzyme. Phe222 also contributes to the separation of chlorinated derivates from brominated derivates, together with Leu179 (Fig. 5). The second set of residues discriminates substrates by their interactions with the substrate alkyl chains. These are mainly Phe172, Phe222, and Phe164, with a contribution from Asp124 as well. Mutations affecting these residues can be used to tune the activity of the enzyme for different chain specificity. a-Helix 4 has the largest concentration
Figure 5 Stereo view of the active site of haloalkane dehalogenase DhlA with bound ligands. Residues separating chlorinated from brominated derivatives are shown as dark sticks: Trp125, Trp175, Leu179, Phe222, and Pro223. Residues separating substrates according to the size and shape of their carbon chains are shown as light sticks: Phe164 and Phe172. The van der Waals surface of the protein atoms in direct contact with the halogen atom is represented by dots.
88
Damborsky´ et al.
of residues involved in explaining the Km differences: Phe172, Trp175, Lys176, and Leu179 (see Fig. 3 for the position of a-helix 4). This finding is in good agreement with experimental observations by Priest et al. (27), who isolated 12 in vivo mutants of DhlA with improved activity toward 1-chlorohexane and 9 of them carried modifications in a-helix 4 or its close surroundings. Priest et al. suggested that a-helix 4 is critical for the specificity of DhlA. The applicability of the COMBINE models to predictions was validated using two mutants of DhlA for which the crystal structures had been determined (33,35). Four substrate molecules with available experimental binding constants were modeled in the active sites of the mutant proteins and their Km values were predicted using the COMBINE model. The trends in changes of binding affinity due to mutation were predicted correctly without exception (13). The main disadvantage of the methodology described above is the need for at least one experimental structure of an enzyme–substrate complex and the assumption that all substrates bind to the active site in the same mode. There are probably many enzymes for which the structural information on the enzyme–substrate complex is missing or which bind their substrates in different orientations, e.g., broad-specificity enzymes. An additional study was therefore conducted with DhlA in which all substrate molecules were automatically and independently positioned inside the active site using a computational method. The remaining part of the COMBINE analysis procedure was the same as described above. The automated molecular docking program AutoDock 3.0 (67) was used for positioning 18 halogenated substrates into the active site of DhlA. The docking calculations provided suitable orientations for 15 out of these 18 substrate molecules, as no suitable orientations were found for dihaloacetamides and 2-bromoacetonitrile, i.e., they could not be docked with an orientation close to that necessary for catalysis. Multiple orientations were found for several substrates: 1,2-dibromopropane, halobutanes, and halohexanes. The orientations for the subsequent COMBINE analysis were selected using quantum mechanical calculations, which discriminated between binding modes on the basis of their suitability for the ensuing SN2 dehalogenation reaction. Dehalogenation reactions were simulated inside a reduced model of the active site of DhlA composed of 20 amino acids (Fig. 6) using the semi-empirical quantum mechanics program MOPAC 6.0 (68) interfaced by TRITON 2.0 (69). The selected orientations were further optimized by energy minimization and were found to be in very good agreement with the expected reaction mechanism of DhlA (Fig. 4D). Furthermore, bound substrates resembled the reactive conformation of 1,2-dichloroethane described by Lau et al. (62). The goodness of the fit to the experimental data in the COMBINE model constructed from these selected docked orientations was comparable to that previously
Rational Redesign of Haloalkane Dehalogenases
89
Figure 6 Three-dimensional model of the active site of haloalkane dehalogenase DhlA used for quantum mechanical calculations as displayed in the main window of the TRITON program. This program is used for the preparation of the input data for calculation of reaction coordinates, for monitoring of the progress of calculation, and for analysis of output data. The software is freely available at http://ncbr.chemi.muni. cz/triton/triton.html.
obtained with the structure-based model (compare Fig. 4A and C). The model explained 96% (67% cross-validated) of the quantitative variance in Km, and two outliers (dihalopropanes) had to be removed from the model. The composition of the latent variables extracted and the importance of amino acid residues for explaining Km values, however, were similar in both models, leading to a similar biochemical interpretation. Interestingly, the automatic docking-based model employed more electrostatic contributions than the structure-based alignment model. The largest difference in van der Waals interactions was noted for the residues in direct contact with the halogenated
90
Damborsky´ et al.
hexanes (Phe222 and Leu263) due to the different orientations and conformations adapted by these long substrates in the small active site.
6
COMBINE MODEL FOR THE HALOALKANE DEHALOGENASE LinB
Haloalkane dehalogenase LinB from Sphingomonas paucimobilis UT26 belongs to the same protein family as DhlA. These two proteins differ both by their structures and their catalytic properties. The catalytic triad of LinB is composed of Asp108–His272–Glu132 (38), while the catalytic triad of DhlA consists of Asp124–His289–Asp260 (45). The catalytic acid is positioned after h-strand 6 in LinB (Fig. 3D) and after h-strand 7 in DhlA (Fig. 3C). Bound substrate, transition states, and product structures are primarily stabilized by hydrogen bonds from the Trp109–Asn38 pair in LinB and the Trp125– Trp175 pair in DhlA. The active site of LinB is 2.5 times larger than the active site of DhlA and is less buried inside the protein core (70). There are at least three tunnels leading to the active site of LinB, but only one tunnel in DhlA (63). LinB shows broader substrate specificity than DhlA, i.e., it is more active toward larger and h-substituted haloalkanes, and therefore it should be more suitable for the design of efficient catalysts for the target compounds carrying a halogen in the h-position. Currently, there is no 3-D structure of a Michaelis complex for LinB. Furthermore, it is not safe to assume that all substrates bind to the large active site in a similar way. An automated docking-based methodology was therefore used for the construction of a COMBINE model for LinB. Experimental data (Km values) were determined for 25 substrates: 1-chloropropane, 1-chlorobutane, 1chlorohexane, 1-chloroheptane, 1-chlorooctane, 1-bromopropane, 1-bromobutane, 1-bromohexane, 1-iodopropane, 1-iodohexane, 1,3-dichloropropane, 1,5-dichloropentane, 1,2-dibromoethane, 1,3-dibromopropane, 1-bromo-3-chloropropane, 1,2-dibromopropane, 2-bromo-1-chloropropane, 1-bromo-2-methyl-propane, bis(2-chloroethyl)ether, chlorocyclohexane, bromocyclohexane, 4-bromobutyronitrile, 3-chloro-2-methylpropene, 3-chloro2-(chloromethyl)-1-propene, and 2,3-dichloropropene. A preliminary model, consisting of only one principal component, explained 91% (87% crossvalidated) of the quantitative variability in Km values (Fig. 4E). Two outliers, bis(2-chloroethyl)ether and chlorocyclohexane, had to be removed from the model (Fig. 4F). Extreme kcat/Km values were repeatedly measured with these substrates. The model explained the variability in Km values resulting from the different lengths of the substrate molecules but could not deal properly with the variability originating from the different halogens. In DhlA, halogen substituents are tightly bound between two opposing tryptophans and the
Rational Redesign of Haloalkane Dehalogenases
91
COMBINE model constructed for this protein could distinguish between chlorinated and brominated substrates. More research is needed to refine the model for LinB, e.g., by investigating the contribution of electrostatic desolvation energies, the effects of the energy minimization on conformation of substrates in the Michaelis complexes or the effects of explicit inclusion of the water molecules in Michaelis complexes. In fact, active exchange of water molecules between the active site of LinB and the bulk solvent was observed in nanosecond-scale molecular dynamics simulation (63). The lesson learned so far from the comparison of DhlA and LinB COMBINE models is that exactly the same methodology to generate structures of the complexes cannot necessarily be applied, even to closely related proteins, but the modeling protocol must be adjusted with respect to the proteins distinguishing structural and biochemical features.
7
CONCLUSION
COMBINE analysis quantitatively explores macromolecule–ligand interactions on a residue basis and provides quantitative information about the importance of every residue in a macromolecule for binding of different substrates. COMBINE analysis identified a number of specificity-determining amino acid residues in the haloalkane dehalogenase DhlA. Trp125, Trp175, Leu179, Phe222, and Pro223 are important in distinguishing chlorinated and brominated derivatives. Mutations affecting these residues should modulate the halogen specificity of the enzyme. A second set of residues (Phe164, Phe172, and Phe222) are found to discriminate substrates by their interactions with the carbon chain. The predictive ability of the COMBINE model derived for DhlA was confirmed with two site-directed point mutants and four novel substrates. Modeling the specificity of the haloalkane dehalogenase LinB using the same methodology is slightly more difficult due to its larger active site and less specific binding of its substrates. Our current COMBINE model differentiates between molecules of different chain length but cannot properly distinguish substrates bearing a different halogen atom. To achieve this goal we are currently tailoring our modeling protocol for the LinB enzyme.
ACKNOWLEDGMENTS This work was supported by the NATO Linkage Grant MTECH. LG. 974701 and grants from the Czech Ministry of Education J07/98:143100005 and ME551 (JD).
92
Damborsky´ et al.
REFERENCES 1. 2.
3.
4.
5.
6.
7. 8.
9.
10. 11. 12. 13.
14. 15. 16.
AR Ortiz, MT Pisabarro, F Gago, RC Wade. Predictive of drug binding affinities by comparative binding energy analysis. J Med Chem 38:2681–2691, 1995. RC Wade. Derivation of QSARs using 3D structural models of protein–ligand complexes by COMBINE analysis. In: H-D Holtje, W Sippl, eds. Rational Approaches to Drug Design: 13th European Symposium on Quantitative Structure–Activity Relationships. Barcelona: Prous Science, 2001, pp 23–28. RD Cramer, DE Patterson, JD Bunce. Comparative Molecular Field Analysis (CoMFA): 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967, 1988. C Perez, M Pastor, AR Ortiz, F Gago. Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent effects and validation as a powerful tool in receptor-based drug design. J Med Chem 41:836–852, 1998. S Wold, E Johansson, M Cocchi. PLS—Partial least-squares projections to latent structures. In: H Kubinyi, ed. 3D QSAR in Drug Design: Theory, Methods and Application. Leiden: ESCOM, 1993, pp 523–550. M Pastor, C Perez, F Gago. Simulation of alternative binding modes in a structure-based QSAR study of HIV-1 protease inhibitors. J Mol Graph Model 15: 364–371, 1997. S Tomic, L Nilsson, RC Wade. Nuclear receptor-DNA binding specificity: a COMBINE and Free-Wilson QSAR analysis. J Med Chem 43:1780–1792, 2000. JJ Lozano, M Pastor, G Cruciani, K Gaedt, NB Centeno, F Gago, F Sanz. 3DQSAR methods on the basis of ligand–receptor complexes. Application of COMBINE and GRID/GOLPE methodologies to a series of CYP1A2 ligands. J Comput-Aid Mol Des 14:341–353, 2000. C Cuevas, M Pastor, C Perez, F Gago. Comparative binding energy (COMBINE) analysis of human neutrophil elastase inhibition by pyridone-containing trifluormethylketones. Comb Chem High Throughput Screen 4:627–642, 2001. T Wang, RC Wade. Comparative binding energy (COMBINE) analysis of influenza neuraminidase–inhibitor complexes. J Med Chem 44:961–971, 2001. Kuchner, FH Arnold. Directed evolution of enzyme catalysts. Trends Biotechnol 15:523–530, 1997. FH Arnold. Design by directed evolution. Acc Chem Res 31:125–131, 1998. J Kmunicek, S Luengo, F Gago, AR Ortiz, RC Wade, J Damborsky. Comparative binding energy analysis of the substrate specificity of haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. Biochemistry 40:8905– 8917, 2001. DB Janssen, F Pries, JR Van der Ploeg. Genetics and biochemistry of dehalogenating enzymes. Ann Rev Microbiol 48:163–191, 1994. DB Janssen, JP Schanstra. Engineering proteins for environmental applications. Curr Opin Biotech 5:253–259, 1994. G Stucki, M Thuer. Experiences of a large-scale application of 1,2-dichloroethane degrading microorganisms for groundwater treatment. Environ Sci Technol 29: 2339–2345, 1995.
Rational Redesign of Haloalkane Dehalogenases
93
17. PE Swanson. Dehalogenases applied to industrial-scale biocatalysis. Curr Opin Biotechnol 10:365–369, 1999. 18. S Keuning, DB Janssen, B Witholt. Purification and characterization of hydrolytic haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. J Bacteriol 163:635–639, 1985. 19. T Yokota, T Omori, T Kodama. Purification and properties of haloalkane dehalogenase from Corynebacterium sp. strain m15-3. J Bacteriol 169:4049–4054, 1987. 20. R Scholtz, T Leisinger, F Suter, AM Cook. Characterization of 1-chlorohexane halidohydrolase, a dehalogenase of wide substrate range from an Arthrobacter sp. J Bacteriol 169:5016–5021, 1987. 21. DB Janssen, J Gerritse, J Brackman, C Kalk, D Jager, B Witholt. Purification and characterization of a bacterial dehalogenase with activity toward halogenated alkanes, alcohols and ethers. Eur J Biochem 171:67–92, 1988. 22. PJ Sallis, SJ Armfield, AT Bull, DJ Hardman. Isolation and characterization of a haloalkane halidohydrolase from Rhodococcus erythropolis Y2. J Gen Microbiol 136:115–120, 1990. 23. Y Nagata, K Miyauchi, J Damborsky, K Manova, A Ansorgova, M Takagi. Purification and characterization of haloalkane dehalogenase of a new substrate class from a g-hexachlorocyclohexane-degrading bacterium, Sphingomonas paucimobilis UT26. Appl Environ Microbiol 63:3707–3710, 1997. 24. GJ Poelarends, M Wilkens, MJ Larkin, JD van Elsas, DB Janssen. Degradation of 1,3-dichloropropene by Pseudomonas cichorii 170. Appl Environ Microbiol 64:2931–2936, 1998. 25. A Jesenska, M Bartos, V Czernekova, I Rychlik, I Pavlik, J Damborsky. Cloning and expression of haloalkane dehalogenase gene dhmA from Mycobacterium avium N85 and preliminary characterization of DhmA. Appl Environ Microbiol 68:3724–3730, 2002. 26. F Pries, J Kingma, M Pentega, G Van Pouderoyen, CM Jeronimus-Stratingh, AP Bruins, DB Janssen. Site-directed mutagenesis and oxygen isotope incorporation studies of the nucleophilic aspartate of haloalkane dehalogenase. Biochemistry 33:1242–1247, 1994. 27. F Pries, AJ Van den Wijngaard, R Bos, M Pentenga, DB Janssen. The role of spontaneous cap domain mutations in haloalkane dehalogenase specificity and evolution. J Biol Chem 269:17490–17494, 1994. 28. F Pries, J Kingma, DB Janssen. Activation of an Asp-124!Asn mutant of haloalkane dehalogenase by hydrolytic deamidation of asparagine. FEBS Lett 358:171–174, 1995. 29. F Pries, J Kingma, GH Krooshof, CM Jeronimus-Stratingh, AP Bruins, DB Janssen. Histidine 289 is essential for hydrolysis of the alkyl-enzyme intermediate of haloalkane dehalogenase. J Biol Chem 270:10405–10411, 1995. 30. C Kennes, F Pries, GH Krooshof, E Bokma, J Kingma, DB Janssen. Replacement of tryptophan residues in haloalkane dehalogenase reduces halide binding and catalytic activity. Eur J Biochem 228:403–407, 1995. 31. JP Schanstra, IS Ridder, GJ Heimeriks, R Rink, GJ Poelarends, KH Kalk, BW
94
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
Damborsky´ et al. Dijkstra, DB Janssen. Kinetic characterization and X-ray structure of a mutant of haloalkane dehalogenase with higher catalytic activity and modified substrate range. Biochemistry 35:13186–13195, 1996. JP Schanstra, A Ridder, J Kingma, DB Janssen. Influence of mutations of Val226 on the catalytic rate of haloalkane dehalogenase. Protein Eng 10:53–61, 1997. GH Krooshof, EM Kwant, J Damborsky, J Koca, DB Janssen. Repositioning the catalytic triad acid of haloalkane dehalogenase: effects on activity and kinetics. Biochemistry 36:9571–9580, 1997. P Holloway, KL Knoke, JT Trevors, H Lee. Alternation of the substrate range of haloalkane dehalogenase by site-directed mutagenesis. Biotechnol Bioeng 59: 520–523, 1998. GH Krooshof, IS Ridder, AWJW Tepper, GJ Vos, HJ Rozeboom, KH Kalk, BW Dijkstra, DB Janssen. Kinetic analysis and X-ray structure of haloalkane dehalogenase with a modified halide-binding site. Biochemistry 37:15013–15023, 1998. K Hynkova, Y Nagata, M Takagi, J Damborsky. Identification of the catalytic triad in the haloalkane dehalogenase from Sphingomonas paucimobilis UT26. FEBS Lett 446:177–181, 1999. JF Schindler, PA Naranjo, DA Honaberger, C-H Chang, JR Brainard, LA Vanderberg, CJ Unkefer. Haloalkane dehalogenases: steady-state kinetics and halide inhibition. Biochemistry 38:5772–5778, 1999. Y Nagata, K Hynkova, J Damborsky, M Takagi. Construction and characterization of histidine-tagged haloalkane dehalogenase (LinB) of a new substrate class from a g-hexachlorocyclohexane-degrading bacterium, Sphingomonas paucimobilis UT26. Protein Expr Purif 17:299–304, 1999. S Marvanova, Y Nagata, M Wimmerova, J Sykorova, K Hynkova, J Damborsky. Biochemical characterization of broad-specificity enzymes using multivariate experimental design and a colorimetric microplate assay: characterization of the haloalkane dehalogenase mutants. J Microbiol Methods 44:149–157, 2001. Y Nagata, Z Prokop, S Marvanova, J Sykorova, M Monincova, M Tsuda, J Damborsky. Re-construction of mycobacterial dehalogenase Rv2579 by cumulative mutagenesis of haloalkane dehalogenase LinB. Appl Environ Microbiol 69:2349–2355, 2003. R Chaloupkova, J Sykorova, Z Prokop, A Jesenska, M Monincova, M Pavlova, M Tsuda, Y Nagata, J Damborsky. Modification of activity and specificity of haloalkane dehalogenase from Sphingomonas paucimobilis UT26 by engineering of its entrance tunnel. Submitted. SM Franken, HJ Rozeboom, KH Kalk, BW Dijkstra. Crystal structure of haloalkane dehalogenase: an enzyme to detoxify halogenated alkanes. EMBO J 10:1297–1302, 1991. KHG Verschueren, F Seljee, HJ Rozeboom, KH Kalk, BW Dijkstra. Crystallographic analysis of the catalytic mechanism of haloalkane dehalogenase. Nature 363:693–698, 1993. KHG Verschueren, J Kingma, HJ Rozeboom, KH Kalk, DB Janssen, BW
Rational Redesign of Haloalkane Dehalogenases
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
95
Dijkstra. Crystallographic and fluorescence studies of the interaction of haloalkane dehalogenase with halide ions. Studies with halide compounds reveal a halide binding site in the active site. Biochemistry 32:9031–9037, 1993. KHG Verschueren, SM Franken, HJ Rozeboom, KH Kalk, BW Dijkstra. Noncovalent binding of the heavy atom compound [Au(CN)2] at the halide binding site of haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. FEBS Lett 323:267–270, 1993. KHG Verschueren, SM Franken, HJ Rozeboom, KH Kalk, BW Dijkstra. Refined X-ray structures of haloalkane dehalogenase at pH 6.2 and pH 8.2 and implications for the reaction mechanism. J Mol Biol 232:856–872, 1993. HJ Rozeboom, J Kingma, DB Janssen, BW Dijkstra. Crystallization of haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. J Mol Biol 200:611–612, 1988. IS Ridder, HJ Rozeboom, BW Dijkstra. Haloalkane dehalogenase from Xanthobacter autotrophicus GJ10 refined at 1.15 A resolution. Biol Crystallogr 55: 1273–1290, 1999. J Newman, TS Peat, R Richard, L Kan, PE Swanson, JA Affholter, IH Holmes, JF Schindler, CJ Unkefer, TC Terwilliger. Haloalkane dehalogenase: structure of a Rhodococcus enzyme. Biochemistry 38:16105–16114, 1999. J Marek, J Vevodova, I Kuta-Smatanova, Y Nagata, LA Svensson, J Newman, M Takagi, J Damborsky. Crystal structure of the haloalkane dehalogenase from Sphingomonas paucimobilis UT26. Biochemistry 39:14082–14086, 2000. AJ Oakley, Z Prokop, M Bohac, J Kmunicek, T Jedlicka, M Monincova, I KutaSmatanova, Y Nagata, J Damborsky, MCJ Wilce. Exploring the structure and activity of haloalkane dehalogenase from Sphingomonas paucimobilis UT26: evidence for product and water mediated inhibition. Biochemistry 41:4847–4855, 2002. MG Pikkemaat, IS Ridder, HJ Rozeboom, KH Kalk, BW Dijkstra, DB Janssen. Crystallographic and kinetic evidence of a collision complex formed during halide import in haloalkane dehalogenase. Biochemistry 38:12052–12061, 1999. J Damborsky, M Kuty, M Nemec, J Koca. Molecular modelling to understand the mechanisms of microbial degradation—Application to hydrolytic dehalogenation with haloalkane dehalogenases. In: F Chen, G Schu¨u¨rmann, eds. Quantitative Structure–Activity Relationships in Environmental Sciences—VII. Pensacola: SETAC Press, 1997, pp 5–20. J Damborsky. Quantitative structure–function relationships of the single-point mutants of haloalkane dehalogenase: a multivariate approach. Quant Struct-Act Relat 16:126–135, 1997. J Damborsky, M Kuty, M Nemec, J Koca. A molecular modeling study of the catalytic mechanism of haloalkane dehalogenase: 1. Quantum chemical study of the first reaction step. J Chem Inf Comput Sci 37:562–568, 1997. FC Lightstone, Y-J Zheng, AH Maulitz, TC Bruice. Non-enzymatic and enzymatic hydrolysis of alkyl halides: a haloalkane dehalogenation enzyme evolved to stabilize the gas-phase transition state of an SN2 displacement reaction. Proc Natl Acad Sci USA 94:8417–8420, 1997. AH Maulitz, FC Lightstone, YJ Zheng, TC Bruice. Nonenzymatic and
96
58.
59.
60. 61.
62.
63.
64.
65. 66.
67.
68. 69. 70.
Damborsky´ et al. enzymatic hydrolysis of alkyl halides: a theoretical study of the S(N)2 reactions of acetate and hydroxide ions with alkyl chlorides. Proc Natl Acad Sci USA 94:6591–6595, 1997. J Damborsky, M Bohac, M Prokop, M Kuty, J Koca. Computational sitedirected mutagenesis of haloalkane dehalogenase in position 172. Protein Engng 11:901–907, 1998. M Kuty, J Damborsky, M Prokop, J Koca. A molecular modeling study of the catalytic mechanism of haloalkane dehalogenase: 2. Quantum chemical study of complete reaction mechanism. J Chem Inf Comput Sci 38:736–741, 1998. M Prokop, J Damborsky, J Koca. TRITON: in silico construction of protein mutants and prediction of their activities. Bioinformatics 16:845–846, 2000. FC Lightstone, YJ Zheng, TC Bruice. Molecular dynamics simulations of ground and transition states for the S(N)2 displacement of Cl from 1,2dichloroethane at the active site of Xanthobacter autotrophicus haloalkane dehalogenase. J Am Chem Soc 120:5611–5621, 1998. EY Lau, K Kahn, P Bash, TC Bruice. The importance of reactant positioning in enzyme catalysis: a hybrid quantum mechanics/molecular mechanics study of a haloalkane dehalogenase. Proc Natl Acad Sci USA 97:9937–9942, 2000. M Otyepka, J Damborsky. Functionally relevant motions of haloalkane dehalogenases occur in the specificity-modulating cap domains. Protein Sci 11:1206– 1217, 2002. KA Gray, TH Richardson, K Kretz, JM Short, F Bartnek, R Knowles, L Kan, PE Swanson, DE Robertson. Rapid evolution of reversible denaturation and elevated melting temperature in a microbial haloalkane dehalogenase. Adv Synth Catal 343:607–617, 2001. JP Schanstra, J Kingma, DB Janssen. Specificity and kinetics of haloalkane dehalogenase. J Biol Chem 271:14747–14753, 1996. DA Case, DA Pearlman, JW Caldwell, TE Cheatham III, WS Ross, CL Simmerling, TA Darden, KM Merz, RV Stanton, AL Cheng, JJ Vincent, M Crowley, DM Ferguson, RJ Radmer, GL Seibel, UC Singh, PK Weiner, PA Kollman. AMBER 5.0. San Francisco: University of California, 1997. GM Morris, DS Goodsell, RS Halliday, R Huey, WE Hart, RK Belew, AJ Olson. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19:1639–1662, 1998. JJP Stewart. MOPAC—A semiempirical molecular-orbital program. J ComputAid Mol Des 4:1–45, 1990. J Damborsky, M Prokop, J Koca. TRITON: graphic software for rational engineering of enzymes. Trends Biochem Sci 26:71–73, 2001. J Damborsky, J Koca. Analysis of the reaction mechanism and substrate specificity of haloalkane dehalogenases by sequential and structural comparisons. Protein Eng 12:989–998, 1999.
6 Computer Simulations: A Tool for Investigating the Function of Complex Biological Macromolecules ¨nther H. Peters Gu Technical University of Denmark Lyngby, Denmark
1
COMPUTER SIMULATIONS
Computer simulations have become one of the principal tools in theoretical studies of physical, chemical, and biological systems, and with the fast advancement of computational resources, simulation techniques have emerged as indispensable scientific and engineering tools. In particular, two simulation techniques established in the 1960s are now commonly applied to many aspects of medicinal chemistry and biophysics, where computer analyses and simulations can augment or explain experimental observations. These techniques are the Monte Carlo algorithms (1–9) and the molecular dynamics simulation techniques (10–32). Monte Carlo calculations represent an entirely different type of simulation than those based on molecular dynamics. The name ‘‘Monte Carlo’’ comes from the randomchance nature of the simulations, akin to the games of chance at Monaco’s 97
98
Peters
gambling resort. Monte Carlo simulations are stochastic and use random numbers to sample from a probability distribution, usually the classical Boltzmann distribution, to obtain for instance thermodynamic properties, minimum-energy structures and/or rate coefficients, or to sample conformers as part of a global conformer search algorithm. The molecular dynamics (MD) simulation technique on the other hand is a deterministic method, where the time evolution of a system is determined by Newton’s equations of motion (i.e., positions, velocities, and accelerations of atoms). Hence, MD simulations not only provide information on a molecular level (i.e., in space) but also on the dynamics of a system (i.e., behavior in time). Experiments often do not provide the molecular information available from simulations. Therefore as schematically shown in Fig. 1, theoreticians, computational scientists, and experimentalists can have a synergistic interaction, leading to new insights into complex biological systems. Chemical and physical
biological system
modeling approach
developing model system
experiment
computer simulations
theory and approximation
experimental results
"exact" model results
theoretical predictions
comparison/ adjustment
comparison/ adjustment
insight on molecular level
Figure 1 Synergistic interaction between experimentalists, computational chemists, and theoreticians.
Computer Simulations
99
intuition and experience, of course, will always be necessary, but computers add depth and new dimensions in understanding complex biological systems. 2
MONTE CARLO METHOD
Numerical methods that are known as Monte Carlo (MC) methods can be broadly described as statistical simulation methods. Here the terminology statistical simulation is defined in general terms including any method that utilizes sequences of random numbers to perform a simulation. The name ‘‘Monte Carlo’’ was coined by Metropolis during the Manhattan Project of World War II, because of the similarity of statistical simulations to games of chance, and because the capital of Monaco was a center for gambling. Monte Carlo methods have been used for centuries, but only in the past several decades has this technique been applied to complex problems. One of the first MC calculations was performed by Enrico Fermi in the 1930s. He used MC in the calculation of neutron diffusion and later designed the Fermiac, which is a Monte Carlo mechanical device used in the calculation of criticality in nuclear reactors. It was in the 1940s when von Neumann developed a formal foundation for the MC method by establishing the mathematical basis for probability density functions, inverse cumulative distribution functions, and pseudorandom number generators. In many applications of MC, the physical process is simulated directly without the need to write down the differential equations that describe the behavior of the system. It is noteworthy that this concept is in contrast to conventional numerical discretization methods, which typically are applied to ordinary or partial differential equations that describe some underlying physical or mathematical system. The only requirement for utilizing MC is that the physical (or mathematical) system can be described by a probability density function. Once the distribution function is known, MC simulation can proceed by random sampling from that distribution. By the 1960s, the method was used in a variety of engineering fields. However, at that time, calculations were limited by the computational power available. Many complex problems remained intractable throughout the 1970s. With the development of high-speed supercomputers, Monte Carlo application has received increased attention. In particular with the development of parallel algorithms having much higher execution rates. The Monte Carlo technique is now used routinely in many diverse fields including nuclear reactor design, traffic flow, economics, environmental problems (e.g., air pollution), or biological systems. In the latter area, simulations are carried out to understand the collective behavior of many-particle systems leading for instance to protein folding, phase separation, or spatial segregation in membranes (33–36).
100
Peters
A major advance in Monte Carlo simulations was made by Metropolis and co-workers (37), who developed a new algorithm in which the new conformation is generated from the current one by a ‘‘move’’ accepted with the probability, Pacc Pacc ¼ 1 DE < 0 Pacc ¼ minð1; expðDE=kB TÞÞ DE > 0
ð1Þ ð2Þ
which depends on the corresponding change in energy, DE, and on the externally adjustable parameter kBT, where kB is the Boltzmann constant, and T is the temperature (38). As the Metropolis algorithm satisfies the ‘‘detailed balance’’ condition and each configuration can be reached in a finite number of steps (ergodicity), the resulting Markov process (chain) will converge to the canonical distribution. That means the probability (frequency) that a particular conformation occurs will be proportional to its Boltzmann factor exp(E/kBT). Thermodynamic quantities can then be simply calculated by computing averages over the sampled conformations. In many practical applications, one can predict the statistical error (the ‘‘variance’’) in the average result, and hence an estimate of the number of Monte Carlo trials needed to achieve a given error and convergence of the thermodynamic quantities of interest. The MC method is also an efficient method for sampling the configurational space, which frequently is explored in protein folding problems. Because MC is stochastic in nature and the method is based on probability distributions, there is no dynamic component involved in the simulations (i.e., time-dependent quantities such as transport coefficients cannot be calculated from MC simulations). Molecular dynamics simulations, in contrast, are deterministic and follow the time-evolution of particles in phase space. In other words, MC explores the phase space by jumping from one configuration to another configuration weighted by the Boltzmann factor, while MD follows the trajectory (time-evolution) of single atoms through the phase space. An obvious advantage of molecular dynamics over Monte Carlo is that MD simulations follow the classical trajectory of the system, whereas the dynamics in MC is artificial. MD is the method of choice when investigating the kinetics of a given process. One of the major shortcomings of MD simulations is that as the complexity of the system increases, these simulations are computationally demanding, and in many problems only a limited part of the phase space can be explored. Furthermore, if the system moves along a rough energy surface with a relatively large number of local minima, then MD simulations tend to get trapped in these energy minima and as a consequence only a limited part of the phase space is sampled. The problem of multiple minima and their detection is well known in computational chemistry. In many instances, one cannot explore all
Computer Simulations
101
possible states of a system by performing an exhaustive search of all its degrees of freedom. Here MC simulations provide a sufficient method to explore the energy landscape and have been successfully applied in many biophysical areas such as structure-based drug design, docking of molecules to a receptor, and alignment of molecules by optimization of molecular similarity indices. 3 3.1
MOLECULAR DYNAMICS METHOD Historical Background
The molecular dynamics technique was first introduced by Alder and Wainwright in the late 1950s (32,39). The authors studied the interactions of hard spheres resulting in many important insights into the behavior of simple liquids (39). In the early 1960s, pioneering works on the development of consistent force fields based on experimental data (such as spectroscopic data, heat of formation, structures of small molecules, quantum-mechanical information, etc.) were independently carried out by Lifson at the Weizman Institute of Science, Scheraga at Cornell University, and Allinger at the Wayne State University. These researches lay the foundation for developing force field parameters for various chemical compounds by optimizing computationally obtained results to experimental observations such as structure and energetics. In the early 1970s, Rahman and Stillinger performed the first simulation using a realistic potential for liquid argon (40). The same authors also carried out the first molecular dynamics simulation of a polar molecule (liquid water) (41,42). The first protein simulation appeared in 1977 by McCammon et al. The authors conducted simulations of the bovine pancreatic trypsin inhibitor (BPTI) (43). Today molecular dynamics simulations are well established in the scientific community and this technique is applied to a wide range of applications including chemical, biophysical, or medicinal problems such as enzyme catalysis, protein–protein interactions, and protein/ligand design (44). Moreover, molecular dynamics simulation techniques are also used in experimental procedures such as x-ray crystallography and NMR structure determinations. The number of simulation techniques has greatly expanded, and techniques have been developed for particular problems including mixed quantum mechanical–classical simulations that are applied to the study, for instance, of charge transfer in enzymatic reactions. 3.2
Classical Mechanics
Molecular dynamics simulation is a method that inherently introduces the concept of time and is based on Newton’s classical mechanics. Building on the work of Galileo Galilei (1564–1642), Nicolas Copernicus (1473–1543),
102
Peters
Tycho Brahe (1546–1601), and Johannes Kepler (1571–1630), Isaac Newton (1642–1727) formulated in 1687 the second law of motion stating that a ! body’s acceleration, ! a, is equal to the net force, F, divided by its mass, m. !
!
a ¼ F=m
ð3Þ
Throughout the text, an arrow above a variable indicates that the quantity is a vector. In the late 1800s and early 1900s a number of experimental observations indicated that Newtonian physics has its limitation because it only considers the nuclear motion of many-body systems. It became increasingly clear that electromagnetic radiation had particle-like properties in addition to its wave-like properties such as interference and diffraction. This initiated intense research in that area and several major breakthroughs have been achieved, as summarized in Table 1. Plank demonstrated in 1900 that electromagnetic radiation was emitted and adsorbed from a black body in discrete quanta, each having an energy proportional to the frequency of radiation. Einstein invoked this concept (discrete quanta) to explain the photo-electric effect in 1904. In 1924, de Broglie asserted that matter has a dual nature, i.e., that particles can be wavy. This led to the formulation of Schro¨dinger’s time-dependent wave equation of matter (45–47). !
ywð r; tÞ ˆ ð! Hw r; tÞ ¼ ih yt
ð4Þ
Hˆ is the so-called Hamiltonian operator incorporating all the relevant forces exerted on the particles of the system, and ! r given by ! r ¼ rx eˆ x þ ry eˆ y þ rz eˆ z is the position of the particle in Cartesian coordinates. The solution of Eq. (4) yields discrete (quantized) values (or eigenvalues) of energy En and for each En its corresponding wave function. Hence each particle is represented by a wave function w (position, time) such that the quantity ww* is the probability of finding a particle at that position at that time. However, the theory of quantum mechanics is by no means equivalent to Newton’s Laws. There are some major differences between classical and quantum mechanics, and these differences form a limitation on their exact application. In classical mechanics a particle can have any energy and any speed, whereas in quantum mechanics these quantities are quantized. As a consequence a particle in a quantum system can only have certain values for its energy and its speed (or momentum). These special values are called the energy or momentum eigenvalues of the quantum system. Associated with each eigenvalue is a special state called an eigenstate. The eigenvalues and eigenstates of a quantum system are the most important features for characterizing that systems behavior. There are no eigenvalues or eigenstates in classical mechanics. Newton’s laws allow, in
Computer Simulations Table 1
103
Historical Overview in Quantum Mechanics
Scientist
Year
Achievement
Max Planck
1900
Albert Einstein
1905
Niels Bohr
1913
Louis de Broglie
1924
Werner Heisenberg
1925
Erwin Schro¨dinger
1926
Copenhagen Interpretation
1927
Explained blackbody radiation by applying the concept of discrete-energy quanta in physics. Treated radiation as independent particles of energy, where quantum theories of both matter and radiation were needed to describe these systems. Demonstrated that the frequencies of atomic spectral lines are independent of the frequencies of electronic motions within the atom. Established the concept of wave-particle duality for matter. Developed the matrix mechanics which is a consistent (but arbitrary) quantum theory emphasizing quantum rules to problems of atomic structure and atomic spectra. Proved that the theories of matrix mechanics are equivalent to his own developed wave mechanics. Proposed by Heisenberg and Bohr, this interpretation of quantum mechanics was based on Bohr’s statistical interpretation of the wave function and Dirac’s more comprehensive theory of quantum mechanics.
principle, to determine the exact location and velocity of a particle at some future time. Quantum mechanics, on the other hand, only determines the probability for a particle to be in a certain location with a certain velocity at some future time. The probabilistic nature of quantum mechanics makes it very different from classical mechanics. Quantum mechanics incorporates what is known as the ‘‘Heisenberg Uncertainty Principle’’. This principle states that the location and velocity of a quantum particle are not known to infinite accuracy. If one can determine precisely the particle’s location, then the exact velocity is uncertain and vice versa. In practice, the level of ‘‘uncertainty’’ is so small that it is only noticeable when dealing with matter having atomic dimensions. Quantum mechanics permits what is called ‘‘superposition of states’’. This means that a quantum particle can be in two different states at the same time, which is certainly not possible in classical
104
Peters
mechanics. Quantum mechanics describes a system in terms of probabilities. It forces to abandon the notion of precisely defined trajectories of atoms through time and space. In classical mechanics electronic motions are not considered, and quantum effects are generally ignored. The classical description is excellent for a wide range of systems but of course fails for reactions involving electron transport such as bond formation and cleavage, or polarization. To study this kind of problems, quantum dynamical approaches are used, which combine quantum-mechanical calculations with classical mechanics. 3.3
Characteristic Time Scales
Many complex phenomena, which one encounters in science and technology, are consequences of collective or cooperative behavior of many interacting particles (‘‘many-body problem’’). Liquid–solid phase transitions, nematic–isotropic transitions of liquid crystal molecules, viscoelasticity of polymer melts, self-organization in biological systems, and enzymatic reactions are only a few examples for such many-body problems. An exact treatment of these systems would require a quantummechanical approach. However, quantum-mechanical computations are expensive, and at the present time, these calculations are not feasible for complex systems with processes occurring on relatively large time and length scales. As shown in Fig. 2, quantum-mechanical calculations involve time and length scales of 1012 sec and 1010 m, respectively. Systems involving larger time or length scales require other approaches such molecular dynamics, Monte Carlo, or continuum theory (Fig. 2). As mentioned previously, classical mechanics cannot be applied when, for instance, charge transfer occurs in a process. For other processes, the question arises as to which motions can be reasonably approximated by classical mechanics. In the classical mechanical description, atoms may possess any energy, and as a consequence atoms move along continuous energy surfaces. Contrarily, in the quantum-mechanical description, the energy is quantized, and the atoms can only occupy certain discrete (separated) energy levels. This ‘‘discreteness’’ of the energy landscape will be more pronounced at temperatures where the gaps between the energy levels are much larger than the thermal energy. With increasing temperature more energy levels become thermally accessible and the atoms approach the limit of classical behavior. To determine this limit, let us consider a harmonic oscillator. For the harmonic oscillator, the energy levels are separated by DE=hf, where f is the frequency of the harmonic vibration, and h is Planck’s constant. Classical behavior is approached at temperatures for which kBT>>hf,
Computer Simulations
Figure 2
105
Time and length scales encountered in simulations.
where kB is the Boltzmann constant and T the absolute temperature. Using T equal to 300 K, then kBT is 2.5 kJ/mol. The frequency of the harmonic oscillator becomes 6.25 psec1, which indicates that classical mechanics is a good approximation for motions with a characteristic time scale of picoseconds or longer at room temperature. Depending on system size and the complexity of the system, molecular dynamics simulations are usually performed on a time scale of 10-9 sec (Fig. 2). This raises the question of how well the configurational space can be sampled (48,49). The many-body problem causes a correlated motion between the particles, i.e., the motion of individual particles is coupled to the motion of other particles. Each dynamic process (motion) has a characteristic time-scale, amplitude, and energy range. Macromolecules in general, and proteins in particular, display a broad range of characteristic motions ranging from motions that are very fast and often localized (e.g., atomic fluctuations) to slow motions that occur on the scale of the whole molecule (e.g., motion of domains). Many of these motions have an important role in the biochemical function of proteins and might be coupled to one another. Namely, the large-scale dynamic transitions involve medium-scale motions that naturally could involve localized motions. As summarized in Table 2, these motions are in the range of picoseconds to hours;
106
Peters
Table 2
Characteristic Motions in Proteins
Type of motions Local motions: . Atomic fluctuation . Side chain motion . Loop motions Medium scale motions: . Rigid-body motion (helices) . Loop motion . N- or C-terminal motion Large scale motions: . Domain motions . Subunit motions
Functionality examples . . .
. .
. .
Global motions: . Helix-coil transition . Folding/unfolding . Subunit association
. .
Time and amplitude scales
Substrate recognition Ligand docking Temporal diffusion pathway
1015–1012 sec (fsec–psec) 5 A˚
i.e., that these motions span 20 orders of magnitude in characteristic time. 3.4
Potential Functions
Biological systems involving macromolecules are inherently complex and consist normally of many atoms such that a complete quantum mechanical description of these systems is not yet feasible. Here classical mechanics and the use of empirical potential energy functions are presently the only approach to study these systems. Potential energy functions (i.e., force fields) provide a reasonably good compromise between accuracy and computational efficiency. The parameters contained in the force fields are often calibrated to experimental results and quantum mechanical calculations of small model compounds. The force fields are tested by computing the physical properties that are measurable by experiment. Normally, structural data obtained from x-ray crystallography and NMR, dynamic properties obtained from spectroscopy and inelastic neutron scattering, and thermodynamic data are used in the evaluation of the accuracy of the force field (50–53). The development of a force field is an iterative process, and depend-
Computer Simulations
107
ing on the complexity of the system, it could require extensive optimization. Several research groups are focusing on deriving functional forms and parameters for potential energy functions, which are generally applicable to biological molecules. Among the most commonly used potential energy functions are the CHARMM (http://www.scripps.edu/brooks/charmm_ docs/charmm.html), GROMOS (http://www.gromacs.org), AMBER (http:// www.amber.ucsf.edu/amber/amber.html), and OPLS/AMBER force fields. These force fields are continuously being improved to be suitable for applications in both fundamental and applied research. Complete potential functions are now available for macromolecular simulations involving nucleic acids (54), proteins (55), lipids (56), and carbohydrates (57). These force fields are functions of the atomic positions, ! r, which are usually expressed in terms of Cartesian coordinates. The total potential energy is then computed as a sum of intramolecular (or bonded) energies, Ubonded, and intermolecular (or nonbonded) energies, Unonbonded. As schematically shown in Fig. 3, Unonbonded accounts for interactions between nonbonded atoms (i.e., between molecules) or atoms separated by three or more covalent bonds in the same molecule. The intramolecular term describes the bond stretching, valence angle bending, and bond rotations (torsion) in a molecule.
bond stretch
torsional angle intermolecular interactions valence angle bend
(A) Figure 3 cules.
intramolecular nonbonded
(A) Illustration of the types of interactions encountered in macromole-
108
Peters
Figure 3 (B) Interactions included in the potential energy function for molecular dynamics simulations.
Computer Simulations
109
Ubonded ¼ Ubondedstretch þ Uanglebend þ Utorsion þ Uimproper
ð5Þ
Ubonded-stretch in Eq. (5) is a harmonic potential describing the covalent bond between atom pairs, i.e., 1,2 pairs. Ubondedstretch ¼
1 X Kb ðr r0 Þ2 2 1;2 pairs
ð6Þ
This potential is an approximation of the energy of a bond as it is stretched from its equilibrium bond length, r0 (Fig. 3B). The force constant, Kb, determines the strength of the bond and like r0 depends on the chemical type of atoms connected. The equilibrium bond length and the force constant are usually inferred from high-resolution crystal structures and microwave spectroscopy data, respectively. Uangle-bend in Eq. (5) is associated with an alteration of the bond angle h from equilibrium value h0 (Fig. 3B). This function is also expressed as a harmonic potential. Uanglebend ¼
1X Kh ðh h0 Þ2 2 i;j;k
ð7Þ
Again, h0 and the force constant Kh depend on the chemical type of atoms forming the angle. Ubonded-stretch (Eq. (6)) and Uangle-bend (Eq. (7)) describe the deviation from an ideal geometry. These potentials are effectively penalty functions, and the sum of the potentials should be close to zero in a perfectly optimized structure. The third term in Eq. (5) represents the torsion angle potential function, which takes into account the presence of steric barriers between atoms separated by three covalent bonds (i.e., 1,4 pairs). Different functional forms of the potential are employed in the literature, and one form frequently encountered is a periodic cosine function X Kw ð1 cosðnwÞÞ ð8Þ Utorsion ¼ 1;4 pairs
The rotation around the middle bond is described by a dihedral angle, w, (Fig. 3B) and a coefficient of symmetry (n=1,2,3). The last term (the socalled improper dihedrals) in Eq. (5) has also the form of a harmonic potential and is used to maintain the chirality or planarity of chemical groups (e.g., sp2-hybridization in a carboxylate group). The potential is given by 1 X Uimproper ¼ Kx ðx x0 Þ2 ð9Þ 2 improper
110
Peters
where x0 is the equilibrium angle as displayed in Fig. 3B. The force constants Kb, Kh, and Kw are obtained from studies of small model compounds by using structural information (geometry) and vibrational spectra monitored usually in the gas phase (IR and Raman spectroscopy), supplemented with ab initio quantum calculations. The nonbonded interactions consist of two components, which are the van der Waals (vdW) and electrostatic (elec) interaction energies (see also Fig. 3B). Unonbonded ¼ UvdW þ Uelec
ð10Þ
The functional form of Eq. (10) does not include an explicit hydrogen bond term and hydrogen bonds are frequently accounted for through an appropriate parameterization of van der Waals and Coulomb interactions. The van der Waals interactions are described by a Lennard Jones (LJ) 6–12 potential that includes (i) repulsive forces arising at short distances where the electron–electron interaction is strong and (ii) attractive forces (so-called dispersion forces) originating from fluctuations in the charge distribution in the electron clouds. The LJ potential given by Eq. (11) results in a minimum in the energy, where atom pairs are located at the optimal distances (rijmin) stabilizing the structure. The minimum energy (qij) and the optimal separation of atoms (approximately equal to the sum of van der Waals radii of the atoms) depend on the chemical type of these atoms. 2 !12 ! 3 min 6 X rmin r ij ij 5 ULJ ¼ eij 4 2 ð11Þ r rij ij i<j The electrostatic interaction between a pair of atoms is represented by the Coulomb potential. X qi qj ð12Þ Uelec ¼ 4per e0 rij nonbonded pairs er is the effective dielectric constant for the medium, and rij is the distance between two atoms i and j having charges qi and qj, respectively. The empirical potential functions have some limitations, and one such limitation originates from the fixed set of atom types employed when determining the parameters for the force field. Atom types are defined to describe for instance a particular bonding situation. For example, an aliphatic carbon atom in an sp3 bond has different properties than a carbon atom found in the His ring. Instead of presenting each atom by a unique set of parameters, there is a certain amount of grouping to minimize the number of atom types. This could result in type-specific errors. The properties
Computer Simulations
111
of certain atoms (e.g., aliphatic carbon or hydrogen atoms) are less sensitive to their surroundings and a single set of parameters may be applicable, while other atoms such as oxygens and nitrogens are much more influenced by their neighboring atoms. These atoms require specific parameters to account for the different bonding environments. An approximation introduced to decrease the computational burden is the pair-wise additive approximation. The interaction energy between one atom and the remaining part of the system is calculated as a sum of pair-wise interactions. Hence, the simultaneous interaction between three or more atoms is not calculated and, consequently, any quantity that depends on multiple interactions (e.g., polarization effects) is poorly described, giving rise to subtle differences between calculated and experimental results. Finally, it is noteworthy that the potential energy function does not include entropic effects. Thus a minimum value in the sum of the different potential energy might not correspond to the equilibrium configuration (i.e., not corresponding to the minimum of the free energy). However, the latter effect occurs only in simple (‘‘static’’) energy calculations, whereas in molecular dynamics simulations entropic effects are implicitly included.
3.5 3.5.1
Short-Range and Long-Range Forces Truncating the Potential
Truncating the interactions introduces a discontinuity in the potential function itself and its derivatives. This corresponds to an infinite (impulsive) force acting between atoms that cross the discontinuity. Molecular dynamic simulations using soft potentials and carried out under these conditions will not, strictly speaking, conserve energy. However, the extent of the effect is dependent on the functional form of the potential and the chosen cutoff. To deal with this problem different methodologies have been implemented that shift either the potential or the force at the truncation point. Using the former methodology, the potential is shifted upward by the amount of the discontinuity, thereby bringing the energy to zero at exactly the truncation point. US ðrÞ ¼ UðrÞ Uðrc Þ
r V rc
US ðrÞ ¼ 0
r < rc
ð13Þ
rc is the truncation (cutoff) distance. The disadvantage is that the entire potential is shifted by the same amount, U(rc), and that the forces are still discontinuous at the truncation point, making the force a Heavyside function at this point. To remedy this, a switching function (a third-order
112
Peters
spline) can be used to smoothly switch the potential to zero. The potential is multiplied by the switching function, which has the form Sðr; ron ; roff Þ
¼ 1 ¼
ðroff rÞ2 ðroff þ2r3ron Þ ðroff ron Þ3
¼
0
rVron ron VrVroff
ð14Þ
r < roff
Between the distances ron and roff, the potential smoothly approaches zero. The subscripts on (start) and off (end) refer to the interval used for smoothing the potential. Another approach is to shift the force to zero by introducing an additional term which is linear in distance; thus rVrc Usf ðrÞ ¼ UðrÞ Uðrc Þ dU dr r¼rc ðr rc Þ ð15Þ ¼ 0 r < rc The shifted-force potential represents a larger perturbation on the overall potential. Depending on the density, a cutoff of about 2.5 Lennard Jones diameter makes up about 5–10% of the total energy and pressure. Sometimes these corrections can be added to the simulation averages upon completion of the simulation (‘‘tail-corrections’’), but at other times it is important to include the corrections during the course of the simulation. For example, the long-range correction to the energy depends on the density, and therefore the tail-correction must be included during the simulations when, for instance, these are carried out in the NPT ensemble [i.e., simulations at constant number of particles (N), constant pressure ( P), and constant temperature (T)]. 3.5.2
Force Calculations
The equations of motion in the Hamiltonian formulation of mechanics can be written as a set of first-order differential equations (10,16) ! d! ri pi ¼ dt dt
ð16Þ
d! pi ! ¼ Fi dt
ð17Þ
The forces needed to integrate Eqs. (16) and (17) are derived from the potential model and are the gradient of the potential energy. Because all the empirical potential energy functions mentioned in Section 3.4. are analytical, it is straightforward to derive an expression for the force. In the simplest case,
Computer Simulations
113
the potential is pair-wise additive and spherically symmetric. For this situation, the force between two atoms, i and j, is acting alongq the separation ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! ! 2 vector rij, where the magnitude of the vector is given by jrij j ¼ xij þ y2ij þ z2ij. ! The force that atom j exerts on atom i is simply Fji ¼ jr!i U rij , and according to Newton’s third law, the force that atom i exerts on atom j is ! ! Fij ¼ Fji : 3.5.3
Coulombic Interactions
There are numerous examples in nature indicating that the biological function of some proteins is driven by electrostatic interactions. Therefore the treatment of the electrostatic contributions becomes an important issue in simulating charged systems (58–63). The lack of insufficient treatment of these interactions, which might introduce artificial (undesired) errors when using the truncation method, has been reported in the literature and has been frequently (re)examined. The problem in calculating the electrostatic forces lies in their nature. These forces are long-range and have to be treated in a more sophisticated way than it is done for van der Waals interactions. The Coulomb potential vanishes as 1/r, which is a much slower decay than the 1/r6 dispersion interaction characterized by the Lennard Jones model potential (Eq. (11)). The task of developing a correct and efficient treatment of these long-range electrostatic interactions in macromolecular simulations has drawn significant attention as reflected throughout the literature. Generally, two approaches are applied that involved continuum models (i.e., implicit representation of the solvent) and models using explicit representations of the solvent in periodic boundary conditions. It is the latter situation where the treatment of electrostatic interactions requires some considerations. There are two widely used techniques. The first one is the Ewald summation, which is based on the full lattice sum, using special mathematical techniques to enhance the convergence. Alternatively, the reaction field technique can be used. This method uses full three-dimensional periodic boundary conditions and treats interactions with the nearby solvent molecules explicitly, whereas interactions with the distant solvent molecules are handled through a continuum description. The Ewald summation was developed in 1921 to compute the electrostatic contribution in a system with periodic boundary conditions (pbc). In such a system, the central simulation cell will be surrounded by its image in all directions infinitely. The total electrostatic energy of a system of N particles in a cubic box of size L with pbc is represented by V X N X N 1X qi qj Uelec ¼ ð18Þ 2 !n i¼1 j¼1 rij;!n
114
Peters
! where n ¼ ðn1 Lx ; n2 Ly ; n3 Lz Þ is the cell basis vector and N is the number of atoms. The prime symbol at the first summation indicates the exclusion of all i=j interactions inside the central simulation cell. This sum is conditionally convergent, as the terms decay as 1/n (i.e., the result depends on the order of summation). Now, the Ewald technique separates the electrostatic interactions into two parts, which are a short-range term handled in the direct (real) space and a long-range, smooth varying contribution handled approximately in the reciprocal space using Fourier transforms. This splitting changes the potential energy from a slowly and conditionally convergent series to the sum of two rapidly converging series calculated in direct and reciprocal space and a constant term, Uelec= Udir+Urec+Uself. The last term, Uself, cancels the self energy term introduced in the calculation of Urec. Because the derivation of these potential functions is lengthy and is beyond the scope of this chapter only the final equations are given: Udir ¼
N 1XX qi qj erfcðaA! rj ! ri þ ! nA ! ! ! 2 !n i; j¼1 Arj ri þ nA
Urec ¼
1 X expðp2 m2 =a2 Þ ! ASð mÞ2 A !2 2pV ! m
ð19Þ
!
ð20Þ
m p0
N a X q2 Uself ¼ pffiffiffi p i¼1 i
ð21Þ
where V is the volume of the simulation cell with sides of length Lx, Ly, and ! complementary error function, Lz. m is a reciprocal-space vector, erfc is the X ! N and S( m) is the structure factor defined as q expð2pi½m1 xj =Lx þ m2 yj = j¼1 j Ly þ m3 zj =Lz Þ. The transformation treats each point charge in the system as if it were surrounded by a Gaussian distribution (i.e., normal distribution) of an equal but opposite sign charge, producing an exponentially decaying function. The Gaussian charge distribution essentially screens the interactions between neighboring point-charges, and the interactions become shortrange. As a result, the sum over all charges, including their images, converges rapidly in direct space. To counteract the Gaussian distribution introduced ‘‘artificially’’, another Gaussian distribution of the same sign and magnitude of charge is added for each point charge. This sum is performed in the reciprocal space using Fourier transforms and transformed back to the direct space. The a parameter occurring in Eqs. (19)–(21) represents the width of the Gaussian distribution. a reflects the accuracy and determines the relative rates of convergence between the direct and reciprocal sums to optimize computational time. When a is small the direct sum converges faster than the reciprocal
Computer Simulations
115
sum. Similarly, when a is large the reciprocal sum converges faster than the direct sum. This is simply due to the fact that in the direct sum a is in the numerator of the function while in the reciprocal sum a occurs pinffiffiffi the de1=6 , nominator. It has been shown that a should vary such that a ¼ c p VN2 where the constant c determines the ratio of execution time of the real and the reciprocal term, which may vary from one platform to another. The implementation of the Ewald sum has been further developed to supplement the conventional and also the refined truncated list methods (64). Efficient computational schemes such as the Particle Mesh Ewald (PME) method (65,66), the somewhat related Particle–Particle–Particle method (67), and the Fast Multipole Algorithm (or Method) (FMA/ FMM) originally formulated nonperiodically (68,69) and subsequently generalized to periodic systems (70,71) have been used, refined, and rigorously compared in the literature (72–76). The PME method is the currently preferred methods besides the conventional truncated list methods. 3.6
Algorithm for Solving Newton’s Equations of Motion
All common algorithms used to numerically solve the equations of motion are based on a Taylor series expansion of the coordinates, velocities, or higher order time derivatives of the coordinates. An overview of the most commonly used algorithms is provided in Table 3A. The simplest method is an Euler formulation, but this algorithm does not produce a stable trajectory, and an NVE [i.e., constant number of particles (N), constant volume (V), and constant energy (E)] simulation suffers from severe energy drift. This is because the positions and velocities are advanced without considering information from the previous time step. This shortcoming is solved by the more advanced algorithms such as the Gear predictor–corrector algorithms, the Verlet algorithm, or the Verlet-like algorithms: Leapfrog and Velocity–Verlet (see Table 3B). The Verlet and the Verlet-like algorithms are based on a Taylor series expansion of only the coordinates and velocities, whereas the predictor–corrector algorithms use a number of time derivatives of the coordinates to advance each derivative forward in time. The predictor–corrector algorithms frequently applied in the literature differ in the number of time derivatives used in the numerical solution of the equations of motion. This is referred to the different value of the algorithms. For instance, the predictor–corrector algorithm of ‘‘value 5’’ includes terms up to the fourth-order time derivative. As the name indicates the predictor– corrector algorithm consists of a predictor step and a corrector step. A Taylor series expansion of the different time derivatives is used to advance each derivative forward in time from t to (t+yt), producing a series of predicted coordinates and their time derivatives. The predicted coordinates
116 Table 3A
Peters Algorithm of the Different Integration Schemes
Algorithm Euler
Predictorcorrector
Verlet
Leapfrog
VelocityVerlet
! ri ðt
: ! ri ðt
þ ytÞ ¼
! ri ðtÞ
Algorithm : þ! ri ðtÞyt þ 12 ! r¨ i ðtÞyt2
: þ ytÞ ¼ ! ri ðtÞ þ ! r¨ i ðtÞyt : !p ! ri ðt þ ytÞ ¼ ri ðtÞ þ ! ri ðtÞyt þ 12 ! r¨ i ðtÞyt2 ::: þ 16 ! ri ðtÞyt3 þ : : : ::: :! p :! ri ðt þ ytÞ ¼ ri ðtÞ þ ! r¨ i ðtÞyt þ 12 ! ri ðtÞyt2 þ : : : ::: p ! r¨ i ðtÞ þ ! ri ðtÞyt þ : : : r¨i ðt þ ytÞ ¼ ! ::: ::: p ! ri ðt þ ytÞ ¼ ! ri ðtÞ þ : : : r¨i c ðt þ ytÞ ! r¨i p ðt þ ytÞ Dr!¨i ðt þ ytÞ ¼ ! !c !c ri ðt þ ytÞ ¼ ri ðt þ ytÞ þ c0 D ! r¨ i ðt þ ytÞ :c :p ! ! ! ¨ ri ðt þ ytÞ ¼ ri ðt þ ytÞ þ c1 D ri ðt þ ytÞ ! r¨ ip ðt þ ytÞ þ c2 D ! r¨ i ðt þ ytÞ ri¨ c ðt þ ytÞ ¼ ! ::: ::: ! ri p ðt þ ytÞ þ c3 D ! r¨ i ðt þ ytÞ ri c ðt þ ytÞ ¼ ! ! ! ! ri ðt þ ytÞ ¼ 2ri ðtÞ ri ðt ytÞ þ ! r¨ i ðtÞyt2 : 1 ! ! ! ½ri ðt þ ytÞ ri ðt þ ytÞ ri ðtÞ ¼ 2yt : ! ri ðt þ ytÞ ¼ ! ri ðtÞ þ ! ri ðt þ yt=2Þyt : : ! ri ðt yt=2Þ þ ! r¨ i ðtÞyt ri ðt þ yt=2Þ ¼ ! :! :! 1 !: ri ðtÞ ¼ ri ðt þ yt=2Þ þ ri ðt yt=2Þ 2 : 1 ¨ ! ri ðt þ ytÞ ¼ ! ri ðtÞ þ ! ri ðt þ ytÞyt þ ! r i ðtÞyt2 2 : : 1 ¨ ! r i ðtÞyt ri ðt þ yt=2Þ ¼ ! ri ðtÞ þ ! 2 : : 1 ¨ ! ri ðt þ yt=2Þ þ ! ri ðt þ ytÞ ¼ ! r i ðt þ ytÞyt 2
Source: Refs. 10 and 16.
Taylor series expansion : and ! ri ðtÞ to second and first order, respectively
! ri ðtÞ
: ! ri ðtÞ; ! ri ðtÞ; r!¨i ðtÞ
and higher order time derivatives comments: superscript ‘‘p’’: predicted value; superscript ‘‘c’’ correct value; c0, c1, c2, and c3 are the Gear coefficients (10)
! ri ðtÞ
forward to ! ri ðt þ ytÞ and backward to ! ri ðt ytÞboth to fourth order ! ri ðtÞ forward to ! ri ðt þ ytÞ : (to third order); ! ri ðtÞ forward :! to ri ðt þ yt=2Þ and backward : to ! ri ðty t=2Þ (to third order) ! ri ðtÞ
forward to ! r ðt þ ytÞ (to : i third order); ! ri around t : using ! ri ðt þ ytÞ and moving backward by (yt) as well as : : ! ri ðtÞ ri around (t +yt) using ! and moving forward by (yt) (to third order)
Computer Simulations Table 3B
117
Advantages and Disadvantages of the Different Integration Schemes
Algorithm Euler
Properties . . . .
Predictor-corrector
. . .
Verlet
. . . .
Leapfrog
. .
Velocity-Verlet
. .
Simplest integration scheme Not time-reversible Advancing coordinates and velocities without information from the previous time step Severe energy drift in an NVE simulation Not time-reversible Requires higher order time-derivatives Storage requirements due to the number of time-derivatives Time-reversible Numerical imprecision due to the addition of relatively large and small numbers Coordinates and velocities are determined to different orders; ! r~O(yt4), y! r/yt~O(yt2) Algorithm does not involve the velocities— temperature regulation by scaling is not possible Coordinates and velocities are determined to the same order; ! r~O(yt3), y! r/yt~O(yt3) Coordinates and velocities are determined at different time Coordinates and velocities are determined to the same order; ! r~O(yt3), y! r/yt~O(yt3) Coordinates and velocities are determined at the same time
are then used to determine the correct velocities or accelerations (depending on the order of the equations of motion). Because the equations of motion were not involved in the predictor step, which was simply a Taylor series expansion, the corrected velocities or accelerations based on the equations of motion differ generally from the predicted values. The difference between the corrected and the predicted velocities or accelerations is used with a set of Gear coefficients to correct all of the predicted time derivatives. The Gear coefficients are chosen to optimize the stability and accuracy of the trajectories. The coefficients depend on the order of the differential equation and the number of derivatives used in the Taylor series expansion. The predictor–corrector algorithms usually store a significant number of time derivatives of the coordinates at a particular time step. The alternative algorithms Verlet and Verlet-like algorithms require much less storage, as they use information from the current and previous time steps to advance
118
Peters
the coordinates in time. An overview of the advantages and disadvantages of the different algorithms is listed in Table 3B. The derivation of the governing equations for these integration schemes is straightforward, and a summary of the final equations is provided in Table 3A. An excellent discussion of different algorithms can be found in the literature (10,16). 3.7 3.7.1
Implementation of the Algorithms Predictor–Corrector Algorithm
It is straightforward to implement the predictor–corrector algorithm for any of the ensembles considered. For each variable, i.e., particle coordinate, heat bath coordinate or volume coordinate, predictor equations are formulated. The corrector coefficients (Gear coefficients) are chosen according to the order of the equations of motion and according to the chosen ‘‘value’’ of the predictor–corrector algorithm, i.e., the number of terms in the predictor equations (10). In a simulation of an NVE ensemble, where the accelerations only depend on the position coordinates and not on the velocities, one may use the Verlet equation directly to integrate the equations of motion (40). However, for an NVT [i.e., constant number of particles (N), constant volume (V), and constant temperature (T)], or an NPT [i.e., constant number of particles (N), constant pressure ( P), and constant temperature (T)] simulation, one cannot use the Verlet algorithm because the accelerations also depend on the velocities at time t, which do not enter in the Verlet scheme. Instead, one has to use another Verlet-like algorithm such as the leapfrog algorithm, where the velocities are part of the integration scheme. 3.7.2
Leapfrog Algorithm
The leapfrog algorithm can be used directly when the accelerations depend only on the particle position coordinates as in the simulation of the NVE ensemble. To use the algorithm in situations where the accelerations also depend on velocities at time t, the calculations are not :straightforward : because the velocities at half-times [i.e., ! ri ðt þ yt=2Þ and ! ri ðt yt=2Þ ]: are involved directly in the algorithm. The problem is that the velocity ! ri ðtþ : ! yt/2) is unknown and is needed to determine ri ðtÞ. Hence one may have to ‘‘customize’’ the equations to the particular situation encountered. Two situations may arise when either the accelerations depend linearly on the velocities or on the square of the velocities. In the : former case, it is easy to solve the equation for the unknown velocity ! ri ðt þ yt=2Þ , whereas in the latter case a quadratic equation in the unknown velocity has to be solved. Usually, this is unattractive and as demonstrated below an iterative approach is used instead.
Computer Simulations
119
NVT Ensemble. For a simulation of an NVT ensemble, the equations of motion for the particle coordinates ri and the friction coefficient f are (77– 80) ! ! : ! ¨ri ðtÞ ¼ Fi ðtÞ fðtÞr!i ðtÞ ð22Þ mi ! N : 2 1 X ri ðtÞ 3NkB T fðtÞ ¼ mi ! ð23Þ MS i¼1 where kB is the Boltzmann constant. The parameter MS is associated with the heat bath coordinate and regulates the changes in the friction coefficient ~. In the limit where MS approaches infinity, the conventional MD simulation that samples a microcanonical ensemble (NVE) is recovered. MS is proportional to the relaxation time, ss, and is defined as MS=3NkBTss. A relatively large values of ss means that the variation of f(t) is relatively slow corresponding to a slow heat flow between the system and the heat bath. The accelerations, ! r¨i ðtÞ, depend linearly on the velocity, and the expression for the half-time velocities becomes: : ! : ðFi ðtÞ=mi Þyt þ ½1 ð1=2ÞfðtÞytr!i ðt yt=2Þ ! ri ðt þ yt=2Þ ¼ ð24Þ ½1 þ ð1=2ÞfðtÞyt The position coordinates are advanced by the ordinary leapfrog algorithm (see Table 3A). The friction coefficient f(t) is expanded to third order in yt like the coordinates, i.e., 1 ¨ 2 fðt þ ytÞ ¼ fðtÞ þ fðtÞyt þ fðtÞyt þ O ðyt3 Þ 2
ð25Þ
fðtÞ is given by Eq. (23), and the acceleration of the friction variable, f¨ ðtÞ, is found by differentiation of Eq. (23) with respect to time. f(t+yt) is then given by ! N ! : 2 1 X 1 fðt þ ytÞ ¼ fðtÞ mi ri ðtÞ 3NkB T yt þ MS i¼1 MS ! 3N X : ! ! mi ri ðtÞri¨ ðtÞ yt2 ð26Þ i¼1
: and ! ri ðtÞ is calculated as the average between the half time velocities at (t+yt/ 2) and (tyt/2) (see Table 3A). In summary, the leapfrog algorithm for the simulation of an NVT ensemble has the following scheme: (i) Eq. (24) is used
120
Peters
to advance the velocities, where the forces are determined from the potential energy function and the atomic positions at time t; (ii) the positions are advanced using the usual equations (Table 3A); (iii) the velocities at time t are calculated from the velocities at (t+yt/2) and (tyt/2); and (iv) the results are finally used in Eq. (26): to advance the friction coefficient, f. In this scheme, : ri ðt yt=2Þ, and f(t). the stored quantities are ! ri ðtÞ, ! The equations of motion for this ensemble are (81) 2 ! ¨ : 1 ! VðtÞ VðtÞ ! !¨ ! 2 ðFi ðtÞ fðtÞpi ðtÞÞ þ ri ðtÞ ð27Þ ri ðtÞ ¼ mi 3VðtÞ 3VðtÞ
NPT Ensemble.
1 ¨ ðPðtÞ Pext Þ fðtÞVðtÞ VðtÞ ¼ MV 1 fðtÞ ¼ MS with
! N X ð pi Þ 2 i¼1
mi
ð28Þ !
ð3N þ 1ÞkB T
þ
MV 2 VðtÞ MS
ð29Þ
: !
!
p ðtÞ Fi ðtÞ VðtÞ ! pi ðtÞ ¼ fðtÞ þ mi 3VðtÞ mi mi ! : pi ðtÞ ! VðtÞ ¼ ri ðtÞ ! ri ðtÞ mi 3VðtÞ 0 1 !2 ! N X ! 1 @mi pi ðtÞ þ! PðtÞ ¼ ri ðtÞFi ðtÞA 3VðtÞ i¼1 mi
ð30Þ ð31Þ ð32Þ
There are two parameters introduced in the equations of motion. Ms and Mv are associated with the heat bath and the barostat, respectively. Mv is related to the relaxation time sv by Mv=PextsV2/rref3. The variable rref is a reference length here arbitrarily chosen to be the atom size parameter of the Lennard– r¨ ðtÞ and Jones potential. Pext is the external pressure. Both the accelerations ! ¨ V(t) depend on the squared velocities. Hence it will not be feasible to solve the coupled quadratic equations for the advanced velocities. Instead, an iterative approach has to be used. The friction coefficient f(t+yt) is deter. mined like the NVT ensemble by a Taylor series expansion (Eq. (25)). ~ (t) is given by Eq. (29), and the acceleration ~¨ (t) is determined by differentiation . of ~ (t) (Eq. (29)) with respect to time :
! ! N X pi ðtÞ pi ðtÞ 2MV ¨fðtÞ ¼ 1 ¨ mi VðtÞVðtÞ þ MS i¼1 mi mi MS
ð33Þ
. where ! pi (t)/mi and ! pi ðtÞ=mi are given by Eqs. (30) and (31), respectively. The iterative determination of the advanced velocities requires an initial guess of
Computer Simulations
121
: : : : ! the ri ðtÞ and VðtÞ. A natural choice would be ! ri ðtÞ ¼:! ri ðt dt=2Þ and : velocities : : VðtÞ ¼ Vðt dt=2Þ. Additionally, ! ri ðtÞ, V(t), ! ri ðt dt=2Þ, Vðt dt=2Þ, and : f(t) have to be stored. These are used to determine the accelerations ri ðtÞ ¨ and VðtÞ, which provides an avenue for determining the advanced velocities : : ! dt=2Þ and V ðt þ dt=2Þ, and subsequently the new values for the velociri ðt þ : : ties ! ri ðtÞ and VðtÞ using the half-time velocities (Table 3A). These enter a new calculation of the accelerations and the advanced velocities. This process is continued until convergence, and finally the coordinates are advanced before continuing on to the next time step. Hence both particle velocities and volume velocity have to be determined by iteration because the variables are coupled, preventing an analytical solution. 4
CASE STUDIES
There are many aspects of medicinal chemistry and biophysics, where computer analysis and simulation can augment or explain experimental observations. Computers and sophisticated software packages have become an integral part in modern science. Hence there exists a vast amount of theoretical and computational contributions in the literature focusing on the understanding of enzyme structure and function. These studies cover a wide range of areas such as protein–lipid interactions (e.g., membrane-associated enzymes), ion transport through channels, solvent effect on enzyme function, substrate binding, catalysis (enzyme kinetics), inhibitor design (molecular recognition), etc. In the following we will not attempt to provide a comprehensive review of the literature, but we will discuss two projects performed in our research group, MEMPHYS (82), which will highlight some possible applications of molecular dynamics simulations. 4.1
Lipase–Lipid Interactions: Implications for Hydrolysis
The research activities presented here include both computer simulations and experimental techniques (83). Depending on the question asked and on the complexity of the problem, simulations and experimental techniques are applied in a complementary fashion. Both approaches are aimed at elucidating important lipase–lipid interactions on an atomic level as well as at understanding how the lipid-interface interferes with the performance of lipases. 4.1.1
Background
The essential role of lipases in many biological and industrial processes has stimulated interest in elucidating the molecular details determining the function of triglyceride lipases. Triglyceride lipases catalyze the hydrolysis of glycerides into free fatty acids and monoglycerols (84). Biologically, lipases
122
Peters
are essential for transporting fats into cells for storage or for conversion into energy. Triglycerides in the diet, in lipoprotein particles, and in fat storage cells cannot be directly absorbed into cells. They must be hydrolyzed into free fatty acids and monoglycerols before they are able to cross cellular membranes. Commercially, lipases are used in various applications ranging from removal of oils or fats from fabrics to stereo-specific synthesis of compounds including precursors for biologically active therapeutics, herbicides, or pesticides (85–88). Lipases from different organisms vary greatly in size ranging from molecular masses of 20–25 to 60–65 kDa. Although the amino acid sequence of lipases of origin is diverse, they all share the characteristic structural a/h hydrolase fold (89). Three-dimensional crystal structures of several lipases covering the full range of sizes have provided the basis for understanding the activation process and catalytic mechanism (90,91). Several crystal structures revealed that the active site of lipase is covered by a helical loop (‘‘lid’’), and that the activation process involves the displacement of the active-site lid. It should be noted, however, that the lid is not a ubiquitous feature and it has been found that some lipolytic enzymes have solvent accessible active sites. In aqueous solution, the activity of lipases is very low and only when the substrate concentration exceeds the critical micelle concentration is a sharp increase in enzyme activity observed. Hence the lipid interface triggers a conformational change in the enzyme (i.e., the displacement of the lid), by which the active site becomes accessible to the substrate. To illustrate this movement, the closed (‘‘inactive’’) and open (‘‘active’’) conformations of the Rhizomucor miehei lipase are shown in Fig. 4. The displacement of active-site lid results in the exposure of a hydrophobic site of the lid and thereby facilitating binding of the lipase to the interface (91). Hydrophobic interactions between residues of the lid and the lipid interface contribute to the stabilization of the open conformation. The observed conformational rearrangements correlate well with the phenomenon of interfacial activation. The mechanism for catalysis is analogous to that proposed for serine proteases (92,93), where the active site region possesses a [Ser/His/acidic acid] active site triad (as shown in Fig. 4) and a neighboring oxyanion hole. The triad is involved in the catalysis, where the serine forms the nucleophilic center of the sequence G–X–S–X–G, and the His residue serves as a general acid/ base. The oxyanion hole stabilizes the incipient carbonyl of the ester group during turnover. The catalytic reaction is initiated by forming a Michaelis– Menten complex between the substrate and the enzyme. The reactive carbonyl carbon atom of the ester bond is then attacked by the oxygen of the serine side chain leading to the formation of a tetrahedral intermediate (close to the transition state). During the formation of the intermediate, the hydrogen atom of the hydroxyl group of the serine is transferred to the histidine, thereby
Computer Simulations
123
Figure 4 Secondary structure of the closed and open conformation of Rhizomucor miehei lipase are shown on the left. The top view shown on the right displays the three residues (His–Ser–Asp) forming the catalytic triad. As indicated, the active site serine is covered by the lid in the closed conformation.
causing the histidine imidazole ring to become protonated (i.e., positively charged). The positive charge is stabilized by the negatively charged acidic residue in the triad. The tetrahedral intermediate, which subsequently breaks down to release alcohol and to form an acyl enzyme, is stabilized by two hydrogen bonds formed with amide bonds of residues belonging to the oxyanion hole. The protonated imidazole ring donates a proton to the leaving alcohol group. The acyl enzyme is then hydrolyzed by water or cleaved by a competing nucleophile in which one proton is transferred from the nucleophile through the imidazole group to the active-site serine residue (94). 4.1.2
Overview
As schematically shown in Fig. 5, the enzymatic reaction of lipases involves at least four processes: (1) binding to the lipid surface, (2) penetration into the lipid phase, (3) activation of the enzyme (i.e., displacement of the activesite lid), and (4) catalytic hydrolysis (including the formation of the transition state). Although the lipid interface is essential for efficient catalysis, the exact role of the lipid–water interface is not well understood. However, there is increasing evidence that the properties of the interfacial plane are important for lipase action (95,96), and it has been shown that lipases are sensitive to many external factors including surface pressures and lipid composition. In order to reveal the structural and interfacial properties of the lipid film at different surface pressures, we have studied the structure and phase transitions 1,2-sn dipalmitoylglycerol monolayers by applying x-ray
124
Peters
Figure 5 Schematic illustration of the enzymatic reaction of lipases that involves at least four processes: (1) binding to the lipid surface, (2) penetration into the lipid phase, (3) activation of the enzyme, and (4) catalytic hydrolysis.
diffraction (96,97), pressure-area (k-A) isotherms (98), and computer simulations (99,100). As displayed in Fig. 6, the interfacial quality (measured by the hydrophilicity of the interface) is dependent on the surface pressure of the film and hence may influence the activation and/or adsorption of the enzyme to the interface. Activation of the enzyme involves the displacement of the active-site lid, in order to allow access of the substrate to the active site suggesting that the displacement of the lid might be triggered by interactions between residues located in the lid and the head groups of the lipid interface. It is exactly this part of the activation pathway that, at the present time, is difficult to probe by experimental means but where computer simulations might provide essential information. To investigate possible activation pathways and to elucidate the effect of a hydrophobic environment (as it would be provided by a lipid surface) on the lid opening, we have applied molecular dynamics (MD) (101) and Brownian dynamics (BD) (102) techniques. Molecular dynamic simulations were performed to investigate the effect of a hydrophobic environment on the activation of lipases, whereas the BD technique was applied to study the dynamics of the activating loop. Our results, which agree well with experimental observations, suggest that the activation of lipases is enhanced in a hydrophobic environment. An example is shown in Fig. 7 for Rhizomucor miehei lipase. At a dielectric constant of 4 (corresponding approximately to a lipid
Computer Simulations
125
Figure 6 Calculated hydrophilicity of the lipid interface as a function of area per lipid molecule extracted from a molecular dynamics trajectory. The hydrophilicity is expressed as the difference between the accessible areas of hydrophilic and hydrophobic atoms at the lipid interface.
Figure 7 Total mean energy difference, EactiveEinactive, as a function of dielectric constant. The energies shown are averages calculated from the energy difference obtained after opening of the active site lid and subsequently closing of the lid, i.e., total mean energy difference=[(EactiveEinactive)opening+(EactiveEinactive)closing]/2.
126
Peters
environment), the energy gain is approximately 30 kcal/mol. Additionally, the BD simulations revealed that the active-site lid exhibits some gating motion suggesting that the enzyme molecule may exist in a partially active form prior to the catalytic reaction as also suggested by recent x-ray crystallographic studies (103). Our findings indicate that the conformational and interfacial properties of the lipid may have considerable influence on the enzyme’s catalytic activity. Indeed, using fluorescence microscopy, surface potential, and activity measurements, we obtained a detailed understanding of the inhibitory effect of the additives, fatty acid and fatty alcohol, on the lipolytic activity of the bacterial lipase Pseudomonas cepacia, the yeast lipase Candida rugosa, and the fungal lipases Rhizomucor miehei and Rhizopus delemar (95). Measurements were performed for 1,2 didecanoyl-glycerol/eicosanoic acid and 1,2 didecanoyl-glycerol/1-octadecanol mix-monolayers. As shown in Fig. 8, small amounts of fatty acid have a significant inhibitory effect on the R. miehei lipase activity. These results indicate that lipase activity is strongly influenced by the lateral distribution of additives in the diglyceride matrix. Furthermore, the studies showed that the level of inhibition might be correlated to the isoelectric point (pl) of the enzymes (95). Initially, we concluded that repulsive charge–charge interactions between fatty acid moieties and
Figure 8 Lipase activities as a function of mole fraction of Eicosanoic acid. Data are shown for R. miehei lipase (Rm1) (1 unit). Subphase is 10 mM TRIS buffer with 0.1 mM EDTA; pH=8. Error bars are calculated from at least three independent experiments.
Computer Simulations
127
charged residues at the enzyme surface are responsible for this inhibition. However, as shown in Fig. 9, Humicola lanuginosa lipase shows relatively high binding affinity to acidic phophatidylglycerol liposomes. It should be noted that a direct comparison of the results is difficult to perform because the systems (monolayer vs. liposomes) and lipids (diglyceride/fatty acid vs. acidic phospholipids) are different (104,105). With the increasing amount of information, it is also important to understand the physics and the chemistry that relate the structural fold of the protein and the structure of the binding site with the function and action of the enzyme. Substrate binding, enzymatic processes, and product release are often associated with conformational changes in the structure, and these structural changes require a certain flexibility of the protein (106–108). Several studies have used molecular dynamics simulations to study the flexibility of proteins and its relation to the biological function of the protein. In these studies, protein flexibility was extracted using principal component analysis (109–111), which allows the separation of the internal protein motion in terms of relatively large collective motions and small thermal fluctuations (112–114). This technique provides an avenue for extracting functionally relevant motions in the protein and for understanding the physical nature of the protein energy landscape (115–118). Therefore an essential aspect of protein function is the dynamic response of the protein upon substrate binding and product release in the presence of a lipid patch. To gain further insight into the structure–function relation of lipases, we have performed
Figure 9 Quenching of the pyrene monomer fluorescence as a function of lipid concentrations. Liposomes are composed of 97 mol% DMPG and 3 mol% PDA. Excitation and emission wavelengths are 290 and 340 nm, respectively.
128
Peters
molecular dynamics simulations of R. miehei lipase in complex with a substrate or a product molecule in the presence of a lipid patch (Fig. 10). These simulations indicate that the dynamic responses of the substrate or product molecules are dependent on the environment (119). Entry and departure of substrate molecules could be observed in the presence of the lipid patch as shown in Fig. 10. Here, the snapshots of the initial configuration (Fig. 10a) and a configuration taken after 1000 psec (Fig. 10b) are shown that display the entry (solid circle) and departure (dashed circle) of a substrate molecule. The case of the simulation with a product molecule reveals a different picture. Analysis of the hydrogen pattern between product (fatty acid) and residues in the binding cleft along the trajectory revealed that two serine residues form stable hydrogen bonds with the substrate and hence might be involved in the mechanism of product inhibition (Fig. 11) (119). Important questions remain regarding the exact orientation of lipase molecules at the interfacial plane and the mobility of the active site lid. Using x-ray reflectivity measurements supplemented by molecular dynamics simulations, we have gained insight into the orientation of the lipase molecules as well as the conformation of the lipase molecules (i.e., closed or open con-
Figure 10 Secondary structure of R. miehei lipase (Rml) complexed with a substrate molecule and in the presence of the lipid patch consisting of substrate molecules. Active site lid is shown as a rod, whereas atoms of the active serine are displayed in van der Waals modus. The substrate molecules are displayed in sticks. Images (a) and (b) are snapshots of the simulation taken at the start and after 1 nsec. See text for more details.
Computer Simulations
129
Figure 11 Snapshots taken along the trajectory from the Rml-product-patch simulation showing the hydrogen bond pattern between the carboxylate group of the product molecule and residues in the binding pocket. Hydrogen bonds are shown as solid bonds. The snapshots are taken at (a) 0, (b) 600, and (c) 1940 psec.
formation). Fig. 12 displays an example of the calculated electron density profiles (q(z)) across the enzyme, which were extracted from a molecular dynamics trajectory and calculated for different orientations. Clearly, the calculated profiles depend characteristically on the orientation of the enzyme. Hence this approach provides a route for determining the orientation of the lipase molecule at an air–water interface by fitting the calculated profiles to the corresponding experimental data recorded from synchrotron x-ray reflectivity measurements (120). The computational scheme developed would provide an avenue for determining the lipase orientation at the interface of different lipid monolayers (alkane, alcohols, diglyceride, etc.) and for elucidating the effect of different lipid headgroups on the orientation of lipases. We have considered an alkane/water interface to study the orientation of lipases on such a hydrophobic surface. Initial synchrotron x-ray reflectivity measurements of the alkane/water system suggested that the water molecules behave very differently when compared to bulk water (121). Therefore we first proceeded to elucidate the structure of the water at that interface, as this could be an important effect on the adsorption and orientation of lipase molecules. The measurements could not reveal the exact water structure, and we therefore performed molecular dynamics simulations on the alkane/water and alcohol/water systems. The simulations revealed that water molecules are oriented characteristically at these interfaces. Calculated electron density profiles are shown in Fig. 13, which compare well with the experimentally
130
Peters
Figure 12 Orientation-specific electron density profiles for selected orientations of the open (dashed) and closed (solid lines) conformation of Thermomyces lanuginosa lipase. The Euler angles of rotation are indicated in each panel. For example, (0,0,0) means that the active site lid is aligned with the surface normal nz. Snapshots of the closed and open conformers taken after 2 nsec of molecular dynamics simulations are shown to the right of the profiles and are in orientations referring to the particular electron density profile. The active site lid is displayed as a solid tube.
Computer Simulations
131
determined profiles (122). There are settled differences in the interfacial water density when comparing the water electron density profile determined at the alkane monolayer with the profile obtained in the presence of the alcohol monolayer. 4.2
Regulation and Substrate Specificity of Protein Tyrosine Phosphatases
Protein tyrosine phosphatases (PTPs) are critical elements in the regulation of signal transduction pathways in living organisms, and their unregulated activities are related to diverse pathological events. Several kinds of diseases have been related to an increase in the PTP transcription. For instance, the receptor-like phosphatase, PTPa, has been found to possess oncogenic activity. Another example is the cytosolic phosphatase, PTP1B, which might be involved in the development of diabetes. Consequently, phosphatases may
Figure 13 Normalized electron density profiles for n-alkane C36H74 and n-alcohol C35H71OH. The profiles are extracted from molecular dynamics simulations of crystalline monolayers of these amphiphilic molecules at a water surface.
132
Peters
represent a potential therapeutic target. To determine the biological function and to design inhibitory agents, it would be of enormous value to have a detailed structural understanding of the regulation of these enzymes and how phosphatases distinguish the different substrates that they encounter in the cell. To elucidate the molecular mechanisms underlying substrate specificity and inhibitor selectivity, we have used a multidisciplinary approach, which combines theoretical approaches, computer simulations, and experimental techniques in a complementary fashion (123). 4.2.1
Background
The phosphorylation/dephosphorylation of tyrosine residues in proteins is one of several key molecular mechanisms by which living organisms regulate cell growth, proliferation, and differentiation (124). The phosphorylation state of proteins is remarkably dynamic, which enables cells to respond rapidly to discrete changes in environmental conditions (125). This dynamic behavior is governed by the opposing actions of protein-tyrosine kinases and protein-tyrosine phosphatases (Fig. 14), which are integrated within an elaborate signal-transducing network, an enzyme-based system, which converts external environmental stimuli to internal cellular action. The defective or inappropriate operation of this network is at the root of a variety of diseases in humans and animals. Consequently, the characterization of the individual components and the delineation of the circuitry of this regulatory network have emerged as one of the most active fields in biological research. The critical roles played by these phosphatases in pathological events indicate that these signaling enzymes are suitable targets for pharmacological intervention and that this may be achievable in a selective manner. Clearly, a selective control of the biological function of PTPs is a challenging task, and an inhibitor must not only tenaciously bind to the specific target enzyme, but must do so without impeding the catalytic behavior of closely related en-
Figure 14 kinases.
Illustration of the interplay between protein phosphatases and protein
Computer Simulations
133
zymes. Many PTPs show strong substrate selectivity, and it is generally believed that the activity of PTPs is regulated by specific substrates containing the requisite structural recognition sites and/or by confining individual PTPs and protein-tyrosine kinases to specific cellular microenvironments. 4.2.2
Overview
The PTPs are characterized by having a common active site sequence: the (H/ V)CX5R(S/T) motif (X denotes any amino acid residue). This conserved motif, which in the literature is referred to as the ‘‘P-loop’’, plays an important role in the binding of the substrate and subsequent catalytic reaction. It defines the binding site for the tyrosyl phosphate substrate and contains a nucleophilic cysteinyl residue as well as a conserved arginine residue separated by five residues. The backbone nitrogens of the P-loop and three of the arginine nitrogens coordinate the oxygens of the phosphate group. Binding of the substrate triggers the displacement of a loop (referred to as the ‘‘WPDloop’’ in the literature) toward the phosphate moiety, which causes a tight binding of the tyrosyl phosphate group and brings a catalytically active aspartate (general acid/base) in position for the catalytic reaction (Fig. 15). The cysteine thiolate in the consensus sequence functions as a nucleophile and
conformation: open substrate
closed Asp Asp Cys
H2 N-L-pY-EDAD influence of this site on binding
Gln
Figure 15 Secondary structure of PTP1B complexed with the hexa-peptide DADEpYL-NH2 (pY stands for phosphorylated tyrosine) is displayed on the right. The peptide structure is shown on the left. The active site Cys, Gln (coordinates a water molecule that participates in the hydrolysis), Asp (general acid/base), and substrate are shown in stick modus.
134
Peters
electrical field electrical field from N-H bond
helix Arg H
Cys H
H H
H
H
Figure 16 Illustration of the electrical field surrounding the negatively charged active site cysteine.
attacks the phosphorous atom in the substrate resulting in the formation of the thiophosphate enzyme intermediate and product release (dephosphorylated substrate). The transition state is then destabilized by the attack of a water molecule yielding inorganic phosphate. The first PTP structure solved was PTP1B, which has initiated much interest in this field, as it has been suggested that PTP1B is a negative regulator of the insulin signaling. The secondary structure of PTP1B complexed with the hexa-peptide, DADEpYL-NH2 (pY stands for phosphorylated tyrosine), is displayed in Fig. 15. The active Cys, the conserved Gln, the essential general acid/base (Asp), and the substrate are shown in stick modus. The active site Cys (residue 215 in PTP1B), which facilitates the catalysis, is negatively charged at physiological pH (i.e., the pKa of the Cys residue in the protein is lower than the pKa of a free cysteine residue). It has
Figure 17 (A) Titration results for PTP1B. pKas computed with modified backbone charges as a function of pKas in PTP1B. ‘‘Loop’’ indicates the pKa shift when the backbone charges of all residues in the loop are zeroed. ‘‘Helix’’ marks the pKa shift when the backbone charges of all amino acids in the central a-helix are zeroed. ‘‘Helix+loop’’ indicates that backbone charges of all residues in the loop and central a-helix are zeroed. These values should be seen in relation to the pKa of 8.3 for a free cysteine amino acid residue. (B) Time evolution of the active site cysteine’s pKa in the presence or absence of the substrate. Solid line indicates the pKa value (8.3) of a free cysteine residue.
136
Peters
been suggested from the crystal structure that the hydrogens in the backbone of the P-loop stabilizes the negative charge at the active site Cys (see Fig. 16). To further study the origin of the low pKa value of the cysteine on an atomic level, we performed macroscopic electrostatic calculations using the so-called single site titration method, which is based on the Poisson– Boltzmann methodology (126 and reference therein). The methodology calculates the electrostatic field around the titratable residues, iteratively accounting for the influence of neighboring charges. From the electrostatic field the apparent pKa can be computed, which is shown as a function of pKas in PTP1B in Fig. 17A. The insert ‘‘loop’’ indicates the pKa shift when the backbone charges of all residues in the loop are zeroed. ‘‘Helix’’ marks the pKa shift when the backbone charges of all amino acids in the central ahelix are zeroed. ‘‘Helix+loop’’ indicates that the backbone charges of all residues in the loop and in the central a-helix are zeroed. The analysis of the charges contributing to the pKa shift shows that the net influence of titratable charges is negligible. The major contribution stems from the electrostatic microdipoles created by the backbone charges of the consensus sequence (H/V)CX5R(S/T). The peculiar loop structure of this motif shows that these microdipoles are directed toward the thiol atom of the active site cysteine, thereby giving rise to a pKa shift. In a subsequent study, we could show that the pKa shift is independent of protein flexibility (Fig. 17B) and thoroughly stems from the architecture of the binding pocket (127). One of the surprising observations is the preference of PTP1B for negatively charged peptide substrates. Experimentally, it has been observed that PTP1B has a high catalytic efficiency for the phosphotyrosine-containing peptide DADEpYL, whose sequence is derived from the autophosphorylation site of the epidermal growth factor receptor (EGFR988–998). The high binding affinity is particularly surprising because the total charge of all titratable residues in PTP1B is 6. To determine the origin of the preference of PTP1B for negatively charged peptides, we performed macroscopic electrostatic calculations using again the so-called single-site titration method discussed above. These calculations reveal that there is a positively charged electrostatic field surrounding the active site pocket, which may serve as a trap for negatively charged peptides (128) and as a possible diffusion path for substrates. In contrast to PTP1B, such a field is not observed for PTPa suggesting that DADEpYL is a poor substrate for PTPa. Indeed, kcat/Km for the hydrolysis of the hexa-peptide is four times lower than the value determined for PTP1B (Fig. 18). To further study the origin of the observed substrate specificities, we tested various peptides based on the phosphorylation site of different receptors. In all cases, the different peptides were very efficiently catalyzed by PTP1B but were only poorly recognized by PTPa suggesting that the inherent substrate specificity of these PTPs resides
Computer Simulations
137
Figure 18 Catalytic efficiency of the hydrolysis of the hexa-peptide Ac-DADEpYLNH2 (pY stands for phosphorylated tyrosine) with PTP1B, PTP1BG259Q, PTPa, and PTPaQ259G. The subscripts indicate the mutation; G259Q, for instance, means that Gly259 in PTP1B has been mutated to Gln.
in their active sites (data not shown). To identify the areas in PTPs that could determine the substrate specificity, we have performed a detailed structural analysis of the variability and conservation of amino acid residues in the vicinity of the active site. Based on Ca regiovariation analyses and primary sequence alignments, we could identify regions in the binding cleft, which might confer substrate specificity between PTPs. Among those, residues 47, 48, 258, and 259 (PTP1B numbering) are of particular interest (129). We were able to show that, in particular, residues 48 and 259 are involved in substrate specificity and are potential targets for inhibitor design. Position 48 is occupied by an Asp residue in PTP1B but by an Asn in PTPa. Here selectivity toward substrates or potential inhibitors is governed by electrostatic interactions (i.e., salt bridge formation) (130,131). On the other hand, the selectivity impaired by residues 258 and 259 might be due to steric hindrance. In PTP1B, residue 259 is a glycine, and we hypothesized that the lack of a side chain would allow easy access to the active site, whereas bulky residues in this position as found in PTPa (259 is a
138
Peters
glutamine) and other PTPs might cause steric hindrance. Thus it appeared that Gly259 and Cys258 in PTP1B form the bottom of an open cleft, a gateway, which leads to the active site. To elucidate the mechanism of substrate recognition for this particular substrate site, we performed a detailed enzyme kinetic analysis of PTP1B, PTPa, and single mutants of these enzymes using, among other peptides, Ac-DADEpYL-NH2. As shown by the catalytic efficiencies in Fig. 18, replacing Gly259 in PTP1B with a Gln (PTP1BG259Q mutant) caused steric hindrance and concomitant restricted substrate recognition. In contrast, by substituting Gln259 for a glycine in PTPa (PTPaQ259G mutant), we obtained an enzyme with broad substrate recognition capacity, i.e., an enzyme similar to PTP1B (132). Our studies, however, pointed also to a more complex picture regarding the involvement of residue 259 in substrate recognition and hydrolysis. Thus using the above mutational approach, we noted that bulky 259 residues—in addition to the described steric hindrance—might indirectly influence the catalytic activity of PTPs. We hypothesized that this effect was mediated by an interference in the rotational freedom of residue 262, which is a conserved glutamine in most PTPs and critical for the positioning of a water molecule in the second step of catalysis. Thus it seems likely that bulky residues in the 259 position would negatively influence substrate hydrolysis both due to a direct effect caused by reduced substrate binding and impairing hydrolysis. To evaluate the influence of residue 259 on Gln262, we are currently performing molecular dynamics simulations using PTP1B and a mutant of PTP1B in which a defined set of four residues have been introduced as a model for PTPa. As mentioned above, PTP1B has a glycine in position 259, whereas PTPa has bulky glutamine in that position. In particular, we are interested in monitoring the flexibility of Gln262, which during catalytic reaction swings into the binding pocket and coordinates a water molecule, which is essential in the catalytic reaction. For each enzyme, two cases are modeled: the Michaelis–Menten complex with the substrate analogue p-nitrophenyl phosphate bound to the active site and the cysteine–phosphor complex. Preliminary results for the wild-type PTP1B and the mutant show significantly different behavior of Gln262 when the cysteine–phosphor complex is formed. In Fig. 19, the distances between Gln262 (Q262(CD)) and the phosphor of the cysteine–phosphor complex (indicated by P) are shown. For PTP1B (top), Gln262 can freely swing toward the phosphate group, whereas in the mutant structure (bottom), Gln262 does not approach the phosphate moiety. This suggests that Gln262 has a higher flexibility in the wild-type structure than in the mutant structure. Furthermore, the simulation results identified several key interactions between Gln262 and surrounding residues providing a way to explain the difference in the experimentally observed catalytic efficiency for the enzymes on an atomic level (133).
Computer Simulations
139
Figure 19 Flexibility of the Gln262 side chain in PTP1B (top) and from the quadruple mutant, R47V/D48N/M258C/G259Q (bottom) as extracted from molecular dynamics simulations. As an indication of the mobility of the side chain, the distance between Gln262 (Q262(CD)) and the phosphor of the cysteine–phosphor complex was monitored. Right panels show schematically the interactions of Gln262 with the water molecule (top) and the water molecule and Gln259 (bottom). See the text for more details.
ACKNOWLEDGMENTS The author would like to acknowledge financial support from the Danish National Research Foundation via a grant to MEMPHYS—Center for Biomembrane Physics, from the Danish Cancer Research Foundation, and from the Danish Natural Science Research Council.
REFERENCES 1. 2.
GF Fishman. Monte Carlo—Concepts, Algorithms, and Applications. New York: Springer, 1996. MEJ Newman, GT Barkema. Monte Carlo Methods in Statistical Physics. New York: Oxford University Press, 1999.
140
Peters
3.
K Binder. The Monte Carlo Method in Condensed Matter Physics. Berlin: Springer, 1990. K Binder, DW Hermann. Monte Carlo Simulation in Statistical Physics. Berlin: Springer, 1988. K Binder. Monte Carlo Methods in Statistical Physics. Berlin: Springer, 1986. K Binder. Applications of the Monte Carlo Method in Statistical Physics. Berlin: Springer, 1984. OG Mouritsen. Computer Studies of Phase Transitions and Critical Phenomena. Berlin: Springer, 1984. H Gould, J Tobochnik. An Introduction to Computer Simulation Methods: Applications to Physical Systems. Part 2. Reading, MA: Addison-Wesley, 1988. S Jain. Monte Carlo Simulations of Disordered Systems. Singapore: World Scientific, 1992. MP Allen, DJ Tildesley. Computer Simulation of Liquids. Oxford: Oxford University Press, 1989, pp 71–108. C Branden, J Tooze. Introduction to Protein Structure 2nd ed. New York: Garland Publishing Inc., 1999. P Bratley, BL Fox, LE Schrage. A Guide to Simulation. New York: Springer Verlag, 1987. CL Brooks III, M Karplus, BM Pettitt. A Theoretical Perspective of Dynamics, Structure, and Thermodynamics. New York: Wiley Interscience, 1988. NR Cohen. Guidebook on Molecular Modeling in Drug Design. San Diego, CA: Academic Press, 1996, pp 1–26. A Fersht. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. New York: WH Freeman and Company, 1999. D Frenkel, B Smit. Understanding Molecular Simulations. From Algorithms to Applications. San Diego, CA: Academic Press, 1996. H Gould, J Tobochnik. An Introduction to Computer Simulation Methods: Applications to Physical Systems. Part 1. Reading, MA: Addison-Wesley, 1988. JM Haile. Molecular Dynamics Simulations: Elementary Methods. New York: Wiley, 1992. M Kalos, PA Whitlock. Monte Carlo Methods. New York: John Wiley and Sons, 1986. AR Leach. Molecular Modelling. Principles and Applications. Essex, England: Addison-Wesley Longman, 1996. DC Rapaport. The Art of Molecular Dynamics Simulation. Cambridge, England: Cambridge University Press, 1995. W van Gunsteren, P Weiner, AT Wilkinson. Computer Simulation of Biomolecular Systems: Theoretical and Experimentational Applications. Leiden, The Netherlands: ESCOM, 1996. JM Thijssen. Computational Physics. Cambridge: Cambridge University Press, 1999. H Gould, L Spornick, J Tobochnik. Thermal and Statistical Physics
4. 5. 6. 7. 8.
9. 10. 11. 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22.
23. 24.
Computer Simulations
25. 26. 27.
28. 29. 30. 31. 32. 33.
34. 35.
36.
37. 38.
39. 40. 41. 42. 43. 44.
141
Simulations: The Consortium for Upper-level Physics Software. New York: Wiley, 1995. DW Hermann. Computer Simulation Methods. Berlin: Springer, 1990. K Binder. Monte Carlo and Molecular Dynamics Simulations in Polymer Science. New York: Oxford University Press, 1995. MP Allen, DJ Tildesley. Computer Simulation in Chemical Physics. NATO ASI Series C: Mathematical and Physical Sciences. Dordrecht: Kluwer Academic Press, 1993, Vol 397. D Raabe. Computational Materials Science. Weinheim: Wiley-VCH, 1998. FJ Vesley. Computational Physics—An Introduction. New York: Plenum Press, 1994. FF Abraham. Computational statistical mechanics—methodology, applications, and supercomputing. Adv Phys 35:1–111, 1986. P Stoltze. Simulation methods in atomic-scale materials physics. Lyngby: World Scientific, 1992. BJ Alder, TE Wainwright. Phase transition for a hard sphere system. J Chem. Phys. 27:1208, 1957. MW Maddox, ML Longo. A Monte Carlo study of peptide insertion into lipid bilayers: equilibrium conformations and insertion mechanisms. Biophys J 82:244–263, 2002. H Berry. Monte Carlo simulations of enzyme reactions in two dimensions: fractal kinetics and spatial segregation. Biophys J 83:1891–1901, 2002. EI Michonova-Alexova, IP Sugar. Component and state separation in DMPC/ DSPC lipid bilayers: A Monte Carlo simulation study. Biophys J 83:1820–1833, 2002. AFP de Araujo, TC Pochapsky. Monte Carlo simulations of protein folding using inexact potentials: how accurate must parameters be in order to preserve the essential features of the energy landscape? Fold Des 1:299–314, 1996. N Metropolis, A Rosembluth, M Rosembluth, A Teller. Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092, 1953. J Skolnick, A Kolinski. Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. J Mol Biol 221:499– 532, 1991. BJ Alder, TE Wainwright. Studies in molecular dynamics I General method. J Chem. Phys. 31:459, 1959. AJ Rahman. Correlations in the motion of atoms in liquid argon. Phys Rev A 136:405, 1964. AJ Rahman, FH Stillinger. Molecular dynamics study of liquid water. J Chem Phys 55:3336–3359, 1971. AJ Rahman, FH Stillinger. Improved simulation of liquid water by molecular dynamics. Chem Phys 60:1545–1557, 1974. JA McCammon, BR Gelin, M Karplus. Dynamics of folded proteins. Nature 267:585–590, 1977. W Wang, O Donini, CM Reyes, PA Kollman. Biomolecular simulations: Recent developments in force fields, simulations of enzyme catalysis, protein–
142
45. 46.
47. 48. 49.
50.
51.
52.
53.
54.
55.
56.
57.
Peters ligand, protein–protein, and protein–nucleic acid noncovalent interactions. Ann Rev Biophys Biomol Struc 30:211–243, 2001. E Schro¨dinger. The relation between the quantum mechanics of Heisenberg, Born and Jordan and that of Schro¨dinger. Ann Phys 79:734–756, 1926. E Schro¨dinger. Quantisation as a problem of characteristic values, the perturbation theory and its application to the Stark-Effect of the H Balmer Lines. Ann Phys 80:437–490, 1926. E Schro¨dinger. An undulatory theory of the mechanics of atoms and molecules. Phys Rev 28:1049–1070, 1926. M Braxenthaler, R Unger, D Auerbach, JA Given, J Moult. Chaos in protein dynamics. Proteins 29:417–425, 1997. JB Clarage, T Romo, BK Andrews, BM Pettitt. A sampling problem in molecular dynamics simulations of macromolecules. Proc Natl Acad Sci USA 92:3288–3292, 1995. M Levitt, M Hirshberg, R Sharon, V Dagget. Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Comp Phys Comm 91:215–231, 1995. SW Bunte, H Sun. Molecular modeling of energetic materials: the parameterization and validation of nitrate esters in the COMPASS force field. J Phys Chem B 104:2477–2489, 2000. J Wang, P Cieplak, PA Kollman. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem 21:1049–1074, 2000. P Cieplak, J Caldwell, P Kollman. Molecular mechanical models for organic and biological systems going beyond the atom centered two body additive approximation: aqueous solution free energies of methanol and N-methyl acetamide, nucleic acid base, and amide hydrogen bonding and chloroform/ water partition coefficients of the nucleic acid bases. J Comput Chem 22:1048– 1057, 2001. AD MacKerell Jr, J Wio´rkiewicz-Kuczera, M Karplus. An all-atom empirical energy function for the simulation of nucleic acids. J Am Chem Soc 117: 11946–11975, 1995. AD MacKerell Jr, D Bashford, M Bellott, RL Dunbrack Jr, JD Evanseck, MJ Field, S Fischer, J Gao, H Guo, S Ha, D Joseph-McCarthy, L Kuchnir, K Kuczera, FTK Lau, C Mattos, S Michnick, T Ngo, DT Nguyen, B Prodhom, WE Reiher III, B Roux, M Schlenkrich, JC Smith, R Stote, J Straub, M Watanabe, J Wio´rkiewicz-Kuczera, D Yin, M Karplus. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem 102:3586–3616, 1998. M Schlenkrich, J Brickmann, AD MacKerell Jr, M Karplus. An empirical potential energy function for phospholipids: criteria for parameter optimization and applications. In: KM Merz Jr, B Roux, eds. Biological Membranes: A Molecular Perspective from Computation and Experiment. Birkhauser, 1996, pp 31–81. SN Ha, A Giammona, M Field, JW Brady. A revised potential-energy surface
Computer Simulations
58. 59. 60. 61. 62. 63.
64. 65. 66. 67. 68. 69.
70. 71.
72. 73.
74.
75.
76.
143
for molecular mechanics studies of carbohydrates. Carbohydr Res 180:207– 221, 1988. R Garemyr, A Elofsson. Study of the electrostatics treatment in molecular dynamics simulations. Proteins 37:417–428, 1999. A Warshel, ST Russel. Calculations of electrostatic interactions in biological systems and in solutions. Q Rev Biophys 17:283–422, 1984. A Warshel, J A˚qvist. Electrostatic energy and macromolecular function. Ann Rev Biophys Chem 20:267–298, 1991. RJ Loncharich, BR Brooks. The effects of truncating long-range forces on protein dynamics. Proteins 6:32–45, 1989. B Honig, A Nicholls. Classical electrostatics in biology and chemistry. Science 268:1144–1149, 1995. Z Radic, PD Kirchhoff, DM Quinn, JA McCammon, P Taylor. Electrostatic influence on the kinetics of ligand binding to acetylcholinesterase. J Biol Chem 272:23265–23277, 1997. PJ Steinbach, BR Brooks. New spherical-cutoff methods for long-range forces in macromolecular simulation. J Comput Chem 15:667–683, 1994. T Darden, D York, L Pedersen. Particle mesh ewald: An n log(n) method for ewald sums in large systems. J Chem Phys 98, 1993. U Essmann, L Perera, ML Berkowitch, T Darden, L Hsing, LG Pedersen. A smooth particle mesh Ewald method. J Chem Phys 103, 8577–8593, 1995. RW Hockney, JW Eastwood. Computer Simulation Using Particles. New York: McGraw-Hill, 1981. V Rokhlin. Rapid solution of integral equations of classical potential theory. J Comput Phys 60:187–207, 1985. JA Board Jr, JW Causey, JF Leathrum Jr, A Windemuth, K Schulten. Accelerated molecular dynamics simulations with the parallel fast multipole algorithm. Chem Phys Lett 198:89–94, 1992. E Pollock, J Glosli. Comments on pppm, fmm and the Ewald method for large periodic coulombic systems. Comp Phys Comm 95:93–110, 1996. F Figuerido, R Levy, R Zhou, B Berne. Large scale simulation of molecules in solution: combining the periodic fast multipole method with multiple time step integrators. J Chem Phys, 9835–9849, 1997. C Sagui, TA Darden. Molecular dynamics simulations of biomolecules: Longrange electrostatic effects. Annu Rev Biophys Biomol Struct 28:155–179, 1999. T Schlick, RD Skeel, AT Brunger, LV Kale, J Board, J Hermans, K Schulten. Algorithmic challenges in computational molecular biophysics. J Comput Phys 151:9–48, 1999. DM York, TA Darden, LG Pedersen. The effect of long-range electrostatic interactions in simulations of macromolecular crystals: A comparison of the Ewald and truncated list methods. J Chem Phys 99:8345–8348, 1993. DM York, W Yang, H Lee, T Darden, LG Pedersen. Toward the accurate modelling of DNA: The importance of long-range electrostatics. J Am Chem Soc 117:5001–5002, 1995. T Fox, PA Kollman. The application of different solvation and electrostatic
144
77. 78. 79. 80. 81. 82. 83.
84. 85. 86. 87. 88. 89.
90. 91. 92.
93.
94.
Peters models in molecular dynamics simulations of ubiquitin: how well is the x-ray structure ‘‘maintained’’? Proteins: Struct Func Gen 25:315–334, 1996. S Nose. Mol Phys 52:255–268, 1984. S Nose. A unified formulation of the constant temperature molecular dynamics methods. J Chem Phys 81:511–519, 1984. WG Hoover. Canonical dynamics: equilibrium phase-space distributions. Phys Rev A 31:1695–1697, 1985. WG Hoover. Molecular Dynamics. Berlin: Springer, 1987. M Parrinello, A Rahman. Crystal structure and pair potentials: a moleculardynamics study. Phys Rev Lett 45:1196–1199, 1980. The author is a member of the MEMPHYS Group—Center for Biomembrane Physics. This work has been performed in collaboration with several research groups involving Risø National Laboratory (K Kjaer, DK), Centre for Interdisciplinary Studies of Molecular Interactions (CISMI) (T Bjørnholm, DK), MaxPlank Institute Berlin (G Brezesinski, H Mo¨hwald, D), European Molecular Biology Laboratory (EMBL) (R Wade, D), University of Helsinki (PKJ Kinnunen, FIN), Novo Nordisk A/S (RP Bywater, DK) and Novozyme A/S (A Svendsen, DK). P Woolley, SB Petersen. Lipases: Their Structure, Biochemistry and Applications. Cambridge: Cambridge University Press, 1994, pp 271–288. AR Macrae. Lipase catalyzed interesterification of oils and fats. J Am Oil Chem Soc 60:291–294, 1983. W Boland, C Frobel, M Lorentz. Esterolytic and lipolytic enzymes in organic synthesis. J Synthetic Org Chem 12:1049–1072, 1991. BA van Kuiken, WD Behnke. The activation of porcine pancreatic lipase by cis-unsaturated fatty acid. BBA 214:148–160, 1994. AR Macrae, RC Hammond. Present and future applications of lipases. Biotechnol Gen Eng Rev 3:193–217, 1985. M Cygler, JD Schrag, JL Sussman, M Harel, I Silman, MK Gentry, BP Doctor. Relationship between sequence conservation and three-dimensional structure in a large family of esterases, lipases, and related proteins. Protein Sci 2:366–382, 1993. ZS Derewenda. Structure and function of lipases. Adv Protein Chem 45:1–52, 1994. GG Dodson, DM Lawson, FK Winkler. Structural and evolutionary relationships in lipase mechanism and activation. Faraday Discuss 93:95–105, 1992. M Norin, F Haeffner, A Achour, T Norin, K Hult. Computer modeling of substrate binding to lipases from Rhizomucor miehei, Humicola lanuginosa, and Candida rugosa. Protein Sci 3:1493–1503, 1994. AT Yagnik, JA Littlechil. Molecular modelling studies of substrate binding to the lipase from Rhizomucor miehei. Comput Aided Mol Design 11:256–264, 1997. L Brady, AM Brzozowski, ZS Derewenda, E Dodson, S Tolley, JP Turkenburg, L Christiansen, B Huge-Jensen, L Nørskov, L Thim, U Menge. A serine
Computer Simulations
145
protease triad forms the catalytic centre of a triacylglycerol lipase. Nature 343:767–770, 1990. 95. GH Peters, U Dahmen-Levison, K de Meijere, G Brezesinski, S Toxvaerd, H Mo¨hwald, A Svendsen, PKJ Kinnunen. Influence of surface properties of mixmonolayers on lipolytic hydrolysis. Langmuir 16:2779, 2000. 96. GH Peters, S Toxvaerd, NB Larsen, T Bjørnholm, K Schaumburg, K Kjaer. Structure and Dynamics of Lipid Monolayers: Implications for enzyme catalysed lipolysis. Nat Struct Biol 2:401, 1995. 97. GH Peters, NB Larsen, T Bjørnholm, S Toxvaerd, K Schaumburg, K Kjaer. X-ray diffraction and molecular dynamics studies: structural analysis of phases in diglyceride monolayers. Phys Rev E 57:3153, 1998. 98. GH Peters, S Toxvaerd, NB Larsen, T Bjørnholm, K Schaumburg, K Kjaer. Phase transitions in di-glyceride monolayers studied by computer simulations, pressure-area isotherms and x-ray diffraction. Nuovo Cim 16:1479, 1994. 99. GH Peters, S Toxvaerd, A Svendsen, OH Olsen. Modeling of complex biological systems: I. Molecular dynamics studies of di-glyceride monolayers. J Chem Phys 100:5998, 1994. 100. GH Peters, S Toxvaerd, OH Olsen, A Svendsen. Modeling of complex biological systems: II. Effect of chainlength on the phase transitions observed in diglyceride monolayers. Langmuir 11:4072, 1995. 101. GH Peters, S Toxvaerd, O Olsen, A Svendsen. Computational studies of activation of lipases and the effect of a hydrophobic environment. Protein Eng 10:137, 1997. 102. GH Peters, OH Olsen, A Svendsen, R Wade. Theoretical investigation of the dynamics of the active site lid in Rhizomucor miehei lipase. Biophys J 71:119, 1996. 103. AM Brzozowski, H Savage, CS Verma, JP Turkenburg, DM Lawson, A Svendsen, S Patkar. Structural origins of the interfacial activation in Thermomyces (Humicola) lanuginosa lipase. Biochemistry 39:15071, 2000. 104. GH Peters, A Svendsen, H Langberg, J Vind, SA Patkar, S Toxvaerd, PKJ Kinnunen. Active serine involved in the stabilization of the active site loop in the Humicola lanuginosa lipase. Biochemistry 37:12375, 1998. 105. GH Peters, A Svendsen, H Langberg, J Vind, SA Patkar, PKJ Kinnunen. Glycosylation of Thermomyces lanuginosa lipase enhances surface binding, but does not significantly influence the catalytic activity. Colloids Surf Sci B 26:125–134, 2002. 106. GH Peters, R Bywater. Computational analysis of chain flexibility and fluctuations in Rhizomucor miehei lipase. Protein Eng 12:747, 1999. 107. GH Peters, MØ Jensen, RP Bywater. Dynamics of the substrate binding pocket in the presence of a covalently attached inhibitor. J Biomol Struct Dyn 19:1–13, 2001. 108. GH Peters. The dynamic response of a fungal lipase in the presence of charged surfactants. Colloids Surf Sci B 26:84–101, 2002. 109. A Amadei, ABM Linssen, HJC Berendsen. Essential dynamics of proteins. Proteins 17:412–425, 1993.
146
Peters
110. T Ichiye, M Karplus. Collective motions in proteins; a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 11:205–217, 1991. 111. M Karplus, T Ichiye. Comment on a fluctuation and cross correlation analysis of protein motions observed in nanosecond molecular dynamics simulation. J Mol Biol 263:120–122, 1996. 112. HJC Berendsen, S Hayward. Collective protein dynamics in relation to function. Curr Opin Struct Biol 10:165–169, 2000. 113. BL De Groot, DMF van Aalten, A Amadei, HJC Berendsen. The consistency of large concerted motions in proteins in molecular dynamics simulations. Biophys J 71:1707–1713, 1996. 114. A Kitao, N Go. Investigating protein dynamics in collective coordinate space. Curr Opin Struct Biol 9:164–169, 1999. 115. DW Miller, DA Argard. Enzyme specificity under dynamic control: a normal mode analysis of alpha-lytic protease. J Mol Biol 286:267–278, 1999. 116. TM Frimurer, GH Peters, MD Sørensen, JJ Led, OH Olsen. Assignment of side-chain conformation using adiabatic energy mapping, free energy perturbation, and molecular dynamics simulations. Protein Sci 8:25, 1999. 117. BK Andrews, T Romo, JB Clarage, BM Pettitt, GN Phillips Jr. Characterizing global substrates of myoglobin. Structure 6:587–594, 1998. 118. LSD Caves, JD Evanseck, M Karplus. Locally accessible conformations of proteins: multiple molecular dynamics simulations of crambin. Protein Sci 7:649–666, 1998. 119. GH Peters, RP Bywater. Influence of a lipid interface on protein dynamics in a fungal lipase. Biophys J 81:3052–3065, 2001. 120. MØ Jensen, TR Jensen, K Kjaer, T Bjørnholm, OG Mouritsen, GH Peters. Orientation and conformation of a lipase at an air–water interface studied by molecular dynamics simulations. Biophys J 83:98–111, 2002. 121. TR Jensen, MØ Jensen, N Reitzel, K Balashev, GH Peters, K Kjaer, T Bjørnholm. Water in contact with extended hydrophobic surfaces: Direct evidence of weak dewetting. Phys Rev Lett 90:086101–086400, 2003. 122. MØ Jensen, OG Mouritsen, GH Peters, Interfacial water structure at an alkane and alcohol monolayer studied by molecular dynamics and x-ray scattering, submitted. 123. This project is carried out in collaboration with NPH Møller and OH Olsen from Novo Nordisk A/S. 124. EH Fisher, H Charbonneau, NK Tonks. Protein tyrosine phosphatases. Science 253:401, 1991. 125. T Hunter. Protein kinases and phosphatases: the yin and yang of protein phosphorylation and signalling. Cell 80:225, 1995. 126. GH Peters, TM Frimurer, OH Olsen. Electrostatic evaluation of the signature motif (H/V)CX5R(S/T) in protein-tyrosine phosphatases. Biochemistry 37: 5383, 1998. 127. GH Peters, TM Frimurer, JN Andersen, OH Olsen. Molecular dynamics simulations of protein-tyrosine phosphatase 1B: I. Ligand-induced changes in the protein motions. Biophys J 77:505, 1999.
Computer Simulations 128.
129.
130.
131.
132.
133.
147
GH Peters, TM Frimurer, JN Andersen, OH Olsen. Molecular dynamics simulations of protein-tyrosine phosphatase 1B: II. Substrate–enzyme interactions and dynamics. Biophys J 78:2191, 2000. JN Andersen, OH Mortensen, GH Peters, PG Drake, LF Iversen, OH Olsen, HS Andersen, NK Tonks, NPH Møller. Structural and evolutionary relationships among protein tyrosine phosphatase domains. Mol Cell Biol 21:7117– 7136, 2001. LF Iversen, HS Andersen, S Branner, SB Mortensen, GH Peters, K Norris, OH Olsen, CB Jeppesen, BF Lundt, W Ripka, KB Møller, NPH Møller. Structure-based design of a low molecular weight, nonphosphorus, nonpeptide, and highly selective inhibitor of protein-tyrosine phosphatase 1B. J Biol Chem 275:10300, 2000. LF Iversen, HS Andersen, KB Møller, OH Olsen, GH Peters, S Branner, SM Mortensen, TK Hansen, J Lau, Y Ge, DD Holsworth, MJ Newman, NPH Møller. Steric hindrance as basis for structure-based design of selective inhibitors of protein-tyrosine phosphatases. Biochemistry 40:14812–14820, 2001. GH Peters, LF Iversen, S Branner, HS Andersen, SB Mortensen, OH Olsen, KB Møller, NPH Møler. Residue 259 is a key determinant of substrate specificity of protein-tyrosine phosphatases 1B and a. J Biol Chem 275:18201, 2000. GH Peters, LF Iversen, HS Andersen, OH Olsen, NPH Møller. Molecular modelling of wild-type and mutant protein tyrosine phosphatases: residue 259 determines the flexibility of glutamine 262. Submitted.
7 Calculations of Ionization Equilibria in Proteins Andrey Karshikoff Karolinska Institutet Huddinge, Sweden
Functional properties of proteins result from a delicate balance of different type of interactions. Among them, electrostatic interactions are a factor, whose importance becomes evident at any pH-dependent property, such as pH regulation of enzyme activity and substrate/inhibitor binding, pH dependence of protein stability, and many others. Electrostatic interactions in proteins cannot be measured directly. That is why their correct theoretical description is of key importance for studies on structure–function relationship in proteins. There is no doubt that engineering studies are a powerful instrument for a better understanding of the role of electrostatic interactions in functional properties of protein. On the other side, a correct understanding of electrostatic interactions and their interplay with all other interactions in proteins is needed for an adequate design of molecules with desired properties. A good example of this need is the charge reversal mutations, which are expected to stimulate the binding of charged substrates. Failures to increase the binding of, say, a negatively charged substrate by mutating a negative group to a positive one have been clearly explained on the basis of theoretical calculations 149
150
Karshikoff
(1). There are also other questions that can be answered by synergic efforts of experimental protein engineering and theoretical modeling. For instance, do salt bridges stabilize (2) or destabilize (3) native structure of proteins? Elevation of thermal stability of enzymes is also an interesting objective of engineering studies. Good natural sources for understanding the factors regulating the thermal stability of proteins are the thermophilic and hyperthermophilic organisms (4). Electrostatic interactions and salt bridge formation seem to have a dominant role for the enormous thermal stability of the enzyme from thermophiles, some of which are active at temperatures of boiling water or even higher (5). Again, experimental studies based on engineered proteins together with the theoretical predictions can provide the volume of knowledge needed to understand the origin of thermal stability (6). A variety of other examples can be given, where electrostatic interactions and protonation/deprotonation equilibria in proteins can be examined by means of a combined effort and protein engineering. Two main issues of electrostatic interactions and protonation/deprotonation equilibria in proteins will be considered below. The first one is that ionizable groups in proteins may not follow the Henderson–Hasselbalch equation. The titration curves of such groups have irregular ionization. The analysis of experimental data that indicate pH dependence of a certain observable usually begins with efforts to identify the titratable group or groups, whose ionization regulates or is responsible for this dependency. It will be illustrated that the formal assignment of a pK value to the midpoint of an observed dependency may be misleading if irregular ionization occurs. The second issue concerns electrostatic interactions in denatured (unfolded) state of proteins. Not much work has been done on studying the ionization properties of denatured proteins, probably because proteins in this state are considered as ‘‘dead’’ molecules, deprived of biological activity. Often, electrostatic interactions in denatured proteins are set to zero, i.e., considered as irrelevant, which is an oversimplification and is applicable to a very few cases. It will be shown that in order to predict stability of native protein as a function of pH, electrostatic interactions in denatured state have to be taken into account. 1
PROTONATION/DEPROTONATION EQUILIBRIA IN PROTEINS
Most often, theoretical prediction of electrostatic interactions is focused on the calculation of measurable quantities, whose values can be directly correlated to electrostatic interactions. Such quantities are, for instance, the ionization equilibrium constants (or their equivalents, the pK values) of the individual titratable groups in proteins.
Calculations of Ionization Equilibria in Proteins
151
The degree of deprotonation, h, of a titratable amino acid side chain in solution at standard condition is given by the Henderson–Hasselbalch equation: h¼
10ðpHpKÞ 1 þ 10ðpHpKÞ
ð1Þ
where pK is the negative logarithm of the dissociation constant. The pH dependence of h has a sigmoidal character with inflection point at h=0.5, where pK equals pH. This simple feature of Eq. (1) is widely used for the determination of pK of titratable groups in proteins, just by fitting the experimental data to Eq. (1). The standard free energy of deprotonation is related to the dissociation constant by DG0p!d=2.3RTpK. If the protonation is coupled with some interactions with the environment that differ from the standard conditions, one writes DGp!d ¼ DG0p!d 2:3RT pH þ DGenv
ð2Þ
where DGenv corresponds to the free energy change due to these interactions. The straightforward interpretation of Eq. (1) holds until DGenv is a linear function of pH. Otherwise, the ionization curve h(pH) may have nontrivial character and, in general, pK is no longer equal to pH when h=0.5. In such cases, fitting of experimental data to Eq. (1) is inappropriate. In proteins, DGenv(pH) for the individual titratable groups can be far from linear. 1.1
Factors Regulating Ionization Equilibria in Proteins
The fundamental assumption of the theory of protonation/deprotonation equilibria in proteins is that these equilibria are regulated only by the electrostatic environment created by the protein molecule and the surrounding solvent, i.e., DGenv=DGel. A few factors determining DGel can be distinguished. The first one arises from charge–charge interactions between titratable groups themselves. That is, the process of protonation of a given group is realized in the electric field created by the other titratable groups of the molecule. The magnitude of these interactions depends on the protonation state of the interacting groups and is obviously pH-dependent. Electrostatic interaction of the titratable groups with protein permanent charges is the second factor contributing to DGel. The charges that can be considered permanent are the polypeptide backbone dipoles, the partial charges of the polar groups, or metal ions bound to the protein molecule. These interactions are pH-independent. The separation of the charge–charge interaction into pH-dependent and pH-independent is formal and aims simplicity of the theoretical formulations and analysis. Desolvation effect is the third factor
152
Karshikoff
that essentially influences the protonation/deprotonation equilibria in proteins. This is the part of DGel that corresponds to the energy of transfer of a titratable group, or a model compound with experimentally known pK value, from solvent (standard conditions) to its location in the protein. The model compound of a given titratable side chain is the corresponding amino acid with the alpha amino and the carboxyl groups substituted by blocking groups. This energy is always positive (unfavorable) and is often called ‘‘desolvation penalty.’’ The ability of native proteins to adopt different conformations, or in other words the conformational flexibility of proteins, is the fourth factor regulating the ionization behavior of titratable groups. This factor is not electrostatic in nature, but it has a significant influence on the other three factors and vice versa; electrostatic interactions control the conformational flexibility by stabilizing one or another conformation at given conditions. Bohr effect in hemoglobins is an example for conformational change controlled by electrostatic interactions. The factors determining protonation/deprotonation equilibria in proteins are mutually dependent, and the description of any of the factors cannot be done out of the context of all other factors. To simplify the theoretical analysis, the factors depending on electrostatic interactions will be considered separately from the conformational flexibility. Without loosing generality, the theoretical considerations will be made on the basis of a fixed, nonflexible structure. Then the interplay between electrostatic factors and conformational flexibility will be analyzed. 1.2
Ionization Curves of Titratable Groups in Proteins
The change of the protonation/deprotonation equilibrium of a given titratable group can be analyzed by means of a thermodynamic cycle as shown in Fig. 1A. Each titratable group is considered as an appropriate model compound, which is transferred from solution to its place in the protein (S P). When the group is protonated, the energy of transfer is DGpS!P, while for the deprotonated form of the group, the energy of transfer is DGdS!P. The protonation/deprotonation equilibrium of the model compound in solution is characterized by its standard free energy DGSp!d=2.3RTpKmod. The equilibrium constant, pKmod, can be a subject of calculations or can be determined Figure 1 A: Thermodynamic cycle used for the calculation of the pK values of titratable groups in proteins. B: Lattice representation of the protein as a dielectric material, ep, immersed in the medium of the solvent with dielectric constant es. Dielectric constant, charge value, and ionic strength are assigned to each node of the lattice.
Calculations of Ionization Equilibria in Proteins
153
154
Karshikoff
experimentally. According to the thermodynamic cycle, the free energy of deprotonation in the protein molecule is related to DGSp!d as follows: DGPp!d ¼ DGSp!d þ ðDGS!P DGS!P Þ d p
ð3Þ
The difference between transfer energies, DGsol=(DGdS!PDGpS!P), is the desolvation penalty responsible for the shift of the protonation/deprotonation equilibrium of the titratable group towards stabilization of its uncharged form. Taking into account the influence of the charge–charge interactions, DGtc, and the influence of protein permanent charges, DGpc, one obtains DGPp!d ¼ 2:3RT ðpKmod pHÞ þ DGsol þ DGpc þ DGtc
ð4Þ
For a fixed protein structure, the terms DGsol and DGpc depend only on the permittivity of the surrounding medium and on the distribution of the protein permanent charges, i.e., they are pH-independent. Being pH-independent, these two factors can be considered as pK corrections: DpKsol=DGsol/2.3RT and DpKpc=DGpc/2.3RT. Combining the pH-independent parts of Eq. (4), one obtains DGPp!d ¼ 2:3RT ðpKint pHÞ þ DGtc
ð5Þ
where pKint=pKmod+DpKsol+DpKpc. The definition of pKint (intrinsic pK) given by Tanford and Kirkwood (7) has the meaning of pK of a given group if all other titratable groups in the protein were in their neutral form. The term DGtc depends on the protonation state of all other titratable groups, so that it is pH-dependent. It is convenient to express h in terms of statistical sum: eðDGP!d =RT Þ P
h¼
1 þ eðDGP!d =RT Þ P
ð6Þ
where the denominator represents the partition function of a system with two states (protonated and deprotonated). After substituting DGPp d from Eq. (5), the above equation can be written as: h¼
e2:3ðpHpKint ÞDGtc =RT 10ðpHpKint ÞDGtc =2:3RT ¼ 1 þ e2:3ðpHpKint ÞDGtc =RT 1 þ 10ðpHpKint ÞDGtc =2:3RT
ð7Þ
The purpose of this formal derivation of Eq. (7) is to illustrate that h may depend on pH in a more complicated manner than that predicted by Eq. (1). Obviously, if DGtc=0, Eq. (7) becomes identical to Eq. (1). If DGtc=const., the inflection point of h(pH) is shifted with a magnitude of DGtc/RT pH units, while if DGtc linearly depends on pH, the titration curve changes
Calculations of Ionization Equilibria in Proteins
155
also its slope. In all these cases, half-protonation (h=0.5) occurs at the pH corresponding to the inflection point of the titration curve. This point is defined as pK1/2. In cases of cooperative ionization, DGtc(pH) becomes a nonlinear function of pH and the above simple rules cannot be applied. The calculation of DGtc(pH), and respectively h(pH) of a given titratable group, is a complex task because it depends on and at the same time influences the protonation/deprotonation equilibria of all other titratable groups in the protein. 1.3
Calculation of Protonation/Deprotonation Equilibria in Proteins
Usually, individual titratable groups are assumed to have two states: protonated and deprotonated. On the other hand, deprotonated state of histidines has two tautomers: Nq2–H and Ny2–H. Also, the hydrogen atom in the protonated form of the glutamic and aspartic acids can be bound to one of the two carboxyl oxygens. In addition, titratable groups are usually involved in hydrogen bonds with the polar groups from the protein environment. Upon deprotonation (or protonation), local but important changes may occur. Hydrogen bonds may be broken, donor–acceptor partnership can be changed, and hence hydrogen bond networks may be rearranged. These effects go together with the reorientation of the surrounding polar groups and in this way may influence the protonation/deprotonation equilibria of other titratable groups. These effects can be taken into account in different ways. The simplest way is to introduce alternative proton position, say, by using stereochemical criteria only. As far as the occupancy of the alternative proton positions depends on pH for both titratable and nontitratable polar groups, it is convenient to formally distinguish pH-sensitive sites (polar groups, such as threonines, water molecules participating in hydrogen bond networks, etc.) and titratable sites (asp, glu, his, etc.). The introduction of the alternative proton locations means that the individual sites (titratable or pHsensitive) may have more than two states. A general theoretical approach that treats sites with multiple states has been elaborated earlier by Spassov and Bashford (8) (see also Refs. 9 and 10 for review). 1.3.1
Microscopic Protonation/Deprotonation Equilibria
Instead with two states (protonated and deprotonated), a given site (titratable or pH-sensitive) can be described by a set of n microstates Sa (a=0,1,. . .,n1). The term microstate is formally introduced to distinguish from protonated or deprotonated state. Different microstates can be different rotamers or tautomers. Each microstate, a, is characterized by a certain number of titratable
156
Karshikoff
hydrogens, ma. Usually, ma=1, but for histidines, ma=2. As far as there is no rule in ordering the states, the choice of the reference state is in fact arbitrary and does not affect the final results or validity of the derived equations. In the further considerations, the S0 will be used as reference state. The equilibrium of the microstates within a single group is determined by the microscopic equilibrium constants, K Aa, or equivalently by pK aA: pKaA ¼ logKaA ¼ log
½Sa þ Dra pa ½H ¼ log þ Dma pH; ½S0 p0
ð8Þ
where pa is the population of state Sa (Spa=1) and Dma=m0ma. For transitions between states with equal proton content, Dma=0. Consider a single titratable group, for instance tyrosine, as a model compound free in solution. Assume also that its protonated form has two microstates corresponding to the two most populated orientation of the hydroxyl group. These states are experimentally indistinguishable and a single macroscopic pKmod is observed. The two microstates may have different populations when this titratable group is in a protein molecule. This difference may arise from electrostatic interactions with the protein environment or from participation of the hydroxyl group in hydrogen bonds. Therefore the microscopic equilibrium constants, or the pK Aa,mod values, should be used in the thermodynamic cycle (Fig. 1) rather than the experimentally observed pKmod. The relation between macroscopic and microscopic pK is given in detail in Ref. 11 (see also Refs. 12 and 13). It is reasonable to assume that all microstates of protonated species of a model compound are equally populated. For instance, each of the carboxyl oxygens is protonated with a probability of 0.5. In this case, the transition from one protonated microstate, say S0, to another protonated microstate, Sa, will be characterized by Dma=0 and Sa/S0=1. According to Eq. (8), pK Aa,mod=0 for this transition. The same is valid for the deprotonated states. It should be noted that histidine tautomers are not equally populated. The introduction of microstates requires the reconsideration of Eq. (4) and certain adjustment of the terminology. Eq. (4) gives an expression for the free energy of the transition from protonated (reference state) to deprotonated state. Some groups may have more than one protonated (or deprotonated) microstate. Moreover, polar groups have only protonated states. Eq. (4) holds in all these cases; however, it must be rewritten as follows: l DGia ¼ 2:3RT ðpKia;mod Dmia pHÞ þ DGia;sol þ DGia;pc þ DGia;tc :
ð9Þ
Here, DGia is the free energy of the transition from the reference state, Si0, to state Sia of a titratable or pH-dependent site i in the protein molecule. The multiplier Dmia indicates whether during transition, Si0!Sia, deprotonation
Calculations of Ionization Equilibria in Proteins
157
occurs (Dmia=1) or does not occur (Dmia=0). Consider a transition of polar group from Si0 to an arbitrary state Sia. From the equality of microstate populations, it follows from Eq. (8) that pKAia,mod=0. Taking into account that Dmia=0 (no change of the protonation state of a polar group takes place), Eq. (9) becomes DGia ¼ DGia;sol þ DGia;pc þ DGia;tc :
ð10Þ
The transition Si0!Sia is regulated by the changes of desolvation energy, DGia,sol, electrostatic interactions with the permanent charges, DGia,pc, and titratable sites, DGia,tc. DGia depends on pH via DGia,tc. 1.3.2
Electrostatic Interactions
Each atom, k, of a certain titratable or pH-sensitive group, i, is characterized by partial charge, qia(k), which depends on the chemical nature of the groups. At each microstate, Sia, of this group, there is a distribution of charges Uia. The work needed to situate all charges on their places on the atoms comprising the group is given by the self energy of the distribution Uia: Gia;self ¼ 1=2
mi X
BðUia ;kÞqia ðkÞ:
k
The sum in the above expression is over the number of all partial atomic charges, mi, of the site i. B(Uia,k) is the electrostatic potential at location k created by all charges within Uia. According to the definition of desolvation energy, this is the energy of transfer of titratable or pH-sensitive site from standard condition (model compound in solution) to its place in the protein molecule. As far as only electrostatic interactions are considered, desolvation energy is then the difference between self energies when the titratable or pHsensitive site is considered as a model compound in solution (S) and when it is at its location in the protein molecule (P): DGS!P ia;sol ¼ 1=2
X
ðBP ðUia ; kÞ BS ðUia ;kÞÞqia ðkÞ
k
The contribution of the desolvation energy for the transition Si0!Sia in protein can be obtained from the thermodynamic cycle shown in Fig. 1: DGia;sol ¼ 1=2
X ½ðBP ðUia ; kÞ BS ðUia ; kÞÞqia ðkÞ ðBP ðUi0 ; kÞ k
B ðUi0 ; kÞÞqi0 ðkÞ S
ð11Þ
158
Karshikoff
The contribution of the permanent charges, DGia,pc, is calculated as: X DGia;pc ¼ ðBðUia ; kÞ BðUi0 ; kÞÞqpc ðkÞ;
ð12Þ
kafpcg
where summation is over all permanent charges of the protein, {pc}. It is convenient to introduce a microscopic intrinsic pKAia,int of the transition Si0!Sia analogously to that used in Eq. (5): l A ¼ pKia;mod þ ðDGia;sol þ DGia;pc Þ=2:3RT pKia;int
ð13Þ
The value of pKAia,int at given conditions (temperature and ionic strength) depends only on the protein structure (i.e., on how the i-th group is situated in the protein) but not on the charge–charge interactions with the titratable sites. The transitions {Si0!Sia} of a given group i in protein molecule occur under the influence of the electrostatic field of all other titratable and pHsensitive sites. This influence is accounted by DGia,tc (the last term of the righthand side of Eqs. (9) and (10)). Electrostatic interaction between site i in microstate a and site j i in state h is given by Wia; jh ¼
mi X
BðUjh ; kÞqia ðkÞ
ð14Þ
k
where the sum is taken over all atoms, mi, of site i with partial charges, qia(k). B(Uia,k) is the electrostatic potential at the location of atom k created by the charge distribution, Ujh, of site j. The microstate Sb of group j is pH-dependent, and its population depends on the other titratable and pH-sensitive sites in the very same way as the population of Sa of group i. Thus to calculate electrostatic interactions between sites i and j, one needs to know the populations of the microstates Sa and Sb at a given pH. In fact, according to Eq. (8), the determination of the equilibrium populations of the microstates of the titratable and pH-sensitive sites includes the task for calculation of pK values. Therefore it is more convenient to transform the task for calculation of pK values to a task for determination of the microstate populations as a function of pH. 1.3.3
Populations of Microstates of Titratable and pH-Sensitive Sites
The population of the microstates of the individual titratable and pH-sensitive sites is calculated in terms of statistical physics. The solution of the task for multiple site titration in proteins has been given by Bashford and Karplus (14). Later, it has extended for the more general case including redox sites and other properties (8). Here, a modified expression for the population of the individual microstates will be given, which is more convenient for the problem considered. The probability of certain site i to be in a microstate Sa is given by
Calculations of Ionization Equilibria in Proteins
159
the Boltzmann weighted sum X dðxi ; aÞexpðDGðxÞ=RT Þ pia ¼
fxg
X
expðDGðxÞ=RT Þ
:
ð15Þ
fxg
The sums in Eq. (15) are taken over all possible states {x} that the protein molecule can adopt. One state of the protein molecule is described by the vector x=(x1,. . .xi,. . .xM), which contains M elements. The number of elements corresponds to the number of sites (titratable and pH-sensitive). Each element x i indicates the microstate of the ith site, i.e., x i = 0,1,Sia,. . .ni1, if site i has ni microstates. The function d(xi,a) is defined so that d(xi,a)=1 if xi=a and d(xi,a)=0 if xi p a. If one considers only titratable sites with two states each, the element xi will have values 1 or 0 depending on whether the site i is in protonated or deprotonated state. In this case, d(xi,a)=xi and Eq. (15) becomes identical to that introduced by Bashford and Karplus (14). The energy, DG(x), of the system in state x is given by: XX X DGðxÞ ¼ 2:3RT ðpKxli ;int Dmxi pHÞ þ 1=2 Wxi ;xj ð16Þ i
i
j pi
where indices i and j enumerate all titratable and pH-dependent sites. In the above expression, pKAxi,int and Wxi,xj are defined by Eqs. (13) and (14), respectively. After substituting DG(x) from Eq. (16) into Eq. (15), one obtains the final expression for probability site i to be in state Sa as a function of pH. It can be illustrated that if the system has only two states (the vector x has only one element x=1 or 0), Eq. (17) reduces to Eq. (7). 1.4
Continuum Dielectric Model
The factors regulating protonation/deprotonation equilibria in proteins are defined by Eqs. (11), (12), and (14). The solution of all of these equations requires calculations of the electrostatic potential, B(Uia,k), created by a set of atomic charges, Uia, at the position of atom k. In order to calculate B(Uia,k), one assumes that both the three-dimensional structure of the protein of interest and the values of the partial charges are known. There are different methods for the calculation of B(Uia,k). Among them, the continuum dielectric model is probably the most frequently used approach. It is attractive because of its simplicity and the few parameters needed to perform the calculations. In this model, the protein molecule and the surrounding solvent are treated as two homogeneous media characterized by macroscopic quantities such as permittivity and charge density. The protein is represented as a rigid body with low dielectric constant (ep=2 to
160
Karshikoff
20) and fixed charge distribution, Up(r), which is immersed in a high dielectric medium (esc80, assuming aqueous solution). The linearized Poisson–Boltzmann equation is solved for this system: jðqðrÞjBðrÞÞ n2 BðrÞ þ 4kUp ðrÞ ¼ 0:
ð17Þ
The ionic strength of the solution is presented in Eq. (17) through the Debye parameter j. A detailed derivation of the above equation can be found in Ref. 15. The nonlinear form of the Poisson–Boltzmann equation can also be used; however, it has been shown that for physiological ionic strength, the two forms of the equation give practically equal results (16). For an arbitrary, nonanalytical, shape of the dielectric boundary (the protein–solvent interface), Eq. (17) is solved numerically. The most popular and widely used routine is the finite difference method, first proposed for the calculation of electrostatic interactions in proteins by Warwicker and Watson (17). The protein is placed in a box with a three-dimensional grid forming a cubic lattice. Values of dielectric constant (ep or es), charge, and ionic strength are assigned to each grid point (Fig. 1B). The finite difference formula for the calculation of the potential at position k is P k qi Bi þ 4kq h : Bk ¼ P qi þ n2 h2 The sums in the above expression run over the 6 neighboring grid point i (in the planar representation in Fig. 1B they are 4), h is the grid spacing, and qk is the charge in the volume belonging to the grid point k. As can be seen, the potential Bk depends on the potentials at the neighboring grid point, Bi, which are also unknown. Therefore the finite difference formula is solved iteratively. A comprehensive theoretical background of the computational procedure is given in a number of works (18,19). The principal scheme of the method has been further elaborated by the introduction of the focusing technique (18) and by a multigrid technique (20,21). Alternative models can also be used for the calculation of electrostatic interactions in proteins. The microscopic model proposed by Warshel et al. (22,23) considers the protein molecule on an atomic level, which makes it the most rigorous method. All atomic partial charges and polarizabilities are taken explicitly into account. In this way, the introduction of a dielectric constant for the protein molecule is avoided. This approach has been extended by the introduction of Langevin dipoles to account for the reaction of the surrounding solvent molecules (24). Recently, another approach has been successfully introduced, namely, the generalized Born model (25–27). Each atom in the protein molecule is
Calculations of Ionization Equilibria in Proteins
161
represented as a sphere with a given radius and charge. The interior of the atom is considered as a uniform dielectric material. Similarly to the continuum dielectric model, the protein molecule is surrounded by the high dielectric medium of the solvent. The electrostatic interactions are calculated as the work needed to create a given charge distribution. Onufriev et al. (27) modified this model by introducing an additional function, which depends on the atomic radii and distances. They demonstrated that, with the exception of some deeply buried titratable sites, this modification gives practically identical results in comparison with the dielectric continuum model, but the calculations are essentially faster. 1.5
Protein Dielectric Constant
The key parameter of the continuum dielectric model is the protein dielectric constant. While the dielectric constant of the solvent can be measured, in the vicinity and inside the protein molecule, its value can only be assumed. Lamm and Pack (28) have shown that the dielectric constant at the protein– solvent interface can be reduced to a value of about 30. The dielectric constant inside the proteins is usually assumed to be homogenous with a value between 2.5 and 4 (29). Values between 10 and 20 have also been proposed (30 31 32). This large difference in the evaluated protein dielectric constants illustrates the fact that the problem of its determination is far from being solved. Inhomogeneous dielectric constant has been considered as a possible solution of the problem. For instance, a high dielectric constant can be attributed to regions in proteins containing polar side chains (33). Sharp et al. (34) have proposed a calculation of the local dielectric constant based on Clausius–Mossotti equation. Other equations (Debye, Onsager, and Kirkwood) that relate the microscopic properties, such as polarizability and dipole moment, to the macroscopic dielectric constant are also known. However, they all treat homogenous matter, while protein is an inhomogeneous matter. An attempt to treat the protein molecule as an inhomogeneous dielectric medium has also been made (35). 1.6
Computational Strategies
A direct application of the statistical mechanical calculations (Eq. (15)) is limited because the CPU time grows exponentially with the number of the titratable and pH-sensitive sites in the protein molecule. Nowadays, computational facilities allow the summations in Eq. (15) to be performed in a reasonable time for about 25 to 30 sites. Apparently, the use of Eq. (15) easily becomes unrealistically time consuming even for small proteins. There are methods, however, that can be used to overcome this obstacle by approximating the rigorous treatment.
162
Karshikoff
A large part of the titratable groups in proteins do not participate in cooperative titration. The pK shift of such a group caused by charge–charge interactions can be estimated from the mean electrostatic potential created by the rest of the titratable groups. The protonation state of all groups is determined iteratively via pK1/2 calculations (36). At each iteration step, pK1/2 of a given group is obtained by Eq. (7), where DGtc is calculated as a function of the average charges (degree of deprotonation) of other groups determined from their pK1/2. These pK1/2 values are taken from the previous iteration. At the first step, pKmod are used. This approach (mean field approximation) is very effective, but it is inappropriate for sites participating in cooperative interactions (37). For those sites, the iterations converge slowly or do not converge. A combination between mean-field approximation and statistical mechanical calculations has been proposed to reduce the complexity of the task (38). Groups that have pKint far from the pH region of interest can be considered as being fixed in appropriate protonation state and can be excluded from statistical calculations (37). This stripping facilitates the calculations (especially at extreme pH values) but often does not reduce the number of sites enough for a direct application of Eq. (15). Monte Carlo simulation is a powerful approach that can be used for pK calculations (39). The accuracy of Monte Carlo methods depends on the length of the simulation and the specificity of the system. If the protein contains pairs or clusters of strongly interacting groups with cooperative ionization, a very long simulation is needed for achieving reliable estimates for the protonation states of those groups. This can be avoided by introducing some modifications in the standard algorithm (39,40). Computations can be essentially speeded up without diminution of the accuracy by applying clustering methods (8,41,42). In all these methods, the strongly interacting (or closely situated) sites are grouped in clusters. The degree of deprotonation of these groups is calculated either by Boltzmann statistics or by Monte Carlo simulation over the groups included in one cluster, while the influence of the rest of the groups is counted by mean-field approximation. A computational strategy that combines rigorous application of Eq. (15), Monte Carlo calculations, and a clustering technique is detailed and described in Ref. 11. 2
PROTONATION/DEPROTONATION EQUILIBRIA AND CONFORMATIONAL FLEXIBILITY
All considerations made above were based on a single protein structure. The sensitivity of the protonation/deprotonation equilibria of titratable groups to
Calculations of Ionization Equilibria in Proteins
163
the conformational changes is one of the major problems of the accurate prediction of ionization equilibria in proteins. A number of approaches have been proposed to account for conformational flexibility (12,41,43,44). Because of the complexity of the task, all methods are based on approximations aiming the reduction of its prohibitively large computational demands. A possible reduction of the problem is to collect an ensemble of conformations, which presumably represent the conformational variety of a protein molecule in solution. Antosiewicz et al. (45) have used sets of NMR structures for this purpose. An overall agreement of the calculated pK values with the experimental data was achieved. Moreover, the pK values averaged over the NMR structures were more accurate than those calculated from a single crystal structure. On the other hand, Khare et al. (46) demonstrated that in the regions where NMR and x-ray structures differ significantly, the pK values calculated on the basis of the x-ray structures are in better agreement with the experimental data. For solvent-exposed residues, however, NMR structures provide better agreement with the experimental data. These results suggest that the crystal contacts are one of the main sources of discrepancy between the calculated and observed pK values in general. A disadvantage of the calculations based on NMR models is that the side chain conformations are usually not a result of experimental observations, and that the assumption for equal weight of the individual models is too strong. An original method for analyzing the interplay of conformational flexibility and pK calculations has recently been proposed by Georgescu et al. (47). In this method, continuum electrostatic and molecular mechanics force field calculations are combined in Monte Carlo sampling procedure. Another technique for collecting of protein conformations is molecular dynamics (MD) simulation (43,48). A general result of combining MD and pK calculations is the overall improvement of the theoretically predicted pK values. However, discrepancies between experimental and calculated pK values remain most often for groups buried in the protein interior. One possible reason is the relatively short time of conformational sampling (32). This assumption has partially been confirmed by 1-ns MD simulations combined with pK calculations for the structures of xylanase (49,50). It has been concluded that 500-ps simulation time can be considered as a lower limit when the goal is the prediction of the ionization behavior of proteins by means of trajectory averaging. This is illustrated in the upper panel of Fig. 2, where the time evolution of the pK value of Asp121 from Bacillus circulans xylanase is shown. It must be pointed out that the pK calculated from the x-ray structure (3.9) and after 1-ns MD simulation (3.6) are fairly close to each other and are both close to the experimental value. The excellent agreement with the experimental results in this case is in fact a lucky
164
Karshikoff
Figure 2 Xylanase from B. circulans. Upper panel: snapshot pK values of Asp121 taken at each 5 ps. The time evolution of the average pK is given as a continuous line. Time is measured after 50-ps relaxation. The dashed line corresponds to the experimental pK value of 3.6 (Ref. 75). Lower panel: the time evolution of the average DpKsol due to desolvation (snapshot values are given with solid circles) and due to electrostatic interactions with the peptide dipoles DpK (snapshot values are given with open circles).
Calculations of Ionization Equilibria in Proteins
165
hit. However, it illustrates some typical relations between the factors contributing to the protonation/deprotonation equilibria in proteins. In Fig. 2 (lower panel), the change of pK due to desolvation, DpKsol, and due to the interactions with the protein permanent charges (in this case, peptide dipoles only), DpKpc, is plotted. After 500 ps, Asp121 undergoes a transition at which the contribution of the desolvation increases stabilizing its neutral form (increasing pK). At the same time, the energy of interactions with the peptide dipoles tends to compensate this effect by stabilizing the charged form (reducing pK). This compensatory effect is typical and reflects two features of proteins. The first one arises from the chemical nature of proteins as polypeptides (51). The second one results from the fact that buried titratable groups are usually surrounded by appropriate polar environment. Another typical relationship between the conformational flexibility and pK values is illustrated in Fig. 3. The pK of Lys52 (upper panel) from Bacillus agaradhaerens xylanase fluctuates between two average values: around 14 and around 10.5. The occupancy of the protein conformers providing these pK values is approximately equal within the time period of 1 ns which results in a pK of f12—which is an expected value for lysines. No experimental data are available to validate this result; however, an interesting, yet speculative, conclusion can be drawn. In spite of the extreme sensitivity of the pK values regarding conformational flexibility, the conformers can form a limited number of sets, each providing a single average pK value. In the lower panel of Fig. 3, pK snapshots of Asp21 from two molecules (A and B) forming the crystallographic asymmetric unit of B. agaradhaerens xylanase are shown. The molecules A and B should not differ in solution, so that one expects identity of the final pK values. In the case of molecule A, the protonation/deprotonation equilibrium of Asp21 is relatively stable during the time of simulation, providing a final average pK value of 3.4. As seen in Fig. 3, the pK values of Asp21 calculated for the two molecules collapsed after 500 ps to average (over the second half of the simulation time) pK values of 4.2 and 3.8 for molecules A and B, respectively. The tendency to reduction of the difference between these two values suggests that at a longer MD simulation, they may converge into a single value. It must be pointed out that the above examples are illustrative and highlight the importance of conformational flexibility in protonation/deprotonation equilibria in proteins. Other results could be shown, where the prediction of pK fails even for 1-ns MD simulation. Reasons for failures might be that MD simulation is performed for a fixed protonation state of the protein, for instance, when all titratable groups are in their charged forms. Changes in protonation state of the protein will change the explored area of the conformational space.
166
Karshikoff
Figure 3 Xylanase from B. agaradhaerens. Upper panel: snapshot pK values of Lys52, molecule A from the crystallographic asymmetric unit. Lower panel: snapshot pK values of Asp21, molecule A (solid circles) and molecule B (open circles), from the crystallographic asymmetric unit, respectively. In all cases, snapshots are taken at each 5 ps.
Calculations of Ionization Equilibria in Proteins
3
167
IRREGULAR TITRATION IN PROTEINS
The relation between conformational flexibility and ionization equilibria of the individual titratable groups in proteins was considered for cases of noncooperative ionizations. In regions where the protein structure is rigid enough to prohibit essential changes of the conformations upon change of the ionization state of the molecule, cooperative titration of the groups belonging to this region may occur. If the deprotonation (or protonation) of two or more sites is cooperative, DGtc becomes a relevant factor determining the deprotonation function, h(pH). As it will be shown below, h(pH) can differ essentially from the well-known sigmoidal character given by Eq. (1). The nonsigmoidal pH dependence of h is named irregular titration to distinguish it from the familiar h(pH) that follows Eq. (1). 3.1
Conditions for Irregular Titration
The desolvation of the titratable groups causes a significant shift of their pK values. The importance of this factor has been first pointed out by Warshel and Russell (22). The effect of burial of the titratable sites is manifested by stabilization of their neutral form (pK increasing the values for the acidic groups and decreasing for the basic groups). Due to the desolvation energy, a group completely buried in the protein interior may shift its pK value up to 25 pH units (23). A well-known example for such an ‘‘unusual’’ pK shift is that of the hen egg white lysozyme active site Glu35, which has a pK value of 6.2. As it has been already mentioned, buried titratable group is usually surrounded by polar environment, so that a tendency of compensation of the desolvation penalty is present. This compensation is not necessarily complete. An example for incomplete compensation is given in Fig. 2. The interplay between desolvation penalty and charge–charge interactions between titratable groups is of particular interest. First, in salt bridges, these two factors have opposite effect on the protonation/deprotonation equilibria. Second, in contrast to the influence of protein permanent charges, charge–charge interactions are pH-dependent. Thus clusters of titratable groups can be involved in strong pH-dependent charge–charge interactions, which can result in cooperative ionization behavior. Cooperative and nonsigmoidal titration has first been theoretically obtained by Bashford and Gerwert (52). Another example of irregular (nonsigmoidal) titration is the ionization behavior of the lysine cluster in the constriction zone of some porins (53) and in bacteriorhodopsin (54). In addition, Alexov and Gunner (12) have theoretically shown that tautomerization leads to an irregular titration. It must be noted that these results should not be considered as ‘‘theoretical exercises’’ without connection to experimental observations (52).
168
Karshikoff
A theoretical description of this phenomenon has been given by Yang et al. (41) for the case of two acidic groups. It has been extended for arbitrary pairs of groups by Koumanov et al. (11). Fig. 4 illustrates the ionization curves of two interacting groups [acidic–acidic (aa) and basic– basic (bb) pair]. If the groups do not interact (electrostatic interaction energies Waa=0 and Wbb=0), the ionization of the groups follows Henderson– Hasselbalch equation (Eq. (1) or Eq. (7)) with the pK values equal to pKint. For nonzero electrostatic interactions between the groups, the Henderson– Hasselbalch titration is violated which is manifested by the formation of a plateau due to the buffering effect of the synchronous ionization of the pair. When the groups in the pair have equal pKint values, their ionization curves coincide. With the increase of Waa (Wbb), the length of the plateau increases but its midpoint remains at the pH where the groups are half-protonated. Hence the group remains half-deprotonated in a certain pH region. The separation of the inflection points of the two sigmoidal segments (pKV and pKW) is proportional to the energy of interactions between the groups. If the pKint values of the interacting groups differ, the plateau is shifted to the level of higher degree of deprotonation for the group with lower pKint and vice
Figure 4 Titration of two interacting sites (acidic–acidic or basic–basic pair). Continuous line pKint of the groups are equal. In this case, titration curves coincide so that only one line is drawn. Dashed line: the interacting groups have different pKint. pKV and pKW indicate the pH of the inflection points of the sigmoidal segments.
Calculations of Ionization Equilibria in Proteins
169
versa for the other group. In such cases, the titration curves of the partners are not identical (Fig. 4, dashed lines). Consider a pair of an acidic and a basic group. The concrete environment of the pair may induce shifts of the protonation/deprotonation equilibria of the groups, so that the pKint of the acidic group is higher than that of the basic group. This can occur if the acidic group is deeply buried in the protein interior. In such a case, irregular titration is also observed. The titration curves of the groups have a two-step sigmoidal form similar to that shown in Fig. 4. Unlike to aa and bb pairs, for an acid–base pair, the separation between the inflection points and the level of the plateau depend on both DpKint and W. The conditions for irregular titration are the following. First, pKint of the acidic group must be higher than pKint of the basic group. Second, the absolute value of the difference between DpKint and the pK shift due to the charge–charge interactions within the pair should not exceed 1.3 pH units. Third, the magnitude of the charge–charge interactions should correspond to a pK shift larger than 1.3 pH units. A detailed derivation of these conditions is given in Ref. 11. The conditions for irregular titration are mild. The energy involved in this effect is about 2 kcal/mol, which is less than the charge–charge interaction energy in a salt bridge. The condition related to the rigidity of the protein structure is stronger. It requires that the acidic group remains buried in the protein interior during its deprotonation (ionization). If the protein environment allows conformational changes, diminishing the desolvation penalty of the acidic group upon ionization, the conditions for irregular titration can be broken. Two-step sigmoidal dependencies are often observed when ionization properties of proteins are investigated, for instance, by the pH dependence NMR chemical shift of the individual titratable groups (55). Consider again the system of two interacting acidic (or basic) groups with identical pKint (Fig. 4). At pH corresponding to the midpoint of the plateau, the total charge of the couple indicates departure of one proton from the system. Fitting to an appropriate sum of Henderson–Hasselbalch equations would give the same result at the midpoint; however, two pK values will appear (pKV and pKW). According to this fit, the groups should be half-protonated at pH values equal to pKV and pKW, respectively. However, none of the groups is halfprotonated at these pH values. Moreover, half-protonation occurs not at a given pH value, but rather within a pH range, the magnitude of which depends on the electrostatic interactions between the groups. 3.2
Irregular Titration in Enzyme Active Site: An Example
Fitting to the Henderson–Hasselbalch equation is widely used for the identification of the groups responsible for pH dependence of enzymatic activity,
170
Karshikoff
substrate, or inhibitor binding. As it was just illustrated, this approach is misleading if irregular titration occurs. An interesting example for challenging of the traditional interpretation of experimental results based on fitting to Henderson–Hasselbalch equation and the effects of irregular titration is the substrate binding and proton abstraction from the alcohol substrate of Drosophila lebanonensis alcohol dehydrogenase. Studies on pH dependencies of the different steps of the enzymatic reaction have shown that a group in the active site undergoes deprotonation with a pK of 6.8 to 7.5, depending on temperature (56,57). As seen in Fig. 5, the experimental observations excellently fit the Henderson– Hasselbalch equation, revealing deprotonation of a single group with pK of 7.3. There are three groups in the active site of Drosophila alcohol dehydrogenase, which are suspected of having such a pK value: Tyr151, Lys155, and Ser138. The hydroxyl groups of Tyr151 and Ser138 interact via hydrogen bonds with the hydroxyl group of the Ca carbon of the substrate. Lys155 interacts with the O2V hydroxyl groups of the NAD+ ribose and is in the vicinity of Tyr155. After a profound analysis of a large number of ki-
Figure 5 Alcohol dehydrogenase from D. lebanonensis. pH dependence of the function 1/f2 (open circles) scanned from the publication of Winberg et al. (57). The continuous curve is obtained by fitting of Winberg’s data to Eq. (1) using Origin program (Copyright 1997, Microcal Software Inc.) The pK value obtained from the fitting is 7.3 (with Hill coefficient 1.0).
Calculations of Ionization Equilibria in Proteins
171
netic data, it was concluded that the most likely candidate for the group with pK of 7.3 is Ser138 (57). Electrostatic calculations reveal a completely different situation (58). Two residues in the active site, Tyr151 and Lys155, show irregular titration (Fig. 6). Lys155 is inaccessible to the solvent and due to large desolvation penalty, DpKsol for this residue is about 9. The effect of the polar environment, DGpc, is opposite, so that, in terms of Eq. (13), the pKint value of Lys55 is 6.4 on average. Tyr151 is also inaccessible to the solvent; however, its polar environment completely compensates the desolvation penalty. The pKint value of this residue remains at about 10. As can be seen, the couple Tyr151– Lys155 satisfies the conditions for irregular titration. The strong electrostatic influence of the positive charge of Lys155 results in a tendency of stabilization of the charged form of Tyr151: at acidic pH, the degree of deprotonation of Tyr151 is between 0.2 and 0.3. At high pH, the influence of Lys155 diminishes because of the increase of its degree of deprotonation. This causes the stabilization of the neutral (protonated) form of Tyr 151 and, as a result, a reduction of h is observed at pH > 9 (Fig. 6). Such a ‘‘reversal’’ of h(pH) has also been obtained in other theoretical investigations (53,54). The irregular titration of these two residues suggests a different understanding of the pH dependence of the catalytic reaction of Drosophila alcohol dehydrogenase. In Fig. 7, the net deprotonation of the active site of the en-
Figure 6 Alcohol dehydrogenase from D. lebanonensis. Degree of deprotonation of Lys155 (solid line) and Tyr151 (dashed line). The curves are average of the results corresponding to the two subunits of this enzyme.
172
Karshikoff
Figure 7 Alcohol dehydrogenase from D. lebanonensis. Total degree of deprotonation of the active site groups Tyr151 and Lys155. The experimental data (open circles) scanned from the work of Winberg et al. (Ref. 57) are superimposed for comparison.
zyme is presented together with the experimental data of Winberg et al. (57). As can be seen, the theoretical curve follows the experimental points relatively well. The question arises, which interpretation of the experimental data is more reliable? The one illustrated in Fig. 5 is deduced from a fit to the Henderson–Hasselbalch equation and suggests ionization of a single group with changing of the charge from 1 to 0. However, it ignores any possible coupling of the deprotonation of this putative group with other ionization processes in the protein molecule. The electrostatic calculations (Fig. 7) suggest cooperative ionization of two groups and change of the net charge from +1 to 0. The theoretical calculations also suggest a molecular mechanism for proton abstraction from the alcohol substrate through a proton relay chain (58). Without going into details of the molecular mechanism, it is worth noting that it involves rotamer pH dependence of a pH-sensitive site (O2V ribose hydroxyl of NAD+). The population of the rotamers of the O2V ribose hydroxyl as a function of pH is shown in Fig. 8. At neutral pH, all rotamers are approximately equally populated. Increasing the pH, the population of the rotamers, which interact with Tyr151 and Lys155, also increases and becomes practically equally populated, ensuring proton transfer from the alcohol substrate via Tyr151 to Lys155. At low pH, the proton relay chain
Calculations of Ionization Equilibria in Proteins
173
Figure 8 Alcohol dehydrogenase from D. lebanonensis. Rotamer population of the NAD+ ribose O2V hydroxyl group as a function of pH: the hydroxyl group does not donate hydrogen to both Tyr151 and Lys155 (line with dots), proton donor to Lys155 (solid line) and proton donor to Tyr151 (dashed line). The horizontal line indicates population of 1/3.
breaks because O2V ribose hydroxyl adopts orientation at which its hydrogen is not involved in interactions neither with Tyr151 nor with Lys155. The example of Drosophila alcohol dehydrogenase could be considered as a caution against a formal fitting of experimental results to the Henderson–Hasselbalch equation. Another caution should also be made. Electrostatic calculations have been performed for two structures only, i.e., conformational flexibility has practically been ignored. There are, however, experimental data (59) showing the invariance of the structure of the active site upon substrate binding. This can be considered as an evidence that conformational flexibility is not relevant. In cases for which such evidences are not available, irregular titration obtained on the basis of a single protein structure can be as misleading as the formal fitting of experimental data to Henderson–Hasselbalch equation. 4
ELECTROSTATIC INTERACTIONS IN DENATURED PROTEINS
Electrostatic interactions in denatured proteins are often considered as irrelevant for functional properties of proteins, such as enzyme catalysis. Indeed, substrate binding or the catalytic reaction is realized when the enzyme
174
Karshikoff
is in its native state. On the other hand, the structural stability of native proteins is determined by noncovalent interactions, including electrostatic interactions, in both native and denatured states. In the case of pH-induced denaturation, electrostatic interactions play a prime role. The works of Oliveberg et al. (60–62) have given an experimental evaluation of the importance of electrostatic interactions in denatured state, showing, for instance, that the pK values of the acidic groups in unfolded barnase are on average with 0.4 pH units lower than those of model compounds. Other researchers have also noted that electrostatic interactions in denatured state influence protein stability (63). 4.1
Models
The easiest and often-used way of handling electrostatic interactions in denatured proteins is to ignore them. This null approximation can be justified only if electrostatic interactions are screened, for instance, by denaturing agent such as GdmCl. Otherwise, the assumption for zero electrostatic interactions is inapplicable for the prediction of quantities, such as the electrostatic term of unfolding energy (63). Schaefer et al. (31) have used the extended backbone and side chain conformation for the calculation of electrostatic interactions in denatured state. In this model, the titratable groups are characterized by the maximum solvent accessibility to the solvent reflecting the fact that they are fully hydrated in denatured state. On the other hand, the charge–charge distances are maximized in the extended conformation, which may lead to the underestimation of electrostatic interactions. Similar model has been proposed by Warwicker (64) and successfully applied for the calculation of the pHinduced denaturation of a synthetic leucine zipper (65). A hybrid approach has also been proposed to analyze the pH and ionic strength effect of sperm whale apomyoglobin (66). A model based on the molecular mechanics and electrostatic calculations has been proposed by Elcock (67). The key point of the model is the artificial ‘‘swelling’’ of the protein molecule by the increase of the atom–atom distances corresponding to the minimum of the van der Waals interactions. Recently, a more general model of denatured state has been proposed by Zhou (68–70). In this model, denatured protein molecule is treated as a Gaussian chain immersed in a dielectric medium, whereas electrostatic interactions are calculated based on the Debye–Hu¨ckel theory. An approach based on the continuum dielectric model and ideologically very close to that of Zhou (68,69) has been proposed independently (71). The unfolded protein molecule is represented as a material with low dielectric constant, ep between 30 and 40, immersed in the high permittivity
Calculations of Ionization Equilibria in Proteins
175
medium of the solvent, es>ep. The shape of the dielectric cavity can be considered as an average over all possible conformations of a flexible chain, which results in a sphere inside wherein most of the protein atoms reside. The radius of this sphere can be the radius of gyration (71) or the Stocks radius (70) of an unfolded protein. It is known that due to differences in desolvation energies, charges tend to be expelled from a medium with low e (the dielectric cavity) towards a medium with higher dielectric constant (the solvent) (23). Since charges of titratable groups belong to the protein moiety and due to the polypeptide chain flexibility, it is reasonable to assume that titratable sites of a denatured protein in equilibrium are located on the surface of the molecule, i.e., on the surface of the dielectric cavity. The variety of conformers that unfolded protein can adapt is reflected by different configurations of titratable sites on the surface of the dielectric cavity. As a first approximation, one can assume random distributions of the titratable sites. An additional, but very important, constriction can be introduced. Because titratable groups have fixed position in the protein sequence, distances between them cannot be arbitrary. For instance, the distances between two separated along the polypeptide chain sites can be larger than those between two adjacent in the sequence titratable sites. An algorithm for the generation of quasirandom distributions taking into account the influence of the protein sequence is detailed and described in Ref. 71. The strategy for pK calculations does not differ from that described for calculation of protonation/deprotonation equilibria in native proteins. As long as the shape of the dielectric cavity is sphere, the Poisson–Boltzmann equation (Eq. (17)) can be solved analytically (72). A variant of the analytical solution of Eq. (17) adapted for proteins has been given by Tanford and Kirkwood (7). 4.2
Protonation/Deprotonation Equilibria and DG(pH) of Denaturation
Stability of proteins at given conditions is determined by the difference in the Gibbs free energies, DGu, between their folded and unfolded states. Following the concept that pH-induced changes of protein properties are predominantly due to changes of electrostatic interactions, one can derive an expression for the electrostatic term of free energy: Z pH DGu ðpH0 Þ ¼ 2:3RT ½Qu ðpHÞ Qn ðpHÞdpHþDG0 ; ð18Þ pH0
where Qn(pH) and Qu(pH) are protein net charges at native and unfolded states, respectively, and DG0 is the free energy at pH0. The net charge is a sum of the charge values of the individual titratable groups at given pH, which can
176
Karshikoff
be obtained from the degree of deprotonation: qi=h for acidic and qi= 1h for basic groups. Thus, to calculate DGu(pH), one needs to know the protonation/deprotonation equilibria in both the native and denatured state. 4.2.1
Protonation/Deprotonation Equilibria in Denatured Proteins
Cooperative ionization of titratable groups in denatured state is not expected because of the flexibility of the structure. Therefore it is more convenient to consider pK values rather than h(pH). The calculated and experimental pK values of some individual groups of barnase are compared in Table 1. The agreement between theory and experiment is fairly good. As noted by Oliveberg et al. (61), the pK values in denatured state are shifted from the standard values (pKmod), indicating nonzero electrostatic interactions. This comparison illustrates that the null approximation is not valid. An interesting result is that there are small but detectable differences between the calculated pK values for a given type of groups (see Table 1). These differences reflect the influence of the protein sequence on electrostatic interactions (71). This seems to be not an artifact of the calculations since very
Table 1 Group Asp8 Asp12 Asp22 Asp44 Asp54 Asp75 Asp86 Asp93 Asp101 Average Glu29 Glu60 Glu73 Average His18 His102
Barnase Calculated 3.31 3.20 3.17 3.33 3.28 3.09 3.27 3.15 3.28 3.23 3.78 3.64 3.62 3.68 6.42 6.35
Experiment
pKmod
3.50
4.0
3.70 6.59
4.4 6.3 6.3
Comparison of the calculated pK values (Ref. 71) with the experimental data (Ref. 61). The experimental value for His18 is taken from Ref. 74.
Calculations of Ionization Equilibria in Proteins
177
Figure 9 Free energy of denaturation as a function of pH for barnase (upper panel) and for NTL9 (lower panel). Open circles: experimental data scanned from the publications Refs. 61 and 76 for barnase and NTL9, respectively. Continuous line: DGu(pH) calculated with the model of Kundrotas and Karshikoff (Refs. 71,77). Dashed line: DGu(pH) calculated with the null model. The constants DG0 (Eq. (18)) are chosen so that DG(pH 2.1)=0 for barnase and DG(pH 0)=2 kcal/mol for NTL9.
178
Karshikoff
similar deviations of the pK values in denatured state of another protein have been experimentally observed (73). 4.2.2
Energy of Denaturation as a Function of pH
The null approximation means that the net charge in denatured state is calculated by Eq. (1) using standard pK values. The experimental observations (61,73) show that ionization equilibria of the titratable groups in denatured state differ from the standard values. This gives a difference in Qu(pH) and hence in DGu(pH). The difference in the prediction of DGu(pH), when calculated with the null approximation and with the model accounting for electrostatic interactions, is illustrated in Fig. 9. In the upper panel of Fig. 9, DGu(pH) for barnase is compared with the experimental data. The change of DGu, experimentally measured and theoretically predicted, in the interval pH 1 to pH 6 is about 13 kcal/mol. Calculations based on the null approximation gave 23 kcal/mol, which is almost twice the experimental value. A similar overestimation of the free energy change is obtained for the N-terminal domain of the ribosomal protein L9 from Bacillus stearothermophilus (NTL9) when the null approximation is used: 4.4 kcal/mol vs. an experimental value of 2.4 kcal/mol (Fig. 9, lower panel). The theoretical calculations give a certain underestimation of DGu at the neutral pH region. On the other hand, the experimentally observed reduction of the protein stability at pH>8 is successfully predicted. At this pH region, the null approximation fails to predict whatever value or pH dependence of DGu. The above examples illustrate the importance of electrostatic interactions in denatured state. Neglecting their role may lead to incorrect prediction of protein stability. This is especially important for protein engineering studies aiming the design of protein with enhanced stability.
REFERENCES 1. 2. 3.
4.
J-K Hwang, A Warshel. Why ion reversal by protein engineering is unlikely to succeed. Nature 334:270–272, 1988. AC Tissot, S Vuilleumier, AR Fersht. Importance of two buried bridges in the stability and folding pathway of barnase. Biochemistry 35:6786–6794, 1996. S Pao-pin, U Sauer, H Nicholson, BW Matthews. Contributions of engineering surface salt bridges to the stability of T4 lysozyme determined by direct mutagenesis. Biochemistry 30:7142–7153, 1991. F Niehaus, C Bertoldo, M Kahler, G Antranikian. Extremophiles as a source of novel enzymes for industrial application. Appl Microbiol Biotechnol 51:711– 729, 1999.
Calculations of Ionization Equilibria in Proteins 5.
6.
7. 8.
9. 10.
11.
12.
13.
14. 15. 16. 17. 18.
19.
20. 21.
179
A Karshikoff, R Ladenstein. Ion pairs and the thermotolerance of proteins from hyperthermophiles: a ‘‘traffic rule’’ for hot roads. Trends Biochem Sci 26:550–556, 2001. JHG Lebbink, V Consalvi, R Chiaraluce, KD Berndt, R Ladenstein. Structural and thermodynamic studies on a salt bridge triad in the NADP-binding domain of glutamate dehydrogenase from thermotoga maritima: cooperativity and electrostatic contribution to stability. Biochemistry 41:15524–15535, 2002. C Tanford, JG Kirkwood. Theory of titration curves. I. General equations for impenetrable spheres. J Am Chem Soc 79:5333–5339, 1957. VZ Spassov, D Bashford. Multiple-site ligand binding to flexible macromolecules: separation of global and local conformational change and an iterative mobile clustering approach. J Comput Chem 20:1091–1111, 1999. GM Ullmann, EW Knapp. Electrostatic models for computing protonation and redox equilibria in proteins [review]. Eur Biophys J 28:533–551, 1999. MR Gunner, E Alexov. A pragmatic approach to structure based calculation of coupled proton and electron transfer in proteins. Biochim Biophys Acta 1458: 63–87, 2000. A Koumanov, H Ru¨terjans, A Karshikoff. Continuum electrostatic analysis of irregular ionization and proton allocation in proteins. Proteins: Str Func Gen 46:85–96, 2002. E Alexov, MR Gunner. Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophys J 72:2075–2093, 1997. EG Alexov, MR Gunner. Calculated protein and proton motion coupled to electron transfer: electron transfer from QA–QB to QB in bacterial photosynthetic reaction centers. Biochemistry 38:8253–8270, 1999. D Bashford, M Karplus. pKa’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 29:10219–10225, 1990. C Tanford. Physical chemistry of macromolecules. NY, London, Sydney: John Wiley & Sons, 1961. H-X Zhou. Macromolecular electrostatic energy within the nonlinear Poisson– Boltzmann equation. J Chem Phys 100:3152–3162, 1994. J Warwicker, NC Watson. Calculation of the electric field potential in the active site cleft due to alpha-helix dipoles. J Mol Biol 157:671–679, 1982. I Klapper, R Hagstrom, R Fine, K Sharp, B Honig. Focusing of electric fields in the active site of Cu–Zn superoxide dismutase: effects of ionic strength and amino-acid modification. Proteins: Str Func Gen 1:47–59, 1986. A Nicholls, B Honig. A rapid finite difference algorithm, utilizing successive over-relaxation to solve the Poisson–Boltzmann equation. J Comput Chem 12: 435–445, 1991. H Oberoi, NM Allewell. Multigrid solution of the nonlinear Poisson–Boltzmann equation and calculation of titration curves. Biophys J 65:48–55, 1993. M Holst, RE Kozack, F Saied, S Subramaniam. Treatment of electrostatic effects in proteins: multigrid-based Newton iterative method for solution of the full nonlinear Poisson–Boltzmann equation. Proteins 18:231–245, 1994.
180
Karshikoff
22. A Warshel, ST Russell. Calculations of electrostatic interactions in biological systems and in solutions. Q Rev Biophys 17:283–422, 1984. 23. A Warshel, ST Russell, AK Churg. Macroscopic models for studies of electrostatic interactions in proteins: limitations and applicability. Proc Natl Acad Sci USA 81:4785–4789, 1984. 24. J Floria´n, A Warshel. Langevin dipoles model for ab initio calculations of chemical processes in solution: parametrization and application to hydration free energy of neutral and ionic solutes and conformational analysis in aqueous solution. J Phys Chem B 101:5583–5595, 1997. 25. B Jayaram, Y Liu, DL Beveridge. A modification of the generalized Born theory for improved of solvation energies and pK shifts. J Chem Phys 109:1465–1471, 1998. 26. J Srinivasan, MW Trevathan, P Beroza, DA Case. Application of a pairwise generalized Born model to proteins and nucleic acids: inclusion of salt effects. Theor Chem Acc 101:426–434, 1999. 27. A Onufriev, D Bashford, DA Case. Modification of the generalized Born model suitable for macromolecules. J Phys Chem B 104:3712–3720, 2000. 28. G Lamm, GR Pack. Calculation of dielectric constant near polyelectrolytes in solution. J Phys Chem B 101:959–965, 1997. 29. M Gilson, B Honig. The dielectric constant of a folded protein. Biopolymers 25: 2097–2191, 1986. 30. J Antosiewicz, JA McCammon, MK Gilson. Prediction of pH-dependent properties of proteins. J Mol Biol 238:415–436, 1994. 31. M Schaefer, M Sommer, M Karplus. pH-dependence of protein stability: absolute electrostatic free energy difference between conformations. J Phys Chem B 101:1663–1683, 1997. 32. HWT van Vlijmen, M Schaefer, M Karplus. Improving the accuracy of protein pKa calculations-conformational averaging versus the average structure. Proteins 33:145–158, 1998. 33. G King, FS Lee, A Warshel. Microscopic simulations of macroscopic dielectric constant of solvated proteins. J Chem Phys 95:4366–4377, 1991. 34. K Sharp, A Jean-Charles, B Honig. A local dielectric constant for model solvation free energies which accounts for solute polarizability. J Phys Chem 96: 3822–3828, 1992. 35. D Voges, A Karshikoff. A model for a local static dielectric constant in macromolecules. J Phys Chem 108:2219–2227, 1998. 36. C Tanford, R Roxby. Interpretation of protein titration curves. Application to lysozyme. Biochemistry 11:2192–2198, 1972. 37. D Bashford, M Karplus. Multiple-site titration curves of proteins: an analysis of exact and approximate methods for their calculation. J Phys Chem 95:9556– 9561, 1991. 38. A Karshikoff. A simple algorithm for calculation of multiple site titration curves. Protein Eng 8:243–248, 1995. 39. P Beroza, MY Fredkin, MY Okamura, G Feher. Protonation of interacting residues in a protein by a Monte Carlo method: application to lysozyme and the
Calculations of Ionization Equilibria in Proteins
40.
41. 42.
43. 44.
45. 46.
47.
48.
49.
50.
51.
52. 53. 54.
55.
181
photosynthetic reaction center of Rhodobacter sphaeroides. Proc Natl Acad Sci U S A 88:5804–5808, 1991. M Miteva, PA Demirev, AD Karshikoff. Multiply-protonated protein ions in the gas phase: calculation of the electrostatic interactions between charged sites. J Phys Chem B 101:9645–9650, 1997. A-S Yang, MR Gunner, R Sampogna, K Sharp, B Honig. On the calculation of pKas in proteins. Proteins 15:252–265, 1993. M Gilson. Multiple-site titration and molecular modelling: two rapid methods for computing energies and forces for ionizable groups in proteins. Proteins: Str Func Gen 15:266–282, 1993. A-S Yang, B Honig. On the pH dependence of protein stability. J Mol Biol 231: 459–474, 1993. TJ You, D Bashford. Conformation and hydrogen ion titration of proteins: a continuum electrostatic model with conformational flexibility. Biophys J 69: 1721–1733, 1995. J Antosiewicz, JA McCammon, MK Gilson. The determination of pKas in proteins. Biochemistry 35:7819–7833, 1996. D Khare, P Alexander, J Antosiewich, P Bryan, M Gilson, J Orban. pKa measurements from nuclear magnetic resonance for B1 and B2 immunoglobin G-binding domain of protein G: comparison with calculated values for nuclear magnetic resonance and x-ray structures. Biochemistry 36:3580–3589, 1997. RE Georgescu, E Alexov, MR Gunner. Combining conformational flexibility and continuum electrostatics for calculating pKas in proteins. Biophys J 83: 1731–1748, 2002. E Alexov. Role of the protein side-chain fluctuations on the strength of pairwise electrostatic interactions: comparing experimental with computed pKas. Proteins 50:94–103, 2003. AA Gorfe, P Ferrara, A Caflisch, DN Marti, HR Bosshard, I Jelesarov. Calculation of protein ionization equilibria with conformational sampling pKa of a model leucine zipper, GCN4 and barnase. Proteins 46:41–60, 2002. A Koumanov, A Karshikoff, EP Friis, TV Borchert. Conformational averaging in pK calculations. Improvement and limitations in prediction of ionization properties of proteins. J Phys Chem B 105:9339–9344, 2001. VZ Spassov, R Ladenstein, A Karshikoff. Optimization of the electrostatic interactions between ionized groups and peptide dipoles in proteins. Protein Sci 6:1190–1195, 1997. D Bashford, K Gerwert. Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. J Mol Biol 224:473–486, 1992. A Karshikoff, V Spassov, RQ Cowan, R Ladenstein, T Schirmer. Electrostatic analysis of two porin channels from E. coli. J Mol Biol 240:372–384, 1994. VZ Spassov, H Luecke, K Gerwert, D Bashford. pKa calculations suggest storage of an excess proton in a hydrogen-bonded network in bacteriorhodopsin. J Mol Biol 321:203–219, 2001. N Spitzner, L Frank, S Pfeiffer, A Koumanov, A Karshikoff, H Ru¨terjans. Ionization properties of titratable groups in ribonuclease T1. I. pKa values in the
182
56.
57.
58.
59.
60.
61.
62. 63.
64.
65.
66.
67.
68.
69.
Karshikoff native state determined by two-dimensional heteronuclear NMR spectroscopy. Eur Biophys J 30:186–197, 2001. MK Brendskag, JS McKinley-McKee, JO Winberg. Drosophila lebanonensis alcohol dehydrogenase: pH dependence of the kinetic coefficients. Biochim Biophys Acta 1431:74–86, 1999. JO Winberg, MK Brendskag, I Sylte, RI Lindstad, JS McKinley-McKee. The catalytic triad in Drosophila alcohol dehydrogenase: pH, temperature and molecular modelling studies. J Mol Biol 294:601–616, 1999. A Koumanov, J Benach, S Atrian, R Gonza`lez-Duarte, A Karshikoff, R Ladenstein. The catalytic mechanism of Drosophila alcohol dehydrogenase: evidence for a proton relay modulated by the coupled ionization of the active site lysine/tyrosine pair and a NAD+ ibose OH switch. Proteins 51:289–298, 2003. J Benach, S Atrian, R Gonzalez-Duarte, R Ladenstein. The catalytic reaction and inhibition mechanism of Drosophila alcohol dehydrogenase: observation of an enzyme-bound NAD-ketone adduct at 1.4 A˚ resolution by x-ray crystallography. J Mol Biol 289:335–355, 1999. M Oliveberg, S Vuilleumier, A Fersht. Thermodynamic study of the acid denaturation of barnase and its dependence on ionic strength: evidence for residual electrostatic interactions in the acid/thermal denatured state. Biochemistry 33: 8826–8832, 1994. M Oliveberg, VL Arcus, AR Fersht. pKa values of carboxyl groups in the native and denatured state of barnase: the pKa values of the denatured state are on 0.4 units lower than those of model compounds. Biochemistry 34:9424–9433, 1995. Y-J Tan, M Oliveberg, B Davis, AR Fersht. Perturbed pKa-values in the denatured states of proteins. J Mol Biol 254:980–992, 1995. CN Pace, RW Alston, KL Shaw. Charge–charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sci 9:1395– 1398, 2000. J Warwicker. Simplified methods for pK(a) and acid pH-dependent stability estimation in proteins: removing dielectric and counterion boundaries. Protein Sci 8:418–425, 1999. P Phelan, AA Gorfe, I Jelesarov, DN Marti, J Warwicker, HR Bosshard. Salt bridges destabilize a leucine zipper designed for maximized ion pairing between helices. Biochemistry 41:2998–3008, 2002. A-S Yang, B Honig. Structural origin of pH and ionic strength effects on protein stability. Acid denaturation of sperm whale apomyoglobin. J Mol Biol 237:602– 614, 1994. AH Elcock. Realistic modeling of the denatured states of proteins allows accurate calculations of the pH dependence of protein stability. J Mol Biol 294: 1051–1062, 1999. H-X Zhou. A Gaussian-chain model for treating residual charge–charge interactions in the unfolded state of proteins. Proc Natl Acad Sci U S A 99:3569– 3574, 2002. H-X Zhou. Residual electrostatic effects in the unfolded state of the N-terminal
Calculations of Ionization Equilibria in Proteins
70. 71. 72. 73.
74. 75.
76.
77.
183
domain of L9 can be attributed to nonspecific nonlocal charge–charge interactions. Biochemistry 41:6533–6538, 2002. HX Zhou. Dimensions of denatured protein chains from hydrodynamic data. J Phys Chem B 106:5769–5775, 2002. PJ Kundrotas, A Karshikoff. Model for calculations of electrostatic interactions in unfolded proteins. Phys Rev E 65:11901–11909, 2002. JG Kirkwood. Theory of solutions of molecules containing widely separated charges with special application to zwitterions. J Chem Phys 2:351–361, 1934. M Tollinger, JD Forman-Kay, LE Kay. Measurement of side-chain carboxyl pKa values of glutamate and aspartate residues in an unfolded protein by multinuclear NMR spectroscopy. J Am Chem Soc 124:5714–5717, 2002. D Sali, M Bycroft, AR Fersht. Stabilization of protein-structure by interaction of alpha-helix dipole with a charged side-chain. Nature 335:740–743, 1988. M Joshi, A Hedberg, L McIntosh. Complete measurement of the pK(a) values of the carboxyl and imidazole groups in Bacillus circulans xylanase. Protein Sci 6:2667–2670, 1997. DL Luisi, DP Raleigh. pH-Dependent interactions and the stability and folding kinetics of the N-terminal domain of L9. Electrostatic interactions are only weakly formed in the transition state of folding. J Mol Biol 299:1091–1100, 2000. PJ Kundrotas, A Karshikoff. Modeling of denatured state for calculations of electrostatic contribution to protein stability. Protein Sci 11:1681–1686, 2002.
8 Modeling and Optimization of Directed Evolution Protocols Gregory L. Moore and Costas D. Maranas Pennsylvania State University University Park, Pennsylvania, U.S.A.
1
INTRODUCTION
This chapter summarizes research activities by the Maranas group (http:// fenske.che.psu.edu/faculty/cmaranas/) toward modeling the statistics of combinatorial DNA libraries generated through directed evolution methods. Directed evolution methods utilize the process of natural selection to combinatorially evolve enzymes, proteins, or even entire metabolic pathways with improved properties. These methods typically begin with the infusion of diversity into a small set of parental nucleotide sequences through DNA recombination and/or mutagenesis. The resulting combinatorial DNA library is then subjected to a high-throughput selection or screening procedure, and the most improved variants are isolated for another round of recombination or mutagenesis. The cycles of recombination/mutagenesis, screening, and isolation continue until a protein or enzyme with the desired level of improvement is found. In the last few years, remarkable success stories of directed evolution have been reported (1–3), ranging from manifold improvements in enzyme activity and thermostability (4), enhanced bioreme185
186
Moore and Maranas
diation (5–8), and design of vaccines (9–11), to viral vectors for gene delivery (12,13). A key challenge in directed evolution is that only an infinitesimally small fraction of the diversity afforded by DNA sequences can be characterized regardless of the efficiency of the screening procedure employed. For example, a 500-bp gene implies 4500c10301 alternatives, but even the most efficient screening methods are restricted to 107–108 alternatives. Therefore, it is important to know how diversity is generated and allocated in the combinatorial DNA library and which regions are the most promising. This chapter addresses the first question in the context of the DNA shuffling (14,15) and SCRATCHY (16) protocols and examines how fragmentation length, annealing temperature, sequence identity, and number of shuffled parental sequences affect the number, type, and distribution of crossovers along the length of reassembled sequences. The predictive frameworks presented here [eShuffle (17) and eSCRATCHY (16)] provide a step toward optimizing directed evolution protocols in response to an enzyme or protein design challenge. In these modeling frameworks, annealing events during reassembly are modeled as a network of reactions, and equilibrium thermodynamics is employed to quantify their conversions and selectivities. The development of the modeling frameworks was assisted by the experimental and practical expertise of Professors Stephen Benkovic and Stefan Lutz. Figures and tables are reprinted from Refs. 16 and 17 (copyright n 2001 National Academy of Sciences, USA). 2 2.1
MODELING DNA SHUFFLING Background
DNA shuffling (14,15), along with its variants, is one of the earliest and most commonly used DNA recombination protocols. It consists of random fragmentations of parent nucleotide sequences with DNase I and subsequent fragment reassembly through primerless PCR. Library diversity is generated during reassembly when two fragments originating from different parent sequences anneal and subsequently extend. This gives rise to a crossover, the junction point in a reassembled sequence where a template switch takes place from one parent sequence to another. The key advantage of DNA shuffling is that many parent sequences can be recombined simultaneously [i.e., family DNA shuffling (18,19)], generating multiple crossovers per reassembled sequence. However, crossovers tend to aggregate in regions of high-sequence identity due to the annealing-based reassembly. 2.2
Modeling of Annealing Events
During annealing, fragments compete to anneal with a growing template. This competition is quantified by utilizing equilibrium thermodynamics to
Directed Evolution Protocols
187
infer (a) what fraction of these fragments will anneal at a given temperature, (b) how these annealing events will be distributed between those involving high or low overlap lengths, and (c) what portion of these annealing events will involve mismatches. An annealing event between fragments originating from the same parent sequence yields a homoduplex (assuming in-frame annealing), whereas the annealing of two fragments from different parents gives a heteroduplex. Mismatches at exactly the 3V end will lead to less efficient extension and thus are not counted. The thermodynamics of duplex formation can be analyzed using nearest-neighbor parameters that describe the enthalpic and entropic contributions of specific nucleotide pairs in the overlapping region (20–25). The change in free energy DG associated with an annealing event can be approximated by summing the free energy gains associated with all 2-nt matches and the free energy penalties associated with the mismatches. Additional corrections are also included for the duplex initiation free energy cost, salt concentration, and dangling end stabilization (26). Enthalpic and entropic parameters at 37jC for the contribution of pairs of matches and mismatches are summarized in a table found in the supplemental material of Ref. 17. Given this free energy predictive capability, the extent of duplex formation can be tracked at different temperatures. Specifically, consider the reaction associated with the annealing of a fragment F with a template A, forming a duplex AF: A þ FWAF: Assuming equilibrium, the equilibrium constant K(T) links the mole fractions of the template, fragment, and duplex at different temperatures:
DGðTÞ xAF KðTÞ ¼ exp : ¼ RT xA xF Here x denotes mole fractions and 0 denotes initial values of the species in the reaction mixture so that xA=xA0xAF and xF=xF0xAF. Let a(T) be the annealing curve defined as the fraction of templates that have annealed at temperature T [a(T)=xAF/xA0=1–xA/xA0]. Upon rearrangement, these equations can be solved for xF, xA, xAF, and a(T). The temperature at which half of the templates have hybridized to form duplexes [i.e., a(T)=1/2] is defined as the melting temperature Tm. Comparisons of the predictions obtained with the described free energy modeling framework against those found by an empirical formula commonly used for hybridization experiments (27) are in good agreement (see Table 1). Plots of a(T) vs. T reveal that there is a relatively narrow temperature range, centered around Tm, where the majority of annealing events take place (sigmoidal curve). In general, longer overlaps imply higher melting temperatures whereas shorter overlaps, mismatches, and low guanine/cytosine (GC) content depress Tm.
188
Moore and Maranas
Table 1 Comparison of Melting Temperature Predictions for Different Duplexes of Fragmented Subtilisin E Gene Between the Proposed Model and Tm=81.5+0.41(% GC)500/L+16.6 log[Na+] Melting temperature (jC)
Sequence positions
Overlap length
Percent GC
Annealing model
Howley et al. (27)
819–828 1013–1022 529–538 804–828 779–828 729–828
10 10 10 25 50 100
50 30 60 52 50 55
26 17 32 61 72 81
30 22 35 61 71 78
Data shown are for [Na+]=0.05 M and an initial template mole fraction xA0=2.7108 that corresponds to a DNA concentration of 10 mg/L, typical for DNA shuffling.
During the annealing step of DNA shuffling, not a single, but many different fragments with varying lengths, overlaps, and mismatches are competing for a given template: A þ Fmv WAFmv : Here m refers to a fragment originating from parent sequence m and v implies an overlap length of v nucleotides with the template upon annealing. After adjusting the expression for a(T) to reflect the multiplicity of annealing choices and resolving the system of equations, the temperature-dependent selectivity: 0 1 X smv ðTÞ ¼ xAFmv =@ xFmV vV A mV ;vV
for a particular fragment and overlap choice mv is estimated. The presence of multiple fragment and overlap choices ‘‘spreads’’ the melting curve over a wider range of temperatures, implying that annealing events occur over the entire temperature range (typically 94–55jC). The free energy differences between annealing choices and relative fragment concentrations determine which annealing choice dominates at a given temperature. For instance, at high temperatures, fragments with large overlaps that match perfectly with the template dominate all others because of the large enthalpic gains that they provide on annealing. As the temperature is lowered, the melting temperatures of fragments with progressively smaller overlaps and even one or two mismatches are reached, resulting in selectivities that are much more uniform.
Directed Evolution Protocols
189
Because annealing selectivities are temperature-dependent, duplex formation must be assessed cumulatively over the entire annealing temperature range. To this end, the annealing step is modeled as a sequence of pseudoequilibrium states progressively contributing duplexes as the temperature is lowered from 94jC to 55jC. Mathematically, this implies an integration of the temperature-dependent selectivities smv(T) times the annealing rate da(T)/ dT over the annealing temperature schedule: Smv ¼
Z T denature Tanneal
smv ðTÞ
daðTÞ dT: dT
Given a pool of fragments competing for a template and an annealing temperature schedule, Smv quantifies the overall annealing selectivities. The effect of the length of overlap and number/severity of mismatches is illustrated in Fig. 1. The first plot (Fig. 1a) addresses the case when there are no mismatches. It clearly shows that there is a strong preference toward anneal-
Figure 1 Selectivity vs. overlap lengths (a) and selectivity for different degrees, types, and locations of mismatches (b). Both charts utilize the subtilisin E gene (positions 760–784), and mismatches are evenly distributed in the overlapping region.
190
Moore and Maranas
ing events involving the maximum overlap. However, a nonnegligible portion of annealing events involves shorter overlaps. The second plot (Fig. 1b) considers the effect of the number and type of mismatches on annealing selectivities for a given overlap length. Although the great majority of annealing events involve no mismatches, some mismatch-bearing annealing events that cannot be ignored also occur. Note that, in the present implementation, the type of a mismatch affects its selectivity whereas its distance from the 3V end does not. Next, the individual annealing statistics are utilized to infer crossover generation in the reassembled sequences. 2.3
Fragment Reassembly
The reassembly process is modeled as a successive sequence of annealing events. Specifically, the selectivity of an annealing event is assumed to depend only on the identity of the fragment added immediately before. For clarity of presentation, only fragments of a unique length L will be used in the reassembly analysis. Nevertheless, fragments with varying lengths can be incorporated in a straightforward manner as described (28,29). The key idea of the reassembly procedure is to postulate a set of recursive relations that resolves the question of what is the probability C x that a full-length reassembled sequence of B nucleotides has x crossovers. To x denoting the probability that reassembly from pothis end, we define Pik sition i to the end B of the DNA sequence will yield exactly x crossovers, given that the fragment ending at position i1 originated from parent sequence k. The selectivities Smv, defined earlier, can then be calculated for different annealing choices. When a fragment from parent sequence m anneals with a fragment from sequence k, either a homoduplex (m=k) or a heteroduplex (m p k) is formed. Homoduplex formation implies that no crossover is generated and the recursion must still track x crossovers over the remainder of the reassembly. However, heteroduplex formation implies that only x1 remaining crossovers must be subsequently tracked. The annealing of a fragment of length L with an overlap v implies the addition of Lv nucleotides, extending the template to position (i1)+(Lv). This position becomes the new reassembly point completing the recursion. Summation over all parent sequences m and overlap lengths v encompasses all possible reassembly pathways: Pxik ¼
L1 X v¼1
Skv PxiþLv;k þ
L1 XX
Smv Px1 iþLv;m ; b x > 0; b i > L; and b k:
mpk v¼1
The resolution of this recursion requires boundary conditions at the start and end of the gene or gene fragment under consideration. At the onset of re-
Directed Evolution Protocols
191
assembly, the initial fragment covers the range i=1 to i=L, implying that subsequent annealing events add nucleotides starting from position i=L+1. This initial fragment comes from parent m with a probability equal to the relative concentration Cm of parent m in the reaction mixture. This implies that the probability C x that the reassembled sequences contains x crossovers is the parent relative concentration averaged probability of having x crossovers past position L+1: Cx ¼
X
Cm PxLþ1;m ; b x ¼ 0; 1; . . .
m
The boundary conditions for the end position B ensure that no crossovers occur beyond position i=B: P0ik ¼ 1; b i > B and b k Pxik ¼ 0; b x > 0; b i > B; and b k: Because reassembly is a bidirectional process, the reassembly algorithm is also executed in the reverse direction with the complementary DNA sequences, and the results are combined. A flowchart outlining the proposed reassembly procedure is shown in Fig. 2. Interestingly, the original application of the reassembly algorithm overestimated the total number of crossovers, especially for shuffling sequences that share very high-sequence identity. Closer inspection revealed that this was due to the formation of heteroduplexes with fragments involving perfect sequence identity with the growing template. Even though they are indeed crossovers, according to the formal crossover definition, they are completely
Figure 2
A flowchart of the eShuffle reassembly algorithm.
192
Moore and Maranas
undetectable experimentally and, more importantly, they do not contribute any diversity. Therefore, the term silent crossover was proposed for them, and the reassembly algorithm was revised to exclude them. Specifically, if the annealing of a fragment from parent m to a growing template ending with a fragment from parent k is equivalent to the continuation of the template with nucleotides from parent k, no crossover is counted. The proposed reassembly procedure allows the estimation of the fraction of the reassembled sequences containing x =0,1, . . . crossovers. By redefining what constitutes a desirable crossover, different types of crossovers can be assessed separately. For example, in the family DNA shuffling of sequences A, B, and C, the statistics of all six possible types of crossovers AB, BA, AC, CA, BC, and CB can be tracked independently. In addition, one could even track homoduplex extension events such as AA, BB, or CC. Next, the statistics of the distribution of these crossovers along the reassembled sequences is examined. Specifically, the question addressed is, ‘‘what is the probability that a given position i in a reassembled sequence is the site of a crossover (i.e., end point of a heteroduplex annealing event)?’’ This probability depends on the parent origin of the fragment ending at position i1. Thus, the probability that a fragment from parent k ends exactly at position i1 is defined as Tik. A recursion is then established in a similar manner as before. A fragment from parent m ends at position i1 if and only if it was added to a fragment from parent k ending at position iL+v with an overlap v. The probability for this particular duplex formation event can be quantified by multiplying the selectivity Smv times the probability TiL+v,k that the template is positioned appropriately: Tim ¼
L1 XX k
TiLþv;k Smv ; b i > L þ 1 and b m:
v¼1
Boundary conditions ensure that the first nucleotide added to the original fragment comes from a parent sequence k with a probability proportional to its relative concentration. Furthermore, no fragment may end before position i=L: TLþ1;k ¼ Ck ; b k Tik ¼ 0; b i V L and b k: Once the probability Tik that a particular type of template k ends immediately before position i is known, it can be multiplied by the selectivity of a crossover-generating annealing event Smv and summed over all possible annealing choices to infer the probability Picross that position i is the site of a crossover.
Directed Evolution Protocols
Pcross ¼ i
L1 X XX k
193
Tik Smv :
v¼1 mpk
Again, by tailoring the definition of a crossover, the distribution of different types of crossovers (i.e., AB, BC, or AC) along the sequence can be assessed separately. A consistency check reveals that the average number of crossovers calculated based on the probabilities Picross quantifying crossover density along the DNA sequence (SiPicross) is identical to the one obtained based on the crossover number distribution calculated earlier (SxxC x). Given this versatile algorithmic framework, the statistics of any type of crossover can be quantified both in terms of variability among the reassembled sequences and along the length of the gene. Predictions obtained based on the analysis described above are next contrasted against experimental data from DNA shuffling experiments reported in the literature. 2.4
Comparisons with Experimental Results
Although directed evolution studies are being reported in the literature with an accelerating pace, only a few studies report DNA sequencing results for naive (i.e., unselected) DNA libraries. Partial DNA sequencing results allowing for the estimation of the number of crossovers in a small subset of the reassembled sequences are found for the following five studies. Computer simulation of DNA shuffling of these systems provides the basis for the comparisons. Every effort was made to ensure that the fragment length, annealing temperature, and salt and DNA concentrations matched the ones in the experimental study. When no information was provided, default values from the original DNA shuffling protocol (14,15) were adopted. The first system considered is composed of two 465-bp IL-1h genes (human and murine) (15) with a sequence identity of only 75%. An extremely low annealing temperature of 25jC was used to boost the generation of crossovers. Nine colonies were sequenced for a total of 17 crossovers, implying an average of 1.9 per sequence. Simulation results are in close agreement with the experiment, predicting an average of 1.5 crossovers. The next system involved the family DNA shuffling of four class C cephalosporinase genes, 1.2 kb in length with pairwise sequence identities ranging from 58% to 82% (18). It was reported that neither of the two active clones sequenced contained any fragments from the Yersinia enterocolitica gene (third gene). The question is whether this occurred because fragments originating from this gene had a detrimental effect on activity, or simply because pieces from this gene were disproportionately misrepresented in the naive library due to the lack of sufficiently long stretches of near-perfect sequence identity with the other three genes. The average sequence identities
194
Moore and Maranas
of each one of the four genes against the remaining three are 70%, 70%, 65%, and 59%, respectively. Simulation results predict that 36% of the naive sequences contain at least one crossover. The fraction of crossover-bearing sequences containing at least one piece from each one of the four genes is 85%, 95%, 7%, and 19%, respectively. This indicates that Y. enterocolitica (the third one) is, by far, the least even though it is not the one with the lowest sequence identity. This suggests a possible explanation for the absence of any piece of Y. enterocolitica in the most active clones. The next system studied involved two genes for glycinamide ribonucleotide transformylase, Escherichia coli (purN) and human (hGART) (30), with a very low-sequence identity of 49%. Here the following staggered portions of the two genes were shuffled (E. coli positions 1–434 and human positions 164– 611), implying that crossovers could only be formed in the 271-bp shared region (47% sequence identity). This arrangement requires that all reassembled genes of full length start with the E. coli gene and end with the human gene, yielding an odd number of crossovers. In the experimental study, only single crossover clones were observed of 10 sequenced clones. This is consistent with the simulation prediction that the ratio of the number of reassembled sequences with three or more crossovers to the number of sequences with a single crossover is less than 109. A system with a relatively highsequence identity is analyzed next. It involves the DNA shuffling of two biphenyl oxygenases sharing a sequence identity of 87% (31). For this system, an average of 3.3 crossovers per sequence is observed experimentally (six sequenced clones), whereas the simulation suggests a slightly smaller average of 2.8. The last study is the only one where the simulation results deviated from the experimentally observed crossover averages. It involved the DNA shuffling of a 1.3-kb gene for wild-type (wt) subtilisin E and that of a clone (1E2A) differing by only 10 point mutations (32). Slightly larger fragments in the range of 20–50 bases were used in place of the default fragment length range of 10–50 bases. One would expect that a large average number of crossovers would be generated in this system because only 10 point mutations are present, implying a sequence identity of 99.2%. However, this is not observed experimentally as only an average of 1.9 crossovers per sequence is reported (32). The simulation results, on the other hand, are consistent with the intuitive expectation, predicting an average of 3.6 crossovers per reassembled sequence. The randomly chosen sequences may not have been representative of the entire DNA library. For instance, recombinations between mutations at positions 520 and 732 in clone 1E2A must be occurring independently because the stretch of perfect identity is much wider than even the maximum fragment size. However, a crossover occurs in only 10% of the reported sequences instead of the 50% frequency expected for independent reassembly. With the
Directed Evolution Protocols
195
exception of this last example, simulation predictions are in good agreement with the published experimental results without adjustable model parameters. 2.5
Subtilase Case Study
Subtilases are serine proteases (33) extensively engineered with directed evolution experiments (19,34,35). A set of 12 subtilases including subtilisins E, BPNV, Carlsberg, 147, ALP I, PB92, and Sendai; serine proteases C and D; proteinases K and R; and thermitase are next considered to highlight the effect of fragmentation length, annealing temperature, sequence identity, and number of shuffled sequences on the number, type, and distribution of crossovers. We chose to mirror recent subtilase-directed evolution experiments (19) by analyzing the shuffling of only a 500-bp subgenomic region. The average pairwise sequence identity is 58%, ranging from 44% to 90%. First, a high-sequence identity 80% pair (subtilisin E and subtilisin BPNV) is considered. As shown in Fig. 3a, for a fragmentation length of L = 50 bases, 44% of the reassembled sequences involve no crossovers, 37% involve one crossover, 15% involve two crossovers, and diminishing percentages for sequences with more than two crossovers. As the fragment length is reduced, a nonlinear increase of crossovers is observed. This nonlinear increase in the average number of crossovers as a function of L is more clearly depicted in Fig. 3b. Interestingly, the same plot (dashed line) reveals a dramatic increase of silent crossovers for very small fragment lengths (i.e., L V 20). Fig. 4 illustrates the distribution of crossovers superimposed against the sequence identity along the sequence. It shows that crossovers are preferentially aggregated in regions of near-perfect sequence identity, forming a characteristic double peak. The double peak implies that annealing events make full use of the available sequence identity, giving rise to two distinct double peaks at the two flanking positions of the sequence identity stretch. Larger fragments afford a wider range of overlaps flattening the two peaks, whereas smaller fragments are capable of generating crossovers in relatively narrow regions of high-sequence identity. However, in DNA shuffling, not a single fragmentation length L is employed but rather a distribution of fragment sizes, typically in the range of 10–50 bases, with a size distribution described by an exponentially decaying function (28,29). When a range of fragment sizes is employed for the above example, computational results reveal that the crossover statistics are almost identical with the case of utilizing a single ‘‘effective’’ fragment size which, for the 10- to 50-base range, is 25 bases. Next, the effect of annealing temperature on crossover generation is studied. It was found that two underlying mechanisms exist, with which annealing temperature affects the crossover statistics (see Fig. 5). Specifically,
196
Moore and Maranas
Figure 3 (a) Crossover number distribution for DNA shuffling of subtilisin E and subtilisin BPNV for L = 15, 25, and 50 bases. (b) Average number of crossovers per sequence for the same system plotted vs. fragment length in bases. The dotted line includes silent crossovers.
for medium to large fragments, lower annealing temperatures imply that the melting temperatures of more annealing choices containing mismatches (i.e., heteroduplexes) are encountered, yielding more crossovers upon extension. However, for very small fragments at high temperatures, the entropic contribution to the free energy of annealing dominates, blurring the distinction between homoduplexes and heteroduplexes, causing a sharp increase in the total number of crossovers. Clearly, as in the case of fragment length, the annealing temperature cannot be arbitrarily reduced because at some point, fragments cease to exhibit strong affinity for annealing in-frame, and out-offrame additions start to overwhelm the reassembly process.
Directed Evolution Protocols
197
Figure 4 Probability of generating a crossover along the length of the sequence for the (subtilisin E and subtilisin BPNV) system for L = 15, 25, and 50 bases along the subregions 485–979. Black columns in the bottom strip chart denote identical nucleotides for both sequences, and white lines denote mismatches.
The limits of DNA shuffling are explored by choosing the low-sequence identity pair (serine protease D and proteinase K) that has a 46% sequence identity. As expected, very few crossovers are predicted (see Table 2), with only a single narrow region at the end of the sequence coinciding with a short stretch of high-sequence identity. Subsequently, the high-sequence identity pair (subtilisin E and subtilisin BPNV) is shuffled in silico together with the low-sequence identity pair (serine protease D and proteinase K) in equal
Figure 5 Effect of annealing temperature on the number of crossovers produced for the high-sequence identity subtilase pair (subtilisin E and subtilisin BPNV).
198
Moore and Maranas
Table 2 Average Numbers of Crossovers per Sequence Calculated for Various Fragment Lengths L and Parental Sets L (bases) 15 25 50
High-sequence identity pair
Low-sequence identity pair
Set of four subtilases
Set of 12 subtilases
2.9 1.3 0.8
0.5 0.1 0.0
2.3 0.8 0.5
4.8 1.4 0.8
ratios. The key question is whether the low-sequence identity pair will simply dilute the fragment pool that can form heteroduplexes, depressing crossover generation by a factor of 2, or whether synergism in the reassembly will dominate. Even though the average pairwise sequence identity for the four subtilase system is as low as 58%, a comparable number of crossovers with the (subtilisin E and subtilisin BPNV) single-pair case is found (see Table 2). This implies that synergistic reassembly is taking place, alluding to the contribution of ‘‘bridging’’ crossovers by the low-sequence identity pairs. The full power of synergistic reassembly is revealed when all 12 subtilases are included, providing a computational verification of what is seen experimentally with family DNA shuffling, especially for smaller fragments. Even though the average pairwise sequence identity is only 58%, at least as many crossovers are generated (see Table 2) as for the high-sequence identity 80% pair. More importantly, these crossovers span the entire sequence range (see Fig. 6). Admittedly though, the distribution is still multimodal, with peaks tracking the location of high-sequence identity—a signature of the annealing-based reassembly characteristic of DNA shuffling. In Sec. 3, we examine the SCRATCHY protocol, which is capable of generating crossovers in nonhomologous regions and reducing the bias seen in Figs. 4 and 6.
Figure 6 Crossover probability distributions for in silico family DNA shuffling of all 12 subtilases (L = 15).
Directed Evolution Protocols
3 3.1
199
MODELING SCRATCHY Background
As mentioned above, sequence homology-dependent methods for recombining genes have been successful at evolving improved proteins (1–13). An inherent limitation of these methods is their dependence on DNA sequence identity for generating diversity. This precludes the creation of crossovers between genes at loci of low homology, biasing crossover positions toward regions of highest homology. In general, a severe bias toward parental recombination is observed when sequences with less than 70% sequence identity are DNA-shuffled. Given the fact that protein structure is more frequently conserved than DNA homology, homology-dependent methods for recombining genes may potentially exclude solutions to protein engineering problems. The need for a recombination protocol capable of freely exchanging genetic diversity without sequence identity limitations has motivated the creation of incremental truncation for the creation of hybrid enzymes (ITCHY). ITCHY allows one to create comprehensive fusion libraries between fragments of genes without any sequence dependency (30,36,37). However, the main drawback of the method, as well as similar techniques (38), is that members of these libraries contain only one crossover per gene. As suggested earlier (39), the DNA shuffling of ITCHY libraries could potentially introduce multiple crossovers between the genes of interest by preserving ITCHY crossovers (prepositioned crossovers) in the starting material and by recombining regions of homology between genes (Fig. 7). This combination of ITCHY and DNA shuffling has been named SCRATCHY. 3.2
eSCRATCHY Modeling Framework
An in silico modeling framework for crossover statistics prediction, named eSCRATCHY, was developed in conjunction with experimental work on SCRATCHY. The modeling framework builds on the eShuffle program presented above for assessing the generation of crossovers in the context of DNA shuffling (17). SCRATCHY can be abstracted as the family DNA shuffling of an artificially created superfamily containing all single crossover hybrids between the two genes of interest. The presence of fragments that contain prepositioned crossovers during reassembly extends the sequence space accessed by SCRATCHY compared to the one available to traditional DNA shuffling. Therefore, when fragment–fragment hybridization is considered in the reassembly algorithm of eSCRATCHY, it is necessary to keep track of, not only the overlapping region, but also whether one (or both) fragments
200
Moore and Maranas
Figure 7 Schematic overview of SCRATCHY. Initially, individual incremental truncation (ITCHY) libraries of the two complementary constructs are created (a). Following functional selection (b) to recover in-frame hybrids of parental size, the libraries are mixed and submitted to DNA shuffling (c).
contains a prepositioned crossover and whether this crossover is located within or outside the overlapping region (Fig. 8). These considerations give rise to three hypothetical, yet distinct, mechanisms for generating crossovers in contrast to the single mechanism (i.e., the extension of a heteroduplex) encountered in eShuffle, namely, (a) the extension of a heteroduplex as in eShuffle, (b) the incorporation of a prepositioned crossover, or (c) the extension of a hybrid duplex, which occurs when a fragment already
Directed Evolution Protocols
Figure 8
201
The three mechanisms for generating crossovers that are tracked in silico.
containing a prepositioned crossover anneals with another fragment with the crossover positioned in the duplex. Hybrid duplexes are part stabilizing homoduplexes and part crossover-generating heteroduplexes presumably enabling the SCRATCHY protocol to generate crossovers within narrower sequence identity stretches than DNA shuffling. It is important to note that these three hypothesized mechanisms reflect, and thus are dependent upon, the abstraction of the proposed reassembly algorithm as a recursive sequence of annealing events. Clearly, the sequence of actual hybridization events occurring in the reacting mixture over multiple cycles defines a process much more complex than the level of detail captured within eSCRATCHY. Specifically, hybrid duplexes may also occur in DNA shuffling but only after the first reassembly cycle and only between fragments arising from heteroduplex extension in regions of near-perfect sequence identity that are largely absent in low-sequence identity systems. Annealing choices from all three mechanisms are handled in a straightforward manner within the free energybased scoring system (24). In addition, the reassembly algorithm is modified to check for each of the three crossover types for every fragment annealing event. Additional modifications were performed to improve computational performance. The family of single crossover sequences generated in the ITCHY step is much larger than that typically used for molecular breeding, so the original eShuffle program (which scales as the square of the number of parental sequences) was customized. Specifically, fragments with identical sequences from different ITCHY parents were pooled because they do not change the outcome of fragment–fragment extensions considered by the reassembly algorithm. By aggregating their concentrations instead of considering them separately, CPU times were reduced to scale linearly with the number of parental sequences. In addition, we found that for fragmentation lengths longer than 40 nt, approximating individual duplex melting curves as step functions at the duplex’s melting temperature provided a tractable and accurate approximation of the annealing thermodynamics because melting
202
Moore and Maranas
Probability of a Given Number of Crossovers
temperatures for larger fragments are significantly above the applied annealing temperature. A 40-nt fragment reassembly confirmed that predictions vary by less than 5% when this approximation is utilized. eSCRATCHY was next used to address questions concerning the preservation of prepositioned crossovers in reassembled sequences, as well as their contribution toward multiple crossover sequences in comparison with those that also occur in homology-based reassembly. In particular, the effects of fragmentation length and pairwise sequence identity on the number and positioning of crossovers produced and the relative contribution of each of the three postulated crossover mechanisms are examined. The purN/hGART system mentioned above (also see Ref. 30) is first examined in detail. In this case study, both in-frame and parental size selection are ‘‘idealized’’ so that the crossovers present in the ITCHY library are not biased in any manner. Predictions from eSCRATCHY indicate that 52% of the reassembled sequences have multiple crossovers for a fragmentation length of 60 nt even though the nucleotide sequence identity is only 49% in the overlapping region. Note that even for fragments as short as 20 nt, predictions by eShuffle indicate that almost 99.9% of sequences reassembled by DNA shuffling alone will be wild type. Interestingly, in contrast to DNA shuffling, eSCRATCHY predicts that fragmentation length has little, if any, effect on the average number of crossovers produced per sequence (Fig. 9). Smaller fragments imply that more annealing choices are available during reassembly and thus more opportunities to generate crossovers, but at the same time, a
20-nt 40-nt 60-nt 80-nt
30% 20% 10% 0% 0
1
2
3
4
5
Number of Crossovers Figure 9 Probability that a hybrid sequence contains a given number of crossovers after the ‘‘idealized’’ SCRATCHY of PurN and hGART for fragmentation sizes of 20, 40, 60, and 80 nucleotides (54jC annealing temperature). Note that the distributions are similar for each of the sizes.
Directed Evolution Protocols
203
smaller portion of fragments contains prepositioned crossovers. These two effects appear to cancel each other for systems with low-sequence identity. Thus, relatively large fragments can be utilized in SCRATCHY without reducing the number of crossovers, allowing for easier purification, isolation, and reassembly. In addition, predictions suggest that neglecting hybrid duplex crossovers in eSCRATCHY would produce drastically different results, as these crossovers contribute 47% of the total number of crossovers. This ‘‘emergent’’ mechanism, not present in eShuffle, is almost as frequent as the prepositioned crossover mechanism. Heteroduplex crossovers are negligible, as expected, for a system with 49% sequence identity. The distribution of crossovers along the sequence is shown in Fig. 10. Prepositioned crossovers are present almost uniformly along the entire sequence, showing that the unbiased nature of the ITCHY library is retained. In contrast, hybrid duplexbased crossovers track regions of high-sequence identity and involve a less even distribution. Contrary to homology-based methods, the sum of all types of crossovers fills the entire sequence length with an average frequency of 0.65% per position. The ‘‘signature’’ of DNA shuffling can still be detected in the form of peaks tracking regions of high-sequence identity. Next, we examined the effect of pairwise sequence identity on crossover frequencies for the recombination of the following six sequences with purN using eSCRATCHY and eShuffle (sequence identity with purN in the overlapping region in parentheses): GAR transformylases from human (49%);
Figure 10 Distribution of the different types of crossovers along the sequence after the ‘‘idealized’’ SCRATCHY of PurN and hGART (20-nt fragments, 54jC annealing temperature). Note that no gaps appear along the entire crossover range when the crossover types are totaled. Heteroduplex crossovers are negligible and are not pictured.
204
Moore and Maranas
Figure 11 A comparison of the numbers of crossovers predicted for ‘‘idealized’’ SCRATCHY and DNA shuffling for sequence pairs of various sequence identities (20-nt fragments, 54jC annealing temperature). White bars represent contributions to SCRATCHY from prepositioned crossovers; black bars represent hybrid duplex crossovers; and crosshatched bars represent heteroduplex crossovers.
Pseudomonas aeruginosa (54%), Pasteurella multocida (60%), Vibrio cholerae (64%), Salmonella typhimurium (79%); and methionyl-tRNA formyltransferase from E. coli (33%). As seen in Fig. 11, predictions suggest that SCRATCHY is capable of generating crossovers for all sequence pairs, regardless of sequence identity. On the other hand, DNA shuffling requires an approximate ‘‘threshold’’ sequence identity of 60% before any appreciable crossover generation occurs. Even for high-sequence identities, we predict that SCRATCHY outperforms DNA shuffling by an average of 1.5 crossovers per sequence. Both prepositioned and hybrid duplex crossover mechanisms remain prevalent for the entire range of sequence identities, and the heteroduplex mechanism begins to contribute at identities greater than 60% (Fig. 11). Upon utilizing parameters reflecting the specifics of the actual experimental library, eSCRATCHY’s predictions of the naive purN/hGART SCRATCHY library were reexamined and compared to the experimental data. 3.3 3.3.1
Comparisons with Experimental Results Experimental SCRATCHY
Two ITCHY libraries encoding either the PurN/hGART (PGX) or the hGART/PurN (GPX) hybrid pairs were constructed (Fig. 7a), with members
Directed Evolution Protocols
205
distributed over the entire sample space, comparable to data from previous libraries (30, 37). Functional selection (Fig. 7b) was used to select for in-frame members of parental size for DNA shuffling. Although the profile of representative sequences in such a library is biased as shown in Fig. 12, the distribution of the two directional libraries allows for multiple crossovers to occur in the overlapping region. Equal amounts of both selected libraries (PGX and GPX) were DNAshuffled (Fig. 7c), and the resulting reassembled sequences were amplified. The primer pair used for amplification anneals to outside portions on either side of the gene, yielding a comprehensive library of possible combinations
Figure 12 Profiles of crossover positions for the (a) PGX and (b) GPX libraries, including experimental counts (bars) and smooth fitted functions of crossover probability (lines).
206
Moore and Maranas
including wild-type constructs. From this naive library, the hybrid genes of over 50 individual colonies were analyzed by DNA sequencing, and the results are summarized in Fig. 13. For further information on the SCRATCHY protocol, see Ref. 17. Analysis of the library revealed several interesting characteristics. Most importantly, a significant portion of the sample sequences had multiple crossovers. When considering the location and number of the crossover points in the sequences, an important experimental bias emerges. The majority of sequences (70%) in the library are reassembled duplicates of GPX
Figure 13 Sequence data for the naive SCRATCHY library. The dotted lines indicate the borders of the overlapping region between amino acid positions 54 and 144.
Directed Evolution Protocols
207
library members, as if the library were present at a higher concentration than the PGX library during DNA shuffling. Further examination of the sequencing data reveals a number of additional interesting features. The reassembly of parental wt sequences in SCRATCHY, in contrast to DNA shuffling of low homology sequences, is not dominant. Although few wt-PurN sequences are identified in the naive library, wt-hGART is absent. The deficiency of wt-hGART in the recombination mixture is explained by the paucity of a contiguous bridge of hGART fragments traversing the entire gene length due to the uneven distribution of fusion points in the two ITCHY libraries (Fig. 12). The same bias, amplified by the higher effective concentration of the GPX library, is also responsible for the preponderance of hGART/PurN/hGART double crossover sequences over PurN/hGART/PurN hybrids. Reassembly of a PurN/hGART/PurN hybrid requires both a PurN-to-hGART crossover at the beginning of the overlapping region and a hGART-to-PurN crossover near the end of the overlapping region. However, both of these crossovers occur infrequently in the starting material, explaining their absence. In summary, the data show that the characteristics of the ITCHY libraries are inherited by the SCRATCHY library. 3.3.2
eSCRATCHY Comparisons
Accurate in silico analysis required the integration of two experimental presets: the crossover distribution of the employed ITCHY libraries and the fragment reassembly-based bias toward hGART/PurN library members. First, the uneven crossover distributions caused by the functional selection of the ITCHY libraries were accounted for in the eSCRATCHY program by fitting the observed crossover data with a smooth function (Fig. 12), thus customizing the relative concentration of each of the ITCHY library members. Second, as seen in the naive library, hGART/PurN library members dominate the reassembly process. This effect was accounted for by adjusting the concentration ratio of the two libraries to 86% GPX:14% PGX. This ratio was calculated by examining the 5V- and 3V-termini of the library members. The relative effective concentration of the GPX library was estimated by counting the number of sequences beginning with hGART (47 sequences) and ending with PurN (39 sequences). Similarly, the PGX library estimate totaled 14 (3+11), resulting in the 86:14 ratio. Together, these two modifications result in crossover predictions that are in good agreement with the experimental sequence data for the naive library. The distribution matches well with what is found experimentally (Fig. 14a). The discrepancy between the numbers of multiple crossovers predicted in the ‘‘idealized’’ case (Fig. 9) and those found in the experiment can be attributed to the bias in the starting
208
Moore and Maranas
Probability of a Given Number of Crossovers
(a) 80% Experimental Model
60% 40% 20% 0% 0
1
2
3
4
5
Number of Crossovers
(b) Crossover Probability
60% 50% 40%
Experimental Model
30% 20% 10% 150 170 190 210 230 250 270 290 310 330 350 370 390 410 430
0%
Sequence Position (nt) Figure 14 Comparing eSCRATCHY predictions (70 nt fragmentation length, 54jC annealing temperature) for (a) the number of crossovers per naive library member and (b) naive library crossover positions against experimental data. In (b), data are grouped in histogram form with each bar representing a range of 10 nt.
material. In addition, predictions for crossover position statistics (Fig. 14b) capture the uneven nature of crossovers found in the reassembled sequences as a result of the same bias, which also leads to an increased 3.6:1 ratio of prepositioned/hybrid duplex crossovers compared to the ‘‘idealized’’ case. Another interesting aspect is the contribution of crossovers originating from incremental truncation or homology-based recombination. Experimentally, all fusion points observed in the SCRATCHY libraries have counterparts at locations corresponding to prepositioned crossovers, originating from the ITCHY libraries. However, the origin of the crossovers in the
Directed Evolution Protocols
209
homologous region between amino acids 100 and 110 could not have been attributed conclusively to ITCHY or DNA shuffling. In the eSCRATCHY model, heteroduplex crossovers are rare across the entire sequence. 4
SUMMARY
eShuffle provided for the first time a quantitative framework for the in silico exploration of many ‘‘what if’’ scenarios in terms of fragmentation length, annealing temperature, and parental choices in the context of DNA shuffling. Comparisons of the eShuffle predictions against experimental data revealed good agreement, particularly in light of the fact that there are no adjustable parameters in the modeling framework. The only parameters are the free energy contributions used unchanged from literature sources (24). Therefore, no reparameterization is needed when either the experimental conditions or the sequences to be shuffled change, providing a versatile framework for comparing different protocol choices and setups. In the context of family DNA shuffling (18,19), the eShuffle program enabled the estimation of the relative contribution of fragments from different parental sequences to the combinatorial DNA library. Results revealed that the pairwise sequence identities between the parental sequences do not always explain the observed parental crossover frequencies in the libraries. eShuffle also led to the quantification of synergistic reassembly in family DNA shuffling and the elicitation of the presence of the swapping of identical fragments between high-sequence identity parental pairs (silent crossovers). The eSCRATCHY framework (a) led to a newly hypothesized mechanism for the generation of crossovers based on the extension of hybrid duplexes, (b) revealed that fragmentation length has little effect on crossover statistics, and (c) verified a complete coverage of gene length with potential crossover sites. An in silico case study of six pairs of parental sequences ranging in sequence identity from 33% to 79% revealed that SCRATCHY outperforms DNA shuffling by approximately 1.5 crossovers per sequence for all six sequence pairs. Comparisons of eSCRATCHY statistics with experimental naive library sequence data were in good agreement after adjusting the concentration ratio of the incremental truncation libraries. Both eSCRATCHY and experimental results confirmed that the crossover distributions of the incremental truncation libraries are inherited by the SCRATCHY library. ACKNOWLEDGMENTS The authors would like to thank Professor Stephen Benkovic and Dr. Stefan Lutz for developing the SCRATCHY protocol and other helpful discus-
210
Moore and Maranas
sions, and Dr. Shankar Vaidyaraman for help with the model implementation. Financial support by National Science Foundation Award BES0120277, National Science Foundation Career Award CTS9701771, and the Life Science Consortium at Penn State is gratefully acknowledged along with hardware support by the IBM-SUR program.
REFERENCES 1. 2. 3. 4.
5.
6. 7. 8. 9. 10. 11.
12. 13.
14. 15.
S Brakmann. Discovery of superior enzymes by directed molecular evolution. ChemBioChem 2:865–871, 2001. IP Petrounia, FH Arnold. Directed evolution of enzymatic properties. Curr Opin Biotechnol 11:325–330, 2000. C Schmidt-Dannert. Directed evolution of single proteins, metabolic pathways, and viruses. Biochemistry 40:13125–13136, 2001. K Miyazaki, PL Wintrode, RA Grayling, DN Rubingh, FH Arnold. Directed evolution study of temperature adaptation in a psychrophilic enzyme. J Mol Biol 297:1015–1026, 2000. A Crameri, G Dawes, E Rodriguez, S Silver, WPC Stemmer. Molecular evolution of an arsenate detoxification pathway by DNA shuffling. Nat Biotechnol 15:436–438, 1997. K Furukawa. Engineering dioxygenases for efficient degradation of environmental pollutants. Curr Opin Biotechnol 11:244–249, 2000. F Bruhlmann, W Chen. Tuning biphenyl dioxygenase for extended substrate specificity. Biotechnol Bioeng 63:544–551, 1999. LP Wackett. Directed evolution of new enzymes and pathways for environmental biocatalysis. Ann NY Acad Sci 864:142–152, 1998. PA Patten, RJ Howard, WPC Stemmer. Applications of DNA shuffling to pharmaceuticals and vaccines. Curr Opin Biotechnol 8:724–733, 1997. RG Whalen, R. Kaiwar, NW Soong, J Punnonen. DNA shuffling and vaccines. Curr Opin Mol Ther 3:31–36, 2001. G Marzio, K Verhoef, M Vink, B Berkhout. In vitro evolution of a highly replicating, doxycycline-dependent HIV for applications in vaccine studies. Proc Natl Acad Sci USA 98:6342–6347, 2001. NW Soong, L Nomura, K Pekrun, M Reed, L Sheppard, G Dawes, WPC Stemmer. Molecular breeding of viruses. Nat Genet 25:436–439, 2000. SK Powell, MA Kaloss, A Pinskstaff, R McKee, I Burimski, M Pensiero, E Otto, WP Stemmer, NW Soong. Breeding of retroviruses by DNA shuffling for improved stability and processing yield. Nat Biotechnol 18:1279–1282, 2000. WPC Stemmer. Rapid evolution of a protein in vitro by DNA shuffling. Nature (London) 370:389–391, 1994. WPC Stemmer. DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution. Proc Natl Acad Sci USA 91:10747– 10751, 1994.
Directed Evolution Protocols
211
16. S Lutz, M Ostermeier, GL Moore, CD Maranas, SJ Benkovic. Creating multiple crossover libraries independent of sequence identity. Proc Natl Acad Sci USA 98:11248–11253, 2001. 17. GL Moore, CD Maranas, S Lutz, SJ Benkovic. Predicting crossover generation in DNA shuffling. Proc Natl Acad Sci USA 98:3226–3231, 2001. 18. A Crameri, S Raillard, E Bermudez, WPC Stemmer. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature (London) 391:288–291, 1998. 19. JE Ness, M Welch, L Giver, M Bueno, JR Cherry, TV Borchert, WPC Stemmer, L Minshull. DNA shuffling of subgenomic sequences of subtilisin. Nat Biotechnol 17:893–896, 1999. 20. HT Allawi, J SantaLucia. Nearest-neighbor thermodynamics of internal A–C mismatches in DNA: sequence dependence and pH effects. Biochemistry 37: 9435–9444, 1998. 21. HT Allawi, J SantaLucia. Nearest neighbor thermodynamic parameters for internal G–A mismatches in DNA. Biochemistry 37:2170–2179, 1998. 22. HT Allawi, J SantaLucia. Thermodynamics of internal C–T mismatches in DNA. Nucleic Acids Res 26:2694–2701, 1998. 23. HT Allawi, J SantaLucia. Thermodynamics and NMR of internal G–T mismatches in DNA. Biochemistry 36:10581–10594, 1997. 24. J SantaLucia. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest neighbor thermodynamics. Proc Natl Acad Sci USA 95:1460–1465, 1998. 25. N Peyret, PA Seneviratne, HT Allawi, J SantaLucia. Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A–A, C–C, G–G, and T–T mismatches. Biochemistry 38:3468–3477, 1999. 26. S Bommarito, N Peyret, J SantaLucia. Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res 28:1929–1934, 2000. 27. PM Howley, MF Israel, M Law, MA Martin. A rapid method for detecting and mapping homology between heterologous DNAs. Evaluation of polyomavirus genomes. J Biol Chem 254:4876–4883, 1979. 28. GL Moore, CD Maranas. Modeling DNA mutation and recombination for directed evolution experiments. J Theor Biol 205:483–503, 2000. 29. GL Moore, CD Maranas, KR Gutshall, JE Brenchley. Modeling and optimization of DNA recombination. Comp Chem Eng 24:693–699, 2000. 30. M Ostermeier, JH Shim, SJ Benkovic. A combinatorial approach to hybrid enzymes independent of DNA homology. Nat Biotechnol 17:1205–1209, 1999. 31. T Kumamaru, H Suenaga, M Mitsuoka, T Watanabe, K Furukawa. Enhanced degradation of polychlorinated biphenyls by directed evolution of biphenyl dioxygenase. Nat Biotechnol 16:663–666, 1998. 32. H Zhao, FH Arnold. Functional and nonfunctional mutations distinguished by random recombination of homologous genes. Proc Natl Acad Sci USA 94:7997– 8000, 1997. 33. RJ Siezen, JA Leunissen. Subtilases: The superfamily of subtilisin-like serine proteases. Protein Sci 6:501–523, 1997.
212
Moore and Maranas
34. K Chen, FH Arnold. Enzyme engineering for nonaqueous solvents: Random mutagenesis to enhance activity of subtilisin E in polar organic media. Bio/ technology 9:1073–1077, 1991. 35. K Chen, AC Robinson, MEV Dam, P Martinez, C Economou, FH Arnold. Enzyme engineering for nonaqueous solvents: II. Additive effects of mutations on the stability and activity of subtilisin E in polar organic media. Biotechnol Prog 7:125–129, 1991. 36. M Ostermeier, SJ Benkovic. Construction of hybrid gene libraries involving the circular permutation of DNA. Biotechnol Lett 23:303–310, 2001. 37. S Lutz, M Ostermeier, SJ Benkovic. Rapid generation of incremental truncation libraries for protein engineering using alpha-phosphothioate nucleotides. Nucleic Acids Res 29:E16, 2001. 38. V Sieber, CA Martinez, FH Arnold. Libraries of hybrid proteins from distantly related sequences. Nat Biotechnol 19:456–460, 2001. 39. M Ostermeier, AE Nixon, SJ Benkovic. Incremental truncation as a strategy in the engineering of novel biocatalysts. Bioorg Med Chem 7:2139–2144, 1999.
9 Rational Redesign of Enzymes Jens Erik Nielsen* University of California, San Diego La Jolla, California, U.S.A.
1
INTRODUCTION
The enormous acceleration of reaction rates that enzymes provide is one of the foundations of life as we know it. For many years, the study of metabolic enzymes, their reactions, and the way that ATP is generated has been the main focus of enzymology (1). However, classic metabolism is only the most well-known function that enzymes perform. The enzymes that participate in DNA repair, transcription, and replication, and the enzymes that play roles in signal transduction and cell morphology control, are examples of often bigger and more complicated enzymes and enzyme complexes that are currently being studied intensely (2–4). Therefore, it is not hard to find a good reason to study how enzymes work, and, consequently, there is a steady flow of information on new enzymes, their substrate specificities, catalytic mechanisms, and cellular roles in current biological journals. The aim of the current review is not to provide a comprehensive description of different classes of enzymes and their functional diversity,
*Current affiliation: University College Dublin, Dublin, Ireland.
213
214
Nielsen
but rather to provide a description of how well we understand enzymes, and how good we are at using our knowledge to manipulate enzymes. The manipulation of enzymes is desirable not only from a scientific point of view, but also because enzyme-catalyzed reactions often present a more environmentally sound alternative to industrial processes (5,6). The conditions of industrial processes are often very different from the conditions for similar reactions in nature, and it is therefore sometimes necessary to improve the thermostability, to change the substrate specificity, or to change another characteristic of an enzyme to make it perform adequately in an industrial process (6). The ability to change a certain characteristic of an enzyme is promoted by a good understanding of the characteristic that is to be changed. Thus if one wants to increase the catalytic performance of an enzyme at acidic pH and high temperature, it is generally a good idea to know how and why the high temperature and the acidic pH influences the enzymatic rate enhancement if one is to successfully redesign the enzyme. Recent years have seen the development of several methods that rely on selection or screening methods to pick out optimized enzyme mutants from artificially generated enzyme populations [the so-called directed evolution techniques (7)]. These methods do not depend on a detailed understanding of the enzyme in question, and the relative success of these methods compared to classic protein engineering techniques in designing novel and more efficient enzymes (8–10) provides a somewhat sobering measure of our understanding of enzymes. However, while directed evolution techniques to a large extent may be able to provide enzymes with almost any set of characteristics, they have so far failed to give us detailed insights into the principles of enzyme catalysis. While industrially irrelevant, it is scientifically unsatisfactory to employ black-box methods that simply provide solutions without explanations. In the following sections, I will focus mainly on what we know about enzymes, on how they work, and on what rational enzyme engineering studies have told us about enzymes. 2
HOW DO ENZYMES WORK?
The classical view of enzyme catalysis is that enzymes are optimized to stabilize the transition state of the reactants, and in that way accelerate the rate of catalysis (11,12). This may give the impression that designing an efficient enzyme is simply about making a protein that binds the transition state of a given reaction as strongly as possible. Fig. 1 shows the energy diagram of a hypothetical reaction for both an uncatalyzed pathway in water and an enzyme-catalyzed pathway. It is seen that the rate enhancement of enzymes is not solely dependent on a strong binding of the transition state, but also on the ability to bind the substrate and product ground states much less tightly than the transition state as compared to the energy of these three
Rational Redesign of Enzymes
215
Figure 1 Energy diagram for a hypothetical reaction showing both the uncatalyzed and enzyme-catalyzed pathways. It is seen that the true rate enhancement provided by the enzyme stems not only from a high stabilization of the transition state, but also from a relatively low stabilization of the substrate and product ground states.
states in the reference state (13,14). If an enzyme equally reduces the energy of all three states (i.e., in the figure DGz = DGS = DGP), then the enzyme does not enhance the rate of the reaction at all compared to the rate in the reference state. Thus, it is the differential stabilization of the transition state as compared to the stabilization of the ground state of both the substrate and the product that is the foundation of the rate acceleration of enzymes. In recent years, much work has been devoted to understanding how enzymes achieve this remarkable stabilization of the transition state relative to the two ground states. Several specific effects such as the low-barrier hydrogen bond (15), solvent reorganization (14), and electrostatics (13) were proposed to account for the differential stabilization of the transition state. The references above provide a comprehensive discussion of these effects and
216
Nielsen
their proposed importance, and as such these theories provide several logical routes to understanding enzymes. However, the application of these theories in redesigning enzymes is still very limited. So although it is clear that low-barrier hydrogen bonds, solvent reorganization, and the detailed electrostatics play important roles in enzyme catalysis, it is not possible to translate this information into a set of point mutations that will make a given enzyme more efficient (16). This is partly because we do not have a good understanding of exactly how the above effects play a role, and also because we, in most cases, do not have a sufficiently detailed understanding of the catalytic mechanism of the enzyme in question. Without a very detailed understanding of the reaction mechanism, it is difficult to rationally improve the way the enzyme differentially binds the two ground states and the transition state.
3
CATALYTIC MECHANISMS
Understanding the catalytic mechanism is an essential step in improving the performance of an enzyme. Classic biochemistry experiments for determining catalytic mechanisms involve the mutation of a set of residues in the active site to identify residues that are essential for the catalytic activity. Once these residues are identified and once a 3-D structure or a homologybuilt model of the enzyme is available, the next logical step is to obtain structural information of the binding mode of the proposed transition state inhibitors and optionally suicide inhibitors in an attempt to elucidate the details of the catalytic mechanism. All of these methods not only provide valuable information on the catalytic mechanism, but all also provide ample room for different interpretations of the experimental results as will be exemplified in the following. 3.1
Alpha-Amylases
In the a-amylases (17) and the related cyclodextrin glycosyltransferases (CGTases) (18), the active site includes three highly conserved acidic residues that can all potentially act as an acid/base catalyst or as the nucleophile in the catalytic mechanism. Mutation of any one of the three acids to the corresponding amide leads to a significant loss of activity. However, while some research groups reported a complete loss of activity when knocking out any one of the three acids (19,20), other groups reported that traces of activity were present when mutating one or more of the active-site residues (21,22). Consequently, it was unclear which two protein residues were the true active residues. Only recent x-ray structural work (23) has been able to resolve the issue, although controversy still exists on the level of activity of the active-site
Rational Redesign of Enzymes
217
Figure 2 The two possible catalytic mechanisms for HEWL. Path A, the currently accepted view. Path B, the previously accepted mechanism. R, oligosaccharide chain, RV, peptidyl side chain. (From Ref. 28.)
218
Nielsen
mutants (23,24). Clear information on the catalytic mechanism is very important in, e.g., the attempts to reengineer alpha-amylase and CGTase pH–activity profiles (25,26), which is of importance in the industrial application of these enzymes (27). 3.2
Lysozyme and Thermolysin
For other enzymes, accepted catalytic mechanisms have been contested in recent years. In Hen Egg White Lysozyme (HEWL), this had the implication that one of the textbook examples of catalytic mechanisms had to be revised. It has long been accepted that the catalytic mechanism of HEWL proceeded via a carbo-cation intermediate (Fig. 2), which was created by protonation of the glycosidic oxygen followed by an SN1 elimination of the aglycon part of the substrate. However, recent experimental work by Vocadlo et al. (28) using suicide inhibitors showed that the catalytic reaction proceeds via a covalent substrate–enzyme intermediate although the previously accepted mechanism cannot be completely ruled out. In the case of thermolysin, the experimental data are less clear-cut, and currently there are two different views of the thermolysin catalytic mechanism. The original mechanism was proposed by Kester and Matthews (29) based on the x-ray data of thermolysin in complex with inhibitors. However, data on the pH-dependent characteristics from kinetic studies carried out by Mock et al. (30,31) are seemingly inconsistent with the mechanism originally proposed by Matthews, and thus has led Mock et al. to propose an alternative reverse protonation mechanism for the catalytic reaction. This has spurred new interest in the catalytic mechanism of thermolysin (32,33), which has led to new insights into the details of the thermolysin mechanism. Examining the details of the arguments in favor of each of the two proposed mechanisms for thermolysin is beyond the scope of this review, but it is clear from the three examples above that it is nontrivial to be absolutely certain about the details of the catalytic mechanisms for even extremely wellstudied enzymes. Similar controversies are likely to appear for a large number of enzymes once catalytic mechanisms are investigated in detail. For the field of enzyme design, the consequences are far-reaching because it introduces an extra layer of uncertainty when designing point mutations and interpreting their results.
4
REDESIGN OF ENZYMES
Redesigning an enzyme normally has a very specific goal, and most redesigned enzymes have had their stability improved for their use in specific industrial processes. The highly improved stability of enzymes used in washing powder and in starch liquefaction processes are well-known examples rep-
Rational Redesign of Enzymes
219
resenting this class of redesigned enzymes (24,34,35). Examples of other types of reengineering come from the cyclodextrin glycosyl transferases where the product specificity has been changed (Ref. (18) and references therein) from a butyrylcholinesterase, which has been reengineered as a cocaine hydrolase (36), and from Bacillus circulans xylanase, where Joshi et al. (37) engineered a remarkable shift in the pH–activity profile the enzyme. Briefly, enzyme engineering can be conceptually divided into three categories depending on the objective of the engineering project. The three categories are activity engineering (changing kcat), substrate specificity engineering (changing the substrate specificity), and stability engineering (improving the resistance of the enzyme structure to temperature, pH, or other effects). 4.1
Stability Engineering
Several concepts for engineering protein stability have been established and are found to be applicable to a wide range of proteins. Stabilization by the insertion of prolines (38), by cavity filling (39), and by helix capping (40) are all well-known concepts and will not be reviewed here. Recent years have seen the development of Protein Design Algorithms (PDAs), which combine knowledge of the determinants of protein stability with in silico mutations of a protein. This way, it is possible to predict point mutations that will stabilize a given protein (41). PDAs contain a force field for evaluating the stability of a given protein and a search algorithm that searches sequence/ structure space in an attempt to optimize the stability given by the force field. In recent years, significant progress has been made both in constructing better force fields and in constructing search algorithms so that sequence/structure space is faster and more efficiently searched. The most accurate force fields for predicting protein stability are currently statistically based methods (42–44) and semiempirical methods (45–48). Both approaches provide predictions of protein stability with an accuracy of around 1 kcal/mol for a single point mutation. Current state-of-the art protein search algorithms are based on the mean-field method and on the dead-end elimination theorem (Ref. 49 and references therein). Presently, it is feasible to perform quite extensive searches of sequence/structure space even for quite large proteins. However, presently, most PDAs use relatively simple force fields, which provide only a rough description of steric effects, desolvation energies, and a simple treatment of hydrogen bonds. The coming years will undoubtedly see the incorporation of more accurate force fields in PDAs to improve the prediction accuracy. 4.2
Engineering Substrate Specificity
Engineering of enzyme substrate specificity is closely related to the inverse problem of structure-based drug design (50,51), where numerous tools have
220
Nielsen
been developed for the construction and docking of small ligands in enzyme active sites (52–55). Fewer tools are available for designing novel substrate specificities for enzymes, but several PDAs have been modified to optimize the stability of a given enzyme–substrate complex by performing point mutations in the enzyme. The program DEZYMER (56,57) has been successfully used to construct an iron superoxide dismutase (58) using Escherichia coli thioredoxin as scaffold. Other examples of the engineering of novel substrate specificities/catalytic activities have also been reported (59,60). 4.3
Stability vs. Activity
An aspect of protein stability that is unique to enzymes is the correlation between thermostability and catalytic activity for naturally occurring enzymes. If enzymatic activity is measured at a given temperature, it is often found that an enzyme from a psycrophilic organism have a higher activity than the corresponding enzymes from a mesophilic organism, which in turn have a higher activity than the version of the enzyme from a thermophilic organism. The stability of the same enzymes is found to be in the reverse order, with the thermophilic enzyme being the most stable and the psycrophilic enzyme being the least stable. This inverse relationship between the stability of a given enzyme and its activity suggests that enzymes perform best at temperatures relatively close to their melting temperature (Tm). The reason for this behavior is that enzymes in nature have been designed to be only marginally stable at the temperature where they function, because a too stable enzyme in a cell would be difficult to degrade when its activity was no longer beneficial to the cell (61). The lower activity for more stable enzymes is a result of decreased mobility in the active site region. Therefore it is possible to engineer enzymes that have a high catalytic activity even at temperatures far from their Tm (62), provided that the mobility of the active site region is maintained. 4.4
Engineering Catalytic Activity
In contrast to the more well-understood field of stability engineering, the reengineering of the catalytic mechanism itself is only emerging. Boundaries between engineering substrate specificity or stability and the field of engineering catalytic activity are poorly defined because the stabilization of an enzyme often influences its catalytic activity by changing the active site mobility, and the engineering of the substrate or product specificity of an enzyme most certainly changes the kcat and kcat/Km values for certain substrates. Here I define engineering catalytic activity as engineering that aim at modifying a property of the catalytic activity of the enzyme, which is not limited by the substrate/product specificity of the enzyme nor by the stability
Rational Redesign of Enzymes
221
of the enzyme. Such an aim could be, e.g., to increase the activity of the enzyme at a certain temperature, to increase the activity of an enzyme at acidic pH, to make an enzyme independent of its natural cofactor, or to completely redesign the catalytic mechanism of the enzyme. Rational reengineering of the catalytic mechanism of an enzyme has not yet, to the author’s knowledge, been achieved, and attempts at both changing the pH–activity profile of an enzyme and the construction of enzymes with higher activity were proven to be difficult. However, the interpretation of the results obtained in the engineering process often lead to surprising new insights in the catalytic mechanism, as illustrated by the example below. 4.5
Changing the pH–Activity Profile of B. Circulans Xylanase
The family 11 xylanases [for classification nomenclature, see the CaZy database (63)] use a double-displacement mechanism similar to that of lysozyme (Fig. 2) for cleaving xylan polymers. In this type of mechanism, the lower limb of the pH–activity profile is controlled by the pKa of the nucleophilic residue (Asp-52 in HEWL), and the alkaline limb of the pH–activity profile is determined by the pKa of the acid/base catalyst (Glu-35 in HEWL). For B. circulans xylanase, it had been shown that the apparent pKa values from the pH–activity profile coincided with the pKa values for the nucleophile (Glu78, pKa 4.6) and the acid/base catalyst (Glu-172, pKa 6.7) (64). In xylanases with an acidic pH optimum, an aspartic acid forms a hydrogen bond with Glu-172, while in xylanases with a neutral pH optimum, this residue is an asparagine (37). Joshi et al. (37) studied the effect of substituting this asparagine (Asn-35) in a neutral xylanase with an aspartic acid (Asp-35), and observed, in accordance with expectations, that the pH optimum of the enzyme was shifted to more acidic pH values. Measurement of the pKa values of the active-site acids surprisingly revealed that the pH dependence of catalysis in the mutant is determined by the pKa values of Asp-35 (pKa 3.7) and Glu-78 (pKa 5.7). Joshi et al. (37) confirmed that the catalytic mechanism was unchanged in the mutant, and therefore concluded that the mutant xylanase operated by an inverse protonation mechanism, where the enzyme is active when Asp-35 is protonated and Glu-78 is charged. The pKa of Asp-35 is significantly lower than the pKa of Glu-78 and the fraction of enzyme molecules with this protonation state is therefore lower than 1% at all pH values. Because the mutant enzyme is approximately 20% more active than the wild type, the authors suggest that that the inherent catalytic activity of the mutant enzyme must be at least a hundredfold higher than the wild type (37), thus providing a surprising explanation for the observed pH-optimum shift. Current methods for predicting pKa values of protein titratable groups (65) can aid in the interpretation of such results (66), but no tools or ana-
222
Nielsen
lytical methods exist that can reliably predict the proposed 100-fold increase in catalytic activity of the mutant protein. 5
DISCUSSION
Remarkable results in the redesign of enzymes for specific applications have been achieved in the last decade. Thanks to the extensive research in the mechanisms that govern protein stability, it is now possible to improve the thermostability of almost any given protein by following a few simple rules. Further improvements in the thermostability may be achieved by applying protein design algorithms, which in some cases will be able to produce even more thermostable variants of an enzyme by repacking parts of the hydrophobic core of the protein. So far, the engineering of catalytic activity has been proven to be the most difficult area of enzyme design. This is mainly because of our poor understanding of the specific effects that make a certain enzyme a highly efficient catalyst (high kcat) compared to other similar enzymes. Experimental and theoretical results (67–70) suggest that the dynamics of the enzyme play an important role in the catalytic mechanism of the enzyme, and because neither present theoretical methods nor the present experimental techniques are capable of giving clear-cut answers on how important dynamics are and on which correlated motions are necessary for catalysis, it is very difficult to include this information in the enzyme design process. A further drawback of the field of enzymology is that only mutational data, which directly leads to publishable conclusions, become available to the scientific community. The field of protein stability has greatly benefited from the large amount of data on stability changes resulting from point mutations, which has been produced in studies of two-state folding proteins. Much of this data has been compiled into the ProTherm database (71), which now provides an essential service to computational biologists who are trying to construct algorithms that can predict changes in protein stability. For enzymes, there is no mass production of experimental kcat, kcat/Km, or Km values for enzyme mutants. Data on kinetics for enzymes are sparse, often carried out under widely different conditions, and almost never deposited in an electronic form on the worldwide web. If we are to successfully understand how enzymes work and how we can manipulate their catalytic properties, it will be necessary to generate reproducible kinetic data for a large number of mutant enzymes, and electronically store these data in a publicly available database. With such data, the task of interpreting mutational data and understanding the principles of catalytic mechanisms will be much more feasible, and hopefully lead to theoretical models that can reproduce and predict experimental observations for a wide range of enzymes.
Rational Redesign of Enzymes
223
ACKNOWLEDGMENTS The author wishes to thank Stewart Adcock for suggestions on the manuscript. This work was supported in part by the Danish Natural Science Research Council and by the Howard Hughes Medical Institute.
REFERENCES 1. 2.
3. 4. 5. 6.
7.
8.
9. 10.
11. 12. 13. 14. 15. 16.
CK Matthews, KE van Holde, Biochemistry. Red Wood City, CA: Benjamin/ Cummings Publishing Company, Inc., 1990, pp 339–538. D Jeruzalmi, O Yurieva, Y Zhao, M Young, J Stewart, M Hingorani, M O’Donnell, J Kuriyan. Mechanism of processivity clamp opening by the delta subunit wrench of the clamp loader complex of E. coli DNA polymerase III. Cell 106:417–428, 2001. M Huse, J Kuriyan. The conformational plasticity of protein kinases. Cell 109:275–282, 2002. MJ Tyska, DM Warshaw. The myosin power stroke. Cell Motil Cytoskelet 51: 1– 15, 2002. S Panke, MG Wubbolts. Enzyme technology and bioprocess engineering. Curr Opin Biotechnol 13:111–116, 2002. ME Bruins, AE Janssen, RM Boom. Thermozymes and their applications: A review of recent literature and patents. Appl Biochem Biotechnol 90:155–186, 2001. P Forrer, S Jung, A Pluckthun. Beyond binding: Using phage display to select for structure, folding and enzymatic activity in proteins. Curr Opin Struct Biol 9:514–520, 1999. JR Cherry, MH Lamsa, P Schneider, J Vind, A Svendsen, A Jones, AH Pedersen. Directed evolution of a fungal peroxidase. Nat Biotechnol 17:379–384, 1999. ET Farinas, T Bulter, FH Arnold. Directed enzyme evolution. Curr Opin Biotechnol 12:545–551, 2001. T Sakamoto, JM Joern, A Arisawa, FH Arnold. Laboratory evolution of toluene dioxygenase to accept 4-picoline as a substrate. Appl Environ Microbiol 67: 3882–3887, 2001. L Pauling. Molecular architecture and biological reactions. Chem Eng News 24:1375–1377, 1946. AJ Kirby. Enzyme mechanisms, models, and mimics. Angew Chem, Int Ed Engl 35:707–724, 1996. A Warshel. Electrostatic origin of the catalytic power of enzymes and the role of preorganized active sites. J Biol Chem 273:27035–27038, 1998. WR Cannon, SJ Benkovic. Solvation, reorganization energy, and biological catalysis. J Biol Chem 273:26257–26260, 1998. WW Cleland, PA Frey, JA Gerlt. The low barrier hydrogen bond in enzymatic catalysis. J Biol Chem 273:25529–25532, 1998. JR Knowles. Enzyme catalysis: Not different, just better. Nature 350:121–124, 1991.
224
Nielsen
17. EA MacGregor, S Janecek, B Svensson. Relationship of sequence and structure to specificity in the alpha-amylase family of enzymes. Biochim Biophys Acta 1546:1–20, 2001. 18. BA van der Veen, JC Uitdehaag, BW Dijkstra, L Dijkhuizen. Engineering of cyclodextrin glycosyltransferase reaction and product specificity. Biochim Biophys Acta 1543:336–360, 2000. 19. A Nakamura, K Haga, S Ogawa, K Kuwano, K Kimura, K Yamane. Functional relationships between cyclodextrin glucanotransferase from an alkalophilic Bacillus and alpha-amylases. Site-directed mutagenesis of the conserved two Asp and one Glu residues. FEBS Lett 296:37–40, 1992. 20. K Takase, T Matsumoto, H Mizuno, K Yamane. Site-directed mutagenesis of active site residues in Bacillus subtilis alpha-amylase. Biochim Biophys Acta 1120:281–288, 1992. 21. C Klein, J Hollender, H Bender, GE Schulz. Catalytic center of cyclodextrin glycosyltransferase derived from X-ray structure analysis combined with sitedirected mutagenesis. Biochemistry 31:8740–8746, 1992. 22. RM Knegtel, B Strokopytov, D Penninga, OG Faber, HJ Rozeboom, KH Kalk, L Dijkhuizen, BW Dijkstra. Crystallographic studies of the interaction of cyclodextrin glycosyltransferase from Bacillus circulans strain 251 with natural substrates and products. J Biol Chem 270:29256–29264, 1995. 23. EH Rydberg, C Li, R Maurus, CM Overall, GD Brayer, SG Withers. Mechanistic analyses of catalysis in human pancreatic alpha-amylase: Detailed kinetic and structural studies of mutants of three conserved carboxylic acids. Biochemistry 41:4492–4502, 2002. 24. JE Nielsen, TV Borchert. Protein engineering of bacterial alpha-amylases. Biochim Biophys Acta 1543:253–274, 2000. 25. JE Nielsen, TV Borchert, G Vriend. The determinants of alpha-amylase pH– activity profiles. Protein Eng 14:505–512, 2001. 26. RD Wind, JC Uitdehaag, RM Buitelaar, BW Dijkstra, L Dijkhuizen. Engineering of cyclodextrin product specificity and pH optima of the thermostable cyclodextrin glycosyltransferase from Thermoanaerobacterium therosulfurigenes EM1. J Biol Chem 273:5771–5779, 1998. 27. H Guzman-Maldonado, O Paredes-Lopez. Amylolytic enzymes and products derived from starch: A review. Crit Rev Food Sci Nutr 35:1730–1742, 1995. 28. DJ Vocadlo, GJ Davies, R Laine, SG Withers. Catalysis by hen egg-white lysozyme proceeds via a covalent intermediate. Nature 412:835–838, 2001. 29. WR Kester, BW Matthews. Crystallographic study of the binding of dipeptide inhibitors to thermolysin: Implications for the mechanism of catalysis. Biochemistry 16:2506–2516, 1977. 30. WL Mock, DJ Stanford. Arazoformyl dipeptide substrates for thermolysin. Confirmation of a reverse protonation catalytic mechanism. Biochemistry 35:7369–7377, 1996. 31. WL Mock, M Aksamawati. Binding to thermolysin of phenolate-containing inhibitors necessitates a revised mechanism of catalysis. Biochem J 302(pt 1): 57– 68, 1994.
Rational Redesign of Enzymes
225
32. V Pelmenschikov, MR Blomberg, PE Siegbahn. A theoretical study of the mechanism for peptide hydrolysis by thermolysin. J Biol Inorg Chem 7:284–298, 2002. 33. A Beaumont, MJ O’Donohue, N Paredes, N Rousselet, M Assicot, C Bohuon, MC Fournie-Zaluski, BP Roques. The role of histidine 231 in thermolysin-like enzymes. A site-directed mutagenesis study. J Biol Chem 270:16803–16808, 1995. 34. PN Bryan. Protein engineering of subtilisin. Biochim Biophys Acta 1543:203– 222, 2000. 35. BS Harley, N Hanlon, RJ Jackson, M Rangarajan. Glucose isomerase: Insights into protein engineering for increased thermostability. Biochim Biophys Acta 1543:294–335, 2000. 36. H Sun, YP Pang, O Lockridge, S Brimijoin. Re-engineering butyrylcholinesterase as a cocaine hydrolase. Mol Pharmacol 62:220–224, 2002. 37. MD Joshi, G Sidhu, I Pot, GD Brayer, SG Withers, LP McIntosh. Hydrogen bonding and catalysis: A novel explanation for how a single amino acid substitution can change the pH optimum of a glycosidase. J Mol Biol 299:255– 279, 2000. 38. BW Matthews, H Nicholson, WJ Becktel. Enhanced protein thermostability from site-directed mutations that decrease the entropy of unfolding. Proc Natl Acad Sci U S A 84:6663–6667, 1987. 39. M Karpusas, WA Baase, M Matsumura, BW Matthews. Hydrophobic packing in T4 lysozyme probed by cavity-filling mutants. Proc Natl Acad Sci U S A 86:8237–8241, 1989. 40. R Aurora, GD Rose. Helix capping. Protein Sci 7:21–38, 1998. 41. SM Malakauskas, SL Mayo. Design, structure and stability of a hyperthermophilic protein variant. Nat Struct Biol 5:470–475, 1998. 42. D Gilis, M Rooman. PoPMuSiC, an algorithm for predicting protein mutant stability changes: Application to prion proteins. Protein Eng 13:849–856, 2000. 43. D Gilis, M Rooman. Predicting protein stability changes upon mutation using database-derived potentials: Solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol 272:276–290, 1997. 44. D Gilis, M Rooman. Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials. J Mol Biol 257: 1112– 1126, 1996. 45. E Lacroix, AR Viguera, L Serrano. Elucidating the folding problem of alphahelices: Local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J Mol Biol 284:173–191, 1998. 46. V Munoz, L Serrano. Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: Comparison with Zimm– Bragg and Lifson–Roig formalisms. Biopolymers 41:495–509, 1997. 47. K Takano, M Ota, K Ogasahara, Y Yamagata, K Nishikawa, K Yutani. Experimental verification of the ‘‘stability profile of mutant protein’’ (SPMP) data using mutant human lysozymes. Protein Eng 12:663–672, 1999. 48. R Guerois, JE Nielsen, L Serrano. Predicting changes in the stability of pro-
226
49.
50. 51. 52.
53. 54. 55. 56.
57.
58.
59. 60. 61. 62. 63.
64.
65.
Nielsen teins and protein complexes: A study of more than 1000 mutations. J Mol Biol 320(2):369–387, 2002. LL Looger, HW Hellinga. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: Implications for protein design and structural genomics. J Mol Biol 307:429–445, 2001. G Klebe. Recent developments in structure-based drug design. J Mol Med 78:269–281, 2000. TL Blundell, H Jhoti, C Abell. High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov 1:45–54, 2002. F Osterberg, GM Morris, MF Sanner, AJ Olson, DS Goodsell. Automated docking to multiple target structures: Incorporation of protein mobility and structural water heterogeneity in AutoDock. Proteins 46:34–40, 2002. HJ Bohm. A novel computational tool for automated structure-based drug design. J Mol Recognit 6:131–137, 1993. H Claussen, C Buning, M Rarey, T Lengauer. FlexE: Efficient molecular docking considering protein structure variations. J Mol Biol 308:377–395, 2001. M Rarey, B Kramer, T Lengauer. Time-efficient docking of flexible ligands into active sites of proteins. Proc Int Conf Intell Syst Mol Biol 3:300–308, 1995. HW Hellinga, JP Caradonna, FM Richards. Construction of new ligand binding sites in proteins of known structure II. Grafting of a buried transition metal binding site into Escherichia coli thioredoxin. J Mol Biol 222:787–803, 1991. HW Hellinga, FM Richards. Construction of new ligand binding sites in proteins of known structure I. Computer-aided modeling of sites with pre-defined geometry. J Mol Biol 222:763–785, 1991. AL Pinto, HW Hellinga, JP Caradonna. Construction of a catalytically active iron superoxide dismutase by rational protein design. Proc Natl Acad Sci U S A 94:5562–5567, 1997. DN Bolon, SL Mayo. Enzyme-like proteins by computational design. Proc Natl Acad Sci U S A 98:14274–14279, 2001. DN Bolon, CA Voigt, SL Mayo. De novo design of biocatalysts. Curr Opin Chem Biol 6:125–129, 2002. R Jaenicke. Stability and stabilization of globular proteins in solution. J Biotechnol 79:193–203, 2000. B Van den Burg, G Vriend, OR Veltman, G Venema, VG Eijsink. Engineering an enzyme to resist boiling. Proc Natl Acad Sci U S A 95:2056–2060, 1998. PM Coutinho, B Henrissat. Carbohydrate-active enzymes: An integrated database approach. In: HJ Gilbert, G Davies, B Henrissat, B Svensson, eds. Recent Advances in Carbohydrate Bioengineering. Cambridge: The Royal Society of Chemistry, 1999, pp 3–12. LP McIntosh, G Hand, PE Johnson, MD Joshi, M Korner, LA Plesniak, L Ziser, WW Wakarchuk, SG Withers. The pKa of the general acid/base carboxyl group of a glycosidase cycles during catalysis: A 13C-NMR study of Bacillus circulans xylanase. Biochemistry 35:9958–9966, 1996. JE Nielsen, G Vriend. Optimizing the hydrogen-bond network in Poisson– Boltzmann equation-based pK(a) calculations. Proteins 43:403–412, 2001.
Rational Redesign of Enzymes
227
66. MD Joshi, G Sidhu, JE Nielsen, GD Brayer, SG Withers, LP McIntosh. Dissecting the electrostatic interactions and pH-dependent activity of a family 11 glycosidase. Biochemistry 40:10115–10139, 2001. 67. D Vitkup, D Ringe, GA Petsko, M Karplus. Solvent mobility and the protein ‘‘glass’’ transition. Nat Struct Biol 7:34–38, 2000. 68. BF Rasmussen, AM Stock, D Ringe, GA Petsko. Crystalline ribonuclease A loses function below the dynamical transition at 220 K. Nature 357:423–424, 1992. 69. PK Agarwal, SR Billeter, PT Rajagopalan, SJ Benkovic, S Hammes-Schiffer. Network of coupled promoting motions in enzyme catalysis. Proc Natl Acad Sci U S A 99:2794–2799, 2002. 70. PT Rajagopalan, SJ Benkovic. Preorganization and protein dynamics in enzyme catalysis. Chem Rec 2:24–36, 2002. 71. MM Gromiha, H Uedaira, J An, S Selvaraj, P Prabakaran, A Sarai. ProTherm, thermodynamic database for proteins and mutants: Developments in version 3.0. Nucleic Acids Res 30:301–302, 2002.
10 Details in the Reaction Mechanism of Chitinases ˚seidnes, Vincent G. H. Eijsink, Gustav Kolstad, Sigrid Ga and Bjørnar Synstad Agricultural University of Norway ˚ s, Norway A
Martin G. Peter University of Potsdam Golm, Germany
Jens Erik Nielsen* University of California, San Diego La Jolla, California, USA
David Komander, Douglas Houston, and Daan M. F. van Aalten University of Dundee Dundee, Scotland
1
INTRODUCTION
Chitin, a linear polymer of h(1,4)-linked N-acetyl glucosamine (NAG), is one of the most abundant polysaccharides in nature. It occurs in a large number of *Current affiliation: University College Dublin, Dublin, Ireland.
229
230
Eijsink et al.
species, most prominently as a principal structural component of the exoskeleton of insects and crustaceans, as well as in the cell walls of a variety of fungi (1,2). Thus, it is not surprising that chitinolytic enzymes are abundant in nature, occurring in a large variety of organisms varying from prokaryotes to man. Chitin-containing organisms need chitinases during their normal life cycles, whereas other organisms (mainly bacteria) use these enzymes to exploit chitin as an energy source. Plants produce chitinolytic enzymes as part of their defense against chitin-containing pathogens. Interestingly, the malaria-causing Plasmodium falciparum depends on chitinases during its life cycle (3,4). Chitinases are of biotechnological interest for several reasons. First, these enzymes may be used for the conversion of chitin and chitosan (partly deacetylated chitin) into chito-oligosaccharides and NAG. The development of enzymatic methods for these conversions is of interest because the presently established chemical procedures for synthesis are highly laborious. Second, chitinases may be applied as inhibitors of chitin-containing pathogenic fungi and insects (5). Third, inhibitors of chitinases are of interest because they may be used to interfere with the life cycles of chitin-containing pathogens, or to interfere with chitinase-dependent transmission mechanisms of parasites (3,4,6). The use of chitinolytic enzymes as industrial biocatalysts requires the availability of enzymes that are sufficiently stable and active under process conditions. Chito-oligosaccharides that are produced have several potential applications in agriculture and medicine, and there is evidence that the biological activity of these compounds depends on both oligomer length and the sequence of NAG and glucosamine residues (see Ref. 7 and references therein) (8). Thus, during the development of industrial chitinolytic biocatalysts, the substrate specificity and product specificity of the enzymes must also be considered. Protein engineering studies of chitinases are scarce, but recently, several interesting studies have appeared (9–12). Almost all published studies concern work aimed at unraveling the catalytic mechanism, which is a basis for future efforts in rational redesign. The catalytic mechanism of one important class of chitinases has now been determined in great detail, providing fascinating insights into the complexity of catalysis. 2
CATALYSIS IN GLYCOSIDE HYDROLASES
Chitinases are just one member of an enormous collection of naturally occurring enzymes that are able to hydrolyze glycosidic bonds. These so-called glycoside hydrolases (or glycosidases) have been classified into more than 80 families by Couthinho and Henrissat (see Chapter 2 in this book and
Reaction Mechanism of Chitinases
231
http://afmb.cnrs-mrs.fr/CAZY/index.html). Glycosidases are often multidomain proteins, consisting of a catalytic domain and one or more relatively small domains that play a role in interactions with the substrates (13). Families 18 and 19 contain chitinases, whereas several other families contain enzymes called chitosanases. In terms of nomenclature, the difference between chitinases and chitosanases is rather arbitrary because many enzymes classi-
Figure 1 Catalytic mechanisms of glycoside hydrolases. The double displacement mechanism (left; residue numbering as in hen egg-white lysozyme) leads to retention of anomeric configuration, whereas direct displacement (right) leads to inversion. (From Refs. 15–19.)
232
Eijsink et al.
fied as chitinases also degrade chitosan. In fact, it has been suggested that family 19 chitinases and some of the chitosanases belong to the same enzyme family (14). In general, glycosidases act by two principally different catalytic mechanisms that lead to either retention or inversion of the configuration at the anomeric carbon (Fig. 1) (15–19). Retaining enzymes operate via a double displacement mechanism. In the first step, protonation of the glycosidic oxygen by one acidic residue (Glu35 in Fig. 1) and a concomitant nucleophilic attack on the anomeric carbon by another acidic residue (Asp52 in Fig. 1) lead to breakage of the scissile bond and formation of a covalent glycosyl– enzyme intermediate (19). In the second step, this intermediate is hydrolyzed by a water molecule that approaches the anomeric carbon from a position close to that of the original glycosidic oxygen. Inverting enzymes operate via
Figure 2 Catalytic mechanism of family 18 chitinases as proposed in Ref. 24 on the basis of results described in Refs. 22 and 23. Family 18 chitinases are retaining enzymes that, however, lack the second acidic residue observed in other retaining glycoside hydrolases (e.g., Asp52 in hen egg-white lysozyme; Fig. 1). The need for this acidic residue is alleviated by anchimeric assistance by the N-acetyl group of the sugar in the 1 position, which leads to formation of an oxazolinium ion intermediate. Note the distortion of the 1 sugar, which is essential for this mechanism to be possible.
Reaction Mechanism of Chitinases
233
a direct displacement mechanism. Protonation of the glycosidic oxygen and breakage of the scissile bond occur concomitantly with nucleophilic attack by an activated water molecule. This water molecule approaches the anomeric carbon from the other side of the sugar plane, explaining why this mechanism leads to an inversion of the configuration at the anomeric carbon. An important structural difference between retaining and inverting glycoside hydrolases is seen in the distance between the catalytic acid and the catalytic base/nucleophile (Fig. 1), which is approximately 4 A˚ larger in inverting enzymes. Thus, the active site of inverting enzymes provides sufficient space for a water molecule, which attacks the anomeric center with direct displacement of the glycosidic oxygen (Fig. 1). It should be noted that the mechanisms shown in Fig. 1 are, in fact, gross simplifications. Additional amino acid residues besides the catalytic acid are needed for optimal activity, for example, because they contribute to steering the acidity of this proton donor during the catalytic cycle. The presence and the importance of larger interaction networks in the active sites of glycosidases are well illustrated by recent works on a retaining family 11 xylanase (20,21). Family 19 chitinases are inverting enzymes, and their active sites resemble the active sites of other inverting glycoside hydrolases. However, in the probably most widespread family of chitinases, family 18, catalysis proceeds via a unique substrate-assisted mechanism (22–24) (Fig. 2), which has been unraveled recently in detail by protein engineering and crystallography. 3
DETAILS OF THE CATALYTIC MECHANISM OF FAMILY 18 CHITINASES
With respect to mechanism, chitinase B (ChiB) from the soil bacterium Serratia marcescens is, to date, one of the most intensively studied family 18 chitinases (10,25–27). ChiB is an exochitinase containing a catalytic domain with a (ha)8 (‘‘TIM barrel’’) fold and a small C-terminal chitin-binding domain, which is likely to be involved in interactions with the substrate (27,28) (Fig. 3A). The substrate-binding cleft contains six subsites, running from 3 (nonreducing end of the substrate) to +3 (reducing end of the substrate). The chitin-binding domain expands the substrate-binding surface toward the ‘‘+’’ (reducing end) direction, which implicates that ChiB degrades chitin from the nonreducing ends of the polysaccharide chains. The substrate-binding groove (Fig. 3B) is covered by two loops (Fig. 3A and B) that protrude from the catalytic TIM barrel, thus conferring a tunnel character to the substrate-binding site (see Ref. 17 for background information on active site architecture). The binding of the substrate results in closure of the roof of the tunnel, thus improving interactions with the substrate (10).
234
Eijsink et al.
Figure 3 The structure of ChiB (panels A and B) and ChiA (panel C) from S. marcescens (Refs. 27,51) Panel A: View into the substrate-binding groove of ChiB showing aromatic side chains that interact with the substrate (Refs. 10,27). The most clearly distinguishable subsites are numbered. The chitin-binding domain lies to the right of the catalytic core. The loop blocking the substrate-binding groove beyond the 3 subsite (arrow) and two loops that form the ‘‘roof’’ of the active site tunnel are drawn as sticks. The side chain of the catalytic acid (Glu144) is shown as a ball-andstick model. Panel B: The active site groove/tunnel of ChiB viewed from the ‘‘minus’’ side (the side where the nonreducing end of the subsite binds; in this view, the chitinbinding domain lies behind the catalytic core). The side chain of Glu144 is shown as a ball-and-stick model. The side chains of Asp316 and Trp97 are shown as sticks; closure of the ‘‘roof’’ of the active site tunnel upon substrate binding involves interactions between these two side chains (Ref. 10). The eight h-strands that make up the core of the (ha)8 barrel are shown as arrows. For clarity, the chitin-binding domain and the loop blocking the substrate-binding groove in front of the 3 subsite were omitted from the figure. Panel C: View into the substrate-binding groove of ChiA, showing aromatic side chains that interact with the substrate (Refs. 12,32). The most clearly distinguishable subsites are numbered. The chitin-binding domain lies to the left of the catalytic core. The side chain of the catalytic acid (Glu315) is shown as a ball-and-stick model. Note that ChiA and ChiB differ in terms of the location of their chitin-binding domains. The substrate-binding groove of ChiA is open on both sides; in ChiB, there is no 4 subsite but an insertion (compared with ChiA; see panel A) that hampers substrate binding beyond the 3 subsite. The pictures were made using PyMOL. (From Ref. 52.)
Reaction Mechanism of Chitinases
Figure 3
235
Continued.
For comparison, Fig. 3C shows the structure of another chitinase, ChiA, from S. marcescens. This enzyme has a more open active site groove than ChiB and its chitin-binding domain expands the groove on the nonreducing side of the catalytic center. Consequently, ChiA is able to exert some endoactivity (25,26) and, most importantly, its exoactivity results in degradation of the chitin chains from their reducing ends (12,26,27). The main product of chitin hydrolysis by ChiB is NAG2 (25), which, as explained above, is released from the nonreducing end of the polysaccharide
236
Eijsink et al.
chains (27). Crystallography revealed that NAG5 preferably binds to the 2 to +3 sites (10). However, NAG6 binds to the 3 to +3 subsites (25). Longer (polymeric) substrates, covering the chitin-binding domain, apparently do not occupy the nonreducing end 3 subsite and are cleaved predominantly to yield the disaccharide. In family 18 chitinases, the catalytic acid that protonates the glycosidic oxygen is generally a glutamic acid residue (9), which, in the case of ChiB, is Glu144. As illustrated in Fig. 4A, family 18 chitinases contain several other conserved acidic residues. Mutation of these residues resulted in severe decreases in activity (9,29,30), but their functions were, until recently, unknown. Farther away from the active site, near Asp140 in ChiB, family 18 chitinases contain two more conserved residues (Tyr10 and Ser93 in ChiB) whose roles have not been studied until recently. Fig. 4B illustrates that these conserved residues play important roles during catalysis, as explained below. The mutation of Asp140, Asp142, Glu144, and Asp215 to their corresponding amides decreased the catalytic activity of ChiB (Table 1). Similar or even more profound negative effects were observed when these residues were replaced by alanine (30). The alanine and amide mutants displayed similar pH activity profiles, which, in most cases, differed considerably from the pH activity profile of the wild type (WT; see below). This shows that the residual activity found in the amide mutants is not due to deamidation of the introduced amide residue. Analysis of the basic arm of the pH activity profiles (Fig. 5) of ChiB variants yielded some important insights. Intuitively, one would think that loss of activity at alkaline pH would be caused by deprotonation of the catalytic acid. However, in contrast to all other mutants, the pH activity profile of the E144Q mutant was essentially the same as in the wild-type enzyme. Two major types of shifts in pH activity profiles were observed: the D142N mutation yielded an alkaline shift of the pH optimum of at least two units (Fig. 5) whereas in the D140N and D215N mutants, the pH optimum was shifted to the acidic side by at least two units (not shown). The acidic shifts found in D215N and D140N were expected (although difficult to predict and explain quantitatively) because Asp215 and Asp140 are likely to increase the pKa of neighboring ionizable groups (e.g., Asp142 and Glu144). The basic shift in the pH activity profile upon the D142N mutation indicates that the basic arm of the pH activity profile in this mutant is determined by a group with a pKaz 10. The shift also indicates that the basic arm of the pH activity profile in the wild-type enzyme reflects a titration of Asp142. The best candidate for the group determining the (not visible nor measurable) basic arm of the pH activity profile in D142N is the catalytic acid, Glu144. It is conceivable that, in the enzyme–substrate complex, Glu144 has an exceptionally high pKa because it is shielded from the
Reaction Mechanism of Chitinases
237
Figure 4 Details of the active site of ChiB highlighting important conserved residues in family 18 chitinases. The pictures are taken from the crystal structures of the E144Q mutant (A) and the E144Q mutant in complex with NAG5 (B) (Ref. 10). Upon substrate binding, Asp142 rotates toward Glu/Gln144; the buried and charged side chains of Asp140 are stabilized because: (1) a hydrogen-bonding water molecule (sphere in panel A) is replaced by a better proton donor, namely the hydroxyl group of Tyr10 (which comes closer to Asp140), and (2) the side chain of Ser93 moves closer, thus strengthening the hydrogen bond between Ser93 and Asp140. In panel B, only three of the sugar moieties, occupying subsites +1 to 2, are shown. Dotted lines indicate hydrogen bonds. The arrow in panel B points from Glu144 toward the glycosidic oxygen that is to be protonated.
238
Eijsink et al. Table 1 of ChiB Variant Wild type D140N D142N E144Q D215N
Catalytic Activity of Active Site Mutants kcat (s1)
Km (AM)
16F4 0.051F0.010 0.33F0.05 0.0037F0.0005 0.92F0.30
32F12 52F16 5F3 20F4 45F22
Reactions were conducted at pH 6.0 and 37jC. The substrate, 4-methylumbelliferyl-NAG2, was converted exclusively to (fluorescent) 4-methylumbelliferone and NAG2 (for experimental details, see Refs. 25 and 30).
Figure 5 The pH activity profiles of WT ChiB (.) and the E144Q (E), D142N (D), S93A (o), and Y10F (n) variants. Because the plots are based on kcat, the curves reflect pKa values of residues in the enzyme–substrate complex. (From Ref. 10; Copyright n 2001 National Academy of Sciences, USA.)
Reaction Mechanism of Chitinases
239
solvent and because the glycosidic oxygen in its vicinity (3.5 A˚) is likely to carry a partial negative charge due to close contacts between the latter and ionized Asp215 (the closest contact between Asp215–Oy and the glycosidic oxygen is 3.4 A˚). Further insights into the roles of the acidic residues were obtained from structural studies of the E144Q mutant (Fig. 4A), the complex of E144Q with NAG5 (Fig. 4B), the wild type in complex with the pseudotrisaccharide allosamidin,* and a cryo-trapped reaction intermediate obtained by soaking WT crystals in a NAG5 solution (10). The structures of the EQ–NAG5 and WT–allosamidin complexes showed that Asp142 rotates toward Glu144 upon substrate binding (compare Fig. 4A with Fig. 4B). Rotation of Asp142 is accompanied by movements of residues in the core of the TIM barrel, which bring Tyr10 and Ser93 closer to Asp140 (Fig. 4). The importance of Ser93 and Tyr10 was confirmed by the observation that the Y10F and S93A mutations strongly decreased activity (Fig. 5). Most importantly, the pH activity profiles of the S93A and Y10F mutants looked very similar to that of D142N, suggesting that these three residues act in a concerted fashion during catalysis (Fig. 5). The EQ–NAG5 complex also showed that the sugar in the 1 subsite is distorted (Fig. 4B), as had been suggested previously (Fig. 2). The sugar ring is in the boat conformation, whereas the N-acetyl group is frozen by hydrogen bonding with Asp142 in a conformation that locates its oxygen atom at only 3.0 A˚ from the anomeric carbon. This results in a nearly colinear orientation of the glycosidic oxygen, the anomeric carbon, and the N-acetyl oxygen, thus preparing the scenario for an SN2-type displacement. The structure of a cryo-trapped reaction intermediate revealed density in the 1 subsite, which showed a good overlap with the density observed for the allosamizoline group of allosamidin, thus providing evidence for an oxazolinium ion intermediate (Fig. 2). The structure also showed a wellordered, displaced disaccharide, which occupied a position between the +1/ +2 and +2/+3 subsites. Displacement of the product from the catalytic center is essential to provide sufficient space for the approach of a water toward the anomeric carbon from the h-face, which is necessary for completion of the reaction. Indeed, in the structure, a well-ordered water molecule was visible at 3.0 A˚ from the anomeric carbon (10). Interestingly, the displaced disaccharide interacted with the loops that constitute the ‘‘roof’’ of the active site tunnel (Fig. 3A and B). Thus, it is conceivable * Allosamidin is the pseudotrisaccharide N,NV-diacetyl-allosaminobiosyl allosamizoline (50). It binds to subsites 3 to 1 in ChiB and in other family 18 chitinases. The allosamizoline moiety occupies subsite 1 and resembles the proposed oxazolinium ion intermediate shown in Figs. 2 and 6.
240
Eijsink et al.
that reversal of the roof closure that accompanies substrate binding actively contributes to substrate displacement. Taken together, the results from mutagenesis, enzymological studies, and crystallographic studies lead to the conclusion that family 18 chitinases act by the mechanism displayed in Fig. 6. The crucial role of Asp142 in this mechanism has been confirmed recently by work on a hexosaminidase (31). In family 18 chitinases, rotation of the protonated Asp142 has three important consequences: (a) the conformation of the N-acetyl group of the NAG residue at the 1 subsite is frozen in a position optimal for nucleophilic attack at the anomeric carbon; (b) the hydrogen bond donated by the OH group of Asp142 increases the acidity of Glu144, thus promoting protonation of the glycosidic oxygen; and (c) the positive charge developing upon formation of the oxazolinium ion is stabilized by tight interactions with Asp142. The presence of a proton on rotated Asp142 is crucial, and the results indicate that the basic arm of the pH activity profile of the wild-type enzyme reflects titration of this residue. It is important to note that acid catalysis by Glu144 is not enhanced only by Asp142, as catalysis is also closely coupled to Asp140, Tyr10, Ser93, and Asp215 (the role of the latter residue is as yet relatively unclear). One of the most important conclusions of this work is that catalysis in family 18 chitinases involves the concerted action of many
Figure 6 Catalytic mechanism of ChiB. (A) Resting enzyme (note that Glu144 is deprotonated in the empty enzyme; protonation occurs as water is displaced by the substrate). (B) Binding of substrate (only the sugar binding to the 1 subsite is shown) causes distortion of the 1 pyranose ring to a boat conformation and rotation of Asp142 toward Glu144; the rotated Asp142 distorts the N-acetyl group, increases the acidity of Glu144, and stabilizes the developing positive charge. (C) The hydrolysis of the oxazolinium ion by an incoming water molecule completes the reaction. The structural location of the amino acid residues, as well as the mechanism employed to stabilize the buried negatively charged Asp140 after rotation of Asp142, are visualized in Fig. 4. See text for further details. (From Ref. 10; Copyright n 2001 National Academy of Sciences, USA.)
Reaction Mechanism of Chitinases
241
residues, not all of which are obvious elements of what one would call the ‘‘catalytic site’’ (see Ref. 21 for an analysis of a similarly complex network in another glycosidase). 4
SUBSTRATE SPECIFICITY AND PRODUCT SPECIFICITY
So far, there have been no reports in the literature regarding rational engineering of chitinase subsites with the aim of changing substrate specificity or product specificity. Such engineering efforts are likely to be needed to obtain biocatalysts that efficiently convert chitin or chitosan into defined oligosaccharides. Our own preliminary (unpublished) observations indicate that interesting results may be obtained, but that rationalization is difficult. When attempting a rational redesign of the substrate specificity and product specificity of chitinases and other polysaccharide-degrading enzymes, it is important to note that naturally occurring enzymes display fundamental differences that may give important leads for such attempts. For example, ChiA and ChiB from S. marcescens (Fig. 3) share similar catalytic cores but act on chitin in rather different manners. As explained above, the two enzymes degrade chitin chains in different directions. This difference in directionality is likely to have profound effects on substrate affinities in at least some of the subsites because, for example, the 2 site is a ‘‘product site’’ in ChiB (binding one of the sugars in the dimer that is being cleaved off), whereas it is a ‘‘substrate site’’ in ChiA (binding to a sugar that is part of a long chitin chain). Indeed, potentially relevant differences in subsite architecture are readily detectable by a structural comparison of ChiA and ChiB, and these differences are amenable to further studies using site-directed mutagenesis. Although reasonable to assume, there is no proof that ChiA and ChiB are processive enzymes (32). Processive action would render the structural basis of differences in directionality even more interesting. S. marcescens produces at least one additional chitinase, ChiC, which is an example of another type of family 18 chitinases, presumably having a much more open and shallow active site cleft than ChiA and ChiB. This putative architecture is inferred from sequence comparisons with ChiB and ChiA, which show that major parts of the sequences that make up the ‘‘walls’’ of the active site groove in ChiA and ChiB are lacking in ChiC. Thus, the catalytic core of ChiC may be compared with the plant chitinase, hevamine, for which a crystal structure is known (33) and which is a clear and welldescribed example of a family 18 chitinase with a shallow active site groove. These chitinases do not interact with individual sugars as intimately as enzymes possessing deep substrate-binding site grooves or tunnels. They display little or no activity toward smaller substrates such as NAG3 and NAG4 (26,34) and are thus good candidates as biocatalysts for oligomer produc-
242
Eijsink et al.
tion. However, it may be difficult to further modify the specificity of these enzymes (e.g., preferences for acetylated versus nonacetylated sugars) because the limited number of interactions per subsite gives the protein engineer a limited number of residues to play with. 5
PERSPECTIVES FOR ENGINEERING CATALYTIC PROPERTIES
A rational redesign of the catalytic properties of enzymes is still a formidable challenge, especially if the goal is to engineer enzymes that are efficient industrial biocatalysts (see Ref. 35 and references therein). One important challenge is understanding the interplay between binding and catalysis (36), which severely complicates a successful redesign of enzyme active sites. Another major challenge lies in the fact that (long-range) electrostatic interactions are difficult to rationalize, although they play a major role during catalysis (37,38) (see also Chapter 9 by J. E. Nielsen). Finally, dynamics and flexibility—phenomena that are not easy to address experimentally—are extremely important for catalytic efficiency (36,39–43). The work on chitinases described in this chapter provides an example of the complexity of catalysis, illustrating the importance of electrostatic interactions (cf., the Tyr10–Ser93–Asp140–Asp142–Glu144 assembly), concerted substrate binding and distortion, and conformational flexibility (the abovementioned movement of Tyr10 is achieved by backbone movements of up to 2 A˚). In principle, the detailed knowledge on catalysis provides a basis for future efforts in a rational redesign of the catalytic properties of chitinases, but the major effect of this new knowledge probably is a strong notion that such redesign is quite a challenge. There is hope though, as illustrated by a considerable number of examples of successful redesign that occur in the literature (44–48). The complexity of catalysis and the lack of detailed knowledge of catalytic mechanisms indicate that, today, probably the fastest general route to the development of enzymes with improved catalytic properties includes the use of the combinatorial approaches discussed in other chapters of this volume (49). Rational protein engineering studies are, however, essential for understanding how enzymes work, and will eventually create a knowledge base that leads to an increasing success rate in the rational redesign of catalytic properties. REFERENCES 1.
MG Peter. Chitin and chitosan from fungi. In: EJ Vandamme, S De Baets, A Steinbu¨chel, eds. Biopolymers Vol. 6. Weinheim: Wiley-VCH, 2002, pp 123– 157.
Reaction Mechanism of Chitinases 2.
3.
4.
5. 6.
7.
8.
9.
10.
11.
12.
13. 14.
15. 16.
243
MG Peter. Chitin and chitosan from animal sources. In: EJ Vandamme, S De Baets, A Steinbu¨chel, eds. Biopolymers Vol. 6. Weinheim: Wiley-VCH, 2002, pp 481–574. RC Langer, JM Vinetz. Plasmodium ookinete-secreted chitinase and parasite penetration of the mosquito peritrophic matrix. Trends Parasitol 17:269–272, 2001. YL Tsai, RE Hayward, RC Langer, DA Fidock, JM Vinetz. Disruption of Plasmodium falciparum chitinase markedly impairs parasite invasion of mosquito midgut. Infect Immun 69:4048–4054, 2001. A Herrera-Estrella, I Chet. Chitinases in biological control. In: P Jolle`s, RAA Muzzarelli, eds. Chitin and Chitinases. Basel: Birkha¨user, 1999, pp 171–184. DR Houston, K Shiomi, N Arai, S Omura, MG Peter, A Turberg, B Synstad, VGH Eijsink, DMF van Aalten. High-resolution structures of a chitinase complexed with natural product cyclopentapeptide inhibitors: mimicry of carbohydrate substrate. Proc Natl Acad Sci USA 99:9127–9132, 2002. S Bahrke, JM Einarsson, J Gislason, S Haebel, MC Letzel, J Peter-Katalinic, MG Peter. Sequence analysis of chitooligosaccharides by matrix-assisted laser desorption ionization postsource decay mass spectrometry. Biomacromolecules 3:696–704, 2002. MC Letzel, B Synstad, VGH Eijsink, J Peter-Katalinic, MG Peter. Libraries of chito-oligosaccharides of mixed acetylation patterns and their interactions with chitinases. In: MG Peter, A Domard, RAA Muzzarelli, eds. Advances in Chitin Science Vol. 4. Potsdam: Universita¨t Potsdam, 2000, pp 545–557. T Watanabe, K Kobori, K Miyashita, T Fujii, H Sakai, M Uchida, H Tanaka. Identification of glutamic acid-204 and aspartic acid-200 in chitinase-A1 of Bacillus circulans WL-12 as essential residues for chitinase activity. J Biol Chem 268:18567–18572, 1993. DMF van Aalten, D Komander, B Synstad, S Gaseidnes, MG Peter, VGH Eijsink. Structural insights into the catalytic mechanism of a family 18 exochitinase. Proc Natl Acad Sci USA 98:8979–8984, 2001. E Bokma, HJ Rozeboom, M Sibbald, BW Dijkstra, JJ Beintema. Expression and characterization of active site mutants of hevamine, a chitinase from the rubber tree Hevea brasiliensis. Eur J Biochem 269:893–901, 2002. Y Papanikolau, G Prag, G Tavlas, CE Vorgias, AB Oppenheim, K Petratos. High resolution structural analyses of mutant chitinase A complexes with substrates provide new insight into the mechanism of catalysis. Biochemistry 40: 11338–11343, 2001. Y Bourne, B Henrissat. Glycoside hydrolases and glycosyltransferases: families and functional modules. Curr Opin Struct Biol 11:593–600, 2001. AF Monzingo, EM Marcotte, PJ Hart, JD Robertus. Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core. Nat Struct Biol 3:133–140, 1996. DE Koshland. Stereochemistry and the mechanism of enzymatic reactions. Biol Rev Camb Philos Soc 28:416–436, 1953. JD McCarter, SG Withers. Mechanisms of enzymatic glycoside hydrolysis. Curr Opin Struct Biol 4:885–892, 1994.
244
Eijsink et al.
17. G Davies, B Henrissat. Structures and mechanisms of glycosyl hydrolases. Structure 3:853–859, 1995. 18. JC Uitdehaag, R Mosi, KH Kalk, BA van der Veen, L Dijkhuizen, SG Withers, BW Dijkstra. X-ray structures along the reaction pathway of cyclodextrin glycosyltransferase elucidate catalysis in the alpha-amylase family. Nat Struct Biol 6:432–436, 1999. 19. DJ Vocadlo, GJ Davies, R Laine, SG Withers. Catalysis by hen egg-white lysozyme proceeds via a covalent intermediate. Nature 412:835–838, 2001. 20. LP McIntosh, G Hand, PE Johnson, MD Joshi, M Korner, LA Plesniak, L Ziser, WW Wakarchuk, SG Withers. The pKa of the general acid/base carboxyl group of a glycosidase cycles during catalysis: a 13C-NMR study of Bacillus circulans xylanase. Biochemistry 35:9958–9966, 1996. 21. MD Joshi, G Sidhu, JE Nielsen, GD Brayer, SG Withers, LP McIntosh. Disecting the electrostatic interactions and pH-dependent activity of a family 11 glycosidase. Biochemistry 40:10115–10139, 2001. 22. AC Terwisscha van Scheltinga, A Armand, KH Kalk, A Isogai, B Henrissat, BW Dijkstra. Stereochemistry of chitin hydrolysis by a plant chitinase lysozyme and x-ray structure of a complex with allosamidin—evidence for substrate assisted catalysis. Biochemistry 34:15619–15623, 1995. 23. I Tews, A Perrakis, A Oppenheim, Z Dauter, KS Wilson, CE Vorgias. Bacterial chitobiase structure provides insight into catalytic mechanism and the basis of Tay–Sachs disease. Nat Struct Biol 3:638–648, 1996. 24. I Tews, AC Terwisscha van Scheltinga, A Perrakis, KS Wilson, BW Dijkstra. Substrate-assisted catalysis unifies two families of chitinolytic enzymes. J Am Chem Soc 119:7954–7959, 1997. 25. MB Brurberg, IF Nes, VGH Eijsink. Comparative studies of chitinases A and B from Serratia marcescens. Microbiology 142:1581–1589, 1996. 26. K Suzuki, N Sugawara, M Suzuki, T Uchiyama, F Katouno, N Nikaidou, T Watanabe. Chitinases A, B, and C1 of Serratia marcescens 2170 produced by recombinant Escherichia coli: enzymatic properties and synergism on chitin degradation. Biosci Biotechnol Biochem 66:1075–1083, 2002. 27. DMF van Aalten, B Synstad, MB Brurberg, E Hough, BW Riise, VGH Eijsink, RK Wierenga. Structure of a two-domain chitotriosidase from Serratia marcescens at 1.9-angstrom resolution. Proc Natl Acad Sci USA 97:5842–5847, 2000. 28. T Ikegami, T Okada, M Hashimoto, S Seino, T Watanabe, M Shirakawa. Solution structure of the chitin-binding domain of Bacillus circulans WL-12 chitinase A1. J Biol Chem 275:13654–13661, 2000. 29. T Watanabe, M Uchida, K Kobori, H Tanaka. Site-directed mutagenesis of the Asp-197 and Asp-202 residues in chitinase A1 of Bacillus circulans WL-12. Biosci Biotechnol Biochem 58:2283–2285, 1994. 30. B Synstad, S Ga˚seidnes, G Vriend, JE Nielsen, VGH Eijsink. On the contribution of conserved acidic residues to catalytic activity of chitinase B from Serratia marcescens. In: MG Peter, A Domard, RAA Muzzarelli, eds. Advances in Chitin Science Vol. 4. Potsdam: Universita¨t Potsdam, 2000, pp 524–529.
Reaction Mechanism of Chitinases
245
31. SJ Williams, B Mark, DJ Vocadlo, MN James, SG Withers. Aspartate 313 in the Streptomyces plicatus hexosaminidase plays a critical role in substrate assisted catalysis by orienting the 2-acetamido group and stabilizing the transition state. J Biol Chem 277:40055–40065, 2002. 32. T Uchiyama, F Katouno, N Nikaidou, T Nonaka, J Sugiyama, T Watanabe. Roles of the exposed aromatic residues in crystalline chitin hydrolysis by chitinase A from Serratia marcescens 2170. J Biol Chem 276:41343–41349, 2001. 33. AC Terwisscha van Scheltinga, KH Kalk, JJ Beintema, BW Dijkstra. Crystal structures of hevamine, a plant defence protein with chitinase and lysozyme activity, and its complex with an inhibitor. Structure 2:1181–1189, 1994. 34. E Bokma, T Barends, AC Terwissch van Scheltinga, BW Dijkstra, JJ Beintema. Enzyme kinetics of hevamine, a chitinase from the rubber tree Hevea brasiliensis. FEBS Lett 478:119–122, 2000. 35. S Ga˚seidnes, B Synstad, JE Nielsen, VGH Eijsink. Rational engineering of the stability and the catalytic performance of enzymes. J Mol Catal B Enzym 21:3– 8, 2003. 36. AR Fersht. Structure and Mechanism in Protein Science. New York: WH Freeman, 1998. 37. SE Jackson, AR Fersht. Contribution of long-range electrostatic interactions to the stabilization of the catalytic transition state of the serine protease subtilisin BPNV. Biochemistry 32:13909–13916, 1993. 38. A de Kreij, B van den Burg, G Venema, G Vriend, VGH Eijsink, JE Nielsen. The effects of modifying the surface charge on the catalytic activity of a thermolysinlike protease. J Biol Chem 277:15432–15438, 2002. 39. HR Faber, BW Matthews. A mutant T4 lysozyme displays five different crystal conformations. Nature 348:263–266, 1990. 40. M Gerstein, AM Lesk, C Chothia. Structural mechanisms for domain movements in proteins. Biochemistry 33:6739–6749, 1994. 41. OR Veltman, VGH Eijsink, G Vriend, A deKreij, G Venema, B van den Burg. Probing catalytic hinge bending motions in thermolysin-like proteases by glycine!alanine mutations. Biochemistry 37:5305–5311, 1998. 42. MJ Osborne, J Schnell, SJ Benkovic, HJ Dyson, PE Wright. Backbone dynamics in dihydrofolate reductase complexes: role of loop flexibility in the catalytic mechanism. Biochemistry 40:9846–9859, 2001. 43. EZ Eisenmesser, DA Bosco, M Akke, D Kern. Enzyme dynamics during catalysis. Science 295:1520–1523, 2002. 44. HM Wilks, KW Hart, R Feeney, CR Dunn, H Muirhead, WN Chia, DA Barstow, T Atkinson, AR Clarke, JJ Holbrook. A specific, highly active malate dehydrogenase by redesign of a lactate dehydrogenase framework. Science 242: 1541–1544, 1988. 45. JJ Perona, L Hedstrom, WJ Rutter, RJ Fletterick. Structural origins of substrate discrimination in trypsin and chymotrypsin. Biochemistry 34:1489–1499, 1995. 46. E Quemeneur, M Moutiez, JB Charbonnier, A Menez. Engineering cyclophilin into a proline-specific endopeptidase. Nature 391:301–304, 1998.
246
Eijsink et al.
47. F Cedrone, A Menez, E Quemeneur. Tailoring new enzyme functions by rational redesign. Curr Opin Struct Biol 10:405–410, 2000. 48. D Becker, C Braet, H Brumer III, M Claeyssens, C Divne, BR Fagerstrom, M Harris, TA Jones, GJ Kleywegt, A Koivula, S Mahdi, K Piens, ML Sinnott, J Stahlberg, TT Teeri, M Underwood, G Wohlfahrt. Engineering of a glycosidase family 7 cellobiohydrolase to more alkaline pH optimum: the pH behaviour of Trichoderma reesei Cel7A and its E223S/A224H/L225V/T226A/D262G mutant. Biochem J 356:19–30, 2001. 49. FH Arnold. Combinatorial and computational challenges for biocatalyst design. Nature 409:253–257, 2001. 50. S Sakuda, A Isogai, S Matsumoto, A Suzuki. Search for microbial insect growth regulators: II. Allosamidin, a novel insect chitinase inhibitor. J Antibiot (Tokyo) 40:296–300, 1987. 51. A Perrakis, I Tews, Z Dauter, AB Oppenheim, I Chet, KS Wilson, CE Vorgias. Crystal-structure of a bacterial chitinase at 2.3-angstrom resolution. Structure 2:1169–1180, 1994. 52. WL DeLano. The PyMOL Molecular Graphics System. San Carlos, CA, USA: DeLano Scientific 2002 (www.pymol.org).
11 Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase Frank M. Raushel Texas A&M University College Station, Texas, U.S.A.
1
AMIDOHYDROLASE SUPERFAMILY
The amidohydrolase superfamily is a related group of enzymes that catalyze the hydrolysis of bonds to carbonyl and phosphoryl centers. The most prominent members of this family of proteins include urease (URE), phosphotriesterase (PTE), adenosine deaminase (ADA), dihydroorotase (DHO), and atrazine chlorohydrolase (1). The reactions catalyzed by some of these enzymes are illustrated in Sch. 1. Structurally, all of these proteins have been shown to fold into a typical (ha)8-barrel motif, although the level of overall sequence identity is rather low. The hallmark for this family of enzymes is a cluster of four histidine residues that come together in three-dimensional space to form a highly structured binding site for divalent metal ions (2–4). The most common arrangement is for a binuclear metal center, as observed in the x-ray crystal structures of URE, PTE, DHO, and the phosphotriesterase homology protein (PHP), although a mononuclear metal binding site has been observed with ADA (5). Within the binuclear metal ion clusters, there 247
248
Scheme 1
Raushel
Reactions catalyzed by members of the amidohydrolase super-family.
Figure 1 Representation of the binuclear metal center within the active site of phosphotriesterase. (From Ref. 10.)
Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase
249
are always two ligands that bridge the two metal ions: a hydroxide from solvent and a carboxylate group. In the case of URE, PTE, and DHO, the bridging carboxylate originates from the side chain of a conserved lysine residue that has reacted with CO2 to form a carbamate functional group (2–4). In PHP, the bridging group is contributed from the side chain of a glutamate residue (6). A cartoon of the binuclear metal center in PTE is shown in Fig. 1. 2
CHEMICAL MECHANISM
The apparent role of the metal centers within the active sites of these enzymes is to activate the hydrolytic water molecule and substrate for nucleophilic attack. The actual chemical transformation is best understood in the reaction catalyzed by DHO because an x-ray crystal structure was determined with the substrate and product bound to separate monomers within the dimeric protein (4). The proposed reaction mechanism is summarized in Sch. 2 for the hydrolytic cleavage of dihydroorotate. In this chemical mechanism, the binding of dihydroorotate to the active site polarizes the carbonyl group via ligation to the h-metal ion. This binding interaction weakens the coordination of the bridging hydroxide to the h-metal (as evidenced by the longer bond to the h-metal ion relative to the a-metal ion). The hydroxide attacks the polarized carbonyl group, with assistance from Asp250, to form a tetrahedral adduct that now bridges the two divalent cations. Proton transfer from the protonated form of Asp250 to the incipient amide nitrogen initiates the collapse of this intermediate. Carbamoyl aspartate departs the active site and the binuclear metal center is recharged with a hydroxide ion from solvent. Similar mechanisms have been proposed for other members of the amidohydrolase superfamily. It would appear that this family of enzymes has evolved as a ‘‘delivery device’’ for the nucleophilic attack of hydroxide on target substrates. The architecture for the metal center has remained remarkably intact, but the individual active sites have been tailored through molecular evolution to
O
O O
O
O
H
O
OH
HN O
O HO
O
NH2
HN N H
Scheme 2
COOH
O
N H
COOH
O
N H
Catalytic reaction mechanism for dihydroorotase.
O
O
COOH
250
Raushel
recognize a specific set of substrates and associated functional groups for binding and chemical cleavage. Therefore, the amidohydrolase superfamily of enzymes offered a rather attractive target with which to test the limits for a rational reconstruction of an active site. Modulation of the substrate and stereoselectivity of PTE through site-directed mutagenesis were utilized as a stringent test of this proposition. 3
PHOSPHOTRIESTERASE
A bacterial version of phosphotriesterase (aka organophosphate hydrolase or OPH) has been discovered in strains of Pseudomonas and Flavobacterium (7,8). The Flavobacterium isolate was originally identified from a rice patty in the Philippines where bacterial soil samples had been tested for their ability to hydrolyze specific organophosphate insecticides (7). The gene responsible for the coding of the enzyme was identified, cloned, and overexpressed in Escherichia coli, and the protein was purified to homogeneity (9). The threedimensional x-ray structure of PTE has been determined by the Holden laboratory to very high resolution (10). 4
REACTION MECHANISM
Bacterial PTE hydrolyzes a variety of organophosphate triesters of the type shown in Sch. 1 using the insecticide, paraoxon, as an example. The substrate specificity is such that the substituent that functions as the leaving group is very much dependent on the pKa and, thus, with paraoxon, only the pnitrophenol group is cleaved from the phosphorus center (11). The enzyme does not hydrolyze diesters at an appreciable rate and thus only a single substituent is subjected to cleavage (12). The substrate specificity of the native enzyme is reasonably broad in that the phosphoryl oxygen can be substituted with sulfur and the other three substituents can be replaced with a variety of other groups (13). The native bacterial phosphotriesterase served as an ideal candidate for the directed reconstruction of a substrate-binding site. The breadth of substrates recognized by the amidohydrolase superfamily of enzymes convinced us that the structural fold of the (ha)8 barrel could accommodate a variety of perturbations to the specific interactions between proteins and substrates. Moreover, the binuclear metal center within these proteins demonstrated that hydroxide could be delivered to a variety of trigonal and tetrahedral reaction centers. The construction of mutant variants of PTE would be quite useful in the detoxification and detection of chemical warfare agents and agricultural insecticides. Our initial objective was to first identify the structural determinants of substrate specificity for wild-type protein and then to
Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase
Scheme 3
251
Generic substrate for phosphotriesterase.
utilize this information to construct mutant forms of PTE where substrate specificity would be enhanced for specific substrates. The initial substrate library was a series of organophosphate triesters bearing a p-nitrophenol leaving group of the type presented in Sch. 3. The pnitrophenyl group was chosen because of the ease with which the kinetics of hydrolysis could be monitored spectrophotometrically. The substituents X and Y could be varied with large and small alkyl groups through straightfor-
Figure 2 Relative values for kcat for the wild-type phosphotriesterase with achiral substrate analogs. (From Ref. 11.)
252
Raushel
ward synthetic procedures, and chiral substrates could be constructed of either stereochemistry. Altogether, 16 such substrates were prepared using all possible combinations of methyl, ethyl, isopropyl, and phenyl groups. Shown in Fig. 2 are the relative kcat values for the four possible achiral variants of the target substrate (Y and X are the same substituents). These studies demonstrated that the wild-type protein accepted any of the four substituents in either the proS of proR position, but that not all of these substituents were accommodated in the same way by the protein (10). Kinetic assays of the six pairs of racemic mixtures demonstrated that the wild-type enzyme exhibited a distinct preference for one stereoisomer over the other, as shown in Fig. 3. In every case, except for the pair of methyl and ethyl, there was a > 20-fold preference for one isomer over the other and this catalytic preference rose to about 100:1 for the methyl phenyl substrate. Kinetic assays with the individual enantiomers demonstrated that the SPstereoisomer, in every case examined, was preferred over the RP-enantiomer for this series of chiral substrates. If this preference is defined in terms of steric
Figure 3 Relative values for kcat for the wild-type phosphotriesterase with chiral substrate analogs. The values for SP enantiomers are shown in black whereas the RP enantiomers are shown in gray. (From Refs. 16,17.)
Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase
253
bulk, then the ‘‘large’’ substituent is preferred in the proS position whereas the ‘‘smaller’’ substituent is preferred in the proR position, as illustrated in Sch. 3. 5
KINETIC ENGINEERING OBJECTIVES
Our immediate goals for the mutagenesis of PTE were focused on a rational rearrangement of the active site binding cavity such that the inherent stereoselectivity using the aforementioned library of 16 model substrates could be manipulated. Thus, we were interested in enhancing the stereoselectivity possessed by the wild-type enzyme such that the catalytic preference for the SP isomer would be even more pronounced. The construction of mutants of this type would be quite useful, in a practical sense, for the kinetic resolution of racemic mixtures through the hydrolysis of a single stereoisomer while leaving the other enantiomer intact (14). Second, we were interested in relaxing the stereoselectivity of the wild-type enzyme. The goal here was to make the initially slower RP isomer as fast as the SP isomer (rather than making the SP isomer as slow as the RP isomer). Such mutants would be appropriate for the detoxification of racemic mixtures of organophosphate triesters where both isomers are toxic. Mutants of this type would also be useful as an initial stepping stone for the final objective, which was to create mutants where the stereoselective preference was reversed. With such mutants, the RP isomer would be hydrolyzed in preference to the SP isomer. In order to accomplish this goal, the SP isomers must be made poorer substrates while, simultaneously, the RP isomers must be made much better.
Scheme 4
Cartoon of the substrate binding pocket for phosphotriesterase.
254
Raushel
To accomplish these objectives, we set out to retool the active site in a semirational manner. Single-site mutants were constructed sequentially and then specific mutations were combined with one another to achieve the desired effect. Our starting premise for this endeavor was based on the assumption that alterations to substrate specificity could be accommodated by the expansion and contraction of the individual subsites for each of the three substituents attached to the phosphorus core. A cartoon showing these three subsites is illustrated in Sch. 4. An additional assumption for this endeavor was that only one of these three subsites would be properly oriented for the expulsion of the leaving group. Therefore, the remaining two subsites would define the substrate and sterospecificity for PTE. The most obvious problem here is that substrate binding can occur in any one of the three possible orientations and thus there is the potential for nonproductive binding. 6
IDENTIFICATION OF SUBSITES
In order to identify those amino acid side chains that came together in threedimensional space to form the individual subsites for the substrate, Vanhooke et al. (15) (University of Wisconsin) solved the structure of PTE bound to the nonhydrolizable substrate analog shown in Sch. 4. From the x-ray structure, it was concluded that the proS ethoxy group was oriented in what we defined as the leaving group pocket. The remaining two substituents (methylbenzyl) and the proR ethoxy group were oriented within the large and small pockets, respectively. The designation for the large and small pockets was defined to acknowledge the stereoselective preference exhibited by the wild-type enzyme for the initial library of organophosphate esters (13). The residues that surrounded the leaving group pocket included W131, F132, F306, and Y309, whereas those that comprised the large pocket included H254, H257, L271, and M317. The small pocket was defined by the side chains of G60, I106, L303, and S308. However, it should be noted that many of these residues are actually localized between these subsites, and thus the assignments are in some way rather arbitrary (15). 7
CONTRACTION OF SMALL SUBSITES
In order to construct mutants of PTE that were more stereoselective than the wild-type enzyme, we anticipated that the small subsite would have to be reduced in size. This reduction in the cavity size for the small subsite would likely obstruct or impair the positioning of substrates with bulky groups that would be required to bind within this region of the active site. Of the four residues that were probed in this manner, the mutation of Gly60 to an alanine proved to be the most effective. Shown in Fig. 4 is a direct comparison of
Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase
255
Figure 4 Ratios of kcat/Km for chiral substrate analogs with the wild-type and G60A mutant of phosphotriesterase. The ratios are given for SP/RP. (From Ref. 16.)
the stereoselectivity (ratio of kcat/Km values for the SP and RP isomers) for the wild-type and G60A mutant (16). The results are extraordinary considering that only a single –CH2– group has been added to a sea of nearly 2000 carbon atoms. In every case, the G60A mutant is considerably more stereoselective than the wild-type enzyme. For example, the SP isomer of methyl ethyl pnitrophenyl phosphate is hydrolyzed 10 times faster than the RP isomer, where no difference in the rate of hydrolysis for these enantiomers was observed with the wild-type enzyme. Moreover, the ratio of kcat/Km values for the two enantiomers of methyl phenyl p-nitrophenyl phosphate increased from 20:1 to 10,000:1 with the mutant G60A. This mutant has proven to be quite effective in the kinetic resolution of organophosphate triesters (14). Gram quantities of single RP isomers with ee values >98% have been obtained in a few minutes with this enzyme. 8
RELAXATION OF STEREOSELECTIVITY
In order to relax the stereoselectivity of the wild-type PTE, our approach was to enlarge the cavity space of the small subsite by mutation of residues within this site to either alanine or glycine. A simple alanine scan of residues C59, G60, S61, I106, W131, F132, H254, H257, L271, L303, F306, S308, Y309, and M317 showed that a significant increase in the overall rate of hydrolysis of the initially slower RP isomers could be realized when some of these residues are changed to alanine (16). In general, the initially slower RP isomer has gotten faster in every case. Those residues that had the greatest overall impact in the improvement of the rate for the initially slower RP isomers were found to be I106, F132, and S308 (16). Further improvements in the relaxation of stereoselectivity were achieved by the construction of glycine mutants at the critical residue positions and through the combination of multiple alanine or glycine mutants at selected residue positions (17).
256
9
Raushel
REVERSAL IN SPECIFICITY
In order to reverse the stereoselectivity inherent within the wild-type PTE, two adjustments to the active site needed to be accomplished simultaneously. The small subsite must be expanded whereas the large subsite must be shrunk in size. The constriction of the large subsite was initiated in an attempt to make it more difficult for the larger and bulkier groups to properly fit within this portion of the active site. If effective, this would reduce the rate of hydrolysis of the SP stereoisomers, relative to the values exhibited by the wildtype enzyme. The overall dimensional space of the large subsite was reduced by replacing H254, H257, L271, and M317 with the larger aromatic residues tyrosine, phenylalanine, or tryptophan. Shown in Fig. 5 are the effects of these mutations on the kcat values for the hydrolysis of methyl phenyl pnitrophenyl phosphate. The kcat value has been reduced from a value that exceeds 40,000 s1 for the wild-type enzyme to a value that is about 200 s1 for the H254F mutant. Overall, the most interesting mutant within this series of modified enzymes was H257Y (17). The kinetic constants for the SP isomers for the six chiral organophosphates were all reduced, relative to those of the wild-type enzyme. The largest reductions were observed for those compounds containing a single phenyl substituent. Therefore, the H254Y mutation at the large subsite was combined with the mutations previously
Figure 5 Diminution in the value of kcat for the SP enantiomer of methyl phenyl pnitrophenyl phosphate when the indicated residues within the large subsite of phosphotriesterase are mutated. (From Ref. 17.)
Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase
257
made within the small subsite in the rational search for novel proteins where the stereoselectivity was the opposite to that of the wild-type enzyme. We had demonstrated that enlargement of the small subsite with the substitution of glycine and/or alanine residues for I106, F132, and S308 resulted in significant improvements in the rates of hydrolysis for most of the initially slower RP enantiomers of the substrate library. However, the mutations made to the small subsite had much smaller effects on the rates of hydrolysis for the initially faster SP enantiomers. In contrast, reduction in the size of the large subsite with the mutant H254Y resulted in the diminution in the kinetic parameters for most of the faster SP enantiomers but relatively smaller effects on the kinetic parameters for the initially slower RP enantiomers. These results indicated that it should be possible to create variants of the native PTE that could reverse stereoselectivity by modulation of the sizes of the large and small subsites simultaneously, if the effects at the individual sites were additive. A total of 11 mutants were constructed in an attempt to reshape the structure of the small and large subsites simultaneously (17). Mutant enzymes were identified for the reversal of each pair of stereoisomers, with the single exception of ethyl isopropyl p-nitrophenyl phosphate. The most dramatic example is the case for the two stereoisomers of the substrate, isopropyl phenyl p-nitrophenyl phosphate. The wild-type enzyme prefers the SP isomer by a factor of 35 whereas the mutant I106G/H254Y/S308G prefers the RP stereoisomer by a factor of 460. The enhancements in the rates of hydrolysis for the RP isomers caused by these mutants were very similar to those observed with the glycine and alanine mutants of I106, F132, and S308 that only enlarged the small subsite. 10
SUMMARY
The investigation of the enantiomeric selectivity of PTE is of considerable practical significance. A variety of toxic pesticides and chemical warfare agents are phosphorus compounds that contain a chiral phosphorus center (18). Previous studies have shown that the more toxic isomers of these acetyl cholinesterase inactivators are the poorer substrates of the wild-type PTE (19). This study has clearly demonstrated that the reactivity and stereoselectivity of PTE can be enhanced, relaxed, or reversed by the rational evolution of specific active site residues. The enhancement and reversal of stereoselectivity have made it possible to utilize variants of PTE for the kinetic resolution of racemic mixtures of chiral organophosphates and to obtain either isomer with substantial enantiomeric excess (14). The relaxation of stereoselectivty is desired for bioremediation when catalysts are needed to efficiently detoxify hazardous pesticides and chemical warfare agents. The
258
Raushel
Figure 6 Manipulation of the stereoselectivity of phosphotriesterase for the chiral forms of ethyl phenyl p-nitrophenyl phosphate. The ratios of kcat/Km for the wild-type and selected mutants where the stereoselectivity has been enhanced (G60A), relaxed (I106G/F132G/S308G), and reversed (I106G/F132G/H257Y/S308G) are presented.
overall success of this effort, directed at the modulation of the kinetic properties of the wild-type enzyme, is graphically presented in Fig. 6. The relative kinetic parameters for the kinetic parameters for ethyl phenyl pnitrophenylphosphate with the wild-type enzyme and the best mutant enzyme, where the relative kinetic parameters have been enhanced, relaxed, or reversed, are provided. Enhancements in stereoselectivity for the preferred SP enantiomer up to three orders of magnitude have been achieved by the mutant G60A for all substrates tested. Multiple mutations within the active site have led to a complete reversal of the original chiral selectivity. These results suggest that further mutations within the active site could be engineered to accommodate nearly any organophosphate.
ACKNOWLEDGMENTS Financial support for this project has been obtained from the National Institutes of Health, Office of Naval Research, and the Advanced Technology Program from the state of Texas.
REFERENCES 1. 2. 3.
L Holm, C Sander. An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease. Proteins 28:72–82, 1997. E Jabri, MB Carr, RP Hausinger, PA Karplus. The crystal structure of urease from Klebsiella aerogenes. Science 268:998–1004, 1995. MM Benning, JM Kuo, FM Raushel, HM Holden. Three-dimensional structure
Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase
4.
5.
6.
7. 8. 9. 10.
11. 12.
13. 14.
15.
16.
17.
18.
259
of the binuclear metal center of phosphotriesterase. Biochemistry 34:7973–7978, 1995. JB Thoden, GN Phillips, TM Neal, FM Raushel, HM Holden. Molecular structure of dihydroorotase: A paradigm for catalysis through the use of a binuclear metal center. Biochemistry 40(24):6989–6997, 2001. DK Wilson, FA Quiocho. A pre-transition-state mimic of an enzyme: X-ray structure of adenosine deaminase with bound 1-deazaadenosine and zincactivate water. Biochemistry 32(7):1689–1694, 1993. JL Buchbinder, RC Stephenson, MJ Dresser, TS Scanlan, RJ Fletterick. Biochemical characterization and crystallographic structure of an Escherichia coli protein from the phosphotriesterase gene family. Biochemistry 37(15):17445– 17450, 1998. T Sethunathan, T Yoshida. A flavobacterium sp. that degrades diazinon and parathion. Can J Microbiol 19:873–875, 1973. DM Munneke. Enzyme hydrolysis of organophosphate insecticides, a possible pesticide disposal method. Appl Environ Microbiol 32:7–13, 1976. FM Raushel, HM Holden. Phosphotriesterase: An enzyme in search of its natural substrate. Adv Enzymol 74:51–93, 2000. MM Benning, H Shim, FM Raushel, HM Holden. High resolution x-ray structures of different metal-substituted forms of phosphotriesterase from Pseudomonas diminuta. Biochemistry 40:2712–2722, 2001. SB Hong, FM Raushel. Metal substrate interactions facilitate the catalytic activity of the bacterial phosphotriesterase. Biochemistry 35:10904–10912, 1996. H Shim, SB Hong, FM Raushel. Hydrolysis of phosphodiesters through transformation of the bacterial phosphotriesterase. J Biol Chem 273:17445–17450, 1998. SB Hong, FM Raushel. Stereochemical constraints on the substrate specificity of phosphotriesterase. Biochemistry 38:1159–1165, 1999. F Wu, WS Li, M Chen-Goodspeed, M Sogorb, FM Raushel. Rationally engineered mutants of phosphotriesterase for preparative scale isolation of chiral organophosphates. J Am Chem Soc 122:10206–10207, 2000. JL Vanhooke, MM Benning, FM Raushel, HM Holden. Three-dimensional structure of the zinc-containing phosphotriesterase with the bound substrate analog diethyl 4-methylbenzylphosphonate. Biochemistry 35:6020–6025, 1996. M Chen-Goodspeed, MA Sogorb, F Wu, SB Hong, FM Raushel. Structural determinants of the substrate and stereochemical specificity of phosphotriesterase. Biochemistry 40:1325–1331, 2001. M Chen-Goodspeed, MA Sogorb, F Wu, FM Raushel. Enhancement, relaxation, and reversal of the stereoselectivity for phosphotriesterase by rational evolution of active site residues. Biochemistry 40:1332–1339, 2001. HL Boter, C Van Dijk. Stereospecificity of hydrolytic enzymes on reaction with assymetric organophosphorus compounds. 3. The inhibition of acetylcholinestedrase and butyrylcholinesterase by enantiomeric forms of sarin. Biochem Pharmacol 18:2403–2407, 1969.
12 Protein Engineering of PQQ Glucose Dehydrogenase Satoshi Igarashi and Koji Sode Tokyo University of Agriculture and Technology Tokyo, Japan
1 1.1
INTRODUCTION PQQ and PQQ-Harboring Enzymes
Pyrroloquinoline quinone (PQQ) was first proposed in the 1960s as the third major prosthetic group (along with pyridine nucleotides and flavins) for redox enzymes (1). After about two decades, the structure of PQQ (Fig. 1) was determined by two groups (2,3). PQQ is the ortho-quinone at the C4 and C5 positions of the quinone ring. The C5 carbonyl group in the oxidized form is very reactive towards nucleophiles such as alcohols, sugars, amines, ammonia, cyanide, and amino acids. Knowledge about PQQ in the view of biology, biochemistry, and electrochemistry has been studied and summarized in several reviews (4–12). Until now, many PQQ-harboring proteins or PQQ and heme-harboring proteins have been discovered but only in Gramnegative bacteria (Table 1). Most of the PQQ-harboring enzymes belonged to dehydrogenases (4–31): PQQ methanol dehydrogenases (PQQMDH), PQQ ethanol dehydrogenases (PQQEDH), and PQQ glucose dehydrogenases (PQQGDH). 261
262
Igarashi and Sode
Figure 1
The structure of pyrroloquinoline quinone (PQQ).
PQQMDHs oxidize methanol to formaldehyde during the growth of methylotrophic bacteria on methane or methanol (32–34). PQQMDH from Methylotroph sp. was the first PQQ enzyme for which a tertiary structure was elucidated. PQQMDH is a soluble periplasmic enzyme composed of a2h2 heterotetrameric structure (35). The catalytic subunit, the a-subunit (about 60 kDa), possesses one PQQ molecule and one Ca2+ ion. The a-subunit was shown to be an 8-bladed h-propeller fold (36). Other PQQ enzymes also appear to be h-propeller proteins. The h-propeller structure is composed of a repetitive folding unit called the W-motif, which is arranged circularly like the blades of propeller (Fig. 2). The W-motif is composed of four antiparallel hstrands. h-Propeller proteins having four to eight W-motifs have been reported (37). PQQ-dependent alcohol dehydrogenases including PQQMDHs can be categorized into three types. The first group named ADH I was soluble alcohol dehydrogenases, including PQQMDH. The difference between PQQMDH and PQQEDH was simply substrate specificity (34). Type I PQQEDHs are homodimers of identical subunit of 60 kDa each, and its structure is 8-bladed h-propeller fold similar to PQQMDHs (38,39). ADH II is classified as heme-possessing PQQADH (15). The overall structure is composed of two domains: the N-terminal domain (1–566) as an 8-bladed h-propeller fold containing one PQQ molecule and one calcium ion in its active site and the C-terminal type I cytochrome domain (591–667) (40). The ADH III is a membrane-bound type alcohol dehydrogenase. ADH III is comprised of three subunits: a (catalytic), h (cytochrome), and small subunit (41,42). The a subunit has one PQQ molecule and single heme C, and h subunit possesses three heme C’s. In an electrochemical field, direct electron transfer from PQQ to an electrode via heme C was observed (43). The substrate specificity profile of ADH III is relatively restricted compared with other ADHs (12).
Type III
Type II
Polypropylene glycol dehydrogenase Alcohol dehydrogenase
Tetrahydrofurfuryl alcohol dehydrogenase Polyvinyl alcohol dehydrogenase Polyethylene glycol dehydrogenase
Ethanol dehydrogenase Alcohol dehydrogenase
? ?
PQQ/ heme C PQQ/ heme C
PQQ/heme C/3 heme C’s
PQQ
?
PQQ/ heme C
Periplasm/ membrane-bound Membrane-bound
Periplasm/ membrane-bound
Periplasm
Periplasm
Periplasm
Location
PQQ/heme C
PQQ
PQQ
Prosthetic group
The List of PQQ or PQQ-Heme-Harboring Proteins
Alcohol dehydrogenases Type I Methanol dehydrogenase
Table 1
a/h/small
Homodimeric
Tetrameric
Monomeric
Monomeric
Monomeric
Homodimeric
a2h2
Component
21 21
20
20
19
18
16 17
15
12
4
Ref.
(Continued on next page)
Pseudomonas sp. Stenotrophomonas maltophilia Gluconobacter sp. Acetobacter sp.
Pseudomonas sp. strain VM15C Rhodopseudomonas acidophila Flavobacterium sp.
Comamonas testosteroni P. putida HKS Ralstonia eutropha strain B0
Methylotrophs Paracoccus denitrificans Pseudomonas sp.
Organism
Protein Engineering 263
Continued
Glucose dehydrogenase (s) Cyclic alcohol dehydrogenase D-Arabitol dehydrogenase Formaldehyde dehydrogenase
Glucose dehydrogenases Glucose dehydrogenase (m)
Table 1
Periplasm Membrane-bound Membrane-bound Membrane-bound
PQQ PQQ PQQ
Membrane-bound
Location
PQQ
PQQ
Prosthetic group
Heterodimeric (a/h) Homotetrameric
Monomeric
Homodimeric
Monomeric
Component
G. suboxydans IFO3257 Methylococcus capsulatus
G. frateurii CHM9
12 12 12
Gluconobacter sp. Pseudomonas sp. Acinetobacter calcoaceticus Acetobacter sp. Pseudomonas sp. N11 Acinetobacter sp.
25
24
23
12 22 12
12
Ref. Enterio bacteria
Organism
264 Igarashi and Sode
Glycerol dehydrogenase
Sorbose/sorbosone dehydrogenase Quinate dehydrogenase 1-Butanol dehydrogenase
Lupanine hydroxylase Sorbitol dehydrogenase
Periplasm Particle-bound Periplasm Periplasm Membrane-bound
PQQ PQQ PQQ PQQ/heme C PQQ
Monomeric Monomeric
Monomeric
Monomeric
Heterodimeric
Oligomer?
Membrane-bound
P. butanovora G. industrius
A. calcoaceticus Gluconobacter sp. P. butanovora
G. suboxydans IFO3255 The strain DSM4025
Gluconobacter sp.
a/h/small
Membrane-bound
PQQ/heme C/3 heme C’s
P putida
Monomeric
Periplasm
PQQ/heme C
30 31
29
16
28
27
26
Protein Engineering 265
266
Igarashi and Sode
Figure 2 Overall structure of water-soluble quinoprotein glucose dehydrogenase (PQQGDH-B). This model was complemented by the addition of PQQ, Ca2+, some loop regions, and the energy minimization based on previously reported model (From Ref. 68) (PDB code: 1QBI).
PQQ-dependent glucose dehydrogenases (PQQGDHs) have also been studied extensively. PQQGDHs are described in the next section. 1.2
PQQ Glucose Dehydrogenases: The Basic Science and Industrial Application
There are two types of glucose dehydrogenases harboring PQQ as their prosthetic group (44,45). Membrane-bound type glucose dehydrogenase (PQQGDH-A) has been isolated from various Gram-negative bacteria such as Escherichia coli, Acinetobacter calcoaceticus, Pseudomonas sp., and acetic acid bacteria (12). PQQGDH-As are all single peptide with MWs of about 87 kDa containing one PQQ molecule (46,47). PQQGDH-As make a bioenergetic contribution via coupling of the oxidation of glucose to the respiratory chain through ubiquinone (48,49). The five genes encoding PQQGDH-A have been elucidated (46,47,50–52). The 3-D structure of PQQGDH-A is predicted to be a h-propeller composed of eight W-motifs, based on the homology modeling with PQQMDH (53) and also based on the CD spectroscopy of an enzyme from which the membrane spanning region was deleted (54). The N-terminal region was predicted to be the membrane spanning the anchoring region (55). The authors have reported the first site-directed
Protein Engineering
267
mutagenesis study on a PQQ enzyme, PQQGDH-A. Since then, several mutations were introduced in this enzyme, including the studies introduced in this review, to elucidate the enzyme mechanisms (56–66). Besides the membrane-bound glucose dehydrogenase, A. calcoaceticus possesses a completely different PQQGDH, the water-soluble glucose dehydrogenase (PQQGDH-B or s-GDH), which does not share any obvious homology with the primary structures of other PQQ enzymes (67). The BLAST search for PQQGDH-B homology identified two open reading frames from the E. coli K-12 strain MG1655 genome and Synechocystis sp. strain PCC6803 genome and two incomplete sequences from the genomes of Pseudomonas aeruginosa and Bordetella pertussis. The functions of these four deduced open reading frames are uncertain, and the predicted protein localization also differs using the prediction program (PSORT and Signal P) (68,69). PQQGDH-B is a homodimeric enzyme consisting of an identical subunit of approximately 50 kDa (67,70). The monomer has one PQQ molecule and three Ca2+ ions, two of which are located in the dimer interface and the third Ca2+ ion is near PQQ (68). The physiological roles of PQQGDH-B have not yet been elucidated. PQQGDH-B does not couple with the respiration chain of A. calcoaceticus. The substrate specificity profile of PQQGDH-B is broad compared with that of PQQGDH-A. This enzyme catalyzes the oxidation of glucose, allose, 3-O-methyl-glucose, and also the disaccharide lactose, cellobiose, and maltose (71). PQQGDH-B contains a 24amino acid signal peptide at its N-terminus and secreted in the periplasmic space after excision of the signal sequence. PQQGDH-B is also a h-propeller, but apparently forms a 6-bladed structure (68). PQQ resides in a deep, broad, positively charged cleft at the top of the propeller near the 6-fold pseudosymmetry axis (72). In this model, PQQ is directly exposed to the solvent. Ca2+ ion is bound to N6, O7A, and O5 atoms of PQQ. These bonds are similar to that of PQQMDH, and it indicates that catalysis of Ca2+ ion near the PQQ requires a cofactor. The active site of PQQGDH-B is composed of loop1D2A, loop2D3A, loop4BC, loop4D5A, and loop6BC. The substrate biding residues have been reported and are included mainly in loop1D2A, loop2D3A, and loop4BC (72). Among them, His168 was specified as an important residue that works for proton abstraction from substrate because His168 is the only base close to the glucose O1 atom, and glucose C1 atom is positioned directly above the PQQH2 C5 atom (72). 1.3
The Industrial Significance of PQQGDH: The Glucose Sensors
Diabetes mellitus is a serious metabolic disorder that places patients at increased risk of coronary and vascular disease, as well as debilitating
268
Igarashi and Sode
conditions such as retinopathy, nephropathy, and neuropathy. Therefore rapid and accurate blood glucose monitoring is essential for treating critically ill patients and managing diabetic patients. The glucose sensor is a traditional biosensor and was first reported by Clark in 1962 (73). Clark’s sensor was based on glucose oxidase (GOD) as its sensor constituent, and GOD-based glucose sensors dominate the current market. GOD is categorized as a stable protein and may be easily produced and purified from Aspergillus sp. GOD is an electron mediator-type glucose sensor. The inherent property of GOD is that it utilizes oxygen as the electron acceptor. This limits the further application in this field because enzyme activity is a function of oxygen partial pressure (74–76). Various glucose sensors employing PQQGDHs have been reported (77–81). The merits of using the PQQGDHs as a glucose sensor component are as follows: 1. PQQGDHs show high catalytic efficiency compared with GOD. The high activity allows rapid glucose sensing. 2. PQQ is tightly bound to GDH; therefore it is not necessary to add an extra cofactor like NAD (P). 3. PQQGDHs do not utilize dissolved oxygen as its electron acceptor during glucose oxidation. This property enhances accurate measurement of glucose in the human blood. Focusing these merits, PQQGDH-B glucose sensors are already on the market. However, despite their superior features, further improvements are required. This is particularly true when PQQGDH is compared with GOD, which has better substrate specificity and operational stability. The establishment of economical recombinant enzyme production system is also essential. The authors’ research group initiated and is currently the only group engaged in the protein engineering of PQQGDHs to develop an optimized glucose sensor enzyme. This review summarizes the current status of PQQGDH protein engineering. 2
PROTEIN ENGINEERING OF PQQGDH-A
Highly homologous primary structures have been observed in PQQGDH-As, which have been cloned from various Gram-negative bacteria; however, the enzymatic characteristics are dependent upon the derived bacterial sources. Although their tertiary structure was hard to elucidate due to hydrophobic properties, the highly homologous primary structure of this protein enabled us to initiate the protein engineering of this enzyme based on the homologous recombination to construct a chimeric enzyme library (58,60,63).
Protein Engineering
269
Among various properties, we focused on the difference in the cofactor binding stability as the marker for the chimeric enzyme library. PQQGDHs require divalent ion for holoenzyme formation with PQQ; however, divalent ions such as Ca2+ are removed by the presence of chelating reagents such as EDTA, resulting in apoenzyme formation (82). Therefore EDTA tolerance can be interpreted as an indicator of cofactor binding stability. A. calcoaceticus PQQGDH-A is a representative EDTA-tolerant enzyme, whereas E. coli PQQGDH-A is a representative EDTA-sensitive enzyme (57,58,70). The highly homologous primary structure between E. coli and A. calcoaceticus PQQGDH-A structural genes provided a strategy for the construction of a chimeric enzyme library based on homologous recombination. The investigation of the chimeric PQQGDH-A library resulted in the elucidation of the region responsible for EDTA tolerance (58,60) (Table 2). One of the chimeric enzymes (designated as E97A3) showed the increase in the thermal stability of which the N-terminal 97% region is from E. coli and the remaining 3% is from A. calcoaceticus PQQGDH-A (57). This observation suggested that the interaction between C-terminal and N-terminal regions may play a crucial role in maintaining the overall structure of hpropeller proteins (63). We have also carried out the first site-directed mutagenesis studies on PQQ enzymes, particularly PQQGDH-A, focusing on the C-terminal highly conserved region, previously postulated as the putative PQQ binding site (83). Cleton-Jansen et al. (50) reported an altered substrate specificity of a mutant PQQGDH-A in Gluconobacter oxydans, for which substrate specificity was enlarged by the substitution of the conserved C-terminal His region. Based on this information, site-directed mutagenesis studies on the conserved Cterminal His residues of E. coli PQQGDH-A (His775) and of A. calcoaceticus (His781) were carried out (59,62). The substitution of E. coli His775 to Asn showed the increase in both Km value (from 0.9 to 1.5 mM) and Vmax/Km ratio (from 116 to 287 U/mg protein mM) for glucose compared with wild type (Table 2). The substrate specificity of His775Asn drastically changed and increased vs. wild-type E. coli PQQGDH-A. The Vmax/Km ratios for all substrates except for glucose decreased as compared with wild type; consequently, His775Asn scarcely oxidized sugars other than glucose. His775Asp also showed a significant increase in Km values for all the saccharides used in the study (Table 2), and showed improvement of the substrate specificity compared with wild-type E. coli PQQGDH-A, as did His775Asn. Amino acid substitution at His781 in A. calcoaceticus also significantly affected substrate specificity. On the basis of the accumulated information from these studies, we constructed an enzyme composed of all the regions that showed improved
100 30 1 13 2 4 5
100 81 30 105 38 48 14
0 21 0 1 3
100 52
140
His775Asp
3 29 6 4 2
100 47
79
His775Glu
0 161 6 16 23
100 177
5.6
His775Lys
3 73 7 5 4
100 64
195
His775Ser
Substrate concentration is 1 mM. The values were the relative activity compared with the activity toward glucose as the substrate.
417
His775Asn
110
Wild type
Substrate Specificity Profiles of PQQGDH-A His775 Variants
Activity (U/mg) D-Glucose 2-Deoxyglucose D-Mannose D-Allose D-Galactose D-Xylose Maltose
Table 2
1 29 6 4 2
100 39
7.9
His775Leu
2 105 12 3 1
100 46
3.1
His775Tyr
2 58 3 9 4
100 49
1.1
His775Trp
270 Igarashi and Sode
4 93 7 10 0
80 kDa) may not satisfy these criteria. Cloning for display in lytic phages such as E and T4 removes these limitations. Two enzymes, h-lactamase and h-galactosidase, have been cloned as fusions with the E coat proteins gpD and gpV (30,31). These display formats have not been used extensively in enzyme engineering so far.
3
SELECTION FOR ACTIVITY
The selection of a phage enzyme with desirable properties from libraries of mutants based on their catalytic activity is far more demanding than the selection of protein variants for binding to a specific target. Indeed, it is necessary to find a way to couple the ability of an enzyme to catalyze a chemical transformation, which, in principle, leaves it unchanged, with an acquired binding ability. Several strategies of increasing sophistication have been conceived to achieve that goal. 3.1
Selection by Binding: Transition State Analogues vs. Substrate or Product Analogues
The first selection protocols were simply based on binding to substrate or product analogues. They were tested on libraries of mutants of staphy-
Enzyme Engineering by Phage Display
397
lococcal nuclease (16) or glutathione-S-transferase, an enzyme that catalyzes the conjugation of toxic compounds to glutathione to facilitate their elimination (17). The goal of isolating mutants with modified specificities was met with limited success (16,17). Binding to transition state analogues (TSAs) represented a step forward as these are designed to mimic the geometry and charge distribution of the true transition state of a reaction. Indeed, as first suggested by Pauling (31a), enzymes are able to catalyze reactions because they are more complementary to transition states than to substrates. It is interesting to compare the results obtained from selections of GST mutant libraries by binding to a TSA vs. a product analogue. The A1-1-type GST is active on aromatic substrates activated for nucleophilic addition, such as chlorodinitrobenzene. In an effort to change its specificity toward substrates bearing a negative charge on the aromatic ring, Widersten and Mannervik (17) created libraries of phage-displayed GST in which 10 amino acid residues in the aromatic electrophile binding site were randomly mutated. Selection by binding to product-like affinity ligands (e.g., 1 in Fig. 3) allowed extracting novel GST with altered substrate specificity, but the specific activities of these enzymes were reduced 1000-fold compared to the wild-type enzyme. On the other hand, selection by binding to a transition state analogue was used to extract active enzymes from
Figure 3 Immobilized product analogue or transition state analogue used for the selection of glutathione-S-transferase mutants displayed on phage.
398
Soumillion et al.
another library of mutants in which four residues in the same binding site had been randomly mutated. The j-complex (2), which mimics the transition state of a nucleophilic aromatic substitution, was used as an affinity ligand (Fig. 3). Several mutants were characterized after four rounds of biopanning. The catalytic efficiency of one of them was determined on several substrates: it was 20-fold to 90-fold lower than the wild-type enzyme on chloronitrobenzene substrates (32). Although a definite conclusion cannot be reached from a comparison between the results of these two sets of experiments because the libraries and specificities are different, the observation of a better catalytic activity after selection with a TSA suggests that it is more appropriate to select active enzymes. The strategy of selection on phosphonate TSAs was also applied to libraries of catalytic antibodies with specific esterase activities in the hope of finding mutants with enhanced catalytic activity. This has led to conflicting results: in one case, mutants with higher TSA affinity but lower catalytic activity were selected (33) whereas, on the other hand, better binders were more active (34). 3.2
Selection with Suicide Substrates: Mechanism-Based Inhibitors
A suicide substrate is a relatively unreactive compound that can be transformed by an enzyme into a very reactive inhibitor. As this transformation arises by the normal enzymatic mechanism, suicide substrates are also called mechanism-based inhibitors (35). We have taken advantage of that property to design a selection strategy applicable to the extraction of active phage enzymes from libraries of mutants (Fig. 4). Mixtures or libraries of mutants are incubated under kinetic control with a limiting concentration of a biotinylated suicide inhibitor. Preferential reaction of the most active phage enzymes with the inhibitor leads to labelling of their active site and biotinylation of these phages. After a defined reaction period, excess nonbiotinylated inhibitor is added to stop all labellings. The labelled phages are then extracted from the mixture by adsorption on a streptavidin-coated support. As inhibition involves the formation of a covalent bond, the recovery of phages necessitates the cleavage either of a disulfide bond introduced in the connector between the suicide inhibitor head and biotin, or of the peptide
Figure 4 Selection of active phage-enzymes by labelling with a biotinylated suicide inhibitor followed by capture on streptavidin coated support. The phages are released by cleavage of a disulfide bond in the activity label or by cleavage of a peptide bond in the connector between the displayed enzyme and g3p (not represented).
Enzyme Engineering by Phage Display
399
400
Soumillion et al.
connector between the displayed enzyme and g3p using a specific protease. In our experience, the second method is more efficient. The method has been tested on model mixtures using the phage-displayed TEM-1 h-lactamase and mutants of known activities using penicillin sulfone as the biotinylated suicide inhibitor (3 in Fig. 5). The most active phage enzymes could be extracted from the libraries (5,36). However, in other model experiments designed to assess the potential of the method to select enzymes of increased activity and stability, we discovered that this simple protocol may lead to the selection of low-activity mutants. In selections, run in the presence of denaturant, on mixtures of phage enzymes displaying the wild-type h-lactamase and two mutants of different activities and stabilities [kcat vs. PenG, residual activity (r.a.) in 0.65 M GuHCl wt: kcat = 1500 s1, r.a. = 40%; E104K: kcat = 2260 s1, r.a. = 29%; and G238S: kcat = 45 s1, r.a. = 6.7%], we observed a twofold enrichment in the less active and less stable G238S mutant. Analysis of the kinetics of reaction with the suicide substrate revealed that this mutant reacted actually faster with the suicide inhibitor (3) than predicted from its turnover rate with the substrate. Similarly, when the protocol was applied to libraries of mutants generated by error-prone PCR, an inactive mutant (E166V) was isolated as the dominant clone after three rounds of selection (I De Conninck and J Fastrez, unpublished results). These surprising results can be explained by the following kinetic scheme describing the interaction between a suicide inhibitor and its target enzyme when the suicide event originates from a covalent intermediate on the reaction pathway (Sch. 1). The probability of forming the irreversibly inhibited enzyme from the covalent intermediate—an acyl-enzyme in the case of a h-lactamase—is directly related to the ratio between the rate constants of the suicide event (k4) and turnover (k3). If the acyl-enzyme becomes too stable and the enzyme does not turn over efficiently, the ratio k4/k3 becomes high and the enzyme is efficiently inhibited. This is the case of the E166V and the G238S h-lactamase mutants mentioned above. Both muta-
Figure 5 Biotinylated penicillin sulfone used for the selection of phage-displayed hlactamases.
Enzyme Engineering by Phage Display
401
Scheme 1
tions affect the activity mainly through a reduction in k3, the effect being more dramatic on E166 (37) than on G238 (38). Without correction, this simple protocol can lead to the selection of enzymes that do not turn over. The problem can be corrected by preincubating the mixture of phage enzymes with the substrate. The phage enzymes, whose ratio of rates of deacylation vs. acylation (k3/k2) is low, will be blocked as acyl enzymes and will not be available for labelling with the biotinylated suicide inhibitor. Consequently, they will not be selected. This protocol was applied to a library of h-lactamase mutants containing an unknown proportion of phage enzymes with low k3/k2 ratios, behaving like ‘‘penicillin-binding proteins (PBPs)’’ and other mutants with typical h-lactamase properties (high k3). The h-lactamases were efficiently selected under these conditions (39). To analyze the efficiency of this protocol involving a counterselection by substrate, it was also tested on a mixture of phages displaying enzymes of known properties, respectively, a low-activity h-lactamase mutant [kcat = 16 s1c1% of the kcat of the wild-type enzyme and the E. coli PBP4 (a DDpeptidase with a k3 of 7.2105 s1 and k3/k2 ratio of 6.3106] (40). In the absence of counterselection, the PBP4 phage enzyme was selected. Preincubation with 105 M substrate (benzyl penicillin) reversed the enrichment factor to V0.1 (S Lavenne, J Fastrez, in preparation). If the technique of selection with suicide substrates is to be used to select mutants with a modified specificity, it is essential that the rate of labelling with suicide substrates and that of turnover of true substrates be sensitive to the same structural features. Some information on this issue is available from results obtained with the serine protease subtilisin and a lipase. Although, strictly speaking, esters of acyl-amino-phosphonic acids should be referred to as covalent transition state analogues, they behave like suicide substrates for serine proteases. They form a stable phosphonylenzyme analogous to the classical acyl-enzyme intermediate. Furthermore, the rates of phosphonylation by these inhibitors and acylation by the corresponding esters or amides substrates are sensitive to the same structural features. Taking advantage of this characteristic, we have tested the possibility of selecting mutants of the Bacillus lentus subtilisin whose extended active site
402
Soumillion et al.
would accept positively charged residues in the P4 site. Two residues lining the P4 binding pocket were randomized and the phage enzymes library was selected by labelling with a phosphonylating inhibitor (4b in Fig. 6) whose structure is as close as possible to that of a substrate of the target specificity (5b). With inhibitors (4b), mutants whose activity on the charged substrate was 4% of that of the wild-type phage enzyme on substrates (5b) could be isolated even from this small library. As a control, clones with a wild-type-like specificity were shown to be efficiently selected with inhibitors (4a) (8). The ‘‘detergent lipase’’ from Thermomyces lanuginosa, an enzyme active on emulsified lipids and responding to interfacial activation, has also been functionally displayed on phagemid. A library of mutants of high diversity has been created in which nine amino acid residues in two regions constituting the hinge controlling the opening of the active site were randomized. Selection by labelling, in the presence of detergent, with the trioleinemulsified biotinylated p-nitrophenyl-phosphonate (6 in Fig. 7) succeeded in extracting active clones from the library. However, none of the analyzed clones was more active than the wild-type enzyme on p-nitrophenyl-palmitate, particularly in the presence of detergent. In one region, a high degree of conservation of wild-type residues was observed in the active clones. In the most active ones (activity z50% of wt), a substitution known to alter the substrate chain length preference was selected. This may result from the fact that the long hydrophobic chain connecting the inhibitor head to biotin
Figure 6 Biotinylated phophonylating inhibitor used for the selection of subtilisin mutants with wild-type like activity (4a) or with a specificity modified to accept positively charged residues in the P4 site (4b) and structures of the corresponding substrates (5a and 5b).
Enzyme Engineering by Phage Display
Figure 7 mutants.
403
Biotinylated phosphonylating inhibitor used for the selection of lipolase
interacts more favorably with the enzyme than the palmitate chain in the substrate and/or from the influence of biotin on the conformation of the inhibitor in the lipid layer during phage enzymes labelling (27). In conclusion, selection with suicide substrates appears to allow the extraction of interesting mutants from libraries. However, several limitations remain. The most obvious one is that suicide substrates are not available for all classes of enzymes. Furthermore, when available, the suicide substrates are not necessarily suitable for selection of the desired activity. For instance, nearly all the suicide substrates that have been designed to inhibit proteases are activated by cleavage of a lactone-like function; consequently, the suicide event does not really require the ability to cleave an amide bond but an ester bond, indeed an easier reaction. Finally, although our investigation on the selection of various h-lactamase mutants from model mixtures or libraries was relatively successful, the technique has not yet allowed us to select clones having acquired even a weak h-lactamase activity (increased rate of deacylation) from libraries of PBP mutants. This would demonstrate that the technique is suitable for isolating mutants with weak activities (i.e., the kind of activities that are likely to be found initially when trying to engineer enzymatic activities de novo). Nevertheless, Some success in that direction has been achieved in the selection of catalytic antibodies with h-lactamase or glycosidase activities (41,42). 3.3
Direct Selection with Substrates
In view of the potential limitations of the suicide inhibitors approach, efforts have been devoted to develop new techniques of selection based directly on substrate transformation. In the first two reports, the substrates were attached to the phage enzymes. On reaction, occurring in an ‘‘intraphage’’ format, the active phage enzymes became labelled with the product. They were then captured on a support coated with a product-binding protein. This strategy was tested by Pedersen et al. on a model system with the phagemiddisplayed staphylococcal nuclease (SNase), a nuclease requiring Ca2+ for
404
Soumillion et al.
activity. Biotinylated double-stranded DNA was attached to the phagemid and immobilized on streptavidin-coated support. The addition of Ca2+ activated the enzyme and allowed the release of phagemids from their support 100 times more effectively than with a control phage displaying a Fab fragment (43). Demartis et al. have also tested selection protocols based on substrate transformation with four different enzymes: two proteases, a GST, and a biotin ligase. Peptide substrates were attached to the phage enzymes through a complex formation with a calmodulin module fused to both g3p and the displayed enzyme. The protease phage enzymes were selected after ‘‘intraphage’’ proteolytic cleavage by binding to a productspecific antibody. The GST was selected by streptavidin binding after conjugation of a glutathione-containing peptide substrate with a biotinylated electrophilic aromatic cosubstrate. The E. coli biotin ligase was similarly selected directly after biotinylation. On model systems, enrichment factors of active phage enzymes vs. control were adequate (up to 2000-fold). Unfortunately, application of the selection scheme to a library of trypsin mutants did not allow the obtainment of mutants with catalytic activity superior to those of the original H57A mutants (21,28). Being based on single ‘‘intraphage’’ turnover, these strategies are likely to lead to a selection of low-activity enzymes. Jestin et al. have tried to avoid this shortcoming by coupling two independent reactions: the enzymatic reaction, presumed to occur intermolecularly, and a chemical reaction, connecting the substrate/product to the phage. With this strategy, they were able to enrich model mixtures of phagedisplayed active and inactive DNA polymerases into active ones (22). Recently, we have explored the possibility of selecting phages displaying active metalloenzymes by affinity chromatography associated with catalytic elution (Fig. 8). The selection protocol includes three steps: (a) the phage enzymes are first inactivated by extraction of the metal cofactor; (b) they are adsorbed on a support coated with a penicillin substrate; and (c) the phages displaying active enzymes are then selectively eluted by the addition of the cofactor. Active enzymes transform the substrate into a product for which they have normally a significantly lower affinity. The method requires that the apoenzymes are still able to bind their substrates. The selection process was
Figure 8 Selection of metallo-phage-enzymes by catalytic elution. Three kinds of phage-enzymes present in the library of mutants are represented: (1) top: inactive mutant retaining affinity for the substrate, (2) middle: active phage-enzyme (a star in the active site represents the metal cofactor), and (3) bottom: inactive mutant devoid of affinity. The selection operates in three steps: inactivation by metal extraction, binding on immobilised substrate, release of active phage-enzymes by catalytic elution on metal insertion in the active site.
Enzyme Engineering by Phage Display
405
406
Soumillion et al.
first tested with the B. cereus metallo-h-lactamase phage enzyme (fd-hLII). The wild-type fd-hLII was shown to be preferentially extracted from model mixtures containing fd-hLII and either a dummy phage, a phage displaying an inactive mutant of the serine h-lactamase TEM-1, or inactive and lowactivity mutants of hLII. Enrichment factors varying between 36-fold and 820-fold were observed. The selection was also applied to extract active phage enzymes from a library of mutants generated by mutagenic PCR. The activity of the library was shown to increase 60-fold on two rounds of selection. Eleven clones from the second round were randomly picked for characterization. They contained between two and four mutations. The kcat values of the phage enzymes varied between 30% and 160% and the Km values varied between 70% and 170% of the wild-type values. All enzymatic activities were less stable than the wild-type vs. thermal denaturation. 3.3
Construction of an Allosteric Binding Site by Hierarchical Selection and Screening
Allosteric regulation lies in good standing among the characteristic properties of enzymes. In a project whose final purpose is to design enzymes whose activity could be regulated by the binding of various ligands, we endeavored to construct an allosteric binding site in the vicinity of the active site (Fig. 9). To create the first-generation regulatable enzymes, we reasoned that it might
Figure 9 Schematic representation of a h-lactamase in which three contiguous loops have been extended by insertion of decapeptides into surface loops close to the active site. The essential Ser70 located at the bottom of the active site is represented in space-filling mode.
Enzyme Engineering by Phage Display
407
be easier to engineer a recognition site for a protein than for a small ligand because this could simply require the building of a protuberance that would fit into an existing cavity on the surface of the target protein, whereas building a binding site for a small ligand would probably require the more difficult creation of a new cavity. Because we cannot reliably predict the sequences or structures that would constitute adequate binding sites, the strategy will be to insert random sequences in surface loops of the chosen enzyme and then to select potentially interesting clones on three criteria: (a) enzymatic activity, (b) affinity for the target ligand, (c) if possible, modulation of activity on target ligand binding (Fig. 10). Selection for these properties is best organized hierarchically. The enzyme chosen for this project was the serine TEM-1 h-lactamase in the phage display format. Random peptides were genetically inserted in five different loops by replacement of one to three residues of the wild-type
Figure 10 Schematic representation of the effect of allosteric ligand binding on enzymatic activity.
408
Soumillion et al.
sequence by five to nine random residues. In the case of a h-lactamase, active clones can be extracted from libraries either by in vivo selection or by in vitro selection using biotinylated suicide inhibitors as explained above. In vivo selection was chosen for its simplicity: the insertion libraries were plated on solid media containing a h-lactam antibiotic at a concentration chosen to exclude inactive or low-activity clones. The percentage of active clones reflects the insertion tolerance of the loops. The following results were founds: insertions were relatively well accepted (z5% of clones with z5% of wild-type activity) in replacements of G41-A42 (P Mathonet, unpublished results) or of T271 (8). These loops are symmetrically located, respectively, between the amino-terminal helix and the first h-strand, or between the last h-strand and the carboxy-terminal helix. A weak tolerance to insertion (100 plates per day), it may be necessary to invest in one or more large integrated system(s). The integrated systems can be divided into two groups: robot-centric systems and conveyor-based systems. 3.3.1
Robot-Centric Systems
Robot-centric systems, such as Beckman/Sagian Core System, CRS HTS/ UHTS, and Zymark Presto, have all equipment placed around an articulated robot. The robot may be fixed on the worktable, giving it a relatively small cylindrical working volume, or it may be placed on a track, giving it a larger working volume, especially in systems allowing equipment to be placed on both sides of a central track. Such systems have a very large flexibility, where equipment can be placed freely around the robot or track, allowing redundant resources or several types of microtiter plate readers in the same system. In very large and complex systems, however, the robot can become the limiting resource, as it is responsible for all the transportation of plates between equipment. Even with a fast robot, for a complex protocol involving many steps, the cycle time becomes several minutes. Adding redundant resources such as more liquid handling often does not solve the problem. Some companies have implemented dual robot (or even triple robot) systems to overcome this limitation (18), but then the systems become very complex and such systems can be
Screen Automation and Robotics
533
recommended only for organizations with very high experience and skills in the field of laboratory automation. 3.3.2
Conveyor-Based Systems
Conveyor-based systems, such as CCS Packard PlateTrak, CRS Dimension4, and Zymark Allegro, have all equipment aligned along a conveyor mechanism transporting plates between instrument positions. The conveyor can be at least as long as a robot track, allowing systems to be just as complex and flexible with regard to the amount of equipment. Relatively simple pick and place robots are transporting plates between the conveyor and the instruments. For a very high throughput system, consider a conveyor-based system in favor of a robot-centric system. With a conveyor-based system, several plates can be transported to and from instruments and between instrument positions at the same time leading to cycle times of a minute or less. The industry tendency over the last few years seems to be moving away from the robot-centric systems and toward the conveyor-based systems. This trend is developing because in very complex and very high throughput systems, the robot quickly becomes the bottleneck. A conveyor-based system may be the best choice for methods involving these elements. 3.4
Schedulers
A scheduler is a software program that organizes and optimizes the workflow of an automated system, typically a fully integrated system. Schedulers have several advantages over user-programmed scheduling of tasks. They can find a processing pattern, which is more efficient than a sequential processing pattern. They can utilize redundant resources; the more advanced schedulers can recover from loss of resources and dynamically reschedule the process. To some extent, they can also ensure similar timing for all plates in a given step, e.g., as in an incubation step. Schedulers are also very useful in allowing the user to test several configurations to optimize the process without having to take the time to run it. One disadvantage of using schedulers is that the process pattern is not always predictable. Plates within a run may be treated differently, and the derived process pattern is probably not the most efficient. (Frequently, the programmer can visually see inefficiencies that they would like to fix, but may be unable to do so within such a system.) Also, if bar codes are not used, it can become difficult to track plates and data associated with those plates without going through the method log and transportation schedule. Many systems with scheduler software are performing the very same (and few) processes over long periods of time. In situations where protocols
534
Lamsa et al.
are not frequently changed, or the system does not have redundant resources, a better performing system may be obtained by carefully creating processing sequences without a scheduler. 3.5
Software Development Environments
In many cases, it is desirable for the user to write software for the entire system or to enhance aspects of a commercial system. Attempting this task is largely dependent on the user’s programming background and the environment of the commercial system. Some older versions of systems, such as the ORCA robot from Beckman/Sagain, came with robot control software that could be used to control the entire system. The user could set this up or, more commonly, the control software is set up by a systems integrator. Microsoft Visual Basic is also commonly used to control automation equipment (27,28), often being the interface within the commercial software. Again, depending on user expertise, manipulations of common automation can be performed in VB or nearly any software language. Now, it is more common, however, for the vendor-supplied software to be more than adequate and easy to use such that many users can learn and manipulate their systems with relative ease. 3.5.1
Proprietary Environments
Proprietary environments with their own programming language, such as Beckman/Sagian MDS, Tecan Pegasus, and Zymark EasyLab, have several advantages. They get a system up and running quickly. They create a framework for most common applications. They cover all low- and mid-level details and contain all common procedures programmed beforehand. Some of the disadvantages are that it can be difficult to perform very complex tasks; some often must be done by separate programs or systems. The programming paradigm has been chosen by the vendor, making it difficult, if not impossible, to create applications outside the intended scope. The systems can be less flexible because of the paradigm constraints. Some vendors have done a good job to include nearly all the functionality needed for nearly every situation. However, the paradigms can often miss features important to some users, and it can be time-consuming to learn all the functionality of a program. The industry tendency does not seem to be clear. For example, Beckman/Sagian and, to some extent, Zymark have left the proprietary route several years ago. In contrast, Tecan is marketing a brand new proprietary environment. In conclusion, proprietary environments are good for the developed applications (what the designers were considering), but can be severely constrained in situations where an application needs to go outside the intended scope.
Screen Automation and Robotics
3.5.2
535
Device Libraries
Device libraries, such as Beckman/Sagian ORCA NT and ActiveX components from various vendors, are designed to handle equipment control leaving the application programming to a generic programming language, such as Visual Basic or C++. Device libraries have the following advantages: lowlevel functionality is still made by the vendor, more flexible than proprietary environments, and more homogeneous run time environment, e.g., control of devices from several vendors can be integrated into a single software package. The disadvantages include a requirement for experience, and it is likely to be more time-consuming than application development in a proprietary environment. Device libraries are still somewhat limited in scope. When using device libraries from several vendors in the same application, the program feel and the GUI can become somewhat inhomogeneous. 3.5.3
Custom-Written Code
Custom code written entirely from scratch can be made for very specific purposes, where the above-mentioned methodologies cannot fulfill the given set of requirements. This should be considered for very experienced users who have the necessary resources for application development and maintenance. It seems to be the tendency that experienced developers tend to prefer the device libraries to the proprietary environments because of the additional flexibility, but be aware that it certainly comes with a cost of much more development effort. 3.5.4
Data Management
The volume of data generated by HTS systems can come as a surprise to new users. Very often, the instrumentation vendors do not prepare the users for the mountains of data their systems will produce, and typically the solutions are up to the users or to the company information management team. Fully integrated systems tend to have data logging, but that is usually the extent of it. A simple solution is to store the data files from the detection instruments and then import the file into a standard spreadsheet application. Although this method may work nicely for some applications, it has some serious drawbacks if the objective is to have a historical archive of data that is easily accessible. For most enzyme screening, the data are sorted and a subset is retained relating to the selected features identified in the screen. In general, libraries exist in test tubes in a mixture and are not, in general, retained as individual, unique isolates that are screened repeatedly. In the past, the need for extensive, large databases has not been a high priority for enzyme screening. However, as more hits are generated and the interest in retaining more of these as candidates for further manipulation in enzyme evolution, it
536
Lamsa et al.
will become more important to have a good database system. It is important to keep all primary data for data mining purposes. Key parameters of the improved mutants may be discovered in the course of retesting in larger scale. It is then good to be able to go back and reselect. Key traits of each run that will help to determine how reliable the results are may also be discovered. These retrospective analyses are only possible if the data are kept in a suitable annotated form. A substantial part of the resources allocated for HTS should be spent on creating and maintaining a good data collection, storage, reduction, and analysis system. Inspiration for this could be gained from the software used in medicinal chemistry, where the use of data reduction and analysis tools is a common practice. There are a number of specialized applications with very strong visualization capabilities, such as those produced by Spotfirek and Partekk. Database companies such as IDBS (http://www.id-bs.com) and NuGenesis (http://www.nugenesis.com/) offer some of the solutions needed; however, these companies are more geared toward the pharmaceutical industry than toward industrial enzymes. In general, working within the capabilities already present within the organization is a good place to start. An excellent example of an in-house database that could be used as a design model for screening can be found in Ref. 29. Many of the features described could be applied to the design of a database for enzyme screening.
4
SCREENING AND AUTOMATION ISSUES
4.1
Enzyme Evolution in the Literature
Most of the focus of a push towards automation has been driven by the technologies available to make immense libraries of enzyme variants to be screened (5,30–38). In combination with this, many assays and methods that have been automated into microtiter plate formats have been demonstrated in automated screens or miniaturized manual screens (20,39–43). Screening is recognized as a necessary evil to finding useful enzymes, as tying selection to a screen is not always possible or functional. 4.2 4.2.1
Automation Evolution Older System Configurations
Automation has evolved significantly since first implemented in laboratories. Fig. 1 is a photograph of an integrated system in a screening laboratory. In this system, four separate PCs controlling the individual workstations (Biomek 1000, Beckman Instruments) were integrated with the PC controlling the robot arm (ORCA, at the time of purchase, produced by Hewlett-
Screen Automation and Robotics
537
Packard, now owned by Beckman Instruments) to make this system function. When this was set up, in the early 1990s, the idea was to use a workstation that was multifunctional to allow adapting methods as screening research projects and priorities changed. Each workstation pictured has liquid handling (single tip or eight-tip tools) and plate reading capability (optical density tool), in addition to being compatible with many vendor-supported capabilities such as PCR blocks, high-density replicating tools, and bulk dispensing. Workstations such as these were the early precursors to larger, more capable workstations that are available today. This system was built around a nonrandom access, plate-stacking system that was actually very fast. Plates were often moved around in stacks to speed delivery and reduce transport times. A feature of this system is the ability to use each workstation as an easily accessible stand-alone unit. Over the years, various devices and upgrades have been performed on the system to add functionality. 4.2.2
Newer System Configurations
An example of the newer type of system utilizing some of the same principles is illustrated in Fig. 2. It is a commercial system where all the components have been tested and integrated off-site by the vendor. The software to run the system takes advantage of scheduling to ease programming and assay optimization. This particular system is designed to function much as the older system in Fig. 1. Each large, multifunctional workstation is situated to be human-accessible; the system can be run with both workstations being utilized, or with only one, leaving the other available for analytical development or other small-scale screening tasks. A design such as this leaves open the option of performing two completely different screens simultaneously. Redundant components also allow for minimization of downtime in the event of the failure of a major component. The major advantage of this type of system is that it is easier to learn to program for a motivated user who may have a minimal background in programming. 4.3
Automated Screening for Enzyme Function
A very common method that has been available since the mid-1990s is the use of a colony-picking robot to select active clones for a liquid-based screen in microtiter plates. Fig. 3 illustrates a typical flow of a microtiter plate-based screen that begins with the utilization of a colony picker. This could be compatible with nearly any density format or microtiter plate, depending on the scale of the plates and robotics. Typically, if a solid phase screen was available to identify clones expressing active enzymes, that screen would be employed in conjunction with a colony-picking robot to only pick those active clones to the liquid phase. If no activity-based solid-phase screen was
538
Lamsa et al.
Figure 3 This is a simplified view of the interaction of complete automated screening method for a directed evolution program to improve enzyme function. A library is transformed into a host; the cells of the microorganism are plated on to agar plates containing medium or medium and substrate that can be detected by a CCD camera on a colony-picking robot. Colonies are picked to liquid medium, grown up to produce broth samples containing the enzyme being screened, then an automated treatment and/or analytical method is used to screen the library clones.
available, in order to have an efficient method, all clones would be picked into all wells in the microtiter plates. In the past, prior to the invention and commercialization of colonypicking devices, a common way to inoculate microtiter plates was by dilution methods utilizing the characteristics of Poisson distribution statistics. In Fig. 4, the graph shows the number of single colonies/well as one line (diamonds), all wells with more than one colony are included as the other line (triangles), and the final line (squares) is the sum of the two. For instance, an average of 1 colony per well means that 36 wells are single colonies, 24 wells are multiple colonies, with, on average, 2.5 colonies/well, and the remaining 26 wells are empty. These same statistics apply to higher density plates that are being produced now for screening. An example of this is the GigaMatrixk (13). This plate and other approaches like it use these statistics for inoculation of the plates with clones. These are dip and test plates, currently useful for selection and, to a degree, for screening. However, by the
Screen Automation and Robotics
539
Figure 4 Poisson statistics graphed to illustrate the typical distribution of microbial colonies in a 96-well microtiter plate. The theoretical inoculation rate represents how many colonies are added to a given volume of medium to achieve up to 192 total colonies in the total volume of medium to be distributed into the microtiter plate in this example. The graph depicts the predicted distribution of colonies within individual wells of the 96-well microtiter plate. Multiple colonies per well represent two or more, and in this example, they represent a range of two to six colonies per well.
540
Lamsa et al.
current methods of filling and inoculation, if the goal of this design is to screen unique isolates, the plate must be inoculated such that only 1/10th the total number of wells are utilized. This is still impressive; however, a workable solution is needed to make a better utility of all the wells to make use of its full potential as a screening tool. 4.3.1
Defining Noise in a Screening Method
Understanding what is going on in the screening process is important in making the screen work optimally. The advantage (and curse) of automating the process is that it is now easier to track noise quantitatively in the screening system. A few examples of this are given. In Fig. 5, an enzyme assay is performed in 96-well v-bottomed polycarbonate plates in customdesigned heating blocks. In these graphs, the same concentration of enzyme is tested in all wells to map noise in the heating device. The temperature at which the assays are performed and the % CV data are given in the figure legend. To simplify visualization of the data, the absorbance reading obtained in the assay is normalized to the mean value of the assay, then the graph is positioned by rotating it (MS ExcelR) such that 0% CV would yield a line that disappears when all data points lie at 100% on the z-axis. The x- and y-axes correspond to microtiter plate row and column locations, respectively. By setting up a template to view assay data in this manner, variations above and below the mean can easily be visualized. To further enhance the view, the graph can be rotated in MS ExcelR to view the column or row data to further visualize where noise resides. This is a nice way to visualize % CV for an analytical method. It can quickly show trouble spots and trends that are not apparent when simply looking at the results reported as a % CV. Thermal inactivation using the custom-designed heating blocks is illustrated in Fig. 6. In contrast to Fig. 5, the data obtained from a heating block for heat killing of enzyme in polycarbonate plates (in a buffer without substrate) behave quite differently from activity assays (enzyme plus substrate) in the block. Heating blocks should be validated for the type of measurements being generated using the block. Enzyme without substrate is much more susceptible to noise issues at temperatures where the enzyme begins to denature. In Fig. 6, graphs A, B, C, and D are treatments at 0, 5, 10, and 15 min at a fixed temperature, respectively. As temperature or time is increased, the noise associated with this type of measurement generally increases, more so than in the case of the activity assay in Fig. 5. These figures illustrate why it is important to define and improve conditions that affect analytical methods, particularly those related to temperature effects (44).
Screen Automation and Robotics
541
Figure 5 This is a graphical illustration of an activity assay of a hydrolytic enzyme performed in v-bottomed 96-well polycarbonate plates placed in a heating block. Heating block temperatures are A = ambient (20jC), B = 45jC, C = 60jC, and D = 75jC. The same amount of enzyme and substrate are added to all of the wells; after treatment, the samples are transferred to a flat-bottomed 96-well plate for reading absorbance values. The mean assay result is calculated and individual results are divided by the mean and expressed as a percent of the mean (z-axis). The xaxis = rows (A–H), y-axis = columns 1–12 of the 96-well plate. Variation within the plate can be visualized by the percent above or below the mean; no visible line (a line exactly at 100%) would indicate no variation. In this example, variation is expressed as % CV, and for graphs A, B, C, and D, it is 0.9%, 2.4%, 2.8%, and 5.3%, respectively.
4.3.2
Optimization of a Screening Process
An early-stage screen result is illustrated in Fig. 7 as an example of results that can be obtained when a screening method is scaled up from bench scale to high throughput scale. This is an example of a directed evolution library screening where a treatment is applied and the effect is measured as % residual activity after heat challenge. Approximately 700 control data points were compared with 5500 data points from a mutant library under the conditions initially chosen for this screen. These data
542
Lamsa et al.
Figure 6 This is a graphical illustration of the thermal treatment of a hydrolytic enzyme in a buffer followed by an activity assay at ambient temperature. The same volume and concentration of enzyme is added to each well of v-bottomed 96-well polycarbonate plates which are then placed in heating blocks. Graph A = control (20jC), graphs B, C, and D are 5, 10, and 15 min, respectively, at 70jC. After treatment, the same sample volume is removed from each well and an ambient temperature activity assay is performed in a new, flat-bottomed 96-well plate. The mean assay result is calculated and individual results are divided by the mean and expressed as a percent of the mean (z-axis). The x-axis = rows (A–H), yaxis = columns 1–12 of the 96-well plate. Variation within the plate can be visualized by the percent above or below the mean; no visible line (a line exactly at 100%) would indicate no variation. In this example, variation is expressed as % CV, and for graphs A, B, C, and D, it is 1.3%, 8.5%, 8.8%, and 9.7%, respectively.
show that it was virtually impossible under the initial conditions chosen for the screen to clearly identify mutants that improved compared to the controls. In the example depicted in Fig. 8, a hydrolase is being screened after conditions have been further optimized. Low and high % residual activity controls have been added to validate the method. In this screening method, improved variants are much easier to spot than in the previous example. A very useful statistic can be applied to a screen to determine the quality of the data. Table 1 illustrates the data graphed in Fig. 8 to calculate the Z-
Screen Automation and Robotics
543
Figure 7 This is an example of an early-stage screen development of a hydrolase enzyme after adapting the method from a manual method to an automated method. It is an example that illustrates the variation of the screen (which could include both the analytical and mechanical aspects of automating a screen). The mutants are a unique set of mutants; the controls are all the same hydrolase. As was discovered in follow-up analysis of mutants selected from this screen, most of the hits were false positives or noise associated with analytical and mechanical aspects of the screen that required further attention (data not shown).
statistic (45,46) for this particular screen, referred to as ZV in this instance (46), where: Z V ¼ 1 ð3rControl 1 þ 3rControl 2Þ=jl Control 1 l Control 2j In general, Z values >0.5 are excellent assays, with large separation bands. This is illustrated in Fig. 9, a graphical representation of the data in Table 1, where the wild-type control and the high control distribution peaks are well separated from each other. This illustrates the value of the application of the Z-factors in evaluating screening method performance and more clearly shows the separation bands as compared to the same data graphed as a scatter plot in Fig. 8. The wild-type and low controls are too close to each
544
Lamsa et al.
Figure 8 This figure illustrates the effects of improvements made in the screening method for the hydrolase illustrated in Fig. 7. By adjusting mechanical and analytical aspects of the screen, a much improved, more reliable screening process was achieved. The wild-type control is now more distinct from the mutants being screened. These data are from directed evolution libraries of variants that were an improvement of wild type. A high-control variant of wild type and a low-control variant of wild type known to perform better and worse, respectively, than wild type with this particular treatment was available for this comparison.
Table 1 Applying the Z-Statistic to Screen Development for an Improved Hydrolase Description WT Low High Mutants
Median
3SD
n
Z-factor relationship
Z-factor
6.36 0.70 74.09 11.49
3.99 3.30 18.90 28.23
36 65 89 7664
WT–high WT–low Low–high NA
0.701 0.181 0.697 NA
Screen Automation and Robotics
545
Figure 9 This figure is the data for the hydrolase from Fig. 8 graphed as a distribution. It can help to visualize the Z-statistic described in the text and presented in Table 1. WT control is the wild-type control enzyme, low control and high control are the same as described in Fig. 8 legend. Mutants are all the variants screened in this particular screening experiment.
other to clearly distinguish improved variants. As the statistic goes toward 0, it is an indication that the quality of the assay is diminishing towards a yes/no type of assay, and values below 0 indicate that it is virtually impossible to screen in any meaningful way. Evaluation of your method with this statistic can aid in adjusting the screening parameters to improve the separation bands of control populations. In the above examples, the Z-statistic calculation is taking into consideration a mixture of analytical methods of which any one by itself could be evaluated for ZV factors. These include the enzyme assay itself or the heating block treatment effect. Nearly any aspect of the parameters used in a screen could be analyzed for ZV factors to measure the quality of the process or devices used to help pinpoint particularly troublesome aspects of the screening process. There are many interactions of biology, chemistry, and machinery going on during a screen. Nothing should be taken for granted, and careful
546
Lamsa et al.
dissection of a screen is often necessary to define the parameters that affect the screen. Sometimes it is necessary to run the screen with the noise issues ‘‘intact’’ for a while to look at reproducibility. Often, in an automated system, this is more of a job of the screeners than of the programmers or automation experts. Seemingly simple factors, such as locations of air-conditioning ducts in the laboratory, reagents sticking to tubing or reservoirs, heating block, or incubator effects, show up when the number of test samples is increased. It can take patience and repeated testing to clearly identify problems in the process. Awareness of these issues in consideration of optimizing the process can pay big dividends in the quality of the data generated (47,48). Data interpretation tends to be biased based on previous experiences and expectations when interpreting data generated by screening. It is not always clear what statistical parameter to apply, and due to issues of undiscovered noise, the screening landscape can be a moving landscape. Within industrial enzyme screening, many programs do not have the longevity to allow the comfort of clear identification of all the parameters important to improving the quality of the data. Generic techniques, to a certain extent, can be optimized. In the changing landscape of expression system-specific issues, enzyme-specific issues, and even device-specific issues, optimization can be elusive. 4.3.3
Variants Selected From a Directed Evolution Screening Process
In Fig. 10, hydrolase variants isolated from the primary screening process depicted in Fig. 8 were followed up in a variety of screens with slightly different parameters to evaluate the differences in the selected variants. In this way, a fingerprint of the variant can be obtained, and this can also be used to show that two similar variants selected in the primary screen are most likely the result of different gene products. It can also help in evaluating which of a group would be better to test in an application if the follow-up test is more stringent than the original screen. In this figure, it is noteworthy to compare the Libraries A and B variant movement relative to the high control. It is clear that two isolates from Library B are clearly better than the rest (Graph B), while Library A variants retain function in the low temperature, destabilizing the condition. The final example, depicted in Fig. 11, is of a directed evolution screen that correlated with a baking application that would be difficult to automate. One of the mutants (from Library 9, the gray triangle in the upper left) was the best performer in the application. A large number of the variants depicted on this graph were tested in the application to clearly identify the improved variant. The application, the scale of the test required, and the market for the
Screen Automation and Robotics
547
Figure 10 This figure illustrates the fingerprinting of improved variants selected in the type of hydrolase screen found graphed in Fig. 8. Graphs A, B, C, and D are combinations of temperature and treatment cocktails used to help identify and further differentiate selected variants of Libraries A and B from the controls, in particular the high control. Graph A is the data where no treatment is applied other than temperature and buffer. Graph B is a harsh treatment where the wild-type enzyme is known to be unstable. Graph C uses a stabilizer, known to protect the enzyme stability and challenging with a very high temperature. Graph D uses a treatment known to quickly destabilize the enzyme at a low temperature. The x-axis = relative activity, the y-axis = % residual activity, calculated by a comparison of initial samples and final samples that have been treated by the indicated conditions. The high control retains activity after all treatments; ranking of the variants changes depending upon the treatment. In Graph B, Library B has the most improved variants; in Graph D, Library A variants are the most improved. Key to symbols : x = wild type, n = high control, D = low control, . = Library A, w = Library B.
548
Lamsa et al.
Figure 11 This figure is an example of a directed evolution screening program performed over a period of approximately a year, where wild type (WT) was a carbohydrase subjected to poison PCR mutagenesis. Successive rounds of shuffling of improved variants created these libraries. These were tested for improved thermal stability in a buffered screening system in 96-well plates. All of the improved mutants were tested against each other at this time in the follow-up screen. In this test, the cluster of mutants in the upper left corner provided a mutant with greatly improved performance in a baking process.
product often set the limits of the number of variants that can be tested from a screening program for industrial enzymes. Improvements in scale-down and automation of follow-up screens that correlate with applications are a severe limitation for improving industrial enzymes. 4.4 4.4.1
Automated Screening for Increased Enzyme Yield Growth of Microorganisms in Microtiter Plates
Factors such as type of organism, media composition and strength, volume, shaking, temperature, and moisture all influence the amount of enzyme mea-
Screen Automation and Robotics
549
sured in a well. Also, the inoculation method and the chemistry of the assay will have a great influence on the amount of retesting that will be required. When the goal is the identification of mutants that produce higher enzyme titers, it is important to design the microtiter screening setup in a way that mimics the conditions that the strain will experience in a production environment. A mutant isolated from a microtiter plate screen should retain the increased performance when grown in production fermentors. Production processes are virtually impossible to replicate in microtiter plates. The biomass concentration in the production tank is much higher than what can be achieved in a microtiter plate, and it is not easy to add nutrients or to control pH in the wells during growth. In addition, the key parameters that are usually monitored in a production fermentor, such as pH, temperature, and oxygen tension, are virtually impossible to monitor in a microtiter plate. Methods for measuring many of the parameters can be too intrusive at this scale, and it usually makes sense to design the screen such that the composition of the fermentation medium maintains stability with respect to the key parameters. For instance, if the growth medium is sufficiently dilute, the growth rate and biomass yield of the microorganism and thereby the oxygen consumption rate will be low enough to avoid oxygen limitation of the culture. Additionally, by buffering the media, the pH can be maintained within an acceptable range. It is necessary to evaluate whether the chosen conditions correlate to the results that are obtained when growing the organisms in fermentation tanks. The total yield of enzyme per mass unit of carbohydrate is often a good figure to use in this correlation if the cultures are limited by available carbohydrate. Another good figure to correlate is the productivity, i.e., the production of enzyme per unit time. Here, the success criteria would be that production of enzyme over time in the two systems has similar kinetics. Additionally, the correlation between the systems can be evaluated by growing strains that produce different amounts of enzyme. The ranking, and preferably also the relative difference, between the strains should be maintained at all scales. Screening of mutagenized cells or spores is further complicated by the fact that the mutagenized cells require a varying amount of time before they start to grow. This is probably dependent on the harshness of the mutagenesis and the nature of the genes being mutated in each cell. The problem arises when the cultures are assayed at a fixed time after inoculation. The difference in the timing of the initiation of the growth between the cultures will lead to a similar difference in the total incubation time when the cultures are assayed. Therefore, in theory, a high-producing, slow-starting mutant could be lost as it did not have sufficient time to show its potential, whereas a low-producing, fast-starting mutant would be identified as an improved candidate.
550
Lamsa et al.
One way to overcome some of this asynchrony between the wells is to extend the growth period so that all the cultures eventually reach the maximum expression level allowed by the media. However, the produced enzymes are not always very stable in the outgrown cultures, so it is possible that the measured enzyme activity will decline during prolonged incubation. Another way to overcome the asynchrony is to pre-grow the mutagenized cells before they are grown in the screening medium. With unicellular organisms, this could be solved by outgrowing the mutants individually in a duplicate set of microtiter plates, then inoculation of the screening culture from these plates (e.g., mother–daughter plate approach); however, this is at the expense of an additional step. Additionally, the entire population could be outgrown to get healthy cells (a fairly standard and sometimes necessary practice), with the cost being many copies of each mutant being present when the screen is performed. With multicellular organisms such as fungi, this method is generally not performed, but it is feasible to similarly outgrow and harvest spores from plates after mutagenesis, again at the expense of an additional step, uneven dilution of the mutant population, and the need to screen more of your library. Any of these ‘‘outgrowth’’ methods can have unknown enrichment effects on your pool of mutants. 4.4.2
Expression of Enzymes
The enzyme expression level of a microorganism depends not only on the genetic makeup of the strain, but also on the regulation of the expression of the enzyme in question. When a given enzyme is to be produced, it can be made from the organism where it is initially identified, or the gene encoding the enzyme can be moved by gene technology techniques to another host that may be more suitable for the production environment or where the expression of the gene is higher. The industrially interesting enzymes are usually hydrolytic enzymes such as proteases, lipases, and carbohydrases. All these enzymes are inducible enzymes, meaning that they are only produced in significant amount when certain inducing nutrients are present. For instance, proteases are induced by the presence of polypeptides, and amylases are induced by the presence of starch or maltose. Additionally, many of these genes are also repressed when other nutrients are available. Amylase genes are usually repressed when glucose is present in significant concentrations, independent of the presence of the inducing agent (maltose). Furthermore, the details about these often quite complex regulation mechanisms are not always fully understood. The complexity of the regulatory mechanisms makes the design of screening medium more complicated. Whereas it is simple to determine the composition of the medium when the growth is initiated, it can be virtually
Screen Automation and Robotics
551
impossible to control how the medium composition changes during the fermentation. For instance, as starch is a potent inducer of amylases, one could design screening medium where the main carbon source was soluble starch. As the strain grows, it produces amylases that degrade the starch, making it accessible for the cells as a carbon source. However, since the industrially relevant strains produce very high amounts of amylase, the amylase concentration would soon be so high that all starch was degraded very quickly. This would release a lot of glucose leading to a high glucose concentration in the media. The high glucose concentration would repress the expression of the amylase genes, effectively shutting down the synthesis of amylase. As the cells grow, they consume the glucose, so the glucose concentration would gradually be reduced and thereby allowing for amylase synthesis to be resumed. This complex expression pattern would lead to a quite unpredictable expression pattern, making it difficult to determine when the cultures should be assayed for amylase activity. In the example above, the starch was soluble, so the growth medium was homogenous. However, many inducing nutrients are insoluble, such as soy flour, potato protein, or cellulose. This adds an extra level of complexity to the medium, as the accessibility of these complex nutrients to the cells may vary during the growth phase. It is therefore always an advantage to design screening medium using soluble, simple components if possible. In the amylase example above, it would be possible to use maltose as a carbon source in the medium instead of starch and still get the required induction of the amylase. As an added bonus, few amylases can degrade maltose into glucose, so the amylase production would not lead to a boost in glucose concentration, and the multiphase expression pattern would be avoided. Interestingly, the enzyme expression issues discussed for yield improvement screening are virtually ignored when screening enzymes for improved function, such as in the directed evolution programs. In many cases, it is possible that measurements of activity that are reported many times in the literature are misleading characteristics of enzyme functional improvements. In general, one should not interpret activity in functional enzyme screening as specific activity. Activity is specific activity times the concentration. Unless a method to confirm specific enzyme protein concentration is available, only rough estimates of specific activity may be made. This needs to be understood in screen design for evaluating libraries of enzymes for functional characteristics. 4.4.3
Sampling and Assay
For yield improvement screens, a key requirement of sampling is that the sample is representative of the enzyme concentration in the well. Although
552
Lamsa et al.
this looks trivial at first glance, it can be seriously complicated by several factors, such as the homogeneity of the culture, the nature and concentration of the biomass, and the evaporation in microtiter plates. A culture of a microorganism is seldom homogenous. The medium and the biomass will always comprise two different phases no matter how uniformly the biomass is distributed within the well. Cultures of unicellular bacteria can often be regarded as homogenous for all practical purposes, but even unicellular eukaryotes like yeast are a little more complicated, as the cells tend to sediment in the wells. Filamentous organisms such as actinomycetes and fungi are not dispersed in the wells but are often present as one or more pellets. This makes the sampling quite complicated, as the pellets tend to clog pipette tips. Even if the pellets are easily transferable, they will occupy a varying volume of the transferred sample and thus induce an uncertainty in the volume of the sample. Furthermore, many enzymes tend to stick to the biomass, making the concentration of the enzyme very much dependent on the presence of biomass in the sample. Adherence of enzyme to the biomass should be avoided. Adding surfaceactive components to the growth medium such as detergents or emulsifiers can reduce adherence. Evaporation from microtiter plates is not generally uniform from well to well but can be greatly minimized by incubating the plates in a sealed, moist box during growth. The wells located at the edges of the plate have a much higher evaporation rate than the wells in the center, leading to a concentration of the samples located at the edges. Performing enzyme assays directly in the growth plates can reduce this problem. In practice, this can be accomplished simply by adding buffers and reagents directly to the growth plate after incubation. This method requires that the assay can function under these conditions, and that putatively improved strains isolated from the screening wells survive the assay. The assays that are used are usually scaleddown versions of standard enzyme assays. If the assays require addition of insoluble compounds or a filtration or centrifugation step, these unit operations are more difficult to automate, and alternative approaches or solutions may need to be devised. The enzyme assays can be divided into two categories: the kinetic assays in which the enzyme reaction is monitored over time, and endpoint assays employing one measurement made at the end of the assay. Kinetic assays are superior to the endpoint ones in terms of robustness and precision; since the enzyme concentration in a kinetic assay is determined from the differences in absorbance, it is self-blanking. If a kinetic assay is too slow, then it is feasible to use a multiple read method where batches of plates are processed with repeated visits to the plate reader over a period of time. Endpoint assays work well if the blank background is low or known
Screen Automation and Robotics
553
(and relatively reproducible), or if a large-enough sample dilution is done such that the blank is negligible. In general, assays done at ambient temperature are most suitable and most easily controlled in a yield improvement screen setting. 4.4.4
Enzyme Yield Screen Throughput
Table 2 illustrates the results of a screening program to achieve an improvement in yield for an industrially relevant enzyme and host. Generally, when a screening program is started, the industrial host already produces enzymes at high level, but further improvement is required to make the process economical. The data in the table are from a program undertaken before substantial improvements in assay, and automation technology were implemented. With the technology and improvements discussed above, it is generally possible to screen hundreds of thousands of mutants in a month if needed. The limitations are the rounds of mutagenesis and the follow-up screening required, including fermentation time, to clearly identify the best candidate for another round of mutagenesis and screening. However, it illustrates the types of improvements that can be obtained with the numbers screened. In this program, a fivefold improvement of an industrially relevant enzyme was achieved. 4.4.5
Secondary Screening
In yield improvement screening projects, a considerable amount of time is spent on designing the screening setup and in retesting the candidates isolated from the primary screen. There is generally low correlation between the primary screen and the production process due to many of the factors discussed in the preceding sections. It is important to test the isolated candidates in systems of increasing scale, to eliminate the false positive hits
Table 2 Enzyme Parent Initial Round Round Round Round Totals
1 2 3 4
Results of a Screening Program for Increased Yield of an Industrial
Mutagen
96-well
24-well
Shake flasks
24-well
Shake flasks
Fermentors
EMS UV NTG NTG NTG
37,900 4,700 26,800 9,800 14,900 94,100
255 93 490 300 368 1,506
20 12 41 20 44 137
6 6 16 12 22 62
2 5 16 12 22 57
2 5 5 3 1 16
554
Lamsa et al.
at each scale, and to reduce the number of isolates that have to be tested at each scale. The isolated candidates can be retested in microtiter plates (384, 96, and 24 wells), with several parallel cultures of each candidate. After retesting in microtiter plates, the candidates can be tested in ordinary shake flasks or in fed-batch shake flasks where nutrients are added continuously during the fermentation. Finally, the candidates can be tested in laboratoryscale fermentors and ultimately in pilot plant scale. Laboratory fermentor scale systems correlate better with the production processes than the microtiter-based systems. This is largely because the physical conditions and other parameters can be strictly controlled in fermentation vessels. This is the final screen of the enzyme yield improvement screening process. Ultimately, it is also the final screen of any enzyme that goes through an improvement program for enzyme function.
5
THE FUTURE
There are many new approaches being applied to screening and selection. Among these are the methods involving solid phase screening with digital imaging, single molecule detection assay technologies, phage display, protein display on microorganisms, fluorescence-activated cell sorting (FACS), nanoscale growth and selection in liquid phase, and man-made cell-like compartments (5–12). These new approaches can make the current automation approaches more necessary and efficient, rather than obsolete, as more and more candidates are generated for applications testing. A new emphasis of these robotic systems for applications-relevant or correlating screens is a likely scenario. As screening of larger and larger libraries increases, more and more hits will be obtained, further complicating the process of selecting the best variants, as has been seen in the drug discovery field (14). Like the explosion of information since the Internet, it will further increase the need for a variety of automation methods at nearly all microtiter plate scales, although some say these newer methods could eventually replace screening in microtiter plates (8). Keeping automation as flexible as possible will allow adapting existing systems to some of these new technologies (15,16,18,19). The scientists that are involved in screening will find it easier to use automation because there is now more access to off-the-shelf automation available at many different levels of complexity, with more variations constantly in development. Those who previously thought they could not use automation will find it easier to use, so more scientists will become involved. The timing is right; there is more screening-related work to be done at a faster pace due to the explosion of results from these new primary screening
Screen Automation and Robotics
555
methods mentioned above. Automation of screening will remain a multidisciplinary activity, however, largely because of the complexity of the different sciences that must come together to make up a screening program. It will become more important for all disciplines to share this function to make it as successful as it can be. REFERENCES 1. 2.
3.
4. 5. 6.
7.
8.
9.
10.
11. 12. 13.
M Divers. Point: screen development as a shared function. J Biomol Screen 3:263–266, 1998. GE Nedwin. Green chemistry: using enzymes as benign substitutes for synthetic chemicals and harsh conditions in industrial processes. In: G Salyer, ed. Biotechnology in the Sustainable Environment. New York: Plenum Press, 1997, pp 13–32. M Lamsa, P Bloebaum. Mutation and screening to increase chymosin yield in a genetically-engineered strain of Aspergillus awamori. J Ind Microbiol 5:229– 238, 1990. A Demain. Genetics and microbiology of industrial microorganisms. J Ind Microbiol Biotech 27:352–356, 2001. M Olsen, B Iverson, G Gerogiou. High-throughput screening of enzyme libraries. Curr Opin Biotechnol 11:331–337, 2000. JM Joern, T Sakamoto, A Arisawa, FH Arnold. A versatile high throughput screen for dioxygenase activity using solid-phase digital imaging. J Biomol Screen 6:219–223, 2001. S Delagrave, DJ Murphy, JL Rittenhouse Pruss, AM Maffia, BL Marrs, EJ Bylina, WJ Coleman, CL Grek, MR Dilworth, MM Yang, DC Youvan. Application of a very high throughput digital imaging screen to evolve the enzyme galactose oxidase. Protein Eng 14:261–267, 2001. KJ Moore, S Turconi, S Ashman, M Ruediger, U Haupts, V Emerick, AJ Pope. Single molecule detection technologies in miniaturized high throughput screening: fluorescence correlation spectroscopy. J Biomol Screen 4:335–353, 1999. H Joo, A Arisawa, Z Lin, FH Arnold. A high-throughput digital imaging screen for the discovery and directed evolution of oxygenases. Chem Biol 6:699–706, 1999. DC Youvan, E Goldman, S Delagrave, MM Yang. Digital imaging spectroscopy for massively parallel screening of mutants. Methods in Enzymology. New York: Academic Press, 1995, pp 232–248. DS Tawfik, AD Griffiths. Man-made cell-like compartments for molecular evolution. Nat Biotechnol 16:652–656, 1998. AD Griffiths, DS Tawfik. Man-made enzymes—from design to in vitro compartmentalization. Curr Opin Biotechnol 11:338–353, 2000. WM Lafferty. GigaMatrixk: 100,000-well Screening Platform. Podium Presentation: LabAutomation 2002, Palm Springs, 2002, p. 66.
556
Lamsa et al.
13a. 14. 15. 16.
JM Perkel. Going Super-Duper Throughput. The Scientist 15(17):24, 2001. DJ Ausman. Screening’s age of insecurity. Mod Drug Discov, 32–39, 2002. G Karet. Transforming HTS. Drug Discov 5:20–26, 2002. J Babiak. Transforming your robotics into an infrastructure of the future. J Biomol Screen 2:139–143, 1997. M Banks, A Binnie, S Fogarty. Point: high throughput screening using fully integrated robotic screening. J Biomol Screen 2:133–135, 1997. M Beggs, H Blok, A Diels. The high throughput screening infrastructure: the right tools for the task. J Biomol Screen 4:143–149, 1999. J Major. Challenges and opportunities in high throughput screening: implications for new technologies. J Biomol Screen 3:13–17, 1998. JR Cherry, MH Lamsa, P Schneider, J Vind, A Svendsen, A Jones, A Pedersen. Directed evolution of a fungal peroxidase. Nat Biotechnol 17:379–384, 1999. JJ Burbaum. Point: the evolution of miniaturized well plates. J Biomol Screen 5:5–8, 2000. KR Oldenburg. Point: automation basics: robotics vs workstations. J Biomol Screen 4:53–56, 1999. G Karet. Options flood the liquid handler market. Drug Discov 5:29–32, 2002. TA Bateman, RA Ayers, RB Greenway. An engineering evaluation of four fluid transfer devices for automated 384-well high throughput screening. Lab Robot Autom 11:250–259, 1999. JW Armstrong, RA Gerren, SD Hamilton. A review of automation options to support plate preparation, cherry picking, and homogeneous assays. J Biomol Screen 3:271–275, 1998. MA Sills. Counterpoint: integrated robotics vs. task-oriented automation. J Biomol Screen 2:137–138, 1997. B Rasnow, K Kearns, P Grandsard. Open-Sourcing Laboratory Automation Control Software. Poster T068: LabAutomation 2002, Palm Springs, 2002. MF Russo, MM Echols. Automating Science and Engineering Laboratories with Visual Basic. New York: John Wiley and Sons, 1999, pp 1–355. TG Holt, C Dufresne, JM Liesch, GK Mallow. The design and development of an integrated natural products screening database. J Biomol Screen 5:421–433, 2000. H Zhao, FH Arnold. Combinatorial protein design: strategies for screening protein libraries. Curr Opin Struct Biol 7:480–485, 1997. JC Moore, HM Jin, O Kuchner, FH Arnold. Strategies for the in-vitro evolution of protein function: enzyme evolution by random recombination of improved sequences. J Mol Biol 272:336–347, 1997. FH Arnold, JC Moore. Optimizing industrial enzymes by directed evolution. In: T Schenor, ed. Advances in Biochemical Engineering/Biotechnology. Berlin: Springer-Verlag, 1997, pp 1–14. O Kuchner, FH Arnold. Directed evolution of enzyme catalysts. Tibtech 15: 523–530, 1997. KE Jaeger, T Eggert, A Eipper, MT Reetz. Directed evolution and the creation of enantioselective biocatalysts. Appl Microbiol Biotechnol 55:519–530, 2001.
17. 18. 19. 20. 21. 22. 23. 24.
25.
26. 27. 28. 29.
30. 31.
32.
33. 34.
Screen Automation and Robotics 35. 36.
37. 38.
39. 40.
41. 42.
43.
44.
45.
46. 47.
48.
557
FH Arnold, AA Volkov. Directed evolution of biocatalysts. Curr Opin Chem Biol 3:54–59, 1999. H Zhao, FH Arnold. Functional and nonfunctional mutations distinguished by random recombination of homologous genes. Proc Natl Acad Sci 94:7997– 8000, 1997. A Zaks. Industrial biocatalysis. Curr Opin Chem Biol 5:130–136, 2001. RR Chirumamilla, R Muralidhar, R Marchant, P Nigam. Improving the quality of industrially important enzymes by directed evolution. Mol Cell Biochem 224:159–168, 2001. M Sivaraja, J Giordano, MG Peterson. High-throughput screening assay for helicase enzymes. Anal Biochem 265:22–27, 1998. FC Christians, L Scapozza, A Crameri, G Folkers, WPC Stemmer. Directed evolution of thymidine kinase for AXT phosphorylation using DNA family shuffling. Nat Biotechnol 17:259–264, 1999. L Giver, A Gershenson, PO Freskgard, FH Arnold. Directed evolution of a thermostable esterase. Proc Natl Acad Sci 95:12809–12813, 1998. S Turconi, K Shea, S Ashman, K Fantom, DL Earnshaw, RP Bingham, UM Haupts, MJB Brown, AJ Pope. Real experiences of uHTS: a prototypic 1536well fluorescence anisotropy-based uHTS screen and application of well-level quality control procedures. J Biomol Screen 6:275–290, 2001. T Lanio, A Jeltsch, A Pingoud. Automated purification of His6-tagged proteins allows exhaustive screening of libraries generated by random mutagenesis. Biotech 29:338–342, 2000. S Silberblatt, RA Felder, TE Mifflin. Optimizing reaction conditions of the NanoOrange protein quantitation method for use with microplate-based automation. JALA 6:83–87, 2001. JH Zhang, TDY Chung, KR Oldenburg. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen 4:67–73, 1999. P Lavery, MJB Brown, AJ Pope. Simple absorbance-based assays for ultra-high throughput screening. J Biomol Screen 6:3–9, 2001. PB Taylor, FP Steward, DJ Dunnington, ST Quinn, CK Schulz, KS Vaidya, E Kurali, TR Lane, WC Xiong, TP Sherrill, JS Snider, ND Terpstra, RP Hertzberg. Automated assay optimization with integrated statistics and smart robotics. J Biomol Screen 5:213–225, 2000. K Slinker. The statistics of synergism. J Mol Cell Cardiol 30:723–731, 1998.
26 Screening for Enantioselective Enzymes Manfred T. Reetz ¨r Kohlenforschung Max-Planck-Institut fu ¨lheim an der Ruhr, Germany Mu
1
INTRODUCTION
Enantiomerically pure or enriched organic compounds play a prominent role in pharmaceutical, agricultural, synthetic organic, and natural products chemistry (1). For example, the so-called chiral market of industrial products in 2000 amounted to $100 billion (1d,e). Many of these products can be prepared in the laboratories of organic chemists. Although conventional separation of enantiomers is still the preferred process in industry (1d), catalytic processes are likely to dominate in the future because asymmetric catalysis has the potential of constituting the economically and ecologically most attractive strategy. The two most important options available to organic chemists are synthetic chiral transition metal catalysts (2), on the one hand, and enzymes, on the other (3). Indeed, both areas are growing in importance. In the case of enantioselective biocatalysts, a formidable number of wild-type enzymes and/or whole cells are used in the industrial production of chiral organic compounds (3,4). Examples include carnitine dehydratasecatalyzed hydroxylation of g-aminobutyric acid with formation of l-carni559
560
Reetz
tine (Lonza), lipase-catalyzed kinetic resolution of chiral amines (BASF), lipase-catalyzed kinetic resolution of 3-(4-methoxyphenyl)glycidic acid ester in the production of diltiazem (DSM/Tanable/Sepracor), and aminoacylasecatalyzed kinetic resolution of N-acyl-methionine (Degussa) (4), to mention only a few. It is certain that the traditional methods of isolating or harvesting such enzymes or whole cells will continue to be applied. Indeed, because only a very small fraction of all enzymes existing on earth have been identified, it is likely that many more useful ones will be found and applied industrially in the future. Because only a very small fraction of soil microorganisms can be readily cultured by standard techniques (0.1% to 1%), methods have recently been developed which allow access to the biodiversity available in uncultured microorganisms (5). This new approach is based on
Figure 1
Directed evolution of an enantioselective enzyme (Ref. 7c).
Screening for Enantioselective Enzymes
561
novel methods for collecting genes in the environment, expressing them in recombinant organisms. Thus metagenome libraries of environmental DNA are being established in companies, leading to huge numbers of hitherto unknown enzymes. This raises the interesting possibility of testing hundreds of thousands of enzymes in enantioselective transformations of interest to organic chemists. The parameter of interest is the enantiomeric excess (ee) and/or, in the case of kinetic resolution, the so-called selectivity factor, E, which reflects the relative rate of reaction of the (R) and the (S) substrate. Obviously, such a task can only be carried out if the appropriate highthroughput screening system(s) is (are) available. The need to develop high-throughput ee assays also arises due to another reason. Recently, the first case in which the methods of directed evolution (6) were applied to the development of an enantioselective enzyme has been reported (7). Accordingly, the appropriate combination of molecular biological methods for random mutagenesis and gene expression coupled with an efficient high-throughput ee-screening system forms the basis of a new area of research (Fig. 1). The gene of the wild-type enzyme, which catalyzes a given reaction of interest A!B, but not with an acceptable degree of enantioselectivity, is first subjected to random mutagenesis [e.g., error-prone polymerase chain reaction (PCR), saturation mutagenesis, DNA shuffling]. Following expression in a suitable bacterial host, the bacterial colonies are plated out on agar plates and harvested by a colony picker (Fig. 2) (7). After being placed in the wells of microtiter plates (e.g., 96 format) containing nutrient broth, arrays of thousands of spatially addressable catalysts become available. Following high-throughput screening, the most enantioselective enzyme variant is identified. Then the corresponding mutant gene is subjected once more to mutagenesis, expression, and screening, a process which creates evolutionary pressure (6,7). Because full exploitation of natural diversity (4,5) as well as the evolution-based extension of diversity (7,8) provides huge numbers of new
Figure 2 Individual steps in the directed evolution of an enantioselective enzyme (Ref. 7c).
562
Reetz
and potentially enantioselective enzymes (certainly thousands, probably millions), the importance of rapid ee assays cannot be underestimated (Fig. 3). This chapter summarizes the present status of research concerning highthroughput ee assays. As will be seen, some of them not only deliver information regarding the enantiopurity of samples, but are also time-resolved. Consequently, activity is also (crudely) measured. If this is not the case, the assay needs to be applied at given time intervals, if information regarding activity is desired. In some cases, it may be necessary to add an internal standard. Alternatively, to exclude nonactive mutants, colony-based on-plate pretests can be applied. Examples include the tributyrin test for lipase activity (9a) and a colorimetric test for epoxide hydrolase activity (10). At this point, the difference between screening and selection, which are sometimes confused, needs to be reemphasized (6d). Screening is the process of identifying by some analytical tool a desired member of a library of enzyme variants, e.g., the most enantioselective variant as a catalyst in a given reaction of interest A!B. If selection is applied in evolutionary experiments, only the desired member(s) of a potential library appears, e.g., as a viable microbial clone. Thus far, no selection system for the directed evolution of enantioselective enzymes has been developed although an ee screen based on differential cell growth was recently introduced (11). This chapter focuses on screening systems for assaying the enantioselectivity of enzyme-catalyzed reactions. Most of these developments arose from the need to apply directed evolution to the creation of enantioselective enzymes for use in organic chemistry. Others were developed by chemists active in the field of combinatorial transition metal catalysis. In most cases, a given assay can be applied to both types of catalysts (9).
Figure 3
Two sources of large libraries of potentially enantioselective enzymes.
Screening for Enantioselective Enzymes
2 2.1
563
HIGH-THROUGHPUT ASSAYS FOR EVALUATING ENANTIOSELECTIVE ENZYMES UV/VIS-Based Assays
A number of assays for screening the (approximate) activity of enzymes have been developed which do not involve enantioselectivity, color tests, or fluorescence-based systems generally being used (6,9,10,12). They are often rather simple and practical and may in fact be used as rough prescreens to sort out ‘‘dead’’ mutants. Unfortunately, extension of these tests to enantioselectivity is not trivial. Indeed, prior to 1997, not a single high-throughput ee assay existed. Conventional ways to determine the ee of a reaction were based on gas chromatography (GC) or high-performance liquid chromatography (HPLC) using chiral columns; however, this allowed only a few dozen samples to be analyzed per day. In a seminal project designed to test directed evolution as a means to create enantioselective enzymes, the lipase-catalyzed hydrolytic kinetic resolution of the p-nitrophenol ester 1 was chosen as a model reaction (7). The wild-type lipase from Pseudomonas aeruginosa catalyzes this transformation with only marginal enantioselectivity in favor of (S)-2. The selectivity factor, E, which reflects the relative rate of the two enantiomers, is only 1.1. The pnitrophenol ester, rather than the usual methyl or ethyl ester, was chosen because the hydrolysis product, p-nitrophenolate (3), can easily be detected by ultraviolet/visible (UV/VIS) spectroscopy using a standard plate reader which addresses microtiter plates in a high-throughput manner. However, if a racemate (rac-1) is used as in a normal kinetic resolution, only the overall activity can be ascertained. To solve this fundamental problem, enantiomerically pure (R)-1 and (S)-1 were used separately pairwise, which means that 48 enzyme variants can be tested on a 96-well microtiter plate (7a).
Two typical experimental plots are shown in Fig. 4. The top one shows the result of the wild-type lipase in which the slopes of the (S) and (R) lines are almost identical, indicating almost no enantioselectivity. The bottom plot displays the results using an enzyme variant in the first generation of random mutagenesis (library of 2000 members), signaling increased (S) selectivity. Such a hit is then studied in detail by running a lab-scale kinetic resolution on the racemate, chiral GC serving as the analytical tool. Notice
564
Reetz
Figure 4 Course of the lipase-catalyzed hydrolysis of the (R) and (S) ester 1 as a function of time measured by a UV/VIS plate reader (Ref. 7a). a) Wild-type lipase from P. aeruginosa, b) improved mutant in the first generation.
that the plots in Fig. 4 also provide some information regarding enzyme activity. Whereas quantification of the reaction rate is not possible, experimental lines showing no slope indicate no activity in the time interval chosen. Thus enzyme variants showing such low activity are eliminated although some of them may actually be enantioselective. A total of four cycles of mutagenesis/expression/screening were performed, about 2000– 4000 enzyme variants being screened in each generation. This led to the creation of an enzyme having an E value of 11 (7a). Later this was increased to E = 26 by applying epPCR and saturation mutagenesis (7b), and most recently a mutant showing even higher enantioselectivity (E > 51) was evolved from the same parent wild-type using recombinant methods (DNA shuffling) (7d). All in all about 40,000 enzyme mutants were screened using this UV/VIS assay. Moreover, it was possible to invert the direction of enantioselectivity (E = 30 in favor of (R)-2) (13), in which case another 40,000 variants were screened. Although this is the first high-throughput ee assay in the literature, allowing between 500 and 800 samples to be tested per day, it suffers from several drawbacks. The most serious disadvantage is that the process of evolution focuses on the p-nitrophenol ester (1), which will certainly not be used in real industrial applications. The methyl or ethyl ester would be industrially relevant, but these do not release a UV/VIS-active alcohol. Moreover, because the (S) and (R) esters are assayed separately, the enzymes are
Screening for Enantioselective Enzymes
565
not allowed to compete for the substrate, which may distort the results. That is why the hits in a library, once identified, need to be studied using the racemate in a lab-scale reaction; the exact ee (or E) is then determined by GC. On the practical side, it is useful to carry out the well-known tributyrin prescreening test to eliminate enzyme variants having no lipase activity whatsoever. Accordingly, the agar plates containing the bacterial colonies are charged with tributyrin. Because of its insolubility in the medium, the plates have a milky appearance. In the case of active lipases, hydrolysis occurs and clear spots appear (9a). Another colorimetric assay for testing the enantioselectivity of lipases or esterases in ester hydrolysis reactions is based on a different principle (14). To simulate the state of competitive conditions of an enzymatic process, the so-called Quick-E-Test was developed in which a mixture of the p-nitrophenol ester of one enantiomeric form of a chiral ester 4 and a resorufin ester 6 is subjected to enzyme-catalyzed hydrolysis, the latter taking on the ‘‘role’’ of the enantiomer. The two hydrolyses were monitored by recording the UV/VIS absorption of the two products 3 and 8 at two distinctly different wavelengths (410 vs. 570 nm). Although this makes a more precise determination of E values possible, the method suffers from the same disadvantage noted previously, namely, the necessity of employing the p-nitrophenol ester of the chiral acid. Nevertheless, appropriate automation should allow a throughput of a thousand or more samples per day.
Yet another UV/VIS test useful in determining the ee of lipases or esterases is based on the notion that hydrolysis of an ester leads to a change in acidity which is measurable by an appropriate pH indicator (15). Upon using a buffer N,N-bis(2-hydroxyethyl)-2-(aminoethane sulfonic acid) and a pH indicator ( p-nitrophenol) having the same pKa value, a linear correlation between the acid generated and the protonation of the indicator was established. In this case, the two enantiomeric esters are studied separately pairwise, the color changes upon protonation in each case being monitored colorimetrically. Currently, it is not quite clear how general and how precise
566
Reetz
this method actually is because it was later observed that appreciable discrepancies between the E value obtained and the E value measured conventionally in control experiments exist in some cases (16). Nevertheless, it is likely that the basic concept can be optimized. A related assay was later reported which makes use of a different and more convenient indicator (bromothymol blue) (17). This system seems to be very practical. However, it should be noted that all of the assays based on pH change reported so far refer to the use of isolated enzymes. In real applications, supernatants are likely to be used such as in directed evolution studies. In supernatants, however, pH variations may occur. Therefore an optimized assay was recently developed in which supernatants are employed (18). In doing so, the pH of the buffer is adjusted to the acidity of the medium. Then about 4000 samples in a kinetic resolution study can be roughly screened per day. The above screening systems are restricted to the hydrolytic kinetic resolution of esters catalyzed by lipases, esterases, or proteases. They are based on the original idea of testing (R) and (S) substrates separately pairwise on microtiter plates (7a). The same applies to an interesting version of this concept in the hydrolysis of chiral acetates (19). In this case, the liberated acetic acid is quantified by conversion into NADH which is monitored by a UV/VIS plate reader at 340 nm. The quantitative conversion of acetic acid into NADH occurs via a cascade of enzyme-catalyzed reactions using a
Figure 5 The hydrolase-catalyzed reaction releases acetic acid, which is converted by acetyl-CoA synthetase (ACS) to acetyl-CoA in the presence of adenosine triphosphate (ATP) and coenzyme A (CoA) (Ref. 19). Citrate synthase (CS) catalyzes the reaction between acetyl-CoA and oxaloacetate to give citrate. The oxaloacetate required for this reaction is formed from L-malate and NAD+ in the presence of Lmalate dehydrogenase (L-MDH). Initial rates of acetic acid formation can thus be determined by the increase in adsorption at 340 nm due to the increase in NADH concentration. Use of optically pure (R) or (S) acetates allows the determination of the apparent enantioselectivity, Eapp.
Screening for Enantioselective Enzymes
567
commercially available enzyme kit (Fig. 5). About 540 E values can be obtained within 1 h, which calculates to be ca. 13,000 determinations per day. Of course, because the enantiomers are tested separately, the hits need to be studied conventionally to ascertain real E values. Obviously, esters other than acetates cannot be used (19). 2.2
Fluorescence-Based Systems
The primary advantage of assays based on fluorescence is the high degree of sensitivity, which allows the use of very dilute substrate concentrations and extremely small amounts of catalysts (20). An elegant fluorogenic assay for the hydrolytic kinetic resolution of certain chiral acetates, e.g., 9, has been developed recently (Fig. 6) (21). It is based on a sequence of two coupled enzymatic steps that converts a pair of enantiomeric alcohols formed by the asymmetric hydrolysis under study [e.g., (R)- and (S)-10] to a fluorescent product (e.g., 12). In step 1, the (R) and (S) substrates 9 are subjected separately to hydrolysis in reactions catalyzed by a mutant enzyme (lipase or esterase), a catalytic antibody, or, in principle, a synthetic catalyst compatible with the system. The goal of the assay is to measure the enantioselectivity of this kinetic resolution. The relative amount of (R)- and (S)-10 produced after a given reaction time is a measure of enantioselectivity and
Figure 6 21).
Fluorescence-based assay for enantioselectivity of ester hydrolysis (Ref.
568
Reetz
can be ascertained rapidly, but not directly. Two subsequent chemical transformations are necessary. In step 2, the enantiomeric alcohols (R)and (S)-10 are oxidized separately to the ketone 11 by horse-liver alcohol dehydrogenase (HLDH), from which the fluorescent final product umbelliferone (12) is released in each case by the catalytic action of bovine serum albumin (BSA) (step 3). Thus by measuring the fluorescence of 12 for the (R) and the (S) substrate separately, the relative amounts of (R)- and (S)-10 can be determined. The authors tested 30 different esterases and lipases and followed the rate of release of 12 by fluorescence in the wells of standard microtiter plates (21). Control experiments ensured that the apparent rate of umbelliferone release is directly proportional to the rate of acetate hydrolysis. The predicted and observed E and ee values (as checked by standard chiral HPLC assay of a lab-scale kinetic resolution) were found to lie within F20%. Only in one case was a larger discrepancy observed, a result that was believed to be caused by the occurrence of an unusually low KM for one of the enantiomers. Thus because the test can be carried out on 96-well microtiter plates, high throughput should be possible. Of course, the inherent disadvantage noted earlier for some of the colorimetric tests also applies here, namely, the fact that the optimization of a potential catalyst is focused on a specific substrate 9 modified by the incorporation of a probe, in this case, the fluorogenic moiety 12. A novel fluorescence-based method for assaying the activity of synthetic catalysts in acylation reactions of alcohols has been described (22a). The underlying idea is to use a molecular sensor which fluoresces upon formation of an acidic product (acetic acid). Protonation of an appropriate chemosensor leads to intense fluorescence. Chiral modification is possible (22b). 2.3
Assays Based on Gas Chromatography, HPLC, Thin-Layer Chromatography, or Capillary Array Electrophorosis
As already delineated, conventional GC or HPLC based on the use of chiral stationary phases can only handle a few dozen ee determinations per day (23,24). However, it was recently demonstrated that GC can be modified so that in certain cases, about 700 exact ee and E determinations are possible per day (25). The case study concerns the lipase-catalyzed kinetic resolution of the chiral alcohol (R)- and (S)-13 with formation of the acylated forms (R)- and (S)-14. Thousands of mutants of the lipase from P. aeruginosa were created by error-prone PCR for use as catalysts in the model reaction (26).
Screening for Enantioselective Enzymes
569
The initial approach concerned the use of two columns in a single GC oven (25, 26). However, this turned out to have a number of disadvantages. The successful construction consists of two GC instruments (27), one prepand-load sample manager (PAL) (28) and a PC (Fig. 7). The instruments are connected to the PC via a standardized data bus (HP-IB) (27), which controls pressure, temperature, etc., and handles other data such as that of the detector. A wash station as well as a drawer system with a maximum of eight microtiter plates were included. Using a special construction developed in-house, the sample manager was attached to the unit in such a way as to reach both injection ports. Because the sample manager can inject samples from 96- or 384-well microtiter plates, over 3000 samples can be handled without manual intervention. The software (ChemstationR) (29) enables additional programs (macros) to be applied before and after each analytical run. Such a macro controls the sample manager, each position on the microtiter plate being labeled via the sequence table. Another macro ensures analysis following each sample run in a specified manner; that is, the peaks of the chiral compound 13 are analyzed quantitatively. The analytical data are transferred to an ExcelR sheet via dynamic data exchange (DDE) (30) in table form or in microtiter format, allowing for a rapid overview. Finally, the setup includes H2 guards which monitor the hydrogen concentration in the ovens; at concentrations exceeding 1% (potentially explosive at >4% H2), the systems responds and automatically switches to nitrogen as the carrier gas (25,26). Using a stationary phase based on a h-cyclodextrin derivative (h-CD), 2,3-di-O-ethyl-6-O-tert-butyldimethylsilyl-h-CD, complete separation of
Figure 7 Schematic representation of a GC-screening system comprising two GC instruments (Ref. 25).
570
Reetz
(R)- and (S)-13 [but not of (R)/(S)-14] was achieved within 3.9 min (25,26). Because the configuration illustrated in Fig. 7 comprises two simultaneously operating GC units, about 700 exact ee determinations of (R)/(S)-13 are possible per day. Moreover, the corresponding values for the conversion and the selectivity factor, E, (or s) are likewise automatically provided in microtiter format. A typical example is shown in Fig. 8 in which the data corresponding to the most selective mutant enzymes are shown in gray boxes (EV2.4) (25). Mutants displaying 0% conversion imply complete lack of enzyme activity within the predetermined time span. Negative values for ee indicate reversal of enantioselectivity. Contrary to common belief, it is thus possible to utilize GC in highthroughput screening of enantioselectivity in appropriate cases. This type of GC setup should also be useful in the screening of nonchiral transformations. Moreover, it is sometimes possible to increase throughput even further by injecting samples at proper times which are shorter than the total time span of the actual chromatogram, enabling maximum use of time between runs (interlocking chromatograms) (25,26). Major advantages relative to the employment of two totally separate GC units include the optimal use of laboratory space and the utilization of a single sampler and a
Figure 8 ExcelR sheet of GC data in microtiter format showing values for percent conversion (c), percent ee, and selectivity factor (E ) for mutant lipases catalyzing the hydrolytic kinetic resolution of alcohol 13 (Ref. 25).
Screening for Enantioselective Enzymes
571
computer system, resulting in high instrumental and economical efficiency. Although optimization needs to be performed for each new chiral compound to be tested, it can be anticipated that in appropriate cases 600–800 samples can easily be handled per day. It has recently been shown that HPLC can be developed analogously to suit the requirements of a given analytical problem (31). However, it is unlikely that truly high-throughput ee determinations, meaning many thousands of samples per day, can be achieved in a general way on the basis of GC or HPLC. Of course, depending upon the particular problem at hand, a throughput of 600–800 ee determinations per day may suffice. A related question concerns high-throughput screening of enantioselectivity based on thin-layer chromatography (TLC) (9a,26). It is easy to imagine that hundreds of TLC plates can be scanned rapidly using the appropriate computer image processing which ‘‘integrates’’ spots on a given surface. The real challenge is to find efficient chiral selectors which result in sufficient enantiomer separation. Although the above-mentioned chromatographic techniques may well serve as practical assays in special cases, truly high-throughput encompassing thousands of ee values per day is outside of the realm of these assays. In sharp contrast, capillary array electrophoresis (CAE) has recently been modified to allow the high-throughput determination of enantioselectivity (32). It is well known that traditional capillary electrophoresis (CE) in which the electrolyte contains chiral selectors, such as h-cyclodextrin derivatives (h-CDs), can be used to determine the enantiomeric purity of a given sample (33). Unfortunately, the conventional forms of this analytical technique allow for only a few dozen ee determinations per day. However, because of the analytical demands arising from the Human Genome Project, inter alia, CE has been revolutionized in recent years so that efficient techniques for instrumental miniaturization are now available, making super-high-throughput analysis of biomolecules possible for the first time (34,35). Two different approaches have emerged, namely, capillary array electrophoresis (CAE) (34) and CE on microchips (also called CAE on chips) (35). Both techniques can be used to carry out DNA sequence analyses and/or to analyze oligonucleotides, DNA restriction fragments, amino acids, or PCR products. Many hundred thousands and more analytical data points can be accumulated per day (34,35). In the case of CAE, commercially available instruments have been developed which contain a high number of capillaries in parallel, e.g., the 96 capillary unit MegaBACER which consists of 6 bundles of 16 capillaries (36). The system can, therefore, address a 96-well microtiter plate. Each capillary is about 50 cm long. This system was adapted as a super-highthroughput analytical tool for ee determination (32). In this study, chiral
572
Reetz
amines of the type 17, which are of importance in the synthesis of pharmaceutical and agrochemical products (37), were used as the model substrates. They are potentially accessible by catalytic reductive amination of ketones 15, Markovnikov addition of ammonia to olefins 16 or enzymatic hydrolysis of acetamides 18 (the reverse reaction also is possible).
In exploratory experiments, the conditions for conventional CE assay of the amines 17 were first optimized using various a- and h-CD derivatives as chiral selectors (32). To enable a sensitive detection system, namely laserinduced fluorescence detection (LIF), the amines were first derivatized by conventional reaction with fluorescene-isothiocyanate (19) leading to fluorescence-active compounds 20. Although extensive optimization was not carried out (only six CD derivatives were tested), in all cases, satisfactory baseline separation was accomplished.
The next step involved the use of compounds 17c/20c as the model substrates for CAE analysis using an instrument of the kind MegaBACER. Known enantiomeric mixtures of the amine 17c were transformed into the
Screening for Enantioselective Enzymes
573
fluorescence-active derivative 20c. The latter samples were then analyzed by CAE. Unfortunately, the results of the conventional single capillarly system could not be reproduced in the CAE experiments because of unstable electrophoretic runs. The problem was solved by developing a special electrolyte having a higher viscosity. It is composed of 40 mM CHES pH 9.1/6.25 mM g-CD 5:1 diluted with a buffer containing linear polyacrylamide. The MegaBACER instrument was operated at a potential of 10 kV/ 8 AA and a sample injection potential of 2 kV/9 s. Under these conditions, baseline separation is excellent. The agreement between ee values of (R)/(S) mixtures of 20c determined by CAE and those of the corresponding (R)/(S) mixtures of 17c as measured by GC turned out to be excellent (32). The enantiomer separation of (R)/(S)-20c on the MegaBACER instrument required about 19 min. This means that although the conditions are far from optimized, the automated 96-array system provides more than 7000 ee determinations in a single day (32). In related cases, optimization resulted in shorter analysis times for enantiomer separation so that a daily throughput of 15,000 to 30,000 ee determinations is realistic. Such super-highthroughput screening for enantioselectivity is not readily possible by any other currently available technology. In view of the possibility of chiral selector optimization and the fact that CAE has many advantages, such as extremely small amounts of samples, essentially no solvent consumption, absence of high pressure pumps and valves, as well as high durability of columns, this CAE assay is ideally suited for high-speed ee determination. A variation of the above method, namely, the possibility of highthroughput screening of the ee of chiral organic compounds by utilizing capillary electrophoresis on microchips has been proposed (38). CE (or more specifically CAE) on microchips (typically 1010 cm), in general, had previously been developed for the analysis of biomolecules (35). Traditional photolithographic techniques are thereby used to produce capillary arrays on plastic or glass microchips. However, the enantiomer separation of organic molecules on plastic microchips is not generally feasible due to the chemical instability of such systems. The situation is quite different in the case of glass chips (32). In such a modification, enantiomer separation, e.g., of compound 20c, is possible, the detection being based on laser-induced fluorescence (LIF). Optimization and automation using robotics still need to be carried out. Nevertheless, a cheap and efficient CAE-based assay for super-highthroughput ee determination may emerge in a few years (38b). In summary, the two forms of capillary array electrophoresis are emerging as powerful methods for the determination of enantiomeric purity of chiral compounds in a truly high-throughput manner. Of course, for a given analytical problem, derivatization and antipode separation need to be efficient, which means that universal generality cannot be claimed. Various
574
Reetz
modifications are possible, e.g., detection systems based on UV/VIS, MS, or electrical conductivity. Moreover, chiral selectors in the CE electrolyte are not even necessary if the mixture of enantiomers is first converted into diastereomers, e.g., using chiral fluorescent-active derivatization agents (32). 2.4
Assays Based on Circular Dichroism
An alternative to HPLC employing chiral columns which separate the enantiomers of interest is the use of normal columns which simply separate the starting materials from the enantiomeric products, enantiomeric excess (ee) of the mixture of enantiomers then being determined by circular dichroism (CD) spectroscopy. Indeed, this principle was first established in 1980 (39) and developed further in later research (40,41). Recently, it was shown that the method can be applied in the screening of combinatorially prepared enantioselective transition metal catalysts (42). However, it should be amenable to enzyme-catalyzed processes as well. The method is based on the use of sensitive detectors for HPLC which determine in a parallel manner both the circular dichroism (De) and the UV absorption (e) of a sample at a fixed wavelength in a flow-through system (39–42). The CD signal depends only on the enantiomeric composition of the chiral products, whereas the absorption relates to their concentration. Thus only short HPLC columns are necessary (39,40,42). Upon normalizing the CD value with respect to absorption, the so-called anisotropy factor g is obtained (42): g¼
De e
For a mixture of enantiomers, it is thus possible to determine the ee value without recourse to complicated calibration. The fact that the method is theoretically valid only if the g factor is independent of concentration and if it is linear with respect to ee has been emphasized repeatedly (39–42). However, it needs to be pointed out that these conditions may not hold if the chiral compounds form dimers or aggregates because such enantiomeric or diastereomeric species would give rise to their own particular CD effects. Although such cases have yet to be reported, it is mandatory that this possibility be checked in each new system under study. This precaution was described in detail in the development of a CDbased ee assay for chiral alcohols (43). In work concerning the directed evolution of enantioselective enzymes, there was the need to develop fast and efficient ways to determine the enantiomeric purity of these compounds, which can be produced enzymatically either by reduction of the prochiral ketone (e.g., 21) using reductases or by kinetic resolution of rac acetates (e.g., 23) by lipases. In both systems, the CD approach is theoretically
Screening for Enantioselective Enzymes
575
possible. In the former case, an LC column would have to separate the educt 21 from the product (S)/(R)-22, whereas in the latter case, (S)/(R)-22 would have to be separated from (S)/(R)-23.
Because acetophenone (21) has a considerably higher extinction coefficient than 1-phenylethanol (22) at a similar wavelength (near 260 nm), the separation of starting material from product was absolutely necessary which was accomplished using a relatively short HPLC column based on a reversed phase system. In preliminary experiments using enantiomerically pure product 22, the maximum value of the CD signal was determined (43). Mixtures of 22 having different enantiomer ratios (and, therefore, ee values) were prepared and analyzed precisely by chiral GC in control experiments. The same samples were studied by CD, resulting in the compilation of g values. Upon plotting the g against the ee values, a linear dependency was in fact observed with a correlation factor of r = 0.99995 which translates into the following simple equation for enantioselectivity: ee ¼ 3176:4 g 8:0 The possible dependency of the g factors on concentration was then studied (43). A mixture of (S)- and (R)-22 corresponding to an enantiomeric excess of ee = 20% was prepared at a concentration of 10 Al ml1 in acetonitrile, which was then successively diluted. It was shown that no dependency of g on concentration (standard deviation = 2.6%) exists. Thus possible aggregation due to hydrogen bonding between two or more molecules of the product (S)- and (R)-22 in this medium, which could lead to artifacts, is not involved, making the system amenable to CD analysis and, therefore, to high-throughput analysis. Although complete optimization was not carried out, separation of 21 from (S)/(R)-22 was in fact accomplished using reversed phase silica as the column material and methanol/water (47/53) as the eluant. In view of the results concerning the dependency of the g factor on concentration (see above), aggregation can be excluded in this protic medium. Fig. 9 shows the
576
Reetz
Figure 9 HPLC chromatogram of a mixture of 21 (peak 1) and (S)/(R)-22 (peak 2) (Ref. 25).
corresponding HPLC chromatogram in which the mixture is fully separated within less than 1.5 min. Thus using the JASCO-CD-1595 instrument in conjunction with a robotic autosampler, it is possible to perform about 700– 900 exact ee determinations per day (26,43). In some cases, it is also possible to obtain reliable ee values using CDbased assays although no LC separation is performed whatsoever (26,43, 44). Prerequisite is a prochiral substrate (e.g., a meso-compound) as well as a UV-active product (chromophore) which is formed as the enantioselective reaction proceeds. The absorption maximum of the prochiral compound has to differ considerably from that of the desired chiral product. This new principle is illustrated by the lipase-catalyzed enantioselective acylation of the meso-diol 24 by benzoic acid p-nitrophenyl ester (25) with formation of the chiral product 26 and the yellow-colored p-nitrophenolate (3) having a characteristic UV/VIS absorption at 410 nm. Upon measuring the g value of the absorption maximum of 26 and the additional UV absorption of 3, all information necessary to determine conversion and enantiopurity is available without the need to perform any LC separation. The advantage of this novel approach has to do with ease of performance and the obvious prospect of higher throughput (26,43,44).
Screening for Enantioselective Enzymes
577
On the basis of these and previous studies, HPLC–UV–CD or UV– CD alone may well constitute a viable high-throughput screening system for enantioselective enzymes in a given situation. Success will depend upon the particular substrate under study. Moreover, the precautions as delineated above need to be considered. 2.5
IR-Thermographic Assays
Modern photovoltaic IR cameras equipped with focal plane array detectors are capable of detecting infrared radiation (black body radiation) emitted by objects (45). The picture obtained thereby provides a two-dimensional thermal image which is nothing but a spatial map of the temperature and the emissivity distribution of all objects in the picture. It is customary to use different colors in the pictures to visualize different photon intensities of the detected infrared radiation, e.g., red areas indicate ‘‘hot spots’’, blue areas denote ‘‘cold spots’’. The technique was first used to monitor the dynamics of reactions on solid surfaces (46) and was extended to obtain temperature profiles of exothermic gas-phase reactions catalyzed by SiO2-supported platinum particles (47). The first cases of parallel testing of the activity of the members of a library of heterogeneous catalysts were reported later (48 49 50). An important conceptional advancement pertains to emissivitycorrected IR thermography of large libraries of heterogeneous catalysts, a technique that requires only very small amounts of catalysts (99% in favor of (R)-23 (53).
In conclusion, IR thermography is a viable tool in the high-throughput identification of highly active and enantioselective enzymes (or other catalysts) in exothermic processes. The method allows one to distinguish such ‘‘hits’’ from other members of a library of catalysts which are much less active or less enantioselective. However, quantification has yet to be achieved. This
Figure 10 Time-resolved IR thermographic imaging of the lipase-catalyzed enantioselective acylation of 22 after a) 0.5 min, b) 0.5 min, and c) 3.5 min. The control experiment without enzyme is given in the bottom row in each case. The bar on the far right is the temperature/color key of the temperature window used [jC] (Ref. 52).
Screening for Enantioselective Enzymes
579
means that small differences in enantioselectivity, as usually observed in sequential rounds of enzyme mutagenesis, cannot be picked up by IRthermographic assays. Moreover, the fact that meaningful comparisons on microtiter can only be made if the same amount of enzyme is present in each well needs to be kept in mind when attempting to apply this technology in real situations. 2.6
Assays Based on Mass Spectrometry
Although several assays based on mass spectrometry (MS) have been developed for use in combinatorial catalysis (54,55), application to the screening of enantioselective catalysts is not obvious because the (R) and (S) forms of a chiral compound show identical mass spectra. However, certain chiral effects, such as enantioselective host/guest interactions, ion/molecule reactions with chiral reagents, and parallel reactions for enantiomeric quantification of peptides, have been studied by MS (56). Moreover, the absolute configuration of chiral alcohols can be determined using the Horeau method in which the substrate is derivatized by (R)- and (S)configurated reagents, one of them being mass-tagged (e.g., deuterium) (57). The diastereomers can thus be distinguished by MS. To be able to measure the enantiomeric excess (ee) of a sample, a measurable degree of kinetic resolution is theoretically necessary. This principle forms the basis of a novel high-throughput ee assay (58). The technique makes use of an equimolar mixture of pseudo-enantiomeric mass-tagged chiral acylating agents that differ in a substituent remote to the stereogenic center (e.g., methyl vs. hydrogen) in a way that the mass of the molecule correlates with its absolute configuration. In principle, the reactions of enantiomers with chiral reagents can proceed with unequal rate constants (kf>ks; f = fast, s = slow). Chiral alcohols (R)-OH and (S)-OH and mass-tagged enantiomerically pure acylating agents are illustrated in Fig. 11 (58). The enantiomeric alcohols (R)-OH and (S)-OH are first reacted with the chiral mass-tagged acids A-CO2H and B-CO2H in the presence of 1,3-dicyclohexylcarbodiimide. The relative amounts of the diastereomeric product esters as measured by MS can then be used to determine the enantiomeric composition of the starting mixture (R)-OH/(S)-OH and, therefore, the ee value, provided two calibration measurements are performed. Specifically, prolinederived mass-tagged acylating agents were used in great excess (Fig. 11) (58). The system fails if no measurable degree of kinetic resolution occurs. The sensitivity of the method was shown to be F10% of the ee values. Moreover, the authors have pointed out the possibility of a robotic high-throughput screening system using microtiter plates (58). In principle, the method can be used to study (bio)catalytic desymmetrization of meso-compounds, kinetic
580
Reetz
Figure 11 MS-based ee determination in the kinetic resolution of alcohols (R)-OH/ (S)-OH (58). I = peak intensity; q = correction factor for ionization; DCC = dicyclohexylcarbodiimide.
resolution of racemates, as well as transformation of prochiral compounds into chiral products. In a rather different MS-based approach, diastereomer formation by chiral derivatization is not necessary (59). In the original version, about 1000 catalyst evaluations are possible per day (59). Indeed, the method is currently in operation in the directed evolution of enantioselective lipases and epoxide hydrolases (60). Recently, throughput has been increased by a factor of about 8 to 10 (see discussion below) (61). Ionization can be accomplished by a number of standard methods including electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI). Two basically different stereochemical processes can be monitored by this
Screening for Enantioselective Enzymes
581
method, namely, kinetic resolution of racemates and asymmetric transformation of substrates which are prochiral due to the presence of enantiotopic groups (59,61). The underlying principle is based on the use of isotopically labeled substrates in the form of pseudo-enantiomers or pseudo-prochiral compounds (Fig. 12). The course of the asymmetric transformation, i.e., the relative amounts of reactants and/or products, are detected by ESI–MS. In the case of kinetic resolution, pseudo-enantiomers 27 and 28, differing in absolute configuration and in labeling at the functional group FG*, need to be prepared in enantiomerically pure form and then mixed in a 1:1 manner simulating a racemate (Fig. 12a). Following asymmetric func-
Figure 12 a) Asymmetric transformation of a mixture of pseudo-enantiomers involving cleavage of the functional groups FG and labeled FG*. b) Asymmetric transformation of a mixture of pseudo-enantiomers involving either cleavage or bond formation at the functional group FG; isotopic labeling at R2 is indicated by the asterisk. c) Asymmetric transformation of a pseudo-meso-substrate involving cleavage of the functional groups FG and labeled FG*. d) Asymmetric transformation of a pseudo-prochiral substrate involving cleavage of the functional groups FG and labeled FG* (Ref. 59).
582
Reetz
tional group transformation (in an ideal kinetic resolution 50% conversion), true enantiomers 29 and 30 are formed in addition to nonlabeled and labeled achiral products 31a and 31b, respectively. The ratios of the total intensities of 27/28 and 31a/31b in the MS spectra (m/z intensities of the quasimolecular ions) allow for the determination of enantiomeric purity and, therefore, enantioselectivity of a catalyst. In some cases, it may be advantageous to use an internal standard to determine the conversion (59,61). As a variation of this theme, kinetic resolution of the pseudo-enantiomers 27 and 32 in which labeling occurs at residue R2 affords a new pair of pseudo-enantiomers 29 and 33 (Fig. 12b). Based on the m/z intensities of the quasi-molecular ions of 27/32 and 29/33, the conversion, enantioselectivity, and selectivity factor (s or E value) can be obtained. An internal standard is not necessary (59,61). In the case of prochiral substrates having enantiotopic groups, e.g., meso-compounds (Fig. 12c), the synthesis of a single pseudo-meso-compound suffices, e.g., 34, because the stereodifferentiating reaction of interest delivers a mixture of two MS-detectable pseudo-enantiomers 35 and 36. The same applies to other pseudo-prochiral substrates of the type 37 (Fig. 12d). The first system to be tested concerns the kinetic resolution of racemic 1-phenylethyl acetate (23) (59). For this purpose, the pseudo-enantiomers (S)-23 and (R)-40 were prepared in enantiomerically pure form. To test the assay system, these two compounds were mixed in various ratios and the resulting mixtures were analyzed by GC to ascertain the exact pseudo-ee values as a control. Thereafter, the same samples were analyzed by ESI–MS. A typical ESI–mass spectrum is shown in Fig. 13. Because the sodium
Figure 13
ESI–mass spectrum of a sample containing (S)-23 and (R)-40 (Ref. 59).
Screening for Enantioselective Enzymes
583
adducts of (S)-23 and (R)-40 appear at different m/z values due to the deuterium labeling, integration is a simple manner. A total of 17 control samples were studied, and the correspondence between ee values determined by GC and ESI–MS of 17 samples is excellent (F5%) (59). Therefore in real practice, as in the directed evolution of enantioselective lipases, 1:1 mixtures of (S)-23 and (R)-40 are used to simulate a racemate in the actual catalytic process. In contrast to a number of other methods which suffer from the fact that (R)- and (S)-configurated substrates are tested separately as pairs on microtiter plates, the present system utilizes 1:1 mixtures of pseudo-enantiomers in kinetic resolutions. Moreover, analogous reactions on the solid phase, if necessary, should pose no problems.
An experimental setup capable of high-throughput screening of enantioselective reactions was then devised. This was achieved by combining an automated liquid sampler for microtiter plates (96-format) with an ESI–MS system, both commercially available (Fig. 14) (59,61). This first generation unit allows in a single day about 1000 rather precise determinations of the ee value and the conversion (and thus E) of such transformations as the above model reaction. The uncertainty in the ee value is only F2%. As apparent from Fig. 12, deuterium labeling can be performed at any position of the substrate. It is advisable to perform a quick kinetic study of labeled and nonlabeled substrates to exclude possible secondary isotope effects. As an example of asymmetric transformation of a prochiral substrate bearing reactive enantiotopic groups, the desymmetrization of cis-1,4-di-
Figure 14
ESI–MS-based ee-screening system (Refs. 59,61).
584
Reetz
acetoxy-cyclopentene was described (59). In this case, the pseudo-prochiral compound 43 was prepared. The products of asymmetric transformation are compounds 44 and 45, each having two stereogenic centers. Because they are pseudo-enantiomers differing in mass, they can easily be distinguished by ESI–MS. It has been demonstrated that this assay is well suited in the directed evolution of an enantioselective lipase from Bacillus subtilis, by performing saturation mutagenesis systematically at every position of the enzyme, specifically for evolving mutants that catalyze the enantioselective hydrolysis of 43 (60,61).
More recently, a significant increase in throughput has been achieved, which was possible on the basis of instrumental improvements (61). MS instruments equipped with eight-channel multiplex spray systems are now available. Appropriate second-generation modification to suit the demands of high-throughput allows the determination of 8000 to 10,000 ee values per day, making this screening system one of the most powerful, precise, and practical ee assays currently available (61). It is being applied in the directed evolution of enantioselective lipases and epoxide hydrolases (60,61). Other types of isotopic labeling are also possible, e.g., 15 N (59b). This has been applied in the directed evolution of an enantioselective nitrilase (62). 2.7
Assays Based on DNA Microarrays
A novel ee assay was recently reported in which DNA microarrays are used (63). This type of technology had previously been employed to determine relative gene expression levels on a genomewide basis as measured by the ratio of fluorescent reporters (64). In the case of the ee assay, the goal was to measure the enantiopurity of chiral amino acids (63). Mixtures of (R)/(S) amino acid were first subjected to acylation at the amino function with formation of N-Boc-protected derivatives. Samples were then covalently attached to amine-functionalized glass slides in a spatially arrayed manner (Fig. 15). In a second step, the uncoupled surface amino functions were
Figure 15 Reaction microarrays in high-throughput ee determination (Ref. 63). Reagents and conditions: step 1) BocHNCH(R)CO2H, PyAOP, iPr2NEt, N,Ndimethylformamide (DMF); step 2) Ac2O, pyridine; step 3) 10% CF3CO2H and 10% Et3SiH in CH2Cl2, then 3% Et3N in CH2Cl2; step 4) pentafluorophenyl diphenylphosphinate, iPr2NEt, 1:1 mixture of the two fluorescent proline derivatives, DMF, 20jC.
Screening for Enantioselective Enzymes
585
586
Reetz
acylated exhaustively. The third step involved complete deprotection to afford the free amino function of the amino acid. Finally, in a fourth step, two pseudo-enantiomeric fluorescent probes were attached to the free amino groups on the surface of the array. An appreciable degree of parallel kinetic resolution in the process of amide coupling is a requirement for the success of the ee assay similar to one of the second MS-based systems described above (58) [Horeau principle (57)]. In the present case, the ee values are accessible by measuring the ratio of the relevant fluorescent intensities. It was reported that 8000 ee determinations are possible per day, precision amounting to F10% of the actual value. Although it was not explicitly demonstated that this ee assay can be used to evaluate enzymes (e.g., proteases), this should in fact be possible. The question whether other types of substrates (and enzymes) are amenable to this type of screening also needs to be addressed. 2.8
Enzymatic Method for Determining Enantiomeric Excess (EMDee)
Recently an enzymatic method for determining enantiomeric excess (EMDee) has been described (65). It is based on the idea that an appropriate enzyme can be used to selectively process one enantiomer of a product from a catalytic or a biocatalytic reaction. In the original paper, the well-known catalytic addition of diethylzinc (47) to benzylaldehyde (46) was chosen as a test reaction for demonstrating EMDee. The reaction product, 1-phenylpropanol (48), can be oxidized to ethyl phenyl ketone (49) using the alcohol dehydrogenase from Thermoanaerobium sp., this process being completely (S)-selective (Fig. 16). It was possible to measure the rate of this enzymatic oxidation by monitoring the formation of nicotinamide adenine dinucleotide phosphate (NADPH) by UV spectroscopy at 340 nm.
Figure 16 Scheme illustrating EMDee in the case of 1-phenylpropanol produced by asymmetric addition of diethylzinc to benzaldehyde (Ref. 65).
Screening for Enantioselective Enzymes
587
Decisive for the success of the assay is the finding that the rate of oxidation constitutes a direct measure of the ee (65). High-throughput was demonstrated by analyzing 100 samples in a 384-well format using a UV/ fluorescence plate reader. Each sample contained 1 Amol of 1-phenylpropanol (48) in a volume of 100 Al. The accuracy in the ee value amounts to F10% as checked by independent GC determinations. About 100 samples could be processed within 30 min (65), which calculates to be 4800 ee determinations per day. It should be noted that EMDee does not distinguish between processes that proceed with low enantioselectivity but high conversion and with high enantioselectivity but low conversion. Therefore EMDee was extended to provide information regarding both ee and conversion (64). In a second set of assays, the (R)-selective alcohol dehydrogenase from Lactobacillus kefir was used to quantify the amount of (R)-48 present in the mixture. Because the amounts of (R)-48 and (S)-48 are known, conversion can be calculated. It is currently unclear how general the EMDee assay is in the case of other chiral alcohols which do not show such high enantioselectivity in the alcohol dehydrogenase-catalyzed oxidation. In this case, a different and more selective alcohol dehydrogenase should be used. Indeed, a large number of such enzymes are commercially available. In summary, EMDee constitutes an interesting way to determine the ee of alcohols in a highthroughput manner using standard instrumentation. Of course, the assay has to be optimized in each new case of a chiral alcohol under study. 2.9
Enzyme Immunoassays as a Means to Measure Enantiomeric Excess
Another recent development concerns high-throughput screening of enantioselective catalysts by enzyme immunoassays (66), a technology that is routinely applied in biology and medicine. As in the case of some of the other screening systems, this new assay was not developed specifically for enzymecatalyzed processes. In fact, it was illustrated by analyzing (R)/(S) mixtures of mandelic acid generated by enantioselective Ru-catalyzed hydrogenation of benzoyl formic acid (50) (Fig. 17). By employing an antibody that binds both enantiomers, it was possible to measure the concentration of the reaction product, thereby allowing the yield to be calculated. The use of an (S)specific antibody then makes the determination of ee possible (Fig. 17). Of course, the success of this assay depends upon the availability of specific antibodies; indeed, these can be raised to almost any compound of interest. Moreover, a simple automated equipment comprising a plate washer and a plate absorbance reader is all that is necessary. About 1000 ee determinations are possible per day, the precision amounts to F9% (66).
588
Reetz
Figure 17 Scheme illustrating high-throughput screening of enantioselective catalysts by competitive enzyme immunoassays (Ref. 66). The antibody marked blue recognizes both enantiomers, whereas the antibody marked red is (S)-specific, making the determination of yield and ee possible.
2.10
NMR-Based Assays
Magnetic resonance imaging (NMR) spectroscopy is traditionally viewed as a relatively slow analytical procedure and, therefore, may not appear to be amenable to high-throughput analyses. However, progress has in fact been made in the utilization of NMR methods in combinatorial drug discovery processes both on solid supports and in solution (67). Additional advancements pertaining to the miniaturization of probes and the development of cryo-probes are expected to stimulate future progress in the use of NMR spectroscopy in combinatorial chemistry. Because high-throughput stands at the heart of combinatorial catalysis, NMR spectroscopy, if applied in this area, needs to be modified. One possible approach is magnetic resonance imaging (NMR tomography) which is used successfully in medicine to image tissues and organs. In high-throughput ee screening, the goal is to obtain tomograms of microtiter plates on which enantioselective reactions are occurring. In exploratory experiments, the principle was illustrated, but quantitative evaluation and implementation in a real system still needs to be accomplished (9a,26). However, the current technological state of instrumentation makes real applications difficult. The test measurements were made with a sample head having a diameter of 5 mm, 8.5 min being needed. However, the length of a microtiter plate is 12 cm, and the recording time increases rapidly with increasing cavity size in the magnet. Nevertheless, this method has some potential in ee determinations (and other types of screening) if a combination of specially formed sample vessels (which are designed to utilize the round magnet more efficiently) and a small sample head were to be developed. A second and very different NMR-based approach promises to be truly practical (68). Indeed, organic chemists and biochemists are well versed in solution NMR spectroscopy and may thus prefer this method. In one manifestation, the assay makes use of the concept of isotopically labeled
Screening for Enantioselective Enzymes
589
pseudo-enantiomers and pseudo-meso-compounds. This is related to one of the MS systems described above (59,61). In this case, 1H NMR is the detection system. This again means that monitoring kinetic resolution of chiral compounds and/or desymmetrization of prochiral compounds bearing reactive enantiotopic groups is possible. In the present NMR system, isotope labeling of one enantiomer in a given pseudo-racemate is best performed with 13C. An example is the lipase-catalyzed kinetic resolution of the pseudo-enantiomeric pair (R)-23/(S)-51 in which the latter contains a 13 C-labeled acetoxy moiety. In kinetic resolution, it is necessary to determine the ratio after a given period of time (e.g., at 50% conversion). The 1H NMR spectra of (R)-23 and (S)-51 are quite different due to the 13C–1H coupling in the methyl group of (S)-51. In the unlabeled case (R)-23, the methyl group appears as a singlet, whereas the 13C-labeled (S)-51 gives rise to a doublet, and these peaks are easily integrated.
It was readily demonstrated that in various mixtures of (R)-23/(S)-51, integration of the methyl peaks allows for the determination of the relative amounts of the pseudo-enantiomers present. Thus ee values are accessible. These turned out to be amazingly accurate (F2%) as checked by chiral GC. Moreover, using flow-through systems now commercially available, it is possible to measure at least 1400 samples per day (68). With new NMR cell systems, it is likely that throughput can be increased by a factor of at least 8. In a second and more general manifestation of this NMR-based ee assay, the mixture of enantiomers is first derivatized by a chiral reagent using a robotic system (68). Using a flow-through cell, system integration of the diastereomers affords ee values. Precision in this embodiment is slightly lower (F5%). Currently, throughput amounts to about 1400 samples per day. Upon using parallel flow-through cells and chemical imaging (tomography), it should be possible to increase this by a factor of about 8. 2.11
IR-Based Assays
The concept of using isotopic labeling in order to distinguish enantiomers can also be applied to IR spectroscopy in appropriate cases, as in the kinetic resolution of acetates of the 23 (69). In this case 13C labeling is introduced at the carbonyl C atom. The method is very cheap.
590
3
Reetz
CONCLUSION
A number of approaches have been described which make possible the highthroughput determination of ee values. Not all of them have been developed for assessing the enantioselectivity of enzymes. However, modifications to include biocatalysts are possible. On the practical side, it is useful to apply some kind of an (achiral) prescreening test to eliminate nonactive (‘‘dead’’) enzyme mutants. This reduces the size of the library to be tested and, therefore, maximizes efficiency. Presently, several colorimetric ee assays are available for certain types of transformations, e.g., ester hydrolyses catalyzed by lipases, esterases, proteases, or catalytic antibodies. Such assays allow the screening of up to a few thousand samples per day. However, they are only semiquantitative; that is, they are accurate enough to identify hits which then need to be analyzed by conventional analytical techniques. In some cases, instrumentally modified chiral GC or HPLC may suffice; in favorable cases, the exact analysis of about 700 samples is possible per day. Of the other approaches reported so far for the high-throughput screening of enantioselective enzymes, the MS-, JR-, and NMR-based assays using isotopically labeled substrates belong to the most efficient and accurate systems currently known. They allow the precise determination of 1000 to 10,000 ee values per day. Capillary array electrophoresis is also worthy of mention because a throughput of 8000 to 20,000 samples per day at high precision is realistic in some cases. Other assays show lower precision in the ee value (F10%) which suffices in the early stages of directed evolution of an enantioselective enzyme. Hits are thereby easily identified; however, in the later stages of optimization, e.g., when going from ee = 90% to ee>98%, problems arise. Then more precise ee assays are necessary. It needs to be emphasized that no single assay is universal. Rather, the best systems may turn out to be complementary.
REFERENCES 1a. AN Collins, GN Sheldrake, J Crosby. Chirality in Industry: The Commercial Manufacture and Applications of Optically Active Compounds. Chichester: Wiley, 1992, pp 409. 1b. AN Collins, GN Sheldrake, J Crosby, eds. Chirality in Industry II: Developments in the Commercial Manufacture and Applications of Optically Active Compounds. Chichester: Wiley, 1997, pp 411. 1c. RA Sheldon. Chirotechnology: Industrial Synthesis of Optically Active Compounds. New York: Dekker, 1993, pp 416. 1d. SC Stinson. Chiral drug interactions. Chem Eng News 77(41):101–120, 1999.
Screening for Enantioselective Enzymes
591
1e. SC Stinson. Chiral drugs. Chem Eng News 78(43):55–78, 2000. 1f. AM Rouhi. Chiral roundup as pharmaceutical companies face bleak prospects, their suppliers diligently tend the fertile field of chiral chemistry in varied ways. Chem Eng News 80(23):43–50, 2002. 2a. EN Jacobsen, A Pfaltz, H Yamamoto. Comprehensive Asymmetric Catalysis. Vol. I–III. Berlin: Springer, 1999, pp 1500. 2b. H Brunner, W Zettlmeier. Handbook of Enantioselective Catalysis with Transition Metal Compounds. Vol. I-II. Weinheim: VCH, 1993. 2c. R Noyori. Asymmetric Catalysis in Organic Synthesis. Wiley, 1994, pp 378. 2d. I Ojima, Ed. Catalytic Asymmetric Synthesis. Weinheim: VCH, 1993, pp 476. 2e. DJ Berrisford, C Bolm, KB Sharpless. Ligand-accelerated catalysis. Angew Chem 107: 1159–1171, 1995; Angew Chem, Int Ed Engl 34:1059–1070, 1995. 3a. HG Davies, RH Green, DR Kelly, SM Roberts. Biotransformations in Preparative Organic Chemistry: The Use of Isolated Enzymes and Whole Cell Systems in Synthesis. London: Academic Press, 1989, pp 268. 3b. CH Wong, GM Whitesides. Enzymes in Synthetic Organic Chemistry. Tetrahedron Organic Chemistry Series. Vol. 12. Oxford: Pergamon, 1994, pp 370. 3c. K Drauz, H Waldmann. Enzyme Catalysis in Organic Synthesis: A Comprehensive Handbook. Vol. I-II. Weinheim: VCH, 1995. 3d. K Faber. Biotransformations in Organic Chemistry. 3rd ed. Berlin: Springer, 1997, pp 402. 4. A Liese, K Seelbach, C Wandrey. Industrial Biotransformations. Weinheim: Wiley-VCH, 2000, pp 423. 5a. SF Brady, CJ Chao, J Handelsman, J Clardy. Cloning and heterologous expression of a natural product biosynthetic gene cluster from eDNA. Org Lett 3:1981–1984, 2001. 5b. P Hugenholtz, BM Goebel, NR Pace. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol 180:4765– 4774, 1998. 5c. SB Bintrim, TJ Donohue, J Handelsman, GP Roberts, RM Goodman. Molecular phylogeny of archaea from soil. Proc Natl Acad Sci U S A 94:277–282, 1997. 5d. G DeSantis, Z Zhu, WA Greenberg, K Wong, J Chaplin, SR Hanson, B Farwell, LW Nicholson, CL Rand, DP Weiner, DE Robertson, MJ Burk. An enzyme library approach to biocatalysis: development of nitrilases for enantioselective production of carboxylic acid derivatives. J Am Chem Soc 124: 9024–9025, 2002. 6a. FH Arnold. Combinatorial and computational challenges for biocatalyst design. Nature (London) 409:253–257, 2001. 6b. KA Powell, SW Ramer, SB del Cardayre´, WPC Stemmer, MB Tobin, PF Longchamp, GW Huisman. Directed evolution and biocatalysis. Angew Chem 113:4068–4080; Angew Chem Int Ed 40:3948–3959, 2001. 6c. RC Cadwell, GF Joyce. Randomization of genes by PCR mutagenesis. PCR Methods Appl 2:28–33, 1992. 6d. MT Reetz, K-E Jaeger. Superior biocatalysts by directed evolution. Top Curr Chem 200:31–57, 1999.
592
Reetz
6e. JD Sutherland. Evolutionary optimisation of enzymes. Curr Opin Chem Biol 4:263–269, 2000. 6f. MT Reetz. Directed evolution of selective enzymes and hybrid catalysts. Tetrahedron 58:6595–6602, 2002. 7a. MT Reetz, A Zonta, K Schimossek, K Liebeton, K-E Jaeger. Creation of enantioselective biocatalysts for organic chemistry by in vitro evolution. Angew Chem 109:2961–2963; Angew Chem, Int Ed Engl. 36:2830–2832, 1997. 7b. K Liebeton, A Zonta, K Schimossek, M Nardini, D Lang, BW Dijkstra, MT Reetz, K-E Jaeger. Directed evolution of an enantioselective lipase. Chem Biol 7:709–718, 2000. 7c. MT Reetz, K-E Jaeger. Enantioselective enzymes for organic synthesis created by directed evolution. Chem Eur J 6:407–412, 2000. 7d. MT Reetz, S Wilensek, D Zha, K-E Jaeger. Directed evolution of an enantioselective enzyme through combinatorial multiple cassette mutagenesis. Angew Chem 113:3701–3703; Angew Chem Int Ed 40:3589–3591, 2001. 7e. D Zha, A Eipper, MT Reetz. Assembly of designed oligonucleotides as an efficient method for gene recombination: a new tool in directed evolution. Chem BioChem 4:34–39, 2003. 8a. UT Bornscheuer, J Altenbuchner, HH Meyer. Directed evolution of an esterase for the stereoselective resolution of a key intermediate in the synthesis of epothilones. Biotechnol Bioeng 58:554–559, 1998. 8b. O May, PT Nguyen, FH Arnold. Inverting enantioselectivity by directed evolution of hydantoinase for improved production of L-methionine. Nat Biotechnol 18:317–320, 2000. 8c. S Fong, TD Machajewski, CC Mak, C-H Wong. Directed evolution of D-2keto-3-deoxy-6-phosphogluconate aldolase to new variants for the efficient synthesis of D- and L-sugars. Chem Biol 7:873–883, 2000. 9. Reviews of high-throughput ee-assays: 9a. MT Reetz. Combinatorial and evolution-based methods in the creation of enantioselective catalysts. Angew Chem 113:292–320; Angew Chem Int Ed 40:284–310, 2001. 9b. MT Reetz. New methods for the high-throughput screening of enantioselective catalysts and biocatalysts. Angew Chem 114:1391–1394; Angew Chem Int Ed 41:1335–1338, 2002. 9c. D Wahler, J-L Reymond. High-throughput screening for biocatalysts. Curr Opin Biotechnol 12:535–544, 2001. 10. F Zocher, MM Enzelberger, UT Bornscheuer, B Hauer, RD Schmid. A colorimetric assay suitable for screening epoxide hydrolase activity. Anal Chim Acta 391:345–351, 1999. 11. MT Reetz, CJ Ru¨ggeberg. A screening system for enantioselective enzymes based on differential cell growth. Chem Commun (Cambridge) 1428–1429, 1996. 12. See, for example: 12a. H Zhao, FH Arnold. Combinatorial protein design: strategies for screening protein libraries. Curr Opin Struct Biol 7:480–485, 1997.
Screening for Enantioselective Enzymes
593
12b. G Gauglitz. Optical detection methods for combinatorial libraries. Curr Opin Chem Biol 4:351–355, 2000. 12c. I Venekei, L Hedstrom, WJ Rutter. A rapid and effective procedure for screening protease mutants. Protein Eng 9:85–93, 1996. 12d. G Xue, H Pang, ES Yeung. Multiplexed capillary zone electrophoresis and micellar electrokinetic chromatography with internal standardization. Anal Chem 71:2642–2649, 1999. 12e. H Fenniri. Rapid screening of biocatalysts. Chemtech 26:15–25, 1996. 12f. RP Hertzberg, AJ Pope. High-throughput screening: new technology for the 21st century. Curr Opin Chem Biol 4:445–451, 2000. 12g. RKC Knaust, P Nordlund. Screening for soluble expression of recombinant proteins. Anal Biochem 297:79–85, 2001. 12h. D Wahler, F Badalassi, P Crotti, J-L Reymond. Enzyme fingerprints by fluorogenic and chromogenic substrate assays. Angew Chem 113:4589–4592; Angew Chem Int Ed 40:4457–4460, 2001. 12i. KD Janda, L-C Lo, C-HL Lo, M-M Sim, R Wang, C-H Wong, RA Lerner. Chemical selection for catalysis in combinatorial antibody libraries. Science (Washington, DC) 275:945–948, 1997. 13. D Zha, S Wilensek, M Hermes, K-E Jaeger, MT Reetz. Complete reversal of enantioselectivity of an enzyme-catalyzed reaction by directed evolution. Chem Commun (Cambridge) 2664–2665, 2001. 14. LE Janes, RJ Kazlauskas. Quick E. A fast spectrophotometric method to measure the enantioselectivity of hydrolases. J Org Chem 62:4560–4561, 1997. 15. LE Janes, AC Lo¨wendahl, RJ Kazlauskas. Quantitative screening of hydrolase libraries using pH indicators: identifying active and enantioselective hydrolases. Chem Eur J 4:2324–2331, 1998. 16. R Kazlauskas. Abstract of a lecture presented at the Enzyme Technologies 2000 Pre-Conference, Workshop on High-Throughput Screening, International Business Communications, Las Vegas, 2000. 17. F Morı´ s-Varas, A Shah, J Aikens, NP Nadkarni, JD Rozzell, DC Demirjian. Visualization of enzyme-catalyzed reactions using pH indicators: rapid screening of hydrolase libraries and estimation of the enantioselectivity. Bioorg Med Chem 7:2183–2188, 1999. 18. C Ru¨ggeberg. Beitra¨ge zur gerichteten Evolution von Enzymen fu¨r die organische Synthese. PhD dissertation, Ruhr-Universita¨t, Bochum, Germany, 2001. 19. M Baumann, R Stu¨rmer, UT Bornscheuer. A high-throughput-screening method for the identification of active and enantioselective hydrolases. Angew Chem 113:4329–4333; Angew Chem Int Ed 40:4201–4204, 2001. 20a. AW Czarnik, (Ed.). (1996). Fluorescent Chemosensors for Ion and Molecule Resognition. ACS Symp Ser (538). Washington, DC: American Chemical Society, 1993, pp 235. 20b. AP de Silva, HQN Gunaratne, T Gunnlaugsson, AJM Huxley, CP McCoy, JT Rademacher, TE Rice. Signaling recognition events with fluorescent sensors and switches. Chem Rev (Washington, DC) 97: 1515–1566, 1997. 20c. G Zandonella, L Haalck, F Spener, K Faber, F Paltauf, A Hermetter. Enan-
594
21. 22a. 22b.
23. 24. 25.
26.
27. 28. 29. 30. 31. 32a.
32b. 33a. 33b. 33c. 33d. 33e.
33f.
Reetz tiomeric perylene-glycerolipids as fluorogenic substrates for a dual wavelength assay of lipase activity and stereoselectivity. Chirality 8:481–489, 1996. G Klein, J-L Reymond. Enantioselective fluorogenic assay of acetate hydrolysis for detecting lipase catalytic antibodies. Helv Chim Acta 82:400–406, 1999. GT Copeland, SJ Miller. A chemosensor-based approach to catalyst discovery in solution and on solid support. J Am Chem Soc 121:4306–4307, 1999. ER Jarvo, CA Evans, GT Copeland, SJ Miller. Fluorescence-based screening of asymmetric acylation catalysts through parallel enantiomer analysis. Identification of a catalyst for tertiary alcohol resolution. J Org Chem 66: 5522–5527, 2001. WA Ko¨nig. Gas Chromatographic Enantiomer Separation with Modified Cyclodextrins. Heidelberg: Hu¨thig, 1992, pp 163. AM Krstulovic, ed. Chiral Separation by HPLC. Chichester: Ellis Horwood, 1989, pp 548. MT Reetz, KM Ku¨hling, S Wilensek, H Husmann, UW Ha¨usig, M Hermes. A GC-based method for high-throughput screening of enantioselective catalysts. Catal Today 67:389–396, 2001. KM Ku¨hling. Beitra¨ge zur Antibiotikaforschung. Naturstoffisolierung, enzymatische Racematspaltung und Screening Systeme. PhD dissertation, RuhrUniversita¨t, Bochum, Germany, 1999. GC instruments and data bus (HP-IB) are commercially available from Hewlett-Packard, Waldbronn, Germany. The sample manager PAL is commercially available from CTC, Schlieren, Switzerland. ChemstationR is commercially available from Hewlett-Packard, Waldbronn, Germany. Microsoft Excel is commercially available from Microsoft, Unterschleissheim, Germany. MT Reetz, A Deege, F Daligauld, unpublished results. MT Reetz, KM Ku¨hling, A Deege, H Hinrichs, D Belder. Super-high-throughput screening of enantioselective catalysts by using capillary array electrophoresis. Angew Chem 112:4049–4052; Angew Chem Int Ed 39:3891–3893, 2000. MT Reetz, KM Ku¨hling, A Deege, H Hinrichs, D Belder. Studiengesellschaft Kohle mbH. Patent application DE-A 100 42 451.1, 2000. B Chankvetadze. Capillary Electrophoresis in Chiral Analysis. Chichester: Wiley, 1997. E Gassmann, JE Kuo, RN Zare. Electrokinetic separation of chiral compounds. Science (Washington, DC) 230:813–814, 1985. LG Blomberg, H Wan. Determination of enantiomeric excess by capillary electrophoresis. Electrophoresis 21:1940–1952, 2000. H Nishi, T Fukuyama, S Terabe. Chiral separation by cyclodextrin-modified micellar electrokinetic chromatography. J Chromatogr 553:503–516, 1991. S Fanali. Separation of optical isomers by capillary zone electrophoresis based on host guest complexation with cyclodextrins. J Chromatogr 474:441–446, 1989. A Guttman, A Paulus, AS Cohen, N Grinberg, BL Karger. Use of complexing agents for selective separation in high-performance capillary electrophoresis—
Screening for Enantioselective Enzymes
33g.
33h. 33 i. 34a. 34b. 34c. 34d.
34e.
35a.
35b.
35c.
35d.
35e. 35f.
35g.
35h. 36. 37.
595
chiral resolution via cyclodextrins incorporated within polyacrylamide-gel columns. J Chromatogr 448:41–53, 1988. D Belder, G Schomburg. Chiral separations of basic and acidic compounds in modified capillaries using cyclodextrin-modified capillary zone electrophoresis. J Chromatogr A 666:351–365, 1994. D Wistuba, V Schurig. Enantiomer separation of chiral pharmaceuticals by capillary electrochromatography. J Chromatogr A 875:255–276, 2000. G Blaschke, B Chankvetadze. Enantiomer separation of drugs by capillary electromigration techniques. J Chromatogr A 875:3–25, 2000. XC Huang, MA Quesada, RA Mathies. DNA sequencing using capillary array electrophoresis. Anal Chem 64:2149–2154, 1992. H Kambara, S Takahashi. Multi-sheathflow capillary array DNA analyser. Nature (London) 361:565–566, 1993. NJ Dovichi. DNA sequencing by capillary electrophoresis. Electrophoresis 18:2393–2399, 1997. G Xue, H Pang, ES Yeung. Multiplexed capillary zone electrophoresis and micellar electrokinetic chromatography with internal standardization. Anal Chem 71:2642–2649, 1999. S Behr, M Ma¨tzig, A Levin, H Eickhoff, C Heller. A fully automated multicapillary electrophoresis device for DNA analysis. Electrophoresis 20:1492–1507, 1999. DJ Harrison, K Fluri, K Seiler, Z Fan, CS Effenhauser, A Manz. Micromachining miniaturized capillary electrophoresis-based chemical analysis system on a chip. Science (Washington, DC) 261:895–897, 1993. SC Jacobson, R Hergenroder, LB Koutny, RJ Warmack, JM Ramsey. Effects of injection schemes and column geometry on the performance of microchip electrophoresis devices. Anal Chem 66:1107–1113, 1994. LD Hutt, DP Glavin, JL Bada, RA Mathies. Microfabricated capillary electrophoresis amino acid chirality analyzer for extraterrestrial exploration. Anal Chem 71:4000–4006, 1999. D Schmalzing, L Koutny, A Adourian, P Belgrader, P Matsudaira, D Ehrlich. DNA typing in thirty seconds with a microfabricated device. Proc Natl Acad Sci U S A 94:10273–10278, 1997. SC Jacobson, CT Culbertson, JE Daler, JM Ramsey. Microchip structures for submillisecond electrophoresis. Anal Chem 70:3476–3480, 1998. S Liu, H Ren, Q Gao, DJ Roach, RT Loder Jr, TM Armstrong, Q Mao, I Blaga, DL Barker, SB Jovanovich. Automated parallel DNA sequencing on multiple channel microchips. Proc Natl Acad Sci U S A 97:5369–5374, 2000. SR Wallenborg, CG Bailey. Separation and detection of explosives on a microchip using micellar electrokinetic chromatography and indirect laser-induced fluorescence. Anal Chem 72:1872–1878, 2000. I Rodriguez, LJ Jin, SFY Li. High-speed chiral separations on microchip electrophoresis devices. Electrophoresis 21:211–219, 2000. MegaBACE is commercially available from Amersham Pharmacia Biotech, Freiburg, Germany. F Balkenhohl, K Ditrich, B Hauer, W Ladner. Optisch aktive Amine durch
596
38. 39.
40. 41. 42a.
42b.
43.
44. 45.
46.
47.
48.
49a.
49b. 50.
51. 52a.
Reetz Lipase-katalysierte Methoxyacetylierung. J Prakt Chem/Chem-Ztg 339:381– 384, 1997. MT Reetz, A Zonta, K Schimossek, K Liebeton, K-E Jaeger. Studiengesellschaft Kohle mbH. Patent application DE-A 197 31 990.4, 1997. AF Drake, JM Gould, SF Mason. Simultaneous monitoring of lightabsorption and optical activity in the liquid chromatography of chiral substances. J Chromatogr 202:239–245, 1980. P Salvadori, C Bertucci, C Rosini. Circular dichroism detection in HPLC. Chirality 3:376–385, 1991. A Mannschreck. On-line measurement of circular dichroism spectra during enantioselective liquid chromatography. Trends Anal Chem 12:220–225, 1993. K Ding, A Ishii, K Mikami. Super high throughput screening (SHTS) of chiral ligands and activators: asymmetric activation of chiral diol-zinc catalysts by chiral nitrogen activators for the enantioselective addition of diethylzinc to aldehydes. Angew Chem 111:519–523; Angew Chem Int Ed 38:497–501, 1999. R Angelaud, Y Matsumoto, T Korenaga, K Kudo, M Senda, K Mikami. Optical rotation per refractive index unit, or enantiomeric (e) factor, for screening enantioselective catalysts through asymmetric activation of carbohydrates. Chirality 12:544–547, 2000. MT Reetz, KM Ku¨hling, H Hinrichs, A Deege. Circular dichroism as a detection method in the screening of enantioselective catalysts. Chirality 12:479– 482, 2000. MT Reetz, A Eipper, KM Ku¨hling, unpublished results. U Glu¨ckert. Erfassung und Messung von Wa¨rmestrahlung: Eine praktische Einfu¨hrung in die Pyrometrie und Thermographie. Mu¨nchen: Franzis, 1992, pp 153. PC Pawlicki, RA Schmitz. Spatial effects on supported catalysts: thermal infrared imaging is a useful tool for studying local rate variations on catalytic surfaces in situ. Chem Eng Prog 83:40–45, 1987. G Georgiades, VA Self, PA Sermon. IR-emission analysis of temperature profiles of Pt/SiO2 catalysts in exothermic reactions. Angew Chem 99:1050– 1052, 1987. Angew Chem, Int Ed Engl 26:1042–1043, 1987. FC Moates, M Somani, J Annamalai, JT Richardson, D Luss, RC Willson. Infrared thermographic screening of combinatorial libraries of heterogeneous catalysts. Ind Eng Chem Res 35:4801–4803, 1996. SJ Taylor, JP Morken. Thermographic selection of effective catalysts from an encoded polymer-bound library. Science (Washington, DC) 280:267–270, 1998. DE Bergbreiter. Infrared thermographic screening of combinatorial libraries of heterogeneous catalysts. Chemtracts 10:683–686, 1997. A Holzwarth, H-W Schmidt, WF Maier. Detection of catalytic activity in combinatorial libraries of heterogeneous catalysts by IR thermography. Angew Chem 110:2788–2792; Angew Chem Int Ed 37:2644–2647, 1998. HM Becker. Neue Screening-Systeme fu¨r die enantioselektive Bio- und Metallkatalyse. PhD dissertation, Ruhr-Universita¨t, Bochum, Germany, 2000. MT Reetz, MH Becker, KM Ku¨hling, A Holzwarth. Time-resolved IR-ther-
Screening for Enantioselective Enzymes
52b.
53.
54a.
54b. 55a.
55b. 55c.
55d.
56a.
56b. 56c.
56d.
56e.
56f.
597
mographic detection and screening of enantioselectivity in catalytic reactions. Angew Chem 110:2792–2795; Angew Chem Int Ed 37:2647–2650, 1998. MT Reetz, M Hermes, MH Becker. Infrared-thermographic screening of the activity and enantioselectivity of enzymes. Appl Microbiol Biotechnol 55:531– 536, 2001. ALE Larsson, BA Persson, J-E Ba¨ckvall. Enzymatic resolution of alcohols coupled with ruthenium-catalyzed racemization of the substrate alcohol. Angew Chem 109:1256–1258.; Angew Chem, Int Ed Engl 36:1211–1212, 1997. Reviews of combinatorial methods in materials science and in catalysis: B Jandeleit, DJ Schaefer, TS Powers, HW Turner, WH Weinberg. Combinatorial materials science and catalysis. Angew Chem 111:2648–2689; Angew Chem Int Ed 38:2494–2532, 1999. S Senkan. Combinatorial heterogeneous catalysis—a new path in an old field. Angew Chem 113:322–341; Angew Chem Int Ed 40:312–329, 2001. WF Maier. Combinatorial chemistry—challenge and chance for the development of new catalysts and materials. Angew Chem 111:1294–1296; Angew Chem Int Ed 38:1216–1218, 1999. PP Pescarmona, JC van der Waal, IE Maxwell, T Maschmeyer. Combinatorial chemistry, high-speed screening and catalysis. Catal Lett 63:1–11, 1999. S Senkan, K Krantz, S Ozturk, V Zengin, I Onal. High-throughput testing of heterogeneous catalyst libraries using array microreactors and mass spectrometry. Angew Chem 111:2965–2971. Angew Chem Int Ed 38:2794–2799, 1999. C Hinderling, P Chen. Rapid screening of olefin polymerization catalyst libraries by electrospray ionization tandem mass spectrometry. Angew Chem 111:2393–2396; Angew Chem Int Ed 38:2253–2256, 1999. G Smith, JA Leary. Differentiation of diastereomeric nickel(II) N-glycoside complexes using tandem mass spectrometry and kinetic energy release measurements. J Am Chem Soc 118:3293–3294, 1996. J Ramirez, F He, CB Lebrilla. Gas-phase chiral differentation of amino acid guests in cyclodextrin hosts. J Am Chem Soc 120:7387–7388, 1998. M Sawada, Y Takai, H Yamada, J Nishida, T Kaneda, R Arakawa, M Okamoto, K Hirose, T Tanaka, K Naemura. Chiral amino acid recognition detected by electrospray ionization (ESI) and fast atom bombardment (FAB) mass spectrometry (MS) coupled with the enantiomer-labelled (EL) guest method. J Chem Soc, Perkin Trans 2(3):701–710, 1998. DV Dearden, C Dejsupa, Y Liang, JS Bradshaw, RM Izatt. Intrinsic contributions to chiral recognition: discrimination between enantiomeric amines by dimethyldiketopyridino-18-crown-6 in the gas phase. J Am Chem Soc 119:353– 359, 1997. S Piccirillo, C Bosman, D Toja, A Giardini-Guidoni, M Pierini, A Troiani, M Speranza. Gas-phase enantiodifferentation of chiral molecules: chiral recognition of 1-phenyl-1-propanol/2-butanol clusters by resonance enhanced multiphoton ionization spectroscopy. Angew Chem 119:1816–1818; Angew Chem, Int Ed Engl 36:1729–1731, 1997. EN Nikolaev, EV Denisov, MI Nikolaeva, JH Futrell, VS Rakov, FJ Winkler. Elucidation of influence of chirality on formation and decomposition of ion
598
56g. 56h. 56 i.
57.
58.
59a.
59b.
60. 61.
62. 63.
64. 65. 66.
67a. 67b.
68. 69.
Reetz molecular complexes in the dialkyltartrate class using mass spectrometry. Adv Mass Spectrom 14:279–313, 1998. A Filippi, A Giardini, S Piccirillo, M Speranza. Gas-phase enantioselectivity. Int J Mass Spectrom 198:137–163, 2000. M Sawada. Chiral recognition detected by fast atom bombardment mass spectrometry. Mass Spectrom Rev 16:73–90, 1997. WA Tao, RG Cooks. Parallel reactions for enantiomeric quantification of peptides by mass spectrometry. Angew Chem 113:779–782; Angew Chem Int Ed 40:757–760, 2001. A Horeau, A Nouaille. Micromethod for determination of the configuration of secondary alcohols by kinetic resolution. Use of mass spectroscopy. Tetrahedron Lett 31:2707–2710, 1990. J Guo, J Wu, G Siuzdak, MG Finn. Measurement of enantiomeric excess by kinetic resolution and mass spectrometry. Angew Chem 111:1868–1871; Angew Chem Int Ed 38:1755–1758, 1999. MT Reetz, MH Becker, H-W Klein. D Sto¨ckigt. A method for highthroughput screening of enantioselective catalysts. Angew Chem 111:1872– 1875; Angew Chem Int Ed 38:1758–1761, 1999. MT Reetz, MH Becker, D Sto¨ckigt, HW Klein. High throughput screening method for determining enantioselectivity. Patent application DE-A 199 13858.3 (26.3.1999). MT Reetz, A Eipper, H Krumm, M Hermes, A Funke, T Eggert, KE Jaeger, unpublished results. W Schrader, A Eipper, DJ Pugh, MT Reetz. Second generation MS-based high-throughput screening system for enantioselective catalysts and biocatalysts. Can J Chem 80:626–632, 2002. M Burk. Lecture at Chiral Europe, London, May 12, 2003. GA Korbel, G Lalic, MD Shair. Reaction microarrays: a method for rapidly determining the enantiomeric excess of thousands of samples. J Am Chem Soc 123:361–362, 2001. B Phimister. Going global. Nat Genet 21:1, 1999. P Abato, CT Seto. EMDee: an enzymatic method for determining enantiomeric excess. J Am Chem Soc 123:9206–9207, 2001. F Taran, C Gauchet, B Mohar, S Meunier, A Valleix, PY Renard, C Cre´minon, J Grassi, A Wagner, C Mioskowski. High-throughput screening of enantioselective catalysts by immunoassay. Angew Chem 114:132–135; Angew Chem Int Ed 41:124–127, 2002. MJ Shapiro, JS Gounarides. NMR methods utilized in combinatorial chemistry research. Prog Nucl Magn Reson Spectrosc 35:153–200, 1999. H Schro¨der, P Neidig, G Rosse´. High-throughput structure verification of a substituted 4-phenylbenzopyran library by using 2D NMR techniques. Angew Chem 112:3974–3977; Angew Chem Int Ed 39:3816–3819, 2000. MT Reetz, A Eipper, P Tielmann, R Mynott. Studiengesellschaft Kohle mbH. Patent application DE-A 102 09 177.3, 2002. P Tielmann, M Boese, M Luft, MT Reetz. A practical high-throughput screening system for enantioselectivity using FTIR spectroscopy. Chem Eur J, in press.
27 Enzyme Engineering by Microbial Cell Surface Display Thorsten M. Adams and Harald Kolmar ¨t Go ¨ttingen Georg-August-Universita ¨ttingen, Germany Go
During recent years, structure-based protein design and directed evolution have been widely applied to engineer enzyme activity, specificity, or stability (1). Methodologies such as gene shuffling (2) and combinatorial mutagenesis (3) made it possible to generate diverse molecular repertoires of enzyme variants that were successfully screened for the desired improvements. Screening of large mutant libraries in the range of 104 to 107 different variants is a crucial step in the process and often becomes the limiting factor (4,5). In cases where the desired enzyme function can be coupled to microbial growth or survival, selection may be applicable. Otherwise, single bacterial cells are clonally expanded and each population is individually tested for the desired activity. Commonly, the bacterial cells are compartmentalized, e.g., by transfer into microtiter plates, followed by cell lysis and testing the lysate for the desired novel or improved function. One of the most interesting advancements of recent molecular biotechnology is the ability to directly display peptides and proteins on the surface of host organisms. This breakthrough technology obviates cell lysis 599
600
Adams and Kolmar
and allows functional screening in a defined environment. It opens new avenues for various applications such as the generation of bacterial live vaccines, whole cell biosorbents, cell-based diagnostics, and recombinant biocatalysts. Moreover, it allows one to apply very high throughput screening by fluorescence-activated cell sorting (FACS) of combinatorial peptide and enzyme libraries for the desired function, including expression level, stability, ligand binding, and catalysis. In this chapter, we will mainly focus on novel approaches in cellular display of enzymes and their applications in enzyme technology. 1 1.1
DISPLAY STRATEGIES Escherichia coli Cell Surface Display
Numerous expression systems have been developed for the display of peptides and proteins on the surface of E. coli, which is the preferred host for the generation, propagation, and maintenance of large molecular repertoires that may be derived from over 1010 individual transformants. For microorganisms other than E. coli, the library size is limited by the transformation efficiency and, realistically, it cannot be much larger than 105 clones. To become exposed on the outer surface of an E. coli cell, the protein of interest, which is synthesized in the bacterial cytoplasm, has to pass two membranes, namely the cytoplasmic membrane and the outer membrane (Fig. 1). Surface exposition of a heterologous passenger protein is com-
Figure 1 E. coli cell surface display formats: (A) porins; (B) Lpp–OmpA fusion; (C) fimbriae; (D) autotransporters; (E) ice-nucleation protein; (F) intimin; (P) passenger protein.
Enzyme Engineering by Microbial Cell Surface Display
601
monly achieved by genetic fusion of the passenger with a translocator protein that is completely or in part located on the outer surface of the microbial host cell. 1.1.1
Porins
Genetic insertion of a target sequence into the genes for outer membrane proteins is a frequently used strategy to enable membrane translocation and subsequent surface anchoring of the recombinant passenger gene products (6). Porins are abundant outer membrane proteins that constitute a h-barrel structure, where the h-strands traverse the outer membrane with the connecting loops facing either the periplasm or the cell surface (7). Short peptides with several dozens amino acids in length can be displayed on the cell surface via insertion into surface-exposed loops of porins such as OmpC and LamB (6). However, the position and the length of the target sequence plays a critical role in the efficient display because sequences exceeding approximately 50–60 residues negatively interfere with the folding and membrane insertion of the carrier protein. Unfortunately, most if not all porins are inserted into the outer membrane in a way that both termini face the bacterial periplasm. This renders them unsuitable to serve as carboxy-terminal or amino-terminal fusion partners to achieve a cell surface exposition of the fused passenger protein. To overcome this drawback, Francisco et al. (8) developed the Lpp– OmpA system, a sophisticated display format based on a tripartite fusion protein consisting of the first nine amino acids of the E. coli major outer membrane lipoprotein (Lpp), three membrane spanning h-strands comprising residues 46–159 of OmpA protein, and the protein of interest. In the shortest version of the targeting vehicle, a single membrane-spanning hstrand of OmpA is used together with the Lpp moiety (8,9). The aminoterminal cysteine residue of Lpp provides an outer membrane anchor, which consists of a cysteinyl–glycerol molecule to which two fatty acids are attached by two ester linkages and one fatty acid that is attached by an amide linkage (10). With this expression system, enzymes such as hlactamase (8), an organophosphorus hydrolase (11), Cellulomonas fimi exoglucanase Cex as well as its cellulose binding domain (12), single-chain antibodies (scFv) (13), and a protease inhibitor (14) have been successfully displayed on the E. coli cell surface (Table 1). 1.1.2
Fimbriae
Fimbriae and flagella of gram-negative bacteria are complex filamentous structures on the cell surface that are composed of thousands of copies of the respective fimbrial or flagellar protein. Flagella display is based on the
602 Table 1
Adams and Kolmar Examples of Microbial Display of Enzymes
Displayed enzyme h-Lactamase Organophosphorus hydrolase C. fimi exoglucanase Cex h-Lactamase Levansucrase Organophosphorus hydrolase Carboxymethylcellulase OmpT protease Lipase Glucoamylase h-Glucosidase Carboxymethylcellulase
Display format Lpp–OmpA fusion Lpp–OmpA fusion Lpp–OmpA fusion AIDA INP INP INP OmpT a-Agglutinin fusion a-Agglutinin fusion a-Agglutinin fusion a-Agglutinin fusion
Display host E. E. E. E. E. E. E. E. S. S. S. S.
coli coli coli coli coli coli coli coli cerevisiae cerevisiae cerevisiae cerevisiae
Reference 8 11 12 25 27 11 28 62 48 45,49 50 50
fact that large peptides can be fused into the variable domain of the flagellar major subunit FliC without loss of flagellar synthesis and function. By inframe insertion of various bacterial adhesin gene fragments in a permissive site of the fliC gene of E. coli, display of a number of peptides ranging from 30 up to over 300 amino acids in size in the E. coli flagellum was achieved (15). Flagella display allows presentation of large peptides in thousands of intimately associated copies on the outer surface of E. coli and has proven a valuable tool for a variety of applications such as epitope mapping, binding analyses, or molecular studies of adhesin–receptor interactions (16). Similarly, fimbriae displaying metal-binding motifs have been shown to work well for the sequestration of metals by recombinant E. coli cells (17,18). However, the passenger protein is inserted into the structural framework of the flagellar/fimbrial protein, which can hamper passenger protein folding or filament formation. 1.1.3
Autotransporters
The autotransporters form a family of secreted proteins from gram-negative bacteria. They possess an overall unifying structure comprising three functional domains: the amino-terminal leader sequence, the secreted passenger domain, and a carboxy-terminal h-domain that forms a h-barrel pore for the secretion of the passenger protein (19). The prototype of autotransporters is the IgA protease (IgAh) from Neisseria gonorrhoeae (20), where the aminoterminal protease domain is released into the culture medium after cell surface exposition and autoproteolytic cleavage (21). Autotransport and cell
Enzyme Engineering by Microbial Cell Surface Display
603
surface exposure of the amino-terminal domain of IgAh also functions if heterologously expressed in E. coli. Igah has been engineered by replacing the amino-terminal protease domain by the passenger polypeptide to be transported, as exemplified by the cell surface display of cholera toxin B subunit (22) and a protease inhibitor of the squash family (9). However, it was found that overexpression of the fusion protein is highly toxic for E. coli, which makes library screening very difficult (9). Recently, AIDA, another autotransporter from E. coli, the adhesin involved in diffuse adherence, was used to expose the cholera toxin B subunit (23), small T-cell epitopes, and the 11.6kDa B subunit of the E. coli heat labile toxin (LTB) (24) on the E. coli cell surface. Furthermore, it was possible to display an enzymatically active hlactamase on the E. coli surface via fusion to AIDA (25). 1.1.4
Ice-Nucleation Protein
The ice-nucleation protein (INP) of Pseudomonas syringae, which is capable of catalyzing the formation of ice in supercooled water, is attached to the outer surface of the bacterial cell via a glycosyl-phosphatidylinositol (GPI) anchor (26). INP was found to remain surface exposed when expressed in E. coli. Several proteins were successfully displayed on the cell surface of E. coli via genetic fusion to INP, such as levansucrase (27), organophosphorus hydrolase (11), carboxymethylcellulase (28), HIV gp120 (29), hepatitis B virus surface antigen (30), and synthetic phytochelatins (31). 1.1.5
Intimin
Intimins are members of a family of bacterial adhesins of pathogenic gramnegative bacteria, which specifically interact with diverse eukaryotic cell surface receptors (32). They are integrated into the bacterial outer membrane with their amino-terminal region, while the carboxy-terminal 280–300 amino acids are surface exposed. The cell binding activity of the EaeA intimin from enterohaemorrhagic E. coli has been localized to its C-terminal 280 residues and the structure of the carboxy-terminal domains has been determined (33,34). It is assumed that the amino-terminal 550 residues of intimin form a porin-like structure and are folded into an antiparallel h-barrel. The entire extracellular segment of intimin is an elongated and relatively rigid rod made up of three immunoglobulin-like domains and a C-terminal lectin-like domain to interact with the receptor. This domain resides on a rigid extracellular arm, which is most likely anchored to the amino-terminal transmembrane domain through a flexible hinge made by two glycine residues allowing mechanical movement between the extracellular rod and the bacterial outer membrane (34). Obviously, intimin provides a structural
604
Adams and Kolmar
scaffold ideally suited for the cell surface display of receptor binding domains remote from the bacterial cell surface. Intimin variants have been constructed, where the two carboxy-terminal extracellular domains that mediate the adhesion of enteropathogenic and enterohaemorrhagic E. coli to target epithelia have been replaced by various passenger proteins. A derivative of the Ecballium elaterium trypsin inhibitor, the Bence–Jones protein REIv, human interleukin-4 (35), as well as calmodulin, ubiquitin, and h-lactamase inhibitor protein from Streptomyces clavuligerus, were efficiently targeted to the surface of E. coli cells (T. Adams, A. Wentzel, H. Kolmar, unpublished results). Approximately 30,000 passenger proteins were found to be surface exposed on a single E. coli cell (35). 1.2
Alternative Microbial Hosts
Surface display on gram-positive bacteria has also been taken into consideration mainly for vaccine development. Approaches based on attenuated mycobacteria, nonpathogenic staphylococci, streptococci, and lactococci, as well as Bacillus subtilis, have been developed (36). Single-chain antibody fragments (37) and the cellulose binding domain from Trichoderma reesei cellulase were displayed on recombinant staphylococci (38), which could serve as inexpensive tools in diagnostic tests and as novel types of microbial biocatalysts. The protozoan Tetrahymena has also been shown to be capable of displaying fusion proteins on its surface (39). Furthermore, several investigators have developed mammalian cell surface display formats (40–42). For the past several years, the expression of proteins on the surfaces of the yeast Saccharomyces cerevisiae has been very actively studied (for reviews, see Refs. 43,44). Exposition of proteins on the cell surface of S. cerevisiae offers for some applications several advantages over bacterial display hosts: S. cerevisiae is widely used in industrial production of proteins and chemicals. Hence enzyme-coated yeast cells could be used as whole-cell catalysts for many biotransformations. To fix heterologous enzymes to the cell wall of S. cerevisiae, Murai et al. (45) developed a yeast display system that relies on a tripartite fusion consisting of a secretion signal sequence, the passenger protein, and the glycosyl-phosphatidylinositol (GPI)-anchor attachment signal sequence of the native cell-wall-anchored protein a-agglutinin. Boder and Wittrup (43) used the small Aga2p-binding domain of the yeast a-agglutinin mating receptor as a cell wall anchor, which forms two disulfide bonds to the Aga1p cell-wall protein. a-Agglutinin is a mannoprotein involved in the mating of type a S. cerevisiae cells with mating type a cells. Examples for yeast cell surface display of heterologous proteins and enzymes are scFv antibody
Enzyme Engineering by Microbial Cell Surface Display
605
fragments (43), T cell receptors (46), human urokinase plasminogen activator epidermal growth factor-like domain (47), lipase (48), glucoamylase (45,49), h-glucosidase, and carboxymethylcellulase (50). 1.3
Choice of the Appropriate Display Format
At present, S. cerevisiae and E. coli are the preferred organisms for the display of populations of variant polypeptides. Yeast combines the advantages of a eukaryotic secretory pathway with the ease of manipulation of a single-celled microorganism, and E. coli, because of its high transformation efficiency, is generally the preferred host for the generation of combinatorial polypeptide libraries. The development of a robust and versatile E. coli cell surface display system was, for several years, hampered by the finding that overproduction of outer membrane fusions is often found to be associated with severe reductions in cell viability (13,35). Very little is known about the mechanism by which outer membrane proteins find their way into the bacterial outer membrane and we do not know yet the reason for the growth defects. Several successful approaches were made to overcome that problem, including tight regulation of outer membrane fusion protein expression (13), careful adjustment of fusion protein net accumulation to a tolerable level, and utilization of well-tolerated autologous translocator proteins such as AIDA (23) or intimin (35). For applications where bacterial cells are used as biosorbents or as microparticles for enzyme display, cell viability may not be a major concern. Therefore it might be desirable to use E. coli cells carrying a maximum number of surface-exposed molecules per cell. High-level accumulation of passenger proteins on the E. coli cell surface exceeding 10,000 molecules per cell has been reported for several display formats (13,35). Common to all E. coli display systems is the requirement of the passenger protein to pass both the cytoplasmic and the outer membrane. As a consequence, these display systems underlie the same restrictions as filamentous phage display, where phage assembly occurs within the periplasmic space and therefore requires secretion of the passenger/coat–protein fusion. As a rule, those proteins that are secreted in their natural host are more likely amenable to successful surface exposition than cytoplasmic proteins. However, no rules or predictions can yet be made about which candidate passenger protein will become exposed on the surface of a particular host cell. Several cytoplasmic proteins have been displayed and some periplasmic proteins failed to be displayed. Nonetheless, proteins that are refractory to display can be optimized for cell surface display by random mutagenesis and flow cytometry selection. Kieke et al. (51) have successfully applied a strategy of random mutagenesis and selection for surface expression of T cell receptor
606
Adams and Kolmar
(TCR) variants via labeling of the cells with a fluorescent anti-TCR antibody followed by flow cytometry screening. For the isolation of ligand-binding proteins from molecular libraries, periplasmic expression of the protein of interest may be sufficient, as long as the ligand is able to pass the E. coli outer membrane. Chen et al. (52) have described the PECS system (periplasmic expression with cytometric screening) where a fluorescent conjugate of the ligand is used to incubate E. coli cells expressing a library of proteins that are secreted into the periplasmic space. Ligand molecules as large as about 10 kDa can enter the E. coli periplasm and equilibrate within the periplasmic space without compromising the cell’s integrity or viability. The bacterial cell envelope effectively serves as a dialysis bag to selectively retain receptor–fluorescent probe complexes but not free ligand. Flow cytometry screening of a bacterial cell population expressing variant antidigoxigenin scFv fragments was used to isolate cells with elevated fluorescence, which were shown to produce scFv antibodies with higher affinity to digoxigenin.
2 2.1
APPLICATIONS OF MICROBIAL CELL SURFACE DISPLAY Microbial Cells as Self-Amplifying Solid Supports
Cell surfaces can be regarded as solid supports for the immobilization of proteins, similar to the immobilization of proteins on microbeads. Conceptually, one can display an expressed protein on the surface of producing cells and then handle the cells as if they were beads of an inert support matrix. Display of proteins provides a means to circumvent separate expression, purification, and immobilization of binding proteins and enzymes. An interesting aspect of bacterial cell surface display is the use of recombinant bacteria as bioadsobents for heavy metals. Metallothioneins that were inserted into the permissive loop of LamB multiplied the Cd2+ sequestration of recombinant E. coli by 20-fold (53). Even more intriguing is the possibility to engineer soil bacteria that are able to survive in polluted environments for an extended period of time. Valls et al. (54) fused the mouse metallothionein to the autotransporter domain of the IgA protease from N. gonorrhoeae and displayed it on the surface of Ralstonia metallidurans, which resulted in a threefold increase of Cd2+ binding. Enzymes have been displayed on cell surfaces for various applications. Yeast strains were constructed displaying active lipase of Rhizopus oryzae (48), glucoamylase mediating starch utilization (45), or cellulose utilization by coexpression of carboxymethylcellulase and h-glucosidase (50). Multivalent display in the context of adjuvant immune-stimulating components of the cell surface makes microbial display a promising avenue for vaccine development (55).
Enzyme Engineering by Microbial Cell Surface Display
2.2 2.2.1
607
Functional Screening of Cell-Surface-Exposed Enzyme Libraries Direct Positive Selection
Direct methods for screening or selection link improved enzyme activity to colony phenotypes or to the survival or growth rates of cells, respectively. Examples of this method include colony screening on plates using chromogenic substrates (56), selection on plates containing increasing antibiotic concentrations (2), and complementation selection with auxotrophs (57). However, intracellular expression of the enzyme of interest requires that it does not negatively interfere with cellular metabolism, that the enzymatic activity can be distinguished from the background of all other cell reactions, and that an externally added substrate readily enters the cytoplasm of the enzyme producing cell. This restriction can be overcome by displaying the protein of interest on the surface of the microbial cell. Recently, the ice nucleation protein (INP)-based bacterial surface display system has been used to selectively screen enzyme libraries for improved catalytic activity of carboxymethyl cellulase (CMCase) (28). The substrate of this enzyme, carboxymethyl cellulose (CMC), is a high-molecular-weight polymer, which is not transported into cells. As a result, only cells displaying CMCase on their surface are able to hydrolyze CMC in agar plates and can easily be identified because they are surrounded by a clear halo after Congo red staining. Furthermore, growth rates of E. coli cells displaying CMCase variants on minimal medium containing CMC as the sole carbon source were found to be correlated with the activities of displayed CMCase variants. As a consequence, by selecting rapidly growing colonies, cells containing improved CMCase variants with fivefold increased activity could be isolated (28). 2.2.2
Screening of Microbial Populations by Flow Cytometry
One major advantage of microbial surface display over other display formats lies in the ability to use fluorescence-activated cell sorting (FACS) for very high throughput screening of polypeptide libraries. As more powerful combinatorial mutagenesis methods are now available, designing screening strategies becomes the most critical step in the successful exploitation of molecular diversity. In this respect, flow cytometry has been established over the recent years as a powerful tool for the screening of microbial populations for elevated gene expression, enhanced catalytic performance, or improved binding capabilities, as detailed in the following sections. To isolate proteins with enhanced binding capabilities to a particular ligand, the library of cells, where each cell individually displays numerous copies of a unique protein variant, is incubated with a fluorescently labeled ligand. After thorough washing, only cells displaying a protein variant with
608
Adams and Kolmar
affinity to the ligand remain fluorescent and are isolated using FACS (Fig. 2). Several methods have been described to introduce a fluorescent label into the protein of interest (Fig. 3). Fluorescent reporter groups can be directly introduced by chemical coupling. Protein ligands that are produced by heterologous gene expression can be engineered such that they contain an additional epitope sequence, which is recognized by a monoclonal antibody. Ligand binding can then be detected by consecutive incubation of the cell
Figure 2 Combinatorial library screening by FACS. A library of microbial cells displaying a protein of interest is incubated with a fluorescent ligand. Cells that are capable of binding the ligand are detected by the LASER optics of the fluorescence activated cell sorter (FACS). The flow cytometer nozzle is vibrated at a high frequency, which causes the microscopic fluid stream to break into discrete droplets. As a fluorescent cell enclosed in a droplet reaches the droplet break-off point, it receives a positive or negative charge. As the droplets individually pass through two vertical deflection plates, the electric field created by those plates directs them a collection vial. Uncharged droplets flow into a waste receptacle. Positive cells are cultivated overnight on agar plates to prevent loss of clones that grow more slowly than others. Colonies are scraped off and used to inoculate liquid medium, to which, at an appropriate optical density, inducer is added to induce expression of the translocator/passenger protein fusion. This procedure is repeated for several rounds until fluorescent cells are enriched.
Enzyme Engineering by Microbial Cell Surface Display
609
Figure 3 Procedures for fluorescent detection of protein/ligand interaction. (A) Fluorophore-coupled ligand. (B) Fluorophore-coupled antibody. (C) Quarternary complexes generated by consecutive rounds of incubation with ligand, primary antibody, biotinylated second antibody, and streptavidin, R-phycoerythrin conjugate. (D) Labeling with steptavidin-coated magnetic beads.
population with the ligand protein followed by incubation with fluorescencelabeled anti-epitope antibody. Cell labeling can also be achieved by application of consecutive rounds of incubation with ligand protein, anti-epitope antibody, biotinylated second antibody followed by incubation with streptavidin, R-phycoerythrin conjugate. All these compounds for indirect fluorescent labeling are commercially available. By using a polyclonal second antibody that is biotinylated at multiple sites in conjunction with a streptavidin, R-phycoerythrin conjugate, which contains 35 or more fluorophores depending on the organism of origin (58), a dramatic signal amplification can be achieved because numerous fluorophores are bound per cell surface exposed protein variant. As a consequence, less than 1000 surface displayed protein molecules per microbial cell are sufficient to achieve a suitable signalto-noise ratio for detecting ligand-binding cells (35). To demonstrate the feasibility of indirect cell labeling and FACS for isolation of rare binders, Wentzel et al. (35) have recently described a model experiment where 100 E. coli cells displaying a particular epitope sequence were mixed with 109 control cells. Epitope displaying E. coli variants could be isolated from the
610
Adams and Kolmar
1:107 mixture after only three consecutive rounds of cell labeling, sorting, and recultivation of the fraction of enriched cells. The major advantage of fluorescence-activated cell sorting as a tool for high-throughput screening lies in the ability to perform biological assays on large populations in solution with single-cell resolution. Flow cytometric analysis of cell surface binding of fluorescence-labeled ligands provides linear quantitation of binding constants and dissociation rates in situ and surface expression across several orders of magnitude. Analysis of typically more than 10,000 protein molecules per cell surface effectively eliminates the stochastic uncertainty inherent in scaffolds displaying only a few protein molecules. FACS screening of a library of single chain Fv antibody fragments displayed on S. cerevisiae allowed Boder et al. (59) to isolate variants with femtomolar antigen-binding affinity, the highest ligand-binding affinity yet reported for a monovalent protein. With modern flow cytometers, such as the FACSVantage from Becton Dickinson or the MoFlo from Cytomation Inc., it is nowadays possible to sort cell populations at rates of not less than approximately 90,000 cells per second (60). Hence a total of approximately 3 108 cells can be screened per hour. Under the assumption that each clone of a library should be represented by at least three bacterial cells presenting a particular variant on their cell surface, molecular repertoires represented by up to 109 different variants can be screened in 1 day. Because after one sorting round, the number of FACS-positive cells is usually less than 1/1000 of the initial population, an accordingly smaller cell population has to be screened in the next sorting round requiring only minutes of sorting time. If necessary, libraries exceeding 109 members can be processed in the first sorting round using magnetic cell sorting to pre-enrich target cells. To achieve this, streptavidin- or antibodycoated superparamagnetic microbeads are used instead of streptavidin, Rphycoerythrin conjugate in the labeling scheme described above, and cells are captured by passage of the cell population through a separation column, which is placed in a strong permanent magnet. The column matrix serves to create a high-gradient magnetic field. The magnetically labeled cells are retained in the column while nonlabeled cells pass through (14). After removal of the column from the magnetic field, the magnetically retained cells are eluted. A single-pass enrichment ratio of over 1000-fold has been reported (61). Over 1011 bacterial cells can be handled in parallel in a single experiment, thus allowing one to screen large repertoires with reasonable library oversampling (T. Adams, unpublished results). Olsen et al. (62) recently developed a technology that allows one to screen by flow cytometry a library of enzyme displaying E. coli cells for rare variants with enhanced catalytic turnover. This has been achieved by using a substrate molecule, which becomes converted into a fluorescent product
Enzyme Engineering by Microbial Cell Surface Display
611
upon catalytic turnover. Because this product attaches to the surface of the enzyme-displaying cell, a direct correlation between turnover rate of the cell-exposed enzyme variant and the cellular fluorescence is established. To obtain such a linkage, Olsen et al. (62) used a cell-surface-associated fluorescence resonance energy transfer (FRET) substrate, which consist of a fluorophore, a scissile bond to be cleaved by the desired enzyme, a quenching fluorophore, and a positively charged moiety to direct the substrate to the negatively charged cell envelope. Enzymatic cleavage of the scissile bond (Fig. 4) disrupts FRET quenching; the quencher is released from the cell and
Figure 4 Binding of FRET substrate to the cell surface of E. coli cells displaying a protein library. The positively charged FRET substrate is attached to the negatively charged polysaccharide matrix of the cell surface. Upon catalytic turnover, the FRET substrate displays FL fluorescence, which is otherwise quenched by Q. FL: BODIPY; Q: tetramethylrhodamine.
612
Adams and Kolmar
drifts away, while the fluorophore remains attached to the cell surface because of the overall positive charge. The research group isolated a mutant of OmpT protease with a 60-fold increase in catalytic activity toward a nonpreferred substrate in a single round of screening from a library of about 2 106 variants (62). 3
CONCLUSION
Bacteria and yeast displaying heterologous receptors or enzymes on their surface hold great potential as whole-cell adsorbents and biocatalysts. Microbes, where the target protein is covalently attached to the cell surface, can be regarded as living and self-amplifying microbeads, and are valuable matrices for various analytical and biotechnological applications. Numerous powerful methodologies are nowadays available for the surface exposure of heterologous proteins in various microbial hosts, ranging from E. coli cells and gram-positive bacteria to yeasts. Upon microbial surface exposure, the protein of interest becomes directly accessible to potential interaction partners, enzyme substrates, or inhibitors. Substantial progress has been made in the past few years in the development of new tools for the generation of very large mutant enzyme libraries. Although fairly large improvements have been made in parallel in the development of screening tools for the isolation of enzymes with enhanced catalytic performance, only a fraction of the generated library clones, which may exceed 108 different mutants, can be screened by application of conventional enzyme activity assays in a microplate format even with sophisticated automated robotic systems. Current FACS technology allows the screening of approximately 109 variant cells per day. Promising examples of FACS screening of cell-based libraries for a desired enzyme function have been recently described. However, for many interesting enzyme-catalyzed reactions, further work remains to be invested into the development of strategies for linking enzyme performance to a corresponding fluorescence signal that can be readout by flow cytometry. Nevertheless, the ability to quantitatively screen libraries of very large size not only for folding stability, protein interaction, and inhibitor binding, but also for catalytic activity, opens new avenues to the directed evolution of enzymes.
REFERENCES 1.
2.
P Forrer, S Jung, A Pluckthun. Beyond binding: using phage display to select for structure, folding and enzymatic activity in proteins. Curr Opin Struct Biol 9:514–520, 1999. WP Stemmer. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370:389–391, 1994.
Enzyme Engineering by Microbial Cell Surface Display 3. 4.
5. 6. 7.
8.
9.
10.
11. 12.
13.
14.
15.
16. 17.
18. 19.
613
JF Reidhaar-Olson, RT Sauer. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241:53–57, 1988. N Cohen, S Abramov, Y Dror, A Freeman. In vitro enzyme evolution: the screening challenge of isolating the one in a million. Trends Biotechnol 19:507– 510, 2001. M Olsen, B Iverson, G Georgiou. High-throughput screening of enzyme libraries. Curr Opin Biotechnol 11:331–337, 2000. H Lang. Outer membrane proteins as surface display systems. Int J Med Microbiol 290:579–585, 2000. SW Cowan, T Schirmer, G Rummel, M Steiert, R Ghosh, RA Pauptit, JN Jansonius, JP Rosenbusch. Crystal structures explain functional properties of two E. coli porins. Nature 358:727–733, 1992. JA Francisco, CF Earhart, G Georgiou. Transport and anchoring of betalactamase to the external surface of Escherichia coli. Proc Natl Acad Sci U S A 89:2713–2717, 1992. A Wentzel, A Christmann, R Kratzner, H Kolmar. Sequence requirements of the GPNG beta-turn of the Ecballium elaterium trypsin inhibitor II explored by combinatorial library screening. J Biol Chem 274:21037–21043, 1999. J Ghrayeb, M Inouye. Nine amino acid residues at the NH2-terminal of lipoprotein are sufficient for its modification, processing, and localization in the outer membrane of Escherichia coli. J Biol Chem 259:463–467, 1984. M Shimazu, A Mulchandani, W Chen. Cell surface display of organophosphorus hydrolase using ice nucleation protein. Biotechnol Prog 17:76–80, 2001. JA Francisco, C Stathopoulos, RA Warren, DG Kilburn, G Georgiou. Specific adhesion and hydrolysis of cellulose by intact Escherichia coli expressing surface anchored cellulase or cellulose binding domains. Biotechnology (N Y) 11: 491–495, 1993. PS Daugherty, MJ Olsen, BL Iverson, G Georgiou. Development of an optimized expression system for the screening of antibody libraries displayed on the Escherichia coli surface. Protein Eng 12:613–621, 1999. A Christmann, K Walter, A Wentzel, R Kratzner, H Kolmar. The cystine knot of a squash-type protease inhibitor as a structural scaffold for Escherichia coli cell surface display of conformationally constrained peptides. Protein Eng 12:797–806, 1999. B Westerlund-Wikstrom, J Tanskanen, R Virkola, J Hacker, M Lindberg, M Skurnik, TK Korhonen. Functional expression of adhesive peptides as fusions to Escherichia coli flagellin. Protein Eng 10:1319–1326, 1997. B Westerlund-Wikstrom. Peptide display on bacterial flagella: principles and applications. Int J Med Microbiol 290:223–230, 2000. K Kjaergaard, MA Schembri, P Klemm. Novel Zn(2+)-chelating peptides selected from a fimbria-displayed random peptide library. Appl Environ Microbiol 67:5467–5473, 2001. MA Schembri, K Kjaergaard, P Klemm. Bioaccumulation of heavy metals by fimbrial designer adhesins. FEMS Microbiol Lett 170:363–371, 1999. IR Henderson, F Navarro-Garcia, JP Nataro. The great escape: structure and function of the autotransporter proteins. Trends Microbiol 6:370–378, 1998.
614
Adams and Kolmar
20. J Pohlner, R Halter, K Beyreuther, TF Meyer. Gene structure and extracellular secretion of Neisseria gonorrhoeae IgA protease. Nature 325:458–462, 1987. 21. J Jose, F Jahnig, TF Meyer. Common structural features of IgA1 proteaselike outer membrane protein autotransporters. Mol Microbiol 18:378–380, 1995. 22. T Klauser, J Pohlner, TF Meyer. Selective extracellular release of cholera toxin B subunit by Escherichia coli: dissection of Neisseria Iga beta-mediated outer membrane transport. EMBO J 11:2327–2335, 1992. 23. J Maurer, J Jose, TF Meyer. Autodisplay: one-component system for efficient surface display and release of soluble recombinant proteins from Escherichia coli. J Bacteriol 179:794–804, 1997. 24. MP Konieczny, M Suhr, A Noll, IB Autenrieth, M Alexander Schmidt. Cell surface presentation of recombinant (poly-) peptides including functional T-cell epitopes by the AIDA autotransporter system. FEMS Immunol Med Microbiol 27:321–332, 2000. 25. CT Lattemann, J Maurer, E Gerland, TF Meyer. Autodisplay: functional display of active beta-lactamase on the surface of Escherichia coli by the AIDAI autotransporter. J Bacteriol 182:3726–3733, 2000. 26. LM Kozloff, MA Turner, F Arellano. Formation of bacterial membrane icenucleating lipoglycoprotein complexes. J Bacteriol 173:6528–6536, 1991. 27. HC Jung, JM Lebeault, JG Pan. Surface display of Zymomonas mobilis levansucrase by using the ice-nucleation protein of Pseudomonas syringae. Nat Biotechnol 16:576–580, 1998. 28. YS Kim, HC Jung, JG Pan. Bacterial cell surface display of an enzyme library for selective screening of improved cellulase variants. Appl Environ Microbiol 66:788–793, 2000. 29. YD Kwak, SK Yoo, EJ Kim. Cell surface display of human immunodeficiency virus type 1 gp120 on Escherichia coli by using ice nucleation protein. Clin Diagn Lab Immunol 6:499–503, 1999. 30. EJ Kim, SK Yoo. Cell surface display of hepatitis B virus surface antigen by using Pseudomonas syringae ice nucleation protein. Lett Appl Microbiol 29:292–297, 1999. 31. W Bae, A Mulchandani, W Chen. Cell surface display of synthetic phytochelatins using ice nucleation protein for enhanced heavy metal bioaccumulation. J Inorg Biochem 88:223–227, 2002. 32. BA Vallance, BB Finlay. Exploitation of host cells by enteropathogenic Escherichia coli. Proc Natl Acad Sci U S A 97:8799–8806, 2000. 33. M Batchelor, S Prasannan, S Daniell, S Reece, I Connerton, G Bloomberg, G Dougan, G Frankel, S Matthews. Structural basis for recognition of the translocated intimin receptor (Tir) by intimin from enteropathogenic Escherichia coli. EMBO J 19:2452–2464, 2000. 34. Y Luo, EA Frey, RA Pfuetzner, AL Creagh, DG Knoechel, CA Haynes, BB Finlay, NC Strynadka. Crystal structure of enteropathogenic Escherichia coli intimin–receptor complex. Nature 405:1073–1077, 2000. 35. A Wentzel, A Christmann, T Adams, H Kolmar. Display of passenger proteins
Enzyme Engineering by Microbial Cell Surface Display
36. 37.
38.
39.
40.
41.
42.
43. 44. 45.
46.
47.
48. 49.
50.
615
on the surface of Escherichia coli K-12 by the enterohemorrhagic E. coli intimin EaeA. J Bacteriol 183:7273–7284, 2001. M Hansson, P Samuelson, E Gunneriusson, S Stahl. Surface display on gram positive bacteria. Comb Chem High Throughput Screen 4:171–184, 2001. E Gunneriusson, P Samuelson, M Uhlen, PA Nygren, S Stahl. Surface display of a functional single-chain Fv antibody on staphylococci. J Bacteriol 178: 1341–1346, 1996. J Lehtio, H Wernerus, P Samuelson, TT Teeri, S Stahl. Directed immobilization of recombinant staphylococci on cotton fibers by functional display of a fungal cellulose-binding domain. FEMS Microbiol Lett 195:197–204, 2001. J Gaertig, Y Gao, T Tishgarten, TG Clark, HW Dickerson. Surface display of a parasite antigen in the ciliate Tetrahymena thermophila. Nat Biotechnol 17:462– 465, 1999. P Holmes, M Al-Rubeai. Improved cell line development by a high throughput affinity capture surface display technique to select for high secretors. J Immunol Methods 230:141–147, 1999. JD Chesnut, AR Baytan, M Russell, MP Chang, A Bernard, IH Maxwell, JP Hoeffler. Selective isolation of transiently transfected cells from a mammalian cell population with vectors expressing a membrane anchored single-chain antibody. J Immunol Methods 193:17–27, 1996. WC Chou, KW Liao, YC Lo, SY Jiang, MY Yeh, SR Roffler. Expression of chimeric monomer and dimer proteins on the plasma membrane of mammalian cells. Biotechnol Bioeng 65:160–169, 1999. ET Boder, KD Wittrup. Yeast surface display for screening combinatorial polypeptide libraries. Nat Biotechnol 15:553–557, 1997. ET Boder, KD Wittrup. Yeast surface display for directed evolution of protein expression, affinity, and stability. Methods Enzymol 328:430–444, 2000. T Murai, M Ueda, M Yamamura, H Atomi, Y Shibasaki, N Kamasawa, M Osumi, T Amachi, A Tanaka. Construction of a starch-utilizing yeast by cell surface engineering. Appl Environ Microbiol 63:1362–1366, 1997. PD Holler, PO Holman, EV Shusta, S O’Herrin, KD Wittrup, DM Kranz. In vitro evolution of a T cell receptor with high affinity for peptide/MHC. Proc Natl Acad Sci U S A 97:5387–5392, 2000. JR Stratton-Thomas, HY Min, SE Kaufman, CY Chiu, GT Mullenbach, S Rosenberg. Yeast expression and phagemid display of the human urokinase plasminogen activator epidermal growth factor-like domain. Protein Eng 8:463– 470, 1995. M Washida, S Takahashi, M Ueda, A Tanaka. Spacer-mediated display of active lipase on the yeast cell surface. Appl Microbiol Biotechnol 56:681–686, 2001. Y Shibasaki, N Kamasawa, S Shibasaki, W Zou, T Murai, M Ueda, A Tanaka, M Osumi. Cytochemical evaluation of localization and secretion of a heterologous enzyme displayed on yeast cell surface. FEMS Microbiol Lett 192:243– 248, 2000. T Murai, M Ueda, T Kawaguchi, M Arai, A Tanaka. Assimilation of cellooligosaccharides by a cell surface-engineered yeast expressing beta-glucosi-
616
51.
52.
53. 54.
55. 56. 57.
58.
59.
60.
61.
62.
Adams and Kolmar dase and carboxymethylcellulase from Aspergillus aculeatus. Appl Environ Microbiol 64:4857–48561, 1998. MC Kieke, EV Shusta, ET Boder, L Teyton, KD Wittrup, DM Kranz. Selection of functional T cell receptor mutants from a yeast surface-display library. Proc Natl Acad Sci U S A 96:5651–5656, 1999. G Chen, A Hayhurst, JG Thomas, BR Harvey, BL Iverson, G Georgiou. Isolation of high-affinity ligand-binding proteins by periplasmic expression with cytometric screening (PECS). Nat Biotechnol 19:537–542, 2001. C Sousa, A Cebolla, V de Lorenzo. Enhanced metalloadsorption of bacterial cells displaying poly-His peptides. Nat Biotechnol 14:1017–10120, 1996. M Valls, S Atrian, V de Lorenzo, LA Fernandez. Engineering a mouse metallothionein on the cell surface of Ralstonia eutropha CH34 for immobilization of heavy metals in soil. Nat Biotechnol 18:661–665, 2000. JS Lee, KS Shin, JG Pan, CJ Kim. Surface-displayed viral antigens on Salmonella carrier vaccine. Nat Biotechnol 18:645–648, 2000. D Wahler, JL Reymond. Novel methods for biocatalyst screening. Curr Opin Chem Biol 5:152–158, 2001. JA Smiley, SJ Benkovic. Selection of catalytic antibodies for a biosynthetic reaction from a combinatorial cDNA library by complementation of an auxotrophic Escherichia coli: antibodies for orotate decarboxylation. Proc Natl Acad Sci USA 91:8319–8323, 1994. S Ritter, RG Hiller, PM Wrench, W Welte, K Diederichs. Crystal structure of a phycourobilin-containing phycoerythrin at 1.90-A˚ resolution. J Struct Biol 126: 86–97, 1999. ET Boder, KS Midelfort, KD Wittrup. Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity. Proc Natl Acad Sci U S A 97:10701–10705, 2000. RG Ashcroft, PA Lopez. Commercial high speed machines open new opportunities in high throughput flow cytometry (HTFC). J Immunol Methods 243: 13–24, 2000. YA Yeung, KD Wittrup. Quantitative screening of yeast surface-displayed polypeptide libraries by magnetic bead capture. Biotechnol Prog 18:212–220, 2002. MJ Olsen, D Stephens, D Griffiths, P Daugherty, G Georgiou, BL Iverson. Function-based isolation of novel enzymes from a large library. Nat Biotechnol 18:1071–1074, 2000.
28 Overexpression and Secretion of Biocatalysts in Pseudomonas Frank Rosenau and Karl-Erich Jaeger ¨t Du ¨sseldorf Heinrich-Heine-Universita Ju ¨lich, Germany
1
INTRODUCTION
Enzymes are naturally occurring biocatalysts operating in living cells. During biological evolution for billions of years, they have been optimized to catalyze a given reaction with high activity and substrate specificity. Nowadays, several weeks are sufficient to mimic natural evolution in a test tube by using ‘‘directed’’ or in vitro evolution, during which enzyme variants with desired properties are identified in large libraries of mutated genes. In principle, this technique does not require knowledge of the enzymes’ structure, its catalytic mechanism, or biosynthesis, making it a powerful novel tool for enzyme optimization (1–5). The directed evolution of a given biocatalyst is a multistep process including the (a) identification of a candidate enzyme, which preferably should have a catalytic activity toward the substrate of interest; (b) cloning of the respective enzyme gene; (c) generation of a large number of mutant genes; (d) expression of these genes to generate large libraries of enzyme variants; (e) identification of better-performing biocatalysts in the libraries by high-throughput screening or selection; and, finally, (f ) produc617
618
Rosenau and Jaeger
tion of the best-performing biocatalyst at a large scale. Therefore, the construction of a potent overexpression system for a gene of interest constitutes a major part of devising an efficient directed evolution strategy. 2
HOW TO DEVISE AN EFFICIENT OVEREXPRESSION SYSTEM
An efficient overexpression system consists of a vector harboring the gene(s) of interest behind strong and inducible promoters, which may be under the control of regulatory elements allowing controlled gene expression in prokaryotic or eukaryotic cells. In bacteria, tightly regulated promoters, such as the EPL and EPR promoters derived from the Escherichia coli bacteriophage E or the E. coli lac operon-based promoters Plac, Ptac, and Ptrc, are used (6). A suitable host strain allows easy handling, has a short generation time when grown in a variety of different media, and does not require extreme conditions such as high temperatures or an anoxigenic environment. Additionally, such a strain should allow DNA transformation with high efficiency. The expression level to be achieved should be high enough to allow high-throughput screening, which is usually performed in microtiter plates (culture volume: 10–100 AL) after growth of the cultures for several hours. Among other factors, the throughput of a screening method per unit of time is determined by the amount of biocatalyst produced per cell. Furthermore, a suitable expression system should offer the possibility to produce the optimized biocatalyst at a larger scale, preferably also allowing for a cost-effective downstream processing. Therefore, secretion of the biocatalyst into the culture supernatant should also be envisaged. The best known bacterial overexpression system uses E. coli host strains and the pET vector series (commercialized by Novagen, Madison) with the expression of genes from a strong promoter derived from the bacteriophage T7. Another popular example is the pBAD system marketed by Invitrogen (Carlsbad), which involves a promoter that can transiently be induced by the addition of arabinose to the culture medium. However, a considerable number of biocatalyst proteins cannot be expressed using one of these systems because their production requires several accessory cellular functions. Some enzymes may need essential cofactors for activity and unique chaperones for folding. Moreover, secretion may also be essential to achieve an enzymatically active state of the enzyme. Therefore, the development of an efficient system allowing for overexpression and secretion is not trivial, and requires a detailed knowledge of the biocatalyst protein, its biochemical properties, and the cellular pathway involved in its biosynthesis, folding, and secretion. Bacterial lipases originating from the genera
Biocatalysts in Pseudomonas
619
Pseudomonas and Burkholderia represent prototypic examples for such biocatalysts. They do have an exceptionally high potential for numerous biotechnological applications (1,5,7–12), but a complex pathway of folding and secretion is required to obtain enzymatically active protein (13). 3
BOTTLENECKS FOR OVEREXPRESSION OF LIPASES
Many bacterial lipases are secreted into the environment. Therefore, their transcription, folding, and secretion represent potential bottlenecks for lipase production (see Fig. 1). Lipases from the Gram-negative bacterial genera Pseudomonas and Burkholderia belong to three distinct groups of lipase family I, which is further divided into six subfamilies (14). Prototype lipases of subfamilies I.1 and II.2 are those from P. aeruginosa and Burkholderia glumae, which not only share a high degree of sequence homology but also several physiological features including their biogenesis and secretion. The lipase LipA produced by P. aeruginosa is encoded in a bicistronic operon together with its cognate foldase, Lif (15). Under physiological growth conditions, the transcription and translation of this lipase are downregulated to a level that results in the production of only about 200 lipase molecules per cell (16). The physiological regulation of transcription can be circumvented by placing the lipase operon behind an artificial promoter; however, it has been demonstrated that the relation of lipase to foldase molecules also determines the yield of extracellular lipase. Recently, we have shown for both P. aeruginosa and B. glumae that overproduction of the Lif foldases in relation to the corresponding lipases resulted in a significant increase in enzymatically active extracellular lipases by at least a factor of 20, indicating that the ratio of lipase to foldase indeed represents an important bottleneck for lipase overexpression (M El Khattabi, F Rosenau, W Bitter, K-E Jaeger, J Tommassen, submitted for publication). The secretion of P. aeruginosa lipase is a two-step process involving the translocation across both the inner membrane and the outer membrane (13). The signal peptide-dependent translocation into the periplasm is mediated by the Sec machinery (17). The existing experimental data suggest that the capacity of the Sec machinery does not represent a limiting factor during lipase overexpression. After having reached the periplasm, proper folding of the lipase and of several other biocatalysts requires the correct formation of disulphide bonds mediated by the Dsb proteins (reviewed in Ref. 18). In P. aeruginosa, DsbA and DsbC are absolutely required for lipase to reach a secretioncompetent conformation (19,20), thereby demonstrating that periplasmic folding represents another bottleneck for overexpression. Finally, secretion through the bacterial outer membrane occurs via a multisubunit protein
620
Rosenau and Jaeger
Figure 1 Bottlenecks (A-C) for overexpression of P. aeruginosa lipase. The lipase structural gene lip is located in an operon together with a second gene lif encoding a lipase-specific foldase. Biosynthesis requires the efficient transcription of the operon (A) for which at least two different promoters (P1 and P2) have been identified. Both the amount and the stability of the mRNA influence the efficiency of translation (B). Newly synthesized lipase is translocated into the periplasm (p) via the Sec-machinery located in the inner membrane (i.m.), where the signal sequence (ss) is removed by a specific signal peptidase. Periplasmic folding of the lipase (C) requires the action of Lif and of additional folding catalysts including Dsb proteins, which catalyze the formation of disulphide bonds. Rapid degradation by periplasmic proteases is avoided by correct folding of lipase molecules, which are subsequently recognized by the Xcp-machinery (D) and translocated across the outer membrane (o.m.) into the extracellular medium.
Biocatalysts in Pseudomonas
621
complex named the type II secretion machinery (21,22). Obviously, only correctly folded proteins are recognized by certain components of this machinery, whereas others are rapidly degraded by periplasmic proteases (19,20), defining this recognition process as another potential bottleneck during overexpression. 4
OVEREXPRESSION OF LIPASES IN PSEUDOMONAS
First of all, overexpression requires an efficient transcription of a target gene. This fact has been generally accepted since the outstanding work of Tabor and Richardson, who established an overexpression system in E. coli that uses an RNA polymerase encoded by the E. coli bacteriophage T7. The concept is based on the high processivity of this enzyme and its exceptionally high specificity for T7-derived promoters (23,24). Due to its unusual processivity, T7-RNA polymerase requires tight regulation to ensure a low basal level of expression under noninducing conditions to avoid background expression of target proteins, which may turn out to be harmful or deleterious to the host cells. Modern plasmids such as the pET series (Novagen), which are devised for T7-RNA polymerase-dependent expression in E. coli, therefore contain additional lac operator sequences preceding the promoter site and also encode extra copies of the lac repressor gene to reduce the basal gene expression. Upon induction of target gene expression, high amounts of mRNA and subsequently of protein are produced. However, the production as such of a high transcript quantity does not always guarantee the highlevel production of the respective protein because it is often accompanied by misfolding and intracellular deposition of the protein in the form of insoluble inclusion bodies. The molecular mechanisms resulting in the formation of inclusion bodies are still largely unknown (25). Despite these drawbacks, E. coli-based systems using inducible T7-RNA polymerase have proven their potential for the overexpression of target genes from various sources and are commercially available in a number of variants. Moreover, the unique features of T7-RNA polymerase have been used to increase the expression levels of several target genes in yeast, plant, or mammalian cells (26–28). The T7 overexpression system has also been adapted to construct P. aeruginosa overexpression strains (6,29–33). In Table 1, several currently available plasmids, which all harbor an inducible T7 promoter, are listed. The novel expression vector, pBBR22b (Fig. 2), is based on the mobilizable broad host range vector, pBBR1MCS (30). An AseI/PvuI fragment containing the multiple cloning site of pBBR1MCS, the promoter region, and the lacZa reporter gene allowing for blue-white selection in E. coli where exchanged by a PshAI/PpuMI fragment derived from the commercially
622
Rosenau and Jaeger
Table 1 Broad Host Range Plasmids for T7-RNA Polymerase-Dependent Protein Expression in P. aeruginosa Plasmids
Selectable markers
Additional features
pBBR22b
cmr
pBSPIIKS/pBSPIISK pEB12 pEB14 pBBR1MCS pUCPKS/pUCPSK
ampr ampr ampr cmr ampr
lac operator, lacIq gene, pelBsignal sequence, His tag Blue-white selection Multiple terminators lac operator Blue-white selection Blue-white selection
References This chapter
(12) (29) (29) (30) (31)
available expression vector, pET22b (Novagen). Unlike pET22b, the resulting vector, pBBR22b, can replicate in a variety of gram-negative bacteria and is therefore suitable for the overexpression of different target genes in strains other than E. coli. It harbors in combination a T7 promoter (PT7) and a lac operator (lacO)—a feature typical for vectors of the pET series. In expression strains that provide the Lac repressor for transcriptional control of the T7-RNA polymerase gene, this leads to a significant reduction of background expression under noninducing conditions. This effect is even further increased by a constitutively expressed additional copy of the Lac repressor gene (lacIq) encoded on the vector. A strong ribosomal binding site (SD) enables an efficient translation initiation of target genes cloned into the polylinker region. Furthermore, in-frame fusions with the pelB signal sequence allow for a Sec-dependent translocation of overexpressed target proteins into the periplasm of several different host cells. Moreover, by creating in-frame fusions with a His-tag coding sequence, target proteins can be constructed that contain a carboxy-terminal affinity tag enabling an easy one-step affinity purification. Brunschwig and Darzins have constructed the P. aeruginosa strain, PADD 1976, which contained a cassette composed of the T7-RNA polymerase structural gene under the control of a lacUV5 promoter and, in addition, the lacIq gene encoding the E. coli Lac repressor (29). As lacIq is constitutively expressed in P. aeruginosa, the transcription of the T7-RNA polymerase gene from the lacUV5 promoter is repressed under noninducing conditions. Addition of the synthetic inducer, isopropyl-h-D-thiogalactoside (IPTG), induces the synthesis of T7-DNA polymerase, which itself transcribes plasmid-encoded target gene(s) starting from the T7 promoter. Initial experiments demonstrated that P. aeruginosa PADD 1976 was a
Biocatalysts in Pseudomonas
623
Figure 2 The broad host range expression vector, pBBR22b. This vector was constructed by insertion of a PshAI/PpuMI-fragment derived from the commercially available expression vector, pET22b (Novagen, Madison, Wisconsin, USA), into the mobilizable broad host range vector, pBBR1MCS (Ref. 30). It harbors the combination of a T7 promoter (PT7) and a lac-operator (lacO), a strong ribosomal binding site (SD), and additionally allows the construction of in-frame fusions with the pelB signal sequence (ss) and a His-tag coding sequence. The chloramphenicol resistance gene (Cmr) and the elements needed for mobilization (MOB) and replication (REP) in gram-negative bacteria originating from pBBR1MCS are indicated as black arrows (Ref. 30).
suitable host strain for the overexpression of P. aeruginosa lipase (32). However, its use as an expression host for mutant lipases was limited because it also harbored the wild-type gene and therefore produced a significant background lipase activity. In order to exclude this effect, we have constructed different strains in which the chromosomal lipase gene was inactivated. P. aeruginosa PABST7.1 was based on mutant strain, P. aeruginosa PABS1, which carries a large deletion in the lipase operon covering about 600 bp of the lipase structural gene lipA and about 300 bp of the
Figure 3 Construction of the overexpression strain P. aeruginosa PAFRT7.7. The T7-expression cassette obtained from phagemid pEB1 (Ref. 29) harboring the T7RNA polymerase gene and the gene encoding the Lac-repressor was cloned into the lipA gene of pLip3-S, a pBluescript derivative carrying the P. aeruginosa lipase operon. The resulting gene disruption construct was subcloned into the mobilizable suicide vector, pME3087 (kindly provided by Dieter Haas, University of Lausanne, Switzerland), giving pMELipT7, which was then inserted into the chromosome of the wild-type strain P. aeruginosa PAO1 by triparental conjugation and recombination replacing the wild-type lipA gene. For the resulting strain P. aeruginosa PAFRT7.7, the allelic exchange was confirmed by Southern blotting and the lipase-deficient phenotype was determined by lipase activity assays and Western blotting.
Biocatalysts in Pseudomonas
625
foldase gene lif. However, in this strain, the expression cassette containing the IPTG-inducible T7-RNA polymerase gene was inserted into the chromosome at an unknown position by random integration of the phagemid pEB1. Therefore, strain P. aeruginosa PAFRT7.7 was constructed by sitespecific integration of the expression cassette into the lipase operon (32). This strain now represents a lipase-negative mutant suitable for overexpression of lipase (Fig. 3), which can also be used for the background-free expression of mutant lipase genes (e.g., derived from libraries constructed by directed evolution experiments). Both P. aeruginosa strains yielded a lipase overexpression level, which exceeded that of the P. aeruginosa wild type by at least five orders of magnitude. Standard T7 promoter-based overexpression protocols suggest to induce expression during the logarithmic growth phase. On the other hand, extracellular lipase is produced and secreted by wild-type P. aeruginosa only when the cells reach the stationary growth phase. Furthermore, it is known that the type II secretion machinery needed to transport lipase into the culture medium is subject to growth phase regulation being expressed only at high cell densities (34). Therefore, we have exactly determined the optimal time point for the induction of T7-RNA polymerase-dependent lipase gene expression. The most efficient lipase production was obtained when T7RNA polymerase expression was induced at the beginning of the stationary growth phase followed by a lipase production phase of 24 h (unpublished results). Extracellular lipase production was increased to 150 mg/L culture supernatant without any further optimization of media and growth conditions (32). 5
OVEREXPRESSION USING HETEROLOGOUS BACTERIAL HOST STRAINS
In several countries including Germany, containment regulations pose severe restrictions to the use of potentially pathogenic bacteria for large-scale biotechnological applications (e.g., industrial-scale fermentations and downstream processes). In order to circumvent these restrictions, it is desirable to use expression systems derived from nonpathogenic strains. Mainly for this reason, much effort has been put into attempts to overexpress Pseudomonas lipases in E. coli as a nonpathogenic host strain (35–38). However, due to the complexity of the folding and secretion pathway used by these lipases, these attempts resulted in the production of enzymatically inactive lipase protein, which subsequently had to be subjected to time-consuming in vitro refolding procedures to obtain active enzymes (37,38). More recently, the cooverexpression of different general cellular chaperones has become a general strategy to improve the folding capacity of heterologous expression strains,
626
Rosenau and Jaeger
thereby enhancing the production level of those proteins that tend to form insoluble aggregates or misfolded and inactive intermediates (39–44). The same concept has successfully been applied to optimize the folding process of heterologous proteins in the bacterial periplasm (45). However, the successful heterologous overexpression of a Pseudomonas lipase has not yet been reported. Recently, we have developed a heterologous lipase overexpression system that allows both the expression and the secretion of P. aeruginosa lipase into the culture supernatant of the nonpathogenic strain, Pseudomonas putida, which has been characterized as an extremely versatile bacterium with a high potential in bioremediation and biocontrol applications. In contrast to other pseudomonads, it is classified as a GRAS ( generally regarded as safe) organism, which qualifies this strain as a production host for the large-scale industrial production of biocatalysts (46). Moreover, P. putida offers the practical advantage of having a lipase-negative phenotype. For overexpression, the P. aeruginosa lipase operon was cloned into plasmid pBBR1MCS to give pBBL7, which resulted in the production of several milligrams per liter of lipase protein upon induction with IPTG. However, periplasmic folding and Xcp-mediated secretion through the bacterial outer membrane were also needed as outlined above. There are good reasons to assume that at least some periplasmic foldases such as Dsb proteins, rotamases, or general chaperones are present in P. putida. Secretion across the outer membrane may also be a limiting step, as demonstrated in an elegant approach carried out to identify cellular factors that could increase extracellular lipase production by P. alcaligenes: by screening a genomic cosmid library, a plasmid, which harbored the genes for a type II secretion machinery and greatly enhanced the production of lipase, was identified (47). In P. putida, xcp homologous genes have also been identified (48,49). However, a functional Xcp secretion machinery was not described. If it exists at all, it cannot mediate the secretion of heterologous proteins as demonstrated for P. aeruginosa elastase (50) and lipase (F Rosenau and K-E Jaeger, unpublished results). In order to reconstitute a functional Xcp machinery, we have identified this in a P. aeruginosa genomic cosmid library and cloned the entire xcp gene cluster. Upon transformation of P. putida with both pXCP7 containing 12 P. aeruginosa xcp genes and pBBL7 harboring the lipase operon, P. aeruginosa lipase was efficiently produced and secreted by P. putida (Fig. 4). The level of secreted and enzymatically active lipase was comparable to that observed for P. aeruginosa wild type, thereby demonstrating that the P. aeruginosa Xcp machinery was fully functional also in the heterologous host, P. putida. This result suggests the possibility to develop P. putida as a heterologous nonpathogenic expression and secretion host for biocatalysts produced by different genera of related Pseudomonas and Burkholderia species.
Biocatalysts in Pseudomonas
627
Figure 4 P. putida as a non pathogenic host for heterologous expression and secretion of P. aeruginosa lipase. The production of extracellular P. aeruginosa lipase in P. putida requires expression of both the lipase protein and the Xcp secretion machinery from P. aeruginosa. (A) Production of lipase was analyzed on agar plates containing tributyrin as the substrate. The formation of clear halos around the colonies indicates the secretion of enzymatically active lipase. Significant lipase production can only be observed in P. putida when the bacteria contain both the lipase operon on plasmid pBBL7 and the Xcp machinery encoded on plasmid pXCP7 originating from pLAFR3, which served as a control. (B) This result was confirmed by quantitative analysis of cell-free culture supernatants from P. putida grown in LB medium. Lipase activity was measured spectrophotometrically using p-nitrophenyl palmitate as the substrate and expressed as enzyme activity (Akat/mL) per unit of bacterial cell density determined spectrophotometrically as O.D.580 nm. (C) Lipase protein was detected by Western immunoblotting using a lipase-specific antiserum.
6
CONCLUSION
A rapidly increasing number of genes encoding novel biocatalysts are currently being isolated. Environmental DNA (the so-called metagenome) is screened for functional open reading frames, new genes are detected by determination of the complete genome sequences of many prokaryotic and eukaryotic organisms, and modern protein engineering as well as directed evolution techniques generate large libraries consisting of billions of mutants derived from appropriate wild-type genes. However, to assess the functionality of all these novel genes, they have to be expressed in a way that results in the production of enzymatically active enzyme protein in amounts high enough to allow the determination of at least the most important
628
Rosenau and Jaeger
biochemical properties including enzymatic activity, substrate specificity, or enantioselectivity. Therefore, efficient systems are needed for the production and isolation of biocatalyst proteins from various sources. Recently, the genus Pseudomonas has emerged as a reasonable alternative to the well-known overexpression systems operating in E. coli. Several P. aeruginosa strains have been adapted to allow specific and effective overexpression from the T7 promoter driven by an inducible T7-RNA polymerase. In addition, efficient secretion of overexpressed enzymes into the bacterial culture supernatant is also possible as was demonstrated for P. aeruginosa lipase. Furthermore, heterologous overexpression and secretion in P. putida of a P. aeruginosa lipase were achieved by co-expression of both the biocatalyst gene itself and the corresponding homologous type II secretion machinery. Undoubtedly, Pseudomonas bacteria will belong to the tool box of novel and efficient biocatalyst overexpression systems that molecular biologists need to develop in the near future.
ACKNOWLEDGMENT The work reported here was funded by the European Union project Nanofoldex.
REFERENCES 1. 2. 3. 4. 5.
6. 7. 8. 9.
KE Jaeger, MT Reetz. Directed evolution of enantioselective enzymes for organic chemistry. Curr Opin Chem Biol 4:68–73, 2000. IP Petrounia, FH Arnold. Designed evolution of enzymatic properties. Curr Opin Biotechnol 11:325–330, 2000. UT Bornscheuer, M Pohl. Improved biocatalysts by directed evolution and rational protein design. Curr Opin Chem Biol 5:137–143, 2001. ET Farinas, T Bulter, FH Arnold. Directed enzyme evolution. Curr Opin Biotechnol 12:545–551, 2001. KE Jaeger, T Eggert, A Eipper, MT Reetz. Directed evolution and the creation of enantioselective biocatalysts. Appl Microbiol Biotechnol 55:519–530, 2001. HP Schweizer. Vectors to express foreign genes and techniques to monitor gene expression in pseudomonades. Curr Opin Biotechnol 12:439–445, 2001. A Liese, K Seelbach, C Wandrey. Industrial Biotransformations. Weinheim: Wiley-VCH, 2000. KE Jaeger, T Eggert. Lipases for biotechnology. Curr Opin Biotechnol 13: 390–397, 2002. MT Reetz. Lipases as practical biocatalysts. Curr Opin Chem Biol 6:145–150, 2002.
Biocatalysts in Pseudomonas
629
10. KE Jaeger, K Liebeton, A Zonta, K Schimossek, MT Reetz. Biotechnological application of Pseudomonas aeruginosa lipase: efficient kinetic resolution of amine and alcohols. Appl Microbiol Biotechnol 46:99–105, 1996. 11. KE Jaeger, MT Reetz. Microbial lipases form versatile tools for biotechnology. Trends Biotechnol 16:396–403, 1998. 12. K Liebeton, A Zonta, K Schimossek, M Nardini, D Lang, BW Dijkstra, MT Reetz, KE Jaeger. Directed evolution of an enantioselective lipase. Chem Biol 7:709–718, 2000. 13. F Rosenau, KE Jaeger. Bacterial lipases from Pseudomonas: regulation of gene expression and mechanisms of secretion. Biochimie 82:1023–1032, 2000. 14. JL Arpigny, KE Jaeger. Bacterial lipolytic enzymes: classification and properties. Biochem J 343:177–183, 1999. 15. KE Jaeger, BW Dijkstra, MT Reetz. Bacterial biocatalysts: molecular biology, three-dimensional structures, and biotechnological applications of lipases. Annu Rev Microbiol 53:315–351, 1999. 16. W Stuer, KE Jaeger, UK Winkler. Purification of extracellular lipase from Pseudomonas aeruginosa. J Bacteriol 168:1070–1074, 1986. 17. EH Manting, AJ Driessen. Escherichia coli translocase: the unravelling of a molecular machine. Mol Microbiol 37:226–238, 2000. 18. JF Collet, JC Bardwell. Oxidative protein folding in bacteria. Mol Microbiol 44:1–8, 2002. 19. K Liebeton, A Zacharias, KE Jaeger. Disulfide bond in Pseudomonas aeruginosa lipase stabilizes the structure but is not required for interaction with its foldase. J Bacteriol 183:597–603, 2001. 20. A Urban, M Leipelt, T Eggert, KE Jaeger. DsbA and DsbC affect extracellular enzyme formation in Pseudomonas aeruginosa. J Bacteriol 183:587–596, 2001. 21. M Koster, W Bitter, J Tommassen. Protein secretion mechanisms in Gramnegative bacteria. Int J Med Microbiol 290:325–331, 2000. 22. M Sandkvist. Biology of type II secretion. Mol Microbiol 40:271–283, 2001. 23. S Tabor, CC Richardson. A bacteriophage T7 RNA polymerase/promoter system for controlled exclusive expression of specific genes. Proc Natl Acad Sci USA 82:1074–1078, 1985. 24. FW Studier, BA Moffatt. Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol 189:113–130, 1986. 25. C Schlieker, B Bukau, A Mogk. Prevention and reversion of protein aggregation by molecular chaperones in the E. coli cytosol: implications for their applicability in biotechnology. J Biotechnol 96:13–21, 2002. 26. BM Benton, WK Eng, JJ Dunn, FW Studier, R Sternglanz, PA Fisher. Signalmediated import of bacteriophage T7 RNA polymerase into the Saccharomyces cerevisiae nucleus and specific transcription of target genes. Mol Cell Biol 10: 353–360, 1990. 27. MW Lassner, A Jones, S Daubert, L Comai. Targeting of T7 RNA polymerase to tobacco nuclei mediated by an SV40 nuclear location signal. Plant Mol Biol 17:229–234, 1991. 28. A Lieber, U Kiessling, M Strauss. High level gene expression in mammalian
630
29. 30. 31. 32.
33.
34.
35.
36. 37.
38.
39.
40.
41.
42.
43.
Rosenau and Jaeger cells by a nuclear T7-phase RNA polymerase. Nucleic Acids Res 17:8485–8493, 1989. E Brunschwig, A Darzins. A two-component T7 system for the overexpression of genes in Pseudomonas aeruginosa. Gene 111:35–41, 1992. ME Kovach, RW Phillips, PH Elzer, RM RoopII, KM Peterson. pBBR1MCS: a broad-host-range cloning vector. Biotechniques 16:800–802, 1994. AA Watson, RA Alm, JS Mattick. Construction of improved vectors for protein production in Pseudomonas aeruginosa. Gene 172:163–164, 1996. KE Jaeger, B Schneidinger, F Rosenau, M Werner, D Lang, BW Dijkstra, A Zonta, MT Reetz. Bacterial lipases for biotechnological applications. J Mol Catal B Enzym 3:3–12, 1997. TT Hoang, AJ Kutchma, A Becher, HP Schweizer. Integration-proficient plasmids for Pseudomonas aeruginosa: site-specific integration and use for engineering of reporter and expression strains. Plasmid 43:59–72, 2000. M Akrim, M Bally, G Ball, J Tommassen, H Teerink, A Filloux, A Lazdunski. Xcp-mediated protein secretion in Pseudomonas aeruginosa: identification of two additional genes and evidence for regulation of xcp gene expression. Mol Microbiol 10:431–443, 1993. JL Aamand, AH Hobson, CM Buckley, ST Jorgensen, B Diderichsen, DJ McConnell. Chaperone-mediated activation in vivo of a Pseudomonas cepacia lipase. Mol Gen Genet 245:556–564, 1994. F Ihara, I Okamoto, K Akao, T Nihira, Y Yamada. Lipase modulator protein (LimL) of Pseudomonas sp. strain 109. J Bacteriol 177:1254–1258, 1995. DT Quyen, C Schmidt-Dannert, RD Schmid. High-level formation of active Pseudomonas cepacia lipase after heterologous expression of the encoding gene and its modified chaperone in Escherichia coli and rapid in vitro refolding. Appl Environ Microbiol 65:787–794, 1999. PC Traub, C Schmidt-Dannert, J Schmitt, RD Schmid. Gene synthesis, expression in E. coli, and in vitro refolding of Pseudomonas sp. KWI 56 and Chromobacterium viscosum lipases and their chaperones. Appl Microbiol Biotechnol 55:198–204, 2001. E Ailor, MJ Betenbaugh. Overexpression of a cytosolic chaperone to improve solubility and secretion of a recombinant IgG protein in insect cells. Biotechnol Bioeng 58:196–203, 1998. K Nishihara, M Kanemori, M Kitagawa, H Yanagi, T Yura. Chaperone coexpression plasmids: differential and synergistic roles of DnaK–DnaJ–GrpE and GroEL–GroES in assisting folding of an allergen of Japanese cedar pollen, Cryj2, in Escherichia coli. Appl Environ Microbiol 64:1694–1699, 1998. C Vonrhein, U Schmidt, GA Ziegler, S Schweiger, I Hanukoglu, GE Schulz. Chaperone-assisted expression of authentic bovine adrenodoxin reductase in Escherichia coli. FEBS Lett 443:167–169, 1999. K Nishihara, M Kanemori, H Yanagi, T Yura. Overexpression of trigger factor prevents aggregation of recombinant proteins in Escherichia coli. Appl Environ Microbiol 66:884–889, 2000. K Ikura, T Kokubu, S Natsuka, A Ichikawa, M Adachi, K Nishihara, H
Biocatalysts in Pseudomonas
44.
45.
46.
47.
48.
49.
50.
631
Yanagi, S Utsumi. Co-overexpression of folding modulators improves the solubility of the recombinant guinea pig liver transglutaminase expressed in Escherichia coli. Prep Biochem Biotechnol 32:189–205, 2002. KH Lee, HS Kim, HS Jeong, YS Lee. Chaperonin GroESL mediates the protein folding of human liver mitochondrial aldehyde dehydrogenase in Escherichia coli. Biochem Biophys Res Commun 298:216–224, 2002. Z Zhang, ZH Li, F Wang, M Fang, CC Yin, ZY Zhou, Q Lin, HL Huang. Overexpression of DsbC and DsbG markedly improves soluble and functional expression of single-chain Fv antibodies in Escherichia coli. Protein Expr Purif 26:218–228, 2002. MC Ronchel, JL Ramos. Dual system to reinforce biological containment of recombinant bacteria designed for rhizoremediation. Appl Environ Microbiol 67:2649–2656, 2001. G Gerritse, R Ure, F Bizoullier, WJ Quax. The phenotype enhancement method identifies the Xcp outer membrane secretion machinery from Pseudomonas alcaligenes as a bottleneck for lipase production. J Biotechnol 64: 23–38, 1998. A de Groot, JJ Krijger, A Filloux, J Tommassen. Characterization of type II protein secretion (xcp) genes in the plant growth-stimulating Pseudomonas putida, strain WCS358. Mol Gen Genet 250:491–504, 1996. A de Groot, G Gerritse, J Tommassen, A Lazdunski, A Filloux. Molecular organization of the xcp gene cluster in Pseudomonas putida: absence of an xcpX (gspK) homologue. Gene 226:35–40, 1999. P Braun, W Bitter, J Tommassen. Activation of Pseudomonas aeruginosa elastase in Pseudomonas putida by triggering dissociation of the propeptide–enzyme complex. Microbiology 146:2565–2572, 2000.
29 Analysis of Catalytic and Structural Stability of Native and Covalently Modified Enzymes P.V. Sundaram and S. Srimathi Centre for Protein Engineering and Biomedical Research, The Voluntary Health Services Madras, India
1
INTRODUCTION
Whereas demand for enzymes is largest from the industrial markets, new growth areas are blossoming in many other areas in biotechnology. Novel use for countless new enzymes is a major source of growth in the recent years. Industrial processes including drug design in the pharmaceutical industry, chiral synthesis, therapeutic applications, food processing, detergents, animal feeds, and fuel alcohol markets are the areas that are bound to demand increased attention and investment in biotechnology. Such a demand has spurred new approaches to large-scale production of stable enzymes. In addition to all of this is the prospect of countless new proteins/enzymes that will be appearing in the public domain in the new era of proteomics. Techniques have been established for large-scale production of enzymes after changing their structures by protein engineering techniques such as SDM, gene shuffling, or directed evolution of proteins (1). 633
634
Sundaram and Srimathi
To achieve gross shifts in properties, most of the R&D developments utilize protein engineering techniques in structure modifications. In this chapter, we demonstrate that a chemical approach to structure modification of an already folded protein is a viable alternative method to produce stable enzymes. It is also cost-effective unlike the genetic approach. How does one measure enzyme stability? Though we recognize the fact that there is catalytic stability (CS) as different from structural or thermodynamic stability (SS), it is naturally the former which is more relevant in the sense that for most of those who are interested in applications, it is the retention of catalytic activity of the enzyme under varying conditions that will be their prime concern. Structural stability (SS) is of course an important entity, but, as we see from our experience with a variety of enzymes, SS need not necessarily resemble CS in its response to temperature changes or the denaturing effect of chaotropic agents. In other words, very often, catalytic activity may be lost while perceptible changes in structure may not be visible, and it is also possible that in some cases, there may be noticeable changes in structure without much alteration in activity. Although the role of the naturally inherited structure of a protein may be understood in qualitative terms, it is not always clear what the quantitative contributions of the various forces are. Stability may depend on (1) the primary sequence as well as the tertiary structure of a fully folded protein, the latter, for example, producing S– S bonds and ionic interactions at critical places, and (2) factors independent of the sequence such as water structure, charge distribution on the protein, and the dipole moment of the medium. For long now, efforts to preserve catalytic activities have been approached mainly by adding the so-called cosolvents such as sugars, salts, or suitable buffer ions to enzyme solutions or even additives such as glycerol or ammonium sulfate. Similarly, monosaccharides and disaccharides are often added to lyophilized enzyme preparations. In the more recent times, the concept of immobilization of enzymes on insoluble carriers to conserve activity began to emerge followed by genetic engineering techniques to alter protein structures by changing primary sequences using substitutions. This last mentioned approach relied upon intervention at the DNA level commonly known as the site-directed mutagenesis (SDM). This has been followed by gene shuffling and guided evolution, the last mentioned now being considered the best approach for structural changes aimed at optimizing performance and stability. One of the primary objectives of enzyme engineering is to produce stable enzymes useful for various biotechnological applications including organic synthesis. The native enzymes can be structurally engineered to suppress their inactivation under different conditions. This can be achieved by mutagenesis, immobilization, or chemical modification. In our approach of
Analysis of Stability of Enzymes
635
stabilizing the enzymes by chemical modification, we concentrate both on the catalytic and structural stabilities. 2 2.1
ANALYSIS OF CATALYTIC STABILITY Thermal Inactivation
The catalytic stability of an enzyme is measured by progressive inactivation of the enzyme at various temperatures and pH. The half-life of inactivation, t1/2, is obtained by exposing the enzyme to various temperatures for a definite period before measuring its activity at a temperature at which it is stable. The slope of the plot of log percentage residual activity against time gives the inactivation constant, ki, at that temperature. The half-life is calculated using the following relationship. t1=2 ¼ ln2=ki ð1Þ The inactivation constant calculated at various temperatures can be used to make an Arrhenius plot of log ki against 1/T from which the energy of inactivation, Eai, is calculated. Eai ¼ 2:303 R slope ð2Þ The free energy (DGp ), enthalpy (DHp ), and entropy (DSp ) associated with inactivation can be calculated using the following relationship kB T expðDGp =RTÞ kcat ¼ ð3Þ h p ð4Þ DH ¼ Eai RT p p ðDH DG Þ ð5Þ DSp ¼ T where ki is the first-order inactivation rate constant (h1), kB is the Boltzmann’s constant, and h is Planck’s constant. 2.2
Thermal Activation
In order to calculate the energies and entropies of activation, the use of the theory of absolute reaction rates is necessary. The influence of temperature on the rate constant depends on the equilibrium between the activated complex and the unactivated reactants. The rate of breakdown of these complexes is given by kcat ¼ kB T=hK
ð6Þ
where K is the equilibrium constant. Applying the thermodynamic equations to this equilibrium DGp ¼ RT lnK
ð7Þ
636
Sundaram and Srimathi
Substituting Eq. (7) in Eq. (6) kcat ¼ kB T=h expðDGp =RTÞ
ð8Þ
The rate constant measured at various temperatures can be used to make an Arrhenius plot to get the activation energy, Ea. The slope of this plot is equal to Ea/R. The enthalpy and entropy of activation are calculated using Eqs. (4) and (5). Lonhienne et al. (2) have pointed out in their review that the contribution of kcat to the free energy should be taken into consideration while calculating DGp in Eq. (9). According to them, the determination of kcat at various temperatures eliminates this error in the DGp values estimated using Eq. (3). DGp ¼ RTð23:76 þ lnT lnkcat Þ
ð9Þ p
In order to see how sensitive the DG values will be to alterations in kcat values, we examined the reactions catalyzed by porcine trypsin using BAPNA as the substrate. In the temperature range 35jC to 40jC, kcat values were around 182 S1, and increasing this 10-fold, i.e., 1820 S1, the DGp value dropped from 62.18 to 56.29 kJ/mol, a clear 5.9 kJ/mol decrease. It is worth pointing out that in many cases, such as porcine trypsin mentioned here, the Km values remain within the same order of magnitude and do not vary dramatically in the temperature range 35jC to 70jC. 3
ANALYSIS OF STRUCTURAL STABILITY
The primary sequence of the native proteins folds into a stable three-dimensional conformation called the native conformation. There is always a balance between the conformation assumed and catalytic activity achieved. There are specific regions in a protein which contribute to its structural and catalytic stabilities, and it may be often difficult to correlate one with the other. To know the effect of mutation or chemical modification, it is essential to know the conformational stability of native and denatured forms of wild type and mutant or native and modified protein. Since it is impossible to determine the absolute conformational stability, it is calculated as the difference in free energy for the native and the unfolded form relative to the transition state. Protein unfolding is normally induced by pH, temperature, or chaotropic agents such as urea and guanidine hydrochloride. The technique chosen will depend on the property to be measured to follow the unfolding process. UV difference spectroscopy, fluorescence spectroscopy, and circular dichroic spectroscopy are the most useful techniques. Unfolding curves obtained from these techniques can be analyzed further to get the essential information about the denaturation (4).
Analysis of Stability of Enzymes
3.1
637
The Mathematics of Protein Unfolding
A typical unfolding curve can be divided into three regions assuming y as the optical property measured to monitor the unfolding process. (i) The pretransition region, which shows the variation of y for the native, folded protein with respect to the denaturant concentration, pH, and temperature measured as yN. (ii) The transition region where y varies as the protein unfolds. (iii) The posttransition region showing the variation of y for the denatured, unfolded protein measured as yD. We use equilibrium methods to examine the effect of modification on the relative energies of the stable native and unfolded forms. Measuring the conformational stability requires the determination of the equilibrium constant for the denaturation process. It is assumed that there exists at least two states which are stable, the native, folded state and the denatured, unfolded state. aD N Y
ð10Þ
Unfolding of many globular proteins has been found to approach a two-state folding mechanism as shown above. Determination of whether protein unfolding is a single-step transition between two states or a multistep conformational transition with intermediate states necessitates the use of high-precision instrumentation. The smooth single-step transition involves subtransitions, detection of which is purely dependent on the sensitivity of measurement. At any point during unfolding, only the folded and unfolded conformations are present at significant concentrations. It can also be said that even if there are intermediates involved, they can be classified as native-like and denatured-like intermediates. Therefore fN þ fD ¼ 1
ð11Þ
where fN is the fraction of folded conformation and fD is the fraction unfolded. Values of y characteristic of the native state, yN, and of the denatured state, yD, can be obtained by the extrapolation of the pretransition and posttransition regions, respectively. For a two-state mechanism, since fN+fD=1, y ¼ yN f N þ yD f D
ð12Þ
Combining Eqs. (11) and (12) fD ¼ ðy yN Þ=ðyD yN Þ
ð13Þ
fN ¼ ðyD yÞ=ðyD yN Þ
ð14Þ
638
Sundaram and Srimathi
Thus the equilibrium constant, KD, and the free energy of unfolding, DGD, can be calculated using the following relation KD ¼ fD =fN ¼ ðy yN Þ=ðyD yÞ DGD ¼ RT lnKD
ð15Þ ð16Þ
Three methods are currently used to estimate DGDH2O which is DGD at zero concentration of denaturant. They are Tanford’s model, denaturant binding model, and linear extrapolation model (LEM) discussed in detail by Pace (4). The LEM is the simplest and the most widely used method. It is based on the linear dependence of DGD on the denaturant concentration in the transition region and assumes that this linearity continues to zero concentration of the denaturant. The data are fitted to an equation DGD ¼ DGD H2 O m ðdenaturantÞ
ð17Þ
where m is the measure of the dependence of DGD on denaturant concentration and it is equal to the slope of the plot of DGD vs. [D]. Also, at the midpoint of the unfolding curve ½D1=2 ¼ DGD H2 O=m
ð18Þ
where [D]1/2 is the denaturant concentration at which there is 50% unfolding. Also, at [D]1/2, the free energy, DGD=0. We are interested in determining the conformational stability of proteins with modified structures. Homologous series of proteins, mutants, and chemically modified proteins may be compared as their structure may show minor variations. The difference in conformational stability D(DG) can be determined either from the DGDH2O values of the proteins or from the [D]1/2 and m values as follows: X m=n kcal=mol ð19Þ DðDGÞ ¼ D½D1=2 where D[D]1/2 is the difference in [D]1/2 values and Sm/n is the average value of m for n number of proteins. 3.2
Techniques Used to Follow Unfolding
It is well known that the intrinsic properties of the protein structure and amino acid residues are exploited as the probes to follow the unfolding path. The change in molar extinction coefficient upon unfolding can be monitored by UV difference spectroscopy selecting 287 nm for tyrosine and 292 nm for tryptophan as absorption maximum. When the intrinsic fluorescence of tyrosine and tryptophan are used to trace unfolding, the protein is excited and the emission spectra are recorded by a spectrofluorophotometer in the presence of the denaturant of interest. Excitation at 281 nm gives both tyrosine and tryptophan fluorescence. However, when the excitation wave-
Analysis of Stability of Enzymes
639
length is 295 nm, only tryptophan is excited. The emission maximum is observed between 345 and 350 nm when the tryptophan residues are solventexposed, and it occurs between 320 and 330 nm when they are in the interior of the globular proteins. Both UV difference spectroscopy and fluorescence spectroscopy reflect the changes in the tertiary structure governed by the aromatic side chains. However, with circular dichroism (CD spectroscopy), both secondary and tertiary structural changes can be monitored. The peptide backbone of the globular proteins forms the basic secondary structure as a result of protein folding. The three major secondary structures, a-helix, h-sheet, and random coil, change with denaturant concentration. The far UVCD (190–250 nm) shows the changes in secondary structure (5). The ellipticity changes at 222 and 208–210 nm show the changes in helix and those between 216 and 218 nm estimate the changes in h-sheets. The unordered or random coil shows a strong positive band around 200 nm. The near UVCD (250–300 nm) measures the tertiary structural changes characterized by aromatic side chains. Ellipticity changes at 270 and 296 nm show the changes in tertiary structure during unfolding. The noncoincidence of the transition curves obtained from the ellipticity changes at far UV (222 nm) and at near UV (270 and 296 nm) region will show that there is at least one intermediate involved. The results obtained by various spectroscopic techniques can be analyzed well using the same two-state model and equations discussed earlier. It is also possible to analyze the thermal denaturation of proteins by CD. When analyzed similarly, it gives an additional parameter, Tm, the melting temperature. At Tm, DGD=0. However, comparing thermal denaturation results could be difficult because denaturation of proteins under identical conditions, but monitored by different techniques such as fluorescence or CD, need not produce results showing similar trends, as for example, Tm obtained through fluorescence shows an increase in Tm values for trypsins modified with a variety of modifiers. Opposite trends are noticed for the Tm values of the samples in the thermal denaturation studies carried out with a CD. 4
ALTERNATIVES IN STRUCTURE MODIFICATION AND THEIR RELATIVE MERITS
Protein engineering, including the various approaches mentioned earlier, has been the most popular approach used in improving and preserving the catalytic activity of enzymes. However, protein structures may also be modified by in vitro covalent modification using a variety of reagents (not necessarily composed of amino acids) of different physicochemical properties such as aldehydes, sugars, carbohydrate polymers, and PEG derivatives to mention a few.
640
4.1
Sundaram and Srimathi
General Approaches to In Vitro Modification
There are several factors inherent to the final strategy employed in in vitro covalent modification of enzymes such as: a) Molecular size and subunit structure of the enzyme. b) The availability of the primary sequence information and crystal structure data of the enzyme which will be ideal. c) Knowledge of the interlysine and inter –COOH group distances calculated from the crystal data in (b) will be useful in selecting the modifier. d) General information on the pH and temperature dependence and any metal ion effects that deserves attention. e) The molar ratio of the enzyme to modifier used during modification. f) Choice of the enzyme to modifier molar ratio is likely to influence the production of a heterogeneous mixture of enzyme adducts having different degrees of modification. 4.2
Consequences of Modification
Chemical modification of an enzyme could induce changes in conformation which, in turn, may affect the catalytic activity and structural stability. Product yield, homogeneity or heterogeneity of the products formed, percentage of the targeted protein groups being modified, and percentage activity retained by the structurally modified protein are features that are of primary concern before further characterization of the products is undertaken. The extent to which the catalytic and structural parameters are altered depends on the degree of modification since the chemical modification procedures are group-specific and not site-specific, forming a heterogeneous mixture of modified enzymes. In this context, the term percentage modification can be misleading because the same percentage modification can be produced involving different set of target groups. Two situations are possible, i.e., (1) activity may be the same whereas the percentage of the groups modified may be different or (2) catalytic activity retained may vary though the enzyme may have been modified to the same extent. One may or may not see a trend in the degree of modification and the activity retained. A generic procedure has been developed to modify the structure of enzymes by in vitro covalent procedures (6–8). The approach takes into account the inherent properties of the various enzymes so that there is room to design the physicochemical properties of the modifier molecules, with the primary concern being that, while modifying the enzymes, their specificity,
Analysis of Stability of Enzymes
641
and in broad terms their native properties, are retained while their thermotolerances and chemotolerances improve. Having achieved this, the aim is to characterize the various parameters connected with the efficient performance of the catalysts. This includes their thermotolerance and chemotolerance. This detailed analysis should indicate what parameters are dependent on the primary sequence of the enzymic protein and which are the sequenceindependent factors that affect stability. In other words, how complex and predictable the molecular mechanism of stabilization is should become clearer in our analysis. 4.2.1
Effect of Product Heterogeneity
The number of specific groups modified in a protein during chemical modification can introduce a catalytic activity-independent heterogeneity. This kind of variation will be more pronounced when a relatively large polymeric modifier with multiple reactive sites is used. However, if all the reactive sites are not blocked simultaneously due to steric factors of the reactants (enzyme or polymeric modifier), the initial states of the available binding sites on the modifier molecule might influence the succeeding steps. The effective stoichiometry in reactions involving macromolecular reactants is subject to changes depending on (a) the conformation of the macromolecules, (b) the orientation of the molecules, and (c) the relative reactivities of the potential reactive sites which may begin to vary as the initial sites have already reacted. Heterogeneously modified enzymes separated by gel filtration chromatography show varying degrees of catalytic and structural stabilities. Chemical modification of proteins targeting specific amino acids can only be group-specific and not site-specific. For this reason, since a given protein has only a finite number of the targeted amino acids, it is possible to modify the protein to the same extent (% modification) but with a different set of amino acids, e.g., q-NH2 groups of lysine when one is targeting lysines. Under identical conditions, the extent of modification achieved may be reproducible, but the activities retained need not be equal and reproducible. In general, it has been found that high degrees of modifications result in reduced activity of an otherwise more stable conformer. Under identical conditions, a product of predictable conformation can be obtained. Since solvents can also affect protein conformation, the choice of solvent composition during modification becomes another variable which can be exploited. 4.2.2
Michaelis Parameters and Their Significance
It is important to know what the catalytic efficiency (kcat/Km) and the specific activity of a modified enzyme are before more detailed characteriza-
642
Sundaram and Srimathi
tion is carried out. Catalytic efficiency will also be affected by the molecular size of the substrates. Proteases are a good example to test this phenomenon since these enzymes may be studied with small molecular weight synthetic substrates such as ATEE or BAPNA or with natural substrates such as casein or BSA. Deepthi et al. (9) demonstrated how steric limitations to large substrates affect the Km, kcat, and kcat/Km of several proteases such as chymotrypsin, trypsin, and papain. Similarly, in the case of a-amylases, steric problems surface when starch is used as a substrate instead of the synthetic substrate p-nitrophenyl maltopentoside. 5
MODEL ENZYMES STUDIED
5.1
Papain
An independent study of papain covalently modified with an oxidized sucrose polymer (OSP 400) of molecular weight 400 kDa by Rajalakshmi and Sundaram (6) was made. Among the three preparations bearing a molar ratio of enzyme to modifier of 16:1 (P1), 4:1 (P2), and 2:1 (P3), P3 showed a decrease of 20% in specific activity. Catalytic efficiency of P3 also decreased the most. Km values did not change, whereas kcat decreased with increase in OSP 400 in the molar ratio. Table 1 shows the results of covalently modified papain using oxidized sucrose polymer (OSP 400) at different molar ratios wherein it may be seen that Km values do not change much whereas kcat does. Preparations containing more of the modifying sucrose polymer show reduced catalytic activity, thus decreasing the kcat/Km value. Thus P3 showed the lowest catalytic efficiency, although its catalytic stability was the highest. Topt and T50 are the parameters that tell us about the heat resistance of the enzyme, and as may be seen in Table 2, both of these parameters show
Table 1
Specific Activity and Affinity Constants of Native and Modified Papain
Papain form Native P1 P2 P3
Specific activity (Amol pNA/mg protein/h)
Km (mM)
kcat (min1)
kcat/Km (M1 S1)
1.76F0.06 1.82F0.08 1.70F0.10 1.76F0.06
1.00F0.02 0.94F0.08 1.09F0.06 1.00F0.03
3.82F0.09 4.01F0.07 3.52F0.09 2.98F0.10
63.70 71.06 53.76 49.70
P1, P2, and P3 refer to PS-papain, modified in ratios of 16:1, 4:1 and 2:1, respectively. The Km and kcat values were determined from Lineweaver–Burk plots. The values given are the meanFstandard deviation from at least three independent determinations.
Analysis of Stability of Enzymes Table 2
Temperature Optima and Stability of Native and Modified Papain Unheated
Papain type Native P1 P2 P3
643
T50 (jC)
Preheated
Topt (jC)
Ea (kJ/mol)
Topt (jC)
Ea (kJ/mol)
Without urea
With 8M urea
62.6F0.21 72.4F0.41 71.2F0.16 73.6F0.10
33.4F0.38 32.5F0.32 33.4F0.22 36.3F0.25
59.8F0.18 66.0F0.25 70.6F0.12 71.2F0.12
33.3F0.18 32.0F0.27 32.2F0.19 35.1F0.22
68 75 74 78
29 53 56 60
Topt: optimum temperature for activity; Ea: energy of activation for thermal activation. All Topt and Ea values are given as meanFSD from at least four independent determinations. T50 is the temperature where 50% of initial activity is retained, values represent the average of at least four independent assays; standard errors in T50 values are not more than f1jC.
an increase in value indicating improved thermotolerance after modification, with the best among the lot being the preparation P3. In contrast to Topt and T50, the activation energy Ea increases only slightly for the modified enzymes. This correlates with the fact that the specific activity of the enzyme (Table 1) is lowered after modification. Slopes of lines obtained from the plots of log % residual activities (RA) of papain at different temperatures over a period of time (Fig. 1) denote ki, the rate constant of the inactivation reactions. Plotting log ki against 1/TjK yields Eai (Fig. 2), the activation energy of inactivation for native and modified papain by using Eq. (2). Table 3 contains values of stabilization factor (SF) of the three papain adducts P1, P2, and P3 in the temperature range 60jC to 90jC. For P1, the SF value remains around 2.5 for 60jC to 90jC, whereas for P2 and P3, maximum stabilization is obtained at 70jC. Table 4 contains the data on t1/2 and kinetic activation parameters of native and modified papains. The values for t1/2 are calculated from Eq. (1) (t1/2=ln 2/k), and the interpretation of these values loses its significance when the half-life of the modified enzymes such as P3 is so large especially at lower temperatures. A more adequate measure of stability is obtained from Eai, the activation energy of inactivation (see Eq. (2) which uses the Arrhenius equation). A larger value of Eai implies that more energy is required to inactivate the protein. The t1/2 values of both the native and various modified enzymes P1, P2, and P3 (Table 4) drop steeply with the increase in temperature above 60jC. The half-life of OSP 400 papain preparations (P1 to P3) shows a vast improvement depending on the amount of OSP used. Thus the t1/2 values were
644
Sundaram and Srimathi
Figure 1 First-order plots of thermal inactivation of papain (A) native, (B) P1, (C) P2, and (D) P3 at 37jC (.), 50jC (o), 60jC (z), 70jC (n), 80jC (j), and 90jC (5). P1, P2, and P3 are modified preparations of papain.
2- to 20-fold higher at 60jC, 3- to 30-fold higher at 70jC, 4- to 8-fold higher at 80jC, and 2-fold higher at 90jC. The free energy of inactivation DGp and the corresponding enthalpy p (DH ) and entropy values (DSp ) calculated from Eqs. (3)–(5) are also found in Table 4. These values did not change markedly in the temperature range 60jC to 90jC, although, compared to the native enzyme, the adducts
Analysis of Stability of Enzymes
645
Figure 2 Arrhenius plots of thermal inactivation for native (.) and modified papain P1 (j), P2 (z), and P3 (o). Slopes of the plots in Fig. 1 yield ki, the firstorder inactivation rate constants.
showed larger changes in DG p , DH p , and DS p . DG p for native papain ranged between 105.1 and 114 kJ/mol, its average enthalpy was 162 kJ/mol, and the entropy values were between 144.5 and 156.5 mol K1. P1, which has the lowest content of OSP, showed a decrease 17 J/mol K1 in its DS p , which is less than that of native enzyme, and a decrease in DH p by 3.3 kJ/mol, although DG p was higher than the native by 2.1 to 3.1 Table 3 Temperature-Dependent Stabilization Factor (SF) of Papain Modified with OSP 400 Papain type
60jC
70jC
80jC
90jC
P1 P2 P3
2.15 4.66 20.50
3.00 11.00 32.00
2.5 2.68 5.00
2.5 1.43 2.14
SF=t1/2 modified/t1/2 native. Molar ratios of enzyme to OSP 400 are 16:1 (P1), 4:1 (P2), and 2:1 (P3).
646
Sundaram and Srimathi
Table 4 Half-Life and Kinetic Activation Parameters for Thermal Inactivation of Native and Modified Papains Temperature (jC) 60
70
80
90
(A) Native papain t1/2 (h) DG p (kJ/mol) DH p (kJ/mol) DS p (J/mol K1)
48.00F1.1 114.0F0.20 162.1F0.22 144.5F0.30
6.00F0.20 110.1F0.08 162.1F0.22 151.5F0.25
1.60F0.06 108.3F0.02 161.9F0.22 151.8F0.28
0.28F0.02 105.1F0.02 161.9F0.22 156.5F0.28
(B) Papain P1 t1/2 (h) Dt1/2a DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)
103.00F3 55.00F1.90 116.1F0.15 2.1F0.05 158.8F0.21 128.2F0.09
18F0.9 12F0.70 113.2F0.13 3.1F0.05 158.7F0.21 132.7F0.12
4F0.1 2.4F0.04 111.1F0.17 2.8F0.15 158.6F0.21 134.6F0.06
0.7F0.04 0.47F0.02 107.8F0.21 2.7F0.19 158.5F0.21 140.0F0.0
(C) Papain P2 t1/2 (h) Dt1/2a DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)
224F2.5 176F1.40 118.2F0.15 4.2F0.05 179.6F0.24 184.1F0.13
66F1.1 60F0.90 118.0F0.18 7.9F0.10 179.5F0.24 179.3F0.09
4.3F0.2 2.7F0.14 111.3F0.13 3.0F0.11 179.4F0.24 192.9F0.16
0.4F0.04 0.12F0.00 106.2F0.23 1.1F0.21 179.3F0.24 201.4F0.15
(D) Papain P3 t1/2 (h) Dt1/2a DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)
988F4 940F2.90 122.3F0.18 8.3F0.02 236.6F0.27 342.6F0.14
192F3 186F2.80 120.1F0.17 10.0F0.09 240.4F0.27 351.0F0.15
8F0.6 6.4F0.54 113.1F0.19 4.8F0.17 240.4F0.27 360.6F0.12
0.6F0.05 0.32F0.02 107.4F0.27 2.3F0.25 240.3F0.27 366.1F0.0
a Difference between modified and native papain. Standard deviations represent deviations in values calculated from the maximum and minimum values of the first-order rate constants.
Analysis of Stability of Enzymes
647
kJ/mol. P2 and P3 showed an increase in DS p , DH p , and DG p . D(DG p ) for all the modified papains were the highest at 70jC which indicates maximum stability at this temperature. 5.1.1
Urea Denaturation
The urea denaturation pattern of papain before and after modification is rather unusual in that the native and the sample P1, which has the least OSP (E/M=16:1) in relation to the enzymic protein, are inhibited by urea (0 to 8 M), whereas P2 and P3, which are made with an enzyme to modifier molar ratio of 4:1 and 2:1, show an increase in activity of 120% and 170%, respectively, in the initial 4 h after which a loss in activity was observed. This initial increase in activity is attributed to the active site becoming more flexible due to urea. Further rupture of the H-bonds leads to denaturation and a gradual loss of activity. t1/2 (time for 50% inactivation when exposed to urea) for the native enzyme was 33 min which increased to 130 min for P1 due to this effect. t1/2 could not be calculated for P2 and P3 because of the initial activation observed. 5.2
Chymotrypsin
Similar to papain, bovine a-chymotrypsin was also modified by OSP 70, OSP 400, CMC (12 kDa), and Dextran (73 and 250 kDa), all of them by the reductive alkylation procedure. Table 5 shows the data on residual activities of native and modified chymotrypsin after heat, urea, and SDS treatment. The pH optimum for this enzyme was virtually the same after modification. The Km value for the CMC adduct moves up from 0.49 to 0.66 mM. The kcat/Km for native enzyme was 122.44 M1 S1 and was 81.82 M1 S1 for the CMC adduct. 5.2.1
Activity Against BSA
Using BSA as natural substrate (molecular weight taken as 64 000, extinction coefficient after total digestion as 6000, and setting the native enzyme activity as 100%), the following were the percentage activities shown by the various modified chymotrypsin adducts: native (100%) > CMC-C (80%)>OSP 400-C (35%)>Dextran-C (28.9%)>OSP 70-C (26.2%). The kcat/Km value for CMC-C was the highest at 6.00104 M1 S1 as against 7.2104 M1 S1 for the native enzyme, while other modified products ranged between 2.04104 and 2.2104 M1 S1. Topt values increased by only 0.5jC to 10jC from 56.5jC for the native chymotrypsin. Table 5 contains data on the residual activities (% RA) of chymotrypsin before and after modification, the extent of modification, T50 and
648
Sundaram and Srimathi
Table 5 Residual Activities of Native and Modified a-Chymotrypsin After Heat, Urea, and SDS Treatment Type of enzyme OSP 400-C OSP 70-C
CMC-C Dextran 250-C Native-C
Enzyme/polymer molar ratio
Modification (%)
% RAa
T50 (jC)b
U50 (M)c
% RA in SDSd
1:0.5 M1 1:0.5 M2 1:0.5 O1 1:1.0 O2 1:2.0 O3 1:333 C1 1:500 C2 1:0.2 D1 1:0.5 D2 1:0.0
58 80 60 67 75 82 90 90 98 0.00
64 55 75 68 65 80 77 53 50 100
64 63 60 61 61 58 57 55 54 50
8.6 – – – 7.9 4.5 – 4.9 – 3.6
100 – – – 100 45 – 82 – 5.6
a % RA refers to the residual activity remaining after modification as compared with the native enzyme whose activity is taken as 100%. b T50 is the temperature where 50% of the initial activity is retained. c U50 is the concentration of the urea at which 50% initial activity is retained. d Incubation with 0.3% SDS for 15 min.
U50 values (temperature and urea concentration at which 50% of initial activity is retained), and % RA after SDS treatment. Modification of 58% to 98% was observed among the four modifiers used. It is worth noticing how the % modification affects % RA, T50 (jC), U50 (M), and the denaturing effect of SDS. The OSP 400-modified enzyme M1 yields a 58% modified adduct that is more active and stable in all respects than M2 which is 80% modified. A similar trend is seen also with O1, O2, and O3, the three OSP 70-modified enzymes. An increase of 4jC to 14jC in T50 values of the modified enzymes indicates their improved thermotolerance, with M1, the OSP 400-modified enzyme, being the best of the lot. Variation in the molar ratios of enzymes to polymer does not change T50 values noticeably, although other parameters appear to change. Table 6 contains t1/2, Eai, and other thermal inactivation parameters such as DGp , DHp , and DSp for native and variously modified chymotrypsins. Although the native and modified enzymes showed a steep decrease in t1/2 values with increasing temperature, the OSP 400-modified enzyme showed a 60- to 90-fold higher t1/2 than the native, whereas for dextran, CMC, and OSP 70-modified enzymes, they were 8-, 32-, and 80-fold higher, respectively. Correspondingly, Eai values also rose indicating stabilization. DGp values of modified enzymes rise by 2.3 to 12.4 kJ/mol.
Analysis of Stability of Enzymes
649
Table 6 Half-Life and Kinetic Activation Parameters for the First Phase (k1) of the Thermal Inactivation of Native and Modified Chymotrypsin a-Chymotrypsin (a-CT) Native a-CT
OSP 400-a-CT (M1)
OSP 70-a-CTa (O3)
CMC-a-CT (C1)
Dextran-a-CT (D1)
t11/2 (h) 5.5F0.115 to 0.033F0.001 362.0F5.43 to 1.3F0.037 160F1.92 to 0.9F0.027 87.5F1.31 to 0.4F0.01 9F0.19 to 0.086F0
Eai1 (kJ/mol)
DG1p (kJ/mol)
DH1p (kJ/mol)
DS1p (J/mol K1)
183.3F0.42
106.9F0.213 to 97.6F0.02 118.11F0.15 to 108.1F0.21 115.08F0.15 to 107F0.32 114.32F0.16 to 104.35F0.22 109.2F0.19 to 100.3F0.134
182.4F0.255 to 182.3F0.255 337.5F0.5 to 336.74F0.49 300F0.54 to 299.7F0.54 284.44F0.43 to 284.31F0.43 227F0.32 to 226.9F0.32
233.7F0.514 to 250.52F0.44 677F0.54 to 676.46F0.2 581.2F0.46 to 570F0.42 527F0.38 to 532.42F0.43 365F0.36
338.0F0.9
302F0.7
285.0F0.6
227.6F0.4
375F0.28
Experimental data collected at 50–65jC. The values are given as FSD from triplicates. Ea is the activation energy of inactivation. It is obtained by plotting log ki, the first-order inactivation constant, against reciprocal of temperature as per the Arrhenius equation, ki is obtained from the slopes of plots of log % residual activity against time in hours. The enzymes in assay buffer were incubated at various temperatures. At regular intervals, aliquots were removed to measure the residual activity expressed relative to that of the unheated control.
Similarly, it may be seen in Table 5 how U50, the concentration of urea at which 50% initial activity is retained, also increase noticeably in the case of OSP-modified chymotrypsin. Fluorescence spectral measurements of the native and modified enzymes at several temperatures between 30jC and 70jC revealed that the native chymotrypsin is denatured by heat at 60jC resulting in the unfolding of the enzyme. This is borne out by a distinct redshift in the wavelength maximum and a corresponding decrease in the fluorescence intensity (Fig. 3A). Under similar conditions, an OSP 70-modified enzyme (preparation O3 in Table 5) showed lesser redshift and smaller fluorescence intensity loss (Fig. 3B). These experiments show that the modified enzyme retains its conformational stability and its catalytic activity as well. 5.3
B-Glucosidase
h-Glucosidase from sweet almonds modified by conjugation with oxidized sucrose polymer (OSP), carboxymethylated sucrose polymer (CMOSP), or CM cellulose (CMC) showed the effect of structural changes mainly seen in
650
Sundaram and Srimathi
Figure 3 Fluorescence emission spectra of (A) native and (B) OSP 70-modified (O3 in Table 4) chymotrypsin. Enzyme samples incubated for 1 h at 25jC (.) and 60jC (o) and their fluorescence emission spectra recorded.
Analysis of Stability of Enzymes
651
improved thermotolerance with a 2- to 6-fold increase in t1/2 at 50jC (in a temperature range tested at 40jC to 70jC, Table 7). Free energy of thermoinactivation (DGp ) increased by 4.97 kJ/mol for CMC-modified enzyme and 7.5 kJ/mol for a sucrose polymer-modified enzyme (9). CMC-modified h-glucosidase shows a 3.3-fold increase in thermostability (preparation C in Table 8A), whereas the OSP-modified enzyme enhances thermotolerance (preparation B in Table 8B) by 18.9 times. Ea, the activation energy of the modified enzymes, decreased from 4.89 kcal/mol for the native enzyme to 1.46 kcal/mol for the CMC-enzyme and 4.31 and 3.64 kcal/mol for OSP-enzyme and the CMOSP-enzyme, respectively. This indicates that the CMC-modified enzyme becomes the most efficient catalyst in this group. Table 7 contains the data on t1/2, DGp , DHp , and DSp of this modified h-glucosidase. The footnotes contain the details. Molar ratios of enzyme to modifier (E/M) may be varied to obtain optimum stabilization. Table 8A shows that the D(DGp ) value changes depending on E/M ratio. OSP-modified h-glucosidase at E/M of 1:0.89 produced the best results, with D(DGp ) varying between 7.5 and 5.89 kJ/mol in the temperature range 55jC to 70jC. Similarly, the stabilization factor (SF) is also the best for the same adduct (Table 8). 5.3.1
Stabilization in Nonaqueous Media
Most of the chemically modified enzymes that we have studied are found to tolerate high concentrations of polar solvents. This finding is considered very useful when enzymes may be considered for use in the synthesis of esters, peptides, or carbohydrate polymers. Here we discuss the observations made with the effect of solvents on h-glucosidase activity. The effect of increased negative charges on the enzyme adduct as in the case of ECMC or ECMOSP could affect the polarity and the dipole moment of the enzyme which might either stabilize or destabilize the enzyme in water-miscible solvents (8). The stabilities of the native enzyme (E) and CMC-modified adduct (ECMC) in 60% v/v solvents such as acetone, CH3CN, dioxane, DMF, DMSO, and ethanol were compared (Fig. 4). ECMC was more stable than E in all cases except dioxane in the following order: DMF>DMSO= CH3CN>acetone>ethanol, the actual order of their dipole moments in pure form being DMSO=CH3CN>DMF>acetone> ethanol. Pure dioxane has a dipole of zero. Thus it suggests that DMF is ideal for the stabilization of ECMC, and a shift in either direction reduces the extent of stabilization. An important finding was that in all these solvents, the modified enzyme was more stable than the native enzyme. EOSP and
652
Sundaram and Srimathi
Table 7 Half-Life (t1/2) and Kinetic Activation Parameters for Thermal Inactivation of Native and Modified h-Glucosidase Temperature j(K)
Native t1/2 (h) DG p (kJ/mol) DH p (kJ/mol) DS p (J/mol K1) ECMC t1/2 (h) DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1) EPS t1/2 (h) DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1) ECMOSP t1/2 (h) DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)
303
323
333
343
56.9 – – –
12 108.7 142 106.2
4.25 109.15 142.92 101.41
0.48 106.69 142.83 105.62
ND – – – –
74.5 113.67 4.97 187.53 229.1
6 110.43 1.26 187.44 231.26
1.03 109.1 2.41 187.36 228.16
75.98 – – – –
27.13 111.04 2.34 166.27 170.99
5.36 110.09 0.94 166.19 168.46
0.6 107.27 0.56 166.1 171.51
83.9 – – – –
22.2 110.51 1.81 166.27 172.63
5.6 110.19 1.04 166.19 168.16
0.57 107.12 0.43 166.10 171.95
ND—Not determined. The experiments were done in triplicate and rate measurement data varied within F2%. The data presented are obtained by taking average values of the triplicates. All the parameters, i.e., Eai, t1/2, DG p , DH p , and DS p , are obtained from the ki values obtained in the temperature-dependent inhibition measurements using Eqs. (2)–(4). After regression analysis, correlation coefficient of the data was found to range between 0.9837 and 0.9967. Ea for the native enzyme (E) was 4.89 kcal/mol and for ECMC, EPS, and ECMPS, the values were 1.46, 4.31, and 3.64 kcal/mol, respectively. Thus DEa (MN) was 3.43, 0.58, and 1.25 kcal/mol for ECMC, EPS, and ECMOSP, respectively. M and N (Ea M– Ea N) denote modified and native enzymes, respectively. The half-life (t1/2) of the enzyme at different temperatures was estimated by incubation at 30jC, 40jC, 50jC, 60jC, and 70jC. Aliquots were removed at definite time intervals over a period of 2 h and assayed for activity at 28jC. Molar ratio (enzyme/modifier) for making ECMC is 1:16 and for EPS and ECMOSP, they are 1:0.5 and 1:1, respectively. a Difference in DG p between native and modified enzyme.
Analysis of Stability of Enzymes
653
Table 8 (A) Standard Free Energy Changes D(DGp ) and (B) Stabilization Factor (SF) of Modified h-Glucosidase D(DGp ) (kJ/mol) E:M ratio
55jC
60jC
(A) CMC-b-glucosidase 1:0.112 2.47 2.12 1:0.561 3.25 3.39 1:1.01 3.79 3.69 (B) OSP-b-glucosidase 1:0.112 4.46 2.76 1:0.89 7.5 6.07 1:1.6 6.07 4.69
Stabilization factor (SF)
65jC
70jC
55jC
60jC
65jC
70jC
1.38 2.02 2.8
2.05 2.52 2.89
1.81 2.28 2.78
2.16 3.42 3.2
1.64 2.07 2.72
2.15 2.51 2.87
3.62 4.44 4.33
3.31 5.89 4.36
5.12 18.3 11.25
2.69 10.8 6.58
4.04 5.09 5.65
3.21 9.6 5.7
DGp values are obtained using Eq. (2). M and N denote modified and native enzyme, respectively. Experiments were done in triplicate and averages were taken. Correlation coefficients after regression analysis for the values of D(DGp ) for the three CMC preparations were around 0.9941 to 0.9988 and were in the range 0.9537 to 0.9953 for the OSP hglucosidase. SF is t1/2M/t1/2N.
ECMOSP were less stable than ECMC, although ECMOSP showed slightly better activity than EOSP in these solvents (Table 9). 5.4
Subtilisin A
Subtilisin A (or Subtilisin Carlsberg isolated from Bacillus licheniformis and supplied by Novozymes A/S), an alkaline protease, did not show any improvement in stability upon cross-linking with monoglutaraldehyde. In fact, the enzyme became a poor catalyst displaying a lower value for Topt and t1/2 and a dramatic 4-fold increase in Ea. This reaction with MGA must have altered the structure drastically such that the Km decreases dramatically from 0.282 mM for the native enzyme to 0.005 mM for the MGA-Subtilisin. kcat decreases from 7.76 to 0.137 S1 for the modified enzyme, although the net result is that the efficiency of the enzyme as a catalyst remains unaltered at 27.5 M1 S1. These data imply that the enzyme with the modified structure binds the substrate too strongly and does not release the product readily enough. However, modification with OSP 400 produced an enzyme that retained 85% of the original activity while using casein as the substrate. This reduction in activity could be due to a steric hindrance created for the macromolecular substrate by the enzyme which is already attached to OSP
654
Sundaram and Srimathi
Figure 4 Effect of water-miscible organic solvents on h-glucosidase activity. E and ECMC were incubated for 24 h in 60% (v/v) solvents at room temperature (28– 30jC). An aliquot of enzyme was assayed for activity in buffer.
Table 9 Correlation Between Solvent Polarity and Stability of h-Glucosidase After Modification
Solvent 60% (v/v) Dioxane Ethanol Acetone DMF DMSO Acetonitrile
Relative activity (modified/native)
Dipole moment of pure solvent A
ECMC
ECMOSP
EPS
0 1.69 2.88 3.86 3.90 3.92
0.8 1.0 1.2 1.59 1.35 1.33
0.56 0.78 0.89 0.84 1.2 1.88
0.59 0.69 0.64 0.85 1.1 1.4
% Residual activity was calculated by comparing the absolute activity in solvent with that without solvent after 24-h incubation at room temperature for each enzyme. The relative activity is a comparison of the residual activities of the modified enzyme with the native enzyme in each solvent.
Analysis of Stability of Enzymes
655
Table 10 Stabilization Factor (SF) for OSP 400Modified Subtilisin A t1/2 (h) Temperature (jC) 50 55 60
Modified
Native
SF
17 10.6 5
3.2 0.57 0.47
5.3 18.6 10.6
400, a large polymer. Ea, the Arrhenius activation energy, decreases by 1.66 kJ/mol after modification, whereas Topt remained unchanged at 70jC and T50 increased from 53jC to 61jC. Maximum stabilization occurs at 55jC with the stabilization factor SF reaching 18.6 (Table 10). The kinetic parameters do not change much in the temperature range 30jC to 70jC. After exposure to 1% SDS, nearly 50% activity is retained by OSP 400 subtilisin as against around 15% for the native enzyme. 6
DISCUSSION
In proteins, the primary sequence determines structure and structure influences function. In enzymes, function implies catalytic activity, and how the latter may be persuaded to remain stable has been discussed in this review with some specific enzymes as examples. In general, protein structures depend on amino acid composition, and given a linear sequence, they would fold into structures that segregate hydrophobic amino acids which may not be valid in all cases. In extreme cases, proteins may not fold properly but would tend to aggregate and form precipitates, a reason why very often during production, when scaling up is attempted, inclusion bodies are formed. 6.1
Changes in the Kinetic Parameters and Their Significance
How the values of the parameters like Topt, t1/2, U50, DGp , DHp , m, and Eai are estimated has been described, and when the values of all these parameters increase, it is an indication that the catalytic stability of the enzymes also increases. When Ea, the Arrhenius activation energy, goes down in value, it implies that the enzyme has become a better catalyst. 6.2
Effect of Modification on Protein Structure
Altering structure by in vitro covalent methods could also lead to proteins with improved properties including catalytic performance and stability. However, there are elements that are a natural part of the in vitro approach
656
Sundaram and Srimathi
which arise from the fact that the modifiers used for changing the structures can be varied in their physicochemical properties. There are two ways in which the protein modification manifests itself: (a) directly visible changes in structure, e.g., change of one amino acid for another as in engineered proteins, or (b) in the in vitro method, attachment of a modifier molecule to an amino acid target on the proteins such as an acid or a carbohydrate molecule made to react with a lysine q-NH2 group. In dealing mainly with the in vitro method, the consequences that may be foreseen are the following. 6.2.1
Water Structure
One of the consequences is the water structure changes resulting from a cluster of OH groups present in the carbohydrate, e.g., disaccharides or polymers used as a modifier. This will ‘‘rigidify’’ the protein and stabilize it considerably. 6.2.2
Charges on the Protein
The charge on the protein can be neutralized, reversed, or increased considerably leading to a change in the pI of the protein. Ultimately, the activity and stability pattern of the enzyme can change due to such structural changes. 6.2.3
Solvent Effects
The thermodynamics of water–alcohol (30:70) systems has been investigated at a molecular level using neutron diffraction with hydrogen isotope labeling by Maurel (10). In a seemingly simple system such as this, the H-bonding pattern appears quite complex. We draw attention to this phenomenon only to emphasize the fact that the orientation of (a) the hydrophobic head groups—CH3 in methanol in a 70% aqueous mixture of the solvent, and, for example, (b) the oxygens in ether, or (c) the CN in CH3CN, and so on probably produces sizable effects on the enzyme molecule when we try to study their function after a long exposure to predominantly organic media which is important when enzyme-mediated synthesis is a concern. The H-bond network in a medium is a characteristic of the composition of the medium due to the complex thermodynamics of the aqueous mixtures of organic solvents. Given this situation, it is not difficult to see how different solvents influence a given enzyme in different ways. It is conceivable that a fine layer of the ‘‘solvent cage’’ orients itself on the enzyme molecule and may affect the enzyme functionally. For example, we have pointed out (Fig. 4) the role played by the dipole moment and the dielectric
Analysis of Stability of Enzymes
657
constant of the solvents in affecting the catalytic efficiency of h-glucosidase in our studies. When looking at the effect of the solvents on the properties of an enzyme, one is conscious of the water structure in the medium. A carbohydrate-modified enzyme in an aqueous medium already strongly influences the water structure because of the introduction of a sizable cloud of –OH groups from the modifier which in turn forms a network of H-bonds. The effect of organic solvents on such a system will be different from that found in a purely aqueous phase. Apart from the intramolecular and intermolecular H-bonds found in the protein molecules, one encounters the H-bond formation by water molecules and the –OH groups of the carbohydrate in the medium. 6.2.4
Temperature Coefficient and Enzyme Stability
It must be pointed out that the catalytic rate is a combination of the thermal stability and the temperature coefficient, and in the descending limb of ‘‘the temperature vs. enzyme activity plot,’’ denaturation becomes significant. Because of this, the denaturation effect increases when the assay time increases, and also with increasing temperatures. The overall effect of this is that the real Topt, the optimum temperature for enzymatic activity, could shift to lower values as the ratio of (Einact) to (Eact) in the equilibrium mixture increases with increased exposure time and also at high temperatures as suggested by Daniel et al. (3). In our experiments with covalently modified enzymes, which show greater stability, t1/2, and Topt values increase noticeably. This implies that the onset of denaturation is delayed towards a higher temperature.
7
EPILOGUE
Anfinsen’s paradigm that the primary sequence of a protein defines its folded structure and consequently its properties is still accepted. Based on this, the primary sequence of enzymes is modified by protein engineering techniques to change or improve their properties. Introduction of disulfide bridges at critical points in the protein could also lead to stabilization provided that the S–S bridge has the right stereochemistry (12). We have shown that a viable alternative to this expensive and laborintensive procedure is an in vitro method for altering protein structures by covalent chemical methods. Methods to analyze the catalytic and structural stabilities have been discussed with choice examples of three proteases and a glycosidase which have been modified with a variety of modifier molecules. Results reinforce the contention that covalently produced structural changes
658
Sundaram and Srimathi
often lead to enzymes possessing increased resistance to thermal and chemical stress including a spectrum of solvent effects. 7.1
Influence of Carbohydrate Structure on Stabilization
Among the hydrophilic modifiers such as carbohydrates, in some instances, it is not yet clear why one sugar, a disaccharide, or a polysaccharide is a better stabilizer than the others. We have tried coupling several disaccharides such as sucrose, maltose, lactose, or trehalose after oxidation with periodate to proteins like papain, chymotrypsin, and trypsin. Usually, sucrose has been found to be the best (Venkatesh et al., unpublished), with maltose the next best stabilizer. Whereas trehalose, which is used as an additive or a cosolvent, has been considered to be very efficient in maintaining the stability of enzymes, it is not a suitable modifier for covalent coupling. We have tried both periodate oxidized as well as carboxymethylated trehalose in modifying subtilisin (unpublished). Among the carbohydrate polymers, dextran is a linear molecule, whereas the sucrose polymer used in our studies is synthetically made and is branched. The branched nature probably makes it a more efficient stabilizer. In conclusion, it may be summarized that: 1. Enzyme structure may be modified by protein engineering techniques or in vitro covalent procedures. 2. We have established several procedures to activate a variety of modifier molecules that will react with proteins causing a permanent change in their structures. 3. Procedures optimized for assessing catalytic (functional) and structural stabilities such as DGH2O have been used in our studies. 4. In making the protein more hydrophilic, its solubility increases, or if its hydrophobicity is increased, it may alter its behavior in nonaqueous solvents. 5. We have shown how disaccharides and natural and synthetic polysaccharides may be attached to proteins covalently. 6. Our studies showed that the physical addition of carbohydrates did not improve enzymatic behavior as compared with covalent coupling. 7. The variability in the bulk, as in the case of carbohydrate polymers added to the protein molecule, can contribute to the change in the microenvironment of the protein. This could offset some of the conformational stability parameters such as DGH2O and m, the midpoint of unfolding transition.
Analysis of Stability of Enzymes
8.
9.
659
Our studies show that there is sufficient scope to synthesize new modifier molecules to further enlarge our approach to enzyme stabilization using the in vitro covalent coupling procedures. It is also clear that catalysis in organic media may be made more facile using in vitro modification of enzymes.
REFERENCES 1.
R Rudolph. Successful protein folding on an industrial scale. Protein Engineering Principles and Practice. New York: Wiley-Liss, 1996, pp 283–298. 2. T Lonhienne, C Gerday, G Feller. Psychrophilic enzymes: revisiting the thermodynamic parameters of activation may explain local flexibility. Biochim Biophys Acta 1543:1–10, 2000. 3. RM Daniel, MJ Danson, R Eisenthal. The temperature optima of enzymes: a new perspective on an old phenomenon. Trends Biochem Sci 26:223–225, 2001. 4. CN Pace. Determination and analysis of urea and guanidine hydrochloride denaturation curves. In: CHW Hirs, SN Timasheff, eds. Methods in Enzymology. Vol. 131. New York: Academic Press, 1986, pp 266–280. 5. JT Yang, C-SC Wu, HM Martinez. Calculation of protein conformation from circular dichroism. In: CHW Hirs, SN Timasheff, eds. Methods in Enzymology. Vol. 130. New York: Academic Press, 1986, pp 208–269. 6. N Rajalakshmi, PV Sundaram. Stability of native and modified papain. Protein Eng 8:1039–1049, 1995. 7. R Venkatesh, PV Sundaram. Modulation of stability properties of bovine trypsin after in vitro structural changes with a variety of chemical modifiers. Protein Eng 11:691–698, 1998. 8. PV Sundaram, R Venkatesh. Retardation of thermal and urea induced inactivation of a-chymotrypsin by modification with carbohydrate polymers. Protein Eng 11:699–705, 1998. 9. S Deepthi, R Venkatesh, PV Sundaram. Catalytic efficiency of covalently modified proteases against proteinaceous substrates. Ann NY Acad Sci 864:521– 523, 1998. 10. P Maurel. Relevance of dielectric constant and solvent hydrophobicity to the organic solvent effect in enzymology. J Biol Chem 193:1677–1683, 1978. 11. L Subramaniam, PV Sundaram. Kinetics of thermal inactivation of h-glucosidase stabilized by covalent modification with soluble carbohydrate polymers. Submitted for publication. 12. A Fersht. Protein stability. Structure and Mechanism in Protein Science, A Guide to Enzyme Catalysis and Protein Folding. New York: W.H. Freeman and Company, 1999, pp 534–535.
Index
Page numbers in boldface indicate in-depth discussion of the subject.
Acinetobacter calcoaceticus, 266 Activation: energy, 73, 294, 636 interfacial, 122 pathway, 124, 214 thermal, 635 Active site, 169, 216 conserved, 18 metal cluster, 249 mutants, 238 Activity: protein engineering concepts, 2, 220 specific, 4, 37, 219 Activity profile, pH-dependent, 6, 10, 37, 44 Acylation, 48, 62 Agrobacterium, 516 tumefaciens, 462 Alanine scanning, 7, 385 Alcohol dehydrogenase, 170 Algorithm, 11, 115
Alignment: sequence, 4, 25, 61, 161, 298, 366, 473 structure, 44, 55, 89, 336 Aminohydrolase superfamily, 247 Amylase, 17, 216, 296 Analysis, stability, 491, 633 Angle: bond, 39, 109 dihedral, 293, 307 energy, 109 equilibrium, 110 Euler, 130 hinge, 45 phi, 293, 307 psi, 293, 307 rotation, 130, 293 volume, 107 Ankyrin, 4 Aspergillus: awamori, 8, 310 fumigatus, 5 661
662 [Aspergillus] niger, 4 terreus, 5 Assay, 475, 492, 563, 568 automation, 525 circular dichroism, 574 DNA microarrays, 584 filter, 493 fluorescence, 567 in vivo, 495 IR-thermographic, 577 mass spectroscopy, 579 NMR-based, 588 solid phase, 507 spectroscopic, 563 Automation, 525
Bacillus: agaradherens, 165 amyloliquefaciens, 43 caldolyticus, 3 cereus, 8, 296 circulans, 163, 221 lentus, 41 licheniformis, 41 megaterium, 486 stearothermophilus, 8 subtilis, 3, 379 Bacteriorhodopsin, 44, 50 Barnase, 3, 175 Beta-lactamase, 401 Binase, 3 Binding, 63, 106, 170, 600 affinity,79, 85 change of, 54 cofactor, 36, 269 DNA, 332 domain, 233, 446 energy, 79 mode, 64, 216 site, 40, 61 substrate, 4, 28, 40, 47, 121, 133, 237, 249 Brownian dynamics, 124
Index Candida rugosa, 60 Carbohydrate active enzymes, 15–34, 216, 229 Cassette PCR, 436 Catalytic: identical machinery, 25 mechanism, 19, 60, 216, 231, 240 properties engineering, 242 CAZY, 20 Cellvibrio gilvus, 462 Chaperone, 638 Chemical modification, 1, 73, 277, 634 Chimeric, 268, 342, 357, 429, 447, 461 Chromogenic substrate, 484, 511, 518, 607 Chymotrypsin, 647 Classification: by conserved structural elements, 62 EC number, 16 of glutathion tranferases, 451 of homing endonucleases, 327 sequence and folding similarities, 17 Cleavage site, 7, 325, 329, 394, 432 Cluster analysis of mutant library, 452 Codon usage, 382 Cold shock protein, 3 Colony pick, 538, 561 Combinatorial: algorithms, 507 cassette mutagenesis, 359, 377 library, 209, 366, 428, 515, 605 modelling of statistics, 185 mutagenesis, 507 mutant libraries, 443 COMBINE, 79 COMFA, 80 Comparative binding energy analysis, 79 Computational methods, 59, 79, 97 Computer simulation, 97–148 Concepts (see Protein engineering concepts) Configuration: inverting, 19, 232 retaining, 19, 232
Index Conformational change, 37, 44, 54, 82, 122, 152 Conformational flexibility, 44, 152, 242, 503 Consensus sequence, 4, 133, 136, 437 Conserved residues, 236, 357 prolines, 299 Continuum dielectric model, 159 Coordinate shift, 38 Coulombic interactions, 113 Covalently: crosslinked, 277 modified, 633 Crystallography, 38 Cumulative effect, 295, 505 Cytochrome f, 44, 50 Databases: CAZY, 20 Lipase Engineering Database, 61 MEROPS, 32 Deamidation, 7 Degenerate DNA, 359, 365, 437, 515– 518 Dehalogenase, 79–96 Dehydrogenase, 261, 303, 587 alcohol, 170 glucose, 261 3-isopropylmalate, 495 Deletion, 36, 49, 299, 357–363, 384, 499, 623 Dielectric: boundary, 160 cavity, 175 constant, 110, 124, 152, 159–160, 161, 175, 656 medium, 160, 175 Digital imaging, 482, 507 Dihedral angles (see Angle) Directed evolution, 9, 32, 40, 475 methods, 353–373, 375, 413 modelling and optimization protocols, 185–212 optimization, 443
663 Diversity: chimera, 435 error-prone, 381, 384 generation, 356, 401, 413, 428, 515, 607 shuffling, 361, 428, 449 structural, 40, 444 theoretical, 185, 380, 515 DNA shuffling, 354, 361, 364, 413, 425, 443, 461 modelling, 186 Docking, 2, 65–77, 79–96, 101, 106, 220 Domain, 2, 44, 106, 463, 599 binding, 82, 233, 601 catalytic, 26, 233 noncatalytic, 26 shuffling, 422, 444 Drosophila lebanonensis, 170 EC-number, 16 Electrostatic, 2, 44 interactions, 110, 113, 149–184 in denatured proteins, 174 Enantioselectivity, 2, 10, 59, 376, 559 assay, 484 in silico assay, 64 modelling of, 59–78 predicting, 65 screening, 559–598 Endonucleases, 325 Engineering: activity, 4, 220, 242, 253 selectivity, 376 specificity, 219, 241, 253, 275, 340, stability, 3, 219, 274, 293 Environment: electrostatic, 2 steric, 2 Error-prone, 358, 502 diversity of, 384 mutational bias of, 381 polymerase chain reaction, 376–390 Escherichia coli, 266, 499, 600 Evolutionary method, 353
664 Expression: in Pichia pastoris, 280 in Pseudomonas, 617 Filter assay, 493 Flexibility, 38, 53, 162 Flow cytometry, 607 Fluorescence, 567 activated cell sorting (FACS), 600 Fluorescent, 238, 283, 482, 567, 585, 606 protein, 342 Fluorogenic: assay, 567 substrate, 485 Fold, 6, 17, 23, 37, 39, 106 alpha-beta-hydrolase fold, 60 (beta-alpha)8 –fold, 23, 247 beta-propeller, 262 of oligo-1,6-glucosidase, 296 Foldase, 618 Folding, 6, 36, 637 chaperone, 618 families, 25 feature of oligo-1,6-glucosidase, 296 mutant impact, 355, 385, 467, 493 pattern, 43 simulation, 99 Force field, 11, 80, 101, 106, 219 Forces: long-range, 111 short-range, 111 Fragment reassembly (see Reassembly) Free energy: of deprotonation, 154 duplex formation, 187 electrostatic free energy of desolvation, 80 Functional space, 443 Galactose oxidase, 511 Glucose dehydrogenase, 261 Glucosidases, 15–34, 216, 231, 461 alpha-, 216 beta-, 461, 516, 649
Index Glutathione transferases, 444 Glycosidases, 231 Glycosyltransferases, 15 GRID, 82 Haloalkane dehalogenase: DhlA, 85 linB, 90 Hamiltonian operator, 102 H-bonds, 2, 8 Helix-capping, 9 High throughput, 508, 525, 563 Homing endonucleases, 325 Homologous regions, 209, 463 Homology, 42, 199, 203, 216, 430 approaches, 3 model, 18, 216, 266, 362 Hydrogen bonds, 2, 8 Hydrophobic interactions, 2, 8, 491 Hygroycin B phosphotransferase, 495 Immunoglobulin, 3 Inactivation, thermal/heat, 495, 540, 635 Inhibitor: HIV protease, 81 phospholipase A2, 81 Insertions, 359 Intein, 337 Intermediate: carbamoyl, 249 carbo cation, 218 cryotrapped, 239 folding, 637 oxazolinium ion, 232 product, 62 tetrahedral, 71, 122 transition state, 134 Inverting, 19, 232 In vitro evolution, 353 In vivo gene shuffling, 414 Ion: binding, 269 cluster, 249
Index Ionization equilibria in proteins, 149 ITCHY, 199, 363 Kanamycin nucleotidyl transferase, 495 Kcat, 37, 48, 222, 255, 273, 511, 642 Kinetic, 247, 253 parameters of chimers, 470 Klebsiella pneumoniae, 280 Km, 37, 222, 255, 273 prediction of, 85 Lipase, 59, 121, 376, 402, 563, 619 Lipid interface, 123 Lipid-lipase interactions, 121 Lyases, 15 Lysozyme, 218 Mechanism, 19, 217, 229 chitinases, 231 conserved, 18 glycosidase, 19, 217, 229 Methionine, 7 Michaelis–Menten complex, 85, 122, 138 Microtiter plates, 481, 537, 570 Modelling: annealing events, 186 directed evolution, 185 homology, 18 Michaelis–Menten complex, 85 quantitive modelling, 59 Modular organization, 15, 26 Molecular dynamics simulation, 98, 101 enantiomer analysis, 71 Molecular interactions, 107 Monooxygenase, 365, 477 Monte Carlo method, 99 Mutant: combination, 3, 8 libraries (see Variant library) many in same position, 44 Mutational bias, 381 Newton’s second law of motion, 102 p-Nitrophenol, 485, 492, 563
665 Nonaqueous media, 651 Nucleophile: active site residue, 10, 86, 122, 133, 216, 232, 249, 463 reaction, 398, 449 Oligo-1,6-glucosidase, 296 Optically active, 62 Oxyanion, 122, 133 hole, 62 Oxydation: cysteine, 7 methionine, 7 Papain, 642 PCR, 358, 436 P450 monooxygenase, 486 pH: activity profile, 6, 10, 37, 44, 468 changing, 6, 221, 236 Phage display, 391–412 Phosphotriesterase, 247 Phylogeny, 515, 517 Phytase, 1–14 pKa: of residues, 134, 152, 175, 221 shift, 2, 10 Plates: agar, 416, 477, 492, 509 culture, 449 microtiter (see Microtiter plates) solid media, 408 Poisson-Boltzmann equation, 160 Polymerase chain reaction (PCR), 358, 436 Potential energy functions, 107 Prediction: of activity, 80 of annealing temperature, 187 of enantioselectivity, 65 of mechanism 18, 20 overprediction, 28 of stabilization, 11 of structure 18, 20 Proline rule, 2, 293
666 Protease: resistant, 7 susceptibility, 7 trypsin-like, 36 Protein core, 8 Protein engineering, rational, 2 Protein engineering concepts, 1–14 cavity filling, 7 consensus, 2 homologous enzymes, 2 homology approaches, 2, 3 proline rule, 2, 293 replacing cysteines and methionines, 7 stability, 219 Protein structure, 3-D, 1 Protein tyrosine phosphatase, 131 Protonation and deprotonation, 150 Pseudomonas: aeruginosa, 376, 563, 619 cepacia, 60 fluorescence, 480 putida, 626 Pyrroloquinoline quinine glucose dehydrogenase (PQQ-GDH), 261 QSAR, 80 Quantitative modelling, 59 Quantitative structure activity relationship, 80 Quantum mechanical calculation, 88 Quantum mechanics, 102 Quick-E-Test, 565 RACHITT, 364, 428 Ramachandran plot, 39, 304 Random chimeragenesis on transient templates (RACHITT), 364 Random mutagenesis, 358, 375 Rational protein engineering, 2 Rational redesign, 79, 213 Reaction: conditions, 70 mechanism, 19, 217, 229
Index Reassembly, 190, 202, 207, 361, 426 Recombination, 357 gene, 360 theoretical, 185 Redesign, 79 Redox enzymes, 261 Reducing entropy, 2 Retaining, 19, 232 Rhizomucor miehei, 123, 126 Rhizopus delemar, 126 RNase, 499 Saccharomyces cerevisiae, 413, 604 Salt bridge, 2, 7, 8, 39 Saturation mutagenesis, 385 Scratchy, 188, 199, 363 Screening, 406, 475–490 automation, 525 of culturable microorganism, 477 high throughput, 520, 525 kinetics, 509 microtiter plate (MTP), 481, 537, 570 noise, 540 optimization, 541 robotics, 525 thermostability, 491–506 Secondary structural elements, 2, 9, 39 variant in helices, 44 Secretion, 617 Selection, 406, 475 complementation, 480 display methods, 480 growth in presence of antibiotics, 480, 495 Sequence alignment, 4, 25, 61, 298, 366, 473 Serratia marcescens, 233 clans, 24 families, 15 Shuffling, 425 family, 364, 428 gene, 413 single-stranded DNA, 430
Index
667
sn-1, sn-2 and sn-3 position, 68 SN1, 218 SN2 reaction, 88 Solvent: channel, 50 effects, 70, 656 Specific activity, 2, 46, 131 Specificity, 2, 37, 46, 131, 241 chain length, 61 enantioselectivity, 2, 59, 253 Spectroscopy, 309, 563, 579 Sphingomonas paucimobilis, 90 Stability, 2, 44, 219, 633 catalytic, 633 cofactor, 269 improved, 3, 37 structural, 633 thermostability, 3, 219 Staggered extension process (StEP), 361 Staphylococcus aureus, 496 Stereoselectivity, 253, 482 Structural: alignment, 44, 55, 89, 336 diversity, 444 modules, 444 motif, 36 stability, 633 Structure: activity, 9 -based approaches, 6 stability, 7 Subsites, 254 Substrate: binding site, 10, 40 configuration, 25 docked, 64 specificity, 2, 37, 44, 131 Subtilisin, 37, 43, 44–49, 195, 653 Subtle changes, 42, 50 Suicide substrates, 398 Suppressor mutation, 495 Surface display cell, 599–616 T4-lysozyme, 44, 45 Taq-polymerase, 379
колхоз 5/15/06
Temperature factor, 52 Thermal: activation, 635 inactivation, 635 profiles, 467 Thermoadaptation, 495 Thermolysin, 8, 216 Thermomyces lanuginosus, 130, 402, 414 Thermostabilization, 3, 293, 491, 511 Thermotoga maritime, 462 Three-dimensional (3-D) structure, 1, 6, 18, 24, 35, 39 subtle changes, 42, 50 of variant enzymes, 35–58 Time scale of motions, 104–106 Titrateable group, 2 Titration, 152 irregular, 167 Torsion: angle, 39, 67–69, 107 potential function, 109 Transition: DNA, 359 state, 134, 214 analogues, 396 Transversions, 359 Turnover, 48, 122, 400, 610 Umbelliferone, 486 Unfolding, 637 energy, 174, 307, 638 van der Waals interaction, 110, 491 Variant library: combinatorial DNA library, 185 library size, 599 Water structure, 656 Yeast, 413 Xanthobacter autotrophicus, 85 X-ray crystallography, 38, 35–58 isomorphous, 41 molecular replacement, 41 Xylanase, 163, 221