Methods
in
Molecular Biology™
Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfi...
50 downloads
1194 Views
16MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Methods
in
Molecular Biology™
Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For other titles published in this series, go to www.springer.com/series/7651
Chemogenomics Methods and Applications
Edited by
Edgar Jacoby Novartis Institute for Biomedical Research, Basel, Switzerland
Editor Edgar Jacoby Novartis Institute for Biomedical Research Basel, Switzerland
ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-60761-273-5 e-ISBN 978-1-60761-274-2 DOI 10.1007/978-1-60761-274-2 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009932720 © Humana Press, a part of Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Cover illustration: Background art derived from Figure 3 in Chapter 11 Printed on acid-free paper Humana Press is part of Springer Science+Business Media (www.springer.com)
Preface Chemogenomics aims toward the systematic identification of small molecules that interact with the products of the genome and modulate their biological function. The establishment, analysis, prediction, and expansion of a comprehensive ligand–target SAR (structure–activity relationship) matrix has followed the elucidation of the human genome and presents a key scientific challenge for the twenty-first century. The annotation and knowledge-based exploration of the ligand–target SAR matrix is then expected to impact science greatly. Progress alongside this challenge without a doubt will contribute to further the fundamental understanding of the biological function of the individual proteins and ultimately provide a basis for the discovery of new and better therapies for diseases. While historically the chemogenomics approach is based on efforts that systematically explore target gene families, today broader in vitro and in silico approaches are available to encompass wider genomes. In this book, experts from academia and industry outline relevant aspects of chemistry, biology, and molecular informatics which are the cornerstones of chemogenomics. General introductory chapters are combined with chapters describing methods and protocols, which are the gold standard of the Methods in Molecular Biology book series. In Chapter 1, Dr. Hans-Peter Nestler from Sanofi-Aventis outlines the concept and advantages of organizing drug discovery in target families. In Chapter 2, Dr. Konstantin Balakin from ChemDiv describes the methods and application of target family-oriented compound library design. In Chapters 3 and 4, Dr. Dirksen Bussiere from Novartis and Prof. Andrea Mozzarelli from the University of Parma, respectively, outline the concepts of drug discovery targeting the purinome and cofactor binding sites in general. In Chapter 5, Prof. Garland Marshall from the Washington University School of Medicine outlines chemogenomics with peptide secondary structure mimetics. In Chapter 6, Dr. Sarma Jagarlapudi from GVK Biosciences describes molecular information systems for knowledge-based discovery. In Chapter 7, our group demonstrates knowledge-based virtual screening approaches with the application to the P53-MDM4 protein–protein interaction. In Chapters 8 and 9, Dr. Michael Keiser from the University of California at San Francisco and Dr. Josef Scheiber from Novartis, respectively, describe the analysis and prediction of the SAR matrix of the comprehensive chemogenomics knowledge space and of the safety profiling knowledge space. In Chapter 10, Dr. Richard Brennan shows the mining of biological pathway data with emphasis on compound protein networks. In Chapters 11 and 12, Prof. Ruben Abagyan from the Scripps Institute and Dr. Bernard Pirard from Novartis, respectively, describe the pocketome engine and molecular interaction field approaches which aim to analyze the target binding sites and the binding modes of the ligands. Finally, in Chapter 13, Dr. Ulrich Schopfer from Novartis describes the logistics of compound management and screening in action, which are essential elements for the experimental generation of the SAR data. While there are many obvious advantages to the herein described approaches, like the emphasis on systematization, a key limitation to the success of chemogenomics approaches is
v
vi
Preface
recognized by the fact that drug discovery is not a totally structured and predictable science. Diversity and serendipity, both in chemistry and in biology, are drivers of many important discoveries. Conversely, the chemogenomics approaches described in this book are today part of the portfolio of the major pharmaceutical companies, and important contributions to drug discovery have been reached in the design of protein kinase inhibitors. All chapter authors are very much acknowledged for their excellent scientific contributions and their willingness to share their insights and strategic viewpoints on chemogenomics which make this book especially interesting to read. I also thank Prof. John M. Walker, the Series Editor, and David Casey from Humana Press, for the invitation to edit this book and for their commitment to dealing with the production work. I am convinced that the content of this book will help to broaden the practical application of chemogenomics concepts in drug discovery, and I hope that you, the reader, will find the book both informative and enjoyable. Basel, Switzerland
Edgar Jacoby
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v ix
1 Organizing Bioactive Compound Discovery in Target Families . . . . . . . . . . . . . . H. Peter Nestler 2 Compound Library Design for Target Families . . . . . . . . . . . . . . . . . . . . . . . . . . K.V. Balakin, Y.A. Ivanenkov, and N.P. Savchuk 3 Targeting the Purinome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeremy M. Murray and Dirksen E. Bussiere 4 Cofactor Chemogenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ratna Singh and Andrea Mozzarelli 5 Chemogenomics with Protein Secondary-Structure Mimetics . . . . . . . . . . . . . . . Garland R. Marshall, Daniel J. Kuster, and Ye Che 6 Database Systems for Knowledge-Based Discovery . . . . . . . . . . . . . . . . . . . . . . . . Sarma A.R.P. Jagarlapudi and K.V. Radha Kishan 7 Knowledge-Based Virtual Screening: Application to the MDM4/p53 Protein–Protein Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edgar Jacoby, Andreas Boettcher, Lorenz M. Mayr, Nathan Brown, Jeremy L. Jenkins, Joerg Kallen, Caroline Engeloch, Ulrich Schopfer, Pascal Furet, Keiichi Masuya, and Joanna Lisztwan 8 Off-Target Networks Derived from Ligand Set Similarity . . . . . . . . . . . . . . . . . . . Michael J. Keiser and Jérôme Hert 9 Chemogenomic Analysis of Safety Profiling Data . . . . . . . . . . . . . . . . . . . . . . . . . Josef Scheiber and Jeremy L. Jenkins 10 Network and Pathway Analysis of Compound–Protein Interactions . . . . . . . . . . . Richard J. Brennan, Tatiana Nikolskya, and Svetlana Bureeva 11 The Flexible Pocketome Engine for Structural Chemogenomics . . . . . . . . . . . . . Ruben Abagyan and Irina Kufareva 12 Structure-Based Chemogenomics: Analysis of Protein Family Landscapes . . . . . . Bernard Pirard 13 Hypothesis-Driven Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ulrich Schopfer, Caroline Engeloch, Frank Höhn, Hervé Mees, Jennifer Leeds, Fraser Glickman, Günther Scheel, Sandrine Ferrand, Peter Fekkes, and Martin Pfeifer Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
vii
21 47 93 123 159
173
195 207 225 249 281 297
317
Contributors Ruben Abagyan • Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA K.V. Balakin • ChemDiv, Inc., San Diego, CA, USA Institute of Physiologically Active Compounds, Russian Academy of Sciences, Chernogolovka, Noginsk Area, Moscow Region, Russia Andreas Boettcher • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Richard J. Brennan • GeneGo Inc., Encinitas, CA, USA Nathan Brown • The Institute of Cancer Research, Royal Cancer Hospital, Belmont, Sutton, Surrey, UK Svetlana Bureeva • GeneGo Inc., Encinitas, CA, USA Dirksen E. Bussiere • Global Discovery Chemistry, Novartis Institutes for BioMedical Research, Emeryville, CA, USA Ye Che • Exploratory Medicinal Sciences, Pfizer Global Research & Development, Groton, CT, USA Caroline Engeloch • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Peter Fekkes • Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA Sandrine Ferrand • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Pascal Furet • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Fraser Glickman • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Jérôme Hert • Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA Frank Höhn • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Y.A. Ivanenkov • ChemDiv, Inc., San Diego, CA, USA Chemical Diversity Research Institute (IIHR), Khimki, Moscow Region, Russia Edgar Jacoby • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Sarma A.R.P. Jagarlapudi • GVK Biosciences Pvt. Ltd. S-1, Phase-1, TIE, Balanagar, Hyderabad, India Jeremy L. Jenkins • Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
ix
x
Contributors
Joerg Kallen • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Michael J. Keiser • Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA K.V. Radha Kishan • GVK Biosciences Pvt. Ltd. S-1, Phase-1, TIE, Balanagar, Hyderabad, India Irina Kufareva • Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA Daniel J. Kuster • Biomedical Engineering, Washington University, St. Louis, MO, USA Jennifer Leeds • Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA Joanna Lisztwan • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Garland R. Marshall • Departments of Biochemistry and Molecular Biophysics and Biomedical Engineering, Washington University, St. Louis, MO, USA Keiichi Masuya • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Lorenz M. Mayr • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Hervé Mees • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Andrea Mozzarelli • Department of Biochemistry and Molecular Biology, University of Parma, Parma, Italy Italian Institute of Biostructures and Biosystems, Parma, Italy Jeremy M. Murray • Department of Protein Engineering, Genentech, Inc., South San Francisco, CA, USA H. Peter Nestler • Sanofi-Aventis Combinatorial Technologies Center, Tucson, AZ, USA Martin Pfeifer • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Bernard Pirard • Computer-Aided Drug Discovery, Global Discovery Chemistry, Novartis Institute for Biomedical Research, Basel, Switzerland N.P. Savchuk • ChemDiv, Inc., San Diego, CA, USA Günther Scheel • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Josef Scheiber • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Ulrich Schopfer • Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland Ratna Singh • Department of Biochemistry and Molecular Biology, University of Parma, Parma, Italy Italian Institute of Biostructures and Biosystems, Parma, Italy
Chapter 1 Organizing Bioactive Compound Discovery in Target Families H. Peter Nestler Summary The sequencing of genomes gave access to the complete set of building blocks for organisms of various species. A plethora of “-omics”-technologies has been developed to investigate the dynamic interactions of the building blocks in order to understand the functioning of living organisms. This has given rise to the clustering of proteins into target families based on the phylogenetic and structural similarities. In this chapter we will discuss how the concept of target families enables to investigate and modulate biochemical function in the quest to chart Chemical and Biological Spaces. Key words: Chemical space, Drug discovery technologies, Privileged motifs, Target families, Fragment screening
1. Introduction Over the last 3 decades, pharmaceutical compound discovery has undergone a dramatic evolution with several paradigm shifts. Originally drug discovery and development were based on pharmacological observation for a small number of compounds. While this process was slow and tedious and limited the numbers of substances that could be evaluated, the relevance of the studies allowed a fast transfer of efficacious compounds to the clinic. The increased understanding of molecular biology and the access to purified proteins triggered the paradigm of target-based drug discovery. The approach took biological reductionism to the extreme in stipulating that the modulation of individual protein activities would allow controlling physiological events and thus exerting a therapeutic effect. The focus on individual proteins Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_1, © Humana Press, a part of Springer Science + Business Media, LLC 2009
1
2
Nestler
simplified the initial testing of compounds and allowed the highthroughput screening of vast compound collections. It was soon realized that the success rate per tested compound was lower than in the original paradigm as these high-throughput screening assays have a high degree of abstraction to the physiological setting. To feed the screening machinery and to respond to increased crowding of intellectual property space, chemists devised combinatorial synthesis methods to quickly populate diverse chemical spaces. Even though the size of the screening collections grew quickly, the number of compounds entering clinical development stagnated. It was soon discovered that the majority of chemical structures present in the screening collections were incompatible with the requirements for successful physiological applications. Studies focused on the property analysis of successful drugs to define the “drug-like” chemical space and thus guide library and compound design (1, 2). In the beginning of the millennium the completion of the sequencing of the human genome constituted a landmark achievement in biological sciences (3). For the first time the complete set of protein building blocks for the human organism was available for pharmacological studies. Early analysis focused on establishing the number of druggable proteins, based on the correlation of binding sites with the guidelines for drug-like chemical space (4–6). While the actual number of druggable targets maybe not clearly defined and estimates range from 1,500 to 10,000 out of 30,000 human genes, the studies revealed the clustering of targets within various protein families (7–9). These insights triggered the concepts of organizing drug discovery aligned along the boundaries of these protein families to find chemical structures that address the structural space offered by biology on a more rational basis. From a managerial standpoint, corporations chose various models for implementation ranging from virtual knowledge networks to dedicated departments to improve the efficiency of their drug discovery efforts (10, 11). In this chapter, the scientific concepts of target-family-oriented drug discovery, also called Chemogenomics, will be discussed and exemplified, focusing more on their general technological applicability than on individual chemical structures.
2. Understanding Biological Space After the sequencing of the human genome, many questions started to be addressed: Which proteins cluster into families and are related on a structural level to each other? Which genes are expressed under which physiological setting and how do their
Organizing Bioactive Compound Discovery in Target Families
3
levels respond to challenges of the system? Are the expressed proteins functionally active, on which ligands do they exert their actions, and is their functioning dependent on their subcellular distribution? Having the blueprint of human physiology available allowed using the repertoire of bioinformatic tools to classify and cluster proteins based on their sequence homology (3). Although this comfortable straightforward picture is complicated by the fact that genes can be expressed in various forms, the target family classifications hold up in a first approximation especially if searching for structural information that helps to address the binding sites crucial for the protein activity. In the initial phase of mining the human proteome for new targets, the bioinformatic efforts took a rather static look at the genomic information. The analysis was based on sequence and structural homology, thereby ignoring the dynamics of biological systems. Proteins do not act as isolated entities but as complexes in an almost overcrowded environment. To exert their biological effect, the individual entities enter dynamic physical interactions with each other, and our textbook knowledge about kinetics and thermodynamics does not necessarily stand up to the task because of the high concentration and high viscosity of the cytoplasmic space. Furthermore, the monitoring of gene expression and protein analysis does not reveal the complete picture about their respective binding partners. Finally, protein actions are coupled and networked with each other in pathways that can either lead to an avalanche-like amplification of signals or feedback-triggered attenuation. One of the prototype cascades is the coagulation of blood, where one protease cleaves multiple molecules of the next downstream protease thus leading to an exponential increase in proteolytic potential and fibrin formation. “Systems Biology” makes an attempt at investigating and modeling these protein pathways and networks. Nonetheless, we are still ignorant about the intricate details of many protein/protein- and protein/ ligand-complexes, such as GPCR-agonist, ion-channel-modulator, or protease-substrate pairs that associate and dissociate in a cell and are responsible for biological activity. Even in cases where we know the respective binding partners, we are a long way from understanding the structural basis and dynamics of these interactions. Structural biology methods such as crystallography and NMR have taught us a lot for soluble proteins such as kinases and proteases, but gaining structural insights on membranebound proteins such as GPCRs and ion channels remains difficult, and the first structures in these protein families only could be determined at the turn of this millennium (12–14). Many of the pathway related challenges are addressed by “Systems Biology,” a concerted effort of informatics, chemistry, and biology. Although the dynamic aspects are crucial for the understanding of biological function, we will focus in this section on the approaches
4
Nestler
to identify physiological and artificial ligands for proteins as well as gaining structural knowledge about their interactions within the target-families.
3. Exploring Chemical Space One of the key aspects of Chemogenomics is to discover small bioactive molecules that allow to selectively modulate the activity of proteins. The expedition through biological space with small molecules has gone through several stages, swiveling between post- and presynthesis selection of chemical structures (15). From a purely empirical level led by phenomenological studies without guidance from structural information, through a phase of strong desire to rationally design drugs, via the high-number trial-and-error games of high-throughput screening and combinatorial chemistry, we have reached today a stage where we believe that organizing the biological space in target families will help us to integrate knowledge and technologies to chart the Chemical Space in an effective manner. One of today’s cornerstones of the synthetic strategies is the preparation of compounds in a libraries’ setting. Historically, the art of chemical synthesis demanded that individual molecules are prepared efficiently with high yield and are extensively characterized. Over decades, more and more complex molecules were prepared by dissecting the target molecule at key reactive bonds, devising strategic options for the assembly, and developing novel reactions to address the synthetic needs (16, 17). While this “Logic of Synthesis” laid the foundation for today’s capabilities to prepare a wide variety of conceivable target structures, it did not provide the numbers of molecules necessary for an efficient investigation of the “biological space.” The resolution of this conundrum required a conceptual rethinking by chemists: Instead of preparing individual “hypothesis-driven” structures with the hope of hitting the jackpot, libraries are based on the assumption of being “biologically naïve” but represent sufficient diversity to allow for the selection of compounds with desired activity. Combinatorial chemistry had raised expectations to solve the challenge of making the complete chemical space available for testing. Yet, it fast quickly realized that this hope was futile. Calculating the numbers of possible chemical structures that would be considered drug-like, e.g., based on carbon, hydrogen, nitrogen, oxygen, sulfur, and phosphorus with molecular weights below 500, estimates reached ballpark figures of 1060(18). Even if we assume that we could represent this space through 1% of the structures, an estimate that is made often for representative
Organizing Bioactive Compound Discovery in Target Families
5
selections from compound sets, we are still looking at 1058 structures. The material requirements for single representation of each structure go beyond the resources available in the known universe. Besides being disillusioned by the numbers, chemists soon recognized that compounds from combinatorial libraries were often inactive or poorly active on biological molecules unless they were derived from known active compounds. The structures were based on chemical feasibility and therefore densely populated the regions of chemical space offered by the scaffolds. With the insight that combinatorial libraries would not be capable of addressing the biological space and would even fall way short of filling the chemical space even within the boundary of molecular weights below 500, the utilization of combinatorial chemistry and parallel synthesis shifted from a diversity approach to densely populating chemical space around proven starting points, compounds with documented biological activity. One approach to solving conundrum is to use existing know ledge of bioactive structural motifs. Literature and databases on marketed drugs provide many of these starting points. Analysis of drugs on the market and development revealed that a limited set of 32 frameworks form the basis of more than 50% of marketed drugs (19). Although this analysis, like all retrospective studies, may be biased toward GPCR activity modulators that represent a significant fraction of drugs on the market, the study underlines two aspects. First, through one and a half centuries of organic synthesis we have explored only a very limited subset of chemical space in our drug discovery efforts, but remaining within this space makes us quite successful. Second, nature may not be as structurally creative and tolerant as could have been assumed and therefore biological space may be not as diverse as envisioned. Beyond these points, the bias toward GPCR ligands may not be as limiting as it may seem as GPCR through their subfamilies are binding a variety of structural motifs, such as nucleotides, lipids, peptides, and small molecule ligands like nicotinic acid or dopamine (20). These ligand types are actually shared with other target families, and therefore the structural motifs from drugs on the market can be transferred to drug discovery of other target families that may seem unrelated at first glance such as nucleotide mimics for kinases and peptide mimics for proteases. Thus molecular frameworks may be the uniting concept between and within target families, and recently studies have reported the clustering of proteins by their small molecule ligands (9, 21). The clustering along binding sites is nothing unique to small molecule ligands. Throughout evolution substrate core motifs have been conserved and thus driven the development of the respective protein family (22). To establish a more detailed analysis of ligand types and target families, so called biospectra have gained importance. To derive biospectra, a defined set of compounds is tested against a set of
6
Nestler
targets in a “full-matrix” layout. These biospectra allow several types of analysis as demonstrated by Fliri et al. on a set of 1,500 compounds and 92 assays. First, proteins can be clustered on the similarity of responses to the compounds (23). As mentioned earlier this may lead to new insights on how to group proteins for accelerating the discovery of bioactive compounds (vide infra). Second, the reactivity patterns can be used to establish similarity relationships between compounds (24). The outcome of the studies underline today’s paradigm that similar ligands bind to proteins with similar binding sites. Taking the data from these experiments opens the way to predict or design new ligands to predefined proteins (25, 26). In the following part of this chapter, several examples of target-family-oriented studies are highlighted in an attempt to emphasize the particular challenges of each target family in the discovery and profiling of bioactive compounds. As the field of Chemogenomics has expanded over the last 5 years, an exhaustive overview of research is beyond the scope of this contribution.
4. Examples of Target Families 4.1. G-Protein-Coupled Receptors (GPCRs)
GPCRs are the target family probably most frequently targeted by drugs on the market, albeit historically based on pharmacological observations without target knowledge. The human genome features several hundred GPCRs, thus making them today favorite targets for drug discovery. Two aspects distinguish GPCRs from other target families, especially the kinases or proteases. First, GPCRs do not possess enzymatic activity by themselves and the availability of structural information is limited. Second, unlike proteases and kinases which affect their enzymatic actions on proteins, many GPCRs bind ligands of small molecular weight. Because of the limited availability of crystal structures of membrane-bound proteins such as GPCRs, the rational design of GPCR-targeting agents relies heavily on design derived from known ligands (12, 13). Structure-based design is dependent on deducing structural information through sequence homologies. For families of soluble protein targets the situation for gaining structural knowledge is quite comfortable. A wealth of crystal structures in free as well as inhibitor-bound forms is available for proteases and kinases, very often for a variety of ligands to each protein. This information is used intensely for inhibitor optimization purposes and allows as well for structural comparisons on a target family level. These analyses are based on structural overlays of the protein structures within the target families, respectively, the subfamilies of proteases, and probing the affinity and repulsion
Organizing Bioactive Compound Discovery in Target Families
7
of various small molecular probes, such as water or methanol, with the active sites surface. The studies provide “target family landscapes” that show the relationships of the target family members on a structural level (21, 27–30). The landscapes provide the tools necessary to understand cross-reactivities of inhibitors with closely related proteins or to assess the likelihood of success for transforming an inhibitor for a particular target into an inhibitor for another family target. Furthermore, they allow selecting closely related proteins as structural surrogates for those family members, where crystallographic information is not available. This so-called homology modeling is of crucial importance for understanding the structural space covered by membrane-bound proteins, such as GPCRs or ion channels (31). Using the rhodopsin GPCR structure (12) as a template and combining it with target family homology knowledge, it has been possible get topological information about the binding sites for many GPCRs to foster understanding of the binding modes of ligands (12). At a resolution of about 3.5 Å which can usually be achieved, it is possible to understand differential binding of ligands to the receptors and to rationalize their activation, as demonstrated by Goddard et al. on a homologous series of ketones activating the olfactory receptor 912–93. Furthermore the differences in activation between mouse and human orthologs could be assigned to a Ser105Gly mutation (32). This study points also to an instrumental aspect for the structural modeling of membrane proteins. In addition to sequence homologies, ligand-binding strengths are used to refine the topologies and interactions. If combined with molecular dynamics refinements of the loops connecting the transmembrane helices as demonstrated by the program PREDICT (33), the accuracy of the models becomes powerful enough to perform virtual screening and to discriminate between ligands and their binding modes (29, 30, 34). Recently the high-resolution crystal structure of the b2-Adrenergic GPCR was reported (35–37). As the receptor is complexed with the partial inverse agonist carazolol, it gives new insights in possible binding interactions and modes of function thus enabling a more refined design of compound libraries targeted as against GPCRs (29, 38). Orphan GPCRs are receptors without known agonistic or antagonistic ligands. As GPCRs are usually identified based on sequence homologies, most of the GPCRs have no pharmacologic function or ligands associated at the time of their identification. To find such ligands and later on elicit a biological response, GPCRs are cloned and overexpressed with linkage to easily detectable reporter genes and screened against collection of known signal transmitters or dedicated libraries. Especially with the evolution of screening technology for GPCRs, it is possible today to deorphan many GPCRs, either with their endogenous ligands or synthetic analogues. The identified ligands give insights
8
Nestler
into the structural requirements for binding, information that can be used to refine aforementioned homology models, and can be used as tools to elucidate the biological functions. The endeavor of deorphaning GPCRs is supported by the existence of many GPCR-targeted drugs and the deorphaning efforts lead often to discovery of GPCRs that are the target for drugs with hitherto unknown mode of action (39–41). As mentioned earlier many of the GPCR-targeted compound discovery efforts are based on cell-based assays with the target GPCR coupled to a gene-expression read-out system. Unfortunately the readout of these systems can be influenced by many factors which are independent of the target action. Consequently, efforts to develop direct affinity screens have attracted some interest for GPCRs. One of the most elegant approaches has focused on size-exclusion chromatography. GPCR-micelle preparations are equilibrated with compound libraries and passed through a size-exclusion column. As the removal of unbound compounds relies on the size difference between small molecules and proteins, this approach was assumed to be quite powerful for screening membrane-bound proteins that are captured in micelles. GPCR/ ligand complexes provide a high molecular size during separation from the small unbound ligands and allowed identifying ligands to the M2-receptor (42). Despite the intellectual appeal of this affinity screen, the elegance of selecting compounds based on a cellular response still tilts the scale toward gene-expression linked cellular screens and allows to progress selected GPCR-ligand comparatively fast through the early stages of drug development. 4.2. Kinases
The insights into structural relationship within target families have reshaped our thinking about library synthesis and high-throughput screening and lead to the concept of focused target family libraries to improve screening efficiency. The concept has been especially valuable in the family of kinases that provide a high structural homogeneity in their ATP-binding pocket. The use of focused screening sets provides – assuming they are properly constructed or selected – multiple advantages. First, the lower numbers of compounds reduce the cost and efforts of screening campaigns and address the throughput limitations of some assay types. Second, high-quality activity data can be gathered from the beginning as the smaller compound numbers allow measuring multiple data points per compound and thus reducing false-positive and -negative occurrence. Third, they provide higher hit rates and thus SAR from the initial screening and thus can give immediate guidance for chemical programs. However, a delicate balance between focused screening and chance for serendipity has to be maintained, especially to address the challenge of discovering novel chemotypes to secure an intellectual property position and to explore novel interfaces of chemical and biological spaces.
Organizing Bioactive Compound Discovery in Target Families
9
Furthermore, the heavy use of privileged scaffolds may lead to an incestuous reinvestigation of established structures. While these privileged scaffolds may be advantageous for the efficiency of optimizing lead structures toward drug candidates as we are moving on known terrain, it also limits our ability to resolve old issues or to find new activities. The paradigm of target families stipulates that similar chemical structures elicit similar biological response, and we base our optimization strategies on this concept (43). On the flipside of the coin we have to take into account that the reverse assumption also will hold true: If two similar molecules cause a similar response on the target, then we have to assume that two structural similar targets respond to a molecule in a similar way. Especially for kinases, the prototypic target family, we observe this phenomenon with significant activities of one compound on several kinases. Most known kinase inhibitors act as competitors of ATP, the universal cosubstrate for all kinases, and therefore frequent hitters are quite common in high-throughput screening of kinase inhibitors (44, 45). In a recent investigation the binding affinities of 20 structurally diverse kinase inhibitors that are in clinical trials or marketed drugs were investigated against a panel of 113 kinases distributed across the kinome. The study highlights that even “selectivity”-optimized kinase inhibitors are a long way from being selective and hit targets across the kinome (46). The kinome maps are phylogenetic trees based on sequence similarities, and we discussed the shortcomings of phylogenetic analysis for high-resolution structural grouping before. Inhibition profiles of series of compounds can give us guidance for structural clustering of kinases that is necessary to devise selective and potent inhibitors (21). Because of their prototypic characteristics and structural homogeneity, kinases serve also as the workhorses for developing or revisiting predictive approaches. In a recent example, 700 compounds from two libraries with different chemotypes were tested against a panel of 45 kinases. Neither the combinatorial matrices in the libraries nor the testing on the kinase panel was exhaustive. Nonetheless the available data allowed to successfully build refined increment systems for the prediction of activities and specificities on each of the kinases and missing compounds using Free-Wilson analysis (47), a methodology developed 40 years ago for the QSAR of small medicinal chemistry series (48). Beyond the problems that may arise through the polyenzymology of kinase inhibitors, we also have to take into consideration that privileged motifs may also lead to “privileged problems” for drug development in pharmacological and toxicological studies. As the compounds derived from focused screening set share high structural similarities, the different substitutions may not sufficiently modulate the metabolic lability of the scaffolds or suppress unwanted interactions with proteins even from unrelated families. For kinases, the ensemble of active sites provides an enveloping shape for the ATP-binding pocket that can be addressed through
10
Nestler
screening (27). To address the differences of the binding pockets in various kinases, affinity screens with small molecular probes have gained considerable attention. Because of the very compact binding site in kinases, fragment-based assays have been quite successful in exploiting the differences of the binding sites. As described for the GPCRs earlier, one of the promising concepts couples the equilibration of molecular fragments and proteins with a size-exclusion filtration to remove unbound library members before determining the ligands bound to target proteins by mass spectrometry (49). A family experiment using several JNK-kinases thus provided selective inhibitors with nanomolar activities and molecular weights starting at around 350 Da (50). Unlike the GPCRs, kinases are soluble proteins and usually can be cocrystallized with ligands. Even low affinity molecules may yield crystals of the kinase/ligand complex. As the information content of such cocrystal structures is very high, they have been used commonly in the optimization of inhibitors and recently found a new application in the screening for kinases inhibitors. Starting from privileged scaffolds for the ATP pocket of kinases, fragments binding to p38 MAP kinase and CDK2 were discovered, that can serve as novel central building blocks for kinase inhibitors (51). As the throughput of crystallography is still limited compared to biochemical screenings, collection sizes have to be small or as in the previous example, mixtures of fragments have to be screened. To expand the size of collections that can be screened by crystallography, Congreve et al. devised a dynamic combinatorial library system using cyclin-dependent kinase 2 (“CDK2”). Protein crystals act as selectors for the tightest binding ligands which are formed from the condensation of isatin and hydrazines. Instead of equilibrating with a large amount of template protein, the reaction mixture is exposed to individual crystals of CDK2 guiding the selective formation of imino-indolones. The structures of selected reaction products are determined by crystallography, immediately establishing a binding mode for the nanomolar inhibitors of CDK2 (52). Using the information gained on the fragments, they can be extended from a central scaffold (usually one that has shown its suitability to yield kinase inhibitors) into the side pockets of the ATP-binding site to improve selectivity and activity. The growth of inhibitors has been demonstrated in the case of growing nonnucleotide binders into the nucleotide-binding pocket of adenosine kinase (53, 54). Because of the low molecular weight of the fragments which allow the addition of structural elements to tune activity and specificity, fragment-based screenings have found their way into many areas of bioactive compound discovery across target families (55–57). 4.3. Proteases
Tracking protease activity remains one of the major challenges as gene expression levels do not correlate tightly with the activity of a protease. Even monitoring tools like in situ hybridization
Organizing Bioactive Compound Discovery in Target Families
11
cannot elucidate the protease activity in tissues or cellular systems, because the antibodies employed for the studies often do not discriminate between the proenzyme and active protease. Efforts to image protease activity in cells yielded to activity labeling probes, which act as suicide substrates and provide a fluorescent tagging of the active site of active proteases (58, 59). Originally it was believed that this technology would be limited to proteases that allow for covalent attachment of the probes, i.e., serine and cysteine proteases that act through a nucleophilic substitution, to limit the diffusion of the fluorescent molecule in the cells or organisms in order to achieve a sufficient signal for imaging (60). Some recent papers support however the knowledge that covalent attachment to the protease is not needed, but linking the fluorophor to a carrier may give sufficient signal intensity for in vivo imaging purposes (61–63). Most recently Watzke et al. could show that even small diffusible fluorophores provide sufficient intensity for detecting protease activity in living systems (64). In an interesting twist to the traditional design of the probes starting from the substrate peptide, they utilize a “reverse design” to derive the probes from known highly selective Cathpesin inhibitors by introducing a cleavable bond in the center of the scaffold to achieve higher specificity of detection. Unfortunately, none of these probes addresses the other problem of protease-directed pharmacology: revealing the proteins that are cleaved by the protease. Straightforward labeling approaches as utilized to discover kinase substrates are not feasible because no additional moieties are introduced. Therefore, alternate approaches based on two-dimensional gel electrophoresis have been devised that allow either the differential labeling of substrates or utilize the differential mobility of substrates and cleavage products after digestion. The identities of the proteins are determined by mass spectrometry and sequence analyses, although these technologies do not reach a resolution that would allow determining the characteristics of the protease selectivity pockets. For the first approach, cell extracts are divided into two parts and in each portion the proteins are labeled with a fluorescent dye, using different dyes for the portions. One fraction is subjected to proteolytic digestion, while the other fraction remains untreated. After mixing of the portions and electrophoretic separation, substrates can be identified through the varying color of the spots (65). In the second approach, a cellular extract is separated by electrophoresis in one dimension. After proteolytic digestion in the gel, the protein mixture is separated in the second dimension where the cleavage products show a different mobility from the parent proteins (66). While the first approach allows for analysis on a proteomic level and under various conditions, the latter allows a direct correlation of the cleavage peptides to the parent proteins. We use the insights into the biological space
12
Nestler
of the target families to select screening collections as well as to define specificity requirements for target family members to build appropriate profiling panels. To gain a more detailed insight into the structural parameters controlling substrate selections, peptide libraries have been used intensely. Proteolytic digestion of such libraries that commonly contain hexa- to octapeptides returns ensembles of peptide substrates (67, 68). These substrate ensembles carry pharmacophoric information of the substrate pockets as well as on specificity of these pockets. Together with the knowledge about the preferred ß-strand geometry of protease inhibitors (69) and the ensuing privileged scaffolds this information can guide protease inhibitor design (70). Peptidomimetic approaches are heavily used to build protease inhibitor scaffolds. Selective protease inhibitors are quite straightforward to be obtained because of the substrate variety and specificity of the proteases. However, the concept of privileged scaffolds does not carry far for proteases. The unifying element in protease substrates is the extended beta-strand conformation that allows interactions with four to six subpockets in the protease active site (69). Mimics for this conformation have been developed but they still lack universal applicability for the transfer into clinical application (70). Unlike the scaffolds for kinase or GPCR ligands, the cores of protease inhibitors, such as the peptidic backbone in the substrate, do not contribute the majority of binding energy and are therefore not crucial for affinity to the target (although they may severely affect the pharmacokinetic properties of the inhibitors). The energetic drivers of protease/inhibitor binding are the interactions in the subpockets, determining activity and selectivity of the inhibitors. Recently, these pockets are probed directly with molecular fragments that are conjugated to each other upon showing affinity to the targets. The structure-focused concept for fragment screening tries to extract as much structural information from the initial interaction experiment and relies on either NMR or crystallography, paying for the increased information content with a limitation in throughput. The door to these experiments was pushed wide open when Fesik and colleagues reported the successful screening of small molecular fragments against the S1¢ pocket of stromelysin (matrix metalloprotease 3). Using biaryl systems, they could show that the resonances in NMR spectra shifted when the molecule fragments bound to the protein. Conjugating these fragments with hydroxamic acid, a potent zinc chelator, provided compounds with nanomolar affinities (71). The elegance and potential of fragment-based screening approaches was underlined by a detailed investigation of the thermodynamics of the interactions (72). Using the NMR-fragment screening but another fragment set, high affinity inhibitors with novel structural motifs were discovered from a small set of fragments for urokinase, wherein the deep S1-specificity pocket served as anchoring point for the ligands (73).
Organizing Bioactive Compound Discovery in Target Families
13
Starting from these anchors the ligands were grown into the S2, S3, and S4 pockets of the enzyme. Biochemical data as well as structural information from NMR experiments guided the optimization toward selective and nanomolar inhibitors (74). The structural-based fragment screenings are a powerful way for the design of compounds and the evaluation of small compound sets. High-throughput screening of large collections depends on faster – less information-rich – readouts. Although the affinity of small fragments is sufficient for detection and quantification, the interaction strengths are usually too weak to be reliably detected in the high-throughput enzymatic assays, which depend on the competition of inhibitors with the substrate. Therefore a high-throughput affinity detection technology is required for the screening of larger fragment collections. Surface Plasmon Resonance (“SPR”) offers the advantage of fast and label-free affinity detection, meaning that no additional modification of the analyte is needed. Attaching the fragments on an SPR array allows the simultaneous testing of thousands of fragments and the ability to combinatorially synthesize small ligands conjugated to surface attachment tags allow the convenient preparation of such arrays (75). As the strength of SPR response depends on the changes of molecular weight on the resonance surface, capturing a protein with a small ligand provides a favorable signal-to-background ratio. Recently, the search for fragments binding to the S1-specificity pocket of the serine protease Factor VIIa yielded haloaromatic moieties that can substitute for the well-known but undesired benzamidine as anchor. Haloaromatic moieties had been known as ligands to the benzamidine-binding S1-pockets of the S1-clan serine proteases, such as Factor Xa or thrombin. This knowledge guided the design for a library of approximately 1,500 small-size fragments, which were immobilized on a microarray. Affinity screening with Factor VIIa identified several small ligands and their interaction in the S1-pocket could be confirmed by crystallography using trypsin as a surrogate for faster crystallographic screening and reconfirmation of the binding in Factor VIIa (76).
5. Beyond the Original Target Families: Addressing Toxicology and ADME
The discovery and optimization of compounds against therapeutical target is the obvious application of target family knowledge. Nonetheless, the discovery and development of pharmacological agents is more often than not challenged by unwanted side effects of the compounds. The methods described earlier can as well be utilized to optimize compounds to be nonreactive against unwanted targets. One of the prime “antitargets” is the hERG channel. The hERG channel, a voltage-gated potassium ion channel located in cardiac
14
Nestler
tissue, is responsible for membrane depolarization and is of general pharmacological interest because blocking this channel can induce fatal atrial fibrillation through prolongation of the QT-cardiac interval (77, 78). In the ion channel field, homology models can be based on three crystal structures of various potassium channels, two of which show the channel in the open (13, 14) and one in the closed state (79). Although ion channels are multimeric proteins and structurally more diverse than GPCRs, good models have become available using the three structures and ligand-activity information, as highlighted by the possibility of predicting hERG blocking activity of ligands (77, 80). Because of the importance of the hERG channel, a lot of biological data are available and the homology-based models, even though they are built on the bacterial MthK channel (13), have meanwhile reached the same accuracy as models derived from structure– activity relationship data (81) and can guide chemical optimization to achieve specificity of ligands. Beyond the prediction of ligand binding, homology models help in the functional analysis of ion channels. In a recent example, the gating of the Kir6.2 channel by ATP could be explained at atomic level (82) utilizing the structures of the open Kir3.1 channel (14) and the closed KirBac1.1 channel (79). Another major class of undesired targets is the Cytochrome P450 family (“CYP”). CYPs are mitochondrial proteins responsible for the oxidative metabolism of compounds. Their activity is dependent on a heme molecule bound to the active site. Even though the human genome contains about 60 P450s, about than 90% of all drugs are metabolized via five main CYP isoforms: CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4, the latter often being the main route of initial oxidation. Inhibition and/or induction of CYP isoforms by one drug can cause major changes in the metabolism rates of other drugs thus causing various adverse drug interactions, e.g., loss in therapeutic efficacy through faster degradation or drug toxicity by route of unexpected accumulation of drugs. Based on crystal structures of the isoforms complex to various drugs, models have been developed to allow the rationalization of drug interactions with CYPs (83, 84). Starting from these insights efforts are underway to develop general structural descriptors to predict metabolism as well as those parameters of pharmacokinetics that we cannot yet explain on a structural level (85).
6. Epilogue Over the last decade Chemogenomics and its subaspect of target families have reshaped our ways of doing drug discovery and medicinal chemistry (86–88). The rationalization of small molecules acting
Organizing Bioactive Compound Discovery in Target Families
15
as the keys to unlock the activity of proteins served as a starting point into molecular biology (89, 90). The investigations of the structural characteristics of target families allow us today to take a more rational approach toward selecting appropriate compounds for synthesis and testing. Through the sequencing of the human genome, we have the blueprint of the building blocks of life that we can modulate in their interactions through therapeutics (3). Our learning about molecular characteristic that provide favorable pharmacokinetic characteristics of molecules in the human body has established guidelines and boundaries for molecules that help us to navigate the chemical space in regions that offer a higher population of structures that may be suitable as drugs (1). While many of the concepts of the target family approach may not be novel to a medicinal chemist if considered in isolation, their conscious combination adds another dimension: “Chemical Biology” is based on a thorough structural knowledge of similarities and differences within a target family. Today’s structural understanding also allows us to make more sophisticated choices about investigations to prevent side effects, and the increasing biological knowledge helps us to rationalize side effects of drugs and to modify affected drugs accordingly. However, we still run into the trap of building assay schemes for drug discovery that allow high throughput and are self-consistent. The high-throughput design sacrifices the biochemical mimicking of the cellular environment, such as the mentioned high concentration and viscosity, for technical feasibility. The self-consistency leads often to the risk of losing the relevance for the pathophysiological phenomenology and thus jeopardizes the predictivity for the therapeutic setting, being detached from reality like the “Hessian glass bead game” (91). Eventually “Systems Biology” will elucidate how the building blocks of life work together in networks and pathways and which results can be expected by tweaking one dial in the systems, leading to novel and powerful assay setups. Once we have established the tools for highcontent screening, we may be set to tackle the challenge to address complex organisms in a rational manner. To date the interplay of various organs and cell types combined with the ever-changing concentrations of compounds at the locus of action is not understood. The challenge for drug discovery will be the discovery of bioactive compounds for application in the dynamic universe of Chemical and Biological Spaces and Networks. References 1. Lipinski, C.A., Lombardo, F., Dominy, B.W., and Feeney, P.J. (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 23, 3–25. 2. Veber, D.F., Johnson, S.R., Cheng, H.-Y., Smith, B.R., Ward, K.W., and Kopple, K.D.
(2002) Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 45, 2615–2623. 3. Venter, J.C., Adams, M.D., Myers, E.W., et al. (2001) The sequence of the human genome. Science 291, 1304–1351. 4. Drews, J. (2000) Drug discovery: A historical perspective. Science 287, 1960–1963.
16
Nestler
5. Hopkins, A.L, and Groom, C.R. (2002) The druggable genome. Nat. Rev. Drug Discov. 1, 727–730. 6. Overington, J.P., Al-Lazikani, B., and Hopkins, A.L. (2006) How many drug targets are there? Nat. Rev. Drug. Discov. 5, 993–996. 7. Russ, A.P., and Lampel, S. (2005) The druggable genome: An update. Drug. Discov. Today 10, 1607–1610. 8. Hajduk, P.J., Huth, J.R., and Tse, C. (2005) Predicting protein druggability. Drug Discov. Today 10, 1675–1682. 9. Paolini, G.V., Shapland, R.H.B., van Hoorn, W.P., Mason, J.S., and Hopkins, A.L. (2006) Global mapping of pharmacological space. Nat. Biotechnol. 24, 805–815. 10. Narayanan, V.K., Douglas, F., Schirlin, D., Wess, G., and Giesing, D. (2004) Virtual communities as an organizational mechanism for embedding knowledge in drug discovery: The case of chemical biology platform. J. Business Chem. 1, 37–47. 11. Douglas, F.L. (2007) Managerial challenges in implementing chemical biology platforms. In: Schreiber, S.L., Kapoor, T.M., and Wess, G. (eds.) Chemical Biology: From Small Molecules to Systems Biology and Drug Design. Wiley-VCH, Weinheim, pp. 789–803. 12. Palczewski, K., Kumasaka, T., Hori, T., et al. (2000) Crystal structure of rhodopsin: A G protein-coupled receptor. Science 289, 739–745. 13. Jiang, Y., Lee, A., Chen, J., Cadene, M., Chait, B.T., and MacKinnon, R. (2002) Crystal structure and mechanism of a calcium-gated potassium channel. Nature 417, 515–522. 14. Nishida, M., and MacKinnon, R. (2002) Structural basis of inward rectification: Cytoplasmic pore of the G protein-gated inward rectifier GIRK1 at 1.8.ANG. resolution. Cell 111, 957–965. 15. Eschenmoser, A. (1994) One hundred years of the lock-and-key principle. Angew. Chem. Int. Ed. Engl. 33, 2363. 16. Woodward, R.B. (1972) Recent advances in the chemistry of natural products (Nobel Lecture, December 11, 1965). In: Nobel Foundation (ed.) Nobel Lectures, Chemistry 1963–1970. Elsevier, Amsterdam, pp. 100–121. 17. Corey, E.J. (1991) The logic of chemical synthesis: Multistep synthesis of complex carbogenic molecules. (Nobel lecture). Angew. Chem. Int. Ed. Engl. 30, 455–465. 18. Bohacek, R.S., McMartin, C., and Guida, W.C. (1996) The art and practice of struc-
19.
20.
21.
22.
23.
24.
25.
26. 27.
28.
29. 30.
ture-based drug design: A molecular modeling perspective. Med. Res. Rev. 16, 3–50. Bemis, G.W., and Murcko, M.A. (1996) The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887– 2893. Bondensgaard, K., Ankersen, M., Thogersen, H., Hansen, B.S., Wulff, B.S., and Bywater, R.P. (2004) Recognition of privileged structures by G-protein coupled receptors. J. Med. Chem. 47, 888–899. Vieth, M., Higgs, R.E., Robertson, D.H., Shapiro, M., Gragg, E.A., and Hemmerle, H. (2004) Kinomics-structural biology and chemogenomics of kinase inhibitors and targets. Biochim. Biophys. Acta 1697, 243–257. Chiang, R.A., Sali, A., and Babbitt, P.C. (2008) Evolutionarily conserved substrate substructures for automated annotation of enzyme superfamilies. PLoS Comput. Biol. 4, e1000142. Fliri, A.F., Loging, W.T., Thadeio, P.F., and Volkmann, R.A. (2005) Biospectra analysis: Model proteome characterizations for linking molecular structure and biological response. J. Med. Chem. 48, 6918–6925. Fliri, A.F., Loging, W.T., Thadelo, P.F., and Volkmann, R.A. (2005) Biological spectra analysis: Linking biological activity profile to molecular structure. Proc. Natl. Acad. Sci. U.S.A. 102, 261–265. Bender, A., Young, D.W., Jenkins, J.L., Serrano, M., Mikhailov, D., Clemons, P.A., and Davies, J.W. (2007) Chemogenomic data analysis: Prediction of small-molecule targets and the advent of biological fingerprints. Comb. Chem. High Throughput Screen. 10, 719–731. Bajorath, J. (2008) Computational analysis of ligand relationships within target families. Curr. Opin. Chem. Biol. 12, 352–358. Naumann, T., and Matter, H. (2002) Structural classification of protein kinases using 3D molecular interaction field analysis of their ligand binding sites: Target family landscapes. J. Med. Chem. 45, 2366–2378. Matter, H., and Schwab, W. (1999) Affinity and selectivity of matrix metalloproteinase inhibitors: A chemometrical study from the perspective of ligands and proteins. J. Med. Chem. 42, 4506–4523. Klabunde, T. (2007) Chemogenomic appro aches to drug discovery: Similar receptors bind similar ligands. Br. J. Pharmacol. 152, 5–7. Klabunde, T., and Hessler, G. (2002) Drug design strategies for targeting G-protein-coupled receptors. ChemBioChem 3, 928–944.
Organizing Bioactive Compound Discovery in Target Families
31. Radestock, S., Weil, T., and Renner, S. (2008) Homology model-based virtual screening for GPCR ligands using docking and targetbiased scoring. J. Chem. Inf. Model. 48, 1104–1117. 32. Hummel, P., Vaidehi, N., Floriano, W.B., Hall, S.E., and Goddard, W.A. III. (2005) Test of the binding threshold hypothesis for olfactory receptors: Explanation of the differential binding of ketones to the mouse and human orthologs of olfactory receptor 912–93. Protein Sci. 14, 703–710. 33. Shacham, S., Marantz, Y., Bar-Haim, S., Kalid, O., Warshaviak, D., Avisar, N., Inbal, B., Heifetz, A., Fichman, M., Topf, M., Naor, Z., Noiman, S., and Becker, O.M. (2004) PREDICT modeling and in-silico screening for G-protein coupled receptors. Proteins 57, 51–86. 34. Becker, O.M., Marantz, Y., Shacham, S., Inbal, B., Heifetz, A., Kalid, O., Bar-Haim, S., Warshaviak, D., Fichman, M., and Noiman, S. (2004) G protein-coupled receptors: In silico drug discovery in 3D. Proc. Natl. Acad. Sci. U.S.A. 101, 11304–11309. 35. Rosenbaum, D.M., Cherezov, V., Hanson, M.A., Rasmussen, S.G., Thian, F.S., Kobilka, T.S., Choi, H.J., Yao, X.J., Weis, W.I., Stevens, R.C., and Kobilka, B.K. (2007) GPCR engineering yields high-resolution structural insights into beta 2-adrenergic receptor function. Science 318, 1266–1273. 36. Cherezov, V., Rosenbaum, D.M., Hanson, M.A., Rasmussen, S.G., Thian, F.S., Kobilka, T.S., Choi, H.J., Kuhn, P., Weis, W.I., Kobilka, B.K., and Stevens, R.C. (2007) High-resolution crystal structure of an engineered human beta 2-adrenergic G proteincoupled receptor. Science 318, 1258–1265. 37. Rasmussen, S.G., Choi, H.J., Rosenbaum, D.M., Kobilka, T.S., Thian, F.S., Edwards, P.C., Burghammer, M., Ratnala, V.R., Sanishvili, R., Fischetti, R.F., Schertler, G.F., Weis, W.I., and Kobilka, B.K. (2007) Crystal structure of the human beta 2 adrenergic G-protein-coupled receptor. Nature 450, 383–387. 38. Jacoby, E., Bouhelal, R., Gerspacher, M., and Seuwen, K. (2006) The 7 TM G-proteincoupled receptor target family. ChemMedChem 1, 760–82. 39. Civelli, O. (2005) GPCR deorphanizations: The novel, the known and the unexpected transmitters. Trends Pharmacol. Sci. 26, 15–19. 40. Chung, S., Funakoshi, T., and Civelli, O. (2007) Orphan GPCR research. Br. J. Pharmacol. 153(S1), S339–S346.
17
41. Levoye, A., and Jockers, R. (2008) Alternative drug discovery approaches for orphan GPCRs. Drug Discov. Today 13, 52–58. 42. Hou, Y., Felsch, J., Annis, A., et al. (2002) Identification of Small Molecule Ligands for G Protein Coupled Receptor Using Affinity Selection Screening. GPCR IBC Conference 2002. 43. Martin, Y.C., Kofron, J.L., and Traphagen, L.M. (2002) Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358. 44. Aronov, A.M., McClain, B., Moody, C.S., and Murcko, M.A. (2008) Kinase-likeness and kinase-privileged fragments: Toward virtual polypharmacology. J. Med. Chem. 51, 1214–1222. 45. Aronov, A.M., and Murcko, M.A. (2004) Toward a pharmacophore for kinase frequent hitters. J. Med. Chem. 47, 5616–5619. 46. Fabian, M.A., Biggs, W.H., Treiber, D.K., et al. (2005) A small molecule-kinase interaction map for clinical kinase inhibitors. Nat. Biotechnol. 23, 329–336. 47. Free, S.M., and Wilson, J.W. (1964) A mathematical contribution to structure-activity relationships. J. Med. Chem. 7, 395–399. 48. Sciabola, S., Stanton, R.V., Wittkopp, S., et al. (2008) Predicting kinase selectivity profiles using Free-Wilson QSAR analysis. J. Chem. Inf. Model. 48, 1851–1867. 49. Dunayevskiy, Y.M., Vouros, P., Wintner, E.A., Shipps, G.W., Carell, T., and Rebek, J. Jr. (1996) Application of capillary electrophoresis-electrospray ionization mass spectrometry in the determination of molecular diversity. Proc. Natl. Acad. Sci. U.S.A. 93, 6152–6157. 50. Agnihotri, G., Scott, M.P., Alaoui-Ismaili, M.H., et al. (2004) Identification of Potent Inhibitors of c-Jun N-terminal Kinase-1 (JNK1) using Ultra High-Throughput Affinity Based Screening. 12th Symposium on Second Messengers and Phospho-proteins (SMP-2004). 51. Hartshorn, M.J., Murray, C.W., Cleasby, A., Frederickson, M., Tickle, I.J., and Jhoti, H. (2005) Fragment-based lead discovery using X-ray crystallography. J. Med. Chem. 48, 403–413. 52. Congreve, M.S., Davis, D.J., Devine, L., et al. (2003) Detection of ligands from a dynamic combinatorial library by X-ray crystallography. Angew. Chem. Int. Ed. Engl. 42, 4479–4482. 53. Hajduk, P.J., Bures, M., Praestgaard, J., and Fesik, S.W. (2000) Privileged molecules for
18
54.
55. 56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
Nestler protein binding identified from NMR-based screening. J. Med. Chem. 43, 3443–3447. Hajduk, P.J., Gomtsyan, A., Didomenico, S., et al. (2000) Design of adenosine kinase inhibitors from the NMR-based screening of fragments. J. Med. Chem. 43, 4781–4786. Nestler, H.P. (2005) Combinatorial chemistry and fragment screening - two unlike siblings? Curr. Drug Discov. Technol. 2, 1–12. Congreve, M., Chessari, G., Tisi, D., and Woodhead, A.J. Recent developments in fragment-based drug discovery. J. Med. Chem. 51, 3661–3680. Degen, J., Wegscheid-Gerlach, C., Zaliani, A., and Rarey, M. (2008) On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem. 3, 1503–1507. Greenbaum, D.C., Arnold, W.D., Lu, F., et al. (2002) Small molecule affinity fingerprinting a tool for enzyme family subclassification, target identification, and inhibitor design. Chem. Biol. 9, 1085–1094. Greenbaum, D., Baruch, A., Hayrapetian, L., Darula, Z., Burlingame, A., Medzihradszky, K.F., and Bogyo, M. (2002) Chemical approaches for functionally probing the proteome. Mol. Cell. Proteomics 1, 60–68. Blum, G., Degenfeld, G.V., Merchant, M.J., Blau, H.M., and Bogyo, M. (2007) Noninvasive optical imaging of cysteine protease activity using fluorescently quenched activity-based probes. Nat. Chem. Biol. 3, 668–677. Zacharakis, G., Kambara, H., Shih, H., Ripoll, J., Grimm, J., Saeki, Y., Weissleder, R., and Ntziachristos, V. (2005) Volumetric tomography of fluorescent proteins through small animals in vivo. Proc. Natl. Acad. Sci. U.S.A. 102, 18252–18257. Jaffer, F.A., Tung, C.H., Gerszten, R.E., and Weissleder, R. (2002) In vivo imaging of thrombin activity in experimental thrombi with thrombin-sensitive near-infrared molecular probe. Arterioscler. Thromb. Vasc. Biol. 22, 1929–1935. Mahmood, U., Tung, C.H., Bogdanov, A. Jr., and Weissleder, R. (1999) Near-infrared optical imaging of protease activity for tumor detection. Radiology 213, 866–870. Watzke, A., Kosec, G., Kindermann, M., et al. (2008) Selective activity-based probes for cysteine cathepsins. Angew. Chem. Int. Ed. Engl. 47, 406–409. Bredemeyer, A.J., Lewis, R.M., Malone, J.P., et al. (2004) A proteomic approach for the discovery of protease substrates. Proc. Natl. Acad. Sci. U.S.A. 101, 11785–11790.
66. Nestler, H.P. and Doseff, A. (1997) A two-dimensional, diagonal sodium dodecyl sulfate polyacrylamide gel electrophoresis technique to screen for protease substrates in protein mixtures. Anal. Biochem. 251, 122–125. 67. St. Hilaire, P.M., Willert, M., Juliano, M.A., Juliano, L., and Meldal, M. (1999) Fluorescence-quenched solid phase combinatorial libraries in the characterization of cysteine protease substrate specificity. J. Comb. Chem. 1, 509–523. 68. Meldal, M. (2002) The one-bead two-compound assay for solid phase screening of combinatorial libraries. Biopolymers 66, 93–100. 69. Tyndall, J.D.A., Nall, T., and Fairlie, D.P. (2005) Proteases universally recognize beta strands in their active sites. Chem. Rev. 105, 973–999. 70. Leung, D., Abbenante, G., and Fairlie, D.P. (2000) Protease inhibitors: Current status and future prospects. J. Med. Chem. 43, 305–341. 71. Hajduk, P.J., Sheppard, G., Nettesheim, D.G., et al. (1997) Discovery of potent nonpeptide inhibitors of stromelysin using SAR by NMR. J. Am. Chem. Soc. 119, 5818–5827. 72. Olejniczak, E.T., Hajduk, P.J., Marcotte, P.A., et al. (1997) Stromelysin inhibitors designed from weakly bound fragments: Effects of linking and cooperativity. J. Am. Chem. Soc. 119, 5828–5832. 73. Hajduk, P.J., Boyd, S., Nettesheim, D., et al. (2000) Identification of novel inhibitors of urokinase via NMR-based screening. J. Med. Chem. 43, 3862–3866. 74. Wendt, M.D., Rockway, T.W., Geyer, A., et al. (2004) Identification of novel binding interactions in the development of potent, selective 2-naphthamidine inhibitors of urokinase. Synthesis, structural analysis, and SAR of N-phenyl amide 6-substitution. J. Med. Chem. 47, 303–324. 75. Metz, G., Ottleben, H., and Vetter, D. (2003) Small molecule screening on chemical microarrays. In: Böhm, H.J., and Schneider, G. (eds.) Protein-Ligand Interactions, From Molecular Recognition to Drug Design. Wiley-VCH, Weinheim, pp. 213–236. 76. Dickopf, S., Frank, M., Junker, H.D., et al. (2004) Custom chemical microarray production and affinity fingerprinting for the S1 pocket of factor VIIa. Anal. Biochem. 335, 50–57. 77. Mitcheson, J.S., Chen, J., Lin, M., Culberson, C., and Sanguinetti, M.C. (2000) A structural basis for drug-induced long QTsyndrome. Proc. Natl. Acad. Sci. U.S.A. 97, 12329–12333.
Organizing Bioactive Compound Discovery in Target Families
78. Haverkamp, W., Breithardt, G., Camm, A.J., et al. (2000) The potential for QT prolongation and proarrhythmia by nonantiarrhythmic drugs: Clinical and regulatory implications. Report on a Policy Conference of the European Society of Cardiology. Eur. Heart J. 21, 1216–1231. 79. Kuo, A., Gulbis, J.M., Antcliff, J.F., et al. (2003) Crystal structure of the potassium channel KirBac1.1 in the closed state. Science 300, 1922–1926. 80. Pearlstein, R.A., Vaz, R.J., Kang, J., et al. (2003) Characterization of HERG potassium channel inhibition using CoMSiA 3D QSAR and homology modeling approaches. Bioorg. Med. Chem. Lett. 13, 1829–1835. 81. Aronov, A.M. (2005) Predictive in silico modeling for hERG channel blockers. Drug Discov. Today 10, 149–155. 82. Antcliff, J.F., Haider, S., Proks, P., Sansom, M.S.P., and Ashcroft, F.M. Functional analysis of a structural model of the ATP-binding site of the KATP channel Kir6.2 subunit. EMBO J. 24, 229–239. 83. Jensen, B.F., Vind, C., Padkjar, S.B., Brockhoff, P.B., and Refsgaard, H.H.F. In silico prediction of cytochrome P450 2D6 and 3A4 inhibition using gaussian kernel weighted k-nearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. J. Med. Chem. 50, 501–511.
19
84. Kontijevskis, A., Komorowski, J., and Wikberg, J.E. (2008) Generalized proteochemometric model of multiple cytochrome p450 enzymes and their inhibitors. J. Chem. Inf. Model. 48, 1840–1850; PMID: 18693719. 85. Gleeson, M.P. (2008) Generation of a set of simple, interpretable ADMET rules of thumb. J. Med. Chem. 51, 817–834. 86. Wess, G., Urmann, M., and Sickenberger, B. (2001) Medicinal chemistry: Challenges and opportunities. Angew. Chem. Int. Ed. Engl. 40, 3341–3350. 87. Mueller, G. (2003) Medicinal chemistry of target family-directed masterkeys. Drug Discov. Today 8, 681–691. 88. Nestler, H.P. (2007) The target family approach. In: Schreiber, S.L., Kapoor, T., Wess, G. (eds.) Chemical Biology: From Small Molecules to Systems Biology and Drug Design. 1. Wiley-VCH, Weinheim, pp. 825–851. 89. Fischer, E. (1894) Effekt der Zuckerkonfiguration auf die Enzymwirkung. Berichte 27, 2984–2993. 90. Koshland, D.E. Jr. (1994) The lock-and-key principle and the induced-fit theory. Angew. Chem. Int. Ed. Engl. 33, 2475–2478. 91. Horrobin, D.F. (2003) Opinion: Modern biomedical research: An internally selfconsistent universe with little contact with medical reality? Nat. Rev. Drug Discov. 2, 151–154.
Chapter 2 Compound Library Design for Target Families K.V. Balakin, Y.A. Ivanenkov, and N.P. Savchuk Summary Chemogenomics is a modern approach to analysis of the biological effect of a wide array of small molecule compounds on a large set of homologous receptors or other macromolecular drug targets. However, the relative productivity of the method and the extremely high-cost procedure jointly force the scientist to use additional computational tools for rational compound library design and selection. The present chapter will focus specifically on application of a predictive mapping computational technology in the context of the fundamental principles of chemogenomic approach to foster rational drug design and derive information from the simultaneous biological evaluation of multiple compounds on a set of coherent biological targets. Key words: Chemogenomics, Biological target, Neural modeling, Kohonen self-organized maps, GPCR, Compound library, Chemokine, Receptor
1. Introduction Among the methodological approaches currently described as promising strategies toward drug design and development, chemogenomics comprises a special discipline targeted particularly at systematically studying the biological effect of a vast number of small-molecular compounds on a wide spectrum of principal biological targets. Considering an extremely large quantity of existing data (compounds, targets, and assays) and corresponding information flows (gene/protein expression levels and binding constants) a throughput analysis and an effective searching within the data obtained are too complex for manual manipulations, immediate representation, and a clear understanding of the phenomena observed. From this point of view information technologies and methods play a crucial role in modern drug design and Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_2, © Humana Press, a part of Springer Science + Business Media, LLC 2009
21
22
Balakin, Ivanenkov, and Savchuk
chemogenomics. In addition, it becomes quite obvious that the current drug discovery paradigm states that mass random synthesis and screening do not necessarily provide a sufficiently large number of high-quality leads; therefore chemogenomics, as an effective tool for rational drug design and development, as well as related computational technologies are of great industrial demand. Currently, insights from chemogenomics are increasingly used for the rational compilation of screening sets and for the rational design and synthesis of directed chemical libraries to accelerate drug discovery. In this chapter, we will focus on the application of an advanced knowledge-based neural-net method belonging to a common class of unsupervised classification algorithms widely used in virtual screening. Compound classification methods used for correlation of molecular properties with specific activities play a significant role in modern virtual screening strategies. The most typical application of classification algorithms includes the identification of compounds with desired target-specific activity, which constitutes an essential part of the virtual screening ideology. Such methods can be effectively applied to process the results of high-throughput screening or known literature data, to generate annotated and targeted libraries and develop different neural models to predict a wide spectrum of key pharmacologically relevant features including biological activity (e.g., target-specific activity), pharmacokinetic and ADME/Tox profiles (e.g., blood-brain barrier (BBB) permeability, half-life time and volume of distribution in human blood plasma, cell and organ-specific toxicity), various physicochemical properties (e.g., water and DMSO solubility or melting point). Thus, we have successfully applied Self-organizing Kohonen maps (SOMs) for the analysis of nontrivial space of clinically validated therapeutic agents and approved drug compounds. In particular, the developed models can further be used for selection of screening candidates from virtual databases. The applied virtual screening technology is focused solely on the small molecular level, as opposed to target structure-based design or docking methodology. A leitmotif of the method used is a ligandbased strategy realized in the context of self-organizing mapping. In the framework of chemogenomics, this concept represents a consistent and valuable approach toward both the rational drug design and gathering information from the simultaneous biological evaluation of many compounds on multiple biological targets.
2. Materials 2.1. Databases
Initially, a comprehensive knowledge database should be collected for subsequent neural-net modeling. As a rule it represents a set of compounds with experimentally defined features (such as activity against a target of interest, BBB permeability, toxicity,
Compound Library Design for Target Families
23
cytochrome P450 (CYP450) substrate affinity, etc.). Currently, several commercially available pharmaceutical databases, such as Prous Ensemble (1), WOMBAT (2) or Beilstein (3) databases, as well as proprietary knowledge databases can be easily used as the separate or joint source of information about structures and their specific activities. Among these sources Prous Ensemble database, which is a licensed database of known pharmaceutical agents compiled from the patent and scientific literature, seems to be one of the most convenient and experimentally validated sources. Thus, based on this database we have recently developed a unique model for the prediction of G-protein-coupled receptor (GPCR)-specific activity (4). The overall objective was to investigate differences between various groups of GPCR-specific ligands based on their physicochemical properties (for details see Subheading 4). The main characteristics of Prous Ensemble database are listed as follows: • More than 285,000 therapeutic agents and bioactive compounds including macromolecules, peptidomimetics, small-molecule organic compounds as well as many rare substances (e.g., unique metal-coordinated or metal-containing complexes) • 3,128 launched drugs • 146 preregistered compounds • 8,355 agents in different clinical trials (Phases I–III) • 25,300 compounds in preclinical evaluation • 249,457 compounds currently evaluated in early biological tests • 1,900 precedented targets with key signaling pathways • 7,400 genomics records with various gene-related studies • 90 biomarkers records with uses/techniques • 17,000 synthetic schemes, including more than 70,000 intermediates and reagents • 2,400 companies and research organizations, including sales information on more than 300 It also includes more than: • 650,000 data values from experimental pharmacology studies that delineate drug/receptor and enzyme/target cell interactions • 340,000 data values on pharmacokinetics/metabolism for parent compounds and active metabolites • 80,000 references to clinical trials of compounds currently under study and/or in use in humans • 785,000 references to the current literature, abstracts and proceedings from congresses and symposia, and company communications • 100,000 patent families from 11 leading sources
24
Balakin, Ivanenkov, and Savchuk
Principal information sources are the patent literature, biomedical literature and congresses, company communications, as well as Internet monitoring. Ensemble database covers more than 1,500 journals and 300 conferences annually in the areas of medicinal chemistry, organic synthesis, experimental pharmacology, clinical pharmacology, biomarkers and genomics, as well as tracking the activity of 2,500 pharmaceutical and biotechnology companies. The main focus is on products in discovery or development since approximately 1988 to the present date. 2.2. Experimental Dataset
All structures were selected from the Ensemble database. In addition to approved therapeutic agents these particularly include a set of lead compounds entered in advanced clinical/preclinical trials (a total of 16,540 compounds). Structures were extracted according to the assigned activity class, where the class indicates a common target-specific group such as GPCRs, kinases and proteases, nuclear receptors, and ion channels as well as more than 150 subclasses (for example, serotonin, tachykinin and dopamine receptors, tyrosine, Abl, Aurora and serine/threonine kinases, cysteine and serine proteases, etc.). Prior to the statistical experiments, the molecular structures should be filtered and normalized in order to fulfill certain criteria (see Subheading 3.5 and 4).
2.3. Software
Over the past decade the amount of data arising from various medicinal chemical disciplines especially from biological screening and combinatorial chemistry has literally exploded and continues to grow extremely at a staggering pace. Scientists are constantly being inundated with all types of chemical and biological data. However, the computational tools for integrating and analysis of such data have largely failed to keep pace with these advances. The majority of computer programs targeted for dimensionality reduction, mapping, and throughput data analysis have roughly been realized in simple, often inconvenient console format, or as an external module installed under the different pilot platforms, such as Microsoft Excel and MATLAB computing language; these include, but not limited, to SOM_PAK (console DOS-based version) (5), Ex-SOM (excel-based interface) (6), and SOM Toolbox (Matlab-based interface) (7). To date, several novel powerful Windows-based programs operating in the field of dimensionality reduction and multiparametric data analysis are currently available from different commercial and academic sources. Among these software InformaGenesis (8), NeuroSolutions (9), and NeurOK (10) are typical examples of neural-based computational programs running under Windows and particularly targeted for Kohonen and Sammon mapping. However, there is a relatively small number of such software specialized for chemical data analysis. Among these computational programs, SmartMining (8), which is originally developed and scientifically validated as the
Compound Library Design for Target Families
25
specific calculation module fully integrated into the master computing engine of InformaGenesis platform, is a powerful software primarily based on Kohonen SOM algorithm (see Subheading 3) and additionally supported by many advanced modifications and complex-specific modalities. It should be particularly noted that the program has specifically been designed to work under Windows operating system. In addition to basic Kohonen settings and learning parameters, it includes significant algorithmic-based improvements, such as “Neural Gas,” “Duane Desieno,” “Noise Technique,” “Two Learning Stages and 3D-architecture,” as well as several unique algorithms and specific methods, for instance “Corners,” “Gradient,” and “Automatic Descriptor Selection Algorithm (ADSA)” (8). SmartMining was completely adapted for the analysis of large sets of chemical data of different types and dimensionality. It calculates more than hundred fundamental molecular descriptors that are generally divided into several logical and functional categories, including the basic specific physicochemical features, such as LogP, number of H-bond donors, H-bond acceptors, and rotatable bonds; topological and electrotopological descriptors, such as Zagreb, Wiener and E-state indexes; as well as quasi-3D descriptors, such as Van der Waals volume and surface. All descriptors are directly calculated by using well-known common models and approximations borrowed from the scientific literature (11). In addition, several algorithms have been progressively modified to obtain more exact feature prediction or/and calculation, for example Van der Waals parameters are calculated fairly accurately considering the overlapped volumes and/or surfaces. Consequently, this program represents a real practical tool for enhancement of informational content of virtual compound selections and can be effectively used for developing the neural model describing a target-specific activity.
3. Methods 3.1. Som 3.1.1. The Main Concept of SOM
One of the difficult challenges in data analysis is to be able to represent whatever complexities might be intrinsic to the data in a simple and intuitive form. Traditional methods of data visualization, such as property distribution histograms, are often inadequate to represent the extremely large, high-dimensional datasets common to statistical analyses of large virtual databases. In order to minimize the complexity and reduce number of individual plots needed to visualize this sort of data, one must attempt to reduce the dimensionality of the representation. Several different techniques were proposed to achieve dimensionality reduction, while preserving the topology of the original space. That is, points near
26
Balakin, Ivanenkov, and Savchuk
each other in the high-dimensional space are also near each other in the low-dimensional space. Among these algorithms Kohonenbased SOM’s and Sammon maps are the most widely used techniques for dimensionality reduction with different conceptual basis. SOMs (12) belong to a class of neural networks known as competitive learning or self-organizing networks which in turn are based on unsupervised learning rule (see Subheading 4). They were originally developed to model the ability of the brain to store complex information as a reduced set of salient facts without loss of information about their interrelationships. High-dimensional data are mapped onto a two-dimensional rectangular or hexagonal lattice of neurons in such a way as to preserve the topology of the original space. Each object, or molecule, can be represented as a vector, the components of which are variables with a definite meaning (molecular descriptors). The Kohonen neural network automatically adapts itself to the input data in such a way that similar input objects are associated with the topologically close neurons in the neural network. In the Kohonen approach, the neurons learn to determine the location of the neuron in the neural network that is most “similar” to the input vector. This means that objects on the 2-D map located physically close to each other have similar properties. Kohonen SOM’s can be very effective for visualizing, comparing and filtering chemical libraries. In the last decade, Kohonen maps became popular for comparative analysis and visualization of datasets (13). A study on comparison of benzodiazepine and dopamine datasets was performed with an implementation of a Kohonen network (14). In another study, a dataset of 31 steroids binding to the corticosteroid binding globulin (CBG) receptor was modeled (15). SOMs were used for distinguishing between drugs and nondrugs with a set of descriptors derived from semi-empirical molecular orbital calculations (16). Kohonen map-based algorithm was recently used for clustering and visualization of the National Cancer Institute’s publicly available antitumor drug-screening data (17). This analysis identified relationships between chemotypes of screened agents and their effect on four major classes of cellular activities: mitosis, nucleic acid synthesis, membrane transport and integrity, and phosphatase- and kinase-mediated cell cycle regulation. In our laboratories, we developed an effective classification scheme for segregation of human CYP450 substrates/nonsubstrates using Kohonen SOMs (18). The same methodology was applied for discrimination between BBB well- and poorly permeable drugs (19). Several important preparatory procedures are required prior to initiation of any knowledge-based statistical data mining experiment. Two main steps usually precede the develop ment of a classification model: (1) collection and processing of the knowledge database aiming at generation of quality training datasets; (2) calculation of molecular descriptors and selection of an appropriate subset of descriptors for further modeling.
Compound Library Design for Target Families
27
The detailed procedure of developing the target neural-based model is successively described in the following subsections. 3.1.2. Import From Database
In the first step, the selected dataset including 16,540 structures (see Subheading 2.2) was exported from the Ensemble database into the corresponding SDF file, then it was imported into the SmartMining software; thus, the internal database was fully formed except 36 structures with unrecognized structures. The import procedure was completed in less than 2 min.
3.1.3. Descriptor Calculation
At the second step, the physicochemical and topological properties (molecular descriptors) of compounds should be calculated and analyzed; then a minimal set of key parameters adequately describing the selected dataset should also be determined. Physicochemical properties have long been used to develop structureactivity relationships. They quantify a large number of molecular features known to determine the pharmacokinetic and pharmacodynamic properties of compounds. Among them are molecular weight (MW), octanol-water partition coefficient (logP), molar refractivity, Van der Waals volume (VDWvol) and surface area (VDWsurf), the number of filled orbitals, HOMO and LUMO energies, partial atomic charges and electron densities, dipole moment, ionization potential, and many others. Molecular connectivity or topological indices are numerical values calculated from certain invariants (characteristics) of a molecular graph (20) which encodes features such as branching, ring positions, and bond order. They are attractive for quantifying molecular diversity because they are inexpensive to compute and have been validated through years of use in the field of structure-activity correlation. The available programs calculate indices based on connectivity, shape, subgraph counts, topological equivalence, and electrotopological state (E-State indexes). In addition, several other descriptors encoded specific molecular properties such as number of free-rotatable bonds (RBN), radius of gyration, and whole flexibility, and they can also be included in the same cate gory. Finally, there is a large class of structural descriptors consisting of several minor subclasses including formal, simple structure determinants, such as number of atoms (heteroatoms), rings, or different functionalities (amide fragment, carbonyl group, etc.); key functional motives, such as number of structural fragments with known physicochemical properties (H-bond acceptors and donors). The applicability and use of various topology indices is discussed in several recent papers (21, 22). A thorough overview of different molecular descriptors can be found in a fundamental edition presenting a comprehensive collection of molecular descriptors and a detailed review from the origins of this research field up to the present day (11). Molecular property descriptors have also been extensively reviewed in several excellent reviews (23, 24). Using the internal module for descriptor calculation
28
Balakin, Ivanenkov, and Savchuk
originally integrated in the SmartMining software, more than 100 different molecular descriptors including physicochemical properties, such as LogP, VDWvol/surf and MW; topological descriptors, such as Zagreb, Winner, and E-state indexes; as well as various structural descriptors, such as number of H-bond acceptors/ donors and RBN, were immediately calculated. 3.1.4. Descriptor Selection
After the calculation a feature reduction stage should be performed (for details see Subheading 4). In the modeling studies described here, for reduction of the number of input variables, we have used a unique algorithm, named Automatic Descriptors Selection (ADS), implemented in SmartMining software. The main concept of ADS is briefly described in Subheading 4. In addition to ADS, several well-known and fully validated methods for dimensionality reduction can be also used, for example Principal Component Analysis (PCA). The principles of PCA have been described many times in scientific literature (25, 26) and are not described here. As a result of the performed selection procedure, at the output, an experimental set consisted of seven molecular descriptors including Zagreb index, E-state indexes for the following structural fragments: >C–, –CH2–, –CH3, the number of H-bond donors, HB2 (a structural descriptor which encodes the strength of H-bond acceptors following an empirical rule), and LogP was determined. After all the preparatory procedures were complete, the reference database with selected molecular descriptors was used for development of in silico model with the most appropriate architecture and learning strategy (see Subheading 3.6). Key examples cited in Subheading 4 represent real computational filtering technologies developed at ChemDiv, Inc. for enhancement of knowledge-based content of exploratory chemical libraries for biological screening at the stage of combinatorial synthesis planning. In particular, several methodological pathways toward the design of target-specific combinatorial libraries using different computational-based approaches are also described in Subheading 4. Prior to the statistical experiments, the molecular structures should be filtered and normalized in order to fulfill certain criteria.
3.1.5. Database Preprocessing
The filters described herein represent simple structural and descriptor limitations mainly targeted for excluding the obvious error and outlet structures (outliers) as well as structures with descriptor values lying far beyond a normal distribution criteria, for example a classical Gauss distribution with single/double dispersion threshold. The preprocessing algorithm was easily performed by the following scheme: 1. Remove structures with obvious errors. 2. Remove counterions and solvent molecules in order to obtain single-compound records.
Compound Library Design for Target Families
29
3. Where possible, neutralize charges at acidic and basic groups by adding or removing protons. 4. Remove duplicates within the reference database. 5. Remove redundant tautomeric compound forms, such as tautomers. 6. Remove compounds that do not pass a rejection filter (specific for each particular task). 7. Remove compounds with extremely high/low values of MW (MW < 60 and MW > 700). 8. Remove compounds with very high/low values of LogP (LogP < −6 and LogP > 14). Point 6 needs some additional explanations. For many practical tasks, it is required that the compounds pass a rejection filter that removes chemically reactive or otherwise not suited compounds. Filtering criteria depend on a particular task. For example, compounds can be filtered based on the presence of chemically reactive or toxicophoric fragments, such as alkylating or acylating groups, strong electrophiles, etc. Care should be taken when using a standard set of reactive fragments for exclusion of hypothetically hazardous compounds. For some specific classes of pharmaceutical chemicals, such as oncolytic or protease-active agents, the chemically reactive “warheads” can be necessary elements of the structure. The rejection filter can also include a number of criteria that ensure the general druglikeness (27) or leadlikeness (28) of compounds. In addition, it is usually required to filter molecules based on atom type content, because many standard programs for calculation of molecular descriptors correctly work only with C, N, O, H, S, P, F, Cl, Br, and I atom-containing structures. As a result of implementation of all the preprocessing procedures listed, at the output, the final dataset consisting of 16,000 structures was completely organized and prepared for computational modeling. The following procedure should be carried out for Kohonen network training and map generation. 3.1.6. Kohonen Map Generation
The formal procedure for Kohonen SOM’s generation includes the following stages: 1. Identify and label molecular descriptors (seven descriptors) and training set compounds, which will be used for modeling. 2. Set learning parameters: the number of learning iterations = 3,000, the starting adjustment radius = 8, the initial learning rate = 0.3. In many cases, a unique algorithm, named “Corners,” distributed as additional optionality to the parental SmartMining package can be effectively used to achieve the best results of sample separation. Thus, the algorithm was applied to the training dataset (number of corners = 4).
30
Balakin, Ivanenkov, and Savchuk
3. Normalize descriptor values. 4. Set a proper map size to provide the studied molecules with a sufficient distribution space. For most applications, an optimal map contains 10–100 objects (molecules) per node. A map resolution of 14 × 14 (a resolution that presumably would provide a satisfactory distribution within the constructed Kohonen map with random threshold = 16,000/196~82 compounds per node) was chosen. 5. Determine cell architecture (tetragonal cell) and map dimension (2D). 6. Run the learning process. After the training is accomplished the resulting Kohonen map can be readily formed, the model can be saved and further used for testing and visualizing other objects on the same map. The whole SOM of 16,000 pharmaceutical leads and drugs generated as a result of the unsupervised learning procedure is depicted in Fig. 1. It shows that the studied compounds occupy a wide area on the map, which can be characterized as the area of druglikeness. Distribution of various target-specific groups of ligands in the Kohonen map demonstrates that most of these groups have distinct locations in specific regions of the map (Fig. 2a–l).
Fig. 1. Property space of 16,000 pharmaceutical leads and drugs visualized using the Kohonen map. The data have been smoothed.
Compound Library Design for Target Families
Fig. 2. Distribution of twelve large target-specific groups of pharmaceutical agents within the Kohonen map: (a) G-protein-coupled receptors (GPCR) agonists/antagonists (5,432 compounds); (b) matrix metalloproteinase inhibitors (120 compounds); (c) tyrosine kinase inhibitors (175 compounds); (d) caspase inhibitors (50 compounds); (e) NMDA receptor agonists/antagonists (150 compounds); (f) potassium channel blockers/activators (302 compounds);
31
Fig. 2 (continued) (g) reverse transcriptase inhibitors (160 compounds); (h) serine protease inhibitors (531 compounds); (i) p38 MAPK inhibitors (100 compounds); (j) histamine receptor antagonists (168 compounds); (k) lipoxygenase inhibitors (114 compounds); (l) serine/threonine kinase inhibitors (120 compounds).
32 Balakin, Ivanenkov, and Savchuk
Compound Library Design for Target Families
33
A possible explanation of these differences is in the fact that, as a rule, receptors of one type share a structurally conserved ligand-binding site. The structure of this site determines molecular properties that a receptor-selective ligand should possess to properly bind the site. These properties include specific spatial, lipophilic, and H-binding parameters, as well as other features influencing the pharmacodynamic characteristics. Therefore, every group of active ligand molecules can be characterized by a unique combination of physicochemical parameters differentiating it from other target-specific groups of ligands. Another explanation of the observed phenomenon can be related to different pharmacokinetic requirements to drugs acting on different biotargets. The described algorithm represents an effective procedure for selection of target-biased compound subsets compatible with high-throughput in silico evaluation of large virtual chemical space. Whenever a large enough set of active ligands is available for a particular receptor, quantitative discrimination function can be generated allowing selection of a series of compounds to be assayed against the target. Once a Kohonen network is trained and specific sites of location of target-activity groups of interest are identified, the model can be used for testing any available chemical databases with the same calculated descriptors. The Kohonen mapping procedure is computationally inexpensive and permits real-time calculations with moderate hardware requirements. Thus for a training database consisting of 16,000 molecules with seven descriptors using 3,000 iterations, approximately 1 h is required for a standard PC (Pentium 3-GHz processor) on a Windows 2000/XP platform to train the network. The time increases almost linearly with the size of the database. After the Kohonen network is trained, the 2D map can be created in a short time. It is important to note that focusing on physicochemical rather than structural features makes this approach complementary to any available ligand structure similarity technique. Our own experience and literature data demonstrate that SOMs are an efficient clustering, quantization, classification, and visualization tool very useful in the design of chemical libraries. Possible limitations of this approach are related to the fact that the SOM algorithm is designed to preserve the topology between the input and grid spaces; in other words, two closely related input objects will be projected on the same or on close nodes. At the same time, the SOM algorithm does not preserve distances: there is no relation between the distance between two points in the input space and the distance between the corresponding nodes. The latter fact sometimes makes the training procedure unstable, when the minor changes in the input
34
Balakin, Ivanenkov, and Savchuk
parameters lead to serious perturbation in the output picture. As a result, it is often difficult to find the optimal training conditions for better classification. Another potential problem is associated with the quantization of the output space. As a result, the resolution of low-sized maps can be insufficient for effective visualization of differences between the studied compound categories. 3.2. Focus on GPCR Target Classes
Because of specific cluster structure determined by the internal activation (neuron) function originally realized in the Kohonenbased SOMs it is very difficult to obtain a fully scalable and, at the same time, adequate map with the preservation of all the distances observed among all input samples. Fortunately, using the constructed map it is quite possible to “set the zoom” into the GPCR area. Thus, compounds acting specifically on different GPCR subclasses including a/ b-adrenoceptors, dopamine D1–D4 receptors, tachykinin NK1/NK2, serotonin and chemokine receptors can also be successfully separated within the same map (Fig. 3a–i). As shown in Fig. 3, small molecule compounds targeted against different GPCR subclasses are commonly located separately in different areas within the Kohonen map with minor overlapping. Since a key objective of our research is to analyze chemokine receptor superfamily, generally, in the context of chemogenomic approach adopted specifically for compound library design we have also studied the distribution of compounds within the Kohonen map acting against different chemokine subclasses (Fig. 4a–f). Thus, Fig. 4 clearly shows that, in general, the studied pharmaceutical agents occupy different areas within the obtained map, for example the location of CXCR4 antagonists differs significantly from CCR1/2 position while it is conceptually similar to CCR3/5.
3.3. Similarity Across the Chemokine Receptor Superfamily
During the past decade, the main paradigm in medicinal chemistry has been turning gradually from traditional receptor-specific studies and biological assays to a novel crossreceptor vision. Currently, such approach becomes increasingly applied within the whole pharmaceutical research to enhance the efficiency of modern drug discovery. Following the fundamental principle of chemogenomics, receptors are no longer viewed as single entities but grouped into sets of related proteins or receptor families that are explored in a systematic manner. This interdisciplinary approach aimed primarily to find the links between the chemical structures of bioactive molecules and the receptors with which these molecules interact. Chemogenomics is based on a paradigm originally introduced by Klabunde (29) who has formulated the basic principle – “similar receptors bind similar ligands.” In other words, for a receptor as drug target of interest, known drugs and ligands of similar
Compound Library Design for Target Families
35
Fig. 3. Distribution of several G-protein-coupled receptors (GPCR)-specific groups of pharmaceutical agents on the Kohonen map: (a) chemokine receptors agonists/antagonists (1,212 compounds); (b) dopamine D1–D4 agonists/ antagonists (320 compounds); (c) endothelin ETA/ETB antagonists (121 compounds); (d) CCKA/B agonists/antagonists (77 compounds); (e) bradykinin agonists/antagonists (12 compounds); (f) muscarinic M1 agonists (121 compounds); (g) serotonin receptors agonists/antagonists (477 compounds); (h) tachykinin NK1/NK2 antagonists (103 compounds); (i) b-adrenoceptor agonists/antagonists (556 compounds).
receptors, as well as compounds similar to these ligands, can serve as a convenient starting point for drug discovery. However, the obvious question is: “How can receptor similarity be defined?” A recent review by Rognan (30) provides a comprehensive overview
36
Balakin, Ivanenkov, and Savchuk
Fig. 4. Distribution of six chemokine-specific groups of pharmaceutical agents within the Kohonen map: (a) CXCR4 antagonists (80 compounds); (b) CXCR1/2 antagonists (442 compounds); (c) CCR5 antagonists (238 compounds); (d) CCR3 antagonists (182 compounds); (e) CCR1/2 antagonists (180 compounds); (f) CCR4 antagonists (13 compounds).
on how chemogenomic approaches define receptor and/or ligand similarity and presents case studies on how this knowledge has been applied to rational drug design. The main receptor similarity principles formulated by Rognan are listed as follows: • Receptor class (e.g., GPCRs) • Receptor subclass (e.g., Chemokine receptors) • Overall sequence homology (phylogenetic tree) • Similarity of active binding site (3D structure or 1D sequence motifs) It should be particularly noted that numerous chemogenomic approaches merely apply the classification of target families (such as ion channels, kinases, proteases, nuclear receptors, GPCRs, etc.) or protein subfamilies (such as tyrosine kinases, chemokine receptors, serine proteases, etc.) without taking into account similarities of the determined or assumed ligand-binding sites. However, there are strong evidences that the complex analysis of receptors which includes a formal receptor classification, sequence homology, 3D-similarity, and active binding site construction
Compound Library Design for Target Families
37
provides a more relevant and adequate strategy toward the modern concept of drug design and discovery. The chemokine superfamily includes a large number of ligands that bind to a smaller number of receptors (31, 32), at that multiple chemokine ligands can bind to the same receptor and vice versa. Whereas the perceived complexity and promiscuity of receptor binding introduces an additional challenge in understanding the common mechanism of chemokine ligand binding, with respect to chemogenomics they provide a valuable starting point to investigate key interrelationships across chemokine receptor subfamily. Chemokines (chemotactic/chemoattractant cytokines) are highly basic small secreted proteins consisting on average 70–125 amino acids with molecular masses ranging from 6 to 14 kDa which mediate their effects through binding to seven transmembrane domain (7-TMS) of the specific family of GPCR located on target cell membrane. Since chemokine receptors are members of the common GPCR family the two first similarity criteria (see earlier) are being fulfilled successfully. Currently, there are more than 20 functionally signaling chemokine receptors and more than 45 corresponding chemokine ligands in humans (33). The chemokine ligands and receptors have been divided into several major groups based on their expression patterns and functions. In addition, their genomic organization also provides an alternative chemokine classification. This is apparent from the phylogenetic trees presented in Fig. 5a, b. The chemokine receptor CXCR4 possesses multiple funda mental functions in both normal and pathologic physiology. CXCR4 is a GPCR receptor that transduces signals of its endogenous ligand, the chemokine CXCL12 (stromal cell-derived factor-1, SDF-1, previously SDF1-a). The interaction between CXCL12 and CXCR4 plays a critical role in the migration of progenitors during embryologic development of the cardiovascular, hemopoietic, central nervous systems, and so on. This interaction is also known to be involved in several intractable disease processes, including HIV infection, cancer cell metastasis, leukemia cell progression, rheumatoid arthritis (RA), asthma, and pulmonary fibrosis. Unlike other chemokine receptors, CXCR4 is expressed in many normal tissues, including those of the central nervous system, while it is also commonly expressed by over 25 different tumor cells including cancers of epithelial, mesenchymal, hematopoeitic origin, etc. (35). Since CXCR4 is the most actively studied chemokine receptor and a number of small molecule compounds are currently known to modulate its basic functions it is quite reasonable to investigate similarity links between this and other chemokine receptors based on the formulated criteria (see earlier). Thus, as shown in phylogenetic dendogram, CXCR4 is located closely to CXCR1, CXCR2, as well as CXCR3. This means primarily that these receptors possess a similar genotype and based on this observation they
38
Balakin, Ivanenkov, and Savchuk
Fig. 5. Sequence relationship analysis of the human (h) and mouse (m) (a) chemokines and (b) chemokine receptors (34). In (a), the GRO and IP10 groups of CXC chemokines and the MCP and MIP groups of CC chemokines are circled.
Compound Library Design for Target Families
39
can be logically grouped into the common CXCR family differed genetically from CCR subclass but not significantly. Prima facie it seems to be perfectly reasonable to investigate small molecule space around whole CXCR subclass; however, the last similarity criterion is still not considered. A binding site composition and corresponding space cavity jointly play a key role in the ligand binding process. Furthermore, the majority of ligand-receptor complexes are not static structures; they can change dynamically upon ligand binding. In addition, enforced conformational changes across the active binding site can also be achieved by ligand partial binding followed by internal cavity formation fitted appropriately for deep embedding. There are several scientific reports highlighted the partial sequence homology (25–30%) and high binding sites’ similarity between CCR5 and CXCR4 (36, 37). For instance, using MembStruk methods to develop 3D protein structures for CXCR4 and CCR5, and the HierDock protocol to define the binding site for both these receptors it was clearly shown that the two binding sites, even though being on different sides of their receptors, have similar characteristics (37). In both cases, CCR5 and CXCR4 MembStruk structures are also used to correctly identify the binding sites regions according to mutational studies. In addition, a high degree of similarity was also determined for CCR5 and CCR3 (38). Therefore, from the chemogenomics point of view, it is of practical relevance to test the agents acting against CXCR4 also on activity toward CXCR1–3 and CCR3/5. Thus, compounds are profiled against a set of receptors and not tested against single targets. Returning to Fig. 4, it is certainly interesting that the ligands that bind to CCR3 and CCR5 are located closely to CXCR4 on the Kohonen map with significant overlapping. Combining the results of our computational modeling and theoretical analysis it can be reasonably concluded that the applied mapping technique represents a useful approach to filtering combinatorial libraries for selection of target-specific subsets including those acting against chemokine receptor superfamily. It permits to reduce the size of initial multidimensional chemistry space up to two orders of magnitude, and can be recommended as an efficient classification and visualization tool for practical combinatorial design. It is important that this property-based method is complementary to other target- and ligand structure-based approaches to virtual screening. In addition, Kohonen-based SOMs are fully compatible with both the high-throughput virtual screening protocols and the analysis of small-to-medium-sized combinatorial libraries. The principal limitation of this statistics-oriented technique is that prior to experiment one should have a large enough dataset of compounds active against the target of interest. 3.4. Internal Database Analysis
The described algorithm is quite an effective tool for synthesis planning of de novo chemical libraries. Due to a series of specific filters, the properties of a virtual chemical space to be synthesized
40
Balakin, Ivanenkov, and Savchuk
can be modulated in a wide range of possibilities in order to optimize them according to the purposes of a particular bioscreening program. Usually, the practical design of target-specific combinatorial libraries also includes elements of other virtual screening approaches, such as selection by structural similarity to known selective ligands (including bioisosteric, topologic, heterocyclic, and substructure similarity), 3D pharmacophore search, flexible docking, etc. After synthetic feasibility assessment, the combinatorial libraries focused toward particular biotargets are synthesized and used in primary screenings. This general strategy is applicable for generating the focused libraries toward several protein target classes, such as GPCRs, protein kinases, nuclear receptors, and ion channels. Thus, using the constructed Kohonen map, we have tested an internal set of diverse representative compounds obtained from ChemDiv chemical database (39) (these libraries are available as commercial products at ChemDiv, Inc.). The corresponding procedure is described in the following subsections. 3.4.1. Import from Database
Initially, a set of compounds consisted of 20,000 structures of high diversity (see Subheading 4) was exported from ChemDiv database as an SDF-file with a unique ID number for each structure. Subsequently, it was imported into the SmartMining software; thus, the experimental internal database was successfully formed. The import procedure was fully completed in less than 3 min.
3.4.2. Descriptor Calculation and Mapping
Just after the import stage was finished, the previously saved neural model was loaded and the appropriate descriptors were automatically calculated. After the descriptor calculation procedure was completed, the location of the tested structures was determined using the Kohonen algorithm. The corresponding maps are shown in Fig. 6a–c.
Fig. 6. Distribution of the tested compounds (dotted line) within the Kohonen map: (a) the overlap with G-protein-coupled receptors (GPCR) agonists/antagonists area; (b) the overlap with chemokine receptor antagonist areas; (c) the selection of compounds which can be regarded as potential agents acting against CXCR4 chemokine receptor (shaded area).
Compound Library Design for Target Families
41
3.4.3. Export Results
The compounds which were formally classified by the algorithm as potential CXCR4 agents (5,130 structures) were then exported into an external SDF file containing two principal fields: a unique ID number and the corresponding score value depended solely on “privileged” factor calculated individually for each Kohonen neuron except not-active units. Across the neurons occupied by CXCR4 agonists/antagonists and directly classified as CXCR4 area the maximum “privileged” factor was 2.5 while the minimum one was 1.1; also, the average value was calculated to be 1.3. The last one means that among the neurons assigned to CXCR4 area the average “privileged” index calculated as (CXCR4 agents in % per neuron)/(other pharmaceutical agents in % per neuron) is equal to 1.3. Therefore, the developed in silico model can be reasonably regarded as a predictive system for chemogenomicsbased compound library design.
3.5. Conclusion
At the present drug discovery technologies are undergoing radical changes due to both amazing progress in the genomic research and massive advent of combinatorial synthesis and high-throughput biological screening. Among these approaches, chemogenomics, as an alternative route to innovative drug discovery, provides novel insights into receptor–ligand interaction and molecular recognition by the analysis of large biological activity datasets. Furthermore, a rational drug design which is based primarily on the fundamental concept of this approach often complements high-throughput screening for finding chemical starting points for novel drug discovery projects. The greatest impact of the chemogenomic approaches can be expected for targets with no or sparse ligand information as well as for targets lacking structural 3D data. For these targets, classical drug design strategies like ligand-based and structure-based virtual screening and/or de novo design cannot be applied. Considering poor productivity and high production cost of the method described the key question still remains: how could a rational compound selection be achieved? One solution lies in the integration of advanced computational-based approaches, such as neural modeling, into the drug discovery. In combination, these approaches provide excellent results in the field of both rational “in silico” drug design and “new-age” bioscreening technologies.
4. Notes 1. Databases. The key to success of any statistics-oriented predictive modeling is the availability of a large set of quality data used as a training set. Usually, at least several tens of structures
42
Balakin, Ivanenkov, and Savchuk
belonging to a defined activity category are required for generation of a robust classification model, though this number can vary in a wide interval depending on the task. Sometimes, for an optimal statistical experiment one needs up to hundreds of thousands of compounds in the training set. 2. Dimensionality reduction (descriptor selection). The highdimensional data representations that are commonplace in statistical modeling applications pose a number of problems. Firstly, as the number of variables used to describe data increases, the likelihood that some of the variables are correlated dramatically increases. While certain applications are more sensitive to correlation than others, in general, redundant variables tend to bias the results. Secondly, the amount of the computational effort needed to perform the analysis increases in proportion to the number of dimensions. The latter fact is particularly significant in virtual screening programs, where the possibility of real-time calculations is a crucial problem. Therefore, to simplify the analysis and representation of the data, it is often desirable to reduce the dimensionality of the space by eliminating dimensions that add little to the overall picture. Several techniques were developed to perform the dimensionality reduction, such as PCA (25, 26), multidimensional scaling (40, 41), genetic algorithm (42), sensitivity analysis (43). It should be noted, however, that none of the existing methods guarantees to extract the optimal set of important features for the application at hand. Moreover, the underlying molecular features that influence the biological activities of drugs are usually unknown, and this fact makes a priori feature selection problematic in many cases. It can be concluded that selection of variables for QSAR applications is difficult and, despite many different statistical criteria for the evaluation of the resulting models, a highly subjective and ambiguous procedure (44). This is why in many practical experiments, particularly those associated with the use of nonlinear learning approaches, computational chemists have to make intuitive decisions upon selecting molecular descriptors. In our study we have successfully applied a unique algorithm named “Automatic Descriptor Selection Algorithm” to reduce the dimensionality of the original input space consisting of more than 100 molecular descriptors. The fundamental concept of this method lies generally in the preorganization of Kohonen neurons and assigned weight coefficients based on several common principles. Conceptually, the method resembles a sensitivity analysis widely used in computational modeling. Gradually adding the next descriptor it painstakingly attempts to find the optimal positions of input objects with a maximum degree of dissimilarity between each other following the corresponding metric distances. Starting from any corner
Compound Library Design for Target Families
43
of the Kohonen map, each subsequent vector of descriptor values passing straight through the map walks step by step across the perimeter until the best separation among the input objects is achieved. The algorithm can be effectively performed using both the supervised and unsupervised logic. During each cycle it can also be amplified by a minor learning procedure to estimate the total sensitivity of temporary fixed vector net. As a rule, the selection procedure is continued until the predefined number of descriptors is achieved. 3. Learning strategy. In this study, we have successfully applied the unsupervised learning strategy to design a target-specific library. At the same time, in the most of the reported classification strategies in drug discovery, an alternative supervised learning strategy was used. Generally, the choice between the supervised or the unsupervised learning approach depends directly on the problem and the available data. In both cases, the objects with known answers are needed. In supervised learning, the answers are directly used to influence the learning system; in unsupervised learning, the answers are needed to identify and label the output neurons. Whereas with supervised learning, the system adapts itself to a selected representation of classes, an unsupervised learning method is more flexible due to its many possible outputs. Using the supervised learning, the multivariate objects should be split into three sets (the training, the control, and the test set). In unsupervised learning, the control set is not required, since the learning continues until the network stabilization. It should also be emphasized that Kohonen or Sammon map-based classification methods do not depend on the definition of a negative set, and, therefore, the virtual search for compounds belonging to a particular category of interest can be conducted more objectively. This feature of unsupervised learning strategies is particularly important in many practical tasks when it is hard to define the negative training set correctly. 4. Learning parameters. There are several key program settings that should be manually determined before Kohonen-based learning begins. The first setting is the map size or map resolution. Following an empirical rule, it should be chosen considering the probability of normal, statistically relevant distribution of the estimated objects across the whole map. With respect to a prediction accuracy of the unsupervised neural algorithms such as Kohonen or Sammon mapping there are some methodologies allowing users to calculate the index of separation among the tested objects located on the map using a simple criterion, mathematical equation or advanced techniques such as Support Vector Machine. Whereas the last technique can be effectively applied in Sammon algorithm a simple statistical criterion is more suitable for Kohonen maps.
44
Balakin, Ivanenkov, and Savchuk
Usually, an optimal map contains 10–100 objects (molecules) per node. For achieving a satisfactory statistical outcome it is of utmost importance to consider a random threshold; the assumed average occupancy rate among all the Kohonen neurons should be not less than a random distribution, i.e., in our case, the random threshold = 16,000/196~82 compounds per node. The obtained results have shown that the average occupancy rate significantly exceeds the random threshold in 90% of the active Kohonen neurons. The number of learning epochs (iterations) should be selected considering an initial learning rate. As a rule, a small number of epochs (100–500) forced users to choose a large value of the initial learning rate (0.7–0.5) and vice versa to achieve an appropriate and, at the same time, fast convergence of the algorithm. It is important to note that in the first case the learning step is too large to avoid distortions; many weight coefficients are changed discontinuously to provide an inadequate separation, while in the second one the global convergence of the algorithm would not be achieved as well. In turn, a starting adjustment radius should be chosen considering the map size selected previously. Usually, during the initial iterations this setting should provide a whole neuron activity and gradually decreases it for subsequent learning epochs finally tending to a value that is close to zero. Thus, in our study we have selected the optimal algorithm settings: the number of learning iterations = 3,000, the starting adjustment radius = 8, and the initial learning rate = 0.3.
References 1. http://www.prous.com. 2. Olah, M., Mracec, M., Ostopovici, L., Rad, R., Bora, A., Hadaruga, N., Olah, I., Banda, M., Simon, Z., Mracec, M., and Oprea, T. I. (2004) WOMBAT: World of molecular bioactivity. In: Oprea, T. I. (ed.) Cheminformatics in Drug Discovery. Wiley-VCH, Weinheim, pp. 223–239. 3. Vanco, J. (2003) The Beilstein CrossFire Information System and its use in pharmaceutical chemistry. Ceska Slov. Farm. 52, 68–72. 4. Ivanenkov, Y. A., Balakin, K. V., Skorenko, A. V., Tkachenko, S. E., Savchuk, N. P., Ivachtchenko, A. A., and Nikolsky Y. (2003) Application of advanced machine learning algorithm for profiling specific GPCR-active compounds. Chem. Today 21, 72–75. 5. http://www.cis.hut.fi. 6. http://www.geocities.com. 7. http://www.cis.hut.fi.
8. 9. 10. 11.
http://www.informagenesis.com. http://www.nd.com. http://www.neurok.ru. Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., and Timmerman, H. (2000) Handbook of Molecular Descriptors. Wiley, New York. 12. Kohonen, T. (1990) The self-organizing map. Proceedings of the IEEE 78, 1464–1480. 13. Anzali, S., Gasteiger, J., Holzgrabe, U., Polanski, J., Sadowski, J., Teckentrup, A., and Wagener, M. (1998) The use of selforganizing neural networks in drug design. In: Kubinyi, H., Folkers, G., and Martin, Y. C. (eds.) 3D QSAR in Drug Design. Kluwer/ESCOM, Dordrecht, pp. 273–99. 14. Bauknecht, H., Zell, A., Bayer, H., Levi, P., Wagener, M., Sadowski, J., and Gasteiger, J. (1996) Locating biologically active compounds in medium-sized heterogeneous datasets by topological autocorrelation
15.
16.
17.
18.
19. 20. 21.
22.
23.
24.
25. 26. 27.
Compound Library Design for Target Families vectors: Dopamine and benzodiazepine agonists. J. Chem. Inf. Comput. Sci. 36, 1205–1213. Anzali, S., Barnickel, G., Krug, M., Sadowski, J., Wagener, M., Gasteiger, J., and Polanski, J. (1996) The comparison of geometric and electronic properties of molecular surfaces by neural networks: Application to the analysis of corticosteroid binding globulin activity of steroids. J. Comput. Aided Mol. Des. 10, 521–534. Brűstle, M., Beck, B., Schindler, T., King, W., Mitchell, T., and Clark, T. (2002) Descriptors, physical properties, and drug-likeness. J. Med. Chem. 45, 3345–3355. Rabow, A. A., Shoemaker, R. H., Sausville, E. A., and Covell, D. G. (2002) Mining the National Cancer Institute’s tumor-screening database: Identification of compounds with similar cellular activities. J. Med. Chem. 45, 818–840. Korolev, D., Balakin, K. V., Nikolsky, Y., Kirillov, E., Ivanenkov, Y. A., Savchuk, N. P., Ivashchenko, A. A., and Nikolskaya, T. (2003) Modeling of human cytochrome P450-mediated drug metabolism using unsupervised machine learning approach. J. Med. Chem. 46, 3631–3643. Savchuk, N. P. (2003) In silico ADME-Tox as part of an optimization strategy. Curr. Drug Discov. 4, 17–22. Kier, L. B., and Hall, L. H. (1986) Molecular Connectivity in Structure-Activity Analysis. Wiley, New York. Basak, S. C., Balaban, A. T., Grunwald, G. D., and Gute, B. D. (2000) Topological indices: Their nature and mutual relatedness. J. Chem. Inf. Comput. Sci. 40, 891–898. Bonchev, F. (2000) Overall connectivities/ topological complexities: A new powerful tool for QSPR/QSAR. J. Chem. Inf. Comput. Sci. 40, 934–941. Kubinyi, H. (1993) QSAR. Hansch Analysis and Related Approaches. In: Manhold, R., Krogsgaard-Larsen, P., and Timmermann, H. (eds.) Methods and Principles in Medicinal Chemistry, vol. 1. VCH, Weinheim, pp. 21–36. Livingstone, D. J. (2000) The characterization of chemical structures using molecular properties. a survey. J. Chem. Inf. Comput. Sci. 40, 195–209. Jolliffe, I. T. (1986) Principal Component Analysis. Springer-Verlag, New York. Cooley, W., and Lohnes, P. (1971) Multivariate Data Analysis. Wiley, New York. Clark, D. E., and Pickett, S. D. (2000) Computational methods for the prediction of ‘druglikeness’. Drug Discov. Today 5, 49–58.
45
28. Oprea, T. I., Davis, A. M., Teague, S.J., and Leeson, P. D. (2001) Is there a difference between leads and drugs? A historical perspective. J. Chem. Inf. Comput. Sci. 41, 1308–1315. 29. Klabunde, T. (2006) Chemogenomic approaches to ligand design. In: Rognan, D. (ed.) Ligand Design for G-proteinCoupled Receptors. Wiley-VCH, Weinheim, pp. 115–135. 30. Rognan, D. (2007) Chemogenomic appro aches to rational drug design. Br. J. Pharmacol. 152, 38–52. 31. Zlotnik, A., and Yoshie, O. (2000) Chemokines: A new classification system and their role in immunity. Immunity 12, 121–127. 32. Yoshie, O., Imai, T., and Nomiyama, H. (2001) Chemokines in immunity. Adv. Immunol. 78, 57–110. 33. Balakin, K. V., Ivanenkov, Y. A., Tkachenko, S. E., Kiselyov, A. S., and Ivachtchenko, A. V. (2008) Regulators of chemokine receptor activity as promising anticancer therapeutics. Curr. Cancer Drug Targets 8, 299–340. 34. Zlotnik, A., Yoshie, O., and Nomiyama, H. (2006) The chemokine and chemokine receptor superfamilies and their molecular evolution. Genome Biol. 7, 243. 35. Allavena, P., Marchesi, F., and Mantovani, A. (2005) The role of chemokines and their receptors in tumor progression and invasion: Potential new targets of biological therapy. Curr. Cancer Ther. Rev. 1, 81–92. 36. Pérez-Nueno, V. I., Ritchie, D. W., Rabal, O., Pascual, R., Borrell, J. I., and Teixidó, J. (2008) Comparison of ligand-based and receptor-based virtual screening of HIV entry inhibitors for the CXCR4 and CCR5 receptors using 3D ligand shape matching and ligand-receptor docking. J. Chem. Inf. Model. 48, 509–533. 37. Spencer, E. H. (2005) Development of a Structure Prediction Method for G-Protein Coupled Receptors, Thesis. California Institute of Technology, Pasadena, CA. 38. Efremov, R., Truong, M. J., Darcissac, E. C., Zeng, J., Grau, O., Vergoten, G., Debard, C., Capron, A., and Bahr, G. M. (2001) Human chemokine receptors CCR5, CCR3 and CCR2B share common polarity motif in the first extracellular loop with other human G-protein coupled receptors. Eur. J. Biochem. 263, 746–756. 39. http://www.ChemDiv.com. 40. Torgerson, W. S. (1952) Multi-dimensional scaling: I. Theory and method. Psychometrika 17, 401–419. 41. Kruskal, J. B. (1964) Non-metric multidimensional scaling: A numerical method. Psychometrika 29, 115–129.
46
Balakin, Ivanenkov, and Savchuk
42. Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading. 43. Bigus, J. P. (1996) Data Mining with Neural Networks. McGraw-Hill, New York. 44. Kubinyi, H. (1994) Variable selection in QSAR studies. I. An evolutionary algorithm. Quant.
Struct.-Act. Relat. 13, 285–294.1.Lipinski, C.A., Lombardo, F., Dominy, B.W., and Feeney, P.J. (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 23, 3–25.
Chapter 3 Targeting the Purinome Jeremy M. Murray and Dirksen E. Bussiere Summary Purines are critical cofactors in the enzymatic reactions that create and maintain living organisms. In humans, there are approximately 3,266 proteins that utilize purine cofactors and these proteins constitute the so-called purinome. The human purinome encompasses a wide-ranging functional repertoire and many of these proteins are attractive drug targets. For example, it is estimated that 30% of modern drug discovery projects target protein kinases and that modulators of small G-proteins comprise more than 50% of currently marketed drugs. Given the importance of purine-binding proteins to drug discovery, the following review will discuss the forces that mediate protein:purine recognition, the factors that determine druggability of a protein target, and the process of structure-based drug design. A review of purine recognition in representatives of the various purine-binding protein families, as well as the challenges faced in targeting members of the purinome in drug discovery campaigns will also be given. Key words: ATP-binding proteins, Drug design, Drug discovery, Purinome
1. Introduction The discovery of adenosine triphosphate (ATP) was first reported by Karl Lohmann at the Kaiser Wilhelm Institute of Biology in Berlin, as well as by Yellapragada SubbaRow and Cyrus Fiske at Harvard Medical School in Boston, in 1929 (1, 2). Both groups isolated the molecule from muscle as adenosine monophosphate (AMP) and inorganic pyrophosphate (PPi) due to the protocols utilized and the general reactivity of the molecule. At the time, neither group had elucidated the chemical structure, instead referring to ATP as “adenylpyrophosphate.” The energetic potential contained within ATP was revealed in later work by Lohmann and Otto Meyerhof, which showed that a considerable amount of heat was generated by the hydrolysis of ATP to AMP (DH), Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_3, © Humana Press, a part of Springer Science + Business Media, LLC 2009
47
48
Murray and Bussiere
a
NH2 N O
O
O
N
O P O P O
-O P −
O
O−
O
O−
b
O N
N O N
O
O
O P O P O
-O P −
O
−
O
−
O
d
O N
N
N
N
O
O
O
O P
O P
O−
O OH
cAMP
NH2
GTP
NH2
N
N
OH OH
ATP N
N O
O
OH OH
c
N
O−
N N
NH2
O OH
cGMP
Fig. 1. Chemical structures for (a) ATP, (b) GTP, (c) cAMP, and (d) cGMP.
approximately 25 kcal/mol by their measurements at the time (3). The determination of the proper chemical structure would take until 1935, when both Katashi Makino, an investigator at the Dailen Hospital in Manchuria, and Karl Lohmann, separately proposed the correct structure (Fig. 1) (4, 5). The work of these and other investigators revealed the molecule that serves as the common currency in energy conversions within all organisms on the planet. Hydrolysis of ATP to ADP and inorganic phosphate (Pi) produces a free energy of hydrolysis of approximately −7.7 kcal/mol, equivalent to the free energy of hydrolysis of ATP to AMP and pyrophosphate. The hydrolysis of ADP to AMP and Pi produces a slightly lower free energy of hydrolysis of −6.6 kcal/ mol. The phosphoanhydride bonds in ATP are high energy due to the electrostatic repulsion between the phosphate groups, which are negatively charged at physiological pH; cleavage of the bond between the a- and b-phosphate or the b- and g-phosphate relieves this strain. In 1958, Ted Rall and Earl Sutherland at Washington University in St. Louis discovered that a chemical permutation of AMP was the intermediary in many hormonal functions (6, 7). This intermediary, or “second messenger,” was cyclic adenosine monophosphate (cAMP; Fig. 1). Unlike ATP, they discovered that cAMP was heat stable as it lacks the high-energy bonds presented by the b- and g-phosphates. This discovery was of critical importance for understanding the mechanism of action of many hormones (8). Further investigation into the role of cAMP as a second messenger led to the discovery of another essential purine. In a series of experiments between 1969 and 1971, Lutz Birnbaumer, Martin Rodbell, and colleagues at the National Institute of Arthritis and Metabolic Disease in Bethesda showed that hormone-stimulated cAMP production requires guanosine
Targeting the Purinome
49
triphosphate (GTP; Fig. 1), which led to the postulation of a transducer which links the receptor and the cyclase which subsequently produces cAMP (9–22). We now know that G proteins play the role of this transducer. Like ATP, GTP contains two high-energy bonds with free energies of hydrolysis similar to that of ATP. As is the case with ATP and cAMP, a cyclic analog of GTP also exists: cyclic guanosine monophosphate (cGMP; Fig. 1). cGMP was discovered in 1963 by Donald Ashman and T. Price at Columbia University in New York as one of the main phosphate-containing molecules in animal urine (23). In the 1980s, it was discovered that nitric oxide, a readily diffusible gas, was the hormone responsible for triggering cGMP production in smooth muscle, which leads to activation of cGMP-dependent protein kinase and smooth muscle relaxation (24–26). It is highly likely that cGMP plays additional as-of-yet unidentified roles in vivo. In addition, as shown by Phoebus Levene and coworkers at the Rockefeller University in New York in a series of studies in the early decades of the twentieth century, the two purine bases also comprise half of the genetic code (27, 28, see Note 1). Given the roles and importance of these purines to all living organisms as the barometer of the energy content of both the cell and the organism as a whole, as well as comprising half of the genetic code, it is not surprising that a significant percentage of the human genome codes for purine-binding proteins (29). As shown in Fig. 2, this “purinome” is estimated to consist of a total of 3,266 genes. Assuming that the human genome comprises approximately 25,000 genes, approximately 13% of the human genome is used to code for proteins that use purines as either cofactors or substrates (30). The two predominant purinebinding protein classes are involved in cell signaling: the small G-proteins (comprising 3% of the genome), which use GTP as a substrate, and protein kinases (2.1% of the genome), in which ATP acts as a phosphate donor. The remaining purine-binding proteins are divided between 12 classes, from dehydrogenases (1.8% of the genome) to purinergic receptors (10 mM, and 10 mM, respectively (90). Examination of costructures of IBMX bound to both PDE4D2 (a cAMP-specific PDE) and PDE5A1 (a cGMP-specific PDE) demonstrates that in both enzymes, IBMX binds in the active site pocket, mainly interacting with the conserved residues. In PDE5, the xanthine ring stacks against Phe820 on one side, and with Val782 and Phe786 on the other side. A bidentate hydrogen bond interaction is observed between the O6 and N7 of IBMX and the sidechain of Gln817 (91). Interestingly, the orientation of the pyrimidinone core in IBMX observed in the costructure with PDE5 is flipped relative to the binding mode of the pyrimidinone core of 5¢-GMP as observed in the PDE5 5¢-GMP cocrystal structure (84). The binding mode of IBMX orients the five-membered imidazole ring on top of the six-membered pyrimidinone ring of 5¢-GMP (Fig. 6). This binding mode provides the rationale for increased PDE5 potency of IBMX analogs substituted with hydrophobic substituents at the C8 position. For example, 8-(norbornylmethyl)-IBMX has an IC50 of 1.5 nM against PDE5 and improved selectivity against PDE1 (IC50 of 30 nM) and PDE4 (IC50 of 10 mM) (92, 93). Among the first-generation PDE5 inhibitors to make it to the clinic are purine isosteres represented by sildenafil and vardenafil. Comparison of the chemical structures of sildenafil and vardenafil illustrates that these retain the bicyclic guanine core, but differ in the arrangement of nitrogen atoms (Fig. 7).
Nε
Q817 O6 Oε N1
Fig. 6. Flipped binding mode of IBMX compared to 5¢-GMP. 5¢-GMP is represented as sticks (PDB ID 1T9S) and IBMX is represented as ball and sticks (PDB ID 1RKP). The interaction with the conserved Gln817 is shown.
Targeting the Purinome
a
b
O N O
N S
O N
N
HN
N
N N
O
O
HN
S
N
O
O
Q817
65
N
N
O
Q817
Fig. 7. Comparison of the chemical structures of (a) sildenafil (5-[2-ethoxy-5-(4-methylpiperazin-1-yl)sulfonylphenyl]-1-methyl-3-propyl-4H-pyrazolo[5,4-e]pyrimidin-7-one) and (b) vardenafil (2-[2-ethoxy-5-(4-ethylpiperazin-1-yl) sulfonylphenyl]-5-methyl-7-propyl-1H-imidazo[5,1-f][1,2,4]triazin-4-one) illustrates that these retain the bicyclic guanine core, but differ in the arrangement of nitrogen atoms. Comparison of the cocrystal structures of PDE5 in complex with sildenafil (PDB ID 2H42) and vardenafil (PDB ID 1UHO) shows that the binding mode of both sildenafil and vardenafil is similar to the binding mode of the guanine ring in the 5¢-GMP complex. The interaction with the conserved Gln817 is shown.
Comparison of the cocrystal structures of PDE5 in complex with sildenafil and vardenafil shows that the binding mode of both sildenafil and vardenafil is similar to the binding mode of the guanine ring in the 5¢-GMP complex. Thus, the C8 position of IBMX maps onto the C5 position of sildenafil (Fig. 7). The cocrystal structure of sildenafil in complex with PDE5 demonstrates that the pyrazolopyrimidine core and the substituted ethoxyphenyl bind similarly in all PDE5 costructures, with Gln817 making a bidentate hydrogen-bond interaction with the N6 and O7 atoms of the pyrazolopyrimidinone core of sildenafil (77, 82, 84, 94). The ethoxyphenyl group extends into a hydrophobic pocket made up of Val782, Ala783, Phe786, Leu804, and Ile813. There are intriguing conformational differences in the sulfonyl and the methylpiperazine groups observed in the three different PDE5 sildenafil costructures (PDB ID 1DUT, 2H42, and 1TBF). This
66
Murray and Bussiere
is thought to be due to the conformational change in the H-loop that moves up to 24 Å compared to the conformation observed in unliganded PDE5 (Fig. 4) (94). In contrast to these purine isosteres, tadalafil uses a novel tetracylic hexahydro-2-methyl-diketopiperazine-indole core. The cocrystal structure of tadalafil with PDE5 indicates that the indole nitrogen is involved in a single hydrogen bond interaction with the e-oxygen of Gln817 (Fig. 8) (77). This interaction orients the rest of the diketopiperazine core toward the mobile H-loop where it forms a pi-stacking interaction above and below the plane of the ring with Phe786 and Phe820, respectively. The interaction with Gln817 orients the benzodioxol substituent into the same hydrophobic pocket utilized by the ethoxyphenyl of sildenafil and vardenafil. A recent review of the SAR of a large number of PDE5 inhibitors from
HN
O N O
O
O N
Oε
Q817
Fig. 8. Chemical structure of tadalafil (6-benzo[1,3]dioxol-5-yl-2-methyl-2,3,6,7,12, 12a-hexahydro-pyrazino[1¢,2¢:1,6]pyrido[3,4-b]indole-1,4-dione). Cocrystal structure of PDE5 and tadalafil (PDB ID 1UDU) with the interaction with the conserved Gln817 is shown.
Targeting the Purinome
67
various scaffolds suggests that drug discovery campaigns against this target class will benefit from the groundwork done in the development of PDE5 inhibitors (95). 6.2. Protein Kinases – Achieving Selectivity in a Family with High Homology
Protein kinases are essential regulators of cellular signaling cascades. As was outlined in the introduction, ATP is the energy currency of the cell. However, it also serves a role as a substrate in the post-translational modification by phosphor ylation. The phosphorylation of proteins has an important role in virtually all cellular and metabolic signaling pathways (57) and protein kinases are ubiquitous in the cell. Eukaryotic protein kinases can be split into two broad groups: conventional protein kinases (ePKs) and atypical protein kinases (aPKs). Using the sequence similarity within the kinase domain, the presence of accessory domains and the mechanism of regulation, 518 individual ePKs have been identified representing approximately 1.7% of the human genome. These can be further clustered into eight families (96). The aPKs constitute a smaller set of protein kinases that do not display sequence similarity with the ePKs. The catalytic domain of an ePK is constructed from approximately 250 amino acids, comprising two domains connected by a flexible linker or hinge. The smaller N-terminal domain has a mainly beta-sheet secondary structure (b1–b5) with one conserved helix, the aC helix. The secondary structure of the C-terminal domain is composed of two beta-strands (b7 and b8) and seven conserved helices (aD, aE, aEF, and aF-aI). The active site is located at the interface between these two domains. Within this active site, the adenine ring of ATP interacts with the hinge region, donating a hydrogen bond from the exocylic amine to the backbone carbonyl of residue n (the numbering of this residue will depend on the kinase under consideration) in the hinge and the N1 nitrogen accepting a hydrogen bond from the backbone amide of hinge residue n + 2 (Fig. 9). Hydrophobic residues lie above and below the plane of the adenine ring, thereby enforcing the strong tendency of this site to bind flat aromatic rings. The ribose-binding site is located close to the start of the aD helix. The triphosphate group is held in place by the glycine-rich phosphate-loop (P-loop) that contains the Walker-A motif (–GXXGXGK[T/S]–). The Asp from the conserved DFG motif generally coordinates to a Mg2+ ion which in turn contacts the b- and g-phosphate oxygen atoms. In regard to substrates, approximately 80% of protein kinases are Ser/Thr kinases, and despite possessing highly similar folds, protein kinases recognize and phosphorylate a wide range of protein substrates. The ePKs catalyze the transfer of the g-phosphate of ATP to the hydroxyl group of Ser, Thr, or Tyr residues. Phosphotransfer is metal-ion dependent and proposed to occur by an inline attack of the lone-pair electrons of the substrate hydroxyl group on
68
Murray and Bussiere
Fig. 9. Adenine ring of ATP interacts with the hinge region, donating a hydrogen bond from the exocylic amine to the backbone carbonyl of residue n (the numbering of this residue will depend on the kinase under consideration) in the hinge and the N1 nitrogen accepting a hydrogen bond from the backbone amide of hinge residue n + 2. The side chain of the gatekeeper is shown (in this picture Met). The Mg2+ ions are shown as small spheres; the catalytic residues and the XDFG motif are shown as sticks. This illustration is based on the high-resolution crystal structure of the ternary complex of PKA (PDB ID 1RDQ).
the g-phosphorous of the ATP with the Mg2+ ion reducing the electrostatic repulsion of the incoming nucleophile (97). The phosphorylation site specificity of a kinase is largely determined by the substrate’s localization, unique surface charge, and hydrophobicity. In general, four residues to either side of the residue to be phosphorylated (the P-site residue) provide the first level of substrate specificity (98). Protein kinases are highly validated drug targets with over 30 distinct kinase targets being developed to the stage of Phase I clinical trials (99). The importance of kinase inhibitors is further underlined by the finding that in oncology, the success rate for kinase inhibitors is higher compared to all other drugs (100). The identification of potent ATP-competitive kinase inhibitors is not usually a problem in a kinase drug discovery campaign, instead achieving the desired selectivity profile against other kinases, particularly those within the same family, remains one of the biggest challenges. As explained in the introduction, there is often a correlation between the selectivity of an inhibitor for the kinase of interest and the sequence similarity of the kinase antitargets (101, 102). Large-scale selectivity profiling of available kinase inhibitors provides insight into not only selectivity of inhibitors, but also which kinases are easy or difficult to inhibit by the currently available chemical matter. However, there may be potential biases in large-scale selectivity analyses, introduced
Targeting the Purinome
69
by for example kinases that are difficult to express and assay. Nevertheless, knowledge about compound selectivity at the kinome level also provides an important indication of whether certain kinases and/or compounds are inherently more promiscuous than others. When the selectivity profile is coupled with cocrystal structures it facilitates the application of structure-based design techniques to rationally probe the interactions that provide selectivity, or lack thereof, for a specific compound and kinase. Furthermore, it facilitates the identification of new interactions in the context of different scaffolds, enabling scaffold hopping, scaffold morphing, and target hopping techniques to be employed. Sequence analyses in concert with the wealth of structural information that is available for protein kinases and their inhibitors suggest there are key features of the active sites of protein kinases that can be exploited to gain more selectivity for the kinase of interest versus kinase antitarget(s). The choice of which features are exploited is generally scaffold and kinase specific. This section will describe five such features that can be rationally targeted using structure-guided drug design techniques to both improve the potency and/or increase the selectivity of ATP-competitive inhibitors against a kinase antitarget(s). These features are as follows. 6.2.1. The Gatekeeper Residue
The gatekeeper residue is so named for its ability to restrict access to the hydrophobic DFG pocket as it is situated midway in the hinge region connecting the N- and C-terminal domains (103–105). This gatekeeper residue is constrained to be a threonine or larger residue and is an important determinant of inhibitor selectivity in kinases (106, 107). Sequence comparison coupled with data on inhibitor selectivity indicates that there is a fairly good correlation between the size of the gatekeeper residue and the SAR seen between particular kinases (101). Furthermore, ATP-competitive kinase inhibitors can be classified according to whether any substituents access the DFG pocket and impose an inactive, “DFG-out” conformation on the kinase as do the Type II kinase inhibitors, or whether the compound binds solely in the ATPbinding pocket and does not require a change in the conformation of the kinase, as do the Type I kinase inhibitors (108). In addition, mutational data from gleevec resistant forms of Bcr–Abl, together with structural understanding of the binding mode of gleevec, suggest that the mutation of the gatekeeper residue plays an important role in acquired resistance to gleevec (109). Drug discovery campaigns directed against a specific kinase are often able to exploit a single gatekeeper residue difference to achieve the desired selectivity for the target and decrease activity against antitargets. An example of the importance of the gatekeeper residue on selectivity has been demonstrated through work on p38 isoforms. The triarylimidazole inhibitor, SB203580 (Fig. 10), is a type I p38 inhibitor that is selective for the p38a
70
Murray and Bussiere
a
b
F CF3 N
N
N
S N H
O
NH
HN
N
HN O
Fig. 10. Chemical structures of kinase inhibitors. (a) p38 inhibitor SB203580 (4-[5-(4-fluorophenyl)-2-(4-methylsulfinylphenyl)-3H-imidazol-4-yl]pyridine). (b) EGFR inhibitor (N-(3-(6-(3-(trifluoromethyl) phenylamino) pyrimidin-4-ylamino) phenyl) cyclopropanecarboxamide). The dashed line indicates hydrogen-bonding interactions with the hinge.
and p38b isoforms of p38 (IC50 of 48 and 50 nM, respectively) in comparison to the closely related p38g and p38d isoforms (IC50 of greater than 10 mM for both isoforms) (110). The selectivity can be rationalized to a single difference in the gatekeeper residue, which is a threonine in p38a and p38b, and a methionine in p38g and p38d (110–112). The crystal structure of p38a in complex with SB203580 (PDB ID 1A9U) demonstrates that the pyridine N accepts a hydrogen bond amide from the backbone amide of the hinge, and this orients the trisubstituted imidazole so that the fluorophenyl group projects toward the hydrophobic pocket located beyond the Thr gatekeeper residue in p38a (112). The presence of a Methionine would most likely result in a steric clash and therefore a loss in potency in p38g and p38d. Another example of how the gatekeeper residue can be exploited to enhance the selectivity of compounds is provided by the rational design of EGFR inhibitors (113). Researchers at the Genomics Institute of the Novartis Research Foundation were able to convert the generally promiscuous diaminopyrimidine scaffold to a selective EGFR inhibitor by the design of compounds that hydrogen bond with the sidechain of the Thr766 gatekeeper residue in EGFR. The selectivity of the resultant 4,6-dianilinopyrimidines was rationalized by docking, SAR, and mutagenesis of the gatekeeper. The hydrogen bond interaction with the gatekeeper of EGFR likely contributed to the selectivity observed for these compounds (113). 6.2.2. The XDFG Motif
As stated previously, kinase inhibitors can be classified as type I and type II binders based on their ability to access the hydrophobic DFG
Targeting the Purinome
71
pocket (108). Although the DFG motif is generally well conserved across ePKs, the residue preceding this motif is generally not conserved. Instead, a range of different amino acids with physicochemical properties is tolerated in the position N-terminal to the DFG, hereafter referred to as the X position (Fig. 9). Analysis of protein structures that display an inactive, DFG-out conformation suggests that the X residue in the XDFG motif is most frequently a small side chain such as Ala or Gly (114). Furthermore, no kinase crystal structures with a DFG-out conformation have been reported for kinases that contain large residues in the X position of the XDFG motif and the gatekeeper position (114, 115). 6.2.3. The Selectivity Region on the aD Helix
The aD helix is a conserved structural element that is located at the mouth of the ATP-binding site. Many kinases possess a conserved acidic residue at the N-terminus of this helix that interacts with the hydroxyl groups of the ribose sugar. In numerous crystal structures of the kinase domain, one face of the helix is solvent exposed. Despite this, residue differences on the solvent exposed face of the helix have been exploited to achieve selectivity against kinase antitargets. An excellent example of how residue differences on the aD helix can be used to achieve selectivity against antitargets is illustrated by the rational design of selective CDK4 inhibitors. In this study, structure-based drug design techniques were used to identify residue differences close to the active site that could be exploited to convert a pyrroloisoindol-9-urea core that had equipotency for CDK2 and CDK4 into a selective CDK4 inhibitor (116). In this particular case, a threonine residue on the aD helix that is specific to CDK4 was targeted. The equivalent residue in CDK2 (the antitarget) is a lysine and therefore substitution with bulky groups directed toward this residue difference provided the required steric hindrance to favor CDK4 binding over CDK2 binding by approximately 190-fold; the IC50s against CDK4 and CDK2 are 2.3 nM and 440 nM, respectively (116).
6.2.4. The Span of Residues Between the Gatekeeper and the Ribose-Binding Site
The hinge region of ePKs can be subdivided into an upper and a lower hinge region. The upper hinge region contains the gatekeeper and the lower hinge region contains the ribose-binding site. There is low sequence conservation among kinases in the hinge region, and approximately 60% of kinases contain a glycine insertion in this region. For example, CDKs have seven residues between the gatekeeper residue (Phe80) and the ribose-binding residue (Asp86) present at the N-terminus of the aD helix. In contrast, AKT2 kinase contains eight residues between the gatekeeper (Met229) and the ribose-binding residue (Glu236). Depending on the specific antitarget for the compound of interest and the vectors provided for substitution from the compound core, introduction of bulkier chemical groups adjacent to this region will provide selectivity through steric occlusion.
72
Murray and Bussiere
6.2.5. Inhibitor-Specific Conformations
A confounding feature of many drug discovery campaigns directed against a specific target is the ability of the target to change conformation in an unpredictable manner in the presence of certain compounds (117). Conformational flexibility is a welldocumented feature of certain kinases (118). However, there are examples in which an inhibitor selects for a conformation beyond the DFG-in and DFG-out conformations observed for type I and type II kinase inhibitors (119–121). The binding of lapatinib to EGFR provides a good example of this phenomenon. In contrast to the crystal structure of the EGFR:OSI-774 complex, which resembles the active conformation (122), the crystal structure of the EGFR:lapatinib complex reveals that the binding results in the rotation of the N-terminal lobe by ~12° and the movement of the N-terminal end of the aC-helix by over 9 Å. Furthermore, it seems probable that this inhibitor-specific conformation may also contribute to increased selectivity against other members of the kinome (102, 119). The design and development of compounds in this class is generally through empirical observation of the phenomena, However, computational approaches are also being developed (123). A common theme of each of these selectivity determinants is the exploitation of favorable sterics in the kinase target of interest. As was described in the introduction, steric clash provides a large energetic penalty to binding and therefore should prevent the ligand from binding optimally to the antitarget.
6.3. Eg5 (Motor Protein and Mitotic Kinesin) – Accessible Active Site, More Amenable Cryptic Binding Site
Biological functions such as chromosome segregation, vesicular transport, and organelle transport require the generation of motile force. The cell’s ability to execute these processes relies heavily on a class of proteins known as motor proteins. The motor proteins utilize the energy derived from ATP hydrolysis to generate and transduce motile force in an efficient manner (124, 125). It is estimated that these molecular motors operate at an efficiency that varies from 70 to 100% depending on the motor protein involved (126). To put this in perspective, a high-efficiency engine from a racing car only has an efficiency of approximately 35%. Each motor protein consists of an N-terminal motor domain, or head, a connective stalk or tether, which is often nanometers long and comprises a coiled-coil structure, and terminate in a C-terminal tail domain (126, 127). The motor proteins comprise three superfamilies: the kinesin superfamily, the dynein superfamily, and the myosin superfamily. Members of the kinesin superfamily and the myosin superfamily share structural homology as well as a moderate sequence homology of between 20 and 50% amino acid identity; the dynein superfamily shares no homology with the other two superfamilies (128). The kinesin superfamily is diverse, with hundreds of members that can be grouped into 14 families based on sequence similarity. A subset of the kinesin superfamily,
Targeting the Purinome
73
the mitotic kinesins, mediate a wide range of functions including mitotic spindle assembly and chromosome movement. The proposed mechanism for kinesin function is as follows. At rest, a kinesin exists in an ADP-bound state. Upon the binding of ATP, the affinity of the kinesin motor domain for microtubules increases, leading to the docking of the kinesin on the microtubule. The kinesin undergoes a directed series of conformational changes that causes the motor domain to “walk” along the microtubule. This “step” positions the kinesin onto the next suitable binding spot of the microtubule. During the motion, the ATP hydrolyzes to ADP through a currently unknown mechanism. At this point, the kinesin either binds another molecule of ATP, releasing the ADP, and undergoes the cycle again, or reverts to its ADP-bound state that has a low affinity for microtubules, and is released from the microtubule. Kinesin-related motor protein Eg5, which is also known as Kinesin Spindle Protein (KSP) or Kinesin Family Member 11, is a member of the kinesin superfamily. Eg5 is localized to the mitotic spindle and is only found in cells undergoing mitosis; it is not found in other cells and appears to have no role outside of mitosis. It is necessary for both the formation of the bipolar mitotic spindle and for the proper segregation of the replicated DNA in the resulting daughter cells. Eg5 is considered a prime target for the development of antimitotic therapeutics. Current antimitotic agents disrupt mitosis by interfering with microtubule function and stability, which leads to severe toxicity and onerous side effects, such as peripheral neuropathy and other neurotoxicities. A therapeutic targeting Eg5 would not affect microtubules, avoiding these toxicities. However, as an Eg5 antimitotic would target rapidly dividing cells whether or not they are malignant, there would still be an effect on rapidly dividing healthy cells such as neutrophils (see Note 4). The structure of Eg5 comprised a central fold possessing an eight-stranded beta-sheet, which is flanked by three alphahelices on each side (129). The a2 helix, which is positioned on the opposite side from the microtubule-binding site, has a loop insertion of unknown function. This loop is present in all kinesins and varies in length depending on the particular family. The ATP-binding pocket is well conserved within the kinesin superfamily and has four major structural components: the P-loop, the Switch I region, the Switch II region, and the adenosine selective domain (Fig. 11). In kinesins, the P-loop shares the signature motif observed in other P-loop containing folds (–GXXXXGK[T/S]–) and is responsible for donating hydrogen-bonds to the b-phosphate from the backbone amide groups of Gly108, Gly110, Lys111, and Thr112. The last amino acid of the P-loop (a serine or threonine) also coordinates to the divalent Mg2+ cation. The two switch regions
74
Murray and Bussiere
a
b
Switch I a6 a4
ADP
Switch II a3 Mg2+
Mg2+
b1
b4
Loop 5 Trp127 a2
b7 b6
Fig. 11. Conformational changes in Eg5 upon binding of monastrol leading to the formation of the allosteric pocket. (a) KSP:ADP cocrystal structure(PDB ID 1II6) with key secondary structure marked. (b) KSP:ADP:monastrol costructure (PDB ID 1Q0B) showing the movement of Loop 5 and the translocation of Trp127 by approximately 10 Å; the monastrol molecule is circled.
form hydrogen bonds with the g-phosphate of ATP and also with each other (130, 131). Because of this, it is hypothesized that significant structural changes occur when the bound ATP hydrolyzes to ADP. Within the adenosine selective domain, pi-stacking forms the basis for recognition of the adenine ring. The adenine ring stacks against the conserved Pro27 on one face and forms a pi-stacking interaction with Phe113 which immediately precedes the C-terminus of the P-loop (Fig. 12). Because of this limited number of interactions, kinesins have a moderate affinity for ATP: Eg5 has a Km of approximately 21 mM for ATP (132). While the ATP-binding pocket has a reasonable volume of approximately 1,980 Å3 and significant chemical and structural features to define it as a druggable binding site, empirical evidence suggests that it is not the most amenable site on the macromolecule for drug discovery. In 1999, a virtual screening campaign using the structure of Kinesin K560 and targeting the ATP-binding site and the putative microtubule-binding site led to the identification microtubulecompetitive compounds, but no ATP-competitive compounds (133). Building on this work, in 1999, a small-molecule phenotypic screen for compounds that affected mitosis was executed. This screen identified a compound, monastrol, that arrested mitosis in mammalian cells and was specific for Eg5 (134). Further characterization revealed that the (S)-enantiomer was the
Targeting the Purinome
75
Gly 108 Pro27 ADP ?~12 12 Å
Mg2+ Phe113
a3
Asp265 Thr112 Arg221 Glu116
Asp130
monastrol Arg119 Tyr211 Trp127
Fig. 12. Location of the allosteric pocket in relation to the ATP-binding pocket using the KSP:ADP:monastrol costructure (PDB ID 1Q0B). Key interactions between the ADP and the protein and the monastrol and the protein are shown with dashed line. Due to the complexity of the figure, distances are omitted.
active species and that the compound was a weak binder, with an IC50 of 22 mM (Fig. 13) (135). Kinetic studies revealed that monastrol was non-ATP competitive and suggested that it was an allosteric inhibitor which bound to a cryptic binding site separate and distinct from the ATPbinding pocket (135, 136). The determination of the cocrystal structure of Eg5 with monastrol confirmed this (137). Upon the binding of monastrol, a series of conformational changes occur that lead to the formation of a novel binding pocket approximately 12 Å from the ATP-binding pocket. This cryptic binding site is situated between the a3 helix and the a2 insertion loop. Upon compound binding, Trp127 and the insertion loop (Loop 5) flip in toward the protein, leading to the translocation of Trp127 by approximately 10 Å. Trp127 creates the “top” of the pocket along with the sidechains of Arg119 from a2 and Tyr211 from a3 helix. The a3 helix unwinds partially to allow entry of an aromatic ring from the compound. Thus, the pocket displays considerable plasticity, which can be both a boon and a curse for drug discovery (see Note 5). The pocket has a volume of approximately 2,250 Å3 and is mostly hydrophobic in nature as its surface is formed from aliphatic and aromatic sidechains. However, the pocket provides key opportunities for hydrogen bonding and electrostatic interactions through available main-chain amide and carbonyl groups, as well as proximal glutamic acid (Glu116) and arginine (Arg221) residues (Fig. 12). Concurrent with the discovery of this pocket, numerous pharmaceutical companies executed drug discovery and design programs targeting Eg5 and other kinesins. In 2004, the first paper
76
Murray and Bussiere
a
b O
OH
N O
N
Cl
O HN
N
O
N H
S
N
Monastrol
Br
CK-106023
c
d
e
O
Cl N
Cl
N
N
N O
S
S
N
NH2
N
N
CK-238273 (Ispinesib)
Fig. 13. Chemical structures for allosteric and ATP-competitive Eg5 inhibitors. (a) Monastrol ((S)-ethyl 1,2,3,4-tetrahydro4-(3-hydroxyphenyl)-6-methyl-2-thioxopyrimidine-5-carboxylate), the first compound discovered which bound to the allosteric pocket; the compound has an IC50 of 22 uM. (b) CK-106023 (N-((R)-1-(3-benzyl-7-chloro-3,4-dihydro-4oxoquinazolin-2-yl)propyl)-4-bromo-N-(3-(dimethylamino)propyl) benzamide), which binds to the same allosteric pocket as Monastrol and has a Ki of 12 nM. (c) CK-238273 N-(3-aminopropyl)-N-((R)-1-(3-benzyl-7-chloro-3,4-dihydro-4oxoquinazolin-2-yl)-2-methylpropyl)-4-methylbenzamide), an optimized analog of CK-106023 which is in phase II and has a Ki of 1.7 nM. (d) 4-(2-(1-phenylethyl)thiazol-4-yl)pyridine, an ATP-competitive thiazole compound which was an initial hit from the Merck compound collection; the compound has an IC50 of 11 uM. (e) 4-(2-(1-(4-chlorophenyl)cyclopropyl)thiazol-4-yl)pyridine, an optimized ATP-competitive thiazole compound with an IC50 of 290 nM.
announcing the development of a novel, non-ATP competitive inhibitor with a quinazoline core, CK-106023 (Fig. 13), was published by Sakowicz and coworkers (125). This compound has a Ki of 12 nM and is 200-fold specific for Eg5 over other members of the Kinesin superfamily. It was also shown to be efficacious against two ovarian cancer cell lines (SKOV3 and A2780) with 50% growth inhibition and GI50 values of 126 nM and 191 nM, respectively. A similar compound, CK-238273, also known as ispinesib, has a Ki of 1.7 nM and has progressed to clinical trials (132). Detailed kinetic experiments have shown that both monastrol and ispinesib slow the release of ADP in the absence and in the presence of microtubules, in essence stabilizing Eg5 in the resting state (132). In addition, the binding of ispinesib also
Targeting the Purinome
77
favors transition to the ADP-bound state by accelerating the rate of Pi release (an indirect measure of ATP hydrolysis). The costructure of CK-238273 demonstrates that the compounds bind to the same allosteric pocket as monastrol (138). In comparison to monastrol, CK-238273 has a 65% increase in hydrophobic interaction area and a superior fit in the allosteric pocket. In addition, a key hydrogen bond is made between the primary amine of the compound and Glu116 and the 8-chloro group on the quinazoline ring forms a stabilizing interactions with Arg221 (138). The additional interactions and superior fit lead to the nanomolar Ki. In addition to these compounds, the number of published non-ATP competitive compounds targeting Eg5 has increased dramatically. The available chemotypes include, but are not limited to, monastrol analogs, dihydropyrroles, dihydropurazoles, tetrahydrocarbolines, tetrahydroisoquinolines, and thiophenes (139). A recent survey of 25 available chemical structures of non-ATP competitive inhibitors has led to the development of a three-dimensional pharmacophore model, which distills the common features of these compounds. The resulting model consists of one hydrogen-bond acceptor, one hydrogen-bond donor, one aromatic ring, and one hydrophobic group suitably arranged in space (140). Like any binding pocket, the allosteric pocket of Eg5 can mutate when put under the selective pressure generated by exposure to compounds in such a way as to acquire resistance to those compounds. It has been possible to generate a colorectal tumor cell line resistant to quinazolinone Eg5 inhibitors. The resistance was conferred via a D130V mutation in the loop 5 region (141, 142). It is possible that an allosteric pocket might be more or less susceptible to the selective pressure mediated by compound exposure. If the pocket has a critical biological role or is immutable due to structural constraints, it will be less susceptible to selective pressure. If the pocket is fortuitous in nature, it will be more susceptible to selective pressure. Thus, the resistance profile of a compound targeting an allosteric pocket is unlikely to be superior to that of an ATP-competitive inhibitor, but it will likely be different. The biological role of the allosteric pocket in Eg5, if any, is currently unknown. The discovery of ATP-competitive inhibitors has lagged significantly behind the discovery and development of compounds targeting the allosteric-binding site, but there have been publications discussing compounds that have all the hallmarks of ATP-competitive inhibitors. For example, the thiazole-containing compound shown in Fig. 13a was identified from a highthroughput screen from the Merck compound collection, has a modest IC50 of 11 uM, and has shown to be competitive for ATP and uncompetitive with microtubules in steady-state ATPase assays (143). Further elaboration of this thiazole scaffold by substituting the methyl group on the benzylic methylene with a
78
Murray and Bussiere
1,1-substituted cyclopropane, as well as introduction of chloro group at the 4-position of the benzyl ring yielded the most potent analog, with this compound reaching an IC50 of 290 nM (Fig. 13). Despite the enticing kinetic data, the exact binding site of these compounds has yet to be determined and no costructure is available. Other ATP-competitive compounds have also been reported (144). Why is it so difficult to discover and optimize ATP-competitive inhibitors of kinesins and yet relatively straightforward to discover inhibitors which target an allosteric site? There is no simple answer, but consider the following: The ATP-binding site of Eg5 is modest in terms of both size and its affinity for ATP. In fact, Eg5 only recognizes the purine based by interacting with the basest of chemical features (the phosphates and the shape and charge of the aromatic ring) and does not make any direct hydrogen bonds with the purine base itself. The high concentration of ATP in the cell abrogates the necessity for this. Thus, the ATP-binding site may indeed not be highly featured enough or large enough to bind a compound significantly different chemically from ATP. In addition, Eg5 exists at “rest” in its ADP-bound form. As shown, the binding of compounds to the allosteric site stabilizes the ADP-bound form of Eg5. Thus, enzymatic screens will tend to favor the discovery of allosteric inhibitors as they are typically run at ATP levels that are below the saturation level, when the ADP-bound form will be more prevalent. This affinity for ADP will also interfere in subsequent costructure determination, preventing the possibility of a drug design campaign. Alternatively, the allosteric site may simply be more highly featured in terms of shape and available interactions and therefore more likely to yield attractive, drug-like hits. Regardless of the explanation, Eg5 serves as an example of a target where the most amenable site for drug discovery is separate and distinct from the ATP-binding pocket, which would serve as the “traditional” active site. Such allosteric sites have also been discovered in other members of the purinome, such as kinases (145, 146). 6.4. Ras (Small G Protein) – Theoretically Druggable, but Difficult in Practice
Ras proteins were originally identified as the gene products of oncogenes capable of inducing cell transformation. Ras is the most frequently mutated oncogene in both solid and soft tumors and is arguably the most highly validated oncology drug target available today. The human genome codes for three RAS genes that give rise to four proteins: H-Ras, K-Ras4A, K-Ras4B, and N-Ras; the K-Ras4A and K-Ras4B isoforms arise from differential splicing of the KRAS gene (147, 148). K-Ras is most mutated in solid tumors, while N-Ras is most frequently mutated in leukemias (149–151). The Ras proteins are part of a superfamily of proteins known as small G proteins. The superfamily can be further divided into the Ras, Rab, Ran, Rho, Sar1/Arf, and Ran
Targeting the Purinome
79
families. The Ras and Rho families regulate gene expression, the Rab and Sar1/Arf families help regulate vesicular transport, and the Ran family regulates the cell cycle as well as nuclear transport. The superfamily of G-proteins is monomeric with molecular weights spanning from 20 to 40 kDa with primary sequence conservation between members running from approximately 30 to 55% homology. Within the Ras family, the homology is much higher at approximately 50%. The cellular localization and biological function of small G-proteins depends on posttranslational modification, which can include attachment of lipid groups, proteolytic cleavage, or methylation. H-Ras and N-Ras are palmitoylated and farnesylated and their C-terminal Cys residue is carboxymethylated, K-Ras is farnesylated, and all Ras isoforms have a small C-terminal peptide removed via proteolysis by an endopeptidase. These modifications ensure their localization to the lipid membrane. Small G proteins are classified as enzymes, functioning as GTP hydrolases. Their enzymatic function is quite poor with kcat in range of 0.03/min (Ras family) to 0.003/min (152). This low level of activity underlies their true cellular function, which is to serve as a readily diffusible switch that can exist in two and only two possible states: ON (or active; GTP-bound form) and OFF (inactive; GDP-bound form). The hydrolase activity exists to ensure that the “switch” function does not become trapped in a permanent ON state, but instead cycles to OFF automatically at a regular interval. To ensure this binary (ON/OFF) mode of function, small G-proteins have an extremely high affinity for their cofactor. For example, Ras’ affinity for GTP (Kd) is estimated to be 10−11 M (153, 154). The affinity for GDP is approximately ten-fold lower (153). This high affinity and high cellular levels of GTP ensure that Ras practically never exists in a purine-free form. It also has another consequence, in that it has allowed for the evolution of mechanisms for exquisitely precise positive and negative regulation of small G-proteins. An example of a typical and simplified activation cycle for Ras is as follows (note that this cycle varies slightly between family members): In response to a signal, guanine nucleotide exchange factor (GEF) proteins interact with Ras and stimulate the exchange of GDP to GTP, thus activating Ras (148, 155). Ras activation is reversed by GTPase-activating proteins (GAPs), which stimulate the intrinsic GTPase activity of Ras, thereby completing the cycle (148, 155). In regard to tertiary structure, Ras shares a unique 20-kDa catalytic domain with the entire small G-protein superfamily. This domain comprises five a-helices (denoted a1–a5), six b-strands (b1–b6), and five loops (G1–G5) (153). Typically, within a particular protein fold, primary sequence is most highly conserved within the secondary structure elements. The opposite is true for this protein fold, where the loops are more highly conserved
80
Murray and Bussiere
between family members (152, 156). The reason for this conservation is functional in nature. The G1 loop (also known as the P-loop), which connects B1 and A1, interacts with the a- and b-phosphates of the GTP through mainchain interactions as well as with the sidechain of Lys16 (numbering as per Ras structures). The G2 loop connects a1 and b2 and contains a critical Thr residue (Thr35) responsible for coordinating to the Mg2+, which is directly coordinated to the GTP via the a- and g-phosphates and also presents Phe28 which forms a distant pi–pi interaction with the guanine base. The G2 region, also known as the effector loop, is highly conserved in GTPases. The G3 loop, which connects b3 and a2, interacts with both the g-phosphate of GTP and the Mg2+, via a water-mediated interaction with Asp57. This region is conserved in all GTPases. The purine base is recognized by two loops, G4 and G5. The G4 loop, which connects b5 and a4, contains a consensus sequence -NKXD- in which Lys117 forms a hydrogen bond to the ribose sugar and Asp119 forms two hydrogen bonds with the guanine base. The G5 loop, which connects b6 and a5, also serves as a recognition sequence for the guanine base by interacting with the exocyclic oxygen via the mainchain amide of Ala146; this interaction would be lacking in ATP and loss of this interaction weakens the affinity by approximately three orders of magnitude, as one would expect from the loss of a strong hydrogen-bond (see Note 6). This, as well as the pi–pi interaction with Phe28 –an interaction specific to Ras– and additional mainchain interactions allow for exclusion of ATP binding (see Fig. 14): G-proteins bind GTP with sevens orders of magnitude higher affinity than ATP (157). In fact, it is possible to alter this specificity via judicious point mutations with serious biological consequences (158, 159). Small G proteins exploit virtually every enthalpic interaction possible with GTP and GDP, thereby explaining their high affinity for these cofactors. This level of interaction between protein and cofactor is atypical for purine-binding proteins. GTP hydrolysis occurs via an SN2 mechanism in which a water molecule is activated by Gln61, which serves as the catalytic base (160–162). GAPs stimulate this activity by positioning an arginine residue into the GTP/GDP-binding site, placing the positively charged guanidino-group proximal to the g-phosphate, thereby stabilizing the negatively charged transition state and promoting the hydrolysis reaction (163). Removal of the g-phosphate leads to pronounced conformational changes in a few distinct regions, revealing the difference between the active and inactive forms. These conformational changes occur in the switch I (G2) and switch II regions (comprising a2 and G3; also known as the effector loop because of its role in effector protein binding) (164–167). These changes are shown in Fig. 15.
Targeting the Purinome
81
Lys117 Asp119
Gly13
Lys16
Gly15
Gly60 Ala18 Mg2+
Thr35
Phe28
Fig. 14. The Ras guanine nucleotide-binding pocket. The figure is in approximately the same orientation as the figure that follows. Important mainchain and sidechain interactions are indicated by dotted lines and distances (in Å) and the pertinent residues are labeled. The Ras:GTP complex is approximated by the cocrystallization of a nonhydrolyzable GTP analog containing a methylene linkage between the b- and g-phosphates (PDB ID 5P21).
The binding of GEFs to Ras requires the P-loop, switch I, and switch II regions to be in the inactive or the GTP-free conformations. Likewise, the active conformation of the switch I and II regions is required for binding to effector proteins and GAPs. The most prevalent activating mutations in Ras are G12D and G12V. The activation occurs primarily through two mechanisms. First, the mutation of the small glycine residue to a bulkier residue leads to a shielding of the nucleophilic water from the g-phosphate. Second, the mutation prevents proper placement of the catalytic arginine residue from GAPs (168). The combined effect is a drastic decrease in GTP hydrolysis. The complexities of developing a GTP-competitive therapeutic for Ras do not lie in its druggability. The active site of Ras is suitable for the isolation and optimization of such a compound. The first barrier is structural in nature: A GTP/GDP-competitive inhibitor of Ras must necessarily promote the formation the inactive conformation of the protein or another conformation that is not recognizable to proteins which interact with Ras. A compound that stabilizes the formation of the active form of Ras would likely be highly oncogenic. The second barrier is thermodynamic: the Kd of Ras for GTP is 10−11 M and the concentration of GTP in the cell is in the mM range. This ensures that Ras will virtually
82
Murray and Bussiere
A4
A3
G4 B6 B5 A5
B4
G1
G3 A2 Switch II
Mg 2+ G5
B3
Switch I
Mg2+ Switch II
B2 Switch I A1
GTP-Bound (‘On’) Conformation
GDP-Bound (‘Off’) Conformation
Fig. 15. Conformational differences between Ras:GTP:Mg2+ and Ras:GDP:Mg2+ (PDB ID 4Q21) complexes. The a-helices are denoted A1–A5, the b-sheets are denoted B1–B6, and the loops are denoted G1–G5. The Ras:GTP complex is approximated by the cocrystallization of a nonhydrolyzable GTP analog containing a methylene linkage between the b- and g-phosphates (PDB ID 5P21). The GTPase activity of Ras prevents crystallization of the complex with GTP. Note the significant conformational differences within the Switch I and Switch II regions between the two structures. The transition between these two states occurs upon GTP hydrolysis.
always be present in the GTP- or GDP-bound forms. To compete successfully with this concentration of cellular GTP, a compound would need bind to Ras with picomolar affinity and be capable of reaching equivalent (or higher) cellular concentrations. While the affinity requirement can be reached with difficulty, the concentration requirement likely cannot be achieved. For example, it has been possible to synthesize GTP analogs that show higher affinity for Ras than either GTP or GDP (190); see Fig. 16. However, it is likely that these compounds would not be specific inhibitors and have unsuitable pharmacokinetic and physiochemical properties given that they are nucleoside triphosphates (189, 190); the triphosphate moiety - or at the very least a disphosphate - is critical for binding. Thus, Ras serves as an example that an amenable binding site is necessary, but not sufficient, for drug discovery. The biochemistry of the system is also of paramount importance. Because of this, considerable work has been undertaken to attempt to develop compounds that inhibit effectors (189, 191). Another possible strategy would be to directly disrupt the interaction of Ras with effectors through a small-molecule therapeutic or to target a yet-unidentified cryptic binding site on Ras to inhibit it allosterically. Given the existing thermodynamic constraints, these two strategies will likely prove more successful than trying to develop a GTP/GDP-competitive drug. For example, there
Targeting the Purinome
a
83
O N
−O
P O−
O
P O−
R
O
O
O
N
O
P
O
N
N
O
O−
N H R=p-FC6H4 p-BrC6H4 p-n-BuC6H4
OH OH
Base Mo dified GTP Analogue
c
O
b
OH
F
O
OH
F
O O
S
Sulindac Sulfide
Optimized Sulindac Analogue
Fig. 16. Chemical structures for Ras inhibitors. (a) GTP analogs with modified bases bind to Ras with up to a Krel (IC50 of Analog divided by the IC50 of GDP) of 3.30. (b) Sulindac sulfide ((3Z)-3-(4-(methylthio)benzylidene)-6-fluoro-3,3adihydro-2-methyl-2H-indene-1-carboxylic acid) disrupts this interaction with an IC50 of approximately 210 mM. (c) An optimized sulindac analog ((3Z)-6-fluoro-3-((furan-3-yl)methylene)-3,3a-dihydro-2-methyl-2H-indene-1-carboxylic acid) has an IC50 of approximately 30 mM. The molecular basis of the compound’s mechanism of action is unknown, but both (b) and (c) are hypothesized to operate by disrupting Ras–Raf interactions.
is experimental evidence that the nonsteroidal anti-inflammatory drug sulindac sulfide inhibits Ras signaling by binding in a noncovalent manner and inhibiting its interaction other proteins with Ras–binding domains, such as Raf (169). The IC50 for the inhibition of Ras–Raf interaction is approximately 210 mM, and analogs have been synthesized that disrupt the Ras–Raf interaction with an IC50 of 30 mM (169, 170) (Fig. 16). As the interaction has no effect on Ras’ GTPase activity, it is necessary that this interaction occur at a site distinct from the active site, although no structural information is available for the nature of the interaction.
84
Murray and Bussiere
7. Conclusion In this chapter, we have strived to expose the reader to the complexities of the purinome, as well as to the methodologies and technologies that can be employed to help in the discovery of new drugs targeting these proteins. As demonstrated, the purinome is a significant and complex component of the human genome. While there are common themes across the purinome, such as positioning of the P-loop to recognize the triphosphate group of ATP and GTP and other such sequence motifs, there is also incredible variation in the exact interactions that are involved with purine recognition, as well as variation in overall protein fold, function, and mechanism. As shown in each of the preceding sections, the druggability and the methodologies used to discover drugs for each selected protein will vary from protein to protein, although it will likely be possible to use common approaches within protein families. Many targets will yield to traditional approaches. Others, such as the small G-proteins represented by Ras will require novel approaches and perhaps even the development novel chemistries to produce compounds occupying a new segment of chemical “space.” In some, the purine-binding pocket and/or active site will be suitable for drug discovery. In others, it will not or there will be a biological or thermodynamic impediment and novel binding pockets will need to be identified. Given this, the work done on targeting the purinome will certainly yield significant scientific discoveries, as well as novel pharmaceutical compounds, over the coming years.
8. Notes 1. The authors recognize that many biologically active purines are not reviewed herein, for example, nicotinamide adenine dinucleotide (NAD) and nicotinamide adenine dinucleotide phosphate (NADP), as well as enzymatically modified analogs of ATP and GTP. These omissions are purposeful and have been instituted to limit the scope of the review. The reader is encouraged to peruse Chapter 4 within this volume for more examples. 2. These complexes can range from homomultimers to supramolecular complexes of multiple proteins such as a virus capsid. In this review, we will focus on single proteins for the sake of simplicity, but the reader should be aware that druggable pockets could exist within these larger
Targeting the Purinome
85
complexes as well. In addition, they can occur at the interfaces between protein and other types of chemical matter, such as RNA or DNA. 3. For example, recent clinical research has been focused on the use of PDE5 inhibitors in the treatment of pulmonary hypertension. See http://clinicaltrials.gov for more information. 4. As one would predict, neutropenia is a demonstrated clinical consequence of treatment with inhibitors of Eg5. 5. In the authors’ experience, a static, immutable binding site makes drug design difficult because opportunities for picking up key contacts for selectivity are limited. In contrast, an active site with high plasticity is likewise difficult to work with as it undergoes some level of structural rearrangement each time it is perturbed, for example, by attaching a novel functional group to a lead compound. 6. Recall from the prior section that a 1000-fold difference in selectivity requires approximately 4.3 kcal/mol difference in binding energy, which is equivalent to a strong hydrogen bond. References 1. Fiske, C.H., and Subbarow, Y. (1929) Phosphorus compounds of muscle and liver. Science 70, 381–2. 2. Lohmann, K. (1929) Über die Pyrophosphatfraktion im Muskel. Naturwissenschaften 17, 624–5. 3. Meyerhof, O., and Lohmann, K. (1932) Über energetische Wechselbeziehungen zwischen dem Umsatz der Phosphorsäureester im Muskelextrakt. Biochem. Z. 253, 431–61. 4. Lohmann, K. (1935) Konstitution der Adenopyrophosporsäure und Adenosindphosphosäure. Biochem. Z. 282, 120–3. 5. Makino, K. (1935) Über die Konstitution der Adenosintriphosposäure. Biochem. Z. 278, 161–3. 6. Rall, T.W., and Sutherland, E.W. (1958) Formation of a cyclic adenine ribonucleotide by tissue particles. J. Biol. Chem. 232, 1065–76. 7. Sutherland, E.W., and Rall, T.W. (1958) Fractionation and characterization of a cyclic adenine ribonucleotide formed by tissue particles. J. Biol. Chem. 232, 1077–91. 8. Haynes, R.C. Jr., Sutherland, E.W., and Rall, T.W. (1960) The role of cyclic adenylic acid in hormone action. Recent. Prog. Horm. Res. 16, 121–38. 9. Birnbaumer, L., Pohl, S.L., Michiel, H., Krans, M.J., and Rodbell, M. (1979) The
10.
11.
12. 13.
14.
15.
actions of hormones on the adenyl cyclase system. Adv. Biochem. Psychopharmacol. 3, 185–208. Birnbaumer, L., Pohl, S.L., and Rodbell, M. (1969) Adenyl cyclase in fat cells. 1. Properties and the effects of adrenocorticotropin and fluoride. J. Biol. Chem. 244, 3468–76. Birnbaumer, L., Pohl, S.L., and Rodbell, M. (1971) The glucagon-sensitive adenyl cyclase system in plasma membranes of rat liver. II. Comparison between glucagon- and fluoride -stimulated activities. J. Biol. Chem. 246, 1857–60. Birnbaumer, L., and Rodbell, M. (1969) Adenyl cyclase in fat cells. II. Hormone receptors. J. Biol. Chem. 244, 3477–82. Pohl, S.L., Birnbaumer, L., and Rodbell, M. (1969) Glucagon-sensitive adenyl cyclase in plasma membrane of hepatic parenchymal cells. Science 164, 566–7. Pohl, S.L., Birnbaumer, L., and Rodbell, M. (1971) The glucagon-sensitive adenyl cyclase system in plasma membranes of rat liver. I. Properties. J. Biol. Chem. 246, 1849–56. Pohl, S.L., Krans, H.M., Kozyreff, V., Birnbaumer, L., and Rodbell, M. (1971) The glucagon-sensitive adenyl cyclase system in plasma membranes of rat liver. VI. Evidence for a role of membrane lipids. J. Biol. Chem. 246, 4447–54.
86
Murray and Bussiere
16. Rodbell, M., Birnbaumer, L., Pohl, S.L., and Krans, H.M. (1979) Properties of the adenyl cyclase systems in liver and adipose cells: The mode of action of hormones. Acta Diabetol. Lat. 7(Suppl 1), 9–63. 17. Rodbell, M., Birnbaumer, L., and Pohl, S.L. (1971) Characteristics of glucagon action on the hepatic adenylate cyclase system. Biochem. J. 125, 58P-9P. 18. Rodbell, M., Birnbaumer, L., and Pohl, S.L. (1970) Adenyl cyclase in fat cells. 3. Stimulation by secretin and the effects of trypsin on the receptors for lipolytic hormones. J. Biol. Chem. 245, 718–22. 19. Rodbell, M., Birnbaumer, L., Pohl, S.L., and Sundby, F. (1971) The reaction of glucagon with its receptor: Evidence for discrete regions of activity and binding in the glucagon molecule. Proc. Natl. Acad. Sci. U.S.A. 68, 909–13. 20. Rodbell, M., Krans, H.M., Pohl, S.L., and Birnbaumer, L. (1971) The glucagon-sensitive adenyl cyclase system in plasma membranes of rat liver. 3. Binding of glucagon: Method of assay and specificity. J. Biol. Chem. 246, 1861–71. 21. Rodbell, M., Krans, H.M., Pohl, S.L., and Birnbaumer, L. (1971) The glucagonsensitive adenyl cyclase system in plasma membranes of rat liver. IV. Effects of guanylnucleotides on binding of 125I-glucagon. J. Biol. Chem. 246, 1872–6. 22. Rodbell, M., Birnbaumer, L., Pohl, S.L., and Krans, H.M. (1971) The glucagon-sensitive adenyl cyclase system in plasma membranes of rat liver. V. An obligatory role of guanylnucleotides in glucagon action. J. Biol. Chem. 246, 1877–82. 23. Ashman, D.F., Lipton, R., Melicow, M.M., and Price, T.D. (1963) Isolation of adenosine 3¢, 5¢-monophosphate and guanosine 3¢, 5¢-monophosphate from rat urine. Biochem. Biophys. Res. Commun. 11, 330–4. 24. Furchgott, R.F., and Zawadzki, J.V. (1980) The obligatory role of endothelial cells in the relaxation of arterial smooth muscle by acetylcholine. Nature 288, 373–6. 25. Ignarro, L.J., Buga, G.M., Wood, K.S., Byrns, R.E., and Chaudhuri, G. (1987) Endothelium-derived relaxing factor produced and released from artery and vein is nitric oxide. Proc. Natl. Acad. Sci. U.S.A. 84, 9265–9. 26. Ignarro, L.J., Byrns, R.E., Buga, G.M., and Wood, K.S. (1987) Endothelium-derived relaxing factor from pulmonary artery and
27. 28. 29. 30.
31. 32.
33.
34.
35.
36. 37.
38. 39. 40.
vein possesses pharmacologic and chemical properties identical to those of nitric oxide radical. Circ. Res. 61, 866–79. Levene, P.A. (1909) Über die Hefenucleinsäure. Biochem. Z. 17, 120–31. Levene, P.A. (1919) The structure of yeast nucleic acid. IV. Ammonia hydrolysis. J. Biol. Chem. 40, 415–24. Haystead, T.A. (2006) The purinome, a complex mix of drug and toxicity targets. Curr. Top. Med. Chem. 6, 1117–27. International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431, 931–45. Cohen, P. (2002) Protein kinases – the major drug targets of the twenty-first century? Nat. Rev. Drug Discov. 1, 309–15. Ja, W.W., and Roberts, R.W. (2005) G-protein -directed ligand discovery with peptide combinatorial libraries. Trends Biochem. Sci. 30, 318–24. Marshall, G.R., Head, R.D., and Ragno, R. (2000) Affinity prediction: The sine qua non. In: Di Cera, E. (ed.) Thermodynamics in Biology. Oxford University Press, Oxford, pp. 87–111. Holdgate, G.A., and Ward, W.H. (2005) Measurements of binding thermodynamics in drug discovery. Drug Discov. Today 10, 1543–50. Knapp, M., Bellamacina, C., Murray, J.M., and Bussiere, D.E. (2006) Targeting cancer: The challenges and successes of structure-based drug design against the human purinome. Curr. Top. Med. Chem. 6, 1129–59. Traut, T.W. (1994) Physiological concentrations of purines and pyrimidines. Mol. Cell. Biochem. 140, 1–22. Cheng, Y., and Prusoff, W.H. (1973) Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem. Pharmacol. 22, 3099–108. Ghaemmaghami, S., Huh, W.K., Bower, K., et al. (2003) Global analysis of protein expression in yeast. Nature 425, 737–41. Knight, Z.A., and Shokat, K.M. (2005) Features of selective kinase inhibitors. Chem. Biol. 12, 621–37. Arooz, T., Yam, C.H., Siu, W.Y., Lau, A., Li, K.K., and Poon, R.Y. (2000) On the concentrations of cyclins and cyclin-dependent kinases in extracts of cultured human cells. Biochemistry 39, 9494–501.
41. Bhatt, R.R., and Ferrell, J.E. Jr. (2000) Cloning and characterization of Xenopus Rsk2, the predominant p90 Rsk isozyme in oocytes and eggs. J. Biol. Chem. 275, 32983–90. 42. Hopkins, A.L., and Groom, C.R. (2002) The druggable genome. Nat. Rev. Drug Discov. 1, 727–30. 43. Keller, T.H., Pichota, A., and Yin, Z. (2006) A practical view of ‘druggability’. Curr. Opin. Chem. Biol. 10, 357–61. 44. Walke, D.W., Han, C., Shaw, J., Wann, E., Zambrowicz, B., and Sands, A. (2001) In vivo drug target discovery: Identifying the best targets from the genome. Curr. Opin. Biotechnol. 12, 626–31. 45. Schneider, M. (2004) A rational approach to maximize success rate in target discovery. Arch. Pharm. 337, 625–33. 46. Peters, K.P., Fauck, J., and Frommel, C. (1996) The automatic search for ligand binding sites in proteins of known threedimensional structure using only geometric criteria. J. Mol. Biol. 256, 201–13. 47. Liang, J., Edelsbrunner, H., and Woodward, C. (1998) Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design. Protein Sci. 7, 1884–97. 48. Hendlich, M., Rippmann, F., and Barnickel, G. (1997) LIGSITE: Automatic and efficient detection of potential small moleculebinding sites in proteins. J. Mol. Graph. Model. 15, 359–63. 49. Brady, G.P. Jr., and Stouten, P.F. (2000) Fast prediction and visualization of protein binding pockets with PASS. J. Comput. Aided Mol. Des. 14, 383–401. 50. An, J., Totrov, M., and Abagyan, R. (2005) Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol. Cell Proteomics 4, 752–61. 51. Laskowski, R.A. (1995) SURFNET: A program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph. 13, 323–30. 52. Goodford, P.J. (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem. 28, 849–57. 53. Kortvelyesi, T., Dennis, S., Silberstein, M., Brown, L. III, and Vajda, S. (2003) Algorithms for computational solvent mapping of proteins. Proteins 51, 340–51. 54. Silberstein, M., Dennis, S., Brown, L., Kortvelyesi, T., Clodfelter, K., and Vajda, S. (2003) Identification of substrate binding
Targeting the Purinome
55.
56.
57. 58. 59.
60. 61.
62. 63.
64.
65.
66.
67.
68.
87
sites in enzymes by computational solvent mapping. J. Mol. Biol. 332, 1095–113. Hajduk, P.J., Huth, J.R., and Fesik, S.W. (2005) Druggability indices for protein targets derived from NMR-based screening data. J. Med. Chem. 48, 2518–25. Brunger, A.T. (1997) X-ray crystallography and NMR reveal complementary views of structure and dynamics. Nat. Struct. Biol. 4 (Suppl), 862–5. Krebs, E.G., and Beavo, J.A. (1979) Phosphorylation-dephosphorylation of enzymes. Annu. Rev. Biochem. 48, 923–59. Nestler, E.J., and Greengard, P (1983). Protein phosphorylation in the brain. Nature 305, 583–8. Lohmann, S.M., Walter, U., and Greengard, P. (1980) Identification of endogenous substrate proteins for cAMP-dependent protein kinase in bovine brain. J. Biol. Chem. 255, 9985–92. Langan, T.A. (1968) Histone phosphorylation: Stimulation by adenosine 3¢,5¢-mono phosphate. Science 162, 579–80. Mayer, S.E., and Krebs, E.G. (1970) Studies on the phosphorylation and activation of skeletal muscle phosphorylase and phosphorylase kinase in vivo. J. Biol. Chem. 245, 3153–60. Krebs, E.G., and Stull, J.T. (1975) Protein phosphorylation and metabolic control. Ciba Found. Symp. 31, 355–67. Cobbs, W.H., Barkdoll, A.E. III, and Pugh, E.N. Jr. (1985) Cyclic GMP increases photocurrent and light sensitivity of retinal cones. Nature 317, 64–6. Kurkin, S.A., Kislov, A.N., and Fesenko, E.E. (1982) Conductance of cytoplasmic membrane of photoreceptors by the method of intracellular dialysis. Biofizika 27, 1053–6. Yau, K.W., and Nakatani, K. (1985) Lightsuppressible, cyclic GMP-sensitive conductance in the plasma membrane of a truncated rod outer segment. Nature 317, 252–5. Bruch, R.C., and Kalinoski, D.L. (1987) Interaction of GTP-binding regulatory proteins with chemosensory receptors. J. Biol. Chem. 262, 2401–4. Striem, B.J., Pace, U., Zehavi, U., Naim, M., and Lancet, D. (1989) Sweet tastants stimulate adenylate cyclase coupled to GTPbinding protein in rat tongue membranes. Biochem. J. 260, 121–6. Butcher, R.W., and Sutherland, E.W. (1962) Adenosine 3¢,5¢-phosphate in biological materials. I. Purification and properties of cyclic 3¢,5¢-nucleotide phosphodiesterase
88
69. 70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
Murray and Bussiere and use of this enzyme to characterize adenosine 3¢,5¢-phosphate in human urine. J. Biol. Chem. 237, 1244–50. Beavo, J.A., Conti, M., and Heaslip, R.J. (1994) Multiple cyclic nucleotide phosphodiesterases. Mol. Pharmacol. 46, 399–405. Wang, H., Liu, Y., Hou, J., Zheng, M., Robinson, H., and Ke, H. (2007) Structural insight into substrate specificity of phosphodiesterase 10. Proc. Natl. Acad. Sci. U.S.A. 104, 5782–7. Wang, H., Liu, Y., Chen, Y., Robinson, H., and Ke, H. (2005) Multiple elements jointly determine inhibitor selectivity of cyclic nucleotide phosphodiesterases 4 and 7. J. Biol. Chem. 280, 30949–55. Iffland, A., Kohls, D., Low, S., et al. (2005) Structural determinants for inhibitor specificity and selectivity in PDE2A using the wheat germ in vitro translation system. Biochemistry 44, 8312–25. Huai, Q., Colicelli, J., and Ke, H. (2003) The crystal structure of AMP-bound PDE4 suggests a mechanism for phosphodiesterase catalysis. Biochemistry 42, 13220–6. Huai, Q., Wang, H., Sun, Y., Kim, H.Y., Liu, Y., and Ke, H. (2003) Three-dimensional structures of PDE4D in complex with roliprams and implication on inhibitor selectivity. Structure 11, 865–73. Xu, R.X., Hassell, A.M., Vanderwall, D., et al. (2000) Atomic structure of PDE4: Insights into phosphodiesterase mechanism and specificity. Science 288, 1822–5. Lee, M.E., Markowitz, J., Lee, J.O., and Lee, H. (2002) Crystal structure of phosphodiesterase 4D and inhibitor complex(1). FEBS Lett. 530, 53–8. Sung, B.J., Hwang, K.Y., Jeon, Y.H., et al. (2003) Structure of the catalytic domain of human phosphodiesterase 5 with bound drug molecules. Nature 425, 98–102. Huai, Q., Wang, H., Zhang, W., Colman, R.W., Robinson, H., and Ke, H. (2004) Crystal structure of phosphodiesterase 9 shows orientation variation of inhibitor 3-isobutyl-1-methylxanthine binding. Proc. Natl. Acad. Sci. U.S.A. 101, 9624–9. Xu, R.X., Rocque, W.J., Lambert, M.H., Vanderwall, D.E., Luther, M.A., and Nolte, R.T. (2004) Crystal structures of the catalytic domain of phosphodiesterase 4B complexed with AMP, 8-Br-AMP, and rolipram. J. Mol. Biol. 337, 355–65. Scapin, G., Patel, S.B., Chung, C., et al. (2004) Crystal structure of human phosphodiesterase 3B: Atomic basis for substrate
81.
82.
83. 84.
85.
86.
87.
88.
89.
90.
91.
92.
and inhibitor specificity. Biochemistry 43, 6091–100. Liu, S., Mansour, M.N., Dillman, K.S., et al. (2008) Structural basis for the catalytic mechanism of human phosphodiesterase 9. Proc. Natl. Acad. Sci. U.S.A. 105, 13309–14. Card, G.L., England, B.P., Suzuki, Y., et al. (2004) Structural basis for the activity of drugs that inhibit phosphodiesterases. Structure 12, 2233–47. Beavo, J.A. (1995) Cyclic nucleotide phosphodiesterases: Functional implications of multiple isoforms. Physiol. Rev. 75, 725–48. Zhang, K.Y., Card, G.L., Suzuki, Y., et al. (2004) A glutamine switch mechanism for nucleotide selectivity by phosphodiesterases. Mol. Cell. 15, 279–86. Francis, S.H., Colbran, J.L., McAllisterLucas, L.M., and Corbin, J.D. (1994) Zinc interactions and conserved motifs of the cGMP-binding cGMP-specific phosphodiesterase suggest that it is a zinc hydrolase. J. Biol. Chem. 269, 22477–80. Rotella, D.P. (2002) Phosphodiesterase 5 inhibitors: Current status and potential applications. Nat. Rev. Drug Discov. 1, 674–82. Haning, H., Niewohner, U., Schenke, T., Lampe, T., Hillisch, A., and Bischoff, E. (2005) Comparison of different heterocyclic scaffolds as substrate analog PDE5 inhibitors. Bioorg. Med. Chem. Lett. 15, 3900–7. Rotella, D.P. (2006) Phosphodiesterases. In: Taylor, J.D., and Triggle, D.J. (eds.) Comprehensive Medicinal Chemistry II. Elsevier, Oxford, pp. 919–57. Haning, H., Niewoehner, U., Bischoff, E. (2003) Phosphodiesterase Type 5 (PDE5) Inhibitors. Progress in Medicinal Chemistry 41, 246–306. Sekhar, K.R., Grondin, P., Francis, S.H., and Corbin, J.D. (1996) Design and synthesis of xanthines and cyclic GMP analogues as potent inhibitors of PDE5. In: Schudt, C. (ed.) The Handbook of Immunopharmacology. Academic Press, New York, pp. 135–46. Huai, Q., Liu, Y., Francis, S.H., Corbin, J.D., and Ke, H. (2004) Crystal structures of phosphodiesterases 4 and 5 in complex with inhibitor 3-isobutyl-1-methylxanthine suggest a conformation determinant of inhibitor selectivity. J. Biol. Chem. 279, 13095–101. Kramer, G.L., and Wells, J.N. (1979) Effects of phosphodiesterase inhibitors on cyclic nucleotide levels and relaxation of pig coronary arteries. Mol. Pharmacol. 16, 813–22.
93. Garst, J.E., Kramer, G.L., Wu, Y.J., and Wells, J.N. (1976) Inhibition of separated forms of phosphodiesterases from pig coronary arteries by uracils and by 7-substituted derivatives of 1-methyl-3-isobutylxanthine. J. Med. Chem. 19, 499–503. 94. Wang, H., Liu, Y., Huai, Q., et al. (2006) Multiple conformations of phosphodiesterase-5: Implications for enzyme function and drug development. J. Biol. Chem. 281, 21469–79. 95. Eros, D., Szantai-Kis, C., Kiss, R., et al. (2008) Structure -activity relationships of PDE5 inhibitors. Curr. Med. Chem. 15, 1570–85. 96. Manning, G., Whyte, D.B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science 298, 1912–34. 97. Adams, JA. (2001) Kinetic and catalytic mechanisms of protein kinases. Chem Rev 101, 2271–90. 98. Ubersax, J.A., and Ferrell, J.E. Jr. (2007) Mechanisms of specificity in protein phosphorylation. Nat. Rev. Mol. Cell. Biol. 8, 530–41. 99. Benson, J.D., Chen, Y.N., Cornell-Kennon, S.A., et al. (2006) Validating cancer drug targets. Nature 441, 451–6. 100. Walker, I., and Newell, H. (2009) Do molecularly targeted agents in oncology have reduced attrition rates? Nat. Rev. Drug Discov. 8, 15–6. 101. Vieth, M., Higgs, R.E., Robertson, D.H., Shapiro, M., Gragg, E.A., and Hemmerle, H. (2004) Kinomics-structural biology and chemogenomics of kinase inhibitors and targets. Biochim. Biophys. Acta 1697, 243–57. 102. Karaman, M.W., Herrgard, S., Treiber, D.K., et al. (2008) A quantitative analysis of kinase inhibitor selectivity. Nat. Biotechnol. 26, 127–32. 103. Liu, Y., Shah, K., Yang, F., Witucki, L., and Shokat, K.M. (1998) A molecular gate which controls unnatural ATP analogue recognition by the tyrosine kinase v-Src. Bioorg. Med. Chem. 6, 1219–26. 104. Liu, Y., Shah, K., Yang, F., Witucki, L., and Shokat, K.M. (1998) Engineering Src family protein kinases with unnatural nucleotide specificity. Chem. Biol. 5, 91–101. 105. Bishop, A.C., and Shokat, K.M. (1999) Acquisition of inhibitor-sensitive protein kinases through protein design. Pharmacol. Ther. 82, 337–46. 106. Blencke, S., Zech, B., Engkvist, O., et al. (2004) Characterization of a conserved structural determinant controlling protein
Targeting the Purinome
107.
108.
109.
110.
111.
112.
113.
114.
115.
116.
117. 118.
89
kinase sensitivity to selective inhibitors. Chem. Biol. 11, 691–701. Alaimo, P.J., Knight, Z.A., and Shokat, K.M. (2005) Targeting the gatekeeper residue in phosphoinositide 3-kinases. Bioorg. Med. Chem. 13, 2825–36. Liu, Y., and Gray, N.S. (2006) Rational design of inhibitors that bind to inactive kinase conformations. Nat. Chem. Biol. 2, 358–64. Manley, P.W., Cowan-Jacob, S.W., and Mestan, J. (2005) Advances in the structural biology, design and clinical development of Bcr-Abl kinase inhibitors for the treatment of chronic myeloid leukaemia. Biochim. Biophys. Acta. 1754, 3–13. Lee, J.C., Kassis, S., Kumar, S., Badger, A., and Adams, J.L. (1999) p38 mitogen-activated protein kinase inhibitors – mechanisms and therapeutic potentials. Pharmacol. Ther. 82, 389–97. Wilson, K.P., McCaffrey, P.G., Hsiao, K., et al. (1997) The structural basis for the specificity of pyridinylimidazole inhibitors of p38 MAP kinase. Chem. Biol. 4, 423–31. Wang, Z., Canagarajah, B.J., Boehm, J.C., et al. (1998) Structural basis of inhibitor selectivity in MAP kinases. Structure 6, 1117–28. Zhang, Q., Liu, Y., Gao, F., et al. (2006) Discovery of EGFR selective 4,6-disubstituted pyrimidines from a combinatorial kinasedirected heterocycle library. J. Am. Chem. Soc. 128, 2182–3. Ghose, A.K., Herbertz, T., Pippin, D.A., Salvino, J.M., and Mallamo, J.P. (2008) Knowledge based prediction of ligand binding modes and rational inhibitor design for kinase drug discovery. J.Med. Chem. 51, 5149–71. Bohmer, F.D., Karagyozov, L., Uecker, A., et al. (2003) A single amino acid exchange inverts susceptibility of related receptor tyrosine kinases for the ATP site inhibitor STI-571. J. Biol. Chem. 278, 5148–55. Honma, T., Yoshizumi, T., Hashimoto, N., et al. (2001) A novel approach for the development of selective Cdk4 inhibitors: Library design based on locations of Cdk4 specific amino acid residues. J. Med. Chem. 44, 4628–40. Teague, S.J. (2003) Implications of protein flexibility for drug discovery. Nat. Rev. Drug Discov. 2, 527–41. Huse, M., and Kuriyan, J. (2002) The conformational plasticity of protein kinases. Cell 109, 275–82.
90
Murray and Bussiere
119. Wood, E.R., Truesdale, A.T., McDonald, O.B., et al. (2004) A unique structure for epidermal growth factor receptor bound to GW572016 (Lapatinib): Relationships among protein conformation, inhibitor offrate, and receptor activity in tumor cells. Cancer Res. 64, 6652–9. 120. Heron, N.M., Anderson, M., Blowers, D.P., et al. (2005) SAR and inhibitor complex structure determination of a novel class of potent and specific Aurora kinase inhibitors. Bioorg. Med. Chem. Lett. 16, 1320–3. 121. Bellon, S.F., Kaplan-Lefko, P., Yang, Y., et al (2008) c-Met inhibitors with novel binding mode show activity against several hereditary papillary renal cell carcinoma-related mutations. J. Biol. Chem. 283, 2675–83. 122. Stamos, J., Sliwkowski, M.X., and Eigenbrot, C. (2002) Structure of the epidermal growth factor receptor kinase domain alone and in complex with a 4-anilinoquinazoline inhibitor. J. Biol. Chem. 277, 46265–72. 123. Kufareva, I., and Abagyan, R. (2008) Type-II kinase inhibitor docking, screening, and profiling using modified structures of active kinase states. J. Med. Chem. 51, 7921–32. 124. Vale, R.D., and Milligan, R.A. (2000) The way things move: Looking under the hood of molecular motor proteins. Science 288, 88–95. 125. Sakowicz, R., Finer, J.T., Beraud, C., et al. (2004) Antitumor activity of a kinesin inhibitor. Cancer Res. 64, 3276–80. 126. Kolomeisky, A.B., and Fisher, M.E. (2007) Molecular motors: A theorist’s perspective. Annu. Rev. Phys. Chem. 58, 675–95. 127. Vale, R.D., and Fletterick, R.J. (1997) The design plan of kinesin motors. Annu. Rev. Cell Dev. Biol. 13, 745–77. 128. Vetter, I.R., and Wittinghofer, A. (1999) Nucleoside triphosphate-binding proteins: Different scaffolds to achieve phosphoryl transfer. Q. Rev. Biophys. 32, 1–56. 129. Turner, J., Anderson, R., Guo, J., Beraud, C., Fletterick, R., and Sakowicz, R. (2001) Crystal structure of the mitotic spindle kinesin Eg5 reveals a novel conformation of the necklinker. J. Biol. Chem. 276, 25496–502. 130. Sablin, E.P., Kull, F.J., Cooke, R., Vale, R.D., and Fletterick, R.J. (1996) Crystal structure of the motor domain of the kinesin-related motor ncd. Nature 380, 555–9. 131. Kull, F.J., Sablin, E.P., Lau, R., Fletterick, R.J., and Vale, R.D. (1996) Crystal structure of the kinesin motor domain reveals a structural similarity to myosin. Nature 380, 550–5.
132. Lad, L., Luo, L., Carson, J.D., et al. (2008) Mechanism of inhibition of human KSP by ispinesib. Biochemistry 47, 3576–85. 133. Hopkins, S.C., Vale, R.D., and Kuntz, I.D. (2000) Inhibitors of kinesin activity from structure-based computer screening. Biochemistry 39, 2805–14. 134. Mayer, T.U., Kapoor, T.M., Haggarty, S.J., King, R.W., Schreiber, S.L., and Mitchison, T.J. (1999) Small molecule inhibitor of mitotic spindle bipolarity identified in a phenotype-based screen. Science 286, 971–4. 135. Maliga, Z., Kapoor, T.M., and Mitchison, T.J. (2002) Evidence that monastrol is an allosteric inhibitor of the mitotic kinesin Eg5. Chem. Biol. 9, 989–96. 136. Luo, L., Carson, J.D., Dhanak, D., et al. (2004) Mechanism of inhibition of human KSP by monastrol: Insights from kinetic analysis and the effect of ionic strength on KSP inhibition. Biochemistry 43, 15258–66. 137. Yan, Y.W., Sardana, V., Xu, B., et al. (2004) Inhibition of a mitotic motor protein: Where, how, and conformational consequences. J. Mol. Biol. 335, 547–54. 138. Zhang, B., Liu, J.F., Xu, Y., and Ng, S.C. (2008) Crystal structure of HsEg5 in complex with clinical candidate CK0238273 provides insight into inhibitory mechanism, potency, and specificity. Biochem. Biophys. Res. Commun. 372, 565–70. 139. Zhang, Y., and Xu, W. (2008) Progress on kinesin spindle protein inhibitors as anticancer agents. Anticancer Agents Med. Chem. 8, 698–704. 140. Liu, F., You, Q.D., and Chen, Y.D. (2006) Pharmacophore identification of KSP inhibitors. Bioorg. Med. Chem. Lett. 17, 722–6. 141. Maliga, Z., and Mitchison, T.J. (2006) Small-molecule and mutational analysis of allosteric Eg5 inhibition by monastrol. BMC Chem. Biol. 6, 2. 142. Brier, S., Lemaire, D., DeBonis, S., Forest, E., and Kozielski, F. (2006) Molecular dissection of the inhibitor binding pocket of mitotic kinesin Eg5 reveals mutants that confer resistance to antimitotic agents. J. Mol. Biol. 360, 360–76. 143. Rickert, K.W., Schaber, M., Torrent, M., et al. (2008) Discovery and biochemical characterization of selective ATP competitive inhibitors of the human mitotic kinesin KSP. Arch. Biochem. Biophys. 469, 220–31. 144. Parrish, C.A., Adams, N.D., Auger, K.R., et al. (2007) Novel ATP-competitive kinesin spindle protein inhibitors. J. Med. Chem. 50, 4939–52.
145. Tecle, H., Shao, J., Li, Y., et al. (2009) Beyond the MEK-pocket: Can current MEK kinase inhibitors be utilized to synthesize novel type III NCKIs? Does the MEKpocket exist in kinases other than MEK? Bioorg. Med. Chem. Lett. 19, 226–9. 146. Lewis, J.A., Lebois, E.P., and Lindsley, C.W. (2008) Allosteric modulation of kinases and GPCRs: Design principles and structural diversity. Curr. Opin. Chem. Biol. 12, 269–80. 147. Downward, J. (1998) Ras signalling and apoptosis. Curr. Opin. Genet. Dev. 8, 49–54. 148. Buday, L., and Downward, J. (2008) Many faces of Ras activation. Biochim. Biophys. Acta. 1786, 178–87. 149. Nakao, M., Janssen, J.W., Seriu, T., and Bartram, C.R. (2000) Rapid and reliable detection of N-ras mutations in acute lymphoblastic leukemia by melting curve analysis using LightCycler technology. Leukemia 14, 312–5. 150. Burmer, G.C., and Loeb, L.A. (1989) Mutations in the KRAS2 oncogene during progressive stages of human colon carcinoma. Proc. Natl. Acad. Sci. U.S.A. 86, 2403–7. 151. Almoguera, C., Shibata, D., Forrester, K., Martin, J., Arnheim, N., and Perucho, M. (1988) Most human carcinomas of the exocrine pancreas contain mutant c-K-ras genes. Cell 53, 549–54. 152. Bourne, H.R., Sanders, D.A., and McCormick, F. (1991) The GTPase superfamily: Conserved structure and molecular mechanism. Nature 349, 117–27. 153. Paduch, M., Jelen, F., and Otlewski, J. (2001) Structure of small G proteins and their regulators. Acta Biochim. Pol. 48, 829–50. 154. Feuerstein, J., Kalbitzer, H.R., John, J., Goody, R.S., and Wittinghofer, A. (1987) Characterisation of the metal-ion-GDP complex at the active sites of transforming and nontransforming p21 proteins by observation of the 17O-Mn superhyperfine coupling and by kinetic methods. Eur. J. Biochem. 162, 49–55. 155. Ahmadian, M.R. (2002) Prospects for antiras drugs. Br. J. Haematol. 116, 511–8. 156. Dever, T.E., Glynias, M.J., and Merrick, W.C. (1987) GTP-binding domain: Three consensus sequence elements with distinct spacing. Proc Natl Acad Sci U.S.A. 84, 1814–8. 157. Scheidig, A.J., Franken, S.M., Corrie, J.E., et al. (1995) X-ray crystal structure analysis of the catalytic domain of the oncogene
Targeting the Purinome
158.
159.
160. 161.
162.
163.
164.
165.
166.
167.
168.
91
product p21H-ras complexed with caged GTP and mant dGppNHp. J. Mol. Biol. 253, 132–50. Zhong, J.M., Chen-Hwang, M.C., and Hwang, Y.W. (1995) Switching nucleotide specificity of Ha-Ras p21 by a single amino acid substitution at aspartate 119. J. Biol. Chem. 270, 10002–7. Schmidt, G., Lenzen, C., Simon, I., et al. (1996) Biochemical and biological consequences of changing the specificity of p21ras from guanosine to xanthosine nucleotides. Oncogene 12, 87–96. Kosloff, M., and Selinger, Z. (2001) Substrate assisted catalysis – application to G proteins. Trends Biochem. Sci. 26, 161–6. Kosloff, M., and Selinger, Z. (2003) GTPase catalysis by Ras and other G-proteins: Insights from substrate directed superimposition. J. Mol. Biol. 331, 1157–70. Frech, M., Darden, T.A., Pedersen, L.G., et al. (1994) Role of glutamine-61 in the hydrolysis of GTP by p21H-ras: An experimental and theoretical study. Biochemistry 33, 3237–44. Nassar, N., Horn, G., Herrmann, C., Scherer, A., McCormick, F., and Wittinghofer, A. (1995) The 2.2 A crystal structure of the Ras-binding domain of the serine/ threonine kinase c-Raf1 in complex with Rap1A and a GTP analogue. Nature 375, 554–60. Zhang, J., and Matthews, C.R. (1998) Ligand binding is the principal determinant of stability for the p21(H)-ras protein. Biochemistry 37, 14881–90. Zhang, J., and Matthews, C.R. (1998) The role of ligand binding in the kinetic folding mechanism of human p21(H-ras) protein. Biochemistry 37, 14891–9. Spoerner, M., Herrmann, C., Vetter, I.R., Kalbitzer, H.R., and Wittinghofer, A. (2001) Dynamic properties of the Ras switch I region and its importance for binding to effectors. Proc. Natl. Acad. Sci. U.S.A. 98, 4944–9. Geyer, M., Schweins, T., Herrmann, C., Prisner, T., Wittinghofer, A., and Kalbitzer, H.R. (1996) Conformational transitions in p21ras and in its complexes with the effector protein Raf-RBD and the GTPase activating protein GAP. Biochemistry 35, 10308–20. Al-Mulla, F., Milner-White, E.J., Going, J.J., and Birnie, G.D. (1999) Structural differences between valine-12 and aspartate-12 Ras proteins may modify carcinoma aggression. J. Pathol. 187, 433–8.
92
Murray and Bussiere
169. Herrmann, C., Block, C., Geisen, C., et al. (1998) Sulindac sulfide inhibits Ras signaling. Oncogene 17, 1769–76. 170. Waldmann, H., Karaguni, M.I., Carpintero, M., et al (2004) Sulindac-derived Ras pathway inhibitors target the Ras-Raf interaction and downstream effectors in the Ras pathway. Angew. Chem. Int. Ed. Engl. 43, 454–8. 171. Kyte, J. (2003) The basis of the hydrophobic effect. Biophys. Chem. 100, 193–203. 172. Ruelle, P., and Kesselring, U.W. (1998) The hydrophobic effect. 1. A consequence of the mobile order in H-bonded liquids. J. Pharm. Sci. 87, 987–97. 173. Ruelle, P., and Kesselring, U.W. (1998) The hydrophobic effect. 2. Relative importance of the hydrophobic effect on the solubility of hydrophobes and pharmaceuticals in H-bonded solvents. J. Pharm. Sci. 87, 998–1014. 174. Ruelle, P., and Kesselring, U.W. (1998) The hydrophobic effect. 3. A key ingredient in predicting n-octanol-water partition coefficients. J. Pharm. Sci. 87, 1015–24. 175. Sharp, K.A., Nicholls, A., Fine, R.F., and Honig, B. (1991) Reconciling the magnitude of the microscopic and macroscopic hydrophobic effects. Science 252, 106–9. 176. Honig, B., Sharp, K., and Gilson, M. (1989) Electrostatic interactions in proteins. Prog. Clin. Biol. Res. 289, 65–74. 177. Sharp, K.A., and Honig, B. (1990) Electrostatic interactions in macromolecules: Theory and applications. Annu. Rev. Biophys. Biophys. Chem. 19, 301–32. 178. Sinha, N, and Smith-Gill, S.J. (2002) Electrostatics in protein binding and function. Curr. Protein Pept. Sci. 3, 601–14. 179. Paulini, R., Muller, K., and Diederich, F. (2005) Orthogonal multipolar interactions in structural chemistry and biology. Angew. Chem. Int. Ed. Engl. 44, 1788–805.
180. Israelachvili, J.N. (1973) Van der Waals forces in biological systems. Q. Rev. Biophys. 16, 341–87. 181. Fleming, P.J., and Rose, G.D. (2005) Do all backbone polar groups in proteins form hydrogen bonds? Protein Sci. 14, 1911–7. 182. Perrin, C.L., and Nielson, J.B. (1997) “Strong” hydrogen bonds in chemistry and biology. Annu. Rev. Phys. Chem. 48, 511–44. 183. Kim, K.S., Tarakeshwar, P., and Lee, J.Y. (2000) Molecular clusters of pi-systems: Theoretical studies of structures, spectra, and origin of interaction energies. Chem Rev 100, 4145–86. 184. McGaughey, G.B., Gagne, M., and Rappe, A.K. (1998) pi-Stacking interactions. Alive and well in proteins. J. Biol. Chem. 273, 15458–63. 185. Sinnokrot, M.O., and Sherrill, C.D. (2004) Substituent effects in pi-pi interactions: Sandwich and T-shaped configurations. J. Am. Chem. Soc. 126, 7690–7. 186. Crowley, P.B., and Golovin, A. (2005) Cation–pi interactions in protein–protein interfaces. Proteins 59, 231–9. 187. Cubero, E., Luque, F.J., and Orozco, M. (1998) Is polarization important in cationpi interactions? Proc. Natl. Acad. Sci. U.S.A. 95, 5976–80. 188. Gallivan, J.P, and Dougherty, D.A. (1999) Cation– pi interactions in structural biology. Proc. Natl. Acad. Sci. U.S.A. 96, 9459–64. 189. Wittinghofer, A., and Waldmann, H. Ras – A molecular switch involved in tumor formation. Angew. Chem. Int. Ed. Engl. 39, 4192–214 190. Noonan, T., Brown, N., Dudycz, L., and Wright, G. (1991) Interaction of GTP derivatives with cellular and oncogenic rasp21 proteins. J. Med. Chem. 34, 1302–7. 191. Ahmadian, MR. (2002) Prospects for antiras drugs. Br. J. Haematol. 116, 511–8.
Chapter 4 Cofactor Chemogenomics Ratna Singh and Andrea Mozzarelli Summary Cofactors are organic molecules, most of them originating from vitamins, that bind to enzymes making them able to catalyze defined reactions. A cofactor-based chemogenomics approach exploits the presence of a cofactor-binding domain to develop compound scaffolds tailored to mimic the cofactor and to replace it within target enzyme classes. As a result, a loss of function is observed. An expansion of the cofactor scaffold to include structural/chemical features derived from the substrate, that usually binds at cofactor adjacent sites, increases the specificity of the enzyme fishing. This approach has been so far applied only to NAD(P)+-dependent enzymes. However, it is suitable for all other cofactors, with difficulties, for some of them, originated by very tight binding. In the case of cofactors covalently bound to the enzyme, the competition between the natural cofactor and the cofactor scaffold mimic can only occur during enzyme folding. Key words: Cofactor, Enzyme, Coenzyme, Chemogenomics, Enzyme inhibitor, Coenzyme domain, Catalysis, Protein structure
1. Introduction Chemogenomics aims at mining the chemical space for the identification of interactions between chemical compounds and proteins that, in turn, affect protein function (1). This approach may lead either to identify new proteins or to discover compounds that interfere with biological processes, eventually associated with diseases. A key feature of the chemogenomics approach is the observation that a class of molecules can bind “similar” proteins, suggesting that the identification of ligands for a target can be helpful in the determination of ligands for targets that share some structural or functional feature (2). A compound well profiled for a particular target provides an excellent starting point for Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_4, © Humana Press, a part of Springer Science + Business Media, LLC 2009
93
94
Singh and Mozzarelli
the development of its neighbors in genome space and, possibly, for the subsequent development of drug candidates for related proteins. As a result, the classification of binding pockets based on recognition patterns, active site descriptors, and ligand pharmacophores links the genome/proteome space to the chemical space (3). The chemogenomics approach is particularly powerful when applied to class of proteins that share the same ligands. Therefore, it is not surprising that chemogenomics has been applied to kinases that bind ATP (4), proteases that cleave polypeptide chains (2), GPCRs that bind hormones (5, 6), and oxidoreductases that bind redox coenzymes (7). The former three protein classes are discussed in other chapters. In particular, ATP and ATP analogs have been extensively exploited given the biological involvement of kinases in many diseases and their pharmacological relevance (4). In the present chapter, we will focus on cofactor-binding proteins. A cofactor is a chemical species bound to an enzyme assisting it in performing catalysis. An enzyme without the cofactor is called apoenzyme and is inactive, and an enzyme with bound cofactor is called holoenzyme and is fully active. A cofactor can be a water molecule, a metal ion, and an organic compound. We will only report on chemogenomics studies on organic cofactors, such as coenzymes and prosthetic groups. Coenzymes are the biologically active form of hydrophilic vitamins that bind to the corresponding enzyme with affinities that may vary significantly during the catalytic cycle, even leading to dissociation, whereas prosthetic groups usually are tightly bound to the enzyme. A representative example of a coenzyme-bound enzyme is NAD+-lactate dehydrogenase and of a prosthetic group-bound enzyme is the heme-containing cyclooxygenase 1. The list of coenzymes and prosthetic groups is reported in Table1. The rationale of cofactors chemogenomics is based on the existence for a given class of enzymes of a common binding site for the cofactor, as, for example, the NAD-binding domain in dehydrogenases. This site can be exploited for the design of a library of compounds that, by replacing the natural cofactor in the cofactor domain of dehydrogenases, lead to enzyme inhibition. The ligand-binding specificity can arise from the conjugation of a scaffold bearing structural/chemical feature of the cofactor with a second moiety bearing structural/chemical features defined by the substrate active site that is usually adjacent to the cofactorbinding site. These two structural/functional features will guide the resulting compounds to dock in the enzyme active site. This general strategy might be difficult to apply when the cofactor is tightly bound; thus, the designed compounds should exhibit affinity in the nanomolar or subnanomolar range in order to be able to affect enzyme function. Alternatively, the designed
.
Cofactor Chemogenomics
95
Table 1 Cofactors Vitamins
Coenzyme (abbreviation)
Function
Structure
Thiamin (B1)
Thiaminpyrophosphate (TPP)
Aldehyde two carbons group transfer
Fig. 1m
Riboflavin (B2)
Flavin mono- (FMN) and Flavin adenine dinucleotide (FAD)
Redox with hydride and proton transfer
Fig. 1c,d
Niacin (B3)
Nicotinamide adenine dinucleotide (NAD+)
Redox with hydride transfer
Fig. 1a,b
Pantothenic acid (B5)
Coenzyme A (CoA)
Acyl carrier moiety
Fig. 1h
Pyridoxine (B6)
Pyridoxal 5¢-phosphate (PLP)
Amino acid, amine, and ketoacid modification
Fig. 1i–l
CO2 transfer
Fig. 1s
Biotin (B7) Folate (B9)
Tetrahydrofolate
One-carbon transfer
Fig. 1e
Cobalamin (B12)
Adenosylcobalamin and methylcobalamin
Alkyl/hydride group transfer
Fig. 1u
Heme
Redox
Fig. 1n,o
Lipoamide
Acyl transfer
Fig. 1q
S-adenosylmethionine
Methyl transfer
Fig. 1p
Prosthetic groups
compound can bind to the enzyme during folding. As a result, chemogenomics and chemogenomics-oriented studies have been applied to cofactors that are loosely bound to the enzymes, at some stage of the catalytic cycle, and, therefore, can be displaced by a properly designed compound. NAD(P)+ chemogenomics is reported first as, up to now, it is the only cofactor for which this approach has been successfully applied.
2. NAD(P)+ NAD(P)+ (Fig. 1a) and the reduced form NAD(P)H (Fig. 1b) are the coenzymes of oxidoreductases, enzymes that catalyze the oxidation/reduction of the substrates. The general reaction scheme is:
96
Singh and Mozzarelli
Fig. 1. Chemical structure of cofactors (a) NAD(P)+ and (b) NADH(P), with R = H is NAD+ or NADH, and R = P is NADP+ or NADPH, (c) Flavin mononucleotide (FMN), (d) Flavin adenine dinucleotide (FAD), (e) Tetrahydrofolate, (f) 10-formyltetrahydrofolic acid, (g) 5–10 methylenetetrahydrofolic acid,
.
Cofactor Chemogenomics
97
Fig. 1. (continued) (h) Pantothenate, (i) Pyridoxine, (j) Pyridoxal 5¢-phosphate, (k) pyridoamine 5¢-phosphate, (l) Schiff base between PLP and lysine in the active site forming the internal aldimine, (m) Thiamine pyrophosphate, (n) Iron-porphirin, (o) Protoporphirin IX,
98
Singh and Mozzarelli
Fig. 1. (continued) (p) S-adenosyl methionine, (q) Lipoamide, (r) Lipoamide covalently bound to an enzyme via a carbamide bond, (s) Biotin, (t) Biotin bound to an enzyme via a carboamide bond, (u) Cobalamine derivatives 5¢-deoxyadenosylcobalamin and methylcobalamin.
.
Cofactor Chemogenomics
99
NAD(P)+ + reduced substrate ¤ NAD(P)H E
+ oxidized product + H + The reaction involves the hydride transfer from the substrate to the pyridine C-4 position of NAD(P) +. This transfer is usually stereospecific, being the oxidoreductase either anti or syn, depending on the rotation by 180° of the nicotinamide ring with respect to the ribose moiety. During the catalytic cycle, when formed, NAD(P)H dissociates and is replaced by an incoming NAD(P)+ indicating that NAD(P)H exhibits a weaker enzyme affinity. An oxidoreductase usually consists of two domains, a coenzyme-binding domain, like the Rossmann fold (8, 9), and an adjacent catalytic domain where the substrate binds (Fig. 2). Sem and colleagues used a system-based design approach for identifying specific inhibitors for multiple oxidoreductases (7, 10–12). First, a cluster analysis was performed on cofactors extracted from 288 oxidoreductase crystal structures. Oxidoreductases were classified into 11 subfamilies, termed pharmacofamilies that were related by cofactor geometry, protein sequence, and protein fold. This implies some common features in the cofactor-binding site within each pharmacofamily. As a first attempt for the generation of both general and specific ligands for oxidoreductases, pharmacofamilies 1 and 2 were selected as they are the most populated, both possessing a domain Rossmann fold, with the nicotinamide ring in the anti and syn geometry, respectively.
Fig. 2. Three-dimensional structure of M. tuberculosis dihydropicolinate reductase consisting of the Rossmann-fold domain with bound NAD (stick representation) and the catalytic domain with bound the substrate pyridine-2,6-dicarboxylic acid (stick representation) (100).
100
Singh and Mozzarelli
The inhibitor design strategy parallels the modular design of the oxidoreductase gene family (11) and produces inhibitors across a pharmacofamily(7). The procedure starts by identifying a small molecule that binds to the common ligand site (a common ligand mimic or CLM) for this class of enzymes. CLM candidates are selected computationally by matching the pharmacophore properties of nicotinamide mononucleotide portion of NADH cofactor bound to dihydrodipicolinate reductase (DHPR), an oxidoreductase in pharmacofamily 1, essential for cell wall synthesis in Mycobacterium tuberculosis(13). This ligandbased search employed the icosahedral matching algorithm (14), contained within the THREEDOM software package to identify potential inhibitors. These compounds were tested for binding potency using steady-state kinetic inhibition of DHPR and other dehydrogenases belonging to the same pharmacofamily. A privileged cofactor scaffold CLM is shown in (Fig. 3), with the linker that allows to expand the compound and introduce the specificity ligand (SL). The expansion was carried out using an NMR-based binding site method (NMR SOLVE) to determine where to insert a combinatorial library in order to direct diversity elements into the adjacent substrate site. The NMR SOLVE method (11, 15) maps a binding site relative to a reference ligand, such as the cofactor, and then characterizes the binding mode of a novel ligand, such as a CLM relative to the reference ligand. Key information obtained with NMR SOLVE is where a linker should be placed and which chemical diversity elements can be attached and directed into an adjacent specificity pocket. Although the geometric relationship of the CLM and SL sites is conserved in oxidoreductases, the actual size and electrostatic properties of the SL-binding site will vary in a way that parallels the diversity of substrates used by oxidoreductases. The diversity elements attached to the conserved CLM scaffold were matched to the properties of substrates used throughout the oxidoreductase gene family. Three hundred diversity elements were selected and chemically joined to the CLM-linker construct. The resulting biligand library was then screened against three Rossmann-fold
Fig. 3. CLM (common ligand mimic) with linker to SL (substrate ligand) is prepared by computationally matching the pharmacophore properties of the nicotinamide ring of NADH bound to dihydrodipicolinate reductase (DHPR) (11).
.
Cofactor Chemogenomics
101
enzymes in pharmacofamilies 1 and 2: DHPR, lactate dehydrogenase, and 1-deoxy-D-xylulose-5-phosphate reductoisomerase. It was found that the CLM expansion to include the SL increases, in some cases, the affinity by 1,000-fold (from 55 mM to 42 nM) (7). This study demonstrates that a biligand collection of sufficient size and diversity, built with an appropriately chosen CLM and well-placed linkers is able to produce nM inhibitors for most members of a pharmacofamily. In a very recent paper (12), targets in M. tuberculosis and antitargets in human liver cells were fished using a catechol-rhodamine privileged biligand scaffold, generated from the aforementioned studies. This work consists of three parts: (a) assessment of ligand uptake in E. coli cells as a test for the capability of crossing bacteria walls and in vivo protein labeling with fluorescent biligand conjugated to N-hydroxysuccinamide (NHS) to introduce a moiety reacting with amine residues; (b) isolation of targets and antitargets by catechol-rhodamine functionalized affinity chromatography, and (c) identification of targets and antitargets by SDS gel electrophoresis and nanospray LC/MS/MS. Collected MS data were searched against a subset of the Uniprot database. For the human liver cells, this analysis allows the identification of five proteins, of which three are known to bind NAD(P), cytoplasmic isocitrate dehydrogenase, mitochondrial precursor of aldehyde dehydrogenase, and mitochondrial precursor of malate dehydrogenase. For the M. tuberculosis cells the analysis identifies seven proteins, of which two are known to bind NAD(P), riboflavin biosynthesis protein ribD, and a possible oxidoreductase. This result was almost completely confirmed by the MS analysis of proteins fished by the affinity column. 2.1. Others NAD(P)+Dependent Oxidoreductase Targets
In the following section, representative examples of NAD(P)+ oxidoreductases that might be targets for a chemogenomics analysis are presented. Malaria, arthritis, and lupus can be treated by quinolines. In the search of novel proteins that are targeted by quinolines, that also exhibit toxic effects, the human red blood cell purine-binding proteome and the Plasmodium falciparium purine-binding proteome were screened with several quinoline compounds. The screening identifies in human, but not in P. falciparium, aldehyde dehydrogenases and quinone reductase, both enzymes known to bind NAD(P)+. It seems that quinolines target the adenine subsite of NAD cofactor. Inactivation of aldehyde dehydrogenase might be responsible for retinopathy associated to prolonged treatment with chloroquine (CQ), whereas inactivation of quinone reductase has been associated to an anti-inflammatory action (16). A similar proteome mining approach led to the same results. However, with a set of new CQ-related analogs alcohol dehydrogenase was also identified as a target (17).
102
Singh and Mozzarelli
A more classical structure-based drug design approach was applied by Read et al. to the identification of inhibitors of P. falciparum lactate dehydrogenase (pfLDH) (18). This enzyme has revealed a unique cleft adjacent to the active site, ideally suited as a target for the rational design of inhibitors. Another feature of the structure is a significant difference in the NADH-binding mode relative to other forms of LDH, indicating that the P. falciparum enzyme has a unique mode of association with the cofactor and, hence, a distinctive NADH-binding pocket. These features suggest that pfLDH might be a target for structure-based design of novel antimalarials. Crystal structure of the complex formed between CQ and pfLDH has been determined to assess the mode and the likely significance of this interaction. The crystallographic structure shows that CQ is partially buried within the cofactor-binding site of pfLDH (Fig. 4) occupying a position similar to that of the adenyl ring of the cofactor in the ternary complex structure. Although in a previous work (19) some evidence of CQ binding to pfLDH was reported, CQ was not found to inhibit pfLDH. The overlap of CQ and NADH-binding sites observed in the crystal structures suggested CQ might be a competitive inhibitor of the enzyme. This expectation was confirmed by enzymatic activity assays. The measured Ki values of 1.3 and 3.5 mM for inhibition of malarial and pig LDH (18), respectively, are high, partially explaining previous findings. A chemogenomics approach might be of help in detecting inhibitors with higher affinity. Similarly, the chemogenomic approach can be used to identify inhibitors of enoyl-ACP reductase (InhA) in the treatment of tuberculosis. For this disease, the therapy is
Fig. 4. Superimposition of the structure of pfLDH with chloroquine (18) and pfLDH with bound NADH (101), showing that quinoline ring occupies a position similar to the adenyl ring of NAD+.
.
Cofactor Chemogenomics
103
predominantly based on isoniazid (isonicotinic acid hydrazide, INH). The drug binds to the cofactor and forms a binary complex that, in turn, acts as an inhibitor. Indeed, INH is a prodrug which undergoes to a metabolic oxidation by the M. tuberculosis enzyme catalase-peroxidase katG (20) to form an isonicotinoyl radical which binds covalently to the C4 position of NAD(P)+ cofactor. The INH-NAD(P) adduct inhibits two enzymes involved in the fatty acid biosynthetic pathway of M. tuberculosis, NAD+-dependent enoyl-acyl carrier protein reductase (enoyl-ACP reductase, InhA) and NAD(P)+-dependent b-keto-ACP reductase (mycolic acid biosynthesis A, MabA). Addition of the isonicotinoyl radical to the C4 of the nicotinamide ring can result in two stereoisomers. However, only 4(S) isomers of the INH-NAD and INH-NAD(P) are potent inhibitors of InhA (Ki = 0.75 nM) and MabA (Ki = 2.2 mM). In contrast, the 4(R) isomer INH-NAD(P) inhibits M. tuberculosis dihydrofolate reductase, which is essential for nucleic acid synthesis. Using this approach, other compounds, like benzoic acid hydrazide, were identified and demonstrated to act as INH and are potent inhibitors of InhA (21). 4-Phen oxybenzamide adenine dinucleotide was synthesized as NAD analog with inhibitory activity against enoyl-ACP reductase (InhA) of M. tuberculosis. An expansion of this NAD-mimic scaffold to occupy the catalytic site might result in more potent compounds. Another enzyme target that may take advantage of a chemogenomics approach to improve inhibitor affinity is glyceraldehyde3-phosphate dehydrogenase. A number of adenosine analogs were synthesized by exploiting the small differences between NAD+ binding pockets of glyceraldehyde-3-phosphate dehydrogenase from Trypanosoma brucei and Leishmania mexicana with respect to the human enzyme (22). NAD+ does not only play a biological role as a coenzyme in oxidoreductases but also as a substrate with its nicotinamide moiety acting as a leaving group in enzymes, as poly(ADP-ribose) polymerase and sirtuins (23). This leads to an amplification of potential enzyme targets, well beyond oxidoreductases, and, thus, to a potential more general application of NAD(P)+ chemogenomics.
3. Riboflavin Vitamin B2 forms the basis of two coenzymes, flavin mononucleotide (FMN) (Fig. 1c). and flavin adenine dinucleotide (FAD) (Fig. 1d). Both coenzymes are involved in redox reactions. The general scheme is:
104
Singh and Mozzarelli F FMN(FAD) + reduced substrate ⇔ FMNH 2 (FADH 2 ) + oxidized product
where the substrate transfer to FMN (FAD) a hydride and a proton. As in the case of NAD+, FMN and FAD bind to common domains that are adjacent to specific substrate-binding sites. There are two types of FMN (FAD) enzymes: enzymes that depend on the coenzyme for activity, and enzymes in which FMN (FAD) acts as a substrate. In the latter case FMN (FAD) binding is weaker, whereas in the former case FMN (FAD) binding is tight and the coenzyme is never displaced during catalysis. Therefore, the development of chemical compounds affecting FMN (FAD)dependent enzymes has been only directed to enzymes that use FMN or FAD as a substrate. The biosynthesis of riboflavin is carried out by several enzymes (24), that represent a potential target for antibiotics agents because they are not present in humans. Moreover, bacteria lack the proteins for the intake of riboflavin from the external medium. A genomic analysis has suggested that the riboflavin biosynthesis pathway is essential for both M. tuberculosis and M. leprae(25). Thus, inhibitors of lumazine synthase (LS) and riboflavin synthase (RS), the enzymes that catalyze the final steps of riboflavin biosynthesis, are proposed to be efficient drugs against mycobacterium infection. The structure for the M. tuberculosis LS was determined and compared with previously known structures of LSs from different species. A novel class of inhibitors were developed based on ribityl purinetrione scaffold that mimics a reaction intermediate of LS and binds to the active site of both LS and RS, with a binding affinity for the former enzyme in the nanomolar range (24). For two of these inhibitors the structure complexed to LS was determined, defining the mode of binding and opening the way to further improvements in inhibitor-active site complementarity. A similar approach was used for the development of inhibitors that target LS from Candida albicans(26). An NMR-based screening of RS was carried out against a small library of 19F-containing ligands. The library contains about 400 ligands with either an aromatic fluorine or a trifluoromethyl moiety (27). Two compounds were found to bind with affinity in the micromolar range and were used for further development. Monoamino oxidases A and B (MAO A and MAO B) are flavin-containing enzymes that catalyze the oxidative deamination of neuroactive and vasoactive amine. Both enzymes are targets for psychiatric and neurological drugs. A mutagenesis study indicated that the residue Glu34 plays a major role in the active site of the B form, interacting with the 2¢-hydroxyl group of the adenine moiety of the FAD cofactor (28). The coenzyme first binds to the dinucleotide-binding site and then forms a covalent
.
Cofactor Chemogenomics
105
bond with Cys397, a residue that is located at a significant distance from Glu34. In order to inhibit the enzyme, flavin deoxyadenine dinucleotide (dFAD) and dideoxyadenine dinucleotide (ddFAD) were synthesized (29). These compounds are proposed to substitute the natural cofactor during the folding process of MAO, thus occupying the active site with an inert coenzyme. The resulting MAO A and B were catalytically inactive.
4. Tetrahydrofolate Tetrahydrofolate (Fig. 1e) and derivatives (Fig. 1f, g) are the biologically active forms of folate. This cofactor is involved in many, distinct enzymatic reactions, ranging from the amino acid metabolism, such as serine hydroxymethyltransferase (SHMT) (Fig. 5a), to nucleotide biosynthesis, such as thymidylate synthase (TS) (Fig. 5b) and dihydrofolate reductase DHFR (Fig. 5c). These enzymes are targets for anticancer drugs because they participate in the formation of thymidylate, the only nucleotide that cannot be obtained via the salvage reactions (30). Whereas the search for inhibitors of SHTM has only recently
Fig. 5. Three-dimensional structure of serine hydroxymethyltransferase (SHMT) with bound PLP and 5-formyltetrahydrofolate (102) (a),
106
Singh and Mozzarelli
Fig. 5. (continued) Thymidylate synthase with bound 5-hydroxymethylene-6-hydrofolic acid and 5-fluoro-2¢-deoxyuridine5¢-monophosphate (103) (b), dihydrofolate reductase with bound folate (104) (c).
begun favored by the determination of the three-dimensional structure of the human enzyme (31), extensive studies have been carried out on TS and DHFR in the search of inhibitors that mimic the deoxyuridylate monophosphate substrate as well as the folate moiety (32). Indeed, aminopterin and methotrexate, inhibitors of DHFR, were the first antimetabolites used in cancer therapy to treat childhood acute lymphoblastic leukemia (33). TS catalyzes the conversion of deoxyuridylate monophosphate to thymidylate monophosphate (dTMP) via a methylation reaction in which 5,10-methylene tetrahydrofolate is involved as a cofactor. Inhibitors of TS have been shown to possess
.
Cofactor Chemogenomics
107
broad spectrum activity as antiproliferatives and antitumor agents, and, thus, have become attractive candidates for cancer chemotherapy. Several potent inhibitors for TS were designed based on modifications of 5,10-methylene tetrahydrofolate, as shown (Fig. 6)(34), and on the cofactor-binding site by using the crystal structure of E. coli TS complexed with 5-fluoro-2¢deoxyuridylate (Fig. 7)(35).
Fig. 6. Chemical structures of inhibitors of thymidylate synthase designed based on modifications of 5,10-methylene tetrahydrofolate: (a) ZD1694, (b) OSI 1843U89. (c) CB3717 (34).
Fig. 7. Thymidylate synthase inhibitors designed to bind in the cofactor-binding site using crystal structure of E. coli enzyme complexed with 5-fluoro-2¢-deoxyuridylate (35).
108
Singh and Mozzarelli
Since the determination of the M tuberculosis genome sequence, various groups have used the genomic information to identify and validate targets as the basis for the development of new antituberculosis agents (36). Validation includes many components: characterization of the biochemical activity of the enzyme, determination of its crystal structure in complex with an inhibitor or a substrate, confirmation of essentiality, and the identification of potent growth inhibitors either in vitro or in an infection model. If novel target validation and subsequent inhibition are matched by an improved understanding of disease biology, then new antibiotics could have the potential to shorten the duration of therapy, prevent resistance development, and eliminate latent disease. Whereas bacteria synthesize folate de novo, mammals must assimilate preformed folate derivatives through an active transport system. Two enzymes of the folate biosynthesis pathway, dihydropteroate synthase and dihydrofolate reductase, are the validated targets of the widely used antibacterial sulfonamide drug trimethoprim (37). In spite of the pharmaceutical relevance of folate-containing enzymes, the chemogenomics approach has not yet been applied to this enzyme class.
5. Pantothenate Pantothenate (vitamin B5) (Fig. 1h) is a component of CoA and is also attached to acyl carrier proteins coordinating acyl groups during fatty acid biosynthesis. CoA is an essential cofactor in lipid metabolisms, cell signaling, and synthesis of polyketides and nonribosomal peptides. The four enzymes in pantothenate biosynthesis (Pan B-E) are validated targets for the development of effective antibiotics (38). Moreover, pantothenate kinase (PanK) is essential for growth and catalyzes the first of five steps leading to CoA biosynthesis. The strong correlation between structural plasticity, evolutionary conservation, and variability between prokaryotic and eukaryotic PanK has been exploited for the design of species-specific enzyme inhibitors. Three series of inhibitors were designed, the first targeting the substrate-binding site, the second targeting the pantothenate-binding site and an adjacent phenylalanine apolar pocket, and a third combining pantothenate-like high affinity scaffolds with substrate-like substituents (39). The inhibitor series were screened against PanK from E. coli, Aspergillus nidulans, Staphylococcus aureus, and mouse. The third series was found to be very effective in the inhibition of PanK from S. aureus and mouse. PanK from M. tuberculosis in complex with a derivative of the feedback inhibitor CoA has been crystallized and its structure
.
Cofactor Chemogenomics
109
solved (40). This information can serve as a basis for the development of enzyme inhibitors. By a classical screening procedure, a series of 5-tert-butyl-N-pyrazol-4-yl-4,5,6,7-tetrahydrobenzo[d] isoxazole-3-carboxamide derivatives were identified as novel potent M. tuberculosis pantothenate synthase inhibitors. This enzyme catalyzes amide bond formation of pantothenate from D-pantoate and beta-alanine accompanied by hydrolysis of MgATP into AMP and Mg-PPi. The best inhibitors exhibit IC50 lower than 100 nM (41). Ten analogs of the reaction intermediate pantoyl adenylate, obtained by replacing the phosphodiester bond by either an ester or a sulfamoyl moiety, were designed and tested. The latter modification was found to lead to more potent inhibitors (42). By an automated high-throughput screening 4,080 compounds were evaluated in their inhibitory action against PanK from M. tuberculosis, identifying a novel compound (43).
6. Pyridoxal 5¢-Phosphate Vitamin B6 is pyridoxine (Fig. 1i), that is transformed to pyridoxal 5¢-phosphate (PLP) (Fig. 1j), its biologically active form, via a series of reactions. PLP is the coenzyme of enzymes catalyzing a wide range of chemical reactions on amino acids, amines, and ketoacids (44). PLP is well known for its catalytic versatility that depends on the interaction with the protein matrix. PLP enzymes have been classified in different functional families, depending on the modified carbon atom and fold types. A representative example of the beta family and fold type II is O-acetylserine sulfhydrylase that catalyzes the formation of L-cysteine (Fig. 8). Several PLP enzymes are validated targets for human diseases (45). Representative examples are dopa decarboxylase (DDC) (Fig. 9a) and GABA aminotransaminase (GABA-AT) (Fig. 9b)(45). DDC catalyzes the PLP-dependent decarboxylation of L-DOPA to dopamine, as well as of L-5-hydroxytryptophan to serotonin, being both compounds potent neurotransmitters. DDC is irreversibly inactivated by either carbidopa (Fig. 9a) or benserazide, drugs used in Parkinson therapy. These compounds, or derivatives formed within the cell, react with PLP, leading to a stable, covalent complex. GABA-AT catalyzes the PLP-dependent transamination of GABA. GABA-AT is irreversibly inactivated by vigabatrin (Fig. 9b), a drug used in the therapy for seizure, via a complex reaction with the coenzyme to form a stable complex (45). Most of the PLP-dependent enzymes inhibitors are structurally similar to the substrate, catalytic intermediates, or transition-state analogs and exploit coenzyme reactivity. Recent examples are the irreversible inhibition of 8-amino-7oxononanoate synthase (46),
110
Singh and Mozzarelli
Fig. 8. Three-dimensional structure of the PLP-dependent O-acetylserine sulfhydrylase (105).
the inhibition by 1-amino-oxy-3-aminopropane of human and Leishmania donovani ornithine decarboxylase (47), the inhibition of Haemophilus influenzae(48) and M. tuberculosis O-acetylserine sulfhydrylase by the C-terminal peptide of serine acetyltransferase (49), the inhibition of N-acetylornithine aminotransferase from Salmonella typhimurium by gabaculine (50), and Escherichia coli L-aspartate aminotransferase by (S)-4-aminio-4,5-dihydro-2thiophenecarboxylic acid (51). In a few cases, cofactors analogs were used to inhibit PLPdependent enzymes. This is due to the relatively tight binding of PLP to the enzyme, as the coenzyme exhibits different point of interaction with active site residues. However, PLP affinity can vary at different stages of catalysis, with the PLP affinity higher at the level of internal aldimine (Fig. 1l) and lower when the external aldimine or PMP (Fig. 1k) are formed. As a consequence, PLP analogs should bind strongly in order to favorably compete with the coenzyme. In particular, to increase inhibitor affinity, PLP derivatives resembling an amino acid-PLP external aldimine were synthesized and tested (52). For example, in the search for drugs against tumor cells, N-(5¢-phosphopyridoxyl)-ornithine and N-(4¢-pyridoxyl)-ornithine were used to inhibit ornithine decarboxylase (53), and pyridoxyl-histidine methyl ester to inhibit histidine decarboxylase (54). In principle, pyridoxyl amino acids and other PLP derivatives can be used to treat bacteria, protozoans, and cancer cells evaluating their efficacy in depressing growth. This may lead to
.
Cofactor Chemogenomics
111
Fig. 9. Three-dimensional structure of dopa decarboxylase with the drug carbidopa covalently bound to PLP (106) (a), and GABA aminotransferase with the drug vigabatrin covalently bound to PLP (107) (b).
identifying novel target enzymes via a proteomic analysis. The treatment can be repeated with several pyridoxyl derivatives in order to identify the most potent against a set of targets. In some cases, the pyridoxyl derivatives might be a proinhibitor because they are chemically modified by cellular enzymes, as pyridoxine oxidase or pyridoxal kinase. Some examples of inhibition of
112
Singh and Mozzarelli
isolated PLP-dependent enzymes by pyridoxyl analogs aimed at replacing the cofactor are reported as follows. Serine dehydratase is a PLP-dependent enzyme that catalyzes the conversion of D-serine to pyruvate and ammonia. Spectral studies in which the natural cofactor was substituted with pyridoxal 5¢-sulfate, pyridoxal 5-deoxymethylene phosphonate, and pyridoxal 5¢-phosphate monomethyl ester were carried out to gain insight into the structural basis for binding of cofactor and substrate analogs (55). From reconstitution experiments of D-serine apodehydratase with various PLP analogs, it was deduced that substitutions at positions 2¢ and 6¢ of the coenzyme are not critical for catalytic activity of the dehydratase and the phosphate group in the 5¢-position of the cofactor in its dianionic form is essential. A similar approach using PLP analogs was applied in the investigation of O-acetylserine sulfhydrylase (56). A new cofactor analog, 6-Br-PLP, was synthesized by organic and enzymatic methods, and demonstrated to bind to the catalytic site of resolved or reduced GABA-AT, restoring its catalytic activity (57). These results indicate that 6-Br-PLP remains covalently attached to the amino acid residue of the catalytic site and acts like the natural cofactor. PLP is not only the cofactor of many enzymes, but also a molecule that can easily react with amine residues. Pyridine analogs were found to inhibit glucosyltransferase from Streptococcus mutans (58). S. mutans is a group of oral bacteria that are involved in cariogenesis. The bacteria exhibit the ability to adhere to salivacoated smooth surfaces. The adherence is an accumulation process in which bacteria bind to bacteria. This process depends on the presence of sucrose and is mediated by glucans (dextrans) produced from the disaccharide by glucosyltransferase(s) (commonly referred to as dextransucrase) from oral streptococci. A significant reduction of dextransucrase activity might lead to a decrease in the incidence of caries. Several strategies have been employed to depress dextransucrase activities in animal models. PLP and several other pyridine derivatives inhibit S. mutans dextransucrase (55). Dextransucrase has specific sites for the binding of PLP and structurally related compounds. PLP or analogs may have several advantages as agents to reduce the incidence of dental caries: (a) they inhibit glucan formation from sucrose, inactivating dextransucrase from S. mutans; (b) they prevent acid production, probably through inhibition of glycolysis carried out by S. mutans in the presence of sucrose, glucose, or fructose; (c) they markedly reduce the ability of S. mutans to transport D-glucose; (d) they prevent the sucrose-dependent adherence of S. mutans to saliva-coated hydroxylapatite; (e) PLP precursor, pyridoxine, is a required nutrient for humans and is largely converted to PLP; (f) there is no clear evidence for toxicity of PLP. “Excess” pyridoxine or analogs is not absorbed and is thereby eliminated (57).
.
Cofactor Chemogenomics
113
Other examples of enzymes covalently inhibited by reactions with PLP include rabbit muscle aldolase (59), bovine pancreatic RNase (60), bovine liver glutamate dehydrogenase (61), rabbit muscle phosphorylase (62), snake venom phospholipaseA2 (63), acetyl-CoA carboxylase (64). PLP has been also exploited as a novel weapon against autoimmunity and transplant rejection. Activation of CD4 T cells by antigen-presenting cells is required for the full expression of most autoimmune diseases and allogeneic transplant rejection. The extracellular part of CD4 molecule is composed of four domains, D1-D4. CD4 binds to MHC class II via its D1 and D2 domains. The elegant studies by Salhany et al. (65) have led to the discovery that PLP binds tightly to the D1 domain of the CD4 molecule. Therefore, it could interfere with proper CD4-MHC II interaction, preventing the HIV entry to CD4+ cells with potential implications for AIDS therapy. Considering that the D1 domain of CD4 plays a pivotal role in the CD4-MHC II interaction and that PLP binds tightly to this domain, Namazi (66) proposed that it can prevent D1–b2 interaction. PLP might be a useful addition to antiautoimmunity and antitransplant rejection armamentarium. PLP is a cheap compound and could be administered to humans intravenously in relatively high doses. Through failing CD4 to incorporate properly into the activation complex, PLP may induce apoptosis of T cells and tolerance to autoantigen(s). Occupancy of D1 by PLP may prevent the proper protein-protein interactions of the CD4 molecule itself, considered important in T cell activation (67). There is also evidence that CD4 molecules dimerize during T cell activation (68). D1 occupan cy by PLP may interfere with the dimerization process or with the interaction of this dimer with other molecules on the surface of the T cells, such as CD45. This could lead to the failure in generating activation signals, and the responding CD4 T cells may then be anergic or driven to apoptosis. Thus, PLP may prove of special value in the treatment of patients affected by both autoimmunity and HIV for whom the routine immunosuppressive treatments are contraindicated.
7. Thiamin Pyrophosphate Thiamin in its biological active form, thiamine pyrophosphate (TPP) (Fig. 1m), is the coenzyme of enzymes catalyzing the cleavage of the carbon-carbon bond adjacent to a carbonyl moiety, such as pyruvate dehydrogenase and transketolase (69, 70). The center of reactivity is the thiazolium ring. However, recent studies suggest a role also for the 4¢-iminopyridine ring (71).
114
Singh and Mozzarelli
Moreover, recently, TPP has been discovered to bind to bacterial riboswitch, regulating thiamin biosynthesis (72). It has been found that the antimicrobial compound pyrithiamin, an isosteric pyridine analog of thiamine (73), inhibiting eukaryotic thiamin pyrophosphorylase, binds to TPP riboswitch (74). Moreover, TPP-sensing riboswitch is uniquely found in eukaryotes and is inhibited by TPP analogs (75). Indeed, it has been suggested that coenzymes can also act as coribozymes, reciprocally modulating their reactivity (76). The pentose phosphate pathway is very active in tumor cells for the production of ribose. Because TPP-dependent transketolase is a key enzyme in this pathway, several thiamin analogs have been designed and tested for their inhibitory action (77–81). Moreover, benfotiamine, S-benzoylthiamine O-monophosphate, is an amphiphilic S-acyl TPP analog that prevents the progression of diabetic complications (79). The structure of these TPP analogs as well as TPP-substrate analogs may serve as the structural basis for a chemogenomics study aimed at identifying TPPdependent enzyme targets.
8. Heme The heme is the cofactor of several enzymes catalyzing redox reaction (Fig. 1n, o). Whereas in hemoglobin and myoglobin the iron coordinated to the four pyrrole rings remains reduced in order to bind oxygen, in most other heme-carrying proteins, the iron changes its redox state to fulfill function. The heme can be either covalently bound to the protein, such as in cytochrome c, or interact with amino acid residues in the protein hydrophobic pocket, with a strength that is protein and redox state dependent. Heme can also act as a regulatory molecule, as in cystathionine beta synthase (82). As for other cofactors, no chemogenomic approach has been yet applied to heme-proteins. We report a few examples of enzyme inhibition using heme derivatives that either target enzymes acting on heme or enzymes that are affected by heme binding. An innovative immunochemo-proteomics method was developed that allowed the identification of an inhibitor of the heme-binding protein (83). This method is based on the coupling of a small molecule, bis-indolylmaleimide-III, a known inhibitor of protein kinase C-a, to the FLAG peptidic epitope. This leads to the isolation of binding proteins from a cell lysate via the reaction of the FLAG epitope with anti-FLAG antibody beads.
.
Cofactor Chemogenomics
115
Bilirubin is a normal product of heme degradation, and the rate-limiting step in its formation is controlled by heme oxygenase. Heme oxygenase is also the enzyme responsible in vivo for the production of CO. Newborns produce bilirubin faster than they can dispose of it and thus experience a mild, transient jaundice after birth. Kappas (84) has developed a method for effectively controlling the production of bilirubin in newborns. The method involves use of an inhibitor that targets heme oxygenase. Thus, physicians can rapidly and predictably interdict hyperbilirubinemia at any point in the progression of jaundice. The inhibitor, Sn-mesoporphyrin, is a structural analog of heme and blocks the site of heme oxygenase where heme conversion to bilirubin is initiated. Other compounds that have been tested for the inhibition of heme oxygenase are zinc protoporphyrin IX (85), and imidazole-dioxolone derivatives (86). Human immunodeficiency virus type 1 (HIV-1)1 reverse transcriptase (RT) is a major target for chemotherapeutic agents used in the treatment of AIDS. It has been shown that heme and other synthetic heme analogs significantly inhibit HIV-1 RT activity in a noncompetitive manner with respect to deoxythymidine triphosphate and markedly enhance the inhibitory effect of AZT-TP on HIV-1 RT. Argyris et al. (87) demonstrated that metalloporphyrins are potent non-nucleoside inhibitors of HIV-1 RT and viral replication. It was shown that metalloporphyrins, in combination with the well-characterized NNI (non-nucleoside inhibitors) BHAP, inhibit the enzyme in a noncompetitive and nonexclusive manner. Moreover, metalloporphyrins enhance the inhibitory effect of BHAP by yielding an additive effect, indicating that heme and its analogs inhibit HIV-1 RT by binding to a distinct site with respect to the common binding site of NNIs.
9. S-Adenosyl Methionine S-adenosyl methionine (SAM) (Fig. 1p) is the main donor of activated methyl groups to biological compounds, including metabolites and N-, C-, and S-nucleophiles contained in DNA, RNA, and proteins (88). SAM is the cofactor with the second higher cellular concentration, with ATP being the highest. The SAM-dependent enzymes represent about 3% of cellular proteins (89). The catalyzed enzyme reaction is: S - adenosyl methionine + substrate ⇒ methylated substrate + S - adenosyl -homocysteine
116
Singh and Mozzarelli
Moreover, SAM has been found to act as a regulatory molecule in some enzymes, such as cystathionine beta-synthase (90). Recently, two classes of SAM analogs have been used to covalently label specific sequence of DNA (91). Other SAM analogs have been developed over the years to act as inhibitors of SAM-dependent methyltransferase enzymes (92, 93) including cyclopropane fatty acid synthase (94). SAM synthetase was identified as a potential target for the development of new antiparasitic drugs via a combined bioinformatics and chemoinformatic strategy (95). It should also be mentioned that SAM is the selective ligand of bacteria riboswitches, activating genes involved in coenzyme recycling (96). A chemogenomic approach using SAM analogs will unveil several target proteins, eventually essential in bacteria, parasites, or tumor cells. Because SAM does not bind very tightly to proteins, it might be feasible to develop a library of SAM analogs that bind and inactivate SAM-dependent enzymes or SAM-regulated enzymes.
10. Lipoamide, Biotin, and Cobalamin
Lipoamide (Fig. 1q) is the cofactor of enzymes involved in energy metabolism, such as pyruvic dehydrogenase and ketoglutarate dehydrogenase, and amino acid metabolism, such as branched chain ketoacid dehydrogenase. The cofactor is covalently linked to the enzyme via a stable covalent bond (Fig. 1r). During the catalytic reaction, lipoamide is reduced. The reduced cofactor is oxidized by a specific FAD-dependent lipoamide dehydrogenase that utilizes NAD+ as a reducing substrate. Biotin (Fig. 1s) is the cofactor of enzymes that add bicarbonate to metabolites, such as pyruvate carboxylase and acetylCoA carboxylase. Similar to lipoamide, the cofactor is covalently linked to the enzyme via a stable covalent bond (Fig. 1t). Because biotinylation is a post-translational modification that takes place on histones via the action of biotinidase, using biotin as a substrate, several biotin analogs were synthesized and tested for their ability to inhibit the human enzyme. Biotinidase biotinyl-methyl 4-(amidimethyl)benzoate was demonstrated to be the most potent competitive inhibitor (97). 5-deoxyadenosylcobalamin (Fig. 1u) is the cofactor of methylmalonyl CoA mutase, an enzyme catalyzing the conversion of methylmalonylCoA in succinylCoA, and methylcobalamin is the cofactor of 5-methyltetrahydrofolate-homocysteine methyltransferase (methionine synthase), an enzyme catalyzing the conversion of homocysteine in methionine. Both reactions take place via formation of radical species.
.
Cofactor Chemogenomics
117
The chemogenomics approach has not been applied in the identification of lipoamide-, biotin-, and cobalamin-dependent enzymes. Given the stable, covalent bond between lipoamide and biotin with enzymes, only the replacement with inactive cofactor analogs during protein folding can lead to the identification of enzymes associated to loss of function.
Acknowledgments This work was supported by COFIN 2007–2008 grant from the Italian Ministry of University and Research to A.M.
References 1. Bredel, M., and Jacoby, E. (2004) Chemogenomics: An emerging strategy for rapid target and drug discovery. Nat. Rev. Genet. 5, 262–275. 2. Caron, P.R., Mullican, M.D., Mashal, R.D., Michael, S.S., and Murcko, M.A. (2001) Chemogenomic approaches to drug discovery. Curr. Opin. Chem. Biol. 5, 464–470. 3. Hall, S.E. (2006) Chemoproteomics-driven drug discovery: Addressing high attrition rates. Drug Discov. Today 11, 495–502. 4. Stahura, F.L., Xue, L., Godden, J.W., and Bajorath, J. (1999) Molecular scaffold-based design and comparison of combinatorial libraries focused on the ATP-binding site of protein kinases. J. Mol. Graph Model. 17, 1–9. 5. Jacoby, E. (2001) A novel chemogenomics knowledge-based ligand design strategy – application to G protein-coupled receptors. Quant. Struct.-Act. Relat.20, 115–123. 6. Bleicher, K.H. (2002) Chemogenomics: Bridging a drug discovery gap. Curr. Med. Chem. 9, 2077–2084. 7. Sem, D.S., Bertolaet, B., Baker, B., et al. (2004) Systems-based design of bi-ligand inhibitors of oxidoreductases: Filling the chemical proteomic toolbox. Chem. Biol. 11, 185–194. 8. Lesk, A.M. (1995) NAD-binding domains of dehydrogenases. Curr. Opin. Struct. Biol. 5, 775–783. 9. Carugo, O., and Argos, P. (1997) NADPdependent enzymes. II: Evolution of the mono- and dinucleotide binding domains. Proteins 28, 29–40. 10. Kho, R.B.B., Newman, J.V., Jack, R.M., Sem, D.S., Villar, H.O., and Hansen, M.R.
11.
12.
13.
14.
15.
16.
17.
(2003) A path from primary protein sequence to ligand recognition. Proteins 50, 589–599. Sem, D.S., Yu, L., Coutts, S.M., and Jack, R. (2001) Object-oriented approach to drug design enabled by NMR SOLVE: First real-time structural tool for characterizing protein-ligand interactions. J. Cell. Biochem. Suppl. 37, 99–105. Ge, X., Wakim, B., and Sem, D.S. (2008) Chemical proteomics-based drug design: Target and antitarget fishing with a catecholrhodanine privileged scaffold for NAD(P) (H) binding proteins. J. Med. Chem. 15, 4571–4580. Cirillo, J.D., Weisbrod, T.R., Banerjee, A., Bloom, B.R., and Jacobs, W.R. Jr. (1994) Genetic determination of the meso-diaminopimelate biosynthetic pathway of mycobacteria. J. Bacteriol. 176, 4424–4429. Bladon, P. (1989) A rapid method for comparing and matching the spherical parameter surfaces of molecules and other irregular objects. J. Mol. Graph. 7, 130–137. Pellecchia, M., Meininger, D., Dong, Q., Chang, E., Jack, R., and Sem, D.S. (2002) NMR-based structural characterization of large protein-ligand interactions. J. Biomol. NMR 22, 165–173. Graves, P.R., Kwiek, J.J., Fadden, P., et al. (2002) Discovery of novel targets of quinoline drugs in the human purine binding proteome. Mol. Pharmacol. 62, 1364–1372. Hall, S.E. (2006) Chemoproteomics-driven drug discovery: Addressing high attrition rates. Drug Discov. Today 11, 495–502.
118
Singh and Mozzarelli
18. Read, J.A., Wilkinson, K.W., Tranter, R., Sessions, R.B., and Brady, R.L. (1999) Chloro quine binds in the cofactor binding site of Plasmodium falciparum lactate dehydrogenase. J. Biol. Chem. 274, 10213–10218. 19. Menting, J.G., Tilley, L., Deady, L.W., et al. (1997) The antimalarial drug, chloroquine, interacts with lactate dehydrogenase from Plasmodium falciparum. Mol. Biochem. Parasitol. 88, 215–224. 20. Wei, C.J., Lei, B., Musser, J.M., and Tu, S.C. (2003) Isoniazid activation defects in recombinant Mycobacterium tuberculosis catalase-peroxidase (KatG) mutants evident in InhA inhibitor production. Antimicrob. Agents. Chemother. 47, 670–675. 21. Bonnac, L., Gao, G.Y., Chen, L., et al. (2007) Synthesis of 4-phenoxybenzamide adenine dinucleotide as NAD analogue with inhibitory activity against enoyl-ACP reductase (InhA) of Mycobacterium tuberculosis. Bioorg. Med. Chem. Lett. 17, 4588–4591. 22. Aronov, A.M., Suresh, S., Buckner, F.S., et al. (1999) Structure-based design of submicromolar, biologically active inhibitors of trypanosomatid glyceraldehyde-3-phosphate dehydrogenase. Proc. Natl. Acad. Sci. U.S.A. 96, 4273–4278. 23. Chen, L., Petrelli, R., Felczak, K., et al. (2008) Nicotinamide adenine dinucleotide based therapeutics. Curr. Med. Chem. 15, 650–670. 24. Morgunova, E., Meining, W., Illarionov, B., et al. (2005) Crystal structure of lumazine synthase from Mycobacterium tuberculosis as a target for rational drug design: Binding mode of a new class of purinetrione inhibitors. Biochemistry 44, 2746–2758. 25. Cole, S.T., Eiglmeier, K., Parkhill, J., et al. (2001) Massive gene decay in the leprosy bacillus. Nature 409, 1007–1011. 26. Morgunova, E., Saller, S., Haase, I., et al. (2007) Lumazine synthase from Candida albicans as an anti-fungal target enzyme: Structural and biochemical basis for drug design. J. Biol. Chem. 282, 17231–17241. 27. Klages, J., Coles, M., and Kessler, H. (2007) NMR-based screening: A powerful tool in fragment-based drug discovery. Analyst 132, 693–705. 28. Zhou, B.P., Wu, B., Kwan, S.W., and Abell, C.W. (1998) Characterization of a highly conserved FAD-binding site in human monoamine oxidase B. J. Biol. Chem. 273, 14862–14868. 29. Abell, C.W., Kwan, S.-W., Zhou, B., Mamiya, B.M., and Lewis, D.A. (1998) Flavin adenine dinucleotide analog inhibitors of monoamine oxidase. U.S. 25 pp.
30. 31.
32. 33.
34. 35.
36.
37.
38.
39.
40.
41.
Cont.-in-part of U.S. Ser. No. 365,782, abandoned. CODEN: USXXAM US 5756479 A 19980526 CAN 129:36465 AN 1998:331562. Assaraf, Y.G. (2007) Molecular basis of antifolate resistance. Cancer Metastasis Rev. 26, 153–181. Renwick, S.B., Snell, K., and Baumann, U. (1998) The crystal structure of human cytosolic serine hydroxymethyltransferase: A target for cancer chemotherapy. Structure 6, 1105–1116. Gmeiner, W.H. (2005) Novel chemical strategies for thymidylate synthase inhibition. Curr. Med. Chem. 12, 191–202. Farber, S., Diamond, L.K., Mercer, R.D., Sylvester, R.F., and Wolff, V.A. (1948) Temporary remission in leukemia in children produced by folic antagonist 4-aminopteroylglutamic acid (aminopterin). N. Engl. J. Med. 238, 787–793. Anderson, A.C. (2003) The process of structure-based drug design. Chem. Biol. 10, 787– 797. Varney, M.D., Marzoni, G.P., Palmer, C.L., et al. (1992) Crystal-structure-based design and synthesis of benz[cd]indole-containing inhibitors of thymidylate synthase. J. Med. Chem. 35, 663–676. Mdluli, K., and Spigelman, M. (2006) Novel targets for tuberculosis drug discovery. Curr. Opin. Pharmacol. 6, 459– 467. Huovinen, P., Sundstrom, L., Swedberg, G., and Skold, O. (1995) Trimethoprim and sulfonamide resistance. Antimicrob. Agents Chemother. 39, 279–289. Spry, C., Kirk, K., and Saliba, K.J. (2008) Coenzyme A biosynthesis: An antimicrobial drug target. FEMS Microbiol. Rev. 32, 56–106. Virga, K.G., Zhang, Y.M., Leonardi, R., et al. (2006) Structure-activity relationships and enzyme inhibition of pantothenamidetype pantothenate kinase inhibitors. Bioorg. Med. Chem. 14, 1007–1020. Das, S., Kumar, P., Bhor, V., Surolia, A., and Vijayan, M. (2006) Invariance and variability in bacterial PanK: A study based on the crystal structure of Mycobacterium tuberculosis PanK. Acta Crystallogr. D Biol. Crystallogr. 62, 628–638. Velaparthi, S., Brunsteiner, M., Uddin, R., Wan, B., Franzblau, S.G., and Petukhov, P.A. (2008) 5-tert-butyl-N-pyrazol-4-yl4,5,6,7-tetrahydrobenzo[d]isoxazole-3carboxamide derivatives as novel potent inhibitors of Mycobacterium tuberculosis
.
42.
43.
44. 45.
46.
47.
48.
49.
50.
51.
52.
Cofactor Chemogenomics pantothenate synthetase: Initiating a quest for new antitubercular drugs. J. Med. Chem. 51, 1999–2002. Tuck, K.L., Saldanha, S.A., Birch, L.M., Smith, A.G., and Abell, C. (2006) The design and synthesis of inhibitors of pantothenate synthetase. Org. Biomol. Chem. 4, 3598–3610. White, E.L., Southworth, K., Ross, L., et al. (2007) A novel inhibitor of Mycobacterium tuberculosis pantothenate synthetase. J. Biomol. Screen. 12, 100–105. John, R.A. (1995) Pyridoxal phosphatedependent enzymes. Biochim. Biophys. Acta 1248, 81–96. Amadasi, A., Bertoldi, M., Contestabile, R., et al. (2007) Pyridoxal 5¢-phosphate enzymes as targets for therapeutic agents. Curr. Med. Chem. 14, 1291–1324. Alexeev, D., Baxter, R.L., Campopiano, D.J., et al. (2006) Suicide inhibition of alpha-oxamine synthases: Structures of the covalent adducts of 8-amino-7-oxononanoate synthase with trifluoroalanine. Org. Biomol. Chem. 4, 1209–1212. Dufe, V.T., Ingner, D., Heby, O., Khomutov, A.R., Persson, L., and Al-Karadaghi, S. (2007) A structural insight into the inhibition of human and Leishmania donovani ornithine decarboxylases by 1-amino-oxy-3aminopropane. Biochem. J. 405, 261–268. Huang, B., Vetting, M.W., and Roderick, S.L. (2005) The active site of O-acetylserine sulfhydrylase is the anchor point for bienzyme complex formation with serine acetyltransferase. J. Bacteriol. 187, 3201–3205. Schnell, R., Oehlmann, W., Singh, M., and Schneider, G. (2007) Structural insights into catalysis and inhibition of O-acetylserine sulfhydrylase from Mycobacterium tuberculosis. Crystal structures of the enzyme alpha-aminoacrylate intermediate and an enzyme-inhibitor complex. J. Biol. Chem. 282, 23473–23481. Rajaram, V., Ratna Prasuna, P., Savithri, H.S., and Murthy, M.R. (2008) Structure of biosynthetic N-acetylornithine aminotransferase from Salmonella typhimurium: Studies on substrate specificity and inhibitor binding. Proteins 70, 429–441. Liu, D., Pozharski, E., Lepore, B.W., et al. (2007) Inactivation of Escherichia coli L-aspartate aminotransferase by (S)-4amino-4,5-dihydro-2-thiophenecarboxylic acid reveals “a tale of two mechanisms”. Biochemistry 46, 10517–10527. Heller, J.S., Canellakis, E.S., Bussolotti, D.L., and Coward, J.K. (1975) Stable mul-
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
119
tisubstrate adducts as enzyme inhibitors. Potent inhibition of ornithine decarboxylase by N-(5¢-phosphopyridoxyl)-ornithine. Biochim. Biophys. Acta 403, 197–207. Wu, F., Grossenbacher, D., and Gehring, H. (2007) New transition state-based inhibitor for human ornithine decarboxylase inhibits growth of tumor cells. Mol. Cancer Ther. 6, 1831–1839. Wu, F., Yu, J., and Gehring, H. (2008) Inhibitory and structural studies of novel coenzyme-substrate analogs of human histidine decarboxylase. FASEB J. 22, 890–897. Schnackerz, K.D., Tai, C.H., Potsch, R.K., and Cook, P.F. (1999) Substitution of pyridoxal 5¢-phosphate in D-serine dehydratase from Escherichia coli by cofactor analogues provides information on cofactor binding and catalysis. J. Biol. Chem. 274, 36935–36943. Cook, P.F., Tai, C.H., Hwang, C.C., Woehl, E.U., Dunn, M.F., and Schnackerz, K.D. (1996) Substitution of pyridoxal 5¢-phosphate in the O-acetylserine sulfhydrylase from Salmonella typhimurium by cofactor analogs provides a test of the mechanism proposed for formation of the alpha-aminoacrylate intermediate. J. Biol. Chem. 271, 25842–25849. Choi, S.Y., Wee, S., and Kim, D.S. (1989) 6-Br-pyridoxal 5-phosphate a new cofactor analog of GABA transaminase. Han’guk Saenghwa Hakhoechi 22, 227– 232. Thaniyavarn, S., Taylor, K.G., Singh, S., and Doyle, R.J. (1982) Pyridine analogs inhibit the glucosyltransferase of Streptococcus mutans. Infect. Immun. 37, 1101– 1111. Anai, M., Lai, C.Y., and Horecker, B.L. (1973) The pyridoxal phosphate-binding site of rabbit muscle aldolase. Arch. Biochem. Biophys. 156, 712–719. Raetz, C.R., and Auld, D.S. (1972) Schiff bases of pyridoxal phosphate with active center lysines of ribonuclease A. Biochemistry 11, 2229–2236. Chen, S.S., and Engel, P.C. (1975) The equilibrium position of the reaction of bovine liver glutamate dehydrogenase with pyridoxal5¢phosphate. A demonstration that covalent modification with this reagent completely abolishes catalytic activity. Biochem. J. 147, 351–358. Forrey, A.W., Sevilla, C.L., Saari, J.C., and Fischer, E.H. (1971) Sequence of a segment of muscle glycogen phosphorylase containing
120
63.
64.
65.
66.
67.
68.
69. 70.
71.
72. 73.
74.
Singh and Mozzarelli the pyridoxal 5¢-phosphate binding site. Biochemistry 10, 3132–3140. Viljoen, C.C., Visser, L., and Botes, D.P. (1977) Histidine and lysine residues and the activity of phospholipase A2 from the venom of Bitis gabonica. Biochim. Biophys. Acta 483, 107–120. Lee, W.M., Elliott, J.E., and Brownsey, R.W. (2005) Inhibition of acetyl-CoA carboxylase isoforms by pyridoxal phosphate. J. Biol. Chem. 280, 41835–41843. Salhany, J.M., and Schopfer, L.M. (1993) Pyridoxal 5¢-phosphate binds specifically to soluble CD4 protein, the HIV-1 receptor. Implications for AIDS therapy. J. Biol. Chem. 268, 7643–7645. Namazi, M.R. (2003) Pyridoxal 5¢-phosphate as a novel weapon against autoimmunity and transplant rejection. FASEB J. 17, 2184–2186. Zerbib, A.C., Reske-Kunz, A.B., Lock, P., and Sekaly, R.P. (1994) CD4-mediated enhancement or inhibition of T cell activation does not require the CD4:p56lck association. J. Exp. Med. 179, 1973–1983. Marini, J.C., Jameson, B.A., Lublin, F.D., and Korngold, R. (1996) A CD4-CDR3 peptide analog inhibits both primary and secondary autoreactive CD4+ T cell responses in experimental allergic encephalomyelitis. J. Immunol. 157, 3706–3715. Jordan, F. (2004) Biochemistry. How active sites communicate in thiamine enzymes. Science 306, 818–820. Frank, R.A., Titman, C.M., Pratap, J.V., Luisi, B.F., and Perham, R.N. (2004) A molecular switch and proton wire synchronize the active sites in thiamine enzymes. Science 306, 872–876. Nemeria, N., Chakraborty, S., Baykal, A., Korotchkina, L.G., Patel, M.S., and Jordan, F. (2007) The 1¢,4¢-iminopyrimidine tautomer of thiamin diphosphate is poised for catalysis in asymmetric active centers on enzymes. Proc. Natl. Acad. Sci. U.S.A. 104, 78–82. Cochrane, J.C., and Strobel, S.A. (2008) Riboswitch effectors as protein enzyme cofactors. RNA 14, 993–1002. Heinrich, P.C., Steffen, H., Janser, P., and Wiss, O. (1972) Studies on the reconstitution of apotransketolase with thiamine pyrophosphate and analogs of the coenzyme. Eur. J. Biochem. 30, 533–541. Sudarsan, N., Cohen-Chalamish, S., Nakamura, S., Emilsson, G.M., and Breaker, R.R. (2005) Thiamine pyrophosphate riboswitches are targets for the antimicro-
75.
76. 77.
78.
79.
80.
81.
82.
83.
84.
85.
bial compound pyrithiamine. Chem. Biol. 12, 1325–1335. Thore, S., Frick, C., and Ban, N. (2008) Structural basis of thiamine pyrophosphate analogues binding to the eukaryotic riboswitch. J. Am. Chem. Soc. 130, 8116–8117. Jadhav, V.R., and Yarus, M. (2002) Coenzymes as coribozymes. Biochimie 84, 877–888. Arjunan, P., Chandrasekhar, K., Sax, M., et al. (2004) Structural determinants of enzyme binding affinity: The E1 component of pyruvate dehydrogenase from Escherichia coli in complex with the inhibitor thiamin thiazolone diphosphate. Biochemistry 43, 2405–2411. Thomas, A.A., Le Huerou, Y., De Meese, J., et al. (2008) Synthesis, in vitro and in vivo activity of thiamine antagonist transketolase inhibitors. Bioorg. Med. Chem. Lett. 18, 2206–2210. Volvert, M.L., Seyen, S., Piette, M., et al. (2008) Benfotiamine, a synthetic S-acyl thiamine derivative, has different mechanisms of action and a different pharmacological profile than lipid-soluble thiamine disulfide derivatives. BMC Pharmacol. 8, 10. Strumilo, S., Czygier, M., and Markiewicz, J. (1996) Different extent of inhibition of pyruvate dehydrogenase and 2-oxoglutarate dehydrogenase both containing endogenous thiamine pyrophosphate, by some anticoenzyme analogues. J. Enzyme Inhib. 10, 65–72. Le Huerou, Y., Gunawardana, I., Thomas, A.A., et al. (2008) Prodrug thiamine analogs as inhibitors of the enzyme transketolase. Bioorg. Med. Chem. Lett. 18, 505–508. Kery, V., Bukovska, G., and Kraus, J.P. (1994) Transsulfuration depends on heme in addition to pyridoxal 5¢-phosphate. Cystathionine beta-synthase is a heme protein. J. Biol. Chem. 269, 25283–25288. Saxena, C., Zhen, E., Higgs, R.E., and Hale, J.E. (2008) An immuno-chemo-proteomics method for drug target deconvolution. J. Proteome Res. 7, 3490–3497. Kappas, A. (2004) A method for interdicting the development of severe jaundice in newborns by inhibiting the production of bilirubin. Pediatrics 113, 119–123. Tozer, G.M., Prise, V.E., Motterlini, R., Poole, B.A., Wilson, J., and Chaplin, D.J. (1998) The comparative effects of the NOS inhibitor, Nomega-nitro-Larginine, and the haemoxygenase inhibitor, zinc protoporphyrin IX, on tumour
.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
Cofactor Chemogenomics blood flow. Int. J. Radiat. Oncol. Biol. Phys. 42, 849–853. Kinobe, R.T., Ji, Y., Vlahakis, J.Z., et al. (2007) Effectiveness of novel imidazoledioxolane heme oxygenase inhibitors in renal proximal tubule epithelial cells. J. Pharmacol. Exp. Ther. 323, 763–770. Argyris, E.G., Vanderkooi, J.M., Venkateswaran, P.S., Kay, B.K., and Paterson, Y. (1999) The connection domain is implicated in metalloporphyrin binding and inhibition of HIV reverse transcriptase. J. Biol. Chem. 274, 1549–1556. Fontecave, M., Atta, M., and Mulliez, E. (2004) S-adenosylmethionine: Nothing goes to waste. Trends Biochem. Sci. 29, 243–249. Kagan, R.M., and Clarke, S. (1994) Widespread occurrence of three sequence motifs in diverse S-adenosylmethionine-dependent methyltransferases suggests a common structure for these enzymes. Arch. Biochem. Biophys. 310, 417–427. Janosik, M., Kery, V., Gaustadnes, M., Maclean, K.N., and Kraus, J.P. (2001) Regulation of human cystathionine beta-synthase by S-adenosyl-L-methionine: Evidence for two catalytically active conformations involving an autoinhibitory domain in the C-terminal region. Biochemistry 40, 10625–10633. Klimasauskas, S., and Weinhold, E. (2007) A new tool for biotechnology: AdoMetdependent methyltransferases. Trends Biotechnol. 25, 99–104. Borchardt, R.T., Wu, Y.S., and Wu, B.S. (1978) Potential inhibitors of S-adenosylmethionine-dependent methyltransferases. 7. Role of the ribosyl moiety in enzymatic binding of S-adenosyl-L-homocysteine and S-adenosyl-L-methionine. J. Med. Chem. 21, 1307–1310. Thompson, M.J.M., Hornby, D.P., and Blackurn,G.M. (1999) Synthesis of two stable nitrogen analogues of S-adenosyl-L-methionine. J. Org. Chem. 64, 7467–7473. Guerard, C., Breard, M., Courtois, F., Drujon, T., and Ploux, O. (2004) Synthesis and evaluation of analogues of S-adenosylL-methionine, as inhibitors of the E. coli cyclopropane fatty acid synthase. Bioorg. Med. Chem. Lett. 14, 1661–1664. Krasky, A., Rohwer, A., Schroeder, J., and Selzer, P.M. (2007) A combined bioinformatics and chemoinformatics approach for the development of new antiparasitic drugs. Genomics 89, 36–43. Wang, J.X., Lee, E.R., Morales, D.R., Lim, J., and Breaker, R.R. (2008) Riboswitches
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
121
that sense S-adenosylhomocysteine and activate genes involved in coenzyme recycling. Mol. Cell. 29, 691–702. Kobza, K.A., Chaiseeda, K., Sarath, G., Takacs, J.M., and Zempleni, J. (2008) Biotinyl-methyl 4-(amidomethyl)benzoate is a competitive inhibitor of human biotinidase. J. Nutr. Biochem. 19, 826–832. Coquelle, N., Fioravanti, E., Weik, M., Vellieux, F., and Madern, D. (2007) Activity, stability and structural studies of lactate dehydrogenases adapted to extreme thermal environments. J. Mol. Biol. 374, 547–562. Harman, C.A., Turman, M.V., Kozak, K.R., Marnett, L.J., Smith, W.L., and Garavito, R.M. (2007) Structural basis of enantioselective inhibition of cyclooxygenase-1 by S-alpha-substituted indomethacin ethanolamides. J. Biol. Chem. 282, 28096–28105. Cirilli, M., Zheng, R., Scapin, G., and Blanchard, J.S. (2003) The three-dimensional structures of the Mycobacterium tuberculosis dihydrodipicolinate reductase-NADH-2,6PDC and -NADPH-2,6-PDC complexes. Structural and mutagenic analysis of relaxed nucleotide specificity. Biochemistry 42, 10644–10650. Cameron, A., Read, J., Tranter, R., et al. (2004) Identification and activity of a series of azole-based compounds with lactate dehydrogenase-directed anti-malarial activity. J. Biol. Chem. 279, 31429–31439. Trivedi, V., Gupta, A., Jala, V.R., et al. (2002) Crystal structure of binary and ternary complexes of serine hydroxymethyltransferase from Bacillus stearothermophilus: Insights into the catalytic mechanism. J. Biol. Chem. 277, 17161–17169. Perry, K.M., Carreras, C.W., Chang, L.C., Santi, D.V., and Stroud, R.M. (1993) Structures of thymidylate synthase with a C-terminal deletion: Role of the C-terminus in alignment of 2¢-deoxyuridine 5¢-monophosphate and 5,10-methylenetetrahydrofolate. Biochemistry 32, 7116–7125. Sawaya, M.R., and Kraut, J. (1997) Loop and subdomain movements in the mechanism of Escherichia coli dihydrofolate reductase: Crystallographic evidence. Biochemistry 36, 586–603. Burkhard, P., Rao, G.S., Hohenester, E., Schnackerz, K.D., Cook, P.F., and Jansonius, J.N. (1998) Three-dimensional structure of O-acetylserine sulfhydrylase from Salmonella typhimurium. J. Mol. Biol. 283, 121–133. Burkhard, P., Dominici, P., Borri-Voltattorni, C., Jansonius, J.N., and Malashkevich,
122
Singh and Mozzarelli
V.N. (2001) Structural insight into Parkinson’s disease treatment from drug-inhibited DOPA decarboxylase. Nat. Struct. Biol. 8, 963–967. 07. Storici, P., De Biase, D., Bossa, F., et al. 1 (2004) Structures of gamma-aminobu-
tyric acid (GABA) aminotransferase, a pyridoxal 5¢-phosphate, and [2Fe-2S] cluster-containing enzyme, complexed with gamma-ethynyl-GABA and with the antiepilepsy drug vigabatrin. J. Biol. Chem. 279, 363–373.
Chapter 5 Chemogenomics with Protein Secondary-Structure Mimetics Garland R. Marshall, Daniel J. Kuster, and Ye Che Summary During molecular recognition of proteins in biological systems, helices, reverse turns, and b-sheets are dominant motifs. Often there are therapeutic reasons for blocking such recognition sites, and significant progress has been made by medicinal chemists in the design and synthesis of semirigid molecular scaffolds on which to display amino acid side chains. The basic premise is that preorganization of the competing ligand enhances the binding affinity and potential selectivity of the inhibitor. In this chapter, current progress in these efforts is reviewed. Key words: Peptidomimetic, a-helix, b-sheet, b-turn, Reverse turn, Privileged scaffold, Recognition motif, Protein/protein interactions
1. Introduction 1.1. Molecular Recognition
Specificity and dynamics, the sine quibus non of biological systems, are determined by the relative affinities of interacting molecular surfaces and their rates of association. One guiding principle has emerged; biological systems optimize rates of processes and not affinities. Elucidating the functional significance of protein interactions controlling dynamic biological processes requires development of a second generation of chemical probes. The normal genetics approach to ascertain the functional role of a protein is to eliminate/modulate its production in the cell. As most intracellular proteins serve as nodes in a complex and dynamic network of protein–protein interactions, elimination, or modulation of an individual node can manifest itself with a wide variety of phenotypes. Much more insight comes from the ability to specifically
Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_5, © Humana Press, a part of Springer Science + Business Media, LLC 2009
123
124
Marshall, Kuster, and Che
sever a particular protein-protein connection in the network with a reversible molecular scalpel and observe any time-dependent perturbations that result to the biological system, versus complete knockout of the functional node. Data from such experiments provide insight into the feedback loops and the magnitude of their interactions as well as their dynamic responses required for development of a systems biology model of the complex network under study. 1.2. Premise of Preorganization
A basic assumption underlying much of the research in drug design and protein engineering is the concept that preorganization will improve the affinity/stability of the analog compared with the parent peptide/protein. Preorganization as a guiding principle in synthetic chemistry of complexes was enunciated by Charles Pederson (1) in his studies on crown ethers. This principle dominated the research of Donald J. Cram and Jean-Marie Lehn on supramolecular hosts/guests as outlined in their Nobel Lectures of 1987 (2, 3). This premise is based on assumed differences in free energy between the complex with the parent versus that of the preorganized analog: D DG = (DHa – TDSa) (preorganized analog) – (DHp – TDSp) (parent) It is assumed that DHa and DHp are approximately the same if the surface interactions of the preorganized analog skillfully mimic those of the parent within the binding site (4). Therefore, D DG » – TDSa (preorganized analog) + TDSp (parent) The preorganized analog has already reduced its entropy of binding by virtue of chemical incorporation of constraints, i.e., TDSa < TDSp. Thus, the preorganized analog must have a higher affinity for the receptor (Fig. 1). While this logic is clear and numerous
Fig. 1. Theoretical behavior of conformational entropy in the case of binding a flexible ligand (bottom) versus a corresponding constrained analog (top) to the same molecularrecognition site.
Chemogenomics with Protein Secondary-Structure Mimetics
125
examples from the literature support this interpretation, there are examples where this paradigm has been challenged with experimental thermodynamic data. Cases of entropy–enthalpy compensation probably arise when the assumption that DHa = DHp is not justified. For example, the estimated change in binding entropy upon constraining a single rotatable bond varies between 1.6 and 4.5 KJ/mol (5–9); such large changes in affinities upon introduction of conformational constraints in an analog are rather rare in the medicinal chemistry literature. One possible example is the cyclic a-MSH analog of Sawyer et al. (10); unfortunately, no measurements of changes in the enthalpy and entropy of binding were made upon introduction of the cyclic constraint. It is, therefore, impossible to attribute the 10,000-fold increase in affinity to changes in binding entropy alone. Advances in molecular simulations combined with adequate computing resources have led to methods for calculation of binding affinities of ligands to their protein receptors. Such methods can also yield estimates of binding entropies and enthalpies as well. While these approaches continue to advance, detailed discussion is far beyond the scope of this review; the reader is referred to papers from the groups of Gilson (11, 12), Jorgensen (13, 14), Roux (15), and others (16, 17) for more complete analyses of the problems addressed and approximations used. Freire and his colleagues have utilized isothermal titration calorimetry (ITC) to characterize the thermodynamics of ligand binding and resolve the enthalpic and entropic components of affinity (18– 23). Their studies on inhibitors of HIV-1 protease by ITC to guide optimization of affinity and to avoid resistance were very insightful (24–30). ITC allows direct measurement of the changes in entropies and enthalpies of binding of ligands and their constrained analogs to protein recognition sites. A database of thermodynamic binding data for over 400 complexes is available on the internet (31). One case from the Martin group (32) examined the binding of preorganized pseudopeptides to the Grb2-SH2 domain where thermodynamic data from ITC were compared with crystal structures of the complexes. They compared binding of Ac-p Tyr-Val-Asn-NH2 with an analog in which the c1 bond was constrained by a cyclopropane ring. The analog bound twice as tight as the unconstrained peptide, but the enhanced affinity was due to enhanced enthalpy of binding rather than the anticipated decrease in the entropy of binding. The crystal structures of the two peptides bound to monomeric GRb2 SH2 domains were determined and bound in similar b-turn-like conformations. The major difference in the two complexes was the position of the BC loop that makes up part of the phosphate-binding pocket with movement of loop backbone atoms up to 2.0 Å. Clearly, the mimicry of the interacting surfaces of the two peptides must differ significantly to induce such a large change in the binding pocket of the recognition site.
126
Marshall, Kuster, and Che
Thus, these observations point out the difficulties in design of peptidomimetics with correct surface similarity without challenging the validity of the premises of preorganization. The Spaller group (33) has questioned any simplistic interpretation of the impact of preorganization, based on an ITC study. Three peptides cyclized through their side chains, analogs of the hexapeptide, Tyr-Lys-Lys-Thr-Lys-Val, and their linear controls were compared for binding to a PDZ domain. A potential flaw in their experimental design is the tacit assumption that cleaving a macrocycle yields a linear molecular capable of an identical bound conformation in the complex. By definition, a covalent bond holds two atoms closer than the sum of their van der Waals radii and the interatomic distances between bonded atoms must be increased on bond cleavage. Bastros et al. have also utilized ITC to analyze the thermodynamics of binding of a disulfide-constrained S-peptide compared with S-peptide itself to ribonuclease S (34). It is the purpose of this review to provide a guide to the current status of efforts to generalize technology for specifically inhibiting protein–protein interactions through preorganization as well as the limitations and bottlenecks that have been encountered. 1.3. Recognition “Hot Spots”
Because the sizes of recognition surfaces are large at protein interfaces, important recognition residues at the interface are often not contiguous and can be spread out over the interface. Some of the side chains within a protein/protein interface play a more significant role than others in the energetics of binding and in the determination of the relative orientation of the two proteins. Experimentally, this information on “hot spots” has been obtained by systematic mutagenesis of side chains within the interface to alanine and determination of the changes in binding affinity. Bogan and Thorn (35) collected a database of 2,325 alanine mutants for which the change in free energy of binding upon mutation to alanine had been measured (an updated database ASEdb is available at http://www.asedb.org). Analysis of the database by Bogan and Thorn (35) generated several observations; amino acid side chains in hot spots are located near the center of protein/protein interfaces, are generally solvent inaccessible, and are self-complementary across protein/protein interfaces, i.e., they align and pack against one another. Out of 31 contact residues involved in the interaction of growth hormone with its receptor, for example, two tryptophan residues of the receptor accounted for over 75% of the free energy of binding as determined by mutation to alanine (36). In the case of growth hormone itself, eight of the 31 side chains involved in the interface accounted for approximately 85% of the binding energy; thus, the genesis of the “hot-spot” theory (37) as a basis for inhibitor design and drug discovery. A recent review by Ma
Chemogenomics with Protein Secondary-Structure Mimetics
127
and Nussinov emphasized Trp/Met/Phe hot spots in protein/ protein interactions (38). To emphasize the dominance of side chains that interact with hot spots in molecular recognition, in the case of recognition of the peptide hormone somatostatin by its G-protein-coupled receptors (GPCRs), many of the amide bonds can be reduced (39), the direction of amide bonds reversed (40, 41), or even the whole peptide backbone replaced by a saccharide with recognition retained at the receptor (42, 43).
2. SecondaryStructural Mimetics 2.1. Definition
2.2. Privileged Scaffold
For the purposes of this discussion, a semirigid secondary-structural mimetic has limited thermodynamically accessible conformations; one of the most stable of which has a surface that mimics the interacting surface at a protein–protein interface, either intra- or intermolecular. This can be accomplished by incorporation of conformational constraints into the peptide itself, or by design and synthesis of novel organic structures that do not used the amide scaffold as a backbone. It is obvious from examination of multiple protein-ligand complexes that multiple classes of chemical compounds can bind to any given secondary-structure recognition site of a protein surface with diverse sets of interactions. As these compounds do not necessarily mimic the interactions seen at the interface of the protein-protein complex in a general and transferable way, they do not qualify as true mimetics, only as competitive binding ligands. An excellent example is the diversity of chemical structures that have been found to inhibit the p53/ MDM2 complex with no apparent relation to the structure of the peptide helix of p53 (44). A nonpeptide semirigid organic scaffold on which to append amino acid-like side chains to mimic the interacting surfaces of peptides and proteins. The organic scaffolds that first generated the use of this term, by Evans et al. (45), were the benzodiazepines. Benzodiazepines had been recognized much earlier by Ripka et al. (46, 47) as reverse-turn mimetics. A subsequent more detailed conformational analysis by Hata and Marshall (48) confirmed the earlier insights of Ripka et al. (46, 47) and provided explicit rationale for the ability of these ring systems to mimic reverse turns. In particular, benzodiazepines have proven to be an excellent privileged scaffold for GPCRs – a recent example is the melanocortin receptor (49). Excellent reviews of a variety of assumed privileged scaffolds and their synthetic chemistry have been published by Horton et al. (50) and by Patchett and Nargund (51). Waldmann and his colleagues have generalized this concept to
128
Marshall, Kuster, and Che
natural products as obviously having recognition sites on proteins (52, 53) and have classified such natural product scaffolds in a hierarchical manner (54). The plethora of crystal structures of protein–protein complexes now available in the Protein Data Bank (PDB) provide detailed examples of the dominant role that helices, sheets, and reverse turns play as recognition motifs. In the case of recognition of helices and reverse turns, the surface formed by the orientation of side chains provides the three-dimensional motif. In most of these cases, the peptide backbone only serves as a scaffold on which to display the interacting side chains. Obviously, sheet recognition combines backbone intermolecular interactions with side-chain recognition elements (55). A prime example of sheet recognition occurs with proteolytic enzymes where the peptide backbone hydrogen bonds to the enzyme with the correct sequence alignment provided by specific side-chain pockets in the enzyme active site. Tyndall et al. (56) reviewed molecular recognition of biological peptides by GPCRs and concluded that reverse-turn recognition was the dominant motif in over 100 cases of peptide-GPCR activation. A similar conclusion had been reached previously by Marshall (57). In the case of protein/DNA recognition, helix motifs are a common theme (58, 59), while RNAs can bind to the surfaces of b-sheets (Fig. 2) as exemplified by the U1A-UTR complex (60). It should not be surprising, therefore, that mimicry of peptide secondary-structural motifs to block protein/protein interactions has dominated a large segment of the bioorganic community for decades (see reviews by Ripka and Rich (61), Fairlie et al. (4), and Hershberger et al. (62)).
2.4. Receptor-Bound Conformation of Peptides
Past approaches to determine the conformation of biologically active peptides when bound to receptors of unknown threedimensional structure have achieved limited success. In part, this
B & W IN PRINT
2.3. Secondary Structures as Recognition Elements
Fig. 2. Induced fit of both free RNA and U1A. In the complex, the RNA recognition occurs on the faces of the b-sheet.
Chemogenomics with Protein Secondary-Structure Mimetics
129
O
O
NH
NH H H
(CH2)n
CONH2
O
N
O
H N NH
CONH2
O
N
N
N
O
(CH2)n
H N NH
Fig. 3. Conversion of peptide TRH, Pca-His-Pro-NH2, into pentacyclic compound to mimic the deduced receptor-bound conformation.
was due to the serial manner in which the research was done as well as the synthetic difficulty in removing the rotational degrees of freedom from a peptide without concomitant loss of receptor recognition. Little evidence supports direct interaction of the amide bonds of the peptide backbone in receptor recognition, in contrast to recognition by proteolytic enzymes. To emphasize the importance of side chains, in the case of somatostatin, many of the amide bonds can be reduced (39), the direction of amide bonds in the backbone can be reversed (40), or even the whole peptide backbone can be replaced by a saccharide with recognition retained at the receptor (63, 64). Similar studies on the tripeptide hormone, thyrotropin TRH, Glp-His-Pro-NH2, led to a CNSactive analog with the backbone amides replaced by a cyclohexyl scaffold (65) that did not release TSH. Studies by Marshall and Moeller (66–69) determining the bioactive conformation of TRH have led to polycyclic analogs (Fig. 3) that retain activity at the endocrine receptor responsible for TSH release, but do not show the anticipated increase in binding affinity to be expected if the receptor-bound conformation were mimicked correctly. One concern with this approach is the fact that replacing two nonbonded atoms in close proximity by two bonded atoms changes the conformation of the mimetic relative to the receptor-bound conformation because the bond length is shorter than the sum of the van der Waals radii. Hanessian and Auzzas have reviewed (70) the use of ring constraints in pepidomimetics focusing on their own work on the syntheses of azacycles containing proline and pipecolic acid as an integral part of the polycyclic structure.
3. Reverse-Turn Mimetics 3.1. Reverse-Turn Motifs
Classic reverse (71, 72) turns have been defined by the two sets of j, y torsional angles associated with the middle i + 1 and i + 2 residues of the turn. These variables determine whether a
130
Marshall, Kuster, and Che
hydrogen bond is feasible between the carbonyl of residue i and the a-amino group of residue i + 3, the original definition of a classic b-turn (71). Such a limitation to four descriptors does not specify the relative orientation of the side chains of residue i and i + 3 with respect to the side chains of residues i + 1 and i + 2. This requires specification of two additional backbone torsional variables, namely y of residue i and j of residue i + 3. 3.2. Criteria for Reverse-Turn Mimicry
Tran et al. (73) have developed a system that classifies reverse turns by the relative orientation of the four i to i + 3 side chains. Since this relative orientation of side chains dramatically constrains the recognition surface that a turn could present with only the c torsional angles of the side chains themselves remaining as variables, this transformation provides a more direct means of visualizing potential surface mimicry that is relevant to motif recognition. This approach has been used to classify contiguous four-residue loop segments from the PDB into 39 clusters representing 89% of the 23,331 loops in the dataset. Of even more interest was the fact that peptidomimetic scaffolds with reported biological activities matched loop scaffold geometries while those scaffolds without reported biological activities did not (74). These observations strongly support the premise of privileged scaffolds as well as the ability of peptidomimetics to mimic the geometrical molecular-recognition requirements of protein-protein interactions. Arbor and Marshall (75) have generated a virtual library of cyclic tetrapeptides (CTPs) and used the canonical reverse-turn Ca–Cb clustering of Tran et al. (73) of the current PDB to show that 54% of the reverse-turns present in the PDB had eight or more CTPs in the virtual library that mimicked the orientation of all four of the Ca–Cb bonds in the reverse turn. This validates the utility of CTPs as generic reverse-turn mimetics.
3.3. Examples of Turn Mimetics
One of the first examples of constraining a small peptide through cyclization to generate a model b-turn was the use of e-aminocaproic acid by the Scheraga lab to bridge the N- and C-terminals of the dipeptide Ala-Gly in 1979 (76). Experimental data were consistent with a type II b-turn with a stabilizing hydrogen bond between the amide hydrogen and carbonyl groups of the e-aminocaproic acid moiety.
3.3.1. a, b-Dehydroamino Acids
Chauhan and colleagues have studied extensively the impact of incorporation of dehydroamino acids, such as dehydrophenylalanine (DPhe), on the backbone conformation of peptides (77–80). In general, reverse-turn conformations are preferred in shorter peptides (81, 82) while longer peptides induce 310-helices (83). DPhe is planar with no chiral a-carbon and can be incorporated into both right- and left-handed helices. The most favorable conformation of DPhe residues are (j, y) approximately (−60°, −30°), (−60°, 150°), (80°, 0°), or their mirror images (80).
Chemogenomics with Protein Secondary-Structure Mimetics
131
3.3.2. Proline and Heterochiral N-methyl Amino Acids as Turn Inducers
Proline has long been known to help a peptide adopt a reverse-turn conformation (71). For example, the classic type VI turn was defined as having a cis-amide bond between residues i + 1 and i + 2, which proline facilitates in the i + 2 position due to its disubstituted amide nitrogen. The sequence d-Pro-l-Pro in particular has been found to adopt a reverse-turn conformation (84–86) and alternating d/l amino acid sequences facilitate cyclization of small peptides as discussed in a review of cyclization of peptides and depsipeptides by Davies (87). Particular difficulties in cyclization of linear tetrapeptides containing residues of the same chirality have been found (88). Durani has discussed designing small folded proteins based on an alphabet of d- and l-amino acids (89). Introduction of an N-methyl amino acid places a significant limitation on the backbone conformations of the peptide chain (90). By replacing the amide hydrogen with a methyl group, the propensity for a turn to continue to grow into a helix is diminished by forcing a discontinuity in hydrogen bonding. Proline is found within helices in proteins, but characteristically forces a kink in the helix to accommodate the steric bulk of two methylene groups attached to the amide nitrogen (91). In addition, amide methylation often offers improved properties of peptide analogs related to bioavailability and enhanced stability to proteolytic enzymes when used in biological systems (92).
3.3.3. Impact of Unusual Amino Acids and Amide Surrogates on Turn Formation
A number of analogs of proline have been investigated for their ability to stabilize an N-terminal cis-amide bond. Che and Marshall (93) compared the impact of different analogs on amide cis-trans isomerism and peptide conformation (Fig. 4). Conformational preferences for the cis-amide bond and the type VI turn were investigated at the MP2/6–31+G** level of quantum theory with a polarizable continuum water model. Proline analogs that stabilized the cis-amide did so through different mechanisms: (1) 5-alkylproline, with a bulky hydrocarbon substituent on the Cd of proline, increased the cis-amide population through steric hindrance between the alkyl substituent and the N-terminal residues; (2) oxaproline or thioproline, the oxazolidine- or thiazolidine-derived proline analog, favored interactions between the dipole of the heterocyclic ring and the preceding carbonyl oxygen; and (3) azaproline, containing a nitrogen atom in place of the Ca of proline, preferred the cis-amide bond by lone-pair repulsion between the a-nitrogen and the preceding carbonyl oxygen (94).
3.3.4. Peptide Backbone Modifications
Numerous modifications (Fig. 5) of the peptide backbone have been explored in order to improve metabolic stability, incorporate transition-state analogs for inhibition of enzymes, chelate metals, and enhance receptor affinity.
132
Marshall, Kuster, and Che
a N H
O N
X-HN
CO-Y
b
O
R1
R1
CO-Y R2
O
H N
X-HN
N
X-HN
R2
R1
N
N
CO-Y
N
X-HN
CO-Y
R2
()m
c O
O N
X-HN R1
d
R1
CO-Y
N R1
e
R1
CO-Y
CO-Y
O N
R1
O
N
X-HN
O
X-HN
CO-Y
O
O X-HN
N
X-HN
N
X-HN CO-Y
N R1
CO-Y
Fig. 4. Different chemical approaches to stabilizing cis-amide bonds.
3.3.4.1. Amide-Bond Surrogates
The role of amide bonds in molecular recognition, enzyme inhibition and protein stability has been probed by a variety of amide-bond surrogates with a variety of chemical modification to the peptide bond. Early work in this area on pseudopeptides, or amide-bond surrogates, has been reviewed by Spatola (95, 96).
3.3.4.2. Reduced Amide Bonds and Other Transition-State Analogs
Because of the wide occurrence of proteolytic hormones involved in various diseases, peptidomimetic inhibitors based on incorporation of a transition-state analog with a tetrahedral carbon to replace the carbonyl carbon of the sessile amide bond have been a dominant strategy. The simplest analog replaces the amide with a methylene amine, or a reduced amide bond.
Chemogenomics with Protein Secondary-Structure Mimetics
133
Fig. 5. Selected types of peptide-backbone surrogates in common usage.
3.3.4.3. Ester Surrogates
Replacing amid bonds with ester linkages has been explored in synthetic proteins to quantitate the energetics of hydrogen bonding of the amide bond is secondary structures (97–100).
3.3.4.4. Tetrazole Surrogates
Zabrocki et al. replaced the amide bond of dipeptides with a 1,5-disubstituted tetrazole ring to impose a cis-amide constraint and help stabilize a type-VI reverse turn (68, 101–105). Molecular modeling based on the crystal structure of a tetrazole-containing diketopiperazine allowed direct comparison (Fig. 6) of the effects of the substitution on the conformations accessible to the cis-amide bond (101). While this was first reported over 20 years ago, the conversion of a peptide bond in a dipeptide to the tetrazole ring requires rather harsh conditions and a strict protocol to avoid racemization. Thus, incorporation of this constraint has seen limited applications in bioactive peptides (106–108). Beusen et al. incorporated the tetrazole as an amide-bond mimetic in a cyclic hexapeptide analog of somatostatin (109). With the tetrazole constraining the cyclic peptide to a type VI turn at the Phe11-y[CN4]-Ala6 position and the expected b-II’ turn
134
Marshall, Kuster, and Che
Fig. 6. (Top) Ramachandran plots for torsional variables f1, Y1 and f2, Y2 for acetyl-l-Ala-Y(CN4)-l-Ala-methylamide
and acetyl-l-Ala-[cis-amide]-l-Ala-methylamide (bottom) at 10¢ increments. Dots represent sterically allowed values. (Bottom) Vector plots of a-carbon-b-carbon bond vectors (arrows) for acetyl-l-Ala-Y (CN4)-l-Ala-methylamide and acetyl-l-Ala-[cis-amide]-l-Ala-methylamide (bottom) at 10¢ increments. Dashed area denotes fixed coordinates of methylamide portions used as common frame of reference.
Chemogenomics with Protein Secondary-Structure Mimetics
135
at d-Trp8-Lys9 as determined by NMR, 83% of the activity of somatostatin was retained. 3.3.4.5. Triazole Surrogates
3.3.5. Cyclization of Peptides
3.3.6. Constrained Cyclic Tetrapeptides
The development of click chemistry (110–113) has impacted the ease with which both 1,4- and 1,5-disubstituted-1,2,3-triazoles can be used as amide-bond mimetics. In particular the 1,5-disubstituted triazole (113) is as good a cis-amide-bond mimetic (Taylor et al., unpublished) as the 1,5-tetrazole and much easier to prepare without fear of racemization. Numerous examples of the appliation of click chemistry have appeared in the peptidomimetic literature. The Meldal group prepared libraries of 1,4-disubstituted-1,2,3-triazoles in 2002 (111) by solid phase synthesis and used them to identify inhibitors of a cysteine protease (114). The Schultz group has even added azide- and alkyne-labeled amino acids to their library of noncoded analogs that can be incorporated by modified protein synthesis (115). Hitotsuyanagi et al. have developed a synthetic route via thiopeptides to 1,5-disubstituted-1,2,4-triazole peptidomimetics (116) and applied this approach to a constrained bicyclichexapeptide (117). Bock et al. have prepared 1,4-disubstituted-1,2,3-triazole analogs replacing the amide bonds of the naturally occurring tyrosinase inhibitor c[Pro-Val-Pro-Tyr] that retained inhibitory activity (118).Taylor and Marshall (unpublished) have investigated the ability of 1,4-disubstituted-1,2,3-triazoles to mimic trans-amide bonds as well as the ability of 1,5-disubstituted-1,2,3-triazoles to mimic cis-amide bonds based on a protocol similar to that of Zabrocki et al. in the investigation of cis-amide mimicry by tetrazole analogs (101). Patgiri (123) has recently reviewed replacement of hydrogen bonds within a peptide conformer with covalent bonds to stabilize particular conformation of the peptide (50, 121–123). Application of this approach has targeted both reverse-turn and helix mimetics. Numerous bicyclic and tricyclicdipeptide derivatives have been generated and incorporated into biologically active peptides (119, 120). Cyclization of peptides to help deduce their receptor-bound conformation has evolved as one strategy for logically making the transition from peptide ligand to nonpeptide drug while probing receptors for reverse-turn recognition (124). Alternative strategies to induce a reverse-turn and stabilize a b-hairpin in a peptide chain include use of heterochiral N-methyl amino acids, such as l-Pro-d-pro or d-pro-l-Pro. Combining the two strategies by Che and Marshall (124) led to recognition of c(l-Pro-d-prol-Pro-d-pro) as a template for displaying amino acid side chains and probing for reverse-turn recognition. The choice of a cyclic tetrapeptide scaffold was strongly influenced by a survey of the crystal structures of cyclic peptides in the Cambridge
136
Marshall, Kuster, and Che
Structure Database (Fig. 7). Only CTPs contained both cis- and trans-amide bonds. Larger cyclic peptides strongly favored all trans-amide bonds, while cyclic di- and tripeptides required all cis-amide bonds for closure. Rothe and coworkers’ extensive research (125–129) showed c(Pro-pro-Pro-pro) preferred a mixture of mirror-image, amide-bond conformers (tctc and ctct, Fig. 8) that interconverted upon heating. The Gilbertson group determined the solution conformation of cyclo[pro-Pro(4-OH)pro-Pro(4-OH)]; only one cycloenantiomer (tctc) was produced upon cyclization of the linear tetrapeptide (130). The substitution of the two chiral hydroxyl groups renders the two possible conformations of the cyclic compound diasteromeric rather than enantiomeric providing a basis for the single conformer seen in solution. Interconversion to the other cycloenantiomer was not observed upon heating. Derivatives of this cyclic tetrapeptide
Fig. 7. Survey of amide-bond torsion angles (w) of cyclic peptides in the Cambridge Structure Database; C is the number of residues in cyclic peptide. w = 0 is cis-amide, w = 180 is trans-amide bond.
Fig. 8. Interconversion of cyclo-(d-Pro-l-Pro-d-Pro-l-Pro) between the ctct and tctc amide-bond conformers.
Chemogenomics with Protein Secondary-Structure Mimetics
137
scaffold have been prepared readily by solid-phase synthesis (126) or by a convergent solution route (130) with yields of 85% during cyclization. Extensive mapping of the conformational surface of c[Pro-pro-Pro-pro] with DFT calculations by Che and Marshall (124) showed that a set of eight distinct amide-bond conformers and their interconversions (Fig. 9) based on the number and location of cis- or trans-amide bonds (tttt, or all trans; cccc, or all cis; tctc; ctct, tccc, cttt, ttcc, and cctt) were accessible. Individual conformers could be selectively stabilized by use of azaproline in which the alpha carbon of proline was replaced by nitrogen and other proline analogs. As all seven chiral hydrogens on the proline ring have been successfully substituted in the literature with side-chain functionality, this CTP was a strong candidate for library generation using chimeric analogs of proline and pipecolic acid to probe for reverse-turn recognition. Considering the synthetic difficulty (estimated at 1 man month) of preparing a single chirally substituted proline analog by published procedures, this was clearly beyond the synthetic capability of any single academic lab. Fortunately, the Moeller group has developed a rapid, novel synthetic route (131) to substituted proline and pipecolic acid analogs to generate specific building blocks. All four five-membered proline ring constraints in the CTP are, however, essential for stabilizing any cis-amide-bond conformers (132). A single substitution of N-methyl-l-alanine, or the six-membered homolog, l-pipecolic acid, destabilizes all the cis-amide conformers of c[Pro-pro-Pro-pro] and only the all-trans-amide tttt conformers were populated (see Fig. 9). 3.3.7. Metal Complexes of Peptides
Efforts have extended conventional cyclization by disulfide, amide, or carbon–carbon bonds through the use of metals and the introduction of specific metal-binding sites in the peptide itself (133–135). The use of a metal template as a strategy for controlling the conformation of a short peptide to mimic a reverse-turn motif was clearly enunciated and demonstrated by Tian and Bartlett (136). Peptide complexes of the Cu(II) ion were used to adopt the appropriate conformation to mimic the Trp-Arg-Tyr segment of tendamistat, a protein inhibitor of a-amylase. The metal complexes oriented the triad around a b-turn in a fashion similar to tendamistat, for which these residues are central to binding interactions with a-amylase. These mimetics were based on the structure of the complex of Cu(II) with pentaglycine where the N-terminal amino group and the next three amide nitrogens showed square-planar coordination to the metal. Three tetrapeptides containing Trp, Arg, and Tyr residues showed ~100-fold increases in inhibition in the presence of Cu(II). One complicating factor in this study was the dissociation
Fig. 9. Orthogonal views of DFT-optimized structures of cyclo-(d-Pro-l-Pro-d-Pro-l-Pro): (a) tttt, all-trans amides, (b) cttt, (c) ctct, (d) tccc, and (e) cccc, all-cis amide conformations. Mirror image (tctc) and rotational isomers of ctct are not shown. Hydrogen atoms are omitted for clarity. Note that each conformer has unique geometrical relative orientations of proline rings for attachment of side-chain functionality.
Chemogenomics with Protein Secondary-Structure Mimetics
139
of copper from the complex with its inherent amylase-inhibitor activity. It is most desirable that any metal complex has stability in the relevant biologic milieu to reduce ambiguity in its mechanism of action and to reduce possible toxicity. Shi and Sharma (137, 138) have developed a combinatorial approach entitled metalion induced distinctive array of structures (MIDAS) in which the amide nitrogens of the N-terminal two amide acids of a peptide preceding a cysteine residue react with a rhenium reagent leading to formation of a stable rhenium complex. This leads to stable complexes with similar geometry to the Cu(II) complexes of Tian and Bartlett (136). A selective inhibitor of human neutrophilelastase (137) and a highly selective agonist of the melanocortin-1 receptor (138) were discovered with the MIDAS approach. Marshall and coworkers (134, 139) explored the use of metal complexes of chiral azacrowns (MACs) derived from amino acid synthons as a strategy for controlling the conformation and fixing chiral side chains in orientations comparable with those of reverse turns. Reduction of the amide bonds to secondary amines of a cyclic peptide precursor leads to a flexible azacrown and the flexibility can be limited by complexation with a metal to fix the side-chain orientations into a manageable set. Proof of concept of MACs providing a novel approach to peptidomimetics came from two examples, where the receptor-bound conformations had been previously determined by X-ray crystallography of peptide– receptor complexes. One MAC was designed to mimic the proposed receptor-bound conformation of the Arg-Gly-Asp motif to the cyclic pentapeptide, c[RGDfMeV], complexed with the aVb3 integrin receptor. And the other MAC was designed to mimic the a-amylase-bound conformation of a Trp-Arg-Tyr b-turn motif from tendamistat. The metal center is buried in the middle of a MAC complex, acting like glue to keep the pharmacophoric groups correctly oriented in their desired directions. One must design a complex that affords the proper geometrical orientations, but it is essential that the metal be bound tightly so that no redox-active metals are allowed to dissociate from the complex in vivo to complicate bioassays with potentially toxic side effects. Riley et al. (140, 141) have demonstrated that MACs possessed catalytic superoxide dismutase activity in a wide range of MAC analogs when complexed with manganese. These metal complexes showed reasonable thermodynamic stabilities and excellent kinetic stability with the metal complexes completely intact under physiologic conditions and no metal dissociation for many hours even in the presence of EDTA. Several other groups have also used amino acid side chains (e.g., cysteine, histidine, lysine, aspartic acid) or chemically modified backbone to participate in specific metal ligation. A few examples serve to further illustrate this approach. Tamamura et al.
140
Marshall, Kuster, and Che
(142, 143) have shown that three peptides with significantly different cyclic constraints, including a Zn(II) complex, bind to C-X-C motif receptor 4. T22, a precursor of T134, has four Cys residues making two disulfide bonds and a b-hairpin conformation in solution. T22 (Zn), a derivative of T22 in which the four sulfurs of the Cys residues are bonded to Zn(II), has fourfold the activity of T22. T134 has a characteristic turn motif (d-amino acid-Pro) and a disulfide bridge constraint to impose a b–hairpin structure in solution. The Marshall group (133, 134, 144) developed synthetic routes to modify the amide backbone to a hydroxymate, or phosphinic acid, group to provide multiple metal-binding sites. Similarly, Akiyama et al. (145) had previously replaced the amide bond with a hydroxymate in enkephalin to generate a metal-binding site. These peptides mimic the naturally occurring hydroxymate-containing siderophores involved in iron transport. Combinations of these approaches and complexation of the resulting compounds with different metals should provide useful probes of conformational preorganization with novel constraints for reverse-turn recognition. 3.4. Criteria for Selection of Reverse-Turn Mimetics
Several criteria become important depending on the desired function for incorporation of a turn mimetic. If nucleation of a b-hairpin is desired, then the geometrical ability of the mimetic to initiate the initial hydrogen bonding seen in classical b-turns, which zippers the hairpin, becomes significant. Evaluation of nucleation of hairpins for a variety of reverse-turn mimetics by the Marshall group utilized the frequency of hydrogen-bond formation (84, 85). If surface mimicry is essential, then the ability of the mimetic to orient side chains correctly dominates (146). Many of the bi- and tricyclic dipeptidemimetics present synthetic difficulties to stereospecifically append side chains for surface mimicry. The Kihlberg group (147, 148) replaced the presumed i to i + 3 hydrogen bond with a covalent ethylene linker that allowed retention of the four chiral side chains by the turn mimetic. Hanessian (149) and Lubell (150–152) have addressed the problem of pendant side chains for several of the bicyclicdipeptidemimetics.
4. Helix Mimetics 4.1. Helical Motif
The most common element of secondary structure in proteins is the helix. Helices are enriched at protein/protein interfaces, where a helix:cleft motif is often employed to recognize “hot spot” residues at the protein/protein interface. Helices are also enriched in protein/nucleic acid interactions, where the helical motif facilitates molecular recognition by projecting residues into the grooves of nucleic acid helices.
Chemogenomics with Protein Secondary-Structure Mimetics
141
A well-studied prototype for helix recognition by proteins is the ribonuclease S system. Ribonuclease S is a modification of ribonuclease A, produced by digestion of the amide bond between residues 20 and 21 using the enzyme subtilisin (153). The cleaved portions can recombine at a 1:1 ratio into a catalytically productive complex similar to the wild-type ribonuclease A. The energetics of recognition (154) have been studied by calorimetry – replacing Met13 with glycine eliminates half the binding energy, presumably due to the entropic penalty from the glycine introducing conformational flexibility in the unbound peptide as well as eliminating enthalpic contributions from side-chain recognition. Met13 is obviously a “hot spot” for helical recognition by ribonuclease S. A system of significant therapeutic interest in oncology involves the protein/protein interface between MDM2 (or the human homolog, HDM2) and the trans-activation domain of p53, which is a functional node in the biological network for apoptosis and cancer cell survival. Here, p53 binds to MDM2 via a helix:cleft motif, where a hot spot of three residues (Leu, Phe, Trp) on the p53 helical segment is necessary to bind the cleft in MDM2 (155). Vassilev et al. (156) have described nutlins, low molecular weight drug candidates, that inhibit this interaction by placing different chemical moieties in a similar orientation as the native Leu-Phe-Trp triad, occluding the binding site from the natural ligand. The diversity of chemotypes that inhibit binding of p53 to MDM2, many of which arose by screening (157), has been reviewed by Dudkina and Lindsley (44). The diversity of small molecules immediately raises the question of the concept of helix mimetic. These compounds have surface complementarity for the MDM2 cleft, but no obvious derivation from the parent helical peptide. In contrast, an octapeptide had previously been described (158, 159) as a nanomolar inhibitor that contained two a,a-dialkylamino acids to stabilize the receptor-bound helical conformation (160) and correctly orient the three side chains of Leu, Phe, and Trp. Unless one can trace the derivation of a nonpeptide inhibitor of a protein/protein interface, the designation as a peptidomimetic is one of function rather than mechanism. A binding site can be complexed with a multiplicity of chemical scaffolds, especially if the structure of a complex provided a basis for structure-based design. Patgiri et al. have reviewed a general strategy of constraining short peptides in helical conformations by replacing their internal hydrogen bonds with covalent bonds through ring-closing metathesis reactions (123). 4.2. Protein Helices: Data Versus Dogma
The description by Pauling and his colleagues, Corey and Branson, in 1951 (161) of the protein a-helix (3.613-helix) with multiple thirteen-membered hydrogen bonds between the carbonyls of residues i and the amide hydrogens of residues i + 4
142
Marshall, Kuster, and Che
remains a milestone in the history of structural biology. Donahue proposed a similar protein helix with ten-membered hydrogen bonding, the 310-helix, in 1953 with relatively minor changes to the backbone torsional angles of the a-helix (162). When one considers replacing a helix with a semirigid organic mimetic, it is important to precisely project side chains along similar Ca–Cb vectors. This is a necessary condition for determining a mimetic surface topology at the recognition motif. In an effort to design helix mimetics that could accommodate the observed variation in backbone torsional angles of helices, Kuster et al. (163) analyzed the crystal structure of high-resolution protein structures in the PDB (Fig. 10). Unexpectedly, the large majority of torsional angles of protein helices lies between those of the a-helix and the 310-helix. The unexpected population of intermediate helices is due to the stability of shared three-centered (bifurcated) hydrogen bonds not considered by either Pauling et al. (161) or Donahue (162). Such bifurcated hydrogen bonds are unrealistically penalized by molecular mechanics force fields utilizing monopole electrostatics where linear hydrogen bonds were considered optimal. As can be seen in Fig. 10a, the distribution of j,y torsional angles for helical residues in the high-resolution structures in the PDB does not center on either a classical a- or 310-helix, but rather at an intermediary value (j = –60, y = –41). The monodispersed, rather than bimodal, distribution of j,y torsional angles seen in
Fig. 10. (a) Frequency of helical backbone torsional angles in PDB structure with resolutions between 1 and 1.49 Å resolution. (b) Difference plot between frequencies of helical backbone torsional angles in PDB structures with resolution between 1 and 1.49 Å resolutions and those with resolutions between 2 and 2.49 Å. Note loss of observations at F,Y = -61, -40 seen in high-resolution data and enhanced observations at classical (Pauling) a-helical torsion angles (F,Y = -57, -47) with lower-resolution data. The implication of this plot is that data from lower-resolution (>2 Å) crystal structures of proteins is biased toward the classical alpha-helix because the electron density of the protein backbone is not adequately resolved by density data. In such cases, electron density fitting is usually supplemented by force field calculations that utilize monopole electrostatics.
Chemogenomics with Protein Secondary-Structure Mimetics
143
Fig. 10a implies that shared (bifurcated) hydrogen bonds dominate the statistical picture of helical conformation in proteins and must be considered in helical mimetic design. The implications of this observation are manifold in terms of protein dynamics, transformation from the molten globule to a packed structure, etc. Further analyses showed that such bifurcated, backbone-hydrogen bonds are consistent with high-level quantum calculations by Wieczorek and Dannenberg (164) as well as second-generation, molecular-mechanics force fields such as AMOEBA (164–168) that include multipole electrostatics and polarizability. A recent critical comparison of twelve different force fields that utilized monopole electrostatics with molecular dynamics simulations of alaninepentamers in water indicated that most, if not all, overpopulate the a-helical region (169). In our opinion, the representation of hydrogen bonding by molecular mechanics requires the use of next-generation force fields such as AMOEBA that include multipole electrostatics and polarizability. This view is supported by studies using AMOEBA on the geometry of water clusters and ion solvation by the Ponder group (165, 168). 4.3. Stabilization of Helical Conformations
The selection criteria for inclusion of different chemical approaches are: introduction of conformational constraints that help preorganize the unbound structure into a helix mimetic (reduction of DS upon complex formation), demonstrated mimicry of a helix-recognition site, and synthetic accessibility (Fig. 11). There are many examples where compounds have been discovered that inhibit protein-protein interactions, for example, by screening of compound libraries without an iota of evidence that they actually bind in a similar fashion by mimicking surface topography of the helix within its recognition site. Such inhibitors are often referred to in the literature as helix mimetics with little supporting structural data. As an example, a substituted 1,4-benzodiazepine-2,5dione antagonist of the HDM2-p53 protein-protein interaction blocked complex formation (170), but structural data supporting helix mimicry per se were sparse.
4.3.1. a-Methyl Amino Acids
Peptides containing amino acids in which the a-proton has been replaced by a methyl group or other alkyl group (a,a-dialkyl amino acids) serve as the simplest example of helix mimetics. The severe restrictions of j,y space by an a-methyl amino acid to limit backbone conformations only to values associated with both a- and 310-helices was discovered independently by Marshall and Bosshard (90) and by Burgess and Leach (171). These early predictions from molecular modeling have been confirmed by multiple studies, both computationally (172–175) and experimentally (176–183) in the subsequent 35 years. Numerous examples of the introduction of a-methyl amino acids into biologically
144
Marshall, Kuster, and Che
Fig. 11. Representative approaches to stabilize helices in peptidomimetics as suggested in the literature.
active peptides exist providing evidence for turn or helical conformations at that residue when bound to their protein receptors (90, 184). Inclusion of several of these a-alkyl amino acid analogs within a peptide enhances the probability that even short peptides will assume helical conformations. 4.3.2. Constraints on Side-Chain Torsional Angles
Creamer et al. have shown a strong correlation between decreased helix propensity of amino acid residues and the entropic effects of holding a valine or isoleucine residue in an a-helical conformation (185). This results from the limitation on c1 values of the b-branched amino acids imposed by the helical backbone conformation.
4.3.3. Helix Nucleation
Kemp et al. suggested that cyclic peptide-like compounds could be used to nucleate helix formation by providing the correct
Chemogenomics with Protein Secondary-Structure Mimetics
145
geometrical arrangement of three carbonyl oxygens to hydrogen bond to the three unsatisfied amide hydrogens at the N-terminus of an a-helix (186). 4.3.4. Oligoaromatic Mimetics
Jacoby (187) and the Hamilton group (188–191) suggested that bis- or tris-aromatic residues could serve as scaffolds for helical mimetics. Che et al. have examined a variety of aromatic-based scaffolds as potential helix mimetics (192). Rebek and coworkers have suggested a central pyridazine ring (193) as well as a heterocyclic piperazine-based scaffold (194). Ahn and Han developed a facile synthesis of benzamides as potential helix mimetics (195). In order to illustrate the synthetic effort to generate libraries of aromatic semirigid helix mimetics, one approach is outlined. Bourne et al. (manuscript in preparation) developed a synthetic scheme for substituted phenyl-pyridyl and phenyldipyridyl scaffolds to generate small libraries of candidate helical mimetics. Their synthesis of helical peptidomimetics involved independent syntheses of aromatic rings (Ring A, Ring B, and Ring C) and then a Pd-based coupling reaction was used to convergently combine the fragments (Fig. 12). Modeling studies suggested a 2-methyl benzyl analog and pyridine analogs as core heterocycles. The methyl group and the steric pocket left by the lone pair of the electrons of the
R1 A Ring X
R1 Sn(Bu)3 X
Br X
N
B Ring R2
R2 Y
X
N
R3 Y
X
N
Br C Ring
X
N
R3 OPG R1, R2, R3 = amino acid side chains X = H, CH3; Y = OPG, H
Fig. 12. Retrosynthetic approach to target substituted phenyl-dipyridyl libraries.
146
Marshall, Kuster, and Che
nitrogen stabilized a twist in the core aromatic rings consistent with an a-helix. Pyridine substitution has the added benefit of increasing solubility.
5. Sheet Mimetics 5.1. b-Strand Motifs
A significant motif of peptide recognition involves interactions with the peptide backbone as commonly seen in proteolytic enzymes, major histocompatibility complex (MHC) proteins, and PDZ protein-interaction domains. The linear recognition motif in proteolytic enzymes allows alignment of the peptide backbone within the active site to give precise orientation to the enzymatic functional groups responsible for amide-bond hydrolysis. In protein-protein recognition, b-strand recognition usually manifests itself as b-sheet assembly in complex formation. As an example, HIV protease is a homodimer with each of two active-site aspartyl residues contributed from each monomer. A four-stranded b-sheet structurally stabilizes the dimer and is responsible for over 80% of the stabilization energy (196). A key common pathological event in many neurodegenerative disorders, such as Alzheimer’s disease, is the misfolding and aggregation of normal soluble peptide into b-sheet-rich oligomeric structures that have a neurotoxic activity and ability to form insoluble amyloid deposits that accumulate in the brain. Several attempts have been made to inhibit dimer formation of HIV protease (197–199) and the accumulation of amyloid deposits found in Alzheimer’s disease with b-strand/sheet mimetics (197–199).
5.2. Protease Inhibitors
Proteolytic enzymes recognize their peptide substrates by the linear array of peptide groups, generally in a b-sheet conformation. Incorporation of stable transition-state analogs at the sessile bond has been a highly productive strategy as exemplified with peptidomimetic inhibitors of HIV protease (200). These inhibitors evolved from structure-activity studies of substrates and optimization of leads after incorporating various transition-state analogs. The first six clinically approved HIV-protease inhibitors all derived by this approach (Fig. 13). In contrast, nonpeptidic inhibitors (201) arose from screening or structure-based design utilizing the structure of an HIV protease/inhibitor complex (202, 203).
5.3. Oligopyrrolinones
A single b-strand mimic was introduced by Hirschmann, Smith, and their coworkers based on the nitrogen-displaced and carbonyl-displaced pyrrolinone scaffolds (Fig. 14), which mimics both the hydrogen-bonding pattern and the side-chain orientation of the b-strand/b-sheet conformation of peptides (204).
Chemogenomics with Protein Secondary-Structure Mimetics O
O
O
H
Tetrahedral intermediate
H
of amide hydrolysis
O
H H
O
R2
O
N
H
R
H
H
O
O
R1
N
CONH2
Saquinavir
S HO
O O
O
NH
N
N H
O S
OH
NH2
Amprenavir
NH
N
N
N H
H2N O
O
H
OH
O
O
+ OH
O
N
N H
O
H
R2
O
R1
H
O
H N
H
R2
N
O
H
TS
N
N
N 1
147
OH
N
OH
O
Nelfinavir
H N
N
OH
O
N
Indianavir
S N
O
O N
N
N O
N OH
O O
HN
S
O
O N
N
N O
N
O
OH
N Ritonavir
Lopinavir
Fig. 13. Mechanism of proteolysis by an aspartyl protease, e.g., HIV-1 protease, and the first six FDA-approved HIV-1 protease inhibitors all developed for mimicking the tetrahedral intermediate of amide hydrolysis.
The pyrrolinone rings fix the dihedral angles analogous to y and w in a peptide, and gauche steric interactions of the side chains with the neighboring pyrrolinone rings constrain rotations corresponding to j. The b-strand mimicry has been demonstrated by X-ray crystallography. In addition, the intrastrand hydrogen bonding of the NH proton with the carbonyl of the neighboring pyrrolinone ring would both stabilize the b-strand and reduce solvation, thereby improving the pharmacokinetic properties of pyrrolinone-based inhibitors. The success of this design led to
148
Marshall, Kuster, and Che
O
R N H
H R N
O
O
O
H R N
R N H
O H N
R O
O
R N H
R
N H R O
pyrrolinone
O
O
O
H N
R
N H
R
O
H N
N
R
N H R O
H N O
R
H N
N R
O
H N
R
N
@-tide
O
Fig. 14. Pyrrolinone- and dihydropyridinone-based scaffolding to mimic the conformation of a b-strand.
the development of potent pyrrolinone-based inhibitors of several proteolytic enzymes including HIV-1 protease (204), renin (204), and matrix metalloproteases (205). 5.4. @-Tides: Dihydropyridinones
Bartlett and coworkers (206, 207) designed modular b-strand mimics, nicknamed “@-tides,” based on the dihydropyridinone derivatives. These @-tides are composed of alternating amino acids and cyclic amino acid surrogates, dihydropyridinones (“@-unit”) or dihydropyrazinones (“aza-@-unit”), which together stabilize the extended conformation of a peptide unit and provide hydrogen bond donor and acceptor functions conducive to b-sheet formation (Fig. 14). As a consequence of this conformational control, @-tides readily associate in organic solvent as two-stranded antiparallel b-sheet-like homodimers, and the @-unit is exceptionally effective as a stabilizing template for b-hairpin structures in aqueous solution (206). Recently, they demonstrated that substituted @-tides as b-strand mimics were protease-resistant ligands to a PDZ protein-interaction domain and significantly more potent than the corresponding peptide sequences (208).
Acknowledgments This research was supported in part by an NIH research grant (GM 68460) to GRM. YC also acknowledges the Lenfant Biomedical Fellowship from the National Heart, Lung and Blood Institute. DJK acknowledges financial support from the Ewing Marion Kauffman Foundation and the Computational
Chemogenomics with Protein Secondary-Structure Mimetics
149
Biology Training Grant at Washington University in St. Louis. The authors acknowledge continued support from the Department of Biochemistry and Molecular Biophysics and the Center for Computational Biology at Washington University Medical School. References 1. Pederson, C. J. (1988) The discovery of crown ethers. Angew. Chem. Int. Ed. Engl. 27, 1021–1025. 2. Cram, D. J. (1988) The design of molecular hosts, guests, and their complexes. Angew. Chem. Int. Ed. Engl. 27, 1009–1020. 3. Lehn, J.-M. (1988) Supramolecular chemistry – scope and perspectives molecules, supermolecules, and molecular devices. Angew. Chem. Int. Ed. Engl. 27, 89–112. 4. Fairlie, D. P., West, M. L., and Wong, A. K. (1998) Towards protein surface mimetics. Curr. Med. Chem. 5, 29–62. 5. Page, M. I., and Jencks, W. P. (1971) Entropic contributions to rate accelerations in enzymic and intramolecular reactions and the chelate effect. Proc. Natl. Acad. Sci. U.S.A. 68, 1678– 1683. 6. Smith, W. W., and Bartlett, P. A. (1998) Macrocyclic inhibitors of penicillopepsin. 3. Design, synthesis, and evaluation of an inhibitor bridged between P2 and P1¢. J. Am. Chem. Soc. 120, 4622–4628. 7. Murray, C. W., and Verdonk, M. L. (2002) The consequences of translational and rotational entropy lost by small molecules on binding to proteins. J. Comput. Aided Mol. Des. 16, 741–753. 8. Searle, M. S., and Williams, D. H. (1992) The cost of conformational order: Entropy changes in molecular associations. J. Am. Chem. Soc. 114, 10690–10697. 9. Searle, M. S., Williams, D. H., and Gerhard, U. (1992) Partitioning of free energy contributions in the estimation of binding constants: Residual motions and consequences for amide-amide hydrogen strengths. J. Am. Chem. Soc. 114, 10697–10704. 10. Sawyer, T. K., Hruby, V. J., Darman, P. S., and Hadley, M. E. (1982) [half-Cys4,halfCys10]-alpha-Melanocyte-stimulating hormone: A cyclic alpha-melanotropin exhibiting superagonist biological activity. Proc. Natl. Acad. Sci. U.S.A. 79, 1751– 1755. 11. Chang, C. E., Chen, W., and Gilson, M. K. (2007) Ligand configurational entropy and protein binding. Proc. Natl. Acad. Sci. U.S.A. 104, 1534–1539.
12. Chang, C. E., and Gilson, M. K. (2004) Free energy, entropy, and induced fit in host-guest recognition: Calculations with the second-generation mining minima algorithm. J. Am. Chem. Soc. 126, 13156– 13164. 13. Zeevaart, J. G., Wang, L., Thakur, V. V., Leung, C. S., Tirado-Rives, J., Bailey, C. M., Domaoal, R. A., Anderson, K. S., and Jorgensen, W. L. (2008) Optimization of azoles as anti-human immunodeficiency virus agents guided by freeenergy calculations. J. Am. Chem. Soc. 130, 9492–9499. 14. Jorgensen, W. L., and Thomas, L. L. (2008) Perspective on free-energy perturbation calculations for chemical equilibria. J. Chem. Theory Comput. 4, 869–876. 15. Wang, J., Deng, Y., and Roux, B. (2006) Absolute binding free energy calculations using molecular dynamics simulations with restraining potentials. Biophys. J. 91, 2798– 2814. 16. Lee, M. S., and Olson, M. A. (2006) Calculation of absolute protein-ligand binding affinity using path and endpoint approaches. Biophys. J. 90, 864–877. 17. Lu, B., and Wong, C. F. (2005) Direct estimation of entropy loss due to reduced translational and rotational motions upon molecular binding. Biopolymers 79, 277–285. 18. Leavitt, S., and Freire, E. (2001) Direct measurement of protein binding energetics by isothermal titration calorimetry. Curr. Opin. Struct. Biol. 11, 560–566. 19. Velazquez-Campoy, A., Luque, I., and Freire, E. (2001) The application of thermodynamic methods in drug design. Thermochimica Acta 380, 217–227. 20. Velazquez-Campoy, A., Leavitt, S. A., and Freire, E. (2004) Characterization of protein-protein interactions by isothermal titration calorimetry. Methods Mol. Biol. 261, 35–54. 21. Luque, I., and Freire, E. (2002) Structural parameterization of the binding enthalpy of small ligands. Proteins 49, 181–190. 22. Freire, E. (2002) Designing drugs against heterogeneous targets. Nat. Biotechnol. 20, 15–16.
150
Marshall, Kuster, and Che
23. Velazquez-Campoy, A., and Freire, E. (2006) Isothermal titration calorimetry to determine association constants for highaffinity ligands. Nat. Protoc. 1, 186–191. 24. Todd, M. J., Luque, I., Velazquez-Campoy, A., and Freire, E. (2000) Thermodynamic basis of resistance to HIV-1 protease inhibition: Calorimetric analysis of the V82F/ I84V active site resistant mutant. Biochemistry 39, 11876–11883. 25. Velazquez-Campoy, A., Todd, M. J., and Freire, E. (2000) HIV-1 protease inhibitors: Enthalpic versus entropic optimization of the binding affinity. Biochemistry 39, 2201– 2207. 26. Ohtaka, H., Velazquez-Campoy, A., Xie, D., and Freire, E. (2002) Overcoming drug resistance in HIV-1 chemotherapy: The binding thermodynamics of Amprenavir and TMC-126 to wild-type and drugresistant mutants of the HIV-1 protease. Protein Sci. 11, 1908–1916. 27. Velazquez-Campoy, A., Muzammil, S., Ohtaka, H., Schon, A., Vega, S., and Freire, E. (2003) Structural and thermodynamic basis of resistance to HIV-1 protease inhibition: Implications for inhibitor design. Curr. Drug Targets Infect. Disord. 3, 311–328. 28. Muzammil, S., Ross, P., and Freire, E. (2003) A major role for a set of non-active site mutations in the development of HIV-1 protease drug resistance. Biochemistry 42, 631–638. 29. Schon, A., del Mar Ingaramo, M., and Freire, E. (2003) The binding of HIV-1 protease inhibitors to human serum proteins. Biophys. Chem. 105, 221–230. 30. Muzammil, S., Armstrong, A. A., Kang, L. W., Jakalian, A., Bonneau, P. R., Schmelmer, V., Amzel, L. M., and Freire, E. (2007) Unique thermodynamic response of tipranavir to human immunodeficiency virus type 1 protease drug resistance mutations. J. Virol. 81, 5144–5554. 31. Li, L., Dantzer, J. J., Nowacki, J., O’Callaghan, B. J., and Meroueh, S. O. (2008) PDBcal: A comprehensive dataset for receptor-ligand interactions with three-dimensional structures and binding thermodynamics from isothermal titration calorimetry. Chem. Biol. Drug Des. 71, 529–532. 32. Benfield, A. P., Teresk, M. G., Plake, H. R., DeLorbe, J. E., Millspaugh, L. E., and Martin, S. F. (2006) Ligand preorganization may be accompanied by entropic penalties in protein-ligand interactions. Angew. Chem. Int. Ed. Engl. 45, 6830–6835.
33. Udugamasooriya, D. G., and Spaller, M. R. (2008) Conformational constraint in protein ligand design and the inconsistency of binding entropy. Biopolymers 89, 653–667. 34. Bastos, M., Pease, J. H., Wemmer, D. E., Murphy, K. P., and Connelly, P. R. (2001) Thermodynamics of the helix-coil transition: Binding of S15 and a hybrid sequence, disulfide stabilized peptide to the S-protein. Proteins 42, 523–530. 35. Bogan, A. A., and Thorn, K. S. (1998) Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280, 1–9. 36. Cunningham, B. C., and Wells, J. A. (1993) Comparison of a structural and a functional epitope. J. Mol. Biol. 234, 554–563. 37. Clackson, T., and Wells, J. A. (1995) A hot spot of binding energy in a hormone-receptor interface. Science 267, 383–386. 38. Ma, B., and Nussinov, R. (2007) Trp/Met/ Phe hot spots in protein-protein interactions: Potential targets in drug design. Curr. Top. Med. Chem. 7, 999–1005. 39. Sasaki, Y., Murphy, W. A., Heiman, M. L., Lance, V. A., and Coy, D. H. (1987) Solidphase synthesis and biological properties of psi [CH2NH] pseudopeptide analogues of a highly potent somatostatin octapeptide. J. Med. Chem. 30, 1162–1166. 40. Pallai, P. V., Struthers, R. S., Goodman, M., Moroder, L., Wunsch, E., and Vale, W. (1985) Partial retro-inverso analogues of somatostatin: Pairwise modifications at residues 7 and 8 and at residues 8 and 9. Biochemistry 24, 1933–1941. 41. Huang, Z., Probstl, A., Spencer, J. R., Yamazaki, T., and Goodman, M. (1993) Cyclic hexapeptide analogs of somatostatin containing bridge modifications. Syntheses and conformational analyses. Int. J. Pept. Protein Res. 42, 352–365. 42. Boer, J., Gottschling, D., Schuster, A., Semmrich, M., Holzmann, B., and Kessler, H. (2001) Design and synthesis of potent and selective alpha(4)beta(7) integrin antagonists. J. Med. Chem. 44, 2586–2592. 43. Hirschmann, R., Hynes, J. Jr., CichyKnight, M. A., van Rijn, R. D., Sprengeler, P. A., Spoors, P. G., Shakespeare, W. C., Pietranico-Cole, S., Barbosa, J., Liu, J., Yao, W., Rohrer, S., and Smith, A. B. III (1998) Modulation of receptor and receptor subtype affinities using diastereomeric and enantiomeric monosaccharide scaffolds as a means to structural and biological diversity. A new route to ether synthesis. J. Med. Chem. 41, 1382–1391.
Chemogenomics with Protein Secondary-Structure Mimetics
44. Dudkina, A. S., and Lindsley, C. W. (2007) Small molecule protein-protein inhibitors of the p53-MDM2 interaction. Curr. Top. Med. Chem. 7, 952–960. 45. Evans, B. E., Rittle, K. E., Bock, M. G., Dipardo, R. M., Freidinger, R. M., Whitter, W. L., Lundell, G. F., Veber, D. F., Anderson, P. S., Chang, R. S. L., Lotti, V. J., Cerino, D. J., Chen, T. B., Kling, P. J., Kunkel, K. A., Springer, J. P., and Hirshfield, J. (1988) Methods for drug discovery - development of potent, selective, orally effective cholecystokinin antagonists. J. Med. Chem. 31, 2235–2246. 46. Ripka, W. C., DeLucca, G. V., Bach, A. C. II, Pottorf, R. S., and Blaney, J. M. (1993) Protein b-turn mimetics I. Design, synthesis, and evaluation in model cyclic peptides. Tetrahedron 49, 3593–3608. 47. Ripka, W. C., Lucca, G. V. D., Bach, A. C. II, Pottorf, R. S., and Blaney, J. M. (1993) Protein b-turn mimetics II: Design, synthesis, and evaluation in the cyclic peptide Gramicidin S. Tetrahedron 49, 3609–3628. 48. Hata, M., and Marshall, G. R. (2006) Do benzodiazepines mimic reverse-turn structures? J. Comput. Aided Mol. Des. 20, 321– 331. 49. Joseph, C. G., Wilson, K. R., Wood, M. S., Sorenson, N. B., Phan, D. V., Xiang, Z., Witek, R. M., and Haskell-Luevano, C. (2008) The 1,4-benzodiazepine-2,5-dione small molecule template results in melanocortin receptor agonists with nanomolar potencies. J. Med. Chem. 51, 1423–1431. 50. Horton, D. A., Bourne, G. T., and Smythe, M. L. (2003) The combinatorial synthesis of bicyclic privileged structures or privileged substructures. Chem. Rev. 103, 893–930. 51. Patchett, A. A., and Nargund, R. P. (2000) Privileged structures - An update. Annu. Rep. Med. Chem. 35, 289–298. 52. Koch, M. A., Schuffenhauer, A., Scheck, M., Wetzel, S., Casaulta, M., Odermatt, A., Ertl, P., and Waldmann, H. (2005) Charting biologically relevant chemical space: A structural classification of natural products (SCONP). Proc. Natl. Acad. Sci. U.S.A. 102, 17272–17277. 53. Balamurugan, R., Dekker, F. J., and Waldmann, H. (2005) Design of compound libraries based on natural product scaffolds and protein structure similarity clustering (PSSC). Mol. Biosyst. 1, 36–45. 54. Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., and Waldmann, H. (2007) The scaffold tree-visualization of the
55. 56.
57. 58.
59.
60. 61. 62.
63.
64.
65.
151
scaffold universe by hierarchical scaffold classification. J. Chem. Inf. Model. 47, 47–58. Loughlin, W. A., Tyndall, J. D., Glenn, M. P., and Fairlie, D. P. (2004) Beta-strand mimetics. Chem. Rev. 104, 6085–6117. Tyndall, J. D. A., Pfeiffer, B., Abbenante, G., and Fairlie, D. P. (2005) Over one hundred peptide-activated G protein-coupled receptors recognize ligands with turn structure. Chem. Rev. 105, 793–826. Marshall, G. R. (2001) Peptide interactions with G-protein coupled receptors. Biopolymers 60, 246–277. Harrison, S. C., and Aggarwal, A. K. (1990) DNA recognition by proteins with the helix-turn-helix motif. Annu. Rev. Biochem. 59, 933–969. Suzuki, M. (1994) A framework for the DNA-protein recognition code of the probe helix in transcription factors: The chemical and stereochemical rules. Structure 2, 317–326. Williamson, J. R. (2000) Induced fit in RNA-protein recognition. Nat. Struct. Biol. 7, 834–837. Ripka, A. S., and Rich, D. H. (1998) Peptidomimetic design. Curr. Opin. Chem. Biol. 2, 441–452. Hershberger, S. J., Lee, S. G., and Chmielewski, J. (2007) Scaffolds for blocking protein-protein interactions. Curr. Top. Med. Chem. 7, 928–942. Hirschmann, R., Nicolaou, K. C., Pietranico, S., Leahy, E. M., Salvino, J., Arison, B., Cichy, M. A., Spoors, P. G., Shakespeare, W. C., Sprengeler, P. A., Hamley, P., Smith, A. B., Reisine, T., Raynor, K., Maechler, L., Donaldson, C., Vale, W., Freidinger, R. M., Cascieri, M. R., and Strader, C. D. (1993) De-Novo design and synthesis of somatostatin nonpeptide peptidomimetics utilizing beta-D-glucose as a novel scaffolding. J. Am. Chem. Soc. 115, 12550–12568. Hirschmann, R., Hynes, J., Cichy-Knight, M. A., van Rijn, R. D., Sprengeler, P. A., Spoors, P. G., Shakespeare, W. C., Pietranico-Cole, S., Barbosa, J., Liu, J., Yao, W. Q., Rohrer, S., and Smith, A. B. (1998) Modulation of receptor and receptor subtype affinities using diastereomeric and enantiomeric monosaccharide scaffolds as a means to structural and biological diversity. A new route to ether synthesis. J. Med. Chem. 41, 1382–1391. Olson, G. L., Cheung, H. C., Chiang, E., Madison, V. S., Sepinwall, J., Vincent, G. P., Winokur, A., and Gary, K. A. (1995) Peptide mimetics of thyrotropin-releasing
152
66.
67.
68.
69.
70.
71. 72.
73.
74. 75.
76.
Marshall, Kuster, and Che hormone based on a cyclohexane framework: Design, synthesis, and cognitionenhancing properties. J. Med. Chem. 38, 2866–2879. Rutledge, L. D., Perlman, J. H., Gershengorn, M. C., Marshall, G. R., and Moeller, K. D. (1996) Conformationally restricted TRH analogs - a probe for the pyroglutamate region. J. Med. Chem. 39, 1571– 1574. Chu, W., Perlman, J. H., Gershengorn, M. C., and Moeller, K. D. (1998) Thyrotropin releasing hormone analogs: A building block approach to the construction of tetracyclic peptidomimetics. Bioorg. Med. Chem. Lett. 8, 3093–3096. Tong, Y., Olczak, J., Zabrocki, J., Gershengorn, M. C., Marshall, G. R., and Moeller, K. D. (2000) Constrained peptidomimetics for TRH: Cis-peptide bond analogs. Tetrahedron 56, 9791–9800. Simpson, J. C., Ho, C., Shands, E. F., Gershengorn, M. C., Marshall, G. R., and Moeller, K. D. (2002) Conformationally restricted TRH analogues: Constraining the pyroglutamate region. Bioorg. Med. Chem. 10, 291–302. Hanessian, S., and Auzzas, L. (2008) The practice of ring constraint in peptidomimetics using bicyclic and polycyclic amino acids. Acc. Chem. Res. 41, 1241–1251. Rose, G. D., Gierasch, L. M., and Smith, J. A. (1985) Turns in peptides and proteins. Adv. Protein Chem. 37, 1–109. Frieden, C., Huang, E. S., and Ponder, J. W. (2001) Turn scanning. Experimental and theoretical approaches to the role of turns. Methods Mol. Biol. 168, 133–158. Tran, T. T., McKie, J., Meutermans, W. D. F., Bourne, G. T., Andrews, P. R., and Smythe, M. L. (2005) Topological sidechain classification of b-turns: Ideal motifs for peptidomimetic development. J. Comput. Aided Mol. Des. 19, 551–566. Gillespie, P., Cicariello, J., and Olson, G. L., (1997) Conformational analysis of dipeptide mimetics. Biopolymers 43, 191–217. Arbor, S., and Marshall, G. R. (2008) A virtual library of constrained cyclic tetrapeptides that mimics all four side-chain orientations for over half the reverse turns in the protein data bank. J. Comput. Aided Mol. Des. 23, 87–95. Deslauriers, R., Leach, S. J., Maxfield, F. R., Minasian, E., McQuie, J. R., Meinwald, Y. C., Némethy, G., Pottle, M. S., Rae, I. D., Scheraga, H. A., Stimson, E. R., and
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
Nispen, J. W. V. (1979) Cyclized dipeptide model for a beta-bend. Proc. Natl. Acad. Sci. U.S.A. 76, 2512–2514. Sharma, A. K., and Chauhan, V. S. (1988) Analogues of luteinizing hormone-releasing hormone (LH-RH) containing dehydroalanine in 6th position. Int. J. Pept. Protein Res. 31, 225–230. Fabian, P., Chauhan, V. S., and Pongor, S. (1994) Predicted conformation of poly(dehydroalanine): A preference for turns. Biochim. Biophys. Acta 1208, 89–93. Pieroni, O., Fissi, A., Jain, R. M., and Chauhan, V. S. (1996) Solution structure of dehydropeptides: A CD investigation. Biopolymers 38, 97–108. Mathur, P., Ramakumar, S., and Chauhan, V. S. (2004) Peptide design using alpha,beta-dehydro amino acids: From beta-turns to helical hairpins. Biopolymers 76, 150–161. Patel, H. C., Singh, T. P., Chauhan, V. S., and Kaur, P. (1990) Synthesis, crystal structure, and molecular conformation of peptide N-Boc-L-Pro-dehydro-Phe-L-GlyOH. Biopolymers 29, 509–515. Rajashankar, K. R., Chauhan, V. S., and Ramakumar, S. (1995) Crystal and molecular structure of Boc-Phe-Val-OMe; comparison of the peptide conformation with its dehydro analogue. Int. J. Pept. Protein Res. 46, 487–493. Rajashankar, K. R., Ramakumar, S., Jain, R. M., and Chauhan, V. S. (1996) Helix termination and chain reversal: Crystal and molecular structure of the alpha, betadehydrooctapeptide Boc-Val-DeltaPhe-PheAla-Leu-Ala-DeltaPhe-Leu-OH. J. Biomol. Struct. Dyn. 13, 641–647. Chalmers, D. K., and Marshall, G. R. (1995) Pro-D-NMe-amino acid and D-Pro-NMe-amino acid: Simple, efficient reverse-turn constraints. J. Am. Chem. Soc. 117, 5927–5937. Takeuchi, Y., and Marshall, G. R. (1998) Conformational analysis of reverseturn constraints by N-methylation and N-hydroxylation of amide bonds in peptides and non-peptide mimetics. J. Am. Chem. Soc. 120, 5363–5372. Spath, J., Stuart, F., Jiang, L., and Robinson, J. A. (1998) Stabilization of a b-hairpin conformation in a cyclic peptide using the templating effect of a heterochiral diproline unit. Helv. Chim. Acta 81, 1726–1738.
Chemogenomics with Protein Secondary-Structure Mimetics
87. Davies, J. S. (2003) The cyclization of peptides and depsipeptides. J. Pept. Sci. 9, 471– 501. 88. El Haddadi, M., Cavelier, F., Vives, E., Azmani, A., Verducci, J., and Martinez, J. (2000) All-L-Leu-Pro-Leu-Pro: A challenging cyclization. J. Pept. Sci. 6, 560–570. 89. Durani, S. (2008) Protein design with Land D-amino acid structures as the alphabet. Acc. Chem. Res. 41, 1301–1308. 90. Marshall, G. R., and Bosshard, H. E. (1972) Angiotensin II. Studies on the biologically active conformation. Circ. Res. 31(Suppl 2), 143–150. 91. Yun, R. H., Anderson, A., and Hermans, J. (1991) Proline in alpha-helix: Stability and conformation studied by dynamics simulation. Proteins 10, 219–228. 92. Chatterjee, J., Gilon, C., Hoffman, A., and Kessler, H. (2008) N-Methylation of peptides: A new perspective in medicinal chemistry. Acc. Chem. Res. 41, 1331–1342. 93. Che, Y., and Marshall, G. R. (2006) Impact of cis-proline analogs on peptide conformation. Biopolymers 81, 392–406. 94. Che, Y., and Marshall, G. R. (2004) Impact of azaproline on peptide conformation. J. Org. Chem. 69, 9030–9042. 95. Spatola, A. F. (1983) In: Weinstein, B. (ed.) Chemistry and Biochemistry of Amino Acids, Peptides, and Proteins. Marcel Dekker, New York, pp. 267–357. 96. Spatola, A. F. (1993) Synthesis of pseudopeptides. Meth. Neurosci. 13, 19–42. 97. Deechongkit, S., Dawson, P. E., and Kelly, J. W. (2004) Toward assessing the position-dependent contributions of backbone hydrogen bonding to beta-sheet folding thermodynamics employing amide-to-ester perturbations. J. Am. Chem. Soc. 126, 16762–16771. 98. Deechongkit, S., Nguyen, H., Powers, E. T., Dawson, P. E., Gruebele, M., and Kelly, J. W. (2004) Context-dependent contributions of backbone hydrogen bonding to beta-sheet folding energetics. Nature 430, 101–105. 99. Blankenship, J. W., Balambika, R., and Dawson, P. E. (2002) Probing backbone hydrogen bonds in the hydrophobic core of GCN4. Biochemistry 41, 15676–15684. 100. Beligere, G. S., and Dawson, P. E. (2000) Design, synthesis, and characterization of 4-ester CI2, a model for the backbone hydrogen bonding in protein a-helices. J. Am. Chem. Soc. 122, 12079–12082.
153
101. Zabrocki, J., Smith, G. D., Dunbar J. B. Jr., Iijima, H., and Marshall, G. R. (1988) Conformational mimicry. 1. 1,5-disubstituted tetrazole ring as a surrogate for the cis amide bond. J. Am. Chem. Soc. 110, 5875–5880. 102. Smith, G. D., Zabrocki, J., Flak, T. A., and Marshall, G. R. (1991) Conformational mimicry. II. An obligatory cis amide bond in a small linear peptide. Int. J. Pept. Protein Res. 37, 191–197. 103. Zabrocki, J., Dunbar, J. B. Jr., Marshall, K. W., Toth, M. V., and Marshall, G. R. (1992) Conformational mimicry. Part III. Synthesis and incorporation of 1,5-disubstituted tetrazole dipeptide analogs into peptides with preservation of chiral integrity: Bradykinin. J. Org. Chem. 57, 202–209. 104. Beusen, D. D., Zabrocki, J., Slomczynska, U., Head, R. D., Kao, J., and Marshall, G. R. (1995) Conformational mimicry: Synthesis and solution conformation of a cyclic somatostatin hexapeptide containing a tetrazole cis-amide bond surrogate. Biopolymers 36, 181–200. 105. Zabrocki, J., and Marshall, G. R. (1999) In: Kazmierski, W. (ed.) Peptidomimetics Protocols. Humana Press, Totowa, pp. 417–436. 106. Kaczmarek, K., Jankowski, S., Siemion, I. Z., Wieczorek, Z., Benedetti, E., Di Lello, P., Isernia, C., Saviano, M., and Zabrocki, J. (2002) Tetrazole analogues of cyclolinopeptide A: Synthesis, conformation, and biology. Biopolymers 63, 343–357. 107. Nachman, R. J., Zabrocki, J., Olczak, J., Williams, H. J., Moyna, G., Ian Scott, A., and Coast, G. M. (2002) cis-peptide bond mimetic tetrazole analogs of the insect kinins identify the active conformation. Peptides 23, 709–716. 108. Taneja-Bageshwar, S., Strey, A., Kaczmarek, K., Zabrocki, J., Pietrantonio, P. V., and Nachman, R. J. (2008) Comparison of insect kinin analogs with cis-peptide bond, type VI-turn motifs identifies optimal stereochemistry for interaction with a recombinant arthropod kinin receptor from the southern cattle tick Boophilus microplus. Peptides 29, 295–301. 109. Beusen, D. D., Zabrocki, J., Slomczynska, U., Head, R. D., Kao, J. L., and Marshall, G. R. (1995) Conformational mimicry: Synthesis and solution conformation of a cyclic somatostatin hexapeptide containing a tetrazole cis amide bond surrogate. Biopolymers 36, 181–200.
154
Marshall, Kuster, and Che
110. Lewis, W. G., Green, L. G., Grynszpan, F., Radic, Z., Carlier, P. R., Taylor, P., Finn, M. G., and Sharpless, K. B. (2002) Click chemistry in situ: Acetylcholinesterase as a reaction vessel for the selective assembly of a femtomolar inhibitor from an array of building blocks. Angew. Chem. Int. Ed. Engl. 41, 1053–1057. 111. Tornoe, C. W., Christensen, C., and Meldal, M. (2002) Peptidotriazoles on solid phase: [1,2,3]-triazoles by regiospecific copper(i)catalyzed 1,3-dipolar cycloadditions of terminal alkynes to azides. J. Org. Chem. 67, 3057–3064. 112. Kolb, H. C., and Sharpless, K. B. (2003) The growing impact of click chemistry on drug discovery. Drug Discov. Today 8, 1128– 1137. 113. Zhang, L., Chen, X., Xue, P., Sun, H. H., Williams, I. D., Sharpless, K. B., Fokin, V. V., and Jia, G. (2005) Ruthenium-catalyzed cycloaddition of alkynes and organic azides. J. Am. Chem. Soc. 127, 15998–15999. 114. Tornoe, C. W., Sanderson, S. J., Mottram, J. C., Coombs, G. H., and Meldal, M. (2004) Combinatorial library of peptidotriazoles: Identification of [1,2,3]-triazole inhibitors against a recombinant Leishmania mexicana cysteine protease. J. Comb. Chem. 6, 312–324. 115. Deiters, A., Cropp, T. A., Mukherji, M., Chin, J. W., Anderson, J. C., and Schultz, P. G. (2003) Adding amino acids with novel reactivity to the genetic code of Saccromyces Cerevisiae. J. Am. Chem. Soc. 125, 11782–11783. 116. Hitotsuyanagi, Y., Motegi, S., Fukaya, H., and Takeya, K. (2002) A cis amide bond surrogate incorporating 1,2,4-triazole. J. Org. Chem. 67, 3266–3271. 117. Hitotsuyanagi, Y., Motegi, S., Hasuda, T., and Takeya, K. (2004) Semisynthesis of an analogue of antitumor bicyclic hexapeptide RA-VII by fixing the Ala-2/Tyr-3 bond to Cis by incorporating a triazole cis-amide bond surrogate. Org. Lett. 6, 1111–1114. 118. Bock, V. D., Speijer, D., Hiemstra, H., and van Maarseveen, J. H. (2007) 1,2,3-Triazoles as peptide bond isosteres: Synthesis and biological evaluation of cyclotetrapeptide mimics. Org. Biomol. Chem. 5, 971–975. 119. Cornille, F., Slomczynska, U., Smythe, M. L., Beusen, D. D., Moeller, K. D., and Marshall, G. R. (1995) Electrochemical cyclization of dipeptides toward novel bicyclic, reverse-turn peptidomimetics.1. Synthesis and conformational analysis of 7,5-bicyclic systems. J. Am. Chem. Soc. 117, 909–917.
120. Slomczynska, U., Chalmers, D. K., Cornille, F., Smythe, M. L., Beusen, D. D., Moeller, K. D., and Marshall, G. R. (1996) Electrochemical cyclization of dipeptides to form novel bicyclic, reverse-turn peptidomimetics.2. Synthesis and conformational analysis of 6,5-bicyclic systems. J. Org. Chem. 61, 1198–1204. 121. Bourne, G. T., Golding, S. W., McGeary, R. P., Meutermans, W. D., Jones, A., Marshall, G. R., Alewood, P. F., and Smythe, M. L. (2001) The development and application of a novel safety-catch linker for BOC-based assembly of libraries of cyclic peptides. J. Org. Chem. 66, 7706–7713. 122. Horton, D. A., Bourne, G. T., and Smythe, M. L. (2002) Exploring privileged structures: The combinatorial synthesis of cyclic peptides. Mol. Divers. 5, 289–304. 123. Patgiri, A., Jochim, A. L., and Arora, P. S. (2008) A hydrogen bond surrogate approach for stabilization of short peptide sequences in helical conformation. Acc Chem. Res. 41, 1289–1300. 124. Che, Y., and Marshall, G. R. (2006) Engineering cyclic tetrapeptides containing chimeric amino acids as preferred reverse-turn scaffolds. J. Med. Chem. 49, 111–124. 125. Mastle, W., Link, U., Witschel, W., Thewalt, U., Weber, T., and Rothe, M. (1991) Conformation and formation tendency of the cyclotetrapeptide cyclo (D-Pro-D-ProL-Pro-L-Pro): Experimental results and molecular modeling studies. Biopolymers 31, 735–744. 126. Mastle, W., Weber, T., Thewalt, U., and Rothe, M. (1989) Cyclo(D-Pro-L-Pro-DPro-L-Pro): Structural properties and cis/ trans isomerization of the cyclotetrapeptide backbone. Biopolymers 28, 161–174. 127. Link, U., Mastle, W., and Rothe, M. (1993) Conformations and conformational interconversions of diastereomeric cyclic tetraprolines. Int. J. Pept. Protein Res. 42, 475–483. 128. Bats, J. W., Friedrich, A., Fuess, H., Kessler, H., Mastle, W., and Rothe, M. (1979) Boat conformation of cyclo-[LPro2-D-Pro]. Angew. Chem. Int. Ed. Engl. 18, 538–539. 129. Rothe, M., and Mastle, W. (1982) Macrocycles of D- and L-proline residues. Angew. Chem. Int. Ed. Engl. 21, 220–221. 130. Gilbertson, S. R., and Pawlick, R. V. (1995) The synthesis and conformation of dihydroxy-cyclo(D-Pro-L-Pro-D-Pro-L-Pro). Tetrahedron Lett. 36, 1229–1232.
Chemogenomics with Protein Secondary-Structure Mimetics
131. Xu, H.-C., and Moeller, K. D. (2008) Intramolecular anodic olefin coupling reactions: The use of a nitrogen trapping group. J. Am. Chem. Soc. 130, 13542–13543. 132. Arbor, S., Kao, J., Wu, Y., and Marshall, G. R. (2008) c[D-pro-Pro-D-pro-N-methylAla] adopts a rigid conformation that serves as a scaffold to mimic reverse-turns. Biopolymers 90, 384–393. 133. Ye, Y., Liu, M., Kao, J. L., and Marshall, G. R. (2006) Novel trihydroxamate-containing peptides: Design, synthesis, and metal coordination. Biopolymers 84, 472–489. 134. Che, Y., Brooks, B. R., Riley, D. P., Reaka, A. J., and Marshall, G. R. (2007) Engineering metal complexes of chiral pentaazacrowns as privileged reverse-turn scaffolds. Chem. Biol. Drug Des. 69, 99–110. 135. Ye, Y., Liu, M., Kao, J. L., and Marshall, G. R. (2008) Design, synthesis, and metal binding of novel pseudo-oligopeptides containing two phosphinic acid groups. Biopolymers 89, 72–85. 136. Tian, Z. Q., and Bartlett, P. A. (1996) Metal coordination as a method for templating peptide conformation. J. Am. Chem. Soc. 118, 943–949. 137. Shi, Y., and Sharma, S. (1999) Metallopeptide approach to the design of biologically active ligands: Design of specific human neutrophil elastase inhibitors. Bioorg. Med. Chem. Lett. 9, 1469–1474. 138. Shi, Y., Cai, H.-Z., Yang, W. H., Blood, C., Shadiack, A., and Sharma, S. (1999) in “218th ACS National Meeting”, American Chemical Society, New Orleans, LA. 139. Reaka, A. J., Ho, C. M., and Marshall, G. R. (2002) Metal complexes of chiral pentaazacrowns as conformational templates for beta-turn recognition. J. Comput. Aided Mol. Des. 16, 585–600. 140. Riley, D. P. (2000) Rational design of synthetic enzymes and their potential utility as human pharmaceuticals: Development of manganese(II)-based superoxide dismutase mimics. Adv. Supramolecular Chem. 6, 217–244. 141. Riley, D. P., Henke, S. L., Lennon, P. J., and Aston, K. (1999) Computer-Aided Design (CAD) of synzymes: Use of molecular mechanics (MM) for the rational design of superoxide dismutase mimics. Inorg. Chem. 38, 1908–1917. 142. Tamamura, H., Otaka, A., Murakami, T., Ibuka, T., Sakano, K., Waki, M., Matsumoto, A., Yamamoto, N., and Fujii, N. (1996) An anti-HIV peptide,
143.
144.
145.
146.
147.
148.
149.
150.
151.
155
T22, forms a highly active complex with Zn(II). Biochem. Biophys. Res. Commun. 229, 648–652. Tamamura, H., Xu, Y., Hattori, T., Zhang, X., Arakaki, R., Kanbara, K., Omagari, A., Otaka, A., Ibuka, T., Yamamoto, N., Nakashima, H., and Fujii, N. (1998) A low-molecular-weight inhibitor against the chemokine receptor CXCR4: A strong anti-HIV peptide T140. Biochem. Biophys. Res. Commun. 253, 877–882. Ye, Y., Liu, M., Kao, J. L., and Marshall, G. R. (2003) Peptide-bond modification for metal coordination: Peptides containing two hydroxamate groups. Biopolymers 71, 489–515. Akiyama, M., Iesaki, K., Katoh, A., and Shimizu, K. (1986) N-hydroxy amides.5. Synthesis and properties Of N-hydroxypeptides having leucine enkephalin sequences. J. Chem. Soc. Perkin. 1, 851–855. Tran, T. T., McKie, J., Meutermans, W. D., Bourne, G. T., Andrews, P. R., and Smythe, M. L. (2005) Topological side-chain classification of beta-turns: Ideal motifs for peptidomimetic development. J. Comput. Aided Mol. Des. 19, 551–566. Blomberg, D., Kreye, P., Fowler, C., Brickmann, K., and Kihlberg, J. (2006) Synthesis and biological evaluation of leucine enkephalin turn mimetics. Org Biomol Chem 4, 416–423. Blomberg, D., Hedenstrom, M., Kreye, P., Sethson, I., Brickmann, K., and Kihlberg, J. (2004) Synthesis and conformational studies of a beta-turn mimetic incorporated in Leuenkephalin. J. Org. Chem. 69, 3500–3508. Hanessian, S., McNaughtonSmith, G., Lombart, H. G., and Lubell, W. D. (1997) Design and synthesis of conformationally constrained amino acids as versatile scaffolds and peptide mimetics. Tetrahedron 53, 12789–12854. Polyak, F., and Lubell, W. D. (1998) Rigid dipeptide mimics - Synthesis of enantiopure 5- and 7-benzyl and 5,7-dibenzyl indolizidinone amino acids via enolization and alkylation of delta-oxo alpha,omegadi-[N-(9-(9-phenylfluorenyl))amino] azelate esters. J. Org. Chem. 63, 5937– 5949. Gosselin, F., and Lubell, W. D. (2000) Rigid dipeptide surrogates: Syntheses of enantiopure quinolizidinone and pyrroloazepinone amino acids from a common diaminodicarboxylate precursor. J. Org. Chem. 65, 2163–2171.
156
Marshall, Kuster, and Che
152. Halab, L., Gosselin, F., and Lubell, W. D. (2000) Design, synthesis, and conformational analysis of azacycloalkane amino acids as conformationally constrained probes for mimicry of peptide secondary structures. Biopolymers 55, 101–122. 153. Finn, F. M., and Hofmann, K. (1973) The S-peptide S-protein System: A Model for hormone-receptor interaction. Acc. Chem. Res. 6, 169–176. 154. Varadarajan, R., Connelly, P. R., Sturtevant, J. M., and Richards, F. M. (1992) Heat capacity changes for protein-peptide interactions in the ribonuclease S System. Biochemistry 31, 1421–1426. 155. Kussie, P. H., Gorina, S., Marechal, V., Elenbaas, B., Moreau, J., Levine, A. J., and Pavletich, N. P. (1996) Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 274, 948–953. 156. Vassilev, L. T., Vu, B. T., Graves, B., Carvajal, D., Podlaski, F., Filipovic, Z., Kong, N., Kammlott, U., Lukacs, C., Klein, C., Fotouhi, N., and Liu, E. A. (2004) In vivo activation of the p53 pathway by smallmolecule antagonists of MDM2. Science 303, 844–848. 157. Fletcher, S., and Hamilton, A. D. (2007) Protein-protein interaction inhibitors: Small molecules from screening techniques. Curr. Topics Med. Chem. 7, 922–927. 158. Garcia-Echeverria, C., Chene, P., Blommers, M. J., and Furet, P. (2000) Discovery of potent antagonists of the interaction between human double minute 2 and tumor suppressor p53. J. Med. Chem. 43, 3205–3208. 159. Chene, P., Fuchs, J., Bohn, J., GarciaEcheverria, C., Furet, P., and Fabbro, D. (2000) A small synthetic peptide, which inhibits the p53-hdm2 interaction, stimulates the p53 pathway in tumour cell lines. J. Mol. Biol. 299, 245–253. 160. Hodgkin, E. E., Clark, J. D., Miller, K. R., and Marshall, G. R. (1990) Conformational analysis and helical preferences of normal and a,a-dialkyl amino acids. Biopolymers 30, 533–546. 161. Pauling, L., Corey, R. B., and Branson, H. R. (1951) The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. U.S.A. 37, 205–211. 162. Donohue, J. (1953) Hydrogen bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. U.S.A. 39, 470–478.
163. Kuster, D. J., Urahata, S., Ponder, J. W., and Marshall, G. R. (2007) In: Abst 51st Biophysical Soc. Meeting, Poster 1791. The Biophysical Society, Baltimore. 164. Wieczorek, R., and Dannenberg, J. J. (2004) Comparison of fully optimized aand 3.10-Helices with extended b-strands. An ONIOM density functional theory study. J. Am. Chem. Soc. 126, 14198–14205. 165. Ren, P., and Ponder, J. W. (2002) Polarizable atomic multipole water model for molecular mechanics simulation. J. Phys. Chem. B 107, 5933–5947. 166. Ren, P., and Ponder, J. W. (2002) Consistent treatment of inter- and intramolecular polarization in molecular mechanics calculations. J. Comput. Chem. 23, 1497–1506. 167. Ponder, J. W., and Case, D. A. (2003) Force fields for protein simulations. Adv. Protein Chem. 66, 27–85. 168. Grossfield, A., Ren, P., and Ponder, J. W. (2003) Ion solvation thermodynamics from simulation with a polarizable force field. J. Am. Chem. Soc. 125, 15671–15682. 169. Best, R. B., Buchete, N.-V., and Hummer, G. (2008) Are current molecular dynamics force fields too helical? Biophys. J. 95, L07–L09. 170. Cummings, M. D., Schubert, C., Parks, D. J., Calvo, R. R., LaFrance, L. V., Lattanze, J., Milkiewicz, K. L., and Lu, T. (2006) Substituted 1,4-benzodiazepine-2,5-diones as alpha-helix mimetic antagonists of the HDM2-p53 protein-protein interaction. Chem. Biol. Drug Des. 67, 201–205. 171. Burgess, A. W., and Leach, S. J. (1973) An obligatory alpha-helical amino acid residue. Biopolymers 12, 2599–2605. 172. Marshall, G. R., Hodgkin, E. E., Langs, D. A., Smith, G. D., Zabrocki, J., and Leplawy, M. T. (1990) Factors governing helical preference of peptides containing multiple alpha,alpha-dialkyl amino acids. Proc. Natl. Acad. Sci. U.S.A. 87, 487–491. 173. Smythe, M. L., Huston, S. E., and Marshall, G. R. (1993) Free-energy profile of A 3(10)-helical to alpha-helical transition of an oligopeptide in various solvents. J. Am. Chem. Soc. 115, 11594–11595. 174. Bisetty, K., Catalan, J. G., Kruger, H. G., and Perez, J. J. (2005) Conformational analysis of small peptides of the type Ac-XNHMe, where X=Gly, Ala, Aib and Cage. J. Mol. Struc. (Theochem) 731, 127–137. 175. Huston, S. E., and Marshall, G. R. (1994) Alpha/3(10)-helix transitions in alphamethylalanine homopeptides: Conforma-
176.
177.
178.
179.
180.
181.
182.
183. 184.
185.
Chemogenomics with Protein Secondary-Structure Mimetics tional transition pathway and potential of mean force. Biopolymers 34, 75–90. Van Roey, P., Smith, G. D., Balasubramanian, T. M., Czerwinski, E. W., Marshall, G. R., and Mathews, F. S. (1983) Crystal and molecular structure of tert.-butyloxycarbonyl-L-prolyl-alpha-aminoisobutyryl-L-alanyl-alpha- aminoisobutyrate methyl ester. Int. J. Pept. Protein Res. 22, 404–409. Prasad, B. V., and Balaram, P. (1984) The stereochemistry of peptides containing alpha-aminoisobutyric acid. CRC Crit. Rev. Biochem. 16, 307–348. Karle, I. L., Flippen-Anderson, J. L., Uma, K., and Balaram, P. (1993) Unfolding of an alpha-helix in peptide crystals by solvation: Conformational fragility in a heptapeptide. Biopolymers 33, 827–837. Toniolo, C., Crisma, M., Formaggio, F., Valle, G., Cavicchioni, G., Precigoux, G., Aubry, A., and Kamphuis, J. (1993) Structures of peptides from alpha-amino acids methylated at the alpha-carbon. Biopolymers 33, 1061–1072. Andersen, N. H., Liu, Z., and Prickett, K. S. (1996) Efforts toward deriving the CD spectrum of a 3(10) helix in aqueous medium. FEBS Lett. 399, 47–52. Monaco, V., Formaggio, F., Crisma, M., Toniolo, C., Hanson, P., Millhauser, G., George, C., Deschamps, J. R., and Flippen-Anderson, J. L. (1999) Determining the occurrence of a 3(10)-helix and an alpha-helix in two different segments of a lipopeptaibol antibiotic using TOAC, a nitroxide spin-labeled C(alpha)-tetrasubstituted alpha-aminoacid. Bioorg. Med. Chem. 7, 119–131. Higashimoto, Y., Kodama, H., JelokhaniNiaraki, M., Kato, F., and Kondo, M. (1999) Structure-function relationship of model Aib-containing peptides as ion transfer intermembrane templates. J. Biochem. (Tokyo) 125, 705–712. Karle, I. L. (2001) Controls exerted by the Aib residue: Helix formation and helix reversal. Biopolymers 60, 351–365. Howl, J., Prochazka, Z., Wheatley, M., and Slaninova, J. (1999) Novel strategies for the design of receptor-selective vasopressin analogues: Aib-substitution and retroinverso transformation. Br. J. Pharmacol. 128, 647–652. Creamer, T. P., and Rose, G. D. (1992) Sidechain entropy opposes alpha-helix formation but rationalizes experimentally determined helix-forming propensities. Proc. Natl. Acad. Sci. U.S.A. 89, 5937–5941.
157
186. Maison, W., Arce, E., Renold, P., Kennedy, R. J., and Kemp, D. S. (2001) Optimal N-caps for N-terminal helical templates: Effects of changes in H-bonding efficiency and charge. J. Am. Chem. Soc. 123, 10245– 10254. 187. Jacoby, E. (2002) Biphenyls as potential mimetics of protein alpha-helix. Bioorg. Med. Chem. Lett. 12, 891–893. 188. Orner, B. P., Ernst, J. T., and Hamilton, A. D. (2001) Toward proteomimetics: Terphenyl derivatives as structural and functional mimics of extended regions of an alphahelix. J. Am. Chem. Soc. 123, 5382–5383. 189. Davis, J. M., Truong, A., and Hamilton, A. D. (2005) Synthesis of a 2,3¢;6¢,3″-terpyridine scaffold as an alpha-helix mimetic. Org. Lett. 7, 5405–5408. 190. Ernst, J. T., Kutzki, O., Debnath, A. K., Jiang, S., Lu, H., and Hamilton, A. D. (2002) Design of a protein surface antagonist based on alpha-helix mimicry: Inhibition of gp41 assembly and viral fusion. Angew. Chem. Int. Ed. Engl. 41, 278–281. 191. Ernst, J. T., Becerril, J., Park, H. S., Yin, H., and Hamilton, A. D. (2003) Design and application of an alpha-helix-mimetic scaffold based on an oligoamide-foldamer strategy: Antagonism of the Bak BH3/BclxL complex. Angew. Chem. Int. Ed. Engl. 42, 535–539. 192. Che, Y., Brooks, B. R., and Marshall, G. R. (2007) Protein recognition motifs: Design of peptidomimetics of helix surfaces. Biopolymers 86, 288–297. 193. Biros, S. M., Moisan, L., Mann, E., Carella, A., Zhai, D., Reed, J. C., and Rebek, J. Jr. (2007) Heterocyclic alpha-helix mimetics for targeting protein-protein interactions. Bioorg. Med. Chem. Lett. 17, 4641–4645. 194. Restorp, P., and Rebek, J. (2008) Synthesis of a-helix mimetics with four side-chains. Bioorg. Med. Chem. Lett. 18, 5905–5911. 195. Ahn, J.-M., and Han, S.-Y. (2007) Facile synthesis of benzamides to mimic an a-helix. Tetrahedron Lett. 48, 3543–3547. 196. Todd, M. J., Semo, N., and Freire, E. (1998) The structural stability of the HIV-1 protease. J. Mol. Biol. 283, 475–488. 197. Shultz, M. D., and Chmielewski, J. (1999) Probing the role of interfacial residues in a dimerization inhibitor of HIV-1 protease. Bioorg. Med. Chem. Lett. 9, 2431–2436. 198. Zutshi, R., Brickner, M., and Chmielewski, J. (1998) Inhibiting the assembly of protein-protein interfaces. Curr. Opin. Chem. Biol. 2, 62–66.
158
Marshall, Kuster, and Che
199. Bowman, M. J., and Chmielewski, J. (2002) Novel strategies for targeting the dimerization interface of HIV protease with crosslinked interfacial peptides. Biopolymers 66, 126–133. 200. Randolph, J. T., and DeGoey, D. A. (2004) Peptidomimetic inhibitors of HIV protease. Curr. Top. Med. Chem. 4, 1079–1095. 201. Chrusciel, R. A., and Strohbach, J. W. (2004) Non-peptidic HIV protease inhibitors. Curr. Top. Med. Chem. 4, 1097–1114. 202. Miller, M., Schneider, J., Sathyanarayana, B. K., Toth, M. V., Marshall, G. R., Clawson, L., Selk, L., Kent, S. B., and Wlodawer, A. (1989) Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. Science 246, 1149–1152. 203. Miller, M., Geller, M., Gribskov, M., and Kent, S. B. (1997) Analysis of the structure of chemically synthesized HIV-1 protease complexed with a hexapeptide inhibitor. Part I: Crystallographic refinement of 2 A data. Proteins 27, 184–194. 204. Smith, A. B. III, Hirschmann, R., Pasternak, A., Akaishi, R., Guzman, M. C., Jones, D.
205.
206.
207.
208.
R., Keenan, T. P., Sprengeler, P. A., Darke, P. L., and Emini, E. A. (1994) Design and synthesis of peptidomimetic inhibitors of HIV-1 protease and renin. Evidence for improved transport. J. Med. Chem. 37, 215– 218. Smith, A. B. III, Nittoli, T., Sprengeler, P. A., Duan, J. J., Liu, R. Q., and Hirschmann, R. F. (2000) Design, synthesis, and evaluation of a pyrrolinone-based matrix metalloprotease inhibitor. Org. Lett. 2, 3809–3812. Phillips, S. T., Blasdel, L. K., and Bartlett, P. A. (2005) @-tides as reporters for molecular associations. J. Am. Chem. Soc. 127, 4193– 4198. Phillips, S. T., Rezac, M., Abel, U., Kossenjans, M., and Bartlett, P. A. (2002) “@-Tides”: The 1,2-dihydro-3(6H)-pyridinone unit as a beta-strand mimic. J. Am. Chem. Soc. 124, 58–66. Hammond, M. C., Harris, B. Z., Lim, W. A., and Bartlett, P. A. (2006) Beta strand peptidomimetics as potent PDZ domain ligands. Chem. Biol. 13, 1247–1251.
Chapter 6 Database Systems for Knowledge-Based Discovery Sarma A.R.P. Jagarlapudi and K.V. Radha Kishan Summary Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery. Key words: GVK BIO databases, Medchem database, Toxicity database, Clinical candidate database, GPCR database, Kinase database, GOSTAR, Biomarker database
1. Introduction Over the years large amount of scientific information in chemical sciences, bio and pharmaceutical sciences has been accumulated due to efforts by researchers from academics, pharmaceutical and biotechnology companies. Although much of the information is being published in scientific journals by various publishing houses, the information embedded in each publication remains standalone and relational search over the entire data is not viable in the current format. The aim of many pharmaceutical companies is to develop a drug which is novel, unique, and protected
Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_6, © Humana Press, a part of Springer Science + Business Media, LLC 2009
159
160
Jagarlapudi and Radha Kishan
through intellectual property rights. Prior art information on drug molecules that are currently in clinical trials, or the molecules that were in trials but failed at some stage or the other, is not easily available and searching for such information takes unaffordable timelines and become expensive for pharmaceutical industry. Available molecular information on a particular series of chemical scaffolds, particular protein/enzyme targeted inhibitors, or various biological activities of particular molecules is discretely available over a plethora of journals, but difficult to access in an automated way. Knowledge derived out of an individual chemical molecule may be limited, but collectively many chemical molecules display a variety of properties which are very much useful in a pharmaceutical industry. Therefore, besides GVK Biosciences, many publicly or privately funded organizations tried to build databases of chemical molecules with a variety of known published properties and made them available to the pharmaceutical industry. For example, PubChem, ChemBank, NIST chemical kinetics database, KEGG database, and Ingenuity systems, to name a few, have been developed and made available to a large group of research and commercial organizations (1–4). The biggest challenge is to identify the requirement of information across a wide range of pharmaceutical industry and designing the databases accordingly. At GVK Biosciences we identified certain areas of interest in the industry. Accordingly databases are designed and created with unique features compared to many other databases (5, 6). Some of the advantages of these databases are that one can obtain readily available biochemical information on a list of compounds supposed to be active on a target molecule. This information could be used to generate toxicity models by comparing various molecules in the database. These models could be used for designing drug-like molecules. One of the key advantages is to generate diverse compound libraries with optimum physicochemical properties. Attempts are being made to integrate various databases and relationships are drawn to extract knowledge from these databases.
2. Methods The interest in the present pharmaceutical industry is very wide spanning the overall drug discovery sphere encompassing data related to small molecules, structure activity relationships, preclinical candidates, clinical candidates, launched drugs, biomarkers, toxic molecules, and other target protein inhibitors. While no single database could address all these areas, attempts were made to at least address them individually. We have developed databases
Database Systems for Knowledge-Based Discovery
161
based on two broad criteria, reference centric and molecule centric. Reference-centric databases like Medicinal Chemistry database (MCD) contain data on available chemical molecules with associated biological, pharmacological, and toxicological data extracted from journals. Other target-specific databases contain information on compounds active on kinases, proteases, phosphatases, G-protein-coupled receptors (GPCR), transporters, nuclear hormone receptors (NHR), and ion channels. Molecule-centric databases comprise preclinical candidate database (PCD), clinical candidate database (CCD), drug database (DD), and mechanism-based toxicity (MBT) database. Total number of records from journals and patents exceeds 3.2 million and number of structure-activity relationship points exceeds 7.6 million. Fig. 1 shows the data enrichment information of all the GVK BIO databases. Specialized databases are usually on research area-specific like biomarker database which is attracting wide attention recently. The methods to develop these databases are described. 2.1. Reference-Centric Databases
A typical database creation requires a lot of designing and scope for improving the earlier design. Data mining started with the classification list of proteins/targets of interest. Standard names of proteins, aliases, trivial names, official names, and gene identifications were included. Keywords were selected for patent searching on search sites, both public and commercial, to extract patents beginning from 1971. The results were exported into a text file or a spread sheet for further processing. Extraction, sorting, and testing redundancy testing of data were carried out using an in-house developed patent mining tool. In the case of journals, about 100 subscribed journals were searched with standard keywords for target inhibitors. The extracted journal articles were subjected to manual curation and review process. Data curation involves capturing of data from journals and patents to be entered into an ISIS-base template designed for a specific database. The data resources here were journal articles, patents, publicly available resources, conference abstracts, and industry brochures. When the data resource contains individually specified or characterized compounds, then it would be curated based on three criteria, quantitative activity data, qualitative activity data, and data containing no activity information. Curation of quantitative activity data was restricted to the number of scaffolds and kind of activity. When the activity was distinct to the compounds all the compounds reported in a data source were curated, while it was restricted to a maximum of 500 compounds covering all the scaffolds in the data source when a range of activities was reported. Curation of qualitative data varies from a maximum of 200 compounds to a minimum of 100 compounds depending upon the availability of scaffolds. A maximum of 200 compounds would
Fig. 1. Content information of number of data points for all the databases of GVK BIO.
162 Jagarlapudi and Radha Kishan
Database Systems for Knowledge-Based Discovery
163
be curated if the data resources have more than or equal to two scaffolds irrespective of common or distinct qualitative activity, and a minimum of 100 compounds would be curated if the data resource has only one scaffold irrespective of common or distinct qualitative activity. If the data resource contains no activity data a maximum of 100 compounds would be curated covering all the scaffolds with an upper limit of 25 compounds from each scaffold and a maximum of 25 compounds if there was only one scaffold in the data resource with maximum diversity among the compounds. If the data resource contains compounds synthesized using chemical libraries a maximum of 500 compounds would be curated irrespective of the activity in a range or fixed activity information. Two rounds of data review process involve consolidation of curated databases and reallocation for the final review to be done in accordance with the quality standard specifications in order to achieve an error 64 mM. The coding of the in silico methods is as in Fig. 6.
189
190
Jacoby et al.
Table 1 Key parameters for HTS and VS approaches at different concentration thresholds of the MDM4 and MDM2 assays Assay/Threshold
HTS
VS
HTS/VS
HR HTS % HR VS %
Enrichment FN HTS %
MDM4 < 64 mM
4,062
455
326
0.366
1.679
5
9
MDM4 £ 60 mM
3,876
336
318
0.350
1.406
4
7
MDM4 £ 40 mM
1,974
79
196
0.181
0.591
3
4
MDM4 £ 20 mM
327
13
26
0.029
0.084
3
4
MDM2 < 64 mM
3,945
538
316
0.355
1.836
5
11
MDM2 £ 60 mM
3,768
446
309
0.340
1.623
5
10
MDM2 £ 40 mM
2,034
186
207
0.187
0.845
5
8
MDM2 £ 20 mM
444
84
78
0.044
0.348
8
14
HTS and VS are, respectively, the number of hits found exclusively in the HTS and VS experiments; HTS/VS is the number of hits found in the overlap. The hit rate (HR) is defined by the number of hits divided by the total number of compounds screened. The enrichment factor is defined by the VS hit rate divided by the HTS hit rate. The false negative hit rate (FN) is defined by the number of VS hits divided by the total number of hits.
et al. (27) that compound affinities for MDM4 should in general be weaker than for MDM2. Depending on the activity threshold, the hit rates for the VS experiment are between 0.084 and 1.836% and for the HTS experiment between 0.029 and 0.366%. Accordingly, the enrichment factor of the VS experiment over the HTS experiment ranges from 3–8-fold. The comparison of the VS and HTS experiment allows for estimating a false negative hit rate of the HTS experiment. The false negative hit rate of the HTS ranges between 4 and 14%, depending on the assay and the threshold. In order to access the structural diversity and complementarity, the validated HTS and virtual screening hit lists were each clustered using the scaffold-tree approach (61). This clustering approach resulted in 60 main chemical series for the HTS and 43 main chemical series for the VS approach, where a main series is defined by having at least four members and by having at least one compound with an activity £20 mM in the MDM4 or MDM2 assays. Thus, the HTS approach produces a higher diversity than the virtual screening approach. Only 11 main chemical series were common to both hit lists, which demonstrates the structural complementarity of both approaches. Regarding the in vivo significance of the results obtained, one has to keep in mind that full length proteins exhibit higher binding affinities in the MDM4-p53 and MDM2-p53 complexes compared to the N-terminal
Knowledge-Based Virtual Screening
191
domains of MDM4 and MDM2 and p53 peptide segments used herein (20, 62).
4. Conclusion In this chapter we have outlined the potential of a knowledgebased virtual screening approach by applying it to the MDM4–p53 interaction and using the MDM2-p53 system as the basis reference set. Key to the success of the approach were the combination of a variety of conceptually different and complementary ligand-based and structure-based VS methods and the feasibility of a large VS set enabled by modern HTCP technology. Compared to previous studies, 3D based VS methods performed better than computationally cheaper 2D methods; although, one should consider that the compound sets generated by the different methods were not normalized. We showed that the VS approach is in terms of structural diversity highly complementary to the full HTS experiment. HTS identifies, however, a larger number and more potent hits. The VS approach has its strength in identifying weak actives which can be lost during the HTS confirmation process. Both approaches identified novel selective and dual MDM4 and MDM2 inhibitors. The conserved, but distinctive, molecular recognition between the MDM4 and MDM2 systems is the basis for the feasibility of the knowledge-based VS approach. More generally, a successful application of the approach can be expected when the conservation is high, both in terms of the molecular structure and the dynamics of the ligand binding site (63). In addition to NMR and X-ray structures, the building of homology models here, is of value to access the conservation between the related systems and to exclude unfeasible systems where for instance the access to the binding site is occluded by unfavorable mutations. In summary, we recommend the application of knowledgebased VS approaches whenever the ligand and structure-based knowledgebase is sufficiently large, and this in a complementary manner to a full HTS.
Acknowledgments Drs. P. Chene and P. Fuerst (all NIBR associates) are acknowledged for various support and discussions.
192
Jacoby et al.
References 1. Caron, P.R., Mullican, M.D., Mashal, R.D., Wilson, K.P., Su, M.S., and Murcko, M.A. (2001) Chemogenomic approaches to drug discovery. Curr. Opin. Chem. Biol. 5, 464–470. 2. Mestres, J. (2004) Computational chemogenomics approaches to systematic knowledge-based drug discovery. Curr. Opin. Drug Discov. Devel. 7, 304–314. 3. Kubini, H., and Müller, G. (eds.) (2004) Chemogenomics in Drug Discovery – A Medicinal Chemistry Perspective. WileyVCH, Weinheim. 4. Jacoby, E. (2006) Chemogenomics: drug discovery’s panacea? Mol. Biosyst. 2, 218– 220. 5. Jacoby, E. (ed.) (2006) Chemogenomics – Knowledge-Based Approaches to Drug Discovery. Imperial College Press, London. 6. Rognan, D. (2007) Chemogenomic app roaches to rational drug design. Br. J. Pharmacol. 152, 38–52. 7. Klabunde, T. (2007) Chemogenomic approaches to drug discovery: similar receptors bind similar ligands. Br. J. Pharmacol. 152, 5–7. 8. Savchuck, N.P., Balakin, K.V., and Tkachenko, S.E. (2004) Exploring the chemogenomic knowledge space with annotated chemical libraries. Curr. Opin. Chem. Biol. 8, 412–417. 9. Vieth, M., Higgs, R.E., Robertson, D.H., Shapiro, M., Gragg, E.A., and Hemmerle, H. (2004) Kinomics-structural biology and chemogenomics of kinase inhibitors and targets. Biochim. Biophys. Acta. 1697, 243–257. 10. ter Haar, E., Walters, W.P., Pazhanisamy, S., Taslimi, P., Pierce, A.C., Bemis, G.W., Salituro, F.G., and Harbeson, S.L. (2004) Kinase chemogenomics: targeting the human kinome for target validation and drug discovery. Mini. Rev. Med. Chem. 4, 235–253. 11. Savchuk, N.P., Tkachenko, S.E., and Balakin, K.V. (2006) Strategies for the design of pGPCR-targeted libraries. In: Rognan, D. (ed.) Ligand Design for G protein-coupled receptors. Wiley-VCH, Weinheim, pp. 137–164. 12. Jacoby, E., Bouhelal, R., Gerspacher, M., and Seuwen, K. (2006) The 7 TM G-protein-coupled receptor target family. ChemMedChem. 1, 760–782. 13. Koch, M.A., Breinbauer, R., and Waldmann, H. (2003) Protein structure similarity as
14. 15.
16.
17.
18.
19.
20.
21.
22. 23. 24.
25.
guiding principle for combinatorial library design. Biol. Chem. 384, 1265–1272. Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nat. Rev. Drug Discov. 1, 882–894. Bleicher, K.H., Böhm, H.J., Müller, K., and Alanine, A.I. (2003) Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug Discov. 5, 369–378. Jacoby, E., Schuffenhauer, A., Popov, M., Azzaoui, K., Havill, B., Schopfer, U., Engeloch, C., Stanek, J., Acklin, P., Rigollier, P., Stoll, F., Koch, G., Meier, P., Orain, D., Giger, R., Hinrichs, J., Malagu, K., Zimmermann, J., and Roth, H.J. (2005) Key aspects of the Novartis compound collection enhancement project for the compilation of a comprehensive chemogenomics drug discovery screening collection. Curr. Top. Med. Chem. 5, 397–411. Jacoby, E., Schuffenhauer, A., Azzaoui, K., Popov, M., Dressler, S., Glick, M., Jenkins, J., Davies, J., and Roggo, S. (2006) Small molecules for chemogenomics-based drug discovery. In: Jacoby, E. (ed.) Chemogenomics – Knowledge-Based Approaches to Drug Discovery. Imperial College Press, London, pp. 1–38. Toledo, F. and Wahl, G.M. (2006) Regulating the p53 pathway: in vitro hypotheses, in vivo veritas. Nat. Rev. Cancer 6, 909– 923. Toledo, F. and Wahl, G.M. (2007) MDM2 and MDM4: p53 regulators as targets in anticancer therapy. Int. J. Biochem. Cell. Biol. 39, 1476–1482. Hu, B., Gilkes, D.M., and Chen, J. (2007) Efficient p53 activation and apoptosis by simultaneous disruption of binding to MDM2 and MDMX. Cancer Res. 67, 8810–8817. Olah, M., Mracec, M., Ostopovici, L., Rad, R., Bora, A., Hadaruga, N., Olah, I., Banda, M., Simon, Z., Mracec, M., and Oprea, T.I. (2005) WOMBAT: World of molecular bioactivity. In: T.I. Oprea (ed.) Cheminformatics in Drug Discovery. Wiley-VCH, Weinheim, pp. 233–240. http://www.prous.com http://www.mdl.com Ekins, S., Mestres, J., and Testa, B. (2007) In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling. Br. J. Pharmacol. 152, 9–20. Ekins, S., Mestres, J., and Testa, B. (2007) In silico pharmacology for drug discovery:
26.
27.
28.
29. 30.
31.
32.
33.
34.
Knowledge-Based Virtual Screening applications to targets and beyond. Br. J. Pharmacol. 152, 21–37. Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. (2003) Similarity metrics for ligands reflecting the similarity of the target proteins. J. Chem. Inf. Comput. Sci. 43, 391–405. Popowicz, G.M., Czarna, A., Rothweiler, U., Szwagierczak, A., Krajewski, M., Weber, L., and Holak, T.A. (2007) Molecular basis for the inhibition of p53 by Mdmx. Cell Cycle 6, 2386–2392. Dudkina, A.S. and Lindsley, C.W. (2007) Small molecule protein-protein inhibitors for the p53-MDM2 interaction. Curr. Top. Med. Chem. 7, 952–960. Vassilev, L.T. (2007) MDM2 inhibitors for cancer therapy. Trends Mol. Med. 13, 23–31. Kussie, P. H., Gorina, S., Marechal, V., Elenbaas, B., Moreau, J., Levine, A., and Pavletich, N.P. (1996) Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 274, 948–953. Vassilev, L.T., Vu, B.T., Graves, B., Carvajal, D., Podlaski, F., Filipovic, Z., Kong, N., Kammlott, U., Lukacs, C., Klein, C., Fotouhi, N., and Liu, E.A. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. (2004) Science 303, 844–848. Ding, K., Lu, Y., Nikolovska-Coleska, Z., Wang, G., Qiu, S., Shangary, S., Gao, W., Qin, D., Stuckey, J., Krajewski, K., Roller, P.P., and Wang, S. (2006) Structure-based design of spiro-oxindoles as potent, specific small-molecule inhibitors of the MDM2p53 interaction. J. Med. Chem. 49, 3432– 3435. Grasberger, B.L., Lu, T., Schubert, C., Parks, D.J., Carver, T.E., Koblish, H.K., Cummings, M.D., LaFrance, L.V., Milkie wicz, K.L., Calvo, R.R., Maguire, D., Lattanze, J., Franks, C.F., Zhao, S., Rama chandren, K., Bylebyl, G.R., Zhang, M., Manthey, C.L., Petrella, E.C., Pantoliano, M.W., Deckman, I.C., Spurlino, J.C., Maroney, A.C., Tomczuk, B.E., Molloy, C.J., and Bone, R.F. (2005) Discovery and cocrystal structure of benzodiazepinedione HDM2 antagonists that activate p53 in cells. J. Med. Chem. 48, 909–912. Sakurai, K., Schubert, C., and Kahne, D. (2006) Crystallographic analysis of an 8-mer p53 peptide analogue complexed with MDM2. J. Am. Chem. Soc. 128, 11000–11001.
193
35. Fasan, R., Dias, R.L.A., Moehle, K., Zerbe, O., Obrecht, D., Mittl, P.R.E., Grutter, M.G., and Robinson, J.A. (2006) Structure–activity studies in a family of b-hairpin protein epitope mimetic inhibitors of the p53-HDM2 protein–protein interaction. ChemBioChem. 7, 515–526. 36. chon, O., Friedler, A., Freund, S., and Fersht, A.R.(2004) Binding of p53-derived ligands to MDM2 induces a variety of long range conformational changes. J. Mol. Biol. 336, 197–202. 37. Fry, D.C., Emerson, S.D., Palme, S., Vu, B.T., Liu, C.M., and Podlaski, F. (2004) NMR structure of a complex between MDM2 and a small molecule inhibitor. J. Biomol. NMR. 30, 163–173 38. Uhrinova, S., Uhrin, D., Powers, H., Watt, K., Zheleva, D., Fischer, P., McInnes, C., and Barlow, P.N. (2005) Structure of free MDM2 N-terminal domain reveals conformational adjustments that accompany p53binding. J Mol. Biol. 350, 587–598. 39. Chene, P. (2006) Drugs targeting proteinprotein interactions. ChemMedChem. 1, 400–411. 40. Garcia-Echeverria, C., Chene, P., Blommers, M.J.J., and Furet, P. (2000) Discovery of potent antagonists of the interaction between human double minute 2 and tumor suppressor p53. J. Med. Chem. 43, 3205–3208. 41. Espinoza-Fonseca, L.M. and Trujillo-Ferrara, J.G. (2006) Conformational changes of the p53-binding cleft of MDM2 revealed by molecular dynamics simulations. Biopolymers 83, 365–373. 42. Böttger, V., Böttger, A., Garcia-Echeverria, C., Ramos, Y.F., van der Eb, A.J., Jochemsen, A.G., and Lane, D.P. (1999) Comparative study of the p53-mdm2 and p53-MDMX interfaces. Oncogene 18, 189–199. 43. http://www.schrödinger.com 44. http://www. scitegic.com 45. Hert, J., Willett, P., Wilton, D., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177–1185. 46. Xia, X., Maliski, E.G., Gallant, P., and Rogers, D. (2004) Classification of kinase inhibitors using a Bayesian model. J. Med. Chem. 47, 4463–4470. 47. Jenkins, J.L., Glick, M., and Davies, J.W. (2004) A 3D similarity method for scaffold hopping from known drugs or natural
194
48. 49. 50.
51. 52.
53. 54.
55.
Jacoby et al. ligands to new chemotypes. J. Med. Chem. 47, 6144–6159. Brown, N. and Jacoby, E. (2006) On scaffolds and hopping in medicinal chemistry. Mini Rev. Med. Chem. 6, 1217–1229. Brown, N., et al. (2009) Manuscript in preparation. Dubois, J.-É. (1976) Ordered chromatic graphs and limited environment concepts. In: Balaban, A. T. (ed.) Chemical Applications of Graph Theory. Academic Press, London, pp. 335–370. Bremser, W. (1978) HOSE – A Novel Substructure Code. Anal. Chim. Acta 103, 355–365. Rogers, D., Brown, R.D., and Hahn, M. (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. J. Biomol. Screen. 10, 682–686. http://www.umetrics.com. Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz, D.T., Repasky, M.P., Knoll, E.H., Shelley, M., Perry, J.K., Shaw, D.E., Francis, P., and Shenkin, P.S. (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749. Halgren, T.A., Murphy, R.B., Friesner, R.A., Beard, H.S., Frye, L.L., Pollard, W.T., and Banks, J.L. (2004) Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759.
56. Ferrara, P. and Jacoby, E. (2007) Evaluation of the utility of homology models in high throughput docking. J. Mol. Model. 13, 897–905. 57. Martin, Y.C. (1992) 3D database searching in drug design. J. Med. Chem. 35, 2145– 2154. 58. Hurst, T. (1994) Flexible 3D searching: the directed tweak technique. J. Chem. Inf. Comput. Sci. 34, 190–196. 59. Boettcher, A., et al. (2009) Manuscript in preparation. 60. Sheridan, R.P. and Kearsley, S.K. (2002) Why do we need so many chemical similarity search methods? Drug Discov. Today. 7, 903–911. 61. Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M.A., and Waldmann, H. (2007) The scaffold tree – visualization of the scaffold universe by hierarchical scaffold classification. J. Chem. Inf. Model. 47, 47–58. 62. Dawson, R., Müller, L., Dehner, A., Klein, C., Kessler, H., and Buchner, J. (2003) The N-terminal domain of p53 is natively unfolded. J. Mol. Biol. 332, 1131–1141. 63. Jacoby, E., Schuffenhauer, A., and Acklin, P. (2004) The contribution of molecular informatics to chemogenomics. Knowledge–based discovery of biological targets and chemical lead compounds. In: Kubini, H. and Müller, G. (eds.) Chemogenomics in Drug Discovery – A Medicinal Chemistry Perspective. Wiley-VCH, Weinheim, pp. 139–166.
1
Chapter 8 Off-Target Networks Derived from Ligand Set Similarity
2
Michael J. Keiser and Jérôme Hert
3
Summary
4
Chemically similar drugs often bind biologically diverse protein targets, and proteins with similar sequences or structures do not always recognize the same ligands. How can we uncover the pharmacological relationships among proteins, when drugs may bind them in defiance of bioinformatic criteria? Here we consider a technique that quantitatively relates proteins based on the chemical similarity of their ligands. Starting with tens of thousands of ligands organized into sets for hundreds of drug targets, we calculated the similarity among sets using ligand topology. We developed a statistical model to rank the resulting scores, which were then expressed in minimum spanning trees. We have shown that biologically sensible groups of targets emerged from these maps, as well as experimentally validated predictions of drug off-target effects.
5 6 7 8 9 10 11 12 13
Key words: SEA, Expectation value, Target network, Polypharmacology, Off-targets
1. Introduction
14
15
How similar are two proteins? Typically, proteins are compared using bioinformatics approaches based on sequence or structure. While these methods quantify historical protein divergence, drugs and other small molecules often bind to targets that are unrelated from an evolutionary standpoint (1, 2). For example, the enzymes thymidylate synthase, dihydrofolate reductase, and glycinamide ribonucleotide formyltransferase have no substantial sequence identity or structural similarity but they all recognize folic acid derivatives and are inhibited by antifolates. Similarly, the drug methadone binds both the m-opioid receptor, a GPCR, and the structurally unrelated N-methyl-d-aspartate receptor, an ion channel. Polypharmacology, the ability of chemically similar
Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_8, © Humana Press, a part of Springer Science + Business Media, LLC 2009
195
16 17 18 19 20 21 22 23 24 25 26 27
196
Keiser and Hert
drugs to bind biologically diverse proteins, has inspired recent efforts to find protein relationships by means other than their sequence or structure (3–5). The Similarity Ensemble Approach (SEA) considers proteins from a chemocentric point of view, relating them through the chemical similarity of their ligands (6). The idea is that similar molecules have similar biological profiles (7) and bind similar targets (8, 9). This technique links hundreds of ligand sets–and correspondingly their protein targets–together in minimal spanning trees where biologically related proteins cluster together as an emergent property (see Fig. 1). These networks are robust (10) and may be used to predict off-target effects (6). The similarities among ligand sets may reveal the pharmacological relationships of the targets whose actions they modulate. How does SEA work? An overview of the different stages is available in Fig. 2. The similarity between two ligand sets is first approximated by summing the similarity scores of molecule pairs across the sets (see Fig. 2b). In itself, the resulting raw score is not a good estimate of the overall similarity of the sets, as it does not discriminate relevant similarities from random and depends on the number of ligands in each set. SEA corrects for these shortcomings via a statistically determined threshold–pairs of molecules that score below it are discarded and do not contribute to the overall set similarity. We then convert the raw score to a size-bias-free z-score using the mean and standard deviation of raw scores modeled from sets of random molecules. Finally, we express the similarity score between two sets as an E-value, i.e., the probability of a given z-score that high or better to be observed from random data. Small E-values, then, reflect relationships between ligand sets that are stronger than would be expected by random chance alone.
2. Materials 1. A reference database of chemical structures, annotated by therapeutic indication or mechanism of action. For the purpose of illustration, we used the MDL Drug Data Report (MDDR) (11) which contains 65,367 molecules organized in 249 sets (see Note 1). 2. A molecular descriptor generator to encode the structural information of the compounds. We obtained the best results with 2-dimensional fingerprints based on topology of the molecules such as the 2,048-bit default Daylight or 1,024-bit folded Scitegic ECFP_4 descriptors (see Note 2). 3. A similarity coefficient, such as the Tanimoto coefficient (see Note 3).
Fig. 1. Pharmacological network of the MDDR drug targets. Each vertex represents a ligand set and hence a protein target. The vertices are linked together by their SEA E-values (edges) and organized into a minimum spanning tree. Several protein families are highlighted to emphasize the natural clustering that emerges.
Off-Target Networks Derived from Ligand Set Similarity 197
198
Keiser and Hert
Fig. 2. Method overview: Ligand sets derived from existing databases (a) are used in set-wise comparisons (b) against a query set, the result of which is quantified by the statistical model inferred from that reference database (c). The generated probabilistic data can be used to construct chemical mappings of the ligand sets and correspondingly the biological targets (d).
4. Calculating the parameters of the reference database requires a fitter program to calculate nonlinear regressions (see Note 4). 5. Building a similarity network requires a graph visualization program, such as Cytoscape (12).
3. Methods SEA quantifies the similarity among sets of compounds which may be organized by the targets they modulate, the therapeutic indications they address, their activity in a high-throughput screening campaign, or a variety of other criteria. So far, we have focused on sets organized by targets, but SEA can be used with other annotations. Before comparing any sets with SEA, the parameters of the background database–generally the one containing the sets one wishes to compare to–need to be calculated. While this step is computationally intensive, it is only required once for a given database, molecular descriptor, and similarity coefficient (see Subheading 3.1). Once the optimal threshold ti and the formulae
Off-Target Networks Derived from Ligand Set Similarity
199
of the mean ym and standard deviation ys as a function of the product of the sets’ sizes (|a| × |b|) have been determined, SEA can be applied to quantify set similarity (see Subheading 3.2). 3.1. Calculating the Parameters of the Reference Database
In this section, we generate thousands of randomly populated pairs of ligand sets and determine the uncorrected similarity among them. We use these “random” similarities to build an empirical model of background chemical similarity. The particular choice of chemical database will determine the type of background: KEGG molecules will yield a metabolic background, whereas ZINC molecules will produce drug- or lead-like backgrounds (depending on the exact subset used). It is preferable to choose as large a database as possible; those in excess of 100,000 molecules are often ideal. 1. Choose minimum and maximum set sizes smin and smax for sampling, such that they will be representative of molecule sets annotated in the database (see Note 5). 2. Sample at least 1,000 integers si from the range (smin×smin) to (smax × smax) (see Note 6). 3. For each product of sets’ sizes si, calculate all its integer factors fi, such that smin £ fi £ smax. 4. For each si, choose 30 of its fi at random and construct two sets a and b, consisting of fi and si/fi molecules, respectively, randomly selected from the background molecule database (see Note 7). 5. For each pair of sets a and b, calculate standard chemical similarities ca,b for each pair of ligands across the sets using your previously chosen chemical similarity descriptor and coefficient. 6. For ti, where 0 £ ti < 1 with step size 0.01, calculate a “raw score” ra,b(ti) equal to the sum of all ca,b where ca,b > ti. Store all calculated ra,b(ti), along with the sizes of sets a and b (see Note 8). 7. For each ti, plot all ra,b(ti) scores vs. the product of set sizes a and b, e.g., plot all points (|a| × |b|, ra,b). There should be 100 plots (see Note 9), each corresponding to a particular choice of ti. 8. For each plot, use the nonlinear fitter to determine the mean expected random chemical similarity (see Fig. 2c and Fig. 3a). Typically, an equation of the formula ym = mxn + p will be appropriate (see Note 10). 9. For each plot, bin the data by the x-axis values, such that each bin ideally has no fewer than five data points. Given the previously fitted ym, calculate the standard deviation of each bin with Laplacian correction, and fit the resulting standard deviation points nonlinearly (see Fig. 2c and Fig. 3b). Again, ys = qxr + s will typically be appropriate.
200
Keiser and Hert
Off-Target Networks Derived from Ligand Set Similarity
201
10. For each plot, use the fitted ym and ys to transform all original points (|a| × |b|, ra,b) to their z-scores za,b = (ra,b − ym(|a| × |b|))/ys(|a| × |b|) (see Note 11). Construct a histogram of these z-scores. 11. For each histogram, nonlinearly fit the data to Gaussian and extreme value type I (EVD) distributions (see Note 12, Fig. 2c, and Fig. 3c). 12. Based on goodness of fit, such as each fit’s observed-vs.expected c2 value, select the threshold choice ti, such that the histogram best fits an EVD instead of a Gaussian distribution (see Note 13). 13. Record the chosen ti and that ti’s formulae for ym and ys. These values comprise the random background model. All other plots, histograms, and formulae may be discarded at this point. 3.2. Calculating Set-Wise Similarity Ensembles
To calculate the set-wise similarity among sets of ligands, we reuse much of the machinery developed to calculate background models and extend it to calculate E-values. By exhaustively comparing all pairs of sets across two collections (databases), we can then rank the top hits for any particular ligand set. In practice, a ligand set should not comprise fewer than ten ligands, unless you intend to compare it against large sets only. For instance, it would not be statistically reliable to compare two sets of five ligands each, but a set of five ligands compared against a set of thirty should be acceptable. Although the particular choice of set size should depend on the diversity of ligands within a set, a good rule of thumb is to build sets such that the product of the set sizes will be no less than 100 (e.g., the product of set sizes is 25 for the five-by-five case, and 150 for the five-by-thirty case mentioned earlier). 1. To calculate similarity ensembles, choose two collections of sets Ca and Cb to compare (see Note 14). 2. For each set a and b from collections Ca and Cb, respectively, calculate ra,b(ti) as previously described using only the optimal threshold ti from the background model. Be sure to use the actual molecule structures annotated for each set. 3. Transform each ra,b(ti) to z-score za,b as described in Subheading 3.1, step 10.
Fig. 3. Statistical models: a Correlation between the product of sets’ sizes and the mean of the raw score. The fitted function typically corresponds to an equation of the formula ym = mx n + p with n = 1. b Correlation between the product of sets’ sizes and the standard deviation of the raw score. The fitted function typically corresponds to an equation of the formula ys = qx r + s, with 0.6 < r < 0.7. c Distribution of the z-scores obtained from random data using ECFP_4 fingerprints, with a similarity score threshold (ti ) of 0.57 and fitted to an extreme value distribution.
202
Keiser and Hert
4. Transform each z = za,b to p-value P(Z > z) = 1 − exp (−e−zp/sqrt(6)−G¢(1)), where G¢(1) is the Euler–Mascheroni constant (»0.577215665) (see Note 15). 5. Optionally, the p-value may be transformed to a BLAST-like E-value by calculating E(z) = P(Z > z) × ndb, where ndb = the number of set-vs.-set comparisons made when comparing all sets from collection Ca against all sets from collection Cb. Typically, ndb = |Ca| × |Cb|. 6. For each set a, rank all sets bi from Cb by their E-value, where values approaching zero are the best scores (see Note 16). 3.3. Building a Similarity Network
A similarity network is a graphical view of the E-value relationships among all ligand sets in a particular database (see Note 17). If these ligand sets represent particular drug targets, for instance, it is a visualization of the significant chemical similarity present among these targets (see Fig. 1). 1. Calculate the similarity ensemble E-values between all sets ai and aj from Ca versus itself (see Note 18), as previously described. 2. The resulting matrix of E-values defines a strongly connected graph, where each node corresponds to a molecule set and each edge to the E-value between two sets (see Note 19). 3. We use Kruskal’s algorithm (13) to construct a minimum spanning tree (MST): a. Create a set Stree that initially contains all individual nodes, unconnected. We refer to elements of Stree as “trees.” b. Create a set Se that contains all possible edges ei (E-values). c. While Se is not empty i. Remove the minimum-weighted (best) edge emin from Se. ii. If emin connects two existing trees ta and tb in Stree. 1. Remove ta and tb from Stree, connect them into a single new tree tab using emin, and add tab back into Stree. iii. Else, discard emin. d. When the algorithm finishes, Stree will contain only one tree, which is the graph’s MST.
4. Notes 1. Examples of other freely or commercially available annotated chemogenomics databases include WOMBAT, KEGG, and DrugBank. Note, however, that SEA can be used with any kind of annotation and is not limited to ligand-target association.
Off-Target Networks Derived from Ligand Set Similarity
203
2. For efficiency, the steps in Methods will be faster if fingerprints are precalculated and stored for each molecule. 3. While it is not technically necessary, we assume that the similarity coefficient is normalized from 0.0 to 1.0. If not, choose appropriate bounds for the range of ti thresholds discussed in Subheading 3.1, step 6. 4. The open-source Scientific Python (SciPy) package (14) provides a least-squares optimizer that can be used for fitting nonlinear regressions. 5. If you are unsure of appropriate values, use smin = 10 and smax = 300. 6. More than 1,000 points may be sampled, but in our experience this does not yield a substantial difference in the final model. 7. If there are fewer than 30 distinct factors fi for a particular integer si, randomly sample from the available fi 30 times. Sampling more than 30 points is also acceptable, depending on the diversity of the background database and computational resources. 8. These raw scores are the “random” similarities that form the background model. Besides the choice of similarity descriptor and coefficient, the threshold ti is the only settable major SEA parameter. By sampling across the range of ti choices, we will be able to determine an optimal choice of ti in later steps. 9. For the steps plotting these data (and later, the histograms), you need not actually draw out the full plots. All that is strictly necessary is that your data are formatted appropriately for input into your chosen fitter. Using SciPy, for instance, it is enough to store these data points in internal arrays. 10. In our experience, the mean raw score fit ym has always been linear. 11. The z-score is the number of standard deviations by which a particular raw score exceeds the expected mean. 12. You may use the “norm” and “gumbel_r” SciPy data types for Gaussian and extreme value type I distributions, respectively. 13. There is currently no formal justification for choosing the ti threshold, but this approach is consistent and enriches for a BLAST-like background probability distribution. Some experiments also suggest that this choice is reasonable, as thresholds derived from retrospective cross-fold analysis are identical or close to the threshold ti (unpublished). 14. One such collection may be built from the annotated molecular structure database. The second may be the exact same collection (for symmetric comparisons), or derived from a different database of annotated molecules.
204
Keiser and Hert
15. This formula converts EVD z-scores to their p-values, where the p-value expresses the probability of finding a z-score that strong or better, by random chance alone. 16. An E-value of 1 or higher is not statistically significant. The similarity between two sets becomes significant when it is at least one order of magnitude smaller than random chance alone, i.e., 10−1. Sets that are highly similar have E-values «10−50, although there is no single cutoff for E-value significance. The SEA Search tool at http://sea.docking.org may also be used check the accuracy of the z-scores and E-values calculated in Subheading 3.2. 17. While there are many appropriate graph-theoretic approaches, we have chosen an MST. An MST is a selection over all graph edges (E-values) such that the resulting tree links all nodes (ligand sets) at lowest “cost” to the network as a whole. For example, an edge with an E-value approaching zero has a lower cost to the tree than one with an E-value of 1. The resulting MST will preferentially include only those edges with the smallest E-values. It may be interpreted as a simplified view of higherdimensional chemical similarity space. 18. These instructions apply only to symmetric collection comparisons, e.g., Ca = Cb. 19. You may either (a) use Cytoscape to filter out all edges above an E-value threshold of your choice, or (b) construct a global MST.
Acknowledgments M.J.K is supported by a National Science Foundation graduate fellowship. J.H. is supported by the sixth Framework Program of the European Commission. We are grateful to MDL Information Systems Inc. for the MDDR database; Daylight Chemical Information Systems Inc.; and OpenEye Scientific Software for software support. We thank John J. Irwin for reading the manuscript and Brian K. Shoichet for mentoring.
References 1. Roth, B., Sheffler, D., and Kroeze, W. (2004) Magic shotguns versus magic bullets: Selectively non-selective drugs for mood disorders and schizophrenia. Nat. Rev. Drug Discov. 3, 353–359. 2. Paolini, G., Shapland, R., van Hoorn, W., Mason, J., and Hopkins, A. (2006) Global
mapping of pharmacological space. Nat. Biotechnol. 24, 805–815. 3. Nidhi, Glick, M., Davies, J.W., and Jenkins, J.L. (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J. Chem. Inf. Model. 46, 1124–1133.
Off-Target Networks Derived from Ligand Set Similarity
4. Izrailev, S., and Farnum, M.A. (2004) Enzyme classification by ligand binding. Proteins 57, 711–724. 5. Campillos, M., Kuhn, M., Gavin, A.C., Jensen, L.J., and Bork, P. (2008) Drug target identification using side-effect similarity. Science 321, 263–266. 6. Keiser, M., Roth, B., Armbruster, B., Ernsberger, P., Irwin, J., and Shoichet, B. (2007) Relating protein pharmacology by ligand chemistry. Nat. Biotechnol. 25, 197– 206. 7. Johnson, M.A., and Maggiora, G.M. (1990) Concepts and Applications of Molecular Similarity. John Wiley, New York. 8. Frye, S.V. (1999) Structure–activity relationship homology (SARAH): A conceptual framework for drug discovery in the genomic era. Chem. Biol. 6, R3–R7. 9. Jacoby, E., Schuffenhauer, A., and Floersheim, P. (2003) Chemogenomics knowl-
10.
11.
12.
13. 14.
205
edge-based strategies in drug discovery. Drug News Perspect. 16, 93–102. Hert, J., Keiser, M., Irwin, J., Oprea, T., and Shoichet, B. Quantifying the relationships among drug classes. J. Chem. Inf. Model 48, 755–765. The MDL Drug Data Report Database is available from MDL Information Systems, Inc. (Accessed at http://www.mdl. com.) Shannon, P., Markielm, A., Ozierm, O., et al. (2003) Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504. Kruskal, J. (1956) On the shortest spanning subtree and the traveling salesman problem. Proc. Amer. Math. Soc. 7, 48–50. SciPy: Open Source Scientific Tools for Python. (2001) (Accessed at http://www. scipy.org.).
Chapter 9 Chemogenomic Analysis of Safety Profiling Data Josef Scheiber and Jeremy L. Jenkins Summary Understanding the safety of newly developed compounds is a key task in each early drug discovery project. In early stages, pharmaceutical companies address this task by using so-called preclinical safety profiling, in which compounds are screened in inexpensive large-scale assays to understand possible liabilities. This process generates a large amount of binding data on various compounds against a panel of targets − usually thousands or tens of thousands of compounds profiled against ~100 different targets. This data matrix is highly valuable and elicits further analysis. After briefly introducing the nature of safety profiling data, we describe several computational methods used internally at Novartis to analyze it. We showcase protocols that can be used to understand compound promiscuity on a chemical structure level and protocols to evaluate the promiscuity of targets used in safety profiling. We also describe a method to quickly determine the chemical similarity of compounds active against different targets. Next, it is shown what protocols can be used to evaluate global chemical similarity of targets. The above approaches can be used either to optimize the composition of a panel of targets or to better understand certain toxicities. Finally, we will explain a simple method to elucidate hidden patterns in safety profiling data. Key words: Safety profiling, Drug Safety, Chemogenomics, Data Mining
1. Introduction 1.1. Safety Profiling Data
Broad-scale in vitro pharmacology profiling of new chemical entities during early phases of drug discovery is an essential approach to predict clinical adverse effects. Modern relatively inexpensive assay technologies and rapidly expanding knowledge about G-protein coupled receptors (GPCRs), nuclear receptors, ion channels, and enzymes have made it possible to implement a large number of assays addressing possible clinical liabilities early on.
Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_9, © Humana Press, a part of Springer Science + Business Media, LLC 2009
207
208
Scheiber and Jenkins
Together with other in vitro assays focusing on toxicology and bioavailability, safety profiling provides a powerful tool to aid drug development. Preclinical safety profiling is used to test for off-target activity of compounds against proteins that are understood to cause adverse clinical effects when modulated. Such profiling can be carried out at a relatively low cost, especially if they prevent high attrition rates during clinical trials. For a more extensive overview about safety profiling data and how it is generated, the reader is referred to an excellent review by Whitebread et al. (1). A critical first decision is how to construct the safety panel, i.e., what targets in which assay types will be used for routine screening (1). Importantly, the decision about the targets to be maintained in the panel should be re-evaluated regularly. The profiling process can generate invaluable data that, in aggregate, contains information more useful than individual assay results; particularly, the differences in assay results across assays with related readouts may provide useful information. Accordingly, companies embark more and more frequently into a kind of global analysis of multiple data points which can be carried out across targets (i.e., “polypharmacology”) and across target classes (chemogenomics). Key questions in this analysis are the following: How well can we predict off-target activity to drive safety decision making for a chemical series? Are targets related by chemistry? How do we translate highly complex profiling data into something easily interpretable? How can we use it prospectively and proactively? Several methods have been used in recent years to evaluate profiling data. We will introduce protocols that describe the most common methods and their underlying theory. In particular, this review focuses on ligand-based methods and powerful, modern statistical approaches developed in recent years in the cheminformatics field.
2. Methods 2.1. Descriptors
A key factor in chemical data analysis is the choice of chemical descriptors. Recent analyses have shown that extended connectivity fingerprint (ECFP) descriptors–an implementation of circular substructural fingerprints native to Pipeline Pilot software (SciTegic, Accelrys, Inc.) based on the Morgan algorithm (2)–are superior for many tasks. Consequently, they tend to be employed in our analyses herein. An ECFP feature represents an exact structure with limited and specified attachment points. ECFPs are generated in an iterative
Chemogenomic Analysis of Safety Profiling Data
209
fashion. Initially, each atom is assigned a code that derives from the number of atomic connections, element type, charge, and mass within an atomic neighborhood size of “0.” In the first iteration, information about each atom’s immediate neighbors is collected and a new code representing the atom and its immediate neighbors is generated. In each iteration, the neighborhood size becomes larger, and the updated codes of the atoms from previous iterations are used for assigning new codes. When the desired neighborhood size is reached, the set of all features is returned as a fingerprint. ECFPs with a neighborhood size of four or six are typically an optimal size in terms of information content and descriptor space size. We use either ECFP_4 or ECFP_6 as structural descriptors to train statistical models, such as multiplecategory or “multicategory” Laplacian-modified naive Bayesian classifiers. 2.2. Multicategory Bayesian Models Pipeline Pilot Implementation
The proceeding description is derived from Nidhi et al. (3), in which the authors employed multicategory Bayesian models for compound target prediction. In the present work, the same approach can be applied for modeling safety profiling data. Naive Bayes is a statistical classification method based on the Bayes rule of conditional probability, which states that, given two events A and B, the probability of event A occurring, given that B has already occurred, P(A|B), is given by
P (A|B) = P (B|A)P(A)/P(B)
(1)
where P(A) and P(B) are the probabilities of events A and B, respectively. The Bayesian classifier is called naive because it naively assumes that the features are independent. From this assumption, it is valid to multiply probabilities of the individual events. As described below, the “naive” assumption will be employed: the probabilities of the individual events will be multiplied; however, probabilities themselves will not be calculated by the above equation but instead using a Laplacian-corrected estimator (4, 5). The estimator is used to adjust the uncorrected probability estimate of a feature to account for the different sampling frequencies of different features. In the present context, the Laplacian-corrected estimator for a compound being active given a feature Fi is calculated according to the following equations, where A compounds are active in T total compounds and feature Fi is contained in TFi samples and AFi samples containing feature Fi are active. We start with the baseline probability of a compound being active:
P(active) = A/T
(2)
If the molecule contains feature Fi, the uncorrected estimate of activity should be:
P (active|Fi ) = AFi /TFi
(3)
210
Scheiber and Jenkins
But if the number of compounds in the dataset containing feature Fi is small, this estimate may be overconfident. More sampling of a feature is desirable to increase confidence. Typically, if we sample a feature K times, we would expect the number of active compounds to be KAFi/TFi. We can correct every Fi encountered by adding virtual samples:
[AFi + (A/ T)K]/(TFi + K)
(4)
If we have few samplings of the feature, the probability of P(active|Fi) should approach P (active). The Laplacian correction substitutes K = 1/P (active) or T/A:
(AFi + 1)/(TFi + T/A)
(5)
Rearrange by multiplying the numerator and denominator by A/T to yield:
[(AFi + 1)(A/ T)]/[TFi (A/ T) + 1]
(6)
To get the relative estimator, we divide Eq. 5 by P(active) or A/T:
Pfinal (A|Fi ) = (AFi + 1)/[TFi (A/ T) + 1]
(7)
This is the same as Eq. 8 in the appendix of Hert et al. (6) Because several features are normally required to characterize a compound, the multiple features Fi in a sample have to be combined. To allow for an easier combining of multiple features into a Bayesian score, the SciTegic implementation uses the sum of the log values of the individual feature probabilities. Given n features for a compound, the combined estimation Pcombined is calculated as follows:
n
Pcombined = ∑ log[Pfinal (active Fi )]
(8)
i =1
In the present case involving a chemogenomics database, the objective is to build a Laplacian-modified naive Bayesian model for each target class in the database. Because the number of targets is on the order of hundreds or even thousands, an automation of the process is desirable. The Learn Molecular Categories component in Pipeline Pilot automatically builds multiple Laplacian-modified Bayesian models. The user specifies a “CategoryProperty”–for example, the Profiling target name property–which is subsequently used to create a model for each of the categories. In the creation of the model, the compounds in the other activity categories are defined as the inactive set. Models are internally validated during training using the following workflow: Each sample was left out one at a time, and a model built using the results of the samples, and that model used to predict the left-out sample. Once all the samples had predictions, a receiver
Chemogenomic Analysis of Safety Profiling Data
211
operating characteristic (ROC) plot was generated, and the area under the curve (XV ROC AUC) was calculated. Best Split was calculated by picking the split that minimized the sum of the percent misclassified for category members and for category nonmembers, using the cross-validated score for each sample. Using that split, a contingency table is constructed, containing the number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN). From the generated models one can extract the chemical features that are relevant, i.e., “good” or “bad,” for the categories. To prospectively predict the target of a test or query compound, it is passed through each Laplacian-modified naive Bayesian model of each target class. The relative estimator score for each of the target classes is calculated. The target with the highest score is assumed to be the most probable target for that compound. Similarly, the next highest score for a target can be assigned as the second most likely target, and so on. Importantly for analysis, the ECFP features influencing the prediction of a target can be back-projected onto test compounds to understand which features of the molecule are likely to affect target binding. Once validated, multicategory Bayesian models enable the prospective application or reuse of safety profiling data collected over time for disparate projects. Ideally, they are employed to prioritize or flag compounds for safety profiling at the lead discovery and optimization stages. In addition to single target predictions, we may assume that compounds often have multiple off-targets, or an off-target profile. Indeed, some adverse clinical events appear to result from their off-target profile, a phenomenon that is particularly true for the side effects of antipsychotic drugs that are promiscuous inhibitors of GPCRs. In lieu of testing all compounds against all safety targets to generate a complete data matrix, one can employ statistical models trained on safety profiling data to “fill in the holes” in the data matrix. The power of the multicategory Bayes approach is that the entire set of Bayesian scores for all target categories in the model may be used in and of itself as a fingerprint (i.e., “Bayes affinity fingerprints”), to serve as a virtual safety biomarker.
3. Evaluating Safety Profiling Data 3.1. Promiscuity of Compounds – Finding Promiscuous Compounds
The correlation between promiscuity in a safety profiling panel and side effect outcome for a particular drug is very high (7). Until a rational design of promiscuous drugs with desired features becomes possible (8), the main paradigm for drug discovery remains to create compounds that are as selective as possible
212
Scheiber and Jenkins
(i.e., “magic bullets”). A recent analysis by Azzaoui et al. (9) shows an approach to quickly analyze the result of screening a larger number of compounds in a safety profiling panel. The protocol is as follows: 1. Compute the target hit rate (THR) in order to assess each compound for its selectivity or promiscuity across the whole panel of target assays. THR is defined as the ratio of the number of targets hit by a compound (>50% inhibition at a given concentration) to the number of targets tested at that concentration. 2. Use THR to flag compounds according to their promiscuity at 10 mM (THR10): (a) Compounds with THR10 ³ 20% are flagged as promiscuous (P); (b) Compounds with THR10 £ 5% are flagged as selective hits; (c) Other compounds with THR10 values between 5 and 20% are flagged as medium promiscuous (MP). 3. Overall, usually a considerable number of compounds lacking specificity (“promiscuous compounds”) are found, even if the vast majority of these were at the lead optimization stage. Azzaoui et al. found that 21% of compounds had IC50 values below 5 mM toward at least eight different targets. This number is biased, however, because the projects that encounter pharmacological promiscuity submit more compounds than others. Nevertheless, these data show the importance of assessing the pharmacological profile of the compounds well before the last steps of the drug discovery process. 3.2. Promiscuity of Compounds – Elucidating “Dirty” Chemical Features
The identification of chemical features that cause compounds to be either selective or promiscuous is essential for predictive safety pharmacology methods to inform medicinal chemistry decisions. Therefore an analysis using Bayesian models along with ECFP descriptors was used. By using the compound annotation from the previous step, a multicategory Bayes model can be trained. The protocol is as follows: 1. Annotate compounds with THR categories. 2. Use annotation as category for Bayesian model training. 3. Train Bayes model using ECFP descriptors for compounds. 4. Extract the most prevalent substructures for each of the THR categories from the model. 5. “Promiscuous” features are to be avoided, whereas “selective” features can be used to change promiscuous compounds. Such information needs to be fed back into projects that submit compounds for safety profiling. A sample result from Azzaoui’s analysis is shown in Table 1. 6. Repeat the analysis regularly, as newer data could change the outcome considerably.
Chemogenomic Analysis of Safety Profiling Data
213
Table 1 An example analysis of promiscuous chemical features, taken from Azzaoui et al. (1) Functional Group
Present in P
Present Present in MP in S
%P
% MP % S
Difference between % S and % P
Benzoic acid
0
8
50
0
14
86
86
Aliphatic Carboxylic acid
8
48
208
3
18
79
76
Nitro
2
6
21
7
21
72
66
Sulfone
6
7
36
12
14
73
61
Nitrile
20
37
99
13
24
63
51
Sulfonamide
53
91
216
15
25
60
45
Ester
25
39
78
18
27
55
37
Primary amine
38
39
95
22
23
55
33
Imidazole
52
98
134
18
35
47
29
Pyrazole
31
44
71
21
30
49
27
Hydroxyl in R-OH
94
142
216
21
31
48
27
Amide
345
413
701
24
28
48
24
Tetrazole
7
25
15
15
53
32
17
Secondary amine
488
538
726
28
31
41
14
Tertiary amine
451
458
530
31
32
37
5
Piperidine
155
131
135
37
31
32
−5
Piperazine
148
122
79
42
35
23
−20
Indole
174
114
72
48
32
20
−28
Furan
56
20
15
62
22
16
−45
P Promiscouos, MP Medium promiscuous, S Selective
3.3. Assess Promiscuity of Targets
In addition to compound promiscuity, it is important to understand target promiscuity of safety profiling targets. It is well established that some targets bind a huge number of quite diverse molecules (10). If one of these targets is linked to an undesired phenotypic effect, it should be avoided in the process of drug discovery. This analysis here aims to pinpoint targets in safety profiling data that interact with the highest number of molecules. The workflow shown here needs to be performed for every single target to ultimately get a visualization like Fig. 1.
Fig. 1. The most promiscuous targets according to the analysis carried out using Novartis safety profiling data as input.
214 Scheiber and Jenkins
Chemogenomic Analysis of Safety Profiling Data
215
1. Retrieve all compounds from safety profiling data that have been tested against target under investigation. 2. Bin the compound activity in active vs. inactive (cutoff, e.g., 10 mM) to get categories. 3. Determine the percentage of actives (100 × n(active)/ (n(active) + n(inactive)). 4. What targets are hit more often by compounds? What is the reason therefore? 5. How does this match with literature evidence?
4. Evaluating Target Similarity 4.1. Principal Component Analysis (PCA) of Chemical Space Similarity
Many targets are hit by chemically similar compounds, i.e., they share a large common activity space. In this section, we examine how overlaps in chemical space for any two targets can be determined. This is important for safety target panel design; if two targets have similar chemical space, one of the targets can often be moved from a primary safety profiling panel into a panel used less often, as its information content is redundant. Also, targets can be identified that carry very similar activity information, which adds understanding about possible off-targets of compounds. The following steps will guide through an analysis that can be used to optimally evaluate the similarity and differences between safety panel targets with a quick, but easy to understand way (outcome, see Figs. 2 and 3): 1. Compile a comprehensive list of all compounds tested in safety profiling assays where a dose–response curve was valid (actives) or no response occurred (inactives). 2. Canonicalize chemical structures, i.e., make all chemical structures quickly comparable for a computer. For example, canonical smiles or InChIs can be used. 3. Encode all chemical structures using ECFP descriptors, with an atomic neighborhood size or four or six. 4. Run a PCA using the R-PCA component in Pipeline Pilot. 5. Plot the first two principal components (Fig. 2). 6. Annotate compounds with color coding corresponding to active, inactive, or not tested. 7. Determine the geometric center of mean for active compounds per target in PCA score space. 8. Plot centers in a single plot (Fig. 3).
216
Scheiber and Jenkins
Fig. 2. Principal component analysis (PCA) analysis of target chemical space. First a PCA on all compounds tested against a particular target is performed and the first two principal components are plotted. From there, only the actives against the target under investigation are extracted and the center of mass is determined.
Chemogenomic Analysis of Safety Profiling Data
217
Fig. 3. The first two principal components of the safety profiling panel analysis (dark dots are primary panel). Note that each dot describes exactly one target. The targets located near each other are related in terms of the extended connectivity fingerprints of their chemical modulators.
4.2. Venn Diagram of Active Compound Overlaps
Venn diagrams (invented in 1881 by John Venn) are illustrations used in the branch of mathematics known as set theory. They show all the possible mathematical or logical relationships between sets, which normally entails overlapping circles. For our analysis we use the safety profiling targets as sets and compare them with each other by computing a Venn diagram of the compounds active against them. This analysis is performed to analyze the results from the PCA analysis in more detail and focused on comparing two targets side by side. 1. Canonicalize the structures of actives against target A (Set 1), target B (Set 2), and all other compounds ever tested in safety profiling (Set 3). 2. “Merge” canonical smiles to find out where actives overlap between two targets. Similar targets will have a high overlap of circles. 3. Plot this information in a Venn diagram (using e.g., the Pipeline Pilot component).
218
Scheiber and Jenkins
4. Visually analyze two targets side by side to see the extent of their overlap in actives. Importantly, this type of plot is also a visual measure for the promiscuity of a target (the bigger the circle, the higher the promiscuity) – Fig. 4. 4.3. Holistic Data Integration: PCA and Venn Diagrams
The power of the approaches described in Subheadings 4.1 and 4.2 can be better leveraged if they are combined. Initially, two targets are chosen where the chemical space of their actives is very similar. Then, the Venn diagram for these two targets is investigated in more detail. This means that one can quickly drill down from a global actives similarity to specific compounds. 1. Select closely located targets, as this reflects the chemical similarity of their active ligands. 2. Evaluate the actives tested against both of the targets by using the corresponding Venn diagram of actives – Fig. 5: Which one of the receptors is the more promiscuous one? 3. How many compounds are only active against one of the receptors? What is the percentage of compounds active against both? Are the actives of one receptor a subset of the actives of the other receptor?
4.4. Target Similarity in Chemical Feature Space
Similarity by “feature space” takes the analysis one step further: instead of simply using chemical identity, the chemical similarity of the ligands of one target is compared to those of another target by using Bayesian modeling to determine the relevant ligand features (i.e., substructures) for each target. This analysis is analogous to the one presented by Bender et al. (8, 11). Feature space
Fig. 4. Sample Venn diagrams generated based on safety profiling data.
Chemogenomic Analysis of Safety Profiling Data
219
Fig. 5. Combining the principal component analysis (PCA) of chemical similarity with a Venn diagram analysis. This view enables the researcher to rapidly analyze which targets are the most similar ones in a safety profiling panel and how promiscuous they are (black primary panel, grey secondary).
c omparisons are important in cases where a full data matrix of compounds to targets has not been obtained; in these cases, chemical substructures of inhibitors may be compared for targets which share no ligands in common from safety profiling analyses. For any pairing of preclinical safety pharmacology (PSP) targets, the similarity between the two can be established by computing the Pearson correlation between the normalized feature probabilities from individual Bayesian models – Fig. 6. Therefore a multicategory Bayesian model, as described earlier, is established for all targets in the safety profiling panel. Then, for any two models being compared, the 10,000 most frequent features of each individual model which are common to the two models are compared. The comparison is achieved by computing a Pearson correlation for the mathematical weights placed on each feature in the respective Bayesian models. This step was found to improve the overlap of chemical substructures between the different PSP datasets which otherwise would cover very different areas of chemical space. Correlations were normalized per target. In other words, every target was assigned the same overall probability, with different distributions of correlations
Fig. 6. Chemical feature similarities between whole activity classes for the targets used in preclinical safety profiling. Black colors indicate high class similarities, whereas grey colors show low similarities. We compute similarity by using the Pearson correlation of normalized feature probabilities in each Bayesian model. Notably, receptor families share ligand similarity, as would be expected; however, non-family pairing also share features, such as antihistamines (H1 receptor) and hERG blockers (in agreement with the arrhythmia that can be caused by this class of compounds).
Chemogenomic Analysis of Safety Profiling Data
221
over the PSP targets. In contrast to the approach of comparing targets on the basis of their overlap of small-molecule inhibitors (10), determining similarity via statistically correlated features allows one to determine target–target similarity even when no exact chemical structures are in common between datasets. In other words, only important substructures of compounds need to be shared between two targets to find similarity. This is important because data from pooled sources do not contain a complete experimental matrix of all compounds tested against all targets, as mentioned previously. 4.5. Emergent Information from “Indirect Target Similarity”
Another type of pattern that can occur in a safety profiling panel is the indirect correlation between targets. For example, compounds that are active against target A are always inactive against target B. These patterns can be elucidated on a broad scale by using the following approach. 1. Determine the activity difference between compounds tested in any two safety target assays. 2. Bin the difference into classes (e.g., five classes in Fig. 7). 3. Assign compounds to the various bins based on their assay result differences. 4. Compute a pie chart that shows how each bin is populated for the two targets under comparison.
Assay 1 much more active Assay 1 more active Equally active in both assays
5 classes
Assay 2 more active Assay 2 much more active
Fig. 7. Annotating compounds by binning them into classes based on assay result differences, followed by visualization.
222
Scheiber and Jenkins
Fig. 8. Feature clustering based on profiles of assay result difference bins. Each line describes one feature, each column one activity bin (1: inactive – IC50 > 10 mM; 2: moderately active – IC50 >1 mM and 670,000 structures), molecular interactions (drug– protein, protein–protein, protein–enzymatic reaction, compound–enzymatic reaction) and biological, toxicological and disease associations for genes, proteins and compounds in the global interactome network. Molecular interaction data is derived from full-text literature curation of small experiment data (1, 2), that permit interactome reconstruction using high confidence interactions using directionality (a®b), mechanism (e.g., binding, phosphorylation, transcriptional regulation) and nature of the interaction (positive effect or negative effect). Alongside these data, MetaDiscovery includes a suite of tools that leverage the database to translate chemical structure or high-content data
Network and Pathway Analysis of Compound–Protein Interactions
227
Table 1 Noncomprehensive list of tools and databases for predicting metabolic fate, linking chemical structures to biological targets, and performing functional analysis of molecular targets Resource
Supplier
Metabolite prediction software META
Multicase (http://www.multicase.com)
MetaDrug
GeneGo (http://www.genego.com)
MetaSite
Molecular Discovery (http://www.moldiscovery.com)
METEOR
LHASA (http://www.lhasalimited.org)
Databases of structural and pharmacological data DrugMatrix
Entelos (http://www.entelos.com)
MDDR
MDL (http://www.mdli.com)
Medchema
GVK (http://www.gvkbio.com)
MetaBase
GeneGo (http://www.genego.com)
b
PubChem
NCBI (http://pubchem.ncbi.nlm.nih.gov)
QSAR modeling software ChemTreec
Golden helix (http://www.goldenhelix.com)
Discovery Studio QSAR
Accelrys (http://accelrys.com)
HQSAR
Tripos (http://www.tripos.com)
MC4PC
Multicase (http://www.multicase.com)
MDL-QSAR
MDL (http://www.mdli.com)
Functional analysis and network and pathway tools Gene ontology
Gene ontology consortium (http://www.geneontology.org)
Ingenuity pathway analysis
Ingenuity (http://www.ingenuity.com)
MetaCore/MetaDrug
GeneGo (http://www.genego.com)
PathArt
Jubilant biosystems (http://www.jubilantbiosystems.com)
Pathway assist
Ariadne genomics (http://www.ariadnegenomics.com)
Medchem data is also included in MetaBase Content is accessible through MetaDrug and MetaCore c ChemTree QSAR analysis is also available in MetaDrug a
b
(including transcriptomic, proteomic, metabolomic and other systems biology data types) into biological understanding. These tools include structural analysis tools (MetaDrug™) – chemical rules based prediction of human metabolites, chemical reactivity
228
Brennan, Nikolskya, and Bureeva
and “drug-like” properties, quantitative structure–activity relationship (QSAR) modeling for ADMET properties and biological interactions, and structure similarity and substructure (pharmacophore, toxicophore) searching to identify compounds that may have similar pharmacological or toxicological properties and can be used to “read-across” (3–5) and infer activities of uncharacterized compounds of similar structure. These structural analysis tools feed into systems biology network reconstruction algorithms and functional ontology enrichment calculation tools (MetaDrug and MetaCore™) that can use as input, lists of compounds and proteins identified by the structural analysis tools in MetaDrug to reconstruct networks that connect together network components or “nodes” (small molecules, proteins and enzymatic activities) to reveal functional biological units, and to identify biological processes, toxicities, diseases etc. that are statistically associated with the identified nodes (6, 7).
2. Materials MetaCore and MetaDrug (licenses available from GeneGo Inc., Encinitas, CA) run under a client-server model. Server-side software runs on an Intel-based 32-bit server running RedHat Linux Enterprise 4 (RedHat, Raleigh, NC) and a web server running Apache 1.3.x/mod perl (http://perl.apache.org/start/ index.html). For in-house installations, a server with two or more P4/XEON CPUs, 4 GB or greater of RAM, 3.2 GHz CPU, SCSI HDD with minimum of 36 GB of storage is required. Operating system RedHat Enterprise Linux 3, 4; SuSE 9.2; or CentOS 4.4 X development package, JDK 1.5.0, # Oracle 9.2 or 10.2 DBMS and client tools need to be installed. MetaCore client software runs within a web browser and requires a PC or Macintosh computer (P4 equivalent CPU or better and 1 GB of RAM) with Internet Explorer 6.0, Mozilla Firefox 2.0, or Safari 3.0.4 or higher, Macromedia Flash Player 8 or higher, JRE 1.5.0, and ChemDraw ActiveX/Plugin Net 9.0. Download Edition are also required. MetaDrug requires a PC with IE 6.0 or higher.
3. Methods 3.1. Metabolite Prediction
Drugs and other compounds ingested orally are absorbed through the intestines and pass through the liver before wider systemic exposure. Xenobiotics are extensively metabolized in the liver,
Network and Pathway Analysis of Compound–Protein Interactions
229
which may lead to a significant drop in their concentrations prior to systemic circulation and result in the presence of distinct drug metabolites in circulation, a phenomenon called first-pass metabolism. Compounds may also be metabolized in other tissues and can undergo further metabolism (second pass) in the liver. An understanding of drug metabolism and the properties of drug metabolites is essential to understanding chemical pharmacology since, in addition to the role of metabolism in deactivation and excretion of xenobiotics, drug metabolites may themselves be pharmacologically active. In many cases, for example the drugs leflunomide (8), sulindac (9) and azathioprene (10), a major metabolite may, in fact, be the active drug component, the parent molecule being a prodrug. Metabolism may also affect the target profile. Activity of metabolites against different targets to the parent molecule has implications for drug safety. For example, the antihistamine drug terfenadine was withdrawn from the market due to a risk of sudden cardiac death. Drug-drug interactions with coincident medications can inhibit the metabolism of terfenadine to its active carboxylate metabolite, a potent histamine H1 receptor inhibitor, and lead to accumulation of the parent drug, which is a potent inhibitor of the human hERG potassium channel (an unintended activity). The hERG channel is a critical component of electrophysiological function in the heart (11) and its inhibition can lead to potentially fatal cardiac arrhythmias. While instances of metabolites having potent activity against targets not hit by the parent molecule are few (with the exception of prodrug metabolites), this potential should also be considered, especially when the drug is targeted against a particular member of a closely-related family of proteins with different biological functions (e.g., PPARs, tyrosine kinases) (12). MetaDrug includes >80 metabolic rules describing common metabolic reactions, categorized according to the particular type of chemical transformation (e.g., aromatic hydroxylation or ester hydrolysis). Phase 1 metabolic rules determine the metabolites due to oxidation, reduction, hydrolysis, cyclization, and decyclization reactions, typically catalyzed by cytochrome P450 enzymes. Phase 2 metabolic rules determine the metabolites of conjugation reactions (e.g., with glucuronic acid, sulfonates, glutathione or amino acids). MetaDrug metabolic rules were derived from the analysis of a manually-annotated database of human drug metabolism (MetaBase™, GeneGo Inc.) including more than 10,000 xenobiotic reactions, more than 1,500 enzyme substrates and 1,000 enzyme inhibitors with kinetic data. MetaDrug also includes 89 rules to predict likely reactive metabolites such as quinones, aromatic and hydroxyl amines, acyl glucuronides, acyl halides, epoxides, thiophenes, furans, phenoxyl radicals, phenols, and aniline radicals. Molecules with reactive groups are marked and highlighted.
230
Brennan, Nikolskya, and Bureeva
Prediction of metabolites is performed during initial compound upload to MetaDrug. The user specifies which metabolic rules should be applied using the Upload Structures wizard. First pass metabolites are predicted by default, however access to all rules is provided and individual rules can be selected or deselected by the user (Fig. 1). Predicted metabolites are prioritized into major, conjugated, and minor metabolite categories. An option to include second-pass metabolism for sequential modification of major metabolites is also provided. Metabolites for the query compound can be visualized within the software in the Metabolite Properties viewer (Fig. 2), accessible from within the compound report for the uploaded molecule. 1. Open the My Structures folder in Data Manager 2. Open the compound report for query compound 3. Open the Metabolites section of report 4. Click on Metabolite Properties The structures and their predicted properties may also be exported to a text, Microsoft Excel, or MDL SD file.
Fig. 1. MetaDrug Upload Structures wizard, step 1, showing structure preview window and selection of metabolite generation rules.
Network and Pathway Analysis of Compound–Protein Interactions
231
Fig. 2. MetaDrug metabolite properties display showing predicted major first pass metabolites for clozapine. Also shown are the chemical properties of formula, molecular weight, compliance with Lipinsky’s “Rule of five” (20), and chemical reactivity of the compound.
3.2. ADMET Properties and Compound– Protein Target Associations
MetaDrug uses three basic methods with which to associate compounds to protein targets, which can subsequently be subjected to functional analysis. The first method relies on the extensive database of compound properties held within MetaBase. The database contains over 670,000 compound structures, including >2,900 known drugs and >4,600 endogenous compounds, and information, gathered by curation of full-text articles from the scientific literature, on over 100,000 compound–protein interactions. The compound–protein interactions form “edges” in biological networks (Fig. 3) which, as part of the biological interactome data in MetaBase, are used to reconstruct biological networks linking together proteins, enzymatic reactions and compounds into biologically-meaningful functional units (6, 7). This database directly allows compounds with known biological activities to be incorporated into networks, and their pharmacological properties further investigated.
232
Brennan, Nikolskya, and Bureeva
Fig. 3. Simple interaction network for the drug clozapine, three P450 enzymes for which clozapine is a substrate, the enzymatic clozapine metabolic reaction catalyzed by the enzymes, and the clozapine metabolite resulting from the reaction. Compound–protein and compound–reaction interactions are shown. Compounds are represented by hexagons, proteins by different solid shapes representing different classes of compound, and enzymatic reactions by rectangles. Edges (interactions) between nodes (network objects) are shown as unidirectional arrows with a mechanism of interaction represented by letters in hexagonal boxes over the arrows (B binding, Z catalysis).
Secondly, MetaDrug includes a suite of QSAR models that predict a number of ADME properties for the query compound, such as substrate affinity for, and inhibition of metabolic enzymes and transporters, water solubility, blood-brain barrier penetration and plasma protein binding. Models are also available for common off-target effects such as hERG channel inhibition and pregnane X receptor (PXR) activation, and for toxicological endpoints such as cytotoxicity and bacterial mutagenicity. QSAR predictions of protein target affinity from these models define a limited number of potential targets for novel molecules and/or their metabolites submitted for analysis. Importantly, MetaDrug allows the user to build custom QSAR models from published or proprietary structure–activity data, permitting the user to include additional targets of interest as well as the ability to address novel chemical space which may be poorly represented in the existing MetaDrug models. The third method for defining compound–target interactions is to perform a similarity search for the uploaded structure and its major metabolites against the database of existing structures and their targets. Potential targets for novel molecules are inferred through “guilt by association” from the known targets of structurally-similar compounds in the database. 3.2.1. QSAR Modeling
MetaDrug™ incorporates ChemTree 3.1.1™ (GoldenHelix). In generating QSAR models a simple descriptor is used, composed of two atoms separated by minimal topological distance between them. The local environment of these atoms is characterized by three values: the atomic number, the number of nonhydrogen
Network and Pathway Analysis of Compound–Protein Interactions
233
connections, and one half of all associated pi-electrons. Thus, for each structure, n(n − 1)/2 atom pairs are generated (where n is the number of nonhydrogen atoms in the structure) (13). From a dataset of about 100 drug-like structures, approximately 1,000 unique types of atom pairs are generated. ChemTree uses decision tree methodology and a recursive partitioning algorithm to correlate molecular descriptors and the activity/property of the training set to build a model (14). The algorithm recursively splits a heterogeneous training set of molecules into homologous sets on the basis of quantitative data and molecular descriptors. Initially, the software finds a molecular descriptor with the highest occurrence in the training set and divides molecules into several nodes (two or three, as a rule). Taking into account descriptors and activity values, each node is further divided into leaves until no further splitting criteria exist. The resulting trees differ in the number of nodes and leaves. The user can vary the number of descriptors generated based on the extent of recurrence in the training set, number of leaves in the node, and Bonferroni adjusted p-value. The parameters depend on the size of the training set but, as a rule, models are successful with 20 trees, three leaves and a Bonferroni adjusted p-value of 0.99 for small training sets (hundreds of molecules), and 0.01 for big sets (thousands of molecules). These parameters are the default parameters for custom model building in MetaDrug. Custom QSAR models are saved within the MetaDrug software and can be applied alongside the included models during the initial compound upload (step 2 of the Upload Structures wizard, Fig. 4). Available models include both binary (“yes” or “no” predictions, reported as probability of “yes”) and quantitative value models (e.g., for inhibition constants in mM). The range of values considered as a “hit” for the QSAR model (and therefore defining a compound–protein association for binding type models) is user selectable (Fig. 4). The default values are 0.5–1 for binary models (i.e., >50% probability of a positive match to the model) and 50 mM – the lowest limit of the training set activity for binding activity models. Other model types (e.g., % serum protein binding) have default values appropriate for the model and the training set (see Note 1). In order to establish whether a query chemical compound fits within the applicability domain of the QSAR model (i.e., that the training set for the model contained molecules chemically similar to the query molecule), the similarity of the uploaded structure (and its predicted metabolites if the metabolite QSAR prediction option is selected) to the structures used in the training set for the QSAR model can be calculated (maximal Tanimoto coefficient (15)). The higher the similarity value, the greater is the applicability of the model. Results of QSAR modeling for
234
Brennan, Nikolskya, and Bureeva
Fig. 4. Structure similarity cutoff and QSAR model selection in the MetaDrug Upload Structures wizard. Transferase QSAR models selection is expanded, showing individual model names and cutoffs for values for determining positive matches (values for substrate models are shown as pKm (mM)).
Network and Pathway Analysis of Compound–Protein Interactions
235
the query compound are available in the compound report for the uploaded structure. 1. Open the My Structures folder in Data Manager 2. Open the compound report for query compound 3. Select the Models section of report Results for QSAR models applied to the predicted metabolites (if this option was selected during compound upload) are available in the individual metabolite reports accessed by clicking on the name of the metabolite in the Metabolites section of the compound report for the query compound. 3.3. Structural Similarity Searching
The third method of defining compound–target associations is based on the premise that structurally similar compounds have similar biological function. In MetaDrug, structural similarity is based on 2D fingerprint method as implemented in the Accord Chemistry Cartridge (Accelrys Software, Inc., http://accelrys. com/). Fingerprints are arrays generated for each molecule and contain, as elements, binary hashes representing particular substructures (patterns) within that molecule. Patterns are generated for each atom, each atom and its nearest neighbors (including the bonds that join them), and each group of atoms and bonds connected by paths up to 2, 3, 4, 5, 6, and 7 bonds long. The accuracy of the substructure search depends on the length of the fingerprint and maximum pattern length. The fingerprint of a query molecule is compared with the fingerprint of the molecule from the database, and the number of common fragments is determined. The similarity between two fingerprints is quantified with the Tanimoto coefficient (15), which is the ratio of the number of common fragments to the total number of fragments for both molecules. The Tanimoto coefficient ranges from zero to one, and may be expressed as a percent similarity, where a value of 100% indicates that the compounds are identical. A Tanimoto coefficient of 0.85 (85% similarity) is generally considered stringent enough to identify chemicals with similar biological activities (16), however compounds targeted against the same receptor often have lower scores. The similarity coefficient used for similarity searching during compound upload is user definable, with a default value of 0.7 (Fig. 4, see Note 2). The list of database compounds matching or exceeding the similarity cutoff is available in the compound report for the uploaded structure. 1. Open the My Structures folder in Data Manager 2. Open the compound report for the query compound 3. Select the ‘Similar Compounds’ section of report An expansion of structural similarity searching to allow search by chemical substructures in also available as a feature of the Search Compound By Structure function (Advanced Search) of
236
Brennan, Nikolskya, and Bureeva
MetaDrug. In this case a structure file or sketched structure (see Note 3) is treated as a component substructure and all compounds containing this structure are returned by the search. To search compounds by substructure: 1. Select the Search tab in the EZ-Start window 2. Select Search Compound by Structure 3. Select Substructure search under Search type 4. Upload the MOL file containing the structure of interest or sketch the molecule in Structure preview window (see Note 3) 5. Click on Search 6. The search function returns a list of up to 500 database molecules containing the substructure, and any protein targets for those molecules 3.4. Functional Analysis
The end result of applying the above approaches is a list of compounds (from similarity searching of the query compound and it’s metabolites against the database) and possible protein targets for the query compound both from QSAR model hits and from known targets of similar compounds to the query compound and its metabolites (Fig. 5), see Subheading 3.5 for a stepwise workflow integrating all of these steps). The list of targets is available in the compound report for the uploaded structure in the Possible Targets section. This list of objects can now be subjected to functional analysis by evaluation of the list for enrichment in several functional ontologies. The list can also be used to reconstruct biological networks linking together the compounds and proteins through edges in the global interactome database in MetaBase using one or more of the variety of network reconstruction algorithms available in MetaDrug and MetaCore.
3.4.1. Enrichment Analysis
Enrichment analysis is a commonly-applied data-mining technique in genomics that uses statistical analysis to evaluate whether a particular collection of genes is “enriched” with genes from a particular class (17). Alongside extensive molecular interaction data, MetaBase also contains public and proprietary functional ontologies that collect genes/proteins into biologically-meaningful categories. MetaDrug contains three public ontologies from the Gene Ontology (GO) consortium (http://www.geneontology.org). These are GO biological processes (GO Processes), GO molecular function (GO Molecular Function), and GO cellular component (GO Localization). Additionally, MetaDrug includes the proprietary ontologies GeneGo Pathway Maps, GeneGo Biological Processes, GeneGo Disease Biomarker Networks, GeneGo Drug Target Networks, GeneGo Toxicity Networks, and GeneGo Metabolic Networks. MetaCore includes the additional ontologies GeneGo Diseases (by Biomarkers) and GeneGo Metabolic Networks (Endogenous).
Network and Pathway Analysis of Compound–Protein Interactions
237
Fig. 5. Subset of possible targets for clozapine (only targets classed as enzymes are shown) showing the predicted target, the database compound driving the prediction, the Tanimoto coefficient percentage of its similarity to the query compound (in this case clozapine was found in the database, hence similarity score is 100%), the effect on the target (where known) and the PubMed ID or patent ID from which the compound–target association was drawn. Targets predicted by QSAR models are also indicated.
GeneGo pathway maps comprise almost 700 pictorial representations of human and rodent signaling and metabolic pathways. ‘GeneGo disease biomarker networks’ is a proprietary ontology of more than 90 manually-created networks of genes genetically linked to major human diseases. The disease biomarker ontology contains over 8,000 genes with their known links to over 500 human diseases. The GeneGo drug target ontology includes over 80 manually-created networks built around protein targets for all marketed drugs (therapeutic drug targets). The GeneGo toxicity ontology consists of about 400 manually-created networks describing major toxic processes. The enrichment calculation uses the Fisher’s exact test or hypergeometric distribution to calculate the probability that the degree of overlap between the list of possible protein targets generated from the query compound analysis and the proteins represented in the functional ontology category can happen by chance given an identical number of proteins selected at random from the universe of proteins annotated within the ontology (17). The p-value generated is used to rank order the categories within each ontology by their significance to the list of targets, thereby identifying (e.g.) maps or biological processes likely to be
238
Brennan, Nikolskya, and Bureeva
affected by compound exposure, revealing predicted pharmacological or toxicological effects of the compound (Fig. 6). Categories showing enrichment can be further explored from the enrichment results by clicking on the name of the category or the enrichment bar graph. For GeneGo Pathway Maps, the particular map representing the enriched process will be displayed (Fig. 7). Information on tissue or species-specificity of component map nodes, their involvement in diseases or as known targets of drug therapeutic agents can also be displayed on maps to provide additional biological knowledge (Fig. 7). On clicking
Fig. 6. Enrichment analysis results for clozapine targets against GeneGo Pathway Maps and GeneGo Biological Processes. Horizontal bars indicate the negative log of the p-value from the hypergeometric distribution calculation.
Fig. 7. (a) GeneGo pathway map G-protein signaling_ACM regulation of cAMP level identified through enrichment analysis of the possible targets of clozapine in MetaDrug. Compounds are represented by hexagons, proteins by solid shapes representing different classes of compound, and enzymatic reactions by gray rectangles. Protein–protein, compound– protein and compound–reaction interactions are shown as unidirectional arrows, and a mechanism of interaction represented by letters in hexagonal boxes over the arrows. (b) Visualization options panel for pathway maps showing the selection of clozapine targets and genes involved in schizophrenia to be highlighted on the map. Targets of clozapine are identified on the map with hexagons containing a P and genes involved in the pathophysiology of schizophrenia are identified with hexagons containing the letter D. Gene expression data values associated with network objects are displayed as vertical thermometers. Downward–pointing thermometers represent down-regulation of the gene and upward-pointing thermometers represent up-regulation. Maps are in full color in software, and expression values are colored red (for up-regulation) and blue (for down-regulation) for easier visualization.
Network and Pathway Analysis of Compound–Protein Interactions
239
240
Brennan, Nikolskya, and Bureeva
a prebuilt network, such as GeneGo Biological Processes, the network will be displayed. For ontologies consisting simply of gene content (e.g., GOs), the user will be taken to the Network options page from where they can use the various network algorithms available in MetaCore to link together the genes within the category using the interaction data in MetaBase (see Subheading 3.4.2). The results of enrichment analysis calculations for possible targets can be accessed directly from the compound report for the uploaded molecule: 1. Open the My Structures folder in Data Manager 2. Open the compound report for the query compound 3. In the Possible Targets section of report, select the targets identified for the query compound and its metabolites to be included in the enrichment analysis 4. Open then Enrichment Analysis section of the report and review the results for the ontologies of interest 5. Mouse click on the enrichment chart or the category name to view the details of maps and prebuilt networks or to build de novo networks from simple gene list categories 3.4.2. Network Reconstruction
By using the molecular interaction data of known protein– protein and compound–protein interactions in MetaBase, biological associations between possible compound targets or between similar compounds to the query compound can be mapped. The resulting biological networks reveal valuable information on the broader biological space of the predicted compound targets, such as hidden nodes that may act as hubs linking together targets in coordinated biological processes, signaling and metabolic pathways in which the targets participate, and diseases and pathological processes which may be impacted by compound interactions with the targets. MetaCore and MetaDrug use a variety of different algorithms for generating networks from input lists of genes and proteins. The simplest approach is to connect the target proteins in the list through all Direct interactions between them. In most cases however, a list of targets will not form a coherent, connected cluster as the lists are inherently sparse and lack critical hubs through which the individual targets may elicit their effects. A more rigorous Shortest path algorithm can be applied to reconstruct network modules by bringing in additional nodes (proteins, compounds, enzymatic activities) that connect the targets. For larger lists of targets, the Shortest path algorithm may generate large, complex and difficult to interpret networks. In this case the Analyze networks algorithm can be applied. Analyze networks generates overlapping subnetworks of up to
Network and Pathway Analysis of Compound–Protein Interactions
241
Fig. 8. (a) Subnetwork generated using Analyze network algorithm on the list of possible targets identified by MetaDrug for clozapine. Edges are faded and mechanisms hidden to aid visualization. (b) Disease filter page showing disease associations for which proteins in the network are enriched. Root nodes on the network (proteins from the original list of targets) are circled and network nodes not associated with schizophrenia are faded showing the enrichment of this network both with biological targets of the drug clozapine and with proteins associated with the development of schizophrenia. Gene expression data values associated with network objects are displayed as shaded circles adjacent to the nodes. Networks are in full color in software, and expression values are colored red (for up-regulation) and blue (for down-regulation) for easier visualization.
50 nodes by expanding out from the objects in the supplied list, and calculates enrichment of the networks with input data, canonical pathways and biological processes (18, 19) (Fig. 8). Other advanced algorithms in MetaCore provide the option to weight the network building around transcription factors, receptors or canonical pathways. These algorithms are useful for building networks with a particular focus in mind. To generate a network from similar compounds to, or possible targets of, the query compound: 1. Open the My Structures folder in Data Manager 2. Open the compound report for the query compound 3. In the Possible Targets section of the report, select targets identified for the query compound and its metabolites to be included in the network building OR in the Similar Compounds section of the report, select the compounds to be used for network building. Click on Build network 4. On the Network options page, select the desired network building algorithm. Click on Build network 5. If Analyze networks was employed, a list of overlapping subnetworks is presented (default number of nodes = 50, see Note 4) named by their key nodes and displaying the number of root nodes (objects from the list of possible targets used to generate the network), and GO processes
242
Brennan, Nikolskya, and Bureeva
enriched within the subnetwork (giving insight into the key biological functions represented by the network). Networks are listed by ascending p-values from the hypergeometric distribution calculation stop Z-score (which ranks the subnetworks according to saturation with the objects from the initial list of targets) and G-score (a modification of the Z-score based on the number of canonical pathways used to build the network) are also shown. Subnetworks can be visualized by clicking on the network name Once the network algorithm has generated and displayed the network, there are multiple tools available for enhanced visualization, manipulation and interpretation of the networks. All network objects are interactive. Objects can be selected, moved, removed or hidden. “Mousing” over a node or edge brings up a descriptive “pop-up” window for the object. Clicking on an edge (or interaction mechanism) opens a page with more detailed information and literature references. Right clicking on a node reveals a contextual menu with several options for manipulating the object (hide, remove, expand etc.) and a link to the detailed information page for the node. A button bar in the network window contains directly accessible functions for manipulating the network, visualization tools and image export. Additional network objects can be added to the network via a search tool accessible via the Add tab at the top left of the network window. Other tabs provide access to advanced filtering tools to allow further manipulation and interpretation of the network. For example, clicking the Diseases tab brings up a list of diseases with protein associations to the disease (“disease biomarkers”) represented in the network. From this list, proteins involved in a particular disease can be “traced” (highlighted against a faded background) (Fig. 8) or “marked” (selected for further manipulation such as hiding all nonselected – or non disease-associated – nodes so that a disease-specific network is generated). Similarly, networks can be filtered for tissue, particular species orthologs, GO biological processes etc. 3.5. Automated Analysis Workflows
To simplify the user interface and enhance the ease of use for the end user, common tasks such as structure upload and experimental analysis are automated with workflows and wizards.
3.5.1. Compound Upload and Analysis
1. In the EZ Start window, select the Upload Data tab and select Upload Structures (alternatively select Upload Structures from the file menu in Data Manager). 2. In the Data Analysis Wizard (MetaDrug™) window (step 1, Fig. 1) click ln Browse under Upload CDX, SDF or MOL file and open an appropriate chemical structure file. If multiple structures are to be batch-uploaded from an SDF file, individual compound reports will be generated for each structure
Network and Pathway Analysis of Compound–Protein Interactions
243
unless the Include all compounds in SDF file into one Report box is checked. Alternatively select From sketch under Data source and sketch the structure in the Preview window (see Note 3). 3. Enter a name for the report and select metabolite prediction preferences for prioritization (ranking of metabolites by likelihood of occurrence), generation of second pass metabolites if desired and for individual phase 1 and 2 metabolism rules to be applied. 4. At step 2 of the Data Analysis Wizard (MetaDrug™) (Fig. 4) select the similarity cutoff to be applied to generate the list of structurally-similar database compounds and the molecular properties and QSAR models to be calculated. Default activity values for defining matches to individual QSAR models can be changed here also. 5. If QSAR predictions and similarity searching are to be applied to the predicted metabolites of the query compound, select these options under Advanced options. By default the structural similarity search returns only compounds with one or more associated protein targets. There is also an option here to return similar molecules that do not have any associated targets. 6. The next step of the wizard opens a self-refreshing progress monitor. Once processing progress has reached 100%, the upload and analysis are complete and the compound report can be accessed in the My Structures folder in Data Manager (see Note 5). 3.5.2. Comparing Compound Effects
The predicted biological activities of different compounds, based on possible targets identified through MetaDrug analysis, can be compared using the Compare Compounds workflow. The workflow retrieves lists of targets for similar compounds, calculates common, similar and unique subsets of proteins between these lists, and then performs enrichment analysis across the different protein subsets to compare pathways and processes differentiallyaffected by these compounds and reveal common and individual pharmacological and toxicological properties. 1. Open the My Structures folder in Data Manager, select two or more compound reports to compare and click on Activate selected. 2. In the EZ Start window, select the Analyze Data tab and click on Compare Compounds. Alternatively select Compare Compounds Workflow from the Tools menu in Data Manager. 3. In the Compare Compounds window, select the Tanimoto similarity cutoff for defining similar compounds (and therefore possible targets). In the Effects box, choose whether to
244
Brennan, Nikolskya, and Bureeva
compare only targets with a positive effect (activation), only targets with a negative effect (inhibition), only targets with an unspecified effect (unspecified), or to use all effect types. The Operation option determines what targets are included in the comparison. If Intersection is chosen, only the possible targets for the uploaded structures will be used. If Close Neighbors is selected, proteins that are in direct association with possible targets of compared compounds will also be included to expand the number of proteins used in enrichment analyses (see Note 6). 4. Overlap in possible targets (common) and unique targets for each compound are calculated, as well as similar targets (targets predicted for n − 1 compounds). The results are displayed graphically. The lists can be visualized or exported through contextual menus activated by right clicking on the graphical display. 5. Results of enrichment analyses for the various gene sets are calculated and displayed so that similarities and differences between compounds in their predicted activities against GeneGo pathway maps, GO biological processes and toxicity networks can be investigated. Enrichment results are ranked by the lowest p-values for the common target set by default. The ranking for each enrichment ontology can be changed using the Sorting method option, and can also be ranked by the lowest p-value for the unique or similar target lists. 6. Following completion of the enrichment analyses, export the results as a Compare Compounds Workflow Report. If Create report is chosen, three optional files may be generated. Target Analysis exports an Excel file with two worksheets, Compounds for Input Molecules and Targets for Input Molecules. Enrichment Analysis exports an Excel file containing results of the functional ontology enrichment analyses (ten top-scoring categories in each ontology with p-values) with user-defined sorting parameters. Summary Report is an in-depth summary containing compound property calculations, enrichment results and pathway map images, and, if the Create report on networks statistics and topology option is selected, the most relevant (highest enriched) networks for each compound resulting from application of the Analyze Network algorithm to the target lists for each compound using default settings is given. This analysis is also accessible in the workflow window via the Network statistics button. 3.6. Visualizing Data on Maps and Networks
Although this chapter deals with in silico predictive analysis of compounds based on their structures, and mapping of predicted targets to biological maps, networks and functional ontologies
Network and Pathway Analysis of Compound–Protein Interactions
245
to identify potential pharmacological and toxicological consequences of compound exposure, MetaCore and MetaDrug are also powerful platforms for analysis of empirical experimental data from data-rich systems biology experiments such as microarray gene expression analysis, proteomics, metabolomics, and other high data content experiments. The ability to integrate prospective in silico analysis with data-driven empirical data analysis is a powerful feature enabling integration of the various capabilities across different stages of the drug or chemical development process. In silico analysis may be performed very early in the process, perhaps before compounds are even synthesized, to prioritize structures or scaffolds for follow-on development or to direct medicinal chemistry efforts. As programs progress and empirical data is generated using the same compounds, those data can be added to the earlier analysis and hypotheses developed by the in silico approach further investigated and refined. To display experimental results on maps and networks generated during MetaDrug structural analysis: 1. Navigate to the Experiments folder in Data Manager, select experimental datasets of interest (e.g., gene expression or protein or metabolite level data) and click on Activate selected to activate the datasets for analysis. 2. Select Reference list and threshold under the Tools menu and apply value and/or statistical thresholds to define differentiallyaffected genes, proteins or metabolites. 3. Also in the Reference list and threshold menu, optionally select a Background List such as a particular microarray, a custom gene or a network object list, to define the “universe” of possible network nodes for the most accurate statistical calculations. Click OK. 4. While experimental datasets are active, networks generated or pathway maps displayed will show experimental data values for any nodes for which experimental values exist in the dataset and which pass the applied thresholds. Multiple datasets, representing (for e.g.) time-course or dose-response, can be displayed consecutively. On GeneGo pathway maps, data values are displayed as vertical thermometers, blue represents down-regulation of the node and red represents up-regulation (Fig. 7). On networks, data values are displayed as colored circles; red circles represent up-regulation and blue circles represent down-regulation (Fig. 8). Clicking on expression values in maps or networks opens up a more detailed display showing the values and p-values (if available) for individual experiments. Experimental effects of compound on biological processes identified by in silico analysis can thereby be investigated.
246
Brennan, Nikolskya, and Bureeva
4. Notes 1. The default binding affinity value for defining the lowest affinity accepted as a hit in quantitative binding type QSAR models is 50 mM. This value represents a fairly nonstringent cutoff. The user can adjust this value to suit the expected maximum biological concentrations of the compound (if known). The high affinity limit is also user-selectable; however values exceeding the lower limit of values represented by the training set for the model may be less precise than those falling within the range of training values. 2. A higher Tanimoto coefficient cutoff (0.8 and above) will return compounds likely to be pharmacologically related to the query structure, however too stringent a cutoff may result in few or no matches being returned, particularly when novel chemical scaffolds are being investigated. We recommend beginning with a high stringency search and reducing the stringency as necessary if too few matches are found. If no or few similar molecules are identified by similarity searching, a substructure search for a known pharmacophore may identify additional molecules with related pharmacology. 3. MetaCore utilizes the ChemDraw browser plug-in for a user to be able to draw and edit chemical structures right in the MetaCore interface. In order to enable structure drawing and editing for search ChemDraw ActiveX/Plugin Pro 9.0 version is required (CambridgeSoft Corp. http://www.cambridgesoft. com). 4. The default number of nodes for overlapping subnetworks generated by the Analyze network algorithm is 50. This gene rally gives an optimal number of nodes and edges for visualization and manipulation on a typical computer monitor. This default value can be increased or decreased under Advanced options on the Network options page. 5. When first uploading a new compound, it is necessary to refresh the browser window before the compound report becomes visible in Data Manager. 6. The Close neighbors option is useful to expand the number of targets used in enrichment analysis in cases where similarity searching returns only a small number of possible targets, in which case the results of enrichment analyses may not be accurate.
Network and Pathway Analysis of Compound–Protein Interactions
247
References 1. Ekins, S., Bugrim, A., Nikolsky, Y., and Nikolskya, T. (2005) Systems biology: applications in drug discovery. In: Gad, S. C. (ed.) Drug Discovery Handbook. WileyInterscience, Hoboken, pp. 123–183. 2. Nikolsky, Y., Nikolskaya, T., and Bugrim, A. (2005) Biological networks and analysis of experimental data in drug discovery. Drug Discov. Today 10, 653–662. 3. Richard, A. M., Yang, C., and Judson, R. S. (2008) Toxicity data informatics: supporting a new paradigm for toxicity prediction. Toxicol. Mech. Methods 18, 103–118. 4. Bender, A., Young, D. W., Jenkins, J. L., et al. (2007) Chemogenomic data analysis: prediction of small-molecule targets and the advent of biological fingerprint. Comb. Chem. High Throughput Screen. 10, 719– 731. 5. Young, D. W., Bender, A., Hoyt, J., et al. (2008) Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol. 4, 59–68. 6. Ekins, S., Andreyev, S., Ryabov, A., et al. (2006) A combined approach to drug metabolism and toxicity assessment. Drug Metab. Dispos. 34, 495–503. 7. Ekins, S., Bugrim, A., Brovold, L., et al. (2006) Algorithms for network analysis in systems-ADME/Tox using the MetaCore and MetaDrug platforms. Xenobiotica 36, 877–901. 8. Herrmann, M. L., Schleyerbach, R., and Kirschbaum, B. J. (2000) Leflunomide: an immunomodulatory drug for the treatment of rheumatoid arthritis and other autoimmune diseases. Immunopharmacology 47, 273–289. 9. Etienne, F., Resnick, L., Sagher, D., Brot, N., and Weissbach, H. (2003) Reduction of sulindac to its active metabolite, sulindac sulfide: assay and role of the methionine sulfoxide reductase system. Biochem. Biophys. Res. Commun. 312, 1005–1010. 10. Murrell, G. A. and Rapeport, W. G. (1986) Clinical pharmacokinetics of allopurinol. Clin. Pharmacokinet. 11, 343–353.
11. Smith, S. J. (1994) Cardiovascular toxicity of antihistamines. Otolaryngol. Head Neck Surg. 111, 348–354. 12. Humphreys, W. G. (2007) Drug metabolism research as an integral part of the drug discovery process. In: Zhang, D., Zhu, M., and Humphreys, W. G. (eds.) Drug Metabolism in Drug Design and Development. WileyInterscience, Hoboken, pp. 239–260. 13. Rusinko, A. III, Farmen, M. W., Lambert, C. G., Brown, P. L., and Young, S. S. (1999) Analysis of a large structure/biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026. 14. Blower, P. E. and Cross, K. P. (2006) Decision tree methods in pharmaceutical research. Curr. Top. Med. Chem. 6, 31–39. 15. Willett, P., Winterman, V., and Bawden, D. (1986) Implementation of nearest-neighbor searching in an online chemical structure search system. J. Chem. Inf. Comput. Sci. 26, 36–41. 16. Matter, H. (1997) Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. J. Med. Chem. 40, 1219–1229. 17. Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595. 18. Ekins, S., Nikolsky, Y., Bugrim, A., Kirillov, E., and Nikolskaya, T. (2007) Pathway mapping tools for analysis of high content data. Methods Mol. Biol. 356, 319–350. 19. Shipitsin, M., Campbell, L. L., Argani, P., et al. (2007) Molecular definition of breast tumor heterogeneity. Cancer Cell 11, 259– 273. 20. Lipinski, C. A., Lombardo, F., Dominy, B. W., and Feeney, P. J. (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26.
Chapter 11 The Flexible Pocketome Engine for Structural Chemogenomics Ruben Abagyan and Irina Kufareva Summary Biological metabolites, substrates, cofactors, chemical probes, and drugs bind to flexible pockets in multiple biological macromolecules to exert their biological effect. The rapid growth of the structural databases and sequence data, including SNPs and disease-related genome modifications, complemented by the new cutting-edge 3D docking, scoring, and profiling methods has created a unique opportunity to develop a comprehensive structural map of interactions between any small molecule and biopolymers. Here we demonstrate that a comprehensive structural genomics engine can be built using multiple pocket conformations, experimentally determined or generated with a variety of modeling methods, and new efficient ensemble docking algorithms. In contrast to traditional ligand-activity-based engines trained on known chemical structures and their activities, the structural pocketome and docking engine will allow prediction of poses and activities for new, previously unknown, protein binding sites, and new, previously uncharacterized, chemical scaffolds. This de novo structure-based activity prediction engine may dramatically accelerate the discovery of potent and specific therapeutics with reduced side effects. Key words: Pocketome, Chemical biology, Flexible docking, Ensemble docking, Drug screening, Activity prediction, SCARE algorithm, Binding site, Virtual ligand screening
1. Introduction Understanding the interactions of all possible chemicals with all possible proteins represents the ultimate goal of chemogenomics. Identification of the subset of protein targets along with detailed binding geometry enables the rational design and optimization of novel agents with desired genomic profiles. The collection of proteins and their folds is defined by the size of a particular genome and its sequence variations. One can imagine a perfect chemogenomics world in which all protein structures are determined by high-resolution crystallography, Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_11, © Humana Press, a part of Springer Science + Business Media, LLC 2009
249
250
Abagyan and Kufareva
in an apo form and in complexes with diverse ligands; and all allosteric and transient ligand-binding pockets are identified in these structures. This set can be converted to a finite collection of binding pockets P1, P2, …, each in several possible conformations (Fig. 1). In contrast to the limited Pocket dimension, the list of ligands is open-ended and includes biological substrates, cofactors, metabolites, therapeutic candidates, and drugs as well as a virtually infinite list of virtual chemical compounds. Cleverly combined with binding data on known ligands and efficient algorithms, our pocket collection can be redesigned to become a series of powerful “recognition devices,” enabling identification of novel chemicals that bind to each pocket, prediction of their binding geometry, and evaluation of their binding affinity – the predictive flexible Pocketome engine. In this chapter, we will describe the progress toward the implementation and the gradual improvement of such an engine, the arising challenges, and the approaches to address them. The collection of experimentally determined ligand pockets have been
Fig. 1. A general representation of complete chemogenomics matrix. Each column, P1, P2, … represents a conformational ensemble of a protein pocket. Different functional states (e.g., agonist bound and antagonist bound) and different locations on the same protein are considered separate pockets. SNPs and mutations may lead to variations of the same pocket. Each row represents a chemical compound. The chemicals are metabolic compounds, drug candidates, and other chemical substances that are relevant for a biological system, including virtual compounds that have never been synthesized. The goal of this structural chemogenomics engine is to report, if experimental data is available, or predict the following: (1) the binding geometry of each compound to the pockets it can bind, and (2) an estimate of the binding free energy eij. While the screening application searching for potential binders among virtual or available chemicals is widely used, comparing eij for the same compound with different pockets (or proteins), a.k.a. specificity profiling, requires new approaches.
The Flexible Pocketome Engine for Structural Chemogenomics
251
previously used to analyze ligand-protein interactions (1), compare binding sites with each other (2), or develop algorithms to predict locations on uncharacterized druggable pockets (3). We will show how these concepts can be expanded to allow (1) using both experimental and predicted pockets, (2) modeling the pocket flexibility, (3) predicting binding geometry and critical atomic interactions, (4) predicting specificity for compounds based on a new chemical scaffold. The Pocketome structures come from two principal sources: (1) high-resolution experimental structures determined by crystallography or NMR; and (2) computational structure prediction, modeling by homology and/or conformer generation. The 54,000 structures (as of November 2, 2008) deposited in the Protein databank (PDB (4)) are of about 11K proteins (clustered to the 95% identity level), and 13.6K nonoverlapping protein domains. About 3K proteins (of the 11K) are from human or closely related mammals. Of those structures, 85% are solved by X-ray crystallography with about half of them at better than 2.2 Å resolution. Most of the remaining 15% are determined by NMR. For some proteins, their pocket variation has already been captured by multiple experimental structures with or without ligands, e.g., over a hundred PDB entries for the human CDK2 kinase. For others, a delicate and laborious process of building and validating initial models needs to be performed. Every second protein domain in PDB is represented by more than one PDB entry: ~20% of proteins have two structures, and the remaining 30% more than two structures. Some of them are mutants (e.g., ~400 of T4 lysozyme structures from Brian Matthews’s laboratory) but in most cases, these multiple structures represent snapshots of the pocket conformational diversity. Furthermore, many entries contain more than one chain in an asymmetric unit. These protein structures related by noncrystallographic symmetry can also be used as a source of multiple pocket conformations. The noncrystallographic symmetry-related subunits increase the number of domains already represented by multiple experimental conformations from 50% to the overall level of 75% (Fig. 2). About 5% of the domains are represented by more than 30 copies. The abundance of experimentally determined protein structures should not, however, obscure the fact that for the majority of protein domains, no structural information is available at all. The coverage of the mammalian proteome by experimentally determined structures is still only about 10–15% and varies between the protein families. Structures of only four G-protein coupled receptors (GPCR), out of about 900, have been determined by crystallography, and only about one-third of the human kinases and the same fraction of the nuclear receptors. For many of those proteins, models by homology can be built (5), although the
252
Abagyan and Kufareva
Fig. 2. A histogram of experimental structural variability of the 11,168 protein domains in the Protein databank (PDB). Twenty-five percent of protein domains are represented by a single structure, and 5% are represented by more than 30 structures. Three-quarters of the domains are represented by more than one conformation. The additional conformers are found in either different PDB entries or noncrystallographic symmetry-related domains of the same entry.
quality of those models and their usefulness as ligand recognition devices may vary widely. Whenever some high-affinity ligands for a given pocket are known, this quality may be improved through the so-called ligand-guided modeling (6). Even with the experimentally determined pockets, the availability of two or more structures does not guarantee sufficient coverage of the pocket conformational space. Similarly, the homology modeling provides only a starting conformation that may or may not be sufficiently accurate to explain binding any ligands. To turn these models into the powerful ligand recognition devices, one needs to complement them by additional tools for pocket conformational variability modeling. The strength of our proposed Pocketome engine is best revealed in cases when the pocket models are accurate and cover the essential conformational space. For such cases, the Pocketome can
The Flexible Pocketome Engine for Structural Chemogenomics
253
provide answers and explanations to many essential chemogenomics questions, including the effect of single nucleotide polymorphisms (SNPs) and mutations and the inter-species differences. It can also help the prediction of the binding pose and binding affinity of new chemicals to existing pockets, as well as the activity of compounds against new proteins including orphan receptors. Indeed, combined with an accurate docking and scoring method, a good threedimensional pocket model has unmatched specificity for the right ligand. In (7) we demonstrated that the high-resolution structure of bacteriorhodopsin recognizes retinal as the best rank out of 7,000 metabolites and bio-substrates. The pocket structure of malarial enoyl-acyl carrier protein reductase (ENR) recognized its cognate ligand as the top score out of 200,000 drug-like molecules (C. Smith et al., unpublished). Structure inaccuracies and the induced fit effects represent the major challenges on the way of achieving this high predictive power. The rest of this chapter is organized as follows. Subheading 2.1 is focused on algorithms, approaches, and challenges of the Pocketome compilation. Subheadings 2.2–2.4 concentrate on the three major types of chemogenomics applications: ligand binding pose prediction, screening, and activity profiling.
2. Methods 2.1. Compiling the Flexible Pocketome
The 2008 release of the PDB (4) contains more than 10,000 unique protein domains, many of them crystallized with relevant small molecules, and many represented by more than one high-quality structure. This presents a unique possibility for assembling a Pocketome core of purely experimental data. The structures need to be collected, superimposed, and clustered into multiconformational pockets by a procedure allowing minimal manual intervention with the possibility of timely updates. Advanced structure quality control techniques should be employed to mark the unreliable structures and optimize the ligand recognition potential of the collected pockets. An effort should be made to represent each pocket by a set of conformers sufficiently diverse to accommodate various ligand chemotypes. The latter task may require additional pocket “induced-fit” modeling.
2.1.1. Quality Control of Crystallographically Determined Pockets
Experimentally determined atomic models are often incomplete and have method-specific uncertainties and ambiguities. Many models have errors or “fantasy” atoms not supported by the experimental data. Finally, the protonation and tautomerization
254
Abagyan and Kufareva
2.1.1.1. Incomplete and Ambiguous Structural Pocket Models
states of the atoms of the pocket and the ligand are not even defined in PDB entries and must be predicted separately. X-ray crystallography, the best technique of structure determination available at the moment, provides the electron density map for the structured parts of a crystallized protein construct. As a result, the following issues can be observed in even high-quality structures: 1. Protein-related uncertainties (a) Missing or ambiguous fragments of the protein, ranging in size from as a single side-chain atom to multi-residue loops; (b) Ambiguous orientation of polar side chains. At an average resolution of 2.2 Å, 180° rotations of terminal functional groups of Asp, Glu, and His are difficult to distinguish. When placing these side chains in the density, crystallographers often rely on the so-called chemical intuition, which is not applied consistently and optimally. However, the majority (86%) of the Asn, Gln ambiguities can be resolved if special energy-based methods are applied (8). 2. Ligand-related uncertainties (a) Missing or ambiguous electron density for a part of the ligand. Existence of alternative ligand poses with comparable or better fit to the electron density and to the binding pocket; (b) Even the identity of the ligand may be ambiguous. Water molecules are frequently placed in “unrecognized” electron density either intentionally or by the PDB recommendation. For example, PDB entry 2gwx contains erroneous water molecules instead of a found fatty acid as revealed in another structure of the same protein 2baw (9). In a number of structures, one can find the so-called UNrecognized Ligand descriptions, or UNL. Also note that the unrecognized density previously was forcibly filled with water molecules and not annotated in the ATOM records. The ligand-related problems are more frequent than the protein side-chain-related problems because of high chemical diversity of the ligands and (most often) the absence of the positional/orientational constraints in the form of covalent bonds. The ligands frequently have a quasisymmetrical shape allowing multiple placements. The first three panels of Fig. 3 show a progressively worsening electron density for a bound ligand leading to a progressively higher ambiguity in ligand placement. The chemical intuition of the experimentalist is often insufficient for correct ligand placement, but proper (energy based) docking tools are rarely used.
The Flexible Pocketome Engine for Structural Chemogenomics
255
Fig. 3. Four levels of reliability of ligand positioning into the electron density. The coordinates are taken from the PDB, and the density obtained from the Uppsala EDS server. Only the pose of the first (leftmost) ligand is unambiguously defined by the electron density. The last ligand represents a complete fantasy. More than a third of the ligands in pockets in the PDB need to be either ignored or repositioned.
The ligand and protein ambiguities can have a large negative impact if one does not re-evaluate them in model refinement, ligand–pocket interactions study, or testing docking algorithms. 2.1.1.2. Fantasy Atoms and Misleading Atom Annotations
Sometimes, atoms or structure fragments not supported by any experimental electron density are introduced into a deposited model. We will call them fantasy atoms. While completely wrong structures are rare (10, 11), local errors affecting our ability to build a credible structural pocketome are ubiquitous. The question has been raised in scientific literature several times, and systematic analysis of consequences of crystallographic structure defects for smallmolecule binding has been performed (9, 12), but until recently, it was never properly addressed by the crystallographic community. As a control tool, the electron density maps can be rebuilt from the deposited atomic coordinates and the experimental structure factors (13, 14); however, the deposition of the structure factors in PDB became compulsory only in February 2008 (35 years too late). Even with those structure factors, the reconstruction of the most interpretable electron density requires the final values of phases, which are most often unavailable. There are three legal ways of annotating the uncertainties in an atomic model due to low local resolution or gaps in the electron density map: 1. Not depositing atoms that are not visible in electron density (or creating “alternative” records for atoms ambiguously defined by the density); 2. Assigning an occupancy of zero to the atoms introduced for the sake of chemical completeness, but not supported by the electron density; 3. Assigning ostensibly high temperature factors (the so-called B-factors) to the atoms with the made-up coordinates. Unfortunately, “fantasy” atoms are often found in crystallographic structures with full occupancy values, low or medium B-factors, or unmarked as multiple alternatives. Combined with the ambiguities
256
Abagyan and Kufareva
of the ligand placement, it creates a relatively high occurrence of “heavy atom” placement errors for ligand pockets or ligand poses. Our estimate for the fraction of unreliable pocket or small-molecule complex structures, with deposited experimental structure factors, is around 35% (I. Kufareva et al., unpublished). Many of the unreliable ligand–pocket models can be rescued by an energy–based refinement and more rigorous sampling of conformational alternatives. 2.1.1.3. Predicting Hydrogen Positions, Formal Charges, and Tautomerization
The protonation state assignment problem falls into several subproblems (15–19). Having accurate heavy atom positions often simplifies the task of adding protons but does not solve it. What needs to be established is the following: 1. The charged state of all His, Asp, Glu, Arg, Lys, and Cys residues around the binding site; 2. The formal charges of the ligand atoms; 3. The e or d tautomers of uncharged Histidines; 4. The tautomeric form of the ligand; 5. The orientation of all movable hydrogen atoms (e.g., in rotatable hydroxyl groups); 6. The orientation of all essential water molecules included in the core pocket definition. The most rigorous approach to address the first two problems requires a pH-dependent calculation of electrostatic effects to predict the pK values of individual sites of titration (17). However, in most cases a simple set of the following intuitive rules may resolve the uncertainty: (1) never allow an uncompensated buried formal charge; (2) consider compensating charges for buried ionizable groups and metals to maintain electroneutrality outside the buried cluster of charges; (3) take the most likely state at a given pH for the exposed groups. The problem of orienting rotatable hydrogen atoms and water molecules can be solved by restrained global optimization in the relevant subspace of internal coordinates (20, 21).
2.1.1.4. Energy-Based Refinement of Initial X-Ray Pocket Models
While the tight packing and cooperative hydrogen-bonding network are defining characteristics of a ligand–pocket interactions, frequently the initial atom positions in a PDB entry do not provide the correct optimal interactions. The errors may affect the ligand itself or be related to the suboptimal or incorrect placement of side chains and hydrogen atoms around the ligand. In a recent review (13), G. Kleywegt pointed out several pitfalls, that, however, can be avoided to produce plausible models. In particular, a pocket model can be subjected to an energy-based refinement by sampling positions of the movable hydrogen atoms (especially the polar hydrogens), the heavy atoms not clearly defined in the electron density, and the density-ambiguous rotations of polar side chains.
The Flexible Pocketome Engine for Structural Chemogenomics
257
Fig. 4. (a) Energy-based optimization of the ligand and the pocket side chains often leads to a more energetically favorable conformation and improved electron density fit. Unrestrained sampling of hydroxyl groups of b2AR Ser203 and Ser204 in the recently solved X-ray structure (PDB ID 2rh1) lead to improved energetics while preserving the electron density fit. (b) Effect of local heavy atom energy refinement/redocking on the pose and interactions of the pregnane X receptor bound to SRL (PDB ID 1nrl). Performed without any influence of the electron density, the ICM optimization shifted the ligand by 1.3 Å and found a pose with better binding interactions and better fit to the electron density.
One example in which an atomic energy-based refinement of hydrogen atoms and undefined heavy atoms produced a more realistic pocket model is shown in Fig. 4a. The pocket model in a recently solved structure of b2 adrenergic receptor (b2AR) was improved by automatic reorientation of hydroxyl groups of Ser203 and Ser204 that are not clearly defined in the density (22, 23). Since the c1 angles of serine side chains are sampled, here too this procedure goes beyond the optimal placement of the hydrogen atoms. Another example is presented in Fig. 4b. A simple energy-based refinement of a ligand in the structure of pregnane X receptor (PXR) displaces it by 1.3 Å from the deposited crystallographic position, which leads not only to the improvement of the intermolecular contacts, but also to a better placement of the ligand in the experimental electron density, even though the density fit term was not used during the refinement. It is clear that each experimental pocket in the Pocketome needs to be validated by density analysis, subjected to restrained energy-based refinement for hydrogen atoms and heavy-atom ambiguities, and further evaluated by the calculated binding energy. If the calculated binding energy is indicative of nonbinding, the pocket conformer must be dismissed or set aside for a more detailed consideration. 2.1.2. Clustering and Analysis of the Flexible Experimental Pocketome
Additional difficulties in the way of compiling the experimental flexible Pocketome core stem from the artificial crystallographic constructs representing a particular protein, crystal packing, and other artifacts that present a substantial challenge for both manual
258
Abagyan and Kufareva
and automatic identification of true biological interactions. We here present a fully automatic protocol for the multiconformational experimental Pocketome collection that uses a number of filters to overcome these difficulties. Initial characterization of this set in terms of the observed induced fit changes is also provided. 2.1.2.1. Clustering the PDB Pockets
We have proposed a fully automatic procedure for the Pocketome generation. The procedure clusters all available PDB structures with drug-like ligands into the Pocketome ensembles (45) (24). The main steps of the procedure are as follows: 1. The amino acid sequences of all experimental constructs in the PDB are extracted and cleared from 85 common protein expression tag definitions (e.g., five consecutive histidines), and a nonredundant set of “tag-purified” sequences is produced. 2. The full Swissprot sequences (25, 26) are searched against the purified unique set of the PDB sequences. Three-dimensional (3D) domains are annotated based on PDB sequence boundaries, and their structures are clustered to 95% sequence identity. 3. A comprehensive collection of ~3,000 nontrivial drug-like molecules from the PDB is built by (a) excluding ubiquitous substrates, e.g., ATP, and (b) applying filters to exclude pockets with non-drug-like molecules (e.g., too large or too small) in the entire PDB Chemical Component Dictionary. That collection is merged with the above protein domain ensemble set to obtain multiple structure ensembles cocrystallized with at least one relevant compound. 4. All protein structures in the ensemble are superimposed using only the backbone atoms in the immediate vicinity of the ligands. The superimposition algorithm is based on an iterative procedure that, through an unbiased weight assignment to different atomic subsets, gradually finds the better superimposable core of atom pairs between the template and the other structures, and includes the following steps: (a) The atomic equivalences are established between the two structures, and a vector of per-atom weights {W1, W2, …, Wn} is set to {1, 1, …, 1}. (b) The weighted superimposition is performed (27) and the root mean-square deviation (RMSD) is evaluated. (c) The deviations {D1, D2, …, Dn} are calculated for all atom pairs, and their 50-percentile (D50) is determined. (d) The new weights are calculated according to the formula 2 ) Wi = exp (–D i2 / D50 While the well-superimposed atoms are assigned weights close to 1, the weights associated with strongly deviating atom pairs get progressively smaller.
The Flexible Pocketome Engine for Structural Chemogenomics
259
(e) Steps from (b) to (d) are iterated until the RMSD value stops improving or the maximum number of iterations (set equal to ten for this case) is reached. (f) The final superposition is performed with weights smaller than exp(−1) set to zero. The use of this algorithm guarantees that the overall quality of the superimposition is not compromised by the presence of a minority of strongly deviating atoms. 5. The obtained optimally superimposed complexes are automatically annotated in terms of the complex composition: homo- and hetero-multimeric receptors, catalytic metal ions as well as cofactors and their analogs are automatically identified based on the consistency of each of these features throughout the ensemble. Compositional and conformational differences between the individual ensemble structures are recorded. Where applicable, symmetry neighbors are generated and taken into account. 6. The ligands are analyzed for correctness of their covalent geometry, and their crystallographic quality checked as described above. Application of this procedure to the PDB release of 2008 produced a set of more than 800 structure ensembles. This set serves as an experimental core of the Pocketome. 2.1.2.2. Analyzing the Induced Fit in the Experimental Pocketome
The collected Pocketome core provides a fairly comprehensive representation of transient protein–ligand interactions in PDB and allows characterization of the protein and induced conformational changes. Given a family of complexes formed by a particular protein domain, we compared each complex with all other complexes of the same composition, complexes of other compositions, and unbound structures. The unbound structures were also compared to one another to assess the degree of changes stemming from natural protein flexibility rather than induced by binding partners. The obtained data is presented in Fig. 5. In the majority of the cases (77%), comparison of a ligandbound form of a protein to its unbound form or a complex of different compositions shows a strong deviation (>1.5 Å) of at least one ligand–pocket interface residue side chain. On average, about 18% of the ligand interface residues deviate above that threshold (1–2 side chains per interface). Moreover, in a number of cases, significant backbone deviations are observed as well. The corresponding values observed between complexes of the same composition due to natural protein flexibility (white bars), or even between unbound structures (gray bars), are significantly lower.
2.1.3. Predicting Unknown or Allosteric Pockets with ICM PocketFinder
Of the large variety of protein pockets capable of binding small molecules with appreciable affinity, only a small fraction has been cocrystallized with at least one ligand. The majority of pockets of
260
Abagyan and Kufareva
Fig. 5. Flexibility of small-molecule binding interfaces and induced fit. About one-fifth of interface side chains are displaced by more than 1.5 Å when compared between different complex compositions. At least one interface residue backbone deviates by more than 1.5 Å in 33% of the cases, at least one side chain in 77% of the cases.
the full Pocketome remain either completely unknown or only approximately defined. That includes (1) allosteric pockets distant from a well-known “main” pocket (e.g., the CK2b binding interface of the protein kinase CK2a (28)), (2) pockets in apostructures of nonenzymes (e.g., the “hydrophobic pocket” of a1 antitrypsin (29)), and pockets in orphan receptors (e.g., orphan nuclear receptors or GPCRs). In its most complete version, the Pocketome must be completed by likely cavities and allosteric pockets even if crystallographically no small molecules have ever been observed in these cavities. We designed a pocket prediction algorithm based on a physical field, yet very general and relatively independent on the chemical nature of the ligand. This method, called ICM PocketFinder, performs the Gaussian convolution of the Lennard-Jones
The Flexible Pocketome Engine for Structural Chemogenomics
261
potential around a protein (3). The value of the potential in the 3D grid point r is calculated by
P0 (r) =
Aa
∑d a
12 ar
−
Ba dar6
where the sum is taken over all atoms a in the system, dar is the distance from atom a to the grid point r, and the atomdependent parameters Aa and Ba are taken from the Empirical Conformational Energy Program for Peptides (ECEPP)/ three molecular mechanics force field. The obtained P0(r) values were truncated at 0.8 kcal/mol to retain only the attractive regions. The Gaussian convolution of the potential in point r is given by
x − r 2 P (r) = ∫ exp − P0 (x )dx , l =2.6Å l
The resulting field calculated as a 3D grid map with 0.5 Å grid step size is contoured using an in-house algorithm to produce envelopes, whose location, shape, and volume are indicative of the ligand binding pockets. The ICM PocketFinder algorithm was validated on a large collection of experimentally characterized pockets and showed an impressive performance (3). It is able to provide an initial localization of novel (e.g., allosteric) ligand binding sites. Clearly, every newly predicted pocket needs to be experimentally validated in the context of the Pocketome project. The size and character of the predicted pockets also help to estimate the druggability of a pocket. For example, running the ICM PocketFinder calculation on multiple structures of kelchlike ECH-associated protein 1 (KEAP1) resulted in a pocket volume not exceeding 175 Å3, while a typical small-molecule ligand binding pocket has a volume >200 Å3. That largely explains the failure of multiple attempts to develop a small molecule binding to this site with appreciable affinity, even though it is known to be a site of peptide interaction. A predicted pocket may often have borderline characteristics just below a safe druggability threshold. In this case, exploring the conformational plasticity of the pocket (see Subheading 2.1.4) may help to find conformations more relevant for smallmolecule binding. In the context of the Pocketome project, the predicted pocket envelopes may define the bounding box for conformer generation and pocket refinement.
262
Abagyan and Kufareva
2.1.4. Generating Theoretical Pocket Models and Conformers
2.1.4.1. Generating Initial Pocket Models for Unsolved Proteins by Homology
Theoretical pocket models will complement the validated and refined experimental pocketome in two ways: 1. Initial models can be built for proteins not represented in the PDB at all; 2. Additional diverse pocket models can be generated around one or several initial models or experimental structures to provide a realistic coverage of the potential ligand-induced pocket conformations. As we mentioned in the introduction, the experimental 3D pocket models are not available for the majority of the Pocketome in any particular organism. However, accurate models of a significant fraction of these pockets can still be built by homology with existing structures and refined. A practical homology modeling procedure for pockets is relatively simple (30) once a reliable alignment of the query sequence to its single homologous template is established. In the context of the Pocketome, we only need local alignment in the vicinity of the pocket. This facilitates the task because local sequence similarity is frequently higher than the overall sequence identity (22, 23). The homology model-building recipe from a single template includes (1) inheriting the backbone conformation of the template, (2) replacing the nonidentical side chains retaining as many torsion variables from the template as possible, and (3) disregarding large insertions in the modeled sequence. Though the missing loops and ambiguous side chains can be rebuilt at this stage, it is much more practical to postpone until the ligand-guided refinement stage, as the apo-refinement often results in ligand-incompatible conformations. If several homologous templates are known, a model must be built from every template. The multiple models can be ranked by a combined scoring function involving local sequence similarity and structure resolution. It is important to note that switching from resolution 2.6 to 2.1 is always worth losing 10% or even 20% in sequence identity. These multiple pocket models can further be refined using the information about a few strong ligands either to improve the pocket model and generate better pocket models, or to select models with better discrimination between binders and nonbinders known for the pocket of interest. We refer to the latter method as ligand-guided (or ligand-steered) modeling. Finally, as homologous proteins often share similar patterns of flexibility, a model of a particular ligand-compatible conformation of a protein may be built based on its ligand-incompatible structure, as long as the former was observed in its homolog. This strategy may be applied, for example, to build the so-called DFG-out conformations of multiple protein kinases for which
The Flexible Pocketome Engine for Structural Chemogenomics
263
Fig. 6. Pocketome entry for the kelch-like ECH-associated protein 1 (KEAP1). Four superimposed X-ray structures and the ICM PocketFinder envelope are shown. This protein was unsuccessfully targeted by a small-molecule inhibitor at Merck. The Pocketome analysis demonstrates that the pocket is too small and too flexible for a strong small molecule binder.
only DFG-in structures are available in the structural kinome (31) or initial antagonist-bound conformations for the androgen receptor (6). 2.1.4.2. Guiding Homology Modeling or Conformer Generation by Ligands
Ligand guidance can be provided in two main forms: 1. To generate multiple low-energy conformers by restrained cosimulation of the pocket complex with one or several strong and/or diverse ligands (further referred to as the seed ligands); 2. To select from a set of experimental or generated conformers by testing each pocket conformer by its ability to discriminate between a test set of known ligands of this pocket and a set of nonbinders. A variety of discrimination measures are available. The second test can also exist in a weaker form in which the selection is done on the basis of ability of a model to reproduce some experimental restraints (e.g., atomic contacts) rather than ability to select by the predicted binding score. Therefore, given one or several pocket models built by homology, the main steps of their ligand-guided selection and refinement are as follows (6, 22, 23):
264
Abagyan and Kufareva
1. Compilation of a discrimination benchmark consisting of known binders to the pocket and known nonbinders. A challenging surrogate for nonbinders can be a large set of molecules known to bind to this class of proteins (e.g., all kinase inhibitors) but not necessarily to the pocket of interest; 2. Generation of multiple pocket models by a conformational generator with or without an active “seed” ligand; 3. Screening of the discrimination benchmark against each of the models to build a list of compounds ordered by their predicted binding score; 4. Evaluation of the selectivity of each model by one of the discrimination measures, e.g., area under the receiver operating characteristic (ROC) curve (AUC); 5. Selection of one or several best models. Steps 2 through 5 can be iterated until satisfactory level of discrimination between binders and nonbinders is achieved. 2.1.4.3. Energy-Based Torsional Sampling: Fumigation
Energy-based torsional sampling is often used to generate additional pocket conformations. It is important to understand, however, that most ligand-compatible conformations are nonoptimal for an unbound protein. Conversely, the optimal conformations in the absence of ligand are usually characterized by protein side chains collapsing into the pocket, and therefore are irrelevant for ligand binding. We recently presented a new computational technique called fumigation and aimed at generating more “druggable” conformations of the apo small-molecule ligand binding pockets. This technique is based on torsional sampling of the receptor side chains in the presence of a repulsive density representing a generic ligand. The density is calculated as follows: (1) simultaneous conversion of pocket side chains (except Ala, Gly, and Cys) to Ala; (2) construction of an atom density grid map for the obtained “shaved” protein; (3) repeated spatial averaging of the map in order to obtain a smoothed density map, which fills the cavities of the original protein; and (4) taking the difference of the smoothed and the original maps. Next, the internal variables controlling the pocket shape are sampled using the ICM biased probability Monte Carlo sampling procedure, with the generated density included as a penalty term in the combined energy function (Fig. 7). This technique was successfully applied to the discovery of small molecules disrupting the subunit interaction of the protein kinase CK2 (32, 33). Starting from apo- (closed) structures of the pocket, we predicted conformations that were compatible with the binding of either a small molecule or the C-terminal fragment of the regulatory protein CK2b. These conformations were then searched against by docking and virtual ligand screening.
The Flexible Pocketome Engine for Structural Chemogenomics
265
Fig. 7. Pocket fumigation is a modeling technique based on torsional sampling in the presence of a repulsive density representing a generic ligand. (a) the original X-ray structure; (b) the result of Ala conversion: the “largest pocket” density is generated; (c) a “druggable” pocket conformation obtained by Monte Carlo simulation in the presence of the density.
2.1.4.4. Fragment Omission
Omission modeling represents a reasonable alternative to the conformational ensemble approach. Using this approach, one may predict the correct ligand docking pose and the induced pocket conformational changes based on a single structure of the protein, which, most often, is incompatible with that ligand. The omission approach relies on the generation of a “gapped” model of the binding pocket in which parts of its structure are removed. The expectation is that, in one of the gapped models, the main obstacle for the correct ligand docking is eliminated, while the remaining intact parts of the pocket are still sufficient for proper positioning of the ligand. The omitted fragments are later rebuilt in the context of the complex at the refinement stage. In a general form of the algorithm, the gaps may include single side chains, multiple side chains, complete loops, domains, and other parts of the backbone. The induced fit changes upon ligand binding may or may not be limited to side-chain displacements. One should therefore consider different cases of structure fragment flexibility. While compiling a flexible Pocketome, the known or predicted highly deviating fragments may be omitted in a systematic way to result in a corresponding number of the “gapped” models. The SCARE (SCan Alanines and Refine) algorithm contains generation of the gapped pockets in which pairs of adjacent-in-space side chains are systematically omitted (21) as the first step used to identify likely ligand poses. In contrast, in the Deletion-Of-Loop from PHe-IN (DOLPHIN) algorithm a more specific deletion of the so-called DFG motif from the activated structures of the protein kinases is taking place (31). We also studied the effect of extracellular loop 2 (XL2) deletion on the ability of b2 adrenergic receptor to identify its known antagonists to establish whether XL2-deleted models can be used for other GPCR models (22).
266
Abagyan and Kufareva
2.2. Ligand Binding Pose Prediction with the Flexible Pocketome
A correct or a nearly correct, i.e., reproducing all essential intermolecular atomic contacts, prediction of the ligand binding geometry is an important task because it can (1) help in understanding the basics of drug–receptor interaction, (2) guide ligand optimization, and (3) elucidate the consequences of residue variation in the receptor. Moreover, it is a necessary, although often not sufficient, condition for accurate binding energy calculations. Correct ligand pose prediction, which depends on the quality of the pocket ensembles and the docking algorithm, enables more advanced Pocketome applications such as ligand screening and ligand selectivity profiling. This section presents ICM ligand docking as an efficient tool for binding geometry prediction and describes its challenges and limitations.
2.2.1. ICM Ligand Docking
In its pure and general form, computational ligand docking (i.e., binding pose prediction) represents a problem of global minimization of the local estimate of the Gibbs free energy of binding in a multidimensional conformational space of the interacting partners (34). Due to the properties of the energy function, the problem is impossible to solve analytically; moreover, the huge dimensionality of the conformational space makes exhaustive conformational sampling unfeasible. To tackle the conformational space problem, several steps must be made. First, the molecular objects can be represented in internal coordinates, naturally reflecting their covalent bond geometry (35). Unlike simple Cartesian coordinates, internal coordinates consist of covalent bond lengths and angles, dihedral angles (i.e., torsion and phase angles), and six positional variables of a molecular object. Because of chemical bond rigidity, most molecular objects can be accurately represented by free torsion variables, while keeping covalent bonds, angles, and phase angles fixed (36). This dramatically reduces the number of free variables in the system without sacrificing accuracy, while improving convergence time and radius for conformational optimizations by orders of magnitude. The ligand docking procedure in internal coordinates using grid potentials was described in (37). Let us describe the main steps. First, a diverse set of conformers is generated by ligand sampling in vacuo. The generated conformers are then placed into the binding pocket in four principal orientations and used as starting points for Monte Carlo optimization. In simple cases when the binding pocket undergoes only minor conformational changes upon binding, we can further limit the search space by excluding the receptor from the explicit sampling. Instead, the binding pocket can be represented as a set of rigid precalculated grid potential maps. The energy function to be optimized is then the ligand internal strain and a weighted sum of the grid map values in ligand atom centers. While being
The Flexible Pocketome Engine for Structural Chemogenomics
267
a much less accurate Gibbs energy approximation, this function allows fast computation and analytical local minimization. Moreover, the potentials can be modified so that some degree of molecule interpenetration is allowed, providing means to model minor induced conformational changes in the pocket. 2.2.2. ICM Full-Atom Ligand–Receptor Complex Refinement and Scoring
At the output of the ligand docking procedure, a limited set of ligand conformations compatible with the receptor at the grid potential map approximation level is obtained. These conformations can be further scored using a more accurate full-atom-based scoring function. ICM scoring function has been previously derived from a multireceptor screening benchmark as a compromise between approximated Gibbs free energy of binding and numerical errors (38, 39). The score is calculated by: Sbind
= E int + T ∆S Tor + E vw + a × E el + a × E hb + a × E hp + a × E sf 1 2 3 4
where Evw, Eel, Ehb, Ehp, and Esf are Van der Waals, electrostatic, hydrogen-bonding, nonpolar, and polar atom solvation energy differences between bound and unbound states; Eint is the ligand internal strain; DSTor is its conformational entropy loss upon binding; T = 300K; and ai are ligand- and receptor-independent constants. Because of a higher sensitivity of this function, it often downscores the slightly imperfect complex geometries tolerated at the level of the potential grid maps. To avoid this, the full-atom models of the pocket–ligand docking complexes may be refined prior to the scoring stage. The most realistic scenario of a full-atom refinement includes local gradient minimization of the ligand and the surrounding pocket side chains, and global Monte Carlo optimization of rotatable hydrogen atoms. During the refinement, the ligand heavy atoms are tethered to their docking positions with a harmonic restraint whose weight is iteratively decreased. 2.2.3. Expected Accuracy of Ligand Docking to a Single Pocket Conformer
Early docking algorithms were relying on various assumptions about the ligand and its binding pocket that made the problem less realistic but more computationally tractable. In the easiest formulation of the problem, the ligand was considered a rigid molecule, which needed to be placed in a rigid binding pocket in the most energetically favorable orientation. Clearly, this dramatically reduced the search space and greatly improved chances of finding the optimal solution. The second, more realistic problem formulation assumes flexibility of the smaller molecule (ligand), while the binding pocket is still considered rigid. These simplified methods give excellent results in an artificial setting when the receptor and the ligand represent separated components of a cocrystal complex (the so-called self-docking). It is important to realize, however, that in real-life applications,
268
Abagyan and Kufareva
flexible small molecules bind to flexible binding pockets, and both are expected to change their configuration upon the transition from the unbound to the bound state. A good docking method must be capable of handling both flexibility aspects, and therefore needs to be developed and benchmarked on the so-called cross-docking examples. The combination of (1) optimal ligand sampling strategies, (2) efficient representation of the rigid receptor as a set of softened potential grid maps, and (3) accurate full-atom scoring functions allows the ICM rigid receptor docking procedure to successfully predict correct ligand binding geometry even in cross-docking examples, when the induced fit changes are restricted to minor side-chain and backbone readjustments (40, 41). As described in Subheading 2.1.2, weak plasticity characterizes a substantial fraction of the Pocketome. To evaluate the expected success rate of a straightforward ligand docking procedure in cross-docking applications, we applied ICM docking to a subset of the PDB ensembles described in Subheading 2.1.2. This subset contained as many as 99 therapeutically relevant proteins, each of them cocrystallized with various ligands in at least three different conformations. The total of 1,113 of structures made 107 conformational ensembles (some of the 99 proteins were associated with more than one ensemble), and included 300 drug-like ligands. Each ligand was docked in all structures of its receptor ensemble except its cognate (cocrystal) structure using the ICM rigid receptor docking protocol described above. We found that only in 46.6% of the cases the binding geometry prediction was correct ( S2, S3, S3¢ > S1, S2¢. In agreement with a previous analysis (45), the S1¢ appeared as the primary site for achieving selective interactions. However, our results suggested that there was no unique solution to
4
6
10
8
13, 8
3
3
3
2
13
Cytochrome P450 2C
Kinases
MMP
Bile acid transport system
Ephrin kinase and ephrin
NOS
PPAR
NOS
Kinases
MMP
5
10
5
10
4
5
10
X-ray and homology modeling 5
X-ray
X-ray and homology modeling 3
X-ray
X-ray
Homology modeling
X-ray and NMR
X-ray and homology modeling 10
X-ray
Homology modeling
10
DRY, C3, N1
NM3
DRY, N1
DRY, C3
DRY, N1+, NM3
DRY, C3, O
C3, OH
DRY, several polar probes
DRY, N1, O
DRY, C3
DRY, C3, N1+, NM3, O, OS
No. GRID probesd Discriminating probese
(48)
(47)
(46)
(45)
(44)
(43)
(42)
(41)
(40)
(39)
(35)
References
b
a
Number of the target family included in the corresponding GRID/CPCA study Total number of protein structures included in the corresponding GRID/CPCA study c Approach to obtain the protein structures included in the corresponding GRID/CPCA study d Number of GRID probes in the corresponding GRID/CPCA study e GRID probes showing the largest contributions to the discrimination between the protein structures included in the corresponding GRID/CPCA study. DRy hydrophobic probe, C3 methly probe, N1 neutral flat NH e.g., amide, N1+, sp3 amine cation, NM3 trimethylammonium cation, O carbonyl oxygen, OS oxygen of sulfone or sulfoxide, OH phenolic OH
57
10
13
13
9
13, 8
26
10
26
4
X-ray
3
Serine protease
9
No. membersa No. structuresb Originc
Target family
Table 2 List of published GRlD/CPCA studies
Structure-Based Chemogenomics: Analysis of Protein Family Landscapes 289
290
Pirard
this problem, by exploring only one subpocket. Our work also highlighted the importance of steric, hydrophobic and nonpolar interactions to achieve selectivity. Finally, the GRID/CPCA and DrugScore CPCA models served to rationalize experimental binding affinity differences for nine series of inhibitors from the medicinal chemistry literature. The binding mode of these inhibitors had been inferred from crystal structures or from docking. 3.3.2. Variants of GRID/CPCA
Sheridan et al. (50) developed the FLOGTV method, which is based on FLOG maps and on the trend vector paradigm. FLOG maps are conceptually similar to those produced by the GRID force field (51). The trend vector paradigm comes from the QSAR field (52, 53). In the FLOGTV approach, a trend vector captures the differences in the map space between desirable proteins (targets) and undesirable ones (antitargets). Researchers at Merck applied FLOGTV, which is mathematically simpler than GRID/CPCA to four biological systems: aspartyl proteases, serine proteases, dihydrofolate reductases, and kinases (50). In each case, the differences revealed by FLOGTV appeared to be consistent with the conclusions of a visual inspection of the crystal structures of the different targets. For serine proteases, similar discriminating features were identified by FLOGTV (50) and GRID/CPCA (35). Another group (54) performed hierarchical clustering on the interaction fields of hydrophobic or polar probes, computed for a set of aligned proteins. These interaction fields are knowledgebased potentials, encoding the probability of interaction between a binding site and a polar or hydrophobic probe. Hierarchical clustering on knowledge-based potentials was applied to classify and compare the binding sites of three target families, namely the ligand binding domain of nuclear hormone receptors, the ATP site of protein kinases and the substrate binding site of proteases. The results were in good accordance with those of GRID/CPCA on those target families. Both GRID/CPCA and its variants depend on a structural alignment of proteins. Typically, the width of the minima in a MIF map is only about 1 Å. As a consequence, a difference in coordinate placement of 1 Å becomes significant. Although sequence homology between related proteins can help to suggest a structural a alignment, aligning protein structure is not always a trivial task. In addition, selectivity issues can be encountered with proteins adopting different folds (55). It is difficult to obtain a reasonable structural alignment of these proteins. Therefore, alignment-independent descriptors have been developed (see Note 4).
Structure-Based Chemogenomics: Analysis of Protein Family Landscapes
291
4. Notes 1. Scientists analyzing macromolecular crystallographic data often overlook the uncertainties, the assumptions, the biases or even the mistakes, introduced during the derivation of an atomic model from an electron density map. This process entails some subjective interpretation of an electron density map by a crystallographer (56). Uncertainties involve the identity of protein or ligands atoms that appear as isoelectronic in an electron density map. For the same reason, it can be difficult to distinguish water molecules from sodium and ammonium ions, which are common constituents of crystallization media. Uncertainties also occur at the level of a whole residue or even of protein regions, which diffract poorly because of their greater mobility. The same ambiguities affect mobile groups of a bound ligand. As a consequence, a crystallographer needs to incorporate into the structure the missing parts of a residue or of a bound ligand, using dictionaries of standard geometries. In addition, the tautomeric state of a histidine or of a bound ligand with tautomeric groups is inferred since the hydrogen atoms diffract too weakly to be observed in an electron density map. Likewise, the ionizations state of a protein or its ligand is assumed. The resolution of the data, expressed in Å is an important parameter to assess the accuracy and precision of a crystal structure. For application of GRID/CPCA (Table 2), we used to consider structures with a resolution of at least 2.5 Å (the lower the number, the higher is the resolution). As a general rule of thumb, the error on the atomic coordinates amounts to 1/6 of the resolution (57). Recently, Hartshorn and colleagues described a procedure to select high resolution protein ligand complexes that can be further assessed by reconstructing the electron density map (58). This procedure involves the calculation of a density correlation index, which measures how well the observed electron density correlates with the positions of the ligand atoms in a given structure. They applied this procedure to build a high quality test set for validating the performance of docking algorithms. It would also be worth using this protocol to select protein ligand complexes when deriving a protein-based pharmacophore. 2. Using several crystal structures of the same target reduces the chance of obtaining spurious results originating from insignificant differences in crystal structures. Also, a careful selection of one representative crystal structure (see Note 1) does not eliminate the risk of selecting a “nonrelevant” structure, since cr ystallization conditions (which are not necessarily the same as in a biological medium) might strongly affect the 3D
292
Pirard
structure (56). Considering multiple structures of the same target also offers the possibility to account for protein flexibility and to sample different conformations, which are accessible in the presence of a bound ligand. We found the following protocol useful to select a representative set of conformations for a given target: (a) Align the 3D structures in a common reference frame; (b) Characterize their binding sites using the GRID C3 probe (steric interactions); (c) Perform a PCA on the repulsive GRID interaction energy values; (d) Inspect the PCA score plot and the differential loading plots for pairs of structures belonging to different clusters in the score plot; (e) Select at least one representative from each cluster. Each cluster corresponds to a different conformation and hence size and shape of the binding site. 3. The structure of the score plot is an indicator of the relevance of a GRID/CPCA model. Because of the noise associated with protein structures, the amount of variance explained by RC1–PC3 is lower (typically between 20 and 40%) than usually observed for a good PCA model. 4. Some alignment-independent descriptors can be derived from a set of MIFs computed by the GRID program. Pastor et al. developed the GRIND approach (59). Briefly, this method involves two steps: (a) Selection of a fixed number of nodes by optimizing a scoring function, which takes into account the intensity of the field at a node (more negative values are favored) as well as the distance between the nodes (to be maximized). (b) Application of an autocorrelation transform to encode the selected nodes into alignment-independent variables. This procedure computes the product of the interaction energy for each pair of nodes and handles the results according to the distance between the nodes. The results of the analysis can be displayed as correlograms, where the products of the node–node energies are reported versus the distance separating the nodes. Vulpetti et al. (47) generated GRIND for the ATP sites of CDK2/Cyclin A and GSK3b crystal structures. The GRIND served as input for a CPCA. Interpretation of the CPCA model resulted in the identification of the same discriminating features as those identified by GRID/CPCA. GRIND/CPCA does not require any alignment of the protein structures. However, the alignmentindependent nature of the GRIND makes their interpretation more difficult. The same kind of interactions happening at similar node–node distances might occur in different areas of the proteins being compared. Baroni et al. (60) developed the FLAP algorithm to describe protein and ligand structures, using four-point pharmacophore
Structure-Based Chemogenomics: Analysis of Protein Family Landscapes
293
fingerprints. Application of FLAP to compare binding sites involves six steps: 1. Use GRID to compute the MIFs for protein binding sites. 2. Condense the information contained in the GRID map into fewer target-based pharmacophoric points. These pharmacophoric points are selected to encompass the minimum energy regions for the different probes and to cover the different regions of the binding site. 3. Generate all possible energetically favorable arrangements of four pharmacophore points in the regions chosen to map the binding site. 4. Generate a protein fingerprint for each binding site. A protein fingerprint encodes the four point pharmacophore generated at step 3. 5. Compare the protein fingerprints generated for different binding sites. This comparison, which does not require any alignment of the protein, produces similarity metrics. 6. Use the similarity metrics as input for multivariate statistical methods like PCA. Baroni applied this protocol to compare the ATP sites of 23 kinase crystal structures, covering four kinase subfamilies (60). The resulting model could separate the four kinase subfamilies. Cavbase is another alignment-independent approach to compare protein binding site (61, 62). A protein cavity is described by a set of surface-exposed physicochemical properties (aliphatic, H bond donor, H bond acceptor, H bond donor–acceptor, pi). The cavity shape, the set of assigned descriptors of exposed recognition properties and the corresponding surface patches together with information on the individual cavity occupants are stored in a database, known as Cavbase. Cavbase can be searched, using a clique algorithm to detect common subgraphs, generated by the nodes corresponding to pairs of pseudocenters of equivalent properties and similar mutual distances. Appropriate tolerances have been included to account for structural variability. The solutions are ranked by scoring their corresponding matches in terms of the assigned surface-exposed physicochemical properties. This approach is entirely based on physicochemical properties of accessible surface patches and is independent of sequence or fold homology. The classification of protein binding sites is the main application of Cavbase (63, 64). Cavbase similarity values can serve as input for clustering or multivariate analysis like PCA. Detecting similarities between the binding sites of unrelated proteins is one of the strengths of Cavbase (55). A Cavbase similarity search also returns the chemical structures of the ligands occupying the cavities under comparison. Hence a Cavbase search can also generate ideas for de novo design (65).
294
Pirard
References 1. Caron, P. R., Mullican, M. D., Mashal, R. D., Wilson, K. P., Su, M. S., and Murcko, M. A. (2001) Chemogenomic approaches to drug discovery. Curr. Opin. Chem. Biol. 5, 464–470. 2. Bleicher, K. H. (2002) Chemogenomics: bridging a drug discovery gap. Curr. Med. Chem. 9, 2077–2084. 3. Bredel, M. and Jacoby, E. (2004) Chemo genomics: an emerging strategy for rapid target and drug discovery. Nat. Rev. Genet. 5, 262–275. 4. Shuttleworth, S. J., Connors, R. V., Fu, J., Liu, J., Lizarzaburu, M. E., Qiu, W., Sharma, R., Wanska, M., Malgorzata, Z., and Alex, J. (2005) Design and synthesis of protein superfamily-targeted chemical libraries for lead identification and optimization. Curr. Med. Chem. 12, 1239–1281. 5. Klabunde, T. (2007) Chemogenomic approaches to drug discovery: similar receptors bind similar ligands. Br. J. Pharmacol. 152, 5–7. 6. Rognan, D. (2007) Chemogenomic approaches to rational drug design. Br. J. Pharmacol. 152, 38–52. 7. Bergner, A. and Günther, J. (2004) Structural aspects of binding site similarity: a 3D upgrade for chemogenomics. Methods Principles Med. Chem. 22, 97–135. 8. Berman, H. M., Westbrook, J., Feng, Z., Gililand, G., Bhat, T. N., Weissig, H., Shindyaloy, I. N., and Bourne, P. E. (2000) The protein data bank. Nucleic Acids Res. 28, 235–242. 9. Hendlich, M., Bergner, A., Günther, J., and Klebe, G. (2003) Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. J. Mol. Biol. 326, 607–620. 10. Chalk, A. J., Worth, C. L., Overington, J. P., and Chan, A. W. (2004) PDBLIG: classification of small molecular protein binding in the protein data bank. J. Med. Chem. 47, 3807–3816. 11. Wang, R., Fang, X., Lu, Y., Yang, C. Y., and Wang, S. (2005) The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119. 12. Wang, R., Fang, X., Lu, Y., and Wang, S. (2004) The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980. 13. Kellenberger, E., Muller, P., Schalon, C., Bret, G., Foata, N., and Rognan, D. (2006)
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
sc-PDB: an annotated database of druggable binding sites from the protein databank. J. Chem. Inf. Model. 46, 717–727. Gold, N. D. and Jackson, R. M. (2006) A searchable database for comparing proteinligand binding sites for the analysis of structure-function relationships. J. Chem. Inf. Model. 46, 736–742. Günther, J., Bergner, A., Hendlich, M., and Klebe, G. (2003) Utilising structural knowledge in drug design strategies: applications using Relibase. J. Mol. Biol. 326, 621–636. Nicolotti, O., Miscioscia, T. F., Leonetti, F., Muncipinto, G., and Carotti, A. (2007) Screening of matrix metalloproteinases available from the protein data bank: insights into biological functions, domain organization, and zinc binding groups. J. Chem. Inf. Model. 46, 2439–2448. Jacobs, M. D., Caron, P. R., and Hare, B. J. (2008) Classifying protein kinase structures guides use of ligand-selectivity profiles to predict inactive conformations: structure of lck/imatinib complex. Proteins 70, 1451– 1460. Wolber, G., Seidel, T., Bendix, F., and Langer, T. (2008) Molecule-pharmacophore superpositioning and pattern matching in computational drug design. Drug Discov. Today 13, 23–29. Wolber, G. and Langer, T. (2005) LigandScout: 3D-pharmacophores derived from protein bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 45, 160–169. Liao, J. J. L. (2007) Molecular recognition of protein kinase binding pockets for designing potent and selective kinase inhibitors. J. Med. Chem. 50, 409–424. Aronov, A. M. and Murcko, M. A. (2004) Toward a pharmacophore for kinase frequent hitters. J. Med. Chem. 47, 5616– 5619. McGregor, M. J. (2007) A pharmacophore map for small molecule protein kinase inhibitors. J. Chem. Inf. Model. 47, 2374– 2382. Martin, E. J. and Sullivan, D. C. (2008) Surrogate AutoShim: predocking into a universal ensemble kinase receptor for three dimensional activity prediction, very quickly without a crystal structure. J. Chem. Inf. Model. 48, 873–881. Martin, E. J. and Sullivan, D. C. (2008) AutoShim: empirically corrected scoring
25.
26.
27.
28. 29.
30.
31. 32.
33.
34.
35.
36.
Structure-Based Chemogenomics: Analysis of Protein Family Landscapes functions for quantitative docking with a crystal structure and IC50 training data. J. Chem. Inf. Model. 48, 861–872. Liu, Y. and Gray, N. S. (2006) Rational design of inhibitors that bind to inactive kinase conformations. Nat. Chem. Biol. 7, 358–364. Okram, B., Nagle, A., Adrian, F. J., Lee, C., Ren, P., Wang, X., Sim, T., Xie, Y., Wang, X., Xia, G., Spraggon, G., Warmuth, M., Liu, Y., and Gray, N. S. (2006) A general strategy for creating “inactive-conformation” abl inhibitors. Chem. Biol. 13, 779–786. Sotriffer, C. and Klebe, G. (2002) Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design. Farmaco 57, 243–251. Goodford, P. (2006) The basic principles of GRID. Methods Principles Med. Chem. 27, 3–26. Gohlke, H., Hendlich, M., and Klebe, G. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 295, 337–356. Gohlke, H., Hendlich, M., and Klebe, G. (2000) Predicting binding modes, binding affinities and “hot spots” for protein-ligand complexes using a knowledge-based scoring function. Perspect. Drug Discov. Design 20, 115–144. Cruciani, G. and Goodford, P. J. (1994) A search for specificity in DNA-drug interactions. J. Mol. Graph. 12, 116–129. Pastor, M. and Cruciani, G. (1995) A novel strategy for improving ligand selectivity in receptor-based drug design. J. Med. Chem. 38, 4637–4647. Matter, H. and Schwab, W. (1999) Affinity and selectivity of matrix metalloproteinase inhibitors: a chemometrical study from the perspective of ligands and proteins. J. Med. Chem. 42, 4506–4523. Filipponi, E., Cecchetti, V., Tabarrini, O., Bonelli, D., and Fravolini, A. (2000) Chemometric rationalization of the structural and physicochemical basis for selective cyclooxygenase-2 inhibition: toward more specific ligands. J. Comput.Aided Mol. Des. 14, 277–291. Kastenholz, M. A., Pastor, M., Cruciani, G., Haaksma, E. E. J., and Fox, T. (2000) GRID/CPCA: a new computational tool to design selective ligands. J. Med. Chem. 43, 3033–3044. Wold, S., Esbensen, K., and Geladi, P. (1987) Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52.
295
37. Westerhuis, J. A., Kourti, T., and Macgregor, J. F. (1998) Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 12, 301–321. 38. Baroni, M., Costantino, G., Cruciani, G., Riganelli, D., Valigi, R., and Clementi, S. (1993) Generating optimal linear PLS estimations (GOLPE): an advanced chemometric tool for handling 3D-QSAR problems. Quant. Struct.-Act. Relat. 12, 9–20. 39. Ridderström, M., Zamora, I., Fjellström, O., and Andersson, T. B. (2001) Analysis of selective regions in the active sites of human cytochromes P450, 2C8, 2C9, 2C18, and 2C19 homology models using GRID/ CPCA. J. Med. Chem. 44, 4072–4081. 40. Naumann, T. and Matter, H. (2002) Structural classification of protein kinases using 3D molecular interaction field analysis of their ligand binding sites: target family landscapes. J. Med. Chem. 45, 2366–2378. 41. Terp, G. E., Cruciani, G., Christensen, I. T., and Jørgensen, F. S. (2002) Structural differences of matrix metalloproteinases with potential implications for inhibitor selectivity examined by the GRID/CPCA approach. J. Med. Chem. 45, 2675–2684. 42. Kurz, M., Brachvogel, V., Matter, H., Stengelin, S., Thuring, H., and Kramer, W. (2003) Insights into the bile acid transportation system: the human ileal lipid-binding protein-cholyltaurine complex and its comparison with homologous structures. Proteins 50, 312–328. 43. Myshkin, E. and Wang, B. (2003) Chemometrical classification of ephrin ligands and Eph kinases using GRID/CPCA approach. J. Chem. Inf. Comput. Sci. 43, 1004–1010. 44. Ji, H., Li, H., Flinspach, M., Poulos, T., and Silverman, R. B. (2003) Computer modeling of selective regions in the active site of nitric oxide synthases: implications for the design of iso-form selective inhibitors. J. Med. Chem. 46, 5700–5711. 45. Pirard, B. (2003) Peroxisome proliferatoractivated receptors target family landscape: a chemometrical approach to ligand selectivity based on protein binding site analysis. J. Comput. Aided Mol. Des. 17, 785–796. 46. Matter, H., Kumar, H. S. A., Fedorov, R., Frey, A., Kotsonis, P., Hartmann, E., Froehlich, L. G., Reif, A., Pfeiderer, W., Scheurer, P., Ghosh, D. K., Schlichting, I., and Schmidt, H. H. (2005) Structural analysis of isoform-specific inhibitors targeting the tetrahydrobiopterin binding site of human nitric oxide synthases. J. Med. Chem. 48, 4783–4792.
296
Pirard
47. Vulpetti, A., Crivori, P., Cameron, A., Bertrand, J., Brasca, M. G., D’Alessio, R., and Pevarello, P. (2005) Structure-based approaches to improve selectivity: CDK2-GSK3b binding site analysis. J. Chem. Inf. Model. 45, 1282–1290. 48. Pirard, B. and Matter, H. (2006) Matrix metalloproteinase target family landscape: a chemometrical approach to ligand selectivity based on protein binding site analysis. J. Med. Chem. 49, 51–69. 49. Fox, T. (2006) Protein selectivity studies using GRID-MIFs. Methods Principles Med. Chem. 27, 45–82. 50. Sheridan, R. P., Holloway, M. K., McGaughey, G., Mosley, R. T., and Singh, S. B. (2002) A simple method for visualizing the differences between related receptor sites. J. Mol. Graph. Model. 21, 217–225. 51. Miller, M. D., Kearsley, S. K., Underwood, D. J., and Sheridan, R. P. (1994) FLOG: a system to select quasi-flexible ligands complementary to a receptor of known threedimensional structure. J. Comput. Aided Mol. Des. 8, 153–174. 52. Cahart, R. E., Smith, D. H., and Venkataraghavan, R. (1985) Atom pairs as molecular features in structure–activity studies: definition and application. J. Chem. Inf. Comput. Sci. 25, 64–73. 53. Sheridan, R. P., Nachbar, R. B., and Bush, B. L. (1994) Extending the trend vector: the trend matrix and sample-based partialleast-squares. J. Comput. Aided Mol. Des. 8, 323–340. 54. Hoppe, C., Steinbeck, C. K., and Wohlfahrt, G. (2006) Classification and comparison of ligand-binding sites derived from grid-mapped knowledge-based potentials. J. Mol. Graph. Model. 24, 328–340. 55. Weber, A., Casini, A., Heine, A., Kuhn, D., Supuran, C. T., Scozzafava, A., and Klebe, G. (2004) Unexpected nanomolar inhibition of carbonic anhydrase by COX-2 selective cleecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 47, 550–557. 56. Davis, A. M., Teague, S. J., and Kleywegt, G. (2003) Application and limitations of X-ray crystallographic data in structurebased ligand and drug design. Angew. Chem. Int. Ed. Engl. 42, 2718–2736.
57. Boehm, H. J., and Klebe, G. (1996) What can we learn from molecular recognition in protein-ligand complexes for the design of new drugs? Angew. Chem. Int. Ed. Engl. 35, 2588–2614. 58. Hartshorn, M. J., Verdonk, M. L., Chessari, G., Brewerton, S. C., Mooij, W. T. M., Mortenson, P. N., and Murray, C. W. (2007) Diverse, high-quality test set for the validation of protein-ligand docking performance. J. Med. Chem. 50, 726–741. 59. Pastor, M., Cruciani, G., McLay, I., Pickett, S., and Clementi, S. (2000) Grid-Independent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. J. Med. Chem. 43, 3233–3243. 60. Baroni, M., Cruciani, G., Sciabola, S., Perruccio, F., and Mason, J. S. (2007) A common reference framework for analyzing/ comparing proteins and ligands. Fingerprints for ligands and proteins (FLAP): theory and application. J. Chem. Inf. Model. 47, 279–294. 61. Schmitt, S., Hendlich, M., and Klebe, G. (2001) From structure to function: a new approach to detect functional similarity among proteins independent from sequence and fold homology. Angew. Chem. Int. Ed. Engl. 40, 3141–3144. 62. Schmitt, S., Kuhn, D., and Klebe, G. (2002) A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323, 387–406. 63. Kuhn, D., Weskamp, N., Schmitt, S., Hüllermeier, E., and Klebe, G. (2006) From the similarity analysis of protein cavities to the functional classification of protein families using cavbase. J. Mol. Biol. 359, 1023– 1044. 64. Kuhn, D., Weskamp, N., Hüllermeier, E., and Klebe, G. (2007) Functional classification of protein kinase binding sites using cavbase. ChemMedChem. 2, 1432–1447. 65. Gerlach, C., Münzel, C., Baum, B., Gerber, H. D., Craan, T., Diederich, W. E., and Klebe, G. (2007) KNOBLE: a knowledgebased approach for the design and synthesis of readily accessible small-molecule chemical probes to test protein binding. Angew. Chem. Int. Ed. Engl. 46, 9105–9109.
1
Chapter 13 Hypothesis-Driven Screening Ulrich Schopfer, Caroline Engeloch, Frank Höhn, Hervé Mees, Jennifer Leeds, Fraser Glickman, Günther Scheel, Sandrine Ferrand, Peter Fekkes, and Martin Pfeifer
2 3 4 5
Summary
6
Phenotypic chemogenomics studies require screening strategies that account for the complex nature of the experimental system. Unknown mechanism of action and high frequency of false positives and false negatives necessitate iterative experiments based on hypotheses formed on the basis of results from the previous step. Process-driven High Throughput Screening (HTS), aiming to “industrialize” lead finding and developed to maximize throughput, is rarely affording sufficient flexibility to design hypothesis-based experiments. In this contribution, we describe a High Throughput Cherry Picking (HTCP) system based on acoustic dispensing technology that was developed to support a new screening paradigm. We demonstrate the power of hypothesis-based screening in three chemogenomics studies that were recently conducted.
7 8 9 10 11 12 13 14 15
Key words: High throughput screening, Compound management, Hypothesis-driven screening, High throughput cherry picking, Bacterial growth inhibition, Biosynthetic pathway, Yeast growth modulation
16 17 18
1. Introduction
19
High Throughput Screening (HTS) is often regarded as the main culprit for the decline in research productivity at large pharmaceutical companies (1). It is portrayed as mindless screening of large compound collections with unsuitable properties. Disappointingly, this impression overlooks the positive developmental trajectory that the young science of screening has taken over the last 15 years. However, the negative view does highlight the two most critical factors for a successful HTS: a carefully planned and executed screening strategy and a high-quality, diverse compound Edgar Jacoby (ed.), Chemogenomics, Methods in Molecular Biology, vol. 575 DOI 10.1007/978-1-60761-274-2_13, © Humana Press, a part of Springer Science + Business Media, LLC 2009
297
20 21 22 23 24 25 26 27 28
298
Schopfer et al.
library. Therefore, focusing on the investments in automation as intellectually inferior attempts to replace the scientific method with massive screening overlooks an important distinction; automation increases the speed and throughput of experimentation, but it is not a replacement for a well-validated experimental design, which ultimately determines the usefulness of the results. Similar to previous technological developments, for example, combinatorial chemistry, it is the initial euphoria that, when followed by disappointment, leads to frustration. Only later, when the technology is applied to commensurate problems and when it is combined with other approaches does the technology show its true potential. One of the most exciting new themes in this context is the positioning of HTS in the repertoire of chemogenomic approaches (2, 3). The goal of chemogenomics is to elucidate gene function by perturbation of the gene product or associated pathway with a small molecule. The ability of HTS to employ over a million effector molecules to investigate individual components of a larger cellular system provides science with a powerful, rational strategy to discover target–ligand pairs. A phenotypic approach that screens for perturbations of cellular systems, e.g., in antibacterial discovery, can be defined as a forward chemogenomic strategy (function ® gene). Likewise, a biochemical screen for compounds that bind to a purified protein, e.g., a nuclear receptor, constitutes a reverse chemogenomic approach (gene ® function) (4). Forward chemogenomic approaches are utilized when little is known about a potential target, and approaches such as docking and virtual screening cannot be used to identify ligands. HTS methods are ideally suited to tackle such problems since they can initiate, e.g., in the case of a cellular phenotypic screen, with very little information about targets. Once a target has been identified, the chemistry space can be explored in greater depth through subsequent rounds of screening to discover viable small molecule leads for chemical optimization. Further rounds of screening will determine SAR around the previously identified chemotype. The development of HTS was recently reviewed by Hertzberg et al. and Macarron et al. (5, 6). The early days of HTS were characterized by a focus on quantity and speed. Compound collections were assembled mainly from internal sources, e.g., from agricultural or dye programs, without giving too much consideration to the suitability of compounds for drug discovery. Combinatorial chemistry was in its infancy, using mainly amide formation to produce large libraries of compounds. However, this period also saw the development of some of the essential tools of today. Under the auspices of the Society for BioMolecular Screening (7), the standards for high-density microplates were developed. This triggered rapid development of laboratory automation, with the most impressive results in multiparallel liquid
Hypothesis-Driven Screening
299
handling. Homogeneous assays were developed that supported the “mix-and-read” mode which enables throughput in excess of 100,000 wells per day. Since the 1990s, an intense focus on quality has eliminated many of the earlier weaknesses in HTS. A strong emphasis on physicochemical properties of compounds led to the development of concepts such as drug-like and lead-like properties (8–10) which had a large impact on the direction of compound libraries. The statistical analysis of HTS data allowed a better interpretation of primary data and quality parameters emerged that guided assay development and execution (9, 10). Efforts to “industrialize” HTS were directed at reducing variability at all stages of the process (11). Not only the practice of HTS improved, also the interfaces to target biology and chemistry were optimized. Upstream, target validation and “druggability” were assessed, and downstream, a new hit-to-lead process (12) emerged that attempted to generate a few high quality lead compounds from large hit lists. Looking back over the last 20 years, a powerful set of technologies has been developed that allows massive, parallel experimentation to produce high quality data. While there continues to be further development of automation and screening, HTS can be considered to be a mature technology today. The focus on performance and standardization has led to a process-driven screening paradigm that is mainly governed by the capabilities of large, automated screening factories. While the process-driven HTS approach is typically described as a linear sequence of steps, future lead finding strategies will incorporate a rational selection of parallel and iterative experiments. HTS practitioners will shift their focus from technology development to skillful application of screening technologies to investigate biological problems. A well-defined scientific question leads to the formation of a robust experimental strategy which is designed to answer that specific question. The results, regardless of whether they support or reject the hypothesis, in turn help to define the next round of experiments. This iterative strategy is very different from the historical HTS paradigm, which was largely process-driven and guided by the capabilities of the automation and logistics processes of a screening operation (Fig. 1).
Today
Future
Process-driven
Hypothesis-driven
Sequential
Parallel
Filtering
Iterative
Fig. 1. Process-driven versus hypothesis-driven screening.
300
Schopfer et al. • Target focus • Biophysical properties • Chemistry space
HTD
Enriched set
HTS
• Time/Cost • Project phase • Cell type • Assay format
Hypothesis/ Constraint
Compounds screened
106
Focussed/ Diversity Set
104
CRC, Counterscreen, Orthogonal Assay, Secondary Assay
IC50
103
Hit-to-Lead Profiling and Chemistry
IC50
102
Confirmation/ Triaging
Fig. 2. Hypothesis-driven screening combines iterative and parallel elements. H2L Hit-to-Lead.
Increasingly, full-deck screens will be followed, preceeded or even replaced by Medium Throughput Screens (MTS) of preselected compound sets (Fig. 2). Cheminformatics methods will be used to a much larger extent to customize compound selections for more diverse, or more focused, sets. The scope of cheminformatics will expand beyond the current focus on reduction of hitlists (“triaging”) to a broader panel of activities that guide hitlist follow-up, including prediction of pharmacophores, expansion of chemotypes, and prediction of false positives. High Content Screening (HCS) or Fragment Based Screening (FBS) will be used routinely in lead discovery projects alongside HTS activities. Technical feasibility and financial constraints e.g., in assays with primary cells or Imaging/HCS, limit the number of compounds that can realistically be screened. In these cases, iterative or focused strategies can offer a useful alternative to whole-library HTS. Pilot screens that are geared towards helping to develop and validate follow-up assays, or focused screens that deliver tools for target validation and assay development are useful approaches to increase the likelihood of a successful downstream HTS. The use of High Throughput Docking (HTD) to predict targeted libraries or combining screens with multiple readouts are other ways to enhance the information content of screening programs. Logistics, automation and data handling systems will have to be adapted to support the new paradigm, which necessitates follow-up such as orthogonal screens, counter screens and other iterative interrogations of a primary hitlist. HTS systems have to be designed to support experimentation, such that one addresses
Hypothesis-Driven Screening
301
a scientific hypothesis, rather than simply conforming to an industrialized process. As a prototype for next generation automation which supports this paradigm, we discuss the design of a High Throughput Cherry Picking (HTCP) system based on acoustic dispensing. In the examples illustrated below, we demonstrate how hypothesis-based screening combines input from virtual screening, structural biology and other approaches to support a powerful strategy for both target and lead discovery via chemogenomics.
2. Methods 2.1. High Throughput Cherry Picking: System Design
Traditional plate logistics systems produce copies of preformatted 384-well or 1536-well master plates. This allows the efficient supply of standardized sets in excess of one million compounds. These systems also have a “cherry-picking” capability, allowing random access to individual samples that are typically stored in individual tubes (13) and providing 1,000–10,000 samples per day. However, the aforementioned systems offer only limited flexibility for the assembly of focused compound sets. Hypothesis-based screening requires logistics that allow to select large compound sets to be assembled with flexibility of plate layout. By increasing the number of samples that can be picked, large hitlists from HTD or from primary screening can be tested. Flexible plate layouts allowing for repeats and multiple concentrations of compounds enable the researcher to tailor the experiment to the scientific question. For the design of a HTCP system the following key requirements were identified: (1) Storage of approximately 1,000 source plates in 1536-well format, (2) Maintenance of sample integrity for at least 6 months, (3) 40,000 picks per 24 h in 20–1,000 nL volumes, (4) Dispensing quality with CVs £10% and a bias £10%. In collaboration with Velocity11 (14), a concept was developed having as key components a 1536-well plate store and the innovative acoustic dispensing technology. A full copy of the Novartis screening deck, as 2 mM DMSO stock solutions in 1536-well source plates, is housed in an integrated cold storage unit. The major technological advance in this new system is the ECHO device (Labcyte), an acoustic dispenser that is capable of ejecting solution droplets. Volumes from 2.5 nL up to 1 mL can be transferred with high accuracy over repeated dispensing. The HTCP system is built on the BioCel 1800 platform, integrating a number of specialized modules designed for specific tasks (Table 1).
302
Schopfer et al.
Table 1 High throughput cherry picking, main system components Instrument
Provider
Description
BioCel 1800
Velocity 11
Automation platform
VSpin
Velocity 11
Microplate centrifuge
VStack
Velocity 11
Labware stacker
Nanodrop I
Innovadyne
Automated liquid pipettor for low volume back-fill dispensing
ECHO 550
Labcyte
Compound reformatter for low volume compound transfer and low volume back-fill DMSO dispensing
Cytomat 44
Thermo
Controlled climate automated storage system for microplates
2.2. Acoustic Dispensing
The ECHO device is a reliable liquid handler designed to dispense in the nanoliter range. The solution transfer is based on a contact-free technology that relies upon droplet ejections e.g. elicited by acoustic energy. An ultrasonic transducer circulated by a temperated immersion fluid is placed below the bottom of the source plate. An ultrasonic wave is transmitted through the immersion fluid, across the bottom of the source plate, into the solution in the well, and is refracted at the meniscus of the solution (Fig. 3). The time of flight of the wave echo, recorded by the transducer, is calculated and gives exact information regarding the solution level in the well and the DMSO content of that solution. This calculation defines the exact amount of energy that is necessary to eject a droplet of e.g. 2.5 nL from the liquid surface and have it reach the bottom of a well in the inverted destination plate. The desired dispense volume is ultimately achieved by repeatedly exciting droplets with a frequency of close to 500 Hz.
2.3. Compound Storage
For the storage of low volumes (typically 6 mL per well) over extended periods of time, evaporation poses a significant challenge. A 1536-well COC plate (15) which was optimized to minimize evaporation by reducing laminar flows between lid and plate was developed jointly by Novartis and Greiner. In a long term storage study, the advantage of this optimized plate design could be shown (Fig. 4). The volume of the source plates stored in a Cytomat 44 (16) incubator was monitored using a volume audit functionality of the ECHO 550. These audits report the DMSO concentration and the liquid level per source plate well. The source plates were filled with 5 µL of 90% DMSO solution and stored at 4°C and at 15% relative humidity. The plates were loaded and unloaded from the store at daily
Hypothesis-Driven Screening
303
Fig. 3. Principle of acoustic dispensing. An ultrasonic transducer ejects droplets from the source well by the application of focussed sound energy. Figure reproduced with permission of Labcyte.
Fig. 4. Compound storage. (a) Scheme of the Novartis/Greiner 1536w COC storage plate. (b) Evaporation profiles of a source plate audited after loading and (c) after 3 months storage. Less than 1 mL volume loss was detected in the corner wells. Each point represents a well volume at the time of the audit.
intervals in order to simulate frequent plate access. After 10 weeks the average volume loss in the corner wells of these plates was less than 1 µL.
304
Schopfer et al.
The Novartis approach of storing stocks solutions at 2 mM concentration in a 90:10 DMSO:water mixture (15, 17) provides a critical advantage for the HTCP system. With the addition of 10% water to DMSO, the freezing point of the mixture drops from 18 to below 4°C. This freezing point depression avoids the potential for repeated freeze/thaw cycles which could impact the solubility of some compounds. Since no melting step is required, sample access is rapid. At a storage temperature of 4°C and a relative humidity of 15–20% the water content of a 90:10 DMSO:water mixture is close to equilibrium so that very little further water is absorbed from the atmosphere. 2.4. Accuracy and Precision
Source plates (1536-well format) filled with 5 mL of a freshly prepared 10 mM fluorescein solution (obtained from a 10 mM stock solution) were loaded onto the ECHO device. Protocols with single volume dispense commands (50 nL, 40 nL, 10 nL) were then generated in order to produce three 1536-well black plates per test volume. The requested volume was dispensed to all wells except to columns 45–48 which were reserved for calibration and blank samples (1,408 data points per plate). Five microliters of diluent (buffer solution of PBS with 0.1% CHAPS) were then transferred to columns 1–44. After the addition of calibration solution to columns 45–48 the plate was measured on an Envision plater reader (18) (Table 2). The accuracy and precision values are excellent and in accordance with specifications reported by Labcyte (19).
2.5. System Throughput
The throughput of the HTCP system was found to be dependent of a number of parameters. The picking rate of the acoustic dispenser ECHO 550 determines the maximum picking rate that can be achieved. However, with a cycle time of 2 min for each source plate that needs to be accessed, the retrieval of source plates from the store can become a bottleneck. Therefore, the
Table 2 Accuracy and precision of liquid transfers from a 1536-well source plate to a 1536-well destination plate on the ECHO 550 acoustic dispenser Transfer volume (nL)
CV (%)
Bias (%)
2.5
6.2
1.1
10
5.3
1.6
40
5.2
1.4
50
4.5
1.1
Hypothesis-Driven Screening
305
200 300 400 500 600 picks (destination wells) per source plate
700
picks (destination wells) per min
120 100 80 60 40 20
0 0
100
Fig. 5. Dependency of HTCP pick rates on source plate density. The pick rate (number of produced wells per minute) depends on the number of picks that can be taken from the same source plate.
number of compounds picked from the same source plate and the number of replicates and concentrations derived from one source well can have a dramatic influence on the pick rate (Fig. 5). The amount of stock solution transferred from one source well is also an important parameter. The time needed to transfer volumes greater than 200 nL can have a measurable impact on the overall pick rate.
3. Results and Discussion In this section, we exemplify the power of hypothesis-driven screening. Three chemogenomics projects, and the impact of flexible experimental design enabled by HTCP Picking capabilities are discussed. Two of these examples are drawn from the area of infectious diseases; the third example describes a neuroscience project. Infectious diseases continue to be one of the major causes of global morbidity and mortality. Nevertheless, there have only been two truly novel antibiotic chemical entities approved by the FDA for human use in over 25 years; the oxazolidinones including linezolid, and the lipopeptide class exemplified by daptomycin. Notably, both of these “new” drug classes were identified by forward-chemogenomics, cell-based screening against bacterial pathogens during the “golden era” of antimicrobial discovery (20). The “new era” of drug discovery emphasized the screening
306
Schopfer et al.
of enormous synthetic chemical libraries against purified target enzymes. The challenge was to achieve potent antibacterial activity from inhibitors obtained via in vitro enzymatic screens, largely through empirical medicinal chemistry programs. The outcome of the in vitro target/synthetic chemical entity screening approach has, thus far, not led to new market introductions. Most compounds currently in clinical trials are derivatives of known chemical classes for which antimicrobial resistance mechanisms already exist in the clinical setting (21). To improve our chances of discovering quality leads, we complemented the target-based antimicrobial discovery programs with cell-based high throughput antibacterial screens. This allowed us to harness the power of chemogenomics, a time-honored tool in the study of bacterial genetics and microbial physiology, to discover new targets and inhibitors. Combining large scale screening with rapid follow-up studies that were guided by cheminformatics to assist in chemical triaging, target identification, and liability assessment, we were able to take a more systematic approach to the otherwise “blind” process that characterized early cell-based antimicrobial drug discovery (22). 3.1. Bacterial Growth Inhibition
Two cell-based antibacterial HTS programs were run at Novartis, separated by a 3-year interval (Fig. 6). In the first HTS 600,000 compounds were tested and 700,000 additional compounds were assayed in the second HTS. In both HTS, growth of the Gram positive pathogen Staphylococcus aureus was assessed at a single compound concentration (40 mM). Bacterial growth inhibition was monitored by measuring the reduction of the nonfluorescent substrate resazurin to the fluorescent product resorufin (23). A comparison of the throughput and timelines following each of the primary screens illustrates the impact of the HTCP system as an enabling technology for rapid, hypothesis-driven iterations of testing. The first HTS yielded approximately 50,000 hits capable of imparting >95% inhibition of S. aureus growth at 40 mM. Since cherry-picking capacities were limited at this time, a strategy was designed to select the most promising candidates from this hit list. The hypothesis was that the application of strict in silico filters could achieve an enrichment of novel compounds with favorable “drug-like” properties, and the least potential for promiscuity (24, 25). Broad clustering algorithms were then applied to capture the largest diversity from the smallest number of compounds. Compounds containing overtly toxic pharmacophores, known to be unstable or impure, containing features considered unsuitable for further development, or with similarity to marketed antibacterial scaffolds were removed from further consideration. Having removed over 90% of the compounds from the primary hit list,
Hypothesis-Driven Screening
307
Fig. 6. Timelines and compound throughput for two Novartis HTS for bacterial growth inhibitors. The top and bottom portions depict the compound flows in two HTS campaigns against S. aureus in a five-year interval.
only 3,375 compounds were selected for confirmation screening. Of those, 2,465 were confirmed to inhibit bacterial growth by at least 50% at 40 mM, and were selected for dose-response assays. At the completion of the 4-month HTS campaign, 1,995 validated compounds were advanced into the downstream flowchart. While potent compounds emerged from this strategy, and lead optimization programs ensued, it also became apparent that the initial selection criteria were too stringent. Many antibacterials fall outside of the “drug like” profile because, for example, they are large, lipophilic molecules that either do not have to enter the cell cytoplasm, or have dedicated transporters (24). In an effort to cast a wider net for attractive antibacterial starting points, the hit list was submitted to a second iteration of testing. Approximately 24,000 compounds, after removal of the most undesirable scaffolds, were investigated. By cherry-picking 2,000 compounds per month, antibacterial confirmation screens against six organisms were performed at a single compound concentration (4 mM). 2,790 compounds inhibited growth of at least one test strain by 50% or greater. Over the course of 1 year, these antibacterial compounds were assayed for mammalian cytotoxicity. Compounds that had CC50 greater than 24 mM (the top
308
Schopfer et al.
concentration achievable in the MTS cytotoxicity assays) were then tested for antibacterial IC50 via semiautomated methods. When the entire cell-based data set was available, cheminformatics tools were applied, and downstream flowchart activities on the 227 selected hits were initiated. Several additional novel compound:target pairs were discovered from this phase of the screening follow-up, which validated the expanded approach. To summarize, hit follow-up from the first antibacterial HTS began by looking first at compounds that fulfilled traditional druglike criteria. In a second iteration, the remainder, and majority, of the hits were tested in time-consuming effort that took nearly 2 years. Despite the lengthy timelines, the efforts were rewarding, and multiple novel compound:target pairs were identified and progressed through the discovery pipeline. A second antibacterial HTS was performed 3 years after the first HTS (Fig. 6, bottom). The second HTS benefited from the lessons learned from the first program, and, most critically, from the dramatic increase in cherry-picking capacity. With the introduction of HTCP, nearly the entire set of HTS hits (35,000 compounds) was quickly assessed at 4 mM against two bacterial species. Immediately following the confirmation screen, doseresponse assays on the 2,250 active compounds were performed in parallel against five bacterial strains and three mammalian cell lines. The consolidation of bacterial and mammalian cell assays into a single process shortened the dose-response screen timeline from 1 year, following the first HTS, to 1 month following the second HTS. HTCP afforded the rapid processing of compounds through multiple biological tests, thereby enabling the selection of biologically interesting hits while reducing the reliance on criteria based on calculated properties to reduce hit lists to manageable sizes. Significant improvement in the timeline and throughput of the HTS, and the quality and quantity of data, were achieved with the HTCP capacity. Because of the enhanced screening capacity, the compounds moved quickly into downstream triaging and liability assessment which led to rapid decision making. One outcome from the first screening effort which supported the HTCP approach in the second screen was the observation that antibacterial IC50 against S. aureus was predictive of the Minimum Inhibitory Concentrations (MIC) measured by standard low-throughput, manual-read methods (26) (Fig. 7). This data provided us with the confidence to use HT methods to rapidly and accurately assess new antibacterial compounds, and to plan “top up” screens at regular intervals with minimal startup effort. The comprehensive dataset resulting from high-throughput antibacterial screens provides investigators with an easily retrievable catalog of all antibacterial compounds in the collection. Aside from driving cell-based antibacterial programs and new target/mechanism
Hypothesis-Driven Screening
309
200
MIC S. aureus (µg/ ml)
100 75 50 25 10 5
1 0.5
0.05
0.1
0.5
1
5
10
25
50
IC50 S. aureus (µg /ml)
Fig. 7. Correlation between anti-S. aureus IC50s generated in high-throughput and MICs generated by standard low-throughput methods. The graph demonstrates that the HTS antibacterial IC50 data generated using compound solutions from the HTCP system are a robust predictor of antibacterial MICs of compounds from the powder samples. The false negative rate is low, indicating that few truly antibacterial compounds would fail to be detected via the high throughput method.
of action identification, this information is powerful when triaging a hit list from an in vitro target-based screen. One can now use the antibacterials database in combination with cheminformatics to form hypotheses about cell or target based activity, and HTCP to iteratively test these. Some examples where this process has been instrumental are target validation, mechanism of antibacterial action, synergy screening, antibacterial resistance profiling, and specificity and selectivity assays. 3.2. Biosynthetic Pathway Screen
Whole-cell, bacterial growth inhibition screens can be challenging because the hit lists are typically very large and are composed of nonspecific toxic compounds, compounds which interfere with the detection steps, compounds with unknown mechanisms of action, and compounds that inhibit well-described targets. Hit lists are often too large for simultaneous liability identification and target deconvolution via mechanistic assays (24). On the other hand, target-based HTS approaches often failed to identify novel molecules with potent, broad-spectrum antibacterial activity. The paucity of enzyme inhibitors with antibacterial activity is due to several factors including the limited permeability of compounds
310
Schopfer et al.
through the cell membrane, the efflux of compounds back out of the cell, the metabolism of the compounds by the bacteria, or the presence or induction of compensatory pathways which restore bacterial metabolism even in the presence of a specific enzyme inhibitor. In order to develop a lead-finding approach that extracts the most value from the downstream flowchart, we employed a hybrid between target-based screening and cell-based screening, using “sensitized” strains of bacteria. Such sensitized strains are useful for screening because they render the bacteria more sensitive to growth inhibition by a compound that inhibits the down-regulated enzyme or potentially other enzymes in the same pathway. The expression of the target enzyme can be regulated by cloning a suitable promoter/operator sequence upstream of the coding sequence, and then modulating the expression by adjusting the level of inducer. By replacing the wild-type copy of the essential gene with a recombinant, inducible copy, the strains cannot grow in the absence of the exogenously supplied inducer. By measuring growth as a function of inducer concentration, it is possible to identify the minimal amount of inducer (and, therefore, enzyme concentration) that supports growth (Fig. 8a). By utilizing this condition, the cell line can become highly sensitized to compounds that would not otherwise be identified as growth inhibitors. By comparing the IC50 of compounds against a highly induced cell line versus the sensitized cell line, one can select hits that may specifically inhibit the down-regulated enzyme or pathway. Figure 8b depicts the IC50 of a compound against a S. aureus strain sensitized for an essential, biosynthetic enzyme. At low concentration of the inducer IPTG (isopropyl-b-d-thiogalactopyranoside), low enzyme expression renders the bacteria more 0.250 110
0.225
100
0.200
90
% of Uninhibited Control
OD 595nm
0.175 0.150 0.125 0.100 0.075
80 70 60 50 40 30
0.050
20
0.025
10
0.000
0 0
60
120
180
240
300
360
Time (min)
420
480
540
600
-10
-9
-8
-7
-6
-5
-4
Log [Inhibitor] M
Fig. 8. a IPTG-dependent growth of S. aureus sensitized strain. Inoculum »7,000 bacteria/µL, Dt = 37°C, DT = 0–10 h, Assay volume 30 µL, n = 4. [IPTG] = () 1 mM, () 500 µM, () 50 µM, () 5 µM, () 2.5 µM, () 0. b IPTG-dependent Inhibition of sensitized S. aureus Growth by a specific enzyme inhibitor. Inoculum »7,000 bacteria/µL, Dt = 37°C, DT = 8 h, Assay volume 30 µL, n = 4. [IPTG] = () 1 mM, () 5 µM, () 0.75 µM, () 0 gave IC50 values of 332, 16, 5 and 2.5 nM respectively.
Hypothesis-Driven Screening
311
sensitive to an enzyme inhibitor, whereas under high IPTG concentration, high enzyme expression renders the bacteria less sensitive to the inhibitor. A 1.5 million compound HTS was performed using this recombinant, inducible S. aureus strain cultured in clear bottom 384-well microtiter plates containing compounds at a final concentration of 10 mM. Growth was assessed by measuring culture density (absorbance at 595 nm) following 8 h incubation. The growth phenotype allowed to sort the compounds into two distinct populations (Fig. 9) and to thus identify nearly 25,000 hits. Using HTCP, the active compounds were cherry picked and diluted serially to four concentrations in two separate 384-well microtiter plates. These compounds were then tested in two parallel microbial assays. In the first assay, the cells were grown in the presence of high concentrations of IPTG (1 mM), where the growth rate is not limited by enzyme concentration. In the parallel assay, the cells were grown in the presence of a low concentration of IPTG (10 mM), where the growth rate was limited by the low concentration of the enzyme. When the data was fit to a sigmoid dose response curve, an IC50 value could be obtained for each compound under conditions of high and low IPTG. In this experiment, 1,300 of the 25,000 cherry-picked and serially diluted compounds were at least fivefold more potent against the sensitized strain. These compounds were selected as potential enzyme inhibitors with demonstrated antibacterial activity. In the end, more than 50% of these compounds were shown to inhibit the purified enzyme. The hit list represented a variety of chemical structural classes that were capable of inhibiting growth of wild-type S. aureus, but also other Gram positive bacteria that depend on the same biosynthetic pathway for survival.
Fig. 9. Histogram of compound activities. Compound number distribution versus their activity in the S. aureus growth assay. Populations of active and inactive compounds can clearly be distinguished.
312
Schopfer et al.
The HTCP system allowed us to rapidly array four point DRC libraries on a list of 25,000 primary hits allowing for a complete retesting of primary hits for cell-based selectivity measurements. Prior to the institution of the hybrid target:cell-based screening system, this rapid deployment of the downstream flowchart would have been impossible due to the unmanageably large number of hits that are identified by a cell-based screening approach. The HTCP system provided flexibility to explore large numbers of compounds to identify compounds of interest, and opens further possibilities for a less biased approach towards selecting hits for further testing in selectivity and dose-response assays. The ultimate success of the approach was based on four features: (1) direct label-free measurement of bacterial growth in a 384-well format using optical density at 595 nm, (2) the use of a sensitized bacterial strain, in which growth was dependent upon IPTG-inducible expression of an essential enzyme in a biosynthetic pathway, (3) the development of a panel of high throughput enzyme and cell-based assays to rapidly profile the hits for specificity and selectivity, (4) the ability of the HTCP System to rapidly generate focused libraries at multiple concentrations to support the selectivity and mechanism of action profiling of the compounds. 3.3. Yeast Growth Modulation Screen
It had been mentioned in the previous section that potential cytotoxic effects are an inherent challenge with cell-based screening assays especially at higher compound concentrations in the micromolar range. Hitlists of assays for inhibitors of a cellular response will also contain compounds that reduce the measured signal due to an unspecific toxic effect on the cells. During subsequent follow-up screening these false positives have to be discriminated from hits acting on the target under investigation. Usually this discrimination is achieved by retesting the hits in a range of concentrations against the specific target readout and in parallel against a general cytotoxicity measure. Such counter screens detects e.g., cellular protein or DNA content, cellular ATP levels, activity of a cellular housekeeping enzyme or cell count if the assay format allows for cell proliferation to be observed. An appropriate concentration window between target readout and cytotoxicity measurement indicates a target-specific mode of action. If the cellular system is sensitive and the primary screening concentration is in the micromolar range, unspecific cytotoxicity mechanisms may raise primary hit rates up to several percent of the screening library. In this case the hit confirmation against a cytotoxicity counter screen can become a very demanding task that cannot be accomplished without a HTCP system. In screening projects that are looking for stimulatory compounds this problem is even more pronounced, although this may not be obvious at first glance. Cytotoxic compounds suppress
Hypothesis-Driven Screening
313
cellular responses and should not give rise to false positives in a stimulation setting. However, stimulation assays can lead to false negative results when the stimulatory target-specific effect of an active compound turns into unspecific cytotoxic – and thus inhibitory – activity at higher concentrations. Depending on the width of such biphasic, bell-shaped concentration response curves and the choice of screening concentration relative to the ascending and descending parts of the concentration response curve (Fig. 10), a target-specific hit may appear as stimulatory (a), inactive (b) or inhibitory (c). Unfortunately, the most potent hits tend to be most prone to this effect and will get lost unless they exhibit a large window against cytotoxicity. Furthermore, low hit rates in stimulatory assays tempt scientists to increase the primary screening concentration, thereby increasing the false negative rate even further. Ideally, stimulatory screens should be performed over a broad range of compound concentrations which is usually not affordable. Here, another solution is described for minimizing false negative rates in stimulatory assays which heavily relies on HTCP during hit follow-up and retesting all active samples irrespective of whether they are stimulatory or inhibitory. This is best illustrated with a recent screening project based on a genetically engineered S. cerevisiae yeast cell line that expresses a toxic protein under the transcriptional control of the Gal1–10 promoter. This toxic protein is implicated in a number of diseases, and inhibitors of this protein are potential therapeutic modalities. Upon switching from a glucose-containing to a galactose-containing medium the expression of the toxic protein is turned on and the yeast cells stop to grow. The primary
100 80
A
B
C
Response
60 40 20 0 0.001
0.1
0.01
1
−20 −40 −60
Concentration
Fig. 10. Yeast growth modulation screen. Biphasic concentration response curve for target-specific stimulatory compounds that are cytotoxic at higher concentrations. A target-specific hit may appear as stimulatory (A), inactive (B) or inhibitory (C).
314
Schopfer et al.
screening assay quantifies yeast growth via luciferase-mediated determination of ATP content of the yeast lysate. This method is very sensitive and runs robustly in the 1536-well format. During the downstream flowchart an independent orthogonal readout was used for confirmation of compound activity which quantifies yeast growth directly by culture density (absorbance at 850 nm) in 384-well format. Active compounds with the desired profile would interfere with the toxic mechanism of the protein, revert the growth arrest and lead to an increase in yeast cell number over the incubation period of 20 h, irrespective of the readout method used. Primary screening of 1.2 million compounds at a concentration of 20 µM led to 2,000 stimulatory hits (Fig. 10, type A) based on a threshold of 10% increase in cell number as measured by ATP content. Another 5,900 compounds proved to be inhibitory (Fig. 10, type C) during primary screening based on a threshold of 10% reduction of cell number. Compounds of type B (Fig. 10) where a balance of stimulatory and cytotoxic activities leads to an inactive readout, can not be discriminated from truly inactives. However, a screen for general inhibitors of yeast growth inhibition had been run with the parental S. cerevisiae strain. This screen had identified 25,000 inhibitory hits, 17,500 of which did not overlap with the 2,000 stimulatory nor the 5,900 inhibitory hits from the target-based yeast growth modulation screen. The HTCP made it feasible to combine these three hit lists and retest all 25,400 hits as four point concentration response curves, covering compound concentrations from 33 to 1.2 µM. A small subset of compounds exhibited only inhibitory or neutral activity in this concentration range and was retested at a lower concentration range from 1.3 µM to 11 nM. Any compound that stimulated growth by at least 10% anywhere in the concentration range (1,743 compounds) were tested once more as eight point dilution series in the concentration range of 20 µM–6 nM in four replicates. Based on the higher statistical significance of these data, a more stringent selection threshold of at least 35% stimulation was now applied which was met by 118 compound solutions. These 118 compounds were freshly dissolved from powder samples and validated in the orthogonal culture density assay (turbidity of the culture at 850 nm), again as eight point dilution series and in four replicates. In parallel, compound solutions were checked by LC-MS and UV absorption for their chemical integrity and purity. Finally, 26 compounds remained which increased ATP content of the yeast culture by at least 35%, which increased culture turbidity by at least 20%, which had the right mass and a UV purity of at least 50% and which exhibited a concentration window of at least tenfold between ascending and descending part of the concentration response curve.
Hypothesis-Driven Screening
315
A retrospective analysis was performed for the 26 finally validated hits in order to find out from which primary hit list they originated. While 18 compounds were found as stimulatory hits in the yeast growth modulation screen, additional five compounds could be rescued from the inhibitory hits in this screen. Additional five compounds were found in the screen of the parental yeast strain. If only the stimulatory hits from the primary target screen would have been followed up, eight (31%) compounds of the final hits would have been missed. In conclusion, the HTCP enabled us to minimize the false negative rate by adopting a broad and inclusive hit follow-up strategy at early screening stages when data power and quality is relatively poor (e.g., single point determinations in primary screening) and becoming more and more stringent towards later screening stages when data power has increased (e.g., concentration response curves, replicates, orthogonal target readout).
Acknowledgments Drs E. Jacoby and P. Fürst (both NIBR associates) are acknowledged for support and discussions. References 1. Booth, B. and Zemmel, R. (2004) Prospects for productivity. Nat. Rev. Drug Discov. 3, 451–456. 2. Mitchison, T. J. (1994) Towards a pharmacological genetics. Chem. Biol. 1, 3–6. 3. Schreiber, S. L. (1998) Chemical genetics resulting from a passion for synthetic organic chemistry. Bioorg. Med. Chem. 6, 1127–1152. 4. Chiang, S. L. (2006) Chemical genetics: use of high-throughput screening to identify small-molecule modulators of proteins involved in cellular pathways with the aim of uncovering protein function. In: J. Hüser (ed.) High Throughput-Screening in Drug Discovery. Wiley-VCH, Weinheim, pp. 1–13. 5. Hertzberg, R. P. and Pope, A. J. (2000) High-throughput screening: new technologies for the 21st century. Curr. Opin. Chem. Biol. 4, 445–451. 6. Macarron, R. (2006) Critical review of the role of HTS in drug discovery. Drug Discov. Today 11, 277–279.
7. http://www.sbsonline.org 8. Lipinski, C. A., Lombardo, F., Dominy, B. W., and Feeney, P. J. (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26. 9. Zhang, J. H., Chung, T. D. Y., and Oldenburg, K. R. (2000) Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem. 2, 258–265. 10. Malo, N., Hanley, J. A., Cerquozzi, S., Pelletier, J., and Nadon, R. (2006) Statistical practice in high-throughput screening data analysis. Nat. Biotechnol. 24, 167–175. 11. Padmanabha, R., Cook, L., and Gill, J. (2005) HTS quality control and data analysis: a process to maximize information from a high-throughput screen. Comb. Chem. High Throughput Screen. 8, 512–527.
316
Schopfer et al.
12. Alanine, A., Nettekoven, M., Roberts, E., and Thomas, A. W. (2003) Lead generation – enhancing the success of drug discovery by investing in the hit to lead process. Comb. Chem. High Throughput Screen. 6, 51–66. 13. Schopfer, U., Engeloch, C., Stanek, J., Girod, M., Schuffenhauer, A., Jacoby, E., and Acklin, P. (2005) The Novartis compound archive - from concept to reality. Comb. Chem. High Throughput Screen. 8, 513–519. 14. http://www.velocity11.com. 15. Scheel, G., Pfeiffer, M. J. (2009) LongTerm Storage of Compound Solutions for High Throughput Screening by Using a Novel 1536-Well Microplate. J. Biomol. Screen. 14, 492–498. 16. http://www.thermo.com. 17. Engeloch, C., Schopfer, U., Muckenschnabel, I., Le Goff, F., Mees, H., Boesch, K., Hueber, M., and Popov, M. (2008) Stability of screening compounds in wet DMSO. J. Biomol. Screen. 13, 999–1006. 18. http://las.perkinelmer.com. 19. http://www.labcyte.com. 20. Slee, A. M., Wuonola, M. A., McRipley, R. J., Zajac, I., Zawada, M. J., Bartholomew, P. T., Gregory, W. A., and Forbes, M. (1987) Oxazolidinones, a new class of synthetic antibacterial agents: in vitro and in vivo
21. 22.
23.
24.
25.
26.
activities of DuP 105 and DuP 721. Antimicrob. Agents Chemother. 31, 1791–1797. Projan, S. J. and Bradford, P. A. (2007) Late stage antibacterial drugs in the clinical pipeline. Curr. Opin. Microbiol. 10, 441–446. Payne, D. J., Gwynn, M. N., Holmes, D. J., and Pompliano, D. L. (2007) Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat. Rev. Drug Discov. 6, 29–40. O’Brien, J., Wilson, I., Orton, T., and Pognan, F. (2000) Investigation of the Alamar Blue (resazurin) fluorescent dye for the assessment of mammalian cell cytotoxicity. Eur. J. Biochem. 267, 5421–5426. Lipinski, C. A., Lombardo, F., Dominy, B. W., and Feeney, P. J. (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25. Egan, W. J., Merz, K. M. Jr., and Baldwin, J. J. (2000) Prediction of drug absorption using multivariate statistics. J. Med. Chem. 43, 3867–3877. National Committee for Clinical Laboratory Standards (2003) Methods for Dilution Antimicrobial Susceptibility Test for Bacteria That Grow Aerobically; Approved Standard – Sixth Edition. NCCLS document M7-A6. NCCLS, Wayne, PA.
Index A
C
ABL1 kinase................................................................... 276 Absorption-distribution-metabolism-elimination and toxicology (ADMET).............60, 228, 230, 233 Acoustic dispensing................................................ 301–304 Activity prediction.......................................................... 251 Adenosine-5′-triphosphate (ATP).........................8–10, 14, 47–50, 56, 60, 67–69, 71–78, 80, 84, 94, 109, 115, 260, 284, 285, 288, 290, 292, 293, 312, 314 β2 Adrenergic receptor........................................... 258, 267 Affinity screen.........................................................8, 10, 13 Algorithm........................................... 22, 25, 26, 28, 29, 33, 39–44, 50, 57, 100, 202, 208, 230, 235, 238, 242–244, 246, 248, 251, 252, 255, 256, 260–263, 266–268, 274, 283, 286, 291–293, 306 Allosteric........................................................57, 74–78, 82, 261–263, 285 Amino acids.................................................. 37, 67, 71, 109, 110, 130, 131, 135, 141, 143, 144, 148, 231 AMOEBA..................................................................... 143 Antibiotics.............................................................. 104, 108 Anti-target.............................55, 56, 60, 68, 69, 71, 72, 101 Apo-enzyme..................................................................... 94 Aspergillus nidulans...........................................................108 ATP-binding proteins...................................................... 60 Automatic descriptor selection algorithm........................................................ 25, 42 Autoshim........................................................................ 285
Cancer................................................... 26, 37, 76, 106, 107, 110, 141, 170, 173, 174 Candida albicans...............................................................104 Cavbase.......................................................................... 293 Chemical biology............................................................. 15 Chemical features.......................... 25, 78, 94, 211–213, 223 Chemical space......................................... 2, 4–6, 15, 33, 39, 93, 94, 215–220, 234 Cheminformatics............................. 208, 300, 306, 308, 309 Chemokine receptor................................................... 34–40 Cheng-Prusoff.................................................................. 56 Chromatography........................................................ 8, 101 Clinical candidate....................................160, 161, 163, 282 Clustering....................................................... 2, 5, 9, 26, 33, 130, 174, 179, 190, 197, 225, 259–261, 284, 286, 287, 290, 293, 306 Cluster of Differentation 4 (CD4)................................. 113 Cobalamin...................................................95, 98, 116–117 Coenzyme............................................ 94, 95, 99, 103–105, 109, 110, 112–114, 116 Cofactor.............................................. 49, 50, 58, 60, 79, 80, 93–122, 252, 261 Compound management................................................ 297 Conformations..............................12, 51, 66, 69, 71, 72, 74, 81, 82, 125–131, 133, 135–141, 143, 144, 146, 148, 177–179, 252–254, 259, 263, 265, 266, 268–272, 276, 284–286, 292 Consensus principal component analysis (CPCA)...................................................... 286–290 Crystallography........................................ 3, 10, 12, 58, 139, 147, 176, 251, 253, 256 Curation.........................................................161, 163, 165, 166, 170, 228, 233, 283 Cyclin-dependent kinase 2 (CDK2)................................... 10, 71, 253, 288, 292 Cytochrome P450...............................................14, 23, 231 Cytoscape............................................................... 198, 204 Cytotoxicity............................. 165, 234, 307, 308, 312, 313
B Bacterial growth inhibition..................................... 306–309 Bayesian classifier................................................... 178, 209 Binding site.....................................2, 3, 5–7, 10, 33, 36, 39, 57, 60, 63, 67, 71–75, 77–79, 82, 85, 94, 99, 100, 102, 104, 107, 108, 115, 124, 137, 140, 141, 175, 177, 182, 191, 251, 252, 258, 263, 276, 282, 283, 286–288, 290, 292, 293 Bioinformatics...................................................56, 116, 195 Biological space....................................... 2–5, 8, 11, 15, 242 Biomarker................................................... 23, 24, 160, 161, 170, 211, 238, 239, 244 Biospectra....................................................................... 5, 6 Biosynthetic pathway......................................103, 309–312 Biotin...........................................................95, 98, 116–117 BLAST.................................................................. 202, 203
D Database...........................................................5, 22–29, 33, 39–40, 101, 125, 126, 136, 159–172, 175, 176, 178, 179, 183, 196, 199, 201, 210, 228, 231, 233, 234, 237–239, 245, 273, 274, 283–284, 293, 309
317
Chemogenomics 318 Index
Data mining.......................26, 161, 163, 228, 238, 283, 286 Dehydrogenase...................... 49, 50, 94, 100–103, 113, 116 Descriptor................................14, 25–30, 33, 40, 42, 43, 94, 130, 165, 178–181, 196, 199, 203, 208–209, 212, 215, 223, 234, 235, 282, 285, 286, 290, 292, 293 DFG motif..........................................................67, 71, 266 Dihydrofolate reductase (DHFR).......................... 105, 106 Dimensionality reduction................................24–26, 28, 42 Diversity............................................... 4, 5, 27, 40, 60, 100, 101, 127, 141, 163, 174, 175, 190, 191, 203, 253, 256, 272, 300, 306 DOLPHIN.............................................266, 272, 274, 275 DrugBank....................................................................... 202 Drug design.............................................. 21, 22, 36, 37, 41, 49, 58–60, 69, 71, 78, 85, 102, 124, 178 Drug discovery........................................... 1, 2, 5, 6, 14, 15, 22, 34–36, 41, 43, 49, 55, 58, 67–69, 72, 74, 78, 82, 84, 126, 160, 163, 173, 174, 207, 211–213, 281, 283, 298, 305, 306 Druggability.............................. 49, 56–60, 81, 84, 263, 299 Druglikeness............................................................... 29, 30 Drugscore........................................................286, 288, 290
E ECHO............................................................301, 302, 304 Eg5..................................................................60, 72–78, 85 Energy-based refinement....................................... 258, 259 Ensemble docking...........................................269–272, 276 Escherichia coli...................................................101, 107, 108
F Feature Point Pharmacophores (FEPOPS)...........................................179, 185, 187 Flavin adenine dinucleotide (FAD)............................ 96, 103–105, 116, 243, 244 Flavin mononucleotide (FMN)................................ 96, 103 Flexible docking....................................................... 40, 181 FLOGTV....................................................................... 290 Fragment............................................ 10, 12, 13, 27–29, 58, 145, 180, 182, 237, 256, 257, 266, 283, 300 Fumigation............................................................. 265–266
G Gatekeeper................................................................. 68–71 Gleevec............................................................................. 69 Glide....................................................................... 181–182 Glycogen synthase kinase 3 (GSK3)...................... 288, 292 GOSTAR........................................................165, 168, 170 G-protein................................................ 49, 79, 80, 84, 127 G-protein coupled receptor (GPCR)...................3, 5–8, 10, 12, 14, 23, 24, 31, 34–37, 40, 44, 94, 127, 128, 161, 174, 195, 207, 211, 253, 267
GRID......................................................... 33, 57, 181, 262, 263, 265, 267–269, 271, 286–290 GRID indepenpendent descriptors (GRIND)............... 292 Guanine nucleotide exchange factor (GEF)............................................................. 79, 81 GVK BIO.......................................................160–162, 165
H Haemophilus influenzae.....................................................110 α-Helix...........................................................67, 71, 73, 75, 141–146, 229, 234 Heme.................................................. 14, 94, 114–115, 128, 145, 165, 180, 298, 303 High-throughput cherry picking (HTCP).................... 174, 182, 191, 301, 304–306, 308, 309, 311–314 High-throughput docking (HTD).........................175, 177, 181, 183–187, 300–302 High-throughput screening (HTS)....................... 174–176, 184, 187, 190, 191, 297–300, 306–309 Homology................................................ 3, 7, 8, 14, 36, 39, 50, 51, 67, 174, 175, 177, 181, 191, 253, 254, 264–265, 290, 293 Homology model.............................................7, 8, 14, 175, 177, 181, 191, 254, 264–265 Hopfen............................................ 171–181, 183, 185, 187 Hot spots.........................................................126–127, 286 Human Ether-α-go-go Related Gene (hERG)........................... 13–14, 165, 222, 231, 234 Hypothesis-driven screening.................................. 297–316
I Induced fit.............................................. 128, 176, 177, 255, 259, 261, 262, 266, 269 Information.......................................... 3, 4, 6–8, 10, 12–14, 21–26, 41, 56, 58, 61, 69, 83, 85, 100, 108–109, 126, 159–166, 169–170, 175, 180, 196, 208–209, 212, 215, 219, 223, 228, 233, 240, 242, 244, 253, 264, 283, 286–288, 293, 298, 300, 302, 309 In silico................................28, 33, 41, 55, 58, 59, 174–176, 178, 179, 182–187, 189, 228, 246, 247, 306 Internal Coordinate Mechanics (ICM).................. 259, 261–263, 265, 267–269, 274 Ion channels.............................3, 7, 14, 24, 36, 40, 161, 207
K Kinase................................................... 3, 5–6, 8–12, 24, 26, 31–32, 36, 40, 49, 56, 60–61, 67–72, 78, 94, 108, 111, 114, 161, 174, 231, 253, 261, 264–266, 272, 274–276, 284–285, 288, 290, 293 Kinesin................................................................. 60, 72–78 Knowledge........................................ 2–7, 11, 13, 15, 22–23, 26, 28, 36, 58, 69, 159–171, 173–191, 207, 227–228, 240, 281, 286, 288, 290
Chemogenomics 319 Index
Kohonen self-organized maps (SOMs)....................... 22, 24–26, 29–34, 39, 43–44 Kyoto Encyclopedia of Genes and Genomes (KEGG)..............................................160, 199, 202
L Leadlikeness..................................................................... 29 Leishmania donovani........................................................110 Ligand-based......................41, 175, 178, 191, 208, 277, 282 Ligand binding..................................... 7, 14, 33, 36–37, 39, 50–51, 57–58, 94, 125, 175–177, 182, 191, 252, 255, 263, 265–273, 290 Lipoamide........................................................ 98, 116–117
M Malaria............................................................101–102, 255 Matrix metalloproteinases (MMP)............................ 12, 31, 148, 284, 286, 288 MDL patent chemistry database.............175–176, 178–179 Medchem....................................................................... 229 Metal complexes..................................................... 137–140 Mimicry............................125, 128, 130, 135, 140, 143, 147 Mitosis.................................................................. 26, 73–74 Molecular interaction field (MIF)...................229, 288, 290 Molecular recognition......................................41, 123–124, 127–128, 130, 132, 140, 174, 191 Monastrol................................................................... 74–77 Motor protein....................................................... 60, 72–78 Murine DoubleMinute-2 (MDM2).......................127, 141, 174–180, 182–183, 185, 187, 189–191 Murine DoubleMinute-4 (MDM4)....................... 173–191 Mycobacterium tuberculosis..........99, 101, 103–104, 108–110
N Naïve Bayes.....................................178–179, 185, 209–211 Network.................................................. 2–3, 15, 26, 29, 33, 43, 123, 124, 141, 195–204, 227–248, 258 Neural modeling............................................................... 41 Nicotinamide adenine dinucleotide (phosphate) (NAD(P))....................................................... 84, 99 Nuclear Magnetic Resonance (NMR)....................3, 12–13, 58–59, 100, 104, 135, 175–177, 191, 253 Nutlin..............................................................141, 176–182
O Off-target................................ 195–204, 208, 211, 215, 234 Oligopyrrolinones................................................... 146–148 Oxidoreductase..............................................94–95, 99–103
P P53.................................................... 127, 141, 143, 173–191 Pantothenate.................................................... 97, 108–109 Parkinson................................................................ 109, 165
Partial least square (PLS)................................179–181, 285 Patent mining................................................................. 161 Peptidomimetic.......................................... 12, 23, 126, 130, 132, 135, 139, 141, 144–146 Pharmacofamilies..................................................... 99, 101 Pharmacology...............................................11, 23–24, 163, 207, 212, 220, 231, 248 Pharmacophore................................ 40, 63–64, 77, 94, 100, 175–177, 179–180, 182–183, 185, 230, 248, 283–285, 291–293, 300, 306 Phosphodiesterases (PDEs).................................. 60–62, 64 Pipeline Pilot........................... 178–179, 208–211, 215, 219 Plasmodium falciparium....................................................101 Pocketome.............................................................. 251–278 Polypharmacology...........................................195, 208, 223 Principal component analysis (PCA)..........28, 42, 215–221, 286–288, 292–293 Privileged scaffold........................... 9–10, 12, 127–128, 130 Profiling.................................................. 6, 12, 68, 207–225, 252, 255, 267, 274–277, 309, 312 Promiscuity........................................ 37, 211–215, 219, 306 Protease.............................................. 3, 5–6, 10–13, 24, 29, 32, 36, 94, 125, 135, 146–148, 161, 290 Protease inhibitors.......................................12, 32, 146–147 Protein databank (PDB).................................62, 64–66, 68, 70, 74–75, 81–82, 128, 130, 142, 175, 253–261, 263, 269, 283–284 Protein domains..................................................... 253–255 Protein family landscapes..............................5, 57, 281–293 Protein kinase........................................... 40, 49, 61, 67–72, 114, 261, 264–266, 272, 284, 290 Protein/protein interactions.................................... 127–128 Prous..................................................23, 175–176, 178–179 Prous Ensemble................................................................ 23 Purine................................................. 48–51, 55–56, 58, 60, 67–72, 78–80, 84, 101, 104 Purinome.................................................................... 47–85 Pyridoxal 5′-phosphate (PLP)...................97, 105, 109–113
Q Quantitative structure-activity relationship (QSAR).................................... 9, 42, 180–181, 183, 185, 229–230, 234–239, 245, 248, 290
R Ras...................................................................58, 60, 78–84 Recognition motif...........................................128, 142, 146 Reference centric.....................................161–163, 165, 170 Reverse turn............................................................ 127–140 Riboflavin........................................................101, 103–105
S S-adenosyl methionine (SAM)........................ 98, 115–116 Safety...............................................................207–225, 231
Chemogenomics 320 Index
SCan Alanines and Refine (SCARE).............266, 272–273 Screening..............................................................2, 4, 7–10, 12–13, 15, 22, 24, 26, 28, 39–42, 57–58, 74, 101, 104, 109, 141, 143, 146, 173–191, 198, 208, 212, 252, 255, 265–268, 272–277, 285, 297–315 Selectivity...................................................9–12, 51–56, 60, 63–64, 67–72, 85, 212, 223, 265, 267, 275, 281, 286, 288, 290, 309, 312 β-Sheet...................................67, 73, 82, 123, 128, 146, 148 Sildenafil.................................................................... 64–66 Similarities......................................................9, 15, 36, 174, 196, 199, 203, 222, 246, 293 Similarity Ensemble Approach (SEA)..........................................196–199, 202–204 Similarity ensembles............................................... 201–202 SmartMining.............................................24–25, 27–29, 40 Staphylococcus aureus..................................................108, 306 β-Strand.................................... 12, 60, 67, 73, 79, 146–148 Streptococcus mutans..........................................................112 Structure-based.............................. 6, 22, 39, 41, 49, 58–60, 69, 71, 102, 114, 146, 175–176, 178, 191, 281–293 Surface Plasmon Resonance (SPR).................................. 13 Surrogate............................... 7, 13, 131–135, 148, 265, 285 Systems biology............................. 3, 15, 124, 229–230, 247
T Target class............................ 34, 40, 67, 208, 210–211, 284 Target hit rate (THR).................................................... 212 Targets.................................2–3, 6, 9, 12–14, 21–23, 39, 41, 49, 51, 56–58, 60, 62, 68, 84, 93, 101–105, 108–109, 111, 114–115, 161, 173–174, 195–198, 202, 208, 210–215, 218–223, 227–229, 231, 233–234, 238–240, 242–246, 248, 251, 276, 286–287, 290, 298, 306, 309 Tetrahydrofolate............................................... 96, 105–108
Thermodynamics......................... 3, 12, 50–51, 55, 125–126 Thiamine pyrophosphate (TPP)...................... 97, 113–114 @-Tides.......................................................................... 148 Time-resolved Fluorescence Energy Transfer (TR-FRET)........................................183–184, 187 Toxicity............................................. 14, 22, 56, 60, 73, 112, 139, 160–161, 165–166, 228, 238–239, 246 Toxicology.......................................................... 13–14, 208 β-Turn..................................................... 67, 123, 130, 137
U Unity................................................ 173, 182–183, 185, 187
V Vardenafil................................................................... 64–66 Velocity11....................................................................... 301 Venn diagram......................................................... 219–221 Virtual screening (VS)................................ 7, 22, 39–42, 74, 173–191, 285, 298, 301 Vitamin...............................................94–95, 103, 108–109
W World of Molecular Bioactivity (WOMBAT)................. 23, 175, 176, 178–179, 202
X X-ray crystallography................ 58, 139, 147, 176, 253, 256
Y Yeast growth modulation........................................ 312–315
Z Zinc................................................................................ 119