PHARMACOCHEMISTRY LIBRARY- VOLUME 23 QSAR AND DRUG DESIGN" NEW DEVELOPMENTS AND APPLICATIONS
PHARMACOCHEMISTRY LIBRAR...
120 downloads
1483 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PHARMACOCHEMISTRY LIBRARY- VOLUME 23 QSAR AND DRUG DESIGN" NEW DEVELOPMENTS AND APPLICATIONS
PHARMACOCHEMISTRY LIBRARY, edited by H. Timmerman Other titles in this series Volume 9
Innovative Approaches in Drug Research. Proceedings of the Third Noordwijkerhout Symposium on Medicinal Chemistry, Noordwijkerhout (The Netherlands), September 3-6, 1985 edited by A.F. Harms
Volume 10
QSAR in Drug Design and Toxicology, Proceedings of the Sixth European Symposium on Quantitative Structure-Activity Relationships, Portoro2-Portorose (Yugoslavia), September 22-26, 1986 edited by D. Had2i and B. Jerman-Bla2i~
Volume 11
Recent Advances in Receptor Chemistry. Proceedings of the Sixth CamerinoNoordwijkerhout Symposium, Camerino (Italy), September 6-10, 1987 edited by C. Melchiorre and M. Giannella
Volume 12
Trends in Medicinal Chemistry '88. Proceedings of the Xth International Symposium on Medicinal Chemistry, Budapest, 15-19 August, 1988 edited by H. van der Groot, G. Domany, L. Pallos and H. Timmerman
Volume 13
Trends in Drug Research. Proceedings of the Seventh Noordwijkerhout-Camerino Symposium, Noordwijkerhout (The Netherlands), 5-8 September, 1989 edited by V. Claassen
Volume 14
Design of Anti-Aids Drugs edited by E. De Clerq
Volume 15
Medicinal Chemistry of Steroids
by F.J. Zeelen
Volume 16
QSAR: Rational Approaches to the Design of Bioactive Compounds. Proceedings of the Eighth European Symposium on Quantitative Structure-Activity Relationships, Sorrento (Italy), 9-13 September, 1990 edited by C. Silipo and A. Vittoria
Volume 17
Antilipidemic Drugs - Medicinal, Chemical and Biochemical Aspects edited by D.T. Witiak, H.A.I. Newman and D.R. Feller
Volume 18
Trends in Receptor Research. Proceedings of the Eighth Camerino-Noordwijkerhout Symposium, Camerino (Italy), September 8-12, 1991 edited by P. Angeli, U. Giulini and W. Quaglia
Volume 19
Small Peptides. Chemistry, Biology and Clinical Studies edited by A.S. Dutta
Volume 20
Trends in Drug Research. Proceedings of the 9th Noordwijkerhout-Camerino Symposium, Noordwijkerhout (The Netherlands), 23-27 May, 1993 edited by V. Claassen
Volume 21
Medicinal Chemistry of the Renin-Angiotensin System edited by RB.M.W.M. Timmermans and R.R. Wexler
Volume 22
The Chemistry and Pharmacology of Taxol| and its Derivatives edited by V. Farina
PHARMACOCHEMISTRY
LIBRARY
E d i t o r : H. T i m m e r m a n
Volume
23
QSAR AND DRUG DESIGN: N EW DEVE LO PM E NTS AN D APPLI CATI O N S
Based on Topics presented at the Annual Japanese (Quantitative) StructureActivity Relationship Symposium and the Biennial China-Japan Drug Design and Development Conference
EDITED BY:
TOSHIO FUJITA Department of Agricultural Chemistry, Kyoto University, Kyoto, and EMIL PROJECT, Fujitsu Kansai Systems Laboratory, Osaka, Japan
ELSEVIER Amsterdam
- Lausanne - New York-
Oxford - Shannon
- T o k y o 1995
ELSEVIER SCIENCE B.V. P.O. Box 1527 1000 B M A m s t e r d a m , The N e t h e r l a n d s
IS B N 0-444-88615-X
9 1995 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, The Netherlands. Special regulations for readers in the U.S.A.-This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the publisher. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands
Dedicated to
Professor Corwin Hansch Without his heartfelt encouragements, the editing of this volume would never have been completed.
This Page Intentionally Left Blank
PHARMACOCHEMISTRY LIBRARY ADVISORY BOARD T. Fujita E. Mutschler N.J. de Souza D.T. Witiak F.J. Zeelen
Department of Agricultural Chemistry, Kyoto University, Kyoto, Japan Department of Pharmacology, University of Frankfurt, F.R.G. Research Centre, Hoechst India Ltd., Bombay, India College of Pharmacy, The Ohio State University, Columbus, OH, U.S.A. Organon Research Centre, Oss, The Netherlands
This Page Intentionally Left Blank
PREFACE In this series of Pharmacochemistry Library the preceding volume dealing with the QSAR methodology and related topics is Vol. 16, QSAR: RationalApproaches to the Design of Bioactive Compounds, edited by Carlo Silipo and Antonio Vittoria, both of whom unfortunately passed away recently. Volume 16 was published as the Proceedings of the 8th European Symposium on Quantitative StructureActivity Relationships held in 1990 in Sorrento, Italy. Like the European Symposium, the Japanese Symposium on Structure-Activity Relationships has been organised annually since 1975. A bilateral symposium with Chinese scientists, the "China-Japan Drug Design and Development Conference", has been held biennially since 1989. This volume, instead of taking the form of Proceedings, is an edited volume based on topics selected from those presented at these symposia. Each chapter is thus more complete than the original presentations and includes consecutive series of the same topic originally presented separately. The structure-activity relationship (SAR) studies of bioactive compounds seem to have at least two objectives. One is to obtain insight into the pharmacological modes of action and the other is to deduce possible guiding principles for designing analogues with better bioactive profiles. The quantitative approach to the SAR (QSAR), initiated by Corwin Hansch and his co-workers some 35 years ago, opened up new possibilities in the SAR discipline. Because the Hansch QSAR expanded the Hammett-Taft paradigm in physical organic chemistry toward the biomedicinal (re)activity, the mode of action has been illustrated on the (sub)molecular level in many cases. It also revealed the critical importance of the hydrophobicity of the bioactive molecule. Before the advent of the QSAR, the mode of action had remained mostly on the level of discussions in terms of the "lock-and-key" hypothesis. Because the relationships are represented in the form of mathematical correlation equations with physicochemical (electronic, steric, hydrophobic and others when necessary) parameter terms in the QSAR, the bioactivity of non-measured analogues has sometimes been predicted by extrapolating significant parameters and proved after synthesis and biological tests. This can be regarded as the beginning of the quantitative drug design. Perhaps stimulated by the success of the traditional Hansch QSAR, a number of newer software-based methodologies have been publicized in the SAR and drug design disciplines, supported by the tremendous progress in computer technology in recent years. Among them are those based on theoretical physicochemical and/or molecular orbital calculations, those utilizing molecular modelling and graphics, those managing sophisticated statistical operations and data-base-oriented procedures. Some theoretical calculation softwares do not only deal with the stereo-electronic energy of ligands, but also extend their scope into protein molecules. Thus, the current situation is as if a successful drug design from receptor protein structures could be not entirely impossible.
In this volume topics are covered among almost every procedure and subdiscipline described above. They are categorized into three sections. Section I includes topics illustrating newer methodologies relating to ligand-receptor interactions, molecular graphics and receptor modelling as well as the threedimensional (Q)SAR examples with the active analogue approach and the comparative molecular field analysis. Note that the last two chapters also use the traditional QSAR to cross-validate the results obtained with the newer procedures. In Section II the hydrophobicity parameters, log P (1-octanol/water), for compound series of medicinal-chemical interest are analysed physico-organic chemically. New procedures for the lead generation using databases of aminoacid sequences and structural evolution patterns, as well as a newer statistical QSAR modification utilizable in cases when the bioactivity potency is represented by ratings, are also placed in this Section. Section III contains the examples based on the traditional Hansch QSAR approach. Two contributions are from China illustrating how to identify the lead structures from folk medicine and how to optimize them in clinical applications. Others in this Section are instructive examples of the Hansch approach for various series of bioactive compounds in rationalizing the potency variations, actual designing the clinical candidates and revealing the (sub)molecular mechanism of action. A variety of methodologies and procedures are presented in this single volume. It is recommended that the readers regard each of the methodologies as complementary to others. It must be confessed that editing this volume required a much longer period than I had originally expected. Apologies are due to some of the authors if their chapters have become out of date, because the speed of progress in this field is very fast. If there could be something to mitigate the responsibility, it is the fact that most of the chapters dealing with rapidly growing topics describe their methodological philosophy in some detail. With understanding the background way of thinking, further developments can hopefully be caught up without difficulty. Last but not least, the editor expresses his sincere thanks to Mrs. A. Elzabeth Ichihara for critical correction of the English in most of the original manuscripts. August 1, 1995 Toshio Fujita, at Fujitsu Kansai Systems Laboratory
XI
LIST OF CONTRIBUTORS Dr. G. Appendino Dipartimento di Scienza e Tecnologia del Farmaco via R Giuria 9 10125 Torino ITALY Dr. S.H. Chen Bristol Myers Squibb Pharmaceutical Research Institute RO. Box 5100 Wallingford, CT 06492-7660 U.S.A.
Dr. L. Landino Chemistry Department University of Virginia Charlottesville, VA 22901 U.S.A. Dr. T. MacDonald Chemistry Department University of Virginia Charlottesville, VA 22901 U.S.A.
Dr. T. Cresteil INSERM U75 Universite Rene Descartes 75730 Paris Cedex 15 FRANCE
Dr. B. Monsarrat Laboratoire de Pharmacologie et Toxicologie Fondamentales CNRS 205 Route de Narbonne 31400 Toulouse FRANCE
Dr. R.C. Donehower Division of Pharmacology and Experimental Therapeutics Johns Hopkins Oncology Center Baltimore, MD 21287 U.S.A.
Dr. E.K. Rowinsky Div. of Pharmacology and Experimental Therapeutics Johns Hopkins Oncology Center Baltimore, MD 21287 U.S.A.
Dr. V. Farina Department of Medicinal Chemistry Boehringer Ingelheim Pharmaceuticals 900 Ridgebury Road Ridgefield, CT 06877 U.S.A.
Dr. I. Royer Laboratoire de Pharmacologie et Toxicologie Fondamentales CNRS 205 Route de Narbonne 31400 Toulouse FRANCE
Dr. D. Guenard Institut de Chimie des Substances Naturelles CNRS 91190 Gif-sur-Yvette FRANCE Dr. J. Kant Bristol Myers Squibb Pharmaceutical Research Institute P.O. Box 5100 Wallingford, CT 06492-7660 U.S.A.
Dr. D.M. Was Bristol Myers Squibb Pharmaceutical Research Institute 5, Research Parkway Wallingford, CT 06492-7660 U.S.A. Dr. M. Wright Laboratoire de Pharmacologie et Toxicologie Fondamentales CNRS 205 Route de Narbonne 31400 Toulouse FRANCE
This Page Intentionally Left Blank
xIII
CONTENTS T. Fujita: Preface
SECTION I:
.................................
ix
Three-Dimensional Structure-Based Drug Design, Molecular Modelling and Three-Dimensional QSAR.
A. Itai, N. Tomioka, Y. Kato Rational Approaches to Computer Drug Design Based on Drug-Receptor Interactions . . . . . . . . . . . . . . . . . . . . . . . . K. Akahane, H. Umeyama
Drug Design Based on Receptor Modeling Using a System
"BIOCES(E)"
. ...............................
49
T. Matsuzaki, H. Umeyama, R. Kikumoto
Mechanisms of the Selective Inhibition of Thrombin, Factor Xa, Plasmin and Trypsin . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
H. Koga, M. Ohta Three-Dimensional Structure-Activity Relationships and Receptor Mapping of Quinolone Antibacterials . . . . . . . . . . . . . . . . . . .
M. Yamakawa, K. Ezumi, K. Takeda, T. Suzuki, I. Horibe, G. Kato, T. Fujita Classical and Three-Dimensional Quantitative Structure-Activity Analyses of Steroid Hormones: Structure-Receptor Binding Patterns of Anti-hormonal Drug Candidates . . . . . . . . . . . . . . . . . . . .
97
125
SECTION I1: Quantitative Structure-Parameter Analyses and Database-Oriented and Newer Statistical (Q)SAR Procedures and Drug Design, C. Yamagami, N. Takao, T. Fujita
Analysis and Prediction of 1-Octanol/VVater Partition Coefficients of Substituted Diazines with Substituent and Structural Parameters . . . 153
M. Akamatsu, T. Fujita Hydrophobicities of Di-to Pentapeptides Having Unionizable Side Chains and Correlation with Substituent and Structural Parameters . . 185 T. Nishioka, J. Oda
Analysis of Amino Acid Sequence-Function Relationships in Proteins . 215
xIv
T. Fujita, M. Adachi, M. Akamatsu, M. Asao, H. Fukami, Y. Inoue, I. Iwataki, M. Kido, H. Koga, T. Kobayashi, I. Kumita, K. Makino, K. Oda, A. Ogino, M. Ohta, F. Sakamoto, T. Sekiya, R. Shimizu, C. Takayama, Y. Tada, I. Ueda, Y. Umeda, M. Yamakawa, Y. Yamaura, H. Yoshioka, M. Yoshida, M. Yoshimoto, K. Wakabayashi
Background and Features of EMIL, A System for Database-Aided Bioanalogous Structural Transformation of Bioactive Compounds . . . 235
10
I. Moriguchi, S. Hirono
Fuzzy Adaptive Least Squares and its Use in Quantitative StructureActivity Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . .
275
SECTION II1: Traditional QSAR and Drug Design. 11
12
13
14
15
16
Z-r. Guo
Structure-Activity Relationships in Medicinal Chemistry: Development of Drug Candidates from Lead Compounds . . . . . . . . . . . . . . .
299
R.-I. Li, S.-y. Wang
Chemical Modification and Structure-Activity Relationship Studies of Piperine and its Analogs: An Example of Drug Development from Folk Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
H. Terada, S. Goto, H. Hori, Z. Taira
Structural Requirements of Leukotriene Antagonists
..........
321
341
K. Mitani
Quantitative Structure-Activity Relationships of a New Class of Ca2+-Antagonistic and 0~-Blocking Phenoxyalkylamine Derivatives . . . 369
H. Ohtaka
Applications of Quantitative Structure-Activity Relationships to Drug Design of Piperazine Derivatives . . . . . . . . . . . . . . . . . . . . .
413
K. Hashimoto, H. Tanii, A. Harada, T. Fujita
Quantitative Structure-Activity Studies of Neurotoxic Acrylamide Analogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
451
481
SECTION I: Three-Dimensional Structure-Based Drug Design, Molecular Modelling and Three-Dimensional QSAR.
This Page Intentionally Left Blank
QSAR and Drug Design - New Developments and Applications T. Fujita, editor 9 1995 Elsevier Science B.V. All rights reserved
RATIONAL A P P R O A C H E S TO C O M P U T E R D R U G D E S I G N B A S E D ON D R U G - R E C E P T O R I N T E R A C T I O N S
Akiko Itai*, Nobuo Tomioka* and Yuichi Kato Faculty of Pharmaceutical Sciences, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan ABSTRACT
We have developed two novel methods and computer programs for rational drug design on the basis of drug-receptor interaction. The program GREEN is to perform docking studies efficiently and rationally, when the receptor structure is known. The main features of the program are the real-time estimation of intermolecular interaction energy and the informative visualization of the drug binding site. In addition, many functions help to find a p p r o x i m a t e l y the stable positions and conformations of a drug molecule inside the receptor cavity. The other program, RECEPS, is for rational superposition of molecules and for receptor mapping, when the receptor structure is not known. The superposition is performed through the use of spatial grid points and monitored by several goodness-of-fit indices indicating the similarities in physical and chemical properties. Based on the superposed structures, a three-dimensional receptor image can be constructed, which reveals cavity shapes, expected locations and characters of hydrogen-bonding groups, electrostatic potentials of the surface, and other features. 1. I N T R O D U C T I O N
For the development of new drugs, a tremendous number of compounds must be synthesized and assayed for biological activities. As the difficulties in synthesizing compounds have decreased with the technical advances of organic synthesis, the efficient design of bio-active molecules has become more and more important. Usually, drug development starts with the selection of a lead compound, and then the structure is modified to obtain better biological response profiles. But, starting from an appropriate lead compound is the key to success. How to find an appropriate lead compound and how to optimize the lead structure efficiently are the central problems of drug development. As yet, however, no general *Present address: Institute of Medicinal Molecular Design, 4-1-11 Hongo, Bunkyo-ku, Tokyo, Japan
methods for solving these problems are available. Indeed, finding new lead compounds is so difficult as compared with optimizing existing lead compounds that they have never been generated artificially. It has long been desired to design active structures on the basis of logic and calculations, not relying on chance or trial-and-error. Computers have been introduced into drug design for that purpose, and with the remarkable progress of computer technology in the past thirty years, computers have become widely used in drug research for maintaining databases, statistical processing, molecular modeling, theoretical chemical calculation, and so on. Since analyses of the relationships between structures and activities by using computers began more than twenty years ago (1), various approaches have been reported by many researchers. Some of them, however, have fallen by the wayside as our understanding of drug-receptor interactions has deepened.
Drug-Receptor Interactions It is well known now that a drug molecule exerts its biological activities by binding specifically to a target macromolecule, or receptor, in the body. Dozens of receptor molecules for various hormones and neural transmitters have been isolated and characterized, and their amino acid sequences have been determined. None of the three-dimensional structures of such receptors has been elucidated, whereas those of hundreds of proteins have already been elucidated to atomic resolution by X-ray crystallographic analyses. Some solutions have been obtained for complexes of protein and ligand molecules. These results have provided us with details of molecular recognition by the macromolecule as well as the three-dimensional structure of the macromolecule. Such concrete molecular images have validated the key-and-lock model for drugreceptor interaction, which had been vaguely understood for a long time. In most of the complexes, ligand molecules are non-covalently bound to proteins. The complexes are stabilized by intermolecular forces such as hydrogen bonds, electrostatic interactions, van der Waals forces, and hydrophobic interactions. The strength of binding, which is represented experimentally by equilibrium constants of binding or dissociation, can be estimated by empirical energy calculations. The sum of the intramolecular and intermolecular energy values is taken as an index for showing
the binding affinity, although the molecular recognition results from the free energy decrease upon complexation between the molecules. Accordingly, the more energetically favorable the interaction of the ligand molecule with the receptor is, the more efficiently the ligand can bind to the target receptor specifically. There are many examples where agonist and antagonist molecules with quite different chemical structures can bind strongly to the same site of the same receptor as the natural bio-active compounds. This fact is well evidenced by a number of crystallographic studies on protein-ligand or enzyme-inhibitor complexes. It can be seen that it is not the skeletal structure itself but the threedimensional array of submolecular physical and chemical properties of the ligand molecule that is recognized by proteins. As receptors consist mainly of proteins and the main functions of receptors seem to depend on the protein constituents, the molecular recognition between a receptor and drug is supposed to be very similar to that between an enzyme and substrate. The only difference is that reactions proceed in the case of enzymes, whereas signals are transduced between cells in the case of receptors. Many enzyme inhibitors are used as clinical drugs, in order to maintain biological homeostasis by controlling biochemical reactions or to prevent pathogenic microorganisms from proliferating. In this article, we use the term "receptor" in a broad sense, including not only the pharmacological receptors for hormones and neural t r a n s m i t t e r s but also enzymes or other globular proteins or nucleic acids.
Methods for Analysis of Structure-Activity Relationships Various approaches have been proposed for analyzing structure-activity relationships using computers. Among them, there are approaches in which the chemical structural formula is split up into component units. The individual substructural components are regarded as being significant to various extents for the biological activity, and the structureactivity relationships are analyzed a s s u m i n g t h a t the activity is controlled by combinations of the activity-indices assigned to the individual structural units contained in each structural formula. The activities of a series of compounds are expressed as functions of these indices by linear or non-linear combination methods. These approaches seem to be
just for the analyses, but not effective for understanding molecular recognition by biological macromolecules. Some of the substructures may indeed play important roles in interaction with the receptor. But, they can often be replaced by other groups with similar physical and chemical properties. As stated before, it is not just the existence of the particular structural units but the spatial alignments of physical and chemical properties of the units that are important. It seems to be quite difficult to reconstitute the separated pieces of a structural formula to obtain new molecules in the hope that they will have the same biological activity as the original molecule. Among approaches based on the physicochemical properties of molecules, Hansch and Fujita's method (2) is excellent. They have developed a method whereby the relationships between structures and activities can be analyzed quantitatively. In this method, biological activities are correlated with various physicochemical properties of substituent groups at specified positions of molecules in a series of derivatives with the same skeletal structure. By regression analyses, the activities of dozens of compounds can be represented by an equation consisting of a linear combination of several physicochemical variables. Usually, the physicochemical properties of substituent groups, such as inductive, resonance, hydrophobic, and other effects, and those of whole molecules, such as the partition coefficient and molar refractivity, are chosen as variables (3), since they make significant contributions to the activity. From the coefficient for each variable term in the equation, we can determine quantitatively the extent of the contribution of each property to the activity. This method is a powerful tool to indicate quantitatively the direction of subsequent structural modifications in order to improve the biological activity. Although the interpretation of the physical meanings of the variables is not always clear, the equation covers a number of interactions between drugs and biological systems. The method has been shown to be useful for performing lead optimization rationally and used worldwide. But, it is necessary to establish different methods for interpreting the structure-activity relationships for molecules with different skeletal structures, and for designing new molecules with different skeletons. For these purposes, efficient methods using three-dimensional structures, based on new concepts, seem to be essential.
Three-Dimensional Structures of Molecules The three-dimensional structure is the most realistic description of an existing molecule. The chemical structure itself cannot be directly related to biological activities and functions of a molecule, though it is an excellent graphic means to describe chemical bondings. However, all the features of a molecule, such as physical properties, chemical reactivities, dynamical behaviors and molecular interactions, should be interpretable in t e r m s of its three-dimensional structure. With the remarkable advances in techniques of solving crystal structures, it has become more and more easy to obtain three-dimensional structures of molecules. In the last three decades, techniques and equipment for measuring diffraction from crystals, and algorithms for solving the phase problem and for refining structures have made remarkable progress. In the field of small molecules, structure analyses can be routinely performed now. Even in the field of macromolecules, methods for structure analyses have been established (4) and structure elucidations have become progressively easier, although crystallization still remains a difficult problem. The analyses can now be applied to larger, more unstable, and more complicated molecules, and can be done with smaller amounts of samples, with less labor, and in a shorter period than before. The results of these crystallographic analyses have been put into generally available databases. The atomic coordinates of molecules and accompanying crystallographic data of small molecules are available in the Cambridge Crystallographic Database (5). Those of macromolecules are available in the Protein Data Bank (6) (National Laboratory Institute, Brookhaven). These databases have deepened our understanding of the three-dimensional structures of molecules and of molecular interactions. Especially, the crystal structures of protein-ligand complexes or DNA-ligand complexes have clarified the details of molecular recognition by macromolecules in general, as well as in individual cases.
Three-Dimensional Computer Graphics Three-dimensional structures and interactions of protein-ligand and DNA-ligand complexes can be better understood by using threedimensional computer graphics devices (hereafter abbreviated as "3DCG"), which can store images of three-dimensional objects in the
memory and apply three-dimensional transformations to the image, such as rotation, translation and scaling in real time (7). In the past decade, 3D-CG has become an essential tool for computer molecular modeling. Three-dimensional structures in the crystallographic databases or private data files can be displayed directly on 3D-CG and the molecules can be manipulated interactively (rotation, translation, and bond rotation) with input devices such as dials, a joystick, keys, and a mouse connected to the display. After manipulating or modeling the molecule, new atomic coordinates can immediately be stored in files and can be readily used for computation, and the picture can be reproduced at any time. In addition to various representations of molecular structures such as wire-frame, ball-and-stick and space-filling models, physical and chemical properties and virtual characters of molecules, such as electrostatic potentials, molecular orbitals, and expected sites of hydrogen bonding partners, can be displayed on 3D-CG, and compared visually with those of other molecules. Recently, high-performance 3D-CG workstations have become available in place of the combination of 3D-CG terminals with a host computer. Dozens of well-developed softwares for computer-assisted molecular design based on 3D-CG are commercially available and are now widely used (8). The main functions of the softwares are molecular modelling and theoretical calculations. In order to construct threedimensional structures, various procedures are provided with the softwares, and are usually performed interactively on graphic displays. Crystallographic databases or private structure files are referenced, if necessary, and the structures are subjected to further modification, such as addition or deletion of substituent groups, replacement of atomic elements, and conformational changes. Some theoretical calculations are applied for refining the geometries and for obtaining the stable conformation. But, a serious problem is that there are a number of possible three-dimensional structures in non-rigid molecules.
Theoretical Calculations The progress of theoretical calculations in the field of chemistry, such as molecular mechanics (9), molecular orbital (10,11), and molecular dynamics (12) calculations, has been remarkable. The methods are used
for estimating energetic stabilities, electronic properties, and molecular interactions. It is a characteristic of computational methods that they are applicable not only to actually existing molecules but also to imaginary structures. They are useful not only for interpreting various chemical p h e n o m e n a but also for predicting t h e m without experiments. Molecular mechanics and molecular orbital calculations can give us the minimum energy structure with its energy value, although it might not be the global minimum structure but only the local minimum near the starting structure because of the limitations of the energy minimization algorithm. These methods are very useful for refining structures in molecular modeling. Molecular dynamics calculations simulate the motions based on the potential energy calculation by using the force field and Newton's equation of motion, assuming each atom to be a particle. By solving the equation for each short time step in a certain period of time, a trajectory is obtained as a series of positions and velocities of atoms in the system. The dynamic behaviors of molecules can be simulated along the time course by using energy values and other structural features. Unlike the molecular mechanics calculation, the molecular dynamics calculation can override the energy barriers between local minima. But, it still has a limitation in getting over high energy barriers and the global minimum search is not easy even by this technique. Nevertheless, the calculation has come to be used for the purpose of finding the stable structures of super-flexible molecules, including those of solvated states, and estimating free energy difference between two similar states.
Active Conformation of Drugs The calculations described above have become indispensable tools not only in structural organic chemistry but also in analyses of structure-activity relationships in computer-aided drug design. They are of course useful for interpreting the chemical reactivity. For the purpose of drug design or analyses of structure-activity relationships, however, attention has to be paid to the fact that, in general, chemical reactions start from the most stable three-dimensional structures of the molecules involved in the reaction, whereas biological activities arise from the stable interaction of drug molecules with receptor macromolecules. For drug activities, we
10 must consider the stability of the drug-receptor complex, in place of the stability of the drug itself. Therefore, when the three-dimensional structures of receptor macromolecules are not known, we cannot estimate the stability and the stable structure of the drug-receptor complex computationally. Even if the receptor structure is known, it is not easy to find the stable mode of binding of the two molecules, because of the vast number of possibilities arising from the six degrees of freedom of rotation and translation. A "carpet bombing" search for the global energy minimum by changing all degrees of freedom is not realistic in a multidimensional system. A blind calculation of molecular mechanics or molecular dynamics does not yield any stably docked structures owing to the energy barriers. Therefore, we must prepare appropriate starting structures in order to avoid being trapped in unexpected local minima, before starting the calculation. The global energy minimum structure is often assumed to be the most stable structure among them, although this assumption is not necessarily correct. In the case of flexible molecules which have a number of rotatable single bonds, it is especially difficult to find the most stable structure in the complex because of the additional degree of freedom for bond rotation. The conformation which a drug molecule or a natural substrate molecule adopts on its receptor is called the "active conformation". The active conformation for each bio-active molecule is not necessarily the most stable conformation of the molecule itself. The active conformation can be determined most straightforwardly by X-ray crystallography on a crystal of the drug-receptor complex. Those of other drug molecules, which are known to interact with the same receptor, can be estimated based on the structure of the drug binding site. The main problems in docking procedure calculations are as mentioned above. Knowledge of active conformations is quite useful for evaluating structure-activity relationships and designing new structures, especially when the receptor structure is not known. But, it is very difficult to determine the active conformation of a highly flexible molecule without knowledge of the receptor structure. Theoretical calculations are less useful for these purposes.
ll 2. STRATEG1E~S OF OUR APPROACHES Background Because the background is extremely complicated and full of unelucidated factors in spite of recent advances in molecular biology, it seems to be most challenging to establish novel strategies for drug design. First of all, it is important to explore a rational way of drug design in general, r a t h e r t h a n in individual cases. To develop new concepts and new methodologies, effective and efficient utilization of computers seems to be an essential prerequisite, rather than classic procedures utilizing simple mimicry of the process or way of thinking of synthetic chemists, who previously carried out drug development. As it is receptors that hold the keys to biological activities, the most logical approach in drug design is to make use of receptor structures. Even if the receptor structure is unknown, provided that two or more active molecules are known, approaches based on an assumed common receptor are more rational than those based on simple similarities of their structures. We have been developing several program systems based on the receptor, as we will describe later. F u n d a m e n t a l Concepts The key assumptions underlying our concepts are as follows. 1) It is not the chemical structures or atomic positions that are recognized by macromolecules in biological systems. Recognition of a ligand molecule involves the overall intermolecular forces. It is the spatial arrangement of submolecular physical and chemical properties t h a t is important for the proper interaction between two molecules. These properties along with the contact surfaces should be complementary between two molecules. Among various intermolecular forces, the hydrogen bond is very important for discrimination between molecules. Hydrogen bonding works within a limited distance and direction,
whereas the electrostatic interaction works in all directions and over a long distance. In many crystal structures of protein-ligand complexes, ligand molecules have been found to be fixed firmly to the proteins through a number of hydrogen bonds as indicated in Fig. 1 as an example.
12
Fig. 1 Hydrogen bonds ( d o ~ lines) between/~ casei dihydrofolate r e d u c ~ and a potent inhibitor methotrexate (filled bonds) in the crystal structure. (Drawn with the atomic coordinates from the Protein Data Bank entry 3DFR (13)).
2) Molecules with quite different chemical structures can b i n d to the
Many examples are known of competitive inhibition between molecules belonging to different categories of structural types, as found by receptor assay with a radioisotopic ligand. These pairs of molecules, such as those shown in Fig. 2, might have a common three-dimensional shape and common physical and chemical properties such as hydrogen bonding, electrostatic, and hydrophobic interactions. The shape and the properties of these molecules must be complementary with those of the receptor. Furthermore, it is not the existence of the individual properties but their spatial arrangements on the molecule that are important for binding specifically to the receptor site. Flexible molecules must be able to adopt stable conformations that satisfy these requirements.
same site o f a receptor.
13 Natural and Synthetic Estrogens
Natural and Synthetic Retinoids
Substrate and Inhibitor of Cyclooxygenase
OH
~ Estradiol
Retinoic Acid
OH
Hi. ~ ~ N
HO Diethylstilbestrol (14)
0
AM80 (15)
H
Arachidonic Acid
COOH CH30~
N~' CH2COOHcH3 C=O CI
Indomethacin (16)
Fig. 2 Structure-pairs of natural and synthetic ligands (14,15,16) that bind to the same receptor sites. The binding to the same receptor site has been proved by receptor binding assay.
3) The whole structure of the drug molecule is not necessarily required for receptor binding. Inspection of the crystal s t r u c t u r e s of enzymei n h i b i t o r complexes elucidated by X-ray c r y s t a l l o g r a p h y indicates t h a t not all the a t o m s of an inhibitor molecule are necessarily involved in its interaction with a protein, as can be seen, for example, in Fig. 3.
Fig. 3 Three-dimensional structure of/,. case/ dihydrofolate reductase (thin line) and b o u n d inhibitor m e t h o t r e x a t e (thick line) in the crystal. Some atoms in methotrexate at the opening of the binding site may have contacts with molecules outside the protein. (Drawn with the atomic coordinates from the Protein Data Bank entry 3DFR (13))
14 As usual ligand molecules which fill the cavity of the ligand binding site are not totally buried in the protein, an opening cleft exists as an entrance into or an exit from the cavity. Even in the case where most of the atoms in a ligand directly contact protein atoms, the back surface of the ligand might be exposed to the outside. The structure of the exposed portion may be nonspecific, although the functional groups on t h a t portion would contribute to dissolution, partition, transport and permeability through the membrane, together with those in the buried portion. On the other hand, the buried portion of the ligand strongly bound to the receptor should have a specific structure corresponding to the target receptor. Therefore, structural modification for lead optimization should be applied to the exposed portion, if we can distinguish between the two portions. The a p p a r e n t molecular shapes of drugs t h a t are known to bind to the same receptor site often seem to be dissimilar because of the existence of the nonspecific portion. So, conventional shape analysis methods that use the whole three-dimensional structure of drug molecules would have no significance. Comparison of the surface electrostatic potentials between molecules with the same biological activities also seems to have no significance, unless the comparison is limited to the buried surface that is directly involved in receptor binding.
Structure-Activity Relationships and Designing New Structures To establish a correct model of structure-activity relationships is the s t a r t i n g point of designing new structures. For the optimization in a definite skeletal structure, quantitative structure-activity relationships based on two-dimensional structures of molecules (2) are useful to indicate an appropriate course of structural modification in substituents. For molecules with different skeletal structures, however, methods based on the three-dimensional structures of molecules are essential. Several methods have been proposed so far, although they are not sufficiently powerful to guarantee their success in rational drug design at present. When the receptor structure is known, examinations of relationships between three-dimensional structures and activity seem to be r a t h e r easy (8), and the design of new molecules by s t r u c t u r a l modification could be done without difficulty. But, even in these cases, the design of new molecules with different skeletal s t r u c t u r e s cannot be realized
15
easily. When the receptor structure is not known, the examination of structure-activity relationships as well as the design of new molecules becomes much more difficult. The constructed model of structureactivity relationships is necessarily less certain and less reliable because of an insufficiency of information. Each drug molecule may not be wholly complementary to the receptor cavity, only parts of the chemical and physical properties of the drug binding site being reflected. Use of information from multiple molecules with different skeletal structures can give a better image of the receptor cavity. The deduced receptor cavity or the structural requirement for binding to the receptor would give a useful hypothetical basis for structure-activity relationships, and contribute to the design of new structures, although each must be refined or modified repeatedly through synthetic trials. In any case, the design of new structures with different skeletons, so-called "lead generation", is so difficult that it can rarely be attained either by human work or by computer at present. In order to make lead generation possible, it is necessary to develop special methodologies where the h u m a n brain and computer give full play to their particular abilities.
Common Features of the GREEN and RECEPS Programs Based on the principles of drug-receptor interaction described above, we have developed new methods and computer programs for drug design. Among several systems developed for various purposes, we describe here two program systems for evaluating structure-activity relationships using the three-dimensional structures of molecules. One is the program system GREEN for efficient docking studies when the receptor structures are known (17,18), and the other is the program system RECEPS for rational superposition of molecules and receptor mapping when the receptor structures are not known (19). The GREEN program is based on the three-dimensional structures of receptor proteins. It enables the real-time estimation of intermolecular interaction energy between protein and ligand molecules throughout the docking process, describing the physical and chemical environment of the ligand binding site of the protein. It should be helpful in finding the stable relative geometry of protein and ligand molecules in explanations
15
of the m e c h a n i s m s of biochemical reactions and structure-activity relationships of drugs. Without information on receptor structures, the RECEPS program is based on the three-dimensional structures of multiple molecules which are supposed to bind specifically to the same receptor. In the RECEPS program, molecules are superposed in terms of submolecular physical and chemical properties, not in terms of the atomic positions or partial chemical structures as has so far been done conventionally. A threedimensional receptor model can be constructed according to the superposed structures. The model provides the size and shape of the bindingsite cavity, hydrogen bonding sites, the electrostatic character on the surface, and other structural indices. The common features of these two programs are that they (1) are based on the specific interactions between drugs and a target (2) (3) (4) (5)
receptor; make use of a three-dimensional grid to describe the physical and chemical properties spatially; utilize 3D computer graphics interactively, as an interface between the h u m a n brain and computer; yield numerical indices for indicating the validity of docking or superposition in real time; and are useful not only for interpreting structure-activity relationships, but also for designing new structures.
3. APPROACHES BASED ON RECEPTOR STRUCTURE
Docking Studies Techniques for isolation and identification of proteins have made remarkable progress in recent years, and a number of protein structures have been elucidated or are being elucidated at the atomic level. Some of these proteins are bound with small molecules such as inhibitors and cofactors in the crystal. Based on the three-dimensional structure of the protein in such protein-ligand complexes, we can simulate stable interaction modes of ligand molecules with the protein with the aid of computers (20). We can estimate the stability of the ligand molecule with arbitrary conformation at arbitrary relative position, search for the mode
17 of the minimum energy binding and determine its stability. Such approaches have often been called "docking studies" (21). Docking studies are used not only for investigating natural biochemical processes but also for examining the mode and stability of binding of drugs to the target receptor in drug design. Interaction and/or reaction of natural substrates may be difficult to study by crystallographic or other experimental methods, because of the rapid progress of enzymatic reactions. Substrate specificity, site-specific or stereo-specific reactivity, and stability of the possible intermediates can be evaluated by docking simulation. Furthermore, as the binding affinity and the binding mode can be predicted for molecules that have not yet been synthesized, such simulation is useful for designing molecules with enhanced affinity to a target receptor and for selecting candidate molecules for synthesis. A ligand molecule that can bind strongly to the target receptor should have energetically favorable interactions with the receptor with an appropriate relative geometry. In docking simulation, the problem of finding such geometry between ligand and target molecules is too difficult to be accomplished only by computational methods. Besides conformational freedom, six degrees of freedom for rotation and translation of the ligand may give rise to innumerable local minima, from which a global minimum cannot be easily discriminated. Therefore, for the time being, likely stable geometries usually have to be selected by visual judgment using the 3D-CG display before starting computation. To find a likely stable geometry and conformation, the ligand molecule is subjected to a series of interactive three-dimensional manipulations (rotation, translation, and bond rotation) inside the ligand binding site of the protein on the 3D-CG display. During the last ten years, many docking simulation studies for various purposes have been published, based on the known structures of proteins or nucleic acids.
Approaches by Other Research Groups In 1981, Connolly developed an algorithm for rapid calculation of the positions of a group of dots for representing a molecular surface (22) based on the definitions made by Richards (23). Electrostatic properties can be represented by color-coded dots according to electrostatic potentials calculated at the molecular surface from all the atomic charges in
18 the molecule. By using these techniques, Weiner et al. have shown that there is a good complementarity in shape as well as in electrostatic properties between partners in several protein-ligand complexes whose structures had been elucidated by X-ray crystal analyses (24). The representation is not only beautiful but also useful for understanding molecular recognition. Without numerical indices evaluating the goodness of fit, however, this method is not so significant for practical use in finding stable ligand geometry. The protein-ligand interaction energy is a good indicator in selecting or modeling ligand molecules with strong affinity to the target protein. Empirical energy function and force field parameters are usually used for estimating the intermolecular and intramolecular energetic stability of macromolecules. In order to find a stable geometry and conformation of the ligand molecule rapidly and effectively, the estimation should be made on every manipulation of the molecule to provide a guide to the direction and amplitude for the subsequent manipulation. But, because of the large number of atoms in proteins, it takes rather a long time to calculate the energies by using the conventional atom-pair type algorithm even on an efficient workstation at present. In addition to the six degrees of freedom of rotation and translation, the conformational freedom of non-rigid molecules makes the problem very difficult and time-consuming. Therefore, most of the docking processes on 3D-CG are performed without energy estimation, by monitoring only interatomic distances so that the atoms do not come too close to each other. In 1985, Goodford presented a new method to show favored sites for such functional groups as amino, hydroxy, and carboxyl groups, and water inside the ligand binding cavity of a protein (25). The favorable sites for each functional group and water, which are contoured at a certain energy level from the map of total interaction energy consisting of van der Waals, electrostatic and hydrogen bonding interactions, are shown on graphic displays as bird cage models. The method seems to be very useful for designing new structures by adding or modifying functional groups which are expected to enhance the binding. But, it is not suitable for interactive docking studies to find stable relative geometries of the ligand molecule.
19
P a t t a b i r a m a n et al. have presented another approximation method for real-time estimation of interaction energy between a protein and ligand (26). They used the square root of the product of the Lennard-Jones potential parameters of the two interacting atoms to approximate interaction energy between the pair. On each grid point defined in the ligand binding site, they precalculated two sets of data corresponding to the attracting and repulsive terms of the potential function. Although their method enables the real-time estimation of intermolecular van der Waals interaction energy, it is not so useful for practical purposes because other energies such as those of electrostatic and hydrogen-bonding interactions are ignored.
Details of the Program GREEN Intermolecular interaction energy between a protein and a ligand molecule is usually thought to consist mainly of van der Waals, electrostatic and hydrogen-bonding interactions. It can be calculated by the conventional empirical method by Eq. 1, where A and B are the LennardJones parameters, C and D are the hydrogen-bond parameters, rij is the distance between interacting atoms i and j, q is the atomic charge, s is the dielectric constant of the medium, and Nnb and Nhb are the number of atom-pairs included in the calculation of each energy term. E i . r t . . . . . tecutar = Eva,~ ar
W a a l s -3t- E e l e c t r o s t a t i c + E H - b o n d
Nnb Nnb Nhb ___ ~ ( A i j r i j--2 l _ B i j r i j--6 )_jr_ ~ qiqj "~- ~ (CijFij- 2I - - D i j r i j - o1 ) . . erij i,j i,j z,.l
[1]
The calculation takes a rather long computational time because of the large number of atoms in a protein and consequently the l a n e number of atom-pairs between the protein and ligand. We have developed an approximation which greatly speeds up the calculation of the intermolecular interaction energy for real-time use in docking studies. The energy calculations in our approximation method are performed in two phases, the calculation of grid point data by using the protein structure, and the energy calculation by using the grid point data and ligand structures. Once the grid point data have been calculated and stored in a memory or files, the second phase can be performed consecutively for various ligand structures with use of the tabulated data.
20 On each grid point in the ligand binding site, we calculate and store the van der Waals energy term for various probe atoms, electrostatic potential term, expected sites and characters of hydrogen bond partners in the ligand, surface code and other items. Calculation of the Grid Point Data Calculation of the grid point data is as follows. A three-dimensional grid with a regular interval (typically 0.4-1.0 A) is generated inside the binding pocket of the protein molecule (Fig. 4). On each grid point, the van der Waals interaction energy between a probe atom and the whole protein molecule is calculated by using the empirical potential function. Several types of atoms are used as the probe and the energy is calculated and stored separately for each probe atom type. Every atom species that exists in the ligand molecules to be studied is adopted as the probe atom (e.g. carbon, hydrogen, nitrogen, and oxygen). For the van der Waals energy term Gvdw, the Lennard-Jones type potential function as shown in Eq. 2 is used. In Eq. 2, rij is the distance between the probe position on the i-th grid point and thej-th protein atom. As the empirical potential parameters Aij and Bij, those given by Weiner et al. (27,28) are taken currently. Gvdw,i --
protein atoms E ( Z i j r ~ 12 - Bijr[j 6) J
[2]
The electrostatic potential term Gelc is calculated by using the Coulomb potential as in Eq. 3. In Eq. 3, the definition of rij is the same as in Eq. 2. qj is the atomic charge on the j-th protein atom. The value of this term is equivalent to the electrostatic interaction energy in the case that the probe atom bears a positive unit charge. K is a constant to convert the energy unit to kcal/mol. protein atoms
G~l~.i =
~
j
If qj
eriJ
[3]
Determination of the dielectric constant inside the protein molecule is a difficult but an important problem. A constant value, which is often used for simplicity, is not very realistic. We usually use a distance-dependent approximation for the dielectric constant (i.e. ~ = frij where f varies from
21 I to 4). The approximation may still be oversimplified, but it is better than a constant dielectric model when solvent molecules are not explicitly treated in the calculation. The model somehow incorporates shielding of electrostatic interaction by mediating atoms and ions.
Calculation of the Intermolecular Energy When a ligand molecule is placed and manipulated in the gridded region, the interaction energy between the protein and the ligand molecule can be estimated by using the three-dimensionally tabulated energy terms as described above. The tabulated data on the grid point nearest to each ligand atom are used for the calculation. The interaction energy between protein and ligand (Einter) is calculated by using Eq. 4. ligand a t o m s
k
Van der Waals interaction energy is calculated simply by summing up the van der Waals energy term Gvdw(k) on the nearest grid point from the k-th ligand atom. Among the van der Waals energy terms for several probe atom types, the proper term is chosen according to the atom type of each ligand atom. Electrostatic interaction energy is calculated by summing up the product of the electrostatic potential term Gelc(k) on the
ii
LL"k,
J
r
/~
9 9
~\
9.
I/
/
f
X
\
"
.
probe atom (C,H,N,O...) 9
~ f
I
L, ~ . . . j
~'1~ ) ( / \
/
----~
\
,
/•/•
/
~
/,
/
II/
~/f
~
~ %
atom acce~ Lable I /" -"~'~\ region ( ned p ~~ \ \, 9 by Gvdw) "- ~'~\"'~'--( / Il ligand l o l e c u l e ~
9
\
~
\
/
t
protein atoms ~ , . ~ .
Fig. 4 Calculation of the grid point data.
Fig. 5 Calculation of the interaction energy by using the grid point data
22 nearest grid point from the k-th ligand atom and the atomic charge qk on the k-th ligand atom. It would be better to use interpolated values derived from those on the eight neighboring grid points rather than those of the nearest grid point Hydrogen B o n d s
Hydrogen bonds play an important role in the specific recognition of molecules in biological systems. The hydrogen bonding force originates essentially from a combination of van der Waals and electrostatic interactions. But, some empirical force-field calculation methods include the hydrogen-bonding energy term in addition to the van der Waals and the electrostatic energy terms for practical reasons. Several types of potential functions have been proposed to express hydrogen bonding force, where the hydrogen atom as well as the hydrogen donor and acceptor heteroatoms are treated taking into account the atomic distances and angles among them (29,30,31). Hydrogen bonding energy in such functions could easily be calculated, if the coordinates of all atoms involved are known. The positions of hydrogen atoms in protein molecules, however, usually cannot be determined by X-ray crystallography. There are some functional groups such as hydroxy and amino groups whose hydrogen cannot take definite positions because of some degrees of free rotation. Moreover, it seems to be unnecessary to elaborate in calculations of the uncertain energy term in a docking study where the protein structure is assumed to be rigid as a first approximation. Imprecise estimation of hydrogen bonding energy is thought not to be significant, if we consider an allowed flexibility of actual protein atoms. In the GREEN system, we decided not to calculate hydrogen bonding energy using potential functions, but to count the number of hydrogen bonds possibly formed at the current position of the ligand molecule during the docking process. The GREEN system provides a function to calculate the expected region of the hydrogen bonding partner according to each hydrogenbonding functional group, such as hydroxy, primary sp 3 and secondary sp 2 amines, aromatic ring nitrogen, and carbonyl groups, taking into account the directions of lone pairs and hydrogens attached to the heteroatoms as well as the distances. For all the functional groups in a protein molecule, the expected regions are calculated and each grid point is examined to see whether it is inside the region or not. A hydrogen
23 bonding flag, which also expresses the hydrogen bond character, donor or acceptor, is assigned to the grid point inside the region, and stored as one of the grid point data. During the docking study on 3D-CG displays, the hydrogen bonding flag in the grid point data is used to detect possible hydrogen bond formation between the protein and ligand. For each functional group in the ligand molecule, the hydrogen bond flag of the nearest grid point is referenced. In order to refine the ligand geometry to the precise minimum, energy minimization by means of the Simplex algorithm (32) can be performed, where rotation, translation and bond rotation of the ligand molecule are allowed. Optionally, van der Waals and electrostatic energy terms can be calculated by the conventional atom-pair type method in the minimization. More precise energy refinement which takes into account all degrees of freedom of the protein-ligand system should be done by using an external molecular mechanics program such as AMBER (33) or CHARMm (34).
Visualization Tabulated data are used not only for energy calculation but also for visualization of the physical and chemical environment of the drug binding site of the protein on the 3D computer graphic display. This facilitates the initial introduction of a new ligand molecule into the ligand binding site. By using the van der Waals energy term in the tabulated data, an "atom acceptable region" can be displayed. The region is defined as a group of grid points whose van der Waals energy term Gvdw is below a certain level (usually taken as 0.0 kcal/mol). On the 3D-CG display, the region is shown as a "bird cage" r e p r e s e n t a t i o n by threedimensionally contouring the van der Waals energy. As van der Waals energy terms are prepared for several probe atom types, the region can be defined for each atom type. The cage is usually color-coded according to the levels of the electrostatic term of grid point data. Plate 1 shows the structure of horse liver alcohol dehydrogenase, whose structure is solved as a complex with coenzyme NADH, catalytic Zn 2+ ion and inhibitor dimethylsulfoxide. Atomic coordinates were taken from the Protein Data Bank entry 6ADH (35). In Plate 1, the dimethylsulfoxide molecule at the active site was taken away from the crystal
24 structure, and grid point data were calculated on each grid point generated in and around the region which the ligand molecule occupied. The atom acceptable region is represented by a bird cage which is contoured at the energy level of 0.0 kcal/mol for van der Waals term Gvdw of the carbon probe. The color of the cage indicates the electrostatic potential term Gelc from the charges of protein atoms. It is clear that the electrostatically most positive region (red to yellow) extends near the catalytic zinc ion. In Plate 1, substrate ethanol is fitted to the "atom acceptable region" (ball and stick model). With such a cage representation, one can dock molecules much more efficiently and rationally than with the conventional docking procedure as shown in Plate 2. Furthermore, such a representation helps one to model new drug molecules which are highly complementary to the binding site cavity in shape as well as electrostatic character. The "atom acceptable region" may appear similar to the conventional molecular surface representation. But, the molecular surface representation of the ligand binding site is based only on the van der Waals radii of protein atoms, whereas the radii of the ligand atoms are also taken into account to some extent in the "atom acceptable region". The region shows spatial positions which the center of each ligand atom can occupy without severe contacts with protein atoms. The "atom acceptable region" is more useful than the molecular surface, because it clearly shows the energetically favorable region for the binding of drug molecules. The hydrogen bonding flag in the grid point data is used to display the "hydrogen bonding region" representation. The region is either shown as a "bird cage" picture by surrounding the grid points where hydrogen bonding flags are set, or as groups of small symbols at grid points. The cages or symbols are color-coded according to the type of protein functional group affecting the region. The representation shows that the displayed region is affected by the hydrogen-bonding functional group on the protein molecule. If a hydrogen bonding partner exists in this region, then a strong interaction would be expected between the partner and the protein.
25 Plate 3 shows the "hydrogen bonding region" in a part of the substrate binding site of E. coli dihydrofolate reductase (13). The colors of the cages indicate the hydrogen-bonding characters expected from the protein functional groups affecting the region. The characters are divided into three types: hydrogen donor, hydrogen acceptor and ambivalent. Red: hydrogen donor region which is affected by hydrogen-donating functional groups of protein, such as arginine and lysine side chains and main-chain amide N-H. Blue: hydrogen acceptor region which is affected by hydrogen-accepting functional groups, such as main-chain carbonyl oxygen and aspartate and glutamate side chains. Yellow: ambivalent region from functional groups which work either as hydrogen donor or as hydrogen acceptor (free-rotating hydroxy and water molecule). The protein structure is shown by a pale-colored skeleton, and the inhibitor methotrexate, which is bound in the crystal, is shown by a yellow skeleton. It can easily be seen that the functional groups of methotrexate are located at complementary positions to the hydrogen bonding regions of the protein. Representation of the "hydrogen bonding region" is useful for locating the positions of hydrogen bonding functional groups of drug molecules during the docking operation. Furthermore, the representation helps one to design positions of complementary hydrogen-bonding functional groups, when one wants to create drug molecules with more specific hydrogen-bonding capability. Plate 4 simulates the position of an inhibitor, trimethoprim, in the atom acceptable region of dihydrofolate reductase. The position of inhibitor methotrexate in the crystal structure is also shown for comparison.
Designing New Structures Using the Program GREEN. The program GREEN is useful not only for docking studies, but also for designing new structures directly based on the receptor structures. The program provides functions for model building, such as connecting fragment structures, addition or deletion of atoms or groups and replacing atomic elements. With the stable structures of the complex obtained by docking studies or the crystal structures of the drug-receptor complexes, it is possible to modify the drug structures by adding or replacing substructural fragments so as to obtain more favorable structures for interaction with the receptor. The various energy calculations and
25 visualizations provided in this program serve this purpose. In addition to lead optimization, the program is also useful for lead generation. One can construct new molecular structures interactively on 3D-CG, so as to fit well the cavity shape and properties. Structures should be constructed so that functional groups can interact with those of the receptor as much as possible, and so that the atoms can fit well inside the cavity. At the same time, the structures should be stable, or at least not unstable, intramolecularly, and not be too close to receptor atoms. The validity of the constructed structure is monitored by real-time energy estimation at eve,--] step of the procedure. In addition to this interactive approach, we are developing methods for automatic generation of new drug structures t h a t satisfy the shape and various properties of the receptor cavity. By these methods, it should be possible to obtain structures with new skeletons and new functional groups, among which a new lead compound might be found.
Summary of the Program GREEN The program GREEN has been developed for rational docking simulation and also for the construction of new structures based on the receptor structures. As regards docking simulation, the program covers almost all the necessary functions. In addition to the functions that are commonly implemented in the conventional programs for computer-aided drug design, the program GREEN provides the following features: (1) Real-time estimation of the intermolecular interaction energy by the approximation method, together with precise calculation of the energy in the conventional atom-pair-type calculation. (2) Representation of the "atom acceptable region" and physical and chemical properties, such as electrostatic potentials and expected hydrogen bonding sites in ligands. These features facilitate the initial introduction of new ligands to appropriate positions inside the receptor cavity on 3D-CG. (3) Real-time calculation of the intramolecular energy of the drug molecule, for every operation of bond rotation, by using the AMBER force field.
27
(4) Memorization of trajectories of 3D manipulation. Stable geometries can easily be retrieved after a series of interactive docking studies by use of the memorized geometries and energies. (5) Partial energy estimation, which enables a head-to-tail fitting for flexible drug molecules. (6) Interactive optimization of geometry and conformation of the drug molecule by the Simplex method. (7) Display of the contribution of each atom in the drug molecule to the total intermolecular interaction energy. (8) Display of the electron density map from crystallographic analyses of protein-ligand complexes. For determination of the position and structure of the ligand, energetically stable ones can be referenced by superposing them on the ligand electron density. (9) Interactive molecular-modeling functions which enable us to design molecules fitting well to the shape and various properties of the cavity. These are expected to be useful not only for lead optimization but also for lead generation as indicated before. In order to select the most probable structure of the protein-ligand complex, it would be desirable to compare several possible structures of the complex. If necessary, they should be fully optimized by energy minimization, taking into account the flexibility of the protein molecule. In our method, structures are refined by calculations which are done outside the GREEN program by using the AMBER or other molecular mechanics/dynamics packages developed for macromolecules. The GREEN program should provide an efficient tool not only for interpretation of the structure-activity relationships of various drug molecules, but also for the design of new structures based on the known receptor structure. 4. A P P R O A C H E S BASED ON MOI~ECULAR S U P E R P O S I T I O N
When the receptor structure is known, rational approaches seem to be feasible to some extent. However, it seems to be very difficult to find rational approaches, when the receptor structure is unknown. Nevertheless, most drug development studies have to be made without any knowledge of receptor structure, at least initially. So, drug design is done on the basis of comparison of the structures of a number of known active
28
and inactive compounds. In this situation, the elucidation of the structure-activity relationships is very important and is the starting point for designing new structures. The QSAR method has been developed mainly for this purpose. However, the method has a limitation that the design of new molecules as well as the interpretation of the structureactivity relationships must usually remain within the framework of derivatives with the same skeletal structure. It is necessary to establish approaches with three-dimensional structures of molecules, in order to compare the structures and properties of known drugs with different skeletons. The comparison of three-dimensional structures has been done for a long time by inspecting molecular models made from bamboo, metal or plastic from appropriate directions. Superposition of molecules is one of the most efficient ways to compare the structures and properties of multiple molecules. But, this is impossible with the above types of material molecular models. On the other hand, it is possible to superpose molecules on 3D-CG displays interactively or to superpose them computationally followed by visualization of the results. Such computer-aided methods enable us to store structures of the superposed molecules and to compare not only molecular structures but also physical properties with quantitative measures.
Methods for Superposing Molecules Comparison of the structures and properties of drug molecules would be meaningless, unless their biological activities are based on binding to the same receptor site in spite of their superficial similarity. This is because drugs i n t e r a c t i n g with different receptors should have different requirements for structures and properties. Molecules with apparently different chemical structures often exhibit the same kind of biological activities and pharmacological behaviors. Among them, there are many examples where bindings to the same receptor have been confirmed by receptor binding assay with radioisotopic ligands. There are many crystal structures in which a protein molecule stably binds ligand molecules whose structures are quite different from that of the natural substrate or the natural bio-active molecule. Such ligand molecules are tightly trapped inside the cavity or surface
29 cleft through hydrogen bonding, electrostatic, and van der Waals interactions, which work through space between the two molecules. This fact strongly suggests t h a t the physical and chemical properties are much more important than the chemical structure itself in these intermolecular interactions to be recognized by receptor. Therefore, the abilities of various molecules to bind to the same receptor are determined not only by similarities in molecular shape (not necessarily overall, but in part, as described before) but also more importantly by the relative arrangements of their submolecular physical and chemical properties in the threedimensional structures of the molecules. Accordingly, for the purpose of structure-activity relationships, molecules should be superposed in terms of their physicochemical properties but not in terms of their atomic positions or chemical structures. Methods for superposition conventionally used so far are: (1) l e a s t - s q u a r e s calculation specifying the a t o m - p a i r s between molecules (2) 3D manipulation of individual molecules on 3D-CG with visual judgment of the goodness of fit. The least-squares method cannot be applied easily to molecules in which the atom-pair specifications are difficult when large discrepancies exist between their chemical structures. If it can be applied, this method gives the least-squares residual as a measure of"goodness of fit". Specification of at least three atom-pairs is required for this calculation. This superposing method is routinely performed for the common skeletal part of two structures to reveal the similarities and differences in other parts. The biological activities of a series of compounds are often discussed on the basis of the similarities and differences of the volumes occupied by the two molecules. In cases where the two structures look alike, the differences in structure and properties are so clear t h a t superposing the molecules is not necessary. Superposition by the positions of heteroatoms is also often performed to examine biological equivalence, when the two structures are different from each other. But, it is not always easy to assign the corresponding atoms in the two molecules. Moreover, most of the superposition methods are done without taking into account the properties of the heteroatoms and the direction of interaction with possible partners in the
30 receptor. Although an approximate superposition might give information for substructural correspondence in a set of structurally different molecules, a significant superposition of such molecules seems to be very difficult. Another problem with the superposing method is the conformations of flexible molecules. Usually, superposition has been performed assuming the conformation of each molecule to be the same as in the crystal s t r u c t u r e , or the energetically most stable s t r u c t u r e obtained from molecular mechanics or molecular orbital calculations. But, it is doubtful whether the active conformation is the same as t h a t found in the crystal or in solution, or that of the stable state of the isolated single molecule; the active conformation may not coincide with any of these local energym i n i m u m structures. It seems to be pointless to superpose molecules with conformations other than the active conformation. In the superposition of flexible molecules, the conformations of two molecules can be varied by 3D manipulation interactively so as to fit as well as possible with each other by visual judgement. As the specification of pairs of corresponding atoms in the two molecules is not necessary, the method can be applied to very different structures. The disadvantage of such a superposition method is, however, t h a t it does not give us any numerical index of the goodness of fit. To obtain quantitative and reproducible results of superposition, appropriate indices to show the goodness of fit are necessary.
Receptor Models Three-dimensional models of the receptor cavity can be made based on the superposed structures. More accurate or more probable models would be produced based on multiple molecules which bind to the same receptor, t h a n based on a single molecule. The structure-activity relationships cannot be interpreted at all by a single active molecule. The greater the difference in structures used for the superposition, the more useful is the information obtained. In the "Active Analog Approach", Marshall et al. proposed useful definitions for the volume occupied by the receptor, based on the superposition of active or inactive molecules (36,37). They are the receptor-excluded volume defined as union of the volume of the active molecules, and the receptor-essential volume
31
defined as union of the volume of the inactive molecules minus the receptor-excluded volume. It seems to be useful for drug designers to consider the common volume, the differences in volumes of molecules, and the volume occupied by at least one molecule. The validity of the receptor model completely depends on the validity of the superposition. Therefore, superposition of molecules should be done as rationally and logically as possible. We have developed a rational method for superposing molecules based on the prerequisite of specific binding to a common receptor, and for threedimensional receptor mapping to describe the environment of the receptor cavity.
,..Program RECEPS~
Conventional Methods.)
Drug Structures
Drug Structures
in terms of spatial arrangement of physical & chemical
in terms of atomic positions
,I,
properties
9no structural correspondence required 9numerical indices to show "goodness of fit"
,I, /
\
least-squares method manual superposition specifying the atom-pairs with visual judgement 9structural correspondence required 1
Atomic Coordinates of Superposed Molecules
j
9no numerical index
Fig. 6 Superposition of molecules.
Details of the Program System RECEPS In our method, molecules are superposed in terms of physical and chemical properties by using a three-dimensional grid, whereas in the conventional methods, they are superposed in terms of the atomic positions. The specification of atom-pairs is not necessary, although a template molecule to which other molecules are superposed is required, as in other superposition methods. First, the template molecule must be chosen whose structure should be rigid or conformationally well-defined (although this limitation has been removed to some extent by the devel-
32 opment of functions for automatic superposition). On the 3D-CG, a rectangular box is set up in order to extract the essential region for specific binding to the receptor, and to determine the range of grid point calculation (Plate 5). The lengths of three edges and the position of the box are determined interactively so as not only to cover the region required by the template molecule, but also to have a sufficient reserve space for the subsequent superposition of other molecules. Then, a threedimensional grid with a regular interval of 0.4-1.0 .~ is generated inside the box. For each grid point, the following physical and chemical properties are calculated and stored: electrostatic potential, charge distribution, expected hydrogen-bonding character, flag on occupancy by each molecule, and flag for molecular surface. New molecules (hereafter called trial molecules) are superposed on the graphic expression of these three-dimensionally tabulated data. The goodness-of-fit values are calculated on the basis of spatial similarity of the physical and chemical properties of molecules by using the tabulated data. The values are displayed on the 3D-CG and updated during interactive manipulation (rotation, translation and bond rotation) of the trial molecule during the superposing process. The molecule is manipulated until satisfactory goodness-of-fit values are obtained. Trial molecules are superposed one after another, and the resultant atomic coordinates are stored in a file successively. From the atomic coordinates of every superposed molecule, the grid point data are calculated, from which united grid point data are obtained by applying weights for biological activities. These united grid point data describe the threedimensional environment of the receptor pocket. A receptor cavity model, which provides information on cavity size and shape, surface electrostatic potentials, locations of hydrogen-bonding heteroatoms and other features, can be obtained from the united grid point data. The receptor cavity model can be presented on the 3D-CG in various ways and can be further modified (including its enlargement) by superposing additional molecules. The correct superposition enables us not only to extract the structural and physicochemical requirements for the biological activity, but also to determine their required spatial arrangement. One of the major characteristics of our method is that the goodness-of-fit values can be estimated in real time t h r o u g h o u t the interactive
33
superposing process on the 3D-CG. Such values provide a quantitative measure of the extent of superposition. Goodness of Fit The current version of the grid point data file tabulates the address of each grid point, flag of occupancy by molecules, charge distribution, electrostatic potential and hydrogen bonding character. They are used to r e p r e s e n t the spatial a r r a n g e m e n t of properties of s u b s t r u c t u r e s in molecules and to calculate the goodness of fit of each molecule in real time. Goodness-of-fit values are calculated by using the tabulated data for the template molecule and the atomic data for the trial molecule, which are varied by the interactive manipulation. The goodness-of-fit terms t h a t we currently use are summarized as follows: Fshap e - - _
Number of common occupied grid points Number of occupied grid points of template tool.
Fchar9 e = __ E i
cj -
qil 2
Ei ~jl ~ j" grid point nearest to atom i
cj" charge distribution of grid point j qi" charge of atom i E i ( Vtemp,i Vtrial,i ) Felpo -- - V~/~-~i Vtemp,i 2 / ~ / E i
]Vt,-i~,,i 2
v
Vt~mp,i" electrostatic potential at the grid point i of the template molecule Vt,~ial,i" electrostatic potential at the grid point i of the trial molecule FH_bond z --
Number of common H-bonding grid points Number of H-bonding grid points of template tool.
Equations for the calculation of"goodness~f-fit" indices
The charge distributions, which we have tentatively defined from the atomic charges so as to be distributed on the grid points around the atoms in a Gaussian distribution, are calculated inside the van der Waals volume of each molecule, whereas the electrostatic potentials are calculated outside it. To improve these indices for goodness of fit, further modification of the equations, and replacement of terms or addition of new terms
34 may be required. For this purpose, the program has been designed to allow alterations to be made easily by users. Suitable terms and equations should be selected on the basis of their effectiveness by applying them to distinguish effectively the correct superposition from incorrect ones.
Hydrogen Bonds and Electrostatic Potential Atomic charges should be calculated in advance by molecular orbital calculations. In the case of a flexible molecule, the calculations are made based on the crystal structure or the energetically most stable conformation of the molecule, as the active conformation cannot easily be identified. Hydrogen-bond category numbers are assigned in advance to all hydrogen-bonding heteroatoms in the molecule. The geometries of the attached hydrogen atoms and ambiguity of their position by free rotation, as well as the hydrogen-bonding character (donor, acceptor or both) are judged according to the category number. The category number corresponds to each hydrogen-bonding functional group, such as a hydroxy O, carbonyl O, ether O, carboxyl O, amino N, amide N, aromatic N and sulfhydryl S. For the formation of hydrogen bonds, matching between the expected locations and the character of the hydrogen bonding partners of two molecules is judged during the superposition process. Allowable locations are assumed to be 2.5 to 3.1 .~ in distance and allowable deviation from the orientation vector of X-H or Y-lone-pair electrons (X, Y = N or O) is taken as 30 ~. For all hydrogen-bonding functional groups, the program provides functions for generating the positions of lone-pair electrons automatically and for predicting the possible locations of hydrogen bonding partners, taking into account the freedom of bond rotation of the C-X bond in C-X-H, and the C-Y bond in C-Y-lone-pair electrons. The correlation of electrostatic potentials between the template and the trial molecules is always calculated at the surface grid points of superposed plural molecules as discussed afterwards. The surface grid points vary at every stage of manipulation of the trial molecule.
Application to Dihydrofolate-Methotrexate System Methotrexate (MTX) is a potent inhibitor of the enzyme dihydrofolate reductase, which reduces dihydrofolic acid (DHF) to tetrahydrofolic acid
35 with the aid of the coenzyme NADPH. The structures of MTX and DHF resemble each other well, both having a pteridine ring.
H2N
N
H
(CH2)2COOH
dihydrofolate(DHF)
NH2 N H2N
N
N
I
N
II C -- N ~ CHCOOH
CH3
H
I
(CH2)2COOH
methotrexate (MTX)
Fig. 7 Chemical structures of dihydrofolate (D/IF) and methotrexate (MTX).
The enzyme has been well studied for a long time as an attractive target of rational drug design (38,39,40,41). The crystal structures of a number of isozymes from various sources and in various complexed states have been elucidated (13,42,43,44). The structure of dihydrofolate reductase
101
Fig. 8 Schematic picture of the ternary complex of dihydrofolate reductase from L. casei, the inhibitor methotrexate (MTX), and the cofactor NADPH. (Reproduced from (13) by permission of Prof. Joseph Kraut.)
35
from L. casei elucidated as a ternary complex with the inhibitor MTX and NADPH by X-ray crystallography by Bolin et al. (13) is shown in Fig. 8. The atomic coordinates are taken from the Protein Data Bank. The active conformation of MTX is assumed to be the same as in the crystal. In order to verify the validity of the program RECEPS, we have attempted the superposition of the DHF molecule on the active conformation of the MTX molecule (45). Although we can simulate the active conformation of the natural substrate DHF by means of a docking study using the known structure of the enzyme, here we discuss it by the superposition method with the MTX molecule whose active conformation is known and without using the enzyme structure. For the conformation of the DHF molecule trapped in the enzyme active site, two representative models have been proposed so far (13,40), as shown in Fig. 9 and Plate 6.
~N
TRP 21
R
~_.~TRP
N
-b - H
H
N
H--N8
~
,, H/b-H .......o\~j.~/'N~o ~ ~ factor X a .I
prothrombin
+
fibrinogen
thrombin I
fibrin I
Polymerization (Coagulation) I
fibrin polymer plasmin
+& Degradation (Fibr ino lys i s )
Fig.
1. B l o o d c o a g u l a t i o n a n d f i b r i n o l y s i s c a s c a d e s .
the
the
84 treatment
of
thrombosis
(1,2).
have been carried out Kikumoto et al. inhibitory
bleeding,
or
In a search for thrombin inhibitors,
found that arginine derivatives exhibit selective
activities
toward
different
though the active site structures o f similar amino
to that o f
acid
extensive inhibitor studies
trypsin
sequences.
judging
The
purpose
serine
these enzymes
seems
from the homology our
of
(3).
proteases
study
to be
in
their
(4,5) was
to
elucidate the mechanisms of the selective inhibition based on the three-dimensional thereby
to
structures
find
useful
of
trypsin-inhibitor
suggestions
for
the
complexes
design
of
and
enzyme-
specific inhibitors. 2.
X-RAY
ANALYSES OF T R Y P S I N - I N H I B I TO R
COMPLEXES
T a b l e 1 summarizes crystal data on the five complexes whose
In this paper, the
structures were determined by X-ray analysis.
trypsin-(2R,4R)MQPA complex will be described in detail. Table 1 Crystal
d a t a o f bovine tryps'tn
-
i n h i b i t o r complexes. .-
Inhibitor
MQPA (2R.4R)
PNPA
p31 21 6 55.34
PSI21 6 55.35 109.38 2. 5 0. 215
Space Group
Z a
b
(A)
C
Fig.
109.51 2. 5 0. 172
(A)
Resolution R-f ac t or
2 shows the
structural
MQP P21 21 21 4 55.37 56. 73 66.91 3. 0 0. 401
formula of
MQPA (2R,4s)
MQPA (2s.4R)
P21 21 21 4 55.49 56. 79 67. 07 2. 4 0.265
P21 21 21 4 63. 51 69. 07 63. 81 2. 4 0. 244
the thrombin
inhibitor,
(2R.4R)MQPA ( ( 2 R . 4 R ) - 4 - m e t h y l - 1 - I N 2 - [ ( R , S ) - 3 - m e t h y l - l , 2 , 3 , 4 - t e t r a h y d r o - 8 - q u i n o l i n e s u l f o n y l l - L - a r g i n y l l - 2 - p i p e r i d i n e c a r b o x y l i c acid)
(3).
MQPA
quinoline
is composed o f portion
and
three
portions:
piperidine
an
portion.
arginine There
portion,
are
three
asymmetric carbons in addition to the a carbon of the L-arginine. The configuration at
the 3 position of
the quinoline ring
is a
mixture of R and S .
Depending on the configurations at the 2 and
4
piperidine.
positions
of
the
(2R.4R)MQPA
is
to
be
used
there for
are
four
clinical
stereo-isomers.
purposes
as
an
antithrombotic agent. Hereafter, M Q P A is used to denote ( 2 R , 4 R ) M Q P A and other stereo-isomers are always referred to with stereo notations.
85
I b Il -I
quinol ine port ion
Fig.
2.
I
I
I.
I
Structural formula of ( 2 R . 4 R ) M Q P A .
Fig. 3 shows the formulae of PNPA( (2R,4R) - 4 - p h e n y l - l - [ N 2 - ( 7 - m e t h o x y - 2 - n a p h t h a l e n e s u l f o n y l ) - L - a r g i n y l 1 - 2 - p i p e r i d i n e c a r b o x y l i c acid) (6)
and MQP(4-methyl-l-[Nz
-[
(R, S) -3-methyl-1, 2, 3, 4-tetrahydro-8-quino-
8:"
l i n e s u l f o n y l l - L - a r g i n y l l p i p e r i d i n e ) (3).
0 - A r g -N
3
COOH
PNPA Fig.
3.
w
MQPA
has
strong
thrombin
toward
bovine
and
procedure
Crystallization I I I),
selective
(Ki=O. 0 9pM),
for
X-ray
was
inhibitory
but
still
(Ki=5.OwM).
trypsin
crystallize the complex o
0 . 01 ml
MQP
Structural formulae o f PNPA and MQP.
bovine
the
- A I- g - N >
activity
significant
We
therefore
carr ed
ys
of
is
out
by
the
the
t ryps i n-MQPA
hanging
length g r e w
in 2 weeks.
Crystals of
T h e crystals are isomorphous w i t h
the native trypsin crystal registered in the P r o t e i n D a t a Bank with
an
identification code
collected
o n an Enraf-Nonius
reflections had
measured
within
3PTN
(8).
were
calculated
with
The
2.5
A
the
resolution, in
In all
5.400
of
5.967
reflections
the calculation.
coordinates
(7)
intensity data were
C A D 4 diffractometer.
I F o ( > l . O a ( l F o ~ ) and were used
phases
A
(Sigma T y p e
1. 0 mg/ml CaCle, and 0.26 M ammonium sulfate
was kept in a reservoir w i t h 0.97 M ammonium sulfate.
1 mm
to
complex.
d r o p method.
d r o p of solution containing 60 mg/ml trypsin
2. 5 mg/ml MQPA,
tried
T a b l e 2 shows
bovine trypsin and MQPA. ana I
toward activity
trypsin
Starting in
the
86 Table 2 X-Ray analysis of the trypsin-MQPA complex at 2.5 A resolution. (1) Crystallization by the hanging drop method w i t h a-55. 3420. 0 3 , c-109. 5 1 2 0 . 2 0 P3121. Z = 6
(NH4)2SO4
A,
isomorphous w i t h the native trypsin crystal, PDB 3PTN
(2) Electron density m a p calculated with the 3 P T N parameters R=O.31 for 5400 reflections w i t h lFol 2 I.Oa(lFol) (R-0.24 for 3 P T N 1Fols) (3) Hendrickson-Konnert refinement
trypsin. MQPA’s quinoline and arginine R=O. 328 3 R=O. 258 trypsin, whole MQPA and 70 waters R=O.172. RMS deviation of bond distances RMS deviation of angle distances
native trypsin crystal.
A
reliability factor
= =
0.014 0.033
A
i\
(R-factor) of 0.31
was obtained, while the corresponding value for the 3PTN data w a s
0.24.
The
small difference
3 P T N coordinates were complex crystal.
Fig.
a good
in the R-factors approximation
indicates that
for
the
the
trypsin-MQPA
4 shows the electron density map o f
Fig. 4. Electron density map of the trypsin-MQPA complex. The quinolinesulfonyl group i s shown.
the
87 trypsin active site of the complex calculated at this stage. map quality w a s s o
good
that
the directions of
The
the two sulfonyl
oxygens and the 3-methyl group attached to the quinoline ring were easily
identified.
piperidine quinoline refined
portion and
the
But, was
(9).
and
isotropic
of
three-dimensional
group,
and
temperature
factors
of
the Hendrickson-Konnert the piperidine
After all non-hydrogen atoms of MQPA identified and
included
Fig.
in
5 shows the
structure of the trypsin-MQPA complex.
refinement
not
the carbonyl
T h e present R-factor i s 0. 1 7 2 .
the refinement.
the the
showed clear electron density for
ring and the carboxyl group.
could
first
of
fitted
MQPA with
were refined, 70 water molecules were
We
we
so
R-factor was decreased from 0.328 to 0.258 and
The
the second map
Similar
interpretat ion
possible,
arginine portions without coordinates
trypsin and the fitted part program
definite
not
see
the
is
in progress
electron
density
for of
the other
complexes.
(2s. 4S)MQPA, probably
because of its weak activity (Ki>500pM).
Gly193
Ser-2 1 7
Trp215
5. Three-dimensional structure o f the trypsin-MQPA complex. Bold line, MQPA; thin line, trypsin. Fig.
3.
THREE-DIMENSIONAL STRUCTURE OF THE TRYPSIN-MQPA COMPLEX In Fig. 6.
of
and
trypsin-BPTI trypsin-APP
the trypsin-MQPA structure is compared w i t h those (bovine basic
pancreatic trypsin
(p-amidinophenyl
pyruvate)
inhibitor)
(10) complexes.
(10)
BPTI
and A P P are bound to trypsin in a similar way through the specific hydrogen bonds at two locations ; ( 1 ) at the bottom o f active site hole and ( 2 ) other hand. MQPA
at the so-called 'oxyanion
the trypsin
hole'.
shows unique hydrogen bonding ; (1)
On the
the hydrogen
88 bonds
at
the bottom
of
the
hydrogen
bonds
at
the
carboxyl
group
and
Gly-193
active s i t e hole
oxyanion
through a water molecule,
hole
and
and
are
Ser-195
(3)
are preserved, formed
main
(2)
between
chain
the
nitrogens,
the carbonyl o x y g e n of the M Q P A
arginine is hydrogen bonded to the G l y - 2 1 6 main c h a i n nitrogen and the
nitrogen
carbonyl those
of
the
MQPA
arginine
oxygen of Gly216.
found
which
the case, does
not
8-pleated
have
the
bonded
carboxyl
the to
T h e carboxyl But this
the trypsin-MQP c o m p l e x
group
was
shown
to that of the trypsin-MQPA
scheme seems to be the result of
to
bonds a r e similar sheet.
lead to the unique scheme.
since the structure of
analysis to be similar present
hydrogen
T h e s e hydrogen
in an anti-parallel
g r o u p at the piperidine might is not
is
by
X-ray
complex.
The
the stable conformation
of the M Q P A molecule itself because the conformation is similar to that found in M Q P A single crystals.
trypsin-APP
trypsin-BPTI
trypsin-MQPA
F i g . 6. Comparison o f three-dimensional structures. Bold lines indicate hydrogen bonds.
4.
MECHANISMS OF SELECTIVE INHIBITION T h e inhibitory activities of
(3), MNP (4-methyI-l-[N2-
MQPA
( 7 - m e t h o x y - 2 - n a p h t h a l e n e s u l f o n y l ) - L - a r g i n y l ~ p i p e r i d i n e ~(6),
PNP
(4-phenyl-l-CN2- (7-methoxy-2-naphthalenesulfonyl) - L - a r g i n y l l p i p e r idine)
(6) and BAP (4-benzyl-l-IN2 - (2-anthraquinonesul fonyl) - L - l y -
syllpiperidine) trypsin,
all
(11)
toward
from bovine
a-thrombin,
sources,
are
factor shown
Xa.
plasmin
in T a b l e
3.
and
These
data c a n be interpreted o n the bases of the X - r a y s t r u c t u r e of the trypsin-MQPA c o m p l e x s h o w n in Fig. 7 and the molecular models of thrombin.
factor X a and plasmin
built
according to the homology
of the amino acid s e q u e n c e s shown in T a b l e 4. Table
3 describes
the
selective inhibition.
three
binding
sites
T h e lower half of
responsible
for
the
T h e s e are indicated by arrows in Fig. 7 and
89 Table 3 I n h i b i t o r y a c t l v i t i e s and d e s c r i p t i o n s o f b i n d i n g s i t e s . a ~
(Ki
Activity lnhibi tor
Thrombin
CH30@@SOZ-Arg-N3 MNP
or
Factor X a
0. 0 7 2
pM) toward
158'.
Plasmin
Trypsin
NA
NA
1. 4
NA
NA
0.
NA
33
23
NA
D e s c r i p t i o n o f binding site
Binding site
A1 a
Arginine binding s i t e
Q u i n o l i n e binding s i t e
Wide
A1 a Wide
Leu
T Yr
Med i u m
Deletion of 6 residues Wide
Narrow
Insertion o f 10 residues Narrow
P i p e r i d i n e binding s i t e
S er Nar r o w
Insertion
Leu99 Medium
Insertion of 5 residues Med i u m
2
of
Serl9O Na r r o w
residues Med i urn
I le63 Wide
a NA: D a t a not available.
Table
4.
T h e arginine binding
trypsin.
The
quinoline
piperidine binding s i t e data
raise several
site
binding
is near
questions
is a region near S e r - 1 9 0 o f
site
Ile-63
concerning
is (*
near
Leu-99
His-57).
and
the
T h e activity
the selectivities, w h i c h
c a n be answered as follows 4 . 1 Why a r e a r g i n i n e d e r i v a t i v e s s u c h a s MQPA a n d MNP i n g e n e r a l
more a c t i v e o n t h r o m b i n t h a n o n t r y p s i n ?
T h e reason is that the arginine binding s i t e in thrombin h a s Ala
at
the
position
arginine binding
corresponding
site o f
thrombin
of trypsin. T h e arginine portion o f
to
SerlSO
of
trypsin.
The
is consequently w i d e r than that the inhibitors c a n thus easily
90
%
Asp I94
x
Gly 193
Fig. 7. Three binding sites of the trypsin-MQPA complex r e s p o n s i b l e f o r s e l e c t i v e i n h i b i t i o n ( i n d i c a t e d by arrows). Table 4 C o m p a r i s o n o f amino a c i d sequences o f b o v i n e enzymes. T r y p s i n - l i k e r e g i o n s o f t h r o m b i n , f a c t o r Xa and p l a s m i n a r e shown. Numbering i s t h a t f o r chymotrypsin. " / " and ' I . ' ' denote a d e l e t i o n and insertion In the chymotrypsin sequence, respectively. A s t e r i s k s w i t h arrows i n d i c a t e s i t e s r e s p o n s i b l e f o r s e l e c t i v e inhibition.
THROMBIN FACT0 XA PLASMIN TRYPSIN
TH FX PL TR
TH
FX PL TR
TH FX PL TR
16
20
I VECQ IVGCR IVCGC IVGGY
80 HSRTRYERKV RNTQ//EGDE HNEKVREQSV DNINVVECNE
140 HAGFKCRVTG /QTKTCIVSG AARTECYITG /AGTQCLISG
40 50 V H L F R K ~ P Q E LL C C A S L I S D R ALLVNE.ENEC FCCCTILNEF VSL/RR.SSRH FCCCTLISPK VSL/N/.SGYH F C C C S L I N S Q
30
DAEVCLSPWQ DCAEGECPWQ VSKPHSWPWQ TCCANTVPYQ
****
.
. . . . . 150
WGNRRETWTTSVAEV F G R T H E K . . . . .C R L W G E T Q / / ..... G T F W C N T K S S . . . . . GTS
200 CECDSGCPFV CQCDSCCPHV CQGDSCGPLV CQCDSCCPVV
90
EKISHLDKIYI EHAHEVEMTVK QEIP.VSRLFR QFIS.ASKSIV
110
RDIALLKLKR FDIAVLRLKT ADIALLKLSR NDIHLIKLKS
160
170
x
********
220 CIVSWCE/CC GIVSWCE/GC GVTSWCL/GC GIVSWCS/GC
PLVERPVCKA PYVDRSTCKL PVIENKVCNR PILSNSSCKS
.....
... ..
70
YPPWNKNFTVDDLLVRIICK Q...AKRFT.....VRV/GD N I L A L S F Y K . . . . .V I L / G A S.....CIQ.....VRL/GE
I(*************
100 HPRYNWKENLO HSRF/VKETYD EP//////.SQ HPSYNSNT.LN
QPSVLQVVNL SS/TLKHLEV GECLLKEAHL YPDVLKCLKA
210 HKSPYNNRWYQN TR..FKDTYFVT CF..EKDKYILQ CS..CK////LQ
60
WVLTAAHCLL YVLTAAHCLH WVLTAAHCLD WVVSAAHCYK
120 P I E L S D YI H P PIRFR/NVAP PAIlTKEVlP AASLNSRVAS
.. 180 S..TRIRITNDM S..SSFTITPNR NEYLDGRVKPTE A..YPCQITSNM
. 230 DRNCKYGFYTH ARKCKFGVYTK ARPNKPCVYVR AQKNKPCVYTK
.. 130 VCLPDKQTAAKLL ACLPEKDWAAETL ACLPP.P..NYHV ISLPT./..SCAS
. .. . . 1 9 0 FCACYKPCEGKRGDA FCACY..DTQ.PEDA LCAGH..LIG.GTDS FCAGY..LEG.CKDS
240 VFRLKKWIQKVIDRLGS VSNFLKWIDK IHKARACAACSR VSPYVPWIEE THRRN VCNYVSWIKQ TIASN
91 enter oy
the active s i t e of thrombin, w h i l e the shorter contact w i t h
of Ser-190
The
result
in
explanation. 0 . 94
trypsin
As
results
the
present
shown
in Fig.
of
in
lower
X-ray 8.
inhibitory
analysis
activity.
supports
this
the O y atom of S e r - 1 9 0 moved
A in the d i r e c t i o n away from the active s i t e hole, leaving Serl90
I
'
F i g . 8 . Movements o f S e r - 1 9 0 Oy a n d L e u - 9 9 C 6 1 upon b i n d i n g w i t h MQPA. Thin l i n e , position i n f r e e trypsin; bold l i n e . position j n the complex; dotted line, steric repulsions leading t o the movements.
more
space
in
of
rotation
the active site.
52'
around
T h i s movement
the C a - C f l
bond.
The
resultant
A,
between S e r - 1 9 0 O y and the arginine NZ is 3.74 been
A without
2.91
explanation
that
the movement.
Ser-190
acts
Oy
This to
w a s achieved by a distance
w h i c h would h a v e
is consistent
repel
arginine
with
rather than to attract them as a hydrogen bond acceptor. case o f
lysine
critical
inhibitors
factor.
determines
the
such a s BAP,
Instead,
the
selectivity.
similar
hydrogen
Accordingly.
bond
in
to
lysine
form
where
hydrogen
inhibitor
in trypsin w h i l e
thrombin
In the
steric repulsion is not a
ability
The
hydrogen bond w i t h S e r - 1 9 0 O y
the
inhibitors
Ala
can
bonds form
cannot
it
replaces
a
form a
Ser-190.
lysine inhibitors h a v e stronger activities on trypsin
than on thrombin.
F r o m this v i e w point,
the f o u r enzymes c a n be
classified into two g r o u p s according to the a m i n o acid at position 190: thrombin and arginine Ser.
factor X a have Ala.
inhibitors
their
selectivity.
pockets The
are
enzyme
narrow
their pockets are w i d e and ;
plasmin
and
lysine
s h o w selectivity
selectivities
c a n be
and
trypsin h a v e
inhibitors explained
in
show this
way, but trypsin w a s found to
h a v e smaller K m v a l u e s for arginine
substrates
substrates
specificity
than
for
could
substrates h a v e
be
lysine
explained
two binding
sites
by (i.e.
(12).
supposing
This that
substrate arginine
two a m i n o groups)
which
92 can
form
trypsin
hydrogen
Asp-189,
bonds
while
directly
lysine
with
the
substrates
carboxyl
have
one
group
amino
of
group
which can form a hydrogen bond with the carboxyl group through a water molecule.
T h e strong hydrogen bonds formed by arginine seem
to predominate over
the steric repulsion between
for
lysine
and
similarly,
will
inhibitors
the amino group
Thus, the K m f o r arginine is smaller than that
and Oy of Ser-190. in
the
general
trypsin
be
inhibition
stronger
than
by
that
arginine by
lysine
inhibitors.
are the activities of arginine derivatives reversed, becoming stronger on trypsin. when a benzene ring is introduced at position 4 of the piperidine as i n PNP? 4.2
Why
The
reason
Thrombin has an trypsin. dine
this
lies
insertion of
in
10
the
piperidine
amino acids near
binding the
and
become
a
barrier
against
the
there is a pocket for the benzene
Ile-63 o f
benzene
the results o f
the X-ray analysis of
shows h o w the benzene ring o f
In
ring.
ring and activity is
increased by the introduction of the benzene ring. on
site.
These inserted amino acids probably surround the piperi-
ring
trypsin.
for
Fig. 9, based
a trypsin-PNPA complex,
P N P A interacts with trypsin in the
pocket near His-57.
Fig. 9. Three-dimensional structure of the trypsin-PNPA complex. 4.3 Why is MQPA not active on factor Xa? The
reason
replaces Leu prevents
in
tight
lies
in
factor X a binding
the
quinoline
position 99 o f
at
with
binding
the
present X-ray analysis supports
quinoline
this
site,
where T y r
trypsin.
This Tyr
ring
explanation,
of
MQPA. as
shown
The in
93 Fig.
The C 6 1
8.
atom of L e u - 9 9 moved
MQPA by a rotation of 2 4 '
around
A
85
0.
the C a - C y
upon binding with
bond.
T h e rotation
increased the distance between Leu-99 C 6 1 and the 3-methyl carbon o f the quinoline from 3.70 A
hindrance.
A
A
to 4 . 5 2
and eliminated the steric
modeling study in which Leu-99 w a s replaced by Tyr
that steric hindrance would occur between T y r C E ~and the
showed
quinoline C 4 with a distance o f 2.66
A. BAP. i n h i b i t p l a s m i n ?
4 . 4 Why d o e s t h e l y s i n e d e r i v a t i v e ,
BAP has
stronger
group of
plasmin
thrombin
and
and
factor
activity on
inhibitory trypsin,
than on
Xa,
explained
as
the narrow pocket
the wide in
pocket
group of
4.1.
section
The
difference in the activities for plasmin and trypsin arises from a
In plasmin,
difference in the quinoline binding site. of 6
amino acids near
able
room a t
position 99 of trypsin provides consider-
the quinoline binding
to
the
sulfonyl
site and changes the surface
T h e bulky anthraquinone can f i l l
structure from that of trypsin. this space and f i t
a deletion
itself to this surface because group
at
the
position
it
and
can
is connected change
orientation of the ring plane by a rotation around the C - S BAP will
not
inhibit
thrombin because
inserted amino acids of
thrombin near
the benzyl
the
bond.
group hits
the
I t will also not
Ile-63.
inhibit factor Xa, because the anthraquinone hits Tyr at position 99.
Thus,
the
mechanisms
of
selective
inhibition were
explained based o n
the X-ray structures of
complexes
difference
and
the
in
amino
clearly
the trypsin-inhibitor
acid
sequences
of
the
enzyme.
SUGGESTIONS FOR DRUG DESIGN A N D PROTEIN ENGINEERING
5.
The related
explanation
For example, by
adapting
hole,
(2)
carbonyl
for
the
selective
inhibition
to the design strategies for enzyme specific
by end
is
directly
inhibitors.
factor X a specific inhibitors will be obtained; (1) the
arginine
backbone
introducing a phenyl to
increase
van der
to or
fit
the w i d e
active
a benzyl piperidine at
Waal's
interaction and
site the
(3) by
introducing a benzene ring o r an alkyl chain at the amino end to avoid the steric repulsion by Tyr. Direct protein
proofs
engineering
for
the
explanation
experiments.
Table
could
5
be
obtained
by
summarizes proposed
site-directed mutageneses and the expected results.
94 Table 5 Proposed site-directed mutageneses and expected results. (I)
(‘goSer+Ala) trypsin mutant Ki for MQPA Ki for BPTI
(2)
J , t ,
(Ala+lgoSer) thrombin mutant Ki for MQPA t ,
(3) (Ala+’goSer) factor X a mutant
Ki for MQPA
(4)
(Tyr+”Leu)
.
t
K m for Arg substrates K m for Lys substrates
J
K m for Arg substrates K m for Lys substrates
t
K m for Arg substrates K m f o r Lys substrates
t
t
J.
J
factor X a mutant Ki for MQPA J. .
T h e first mutation in T a b l e 5 will
increase the activity of MQPA
by eliminating the steric repulsion between MQPA arginine and Ser
BPTI by reducing the A similar effect will be observed for
i t will decrease the activity o f
But,
Oy.
number of hydrogen bonds.
substrates w i t h arginine o r lysine at the S1 site. arginine
will
substrates
become
substrates will become weaker.
stronger
substrates, water
the
activity
a n e w hydrogen
molecule
between
will and
Oy
the
MQPA.
of bond
Ser
lysine
of
T h e r e will be steric repulsion
between MQPA arginine and O y atom by decrease
that
T h e second and third mutations are
suggested from the same purpose. will
and
T h e binding of
introduced
In be
the
the
formed. lysine
Ser
case
and
of
via a
probably
side
it
lysine
chain.
The
fourth mutation will reduce the degree of steric repulsion between MQPA quinoline and T y r by
replacing
the T y r with
less bulky Leu
and increase the activity of MQPA for factor X a mutant. 6.
CONCLUSION
X-Ray analyses of several trypsin-inhibitor complexes provided
three
novel
lines
type of
of
valuable
information.
mechanisms o f selective inhibition. for design reliable
of
First,
trypsin inhibition and lead to enzyme
atomic
specific
coordinates
a
the
Thus, i t indicated strategies
inhibitors.
to
revealed
it
elucidation o f
examine
the
Second,
it
various
provided
simulation
methods which are used to evaluate the free energy change of drugprotein
complex
activities syntheses.
of
formation.
compounds
Third,
by
could
With be
accumulating
a
valid
predicted the
simulation before
method.
their
actual
three-dimensional
struc-
95 tures o f drug-protein complexes, w e may be able to find essential factors
in
drug-protein
interaction
and
utilize
them
in
drug
design. We thank our coworkers, C. Sasaki, C. Okumura, M. and Dr. H. Kubodera.
Miyagawa
REFERENCES
1 2
3 4 5 6
7 8 9 10 11
12
R. Kikumoto. Y. Tamao. K. Ohkubo, T. Tezuka. S. Tonomura, S .
Okamoto and A. Hijikata, J. Med. Chem.. 23 (1980) 1293-1299. J. Sturzebecher. F. Markwardt. 9. Voigt. G . Wagner and P. Walsmann. Thromb. Res., 29 (1983) 635-642. R. Kikumoto, Y. Tamao. T. Tezuka. S. Tonomura. H. Hara. K. Ninomiya. A. Hijikata and S. Okamoto. Biochemistry, 23 (1984) 85-90. T. Matsuzaki. C. Sasaki and H. Umeyama, J. Biochem. 103 (1988) 537-543. T. Matsuzaki. C. Sasaki, C. Okumura and H. Umeyama. J. Biochem. 105 (1989) 949-952. R. Kikumoto and Y. Tamao. Mitsubishi Chem. R&D Rev., 1 (1987) 26-34. Protein D a t a Bank, Brookhaven National Laboratory, Upton. New York. J . Walter. W. Steigemann, T. P. Singh. H. Bartunik. W. Bode and R. Huber. (1981) 3PTN. Protein Data Bank, Brookhaven National Laboratory, Upton, New York. W. A. Hendrickson and J . H . Konnert, Biomolecular Structure, Conformat i o n . Function and Evolution, Pergamon. Oxford, Vol. 1, 1981. pp. 43-57. M. Marquart. J. Walter, J . Deisenhofer. W. Bode and R. Huber. Acta Crystal logr.. 839 (1983) 480-490. T. Naito, personal communication. C. S. Craik, C. Largman. T. Fletcher, S. Roczniak. P. J. B a r r , R. Fletterick and W. J. Rutter. Science 228 (1985) 291 -297.
This Page Intentionally Left Blank
QSAR and Drug Design - New Developments and Applications T. Fujita, editor 9 1995 Elsevier Science B.V. All rights reserved
97
THREE-DIMENSIONAL STRUCTURE-ACTIVITY RELATIONSHIPS AND R E C E P T O R MAPPING OF QUINOLONE ANTIBACTERIALS
HIROSHI KOGA and MASATERU OHTA Fuji-Gotemba Research Laboratories Chugai Pharmaceutical Co., Ltd. 135, 1-Chome Komakado Gotemba-shi, Shizuoka 412 J a p a n . ABSTRACT:
The q u a n t i t a t i v e structure-activity relationship (QSAR) correlation equation previously formulated for a n t i m i c r o b i a l a c t i v i t y of q u i n o l o n e - 3 - c a r b o x y l i c a c i d s i n d i c a t e s that the steric f e a t u r e s of s u b s t i t u e n t s at the I-N-, 6-, and 8p o s i t i o n s are i m p o r t a n t in g o v e r n i n g a n t i m i c r o b i a l a c t i v i t y . We r e e x a m i n e d the steric features of these s u b s t i t u e n t s by a n a l y z i n g t h e i r c o n f o r m a t i o n s by a m o l e c u l a r m o d e l i n g method. The "active" c o n f o r m a t i o n of e a c h s u b s t i t u e n t at e a c h of the p o s i t i o n s was e s t i m a t e d w i t h the c o n f o r m a t i o n a l e n e r g y c a l c u l a t e d by m o l e c u l a r o r b i t a l m e t h o d s and the a n t i m i c r o b i a l a c t i v i t y of the q u i n o l o n e c a r b o x y l i c a c i d m o l e c u l e p o s s e s s i n g that s u b s t i t u e n t . A m o d e l of the receptor supposed to accommodate substituents at the respective positions was constructed by superposing active c o n f o r m a t i o n s of s u b s t i t u e n t s in h i g h l y a c t i v e c o m p o u n d s . With this a c t i v e v o l u m e or r e c e p t o r model, the a c t i v i t i e s of c o m p o u n d s that were not predicted well by the previous QSAR were qualitatively and/or semi-quantitatively rationalized. We b e l i e v e that the p r e s e n t model is useful for p r e d i c t i o n of the a c t i v i t y of compounds to be synthesized in designing new quinolone antibacterials.
1. I N T R O D U C T I O N Nalidixic
acid
antibacterial therapy is
of
urinary
effective
effective Thus,
(NA) is the
drug
family tract
against
against
efforts
and
have
been
made
as
overcome
its m e t a b o l i c
as to
Norfloxacin 1970's
by
Koga
(NFLX, I), (2),
one
to
to
and
expand
1963
which was of
the
it
to
it
synthesized
the
is
it not
bacteria. enhance
and
authors,
for
Although
antibacterial
instability first
use
Pseudomonas
modify
present
(i) .
bacteria,
and its
of a q u i n o l o n e
clinical
gram-negative
activity
well
member
in
since
gram-positive
antibacterial
(1).
known
been
infections
most
most
first has
its
spectrum,
side
effects
in the
opened
up
late new
98
cooH R k?cooH
O
O
CH3 ~ ' c O O H
O
I
C2H5 nalidixic
acid
possibilities clinical
for use
used
However, For
is
I),
further
(Table
are orally
synthesized
developed
I) .
These
quinolone
relatively
is
enoxacin(2)
low
Subsequently,
systemic
toxicity
poorly
and
much b e t t e r
than
(4)
that
a number
are
commonly
called
share
common
a
new
(3) .
absorbed.
AM-833
(see
of NFLX,
of analogs
a few of them have been m a r k e t e d
they
P . and
quinolones
(4) or
6-fluoro-7-amino
structure.
Koga activity
(2b, c)
previously
relationships
structure
3.
From
correlations log
because
including bacteria,
against
NFLX
to
against
acid-resistant
effective has
the
superior
activity
resistant
and
(2,4).
and quite
is
nalidixic
is
absorbed
compounds
fluoroquinolones
potent
stable,
drawback,
and e x p a n d e d
NFLX
bacteria
of
against
orally,
administrated
which
have been
given
this
it has
incidences
metabolically
orally
greatly.
gram-negative
low
when
overcoming
Table
and
in that
cross-resistance
and,
infection,
were
quinolones
causes
incomplete bacteria
antibacterials
of q u i n o l o n e s
gram-positive
aeruginosa,
(norfloxacin,NFLX) (enoxacin)
of quinolone
applications
previously both
I "X=CH 2 X=N 9
(I/MIC)
the
examined
(QSAR's)
of
statistical
the
71
point
was r e p r e s e n t e d by equation =
- 0.362
quantitative
compounds
with
of view,
one
structurethe
generic
of the best
[I] 9
(LI) 2 + 3.036 L1
-
2.499
(Es6) 2 - 3.345 Es 6
-
0.205
(ZK6,7,8) 2 - 0.485 ZK6,7,8
+ 0. 986 I7 - 0. 734 I7N-CO
n=71 In mole/l
-
1.023
-
0.681 ~F6,7, 8 - 4.571
s=0.274
this
equation,
(MIC)
against
(B48)2 + 3. 724 B48 r=0. 964 the
[1]
r2=0. 929
minimum
Escherichia
F=70.22
inhibitory
coli
NIHJ
concentration is
used
as
in the
99 TABLE
i. F l u o r o q u i n o l o n e s . O
F
Ra Compound
R7
AM-833
5
ofloxacin
6
amifloxacin
7
ciprofloxacin
8
CI-934
9
AM-1091
11
difloxacin
activity.
parameters (Es)
R6.
constants terms is
an
carbonyl
the
zero
maximum
and
H
is
width
(R7
inductive
8-substituents.
the
(L) the
to
be
to
V e r l o o p 's
of
the
unity the
B48
of
side) .
when
7-amino
is
it
F6,7, 8
is
the
hydrophobic site(s).
I7 is
not.
is
as
sum (F's)
I7N-CO
unity
when
a
moiety,
and
parameter
for
direction the
quadratic
7-substituent
STERIMOL
parameters almost
The
substituent
another
perpendicular
with
is
expressed
at
parameter
hydrophobic
action the
STERIMOL
substituent steric the
to
their
when
is
electronic
Equations
sum
related to
zero
which
of R8 in the
R 1 substituent
F
and R 8 substituents.
as
within not.
of
compounds
variable
does
for
equal
expressed
exists
one
length
seem of
variable
it
Lupton-Hansch
N-X___/
R 6, R7,
is
group
>
of the
indicator when
F
stands
transport
and
--
the T a f t - K u t t e r - H a n s c h
parameter
the
H
C2H 5 -
7, 8
indicator
another
the
this
on
hydrogen
as
(~'s)
of
effect
~6,
CH3NH--
F
the
Es 6 r e p r e s e n t s
H
_
L1
representing
i.
of
CH3-N
F CH2 CH2 --
--OCH2CH (CH3) - -
/--A N-k__/
~'~N
H
R1
F
/--A N-k__/
~ NH~ N
PDI17558
position
HN
~N
10
biological
CH3-N
R1 R8
/--I CH3-N N -k__Y /---A CH3-N N-k__/
4
/COOH
II
of
opposite the
of the
equivalent,
or
to
Swain6-,
7-,
slightly
100 poorer the
statistical
significance
hydrophobicity
whole
of
Equation
[i] indicates
compounds
having 4.2
fluoroethyl, substituent
piperazinyl,
a
of
to
be
NFLX.
al.
(5)
almost much
this
These since
few
some
[i]
N-l-aryl
activity
analogs,
the
been
good
elaboration effects
quinolone
of
8-positions
of
analyzing
their
the
effects
rationalize quinolones
in
has not been
steric
steric
three-dimensional
of
found
detail
by
are
For
predicted
be
has
predicts
irrelevant However,
and
the
the
cases
of
a
due
general of
applicability
compound
quantitative
substituents
of
(II)
[i]
et
to
attempted.
map
Chu
ii
and
structure-
In this
chapter,
at
i-,
the
systematically and
substituents a possible
6-, in
more
attempted
in
terms
receptor
we and
of
region
to the of
3.
Equation and
not
substituents.
conformations
to
and/or
example,
equation
to
the
antibacterials
structure
been
compounds
ciprofloxacin(7),
2. ANALYSIS OF THE STERIC EFFECT OF 1-SUBSTITUENTS alkyl
have
of
(ii) .
assumed
except
R8
or more
synthesized
activities
compound been
an
methyl,
to
predictions
However
parameters
have
is
relationship
reexamined
have
steric
deviants
equation
and
to p o s i t i o n
bromo,
activities
developed.
NFLX.
for this
deviations of
as
(e.g.,
-1.4
,
of
an R7
(Table i) .
whose
been
of
E s value
comparable
(i0)
that
(L)
l-(p-fluorophenyl)-fluoroquinolone
activity
lower a c t i v i t y
assignments of
that
high
an
opposite
amifloxacin(6), (4)
compounds have
the
the
methylamino,
nitrogen),
chloro,
these
of
PDI17558
industries
some
found as
fluoro,
evaluations
equation
with
oxygen,
activities
ofloxacin(5),
d e v e l o p e d by various Recently,
(e.g.,
and
length
aminopyrrolidinyl)
exhibit
A M - 1 0 9 1 (9) ,
a
approximately
Subsequently, by
such as AM-833(4),
by
i
could
valid
C I - 9 3 4 (8) ,
well
of
for
of the RI, R6,
methoxy,
R 6 substituent
value
P
K7 for
It p r e d i c t s
with
vinyl, chloro,
either log
factors
(B4) in the d i r e c t i o n
1.8
oxygen)
that
ethyl,
aminopiperidinyl,
approximately
shown
substituent an
or
for activity.
fluoro,
z
with a width
methylene, than
R1
(e.g.,
with
substituent
an
(e.g.,
with
[I].
that the steric
important
cyclopropyl),
approximately-0.65
in e q u a t i o n
are A
formulated
R 7 substituent,
instead of ~ 6 , 7 , 8
approximately
of
the
molecule
and R 8 s u b s t i t u e n t s
1
only
were
[i]
was
substituted
derived alkyl
from
groups
compounds as
the
(6)
having
R1
only
simple
substituent,
not
lO1 including
any
prediction of
R1
aryl
of the
should
factor
for
considered
group.
This
activity
be
more
alkyl
could
be
the
of N l - a r y l q u i n o l o n e s .
complex
groups,
than
when
that
reason
expressed
these
for
The
the
steric by
mis-
effect
the
length
Rl-arylquinolones
are
together.
2.1 Compounds and Biological Activity The
listed of
compounds
in T a b l e
biological
negative
2.
E.
relative
to
were
not
five
groups 2.
that
of
overall
activities
to
lowest
ranging
activity.
activities
of t h e s e
a
quinolone
activity
compounds
from
4 to
There
is
an
index
each was
were
calculated was as
tested
The
into
shown
activity,
almost
other
compound
classified
activities
0.5.
gram-
against
of
activity
highest an
as
are
antibacterials
NFLX(1) ,
relative the
chosen
representative
biological
have
activities
activities
drug,
The
their
1 compounds
of
The the
comparable.
is
their
standard which
was
coli
coli
(2,4) .
the
biological
E.
E.
parallel
under
activities
their
against
the
according
Class
the
MIC
bacteria
always
relative shows
and
conditions
and
because
roughly
coli
gram-negative
Table
The
activity
bacterium
against
because
analyzed
and
class
in
their
5 compound
8000-fold
range
in
compounds.
2.2 Conformational Analysis and Molecular Modeling 2.2.1 General Procedure:
quinolone
ring
oxolinic structures the of
was
acid
and
compound force
the
was
The were
in
using as
as
bonds
rotatable of
analyses
were
and
(15). with
5~ .
was
built
initial
of
orbital
standards
for
Nl-substituents
ethyl,
by
from use
Gaussian
the
with
of 82
cyclopropyl,
such
MO
program
the
each
Tripos
minimum-energy for
further
R6=F,
R7=Rs=H)
molecular
rotated
energy methods
(16)
modeling.
conditions
were
and p h e n y l
bond
of
method.
under
minimum
from model
standard
these
(MO)
taken
energy
6-fluoroquinolones(3"
within
The
with
of The
primary
coordinates
preliminarily
continued
were
mechanics
the
(12) .
The
conformational
of
structure
R8=H)
(13).
examined Starting
X-ray
compound
coordinates
primary
all
each
molecular
The
the m o l e c u l a r
were
compounds
by
the
the
structure
-OCH20-,
system
compound
Then,
used
of
SYBYL
7,8-unsubstituted
used
AM1
each
(13) .
Conformations increment
the
minimized were
optimization
of
from
R6-R7 =
substituents
angles.
field
structures
constructed
library
substituents
lengths
three-dimensional
(B : R I = C 2 H 5 ,
of
fragment
The
was
groups
where
with
an
conformations, as
CNDO/2
also
used
as RI.
(14) for
STO-3G
102 TABLE 2 . R e l a t i v e A c t i v i t y of F l u o r o q u i n o l o n e s .
R e lat i v e
1 14
-
1/21
R
R8
1
H
H
5(S)
Me
6 7
11 12 13 14 15
16
17
18 2
[1/4-1/16]
Me H Me H H Me Me H H
Me
H H H H H H H
H H H
30
[1/32-1/128]
31
H
2
H
7
MeNHcyc-Pr4-F-PhFCHzCH2CH2=CH2,4-F2-Ph-
H H H H H
H ~-oH-P~n -CH=CH -S -
-C
*
(CH2)=CH-S-CHJCH7S-
8t9 9 5 2 2 5 5 10
10 11
Me Me Me Me Me Me H Me
Me
-OCHzCH (CH,) (R) P h2-F-Ph2 -Me-Ph4-C1-Ph4 -Me-Ph3 , 4 - ( O C H 2 0 ) -Ph-
32
MC
H H H
33 [1/250-1/1000]
Me
H
4
Et
n-PrCH2=C13CH2HOCH2CH2PhCHz-
Me
3
ref
(S)
20 21 22 23
27 28 29
R2
(CH3)-
-OCH2CH
19
24 25 26
*
R1
H
(Me)>NCH2CH73-F-Ph4-Br-Ph-
H
H
2 5 5
4-MeO-Ph-
H
5
n
basis set w a s used for t h e G a u s s i a n 8 2 c a l c u l a t i o n s .
a n a l y s e s , all r o t a t a b l e bonds i n N 1 - s u b s t i t u e n t s
I n t h e MO w e r e rotated w i t h
103 15 ~ i n c r e m e n t minimum. each
After
of the
group
and
was
determination
introduced
program. of
or
The
using
was
the
the
by
from
Conformational compound
(35) . 35,
quinolones shown
in
energy. three
(Table
bond
Fig.
i.
There
was
methods. where
quinolone
plane.
With
decide
which
the
ring, this of
results of
the
the
information
these
two
the
the
two
energy
moiety
other
to
alone,
is the
more
each
conformation
Activity:
values
of
above
the
it
by
the
to
the
plane
is
is
are
identical
results
corresponds
it
of
conformation, and
where
however,
was
derivative
minima
is
that
active
class,
Nl-substituted
in the
minima
The
classes,
and
calculated
difference
der each
Nl-cyclopropyl
the
of
the
van for
compound
among
showed
with the
7-piperazinyl
energy
cyclopropyl
and
of
from
was
site
system.
Conformation
the
the AM1
optimized
different
in a c t i v i t y .
substantial
One
SYBYL
active
activity
and the the
compounds
volume
The
at
the
the of
rotated,
The no
having
the
(7),
of
it.
of
less
2).
was
using
superposing
started
highest
by
by
Active
was
compounds
Nl-substituents
total
Ciprofloxacin the
that
in
the
between
examined
conformation the
of
local
conformation
compound
volumes
the v a r i a t i o n s
has
NI-RI
MO
the
that
model
to
by
routine
total
analysis
compound
as
Nl-substituents
2.2.2 Relationship
the
energy
the
or N - m e t h y l - p i p e r a z i n y l
each
close
calculated
the
around
optimized
of
occupied
MVOLUME
reflect
the
were
defined
subtracting
class
where
minimum
7 of
value
was
between
to
position
volume
of
difference compound
of the
or an e n e r g y
"total"
estimated
5~ i n c r e m e n t
conformation
conformation
classes
with
structures
receptor
volume
assumed
at
whole
"active"
energy
"active" Waals
of
The
action
scanned
Nl-substituents , a piperazinyl
conformations
minimum
then
below
of the
impossible
responsible
for
to the
activity. Conformation structure more
and
insight
compounds. of
the
(7) .
S
into
isomer
The
S
(17).
[3: R6=F,
two
stable
R7=H,
ofloxacin activity,
the
Ofloxacin
CH2CH2CH(CH3)-] isomer
of
potent
(5)
5(S) isomer is
is of
also
(5) , was
active has
optical than
S-25930 reported
Conformational
analysis
without
in
of
more
of
the
showed
to
the R
5(R)
R8-RI =than
the
R
of
5
compound
the
energy
S
1
activity
isomer
R7=CH3, model
rigid obtain
class
the
active
that
significant
fairly
and
the
R6=F,
be
a
order of
isomers
that
[3 : to
R 8 - R I = - O C H 2 C H ( C H 3)-]
conformations
has
conformation
two
higher
which
analyzed
isomer
has
difference.
104
10
8
6
v
4
35 The d i h e d r a l
2
a n g l e was d e f i n e d as 2-1-1'-3'
0 -60
-120
0
60
120
Rotation 8 (degree)
A : Energy c u r v e c a l c u l a t e d by G a u s s i a n 82 B : Energy c u r v e c a l c u l a t e d by AM1 c : Energy c u r v e c a l c u l a t e d b y CND0/2
(STO-3G)
F i g . 1 . R o t a t i o n a l Energy Map of t h e N1-R1 Bond o f t h e 1C y c l o p r o p y l Compound35 (Reproduced from r e f . 6b by p e r m i s s i o n o f t h e American Chemical S o c i e t y ) .
is
One
that
where
the
branched
methyl
moiety
is
p e r p e n d i c u l a r t o t h e quinolone r i n g p l a n e and t h e o t h e r i s t h a t where it
i s oblique t o t h e plane. From t h e s i m i l a r i t y o f t h e t h e l a t t e r was s e l e c t e d t o match
o v e r a l l shape of N1-substituents,
one o f t h e c o n f o r m e r o f t h e N 1 - c y c l o p r o p y l
compound 35,
i n which
t h e c y c l o p r o p y l g r o u p i s l o c a t e d above t h e p l a n e o f t h e q u i n o l o n e ring.
Consequently,
the
matched
c o n f o r m e r s were
regarded
having t h e "active conformation" f o r t h e N1-substituents
as
of 5 (S)
and 7 ( F i g . 2 ) . F o r t h e N1-ethyl
g r o u p i n compound 3, i n which R1=C2H5, Rg=F,
R7=Rg=H a s t h e model o f n o r f l o x a c i n
(l), t h e r e a r e t h r e e e n e r g y
minima where t h e e t h y l g r o u p i s a b o v e , b e l o w a n d p a r a l l e l t o t h e quinolone r i n g . R1)
The b o n d - r o t a t i o n a l b a r r i e r s a r o u n d t h e bond
b e t w e e n t h e s e t h r e e c o n f o r m e r s were n o t h i g h .
which
the
e t h y l g r o u p was
above t h e
r i n g was
(N1-
The model i n
selected as
the
105 active of
conformation
5(S)
and
For
the
Nl-phenyl
conformational which and
the
showed
those
of the N l - s u b s t i t u e n t s
of
compounds
compounds
I,
5(S),
compounds
of
class
conformers
as
substituents
and
R6=F,
two
the
quinolone
and
benzene
The
latter and
(Fig.
best
highly
RI=C6H5, are
1 were
those of
that
24 7
(3" there
between
respectively.
conformer
it m a t c h e d
derivative
search
angles
i00 ~
because
7 best.
Ii
2) .
selected matching
active
was by
active
as
the
with
from
active
compounds
are
the
the
and
80 ~
active of
of o t h e r
low
energy
conformation
(i,5(S),
in
those
conformers
similarly
a
minima,
rings
selected
comparison
The
R7=R8=H),
energy
7)
of
NI-
(Figs.
2-
4). The
active
superposed
by
total
volume
model
an
the
be
is
activity energy
the
of
2
they
occupies
selected
in
compounds. as
For
the
region
are
of
two
The
form.
conformation the
of
substituents (Figs. We
class
2
former this
L
with
a
well, be
biological
because was
equation the
the
of
a
activity
the
20,
21,
extended as by
22
was
least
and
their
30) ,
a
the
active
the
suggesting in
as
active
and
significant
represented
receptor
this
dimethylaminoethyl
most
[i],
of
low
of the NI-
compound
selected the
of
fairly
occupies
(19, an
end
low is
and
compounds
a number
position
have
6.
arbitrarily
in
of
should
high
are
(22)
conformations"
was
interact
to
hydroxyethyl, 3
to
model
model
model
which
substituent
analysis
length
in
conformation
of N l - s u b s t i t u e n t s
the
there
compound
in Fig.
and
This
prediction
with
The
accommodating
receptor
This
to the m e t a
seemed
active
allyl,
ring.
calculated
5).
and
this
(22),
close 24
shown
(Fig.
were
found.
conformer
Nl-benzyl as
was
receptor
compounds
Nl-benzyl
plausible
in
effect
function
The
fits
compounds
quinolone
activity.
been
a region
1
verification
derivative
The
Nl-propyl ,
substituents there
possible
for
high
have
The
their
antibacterials
compound
later.
unfavorable
the
Nl-substituted
Nl-benzyl
group
described
in
class
Nl-substituents
If a c o m p o u n d show
conformations.
phenyl
the
atoms
standard
to
novel
substituent
for
a
whenever
For
the
quinolone
as
activity.
for
of
superposed
volume"
expected
amended
class
the
active
used
biological it
matching
of
"active
highly
could
conformers
bent
factor
quadratic that
these
extended
forms
6-9). calculated
the
difference
between
the v o l u m e s
occupied
by
106
Fig. 2. S t e r e o v i e w of the s u p e r p o s i t i o n of the p r o p o s e d a c t i v e conformers of I (green), 5(S) (yellow) , 7 (blue), a n d II ( o r a n g e ) ( R e p r o d u c e d f r o m ref. 6b by p e r m i s s i o n of the A m e r i c a n C h e m i c a l Society) .
r
Fig. 3. S t e r e o v i e w of the s u p e r p o s i t i o n of the p r o p o s e d a c t i v e conformers of 6 (yellow) , 13 (green) , 15 (orange) , a n d 16 (cyan) ( R e p r o d u c e d f r o m ref. 6b by p e r m i s s i o n of the A m e r i c a n C h e m i c a l Society) .
Fig. 4. S t e r e o v i e w of the s u p e r p o s i t i o n of the p r o p o s e d a c t i v e conformers of 12 (green), 14 (red), 17 (yellow) , a n d 18 ( v i o l e t ) ( R e p r o d u c e d f r o m ref. 6b by p e r m i s s i o n of the A m e r i c a n C h e m i c a l Society) .
107
Fig. 5. S t e r e o v i e w of the t o t a l v o l u m e (orange) of the NIs u b s t i t u e n t s of the class 1 c o m p o u n d s ( R e p r o d u c e d f r o m ref. 6b by p e r m i s s i o n of the A m e r i c a n C h e m i c a l Society).
Fig. 6. S t e r e o v i e w of the s u p e r p o s i t i o n of the p r o p o s e d a c t i v e c o n f o r m e r s of 19(green), 2 2 ( y e l l o w ) , 5(R) (orange), and 2 4 ( v i o l e t ) and the d i f f e r e n c e (orange) b e t w e e n the t o t a l v o l u m e s of the set of 19, 22, 5 (R) , and 24 and t h o s e of the c l a s s 1 compounds ( R e p r o d u c e d f r o m ref. 6b by p e r m i s s i o n of the A m e r i c a n C h e m i c a l Society) .
108 Nl-substituents of
class
2
occupied
in class
compounds
volumes
are
increases
where
repulsions
steric
8
shows
(28),
and
the
class
(24,
resulting
ends
of
25,
in
the
fact, seen
occupy
for
meta
R8-R 1
for
the
too
compound
quinolone
31.
moiety
ring,
as
receptor.
to
the
fit
the
are
in Fig.
and
compounds in the
of
of
are
19,
the
N l-
wall, The
and
the
meta
to be
21.
In
activity, methyl
the
6, d i s t u r b i n g
(21)
the
assumed
20,
branched
below
region b e l o w the plane
and
p-methyl
receptor
one
reduce
fixed
the
7,
hydroxyethyl
to
regions
Nl-phenyl
and
1 compounds.
and
compounds
6,
those
occupied
p-hydrogen
(20) ,
methylene
(29),
not
of class
These
of
wall
the
regions be
Figs.
Nl-phenyl
corresponding
5 (R)
shown The
the
in
The to
receptor
and
those
group.
The
of
of
(23)
(19) , allyl
activity
8. seem
activity.
regions
small
than
on
and
conformers
increase
m-oxymethylene
occupy
regions
substituents
cyclic
to the
are
activity
of the N l - p h e n y l
unfavorable
(26) ,
Nl-methyl
Nl-propyl
substituents
positions
26)
7,
volume the
the
substituents
The
the
6,
between
reducing
substituent
lower
Figs.
The a c t i v e
and
occupied
occur
(27)
(22)
1 compounds.
phenyl
the
o-methyl
p-chloro
and N l - b e n z y l
in
in
Nl-substituent
that
superposed,
shown
representing end of the
1 and 2 compounds.
were
as
in
plane
the
of
the p r o p e r
the
binding
of the q u i n o l o n e
ring
s h o u l d reduce the activity. The
difference
substituents together the
is
shown
These
at
the
regions
activities
para
In
of the
region
by
occupied compounds.
I0 in
shows the
The
Nl-phenyl
(33)
relevant
binding
is
of
to
Nl-phenyl
cause
are
is too
meta
fluorine
causes
seem
work
31 in class
difference 4
compound
region
occupied
probably the
I, 2, and 3 compounds.
more receptor
between and
the
by the
reduction
the
1 and the
and the of
the in
3. the
total
class
I,
p-methoxy the
in
simultaneously
significantly than
(32) .
(31) ,
small,
by
and the
of class
derivative
factors
the
NI-
occupied
(30)
reductions
to those
position to
by
2 compounds
substituents
further
Nl-(m-fluorophenyl)
of c o m p o u n d
with
1 and
regions
30 and 32 r e l a t i v e
the
class new
occupied
class
Additional
at the para
two
the a c t i v i t y
Fig.
class
thought
the
These
9.
volumes
and the
of the N l - d i m e t h y l a m i n o e t h y l
position
hydrogen
occupied
activity. lowering
in Fig.
groups
were
the
3 compounds
of c o m p o u n d s
compounds.
volume
between
class
two N - m e t h y l
bromine
2
of
volumes 2,
group
and
3
of the
unfavorable
Nl-substituents
for of
109
least
Finally,
the
active
derivative
(34) ,
examined methyl
(Fig.
inhibitory
below
substituents that
of
of
occupied
the
The the
i,
region
total
volumes
of
the
compounds
was
N I- (2, 6 - d i m e t h y l p h e n y l )
2,
3,
and
occupied
quinolone
ring
4
by
one
seemed
provide
to
for
by
of
the
exert
ortho
a marked
the
the
Nl-phenyl
We
of the q u i n o l o n e above
the
group
to the
of the Nl-phenyl.
We
into
one
above
fluorine also
that
NI-
there
corresponds
the
and
propose
the
for
propose
activity"
Nl-cyclopropyl other
insights
relationships
antibacterials.
increasing
and the
the plane
important
structure-activity
quinolone
ring,
position
below
class
analyses
regions
quinolone para
the
the
the
effect upon the activity. present
two
between
compound,
and
three-dimensional are
5
Ii) .
groups
The
difference
class
plane
of
hydroxyl that
to
the
at the
the
regions
ring and a r o u n d the m e t a p o s i t i o n
quinolone
ring
plane
prevent
proper
r e c e p t o r binding. Fig.
12
shows
a modified
volume
occupied
of
Nl-(p-hydroxy)-phenyl.
the
(length) QSAR
is best
equation
phenyl
toward
at
4.2
For
receptor allyl,
activity
as
used
optimum
they w o u l d
and
not
para p o s i t i o n
methyl further
Fig.
12
derive
volume reach
to
has
the
other
of the Nl-phenyl
as
the
the
of
NI-
has
cyclopropyl L
of
L
in
an and
compounds
value
changes
in these
compounds
in terms
receptor is
too
of the
The
group,
the
to
volumes
onto the L to
fit
that
the
but
n-propyl,
the
forbidden
region.
The
favorable
for
Nl-substituents
region
in
dimethylaminoethyl
forbidden
could
to
model
small
extrude
[i]
group.
[i]
group
does,
optimum
optimum
the
situation
groups
above.
why
The
group
equation
of
activity
cyclopropyl
corresponding
the
the
of the
into
total
activity
equation
length
value.
two
explain
the
in
that
group
benzyl
the
hydroxy
for N l - S u s t i t u e n t s
decreases
methyl
on
the
can
predict
variations
described
to
to
shows
projection a
penetrate in
to
based
and
model
parameter
optimum
activity
terminal
model
compounds
[i]
corresponding
The
substituent
L1
the
example,
wall
receptor
the
of
hydroxyethyl
region.
in
the
unable
substituents
side
one-dimensional
axis.
is The
model
group
parameter
corresponding
Nl-phenyl
explain
This
a steric
but
i
receptor
Nl-cyclopropyl
Equation
either
of the
the
as
[i]
groups.
without fact,
the
derivatives.
optimum ethyl
by
be of
of
accommodated cyclopropyl,
corresponding
the only but
to the
110
Fig. I . Stereoview of the superposition of the proposed active conformers of 2 0 (orange), 23 (green), 25 (blue), and 2 7 (yellow) and the difference(0range) between the total volumes of the set of 20, 2 3 , 2 5 , and 2 7 and those of the class 1 compounds. Since the benzene rings of the N1-substituents of 2 5 and 2 7 overlap, this region appears white. The N1-methyl of 23 and N1-ally1 of 20 also overlap and the N1-methyl appears white or yellowish-green (Reproduced from ref. 6b by permission of the American Chemical Society).
Fig. 8 . Stereoview of the superposition of the proposed active conformers of 21 (green), 26 (yellow), 28 (violet), and 2 9 (blue) and the difference (orange) between the total volumes of the set of 21, 2 6 , 28, and 29 and class 1 compounds (Reproduced from ref.6b by permission of the American Chemical Society).
Fig. 9. Stereoview of the superposition of the proposed active conformers of 30 (green), 31 (yellow), and 32 (cyan) and the difference between the total volumes (orange) of 30, 31, and 32 and class 1 and class 2 compounds (Reproduced from ref.6b by permission of the American Chemical Society).
111
~)__...... ~
~
0
.I~N~ ~
0
Fig. I0. S t e r e o v i e w of the p r o p o s e d active c o n f o r m e r of 33 and the d i f f e r e n c e b e t w e e n the v o l u m e s (orange) of c o m p o u n d 33 and class I, 2, and 3 c o m p o u n d s (Reproduced from ref. 6b by p e r m i s s i o n of the A m e r i c a n Chemical Society).
Fig. Ii. S t e r e o v i e w of the p r o p o s e d active c o n f o r m e r of c o m p o u n d 34 and the d i f f e r e n c e b e t w e e n volume (orange) of c o m p o u n d 34 and the total of class i, 2, 3, and 4 c o m p o u n d s (Reproduced from ref. 6b by p e r m i s s i o n of the A m e r i c a n Chemical Society).
Fig. 12. S t e r e o v i e w of the m o d i f i e d r e c e p t o r model for the volume occupied by Nl-substituents of quinolone antibacterials. ( R e p r o d u c e d from ref. 6b by p e r m i s s i o n of the A m e r i c a n C h e m i c a l Society) .
112
3. ANALYSIS OF THE STERIC EFFECT OF 6-SUBSTITUENTS
(18)
According to equation [l], formulated for the entire series of quinolones 3, the effect of substituents at the 6-position on the activity is represented by the Taft-Kutter-Hansch Es Equation [l] reflects equation [2] for the subset of
parameter.
6-monosubstituted compounds 36, the activity of which varies parabolically with the Es of the R g substituent (Fig. 13A) (2). log(l/MIC)
= -3.318(+0.59) ( E S ~ -4.371(?0.85) ) ~ Es6 +3.924 n=8 S = 0 . 1 0 8 r=0.989 F=112.29
[21
In equation [21, the Es value adopted for the nitro group is the one
(-1.01) evaluated from its half-thickness representing the
steric effect in the perpendicular direction and that of methoxy is approximated by the value of the ethyl group
-2
0
-1
1
-2
-1
E s6 Fig. 13
(2).
6 ES
For the
0
Parabolic relationships for the effect of
6-substituents with the Es6 Parameter.
1
113 corresponding
use
of
the
reasonable for
7-piperazinyl
same
E s value
(Fig.
its
13B) .
coplaner
significant significant
correlation
set
of
compounds
for
the
the
greatest
deviation,
observed
calculated
value
changes
of
the
vicinal
is
piperazinyl
relationship
between
0.61) .
log(I/MIC)
the
This
from the
not
For
the
(Es6)2
+1.426(+0.29) s:0.250
6-nitro-7-
for the
Although
being
be
due
value
to
and
conformational
confirmation
of
with
was
the
this,
analyzed
and a c t i v i t y
higher
observed
interaction
36 and 37 was
showed
much
the
a
combined
compound
between
(Es6) 2 -2.682(+0.93) r=0. 984
No
(half-thickness)
steric
conformation
- -2.587 (+0.89)
the
obtained
activity
the
not
half-width
[3].
Es
the
effective.
unless
could
by
R 6 in c o m p o u n d s
s=0.079
n:15
[4] was
however,
apparently
equation
difference
group
:-2.026(+0.68) n=6
give
predicted
(the
the
also
is
6-nitro-7-piperazinyl
group.
of the
to
using
its
6-nitro
conformation
Iog(I/MIC)
37)
value
estimated is
37,
group
formulated
equation the
compounds
6-nitro
E s value
omitted
group,
of
the
was
is
(36 and
6-nitro
subset
effect
correlation compound
the
The
steric
piperazinyl
than
for
the
and
the
examined. [3]
Es 6 +5.561
F-45.50 -3.351 (+1.25)
Es 6
[4]
17 +4. 088 r=0.971
F=60.84
3.1 C o n f o r m a t i o n and Steric Parameters 3.1.1 C o n f o r m a t i o n a l receptor similar
mappings
As R 6 : N O 2)
ring
Thus,
analysis The
should
the
36 for
and
37
were
carried
l-substituents
6-nitro
almost
hand,
group
plane
6-substituents,
14, is
other
6-substituents
substituent. of
Fig.
nitro
used
analyses
and out
in 3.
by
The AM1
for the MO method.
energy
the
the
Conformational
compounds
to those
used in
low
On of
quinolone some
was
shown at
plane. angle
of
procedures
Hamiltonian
Analysis:
in
of
is about could
be
the
steric
low
compound 55 ~ .
of
with (37:
(36: ring
conformation,
the the
the
on
with
conformations by
of the
based
38
quinolone
R6=N02),
influenced
analysis
parameter
compound
the
energy 39
Likewise,
markedly
for q u a n t i t a t i v e a
group
coplaner
the
steric
effect
conformational
be used.
6-methoxy
compound
40
(36:
R6=OCH 3)
has
of
adjacent
two
conformers
114 with
energy
group
is
moiety shown of
minima.
almost
of
the
methoxy
in Fig.
the
14.
methoxy
m e t h o x y group
Fig. 14. right) , 39
The
One
corresponds
coplaner The
group
locates
with
group
locates is that
is
opposite
conformer
was
the
methylthio
conformation
at
group
is
as
the
and
the
the
and
methoxy
the
5-position only
methyl
the
methyl side
as
direction
moiety
of
the
side.
c o n f o r m a t i o n of c o m p o u n d s 38 ( u p p e r (lower right), and 41 (lower left) .
6-methylthio-7-piperazinyl upward,
in w h i c h plane
in w h i c h
at the 7 - p o s i t i o n
the
turning
to that
quinolone
other
Proposed active (upper left), 40
first
the
taken
almost
modeled
as
the
active
compound
41
coplaner
with
in
Fig.
(37"
structure
R6=SMe)
the
14,
since
in
which
quinolone
has
lower
plane energy
than the other.
3.1.2 Quantitative Structure-Activity Relationship using Conformational StericParameters: conformations calculated. sphere the of
Each
with
the
quinolone the
along
New
steric p a r a m e t e r s
for the atom
van
ring
Waals
plane.
projection
substituents
in the
der
6-substituent
the
6-substituents
from of
The the
the
based
on the p r o p o s e d
of q u i n o l o n e s
6-substituent radius. length carbon
bond
The
P.
represented
plane
L is the atom
between
and the C6 onto the plane
was
at
P
is
farthest the
the
active
36 and 37 were a as
extension
6-position
(~ a t o m
as
defined
of
the
A box w h i c h t o u c h e s
(C6) 6-
the
115 van
der
Waals
through 15.
The
sides H2
the
values
of
are
the
the
tangentially
was
widths
defined
of
the
as
from
and
shown
passes in
substituent
respectively.
substituent
compound
H26
The
The the
that
reliable
36. [7]. of
n=8
situations
the
activity
the
6-NO2
plane
for
With
[6]
in
the
are
the
H 1 and
P
and
are
WI,
W2,
HI,
mainly
due
works
well
[5]
the
and
the
the
to
fact
[6] of
new
by
steric
variable,
quality
[4],
than
the the
the
steric
of
Figs.
16A of
the
correlation parameter
about
for
the
is gives
nature
at
give lower more of the
of R 6.
: -5.806(+2.67) s=0.255
r=0.937
(H26) 2 +17.67 (+8.42)
H26
-8.235
[S]
F:18.08
P
P
COOH
H1
H2 H2 --
15.
(half-
[2] to
R6
from
substitution
combined
R6 and
the
Es
equation
for
the
thickness
that
in
I7,
H2
effect
the
were
Es p a r a m e t e r
statistically
selected.
illustrated
with
L,
the
parameter
is
group
and
were
the
that
indicator
Although equation
37
show
accord
equations
information
effect
log(i/MIC)
Fig.
Fig.
to
values
plane
parameters,
36 and
represents
and
on
steric
respectively,
[5]
7-position,
equation
these
compounds
[6],
quinolone
thickness)
with for
[5] and
substituents
steric
are
the
equations,
Equations
than
of
examined
substituents.
the
W2
5-positions,
correlations
equations In
the
and
6-substituent 6-position
as H 2 _> H I.
The
16B.
the
the
W 1 and
7-
H2 w e r e
best
of at
thicknesses
defined and
radii
carbon
Definition
of the
new
steric
parameters.
H1
116 n
R6D3 OoH
7.0-
36
31
-
B
A
6.5-
4
I
GH5
6.0-
6.0
5.5-
5.5
\
a, 4
5.0
m
4.5
0
4
1 .o
1.5
2.0
1 .o
2.5
1.5
2.0
2.5
H2 Fig. 16. Parabolic relationships for t h e effect of 6 - s u b s t i t u e n t s w i t h t h e newly d e f i n e d H
l/MIC)
=
l/MIC)
s=O.211 =
parameters
( H z ~ )+ 6~ . 9 5 9 ( + 6 . 5 2 ) H26 + 0 . 8 8 6
-2.222(+1.99)
n=7
2
r=0.863
-3.427(+1.86)
[61
F=5.81
( H 2 6 ) 2 t 1 0 . 5 7 2 ( + 6 . 0 0 ) H26
+ l .7 0 5 (+O . 3 9 ) I 7 - 3 . 2 8 8
s=O . 3 3 1
n=15
r=O . 9 4 9
[71 F=33.07
V a r i o u s t y p e s of s t e r i c p a r a m e t e r s e t s h a v e b e e n e m p l o y e d f o r
QSAR
analyses.
Although
various
parameter
s u c c e s s f u l l y u s e d d e p e n d i n g upon t h e t y p e o f
sets
have
been
steric i n t e r a c t i o n s
i n v o l v e d , t h e y sometimes d o n o t r e f l e c t t h e s i t u a t i o n based o n t h e biologically
a c t i v e form.
T h e new
s t e r i c parameters
proposed
a b o v e i n a way s i m i l a r t o t h e STERIMOL v a l u e s seem t o be v e r s a t i l e i n o t h e r examples, conformation
from
s i n c e t h e y are b a s e d on t h e p r o p o s e d " a c t i v e " conformational
m a n i p u l a t e d on t h e c o m p u te r g r a p h i c s .
analysis
and
appropriately
1 I7 TABLE 3 . S t r u c t u r e a n d A c t i v i t y o f q u i n o l o n e s a n d fluoroquinolones having 8-substituent.
n
d b - " " O H I RR
log 1/MIC
(mole/l) a g a i n s t E . c o l i
-1
obsd.
'ZH5
calcda)
dif.
43')
H
3.939
4.489
-0.55
44')
F
4.575
4.586
-0.01
4SC)
c1
4.606
4.449
0.16
Me
4.868
4.818
0.05
4 7 ')
OMe
3.694
3.881
-0.19
48')
Et
3.088
3.149
-0.06
2.514
2.386
46
C)
4 gC) OEt
l o g 1/MIC RNJ
R8
R8
1
R
obsd.
ref
0.13
(mole/l) a g a i n s t E . c o l i
R,
R1
b)
calcd?)
dip) calcd?)
difb) ref
C)
H
Et
H
6.629
6.375
0.25
2
50c)
F
Et
H
6.873
6.564
0.31
2
51c)
c1
Et
H
6.892
6.801
0.09
2
H
7.184
7.007d) 0 . 1 8
5.581e)1.60
2
Me
6.859
6.69gf) 0.16
5.798')1.06 h) 6.880 -0.04
2,7
5 2 ')-CH2 5
CH2CH (CH3)
-0CH2 CH (CH3) -
-
53
OMe
Et
H
6.844
5.759
1.08
54
Br
Et
H
6.600
6.746
-0.15
55
CN
Et
H
6.236
6.506
-0.27
56
NO2
Et
H
5.970
6.154
-0.18
20 21 21
i) 6.532 -0.56
C a l c u l a t e d by e q u a t i o n [l] . D i f f e r e n c e between observed and c a l c u l a t e d v a l u e s . I n c l u d e d t o d e r i v e e q u a t i o n [l]. C a l c u l a t e d w i t h B 1 of t h e e t h y l g r o u p i n p l a c e o f B 4 ( 2 b ) C a l c u l a t e d u s i n g B 4 of t h e e t h y l g r o u p f o r B 4 8 . C a l c u l a t e d u s i n g B 1 of t h e 8 - m e t h o x y g r o u p i n p l a c e o f B 4 C a l c u l a t e d u s i n g B 4 o f t h e methoxy g r o u p f o r B 4 8 . C a l c u l a t e d u s i n g B 2 o f t h e methoxy g r o u p f o r B 4 8 . C a l c u l a t e d u s i n g B1 of t h e n i t r o g r o u p i n p l a c e o f B 4 .
21
118
3.2 Proposed Receptor Model The
active
conformers
of
norfloxacin
droxacin,
tioxacin,
and DJ-6783
of
quinolone
rings.
their
substituents 17.
The
positions receptor these
and
total
should
compounds
HN~ . J
helpful
>--S
to the
are very including
volume
for
active
at
of the
against
E.
oxygen,
as
the
estimating
vicinity
fluorine,
occupied
calculated
compounds
oxolinic
acid,
by m a t c h i n g by
shown 5-,
the
6-,
shape
6-position,
coli
with
atoms the and of
a variety
C2H5
C2H5
C2H5
O
i
C2H 5
tioxacin
Fig. 17. Active volume (cyan) of quinolone antibacterials.
acid
the of
and nitrogen. O
oxolinic
7-
because
O
(I)
6-
in Fig.
O
norfloxacin
O
be
was
these
(1),
superposed
total
groups
of
corresponding
6-substituents
The
adjacent volumes
were
droxacin
O
I
C2H5
DJ-6783
of the
6-substituents
and vicinity
119
4. ANALYSIS OF THE STERIC EFFECT OF 8-SUBSTITUENTS
( 19)
The activity (MIC) of 8-substituted quinolones 4 2 ( 4 3 - 4 9 in Table 3 ) has previously been reported as being parabolically related with B48, one of the STERIMOL parameters for the maximum width of the Re as indicated by equation [81 and Fig. 18 (2). The B4 value as the steric parameter of Re substituents also applies to 1, 6, 7, 8-tetra-substituted quinolones 3 ( 5 0 - 5 2 in Table 3) since
the activity
of these
equation
(2).
111
compounds has been
The 8-substituent
well
predicted by
is thought to interact
sterically with the 1-ethyl-substituent in compound 4 2 . Therefore, the maximum width of the Re expressed by B4 has been believed to be that in the direction opposite to the 1-substituent ( R 7 side) and to recognize the receptor wall as such. Depending upon the structure, however, the 8-substituent may be directed above or below the quinolone ring plane with steric repulsions of substituents at positions 1 and 7. log(l/MIC)
=
-1.016(*0.46) (B48)2 +3.726(+2.04) B48 +1.301
n=7
s=O.221
r=0.978
F=44.05
Me
1 .o
2.0
3.0
B48 Fig. 18 Parabolic relationship for the effect of 8-substituents with the STERIMOL B4 parameter.
181
120
Fig. 19. (pink), 48
S t e r e o v i e w of the p r o p o s e d (green), and 49 (blue) .
active
conformers
of
47
proposed
active
conformers
of
53
Stereoview of the a c t i v e of q u i n o l o n e a n t i b a c t e r i a l s .
volume
model
the
8-
Fig. 20. S t e r e o v i e w of (yellow) and 56 (green).
Fig. 21. substituent
the
of
121 Since structure and
equation 3,
50-52
in
quinolones the
[I]
including Table
3,
(5 and 5 3 - 5 6
activities
of
was
formulated
8-substituted some
symmetrical
top
8-substituents
a ring
the
l-substituent
[i],
with
that
group
was
of c o m p o u n d
53
not.
may
conformations
This of
the
were
be
due
by the
We
compounds
with 44-49
some
reported.
Although
spherical
5 with
the
well
by the
or
R 8 forming equation
unsymmetrical
methoxy
differences
between
the
1,8-disubstituted
(47-49)
and
compounds
8-substituents.
quinolones
as
having
predicted
to in
been 55
compound
substituted
8-substituents
8-substituents
and
and
71
such
i, 6, 7, 8 - t e t r a - s u b s t i t u t e d
3) have 54
i, 6, 7, 8 - t e t r a - s u b s t i t u t e d unsymmetrical
new
in Table
compounds
for
ones
such
analyzed
as
the
53
having
conformations
of
to examine this p o s s i b i l i t y .
4.1 Active Conformation and Activity The
compounds
analysis
of
described
above
As ethoxy
the
analyzed
8-substituted
(2.2.1)
shown
in
direction
the
1-ethyl
e),t.',.'.,
19,
opposite
k. " " " . '
( E q . 12)
Obsd.
0. 00 0. 55 0. 96 1. 19 0. 47 0.95 0. 99 1. 54 2 10 1. 43 0 . 25 0. 03 0. 54 0. 46 1. 19 -0.24 0. 21 0. 23
Calcd.
Dev.
0. 01 0. 52 1. 06 1. 19 0. 51 1. 08 0. 92 1. 43 2 12 1. 40 0 . 23 - 0 . 04 0.57 0.53 1. 13 -0.35"' - 0 . 26"' - 0 . 03"'
-0.01 0. 03 -0. 10 0. 00 -0.04 -0. 13 0. 07 0. 1 1 -0.02 0. 03 0. 02 0. 07 -0.03 -0.07 0. 06 0.11 0. 47 0. 26
Calcd.
0.02 0. 49 1.00 1. 12 0.49 1. 03 0. 97 1. 46 2. 1 1 1. 40 0. 20 0. 00 0. 58 0.55 1. 25 -0.17 0. 04 0. 32
Dev.
-0.02 0. 06 -0.04 0. 07 -0.02 -0. 08 0 . 02 0. 08 -0.01 0. 03 0. 05 0. 03 -0.04 -0.09 -0. 06 -0.07 0. 17 -0. 09
a) Not included in the c o r r e l a t i o n b u t c a l c u l a t e d by Eq. 6.
TABLE 5
Hydrophobicity Parameters of 2-Substituted Pyrimidines(I1) ~~
nZPM
0. 0 0
0. 46 0. 80 0. 94 0 . 39 0. 67 1. 18 1. 45 0. 13 -0.27 0. 13 1. 51 -0. 76 0 . 24 -0.37
0 . 03
0. 32 0 . 82 0. 94 0. 44 0 . 69 1. 17 1. 18"' 0 . 17 -0.03"' 0 . 51"' 0.90"' -0.32"' - 0 . 31"' -0.10"'
Dev.
-0.03 0. 14 -0. 02 0. 00 -0.05 -0.02 0. 01 0. 27 -0.04 -0. 24 -0.38 0. 61 -0.44 0. 55 -0.27
a) Not i n c l u d e d i n the c o r r e l a t l o n b u t c a l c u l a t e d by Eq. I .
162
TABLE 6 Hydrophobicity P a r a m e t e r s of 4 - S u b s t i t u t e d PyrimidinesCIII) z q p M Substltuent
Calcd.
H C1 Me OMe
OEt CN COzMe NMez CONHz NH2 NHAc
(Eq. 1 3 )
(Eq. 8 ) Obsd.
0. 0 0 0. 9 1 0. 39 0. 98 1. 4 1 0. 36 0. 17 1. 0 2 -0.24 0. 1 9 0. 47
Dev.
-0. -0. -0. 0.
0. 06 1. 0 4 0. 5 1 0. 81 1. 3 8 0. 36 0. 08 1. 0 2 -0.25"' - 0 . 42"' - 0 . 11"'
0.
0. 0. 0. 0. 0. 0.
06 13 12 17 03 00 09 00 01 61 58
Calcd.
Dev.
-0. 03 0. 01 -0.05 0 . 11 0. 03 0. 0 8 0. 0 6 -0. 16 -0.25 0. 1 3 0. 04
0. 03 0. 90 0.44 0. 8 7 1 . 38 0. 28 0 . 11 1. 18 0. 0 1 0. 06 0. 43
a) Not i n c l u d e d in t h e c o r r e l a t i o n b u t c a l c u l a t e d b y Eq. 8.
TABLE 7 Hydrophobicity P a r a m e t e r s of 5-Substituted Pyrimidines(IV)
n g p M < l o g P ( p y r i m i d i n e ) = -0. 44> (Eq. 9)
Substituent
Calcd. H
F C1
Br Me OMe
OEt C02Me COzEt NMez CONHz
NHAc
0. 0 0 0. 4 1 0. 9 1 1. 10 0. 4 5 0 . 51 1. 0 0 0.47 0. 96 0. 90 -0.48 0.22
0. 08
Dev.
-0. 08 0. 05 0 . 9 4 -0. 03 1.19 -0.09 0 . 6 1 -0. 16 0. 47 0. 04 0. 8 9 0 . 11 0 . 41 0. 06 0. 9 6 0. 0 0 0. 8 0 0. 10 - 0 . 78"' 0 . 3 0 -0.09"' 0.31 0. 3 6
(Eq. 14)
Calcd.
0. 03 0. 37 0. 9 1 1. 1 4 0.50 0.53 0. 9 1 0. 4 7 0.98 0. 8 6 -0.52 0. 28
(Eq. 17)
Dev.
-0. 0. 0. -0. -0. -0. 0. 0.
-0.
0. 0.
-0.
03 04 00 04 05 02 09 00 02 04 04 06
Calcd.
Dev.
0.10 0.42 0. 97 1.11 0. 5 9 0. 47 0 . 81 0. 45 0. 9 3 0. 78 -0.58 0 . 40
-0.10 -0. 01 -0. 06 -0.01 -0. 14 0. 04 0. 1 9 0. 02 0. 0 3 0. 12 0. 10 - 0 . 18
a) Not included in t h e c o r r e l a t i o n but c a l c u l a t e d b y Eq. 9.
163 TABLE 8 Hydrophobicity Parameters of 3-Substituted Pyridazines(V1 z g p D
0.09 1. 0 1 0.47 0.83 1. 2 3 0. 12 0. 9 5 -0.14"'
Dev. -0.09 - 0 . 18 -0. 09 -0. 02 0 . 13 0 . 18 0. 07 0. 14
Calcd. 0.12 1. 0 1 0.49 0.84 1. 2 1 0. 17 0. 94 -0. 08
Dev. -0.12 - 0 . 18 -0.11 -0. 03 0 . 15 0. 1 3 0. 08 0. 08
~
a) Not included in the correlation but calculated by Eq. 10.
TABLE 9 Hydrophobicity Parameters of 4-Substituted Pyridazines(V1). R ~ P D
(Eq. 16)
Calcd. 0. 05 0.42 0.46 0.19 0. 42 0. 81 0. 70 0. 28 0. 08 0.43
Dev. -0. 05
-0. 0 1 -0. 04 -0.09 0. 04 0 . 15 -0. 06 0. 05 0. 12 -0.11
a) Not included in the correlation but calculated by Eq. 11
164 t w o t y p e s of s u b s t i t u t e d p y r i d i n e s s h a r i n g a common s u b s t i t u e n t e x c e p t f o r
F o r such 2 - s u b s t i t u t e d d i a z i n e s as 2PR(I). 4PM
t h e symmetric 2PM and 5PM
(111). and 3PD ( V ) , t h e c o r r e l a t i o n with
the alternative ( n3
or
p
~
A ~ P Y )
KZPY
w a s always b e t t e r t h a n t h a t with
a s shown i n Table 10
In p a r t i c u l a r . f o r
t h e 2PR (I) system, t h e s i n g l e c o r r e l a t i o n with n p P y is much b e t t e r t h a n t h a t with
i n c l u d i n g s u c h o u t l i e r s as OR, SMe, A c . COzR, CONHz. NH2, and N M e z
K 3py,
in Eqs
2 and 3 f o r t h e 2PY system
This i n d i c a t e s t h a t t h e
v a l u e of 2-
A
s u b s t i t u e n t s i n d i a z i n e s u l t i m a t e l y s h a r e s components from complex proximity i n t e r a c t i o n s with
z
K
p
~
The a b o v e o b s e r v a t i o n s l e d u s t o u s e t h e
Then, Eq. 1 c a n b e r e w r i t t e n as
series as t h e r e f e r e n c e independent v a r i a b l e Eq
value f o r t h e pyridine
A
4 c o n s i d e r i n g t h e backward e f f e c t of t h e ring-N by t h e u P ( p ,
Z(ortho)-X-substituents
A s a matter of f a c t , t h e u s e of
0 )
p x
~
o n hydrogen-bondable
term
~
as t h e r /e f e r e n c e means ~ t h a t ~t h e
~
i n t h e r e f e r e n c e p y r i d i n e system is r e g a r d e d as a
s u b s t i t u e n t X p l u s -N=
" s i n g l e " u n i t (N+X), and t h a t t h i s (N+X) u n i t and t h e second -N= function(Y) are bidirectionally interacting partners.
X,
u
x",
but
not
that
of
(N+X)
In Eq. 4, however, t h e forward e f f e c t of
to
is assumed
work
on
the
second
-N=
function(Y1 and t h e backward e f f e c t of Y o n l y o n X is c o n s i d e r e d depending upon t h e r e l a t i v e p o s i t i o n s of X and Y.
TABLE 10 S i n g l e Correlation C o e f f i c i e n t (r) between DiazineA Z P R ( I )
nppy ~ 3 p y
0.954 0. 814
0 . 940
-
-
0. 678 16
0. 680 15
4PY
~ p h x
ne'
nZPM(I1) -
n4PM(III)
x
a n d Reference x
r5PM(IV)
n3PD(V)
-
0. 923 0. 785
0. 881 -
0 . 956 -
0. 783 0 . 589 10
-
0. 840 12
0 . 556 8
Values.
n4pD(vI) -
0. 8 6 5 0. 8 8 5 0 . 710 10
a) Number of s u b s t i t u e n t s whose n v a l u e s are a v a i l a b l e i n common among m o n o s u b s t i t u t e d b e n z e n e and p y r i d i n e systems.
The a n a l y s i s was made n
~
for
each series
For
t h e reference
v a l u e , t h~e c h o i c e ~ was made a~c c o r d i n g t~o which is~ b e t t e r asd t h e
parameter set i n t h e f i n a l c o r r e l a t i o n s
4PM a n d 3PD). 4PD series.
(I-VI)
KZPY
For
a -substituted
w a s , of c o u r s e . t h e parameter of c h o i c e
Z ~ P Yand
A ~ P Y ,
r e s p e c t i v e l y , were s e l e c t e d
d i a z i n e s (2PR. 2PM, F o r t h e 5PM a n d
In f a c t , t h e select-
~
e d s e t of pyridine-Ir
v a l u e s was always b e t t e r c o r r e l a t e d with t h e set of
corresponding diazine
(
v a l u e s even i n t h e s i n g l e c o r r e l a t i o n
K
For non-hydrogen bonding s u b s t i t u e n t s where
p
=
0.
T a b l e 10
)
and some hydro-
g e n a c c e p t i n g s u b s t i t ~ i e n t swhere p x is n o t v e r y l a r g e , t h e "backward" e f f e c t (a
pp
of Y o n X is n o t v e r y s i g n i f i c a n t
x)
e a c h d i a z i n e series. Eq
F o r s u c h a s u b s t i t u e n t set In
4 c a n b e simplified t o E q
5
Thus, w e f i r s t used Eq. 5 f o r e a c h d i a z i n e series e x c l u d i n g t h e amphiprotic
NHz
substituents, NHAc.
examined t h e u s e of a s u b s t i t u e n t series the
u
and CONHz. possessing
large
values
p
4, i n c l u d i n g amphiprotic s u b s t i t u e n t s
p r o c e e d e d t o u s e Eq
I
and a
parameters i n p l a c e of
a
term was
we
in E q
5 f o r each
A s i n t h e case f o r s u b s t i t u e n t s i n t h e p y r i d i n e series.
p a r a m e t e r g a v e almost e q u i v a l e n t statistics with
I
Then,
W e preliminarily
insignificant
The
coefficient
of
the
x
a
but
Dyrldlne
term.
the
a
R
however.
d e v i a t e d from u n i t y s i g n i f i c a n t l y f o r some series In T a b l e 11. t h e c o r r e l a t i o n e q u a t i o n s (Eqs 6-11) formulated with u s e of
E q 5 are shown
values according t o t h e s e
In T a b l e s 4-9, t h e c a l c u l a t e d x
e q u a t i o n s are l i s t e d f o r comparison with t h e o b s e r v e d values. series. N M e z ,
SMe and
From t h e 2PM
COzR had t o b e d e l e t e d , i n a d d i t i o n t o t h e amphiprotic
s u b s t i t u e n t s , t o o b t a i n t h e a c c e p t a b l e q u a l i t y of t h e c o r r e l a t i o n t h e number of
Although
d a t a r e l a t i v e t o t h a t of t h e independent v a r i a b l e terms w a s
n o t s u f f i c i e n t i n some series, E q
5 seems t o h o l d as f a r as
~r
v a l u e s within
e a c h system are c o n c e r n e d A s e x p e c t e d , t h e c o e f f i c i e n t of t h e p y r i d i n e - x
t e r m was close t o u n i t y
b e i n g c o v e r e d almost completely i n t h e r a n g e of t h e 95% confidence i n t e r v a l i n Eqs
6-11
f u n c t i o n (-N=),
The p y v a l u e , t h e s u s c e p t i b i l i t y c o n s t a n t of t h e s e c o n d aza v a r i e d , however, depending on t h e system
j u s t i f i e d o n l y a t t h e 92% l e v e l i n E q
Although it w a s
10 f o r t h e 3PD system, t h e
p y
value
c o u l d b e c a t e g o r i z e d i n t o two g r o u p s : o n e g r o u p l o c a t e d a r o u n d 0 8 7 (Systems I, 111, a n d V), and t h e o t h e r a t a b o u t 0 5 3 (Systems 11, IV. and VI)
I. 111, a n d V. t h e r e were common s t r u c t u r a l f e a t u r e s s u c h t h a t as t h e r e f e r e n c e
In systems
I I ~ P Y
was used
For systems IV and VI where x B P y and n I P y were used,
t h e p y v a l u e was lower t h e reference, t h e p
Although system I1 is one of t h o s e i n which 2PY is
v a l u e w a s lower i n E q 7
t h i s is a v a i l a b l e a t p r e s e n t
much h i g h e r t h a n t h e c o r r e s p o n d i n g s t i t u t e d p y r i d i n e system
No a d e q u a t e e x p l a n a t i o n f o r
I t w a s a l s o noted t h a t t h e s e P Y
values in E q s
p y
v a l u e s are
2 and 3 f o r t h e sub-
The v a r i a t i o n s i n t h e p y v a l u e s with d i f f e r e n t
systems i n d i c a t e t h a t t h e assumptions made t o f o r m u l a t e Eqs
4 and 5 a p p l y
TABLE 11 Correlations of z ( d i a z i n e ) w i t h E q . 5 ” ’
I : 2PR
1I:ZPM
System I I1
III:4PM
IV:5PM
r
Correlation If
ZPR
=
1. 1 9 3 z ~
P
Y+
(0.0 9 3 ) 7C 2 P M
=
1. 0 4 8 n
ZPY
(0. 195)
I11
Z ~ P M =
IV
*SPM
V
~
VI
Z ~ P D=
0 . 8 2 6 u x” ( m )
(0. 241)
*
0 . 555U ; ( p ) (0. 303)
1. 2 4 5 x 2 ~+ ~0 . 8 7 2 u E ( p ) (0.544)
(0. 345) =
1. 0 0 5 n s p y
(0.2 4 1 ) S
P
D=
+
0. 4 7 5 u E ( m ) (0. 398)
0 . 9 5 3 z z ~+ ~0 . 8 8 3 u f ( m ) (0. 385) (1. 055)
0 . 9 9 4 ~ 4 p y+ 0 . 5 7 3 u f ( m ) (0. 331)
(0. 245)
VI : 4PD
V : 3PD
S
F
n”’
E q . No
+
0.014
0.994
0. 073
456. 0
15
6
+
0. 033
0. 988
0. 072
105. 8
8
7
0. 0 5 5 (0. 214)
0.979
0.119
57. 2
8
8
0.081 (0. 158)
0. 9 6 9
0.100
53. 0
10
9
0. 087 (0. 312)
0. 961
0. 1 6 0
23. 8
7
10
0.020 (0. 144)
0.985
0 . 069
63. 6
7
11
(0. 097) (0. 131)
+
+
+
-
a) Reproduced from r e f . 13 with permission o f t h e copyright owner, VCH Publishers, Inc. b) S u b s t i t u e n t s n o t included: I: CONHZ. NHz. NHAc. 11: SMe. COzR. NMez. CONHZ. NHz. NHAc. 111: CONH2, NHz. NHAc. IV: CONHZ. N H A c . NHz(not measured). V: CONHz.NHz(not measured),NHAc(not measured). VI: CONHZ, NHz. NHAc.
167 Eq. I a l s o differed from t h e o t h e r s in t h a t it w a s
only within each series.
unable t o accommodate w e l l non-amphiprotic
s u b s t i t u e n t s such a s NMez, SMe
and COZR. Table
Eq. 4 for
12 lists, t h e c o r r e l a t i o n substituents
including
Tables 4 and 6-9 a l s o show t h e x
equations Eqs.
12-16 formulated
amphiprotic ones having higher
P X
with
values.
values calculated according t o E q s . 12-16.
For t h e 2PM system, no reasonable c o r r e l a t i o n equation was derived.
In
addition t o o u t l i e r s u b s t i t u e n t s from Eq. I , no amphiprotic s u b s t i t u e n t s w e r e accommodated and so t h e c o r r e l a t i o n with t h e bidirectlonal procedure w a s n o t formulated f o r t h i s series.
For t h e 2PR. 4PM. and 5PM systems, however, NHz. w e l l accommodated in Eqs. 12, 13, and 14.
CONHz.
and N H A c were
Again, t h e number of d a t a r e l a t i v e
t o t h a t of t h e independent variable terms w a s n o t enough in some systems. Since t h e c o r r e l a t i o n f o r 3PD was derived from t h e d a t a set including only one amphiprotic s u b s t i t u e n t , t h e significance of t h e coefficient of t h e
p x
t e r m in Eq. 15, which was Justified only a t t h e 90% level, i s somewhat uncerThe coefficient of t h e
tain.
term in Eq. 16 f o r t h e 4PD system was
xqpy
considerably lower t h a n t h a t in t h e corresponding Eq. 11 in Table 11. substituted
pyridazine series, t h e two vicinally
s t r o n g l y i n t e r a c t wlth each other.
N in t h e r e f e r e n c e X-substituted
situated
ring
In t h e
N-atoms may
The e l e c t r o n i c i n t e r a c t i o n s between X and pyridines could be severely d i s t o r t e d be-
tween t h e corresponding X and N
in t h e X-substituted
pyridazines.
Under
such conditions, t h e regression coefficients of t h e x d p y term i n Eq. 16 would b e highly p e r t u r b e d by t h e N-N e n t is amphiprotic.
interactions, especially when t h e X-substitu-
Since t h e numbers of compounds included in t h e s e pyrida-
zine systems are not l a r g e enough, examination of t h i s type of p e r t u r b a t i o n a f f e c t i n g t h e bidirectional
model requires f u r t h e r study, with additionally
synthesized compounds Except f o r t h e s e minor uncertainties, however, t h e general p a t t e r n of v a r i a t i o n s in t h e P Y value is s i m i l a r with t h e sets of equations in Tables 11 and 12. ic terms
The sign of t h e regression coefficient of t h e bidirectional e l e c t r o n in Eqs.
12-16 can be rationalized a s being analogous t o t h a t
in
c o r r e l a t i o n s f o r d i s u b s t i t u t e d benzenes (8). The forward effect of e l e c t r o n withdrawing
X
substituents
tends
to
decrease
the
basicity
of
-N=.
being
unfavorable t o s o l v a t i o n with t h e more acidic water, enhancing t h e p a r t i t i o n ing toward t h e less acidic 1-octanol phase. P
Y
value (coefficient of
u
This process explains why t h e
Z ) i s positive in a l l t h e correlations.
The positive sign of t h e
u?
value (coefficient of
p
f o r t h e back-
ward effect reflects t h e electron-withdrawing n a t u r e of t h e aza (-N=) The c o e f f i c i e n t of t h e P
x
group.
t e r m in Table 12 seems t o be categorized i n t o two
TABLE 12 Correlations of n (dlazlne)
IC
( d i a z i n e ) w i t h Eq. 4
a z (pyridine)
=
-
pya,"
+
O F p x
+
C ~~~~
Systcm
r
Corrclation
F
E q No.
n
0 . 3 8 1 ~+ ~0.016 (0. 175) (0.101)
0. 994
0. 078
355. 8
18
12
0 . 5 9 9 P x - 0.0'27 - (0. 2 1 9 )
0. 973
0. 138
42. 1
11
13
0.9 3 1 ~ +3 0. ~ 6~5 9 a t ( m ) - 0. 1 1 6 +~ 0. ~ 031 (0.0 7 8 ) (0.219) (0. 1 5 8 1 ( 0 . 083)
0.995
0 . 054
278. 0
12
14
0. 026 (0. 239)
0 . 984
0. 1 1 7
40. 5
8
15
0 . 4 6 6 ~- ~0 . 0 4 6 (0. 328) (0. 176)
0.966
0 . 107
27. 8
10
16
I
XZPR
111
X ~ P W=
IV
X 5PM
Y
X ~ P D=
+
vI
z4PD
- 0.447aPIm)
=
1 . 1 3 0 n z ~+ ~0 . 7 5 6 u p ( r n ) (0. 242)
(0.080)
I.115nzpy
(0.2 6 7 )
-
O.7448P(p)
+
+
(0. 4 3 0 )
(0.5 0 1 1
=
0. 8 9 8 x 2 ~ ~0 . 9 4 1 0 Z ( m ) (0.2 4 0 ) (0. 774) 0.706X,,~ (0.191)
=
S
( 0 . 4221
*
+
0. 4 7 4 p x (0.6 3 5 )
+
169 g r o u p s : o n e is 0 6 i n Eq a r o u n d 0 43 i n E q s second
13 f o r t h e 4PM series and t h e o t h e r is c e n t e r i n g
12, and 14-16
13. t h e backward effect of
In E q
atom is d i r e c t e d t o t h e " p a r a " X s u b s t i t u e n t s .
ring-N
systems, it acts i n t h e "meta" d i r e c t i o n group
These
the
in o t h e r
The magnitudes of t h e s e t w o t y p e s ( p ) and
o f c o e f f i c i e n t c o u l d c o r r e s p o n d t o t h o s e of t h e u t h e second -N=
but
( m ) v a l u e s of
u
values, 0 6 and 0 4, are, however, c o n s i d e r a -
u
b l y lower t h a n t h o s e of 0 99 and 0 9 3 , r e s p e c t i v e l y . o b s e r v e d f o r s u b s t i t u t e d pyridines The a b o v e b i d i r e c t i o n a l
p r o c e d u r e with t h e use of
x z P y v a l u e as t h e r e f e r e n c e worked v e r y well f o r t h e
t h e corresponding
~
Z
ries i n c l u d i n g " i r r e g u l a r " s u b s t i t u e n t s i n t h e c o r r e l a t i o n of
x q p M se-
and
P
R
x
~
3,
with E q
P
Y
s u p p o r t i n g t h e a b o v e mentioned a n t i c i p a t i o n t h a t t h e components a t t r i b u t a b l e
t o t h e proximity
effects
between
similar among t h e s e 2 - s u b s t i t u t e d
the
a-substituent
diazine-x
p r o c e d u r e did n o t work. however, f o r t h e c o r r e l a t i o n between x from t h e pyridine- x
2 P M
and x
zpy
are
N-atom
and pyridine-
K ~ P Mseries
s u b s t i t u e n t s were considered
a n d amphiprotic
and
IC
very
values
The
if hydrogen-acceptable
together
was n o t v e r y low ( r
Though t h e simple =
0 94 1, t h e s h i f t s
v a l u e s f o r t h e s e hydrogen-bonding s u b s t i t u e n t s were t o o
i r r e g u l a r t o explain
3.1.2 Physicochemical Meaning of t h e C o r r e l a t i o n s : A s mentioned above, t h e p r e s e n t model r e p r e s e n t e d by E q
4, in which t h e p y r i d i n e - x ,
are u s e d as i n d e p e n d e n t v a r i a b l e s , assumed t h a t t h e (N+X) e n c e p y r i d i n e system a n d t h e second - N = with
each o t h e r
p and
p
unit in the refer-
(Y) i n t e r a c t b i d i r e c t i o n a l l y
The " b i d i r e c t i o n a l " i n t e r a c t i o n s
c o n s i d e r e d f o r t h e second -N=
(N+X) g r o u p
function
u
were,
however,
actually
o n l y with t h e s u b s t i t u e n t X b u t n o t with t h e
The i n t e r a c t i o n between t h e two N-atoms w a s t a k e n as being un-
changed. a t least within e a c h series of s u b s t i t u t e d d i a z i n e s
These assump-
t i o n s a p p a r e n t l y i n c l u d e o v e r s i m p l i f i c a t i o n s as s u g g e s t e d above, i n p a r t i c u l a r , f o r t h e 2PM. 3PD, and 4PD systems We p r e l i m i n a r i l y a t t e m p t e d t o a n a l y z e diazine- x
e a c h system
v a l u e s using x
PhX
for
The 5PM series w a s t h e o n l y one i n which a n a c c e p t a b l e c o r -
r e l a t i o n was found as shown i n E q
17 by r e g a r d i n g t h e d i a z a g r o u p (-N=(C)-N=)
a s t h e i n v a r i a b l e " s u b s t i t u e n t Y"
The c a l c u l a t e d
II
v a l u e s are also shown i n
T a b l e 7.
x n
5 P M
=
=
12
0.9 3 9 x ~ h x+ 0.5 6 2 ;~( m ) ( 0 . 188) (0. 510) r
=
0. 974
s
=
The a c c e p t a b l e q u a l i t y of E q
+
0. 126
1. 2 5 3 +~ 0. ~ 095
(0. 477)
F
=
( 0 . 188)
1171
48. 4
17 a s a c o u n t e r p a r t of Eq. 14 c o u l d b e under-
s t o o d by t h e f a c t t h a t t h e 5PM series
is
unique among o t h e r isomers i n being
170 symmetric, where t h e s u b s t i t u e n t and two N-atoms have "meta" relationships, perhaps without significant d i r e c t resonance e f f e c t s among t h e t h r e e functions. The "forward" e f f e c t of would,
in f a c t , r e p r e s e n t
i n t e r a c t i o n s between t h e two N-atoms a r e neglected. uPp
e f f e c t on s u b s t i t u e n t X in terms of proximately twice t h e e f f e c t of
u
t (m)
values in Eq.
in Eq.
each of
u
x"
in Eq.
17
t h e two N-atoms
if
X in terms of
substituent
twice t h e e f f e c t on each of
p
Likewise, t h e "backward"
17 could account f o r ap-
t h e N-atoms.
When t h e
and
p
17 a r e divided by two, t h e forward and backward ef-
fects between each p a i r of X and -N=
s u b s t i t u e n t s a r e 0.28 and 0.63. respec-
These values are c l o s e r t o t h e corresponding values, 0.26 and 0.94 in
tively.
Eq. 3 f o r
x x
values in t h e monosubstituted pyridine series on t h e basis of
z P h X . This seems t o s u p p o r t t h e relevancy of Eq. 1 in analyzing t h e substituent
II
values of N-heterocycles.
on t h e basis of
Eq. 14.
II S P Y .
p
Y
and
0
When t h e analysis of
S(m)
n,,,,
w a s made
were 0.66 and 0.42, respectively
The "enhanced" forward e f f e c t of X in Eq. 14 in terms of
p yu
in
x" ( m )
compared with 0 . 2 6 ~P in Eq. 3 could be due t o t h e e f f e c t of t h e N-atom
in
t h e (N+X) u n i t on t h e second N-atom in augmenting t h e effect of X so t h a t t h e use of t h e r e g u l a r u P would under-estimate t h e t o t a l effect of (N+X). ing t o a higher
value.
p y
lead-
The reduction of t h e backward e f f e c t could be
due t o a competition f o r e l e c t r o n s of hydrogen-bondable
s u b s t i t u e n t X be-
tween two N atoms a p p a r e n t l y non-additive in n a t u r e so t h a t t h e p
x
value in
Eq. 14 could be over-estimated, leading t o a lower u t ( m ) value. The above t y p e of i n t e r a c t i o n s among X-substituent and two N-atoms a r e c e r t a i n l y r e f l e c t e d in c o r r e l a t i o n equations o t h e r than Eq. 14 f o r t h e o t h e r diazine systems. The forward effect of X on t h e second N atom in terms of p y is higher t h a n t h a t f o r 5PM f o r systems in which t h e reference
(N+X) group
w a s t a k e n as t h e ZPY, such a s 2PR (Eq. 12). 4PM (Eq. 13), and 3PD (Eq. 15). r e v e r s e is t h e case f o r t h e 4PD system (Eq.
The
16) where t h e (N+X) group was
This t r e n d is understandable since t h e e f f e c t of X on
r e g a r d e d a s t h e 4PY.
N within t h e (N+X) group i t s e l f is highest in systems with t h e 2PY r e f e r e n c e and lowest in t h a t with t h e 4PY reference.
The higher t h e e l e c t r o n i c e f f e c t
of t h i s type, t h e g r e a t e r is t h e e x t e n t of t h e under-estimation ward e f f e c t by t h e use of t h e r e g u l a r
d
of t h e for-
P , leading t o t h e higher
The backward e f f e c t of t h e second N atom on t h e (N+X) in terms of coefficient of t h e t h e o t h e r systems. higher t h a n t h e u t p
x
P x
p
value.
u p , the
term. i s higher f o r t h e 4PM system (Eq. 13) t h a n f o r
This is understandable because t h e
u t h e t a ) f o r t h e -N=
uF(para) value i s
" s u b s t i t u e n t " , although t h e " i n t r i n s i c "
term f o r t h e e f f e c t of two N-atoms in diazine systems i s n o t additive
as mentioned above.
171 In E q
7 for t h e ZPM series. OR s u b s t i t u e n t s were w e l l i n c o r p o r a t e d . b u t
s u c h g r o u p s as SMe. N M e z a n d NHz w e r e p o s i t i v e o u t l i e r s and CONHz.
COZR were n e g a t i v e o u t l i e r s between
N H A c and
P o s s i b l e proximity steric and i n d u c t i v e e f f e c t s
X and t h e s e c o n d N atom were examined by i n t r o d u c i n g t h e Taft-
Kutter-Hansch
E,
and t h e C h a r t o n
u
I
terms f o r X. s i n g l y or t o g e t h e r , b u t The OR s u b s t i t u e n t s i n t h e 2PM
t h e y did n o t r a t i o n a l i z e t h e deviations
system would s h a r e proximity i n t e r a c t i o n s with j u s t o n e of common with t h o s e i n t h e 2PY system 2PM series. t h e
the N
atoms i n
For o t h e r outlying su b stitu e n ts in t h e
z Z P M v a l u e seems t o d e v i a t e from t h e c a l c u l a t e d v a l u e i n a
manner similar t o b u t n o t n e c e s s a r i l y t h e same as t h a t o c c u r r i n g with P a r a m e t r i z a t i o n of
t h e proximity e f f e c t s involved
h e t e r o c y c l i c compounds as
2-substituted
pyridines
K z p y
i n such 2 - s u b s t i t u t e d and
pyrimidines
N-
require
f u r t h e r elaboration
3.2 Di- and P o l y - s u b s t i t u t e d P y r a z i n e s Analysis and p r e d i c t i o n of t h e l o g P v a l u e s would b e more d i f f i c u l t f o r m u l t i - s u b s t i t u t e d d i a z i n e s , i n which a d d i t i o n a l e l e c t r o n i c and steric
interac-
2.0-
Q
1.0-
0 3
0 -
0 H
-0.5
I
I
I
I
I
I
Total Carbon Number Fig. 1. P l o t of log P a g a i n s t the c a r b o n number in a l k y l p y r a z i n e s Closed circles. 2-X-substituted p y r a z i n e s : t h e symbols r e p r e s e n t t h e X-substituents Open circles. p o l y s u b s t i t u t e d p y r a z i n e s : t h e numerals indicate t h e substituent positions S l o p e A: , S l o p e B: -------, S l o p e C: - - (Reproduced from r e f 14 with permission of t h e copywrite owner. t h e American Pharmaceutical Association)
172 t i o n s between individual p a r t n e r s of s u b s t i t u e n t s and ring-N atoms a r e generally
involved
A s a f i r s t t r i a l , we investigated possible procedures
t o t h e above mentioned bidirectional Hammett-Taft-type
similar
analyses f o r t h e log P
values of d i s u b s t i t u t e d pyrazines and extended t h e analysis t o p o l y s u b s t i t u t pyrazines (14)
ed
pyrazines
The experimentally measured log P values of
a r e l i s t e d in Tables 13-15, while t h o s e f o r
disubstituted
polysubstituted
pyra-
z i n e s are given in Table 16 In di- and poly-substituted systems. s t e r i c e f f e c t s may o p e r a t e in a
manner t h a t t h e s u b s t i t u e n t k ) n e x t t o t h e ring-N atom would
such
hinder
the
pair
elec-
solvent
molecule
trons
To test t h i s possibility. we first examined various a l k y l p y r a z i n e s
from forming a hydrogen bond with t h e N-lone
in
which t h e e l e c t r o n i c c o n t r i b u t i o n was thought t o be very low Log P values including mono- and poly-alkylsubstituted plotted shown
against in
Fig
pyrazines were
t h e t o t a l carbon number contained in t h e a l k y l 1
F o r t h e s e r i e s of 8-n-alkyl
substituted
chains
pyrazines,
p y r a z i n e t o 2-Bu-pyrazine, t h e increment per C1-unit was very r e g u l a r , mated
a s 048+0 01 (slope A)
When a methyl group was introduced
as from
esti-
in
their
o r t h o position, t h e increment for each alkylpyrazine was 0 3810 03 h = 4 , s l o p e
B).
which
was
lower t h a n t h a t f o r monoalkyl
pyrazines
With
successive
methyl s u b s t i t u t i o n s a t t h e f o u r available pyrazine positions, t h e log P v a l u e increased almost r e g u l a r l y from mono-Me t o Me4 derivatives (omitting with
a s l o p e (C.036i004) c l o s e t o s l o p e B
2,5-Mez)
These r e s u l t s suggest t h a t
di-
s u b s t i t u t i o n s of f a i r l y bulky groups a t t h e 2,3- and Z6-positions. b u t n o t t h e 2,5-position have s t e r i c e f f e c t s
3.2.1 C o r r e l a t i o n Analyses for Disubstituted Pyrazines: The a n a l y s e s were made f o r t h e
A
( d i s u b s t ) ~value ~ defined by E q
For comparison. t h e sum of t h e corresponding s u b s t i t u e n t from
( 2K 15
monosubstituted
It
pyrazines
(2A
zPR ) ,
and 2-substituted pyridines ( 2A
P h X ) ,
monosubstituted
benzenes
a r e presented in Tables 13-
Although t h e s i t u a t i o n is considerably improved with
d e v i a t i o n s from t h e observed values a r e still g r e a t decreased when t h e
XZPR
P
~
A ~ P Y .the
The deviations a r e much
values a r e used, although t h e a d d i t i v i t y t e n d s t o
r e s u l t i n over-estimation of values JC
A
of t h e log P v a l u e f o r most com-
v a l u e s r e s u l t s in g r e a t under-estimation
pyrazine
values derived
c l e a r l y seen t h a t t h e prediction based on t h e a d d i t i v i t y of
IS
pounds
Z P Y )
A
18
Thus, we s e l e c t e d t h e monosubstituted
values a s t h e r e f e r e n c e parameter
For applying E q
1 t o analyze t h e
A
( d i s u b s t ) p R values, some modifica-
X
173 t i o n s were r e q u i r e d
In u s e of
( d l s u b s t ) p R values r e l a t i v e t o t h e log P
II
v a l u e of t h e u n s u b s t i t u t e d p y r a z i n e . no d i s t i n c t i o n was made between X and Y The a n a l y s e s were
s u b s t i t u e n t s as t o which is fixed and which is v a r i a b l e made
for
individual
t y p e s of
disubstituted
pyrazines
t r e a t t h e sum of t h e b i d i r e c t i o n a l e l e c t r o n i c terms, a n i n d e p e n d e n t v a r i a b l e . as shown i n E q
where 2
r e p r e s e n t s t h e sum of
II 2~~
II
we
First, p yu
P and
tried
u pp
to as
19,
f o r X and Y s u b s t i t u e n t s .
ZPR
In t h i s
t r e a t m e n t , b i d i r e c t i o n a l i n t e r a c t i o n s are assumed t o be p r o p o r t i o n a l t o t h o s e i n m-
and p - d i s u b s t i t u t e d b e n z e n e s
The r e g r e s s i o n c o e f f i c i e n t "b" c o u l d b e
a n i n d i c a t i o n of t h e " t r a n s m i t t i n g e f f i c i e n c y ~ 'of t h e b i d i r e c t i o n a l i n t e r a c t i o n s of s u b s t i t u e n t s compared with t h e c a s e of t h e c o r r e s p o n d i n g l y d i s u b s t i t u t e d benzenes
Preliminary
analyses
of
the
values
II
d i s u b s t i t u t e d series l i s t e d in Tables 13 and 14 with
for
v a l u e s i n T a b l e 1. however. demonstrated t h a t Eq
and p
the
u X and
2.6- and 2.5u
constants
19 w a s n o t s u f f i c i e n t
t o s i m u l a t e t h e s i t u a t i o n in "meta" and " p a r a " d i s u b s t i t u t e d p y r a z i n e s W e n e x t t r i e d t o r e p r e s e n t t h e e l e c t r o n i c e f f e c t s by i n d u c t i v e and resou
n a n c e components s e p a r a t e l y using the
P X U I C Y ) ,
P Y U I C X ) ,
I
P Y ~ R ( X ) ,
and
u
and
P X ~ R C Y ,
electronic constants
With
terms, a n a l y s e s were
made by c o n s i d e r i n g whether t h e s u b s t i t u e n t s are amphiprotic (H-donor) or Hacceptor H-donor
W e found t h a t t h e s e t e r m s w e r e s t a t i s t i c a l l y signiflcant only f o r
X and Y s u b s t i t u en t s
In o t h e r words,
v a l u e s f o r H-acceptors
p
c o u l d b e r e g a r d e d as z e r o l i k e t h o s e for non-hydrogen-bonders
w a s t r a n s f o r m e d t o Eq
where p
20,
is t h e o r i g i n a l l y r e p o r t e d P
(AM,
Then, Eq. 19
v a l u e f o r amphiprotic g r o u p s b u t
is t a k e n t o b e z e r o f o r H-acceptors and non-hydrogen b o n d e r s (Table 1). a n d
2
u
p :An,
represents
for amphiprotic X and Y
+
P
:An,
Analysis of a l l t h e 2.6-disubstituted
R
( 2 , 6-PR) v a l u e s i n Table 13 yield-
p
1
Or
R
0 ;
Or
R
0 :
Or
R
substituent pairs
ed Eq. 21
II
( 2 , 6-PR)
=
1. 0 0 8 2
R z p R
+
0. 6 1 5 2 p
(AM) u I
(0.4 8 4 )
(0.0 6 3 ) +
0. 5832 P
(AM,
u
(0.2 5 0 ) n
=
39
r
=
0.987
s
=
0. 125
R
-
0 . 103
(0.1 0 8 ) F B . 35
=
446.2
1211
TABLE 13 Hydrophobicity Parameters of 2.6-Disubstituted Pyrazines (2.6-PR) K
( 2 , 6-PR)
Substituents
x, Y
C1,
F
c1, C1. C1. C1. C1.
c1 Me OMe OEt NH2 NHAc CN C02Me CONHz NMe2
C1. C1. C1. C1, C1, C1. OPr Me, Me Me, OMe Me, N H 2 Me. N H A c Me, C N Me, C02Me Me, COzEt Me. C O N H 2 Me. NMez Me, O P r
log P
Obsd.
Calcd.”’
1 . 15 1. 5 3 1. 03 1. 6 5 2. 2 2 0. 9 5 1. 10 0. 79 0. 4 7 0. 28 1. 9 5 2. 71 0. 5 4 1. 2 9 0 . 35 0 . 38 0. 4 4 0. 1 0 0. 51 -0.13 1. 57 2 . 38
1.41 1.79 1.29 1. 9 1 2. 4 8 1. 2 1 1. 3 6 1. 05 0. 7 3 0. 5 4 2. 21 2. 97 0.80 1. 5 5 0. 6 1 0. 6 4 0. 70 0. 36 0. 7 7 0. 1 3 1. 8 3 2. 64
1.47 1.82 1. 3 4 1 . 88 2. 4 0 1. 2 1 1. 2 5 1. 18 0. 8 2 0. 61 2. 07 2. 9 3 0. 85 1. 4 1 0.5 7 0. 57 0. 70 0 . 30 0.78 -0. 01 1. 5 9 2. 46
dev. -0. -0. -0. 0.
06 03 05 03 0. 0 8 0. 0 0 0 . 11 - 0 . 13 -0. 0 9 -0. 07 0. 1 4 0. 04 -0. 05 0. 1 4 0. 0 4 0. 07 0. 00 0. 06 -0. 01 0. 14 0. 24 0. 18
Z x 2 p R b ’ dev. 1 . 51 1. 9 2 1.43 1. 95 2. 50 1. 1 7 1. 1 9 1.21 0. 9 9 0. 72 2. 15 3. 06 0.94 1 . 46 0.68 0. 7 0 0. 7 2 0. 50 1. 0 1 0. 2 3 1. 6 6 2. 5 7
-0. 10 13 14 04 02 0. 04 0. 1 7 -0. 16 -0. 26 - 0 . 18 0. 06 -0.09 -0.14 0. 0 9 -0.07 -0. 06 -0. 02 -0. 1 4 -0.24 -0. 1 0 0. 17 0 . 07
-0. -0. -0. -0.
Zxphxc’ 0 . 85 1. 4 2 1. 27 0. 6 9 1. 0 9 -0. 5 2 -0. 2 6 0. 1 4 0. 70 - 0 . 78 0. 8 9 1. 76 1. 1 2 0. 5 4 -0. 67 -0.41 -0. 0 1 0.55 1. 07 -0. 9 3 0. 74 1. 6 1
dev.
0. 5 6 0. 3 7 0. 0 2 1. 2 2 1. 3 9 1. 7 3 1. 6 2 0. 9 1 0. 0 3 1. 32 1. 3 2 1. 2 1 -0. 32 1. 0 1 1. 28 1. 0 5 0. 7 1 -0.19 -0. 30 1. 0 6 1. 0 9 1. 0 3
Zzapyd’
0 . 81 1 . 35 1. 0 8 1 . 31 1. 7 8 0. 4 5 0. 4 9 0. 37 0 . 33 0. 1 2 1. 6 2 2. 3 5 0. 9 2 1 . 15 0. 29
0 . 33 0. 21 0. 1 7
0. 6 8 -0. 04 1. 4 6 2. 19
dev. 0. 0. 0. 0. 0.
60 44 21 60 70 0. 7 6 0. 8 7 0. 6 8 0. 4 0 0. 42 0. 59 0. 6 2 -0. 1 2 0. 40 0. 3 2 0. 31 0. 4 9 0. 19 0. 09 0. 1 7 0. 3 7 0. 45
OMe. OMe OMe. O E t OMe. N H 2 OMe. NHAc OMe, C N OMe. C02Me OMe. C O z E t OMe, C O N H 2 OMe. N M e . 2 C N . NMez NMe2. CO2Me NMe2. C 0 2 E t NMe2. C O N H 2 CONHz. O E t F. F OEt, OEt NH2, NH2
1. 5 8 1. 9 8 0.73 0. 82 0.95 0 . 69 1 . 20 0. 13 1. 99 1 . 12 0 . 90 1. 24 0. 38 0. 61 0 . 74 2. 55 -0.45
1.84 2. 24 0.99 1. 08 1.21 0. 95 1. 4 6 0. 39 2. 2 5 1. 38 1. 1 6 1. 5 0 0. 64 0.87 1.00 2.81 -0.19
1. 93 2.45 1.02 1. 0 0 1.23 0. 93 1. 42 0. 58 2. 1 2 1. 41 1. 11 1. 59 0. 6 3 1. 10 1.10 2. 97 -0.22
-0. 09 -0. 21 -0.03 0. 08 -0.02 0 . 02 0 . 04 -0. 19 0 . 13 -0. 03 0. 05 - 0 . 09 0. 01 -0. 23 -0.10 -0. 16 0. 03
1. 9 8 2. 5 3 1.20 1.22 1. 24 1.02 1.53 0. 75 2. 1 8 1. 4 4 1. 2 2 1.73 0.95 1. 3 0 1.10 3. 08 0.42
- 0 . 14 -0. 29 -0.21 -0.14 -0. 0 3 -0.07 -0.07 -0. 36 0. 07 - 0 . 06 - 0 . 06 -0.23 -0.31 -0. 43 -0.10 - 0 . 27 -0.61
- 0 . 04 0 . 36 -1.25 -0.99 -0.59 -0. 03 0. 49 -1. 51 0. 16 -0. 39 0. 17 0. 69 -1.31 -1. 11 0. 28 0. 76 -2. 46
1. 8 8 1. 8 8 2. 24 2.07 1. 8 0 0. 98 0. 97 1. 9 0 2. 09 1. 7 7 0. 99 0 . 81 1. 9 5 1. 98 0. 72 2. 05 2. 27
1 . 38 1. 8 5 0. 52 0. 56 0. 44 0 . 40 0. 91 0. 19 1. 6 9 0. 75 0. 7 1 1. 22 0 . 50 0. 66 0. 38 2. 32 -0.34
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
46 39 47 52 77 55 55 20 56 63 45 28 14 21 62 49 15
a) C a l c u l a t e d b y E q . 23. F o r example, t h e v a l u e for 2-C1-6-NH2-pyrazine 1s c a l c u l a t e d as f o l l o w s : H (2-Cl-6-NHz) = 0 . 9 4 7 { H z p n ( C l ) + a z p n ( N H z ) } + 0. 5 6 8 ( p ( A M ) ( N H ~ ) * u I ( C ~ + ) P ( A M ) ( C ~ ) * ~ I ( N H ~ ) } + 0 . 6 8 0 ( p ( A M ) ( N H 2 ) * o , ( C l ) + P ( A M ) ( C l ) * o , ( N H z ) } - 0 . 080Ee(C1)*E,(NH2) + 0 . 078 = 0. 9 4 7 ( 0 . 9 6 + 0 . 2 1 ) + 0 . 5 6 8 ( 0 . 7 4 * 0 . 4 7 + 0) + 0 . 6 8 0 { 0 . 7 4 ( - 0 . 2 5 ) + 0) - 0 . 0 8 0 ( - 0 . 9 7 ) (-0.6 1 ) + 0 . 0 7 8 = 1 . 2 1 b) For t h e H 2 P R v a l u e of t h e component s u b s t i t u e n t s . see T a b l e 4. c) F o r t h e H~~~ v a l u e of t h e component s u b s t i t u e n t s . see T a b l e 1. d) For t h e H 2 p y v a l u e of t h e component s u b s t i t u e n t s . see T a b l e 1.
176 Although E q
21 seems t o b e s t a t i s t i c a l l y a c c e p t a b l e , examinations of
r e s i d u a l s showed t h a t t h e
v a l u e s of
K
compounds with bulky s u b s t i t u e n t s
s u c h as COzR a n d CONHz g a v e s i g n i f i c a n t n e g a t i v e d e v i a t i o n s
Moreover, t h e
l e v e l of s i g n i f i c a n c e of t h e i n t e r c e p t w a s c l o s e t o 95% in E q
21 These facts
were
effect
thought
Lo
reflect
a
contribution
d i s u b s t i t u t i o n i n lowering t h e derivatives
E
(11)
of
the
steric
of
2.6-
v a l u e as described above f o r 2.6-dialkyl
K
This c o n t r i b u t i o n was examined with u s e of t h e steric 7arameter Eq
ZE, ( 2 , 6 ) v a l u e as an a d d i t i o n a l term
22 o b t a i n e d with t h e
For E, v a l u e s of p l a n a r r-bonded s u b s t i t u -
w a s s l i g h t l y b e t t e r t h a n E q 21
e n t s . t h e v a l u e s f o r t h e steric effect on t h e c o p l a n a r , b u t n o t t h e perpend i c u l a r d i r e c t i o n were used x ( 2 , 6-PR)
=
0. 9502 +
0. 5692
+
K ZPR
(0.0 7 0 )
0. 6 7 1 2 ~ CAM)
+
O R
(0.2 3 5 ) r
n = 39 In E q
0.990
=
u
(AM)
u
I
(0.4 4 0 )
s
0. 0722E, (0.0 5 0 )
0. 1 1 3
=
+
0 . 140 ( 0 . 193)
1221
411. 1
=
22, however, t h e i n t e r c e p t t u r n e d o u t t o be p o s i t i v e and t h e d e v i a t i o n
from z e r o w a s i n c r e a s e d from t h a t i n Eq. 21. effect
of
2.6-substituents.
being
This suggested t h a t t h e steric
overestimated f o r s u b s t i t u e n t s
of
smaller
size, d i d n o t o p e r a t e e x a c t l y additively. b u t p r o p o r t i o n a l l y with t h e bulk of t h e two s u b s t i t u e n t s To e x p r e s s t h i s s i t u a t i o n , t h e cross-product
of
E,(X)
and E,(Y),
Es',
would b e r e l e v a n t , s i n c e it t a k e s v a l u e s c l o s e t o z e r o when one or b o t h of t h e s u b s t i t u e n t s X a n d Y idare) small enough, b u t it i n c r e a s e s Lo a signific a n t l y l a r g e size when b o t h X and produced E q 23.
Y are large
The a d d i t i o n of t h e E,'
term
Although t h e q u a l i t y of t h e c o r r e l a t i o n w a s almost t h e same
as t h a t of E q 22, t h e i n t e r c e p t of E q 23 was much c l o s e r t o z e r o (2. 6-PR)
=
0. 9 4 7 2 II (0.0 7 2 ) -
2PR
0. 080E,'
+
(0.0 5 7 ) n
=
r
39
=
0 . 990
s
=
O. 5682 P (0.443)
+
(AM)
U I
+
0. 6 8 0 2 p
(AM)
0 . 078 ( 0 . 161)
0. 1 1 3
u
R
(0.2 3 8 ) [231
F4. 3 4
=
405. 7
I t is e x p e c t e d t h a t t h e a n a l y s i s f o r 2.5-disubstituted p y r a z i n e s c o u l d b e d o n e in a manner similar t o Eq. 21, s i n c e no s i g n i f i c a n t steric effect would b e induced i n t h e r e l a t i v e s o l v a t i o n of t h e ring-N atoms by i n t r o d u c i n g t h e s e c o n d s u b s t i t u e n t in t h e " p a r a " position. a n d 25 of methoxy
approximately
derivative
was
equivalent omitted
For t h e d a t a in Table 14, Eqs. 24
quality
from t h e
were formulated.
correlation
analyses,
The 2.5-dibecause
it
177 showed l a r g e d e v i a t i o n s i n a l l preliminary t r i a l s
A
( 2 , 5-PR)
0. 9622
=
0. 4752 P
A PPR
(0.0 7 0 ) n
A
n
r
15
=
(2, 5 P R )
15
=
=
0. 994
s
0.9 5 4 2
=
=
K Z P R
+
=
12
0. 5 6 2 2 ~ (AM) O
0. 994
s
0. 0 8 2
R
=
12
493.6
From t h e s t a t i s t i c a l p o i n t of view. it is d i f f i c u l t two e q u a t i o n s 2 p
=
-0 78) among t h e s e f i f t e e n compounds
cal p o i n t of view. however, w e c o n s i d e r t h a t E q
of
the
t o choose between t h e s e
There was a f a i r l y high c o l i n e a r i t y between
(r
u
[251
(0.1 0 9 )
Fz.
0. 077
=
1241
473. 7
=
(0.3 6 2 )
(0.0 6 9 )
r
0. 096 (0.1 0 9 )
(AM) u I
(0.3 2 0 ) 0. 0 7 9 Fz,
2p u
term
is p o s i t i v e ,
2 p
(AM)
u
I
and
From t h e physicochemi-
25, i n which t h e c o e f f i c i e n t
is p r e f e r a b l e
The
electron-withdrawing
effect of s u b s t i t u e n t s t e n d s t o i n c r e a s e t h e a c i d i t y of t h e second s u b s t i t u e n t s . enhancing t h e s o l v a t i o n with 1-octanol, which is and l e a d i n g t o a h i g h e r The b e h a v i o r of t h o s e of from
A
A
( 2 , 5-PR)
m o r e b a s i c t h a n water,
value
A
( 2 , 3-PR)
and
w a s e x p e c t e d t o b e more complicated t h a n
(2. 6-PR),
A
steric i n t e r f e r e n c e as w e l l
because it c o u l d i n v o l v e components possible
as
proximity
electronic
effects
between s u b s t i t u e n t s
Since t h e compounds i n o u r 2,3-PR series d o n o t in-
c l u d e hydrogen-donor
s u b s t i t u e n t s , t h e c o r r e c t i o n for t h e e l e c t r o n i c e f f e c t s
2 p c
i n t e r m s of
2 p
and
AM) u I
(AM) 0 R
was of
no significance
Further
measurements a r e needed t o o b t a i n a g e n e r a l i z e d c o r r e l a t i o n e q u a t i o n t h a t c o u l d a p p l y t o a wider r a n g e of t h e analysis for
A
n
( 2 . 3-PR) =
=
A
1. 0 6 0 2 A z p ~
(0.1 2 6 ) r = 0. 9 8 4
13
2,3-disubstituted pyrazines
( 2 . 3 - P R ) v a l u e s i n Table 15 y i e l d e d Eq 0. 125
C261
(0.2 1 4 ) s
=
0. 131
FI.
1 1
=
343. 3
The c o r r e l a t i o n was improved by t h e a d d i t i o n of t h e E,' a n a l y s i s of
A
( 2 , 3-PR)
=
A
(2. 6-PR),
0. 9 9 2 2
=
The
r
13,
p u
I
=
0. 992.
XZPR
-
0 . 050E,'
+
(0.0 3 4 ) s
=
0 . 096, F P .
term for hydrogen-accepting
test for a p a r t i c i p a t i o n
t e r m as done i n t h e
and gave Eq. 27,
(0.1 0 4 ) n
Nevertheless,
26
=
0 . 082 (0.2 1 3 )
1271
327. 1
s u b s t i t u e n t s was added t o Eq
27 t o
of t h e proximity e l e c t r o n i c (inductive) e f f e c t s be-
tween 2- and 3 - s u b s t i t u e n t s . b u t no improvement was observed.
TABLE 14 Hydrophobicity Parameters of 2,5-Disubstituted Pyrazines (2,5-PR)
K
(2.5-PR)
Substituents
x. Y
c1. C1. C1, C1. C1. C1. C1. C1, Me, Me, Me, Me, Me, OMe, OMe. OMe,
c1 Me OMe
OEt NHz NHAc CN NMez Me CN COzMe COZEt CONHZ OMe NH2 NMe2
log P
Obsd.
1. 58 1. 08 1. 52 1. 99 0. 67 0. 56 0.9 2 1. 70 0. 63”’ 0. 26 0. 17 0. 61 -0.25 1. 14 0. 63 1. 65
1. 84 1 . 34 1. 78 2. 25 0. 93 0.82 1. 18 1. 96 0.89 0.52 0. 43 0. 87 0.01 1. 40f’ 0. 89 1.91
Calcda’
dev.
1. 75 1. 28 1. 78 2.30 0. 93 0. 92 1. 07 1. 97 0.82 0.61 0 . 40 0.88 0.10 1.81 0. 82 2.00
0. 09 0. 06 0. 00 -0. 05 0. 00 -0. 10 0. 1 1 -0.01 0. 07 -0.09 0. 03 -0.01 -0.09 -0.41 0.07 -0.09
Z n
Z p ~ b ’
1. 92 1.43 1.95 2. 50 1. 17 1.19 1.21 2. 15 0. 94 0. 72 0. 50 1.01 0. 23 1. 98 1.20 2. 18
dev
-0. 08 -0.09 -0.17 -0. 25 -0. 24 -0.37 -0.03 -0.19 -0.05 -0.20 -0. 07 -0.14 -0. 22 -0. 58 -0.31 -0. 27
2
K PhXC’
1. 42 1. 27 0. 69 1. 09 -0.52 -0. 26 0. 14 0. 89 1.12 -0.01 0.55 1. 07 -0.93 -0.04 -1.25 0. 16
dev.
0. 42 0. 07 1. 09 1. 16 1. 45 1. 08 1. 04 1. 07 -0.23 0. 53 -0.12 -0.20 0 . 94 1. 44 2. 14 1. 75
2
zpyd’
1. 24 1. 08 1. 31 1. 78 0. 45 0. 49 0. 37 1. 62 0. 92 0. 21 0. 17 0. 68 -0.04 1. 38 0. 52 1. 69
dev.
0. 60 0. 26 0. 47 0. 47 0. 48 0. 33 0. 81 0. 34 -0.03 0. 31 0. 26 0. 19 0. 05 0. 02 0. 37 0. 22
-~ ~~
a) Calculated by Eq. 25. For example, t h e value f o r 2-0Me-5-NHZ-pyrazine 1s c a l c u l a t e d a s follows: n ( 2 - 0 M e - 5 - N H Z ) = 0. 954{n z p R (OMe) + K z p ( N~ H 2 ) } + 0 . 562{p ( A M ) ( N H z ) * U R (OMe) + p ( A M ) (OMe) * - 0 . 082 = 0. 954(0. 99 + 0. 21) + 0. 562{0.74(-0. 57) + 0) 0. 082 = 0.82 b) For t h e n Z P R value of t h e component s u b s t i t u e n t s , see Table 4. c) For t h e K P h X value of t h e component s u b s t i t u e n t s , see Table 1. d) For t h e n z p y value of t h e component s u b s t i t u e n t s . see Table 1. e ) From ref. 6. f ) Omitted from t h e a n a l y s i s b u t calculated by Eq. 25. -
u
R
(NHz) }
TABLE 15 H y d r o p h o b i c i t y P a r a m e t e r s of 2 . 3 - D i s u b s t i t u t e d P y r a z i n e s (2.3-PR)
R
Substituents Y
x.
Me, Me, Me, Me,
Me
Et
Pr n-Bu Et. Et M e , OMe Me, O E t Me, O C H M e z E t . OMe Me, SMe CN. CN COZMe. COzMe COzEt. C O z E t
log P 0. 54 1. 0 7 1. 5 7 2. 1 0 1 . 51 1. 2 4 I. 82 2. 24 1. 8 0 1. 81 0. 38 -0. 02 0. 65
Obsd. 0.80 1. 3 3 1.83 2. 36 1. 77 1. 5 0 2. 08 2.50 2. 06 2. 07 0. 64 0. 24 0. 91
( 2 . 3-PR) Calcda’
dev.
0.94 1.41 1.87 2. 38 1.88 1. 50 2. 04 2.54 1. 9 7 1. 9 0 0. 56 0.26 0. 8 4
-0. 14 -0. 08 -0. 04 -0. 02 - 0 . 11 0. 00 0. 04 -0.04 0. 09 0. 17 0. 08 -0. 02 0. 07
Z
K 2pRb’
0.94 1. 4 2 1.90 2. 4 2 1.90 1. 4 6 2. 0 1 2.51 1. 9 4 1. 9 0 0. 5 0 0. 50 1. 0 8
dev. -0.14 -0. 09 -0.07 -0. 06 -0.13 0. 04 0. 07 -0. 01 0. 12 0. 17 0. 14 -0. 26 -0. 17
Z
A PhXC’
1. 1 2 1. 5 8 2 . 11 2. 6 9 2. 04 0. 54 0. 94 1. 4 1 1. 0 0 1. 1 7 -1. 1 4 -0. 02 1.02
dev.
-0. 32 -0. 25 -0. 28 - 0 . 33 -0. 27 0. 96 1. 14 1. 0 9 1. 06 0. 90 1. 7 8 0. 26 -0.11
Z
dev.
A 2pyd’
0. 92 1.41
-0. 12 -0. 08
-e
-
-e
-e
1.90 1. 1 5 1. 6 2
e
-0.13 0. 3 5 0. 4 6
-e
-e
0. 4 2 0. 55 1. 1 4 0.82 0. 4 7
1. 6 4 1. 5 2 -0.50 -0.58 0. 44
~~
a) C a l c u l a t e d b y Eq. 27. F o r e x a m p l e . t h e value f o r 2-Me-3-OMe-pyrazme IS calculated as f o l l o w s : I[ ( 2 - M e - 3 - O M e ) = 0 . 9 9 2 I Z~P R ( M ~ ) + A ZPR (OMe) } - 0 . 0 5 0 E , (Me) *E, ( O M e ) + 0. 0 8 2 = 0. 9 9 2 ( 0 . 47 - 0 . 0 5 0 ( - 1 . 2 4 ) ( - 0 . 5 5 ) + 0 . 0 8 2 = 1. 50 b) For t h e X ~ P Rv a l u e of t h e c o m p o n e n t s u b s t l t u e n t s . see T a b l e 4. c) F o r t h e z p h X v a l u e of t h e c o m p o n e n t s u b s t i t u e n t s , see T a b l e 1. d) For t h e R Z P Y v a l u e of t h e c o m p o n e n t s u b s t l t u e n t s , see T a b l e 1. e) T h e c o m p o n e n t n 2 p Y v a l u e s are unknown.
+
0. 9 9 )
180 The c a l c u l a t e d v a l u e s according t o E q s
23. 25 and 27 a r e l i s t e d in
Tables 13-15
3.2.2 Physicochernical Meaning of t h e Correlations: The r e s u l t s described above demonstrated t h a t t h e l o g P value of d i - s u b s t i t u t e d p y r a z i n e s having non-amphiprotic s u b s t i t u e n t s could be approximately p r e d i c t e d by t h e additivi t y model of
k 2 P R
v a l u e s provided t h e i r steric effects a r e n o t s i g n i f i c a n t
The b i d i r e c t i o n a l e l e c t r o n i c c o r r e c t i o n terms expressed by p u
p u
and/or
I
a r e needed o n l y f o r amphiprotic substituents(H-donors), possibly because t h e electron-withdrawing
effect of
r i n g N-atoms in t h e pyrazine to
ring tends t o
enhance t h e hydrogen-donating
ability but
reduce t h e proton-accepting
c a p a b i l i t y of t h e s u b s t i t u e n t s
The n e t hydrogen-donating a b i l i t y of c e r t a i n
amphiprotic g r o u p s could be higher t h a n t h a t in s u b s t i t u t e d benzene systems Thus, t h e p
For t h e hydrogen-
v a l u e is s i g n i f i c a n t f o r amphiprotic groups
a c c e p t o r s with lower o r i g i n a l
values. t h e s i t u a t i o n could be d i f f e r e n t
p
Under s u c h conditions, hydrogen a c c e p t o r s would behave as if t h e y were nonis z e r o o r v e r y c l o s e t o zero.
hydrogen b o n d e r s in which t h e "effective" p
T h e r e f o r e , o n l y t h e terms f o r t h e amphiprotic s u b s t i t u e n t s p a r t i c i p a t e i n t h e
20
c o r r e l a t i o n of t h e t y p e of Eq
I t should be noted t h a t t h e bR v a l u e is comparable with br in E q f o r 2.6-PR
Considering t h a t
t h e resonance
importance f o r meta-derivatives.
almost e q u i v a l e n t with t h a t from t h e p u intervention
through
the
ring-N
Eq
P
s u b s t i t u e n t s a r e " p a r a " t o each o t h e r 23 is c l o s e t o t h a t in E q
O R
by
the
two
those
having
two
because
The reason why t h e bR v a l u e i n
25 is n o t c l e a r bulky
minor
substituents
term is significant,
In f a c t . however, examination
of U V s p e c t r a in water showed t h a t t h e absorption of most 2.6-PR
including
of
t e r m t h a t is
p U R
effect might i n d i c a t e a resonance
I
atom sandwiched
For 2.5-PR, it is n o t unexpected t h a t t h e the
is g e n e r a l l y
effect
a c o n t r i b u t i o n from t h e
23
groups.
was
almost
derivatives,
identical
with,
s l i g h t l y more bathochromic t h a n t h a t of t h e corresponding 2.5-PR
but
compounds,
s u g g e s t i n g n o s i g n i f i c a n t d i f f e r e n c e in t h e resonance s t r u c t u r e between t h e two series In any case, considering t h a t
u
=
u
I
+
R
and
u
u
=
+
I
( 1 0 1 , t h e o b s e r v a t i o n t h a t t h e br and bn c o e f f i c i e n t s of t h e ., p u
"
0 4u
t e r m s in
a l l t h e e q u a t i o n s obtained were considerably smaller t h a n u n i t y lead u s t o conclude t h a t t h e p a r t i t i o n i n g behavior in t h e pyrazine system is p e r t u r b e d by t h e X-Y e l e c t r o n i c i n t e r a c t i o n s t o a "lesser" e x t e n t t h a n in t h e benzenoid system
This means t h a t
s u b s t ) PR
ring-N
some of
v a l u e s including t h a t
t h e e l e c t r o n i c component attributable
atoms and s u b s t i t u e n t s has been
reference
t o the
simulated
by
in t h e
x (di-
i n t e r a c t i o n between using
K
z P R as
the
The i n t r o d u c t i o n of b u l k y s u b s t i t u e n t s d e c r e a s e d t h e l o g P v a l u e i n t h e and 2,6-PR series
2,3-
water molecule
Besides t h e fact mentioned above t h a t t h e smaller
is more e a s i l y
a c c e s s i b l e t o t h e crowded ring-N
atom,
the
steric h i n d r a n c e t o t h e a t t a i n m e n t of a p l a n a r conformation might b e considThis p o s s i b i l i t y c o u l d p r o b a b l y b e eliminated. however, a t least i n 2.6-
ered
PR. j u d g i n g from t h e a b o v e mentioned U V spectral f e a t u r e s
2.5-Dimethoxypyrazine was t h e o n l y o u t l i e r among 68 compounds examined here
I n t e r e s t i n g l y , t h e summation of
g a v e a b e t t e r p r e d i c t i o n t h a n t h a t of a n a l y s i s of
(Eq
3)
K
2py,
~ Z P Yv a l u e s
K PPR,
for t h e two OMe g r o u p s
as shown i n Table 14
g r o u p was shown t o b e o n e of
t h e 2-OMe
Our e x p l a n a t i o n is t h a t 2-methoxy-pyridine
In t h e
the outliers
u n d e r g o e s a 1-to-1
che-
s o l v a t i o n i n water b u t n o t i n o c t a n o l t h a t enhances t h e log P
lating-type
value irregularly
Since t h e e f f e c t of s u c h s o l v a t i o n o c c u r r i n g a t b o t h of
two methoxy s u b s t i t u e n t s c o u l d also be simulated on t h e b a s i s of
KZPR.
no
r e a s o n a b l e e x p l a n a t i o n was made o n t h e behavior of t h i s compound
3.2.3 P o l y s u b s t i t u t e d P y r a z i n e s : The empirical c o r r e l a t i o n e q u a t i o n s f o r di-substituted
pyrazine
n
v a l u e s were a p p l i c a b l e t o p r e d i c t
of i n t e r a c t i o n s (2,6-,
t h o s e of t h e
Taking i n t o a c c o u n t a l l t y p e s
p o l y s u b s t i t u t e d p y r a z i n e s given i n Table 16
2.5- and 2 , 3 - i n t e r a c t i o n s ) , t h e g e n e r a l form of t h e p r e -
d i c t i o n e q u a t i o n is w r i t t e n a s E q
28
TABLE 16 Hydrophobicity P a r a m e t e r s of P o l y s u b s t i t u t e d P y r a z i n e s (Poly-PR)
Substituents log P
Obsd.
Calcd?’
H
0 . 95
Me
1. 2 8 1. 9 5 1 . 50
1. 2 1 1 . 54 2. 2 1 1. 7 6 1.21 1.55 2 . 05 2. 56 3. 1 5
1. 1 9 1. 44 2. 1 0 1. 6 9 1.35 1.70 2. 12 2. 51 3. 04
x2
x3
x5
XI3
Me Me
Me Me Et C1 CN CN CN CN CN
Me Me Me
Et Me
CN CN CN CN CN
Me C1 Me Et
Me Bu
H H H C1 C1 Bu C1
0 . 95 1. 2 9 1. 7 9 2 . 30 2. a 9
dev.
I: K
0. 0 2
2pRb)
1.41 1.88 2. 37 1.90 1.46 1.93 2.41 2. 92 3.41
0. 1 0
0. 11 0. 0 7 -0.14 -0.15 - 0 . 07 0 . 05 0 . 11
dev.
-0. 20 - 0 . 34 -0.42 -0.14 -0.25 -0.38 - 0 . 36 - 0 . 36 - 0 . 25
a) C a l c u l a t e d by Eq. 28. For example, t h e v a l u e f o r Z3-di-CN-5-Me-6-CIp y r a z i n e is c a l c u l a t e d as follows: n ( C N , C N , Me. C1) = n ( 2 . 3 - P R ) (CN. C N ) + n (3. 5 P R ) ( C N , Me) JT ( 5 . 6 - P R ) (Me. CI) n ( 2 . G P R ) ( C l , CN) K ( 2 . 5 - p ~ () C N , Me) + (3. 6 P R ) (CN, C l ) - 2(2* ~ z P R ( C N ) n2PR(Me) a Z P R ( C l ) } = Eq. 27(CN. C N ) + Eq. 23(CN, Me) + Eq.27(Me8CI) Eq.23(CI,CN) Eq.25(CNSMe) Eq.25(CN,C1) 2(2nZPR(CN) + nzzpR(Mf?) z z p n ( c 1 ) ) = 0. 56 0.70 1 . 4 5 + 1. 18 + 0 . 6 1 + 1 . 0 7 - 2 ( 2 * 0 . 25 + 0.47 0 . 9 6 ) = 1. 70, i n whlch Eq.N(X,Y) r e p r e s e n t s t h e II v a l u e c a l c u l a t e d by Eq. N f o r X.Y-disubstituted p y r a z i n e s . b) F o r t h e n 2 P R v a l u e o f t h e component s u b s t i t u e n t s , see Table 4. +
+
+
+
+
+
+
+
+
+
+
+
~
182
R
(poly-PR)
r e p r e s e n t s t h e difference in l o g P values between t h e given
compound and t h e u n s u b s t i t u t e d pyrazine, a v a i l a b l e positions, and
K (1,
one of t h e c o r r e l a t i o n E q s t i o n s of XI and X,
J-PR)
X is any s u b s t i t u e n t on t h e f o u r
is t h e value c a l c u l a t e d with t h e use of
23, 25 and 27. depending on t h e r e l a t i v e o r i e n t a is t h e sum of
R 2 P R ( X )
R ZPR
values o v e r a l l s u b s t i t
uents. and n is 1 and 2 f o r tri- and t e t r a - s u b s t i t u t e d derivatives, r e s p e c t i v e ly
By t h e addition of
x (I,
J-PR)
in E q
28, we counted each
R
(X-PR)
one
o r two times e x t r a depending on whether t h e compound is t r i s u b s t i t u t e d or
t e t r a - s u b s t i t u t e d , so t h e negative c o r r e c t i o n was needed
The c a l c u l a t e d
l o g P v a l u e s simulated t h e observed values f o r 9 compounds very w e l l (r
0 99) and a r e l i s t e d in Table 16 of
matter
The good predicting power of E q
course, since t h e s u b s t i t u e n t s included
bonders or "weak" hydrogen a c c e p t o r s
=
28 was a
a r e e i t h e r non-hydrogen
Although t h e r e l i a b i l i t y of
Eq
28
should b e examined f u r t h e r on various pyrazines s u b s t i t u t e d by amphiprotic s u b s t i t u e n t s . it indicates t h a t E q s 23, 25 and 21 a r e p r e d i c t i v e enough as f a r as t h i s d a t a set is concerned 4.
CONCLUSIONS The above analyses are believed t o indicate t h a t bidirectional Hammett-
Taft-type t r e a t m e n t s can be applied t o r a t i o n a l i z e t h e v a r i a t i o n s in
K
values
of s u b s t i t u e n t s o r increment in log P with polysubstitutions in each of t h e
various types
diazine systems, unless
of
solvations
and/or
the
substituents are
intramolecular
involved
interactions
For
monosubstituted diazines, t h e p a i r of one of t h e two ring-N substituent
was
regarded
as
a
"single" s u b s t i t u e n t
between t h e " s u b s t i t u e n t set" and
t h e o t h e r ring-N
and
in
specific
analyses
the
interactions
atom t o r e g u l a t e
r e l a t i v e s o l v a t i o n s with p a r t i t i o n i n g s o l v e n t s were analyzed
of
atoms and t h e
The
the
v a l u e of
K
corresponding s u b s t i t u e n t s in s u b s t i t u t e d pyridines was used a s t h e r e f e r ence
For t h e di- t o p o l y - s u b s t i t u t e d pyrazines. t h e i n t e r a c t i o n s between t h e
s u b s t i t u e n t p a i r s were primarily considered. t h e sum of s u b s t i t u e n t
R
values
in t h e monosubstituted pyrazine system being t h e r e f e r e n c e hydrophobicity Although t h e f a c t o r s governing t h e hydrophobicity of s u b s t i t u t e d diaz i n e s w e r e s e p a r a t e d t o components a s f a r a s well-behaved
compounds a r e
concerned, t h e y were r a t h e r complex and t h e analyses should be done v e r y carefully
These f a c t o r s should be considered in c o n s t r u c t i n g computer-aided
automatic systems t o p r e d i c t t h e log P values of heteroaromatic compounds
183 Because
of
restrictions
relating
to
the
stability
of
compounds, t h e
number of compounds in some monosubstituted diazine systems were low regular
u
v a l u e was t h e b e t t e r
d e s c r i p t o r of
monosubstituted d i a z i n e systems, whereas t h e f o r t h e disubstituted pyrazines
CJ
I
the electronic and
u
The
effect
for
v a l u e s were b e t t e r
This discrepancy may be a t t r i b u t a b l e t o t h e
r e s t r i c t i o n s in accumulating r e l i a b l e d a t a
Since t h e u s e of r e g u l a r
p a r a m e t e r s i n h e t e r o a r o m a t i c systems is sometimes s u b j e c t
a-type
t o skepticism.
a
uniform t r e a t m e n t of t h e e l e c t r o n i c e f f e c t on t h e r e l a t i v e s o l v a t i o n s h o u l d b e examined with
compound sets including g r e a t e r
numbers
of
having v a r i o u s d e g r e e s of e l e c t r o n i c effect and hydrogen-bonding
substituents behaviors
Outlying b e h a v i o r s of s u b s t i t u e n t s in some of t h e 2 - s u b s t i t u t e d series should also be c l a r i f i e d in terms of experimental physical-organic chemistry
REFERENCES 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15
T. F u j i t a , J. Iwasa, and C. Hansch, J. A m e r . Chem. SOC. 86 (1964) 5175. A. Leo, C. Hansch, and D. Elkins, Chem. Rev. 71 (1971) 525. C. Hansch and A. J. Leo, S u b s t i t u e n t Constants f o r C o r r e l a t i o n Analysis i n Chemistry and Biology, John Wiley and Sons, N e w York 1979. J. Iwasa, T. F u j i t a , and C. Hansch. J. Med. Chem., 81 (1965) 150. S. J. L e w i s , M. S. Mirrlees, and P. J. Taylor, Quant. S t r u c . Act. Relat. 2 (1983) 1. S. J. L e w i s . M. S. Mirrlees, and P. J. Taylor. Quant. S t r u c . Act. Relat. 2 (1983) 100. J. Bradshaw and P. J. Taylor, Quant. S t r u c . Act. Relat., 8 (1989) 279. a)T. F u j i t a , Progr. Phys. Org. Chem.. 14 (1983) 75. b)Y. Nakagawa, K. Izumi. N. Oikawa. T. Sotomatsu, M. Shigemura, and T. F u j i t a , Environ. Toxicol. Chem., 11 (1992) 901. T. F u j i t a and T. Nishioka, Progr. Phys. O r g . Chem.. 12 (1976) 49. M. Charton. Prog. Phys. Org. Chem.. 13 (1981) 119. E. K u t t e r and C. Hansch, J. Med. Chem. 12 (1969) 647. M. Charton. in: N. B. Chapman and J. S h o r t e r (Eds.). C o r r e l a t i o n Analysis i n Chemistry. Plenum Press, N e w York, 1978, pp. 175-268. C. Yamagami. N. Takao and T. F u j i t a , Quant. S t r u c . Act. Relat., 9 (1990) 313 C. Yamagami. N. Takao and T. F u j i t a , J. Pharm. Sci., 80 (1991) 772. 0. Exner, in: N. B. Chapman and J. S h o r t e r (Eds.). C o r r e l a t i o n Analysis in Chemistry, Plenum P r e s s , N e w York, 1978, pp.439-540.
This Page Intentionally Left Blank
QSAR and Drug Design New Developments and Applications T. Fujita, editor 9 1995 Elsevier Science B. 9'. All rights reserved -
185
H Y D R O P H O B I C I T I E S OF DI- TO PENTAPEPTIDES H A V I N G UNIONIZABLE SIDE CHAINS AND C O R R E L A T I O N WITH SUBSTITUENT AND STRUCTURAL P A R A M E T E R S MIKI AKAMATSU and TOSHIO FUJITA Department of Agricultural Chemistry, Kyoto University, Kyoto 606-01, Japan A B S T R A C T : Under standardized experimental conditions, we measured the partition ratio P' in a 1-octanol/pH 7.0 aqueous phosphate buffer system of a large number of zwitterionized di- to pentapeptides composed of amino acids having unionizable side chains as an approximate "molecular" partition coefficient P. The variations in log P' value of peptides were analyzed with free-energy-related physicochemical parameters for the side chain substituents and substructures. The side chain parameters representing the intrinsic hydrophobicity, the steric effect on the relative solvation of functional groups on the backbone, and the conformational potential index derived from the ChouFasman p-turn propensity parameters were shown to be significant. For polar side chains, specific indicator variables attributable to intramolecular hydrogenbond formations and the "polar proximity effect" for augmentations of hydrophobicity observed when polar groups are crowded together were required in addition. The proline residue was shown to participate in the log P' value depending not only upon its location on the backbone but also upon the total number of residues included in peptides. 1.
INTRODUCTION
The hydrophobicity of component amino acids and peptide segments is believed to govern not only the three-dimensional structure of proteins determining their biological functions (1), but also the affiliation properties of their partial domains into hydrophobic biomembraneous phases (2). In addition, peptides and their analogs have been attracting interest as potential drugs (3). The hydrophobicity of drugs is regarded as a highly important parameter to control transport behaviors from their site of administration to their site of action through a number of biomembranes as well as their binding with hydrophobic receptor sites (4). The log P value, P being the partition coefficient of neutral molecules measured with the 1-octanol/water system, has been widely used to represent molecular hydrophobicity (3, 4). In this article, we show that the log P' values, P' being the partition ratio in the system of 1-octanol/pH 7.0 aqueous buffer at 25~ for a number of di- to
186 pentapeptides having unionizable side chains are analyzable with free-energyrelated physicochemical parameters of the side chain substituents of the component amino acids by the regression technique (5, 6). The log P' value under such conditions is believed to be close to the log P of zwitterionized "neutral" peptides. In tetra- and pentapeptides, such conformational factors as the [3-turn formation (7) are shown to contribute to the net molecular hydrophobicity in addition to factors considered for di- and tripeptides. The correlation equation should be able to predict the log P' values of peptides, at least up to pentapeptides consisting of amino acids with unionizable side chains. 2.
H Y D R O P H O B I C I T Y OF PEPTIDES
2.1 Measurement of Partition Ratio Each peptide dealt with here showed a pH-log P' profile taking a "flat" parabolic form (5). The maximum log P' value, which is expected to be observed at the isoelectric point, should be the "true" hydrophobic parameter, log P, for the zwitterionized molecule in which the electric charges are cancelled. Unfortunately, the isoelectric points of most peptides were difficult to measure because of their limited solubility. Because of the flat form, the pHprofile of the log P' value is almost horizontal within the pH range between 5 and 7. We acertained that the log P' measured at pH 7 indeed parallels that measured at pH 6 for a subset of compounds (5). A theoretically reasonable pH-log P' profile with a "flat" parabolic form was not obtained when NaC1 was used to adjust the ionic strength of the acidic to neutral buffer solutions (5). Therefore, the buffer solution should be prepared from sodium hydrogen phosphate and dihydrogen phosphate only. This was considered to be due to the partitioning of the ion-pairs with counter anions. The phosphate anion, probably existing as a mixture of mono- and divalent species under acidic to neutral pH conditions, is perhaps much less hydrophobic than chloride. 2 . 2 Physicochemical Side-chain and Structural Parameters In the course of preliminary analyses, we found that the variations in the log P'(pH 7) value are governed at least by the hydrophobic and steric effects of side chain substituents of component amino acids. As the hydrophobic parameter of side chain substituents, we used the n value of general utility for aliphatic substituents evaluated under conditions free from components such as intramolecular stereoelectronic and hydrogen-bonding interactions (5, 6). For side chains with a polar group or heteroatom, the "intrinsic" aliphatic n value was evaluated from the log P value of related (but not peptidic) compounds in
187 which the polar group is separated by at least two methylene units from a chromophore (8). Our rc value for alkyl side chains is equivalent to that proposed by Hansch and Leo (9) under consideration of the branching factor, being close to that of Fauchbre and Pligka (10). Our ~ value for polar side chains in serine, threonine, methionine, tryptophan, glutamine, and asparagine is more negative than the corresponding value of Fauchbre and Pliska as will be shown later. For the steric effects of side chain substituents except "that" of proline, we used either the E's or E's c parameter depending upon the situation. The E's is the Dubois steric parameter (11). The E's parameter was defined to improve the Taft Es parameter (12). The E's c is the "corrected" Dubois steric parameter related to the original E's by Eq. 1, where n is the number of o~-hydrogen atoms in aliphatic substituents. E's c = E's - 0.306(3 - n)
[1]
The "correction" term in Eq. 1 takes the same form as that for the Taft Es made by Hancock and coworkers (13) to eliminate possible hyperconjugation effects of alkyl substituents on the reference reaction rate from which Es is defined. As indicated previously (14), however, the E's c (improved Es c) value is the parameter not corrected for the hyperconjugation effect attributable to the (xhydrogen atoms of substituents, but that representing not only the steric bulk but also the effect of a-branching. The coefficient of the correction term is fixed as -0.306 in Eq. 1, but values between -0.25 and -0.35 were found to be equally good. The relevance of the use of E's c for the steric effect of aliphatic substituents is discussed in detail in our previous analyses of the log P value of aliphatic amines and the ion-pair formation-partition constant of aliphatic ammonium ions (14). By definition, the bulkier, as well as the more o~-branched the substituents, the more negative the E's c value becomes. For most side chain substituents dealt with in this article, the E's value has been defined (11). The E's values of the indole-3-methyl group in tryptophan, the aminocarbonyl-methyl group in asparagine, and its higher homolog in glutamine were estimated using a highly linear relationship (5) between the E's value and Charton's ~ steric parameter (15, 16). That of the 4-hydroxybenzyl group in tyrosine was taken to be equivalent to that of the benzyl group in phenylalanine. The reference points of E's and E'sC were shifted so that E's(H) and E'sC(H) of the "side chain" in glycine were zero (5, 6, 17). To deal with the conformational effect arising from the possible 13-turn structure in tetra- and pentapeptides, we adopted the 13-turn "potential" index for
188 component amino acids proposed by Chou and Fasman (7). Their 13-turn index is defined statistically for each amino acid in each of the four consecutive positions from the data for 457 [3-turned backbone substructures found in 29 proteins of known sequence and crystallographic structure. As will be discussed later, the logarithm of the 13-turn index, f, for the i-th amino acid in the four consecutive positions, log fi, was regarded as a free-energy-related 13-turn potential parameter of each amino acid. For the inductive electronic parameter of side chain substituents, the Charton civalue was used (16). The relevant parameter sets are listed in Table 1. Factors governing the value of log P'(pH 7) shown in Table 2 were analyzed by the multiple regression technique in terms of the above-mentioned physicochemical free-energy-related parameters for the side chain substituents and indicator variables for particular substructures. T a b l e 1. Hydrophobicity Scale, Steric and Electronic Parameters, and [3-Turn Potential Indices of Amino Acid Side Chains Amino Acid Gly Ala Val
Leu Ile Phe Tyr Trp Met Ser Thr Asn Gln
Pro
na 0.00 0.32 1.27 1.81
E's b E's c c (YI d 0.00 0.00 0.00 -1.12 -0.20 -0.01 -1.60 -1.29 0.01 -2.05 -1.44 -0.01
log fie log fi+l e log fi+2 e log fi+3 e log ft e 0.09 -0.19 -0.19 -0.18
0.00 -0.02 -0.26 -0.54
0.31 -0.37 -0.46 -0.40
0.25 -0.17 -0.13 -0.09
0.19 -0.19 -0.28 -0.24
1.81 1.95 1.20 1.92
-2.12 -1.51 -1.51 -1.47f
-1.81 -0.90 -0.90 -0.86
-0.01 0.03 0.03 0.00
-0.17 -0.07 0.03 -0.10
-0.39 -0.37 -0.10 -0.89
-0.57 -0.17 0.09 -0.10
-0.14 -0.07 0.18 0.30
-0.27 -0.19 0.05 0.00
0.61 -1.49 - 1.18 -1.95 -1.41 0.86
-1.64 -1.09 - 1.04 -1.60 f -1.43 f -
-1.03 -0.48 -0.73 -0.98 -0.82 -
0.04 0.11 0.04 0.06 0.05 -
-0.07 0.14 0.02 0.25 -0.08 0.13
-0.07 0.17 0.09 -0.02 0.01 0.53
-0.85 g 0.12 -0.09 0.33 -0.35 -0.23
-0.14 0.03 -0.03 0.004 0.09 -0.11
-0.19 0.13 -0.00 0.18 -0.01 0.19
a b c d e
From ref. 5 and 17. From ref. 11, unless noted. The reference point is shifted so that E's(H) = 0. From ref. 5 and 14. The reference point is shifted so that E'sC(H) = 0. From ref. 16. Calculated from ref. 7. f See Text. g Not reliable. The corrected value -0.33 was used in Eq. 17.
2.3 Di- and Tripeptides First, we examined the log P' values of di- and tripeptides composed of the nonpolar amino acids; glycine, alanine, valine, leucine, isoleucine, and phenylalanine, in terms of the summation of the side chain ~ value of component amino acids and derived Eq. 2 with an indicator variable Itri.
189
Table 2. Log P and Physicochemical Parameters of Di- to Pentapeptides log P
~~
No. Compounds 1 2 3 4 5 6 7 8 9 10
FL LF FF LL LV VL A1 I1 LI
vv
11
ww
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
WF WA WL WY LY YL VY
FY YY LM ML MV FM SL PF PL PI FP LP IP FFF GFF FVG FVF FVA LVV LII LVL LAL LLL WGG WFA WWL LLY VFY GFY YLV YVF
Zrr
3.76 3.76 3.90 3.62 3.08 3.08 2.13 3.62 3.62 2.54 3.84 3.87 2.24 3.73 3.12 3.01 3.01 2.47 3.15 2.40 3.12 3.12 2.58 3.26 0.32 2.81 2.67 2.67 2.81 2.67 2.67 5.85 3.90 3.22 5.17 3.54 4.35 5.43 4.89 3.94 5.43 1.92 4.19 5.65 4.82 4.42 3.15 4.28 4.42
E,C(RN) ZE,C(RM)
-0.90 -1.44 -0.90 -1.44 -1.44 -1.29 -0.20 -1.81 -1.44 -1.29 -0.86 -0.86 -0.86 -0.86 -0.86 -1.44 -0.90 -1.29 -0.90 -0.90 -1.44 -1.02 -1.02 -0.90 -0.48 -
-
-0.90 -1.44 -1.81 -0.90 0.00 -0.90 -0.90 -0.90 -1.44 -1.44 -1.44 -1.44 -1.44 -0.86 -0.86 -0.86 -1.44 -1.29 0.00 -0.90 -0.90
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.90 -0.90 -1.29 -1.29 -1.29 -1.29 -1.81 -1.29 -0.20 -1.44
0.00 -0.90 -0.86 -1.44 -0.90 -0.90 -1.44 -1.29
E',C(Rc) log Sum log f,+q
-1.44 -0.90 -0.90 -1.44 -1.29 -1.44 -1.81 -1.81 -1.81 -1.29 -0.86 -0.90 -0.20 -1.44 -0.90 -0.90 -1.44 -0.90 -0.90 -0.90 -1.02 -1.44 -1.29 -1.02 -1.44 -0.90 -1.44 -1.81
-
-0.90 -0.90 0.00 -0.90 -0.20 -1.29 -1.81 -1.44 -1.44 -1.44 0.00 -0.20 -1.44 -0.90 -0.90 -0.90 -1.29 -0.90
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Obsd.
-1.17 -1.15 -0.85 -1.46 -2.05 -2.07 -2.60 -1.82 - 1.64 -2.82 -0.27 -0.47 -1.98 -0.73 -1.13 -1.94 -1.75 -2.52 -1.68 - 1.87 - 1.87 - 1.84 -2.53 -1.59 -2.49 -2.07 -2.41 -2.56 - 1.36 -1.76 -1.79 -0.02 - 1.33 -2.33 -0.76 -2.19 -2.10 -1.11 -1.57 -2.03 -0.94 -2.72 - 1.oo 0.36 - 1.34 -1.50 - 1.96 - 1.45 -1.37
Calcd. (Q. 18)
(!3+19)
-1.23 -1.36 -0.93 -1.66 -2.12 -2.09 -2.50 -1.98 -1.77 -2.55 -0.21 -0.56 -1.89 -0.86 -1.14 -1.93 -1.80 -2.36 -1.51 -2.08 -2.01 -1.91 -2.37 -1.58 -2.66
-1.24 -1.38 -0.95 -1.67 -2.13 -2.09 -2.50 -1.98 -1.78 -2.56 -0.22 -0.58 -1.91 -0.87 -1.14 -1.94 -1.80 -2.37 -1.51 -2.08 -2.02 -1.91 -2.37 -1.59 -2.67 -1.95 -2.24 -2.35 -1.37 -1.79 - 1.99 0.04 -1.31 -2.29 -0.72 -2.05 -1.90 -1.19 -1.43 -2.01 -0.97 -2.73 -0.92 0.48 -1.24 -1.38 -1.87 -1.57 -1.28
0.05 -1.29 -2.27 -0.71 -2.03 -1.90 -1.20 -1.44 -2.00 -0.97 -2.71 -0.90 0.48 -1.25 -1.38 -1.87 -1.58 -1.28
190 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
102 103
YGF YYL AYI IYV MLF LSL ISL IS1 SLI SLL FIT LIT IIT LTI TLI TVL PLL LPL LLP IPI FGGF VAAF LLVF LLLV VGFF AVLL IAGF FFFF LLGF LLAF LLLF IlVV IIGF IAAI FFGF VLVL WLLV WGLL YILG FVYF IYIV VFLT MIL1 VMFI PLLL LPLL LLPL LLLP IPGI VPVL VPGV YPGW YPGI GGFVF
3.15 4.21 3.33 4.28 5.07 2.13 2.13 2.13 2.13 2.13 2.58 2.44 2.44 2.44 2.44 1.90 4.48 4.48 4.48 4.48 3.90 3.86 6.84 6.70 5.17 5.21 4.08 7.80 5.57 5.89 7.38 6.16 5.57 4.26 5.85 6.16 6.81 5.54 4.82 6.37 6.09 3.85 6.04 5.64 6.29 6.29 6.29 6.29 4.48 5.21 3.40 3.98 3.87 5.17
-0.90 -0.90 -0.20 -1.81 -1.02 -1.44 -1.81 -1.81 -0.48 -0.48 -0.90 -1.44 -1.81 -1.44 -0.73 -0.73 -1.44 -1.44 -1.81 -0.90 -1.29 -1.44 -1.44 -1.29 -0.20 -1.81 -0.90 -1.44 -1.44 -1.44 -1.81 -1.81 -1.81 -0.90 -1.29 -0.86 -0.86 -0.90 -0.90 -1.81 -1.29 -1.02 -1.29 -1.44 -1.44 -1.44 -1.81 -1.29 -1.29 -0.90 -0.90 0.00
0.00 -0.90 -0.90 -0.90 -1.44 -0.48 -0.48 -0.48 -1.44 -1.44 -1.81 -1.81 -1.81 -0.73 -1.44 -1.29 -1.44
-1.44
0.00 -0.40 -2.73 -2.88 -0.90 -2.73 -0.20 -1.80 -1.44 -1.64 -2.88 -3.10 -1.81 -0.40 -0.90 -2.73 -2.88 -1.44 -3.25 -2.19 -2.71 -2.34 -3.25 -1.92 -2.88
-2.88
-
-2.19
-0.90 -1.44 -1.81 -1.29 -0.90 -1.44 -1.44 -1.81 -1.81 -1.44 -0.73 -0.73 -0.73 -1.81 -1.81 -1.44 -1.44 - 1.44
-
-1.81 -0.90 -0.90 -0.90 -1.29 -0.90 -1.44 -0.90 -0.90 -0.90 -0.90 -0.90 -1.29 -0.90 -1.81
-0.90 -1.44 -1.29 -1.44 0.00 -0.90 -1.29 -0.73 -1.81 -1.81 -1.44 - 1.44 - 1.44
-
-1.81 -1.44 -1.29 -0.86 -1.81 -0.90
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 -0.65 - 1.25 -1.24 -0.43 -0.94 0.06 -0.68 -0.48 -1.16 -1.19 -1.14 -0.31 -0.69 -0.20 - 1.28 -1.16 -0.58 -0.51 -0.31 -0.96 -0.99 -0.99 -0.58 -0.89 -0.14 - 1.04 O.0Od 0.54 -0.21 0.53 1.17 0.73 -0.21
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.31 -0.37 -0.46 -0.40 -0.17 -0.40 0.31 -0.17 0.31 -0.37 -0.40 -0.46 0.31 -0.37 0.31 -0.46 -0.40 -0.40 -0.40 0.09 -0.57 -0.40 -0.40 -0.17 -0.40 -0.40 -0.23 0.00a 0.31 -0.46 0.31 0.31 0.31 -0.17
-1.86 -1.38 -2.04 -1.77 -1.03 -2.35 -2.28 -2.64 -1.99 -2.03 -1.95 -2.14 -2.23 -2.30 -1.66 -1.97 -1.64 -1.56 -1.58 -1.65 -1.51 -1.91 -0.25 -0.51 -0.51 -1.74 -1.78 1.63 -0.42 -1.00 0.24 -1.41 -0.99 -2.82 0.17 -1.23 0.23 0.06 -1.49 -0.32 -1.09 -1.32 -0.49 -0.63 -1.06 -0.92 -1.00 -1.18 -1.69 -1.91 -2.83 -1.25 -1.65 -1.40
-2.08 -1.39 -2.09 -1.92 -0.92 -2.21 -2.41 -2.52 -2.09 -1.97 -1.68 -2.10 -2.31 -2.10 -1.93 -2.28
-1.34 -2.22 -0.27 -0.53 -0.99 -1.25 -1.73 1.43 -0.50 -0.77 0.23 -1.35 -0.82 -2.41 0.23 -1.00 0.27 -0.53 -1.59 0.29 -1.25 -1.21 -0.54 -0.49
-1.26
-2.09 -1.38 -2.08 -1.91 -0.92 -2.21 -2.41 -2.52 -2.08 -1.97 -1.68 -2.10 -2.31 -2.10 -1.93 -2.28 -1.89 -1.44 - 1.44 -1.75 -1.36 -2.25 -0.28 -0.52 -1.01 -1.25 -1.75 1.41 -0.51 -0.78 0.23 -1.34 -0.82 -2.42 0.21 -1.00 0.27 -0.54 -1.58 0.30 -1.24 -1.22 -0.52 -0.48 -1.32 -0.88 -0.75 - 1.09 -1.92 -1.81 -2.50 -1.10 - 1.86 -1.27
191
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 a
VFVGL VGFVF GAALL AFGVF AGFVF LIIGA GLLGF ALLGF IIIIG IVVVI FGAGI FAAAL WGGFV WLFAA IAYWG GLSVL SLAIV YTGFL LVGTF L-enk b M-enk c
6.30 6.44 4.26 5.49 5.49 5.75 5.57 5.89 7.24 7.43 4.08 4.72 5.14 6.32 5.25 3.40 3.72 3.78 3.85 4.96 3.76
-1.29 - 1.29 0.00 -0.20 -0.20 -1.44 0.00 -0.20 -1.81 -1.81 -0.90 -0.90 -0.86 -0.86 -1.81 0.00 -0.48 -0.90 -1.44 -0.90 -0.90
-2.19 -2.19 -1.84 -2.19 -2.19 -3.62 -2.88 -2.88 -5.43 -3.87 -0.20 -0.60 -0.90 -2.54 -1.96 -3.21 -3.45 -1.63 -2.02 -0.90 -0.90
-1.44 -0.90 -1.44 -0.90 -0.90 -0.20 -0.90 -0.90 0.00 -1.81 -1.81 -1.44 -1.29 -0.20 0.00 -1.44 -1.29 -1.44 -0.90 -1.44 - 1.02
-0.11 -0.49 -0.39 -0.37 -0.48 -0.42 -0.48 -0.48 -0.87 -1.01 0.24 -0.63 0.15 -0.98 0.21 -0.45 -0.89 0.36 -0.16 0.28 O.28
0.31 -0.17 -0.37 0.31 -0.17 0.31 0.31 0.31 -0.57 -0.46 0.31 -0.37 0.31 -0.17 0.09 0.12 -0.37 0.31 0.31 0.31 0.31
-0.97 -0.50 -2.55 -0.59 -1.10 -1.65 -0.18 -0.63 -0.97 -0.89 -1.87 -2.23 -0.44 -0.32 -1.47 -1.64 -1.94 -1.18 -1.18 -0.80 -1.39
-0.70 -0.78 -2.33 -0.71 -1.08 -1.36 -0.73 -0.53 -1.30 -1.11 -2.10 -2.02 -0.75 -0.18 -1.12 -1.59 -1.95 -0.97 -1.30 -1.22 -1.57
-0.70 -0.77 -2.32 -0.70 -1.07 -1.36 -0.73 -0.54 -1.31 -1.13 -2.09 -2.0O -0.75 -0.16 -1.12 -1.60 -1.96 -0.98 -1.30 -1.22 -1.57
These parameter terms were not counted, see text.
b [Leu]enkephalin(YGGFL) c [Met]enkephalin(YGGFM)
log P'= 0.804 En - 0.689 Itri - 4 . 4 2 5 (0.217)
(0.417)
[2]
(0.752)
n = 20
s = 0.333
r = 0.892
F2,17 = 32.9
In Eq. 2 and the following correlation equations, n is the number of compounds, s is the standard deviation, r is the correlation coefficient, F is the ratio of regression to residual variance, and the figures in parentheses are the 95% confidence intervals. Itri is zero for dipeptides and unity for tripeptides. In trying to improve the correlation, we noticed that the log P' values of peptides containing 13-branched amino acids with a-branchings in the side chain, such as valine and isoleucine, were more negative than the value calculated by Eq. 2. Since the steric effect of the "crowded" structure of the branched side chain on the relative solvation of the NHCO moiety and terminal NH3 + and COO- groups with partitioning solvents was anticipated to contribute to these deviations, we introduced steric terms into Eq. 2. Among various steric parameters (18) examined, the E's c parameter worked best, yielding Eq. 3.
192 log P' = 1.031 2;~: - 0.778 Itri + 0.521 E'sC(RN) + 0.337 E'sC(RM) (0.081) (0.225) (0.131) (0.188) + 0.335 E'sC(Rc) - 4.068 (0.128) (0.264) n=20
[31 s=0.113
r=0.990
F5,14 =141
R N and RC represent the side chains of amino acids at the N- and C-termini, respectively. RM is for the side chain of the central amino acid of tripeptides. For dipeptides, the E'sC(RM) term is not counted, i.e., E'sC(RM) = 0. This does not mean that the "phantom" central amino acid in dipeptides is regarded as being glycine. The effect attributed to the "phantom" glycine is compensated by the Itri term. Equation 3 indicates that the steric effects of the side chains on the relative solvation with partitioning solvents depend upon their location in the molecule. When including peptides with polar amino acids, no relevant correlation equation was derived unless indicator variable terms for the presence of respective polar amino acids were introduced to give Eq. 4 (Y, W, M, S, and T are one letter notations for tyrosine, tryptophan, methionine, serine, and threonine, respectively). log P ' = 0.960 z n - 0.635 Itri + 0.561 E'sC(RN) + 0.337 E'sC(RM) (0.075) (0.136) (0.096) (0.123) + 0.255 E'sC(Rc) + 0.165 Iy + 0.352 Iw + 0.637 IM (0.097) (0.079) (0.096) (0.149)
[4]
+ 1.665 (Is + IT) - 3.912 (0.219) (0.210) n=59
s=0.138
r=0.982
F9,49 = 1 4 8
Since the slope values for Is and IT terms were very close in the preliminary calculation, they were combined in Eq. 4. Peptides containing proline were not included in Eq. 4, because the E's c value for the cyclic "side chain" of proline is difficult to estimate. Peptides containing glutamine and asparagine were also not included, because their log P' values were too low to measure accurately. The fact that the slope of the zn term is very close to unity shows that the intrinsic hydrophobic factor of the side chains of constituent amino acids contributes to the total hydrophobicity of peptides almost as such after factors attributed to other effects are separated. The positive sign of the E's c terms
193 indicates that the solvation of backbone functional groups with the bulkier 1octanol is less favorable than that with the smaller water as the side chain substituents are bulkier and more c~-branched, resulting in lower log P' values. Equation 4 also shows that the steric effects of the side chains on the relative solvation depend upon their locations in the molecule. The coefficient of the E'sC(RN) term is the highest and that of the E'sC(Rc) term is the lowest, that of the E'sC(RM) term being intermediate. The NH3 + group as the strongest hydrogen donor in the molecule is solvated more effectively than other sites such as CONH and COO- groups with the more basic octanol than the less basic water. The solvation of the NH3 + group with the bulkier 1-octanol favorable for enhancing the log P' value would suffer the steric hindrance from the N-terminal side chain substituent most sensitively. The coefficient of the Itri term, the Itri value taking unity for tripeptides, indicates that the log P' value decreases by about 0.64 with introduction of one more peptide unit into the dipeptide backbone, other things being unchanged. It also corresponds to the rc value of the CH3CONH group. The value -0.64 is, however, considerably more positive than that [-2.17 (9, 19)] expected under conditions without any intramolecular stereoelectronic effects. This could be rationalized by the "polar proximity factor" (9) for the enhancement of hydrophobicity observed when polar groups are close to each other. Change from a di- to tripeptide backbone increases the proximity interaction between two CONH groups. This factor could be evaluated approximately taking the interaction between two CONH groups separated by CH2 as about 1.73 (9), which is close to the real situation, the increase being about 1.6 (-0.6 + 2.2).
I
I
HC~CH2~O~
I C---'-O I NH
I
HC~CH2~O~ I,
,
R~ ~'
C - - ' - O .... H--O~. NH
I
I
HC~CH2~O~ I
... H
R
ROH
C~O
"~
NH
I
"~ _ H o
..... H----O~"
R
I
Fig. 1. Intramolecular Bridging-Solvation of the Hydroxyl Group of a Serine Residue and Carbonyl Group on the Backbone. R is Either 1-Oct or H. Reproduced from ref. 5 by permission of VCH Verlagsgesellschaft mbH.
Indicator variable terms specific to polar amino acid side chains are always positive. The log P' value of peptides with these side chains is higher than that predicted by the hydrophobicity of the side chains and backbone, as well as
194
factors attributed to the steric effect on the relative solvation. For the side chains of serine and threonine, the size of the coefficient is remarkably high. This is probably due to the fact that the hydroxyl group in the side chain and the carbonyl group in the backbone are well positioned for intramolecular bridgingsolvation as shown in Fig. 1. This type of bridging hydration has been observed in glutamine in the crystal structure of human deoxy-haemoglobin (20). Abraham and Leo (21) discussed the possibility that serine and threonine take this type of bridging-solvation structure in rationalizing the side chain rc value of Fauchbre and Pliska (10), which is significantly higher than the value usually used for aliphatic systems. This type of solvation was estimated to make the log P' value 0.6---0.9 unit higher than that of the structure without such "intramolecular" solvation (22, 23). Subtracting the value attributed to the bridging solvation, the size of the regression coefficient for Ser and Thr residues is about 0.8---1.1. The "corrected" regression coefficient value seems to decrease with the number of bonds separating the polar heteroatom on the side chain from the backbone more regularly than the uncorrected value, as shown in Table 3. When the number of bonds increases, the net inductive electronwithdrawing effect of the side chain polar groups on the backbone functional group is gradually reduced. The electron-withdrawing effect of substituents raises the partition coefficient in series of substituted compounds regardless of whether the functional group is hydrogen-donating or hydrogen-accepting (22). Thus, the greater the number of bonds, the lower should be the increment in the log P' value assigned as the polar proximity factor (9). Table 3. Regression Coefficient of Indicator Variable Terms Amino Acid Ser Thr Met Trp Tyr Asn Gin a b c d
Side Chain RegressionCoefficient a -CH2OH 1.665 (0.8--1.1) c -CH(CH3)OH 1.665 (0.8,--1.1) c -CH2CH2SCH3 0.637 -CH2-(3-Indolyl) 0.352 -CH2-(4-OH-Phenyl) 0.165
nb 2 2 3 4 6
-CH2CONH2 -CH2CH2CONH2
3
1.971 d (1.1 ~- 1.4) c
1.337d (0.5,--0.8) c 4
Unlesss noted, the regression coefficient value of indicator variable terms in Eq. 4. The number of bonds separating the polar heteroatom in the polar group from the a-carbon of the peptides. The value in parentheses is "corrected" by subtracting the intramolecular bridging-solvation factor. Estimated from Eqs. 20 and 21.
195 The intercept of Eq. 4 should correspond with the log P' value of glycylglycine where every independent variable is zero. The very good correlation quality of Eq. 4 could be taken to mean that quite a few component factors contributing to variations of the log P' value are almost completely separated from each other. The contributions of the side chains of asparagine and glutamine residues to the log P' value were analyzed indirectly with use of protected peptides as described later.
2.4 Tetra- and Pentapeptides With the above results for di- and tripeptides in mind, we analyzed the log P' values of tetra- and pentapeptides using parameter terms corresponding to those used in Eq. 4, and formulated Eq. 5. log P ' = 1.025 %rt -0.262 Ipent + 0.575 E'sC(RN)+ 0.491 [ZE's C(RM) (0.157) (0.226) (0.205) (0.137) + E'sC(Rc)] + 0.329 Iw + 0.887 IM + 1.772 (Is + IT) - 4.544 (0.335) (0.432) (0.476) (0.670) n =46
s =0.335
r=0.926
[5]
F7,38 = 32.6
Ipent is an indicator variable taking zero for tetrapeptides and unity for pentapeptides. %E'sC(RM) means the sum of E's c parameters for side chains other than those of the two terminal amino acids. Preliminary examinations indi-cated that the steric effect of RM substituents is almost position-independent, so their E's c values were added together. The coefficients of %E'sC(RM) and E'sC(Rc) terms were also so close that they were combined. Equation 5 might be acceptable, but the quality of the correlation in terms of r and s is consider-ably poorer than that of Eq. 4. The Iy term for the tyrosine side chain is insignificant over the 95% level in Eq. 5. The Iw term for the tryptophan is also only justified over the 94.5% level. Moreover, the coefficient of Ipent corresponding to Alog P with introduction of one more peptide unit is significantly more positive than that of Itri in Eq. 4. The intercept is about 0.6 unit more negative than that in Eq. 4, reflecting the difference between the reference peptide series in Eqs. 4 and 5: dipeptides and tetrapeptides. Since physicochemical factors governing the log P' value of lower peptides could at least be involved as factors for tetra- and pentapeptides on the same standards, these discrepancies should indicate that variables other than those used in Eqs. 3 and 4 are required for log P' of tetra- and pentapeptides. We considered that a specific conformational feature such as the 13-turn formation could be a factor required for tetra- and pentapeptides but not for di- and tripeptides.
196 R3 (i+2)
R4 (i+3)
[3-Turns, classified into at least three types, have been observed as regular conformational patterns in regions of backbone chain reversals of globular proteins (24). ~-Turned substructures consist of four consecutive amino acid residues, mostly with hydrogen-bonding formation between the CO-oxygen of the residue at position i and the NH-hydrogen of the residue at position (i + 3). One of the [3-turn structural types (named type I by Venkatachalam) is shown in Fig. 2 (25).
R2
(i+
i)
Fig. 2. One of the [3-turn Structures (Type I) of Peptides. Reproduced from ref. 25 by permission of the Journal of Biological Chemistry.
We assumed that tetra- and pentapeptides exist as an equilibrium mixture of random and [3-turned structures depending upon the ~-turn potential of component amino acids in partitioning solvents, and so the partition of peptides can be depicted as shown in Fig. 3.
1-Octanol Phase: [C~ Water
Phase:
Koct ~ [Coct]~
[Cw]R ~
[Cw][~
Kw Fig. 3. Partition and Conformational Equilibria of Peptides; [C] represents the concentration, and suffixes, R and [3, express the random and ~-turn structure, respectively. Reproduced from ref. 6 by permission of the American Pharmaceutical Association. The net P' value is expressible by Eq. 6. p
t
.--
[Coct]R+[Coct]13 = [Coct]R [ l + K o c t ] [Cw] R + [Cw][3
[Cw] R
1 + Kw
[6]
In Fig. 3 and Eq. 6, Koct and KW are the conformational equilibrium constants in the 1-octanol and water phases, respectively, being reflected by the [3-turn
197 potential of four consecutive amino acids. Di- and tripeptides are unable to take the [3-turn structure and so Koct=Kw=0 in Eq. 6. Thus, Eq. 7 holds for these lower peptides.
[Coct]R
log P ' = log ~ = log [Cw]R
P'R
[7]
P'R is the P' value for molecules with random structures. It has been shown that, the more hydrophobic the environment, the easier is the intramolecular hydrogen-bond formation (22). For oligopeptides, intramolecular hydrogenbonding could lead to the formation of conformationally fixed structures such as [3-turns and o~-helices. Consistent with CD spectra measured in aqueous buffer (pH 7) and 2,2,2-trifluoroethanol (6), the tetra- and pentapeptides studied here were considered to exist almost entirely as random conformers in the aqueous phase, but to take the [3-turn structure in aliphatic alcohols to various extents according to the [3-turn potential of their component amino acids. Thus, such conditions as I>>Kw and l>l for Eq. 9, but the procedure was admissible at least as a first approximation. In Table 2, the log P' values calculated using Eq. 18 are
202 shown for 105 peptides.
2.6
Peptides Containing Proline
Peptides containing proline were not included in the above correlations, since the E's c value for the "side chain" of proline is not easily estimated. By substituting the values of available parameters for peptides including proline such as En, Ipep, log fi+2, and Iturn into Eq. 18, we calculated the summation of these parameter terms and examined the difference, Alog P', from the observed value. The Alog P' value should correspond with the component of the log P' value attributable to the steric effect together with other effects specific to the Pro residue. As shown in Table 4, the effects seem dependent not only Table 4. Alog P' and Indicator Variables on the location but also on the of Peptides Containing Proline number of residues involved. Compounds ~xlogP' Ip(N) Ip(#pep) When the Pro residue is at the NPI -0.683 1 -1 terminus, the Alog P' value is PL -0.648 1 -1
invariably negative, being -0.5 -0.9. At the C-terminus, however, it shows the reverse effect only in dipeptides. For tripeptides without N-terminal proline, the Alog P' is nearly zero. For tetrapeptides, the Alog P' is always negative. We considered that the effect of a Pro residue at a position other than the Nterminus is to lower the log P' value almost regularly with increase in number of total residues from dipeptides regardless of its location. Although the variation patterns of the Alog P' value looked rather
PF FP IP LP IPI PLL LPL LLP PLLL LPLL LLPL LLLP IPGI VPGV VPVL YPGW YPGI
-0.605 0.325 0.531 0.354 0.090 -0.562 -0.128 -0.148 -0.888 -0.407 -0.607 -0.437 -0.123 -0.688 -0.457 -0.509 -0.140
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
-1 -1 -1 -1 0 0 0 0 1 1 1 1 1 1 1 1 1
complex, we assumed that they are represented by two indicator variables. The one is for the effect when the Pro residue is located at the N-terminus, Ip(N), and the other is for the effect of the number of residues, Ip(#pep). The values of these indicator variables were set as zero for tripeptides without N-terminal proline, since their Alog P' value is closest to zero. The values of indicator variables are shown in Table 4. With these two additional indicator variable terms for the Pro residue, Eq. 19 was finally formulated for 124 peptides
203 without any significant decrease in the correlation quality. log P' = 0.942 Zrt - 0.582 Ipep+ 0.546 E'sC(RN) + 0.295 [ZE'sC(RM) (0.064) (0.096) (0.089) (0.071 ) + E'sC(Rc)] + 0.516 Iturn + 0.764 log fi+2 + 0.144 Iy (0.172) (0.211) (0.089) + 0.378 Iw+ 0.659 IM + 1.581 (Is + IT) - 0.807 Ip(N) (0.106) (0.165) (0.197) (0.225) -0.346 Ip(#pep)- 3.866 (0.118) (0.190) n = 124 s =0.209
[191
r=0.967
F12,111 =
134
In Table 2, the log P' values calculated by Eq. 19 are also listed. For Leu-LeuLeu-Pro, where no 13-tum formation with intramolecular hydrogen bonding is possible, the Iturn and log fi+2 terms were ommitted in calculating the log P' values. At the protonated amino group of the N-terminus working as the hydrogen donor, the solvation with the more basic 1-octanol could effectively compete with that with the less basic water. Since the number of polarized N+-H bonds in peptides including proline is lower by unity than that in others without cyclic amino acids at the N-terminus, the solvation with 1-octanol is less significant in peptides including proline than that in other regular peptides, leading to lower log P' values. The slope of the Ip(N) term, -0.81, was in the same order as that previously observed (-0.52) for the effect of the decrease in the number of N+-H bonds on the ion-pair formation-partition equilibrium for various aliphatic ammonium ions and picrate in the 1-octanol/water system (14). At positions other than the N-terminus, one of the amide NH sites working as the hydrogen-donor is reduced by replacing the regular primary amino acid residue with proline. By the same token as that for the N-terminal N+-H sites, the reduction of the NH sites would induce reduction of log P'. On the other hand, the steric inhibition effect of the "side chain" of the Pro residue on the hydrogen-bonding solvation of a neighboring CONH or COO- group could be lowered by the cyclization. This reduced steric effect would be favorable to the solvation of the bulkier 1-octanol leading to the augmentation of log P'. For tripeptides, these two oppositely operating factors may be balanced. The positive effect is predominant for dipeptides, but the negative effect gradually becomes dominant for higher peptides with increase in the number of residues. No theoretical rationalization for variations in the balance between these two
204 opposite factors is available at the moment. Measurements of the log P' values of more peptides containing proline at various positions are needed before drawing definite conclusions.
2.7
Peptides Containing Glutamine and Asparagine
Because the log P' value was very low, it was not always easy to measure the value for zwitterionic peptides including Gln (Q) and Asn (N). To understand the effects of these residues on the hydrophobicity of peptides, we measured the log P values of a number of N-acetylpeptide amides containing these residues under conditions equivalent with those for free peptides (data not shown), and formulated Eq. 20 as the counterpart of Eq. 4 (17). log P = 1.044 1;re- 0.570 Itri + 0.237 XE's c + 0.073 Iy + 0.258 Iw (0.047) (0.054) (0.046) (0.075) (0.080) + 1.476 (Is + IT) + 1.162 IQ + 1.753 IN - 2.375 (0.106) (0.121) (0.154) (0.074) n=53
s=0.072
r=0.997
[20]
F8,44 = 8 4 0
In Eq. 20, the E's c terms for side chain substituents are combined into a single XE's c term. This is due to the fact that, in N-acetylpeptide amides, it is invariably the CONH group toward which the side chain substituents exert the steric effect on the relative solvation. The intercept should correspond with the log P value of Ac-Gly-Gly-NH2. Except for these, the corresponding terms are very similar in Eqs. 4 and 20. Although the indicator variable terms for side chains in Eq. 20 are slightly smaller than the corresponding terms in Eq. 4, the correspondence is very good. In fact, for side chains of Ser, Thr, Trp, and Tyr, Eq. 21 was derived, in which RC is the regression coefficient of the indicator variable terms. RC(Eq. 4) = 1.010 RC(Eq. 20) + 0.092 (0.077) (0.082) n=4
s=0.024
[21]
r=0.9997
F1,2=3146
The slope of the side-chain indicator variable terms for Asn and Gln in Eq. 20 was adjusted to conform to the slope for residues in free peptides with use of Eq. 21 and is indicated in Table 3. Indicator variable terms for Asn (N) and Gln (Q) residues are very large. An intramolecular bridging-type solvation
205 between the side chain amide group and the backbone CONH similar to that shown in Fig. 1 is likely to occur in peptides including these residues (20). In fact, they are even larger than those expected from the simple relationshop with the number of bonds between the side chain heteroatom and the backbone after the correction for the intramolecular solvation is made. This indicates that the size of indicator variable terms is also governed by such factors as the number of hydrogen-bonding sites and electronic effect of the polar groups. In any case, by introducing these indicator variable terms in Eq. 18 or 19, the log P' value of free peptides including Asn and Gln should be estimated with considerable accuracy. 0
A N E W E F F E C T I V E H Y D R O P H O B I C I T Y S C A L E O F SIDE CHAINS
3.1
Definition From the results shown in Eqs. 19 and 20, we propose a new effective hydrophobicity scale, ha, for unionizable amino acid side chains as shown in Eq. 22. The na value is defined as the summation of such factors contributing to the "overall" hydrophobicity of each side chain unit as the "intrinsic" hydrophobicity, steric effects on the relative solvation of backbone functional groups, intra-residue hydrogen-bond formation and the proximity polar effect. In Eq. 22, 8 is 0.55 for N-terminal residues and 0.30 for others. The conformational factors are not included since they are attributable to not only the types of amino acid residues, but also their locations in the sequence. Moreover, they are not applicable to di- and tripeptides or to peptides larger than pentapeptides in which other conformational effects such as a - h e l i x formation should be considered. For proline, the nc~ value varies depending upon its situation. = 0.94 [intrinsic n] + ~5E's c + [coefficient of I for each polar side chain and proline]
[22]
The newly defined na values are listed in Table 5. ha(N) and ncdMC) mean the rta values for N-terminal residue and for others, respectively. The value calculated by Eq. 23 with the na value for the nonconformational components is supposed to be the log P' for an imaginary random form. Comparison of the log P'(random) with the experimentally observed log P' should be useful to obtain information on the component attributable to the effect of the conformation. log P'(random) = Y~na- 0.58
Ipep -
3.87
[23]
206 T a b l e 5. Hydrophobicity Scales for A m i n o Acids or Their Side Chains a,b Amino Acid Gly gla Val Leu Ile Phe Tyr Trp Met Ser Thr Asn Gln Pro d
na n (N) (MC) (FP) 0.00 0.00 0.00 0.19 0.24 0.31 0.48 0.81 1.22 0.91 1.27 1.70 0.71 1.16 1.80 1.34 1.56 1.79 0.78 1.00 0.96 1.71 1.92 2.25 0.67 0.92 1.23 -0.08 0.04 -0.04 0.07 0.25 0.26 -0.51 -0.26 -0.60 -0.51 -0.31 -0.22 e e 0.72
Af (NT) 0.0 0.5 1.5 1.8 2.5 c 2.5 2.3 3.4 1.3 -0.3 -0.4 -0.8 c -0.5 c 0.8
Ef (R) 0.00 0.53 1.46 1.99 1.99 2.24 1.70 2.31 1.08 -0.56 -0.26 _1.05 -1.09 1.01
AG (W) 0.00 -0.45 -0.40 -0.11 -0.24 -3.15 -8.50 -8.27 -3.87 -7.45 -7.27 -12.07 -11.77 -
AHS (C) 0.00 0.05 0.43 0.22 0.58 0.34 -0.68 -0.25 0.10 -0.41 -0.37 -0.84 -1.19 -0.56
AHS AHP (J) (KD) 0.0 0.0 0.0 2.2 0.3 4.6 0.2 4.2 0.4 4.9 0.2 3.2 -0.7 -0.9 0.0 -0.5 0.1 2.3 -0.4 -0.4 -0.5 -0.3 -0.8 -3.1 -1.0 -3.1 -0.6 -1.2
AHS A[-Z] (E) (H) 0.00 0.00 0.09 2.16 0.38 4.92 0.37 6.42 0.57 6.67 0.45 7.15 -0.14 3.62 0.21 6.98 0.10 4.72 -0.42 0.27 -0.34 1.31 -0.80 -0.99 -0.85 0.05 -0.23 3.45
a The reference point is shifted so that each value for Gly is zero. The values for Gly are: G(W) = 2.39, HS(C) = -0.34, HS(J) = 0.3, HP(KD) = -0.4, HS(E) = 0.16, and -Z(H) = -2.23. b For symbols, see text. c Estimated from the value in ref. 31. d Not included in regression analysis. e n~(location, number of residues) of proline; n,x(N, 2): 0.35, n~(MC, 2): 1.16, n~(N, 3): 0.00, na(MC, 3): 0.81, ha(N, 4): -0.34, na(MC, 4): 0.46.
For tetra- and pentapeptides, the conformational effect was represented by Iturn (= 1) and log fi+2 terms. Thus, examination of the difference between experimental log P' and calculated log P'(random) should allow us to predict the 13-turn potential parameter of any amino acids included in tetra- and pentapeptides. Although it does not apply to peptides including proline at the moment, this procedure may be extended to higher peptides in which secondary structural factors differ from those included in tetra- and pentapeptides. To estimate the log P'(random) value for partial domains of proteins, we recommend the use of 8 = 0.30 for the RM and RC side chains to calculate each n~ value by Eq. 22.
3.2
Comparison with Various Hydrophobicity Scales for Amino Acids and Their Side Chains Quite a few sets of parameters supposedly representing the "hydrophobicity" scale of amino acid residues have been proposed. Comprehensive lists of these parameters have been reported by Eisenberg (27), Charton (28) and Nakai and coworkers (29). These parameters are defined and/or estimated on the bases of various standards that are not always consistent.
207 They are broadly categorized into three groups. Parameters in the first group are defined from phase-transfer properties similar to that used in this study but with individual amino acids and their derivatives or related compounds. The scales in the second group are based on the probability of finding a certain amino acid residue in the interior of globular proteins relative to the probability of finding it in the surface. The third group is a composite of parameters of the above two types of scales. The values are listed in Table 5 and the relationships with the rta(MC) are drawn in Fig. 4. Fauchbre and Pligka (10) have measured the log P' value of N-acetylamino acid amides with a system of 1-octanol/aqueous buffer (pH 7), from which they defined the rt value of side chains as the difference from that of N-acetylglycineamide. Because their rt value inherently includes factors such as steric effects on the solvation of backbone CONH functions, the proximity polar effect between the side chain polar group and the backbone CONH functions and the internal hydrogen-bonding in addition to the intrinsic hydrophobicity, our rta(MC) value for 13 unionizable side chains was expected to correspond with theirs. Eq. 24 was formulated for this correspondence.
[24]
rt(FP) = 1.254 rta(MC) - 0.010 (0.175) (0.167) n=13
s=0.198
r=0.979
FI,ll = 2 4 8
The well known classic scale, Af, of Nozaki and Tanford (30) is based on free energy of transfer (kcal/mol) of amino acids from ethanol to water relative to that of glycine, for which Eq. 25 was obtained.
[25]
Af(NT) = 1.819 rta(MC) - 0.080 (0.277) (0.265) n=13
s=0.313
r = 0.975
F I , l l = 208
In the original publication (30), the Af values for lie, Gin, and Asn are not given. In Table 5 and Eq. 25, the values for these residues were estimated from the work of Segrest and Feldman (31). Rekker (32) has proposed a scale, f(R), named the hydrophobic fragment constant for each structural fragment. It is estimated from the 1-octanol/water log P values of a number of organic compounds including substructures appearing in the amino acid side chains statistically based on the additiveconstitutive nature of log P. The summation of the fragment constant values, Y_,f(R), for constituent substructures of amino acid side chains is related to rt~ as shown in Eq. 26.
208 Af (NT) [kcal/mol]
n (FP)
00
% I
I
|
i
i
na(MC)
na(MC) Ef (R)
AG (W) [kcal/mol]
N
0
-10 I
~
I
o
|
]
i
na(MC)
i
2
2
na(MC)
AHS (J) [kcal/mol]
AHP (KD)
-% II
u 9
II
I
,
i
.
1
o
-4
i
I
AHS (E)
i
o
na(MC)
2
na(MC)
zX[-Z (H)] 9
I
I
o0
0 i
I
0
i
I
1
|
71;ot(MC)
i
2
1
I
0
|
i
1
|
71:et(MC)
Fig. 4. Relationships of Various Hydrophobicity Scales with the na(MC) Parameter
209 Zf(R) = 1.538 rca(MC) - 0.673 Ip + 0.088 (0.130) (0.179) (0.161) n=13
s=0.137
r=0.995
[26] F2,10 = 5 0 6
The Ip is an indicator variable taking unity for polar side chains of Ser, Thr, Met, Trp, Asn, and Gln. Because the Rekker fragment parameter is estimated from log P values of compounds without structural characteristics of amino acids or peptides, it seems to underestimate the contribution of such factors in increasing the molecular log P value comprised in the regression coefficient of indicator variable terms for polar side chains listed in Table 3. The Tyr residue did not require the value of Ip = 1 in Eq. 26. This is in accord with the fact that the regression coefficient for the Tyr residue in Eq. 19 is very low. The slope of the rta term is considerably higher than unity. This is due to the fact that the Rekker value neglects the participation of the steric effect of side chains on the relative solvation in lowering the log P' leading to overestimation of the effective hydrophobicity. A set of phase-transfer parameters somewhat special among the category has been proposed by Wolfenden and coworkers (33). Their parameter, G(W), is the free-energy of transfer (kcal/mol) of RH, in which R is the side chain substituent in amino acids, H2NCH(R)COOH, from a gaseous to aqueous phase. Again, this parameter is based on the property of molecules without characteristic features of peptides. Moreover, the phases dealt with in estimation of the parameter are drastically different from the systems in which the above three types of parameters are defined. Therefore, their parameter is not expected to be related with rta. Preliminary examinations showed that G(W) is correlated only with the number of hydrogen-bondable hydrogens, I H D , in the polar groups on the side chain. Eq. 27 shows the situation in which the reference point of G(W) is shifted to that of Gly and so AG(W) = G(W) G(W)gly.
[271
AG(W) = - 5.661 IHD - 1.405 (1.146) (1.101) n = 13 s = 1.385
r = 0.957
FI,ll
=
118
The addition of the rta term to Eq. 27 did not improve the correlation. Eq. 27 indicates that the water-affinity or the hydration potential of the side chains is governed most significantly by the number of hydrogens capable of hydrogenbonding. The higher the number, the less hydrophobic is the residue. Recently, Radzicka and Wolfenden (34) suggested that the vapor phase resembles cyclohexane rather than octanol in its lack of polarity.
210 As parameters of the second category, those proposed by Chothia (35) and Janin (36) are well known. Janin has defined his parameter as the free-energy of transfer (kcal/mol) from the inside to the surface estimated from the ratio of mol fractions in buried and accessible states of each residue in globular proteins. The original Chothia parameter (35), the proportion of each residue 95 % buried in globular proteins, has been modified to place it on the free-energy-related background (37) similar to the Janin parameter. These two parameters were of course very well correlated with the slope of 1.128, s = 0.115, and r = 0.978 taking the Janin parameter as the independent variable. Wolfenden and coworkers showed that their parameter G(W) and the Janin parameter are well correlated with a correlation coefficient of r = 0.90 (33). This is not unexpected because Eq. 28 formulated here for the Janin parameter, HS(J), shows that it is also heavily dependent on IHD. [28]
zxHS(J) = 0.210 rca(MC) - 0.422 IHD - 0.012 (0.117) (0.110) (0.140) n=12
s=0.109
r=0.975
F2,9 =88.0
In Eq. 28, the Tyr residue is not included. Its HS(J) value was significantly lower than that expected. In spite of this, the interior/surface preference of amino acid residues tends to be governed in part by the phase-transfer energy between the two liquid phases. Because the general feature of hydrophobicity scales in terms of the freeenergy of transfer is quite different between scales with the two liquid phases and scales with the gaseous/aqueous system or the interior/surface preference, and also because it seems unrealistic to expect that all aspects of the "hydrophobicity" of residues can be summarized in a single manner, quite a few parameter sets have been proposed by combinations of different categorical parameters for each amino acid residue. One of these third-category parameter sets is the hydropathy scale, HP, proposed by Kyte and Doolittle (2). They defined their scale by somewhat arbitral amalgamation of the Wolfenden G(W) and the Chothia parameters. Because both the G(W) and the Chothia parameters are strongly dependent on the IHD, the HP value is of course related with the IHD as well as the bulkiness in terms of-E's c but not with the rca as shown in Eq. 29. AHP(KD) = -2.271 E's c - 3.039 IHD + 0.882 (0.878) (0.552) (0.965) n-13
s-0.656
r=0.976
[29] F2,10-100
211
Another set of parameters has been put forward by Eisenberg and coworkers (37) as the "consensus" hydrophobicity scale, HS(E). In this scale, not only the gaseous/aqueous, [G(W)], and the interior/surface parameters of Chothia [HS(C)] and Janin [HS(J)], but also a phase-transfer parameter between organic and aqueous liquids theoretically evaluated by von Heijine and Blomberg (38) are amalgamated with normalization and averaging. Because the consensus HS(E) scale involves the component for the phase-transfer between liquids, Eq. 30 shows the significance of our rta as a component factor. In fact, Eq. 30 is very similar to Eq. 28 for the Janin parameter. The amalgamation for the consensus parameter seems to correct the outlying behavior of the Tyr residue from Eq. 28.
[30]
zxHS(E) = 0.303 na(MC) - 0.393 IHD + 0.012 (0.106) (0.099) (0.130) n=13
s=0.103
r=0.979
F2,1o = 113
Hellberg and coworkers (39) have examined a number of descriptors for the amino acid residues characterizing chemical, spectral, phase-transfer, and chromatographic properties statistically by using the principal component analysis and extracted a principal component supposedly related to the hydrophobicity. For their scale, -Z(H), Eq. 31 was formulated.
[3~]
zx[-Z(H)] - 3.638 na(MC)- 1.166 E's c -0.103 (0.767) (1.139) (0.986) n=13
s=0.745
r=0.974
F2,10 = 92.9
Depending upon the selection of the original parameters for the amalgamation, the third-category scales are heavily governed by either the hydration potential represented by IHD and/or the phase-transfer property represented by rta. In Eqs. 29 and 31, a negative E'sC term is significant. The more negative the E ' sC , the more "hydrophobic" is the side chain. This reflects the fact that the steric inhibition effect of side chain substituents on hydration of backbone CONH groups works to make the side chains more burried inside the globular proteins for the hydropathy scale in Eq. 29. In Eq. 31 for the -Z(H) scale, the principal component analysis of the amino acid descripters probably extracted the scale as rta- 8E's c (8 = 0.30) in Eq. 22, because there is no backbone CONH function upon which the steric effect of side chains is exerted in single amino acid residues.
212 4.
CONCLUDING REMARKS
The above examinations are believed to show that the hydrophobicity of peptides, at least up to pentapeptides, that is estimated from the partitioning behavior in an alcohol/aqueous system such as 1-octanol/pH 7.0 buffer, can be analyzed and predicted by combinations of well-defined side chain and substructural parameters. The composition of the hydrophobicity scale was rather complex but each component was rationalized physicochemically very well except for the composition attributable to the Pro residue. The extensions of the present approach toward peptides including ionizable side chains as well as higher peptides should be future projects. The rt (rta) value defined here as the "effective" hydrophobicity index of side chains or residues is unique in that it was estimated from the experimentally measured net "hydrophobicity" of oligopeptides existing in solutions as such. Most of the hydrophobicity indices of amino acid side chains so far published are defined from partition or phase transfer parameters of single amino acids or their analogs or calculated from the solvent-accessible surface area of each residue in globular proteins or composites of these two types of indices, as indicated in the preceding section. We examined the relationship between our rta and each of the existing parameters somewhat in detail because we would like to propose our rta value as the standard hydrophobicity index of amino acid side chains as components of peptides. In this respect, it should be noted that a recent publication of Eisenberg and McLachlan (40) indicates that the solvation energy of globular proteins in water is well rationalized not only by the solvent accessible surface area but also by an "atomic solvation parameter" of each atom included in amino acid side chains accessible to water. The simple ratio of molecular fractions in buried and water-accessible states for amino acid side chains is obviously an oversimplification in estimating the hydrophobicity. The atomic solvation parameter assignable to each atom is very well estimated from the phase-transfer free-energy based on values with the 1-octanol/water system rather than a gaseous/aqueous system. Eisenberg and McLachlan proposed that the interior environment of globular proteins is adequately modeled by nonaqueous but amphiprotic liquids. In a more recent publication of Sharp et al. (41), the changes in the partition free energy of component amino acid residues in a 1-octanol/water system corrected for solute-solvent size differences were shown to agree well with the changes in unfolding free-energy of a variety of mutant proteins. These publications seem to support our proposal that our rta value could be used as the standard hydrophobicity scale.
213 REFERENCES
1. 2. 3.
4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.
Kauzman, W., Adv. Protein Chem. 14 (1959) 1-63. Kyte, J. and Doolittle, R.F., J. Mol. Biol. 157 (1982) 105-132. a: Hadzi, D. and Jerman-Blazic, B. (Eds.) QSAR in Drug Design and Toxicology, Elsevier Science Publishers, Amsterdam, 1987, pp. 221-297; b: Claassen, V. (Ed.)Trends in Drug Research, Elsevier Science Publishers, Amsterdam, 1990, pp. 73-108. Hansch, C. and Fujita, T., J. Am. Chem. Soc. 86 (1964) 1616-1626. Akamatsu, M., Yoshida, Y., Nakamura, H., Asao, M., Iwamura, H., and Fujita, T., Quant. Struct.-Act. Relat. 8 (1989) 195-203. Akamatsu, M. and Fujita, T., J. Pharm. Sci. 81 (1992) 164-174. Chou, P.Y. and Fasman, G.D., J. Mol. Biol. 115 (1977) 135-175. Iwasa, J., Fujita, T., and Hansch, C., J. Med. Chem. 8 (1965) 150-153. Hansch, C. and Leo, A.J., Substituent Constants for Correlation Analysis in Chemistry and Biology, John Wiley and Sons, Inc., New York, 1979, pp. 17-43. Fauchbre, J.-L. and Pli~ka, V., Eur. J. Med. Chem. -Chim. Ther. 18 (1983) 369-375. MacPhee, J.A., Panaye, A., and Dubois, J.-E., Tetrahedron 34 (1978) 3553-3562. Taft, R.W., Jr., in: Newman, M.S. (Ed.), Steric Effects in Organic Chemistry, John Wiley and Sons, Inc., New York, 1965, pp. 556-675. Hancock, C.K., Meyers, F.A., and Yager, B.J., J. Am. Chem. Soc. 83 (1961) 4211-4213. Takayama, C., Akamatsu, M., and Fujita, T., Quant. Struct.-Act. Relat. 4 (1985) 149-160. Charton, M., Topics Curr. Chem. 114 (1983) 57-91. Charton, M. and Charton, B.I., J. Theor. Biol. 102 (1983) 121-134. Akamatsu, M., Okutani, S., Nakao, K., Hong, N.J., and Fujita, T., Quant. Struct.-Act. Relat. 9 (1990) 189-194. Fujita, T. and Iwamura, H., Topics Curr. Chem. 114 (1983) 119-157. Calculated from the hydrophobic fragmental constants: f(CH3CONH) f(H) = -1.94 - 0.23. The f values were from ref. 9. Fermi, G., Perutz, M.F., Shaanan, B., and Fourme, R., J. Mol. Biol. 175 (1984) 159-174. Abraham, D.J. and Leo, A.J., PROTEINS: Structure, Function, and Genetics, (1987) 130-152. Fujita, T., Prog. Phys. Org. Chem. 14 (1983) 75-113. Leo, A., J. Chem. Soc. PERKIN TRANS. II (1983) 825-838. Venkatachalam, C.M., Biopolymers 6 (1968) 1425-1436. Dickerson, R.E., Takano, T., Eisenberg, D., Kallai, O.B., Samson, L., Cooper, A., and Margoliash, E., J. Biol. Chem. 246 (1971) 1511-1535. Lewis, P.N., Momany, F.A., and Scheraga, H.A., Proc. Nat. Acad. Sci. USA 68 (1971) 2293-2297. Eisenberg, D., Ann. Rev. Biochem. 53 (1984) 595-623. Charton, M., Progr. Phys. Org. Chem. 18 (1990) 163-284. Nakai, K., Kidera, A., and Kanehisa, M., Prot. Eng. 2 (1988) 93-100.
214 Nozaki, Y. and Tanford, C., J. Biol. Chem. 7 (1971) 2211-2217. Segrest, J.P. and Feldman, R.J., J. Mol. Biol. 87 (1974) 853-858. Rekker, R.F., The Hydrophobic Fragmental Constant, Elsevier Science Publishers, Amsterdam, 1977. 33. Wolfenden, R., Andersson, L., Cullis, P.M., and Southgate, C.C.B., Biochemistry 20 (1981) 849-855. 34. Radzicka, A. and Wolfenden, R., Biochemistry 27 (1988) 1664-1670. 35. Chothia, C., J. Mol. Biol. 105 (1976) 1-14. 36. Janin, J., Nature 277 (1979) 491-492. 37. Eisenberg, D., Weiss, R.M., Terwillger, T.C., and Wilcox, W., Faraday Symp. Chem. Soc. 17 (1982) 109-120. 38. von Heijine, G. and Blomberg, C., Eur. J. Biochem. 97 (1979) 175-181. 39. Hellberg, S., Sj/Sstrom, M., Skagerberg, B., and Wold, S., J. Med. Chem. 30 (1987) 1126-1135. 40. Eisenberg, D. and McLachlan, A.D., Nature 319 (1986) 199-203. 41. Sharp, K.A., Nicholls, A., Friedman, R., and Honig, B., Biochemistry 30 (1991) 9686-9697. 30. 31. 32.
QSAR and Drug Design - New Developments and Applications T. Fujita, editor @ 1995 Elsevier Science B.V. All rights reserved
ANALYSIS
OF
PROTEINS
TAKAAKI
ACID
SEQUENCE-FUNCTION
RELATIONSHIPS
IN
N I S H I O K A and JUN' ICHI ODA
Institute
Uji,
AMINO
215
Kyoto
for Chemical
Research,
611, Japan
Kyoto University,
ABSTRACT:
A n e w s t r a t e g y for d r u g d e s i g n is p r o p o s e d , in w h i c h relationships between the function and structure of proteins that are p o s s i b l e t a r g e t s of d r u g s are a n a l y z e d and u t i l i z e d . The f u n c t i o n of p r o t e i n s to r e c o g n i z e the m o l e c u l e s was e x a m i n e d in t e r m s of t h e i r a m i n o a c i d s e q u e n c e r a t h e r t h a n t h e i r t h r e e dimensional structure. Target proteins recognize ligand m o l e c u l e s by f u n c t i o n a l a m i n o a c i d s e q u e n c e s c o r r e s p o n d i n g to chemical substructures of t h e l i g a n d s . The new procedure "Homology Graphing", in c o m b i n a t i o n w i t h the E n z y m e - R e a c t i o n database, could detect sequence segments conserved among a set of sequences of functionally related proteins. Examples of analyses of a m i n o acid sequence-ligand structures showed a great p o t e n t i a l i t y in the lead identification phase in drug design.
1.
INTRODUCTION
Recently,
and-see
drug
complementary
model of
to the
of a t a r g e t
the
binding.
design
approaches
protein For
of
target
protein
and
strategy
finding
the
instance,
site
have been c a l c u l a t e d
advances,
de novo
hundred
be r e a l i z e d
design
in the near
by t r a d i t i o n a l screening.
picoseconds
This
structure-function molecular the
of
(i).
the
supported
Lead-structures
and/or
to
lack
especially
rotations
and
drug
of
by
drug
to
found
large-scale
information
amino
for
technological
is u n l i k e l y
are still
of proteins.
proteins
in
dihydrofolate
these
methods due
static
motions
of 1 f e m t o s e c o n d
structure
relationships of
of
look-
structures
involved
lead
probably
motions
translations
Even w i t h
of a new
relationships,
recognition
simulations like
is
positions
at i n t e r v a l s
future.
beyond
of the m o l e c u l a r
processes
atomic
trial-and-error
chemical
in a t h r e e - d i m e n s i o n a l
dynamic
the
progressed
with
to s i m u l a t i o n
reductase several
has
drugs
acid
on
sequence-
In other words, molecules
of CPK m o d e l s
are
of organic
216 reagents
to find out the
knowledge
of
reaction
relationships, state
"best" rules
no c h e m i s t
structures
of
complementary in
terms
In
this
functions
reactants
proteins,
several are
of o t h e r
relationships structures
chemical 2.
WHY
between
their
of
acid
transition
the
reaction
bases
sequences
must
"molecular of t h e i r
of the
acid
by t h e s e
ANALYZE
proteins:
the the
we are
sequences
that
RELATIONSHIPS,
STRUCTURE-FUNCTION
RELATIONSHIPS
(ligands)
searching
and is,
NOT
amino
acid s e q u e n c e s
their been
in
genes.
in t h e
Protein
Identification
Research
Foundation).
of i n t e r e s t
as targets
crystallographic Laboratory) were
data
rapidly: only
available
number
of
sequence-
THREE-
?
14,372
Protein
Resource
at
DNA
techniques,
M o r e than
90% of the known
from the DNA s e q u e n c e s amino
acid
Sequence the
sequences
Database
National
of
had
(NBRF;
Biomedical
S e q u e n c e data are i n c r e a s i n g
for p r o t e i n s
of drugs and a g r o c h e m i c a l s .
In contrast,
on p r o t e i n s
are
still
in the Protein Data Bank
585
proteins
1989,
NBRF
coordinate
in O c t o b e r
have been registered
biology
have been d e d u c e d
By D e c e m b e r
registered
increasing
molecular
of genes has b e c o m e easy.
for
chemical
2.1 A v a i l a b i l i t y of sequence data progress
as
signals.
to d i f f e r e n t i a t e
sequences;
SEQUENCE-FUNCTION
DIMENSIONAL
by
have
such
of cell
and h o r m o n e s
Therefore,
amino
Proteins
functions
receptors
substrates
chemicals.
between
be defined.
recognition"
and h o r m o n a l
recognized
With
and
sequences
the a p p l i c a t i o n
and transduction
structure relationships.
sequencing
for
three-dimensional
physiological
reactions
in
structures
from t h o s e
amino
We a l s o d i s c u s s
kinds
of e n z y m e s
chemical
biochemical
between
function,
of chemical
interested
ability
the
simulate
in drug design.
the term,
different
catalysis
or
Without
Then, we show how to analyze
relationships.
First,
show
not
and functions.
relationships
We
we
relationships
of
structures
to find
chapter,
the
fitting.
structure-function
can e i t h e r p r e d i c t p o s s i b l e
the
p a t h w a y by energy calculation. examining
of
1989.
entries
known
on
Moreover,
for several proteins with
limited
and
are not
(Brookhaven National crystal
at least
structures
two entries
in the database,
three-dimensional
so the
structures
is
217 actually
less
than
for
drug
available
120. design
interest
for d r u g
localized
in pathogens,
small
design,
quantities,
or
protein
sequences,
"target
(c) h a v e
methods
from
its
the
to
need
for
deduce
sequence, sequence
optimization active
structures,
information
low as
predictions for
relationships
prediction
through
crystal
is
and main-chain
structure
Glutathione
coli enzyme
to
data
on
of
three-
model
of
At present,
the
have
energy
in
the
prediction
We
are
success of
interested
structure
reductase
to
only
-COOH,
catalyzes with
in
not
moiety,
the
from the sequence.
including
two
to
by
the
complex,
to
the
in
the
protein
hydrogen-bonding
misconception
spatial
the r e d u c t i o n
coenzymes,
only at the
where
of
of o x i d i z e d
NADPH
specificity
engineering
and
FAD.
of the E.
(6,7).
The
i/i00 of that to
2'-OH p o s i t i o n
a phosphate
that
orientation
and -NH 2 groups.
enzyme to NADH is only
and NADH d i f f e r
in
sequence
function,
bind
tried to change the c o e n z y m e
adenosine-ribose
of
and e s t i m a t e d
The poor
and p r o t e i n
leads
occurs
glutathione
of and
b e t w e e n the ligand and s i d e - c h a i n
This
such as -OH,
a
a tertiary
molecule,
observed
from NADPH to NADH by protein
NADPH
of
the a v a i l a b i l i t y
interactions
a f f i n i t y of the wild-type NADPH.
as
predictions
drug
(4,5).
assumed
interactions
by p r o t e i n s
P e r h a m et al.
such
W h e n no c r y s t a l l o -
of a p r o t e i n - l i g a n d
generally
groups.
groups
glutathione
in
of a drug to fit into the
between
sequence
a
Even
design.
the
point-to-point
and c h a r g e - c h a r g e
functional
way.
limited
usually roles
structure is recognized by local sequence
molecule
recognition
structure
of the three-dimensional
2.2 Chemical In the
drug
between
has
(2,3).
and
% at most
of
of
of some other protein with
evaluation
protein
the m a t c h
50-60
use
are available,
reliable
the
than
prediction
involve
as a model
in a s a t i s f a c t o r y
is as
structural
ligand
for
less
is increasing.
structure
of the c h e m i c a l
site
secondary values
low
are
weights
a three-dimensional
too
between
far
accurate
however,
so
been
are
for drug design
for related proteins
far
proteins
(a)
larger molecular
graphic data interactions
structures
because
proteins",
data
such as the crystal
a closely related
less
centers and virus coat proteins.
structure
practical template
far
crystallographic
dimensional protein
three-dimensional
are
(b) play important p h y s i o l o g i c a l
those of p h o t o r e a c t i o n Since
The
group
of the
is p r e s e n t
in
218 TABLE
1
Arginines
conserved
in N A D P H - b i n d i n g
NADPH-binding reductases Glutathione reductase E. coli 196 Human 216 Mercuric reductase S__~. aureus 279 S. f l e x n e r i i 298 Trypanothione reductase T__~. c o n q o l e n s e 221
reductases.
*)
F V R K H A P L R S F D M I R H D K V L R S F D m
M Q R S E R L F K T Y D L A R S T L F F R E - D C Y R N N P I L R G F D
NADH-binding reductases Dihydrolipoamide dehydrogenase E. col____!. 202 V E M F D Q V I P S S D Yeast 231 V _E F Q p Q I G A S M D Human 242 V E F L G H V G G V G I *) M o d i f i e d from ref. 6 with p e r m i s i o n of the o r i g i n a l authors. A m i n o acid residues are r e p r e s e n t e d by o n e - l e t t e r symbols. N u m b e r s i n d i c a t e p o s i t i o n s of the first r e s i d u e of each sequence. NADPH, less
but
at
not
the
the
human
ray
analysis
(8,9).
NADH
of
residues recognize residues
NADPH. which with
I) .
might
be
Perham
Arg198
al.
the m u t a n t
positions
was
showed Next,
mutant
less
only to
coenzyme-binding secondary
structures the
site
the
enzyme
of
around
13).
This
requiring
NADH
and
to
in
leucine
mutagenesis. at
the
As two
enzyme,
NADH.
of
NADH
those
site
with around of
reductase
the N A D P H - b i n d i n g type
enzyme
sequence
with
in that
affinity
coli
with
acid
with
arginine
the w i l d - t y p e
glutathione
fold";
the
charges
activity
amino
2'-
to neutral
These
E.
no p o s i t i v e than
the
residues
by m e t h i o n i n e
c a ta l y s i s
by X-
residues
the
site-directed
to N A D P H
human
be
a mutant
the
the
arginine
suppressing
catalytic
"dinucleotide
(12,
dehydrogenases
the
near
replaced
of
charged
dehydrogenases
are
and NADH.
replaced
increased
determined
located
to
compared
In
gy c a l l e d
for
with
catalytic
they
dehydrogenases.
beta-sheet
enzyme
improve
two
NADPH
by u s i n g
slightly
enzyme,
the
charges
strucutre
positively
In o t h e r
constructed were
negative
was
are
concluded
between
side-chains
two
the
residues
unnecessary
et
expected, but
Thus,
and A r g 2 0 4
neutral
NADPH.
were
the d i f f e r e n c e
that
residues
arginine
reductase
are
complex
showed
arginine
these
(Table
glutathione
there
three-dimensional
enzyme-NADPH
of the bound
as coenzyme,
is, The
Results
two
group
that
in NADH.
erythrocyte
side-chains phosphate
in NADH;
2'-OH
the the
other
(i0,ii),
form a topolo-
a beta-sheet-turn-alpha-helixof or
fold
is
NADPH.
found They
commonly showed
in that
219 T A B L E 2 A l i g n m e n t of s e q u e n c e ~ of N A D P H the d i n u c l e o t i d e - b i n d i n g fold.-) NADPH-binding reductases Adrenodoxin reductase Human 151 Octopine synthase Aqrobacterium 8 Malic enzyme Rat 300 Glutamate dehydrogenase Yeast 224 Mercuric reductase S. f l e x n e r i i 276 Glutathione reductase E. coli 174 Human 194 Thioredoxin reductase E. coli 152
and N A D H - e n z y m e s
around
G Q G N V A L D V A R I G A G N V A L T L A G D G A G E A A L G I A H L G S G N V A Q Y A A L K G S S V V A L E L A Q A G A G Y I A V E L A G V G A G Y I A V E M A G I G G G N T A V E E A L Y
NADH-binding reductases Dihydrolipoamide dehydrogenase E. coli 180 G G G I L G Alcohol dehydrogenase Rat 15 G L G G V G_ Lactate dehydrogenase Mouse 25 G V G A V G Glyceraldehyde phosphate dehydrogenase Yeast 7 G F G R I G
L E M G T V L S V V I G M A C A I S R L V M R I
*) M o d i f i e d from ref. 6 with p e r m i s i o n of the authors. Amino acid r e s i d u e s are r e p r e s e n t e d by o n e - l e t t e r symbols. The numbers i n d i c a t e the p o s i t i o n of the first residues of each sequence. dehydrogenases
requiring
GIy-X-GIy-X-X-GIy Gly
is
in the the
replaced E.
by Ala
coli
above
shown
at A l a 1 7 9 ,
mutant
enzyme
have
X is a n y
a highly amino
in d e h y d r o g e n a s e
glutathione
alignments
mutations
NADH
(where
reductase
in T a b l e s Ala183, and
2,
Val197,
finally
they
Lys199,
obtained
the
NADPH
2).
By
further and
a
sequence
while
requiring
)(Table
1 and
conserved
acid),
third
(Ala179
comparing introduced
His200
mutant
in the
enzyme,
Ala179Gly/Ala183Gly/Val197Glu/Arg198Met/Lys199Phe/His200Asp/Arg 204Pro,
enzyme
with
activity
to NADPH.
This
example
phosphate
group,
NADH,
is
phosphate by
group
side-chains
the
structural on
the
interactions
environmental
comparable
illustrates the
recognized
charge-charge
to NADH
enzyme
interactions
and m a i n - c h a i n
important difference not
between
and p o s i t i v e l y of
to that
only
the
fact by
Arg
phosphate
the
2'-
NADPH
and
point-to-point,
negatively
of the
that
between
charged the
of the w i l d - t y p e
charged
side-chains, group
dinucleotide-fold.
2'-
but also with The
the
helix
220 in the
fold
stabilizes
the positive
the n e g a t i v e
helix
by dipoles
tight
turn
that
(14,15). allows
the
The
first
fold
to m a k e
contact with NADH by van der Waals 2.3 Loops are responsible The p e p t i d e
segment
does not always the
a
determined antibody
loop
light
affinity the
of
DNA-binding that,
in
crystallographic
a strict
molecular
Gly
six h y p e r v a r i a b l e
as a s t r u c t u r a l
of
recognition
structure
proteins
(17),
cases,
analysis.
recognition
sites
loops
unit
six
(18,19).
for m o l e c u l a r
For example,
an
the
three-dimensional
structure
forming the loops.
loops
but
to
the
chemical
called the of the heavy
specificity
proteins
synthetase
are r e l a t e d
acid
of
identified
in ras
amino
and
structures
are also
recognition
so
be
of
and tryptophan
of the f l e x i b l e
is
cannot
by the
Loops
such as
but
it
site,
The
are g o v e r n e d
(20), glycogen phosphorylase(21), functions
loops.
a
steric)
(16).
The a n t i g e n - b i n d i n g
consists
form
(less
for m o l e c u l a r
some
by
side of the
residues
a close
region located in the variable domains
chains,
of the b i n d i n g
The
two
interaction
responsible
of its antigen.
hypervariable and
coenzyme
for m o l e c u l a r r e c o g n i t i o n
structure
by X - r a y
makes
structure
of the
take a fixed t h r e e - d i m e n s i o n a l
helix-turn-helix
flexible
charges
charge that is induced at the N-terminal
not
(22).
to the
sequences
2.4 Sequence segments of functional importance are conserved The
related locally
amino
to
each
similar
recognize
and
substitutions importance protein
acid
sequences
other
in
terms
in the regions bind
their
of
fatal
proteins
where
ligand
by r a n d o m m u t a t i o n s
cause
of
their
that
polypeptide
is not inherited.
are
only
fold
Amino
at the p o s i t i o n s
that are conserved among the proteins
closely
chains
molecules.
loss of the p h y s i o l o g i c a l
and this mutation
are
functions,
to
acid
of f u n c t i o n a l
f u n c t i o n of the
Therefore,
are sequences
sequences
of functional
importance. Here,
we
"dinucleotide
briefly fold"
dehydrogenases. evolution
refer
to the b i o l o g i c a l
reasons
is conserved among the sequences
According
of proteins,
to a r e c e n t
the gene
theory
why
the
of different
of the m o l e c u l a r
of a n e w p r o t e i n
evolves
not by
r a n d o m mutations
of the gene of some other protein with different
function,
"exon-shuffling"
but by
(23-26).
In the exon-shuffling
221 theory, acid
exons
coding
residues
rearrangement ferred for
from
of
similarities separate
are
acid
genes,
identical
in
to
residues
function.
same
ancestral
each
other.
sequence, by
find
random
the
not
only
Then
proteins
after begin
mutations. of
the
importance
of m o l e c u l a r
30-50
exons
a new
exons
they
boundaries
of
are
gene
show
trans-
that
codes
whose
genes
local
sequence
divergence
from
amino
of e v o l u t i o n a r y
form
exon would Just
but
of p h y s i o l o g i c a l
a unit
to
the two d u p l i c a t e d
substitutions
difficult are
with
composed
to be a unit
duplications,
and m i x e d
novel
the
segments
supposed
By g e n e
genes
with
inherited
are
genes.
other
a protein
have
sequence
in length
into
the a n c e s t r a l
to
accumulate
With
time,
two exon
nucleic
it b e c o m e s
duplicated
exons.
But,
conserved.
Thus,
exons
are
evolution,
but
also
of
protein
function. 2.5 M o t i f s
are too small
The p o s i t i o n a l different
proteins
are
GIy-X-GIy-X-X-GIy (27,28)
and
motifs.
functionally
to b u i l d only
actions
a peptide
a structural
one
a bound
motif
is u s u a l l y
tif.
Protein
beta-sheets of
five
lowing
supported
secondary
but by the
the m o t i f
composed in t h e called
of
ras the
secondary
27 r e s i d u e s and
including
molecule,
not
of
the
loop"
(35,36).
by w h i c h
is too
Although
than
does the
of
by Chou
and
composed
and
preceding
Fasman
and
fol-
GIy-X-GIy-X-X-GIy
hand,
the
in a l o o p not
motif
a
the mo-
sequence
the m o t i f
and
inter-
folding
"dinucleotide-binding is
short
alpha-helices
local
of
detected
direct
longer
supposed
kinase
Thus,
make
as
On the o t h e r
adenylate
(30),
peptide
far
the
in g r o u p s
a motif
sequences
For e x a m p l e ,
and
recognition.
such
The
zipper
well-known
successfully
the
by a short of the
are
motifs
only
by a s e q u e n c e
(15).
leucine
developed
are
among
patterns).
proteins,
in a motif
as o r i g i n a l l y
"glycine-rich structure
been
residues
find
for m o l e c u l a r
is a p a r t
protein
have
to
residues
"context"
(34).
in d e h y d r o g e n a s e s
DNA-binding
structures
are d e t e r m i n e d
conserved
(or fingers,
methods
ligand
are
serine-proteases
chain
or six r e s i d u e s
(32,33),
of
acid
acid
unit
that
dehydrogenases,
of
unit
or two amino with
(29)
proteins
three-amino
However,
motifs of
several
related
of
(31).
called
sequence
Recently,
patterns
of residues
sequence
zinc-finger
GIy-X-Ser-X-GIy
as a s t r u c t u r a l
patterns
make itself
fold"
same m o t i f structure any
fixed
does
not
222 have
any
fixed
assigned
secondary
depending
is involved.
We have to search
than that of the motifs 3. H O M O L O G Y
GRAPHING:
FUNCTIONAL
If
we
METHOD
could
find
of
molecules
containing
these
nition
common
paring
of a c o m m o n
aligned
There
of
common
segments
similarity addition, of
low
computer
-
may
acid
3.1
amino
be as
(or s u b s t r u c t u r e ) .
as c o n s e r v e d
acid
segments in
to a l i g n m o r e
been
developed
available
by w h i c h among
for
a
set
are
than
usually and
three of
sequences
as
their
In
sequences
although pairs
we d e v e l o p e d regions
of
such
sequences
in p o s i t i o n .
alignment
conserved
acid
length,
(37,38),
Recently,
by com-
in d e t e c t i n g
30 % i d e n t i t y
alignment).
(39,40).
Such
These
sequences
a set of a m i n o
residues
20-
method
are
difficulties
within
low as
has
sequence
of
many
se-
a method
within are
a given
detected
Homology graphing Homology
lative (target
local
graphing
Window
of an a m i n o
and
sequence
from the N H 2 - t e r m i n a l along
eral
The
residues.
step is d e f i n e d 3.1.2 search
acid
as segment-i
with
sequence
segments:
The
stepwise
segment (Figure
against
the cumu-
to be a n a l y z e d
sequences. target
sequence
with
is
a window.
at intervals
in the w i n d o w
of sev-
at the
i-th
i).
of h o m o l o g y value:
is p e r f o r m e d
graphically
to the C O O H - t e r m i n a l
the sequence
sequence
Calculation
is a l i g n e d
and shows
to a set of r e f e r e n c e
The w i n d o w moves
ment-i
calculates
similarity
sequence)
3.1.1 scanned
ity
the
recognize
structure
Graphing"
quantitatively
among
can
chemical
proteins.
programs
OF
present that
longer
(or s u b s t r u c -
related
(pairwise
are
for the recog-
segments
40
that
SEGMENTS
proteins
chemical
it
structure
no p r a c t i c a l
"Homology
SEQUENCE
is
should be r e s p o n s i b l e
certain
similarity
quences amino
20
segments
role
with which
segments
however,
sequence
as
a common
functional
sequences with each other.
functionally
short
its
commonly
related
c o u l d be d e t e c t e d
are,
conserved
TO FIND
segments
functionally
and
of the s e q u e n c e s
sequence
only.
IMPORTANCE
sequences ture),
structure
on the c o n t e x t
For segment-i,
a reference
one of the r e f e r e n c e
sequence sequences,
similar-
set.
Seg-
sequence-
223 Window
NH2-Terminal
COOH-Terminal
i
Target sequence
|
Segment-i
Similarity search Reference sequences m
u
Sequence 1 - - ~
score-i,1
Sequence j - - ~ score-i,j
m
Sequence n - - - ~ score-i,n ~
Homology value of segment-i in Score-ij { if score-ij>Maxd } Score-i1 Fig.
I.
H o m o l o g y graphing.
j, by u s i n g
found,
IDEAS s y s t e m
the d e g r e e
calculated
dent on the amino factors.
limit
acid
If score-ij
is not saved.
is slightly,
composition
is h i g h e r
similarity),
segment-i
than the threshold value.
(42).
length
than a given
the v a l u e
The degree of
of
segment-i.
threshold
(a lower
If not,
it
sequence-(j+l),
and saved if score-i(j+l)
This process
depen-
as to these two
is saved.
reference
is
is
is higher
is repeated until all the
sequences have been compared pairwise with segment-i.
The sum of the score-ij
number
(score-ij)
but significantly,
and the
is a l i g n e d w i t h
then similarity is calculated reference
for the a l i g n m e n t
the v alue is c o r r e c t e d and n o r m a l i z e d
of d e t e c t i n g
Next,
When the best local a l i g n m e n t
from the amino acid mutation data
similarity thus calculated
Therefore,
(41).
of s i m i l a r i t y
of r e f e r e n c e
value of segment-i
(from j=l to n, where n is the total
sequences)
[Equation i].
saved
is d e f i n e d
as the h o m o l o g y
224 H o m o l o g y value of segment-i Score-ij
J
{ if score-ij
The h o m o l o g y v a l u e similarity
and
number
homology
of
of
alignments
3.1.3
segment-i segment.
is
until
showing
calculated
the
for
[i] in the d e g r e e
higher
of
similarity
is r e p e a t e d at each step
COOH-terminal.
each
segment
in
Thus,
the
the
target
Graphing:
To show g r a p h i c a l l y ,
the h o m o l o g y value of
is p l o t t e d
against
at
the
residue
By v a r y i n g three p a r a m e t e r s
movement
}
increase
This p r o c e s s
the w i n d o w
value
sequence.
> threshold
increases with
than the t h r e s h o l d value. of m o v e m e n t
=
of the window,
and t h r e s h o l d
we can detect any sequence
the
center
(window size,
for d e t e c t i n g
segments differing
of
the
step size of
similarity),
in length and simi-
larity. 3.2 H o m o l o g y graphing of glutathione reductase Here, homology human
we
show
graphing.
glutathione
an e x a m p l e The
reductase,
u n d e r the e n t r y name of RDHUU the c r y s t a l composed NADPH-
structure 293),
to 478) domains enzyme
FAD-
central-
(43,44).
of 1.54-2 A
as a t a r g e t Three sequence
sequence
registered
(from r e s i d u e (294 to
analysis acid
364),
using
sequence
of
X-Ray analysis
of
in t h e
(478 residues).
(8-10).
NBRF
database
that this enzyme is 19 to r e s i d u e and
157),
interface-
structures
sequences
(365
of the
includes
those
coenzyme;
19
sequence
are c o m p o s e d
of the F A D - r e l a t e d
reductase
could
detect
are
prepared
from
the
the s e q u e n c e s
NBRF of the
that require NADPH or NADH as a coenzyme; enzymes.
enzymes
of 14 FAD-related
of the s e q u e n c e s
for the c o n t r o l
sets
NAD(P)H-related
27 sequences
graphing
selected
for coenzyme binding.
The first one c o m p r i z e s
enzymes
of
The enzyme was therefore
to test how h o m o l o g y
reference
database.
NAD(P)H-related
enzymes
amino
The three-dimensional
the segments of importance
thione
sequence
the
c o m p l e x e d w i t h FAD and NADPH have also been a n a l y z e d at a
resolution
30
the
is
of the enzyme r e v e a l e d
of four domains:
(158 to
of
target
requiring experiment;
not requiring NADPH,
that
enzymes.
The
second
require
set
FAD as a
These two sets
f u n c t i o n a l l y r e l a t e d to the gluta-
both N A D P H and FAD. sequences NADH,
The third
set is
of n u c l e o t i d e - n o n r e l a t e d
or FAD.
This
set is to detect
225
omain
200
(a)
tO > Cn 0
S
100
f'
100
200
300
400
Residue number
500
NADPH-domain
150
(b) g
100
qJ
ii-,
o 0
E o 50
-r
100
200
,
300
I i
400
Residue number
Fig.
2.
H o m o l o g y graphs of human g l u t a t h i o n e
500
reductase.
A n a l y t i c a l conditions: w i n d o w length = 50 residues, step size = 5 residues, and threshold = 45. R e f e r e n c e sequence sets are ( ) F A D - r e l a t e d and (---) n u c l e o t i d e - n o n r e l a t e d enzymes in graph (a) and ( ) N A D ( P ) H - r e l a t e d and (---) n u c l e o t i d e n o n r e l a t e d enzymes in graph (b). M o d i f i e d from Ref (39) with permission, C o p y r i g h t 1989, A m e r i c a n Chemical Society.
226 the
regions
similar
binding.
A homology
with
a reference
major
peak
130-150,
when
by
graph
170-250,
the
66,
129,
130,
localized
the
other
331,
domains,
peak regions With
homology 245-330. 337,
339,
370
(8-9).
regions
interacting
tively,
as r e f e r e n c e for
and FAD.
tool to detect
cal structures.
4.1
enzyme
cal
combination
unit
of
homology
( i0 ) .
as
These
19 to
The
but
are
51,
two m a j o r peaks
primary 197,
contact
198,
201,
except
all
with
the
the
extracted
chemical
protein
sequence
ligand
at
and nicotinamide
NADPH
molecule
(substructures).
structure
using
respec-
are those
to p r o v i d e
of
a
and chemi-
OF S E Q U E N C E - C H E M I C A L
glutathione
the
290,
structures
sequences
structure recognized
with
and
the bound
224,
enzymes,
is b e l i e v e d
FOR A N A L Y S I S
of h u m a n
the
of
the
370 are e x t r a c t e d
and F A D - r e l a t e d
graphing
on
in the
at 190-245
218,
not
usually
enzymes,
extracted
The regions
57,
are
in the homology graph.
NAD(P)H-related
acid
FAD in
spread
reactions
residues
at
reference
residues
157),
of c a t a l y t i c
between
DATABASE
of m o i e t i e s
chemical
467
recognition
phosphodiester, of
50,
the
with the bound
identified
successfully
RELATIONSHIPS
structure
31,
195,
relationships
interacts
phosphate,
amino
(residues
sequences.
the
complex
The
been
enzymes
Units of chemical In the
the
set.
for
appear
with the bound NADPH and FAD separately
Thus,
ENZYME-REACTION
STRUCTURE
sequence
in the graph.
graphs
sets of N A D ( P ) H - r e l a t e d
responsible
significant
graph
All these residues
regions
homology
as
2a) gave one
are
the
interactions
of
(Figure
These
that make
assigned
reductase
410-460.
2b) showed
residues
been
of g l u t a t h i o n e
peaks
sites
set
to
small
in
segments)
(Figure
The
These
4.
and
coenzyme-
Other
of d o m a i n s .
a reference
graph
have
and
have
because
(conserved
as conserved
NADPH
enzyme
in the F A D - d o m a i n
are on the b o u n d a r i e s
NADPH
and
the
enzymes
80.
peaks
which make primarily complex
related
sequence
50 to
300-340,
with
FAD-enzyme
not
set of F A D - r e l a t e d
nucleotide-nonrelated the
of the
at r e s i d u e s
compared
residues
chance,
recognizable
by proteins
reductase adenine,
moieties.
is
NADPH,
ribose,
3'-
The chemi-
recognized
This by
with
suggests
proteins
as
that is
a a
a
227
I
o
o
', O - P - - O - P - O
/ Fig. into
3. Various possible ways of dividing the structure of NADPH substructures.
substructure twenty).
composed
of
several
atoms
(probably
less
than
The size of substructures recognized by proteins would
be limited by the length of the sequence segments coded by one or two exons. The c o n s e r v e d
graph
of
sequence
glutathione
regions
reductase
detected
are
the
in the h o m o l o g y
sequence
responsible for the recognition of the substructures the NADPH molecule.
segments
contained in
To find the conserved sequence segments for
the r e c o g n i t i o n of the p h o s p h o d i e s t e r moiety, we have to compile a reference
sequence
dehydrogenases,
set
including
but also synthetases,
the
sequences
kinases,
p h o s p h o d i e s t e r m o i e t y is c o m m o n l y p r e s e n t NADPH,
NADH,
substrates
chemical
FAD,
structure
sequence-chemical 4.2
ATP,
and
of these enzymes.
GTP,
relationships
Enzyme-Reaction
only
in the s t r u c t u r e s
are
the
cofactors
the p r o t e i n
we are a n a l y s i n g
"substructure" relationships.
The
of
and
sequence-
are a c t u a l l y
database
There are many possible ways of dividing the chemical struc-
ture of NADPH into substructures tures
which
Therefore,
of not
and ligases.
(Figure 3); from small substruc-
such as -OH and -NH 2 to large ones including the adenosyl-
phosphate problems
moiety,
of
evolutionally
which
and
their
combinations.
substructures
significant,
are recognized by proteins.
are
Here
arise
physiologically
and how many d i f f e r e n t
the
and
substructures
228 /// ENTRY NAME
EC 6.3.1.2 Glut amat e-ammoni a ligase Glu tamine S y n t h e t a s e Lig a s e s bonds For m i n g c a r b o n - n i t r o g e n (or amine) ligases Aci d - a m m o n i a (am i d e s y n t h a s e s ) L-G l u t a m a t e : a m m o n i a ligase (AMP-forming) ATP + L - G l u t a m a t e + NH3 = ADP + O r t h o s p h a t e + L-Glutamine ATP L-Glutamate NH3 ADP Ort h o p h o s p h a t e L-G lutamine L-M e t h i o n i n e s u l f o x i m i n e L-2 - A m i n o - 4 - ( h y d r o x y m e t h y l p h o s p h i n y l ) b u t a n o a t e AJEBQT AJAIQ AJZJQ2 AJAAQ AJE CQ A24714 A05079 A05097 A23970 AJF BO A22 947
CLASS
SYSNAME REACTION SUBSTRATE PRODUCT INHIBITOR NBRF-ENTRY ///
Fig.
4. To
Contents study
database amino This
these
called
acid
types
problems,
we
contains
including
the
their
structure
common
as
classified
by
structures
of s u b s t r a t e s ,
NBRF
inhibitors sequence The
base of
collected
in the
entries
collected
enzymes
by July
1991.
with
each
of
known
2,477
version-up The
the
each
Union
products,
and
is
enzymes.
about
We
IUB
keep
entry
41.5 the
%
codes
the
datanumber
for
and
number
database
in the
The
5,864
a name the
effec-
in the N B R F
was
of
reaction
Databank.
Database.
gave
(46),
45).
the names
of B i o c h e m i s t r y ) ,
registered
1984
(40,
4):
activators,
Protein
database
the
in
of
and E C - n u m b e r s ,
cofactors,
our
a
analysis
1,027
EC-number of
enzymes
biochemically
updated
with
the
of the NBRF database.
total
Enzyme-Reaction with
Since
enzymes
sequences
characterized
in
construct
(Figure
Enzyme-Reaction
NBRF
for
names
the e n z y m e s
to the
relationships
and the B r o o k h a v e n
of all
for
items
(International
reaction
database
entries
are
and
IUB
started
Database
following
chemical tors,
database. have
Enzyme-Reaction
sequence-chemical
database
of e n z y m e s
of E n z y m e - R e a c t i o n
number
updating
compounds
of
Database are
of
the
stored
chemical was
compounds
1,554
database. by
molfile
in
July
The
registered 1991
chemical
format
and
in
the
increases
structures
(Molecular
of
Design
229 Ltd.,
San
MACCS
system
format
Chemical search
Leandro,
are
stored
Chem
A,
of
32
FAD,
Software
coordinates The
substructures.
ring
is
system,
form a n e w
as
compounds
substructures
database.
the
all
the of
into
found
hetero the
another This
now
substructures
to
System.
datafile by
atom
the
hetero
database. atom
are
result
only
connected,
rules
those
in a they
2,764
that
other
of
a
to a set of
project,
out of the
apply
and if
bonds,
(3)atoms
suggests
listed to
(2)
by multiple
these
in
their
substructures in
research
trying
CONCORD
Software
a
substructures
(49). are
using
three-dimensional
substructure,
can be a u t o m a t i c a l l y We
by
to
compounds
indexed
(i)
Pomona
substructure,
When we a p p l i e d in
the
possible
a
a Med-
started
of
registered
form
to the
substructure.
are
follows: it
have
a substructure
list
to
their
database
to
We
in the M e d C h e m
if two or more
were
store
using
Project,
3100. database
and
structures
is i n c l u d e d
different
reduce
have
connected
(4)
substructures 400
the
attached
atom
gives
from m o l f i l e
space
structures
construct
in
substructures
atom
Reaction
to
chemical
atoms
carbon
4,733
We
in the
define
carbon the
is
structures
Chemistry
on V A X s t a t i o n
at Austin)
step
for
substructure-
database
to save disk
Enzyme-Reaction
compounds
on a
Institute
acyl-derivatives
chemical
(Medicinal
CA)
the
next
hydrogen
the
(47,48)
into a THOR d a t a b a s e
the
included We
format
of Texas
the
including
three-dimensional
in
(University
in the
For e x a m p l e ,
against
molfile
and s u g a r - n u c l e o s i d e .
System
the
registered
which
compounds
Claremont,
generate
ester
in
related-enzymes
on F A C O M - 3 8 0
translating
SMILES
structures
of the
University.
NAD(P),
we are
into
College,
Kyoto
chemical
EC-numbers
pyrophosphate
list
Now,
The
the
installed
Research,
Coenzyme format
(MDL)
with
output
CA).
with
about
Enzyme-
rules
to
biological
significance. 5.
APPLICATION STRUCTURE
Previously, tures
of
drugs
OF S E Q U E N C E - S U B S T R U C T U R E
we
showed
supposed
sequence
similarity
segments
detected
substructure
RELATIONSHIPS
IDENTIFICATIONS
and in
to
our
strategy
interact
homology the
relationships
with of
be u s e d
identify target
graphing
analysis could
to
(39,
amino
TO
lead
proteins 50). acid
as f u n c t i o n a l
LEAD
strucusing
Sequence sequencetemplates
230 that
specifically
a sequence a
region matching
protein,
sequence
the
to
with
substructures. listed,
many
a high
be a b l e
chemical
that
combination
of
some
recognizes chemical of
constituting
structure
but
the
modifications
corresponding the
target
suggested
together three
for
on
together.
a
as
For
These
This
combinations
of
structure 5).
phosphate
to
called an
broad
is
more
binding
bind
substrate
be
of
which
"effector"
has
no
a
ligand
new
"modulator".
no
segment
by s c a n n i n g
on
accepts
and
oxidized
from
site
with
various
compound,
and
binding
site
by c y t i d i n e
tri-
to the binding
site
similarity
CTP binds
are
either
binding
A
the
the
structural
nicotineamide,
to
with
by the
structure
site
is i n h i b i t e d
from that of aspartic
or
of
separately
FAD,
of
structure
reductase
a broad
binding
structural
of the enzyme.
domain
to
so
lead
substructures
or
NADPH,
alloxan,
carbamyltransferase
(CTP),
When
All
using
strictly
of the substrates.
a new
by
of the
is d e t e c t e d
site
of
of
A protein
Part
drastic
the
the
than two c o m p o u n d s
us to construct
composed
may
part
glutathione
substrates,
for
interactions
design.
templates,
give
on the w a y
molecule.
by the protein.
substrates
the
chemical
latter
set
substructures
structures
somewhat
example,
substructures
the substrate
on a d i f f e r e n t
its
candidates
lead
structure
is
the
structures.
be r e q u i r e d
substructures the
a
substructures
is r e c o g n i z e d
recognizes
prompts
moieties,
Aspartate
acid,
with
single
binding-affinity
(Figure
lead
usually
compounds
cysteine
the
obtain
lead
of
ligand
The
accept
to c e r t a i n
sites.
glutathione.
whose
may
sequence
ligand
is not.
protein
are as f o l l o w s .
through
the
not to be recognized
An e n z y m e
different
of the
the rest
ligand
find
relationships
its ligand molecule
substructures
as
should
to
The
of
template
containing
of these
the
set
structures
strategies
sequence-substructure
protein,
to i d e n t i f y
of substructures.
Additional
could
combinations
constraint
the
of the t a r g e t p r o t e i n
we
a given
by
compounds
the s e q u e n c e
Among various
different
structure
to
When
in the sequence
by the p r o t e i n .
templates,
combinations
is found
of the leads.
characterized
affinity
By s c a n n i n g
various
we w o u l d
possible
a template
be r e c o g n i z a b l e
show
substructure.
drugs
substructures
substructure
would
expected
characterize
acid
to a s p a r t i c
(51,52).
Since proteins
of
CTP is
interest
231
O I
O
I
NH
HO
j....
-\---/----I
t/),
may
have
scanning
templates,
known The
a binding
the target
L
.
.
.
.
.
.
HN
I
for
sequences
cases,
O
research
Research
the M i n i s t r y
was
from the three substrates
not well
an u n k n o w n
with various
we may find new binding
present
sites
supported
on Priority Areas,
of Education,
oll
HS
, I I
ligands.
Scientific
0
O
in most
site
0
2
Fig. 5. C o m b i n a t i o n of substructures gives n e w lead structures. are,
o
N ~NH
H3C H3C
for drug d e s i g n
o
characterized,
effector
molecule.
conserved
sequences
for compounds
by
a
"Genome
Science and Culture
they
as
other than
Grant-in-Aid
Informatics",
of Japan.
By
for
from
REFERENCES
1 2 3 4 5 6 7 8 9
U.C. Singh, in: The Third Alliant C h e m i s t r y Colloquium in Tokyo, 1989. T.L. Blundell and M.J.E. Sternberg, Trends Biotech., 3 (1985) 228-235. T.L. Blundell, B.L. Sibanda, M.J.E. Sternberg, and J.M. Thornton, Nature, 326 (1987) 347-352. W. Kabsch and C. Sander, FEBS Lett., 155 (1983) 179-182. K. Nishikawa and T. Ooi, Biochem. Biophys. Acta, 871 (1986) 45-54. N.S. Scrutton, A. Berry, and R.N. Perham, Nature, 343 (1990) 38-43. S. Greer and R.N. Perham, Biochemistry, 25 (1986) 2736-2742. E.F. Pai, P.A. Karplus, and G.E. Schulz, Biochemistry, 27 (1988) 4465-4474. P.A. Karplus and G.E. Schulz, J. Mol. Biol., 210 (1989) 163180.
232 i0 Ii 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
40 41
P.A. Karplus and G.E. Schulz, J. Mol. Biol., 195 (1987) 701729. P.A. Karplus, E.F. Pai, and G.E. Schulz, Eur. J. Biochem., 178 (1989) 693-703. M.G. Rossmann, A. Liljas, C.I. Branden, and L.J. Banaszak, Enzymes, ii (1975) 61-102. C.I. Branden, Q. Rev. Biophys., 13 (1980) 317-338. W.G.J. Hol, P.T. Van Duijinen, and H.J.C. Beendsen, Nature, 273 (1978) 443-446. R.K. Wierenga, M.C.H. de Maeyer, and W.G.J. Hol, Biochemistry, 24 (1985) 1346-1357. R.K. Wierenga, P. Terpstra, and W.G.J. Hol, J. Mol. Biol., 187 (1987) 101-107. R. Schkeif, Science, 241 (1988) 1182-1187. P.T. Jones, P.H. Dear, J. Foote, M.S. Neuberger, and G. Winter, Nature, 321 (1986) 522-525. C. Chothia, A.M. Lesk, A. Tramontano, M. Levitt, S.J. SmithGill, G. Air, S. Sheriff, E.A. Padlan, D. Davies, W.R. Tulip, P.M. Colman, S. Spinelli, P.M. Alzari, and R.J. Poljak, Nature, 342 (1989) 877-883. M.V. Milburn, L. Tong, A.M. deVos, A. Brunger, Z. Yamaizumi, S. Nishimura, and S.-H. Kim, Science, 247 (1990) 939-945. E.J. Goldsmith, S.R. Sprang, R. Hamlin, N.-H. Xuong, and R.J. Fletterick, Science, 245 (1989) 528-532. C.C. Hyde, S.A. Ahmed, E.A. Padlan, E.W. Miles, and D.R. Davies, J. Biol. Chem., 263 (1988) 17857-17871. C.C.F. Blake, Nature, 273 (1978) 267. J. Rogers, Nature, 315 (1984) 458-459. M. Cornish-Bowden, Nature, 313 (1985) 434-435. M. Marchionni and W. Gilbert, Cell, 46 (1986) 133-141. W.H. Landschulz, P.F. Johnson, and S.L. McKnight, Science, 240 (1988) 1759-1764. C.R. Vinson, P.B. Sigler, and S.L. McKnight, Science, 246 (1988) 911-916. A. Klug and D. Rhodes, Trends Biochem. Sci., 12 (1987) 464. R.F. Smith and T.F. Smith, Proc. Natl. Acad. Sci. USA, 87 (1990) 118-122. H.O. Smith, T.M. Annau, and S. Chandrasegaran, Proc. Natl. Acad. Sci. USA, 87 (1990) 826-839. P.Y. Chou and G.D. Fasman, Adv. Enzymol., 47 (1978) 45-148. J. Garnier, D.J. Osguthorpe, and B. Robson, J. Mol. Biol., 88 (1978) 873-894. W. Kabsch and C. Sander, Proc. Natl. Acad. Sci. USA, 81 (1984) 1075-1078. E.P. Pai, W. Kabsch, U. Krengel, K.C. Holmes, J. John, and A. Wittinghofer, Nature, 341 (1989) 209-214. E.F. Pai, W. Sachsenheimer, R.H. Schirmer, and G.E. Schulz, J. Mol. Biol., 114 (1977) 37. M. Murata, J.S. Richardson, and J.L. Sussman, Proc. Natl. Acad. Sci. USA, 82 (1985) 7657-7661. D.J. Lipman, S.F. Altschul, and J.D. Kececioglu, Proc. Natl. Acad. Sci. USA, 86 (1989) 4412-4415. T. Nishioka, K. Sumi, and J. Oda, in: P.S. Magee, D.R. Henry, and J.H. Block (Eds), Probing Bioactive Mechanisms, ACS Symposium Series, No. 413, American Chemical Society, 1989, pp.i05-122. K. Sumi, T. Nishioka, and J. Oda, Protein Eng. 4, (1991) 413420. W.B. Goad and M. Kanehisa, Nucleic Acids Res., 10 (1982) 247263.
233 42
43 44 45 46 47 48 49 50 51 52
M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt, in: Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3, National Biomedical Research Foundation, Washington, D.C., 1978, pp. 345-352. G.E. Schulz, J. Mol. Biol. 138 (1980) 335-347. R. Thieme, E.F. Pai, R.H. Schirmer, and G.E. Schulz, J. Mol. Biol. 152 (1981) 763-782. M. Suyama, T. Nishioka and J. Oda, unpublished. International Union of Biochemistry, Nomenclature Committee, Enzyme Nomenclature, Academic Press, Orlando, FL., 1984. D. Weininger, J. Chem. Info. Comp. Sci., 28 (1988) 31-36. D. Weininger, A. Weininger, and J.L. Weininger, J. Chem. Info. Comp. Sci., 29 (1989) 97-101. T. Nishioka and J. Oda, unpublished data. H. Kato, M. Chihara, T. Nishioka, K. Murata, A. Kimura, and J. Oda, J. Biochem., i01 (1987) 207-215. K.L. Krause, K.W. Voltz, and W.N. Lipscomb, J. Mol. Biol., 193 (1987) 527-553. K.H. Kim, Z. Pan, R.B. Honzatko, H.-M. Ke, and W.N. Lipscomb, J. Mol. Biol., 196 (1987) 853-875.
This Page Intentionally Left Blank
QSAR and Drug Design - New Developments and Applications T. Fujita, editor 9 1995 Elsevier Science B.V. All rights reserved
235
BACKGROUND AND FEATURES OF EMIL, A SYSTEM FOR DATABASEAIDED B I O A N A L O G O U S S T R U C T U R A L T R A N S F O R M A T I O N OF BIOACTIVE COMPOUNDS Toshio Fujita, Michihiro Adachi, Miki Akamatsu, Masaaki Asao, Harukazu Fukami, Yoshihisa Inoue, Isao Iwataki, Masaru Kido, Hiroshi Koga, Takamitsu Kobayashi, Izumi Kumita, Kenji Makino, Kengo Oda, Akio Ogino, Masateru Ohta, Fumio Sakamoto, Tetsuo Sekiya, Ryo Shimizu, Chiyozo Takayama, Yukio Tada, Ikuo Ueda, Yoshihisa Umeda, Masumi Yamakawa, Yasunari Yamaura, Hirosuke Yoshioka, Masanori Yoshida, Masafumi Yoshimoto, and Ko Wakabayashi EMIL Working Group, Department of Agricultural Chemistry, Kyoto University, Kyoto 606-01, Japan* ABSTRACT : Various structural transformation processes observed in a number of past developmental examples of pharmaceuticals and agrochemicals are regarded as being invaluable precedents for the prospective analog design. In certain cases, (sub)structural transformation patterns are interchangeable among various compound series in spite of differences in their pharmacological category. Thus, the patterns extracted with a computer-readable format could be accumulated and integrated as a database for potential "rules" for bioanalogous molecular transformations. EMIL is a system that incorporates the database and a data-processing engine constructed to release "higher-ordered" candidate structures from a "lower-ordered" input structure "automatically". Conceptual background for the database construction and the procedure for the database collection are presented on the basis of some lead evolution examples among pharmaceutical and agrochemical series of compounds. 1. INTRODUCTION There are numerous series of compounds exhibiting specific biological effects. Examples exist among such pharmaceuticals as those acting to nervous, circulatory, respiratory, digestive, and immunoregulatory systems and chemotherapeutics including antimicrobial and anticancer agents as well as among such agrochemicals as insecticides, herbicides, and fungicides. In each series, an ultimate prototype lead compound has been identified or disclosed first. In certain cases, bioactive principles in natural products, including secondary metabolites of animals and plants and endogenous participants such as hormones and signal-transmitters, are the origin of *The corresponding author and the business addresses of authors are listed at the end of this article.
236 the lead compound. In many instances, it is selected from organic compounds synthesized intentionally or unintentionally. The structure of the prototype lead compound is usually modified variously so as to improve the profiles of biological activity and to potentiate the target activity as well as to eliminate undesirable side effects including chronic toxicities and environmentally hazardous behaviors. There seem to exist two aspects in the structural modification processes. The one is the optimization of the lead structure with a systematic replacement of substituents keeping the skeletal structure (almost) unchanged. This is often called the "lead optimization" (1). The other is the structural transformation usually associated with more or less "drastic" variations in the skeletal structure. The structural transformation is usually performed into more elaborated or "higherordered" lead structures one after another consecutively, quite often in different institutions independently and/or competitively. These consecutive structural transformations could be called the "lead evolution" (2). Of course, the lead optimization can be made starting from the "intermediary" lead structure in each step of the consecutive lead evolution processes. How to make the lead evolution, i.e., the lead evolution strategy is also called the analog design (3). Although the disclosure or identification of the ultimate prototype structure is the prerequisite for the structural modifications, the lead evolution is perhaps most important from the synthetic chemical points of view to obtain patentable pharmaceuticals and agrochemicals having newer generation skeletal structures. In the structural transformation or lead evolution series, a majority of individual steps may originally be attempted on trial-and-error bases. However, because structural transformation patterns included in these steps have eventually been "utilized" in improving or at least in retaining the bioactivity profile, they are well regarded as being invaluable precedents for the analog design or "bioanalogous" molecular transformation (4). If these precedents are integrated and organized as a database for the bioanalogous transformation "rules" and the database is incorporated into a system so that any prototype or "lower-ordered" lead structures introduced into the system are processed with the rules to release elaborated or "higher-ordered" candidate structures as the output "automatically", the system could be a great benefit for the synthetic medicinal and agricultural chemists. We have been working on a project to construct a computerized system for the lead evolution or analog design, named EMIL : Example-Mediated-lnnovation-for-Lead-Evolution (5, 6). In this article, after showing some lead evolution examples, we demonstrate that certain (sub)structural transformation pattems are interchangeable among various series of bioactive compounds in spite of differences in the pharmacological category. Then, we illustrate how to collect the database and how to operate the EMIL system for the analog design.
237 2. LEAD EVOLUTION EXAMPLES From among a number of examples, we selected two each for pharmaceuticals and agrochemicals of current interest. In each example, the lead evolution processes were examined according to a "tree" in which structures are arranged not necessarily in the chronological order but from the most primitive (but not always simplest) structure toward the more elaborated (but not always the more complex) one somewhat concisely. If bioactive compounds before and after a certain structural transformation in lead evolution processes elicit analogous biological responses, the transformation could be bioisosteric and the two compounds or two interchangeable substructures be bioisosters in a broader sense. Here, we adopt the terms, "bioanalogous" and "bioanalog", instead of "bioisosteric" and "bioisoster", respectively, as proposed by Floersheim and coworkers (4). The term "bioanalogy" can be used more flexibly than "bioisosterism" without being restricted by the basic definition of the isosterism including isometricity in terms of various physicochemical parameters (7 - 9). 2.1 Cromakalim and Related Potassium Channel Activators. Figure 1 is a simplified lead evolution tree of cromakalim analogs, which are potassium channel activators exhibiting smooth muscle relaxation effects such as antihypertensive and anti-bronchial asthmatic activities (10 - 12). The very prototype was synthesized at Beecham (now SmithKline Beecham) in the early 1980's with an idea that the cyclization of the side chain in such I]-adrenoceptor antagonists (13blockes) as alprenolol (1) to restrict its conformational freedom may give compounds retaining the antihypertensive activity lacking side effects associated with l-blockers (10). The ring-closured compound of the structure 2 was found to indeed show an antihypertensive activity without 13-blocking effects. The geminal dimethyl at the 2position and the nitro group at the 6 position of compound 2 were necessary for the activity but introduced to enhance the cyclization reaction to form the dihydrobenzopyran skeleton originally (10). During the structural modification trials, the pyrrolidine compound 3 was shown to be highly active in vivo but only moderately in vitro. Thus, cromakalim (4) with a lactam ring was designed and synthesized as a possible metabolite of the pyrrolidine compound 3 and proved to be highly active (10). In the course of lead evolution processes starting from cromakalim (4), the lactam structure was successively transformed via the acyclic amide (in 5) and urea (in 6) structures into the cyanoamidine (in 7), cyanoguanidine (in 8), and triazolediamine (in 9) structures. These transformation patterns are shared by quite a few series of compounds of different pharmacological categories as will be shown later in section 3.2.2.
to
2
1: alprenolol
3
4 9cromakalim (lemakalim)
5
6
/ NCN
NCN
~ ~.~ 7 9KP 293
H3C~N--N
~ ~~
~~'~~
8
9
~o
~o
10" NIP 121
~o 12 "bimakalim
11 9emakalim
,
e.~.
P~.o
9~.o N
.~
o
N
~o NC I ~ ~ O . ~ , .
S.-c-N~cN H
O 2 N ~ --- CH2F
13" Ro 31-6930
14" TCV 295
15" YM 099
16" EMD 57283
17" SR 44994
Fig. 1. Simplified Structural Evolution Tree of Cromakalim Analogs.
18" KC 399
239 One of the other pathways is an elaboration of the lactam moiety leading to compounds 10, 11, 12, and 17 and to pyridine N-oxides 13, 14, and 15. A recently reported acyclic thioamide KC 399 (18) from Chugai (12e) is one of members designed and synthesized (13) with a combination of structural features of bimakalim (12), in which the dihydropyranol structure of the preceding compounds is dehydrated into the benzopyran (11), and aprikalim (19) belonging to an independent S~c.NHCH3 series of potassium channel activators (12a), in which a thioamide 6~sk..v o structure is attached at the c~-position to the aromatic system. The compound 18 was reported to be some 1000-fold more potent than 19: aprikalim cromakalim in relaxation of precontracted rat aorta (12e).
2.2 Non-peptide Angiotensin II Receptor Antagonists. The title compound series are recently attracting enormous attention to develop antihypertensive agents which are orally active with a prolonged duration (14). In the course of structural transformations leading to increasingly potent antagonists, it has been shown that there are at least two subtypes of the receptor, AT1 and AT2 (15). Structures arranged in Fig. 2 showing a summarized evolution tree are mostly those of the AT1 antagonists (16 - 25). The ultimate lead compound in this series is CV 2198 (20) which was synthesized by scientists at Takeda in the late 1970's in a series of projects for derivatization and screening of 1-benzylimidazole-5-acetic acid analogs (16). Because this compound 20 and its close analogs were among the first as the nonpeptide angiotensin II receptor antagonists, a number of research groups over the world started projects for transformation of the structure of compound 20 as the lead (14). Among intensive efforts, a great break-through is likely to be the disclosure of DUP 753 (23: losartan) at DuPont (now DuPont Merck) publicized in the late 1980's (17), because numerous analogs developed following losartan either share the 2'tetrazolyl-biphenyl-4-yl-methyl structure in common (in 24 - 26, 30, 31, 36, and 37) or have closely related biarylylmethyl structures carrying an acidic group bioanalogous to the tetrazolyl at the position corresponding to that in the biphenylyl structure (in 28, 29, 32 - 35, and 38) as an indispensable moiety. The imidazole moiety originally included in CV 2198 (20) has been variously transformed into spiro (in 30), oxy-aryl (in 26), and condensed bicyclic (in 31 - 38) systems as well as ring-fissioned structures (in 24 and 25). Candesartan cilexetil (31) is a prodrug. The ester moiety of this compound is metabolized into the free carboxylic acid, candesartan, as the active form in vivo (21a). One of the most recently reported compounds, L 162313 (35), has been revealed to be a partial
( - ~ , N,'r C1 X~/
N
~u-~. ,'r
.N~CH2COOH
C1
N
~u~. ,y
.N~CH2COOH
C1
,
N.~CH2COOMe
N
,'r
C1
N
.o. "v"
~ _ ~.~
~
.N~'~CH20 H ,~,,,,~,,~N,,~COO H V ' ~ N ' ~ ~
N
"~
O ---t~ ~-1~
20:CV 2198 /
~
21 :EXP 6155 O: ~ /2"EXP6803 /
~_~.~,~ ~COOH
~
~
Tet~ j ~ 23ilosartan
~
TetI ~ TetI ' ~ 24"valsartan / 5 " A 8 1 9 8 8
Vet I ~ ~TM 26"ICID8731
~u-~'~~u~ ~u-~'~~~-~o~o~~~. ~ ~. ~o ~.~ ~o-~~z~,
CF3SO2NH
27 9eprosartan
I
H --.t~.~.
28 "saprisartan
3
HO(~" ",,a,"
29 9SC 52458
30 9irbesartan
~-'~o
~ PhC
31 9candesartancilexetil
32 9TAK 536
N~'~Me
BuOC BuOC
33 9telmisartan
34 9MK 996
35 9L 162313
36 9tasosartan
37 9CL 329167
38" L 162393
Fig. 2. Simplified Structural Transformation Tree of Non-peptide Angiotensin II Receptor Antagonists (Tet 9tetrazol-5-yl).
1".9
241 antagonist acting also as the agonist to the AT1 receptor (22). This compound is the first non-peptide agonist of peptide receptors outside the opiate system. Another, L 162393 (38), is one of the balanced angiotensin II antagonists capable of potent binding to both AT1 and AT2 receptor subtypes (23). The AT1 binding potency of this compound in vitro is about 100 times higher than that of losartan at a subnanomolar level. The structure of compound 26 is unique as is that of eprosartan (27). In compound 26, the acidic biarylylmethyl group is attached to the heteroaromatic ring via oxygen. Eprosartan (27) has an acrylic acid side chain and the carboxyphenyl instead of the acidic biarylyl. In leading to these and related structures, threedimensional superimposition pattems of the small-molecule antagonist candidates on a putative pharmacophore model of angiotensin II has been examined iteratively (24, 25). The angiotensin II model has been constructed with structure-activity studies of its peptide analogs containing conformationally constrained replacement of key amino acid residues and conformational analyses of active analogs. The structural modification of this series of compounds is a typical example for the lead evolution associated with the lead optimization from the intermediary lead structures. Substituents at various positions in each structure of compounds shown in Fig. 2 are mostly those optimized with the more or less systematic modifications of the substituent structure in terms of the in vitro binding as well as the oral activity and its duration. The activity potentiation of the order of 10- to 50fold in the optimization phase is not unusual, if the substituent selection has been done appropriately.
2.3 Fungicidal [~-Methoxyacrylates and Analogs. o~-Substituted-aryl-[~-mcthoxyacrylatcs and their analogs such as o~methoxyiminophenyl-acetates and -acetamides are now being developed as agricultural fungicides with a systemic as well as a broad spectrum activity. Figure 3 shows a simplified lead evolution scheme of this series of compounds (26, 27). The original lead compound, strobilurin A (39), is a fungicidal principle included in small agarics belonging to species of Strobilurus and Oudemansiella which grow on decaying woods. There arc a number of analogs differing in substitution patterns on the conjugate polyene moiety and the benzene ring (28). The toxophoric structure of compounds in Fig. 3 is likely to be the "[3-methoxyacryloyl" or "methoxyiminoacetyl" moiety, but the corresponding free acids are known to exhibit only a very low activity. The fungicidal activity is due to the inhibition of the respiratory chain of fungi (29). The target site is believed to be the cytochrome bcl complex located in the inner membrane of fungal mitochondria.
242
OMe
OMe
!
OMe 39 9strobilurin A
40
~ 42
O~oMe I OMe
[~O
OMe !
OMe
41 OMe ~
~
[ ~O
O
i
Ooe OMe~ ~
M
NHMe 43" SSF 126
OMe 44" BAS 490F
,, N,,.Y-'N.o CN
O
OMe I
45" ICIA 5504
OMe
I~NSJ
OMe |
46
OCH3
Fig. 3. Structural Transformation Tree of 13-Methoxyacrylates and Analogs. The structural transformations from strobilurin A (39) to ICIA 5504 (45) have been made to increase the photostability and to decrease the phytotoxicity as well as to increase the systemicity into the plant body suffering from fungal diseases by adjusting the molecular hydrophobicity (26). Although the design principle of SSF 126 (43) is its own being from the ring fission trials of fungicidal carbamoyl isoxazoles (30), it is reasonable to locate this compound following the ICIA compound 41 in the lead evolution tree. Currently (August, 1994), besides ICIA 5504 (45) by Zeneca and SSF 126 (43) by Shionogi, BAS 490F (44) is being under extensive trials for commercialization by BASF (26). 2.4 Arylsulfonylureas and Related Herbicides. The ultimate lead compound of this series, INU 3373 (47), was serendipitously found to show a modest plant-growth retardant activity in the mid-1970's by Levitt and his coworkers at DuPont (31). The discovery of sulfonylureas such as chlorsulfuron (48: a wheat/barley herbicide), metsulfuron methyl (49: a wheat/barleyl/rice herbicide) and thifensulfuron methyl (52: a wheat/barley herbicide) shown in Fig. 4 was the fruits of extensive efforts of DuPont scientists (32). These and a number of analogous DuPont sulfonylureas are characterized by unprecedentedly low dose rates (generally 5 to 50 g a.i./ha with the lowest of 2 g a.i./ha) to eradicate various species of weeds (32). Depending upon structural
~1
.CH3
,COOCH3
SO2NHCONH---(, N - - ~ 47
~
d ON(CH3)2
48 :chlorsulfuron
/
~-
N._ ~,C1 OCH3 ff'-~" g N_--~ ~./~ N~SO2NHCONH-'~q_~ 55 9imazosulfuron
_N~ -N~I~"
Cl
/
~
~~~
N_ OCH3
CH3
OCH3
r
53 9pyrazosulfuron ethyl
~1~
54 9NC 330
I ~
. - >
Fig. 7. Benzocycloalka(di)ene-l-carboxylic Acids as Antiinflamatory Agents (98- 102) and Plant Growth Regulators (103 - 107). >>,---, and > compare the potency between two compounds of both sides in each series in common. We used to study structure-activity relationships of the same type of cyclized arylalkanoic acids (103 - 107) as plant growth regulators (54) the structures of which are also shown in Fig. 7. 1,4-Dihydro-l-naphthoic acid (104) was most potent among them. As the antiinflammatory agent, the indane-l-carboxylic acid derivative (98) was most potent and compound 108 named clidanac was selected as a clinical drug (52a, 55). Of course, the structure-potency patterns need not completely coinside between the two series of compounds. Among partially COOH hydrogenated 1-naphthoic acid series, however, coincidence in C l ~ the potency variations is remarkable suggesting a similarity at ~ J least in the substructural features of the receptor sites between [ 1 the two pharmacologically different series of compounds. ~108: clidanac
3.2.2 Urea, Thiourea, Cyanoguanidine, Nitroethenediamine, and Related Structural Components in Various Bioactive Compound Series. The bioanalogous relationship among the title "polar hydrogen-bonding groups" has been well known since most of them and other related groups were shown as being "interchangeable" with each other in various series of histamine H2antagonists (56). Their general structural feature, as indicated in Table 3, is to consist of the aromatic ring (R), flexible chain (C), and polar hydrogen-bonding grouping (H). Along with thiourea, cyanoguanidine, and nitroethenediamine structures, some other polar hydrogen-bonding groups are arranged in Table 3 as representatives in respective H2-antagonist series in which the aromatic ring (R) and flexible chain (C) are fixed (56, 57). Many of these polar hydrogen-bonding groups are found in various R-C series simultaneously. Although not every combination between the R-C and H moieties is congenial in giving potent compounds, the H structures for the polar hydrogen-bonding group in Table 3 are regarded as being potentially interchangeable. Interestingly, a very similar bioanalogous set of structural components is found in Fig. 1 for the cromakalim series of potassium channel openers. In the consecutive steps from the ring-fissioned acetamino-compound (5) to the methyltriazolediamine
T A B L E 3. Representative H2-Receptor Histamine Antagonists. J R " Aromatic ] Ring j
t C "Flexible Chain k
Ring "R" and Chain "C" H
H 9Polar ] H-Bonding Group
Polar H-Bonding Groups "H" S II
iCH3
)
mNHCNHCH3
109
NCN II
--NHCNHCH 3
CHNO 2
II
---NHCNHCH 3
110: cimetidine
NNO 2
II
--NHCNHCH 3
111
112 o
S
NCN
II
II
mNHCNHCH3
113 NH2
H2N-'J~NANN ~~S~'r
~
O II
--NHCCH2OCCH 3
120 9roxatidine s
~
--CNH 2
N'S'N --NH
~ I! NH 2
115 9ranitidine
116
O ii
II
117 9tiotidine
i
II
--NHCNHCH 3
NSO2NH 2
--NHCNHCH3
O II
---NHCNHCH 3
114
NCN II
S
CHNO2
N"S'N --NH
118" famotidine
~ /1 NH 2
119 o
H3C~N_ N --NH-~NN~.--NH 2
121 9lamtidine
N,,S-N --NH
,, I/' NH 2
122
CHNO 2
II
123
N H
CHNO 2
II mNHCNHCH3 124 9nizatidine
t'~
250 (9), structural components which are replaced one after another are those included in Table 3 as the hydrogen-bonding polar groups. A similar bioanalogous set such as compounds 125 - 127 exhibiting various degrees of smooth muscle relaxant activity have been explored in the synthetic project of compound 18 (12e, 13, 58).
O....C"NHCH3
NCN..~.,NHCH3
O.:.C"~ -
125
126
CN
u CH2F 127
Examples are also found in other series of potassium channel openers, pinacidil (128) and its analogs (129 - 132) (59) and nicorandil (133) and its analogs (134 and 135) (60).
~
N,NcC~_~.Bu
1~ NCN lq@N,, C,,N_~-Bu
128
[~ CHNO2 N J ~ N-C',N_.~t-Bu
129
130
O
N ~ ~ N,i~_~N._~t.Bu H2N,~ NCN ~ N,.C.. N,,.@ 131
~ONO2
132
NCN J~N~ONO2
133
NCN f ~ H2N~I~N~'~'~ -N
135
C1
Further examples exist in imidacloprid and related compounds (136 - 139) which are potent insecticides acting as agonists of the nicotinic receptor of acetylcholine in the insect nervous system (61) and in artificial sweeteners such as cyanosuosan (140 - 142) and superaspartame (143 - 145) series (62).
NNO2 N~NH 136: imidacloprid
A
l -2'Y
137
CHNO2 CI....~N~ C2H5 138: nitenpyram
CHNO2 NXNH NCN
CI 139: acetamiprid
251
N
~ C ~ ~ C O O H
HOOC
140:X=O 141 : X = S 142 : X = NCN
K,~ I
143 : X = O 144 : X = S 145 : X = NCN
It should be noted that, in compounds 5, 7, and 18 in Fig. 1,118 and 120 in Table 3, 125 - 127, 133 - 135, and 139, structural units, which are interchangeable with (thio)urea, N-cyanoguanidine, nitroethenediamine and related structures, have either (thio)amide or N-substituted amidine structures which lack one of the two N atoms in (thio)urea-related structures. The bioanalogous relationship between amide and N-cyanoamidine structures is likely to be disclosed first in penicillins such as 146 and 147 showing an antibacterial activity at comparative levels (63). The possibility for the cyanoamidine compound 147 to be active after hydrolysis giving the amide was excluded. The cyanoamidine is stable enough chemically and tolerable against enzymatic hydrolyses. NCN
O/~-'N ~.,SCOOH 146 :penicillin G
o,~N 147
I,,,COOH
3.2.3 F r o m " A m i d e s " to Cyclic D i c a r b o x i m i d e s a n d R e l a t e d Structural Transformation
Patterns
in A g r o c h e m i c a l s ,
Anticancer
Agents, and
Anticonvulsants.
Compounds having the N-phenyl-amide moiety such as anilides (148),Nphenylcarbamates (149) and N-phenylureas (150) are herbicidally active exhibiting various degrees of the Hill reaction (a component of the photosynthetic system) inhibitory potency (64). The most conventional substitution pattern on the benzene ring in these compound series, 148 - 150, is X = 3,4-C12. Propanil (148: X = 3,4-C12, R = Et), swep (149: X=3,4-C12, R = Me) and diuron (150: X = 3,4-C12, R = R ' = Me) are among representatives. They are regarded as being bioanalogous to each other.
148
149
150
There is a family of agricultural fungicides the structual feature of which is that they are N-phenyl cyclic dicarboximides, such as procymidone (151:R1 - R4 =
252 Me, R2 - R3 = -CH2-), vinclozoline (152: R 1 = Me, R2 = CH=CH2) and iprodione (153:R1 = CONHCHMe2, R2 = R3 = H), sharing the 3,5-dichloro-substitution on the benzene ring in common (65). They are particularly effective on Sclerotinia and Botrytis diseases in vineyards and greenhouses.
R2 3 C
_ CI
151
N
1 O
C1
152
O
2 3
153
Structures of the cyclic imide moiety of above fungicidal compounds, the pyrrolidinedione (in 151), oxazolidinedione (in 152), and imidazolidinedione (in 153), can be regarded as being generated through the cyclization of the side chain structures of the Hill reaction inhibiting anilides (148), carbamates (149) and ureas (150), respectively, with the insertion of another carbonyl component. Structures 151 - 153 are bioanalogous. Regardless of the type of atoms next to the carbonyl function, the open chain "amides" ( 1 4 8 - 150) are the Hill reaction inhibiting herbicides and the ring-closured dicarboximides (151 - 153) are fungicides. N-Phenylcarbamates 154 and 155 having structural features common with the herbicides (149) are also fungicidal against gray mold diseases of vines, vegetables, and beans caused by Botrytis strains resistant against benzimidazole-fungicides (66). Thus, in spite of some differences in the target of the biological activity and the optimum substitution pattern on the benzene ring, the open chain "amides" and cyclic "dicarboximides" can be regarded as being bioanalogous. Examples supporting this respect will be shown below. Cl CH3CH20--~ Cl
154
CH3CH20
NHCOCH(CH3)2 155
Among anilides (148), chloranocryl (X = 3,4-C12, R = -C(Me)=CH2) and pentanochlor (X = 3-C1, 4-Me, R = CH(Me)C3H7) have been used practically to exterminate annual grass and broad-leaved weeds in various crop fields (67). They have the 3,4-disubstitution patterns as X as well as the branched chain alk(en)yl groups as R. Interestingly, a member of compound series 148 similar to the above herbicides, but having X = 3-CF3,4-NO2 and R = CH(Me)2 named flutamide from Schering, is an antiandrogen (68) and has been used as an antiprostatic cancer agent for some 15 years. Flutamide, having the 3,4-disubstitution pattern on the benzene ring and the branched alkyl as R, is reasonably considered to show some Hill reaction inhibitory activity. Although no description about the herbicidal activity has been
253 found, some higher homologs of flutamides in the acyl moiety have been observed to show a potent antibacterial activity (69). Quite interestingly moreover, compound 156 named nilutamide from RousselUCLAF is also a potent and selective antiandrogen being used as an antiprostatic cancer agent (70). The bioanalogous relationship between anilides and N-phenyl cyclic dicarboximides very similar to that described above in agrochemicals is observed in entirely different pharmacological category.
_ ~ O2N F3C
156
O )I.-~H O2N~ N ~ (~-CH3 ~ O CH3 F3C
H _ ~ N OH NC ' ~ (~-CH3 O CH3 F3C
O cH NHC-- CH2SO2- ' ~ F ~H3
157
158
The dicarboximide heterocycle of nilutamide (156) belongs to the imidazolidinediones (in 153). The structural differences of nilutamide (156) from the fungicidal compound series 153 are the substitution patterns on the benzene and imidazolidinedione tings. Flutamide works as its hydroxylated metabolite 157 in vivo (71). The hydroxy group in the metabolite 157 corresponds well with the NH group in nilutamide (156). Thus, nilutamide is regarded also a ring-closured bioanalog of the metabolite 157. By the way, bicalutamide (158) modified further from the "hydroxyflutamide" is now being extensively investigated for clinical use by Zeneca (71).
O~
H
HN _C=O ,C'--~ Et O Ph 159 :phenobarbital
O~
,H
QC--O
f-'
(3_
('I>
~o ~l-
(I)
,--o
~:~
0 ~-~
~---
~'~" ~;~ :~" r e
"0
0
(D 0
(I)
:~
II
I
0
OCD
I
~-b --b
0
~" CI)
C~) O -
~
I I I C-2 C"2 C'~
0
O)
I
~
I
I
I
~.l,O
0
~CD
~
o
:::~
Z ~
0
(I)
~-b ,-b
0
O(I)
:::~
~
0
(I)
~
C..~
0
O" Cl)
Z
~
~['o
~
"~ 0
0
(I)
,-b ,-b
0
O" (I)
C:)
(1)
~'=~
~" (I)
0
(I)
I
--b "~
0
CT" (I)
~
Z
0
(I)
C~
w-6
O" CI) 0
0
~"-
~" ~o
0
(I)
~.=,o
,-b
0
~" (1)
0
00(I)
(w~
~
C~
0
(I)
(1)
(I)
o-
,--,. ~:::~
"~
g)
(I)
Ca
E~
~...,o
~.,o
Ee
w,,o
(9
w-.,
0
w.,o
(3
(0 ~-~
0
0 ~-
CLCD O0
0
~D
"13
c~
~
w-- ,--.
(D
e-~'~
(~ ~'~
::~- ~:~0 CI)
C~
~-~:~
00~
"~D
C~
0
~-. 0
CD
-'.
CL
~. ::~- ~ ,.--.- '-~ ~:~
~0
0
0
0
0
(I)
~:~ ~"
O0
::3"W'~
(3- ~-+
(I) (I)
re~
( - ~ ('~ = 0
=I=
.
[mO ~---~
~-.-["0
~
0
::x:~ CI)
4~.
.
/ ~
.
O0
O0 ['.0
~
.
03
['x3 [~0
-IV
Z
r.O
4:~
03 ~.]
.
~.~
~
*-~
'~
0
~
O'J
~ L'~
.
"~
t"B
~:~ --
"0 :
~
OO -.~]
I
0
"~
~
(I)
('D
~
0
.
.
..--.~ O o
r.O C.O
I
C.O
O'J
O00q r.O ~:;~
rd~
IV ~
--
I C'~ C'~ =:Z:: Z[~
["0
O0
0
ICE>
=~
..
X
X
I
I'
~
,I=~ Oo
=:Z='.
I
~
(--2
(3"J 0"I
-....]
~ Oq
0-1
O0 Oq
"
Z
+-~
0
"'0
0
CI)
L~O ~
(3 0 (9 ~b
o
C~
~o
o
"o
~-o
(9 00 (3
Z o
0
'0
iJo
i'D
C3
,..,o
0
b-,o
0
II ,...,o
i,--o
i-j
OB
l=.,o
L~
294 c o n t r i b u t e to t h e toxicity r e g a r d l e s s of t h e i r n u m b e r
in a m o l e c u l e .
The
t h r e e k i n d s of v a r i a b l e s are listed in Table 11. FALS c a l c u l a t i o n d e r i v e d a d i s c r i m i n a n t f u n c t i o n w i t h p r e t t y good d i s c r i m i n a n t a n d predictive ability. i n c l u d e d in t h e f u n c t i o n .
As s h o w n in Table 12, 4 0 v a r i a b l e s are
F r o m t h e sign of t h e d i s c r i m i n a n t coefficient for
e a c h variable, it is inferred t h a t n u m b e r s of N, O, S, a n d CI a t o m s , b e n z e n e and naphthalene
rings, h y d r o p h o b i c i t y ,
etc. c o n t r i b u t e to e n h a n c i n g t h e
toxicity, w h e r e a s n u m b e r s of sp 3 c a r b o n a t o m s , carboxylic a c i d s a n d esters, etc. p r o b a b l y c o n t r i b u t e to lowering t h e toxicity.
TABLE
13
Results
of r e c o g n i t i o n
and p r e d i c t i o n
Recognition
Calcd 1
Obsd
1
152
2 3 N
=
324
Correct
MMG
recog
Leave-one-out prediction
12 0
= 0.859 = 87.3%
3
16
0
1
1
142
20 0
MMG = 0 . 8 0 2 pred = 80.2%
Nmi s
1 32
41(0)
Calcd
2 3 N = 324 Correct
2
99 12
Nmi s =
Obsd
u s i n g 40 d e s c r i p t o r s
2
3
26
0
89 13
= 64(2)
Rs
= 0.859
(p 12 j=l
(4)
In eq. 4, vector uij is the coordinate of the j-th atom of these chains on the moment of inertia, vector < u > is the coordinate of the gravity center of these chains (c]. Fig. 12), and m is the total number of O and C atoms in the chain. The distributions of alkoxyl chain lengths are shown in Fig. 13. Then, the effective lengths(RL) of the alkyl and alkoxyl chains were expressed relative to that of the w-chain of LTE4 defined by eq. 5.
354
.~ :--~-x
I
\.,.2~/
,;
7
/
,'
,' , ; ,,
,u2
~
"
v , ~, : '
.
~
~
~
-i,,.'
t ,,~
"~/,.
u5 u6
U'_/
/~ Moment
z
Gravity center
i'.~.
~"~,
, ,'
Fig. 12. Moments of inertia and gravity center < u > of alkoxyl
~.
of i n e r t i a
chains.
(5)
RL = )~LT
In eq. 5, )~LT is the covariance value of the w-chain of LTE4 in its most stable conformation. We took the length of the w-chain in its most stable conformation as the reference, because this chain can be assumed to take the most extended conformation when it binds to its receptor(38). A value of RL - 1 indicates an identical effective length to t h a t of the w-chain at certain 0
9
,
9
,
9
,
3
o~
, k
20 i
i
t
f
10
=:" / -/
CnH2n,1-0
L)
5
:,~ 6 :~: '7 "~......;, !..i i
8
9
10
hi'...
X" - ' ",,.,t-.,".'$,,";>"7%-,: A'Ja-". t ' """ " " " ""-" .t-
..
~::
::"
:: .... "';
20
40
""..:i'"
.... ...
60
Z# Fig. 13. Distribution probability (Pi) of i-th conformation of the alkoxyl chain length as a function of the covariance value ()q). Numbers besides traces represent chain lengths n.
356 1.0
i
0.5
0.0 ,,J
-0.5 -1.0
-2.0
J!
qp
-1.5
i
f 4
5
6
7
8
9
10
Fig. 15. C h a n g e of DL with length of the alkoxyl chain of B P O s . n: C n u m b e r of the chain.
lengths with the w-chain length: the highest similarity corresponds to Fs 1.0 and D L = oe, and 50% similarity to Fs = 0.5 and D L = O. Values of D L are summarized in Table 1. The D L value of the alkyl group of C, was about the same as that of the alkoxyl group of OC,_I, indicating that the feasibilities of these corresponding chains to have an effective length similar to the w-chain are very close. The highest D L values were observed with chains of C7-C8 and OC6-0C7. Fig. 15 shows the change in D L with the alkyl chain length n of BPOs. The D L value was maximal between n - 6 and 7, at which maximal activity was observed. It is noteworthy that the curve was not symmetric, change of D L with n being very steep in the region of less than n - 6 and relatively less in that of more than n - 7, indicating that the flexibility of an alkoxyl chain (and alkyl chain) becomes greater with increase in its chain length.
0
EFFECTS OF EFFECTIVE LENGTHS OF ALKYL AND ALKOXYL CHAINS OF BENZAMIDES ON ANTI-LT ACTIVITIES
It was of interest to know whether antagonists with chains of similar effective lengths to that of the w-chain exhibit potent anti-LT activities. Thus, we analyzed the antagonist activities of these benzamides in terms of DL, and obtained the significant correlation shown by eq. 8. plso-
7.088 + 0.478 D L + 0.735 Io + 1.096 IBp (+0.240) (+0.215) (+0.261) (• (n -- 18, r -- 0.957, s = 0.259)
(s)
355
BPMs
I0.0 7.5
o~
~
__~
--~i
7.5
.o_
-T"I
5.0
BPOs
I0.0
::::::::::::::::::::::::::
..
~.~
. ..
g:::,.:.....::l r ~..::~.~-~:~ ::.. 9 ::.~
~ 5 - 0 F.:i:!:::::~::"
2.5
2.5
-
i !
0
ii i
U,~I ~:~:~:~i.a . . . . . . . . .
0.4
/
i::..
.~.:.. :::~ !
0.8
!
12
!
.6
0 2
0
0
0.4
0.
O
!.2
1.5
2.0
RL
RL
Fig. 14. Conformation probability (P~) of alkyl chain (left) and alkoxyl chain (right) of B P s as a function of their effective lengths relative to that of the w-chain of LTE4. N u m b e r s besides traces are C numbers.
conformations of alkyl and alkoxyl moieties. Some of the results with BPMs and BPOs are shown in Fig. 14. Binomial distributions of conformations were observed, the distributions being the same for BPMs and BPOs with the same chain lengths. To quantify the similarities in the effective lengths of alkyl and alkoxyl groups to that of the w-chain of LTs, we estimated the feasibilities of these chains to take a certain range of lengths similar to the length of the w-chain of LTE4. The feasibility is expressible by summation of the probability of occurrence (P) of conformations in the RL range of 0.8 to 1.2, shown by the shaded area in Fig. 14. The sum of the areas under the distribution curves in this range is referred to as Fs as shown in eq. 6.
(i "0.8 _< RL
~
, ~o o
"--4-)
0
4-)
(d
~
~ ~9
"o @ O
H
@ r
~
~
(1) ~
~ O
4m
o @ .~--) 0 ~
o9
~
>
"~
~ o B.0
. ~ (D .~
O q-~
r.q q) ~ O
~ ~ 0 , "~
~ 0 "J
O
~ 0
cO ~9
(D 4-)
(D (~ ,-~
(D
.~ "q a::l r..q 09 (D
~ ~
~ ~
0
~
r::~
O
~
~
4-) O
,~
~
ad 0 ~
-~
4-) O ~
>)
cO ~
,~ O
~ @
0 .~
~1
O o
~ @
~
~
~
q-~
o .~
,~
~'~
~ "~ ~2 bO
O ..I-n
Oq-~
,~9 ~ ~
O
~_) ~
~-~ o . ~ ~'~
>. > )
~~
4~
0
0 0 (D > ,--~ @ ~1 @ .>~ (D (D . ~ q q_~ .,_~
~
O"~
~ j:z~.'~
0
O
.~ (D 094o ~ ccI 4-)
~-~ Z ~.~
~=
~ "0 ~;Z: s.-C~ s ~ r,q "~-) :Z~ ~3: (]) led -H .,-~
3:
~'~
(D.~ ~ ~
~. O
4.~0 ~ @ ~ (D ~
~4m 0 ~
~
'-'-'
~
~.~'~ . ~ .~.4~H ~ " ~ O (b ._~
F.--, " ~
"~
t~ ~d (D-~ O ~
~ . 0~ O.-, O I:D~~c::~ ~
~
A
q_~ O . H ~ @ .~ O O ~ F._., rc~0 (D a::l ( D . ~
~
(I) ~
4m 0 .~'~
O
"~
~
~
=
~ ~::~ (b
Z~9
~=
@ q-" i:::z,.~ ..o ~ ~ ~ {,q ~>..~[/~ ~ (b ~
ca~
.1_~4_.~ 0 [/..) O ~ .~ ~
~ ~.~
O'~
N
~
O..,
[n.O ..~ ~
2~o >~:~o=.~='~
--~L) ~
~
o
.ID O
'-,-Oh
f::~
r~
O'~
~
""~ O .~~_~ ~.< ~ Q O "~ (D o ~ 4 . ~ ~ O-'4~ - ~ ~ I Q~D ~ ~-~ ~-~ @ ' ~
~
~,~
o
~ =~
~ ~ ~ ~
.
.~
(D ~
~
4--) @ ;:> o c~ (]) ~ (b ,..c:; (b . aJ
~
"~'c0:0
,r_,
(b
O.
;;"
~.,
~z~
o
~-~
z
O = I
"~ I
O 4-) ~ s
~
o
f-~
--
m= o
~
~-4 ~d L~ 4m
F..q
'--1
~
~'~
~,~
,~
~
_
~
~
[-~ Z ~
~: ~>~ ~
~.~
m,,,,,
=
[.~
[-~ ~'~ Z
"0
~
(d
~-~
.H
~:;
9
o
(b S:,,
cd
~ ~
~
~ ~
H H
.,-.-i 4-) ~
s ,,~
(~ 4:= 4-)
O ~
(1) .D
g
"~ ~ O
~
~
"
"
0 4-)
H
@
(lJ ~ a~
O
2
0
~ ,~
O
0
0 .~ r
~ O
~>)
ad ~
~
~
0
~
~
~
~:;
~
s 4--) (D
,----I
(d
~
(1] 4--) ad ,-~
(b
O
"
,.~
.I~ "~
(])
(D o ad
('d
9
O9 I
~
~ _~I
aJ
(D
(D ~ ~
ral3
~ H
~-~
"
s @
H
~"
H H m
~= ~
4_.)
"~
(])
.I-)
O
~
~ I or) ad .- '~ ~
(D ,q
o9
(])
"~
>)
O ~
..~
,~
O ~
O O
~ @ ~
"~D
.~ 4--) ~
O
O ~---"4~ ~-I 9 (1.)
r.,q :
~
(D .~ O
O
~
o O
~: (D ~ ~
~ O
I
4-)
.D
O
q-~
~
~ ~
4-.~
~
,..~
.,~ >
0 (b
r.q
~
0
:>~
ccl ~
4-) O (D
Q~
(D 4-) O
o
@
@
4--)
.~
(b
(])
"u
.,~
,-~
~
O
~
~
X..sd
~
"O
@
~
.s
"~
..,o
.~
~
~
~ ~.~
~
cd ~
O 0 ~
(1)
~
(b ~ b9
~
~ 4--) 0
.~
0
~
X:::::
*
~
s
0
(b
" '--
(D
Q) > ~
~aO ~9
~
.~
R
.
~-~
;>~ ~ ~
I
~ a~ ~
~ (6 O .~
O
~
0
L~ @
"~ E
~>).~
;>~ 2h4
O_, ~
cd
0
"~ -~ ~_~
~ "~
.~
=
>'~ (b ~
.~
~
r.._)
~
~
0
c-f"
:z~
c~
~ ~ C~ ~ c-I" ~ ' ~ 9
b~
~
(1) ~ c-~ r~
~
~
c-~
0
=
~
=0
9
c~
~
Ct
c-~
(I)
~
(1)
~-~
0
~ c-~ ,~ 9 ~I" :z~-"
t-,. c_l.
ct Ca
< ~.
~ ~-
0
~
:z~ ('D
C~
~.
:~-,
~
o
c~ 0
~-~
0
I C~
~ ~ 0 ~ (1) ~
~-~
~
~
I~
0
(I)
c~
0
~ 0
,--3 ~
c-~ :z~ ~)
0 ~
< I~
(-~
.
~ ('D ~ ~
0
(1)
"
(-r :z%
(1) ~
~
~ c~
t-.,.
~
~. C3
~.
1~ ~ c'~ I~ O~ 0
C~ ~ I
(I)
~.
0
~)
~ .,.
~)
(1) ~
~"
0 ~ c-f l~
0
c-l" ~ 0 ~ I~ 0
Cr C)" ~. c-f"
~
:z~
~
~_~
~.
~.
ct
~-' 0
~
~.
=
~"
0 0
c-I" ~ (1) ~ t~. ~
I ~Z~ t--,. C~
9
~.
~
~
~-~
~ ~.
~"
~
~
~.
I~
~" 0
~. ~
~ ~ c-l" ~ C~ 0
C-~ I~ '
~D
~
~-~-
~
~
:~
~
~ ~
~ r~
I---I
~
~ ~
c-l" ~.
~J ~ c?
'~" ,--t ~ 0
{I} ~z: 4-~
~ ~ ~ O I ~ CO ~'~
~ ~ r~
4-) 1:2, l~ ~
F~ O o
.,--I 4-~ ~ ~
~ (D "O
~ (D .~
4-) ~ .~
~ O
~ 9 .~
4~ 4-} ~1
I cO I:~
~ ~z: 1:2,
"u ~ ccl
.~ ~
~
O
~ (D
~
"" ::> 4_)
~ ~
"U (1.}
.,~ 4-) ~
.~ .z~ :3: "O
,z~
,.D '-I
O ~-~ ~
60
4..) ~ ~9
O ~
(D
{1:1
I H ~-4
~
~
r
(1.}
,_c::; E~
~ O
~ ~
.~ ~
~.)
m.~ ~ ~ ~
~
~ I
~ od
~ I
"~E~ ~ ~
~ .,~ (1.}
~
~
4-~
4-) ~
(~ ,--t
O
"~
(D
~
O q-~ (D
(D 4-) O 1:2,
4~
,_~ ~
O
.H E~ ~
.~
~
O
(D
~
q-t
{1:1 Ix: O
(D (D C.)
~9 Od II (D
(~ "~
1:2,
,--I
"
(D
"O
or) II
4~ (D
,---I
(D ..~
~
~
.,--I
q-~ ~
4-~
. eo
O
~ O
-,-q
~ ~ ~1
.,-i
~ "C::l
~
~ ~J ~
II
~ ~1
O
4~ ~ ~
~ I--I
~
4~ ~ .~
"(~
~ (I.} >
~ (3.} ~= ~:
I
o ..~
4.~ .,-I
4D
r~
9
i__1
9
o
or} cq
il
Om
Om
9
~T
,5
c:)
~--
O,.T
II
c)
coo odb-9 .. o ,.s c:) o,J cO
-t-
9
['-- o,J o'~oO
+
I-- c )
o,d
coo or~ ("q
i
C ) C)
('X.I
1:2,
Cb
~
rc'~
E~
o
(D
~r
~ ~D
H ::~
W
~D ~
~ ~"
0
~
~-~ 0
ct
d)
D3 ~ ~.
ct
0
c~ Z:r
0
~D
c-f ~
0
t---' ~
~ ~.
~ ,---3
~D
X ~. E~ ~
(1)
~--~
~
0
~
~
~D
h~
0
~
~
,~
-
9
cr
c-t
~
0
~
~
~
c-I" ~"
zz
ce
~
~
~
~
dD ~
~
~
~
~
c-f
c-l"
CD
0
~ I:D
~
.
0
~
~
c~
~.
~
ct
~ ~0
~ ~
~-~
9
:_~
~
~:~
~.
~
~ 0
~
~ ~1"
r~
0
~
i_,"
c-I"
< ~.
CD
~
~
('D o
0
c-f t:r
~
~D O~
~
r..~
~
Crq
'-~
~D
__~
~
~
"
,---, PO
-.--,CO
0k33
0~0 C.O C)~
~13>
"
I
II
II
I
~='
:~'
I
I
0
PoI~
I
0
~9. . ~ .
I
I
I
I
.~r
~
I
I
I
~ I
I
O. r
~
I
I
I
X
~
I
I
I
0 I
I
I
~ I
I
I
~ I
I
I
I I
I
I
I "0 _~.
I I
I
I
0
-3
~D ---~r
I
g')
~
c ~ O0 4:~ o o 0
0 0 m ~ c'+~D
--~
0 -~.. N
- l - - t - G)
I
v
~
~D ~D ~D ~
~9- ..~.
~D ~
I I
I
I
I
I
t
I
~-
0
I
~-
~
I
-~
~
c+r
-1~ 0 7 Po -.q 4~ -.q ~
~
~
~D ~D CD CD ~D ~D fD
C~ 0-1 4::~ O0 r,o t--~ 0
~
I'D
:::~
~ 00
1
_..j
~:
"-
~
0
I
Z-'~ ~-"J
PO
"~"
~D I
0 7 4:~ r,o P~ 4::~ 0 O0 -I:::~ 4:~ 4::~ ~ 0 CY~ PO ~ 0
rD 0
~D - - C ~ , - -
"~ I'D
-IDO I
I'~0 ~-, 0 0 Cr~ 4::~ 0 4:~ O0 Po
A CY~ CY~ CY~ Cr', r.,.rl Lrl C~ CY~ Cr~ CY~ C~ C~ 0 7 . . . . . . . . . . . .
I
I
c_~.
N ~D O_
-$
0
~D
DO
X~
C~
"~ I'D
~
.-J.
o
,-~-
~ ,,
~
I
~
I
_...a
o
•
o
('D ~
i
r---
v~ c-
O0
('D
-S --.~ I
_.1.
"~"
..J. 0
fl)
~
CD
- o .-J. ::5-
o
" o9 ~ - ~
o
--J. ~ Po
376 molecule. used
and
as
For the sake
values
of simplicity,
relative
to
that
these
of
H:
steric
A MR(X)
parameters
= MR(X)
were
- MR(H)
A B5(X ) = B5(X ) - B5(H ).
Table 3 Ca-antagonistic activity and physicochemical parameters of R3-substituted compounds (II) Me0,
CN
Me
Me0~C-(CH
2 )3N (CH2) 3 0 0 ~
MeOr--- R3
Me
PA2 Compd. No.
R3
~
a)
) AMRb) AB5C
11-5 H 0.00 0.00 11-6 Me 0.54 0.46 11-7 Et 1.08 0.93 11-8 n-Pr 1.62 1.39 11-4 iso-Pr 1.49 1.40 11-9 n-Bu 2.16 1.86,, 11-10 iso-Bu 2.03 1.86~! II-11 n-Hex 3.24 2.79~! 11-12 , n-Oct 4.32[! 3.72!! II-13 g) n-dodecyl 6.48t) 5.58t) 11-14 benzyl 2.22 2.90 II-15 (CHg)~OMe-0.32~! 1 57f) 11-16 (CH~i~OEt 0.50t) 2 03f) a) b) c) d) e) f) g)
0.00 1.04 2.17 2.49 2.17 3.54 3.45 4.96 6.39 9.27 5.02 3.49 3.81
A c) Obsd.d) Eq. 1 B1 Calcd.(A )e) 0.00 0.52 0.52 0.52 0.90 0.52 0.52 0.52 0.52 0.52 0.52 0.52 0.52
Eq.3
Eq.2 Calcd.(A )e)
5 . 5 6 6.28(-0.72) 6.76 6.91(-0.15) 7.44 7.33 (0.11) 7.79 7.52 (0.27) 8.05 7.49 (0.56) 7 . 2 1 7.50(-0.29) 7.53 7.52 (0.01) 7.46 6.79 (0.67) 5 . 0 6 5.21(-0.15) 5.33 -0.80 6.48 7.48(-1.00) 6.80 6.22 (0.58) 6.68 6.56 (0.12)
Calcd. ( A )e) 5.76 (-0.20) 6.77(-0.01) 7.38 (0.06) 7.46 (0.33) 7.38 (0.67) 7.43(-0.22) 7.45 (0.08) 6.68 (0.78) 5.10(-0.04) -0.49 6.63(-0.15) 7.44(-0.64) 7.35(-0.67)
5.83(-0.27) 6.61 (0.15) 7.15 (0.29) 7.43 (0.36) 7.43 (0.62) 7.47(-0.26) 7.47 (0.06) 6.79 (0.67) 5.11(-0.05) -1.28 6.64(-0.16) 7.48(-0.68) 7.42(-0.74)
From ref. i i unless otherwise noted. Scaled by 0.i and from ref. 12 unless otherwise noted. Calculated from the values cited from a brochure given by Dr. A. Verloop. pA9 values in the KCl-depolarized guinea-pig taenia coli. A~ the difference between observed and calculated values. Estimated from those of closely related substituents, see ref. I0 for the detail. Omitted from the correlation.
In Eqs. because reason
of was
another length
]-3 compound its not
quality
in T a b l e
especially
large
the
in terms
deviation
in Eqs.
2 and
from
extra have
correlation
particularly 3,
an
might
of the
was omitted
deviation
but
site
of the R 3 chain.
satisfactory, shown
clear,
receptor
The
(II-]3)
pronounced
for
from the c a l c u l a t i o n
the
correlations.
binding
arisen
interaction
due
of Eqs.
to
the
]-3 was
of the standard alkoxyalkyl
3. We o r i g i n a l l y
The with
increased by no m e a n s
deviation.
derivatives thought
that
As was the
"
~< .
E~
I
~ ~
I:1)
0
c-t.
~" ~
d)
~ 0 ~
~-, CD
~ c~" ~ (D
1~
l::::Z,
I
of" ~
~
,-~
b~
~
C-~
0
C~
~~ .
~
~.
0"g :
~
~
~. O
c-t
~ o
:~.
~
~
(-I"
D~...
(-~
o
O l:::z, (D
~
~
I-~.
CD
~
0
c-t
d)
c-t"
~
~.
~
O
I
o O ::=t
O c-I" ~z~ (D
~-'
dD O
I:1)
~
~ .
0
"~
c-t"
~-]
CD ~ ~
O E~ ~ dD ~"
~ O (-I"
,,.
~ D~
Po
r
~
0
c--t"
""
~
0
"1
O
~c'-t" "
I
(-t~ ~ ~
~-
~.
I k,n ~ ~
co
CD
I
~0
~
.~j
H
~m
~
T
~0 ~, ,-
~
m
PO
O ~ ~
~~.~
~
~"
~ o
m
_~>/c~_ a o
o'o
zo,-
Z
[]
u1 I
2 2 PA2+0.30ALpara-0.73ALpara+0.77AMRmeta-1.12AMRmeta
(D
c-f
~-'-
O ~
I- ~
(1)
Ca
(I)
c~
~z~
~"
~-'.
~
~ ~
c-I"
(D
ct
~
~a
~"
~
~
~
ct ,.
C~
Ca
~
ct
~
(1)
"
~
~.
~
t- ~
0
~
~
-~
~D
c~
(1)
(1)
~
c-I" ~
(1)
c#
~.
Ca ~
E
~
0
(D ~
c-t
~
c~
~
~
(1)
~
~.
0
c-t"
0 ~
~
(D
0
0
c-t" ,. 0 ~
O
~0
0
~"
0
(1)
(1)
c-~
~.~
0 ~
~
c-~
0
(-t, ~'~
O
(I)
c7"
~
"
~ ~
~"
(D
c~
hb
~
~
~
~
9 b,~
Ca
0
~
ct
~.
[]
~=
----4
C)
I
"-~
0
~ :~
cf
0
(D
~
Oq
0 (D
'
~
H~ 0
Ca
'
--~ Po
(D
c#
~
(1)
~"
c-t"
~
o
~
C/)
0 ~
Ca
~z~
~
0
~.
=~'
0
~-b 0 ~
Ca
Ca
~
~
~
~
0
~,
9
c-~
0
~
(D
I=7" (I)
0
~" < r ~
< ~
-q ~
,~ ~
0 ~ ~) ~
~I
.,~ ~-1
~
~.
IS~
~
.~ '~,
,
~--, ~-~
~ ~z~ (I)
< ~. ct" H. (I)
0 c-~
I
" o I~0
II I
CO ~ .. I-~.
r"-
bd
P
P
r--
t> r---
I>
. ".4. r~)
0
XD
bd
0
P
P
. k~). -.j
0
X~)
bd
,, ~,,
.C~ t'o
P
P
I>"
X=)
bl
.--J
~,,
P
P
X=)
bd
,,
--J
P
P
XD
bl
(.~
,~,
P
P
bl
m
O -h
~F
F~z
CD
-q
=~
.mII hO
CD
9
II
9
+
9
+
+
--~
ho
c=zn ---,
CDLn
.
CD--~ . . . O~ ~ --~WI
P~
~ O O
~13 ~
.
I
[-~
~-j
~U ---
c-~ O
~ O
LIl - q
(1) c-t ~) ,~,
:z~
~
~U ~ (1) C-~
CD--~
~
O
-~
hO