Reviews in Computational Chemistry Volume I11
Keviews in Computational Chemistry 111
n
Edited by
Kenny B. Lipkowitz and Donald B. Boyd
8 WILEY-VCH New York
Chichester
Weinheim
Brisbane
Singapore Toronto
Kenny 13. Lipkowitz Department of Chemistry Indiana University-Purdue University at Indianapolis 1125 East 38"' Street Indianapolis, Indiana 46205, IJSA
[email protected] Donald B. Boyd Lilly Research Laboratories Eli Lilly and Company Lilly Corporate Center Indianapolis, Indiana 46285, USA boyd~~dotiald_b~lilly.coin
A NOTE TO 'I'tlE READER
This book has been electronically reproduced from digital information stored at John Wiley & Sons, Inc. We are pleased that the use of this new technology will enable us to keep works of enduring scholarly value in print as long as there is reasonable demand for them. The content ofthis book is identical to previous printings. Copyright 0 1992 by John Wiley & Sons, Inc. All rights reserved. Originally published as ISBN 1-56081-619-8 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections I07 and I08 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 0 1923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, N Y 10158-0012. (212) 850-601 I , fax (212) 850-6008, Ltnail
[email protected].
For ordering and customer service, call 1 -800-CALL-WILEY Library of Congress Cataloging-in-Publication Data Reviews in computational chemistry / edited by Kenny B. Lipkowitz and Donald €3. Boyd p. cm. Includes bibliographical references and index. ISBN 0-471 - 18853-0 1 . Chemistry-Data processing. 2. Chcinistry--Mathematics. I. Lipkowitz, Kenny B. II.Boyd, Donald B. QD39.3.E46KS 1993 92-30 192 54 1 '.2'2 dc20 ~
Printed in the IJnited States of America. 10 9 8 7 6 5 4 3 2 I
Preface Computational chemistry, as a discipline, is still growing steadily. Although it has historically been aligned with physical chemistry, computational chemistry has taken on a vibrant life of its own and spread into other domains of chemical research and education. In some disciplines such as organic chemistry, molecular modeling has almost become a part of the regular routine of research. Other disciplines, such as polymer chemistry, have only recently begun to integrate computational approaches to solving problems. At the 1992 Spring National American Chemical Society meeting, 20% of the 636 presentations in the polymer division were classified as modeling and computer simulation. At that same meeting, fully 70% of the exposition workshops involved computational chemistry. One may argue that this is because computers are easy to carry to conventions, but we maintain that there is great interest in computational chemistry because it is a "hot" field. This ACS meeting is not exceptional. The number of meetings touching on aspects of computational chemistry is truly amazing. For instance, during a single two-month period in 1991, more than 30 conferences, symposia, and workshops were scheduled." If these meet-
-
'The following were among the meetings held in May and June 1991: 1st Canadian Symposium on Computational Chemistry (Orford, Quebec); 2nd International Symposium on Chiral Discrimination (Rome, Italy); 3rd International Symposium on Molecular Aspects of Chemotherapy (Gdansk, Poland); 5th Molecular Modeling Workshop (Darmstadt, Germany); 1lth International Meeting of the Molecular Graphics Society (Chapel Hill, North Carolina); 12th American Peptide Symposium (Cambridge, Massachusetts); 24th Midwest Theoretical Chemistry Conference (DeKalb, Illinois); 32nd American Chemical Society (ACS) National Organic Chemistry Symposium (Minneapolis, Minnesota); 46th Ohio State University International Symposium on Molecular Spectroscopy (Columbus, Ohio); A6 Znitio Methods in Quantum Chemistry Symposium Honoring Klaus Reudenberg (Ames, Iowa); ACS Workshop on Molecular Modeling: Methods and Techniques (Athens, Georgia); Computer Prediction of Polymer Properties and Structure (Atlanta, Georgia); Computers in Chemistry (Rochester, New York); Conference on Aspects of Drug Design (Urbana-Champaign, Illinois); Design of Antiviral Agents (Buffalo, New York); Florida School on Applied Molecular Orbital Theory Workshop (Gainesville, Florida); IBMPolygen Polymer Modeling and Simulation Workshop (Lowell, Massachusetts); Innovative Applications of Computational Chemistry: RISC-Based Parallel Supercomputing (Skokie, Illinois); International Symposium on Computer Simulation of Biomolecular Systems and Mechanisms (Menton, France); Journees de Groupe de Graphism Moleculaire (La Croisic, France); MATH/CHEM/COMP 1991 (Dubrovnik, Croatia); Large Scale Computation for V
vi
Preface
ings had been back-to-back, a quick traveler could have spent 60 solid days learning about progress in computational chemistry! Although the intensity reached a crescendo in this two-month period, meetings on computational chemistry were held throughout 1991, and 1991 was not atypical of recent years. The field is creating a lot of ripples. Computational chemistry is “hot” not only because of advances in computer hardware and algorithms, but also because there have been documented (and undocumented) successes, particularly in the pharmaceutical industry. Scientists in adjacent fields view these successes with great expectation. We are living in exciting times! There are so many developments in the field of computational chemistry that it is difficult to keep track of them. For that reason we established this review series. As in previous volumes, we attempt to treat computational chemistry as broadly and evenhandedly as possible. It should be obvious that not all facets of computational chemistry can be covered in each and every volume. Eventually, however, the existing and future volumes of Reviews in Computational Chemistry, when taken in toto, should constitute a useful guide to the field. We asked the authors of the chapters to begin from ground zero and provide for you a minitutorial on how to implement various computational methods to solve problems. Rather than create a traditional review article, that is, a compilation of data and references that sits in a library, we want you to be able to use this series to learn how to solve problems using computational methods and to be able to locate key references quickly. These chapters are not meant to be surrogate textbooks nor can they replace the original published papers; however, they are meant to be of interest to both experts and novices.
-
Quantum Physics and Chemistry (Namur, Belgium); Massively Parallel Computing: Applications and Techniques in Computational Chemistry (Somerset, New Jersey); Methods of Molecular Mechanics and Dynamics of Biopolymers Workshop (Pittsburgh, Pennsylvania); Recent Developments in Electronic Structure Algorithms (Ithaca, New York); Residential School on Medicinal Chemistry (Madison, New Jersey); St. Louis Regional Gathering on Computer-Aided Design and Computational Chemistry (St. Louis, Missouri); Static, Kinematic, and Dynamic Aspects of Crystal and Molecular Structure (Erice, Italy); Symposium on Computational Aspects of Inorganic Chemistry in Biological Systems at the Mid-Atlantic Regional ACS Meeting (Newark, Delaware); Symposium on Molecular Design and Modeling at the Joint Central-Great Lakes Regional ACS Meeting (Indianapolis, Indiana); Symposium on Molecular, Electrical, Optical and Magnetic Properties and Interactions at the Joint Central-Great Lakes Regional ACS Meeting (Indianapolis, Indiana); Symposium o n Practical Aspects of Computational Chemistry at the 21st Northeast Regional ACS Meeting (Amherst, Massachusetts); Tripos Associates Users Meeting (St. Louis, Missouri). Each meeting lasted one to five days. It is interesting to note that computational chemistry is indeed somewhat of a cottage industry. Whereas a meeting in a huge discipline, such as organic chemistry, biochemistry, analytical chemistry, or molecular biology, may attract thousands of scientists, the computational chemistry gatherings have tended to bring together 20 to several hundred people at a time.
Preface
vii
In this volume, several key concepts used by practicing computational chemists are brought into focus. The first chapter by Tamar Schlick is dedicated to the mathematics of optimization. After some mathematical preliminaries, approaches to large-scale optimization are described. Basic decent structure of local methods is highlighted, and then nonderivative, gradient, and Newton methods are explained. Chapter 2 by Harold Scheraga follows up on the theme of optimization. This chapter on predicting the three-dimensional structures of oligopeptides begins by describing how to construct a model of an oligopeptide chain and then describes methods developed in his laboratory for solving the multipleminimum problem. This is followed by an extension to larger polypeptides and proteins. The multiple-minima problem, which used to look so knotty, is finally being unraveled. The third chapter maintains the theme of determining conformations by describing how to generate initial structures of organic and bioorganic molecules and how to model experimental NMR data. Andrew Torda and Wilfred van Gunsteren also discuss refinement methods, force fields, systematic errors and biases, and the quality of predicted structures. In Chapter 4, David Lewis introduces computer-assisted methods in the evaluation of chemical toxicology. He points out that any substance can be toxic, and thus it is the dose of the substance that determines a toxic response. How, then, does one predict toxicity? Lewis examines QSAR methods, pattern recognition techniques, computer modeling, and knowledge-based systems to answer this question. Ideally, one would like to assess toxicity of a structure before the compound is synthesized. To bring all this into focus, emphasis is placed on the cytochromes P450. An updated, greatly enlarged compendium of software for molecular modeling appears as the Appendix. Programs that run on personal computers, minicomputers, workstations, mainframes, and supercomputers are listed together with some of their features. Telephone numbers and addresses of the vendors and/or developers are provided. To our knowledge, this is the most complete listing of sources of software for computational chemistry anywhere. It may surprise some readers, but as editors we have essentially no control over the price of each volume, other than by setting an upper limit on the number of pages. We have worked diligently with VCH Publishers to keep the price of each volume as low as possible. This volume and Volume 4 represent an experiment of having fewer chapters, and hence pages, per volume. The last volume of Reviews in Computational Chemistry (Volume 2) grew to be very thick (527 pages) with a corresponding price. Suggestions have been made to have smaller books that students and others could acquire more readily. Also, by having smaller volumes, readers can pick those containing the topics of most interest to them. Hence, Volumes 3 and 4 are about half the thickness of Volume 2. We want the books in this series to be accessible, affordable, and
viii Preface
useful. Your comments directly to us will be appreciated and will help us design the best format for future volumes. We express our gratitude to the authors for their fine contributions. We encourage the readers of this review series to recommend other topics by writing to us. Donald B. Boyd and Kenny B. Lipkowitz Indianapolis May 1992
Contents 1.
Optimization Methods in Computational Chemistry Tamar Schlick Introduction Mathematical Preliminaries Notation Problem Statement Matrix Characteristics Conditions at Minima Analysis of Functions Basic Approaches to Large-Scale Optimization Size and Space Limitations Search Techniques Local and Global Methods Basic Descent Structure of Local Methods Descent Directions Line Search and Trust Region Steps Convergence Criteria Convergence Characterization Nonderivative Methods Gradient Methods Steepest Descent Conjugate Gradient Preconditioning Nonlinear Conjugate Gradient Newton Methods Overview Discrete Newton Quasi-Newton Truncated Newton Perspective and Computational Examples Comparisons Numerical Example I: Rosenbrock Minimization Numerical Example 11: Deoxycytidine
1 3 3 3
4 5
8 16 16 16 18 20 21 21 26 28 29 30
30 30
32 34 35 35 38
38
43 47 47
51 55
ix
x
Contents
Numerical Example 111: Water Clusters New Technologies Acknowledgments References 2.
61 63 64 64
Predicting Three-Dimensional Structures of Oligopeptides Harold A. Scheraga Introduction Theoretical Foundations Generation of Oligopeptide Chain Residue Geometry End-Group Geometry Constructing a Molecule Ring Closure without Symmetry Ring Closure with Symmetry Early Use of Hard-Sphere Potential More Realistic Potentials Potential Functions Optimization Methods Ancillary Techniques Application to Simple Systems Multiple-Minima Problem Build-up Methods Optimization of Electrostatics (Self-consistent Electric Field) Monte Carlo plus Minimization Electrostatically Driven Monte Carlo Adaptive Importance Sampling Monte Carlo Increase in Dimensionality Deformation of the Potential Energy Hypersurface Mean-Field Theory Simulated Annealing Extension of Methodology to Large Polypeptides and Proteins Build-up Method Build-up with Limited Constraints Calculations with Constraints Use of Homology Pattern-Recognition Importance Sampling Minimization (PRISM) Outlook for the Future Acknowledgments References
73 75 76 76 76 77 79 80 81 82 82 93 93 94 102 102 106 106 110 110 115 116 118 121
123 123 123 126 127 128 129 129 130
Contents 3.
Molecular Modeling Using Nuclear Magnetic Resonance Data Andrew E. Torda and Wilfred F. van Gunsteren Introduction Scope and Definitions Historical Perspective Molecular Representation Generating Initial Structures Metric Matrix Method Variable Target Function Method Other Methods for Generating Initial Structures Modeling of Experimental Data Distance Restraints Averaging over Discrete Conformations Time-Averaged Distance Restraints Direct NOE Refinement Dihedral Angle Restraints Refinement, Minimization, and Dynamics Molecular Dynamics Other Derivative-Based Dynamics Schemes Other Non-Derivative-Based Schemes Force Field Force Field Parameters and Accuracy Force Field Modifications Systematic Errors and Biases Quality of Structures Future Directions Acknowledgment References
4.
xi
143 143 144 145 146 147 149 151 152 152 153 154 155 157 158 158 159 161 161 161 162 163 164 165 166 166
Computer-Assisted Methods in the Evaluation of Chemical Toxicity David F. V. Lewis Introduction Computer-Based Methods for Toxicity Evaluation Quantitative Structure-Activity Relationship Pattern Recognition Techniques Computer Modeling and Knowledge-Based Systems The Cytochromes P450 COMPACT Molecular Orbital Calculations and QSARs in Toxicity Critical Assessment of Predictive Methods Conclusions and Future Prospects
173 176 176 177 179 182 190 200 203 208
xii
Contents
Information Sources for Software for Computer-Aided Prediction of Toxicity Methods Acknowledgments References
210 21 1 21 1
Appendix: Compendium of Software for Molecular Modeling Donald B. Boyd Introduction References Software for Personal Computers General Purpose Molecular Modeling Quantum Chemistry Calculations Databases of Molecular Structures Molecular Graphics and Other Applications Software for Minicomputers, Superminicomputers, Workstations, and Supercomputers General-Purpose Molecular Modeling Quantum Chemistry Calculations Databases of Molecular Structures Molecular Graphics and Other Applications
223 225 226 226 229 230 230
Author Index Subject Index
249 26 1
233 233 23 9 243 246
Contributors Donald B. Boyd, Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana 46285, U.S.A. (Electronic mail: boybdonalbb@lill y.com) David F. V. Lewis, Molecular Toxicology Research Group, Division of Toxicology, School of Biological Sciences, University of Surrey, Guildford, Surrey, GU2 SXH, United Kingdom Harold A. Scheraga, Baker Laboratory of Chemistry, Cornell University, Ithaca, New York 14853-1301,U.S.A. (Electronic mail:
[email protected]) Tamar Schlick, Courant Institute of Mathematical Sciences and chemistry Department, New York University, 251 Mercer Street, New York, New York, 10012, U.S.A. (Electronic mail:
[email protected]) Andrew E. Torda, Physical Chemistry, ETH Zentrum, CH 8092, Zurich, Switzerland (Electronic mail:
[email protected]) Wilfred F. van Gunsteren, Physical Chemistry, ETH Zentrum, CH 8092, Zurich, Switzerland (Electronic mail:
[email protected])
xiii
Contributors to Previous Volumes Volume 1 David Feller and Ernest R. Davidson, Basis Sets for A6 Znitio Molecular Orbital Calculations and Intermolecular Interactions. James J. P. Stewart, Semiempirical Molecular Orbital Methods. Clifford E. Dykstra, Joseph D. Augspurger, Bernard Kirtman, and David J. Malik, Properties of Molecules by Direct Calculation. Ernest L. Plummer, The Application of Quantitative Design Strategies in Pesticide Design. Peter C. Jurs, Chemometrics and Multivariate Analysis in Analytical Chemistry. Yvonne C. Martin, Mark G . Bures, and Peter Willett, Searching Databases of Three-Dimensional Structures. Paul G. Mezey, Molecular Surfaces. Terry P. Lybrand," Computer Simulation of Biomolecular Systems Using Molecular Dynamics and Free Energy Perturbation Methods. Donald B. Boyd, Aspects of Molecular Modeling. Donald B. Boyd, Successes of Computer-Assisted Molecular Design. Ernest R. Davidson, Perspectives on A6 lnitio Calculations.
-
'Current address: University of Washington, Seattle.
xvi Contributors to Previous Volumes
Volume I1 Andrew R. Leach,t A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules. John M. Troyer and Fred E. Cohen, Simplified Models for Understanding and Predicting Protein Structure. J. Phillip Bowen and Norman L. Allinger, Molecular Mechanics: The Art and Science of Parameterization. Uri Dinur and Arnold T. Hagler, New Approaches to Empirical Force Fields. Steve Scheiner, Calculating the Properties of Hydrogen Bonds by Ab Initio Methods. Donald E. Williams, Net Atomic Charge and Multipole Models for the Ab Initio Molecular Electric Potential. Peter Politzer and Jane S. Murray, Molecular Electrostatic Potentials and Chemical Reactivity. Michael C. Zerner, Semiempirical Molecular Orbital Methods. Lowell H. Hall and Lemont B. Kier, The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. I. B. Bersuker and A. S. Dimoglo, The Electron-Topological Approach to the QSAR Problem.
Donald B. Boyd, The Computational Chemistry Literature.
-
tCurrent address: University of Southampton, U.K.
CHAPTER 1
Optimization Methods in Computational Chemistry Tamar Schlick Courant Institute of Mathematical Sciences and Chemistry Department, New York University, 251 Mercer Street, New York, New York 10012
INTRODUCTION In his 1970 Numerical Methods, the witty numerical analyst Forman Acton wrote:
-
It is with a sense of reluctance that your author introduces this topic, for minimum-seeking methods are often used when a modicum of thought would disclose more appropriate techniques. They are the first refuge of the computational scoundrel, and one feels at times that the world would be a better place if they were quietly abandoned. But even if these techniques are frequently misused, it is equally true that there are problems for which no alternative solution method is known-and so we shall discuss them. A better title for this chapter might be "How to Find Minima-If You Must!"'
Although there may be some truth in this view, there is little doubt today that multivariate minimization algorithms are fundamental research tools for scientists in numerous disciplines. A common problem arises when a complex physical system is described by a collection of particles, or combinations of states, in Reviews in Computational Chemistry, Volume 111 Kenny B. Lipkowitz and Donald B. Boyd, Editors VCH Publishers, Inc. New York, 0 1992
1
2
Optimization Methods in Computational Chemistry
a multidimensional phase space. An energy or cost function is associated with each different configuration, and the challenge is to find sets of points that minimize (or maximize) the objective function. Such applications arise frequently in molecular modeling, rational drug design, mathematical biology models, neural networks, combinatorial problems, financial investment planning, architectural design, electronics, meteorology, and computational geometry. I wonder: Would Acton still call us all scoundrels? The purpose of this chapter is to provide computational chemists a brief background into unconstrained nonlinear optimization methods that seek local minima. Emphasis is placed on methods that are most powerful today for largescale problems (hundreds to thousands of independent variables) and suitable for potential energy minimization. Only a general practical taste of this very rich and theoretically interesting field is attempted here. For comprehensive treatments of optimization methods, interested readers are referred to a large selection of excellent books on a wide spectrum of levels, from introductorylJ to comprehensive3-7 to specialized-topic volumes.8-12 The numerical linear algebra and multivariate calculus background can be found in most of the introductory optimization books as well as in standard books on numerical methods and matrix computations.13J4 Many general concepts mentioned throughout this chapter without explicit citations are discussed in these volumes; however, an attempt is made to make this review as self-contained as possible. Another introductory note may provide further incentive for novice optimizers to read on. Despite extensive developments in optimization methods in the last decade, large-scale nonlinear optimization still remains an art that requires considerable computing experience, algorithm familiarity, and intuition. In general, “black box” minimization implementations, even those using state-of-the-art algorithms, are only partially successful. “1 have been burnt by too many black boxes,” declares Acton in his 1990 preface.’ There are at least two reasons for our shared sentiment. First, many commercial software vendors, such as NAG and Harwell, have developed general-purpose programs that are, for the most part, easy to use but not always up-to-date with the latest minimization developments. Thus, successful new minimization approaches are often out of reach for nonspecialist mathematicians, let alone scientists in related application fields. Second, application tailoring of algorithms is often very difficult with ready-made software. Such modifications may be crucial in many applications. For example, a common thread in applications in meteorology, chemistry, or mathematical biology is a natural separability of the objective functions into components of differing complexity (e.g., local and nonlocal interactions). This composition may not only change the relative attractiveness or suitability among different optimization methods, but also lead to very powerful methods for the application at hand when this information is incorporated appropriately.
Mathematical Preliminaries
3
Such problem tailoring requires some familiarity with the algorithmic modules. It also demands knowledge of the theoretical and practical strengths and weaknesses of the different minimization methods. With rapidly growing improvements in high-performance super and massively parallel machines,lS116 application-tailored software may be even more important in combination with parallel architectures whose design is motivated by specific applications.
MATHEMATICAL PRELIMINARIES Notation We generally denote scalars by lowercase Greek letters (e.g., p), column vectors by boldface lowercase Roman letters (e.g., x), and matrices by capital italic Roman letters (e.g., H). A superscript denotes a vector or matrix transpose. Thus XT is a row vector, xTy is an inner product, and AT is the transpose of the matrix A. Unless stated otherwise, all vectors belong to R",the n-dimensional vector space. Components of a vector are typically written as italic letters with subscripts (e.g., xI,x2,. . ,xJ. The standard basis vectors in R" are the n vectors {e1,e2, . . . ,en}, where ei has the entry 1 in the jth component and 0 in all others. Often, the associated vector norm is the standard Euclidean norm, 11-112, defined as
.
\ r. = l
I
We say that a term is of order n and write O(n)to mean that it is proportional to n.
Problem Statement We are interested in solving the optimization problem without constraints: minimize f ( x ) ,
xER"
PI
where f is a real-valued function of n variables. We will assume that f is sufficiently smooth, that is, possesses continuous derivatives at least up to second order. The gradient of f a t x is defined to be the first derivative vector V f ( x ) ,or g(x), whose n components are given by gi(x) = aE(x)/ax,. The Hessian at x, H(x), is defined to be the n x n matrix of second partial derivatives with components Hjj(x) = [d2f(x)/dxi ax,]. As d 2 f ( x ) / d x j ax, = @f(x)/ax, a x , the Hessian matrix is symmetric: Hij(x) = Hji(x).
4
ODtimization Methods in Combutational Chemistry For example, for the following function of two variables,
the gradient vector is
and the Hessian is the matrix
[
2ex1(&$
+ 2 x , x 2 + x $ + fix, + 4x2 + 4 ) + x2)
H(x) = 4exr(1 + x ,
4exl(1 + x1 + x 2 ) 4 ex1
I-
PI Just as in the one-dimensional case the derivative defines the slope of the tangent line to the curve f(x), the gradient vector at x represents the normal to the tangent hyperplane at the point x. The term hyperplane is an extension to n > 3 of a plane in three dimensions. All vectors x,y satisfying xTy = y for some constant y lie in a hyperplane in higher dimensions. Thus, all vectors y satisfying g(x)Ty = y where y = g(x)Tx lie in the tangent hyperplane at x. A point x* E R" is said to be a strict local minimum off if there exists a number q > 0 such that f ( x ' ) < f ( x ) for all x f x" within a given distance q from x * (i.e., IIx - x'II < q). For a weak minimum, only f ( x " ) 5 f ( x ) holds. The point x' is a global minimum of fif f(x*) < f(x) for all x # x', x E R".Figure 1 illustrates these possibilities for a one-dimensional function.
Matrix Characteristics There are several general characteristics of a matrix that are particularly useful for analysis of minimization algorithms. Density of a matrix is a measurement given by the ratio of the nonzero to zero matrix components. A matrix is said to be dense when this ratio is large and sparse when it is small. A sparse matrix may be structured (e.g., block diagonal, band) or unstructured (Figure 2). A symmetric matrix A is said to be positive-definite if the quadratic form UTAU > 0 for all nonzero vectors u. Similarly, the symmetric matrix A is positive-semidefinite if UTAU 2 0 for all nonzero vectors u. Positive-definite matrices have strictly positive eigenvalues. We classify A as negative-definite if UTAU < 0 for all nonzero vectors u. A is indefinite if uTAu is positive for some u and negative for others. For example, the function f(x) defined in Eq. [3] can be written as the product of ex1 and a generalized quadratic function: f ( x l , x z ) = ei[xTAx]. The matrix A = ;[ $1 is positive-definite.
Mathematical Preliminaries
5
The Hessian matrix is a generalization in R" of the concept of curvature of a function. The positive-definiteness of the Hessian is a generalized notion of positive curvature. Thus, the properties of H are very important in formulating minimum-seeking algorithms. Higher-order derivatives are rarely used in minimization methods; however, some recent approaches, termed tensor, have attempted to approximate higher-order information cheaply.17
Conditions at Minima What are the sufficient conditions that must hold at a solution point of problem [2]?These conditions are simply extensions to R" of the well-known first and second derivative conditions for univariate functions. Let us assume that f(x) is a smooth function with continuous first, second, and third derivatives defined for all x. We further suppose that the following conditions hold for the gradient and Hessian of f a t x" E R": (i) g(x") = 0, (ii) H ( x " ) is positive-definite.
Then it can be easily shown that x" is a strict local minimum of f. The Taylor expansion of f ( x ) about x" along a perturbation vector p = (Ax1,Ax2,.. . , Axn)produces
Ootimtzation Methods in Combutational Chemistry
6 0
e
I8
n
38
4E
M
....... ........
I
~
........ ........ ........ .................
~
I
~
~
I
-
......... ......... ......... .................. ......... ......... ......... ......... .................. ......... ......... ......... ......... ......... .................. ......... ......... ......... ......... .................. ......... ......... ......... .................. ......... ......... ......... ................. .................. ........ ......... ....... -
-
-
-
-I
6:
c
c
-8
l
*
~
I
1 l I
I
I~
I
1
~1
~
I
~
~
I
'
t
Figure 2 Sample matrix patterns for (a) block diagonal and ( b e ) sparse unstructured. Pattern (b) corresponds to the Hessian approximation (preconditioner) for a potential energy model from the local energy terms (bond length, bond angle, and dihedral angle terms), and (c) is a reordered matrix pattern that reduces fill-in during the factorization. Pattern (d) comes from a molecular dynamics simulation of supercoiled DNA36 and describes pairs of points along a ribbonlike model of the duplex that come in close contact during the dynamics trajectory; pattern (e) is the associated reordered structure that reduces fill-in.
c 2 ( A x ; ) ( A x , )az/o + higher order terms. n
+ i=l
j>i
'
oxjoxj
More compactly, this expansion can be written as
fb"+ P) = f ( X * ) + g(x*)Tp + tpTH(x*)p + ocllpll".
[71
T
-
Mathematical Preliminaries
7
. . . . . . .... . ................................ tr..b..~..r..b..t..b..b..t.,'..b..
I
0.b
I
I
I
-
e . . . . . . . . . . . . . . . .
zI
-
U . . . . . . . . . , . . .
36
-
.**
0..
45
H
....... ............ ... ::: .... ... *.... ... ... ... ... ... ... .... ...
0.. +*.
64
63
-
*.
0.-
-
-
-
0..
-
72-
-
... ... ... ........-
I
18
21
Y.(. 0
0

0..
.........
0.0
,
,
1
38
,
t
t.*,*.I
45
s4
I , , l , c I , T 0.- J
63
72
a1
As condition (i) holds, the gradient term vanishes, and condition (ii) implies ~ ~ ~ . that we that there exists some a > 0 for which pW(x*)p> c t ~ ~ Itp follows can write
For arbitrary small perturbation vectors llpll, the first term on the right dominates the second; consequently, the right-hand side is positive. By definition, x" is a strict local minimum off. 0 A point x* is called a stationary point of fif g(x*) = 0 but H ( x * ) is not necessarily positive-definite. Thus, local and global minima are stationary points, but there are more general stationary points, such as saddle points, which are neither local nor global minima. Special techniques are needed for detection of saddle points, which are often related to structural transitions in molecular applications.
8
Optimization Methods in Computational Chemistry
Figure 2 (continued)
Figure 3 illustrates the different types of stationary points for onedimensional functions. The minimum and maximum at the origin are shown, respectively, for a convex and concave quadratic function. Note that the traces in the x,z and y,z planes are parabolas, opening upward (above origin) in the case of the minimum and opening downward (below origin) for the maximum. Elliptical cross sections can be noted in the x , y planes. The saddle point at the origin exhibits parabolic traces in the x,z and y,z planes, opening upward and downward, respectively. The trace in the x,y plane is a pair of intersecting lines.
Analysis of Functions The positive-definiteness of H is a useful concept in analysis of general functions. Smooth functions can be approximated by quadratic models within a sufficiently small neighborhood of a given point. The local behavior of f can then be analyzed in terms of the properties of H.
Mathematical Preliminaries
9
0
16
32
4a
64
80
86
112
1
Figure 2 (continued)
To see this, consider the Taylor expansion of a quadratic function q about a stationary point x*: The symmetry of H implies that the n distinct eigenvectors {vi}associated with the eigenvalues {A;} are orthonormal (i.e., vTvt = 0 €ori # j , and vTvi = 1for all i). We can then express p as the linear combination of eigenvectors
for a scalar set (a;}. As vTHvi = A; for each i, the difference in function value caused by a movement along p is given by
10 T
0
Optimization Methods in Computational Chemistry '
16
I
I
0.n on
... . . . ... . ... "
....
. .... . . .. ..-. ... .... .. .'........ --.. . -........ .. .".. . . ...". 9.
I .
32
..
- 4
..'
*
I
* 98
..... . .. " .. ." .: .. .. -.." . . .
.n
. ... ." I .
*
.
112
... .. ..... .. .. ... .. ..-.. . .. . .... ...
n
16
.
. ..-- ...
."
. . .. . -
32
I 48
. - .... ."...". ....-..." .". ." . ................ ".-.. .. n
n .
. n 0.0
I
I
...
,.\.;'Td.:
64
80
-- -
a..
0-
40
--
... ... . . .. - ...... -. .. ....... ......" ...... .. . I
0.
n . ."
..
I -
64
I
00
06
112
Figure 2 (continued)
Clearly, the signs of the eigenvalues influence the change in q along particular directions. For example, when p = av;for a > 0 and some i, q will be strictly increasing if A; > 0 and strictly decreasing if X i < 0 as a increases, respectively. If A; = 0, q is a constant along directions parallel to vi (because Hvi = 0) and reduces to a linear function along this direction (the quadratic term in Eq. [9] vanishes). When all the eigenvalues are positive, x" is a unique global minimum of q. If H is positive-semidefinite (i.e., has nonnegative eigenvalues), x" is a weak minimum. If H is indefinite and nonsingular (i.e., no eigenvalues are zero), the critical point x' is a saddle point. For example, the two-dimensional function defined in Eq. [3] has a local minimum at (0,O) and a saddle point at (-2,2). Both points are stationary. At
X
maximum
saddle point
z
t
X
Figure 3 Types of stationary points.
12
Optimization Methods in Computational Chemistry
2
(O,O), f = 0 and the Hessian is 4 3, with determinant (product of eigenvalues) of 16 and eigenvalues 6 f 20, or about 10.5 and 1.5. At (-2,2), f = 8e-2 = 1.1 , and the Hessian
0
[ 4e-2
4e-2
1
has determinant -16e-4 < 0 and eigenvalues 2e-2(1 f d), or about 0.88 and -0.34. The reader can verify that the function f ( x ) = ex1(xl - k 2 ) 2 has weak local minima for all points of the form x1 = 2x2, because the Hessian at those points,
2ex1 -4ex1 -4eX1 8exl
1,
has eigenvalues 0 and 10exl > 0. Contour plots of two-dimensional functions help illustrate these concepts. In general, the equation f ( x ) = y defines a surface in R"+l.When n = 2, the plane curves corresponding to various values of y generate contour plots (or maps) of a function. Figure 4 shows the contour plots for the two-dimensional functions discussed before. Note, for the first, the two stationary points corresponding to a minimum and saddle point. For the second, note the region of weak local minima. The contour plots are shaded so that darker areas correspond to higher function values. Figure 5 illustrates more generally various cases that can occur for simple quadratic functions of form q ( x ) = ~ x T H x ,for n = 2, where H is a constant matrix. The contour plots display different characteristics when H is (a) positive-definite (elliptical contours with lowest function value at the center) and q is said to be a convex quadratic, (b) positive-semidefinite, (c) indefinite, or (d) negative-definite (elliptical contours with highest function value at the center), and q is a concave quadratic. For this figure, the following matrices are used for those different functions: with two positive eigenvalues: A = 3 2 lh. 3, with the one zero and one positive eigenvalue: A = 0,5. (c) H = [d with one negative and one positive eigenvalue: A = -1,4. with two negative eigenvalues: A = -2, - 1 . (d) H = [-d -!I,
(a) H = ;[ (b) H =
[t
$1,
41,
As Figure 4 shows, nonlinear functions produce contour maps that are complex composites of these pure quadratic cases. Contour plots for n = 2 also help illustrate paths toward minima specified by various minimization methods. The gradient vector is orthogonal to the contour lines. The familiar notion in one dimension that the negative tangent vector at a point x points toward the minimum of a convex quadratic extends naturally to higher dimensions. Thus, if the contour plots are circular
Mathematical Preliminaries
13
4
z 0
-.2
-4
4
-2
0
2
4
-2
- 1
0
1
2
--
Figure 4 Sample contour plots and partial three-dimensional views of nonlinear ) ; = two-dimensional functions: (a) f(x1,x2) = eX1(4xl2+ 4x1x2 + 2 ~ ~(b)~f(x,,x,) exl(X*
- 2x#.
14
Optimization Methods in Computational Chemistry
4
-2
2
4
Figure 5 Sample contour plots for quadratic two-dimensional functions q ( x ) = f x T H x , where H is (a) positive-definite, (b) positive-semidefinite, (c) indefinite, or (d) negative-definite.
Mathematical Preliminaries
4
. ..
2
0
2
4
4
2
0
--
2
- 4 Figure 5
-4
(continued)
-2
2
4
15
16
Optimization Methods in Computational Chemistry
(e.g., q ( x ) = XWX,where H is a positive multiple of the identity matrix), a movement along the gradient will find the minimum in one step. More generally, the shape of the elliptical contours near local minima depends on the eigenvalues { X i } and eigenvectors (v,} of the Hessian in that neighborhood. The axes of the elliptical contours point in the direction of the orthogonal eigenvectors, and the length of each axis corresponding to the ith eigenvector is proportional to l/Ai. Thus for f ( x )defined in Eq. [3], h, = 7h2 near the origin, and the elliptical axes are elongated proportionately (Figure 4).In Figure 5d, for example, the axes of the ellipses are along the basis vectors, the eigenvectors corresponding to the diagonal matrix.
BASIC APPROACHES TO LARGE-SCALE OPTIMIZATION
Size and Space Limitations The task of minimizing potential energy functions arising in molecular mechanics is typical of optimization applications seeking favorable configurational states of a physical system. 18-23 The sheer size of configuration space and complexity of the system introduce two major problems: extensive computational requirements and the multiple-minima problem. Computational intensity often stems from the long-range interactions among the N particles in the system (e.g., Coulombic forces). Their direct evaluation requires order of N2 operations. Even if a cutoff radius is introduced, computation of the nonbonded terms dominates computation time. Thus, implementation of fast particle methods24Js in molecular mechanics and dynamics calculations may be important for reducing the severity of this problem in the future. For now, if we assume order of N iterations for a minimization algorithm, the computational complexity will be O(N 3 ) , a dimension barely manageable for large biological systems, even on supercomputers. Rapid and reliable performance in practice is then a central focus in selection of suitable minimization algorithms. The multiple-minimum problem is a severe handicap of many large-scale optimization applications. The state of the art today is such that for reasonable small problems (30 variables or less) suitable algorithms exist for finding all local minima for linear and nonlinear functions. For larger problems, however, many trials are generally required to find local minima, and finding the global minimum cannot be ensured. These features have prompted research in conformational-search techniques independent of, or in combination with, minimization.26
Search Techniques Consider, for example, the two different energy minima of n-butane, corresponding to the anti and gauche conformers, as a function of the dihedral
Basic Approaches to Large-Scale Optimization
17
angle about the central C-C bond (Figure 6). This representation is a simplification from the full space of 36 degrees of freedom (3N - 6 where N = 14 atoms) but suffices for partitioning the sampling space approximately. When we scale up the model to alkane chains of m residues, the number of possible starting points obtained from combinations or rough partitions in monomer structure produces 3 m starting configurations. For polypeptides and polynucleotides, the flexibility of the monomer configurations increases, producing a rough range of 1Om to 2Sm reasonable starting points by coarse subdomain partition (e.g., combinations of typical side-chain, main-chain, backbone, or sugar dihedral angles), Exhaustive searches are clearly not feasible. The buildup technique is a related configurational search strategy, used mostly in protein studies27928 and more recently in nucleic acids.29 Reasonable starting points are constructed by combining minima of conformational building blocks. This rational strategy has performed rather well in practice but provides no guarantee that all biologically important local minima, much less the global minimum, are revealed. As computational time tends to be large for biopolymers, the simplest stochastic global optimization method, simulated annealing,8.9 may be preferable for very large systems. Molecular dynamics can also be viewed as a complementary technique for obtaining structural information to potential energy minimization.30 Although in theory information on all thermally accessible states should be observable, the restriction of the integration time step to a very small value with respect to time scales of collective biomolecular motions30J1 limits the scope of
Figure 6 Potential energy of n-butane as a function of the dihedral angle about the central C-C bond.
18
Optimization Methods in Computational Chemistry
molecular dynamics in practice. Implicit numerical schemes32-36 or other special techniques37 may reduce the severity of this problem in the future. Typical molecular dynamics simulations today are effective for refining low-resolution crystal structures and for observing small fluctuations around equilibrium.30 They are often also effective in combination with minimization as “annealing”like procedures for complex systems.38 Together, these general considerations of sampling and complexity have led to two broad classes of multivariate minimization algorithms: local and global (see Figure 7).
Local and Global Methods Local methods, the focus of this chapter, are essentially descent methods. They are defined by an iterative procedure: {xo,xl, , , ,xk, . . }that attempts to find one local minimum x “ from a given xo. In each step, a search vector Pk is computed by a given strategy, and f is minimized approximately along that direction so that “sufficient decrease” is obtained (see next section). Thus the structure of local methods is basically a “greedy” one: effort is directed toward a nearby local minimum, and steps that may increase the function are not permitted (Figure 8). Their performance is clearly sensitive to the choice of starting point in addition to search direction and algorithmic details. Global optimization methods attempt to remedy the deficiency of local methods by exploring larger regions of function space. The global minimum of a function can be sought through two classes of approaches: deterministic and stochastic. Deterministic methods usually require the objective function to satisfy certain smoothness properties; they construct a sequence of points converging to lower and lower local minima. Ideally, they attempt to “tunnel” through local barriers, as shown in the schematic path in Figure 7. Computational effort tends to 6e very large and guarantee of success can only be obtained under specific assumptions. Local minimization methods are often required repeatedly in the framework. Although a variety of interesting algorithmic approaches are currently under investigation,11-26J9-42the field of deterministic global optimization is still in its infancy in terms of general largescale applicability. Nonetheless, by exploiting massively parallel environments for algorithm design,43>44we can anticipate much progress in the coming years. Stochastic global methods, on the other hand, involve systematic manipulation of randomly selected points (Figure 7).11942345346 Success can be guaranteed only in an asymptotic, stochastic sense, although in practice many applications are very promising. The simulated annealing method is a popular and effective technique for small-to-medium molecular systems.8,9~47-50 Simulated annealing is very easy to implement and generally requires no derivative computations. It is often a useful technique when the energy formulation is complex, as in macroscopic models of supercoiled DNA.51 Local optimization methods have experienced far more extensive development in the last decade. Studies have produced a range of robust and reliable
.
.
Basic Approaches to Large-Scale Optimization
19
stochastic global algorithm
deterministicglobal algorithm
Figure 7 Structure of local and global minimization algorithms.
techniques tailored to problem size, smoothness, complexity, and memory considerations. Newton methods, for example, have produced many variants that effectively lift their original application scope to small or sparse problems only. In the case of potential energy functions, unconstrained optimization problems can generally be formulated for large, nonlinear, and smooth functions. Obtaining first and second derivatives may be tedious but is definitely
20
Optimization Methods in Computational Chemistry
Figure 8 Descent structure of local minimization algorithms.
feasible. This additional effort is in fact profitable overall, because a gain in performance will often be realized. As storage and cost considerations are directly related to the function complexity and derivative calculations, the choice of method for potential energy minimization should be based on the extent of possible derivative manipulations (e.g., first derivatives, first and second derivatives). After a brief discussion on the structure of descent methods, we highlight general concepts in the three categories of local methods for unconstrained nonlinear functions: nonderivative, first derivative (or gradient), and second derivative methods.
BASIC DESCENT STRUCTURE OF LOCAL METHODS The fundamental structure of local iterative techniques for solving unconstrained minimization problems is simple. A starting point is chosen, a direction of movement is prescribed according to some algorithm, and a line search
21
Basic Descent Structure of Local Methods
or trust region approach is performed to determine an appropriate next step. The process is repeated at the new point and the algorithm continues until a local minimum is found. Schematically, a model descent method can be written as follows:
Algorithm [All: Basic Descent
* Supply an initial guess xo. * For k = 0,1,2, . . . , until convergence 1. Test xk for convergence.
[All
2. Calculate a search direction Pk. 3. Determine an appropriate step length hk (or modified step sk). 4. Set xk+l to x k + hkpk (or xk + sk).
Descent Directions It is reasonable to choose a search vector P k that will be a descent direction, that is, a direction leading to function reduction. A descent direction p is defined as one along which the directional derivative is negative:
When we write the approximation
we see that the negativity of the right-hand side guarantees that a lower function value can be found along p for a sufficiently small A. Different descent methods are distinguished by their choice of search directions. The various strategies are discussed in the next sections.
Line Search and Trust Region Steps Both line search and trust region methods are essential components of basic descent schemes for guaranteeing global convergence.3.6.~2-55Of course, only one of the two methods is needed for a given minimization algorithm. This notion of convergence refers to obtaining a strict local minimum x* from any given starting point xo and not the global minimum of a function. A line search consists of an approximate one-dimensional minimization of the objective function along the computed direction pb. This produces an acceptable step A and a new iterate xk + Apk.Function and gradient evaluations of the objective function are required in each line search iteration. In contrast, the trust region strategy minimizes approximately a local quadratic model of the function using current Hessian information. An optimal step that lies within
22
Optimization Methods in Computational Chemistry
the trust region of the quadratic model is calculated, and the new iterate becomes xk + s k . The trust radius in the trust region approach is estimated on the basis of the local Hessian's characteristics (positive-definite, positive-semidefinite, indefinite). The basic idea is to choose sk nearly in the current negative gradient direction (-gk) when the trust radius is small, and approach the Newton step -Hk-'gk as the trust region is increased. (Hkand gk denote the Hessian and gradient, respectively, at xk). Note from condition [ 121 that these two choices correspond to the extremal cases ( M = I and M = H) of general descent directions of form p = -M-lg, where M is a positive-definite approximation to the Hessian. Although line searches are typically easier to program, trust region methods may be effective when the procedure for determining the search direction pk is not necessarily one of descent. This may be the case for methods that use finite-difference approximations to the Hessian in the procedure for specifying Pk (discussed in later sections). As we shall see later, in BFGS quasiNewton or truncated Newton methods line searches may be preferable because descent directions are guaranteed. In general, there has been no clear evidence for superiority of one approach over another. Indeed, both techniques are incorporated into minimization software with approximately equal success. Later we sketch the line search procedure, which we find easier to grasp and simpler to implement in practice. For further perspective and algorithmic details, the reader is referred to the recent review of Dennis and Schnabel and references cited rherein.54 The line search is essentially an approximate one-dimensional minimization problem. It is usually performed by safeguarded polynomial interpolation.Sl6,54-56 That is, in a typical line step iteration, cubic interpolation is performed in a region of A that ensures that the minimum of falong p has been bracketed. The minimum of that polynomial then provides a new candidate for A. If the search directions are properly scaled, the initial trial point At = 1 produces a first reasonable trial move from xk. A simple illustration of such a first line search step is shown in Figure 9. The minimized one-dimensional function at the current point x k is defined by f ( A ) = f ( X k + hpk). The vectors corresponding to different values of A are set by %(A) = xk + Apk. At the first step, a cubic polynomial can be constructed from the two function values {(O), {(A,) and the two slopes g(O),g(X,). The slopes are the directional derivatives defined as g(A) = g(xk + hpk)'rpk.Note in the figure a negative slope at A = 0, as p k is a descent direction. In general, precautions are needed to ensure that a minimum for f ( A ) has been bracketed in the trial interval. For example (see Figure 10, for the first step), if (a) the new slope is positive (&) > 0) or (b) the new function value is greater than the old (!(A,) > f ( O ) ) , then there is a relative minimum along pk between f ( 0 ) and %(Ac), We can proceed to find it by minimization of a cubic (or quadratic in special cases) interpolant; however, if neither of these conditions holds, the function has decreased and continues to decrease at %(A,).
Basic Descent Structure of Local Methods
I I
x=o
I I
'
I
I
h'
At
23
,h
Figure 9 A one-dimensional line search for f i h ) = f ( x + hp) by cubic interpolation produces an approximate minimum along p, at ~ * , f i ~ * ) .
The step may then be too small and should be increased until a minimum is bracketed. Alternatively, the step may be too large and can be decreased until the new slope is positive (see Figure 1Oc). More generally, for the bracketed interval [A,, A,] and correspondiqg function and slopes fI,f2,2,,2,, the cubic polynomial passing through (hl,fl) and (A2,(*) and having the specified slopes is given by
P(A) = u ( A -
+ 6(A - A l ) 2 + c(A - A,) + d,
~ 4 1
where
A minimum of p(A) can be obtained by setting A = A,
+ [ -b + s
c ] / 3 u
1151
I br
,A
I
I I
Figure 10 Possible situations in line search algorithms between the current and trial points: (a) The new slope is positive. (b) The new slope is negative but function value is greater. (c) The new slope is negative and function value is lower.
h=O
I
8
8
Basic Descent Structure of Local Methods
25
as long as a # 0 and 62 - 3ac 2 0. Otherwise, a quadratic interpolant fitted to can be constructed with the same coefficients,
f l , ~ land , f2
p(X) = b(x
- X,)2
+ C ( X - X,) + d,
P61
- ~126.
~ 7 1
and minimized to produce X = XI
The degenerate case of 6 = 0, corresponding to a linear function rather than a quadratic and redundancy among the three values {fl,f2,gl},is excluded by construction. Once a new X and corresponding trial point %(A) have been determined in a line search iteration, conditions of sufficient progress with respect to the objective function are tested. If these conditions are not satisfied, a new value for h is sought in another line search step of interpolation, following a backtracking strategy (i.e., reduction of A2). The idea in these tests is to ensure “sufficient decrease” in the objective function for the new point relative to the step taken. These conditions balance the work in the line search procedure with the overall progress realized in the minimization method. The conditions often used in optimization algorithms are derived from the Armijo and Goldstein criterion.6 They require that the inequalities
and
hold for two constants a#, where 0 < a < p < 1. The first condition pre-
scribes an upper limit on acceptable new function values. Note that condition [18a] requires that f(xk + Apb) lie on or below the line /(a)= f ( x k ) + aAg(xb)Tp(Figure 11).The line I (1)corresponds to the directional derivative at xk, whereas the line l(0)corresponds to f ( x k ) .Thus, the larger a is, the more stringent will be the condition of satisfactory decrease. The second condition involves changes in the magnitude of the directional derivatives. The smaller p is, the greater the accuracy in determining an optimal A for a stationary point of f along pk. Condition [18b] can also be interpreted as a lower bound on X. In other words, the optimal X should not be too small (see Figure 11). Typical values of a and p in line search algorithms are a = 10-4 and (3 = 0.9. With this combination (small OL and large p), usually all choices of X that satisfy condition [18a] satisfy condition [18b] as well. The combination selected within a framework of an optimization method is important for balanc-
26
Optimization Methods in Computational Chemistry
I I I
I
I
I
d region of permitted
,h
)L
Figure 11 Line search acceptance conditions.
ing the effort expended in each line search phase with the overall progress realized toward a minimum. More accurate line searches, for example, can be formulated with smaller values of p (e.g., 0.5 or 0.2). As we will see, this control of accuracy is important for nonlinear conjugate gradient algorithms. In particular, if the computational effort during the line search is low relative to the work required for determining the search direction (e.g., function and gradient versus Hessian calculations), more accurate line searches may be beneficial; however, if the target problems are complex and the initial guess is not close to a minimum, accurate line searches may be unwarranted.
Convergence Criteria In the basic descent algorithm [All, we iterate “until convergence.” How exactly do we evaluate the optimality of an approximate minimum x* ? Furthermore, how do we ensure that computations will not continue unnecessarily (a) when no further progress can be realized or (b) beyond attainable accuracy? The accuracy depends on machine precision and cumulative roundoff errors, in addition to algorithmic details. The simplest test for optimality of each xk involves the size of the corresponding gradient. One such test can be formulated as follows:
Basic Descent Structure of Local Methods
27
The parameter eg is a small positive number such as 10-8. A reasonable choice is around the order of the square root of machine precision, E, defined as the smallest number x such that the floating point value of (1 + x ) is greater than the floating representation of 1. This precision-dependent (i.e., double versus single) and machine-dependent quantity is approximately the value of the unit roundoff, or 2-(f+l)for binary computer arithmetic involving t binary digits (or bits) in the fractional part of the number. For example, for double-precision computations on a DEC VAX, t = 52 and ,E 10-’6. As computational errors will enter from sources other than finite arithmetic, a suitable eg is then a number greater than or equal to 10-8. The function term is included in the right-hand side of condition [19] to make the convergence test approximately invariant under function scaling. The value 1 is added because the test may be too stringent in the event that the minimum function value is very small. As problem size increases, the Euclidean norm of the gradient increases also. Therefore for large-scale problems it is more appropriate to check the size of an “average” gradient element by dividing the Euclidean norm by t/5;, for example. Alternatively, the max norm,
-
may be used to replace ll& or IldlJfi in the left side of condition [19]. To obtain a measure of progress at each iteration-and possibly halt computations if necessary-a natural test involves the convergence of the sequence of iterates (xk} as well as the corresponding function value { f ( x k ) } .A suitable combination can be formulated as
where Ef > 0 is a small number that specifies the desired accuracy in the function value. The square root of Ef in condition [21b] is suggested by a Taylor expansion analysis.5 In addition to conditions [21a,b], another test involving Ef may be imposed on the gradient norm to evaluate the optimality of the converging iterate:
In both [21b] and [21c], a scaled Euclidean or the max norm may be more appropriate for large-scale functions. Test [21c] uses (er)1/3rather than (ef)1/2, as suggested theoretically, to reduce the severity of the triplet combination.5 Test 1191 is useful in case an iterate falls very close to a solution or when no further progress can be made for f a n d x.
28
Optimization Methods in Computational Chemistry
Each step of the descent method [All may then check conditions [ 191 and [21a-c]. For xo, only condition [19] is checked. If either the triplet [2la,b,c] or [19] hold, the iteration process may be halted.
Convergence Characterization Different descent methods, distinguished by their choice of search directions and implementation details, have varying computational costs and convergence properties. Convergence properties of most minimization algorithms are analyzed through their application to convex quadratic functions. A general multivariate convex quadratic can be written as
where A is a positive-definite n x n matrix, b a constant vector, x an arbitrary vector, and c a constant. The constant term may be set to zero, as it does not influence the position of the minimum. The gradient of this function is the vector Ax + b, and the Hessian of q is the matrix A. Our optimality conditions imply that the minimum x" of q(x) can be found by setting the gradient equal to zero: AX" = -b.
1231
As A is positive-definite, this solution is unique, and x" is the global minimum of q. Because general functions may be approximated by a quadratic convex function in the neighborhood of their local minima, the convergence properties obtained for convex quadratic functions are usually applied locally to general functions. This does not, however, in any case guarantee good behavior in practice on complex, large-scale functions. The convergence properties of an algorithm are described by two analytic quantities: convergence order and convergence ratio. A sequence {xk} is said to converge to x" if the following holds: limk,, (Ixk - x*ll = 0. The sequence is said to converge to x" with order p if p is the largest nonnegative number for which a finite limit f3 exists, where
When p = 1 and p < 1, the sequence is said to converge linearly (e.g.,xk = 2-k for n = 1);when p = 1 and $ = 0, the sequence converges superlinearly (e.g., Xk = k - b ) ; and when p = 2, the convergence is quadratic (e.g.,x, = 2-2k). Thus, quadratic convergence is more rapid than superlinear, which in turn is faster than linear. The constant p is the associated convergence ratio.
Nonderivative Methods
29
NONDERIVATIVE METHODS Minimization methods that incorporate only function values generally involve some systematic method to search the conformational space. In coordinate descent methods, the search directions are the standard basis vectors. A sweep through these n search vectors produces a sequential modification of one function variable at a time. Through repeated sweeping of the n-dimensional space, a local minimum might ultimately be found. Unfortunately, this strategy is inefficient and not reliable.3.4 A well known variant of the basic coordinate descent scheme is Powell’s method.57 Rather than specifying the search vectors a priori, the standard basis directions are modified as the algorithm progresses. The modification ensures that, when the procedure is applied to a convex quadratic function, n mutually conjugate directions are generated after n sweeps. A set of mutually conjugate directions { p k } with respect to the (positive-definite) Hessian A of such a convex quadratic is defined by pT Apj = 0 for all i # j . This set possesses the important property that a successive search along each of these directions suffices to find the minimum solution.3*4Powell’s method thus guarantees that in exact arithmetic (i.e., in absence of roundoff error), the minimum of a convex quadratic function will be found after n sweeps. Nonderivative minimization methods are generally easy to implement and avoid derivative computations, but their realized convergence properties are rather poor. They may work well in special cases when the function is quite random in character or the variables are essentially uncorrelated. In general, the computational cost, dominated by the number of function evaluations, can be excessively high for functions of many variables and can far outweigh the benefit of avoiding derivative calculations. If obtaining the analytic derivatives is out of the question, viable alternatives remain. The gradient can be approximated by finite differences of function values, such as
for suitably chosen intervals {hi}.533 Alternatively, automatic differentiation, essentially a new algebraic construct,s9-61 may be used. In any case, these calculated derivatives may then be used in a gradient or quasi-Newton method (described later). Such alternatives will generally provide significant improvement in computational cost and reliability. Despite these drawbacks of nonderivative methods, their ease of application has made them the choice for several potential energy applications.27362
30
Optimization Methods in Computational Chemistry
GRADIENT METHODS Methods that use analytic-derivative information clearly possess more information about the smooth objective function. Gradient methods can use the slope of a function, for example, as the direction of movement toward extremum points. Second derivative methods can also incorporate curvature information from the Hessian to find the regions where the function is convex. Two common gradient methods are steepest descent (SD)3-5,63 and conjugate gradient (CG).'4,64-67
Steepest Descent Steepest descent is one of the oldest and simplest methods. It is actually more important as a theoretical, rather than practical, reference by which to test other methods; however, "steepest descent" steps are often incorporated into other methods (e.g., conjugate gradient, Newton) when roundoff destroys some desirable theoretical properties, progress is slow, or regions of indefinite curvature are encountered. At each iteration of SD, the search direction is taken as - g k , the negative gradient of the objective function at x k . Recall that a descent direction P k satisfies g t p k < 0. The simplest way to guarantee the negativity of this inner product is to choose P k = - g k . This choice also minimizes the inner product - g i p for unit-length vectors and thus gives rise to the name steepest descent. Steepest descent is simple to implement and requires modest storage, O(n);however, progress toward a minimum may be very slow, especially near a solution. The convergence rate of SD when applied to a convex quadratic function, as in Eq. [22], is only linear. The associated convergence ratio is no ~ 1)12 where K, the condition number, is the ratio of greater than [(K - 1 ) / ( + largest to smallest eigenvalues of A: K
= Arnax(A)/Arnin(A)*
[261
As the convergence ratio measures the reduction of the error at every step ( I ( X ~ + ~ - x'I 5 Pllxk - x'II for a linear rate), the relevant SD value can be arbitrarily close to 1 when K is large (Figure 12). In other words, because the n lengths of the elliptical axes belonging to the contours of the function are proportional to the eigenvalue reciprocals, the convergence rate of SD is slowed as the contours of the objective function become more eccentric. Thus, the SD search vectors may in some cases exhibit very inefficient paths toward a solution (see final section for a numerical example).
Conjugate Gradient The CG method was originally designed to minimize convex quadratic functions. Through several variations, it has also been extended to the general case.66-72
Gradient Methods
31
P
Figure 12 Steepest descent and conjugate gradient quantities that affect the convergence rate for quadratic functions (see text for the distinct context of these functions).
The first iteration in a CG method is the same as in SD, with a step along the current negative gradient vector. Successive directions are constructed differently so that they form a set of mutually conjugate vectors with respect to the (positive-definite) Hessian A of a general convex quadratic function. Whereas the rate of convergence for SD depends on the ratio of the extremal eigenvalues of A, the convergence properties of CG depend on the entire matrix spectrum. Faster convergence is expected when the eigenvalues are clustered. In exact arithmetic, convergence is obtained in at most n steps. In particular, if A has m distinct eigenvalues, convergence to a solution requires m iterations. One way to see the convergence dependence of the entire spectrum of A is to consider the following bound on convergence where the size of xk - x* is measured with respect to the A-norm, defined as
The bound is given by3,14
32
Optimization Methods in Computational Chemistry
Clearly rapid convergence is expected when K = 1, as for SD (see Figure 12). Further estimates of convergence bounds are more difficult but can be derived when certain properties about the eigenvalue distribution are known (e.g., m large eigenvalues and n - m small eigenvalues clustered in a region [a,b]).3,14The general notion of convergence for CG methods has been a large area of research3~14~70-77for both the convex quadratic and the nonlinear extension cases. When one refers to the CG method, one often means the linear conjugate gradient, that is, the implementation for the convex quadratic form. In this case, minimizing fxTAx + b’x is equivalent to solving the linear system Ax = -b. Consequently, the conjugate directions Pk, as well as the lengths kk, can be computed in closed form. In describing the steps of a CG method to solve Ax = -b, the residual vector Ax + b is useful. We define r = -(Ax + b) and use the vectors {dk} below to denote the CG search vectors (for reasons that will become clear in the Newton Methods section). The solution x‘ can then be obtained by the following procedure, once a starting point xo is specified.787’9 Algorithm [A2]: CG Method to Solve Ax
=
-b
1. Set ro = -(Axo + b), do = ro. 2. For k = 0,1,2, . . . , until r is sufficiently small, compute
Note here that only a few vectors are stored; the product Ad but not knowledge (or storage) of A per se is required; and the cost involves only several scalar and vector operations. The value of the step length X b can be derived in the closed form above by minimizing q ( x k + hdk) as a function of A (producing Ak = rfdk/dzAdk)and then using the conjugacy relation rfd, = 0 for all j < k.3
Preconditioning Performance of the CG method is generally very sensitive to roundoff in the computations that may destroy the mutual conjugacy property. The method was actually neglected for many years until it was realized that a preconditioning technique can be used to accelerate convergence significantly.l4,so.8’
Gradient Methods
33
Preconditioning involves modification of the target linear system Ax =
-b through application of a positive-definite preconditioner M that is closely
related to A. The modified system can be written as
The new coefficient matrix is symmetric as M-'A can be written as M- *QAM-1/2. Preconditioning aims to produce a more clustered eigenvalue structure for M-* A and/or lower condition number than for A to improve the relevant convergence ratio; however, preconditioning also adds to the computational effort by requiring that a linear system involving M (namely, Mz = r) be solved at every step. Thus, it is essential for efficiency of the method that M be factored very rapidly in relation to the original A. This can be achieved, for example, if M is a sparse component of the dense A. Whereas the solution of an n x n dense linear system requires order of n3 operations, the work for sparse systems can be as low as order n.13J4 The Hessians of potential energy functions, for example, separate naturally into local terms (among atom pairs involved in bonds, bond angles, and dihedral angles) and nonlocal terms (among nonbonded atom pairs). The number of local terms increases linearly with n, whereas the nonlocal terms increase as n2. Thus, a preconditioner from the local terms is a good choice that has performed well in practice.23.82 The recurrence relations for the preconditioned conjugate gradient (PCG) method can be derived from Algorithm [A21 after substituting x k = M-"2gk and rk + M1IzIk. New search vectors dk = M-It2dk can be used to derive the iteration process, and then the tilde modifiers dropped. The PCG method becomes the following iterative process. Algorithm [A3]: PCC Method to Solve Ax
= -b
1. Set ro = -(Axo + b), do = M-*r0. 2. For k = 0,1,2, , . , until r is sufficiently small, compute
.
Note above that the system M z = ~ rk must be solved repeatedly for zk and that the matrixhector products Adk are required as before.
34
Optimization Methods in Computational Chemistrv
Nonlinear Conjugate Gradient Extensions of the linear CG method to nonquadratic problems have been developed and extensively researched.66-78 In the several existing variants, the basic idea is to avoid matrix operations altogether and simply express the search directions recursively as
for k = 1,2, . . . ,with do = -go. The new iterates for the minimum point can then be set to
where hk is the step length. In comparing this iterative procedure with the linear CG of Algorithm [A2], we note that rk = -g& for a quadratic function. The aforementioned parameter P k is chosen so that if f i s a convex quadratic and A& is the exact one-dimensional minimizer of f along dk, the nonlinear CG reduces to the linear CG method and terminates in at most n steps (in exact arithmetic). Three of the best known settings for Pk are titled the Fletcher-Reeves (FR), Polak-Ribitre (PR), and Hestenes-Stiefel (HS) formulas.66-71J7978 They are given by the formulas
The last two formulas are generally preferred in practice, though the first has better theoretical global convergence properties. In fact, very recent research has focused on combining these practical and theoretical properties for construction of more efficient schemes.77.78 The simple modification of
for example, can be used to prove global convergence of this nonlinear CG method, even with inexact line searches.77 A more general condition on &, including relaxation of its nonnegativity, has also been derived.78 The quality of line search in these nonlinear CG algorithms is crucial. (Typically, line searches are used rather than the trust region methods.) Adjustments must be made not only to preserve the mutual conjugacy of the search directions-a property critical for finite termination of the method-but also to ensure that each generated direction is one of descent. A technique known as
Newton Methods
35
restarting is typically used to preserve a linear convergence rate by resetting dk to the steepest descent direction, -gk, after a given number of linear searches (e.g., n).68 A restart vector may also be formulated as a linear combination of the steepest descent direction and some other vector. This generally results in a method that requires 2n to 5n iterations to achieve reasonable accuracy,70~83 though in some cases progress may be very slow. As for the convex quadratic version, the number of iterations is significantly reduced if the Hessian has a clustered eigenvalue structure. Thus, preconditioning may also be used as in the linear case. Often, however, if additional derivative information and memory are available, Newton methods are preferred. In sum, the greatest virtues of CG methods are their modest storage and computational requirements (both order n) and their better convergence than the SD method. These properties have made them popular linear solvers and minimization choices in many applications18-20~84-8*and perhaps the only candidates for very large problems. The linear CG is often applied to systems arising from discretizations of partial differential equations,81.89*90 where the matrices are frequently positive-definite, sparse, and structured.
NEWTON METHODS Overview Newton methods are the prototype of second derivative algorithms. (Raphson’s name is often omitted when referring to this general class of methods, though both Newton and Raphson deserve credit for the quadraticallyconvergent method for finding a root of an equation, and its variants.) Several classes of methods exist, including discrete Newton, quasi-Newton (QN) (also termed variable metric in the older literature), and tnrncuted Newton (TN). Historically, the O(n2) memory requirements and O(n3)computation associated with solving a linear system directly have restricted Newton methods only (1)to small problems, (2) to problems with special sparsity patterns, or (3) near a solution, after a gradient method has been applied. Fortunately, advances in computing technology and software are making the Newton approach feasible for a wide range of problems. Indeed, effective strategies have been tailored to available storage and computation, exhibiting good performance in theory and practice. This trend undoubtedly will intensify. Very good detailed treatments of Newton methods can be found in the literature,3-6.52954 and only general concepts are outlined here. Two specific classes are emerging as the most powerful techniques for large-scale applications: limited-memory quasi-Newton (LMQN) and truncated Newton methods. LMQN methods attempt to combine the modest storage and computational requirements of CG methods with the superlinear convergence properties of standard (i.e., full memory) Q N methods. Similarly, TN
36
Optimization Methods in Computational Chemistry
algorithms attempt to retain the rapid quadratic convergence rate of classic Newton methods while making computational requirements feasible for largescale functions. With advances in automatic differentiation, the appeal of these methods will undoubtedly increase even further, as the reduction in the cost of evaluating Hessian/vector products and the preconditioner makes them very eff icient.61 All Newton methods are based on approximating the objective function locally by a quadratic model and then minimizing that function approximately. The quadratic model of the objective function f at xk along p is given by the expansion
The minimum of the right-hand side is achieved when pk is the minimum of the quadratic function:
Alternatively, such a Newton direction Pk satisfies the linear system of tt simultaneous equations, known as the Newton equation:
In the “classic” Newton method, the Newton direction is used to update each previous iterate by the formula xk+l = xk + Pk, until convergence. The reader may recognize the one-dimensional version of Newton’s method for ) . analogous solving a nonlinear equation f ( x ) = 0: xk+l = x k - f ( x k ) / f ’ ( x kThe iteration process for minimizing f ( x ) is xk+l = x k - f’(xk)/f((xk).Note that the is replaced by the Newton direcone-dimensional search vector, -f’(xk)lf”(xk), tion -Hk-lgk in the multivariate case. This direction is defined for nonsingular k i k . When xo is sufficiently close to a solution x“, quadratic convergence can be proven for Newton’s method.3-6 That is, a constanr p exists such that
In practice, this means that the number of digits of accuracy in the solution is approximately doubled at every step. This can be seen from the program output for a simple one-dimensional application of Newton’s method to finding the root of a (equivalently, solving f ( x ) = x2 - a = 0 or minimizing f ( x ) = x 3 / 3 - ax) (Figure 13). Unfortunately, there is a disparity between this theoretical convergence result and the practical behavior of the method in general. Thus, modifications of the classic Newton iteration are essential for guaranteeing global convergence, with quadratic convergence rate near the solution.
Newton Methods f(x)
Newton's Iteration for solving
x(k+l)
-
(
x(k)*x(k) + a
)
/ (2.
S i n g l e Precision X
6.000000 3.750000 3.075000 3.000915 3.000000 3.000000 3.000000 3.000000
(x
=
1 x-s I /s
1.0000000E+00 2.5000000E-01 2.5000015E-02 3.04857883-04 7.94728623-08 0.0000000E+00 0.0000000Et00 0.0000000E+00
iterate, s
=
=
x*x - a
* x(k)
37
0
=
)
Double Precision X
6.000000000000000 3.750000000000000 3.075000000000000 3.000914634146342 3.000000139383442 3.000000000000003 3.000000000000000 3.000000000000000
solution)
(X
=
I x-s I /s
1.0000000000000000E+OO 2.5000000000000000E-01 2.5000000000000004E-02 3.0487804878050658E-04 4.6461147336825566E-08 1.0732155904709847E-15 1.8503717077085942E-17 0.0000000000000000E*00
iterate, S
=
solution)
Figure 13 One-dimensional application of Newton's method for computing the square root of a number (computer output shown).Note in the double-precisionversion the mundoff in the last steps. First, when Hb is not positive-definite, the search direction may not exist or may not be a descent direction. Strategies to produce a related positiveor alternative search directions, become necessary. Second, definite matrix Rk, far away from x", the quadratic approximation of expression [34]may be poor, and the Newton direction must be adjusted. A line search, for example, can dampen (scale) the Newton direction when it exists, ensuring sufficient decrease and guaranteeing uniform progress toward a solution. These adjustments lead to the following modified Newton framework (using a line search). Algorithm [A4]: Modified Newton
*For k = 0,1,2, . . . , until convergence, given xo, 1. Test xk for convergence. 2. Compute a descent direction pk so that
where q k controls the accuracy of the solution and some symmetric matrix ITkmay approximate Hk. 3. Compute a step length A so that for X k + l = xk + hpk,
with 0 < OL < @ < 1. 4. Set x k + ] = xk + Xpk.
38
Optimization Methods in Computational Chemistry
Newton variants are constructed by combining various strategies for the above. These involve procedures for formulating Hkor Hk,dealing with structures of indefinite Hessians, and solving for the modified Newton search direction. For example, when Hk is approximated by finite differences, the discrete Newton subclass emerges.s.91-94 When Hk,or its inverse, is approximated by some modification of the previously constructed matrix (see later), Q N methods are formed.95-110 When q k is nonzero, T N methods result,111-123 because the solution of the Newton system is truncated before completion.
individual components
Discrete Newton Standard discrete Newton methods require n gradient evaluations of O(n2) operations to compute and symmetrize every Hessian f i k . Each column i of Hkcan be approximated by the vector
where hiis a suitably chosen number.sss8 This value must balance the roundoff error, proportional to ( l/hi),by formulation, with the truncation error, proportional to hi. A simple estimate for a well-scaled problem to balance the two errors is O(141 (some of which involve unusually difficult characteristics of scaling, structure, etc., that are rarely encountered in practice) and a few real-life problems suggest that the LM-BFGS and TN methods are preferable overall for reliability For highly nonlinear problems, the LM-BFGS and efficiency.10’~1*2~109~1~0 method generally requires fewer function evaluations (the time-dominating operation) than CONMIN and performs better than the tested TN algorithm of Nash. The TN method was found to perform better than LM-BFGS for functions that are nearly quadratic. A reasonable explanation for this is that the approximate Newton step is a good search direction for such problem structure. In contrast, both Nash’s algorithm and TNPACK were found to perform better on very large-scale problems in meteorology (15,000 variables), where the objective function is quasi-quadratic.110 Problem-tailored preconditioning improves performance of both LMBFGS and TN methods substantially,R09’22 but this remains to be tested systematically for large-scale problems. Several computational examples are shown in Figures 14 and 15 and Tables 2 to 6. Three minimization codes are examined: CONMIN (nonlinear CG and full-memory BFGS), LM-BFGS, and TNPACK (TN method). Preconditioning options, the number of stored updates (for LM-BFGS), starting points, and problem dimensions are varied for analysis.
Numerical Example I: Rosenbrock Minimization Rosenbrock’s function is often used as a minimization test problem, because its minimum lies at the base of a “banana-shaped valley” and can be difficult to locate. This function is defined for even integers n as the sum f(x) =
2
j=1,3,5 ,...,n-1
[(l- X i ) 2
+ loo(xi+l - x f ) Z ] .
~ 3 1
The contour plot of Rosenbrock’s function for n = 2 is shown in Figure 14. The minimum point is (1,1), where f ( x ) = 0. The gradient components of this function are given by gi+] = 2 O 0 ( X i + ,
g.I = -2[xigi+,
I
+ xf, + (1 - X i ) ] ,
I
j = 1,3,5,...,n - 1,
[64]
and the Hessian is the 2 x 2 block diagonal matrix with entries Hj+l,j+l = Hj+l,j =
H 191 . ’
=
200
-4OOxj
-2(xjHi+1,i
+ gi+*
- 1)
,
j = 1,3,5 ,...,n - 1.
~ 5 1
52
Optimization Methods in Computational Chemistn, 1.5
1
0
1
0
1
1.5
1
.5
0
-1
Figure 14 Minimization paths by different methods for the two-dimensional Rosenbrock function f(x,,x,) = (1 - x , ) ~+ 100(x2 - x,2)2:(a)truncated Newton, (b) conjugate gradient, (c) BFGS quasi-Newton, (a) limited-memory BFGS, ( e ) steepest descent, first 150 iterations. See program output in Figure 15.
(Note that these formulas are given in a form most efficient for programming.) For n = 2, the two eigenvalues of the Hessian at the minimum are A, = 1001.6 and h2 = 0.4 (thus K = 2.5 x l O 3 ) , so the contours are quite elongated near the minimum. Table 2 shows results for minimization in the two-dimensional case. We use this example to display the various minimization paths on the function’s
Perspective and Computational Examples
53
1.5
1
.5
0
-.5
-1
0
1
-1
0
1
1.5
I
.5
0
-.5
Figure 14 (continued)
contour plots (Figure 14). TN uses a diagonal preconditioner, and, unless otherwise stated, LM-BFGS uses five stored updates. Progress is reported by the number of iterations (for TN, both outer and inner), the number of function and gradient evaluations (NFG), CPU time, and final gradient norm and function values. TN uses the convergence criteria described in [19] and [21a-c] with Ef = eg = 10-8, whereas the other methods use condition [19] with tg = 10-8. We note from Table 2 that performance of all methods is very similar in terms of iteration and function cost, accuracy, and time. All are equally satisfac-
54
Optimization Methods in Computational Chemistry 1.5
1
.5
0
-.5
-1
0
1
Figure 14 (continued)
tory. In TN, the residual norm is essentially zero after two PCG iterations per Newton step (as expected because n = 2), so the behavior of the method is essentially that of a nontruncated Newton method (criterion RT defined in [54] and [55] was used with c, = 0.5). For the same reason (low dimensionality), the full and limited-memory QN methods are almost identical in performance (the number of updates in LM-BFGS exceeds n). The paths in the Newton methods are more systematic, following down the valley; the CG path is more arbitrary. The large step between the fourth and fifth iterate of CG results from a distant minimum detected in the line search. Nonetheless, CG performs well overall. Partial output produced from these programs is shown in Figure 15. Note the quadratic convergence rate in TN and the slightly slower convergence in the other methods. CG requires more than two function evaluations per step, and its step lengths vary significantly in magnitude from iteration to iteration. In contrast, the Newton methods require around one function evaluation per Newton step, and the step lengths are often one. For comparison, a steepest descent procedure was also examined. For this purpose, the TN code was modified so that only one inner iteration was performed, without preconditioning, and the exit search direction was the negative gradient. We note from Figure 14e and the output in Figure 15 that the SD progress is initially very rapid but then becomes excessively slow. Only 150 steps of minimization are shown in the contour figure, but the output indicates how slow progress is through several hundred iterations (the process was then stopped). This is typical of the SD method and explains why SD steps may often work well only initially, in regions far away from stationary points.
Perspective and Computational Examples Table 2 Rosenbrock Minimization, Method
TN CG BFGS LM-BFGS
Iterations (Outerhner) 22/43
14 38 40
IZ =
2a
CPU
NFG
(4
27 31 47 49
10 9 9
dThe starting point (-1.25,1.05) has f and 15.
=
55
9
lls”ll2
5.2 1.1 3.6 4.2
x 10-l’ x 10-12 x 10-l0 x
f ’* 1.7 x 10 2 3 7.0 x 10-27 1.3 x 10 l 2 2.3 x
3.2 x 10’ and llgllz = 2.8 x lo3. See Figures 14
Table 3 shows results for Rosenbrock minimization with n = 1000. An asymmetrically perturbed starting point was used (see footnote in table). Note that T N is very economical in terms of iteration cost and function evaluations, and the CG method is equally cheap and effective. It is likely that CG performs well because of the clustered eigenvalue structure. As the Hessian has a 2 x 2 block diagonal structure, multiple eigenvalues result. The full-memory BFGS method is inappropriate here because of the high dimensionality (CPU time is greater by three orders of magnitude than for other methods). LM-BFGS requires more iterations and function evaluations and consequently more time than T N and CG, but differences are still within the same order of magnitude. These results demonstrate how effective Newton variants for large dimensions can be and how, even in terms of time, they are similar or better than the CG method. In reliability they typically exceed the best gradient methods.
Numerical Example 11: Deoxycytidine The molecular model of deoxycytidine is examined in Tables 2 and 3 to analyze the issues of preconditioning and number of LM-BFGS updates. Energy model details are described elsewhere,22>23and here it suffices to note that several well-separated local minima correspond to different feasible combinations of the sugar’s pseudorotation parameter and the glycosyl (sugar-to-base orientation) dihedral angle. A natural preconditioner consisting of the local potential energy terms has been found to be very effective here. Its nonzero structure is shown in Figure 2b: an overall block-diagonal clustered pattern is apparent from the separate bond coupling within the sugar and base units. The symmetry in the x , y, z components of the Cartesian variables also produces smaller 3 x 3 block repeating patterns. In the original labeling scheme, the x , y , z components of each atom are numbered in turn. After a YSMP reordering in TN, the pattern shown in Figure 2c results, producing a smaller factorization fill-in, as the first nonzeros in some rows are pushed further to the right. Our experience has
56
Ootimizution Methods in Comwtational Chemistrv Perfonnance for the Rosenbrock F u n c t i o n ( n
=
2)
( a ) Truncated Newton
ITN 0 1 2 3 4 5 6 7
n
9 10 11
12 13
14 15 1G 17 18
19 20 21 22
NF
F
1 31.9712644016 2 4.7190600539 4 4.1312457547 5 3.3543889611 6 2.6573313439 7 2.1148470328 8 1.5904843317 10 1.32 52913900 11 1.00 08130364 12 0.6304597544 14 0.5571407008 15 0.39 62408358 16 0.2309761792 in 0.1579123666 13 0.0839849202 20 0.04 04234905 21 0.01 43451840 22 0.00 43820612 23 0.00 05990766 24 0.00 00260883 25 0.00 00000592 26 0.0000000000 27 0.0000000000
GhQRM/sqrt(N) STEPLEN ~ ( 1 ) X(2) 0.200976E+03 0 . 0 -1.25403023 1.05403023 0.110278Et02 0 . 0 -1.16031887 1.32351830 0.158624Et02 0 . 5 -0,98763945 0.93294226 0.136434E+02 1 . 0 -0,77836768 0.56206156 0.915315E+01 1 . 0 -0.59613799 0.32217285 0.913832E+01 1 . 0 -0.38726128 0.10634181 0.427944EtOl 1 . 0 -0.24462545 0.03949664 0.429829Et01 1 . 0 -0.12393941 -0.00954918 0,06394650 -0.03121196 0.503915E+01 1 . 0 0.196500E+Ol 1 . 0 0.18007895 0.01894168 0.231509E+01 0 . 4 0.27158530 0.05746357 0 , 4 4 2 6 1 4 7 5 0.16665674 0.503692E+01 1 . 0 0.9856713+00 1.0 0.52398231 0.26793678 0.62920614 0.38160901 0.285678Et01 0 . 5 0.1993713+01 1 . 0 0,72530978 0.51683837 0.82178771 0.66602705 0.232090E+01 1 . 0 0.374807E+00 1 . 0 0.88406490 0.77769229 0.94935508 0.89701226 0:123077E+O1 1 . 0 0.203154E+00 1 . 0 0,97669286 0.95318159 0.12588OE+00 1 . 0 0.99696927 0.99353659 1.0 0.99976973 0.99953167 0.219316E-02 0.162610E-04 1.0 0.99999964 0.99999923 0.363783E-10 1.00000000 1.00000000
( b ) CONMIN Conjugate Gradient
ITN 0 1
2 3
4
5 6
7
8
‘3 10 11 12 13 14
NF
1 3
5
R
10 12 14
16
13 %A 23 25 27 29 31
F
31.9712644016 4.3447824695 4.2431104032 3.5167814324 2.9090812308 4,0 533295171 0.5540987318 0.0136801009 0.0 027257658 0.0010528123 0.0001239198 0.0000001044 0.0000000000 0.0000000000 0.0000000000
GNORM
0.28Dt03 0.15Dt02 0 I18De01 0.23Dt02 0.24Dt02 0.71Dt02 0.32Dt02 0.28DtOO 0.16De01 0.20Dtoo 0.47MOO 0.54D-02 0.6OD-04 0.10D-06 0,llPll
STEP1LEN 0.0 0.0 1.3 448.6 1.7 0.5 0.0 0.5
194.5 1.4 44.4 0.3 5.8
0.2
x(1)
X(2) 1.05403023 1.13524071 1.12795763 0.58894666 0.20767072 0.62632901 0.83285031 0.77959043 0.92497436 0.93678481 0.99199439 1.00059877 1.00000109 0.99999999 1.00000000
x(1)
X(2) 1.05403023 1.13524071 1.12946827 1.12674634 1.12454866 1.10303239 1.05722488 0.93880742 0.80737499 0.64003770 0.47901784 0 . 3 5957879 0.19373672
-1.25403023 -1,04681947 -1.05851069 -0.80080080 -0.53169809 0.63952750 0 . 9 524 3688 0.88358200 0.96370498 0.96771111 0.99651384 1.00030472 1.00000048 0.99999999 1.00000000
( c ) CONMIN Quasi Newton (BFGS) ITN
0 1 2 3 4 5 6 7 H 9
10 11
12
NF
1
3 4 5 B 7 8 9
10 11
12 13 15
F 31.9712644016 4.3447824695 4.2475141998 4.2407855297 4.2375454227 4.2107433097 4,1604732801 4.0489517601 3.3108605034 3.6375512593 3.1 130242950 2.6057378503 2.4248857977
GNORM
0.28Dt03 . 0.15Dc02 0 . j4DtOl 0.18Dt01 0.20D+01 0.55DtOl 0.10Dt02 0.18M02 0.23D+02 0.24Dt02 0.17M02 ~
0.66DtOl 0.15D+02
STEPLEN
0.0 0.0 1.0 1.O 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.3
-1,25403023 -1.04681947 -1.05608572 -1.05797103 -1.05771783 -1.05176394 -1.03496426 -0.98560409 -0.92370225 -0.83242480 -0.71996811 -0.60967227 -0.48915091
Figure 15 Minimization output by the differentmethods corresponding to Figure 14.
Perspective and Computational Examples ~
13 14
15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
16 17 19 20 22 23 25 26 27 28 30 31 32 33 34 36 37 38 39 40 42 43 44 45 46 47
2.0625458857 1.6800702763 1.5573528088 1.1262179775 0.9210925703 0.7341555025 0.6702828768 0.5695542610 0.4177832527 0.2958148057 0.1939731541 0.1703468728 0.0961174146 0.0642768826 0.0252997774 0.0148034988 0.0097699591 0.0028700428 0.0007585852 0.0001490214 0.0000243098 0.0000002273 0.0000000009 0.0000000000 0.0000000000
0.99M01 0.54M01 0.93Dt01 0.89Dt01 0.20Dt01 0.27Dt01 0.62M-01 0.41M01 0.55De01 0.4OM-01 0.34Dt01 0.51Dt01 0.22M-01 0.36M-01 0.14Dt01 0.22DtOl 0.16Dt01 0.76M00 0.32DtOO 0.44M00 0.44D-01 0.12D-01 0.12D-02 0.2OD-04 0.36D-06 0.0000000000 0.36D-09
1. O 1.0 0.4 1.0 0.5 1.0 0.3 1.0 1.0 1.0 0.5 1.0 1.0 1.0 1.0 0.3 1.0 1.0 1.0 1.0 0.3 1.0 1.0 1.0 1.0
-0.40294777 -0,28627104 -0.18795897 0.03313867 0.04038753 0.15290463 0.23896553 0.27253092 0.39964271 0.46848904 0.57898880 0.62756400 0.70008751 0.77097074 0.84353739 0.89361486 0.91082832 0.95022938 0.97376724 0.99336586 0.99513193 0.99961778 0.99998676 0.99999953 1.00000002 1.00000000
( d ) L i m i t e d Memory Quas’i Newton (LM-BFGS)
ITN 0 1
2 3 4 5 6
7
8
9
10
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
NF 1 3 4 5 6
7 8
9
10 11 12 14 16 17
18 19 21 22 23 25 27 28 29 30 32 33 34 35 36 37 38 39
41 42 43 44
F
GMRM
STEPLEN
0.319713M-02 0.28Dt01 0.0 0.434478M01 0.15Dc02 0.0 0.424751DcOl 0.34M-01 1.0 0.424023M.01 0.18m-01 1.0 0.423599DtOl 0.21M-01 1.0 0.420697M.01 0.58Dt01 1.0 0.415140Dt01 O.llDt02 1.0 0.403468DeOl 0.19De02 1.0 0.389041DtOl 0.23Dt02 1.0 0.356400M01 0.23Dt.02 1.0 0.282372IHOl 0.56M-01 1.0 0.241029M01 0.59D+01 0.3 0.230043M01 0.llM-02 0.2 0.217535M01 0.12Dt02 1.0 0.188409M.01 O.llDt02 1.0 1.0 0.149586DtOl 0.47Dt01 0.13445oM-01 0.75Dt01 0.5 0.106786M.01 0.66M-01 1.0 0.858817MOO 0.43Dt01 1.0 0.63809OMOO 0.20M-01 0.4 0.602068MOO 0.41M01 0.2 0.546289DtOO 0.48M.01 1.0 0.403118DtOO 0 . 4 9 ~ ~ 0 1 1.0 0.283682DcOO 0.81DtOO 1.0 0.233596M.00 0.35DtO1 0.3 0.1854621~000.66DtOl 1.0 0.114369IHOO 0.llDtOl 1.0 1.0 0.757133D-01 0.49DeOl 0.472804D-01 0.16Dt01 1.0 0.318827D01 0.53DtOl 1.0 0.146830~~010.37DeOO 1.0 0.698355D-02 0.45DcOO 1.0 0.380960P02 0.12Dt01 0.4 0.135593P02 0.11DtOl 1.0 0.24 3668~-03 0.25DtOO 1.0 0.151047D04 0.16Dt00 1.0
Figure 15 (continued)
X(l)
-1.25403023 -1.04681947 -1.05608572 -1.05783541 -1.05745111 -1.05072830 -1.03174877 -0.97885197 -0.91692131 -0.81307500 -0.66331001 -0.54862332 -0.48566319 -0.42201285 -0.31093596 -0.21445210 -0.11220442 0.01739154 0.09066241 0.20664523 0.25172298 0.29814112 0.40273171 0.46852891 0.53753329 0.63165581 0.66504445 0 76525532 0.78893336 0.88470457 0.87887674 0.91761875 0.94577378 0,97389620 0.98522645 0.99864344
57
0.13166130 0.06595828 -0.00289528 -0.04265079 0.00316896 0.01050157 0.02692019 0.05418754 0.13576552 0.23101928 0.32229639 0.37604941 0.48226764 0.58352275 0.71441754 0.79264354 0.82534400 0.90095364 0.94738342 0.98575098 0,99036576 0.99920720 0.99997077 0.99999910 1.00000003 1.00000000 .u(2) 1.05403023 1.13524071 i.12946a27 1.12646330 1.12357217 1.10017373 1.04921001 0.92367947 0.79428758 0.GO848319 0.46389861 0.29000961 0.20533463 0.13095081 0.05599548 0.03151132 -0.02019741 -0.03168826 0.02608642 0.03338670 0.04283403 0.06571853 0.14065476 0.21602615 0,27489902 0.37667664 0.43762155 0.57126012 0.61718967 0.76906776 0.77277349 0.84062103 0,89153996 0.94587665 0.9711752 5 0.99692451
58
Optimization Methods in Computational Chemistry 36 37
38
39
40
45 46 47 48 49
0 121905P05 0 443379P09 0 674012D-12 0.724553D-17 0.233483D-21
0.29D-01 0.90-03 0.29-04 0.33-07 0.42-09
1.0 1.0 1.0
1.0 1.0
0,99912536 1.00000699 0.99999946 1.00000000 1.00000000
0.99818410 1.00001200 0.99999899 0.99999999 1.00000000
( e ) Steepest L e s c e n t ITN
0
1 3
3
4 5 6 I
R
9
10 11
12 13 I4 15 16 11 in
19
20 21 12 21 24
25 26
27 2H 29
10 11 32 33
14
35 36
11 1H
39
40
F
31.9712644016 9.7707222610 4.2274537586 4.2213111549 4.2155533092 4.2079229687 4.1963209609 4.1854935028 4.1658385218 3.1678216258 3.0208933705 3.0046493622 1.1403403446 1.0995164007 1.0468611342 0.2470165941 0.2365334708 0.1990534051 0.1769626768 0.1755935169 0.1740143784 0.1725102330 0.17079'97980 0.1692217280 0.1674507388 0 1658646065 0.1641328576 0.1625880839 0.1603558783 0,1594766286 0.1579616513 0.1565595758 0.1551620911 0.1538434159 0.1525580902 0.1513258950 0.1501451969 0.1489998542 0.1479146236 0.2468534597 0.1458524468
GNORM/sqrt ( N)
0.200976Et03 0.721645Et02 0.217954Et01 0.192507Et01 0.142088Et01 0.325712Et01 0.134753Et01 0.437931EtOl 0.125080Et01 0.995583EtOl 0.278715Et01 0.143926CtOl 0.242356Et01 0.277594Et01 O.l65908E+Ol 0.201467Et01 0.480024E+00 0.328990E+01 0.788015Et00 0.739630Et00 0.451228Et00 0.794984Et00 0.435533Et00 0.823421E+00 0.425773E+00 0.819472Et00 0.419263Et00 0.795206Et00 0.414558Et00 0.762886Et00 0.411215Et00 0.727546Et00 0.409214Et00 0.690754Et00 0.408733Et00 0.653010E+00 0.410099E+OO 0.614568Et00 0.413814Et00 0.575755Et00 0.420602Et00
X(l) -1.25403023 -0,96186693 -1.05592075 -1.05082879 -1.05233890 -1.04414056 -1.04749029 -1,03516795 -1.03952827 -0,75355730 -0.72899166 -0.73218938 -0.06093734 -0.03410434 -0.02177217 0.50828643 0.51418382 0.58755398 0.57947114 0.58391475 0.58296243 0.58797298 0.58685531 0.59214305 0.59093640 0.59622131 0.59501209 0.60010779 0.59894953 0.60377971 0.60269350 0.60722979 0.60622380 0.61045468 0.60953350 0.61345478 0.61262257 0.61623530 0.61549694 0.61880816 0.61816998
X(2) 1.05403023 1.16853549 1.11750564 1.11665583 1.11329794 1.10737954 1.10364253 1.09244969 1.08846948 0.53737593 0.54917178 0.54255831 -0,00843255 0.01852531 -0.00485774 0.26558999 0.26211319 0.32820743 0.33687380 0.33598997 0.33887544 0.34047394 0.34334435 0.34527202 0.34812090 0.35016254 0.35295471 0.35495799 0.35767097 0,35956385 0.36219448 0.36394143 0.36649539 0.36807468 0.37056153 0.37195791 0.37439032 0.37559322 0.37798783 0.37899131 0.38137071
50
0.1369909914
0.543482Et00
0.62987873
0.39685848
100
0.0835178107
0.362718Et00
0.71101517
0.50530633
200
0.0316149860
0.135948E+00
0.82231263
0.67554856
300
0,0066762375
0.614025E-01
0.91833070
0.84307902
400
0.0041614876
0.443911E-01
0.93556265
0.87497226
500
0.0009081323
0.360715E-01
0.96994575
0.94057398
600
0.0006846991
0.210984E-01
0.97384144
0.94830167
700
0.0004944224
0.145696E-01
0.97778643
0.95596732
Figure 15
(continued)
Perspective and Computational Examples
59
800
0.0000669359
0.201061E-01
0.99187333
0.98371820
900
0.0000399588
0.119952E-01
0.99370840
0.98739518
1000
0.0000282049
0.833224E-02
0.99470865
0.98939985
1100
0.0000216950
0.623094E-02
0.99535619
0.99069789
1200
0.0000175591
0.486285E-02
0.99582017
0.99162812
Figure 15 (continued)
shown that this local preconditioner is often positive-definite, with possible exceptions at the first few minimization iterations.139 Tables 4 and 5 summarize performance results for two different starting points. The first (xl)is closer to a minimum than the second (x2)and has lower function value and gradient norm by about four orders of magnitude (see table footnotes for details). From both starting points, we first note how well preconditioning works in TN. The residual truncation criterion of [54]and [55] was used here with c, = 0.5. With preconditioning, the number of inner (PCG) iterations is reduced by two to three orders of magnitude. Even the number of Newton iterations is reduced, and the time is accelerated by a factor of 2 to 3. Not only is precision of the resulting gradient norm not sacrificed; it improves. This is a typical observation with good preconditioning. The CG method requires a large number of iterations and function evaluations (approximately twice the number of iterations). Although its cost per iteration is low, computational time is significantly greater than T N as well as the optimal version of LM-BFGS. BFGS is feasible for this problem size (n = 87) and performs relatively better for xl than x2, presumably because the procedure for updating from a better approximation to the minimum provides better curvature information to direct progress. The unpreconditioned LM-BFGS runs are not very efficient overall, although their cost per iteration is far better than that of BFGS; however, preconTable 3 Rosenbrock Minimization. n = 1000a Method TN CG BFGS LM-BFGS
Iterations (OuterAnner) 231127
52
188 249
CPU NFG
(4
11g1 '12
f'
30 114 208 283
24 21 2880 33
1.5 X lo-' 3.8 x 10-8 2.2 x lo-' 3.0 x 10-7
1.6 x 10-17 1.1 x 10-18 4.8 x 1.1 x 10-13
.The starting point corresponds to a random perturbation of the point (-1.2,1.0,-1.2,1.0, and has f = 8.1 x lo3 and llgllr = 4.0 x 103.
. . .)
60
ODtimization Methods in Combutational Chemistrv
Table 4 Deoxycytidine Minimization from xl. n = Iterations (OuterIInner)
Method
TN,no P TN,p
NFG
2311536 16/96
CG BFGS LM-BFGS, no P, LM-BFGS, P
87~
u = 5
u = 5
u=4 u=3 u=2 u=l
CPU
38 s units) 2.8
27 20
1 .o
1520 354
3002 357
11.8
1551
1598
5.0
46 46
57 58 66
51
66
66 711
u = o
(in
2.9
1.7 1.7 1.8 3.1 3.1 27.2
81 88
1423
~
@Thestarting point corresponds to a pseudorotation puckering angle of 0" and a glycosyl dihedral angle x = 180". The energy at this point has a value of f = 1.4 and ilgllz = 7.6 x 101. At the minimum, f. = -5.7 and [lg.llz in all methods ranges from 0(10-*) to O(lO-5). The symbols "P" and "u" in the Method column refer to precondbronrng (by the matrix of local Hessian interactions) and to the number of stored updates in LM-BFGS, respectively.
Table 5 Deoxycytidine Minimization from x2. n = 87a Iterations (Outerhner)
Method
NFG
CPU 40 s units)
TN, no P TN,P
221771 20181
CG
1029 1017
2043 1021
5.7 4.9
831
8 62
2.5
71 76 74 71 66 91
187 244 200 148 137 149
2.3 3.5 3.4 3.0 3.0 3.3
BFGS
LM-BFGS, no P, u LM-BFGS, P u = s
u=4 u=3 u=2 u=l u = o
=
5
28 24
(in
2.5 1.o
dThe starting point corresponds to a pseudorotation puckering angle of -90" and a glycosyl dihedral angle x = 90". The energy at this point has a value of f = 5.1 x lo4 and llgllz = 6.9 x los. At the minimum, f. = -5.0 and I& in all methods ranges from 0 ( 1 0 - 8 ) to o(10-5).See footnote a to Table 4.
Perspective and Computational Examples
61
ditioning introduces significant improvements: the number of iterations and function evaluations drops sharply, and time is reduced as well. This illustrates the significant savings and improvement in reliability for Newton methods despite the more complex and expensive computations involved. TN and LMBFGS with preconditioning are clearly the best methods overall for this problem. The experiments with varying numbers of stored updates in the preconditioned LM-BFGS runs are meant to help assess the relative importance of updating and preconditioning. Overall performance deteriorates as the number of updates decreases, but not systematically. When u = 0, no updating is performed and the inverse of the preconditioner is used to scale the initial search direction. For xl, this version clearly shows a large performance deterioration, so the additional work in preconditioning does not pay in terms of time. A likely explanation is that, because xl is a good starting point, the QN updating is important for good performance. Interestingly, even a “little” updating here (e.g., u = 1) produces a very significant improvement. For x2 on the other hand, preconditioning rather than updating appears to lead to iterationcount improvements. In ail these deoxycytidine runs, computation time is approximately proportional to the number of function and gradient evaluations, This is typical in molecular computations, as the evaluation and differentiation of the potential energy are expensive and the time-determining factors. The trends observed here extend to biomolecular models and suggest that, with local preconditioning, TN and LM-BFGS are the methods of choice.
Numerical Example 111: Water Clusters The water cluster examples in Table 6 are meant to illustrate a somewhat more difficult minimization problem. Optimal cluster-network geometries are more difficult to locate not only because of the difficulty in constructing good starting points; the optimal configurations are dominated by longrange nonbonded forces that are computationally not feasible to consider in preconditioners. For these water runs, the potential energy consists of intermolecular Lennard-Jones and Coulombic terms, with the addition of intramolecular bond lengths and bond angle terms to permit a flexible mode1.34 Only these intramolecular terms are used as a preconditioner. This produces a 9 X 9 blockdiagonal (see Figure 2a) indefinite matrix, with approximately n / 3 negative eigenvalues.*39 The local preconditioner of deoxycytidine, in contrast, is usually positive-definite with the possible exception of one isolated large negative eigenvalue.139 Starting points for the water clusters of varying sizes are constructed so as to occupy a regular box in three dimensions (e.g., a 3 X 3 x 3 cube for 27 molecules).34 These arrangements are often poor approximations
62
ODtimtzation Methods in Commtational Cbemistrv
Table 6 Water Cluster Minimization" Method
lterations (Outer/lnner)
18 18 18 18 18
TN, P CG BFGS LM-BFGS, no P LM-BFGS, P
6 11252 448 183 603 204
101 899 212 655 222
1.0 1.4 1.0
72 72 72 72 72
TN, P CG BFGS LM-BFGS, no P LM-BFGS, P
128/1417 856 416 1355 904
223 1715 419 1408 934
3.3 2.5 2.1 2.5 4.9
243 243 243 243 243
TN, P CG BFGS LM-BFGS, no P LM-BFGS, P
98/1723 5742 1548 8066 7017
184 11512 1669 8325 7209
n
NFG
CPU
(in 24-s units)
1 .o 1.o
37 110 76 120 140
"Starting points were selected pseudorandomly so that the molecules occupy a regular threeThe starting energies and gradient norms for the dimer (n = 18), 8 dimensional molecules (n = 72), and 27 molecules (n = 243) are -5.5, 2.9 x 10; 4.8 x 102, 7.4 x lo2; and 5.5 x 102, 7.5 x 102, respectively. The three corresponding minima are -6.95 for n = 18, between -73 and -76 for n = 72, and between -296 and -302 for n = 243. (The minimum obtained varies slightly from method to method in the higher dimensions.) The final gradient norms range from O(10-6) to O(10-4). A less strict convergence parameter, eg = 10-4, was used in all methods, and ef was also set to 10-4 for TN (see formulas 1191, [21a-c]). The preconditioner here comes from the intramolecular bond length and bond angle terms and is a 9 x 9 block diagonal matrix. Five updates are used in LM-BFGS. T N used the quadratic truncation test with cq = 0.2 (see condition 1571) and a limit of 25 PCG iterations per Newton step.
to the optimal hydrogen-bonded network geometries and become poorer as n increases, because the optimal shapes become more spherical. These problem difficulties are reflected by an increased number of iterations, function evaluations, and time in all the minimization runs. For n = 18, performance of all methods is very similar in terms of time as most time is dominated by program printing and analysis. CG requires more iterations but is very fast. LM-BFGS is computationally efficient in terms of time per iteration, and preconditioning accelerates convergence. Dramatic acceleration by preconditioning is not noted here as in deoxycytidine, because the preconditioners are not as good approximations to the Hessian. Although these preconditioners generally produce better performance than the unpreconditioned version, the use of indefinite preconditioners is not typical and unclear as an optimal process. For higher dimensions, we note that TN performs very well despite the relatively poor preconditioner and starting point. A limit on the maximum
Perspective and Computational Examples
63
PCG iterations per Newton iteration (25) reduces the total number of PCG iterations considerably without any sacrifice in performance. The quadratic truncation test of [57] was used here with cq = 0.2. The modified Cholesky factorization for the preconditioner apparently produces good search vectors. CG requires more iterations but is relatively economical in terms of time. BFGS works well, but the limited-memory variants are not as efficient. Furthermore, preconditioning does not improve performance significantly and requires more time. These features are likely to reflect the ill conditioning of this problem. For these types of problems, performance of Newton methods may be very sensitive to the starting point and program parameters. Consequently, they may exhibit behavior that does not necessarily scale up systematically with size. CG methods reveal similar characteristics, but they may require less user information (i.e., are easier to implement). Typically, for such difficult problems, it is recommended that local minimization in combination with search strategies or stochastic approaches be used. Formulation of a better, positive-definite approximation to the Hessian may also improve performance, and the use of an automatic technique, among many, as offered in a recent FORTRAN package,130 may be helpful. It should be emphasized that preconditioning is a very difficult topic as it is highly problem dependent. Thus, very little systematic research on tailoring has been done. Perhaps the emergence of powerful Newton variants will spur further developments in this area.
New Technologies Future developments in the field of optimization will undoubtedly be influenced by recent interest and rapid developments in new technologiespowerful vector and parallel machines. Indeed, their exploitation for algorithm design and solution of “grand challenge” applicationslsJ6 is expected to bring new advances in the field of computational chemistry, in particular. Supercomputers can provide speedup over traditional architectures by optimizing both scalar and vector computations. This can be accomplished by pipelining data as well as offering special hardware instructions for calculatarithmetic, and array operations. In ing intrinsic functions (e.g., exp(x), fi), addition, parallel computers can execute several operations concurrently. Multiple instructions can be specified for multiple data streams in MIMD designs, whereas the same instructions can be applied to multiple data streams in SIMD prototypes. Communication among processors is crucial for efficient algorithm design so that the full parallel apparatus is exploited. These issues will only increase in significance as massively parallel networks enter into regular use. In general, one of the first steps in optimizing codes for these architectures is implementation of standard basic linear algebra subroutines (BLAS). These routines-continuously being improved, expanded, and adapted optimally to more machines-perform operations such as dot products (xTy) and
64
Optimization Methods in Computational Chemistry
vector manipulations (ax + y), as well as matrixhector and matrix/matrix operations. Thus, operations, such as in Eq. [38] or [Sob], can be executed very efficiently. In particular, if n is very large, segmentation among the processors may also be involved. A new library of FORTRAN 77 subroutines, LAPACK, focuses on design and implementation of standard numerical linear algebra tasks (e.g., systems of linear equations, eigenvalue and singular value problems) to achieve high efficiency and accuracy on vector processors, high-performance workstations, and shared-memory multiprocessors.14~ At this writing, up-todate information may be obtained by sending the message “send index from lapack” to the electronic-mail address
[email protected] or contacting J. Dongarra and s. Ostrouchov at
[email protected]. Specific strategies for optimization algorithms have been quite recent and are not yet unified.44.54 For parallel computers, natural improvements may involve the following ideas: (1)performing multiple minimization procedures concurrently from different starting points; (2) evaluating function and derivatives concurrently at different p0ints13~(e.g., for a finite-difference approximation of gradient or Hessian or for an improved line search); (3) performing matrix operations or decompositions in paralle1118,133 for special structured systems (e.g., Cholesky factorizations of block-band preconditioners). With increased computer storage and speed, the feasible methods for solution of very large (e.g., O(105) or more variables) nonlinear optimization problems arising in important applications (macromolecular structure, meteorology, economics) will undoubtedly expand considerably and make possible new orders of resolution.
ACKNOWLEDGMENTS I thank the editors of this volume for inviting my contribution. I am grateful to
1. M. Navon, J. Nocedal, M. L. Overton, and R. B. Schnabel for providing many preprints and useful suggestions. I thank Connie Engle for her outstanding typing and sense of humor. The work
is supported by the National Science Foundation (CHE-9002146 and Presidential Young Investigator Award ASC-9157582), the Searle Scholar Program, the Whitaker Foundation, and the Faculty of Arts and Science at New York University. Computer facilities were provided by the Academic Computing Facility at New York University.
REFERENCES 1. F. S. Acton, Numerical Methods That Usually Work, Chap. 17, Mathematical Association of America, Washington, D.C., 1990 (updated from the 1970 edition). 2. P.E. Gill, W. Murray, and M. H. Wright, Numerical Linear Algebra on Optimization, Vol. 1, Addison-Wesley, Redwood City, Calif., 1991. 3. D. G. Luenberger, Linear and Nonlinear Programming, 2nd ed., Addison-Wesley, Reading, Mass., 1984.
References
65
4. R. Fletcher, Practical Methods of Optimization, 2nd ed., John Wiley & Sons, Tiptree, Essex,
5. 6. 7. 8. 9. 10.
11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
22. 23.
24. 25.
United Kingdom, 1987. P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press, New York, 1983. J. E. Dennis, Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, N.J., 1983. P. G. Ciarlet, Introduction to Numerical Linear Algebra and Optimization, Cambridge University Press, Cambridge, United Kingdom, 1989. P. J. M. van Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and Applications, D. Reidel, Dordrecht, 1987. E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines, Wiley-lnterscience Series in Discrete Mathematics and Optimization, John Wiley & Sons, Tiptree, Essex, 1990. G. L. Nemhauser, A. H. G. Rinnooy Kan, and M. J. Todd, Eds., Handbook in Operations Research Management Science, Vol. 1, Elsevier Science/North-Holland, Amsterdam, 1989. C. A. Floudas and P. M. Pardalos, Eds., Recent Advances in Global Optimization, Princeton Series in Computer Science, Princeton University Press, Princeton, N.J., 1991. P. T. Boggs, R, H. Byrd, and R. B. Schnabel, Eds., Numerical Optimization 1984, SIAM, Philadelphia, 198.5. G . Dahlquist and 8, Bjork, Numerical Methods, Prentice-Hall, Englewood Cliffs, N.J., 1974. G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, Md., 1984. Grand Challenges: High Performance Computing and Communications, Report by the Committee on Physical, Mathematical, and Engineering Sciences, Office of Science and Technology Policy, Washington, D.C., 1991. Mathematical Foundations of High-Performance Computing and Communications, Board of Mathematical Sciences, National Research Council, National Academy Press, Washington, D.C., 1991. R. B. Schnabel and Z-T. Chow, SIAM J. Opt., 1, 293 (1991). Tensor Methods for Unconstrained Optimization Using Second Derivatives. U. Burkert and N. L. Allinger, Molecular Mechanics, ACS Monograph 177, American Chemical Society, Washington, D.C., 1987. S. J. Weiner, P. A. Kollman, D. T. Nguyen, and D. A. Case, J. Comput. Chem., 7,230 (1986). An All Atom Force Field for Simulations of Proteins and Nucleic Acids. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187 (1983). CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. S. Lifson and A. Warshel, J. Chem. Phys., 49, 5116 (1968). Consistent Force Field for Calculations of Conformations, Vibrational Spectra, and Enthalpies of Cycloalkane and n-Alkane Molecules. T. Schlick, dissertation, Courant Institute, Department of Mathematics, New York University (1987). Modeling and Minimization Techniques for Predicting Three-Dimensional Structures of Large Biological Molecules. T. Schlick, B. E. Hingerty, C. S. Peskin, M. L. Overton, and S. Broyde, in Theoretical Biochemistry and Molecular Biophysics, Vol. I, pp. 39-58, D. L. Beveridge and R. Lavery, Eds., Adenine Press, Guilderland, New York, 1991. Search Strategies, Minimization Algorithms, and Molecular Dynamics Simulations for Exploring Conformational Spaces of Nucleic Acids. L. Greengard and V. Rokhlin, J. Comput. Phys., 73, 325 (1987). A Fast Algorithm for Particle Simulations. J. Carrier, L. Greengard, and V. Rokhlin, SIAM J. Sci. Statist. Comput., 9,669 (1987).A Fast Adaptive Multipole Algorithm for Particle Simulations.
66
Optimization Methods in Computational Chemistry
26. A. R. Leach, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991. A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules. 27. M. Pincus, R. Klausner, and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 79,5107 (1982). Calculation of the Three-Dimensional Structure of the Membrane Bound Portion of Milittin from Its Amino Acids. 28. H. A. Scheraga, Biopolymers, 22, 1 (1983). Recent Progress in Theoretical Treatment of Protein Folding. 29. B. E. Hingerty, S. Figueroa, T. L. Hayden, and S. Broyde, Biopolymers, 28, 1195 (1989). Prediction of DNA Structure from Sequence: A Buildup Technique. 30. J. A. McCammon and S. C. Harvey, Dynamics of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, 1987. 31. T. Schlick, Compur. Chem., 15,251 (1991). New Approaches to Potential Energy Minimization and Molecular Dynamics Algorithms. 32. C. S. Peskin and T. Schlick, Comm. Pure Appl. Math., 42,1001 (1989).Molecular Dynamics by the Backward-Euler Method. 33. T. Schlick and C. S. Peskin, Comm. Pure Appl. Math, 42, 1141 (1989). Can Classical Equations Simulate Quantum-Mechanical Behavior? A Molecular Dynamics Investigation of a Diatomic Molecule with a Morse Potential. 34. T. Schlick, S. Figueroa, and M. Mezei, I. Chem. Phys., 94, 2118 (1991). A Molecular Dynamics Simulation of a Water Droplet by the Implicit-Euler/Langevin Scheme. 35. A. Nyberg and T. Schlick, 1. Cbem. Phys., 95, 4986 (1991). A Computational Investigation of Dynamic Properties with the Implicit-Euler Scheme for Molecular Dynamics Simulation. 36. T. Schlick and W. K. Olson, J. Mol. Biol., 223, 1089 (1992). Computer Simulations of Supercoiled DNA Energies and Dynamics. 37. M. E. Tuckerman, B. J. Berne, and A. Rossi, J. Chem. Phys., 94, 1465 (1991). Molecular Dynamics Algorithm for Multiple Time Scales: Systems with Disparate Masses. 38. R. K. Z. Tan and S. C. Harvey, in Theoretical Biochemistry and Molecular Biophysics, Vol. I, pp. 125-137, D. L. Beveridge and R. Lavery, Eds., Adenine Press, Guilderland, New York, 1991. Succinct Macromolecular Models: Applications to Supercoiled DNA. 39. L. Piela, J. Kostrowicki, and H. A. Scheraga,J. Phys. Chem., 93,3339 (1989).The MultipleMinima Problem in Conformational Analysis of Molecules. Deformation of the Potential Energy Hypersurface by the Diffusion Equation Method. 40. E. 0. Purisima and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 83, 2782 (1986). An Approach to the Multiple-Minima Problem by Relaxing Dimensionality. 41. A. V. Levy and S. Gomez, in Numerical Optimization 1984, P. T. Boggs, R. H. Byrd, and R. B. Schnabel, Eds., pp. 213-244, SIAM, Philadelphia, 1985. The Tunneling Method Applied to Global Optimization. 42. A. H. G. Rinnooy Kan and G. T. Timmer, in Handbooks in Operations Research and Management Science, Vol. 1, G. L. Nemhauser, A.H.G. Rinnooy Kan, and M. J. Todd, Eds., Elsevier SciencelNorth-Holland, Amsterdam, 1989. Global Optimization. 43. R. H. Byrd, E. Eskow, R. B. Schnabel, and S. L. Smith, Parallel Global Optimization: Numerical Methods, Dynamic Scheduling Methods, and Application to Molecular Configuration, Computer Science Report CU-CS-553-91, University of Colorado, Boulder, 1991. 44. R. B. Schnabel, in Mathematical Programming, M. Iri and K. Tanabe, Eds., pp. 227-261, Kluwer Academic, Dordrecht, 1989. Sequential and Parallel Methods for Unconstrained Optimization. 45. R. H. Byrd, C. L. Dert, A. H. G. Rinnooy Kan, and R. B. Schnabel, Math, Prog., 46,1(1990). Concurrent Stochastic Methods for Global Optimization. 46. S. Smith, E. Eskow, and R. B. Schnabel, Adaptive Asynchronous Stochastic Global Optimization Algorithms for Sequential and Parallel Computation, Computer Science Report CVCS-449-89. University of Colorado, Boulder, 1989.
References
67
47. S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, Science, 220,671 (1983).Optimization by Simulated Annealing. 48. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem. Phys., 21, 1087 (1953). Equation of State Calculations by Fast Computing Machines. 49. A. Dekkers and E. Aarts, Math Pmg., 50, 367 (1991). Global Optimization and Simulated Annealing. 50. 1. 0. Bohachevsky, M. E. Johnson, and M. L. Stein, Technometrics, 28,209 (1986).Generalized Simulated Annealing for Function Optimization. 51. M.-H. Hao and W. K. Olson, Macromolecules, 22, 3292 (1989). Searching the Global Equilibrium Configurations of Supercoiled. DNA by Simulated Annealing. 52. J. MorC and D. C. Sorenson, in Studies in Numerical Analysis, G. H. Golub, Ed., pp. 29-82, Mathematical Association of America, Washington, D.C., 1984. Newton’s Method. 53. J. More, in Mathematical Programming, The State ofthe Art, A. Bachem, M. Grotschel, and G. Korte, Eds., pp. 256-287, Springer-Verlag, New York, 1983. Recent Developments in Algorithms and Software for Trust Region Methods. 54. J. E. Dennis, Jr. and R. B. Schnabel, in Handbook in Operations Research andMathematica1 Sciences, Vol. 1 , G. L. Nemhauser, A.H.G. Rinnooy Kan, and M. J. Todd, et al., pp. 1-72, Eds., Elsevier ScienceINorth-Holland, Amsterdam, 1989. A View of Unconstrained Optimization. 55. J. MorC and D. J. Thuente, Mathematics and Computer Science, Division Preprint MCSP153-0590, Argonne National Laboratory, Argonne, Ill., 1990. On Line Search Algorithms with Guaranteed Sufficient Decrease. 56. W. C. Davidon, Variable Metric Method for Minimization, Report ANL-5990 (rev.), Argonne National Laboratory, Argonne, Ill., 1959. 57. M. J. D. Powell, Cornput.]., 7,155 (1964).An Efficient Method for Finding the Minimum of a Function of Several Variables without Calculating Derivatives. 58. P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright, SIAMJ. Sci. Statist. Comput., 4, 310 (1983). Computing Forward-Difference Intervals for Numerical Optimization. 59. L. B. Rall, Automatic DifJkrentiation--echniques and Applications, Lecture Notes in Computer Science 120, Springer-Verlag, BerlinlNew York, 1981. 60. A. Griewank, in Mathematical Programming 1988, pp. 83-107, Kluwer Academic, Tokyo, 1988. On Automatic Differentiation. 61. L. C. W. Dixon, SIAMJ. Opt., 1,475 (1991).On the Impact of Automatic Differentiation on the Relative Performance of Parallel Truncated Newton and Variable Metric Algorithms. 62. B. E. Hingerty and S. Broyde, Biopolymers, 24,2279 (1985).Carcinogen-Base Stacking and Base-Base Stacking in dCpdG Modified by (+) and ( - ) Anti BPDE. 63. A. Cauchy, Comp. Rend. Acad. Sci. Paris, 536 (1847). Mithode GCnCrale pour las RCsolution des Sysdmes d’Equations SimultanCes. 64. M. R. Hestenes and E. Stiefel, J. Res. Natl. Bur. Stand., 49,409 (1952).Methods of Conjugate Gradients for Solving Linear Systems.
65. M. R. Hestenes, Conjugate Direction Methods in Optimization, Springer-Verlag, New York, 1980.
66. R. Fletcher and C. M. Reeves, Comput. J., 7 , 149 (1964).Function Minimization by Conjugate Gradients. 67. M. J. D. Powell, in Lecture Notes in Mathematics, Vol. 1066, pp. 122-141, Springer, Berlin/New York, 1984. Nonconvex Minimization Calculations and the Conjugate Gradient Method. 68. M. J. D. Powell, Math, Prog., 12, 241 (1977).Restart Procedures of the Conjugate Gradient Method. 69. M. J. D. Powell, Math. Programming, 11, 42 (1976). Some Convergence Properties of the Conjugate Gradient Method.
68
Optimization Methods in Computational Chemistry
70. D. F. Shanno, Math Oper. Res., 3, 244 (1978).Conjugate Gradient Methods with Inexact Searches. 71. E. Polak and G. Ribikre, Rev. Fr. Inform. Rech. Oper., 16, 35 (1969). Note sur la Convergence de MCthodes de Directions Conjuguies. 72. J. Stoer, TbeozyNumer. Math., 28, 343 (1977).On the Relation between Quadratic Termination and Convergence Properties of Minimization Algorithms, Part I. 73. H. P. Crowder and P. Wolfe, IBMJ. Res. Develop., 16, 431 (1972). Linear Convergence of the Conjugate Gradient Method. 74. A. Cohen, SIAM J. Numer. Anal., 9,248 (1972). Rate of Convergence of Several Conjugate Gradient Algorithms. 75. P. Baptist and J. Stoer, Numer. Math., 28,367 (1977). On the Relation between Quadratic Termination and Convergence Properties of Minimization Algorithms. 76. M. Al-Baali, Inst. Math. Appl. 1. Numer. Anal., 5,121 (1985).Descent Property and Global Convergence of the Fletcher-Reeves Method with Inexact Linear Search. 77. J. C. Gilbert and J. Nocedal, SIAM J. Opt., 2,21 (1992). Global Convergence Properties of Conjugate Gradient Methods for Optimization. 78. Y. F. Hu and C. Storey,]. Opt. Theor. Appl., 71,399 (1991).Global Convergence Result for Conjugate Gradient Methods. 79. 0. Axelsson and G. Lindskog, Numer. Math., 48,449 (1989). On the Rate of Convergence of the Preconditioned Conjugate Gradient Method. 80. J. D. Evans, J. Inst. Math. Appl., 4, 295 (1967). The Use of Pre-conditioning in Iterative Methods for Solving Linear Equations with Symmetric Positive Definite Matrices. 8 1. P. Concus, G. H. Golub, and D. P. O’Leary, in Sparse Matrix Computations, J. R. Bunch and D. J. Rose, Eds., Academic Press, New York, 1976, pp. 309-332. A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Differential Equations. 82. T. Schlick and M. Overton, J. Comput. Chem., 8, 1025 (1987). A Powerful Truncated Newton Method for Potential Energy Minimization. 83. D. F. Shanno and K. H. Phua, ACM Trans. Math. Software, 6 , 618 (1980). Remark on Algorithm 500: Minimization of Unconstrained Multivariate Functions. 84. R. A. Scott and H. A. Scheraga, J. Chem. Phys., 42, 2209 (1965).Method for Calculating Internal Rotation Barriers. 85. R. A. Scott and H. A. Scheraga, J. Chem. Phys., 44,30.54 (1966).Conformational Analysis of Macromolecules. 11. The Rotational Isomeric States of the Normal Hydrocarbons. 86. M. Levitt, J. Mol. Biol., 168,595 (1983).Molecular Dynamics of Native Protein: I. Computer Simulation of Traiectories. 87. B. Lesyng and W. Saenger, Carbohydrate Res., 133,187 (1984).Influence of the Orientation of Hydroxyl Groups on the Puckering Modes of Furanoid Rings. 88. C.4. Tung, S. C. Harvey and J. A. McCammon, Biopolymers, 23, 2173 (1984). LargeAmplitude Bending Motions in Phenylalanine Transfer RNA.
89. A. George and J. W. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall, Englewood Cliffs, N.J., 1981. 90. 1. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods for Sparse Matrices, Clarendon Press, Oxford, 1986.
91. Ph. L. Toint, in Sparse Matrices and Their Uses, I. S . Duff, Ed., Academic Press, New York, 1981. Towards an Efficient Sparsity Exploiting Newton Method for Minimization. 92. A. R. Curtis, M. J. D. Powell, and J. K. Reid, J. Inst. Math., 13, 117 (1974). On the Estimation of Sparse Jacobian Matrices.
93. M. J. D. Powell and Ph. L. Toint, SIAMJ. Numer. Anal., 16,1060 (1979).On the Estimation of Sparse Hessian Matrices.
References
69
94. D. P. O’Leary, Math. Prog., 23,20 (1982).A Discrete Newton Algorithm for Minimizing a Function of Many Variables. 95. J. E. Dennis, J . and J. J. MorC, SIAM Rev., 19,46 (1977). Quasi-Newton Methods, Motivation and Theory. 96. P. E. Gill and W, Murray, J, Inst. Math. Appl., 9, 91 (1972). Quasi-Newton Methods for Unconstrained Optimization. 97. A. Griewank and Ph. L. Toint, in Nonlinear Optimization 1981, M. J. D. Powell, Ed., pp. 301-312, Academic Press, New York, 1982. On the Unconstrained Optimization of Partially Separable Functions. 98. A. Griewank and Ph. L. Toint, Numer. Math., 39, 119 (1982). Partitioned Variable Metric Updates for Large Structured Optimization Problems. 99. D. F. Shanno and P. C. Kettler, Math. Comput., 24,657 (1970). Optimal Conditioning of Quasi-Newton Methods. 100. J. Nocedal, Math. Comput., 35,773 (1980).Updating Quasi-Newton Matrices with Limited Storage. 101. D. C. Liu and J. Nocedal, Math. Prog., 45, 503 (1989). On the Limited Memory BFGS Method for Large Scale Optimization. 102. J. C. Gilbert and C. Lemarichal, Math. Prog., 45,407 (1989).Some Numerical Experiments with Variable-Storage Quasi-Newton Algorithms. 103. A. Buckley and A. LeNir, Math. Prog., 27, 155 (1983). QN-like Variable Storage Conjugate Gradients. 104. M. J. D. Powell, Math. Prog., 34,34 (1986).How Bad are the BFGS and DFP Methods When the Objective Function Is Quadratic? 105. R. H. Byrd, J. Nocedal, and Y. Yuan, SIAM 1. Numer. Anal., 24, 1171 (1987). Global Convergence of a Class of Quasi-Newton Methods on Convex Problems. 106. A. R. Conn, N. I. M. Gould, and Ph. L. Toint, SIAM J. Numer. Anal., 25, 433 (1988). Global Convergence of a Class of Trust Region Algorithms for Optimization with Simple Bounds. 107. A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Math. Comput., SO, 399 (1988). Testing a Class of Methods for Solving Minimization Problems with Simple Bounds on the Variables. 108. A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Math. Prog., 2, 177 (1991). Convergence of Quasi-Newton Matrices Generated by the Symmetric Rank One Update.
109. S. G. Nash and J. Nocedal, S A M J. Opt., 1 358-372 (1991). A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large-Scale Optimization. 110. X. Zou, I. M. Navon, F. X. Le Dimet, A. Nouailler, and T. Schlick, SIAMJ. Opt., in press. A Comparison of Efficient Large-Scale Minimization Algorithms for Optimal Control Applications in Meteorology. 111. S. C. Eisenstat and H. F. Walker, preprint, 1992. Globally Convergent Inexact Newton Methods. 112. R. S. Dembo, in Nonlinear Optimization, M. J. D. Powell, Ed., pp. 361-373. Academic Press, New York, 1982. Large Scale Nonlinear Optimization. 113. R. S. Dembo, S. C. Eisenstat, and T. Steihaug, SIAM J. Numer. Anal., 19, 400 (1982). Inexact Newton Methods. 114. R. S. Dembo and T. Steihaug, Math. Prog., 26, 190 (1983). Truncated-Newton Algorithms for Large-Scale Unconstrained Optimization.
11.5. S. G. Nash, in Numerical Optimization 1984, P. T. Boggs, R. H. Byrd, and R. B. Schnabel, Eds., pp. 119-136, SIAM, Philadelphia, 1985. Solving Nonlinear Programming Problems Using Truncated-Newton Techniques.
70
Optimization Methods in Computational Chemistry
116. S. G. Nash and A. Sofer, Oper. Res. Lett., 9,219 (1990).Assessing a Search Direction Within a Truncated Newton Method. 117. S. G. Nash, SIAM ]. Sci. Statist. Comput., 6 , 599 (1985). Preconditioning of TruncatedNewton Methods. 118. S. A. Zenios and M. C. Pinar, S / A M ] . Sci. Statist. Comput., 13 (Sept. 1992). Parallel BlockPartitioning of Truncated Newton for Nonlinear Network Optimization. 119. P. Deufihard, in Proceedings of the Copper Mountain Conference on Iterative Methods, Copper Mountain, Colorado, April 1-5, 1990. Global Inexact Newton Methods for Very Large Scale Nonlinear Problems. 120. J. W. Ponder and F. M. Richards, J. Comput. Chem., 8,1016 (1987). An Efficient Newtonlike Method for Molecular Mechanics Energy Minimization of Large Molecules. 121. T. Schlick and A. Fogelson, ACM Truns. Math. Software, 18, 46 (1992). TNPACK-A Truncated Newton Minimization Package for Large-Scale Problems. I. Algorithm and Usage. 122. T. Schlick and A. Fogelson, ACM Trans. Math. Software, 18, 71 (1992). TNPACK-A Truncated Newton Minimization Package for Large-Scale Problems. 11. Implementation Examples. 123. L. Fauci and A. Fogelson, Comm. Pure Appl. Math., in press. Truncated Newton Methods and the Modeling of Immersed Elastic Structures. 124. H. Matthies and G. Strang, Int. ]. Numer. Methods Engin., 14,1613 (1979).The Solution of Nonlinear Finite Element Equations. 125. S. C. Eisenstat, M. H. Shultz, and A. H. Sherman, in Advances in Computer Methods for Partial Differential Equations, R. Vichnevetsky, Ed., pp. 33-39, AICA, New Brunswick, N.J., 1975. Efficient Implementation of Sparse Symmetric Gaussian Elimination. 126. S. C. Eisenstat, M. H. Shultz, and A. H. Sherman, in Advances in Computer Methods for Partial Differential Equations, R. Vichnevetsky, Ed., pp. 40-45, AKA, New Brunswick, N.J., 1975. Application of Sparse Matrix Methods to Partial Differential Equations. 127. S. C. Eisenstat, M. H. Schultz, and A. H. Sherman, SIAM]. Sci. Statist. Comput., 2 , 225 (1981).Algorithms and Data Structures for Sparse Symmetric Gaussian Elimination. 128. S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman, Int. J. Numer. Methods Engin., 18, 1145 (1982).Yale Sparse Matrix Package. I. The Symmetric Codes. 129. D. J. Rose, in Graph Theory and Computing, R. C. Read, Ed., Academic Press, New York, 1972. A Graph-Theoretic Study of the Numerical Solution of Sparse Positive-Definite Systems of Linear Equations. 130. T. C. Oppe, W. D. Joubert, and D. R. Kincaid, NSPCG User’s Guide. Version 1.0: A Package for Solving Large Sparse Linear Systems by Various Iterative Methods. Technical Report CNA-216, Center for Numerical Analysis, University of Texas at Austin, 1988. 131. Y. Saad, SPARSEKIT: A Basic Toolkit for Sparse Matrix Computations, NASA Ames Research Center, Moffett Field, Calif., May 21, 1990. 132. J. W. H. Liu, ACM Trans. Math. Software, 11, 141 (1985). Modification of the MinimumDegree Algorithm by Multiple Elimination. 133. A. George, M. T. Heath, J. Liu, and E. Ng, SIAM]. Statist. Comput., 9,327 (1988).Sparse Cholesky Factorization on a Local-Memory Multiprocessor. 134. J. M. Ortega, SIAM J. Opt., 1, 565 (1991). Orderings for Conjugate Gradient Preconditioners. 135. J. O’Neil and D. B. Szyld, SIAM ]. Sci. Statist. Comput., 11, 811 (1990).A Block Ordering Method for Sparse Matrices. 136. S. G. Nash and A. Sofer, SZAM]. Opt., 1,530 (1991).A General-Purpose Parallel Algorithm for Unconstrained Optimization. 137. R. B. Schnabel and E. Eskow, SIAM ]. Sci. Statist. Comput., 11, 1136 (1990). A New Modified Cholesky Factorization.
References
71
138. E. Eskow and R. B. Schnabel, Software for a New Modified Cbolesky Factorization. Computer Science Department Technical Report CU-(3-443-89, University of Colorado, Boulder, 1989. 139. T. Schlick, S A M 1. Sci. Statist. Comput., in press. Modified Cholesky Factorizations for Sparse Preconditioners. 140. J. J. Mort, B. S. Garbow, and K. E. Hillstrom, ACM Trans. Math. Software, 7 , 17 (1981). Testing Unconstrained Optimization Software. 141. J. J. Mort, 8. S. Garbow, and K. E. Hillstrom, ACM Trans. Math. Software, 7 , 136 (1981). Algorithm 566: FORTRAN Subroutines for Testing Unconstrained Optimization Software. 142. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users’ Guide, Society for Industrial and Applied Mathematics, Philadelphia, 1992.
CHAPTER 2
Predicting Three-Dimensional Structures of Oligopeptides Harold A. Scheraga Baker Laboratory of Chemistry, Cornell University, Ithaca, New York 14853-2301
INTRODUCTION To address the main topic of this chapter, we must first define an oligopeptide. Recognizing that such a definition is an arbitrary one, we consider an oligopeptide to be a linear condensation polymer of a limited number (say, fewer than about 10 or 20) of amino acid residues. If the polymer chain is much longer, it is usually referred to as a polypeptide. As the methodologies to treat short and long chains are quite similar, we shall blur the distinction between these two types of molecules and use the terms oligopeptide and polypeptide interchangeably. Of course, the shorter the chain is, the smaller is the number of degrees of freedom characterizing its conformation; hence, the magnitude of the required conformational effort is relatively small. Even for longer polypeptides, the magnitude of the computational effort is small if the polypeptide consists of repeating sequences and a regularity constraint (i.e., one in which each sequence has the same conformation) is imposed. Consequently, the prediction of the three-dimensional structure of an oligopeptide or a regular repeating polypeptide has recently become an essentially solved (or at least soluble) problem. This state of affairs stands in contrast with that of a Reviews in Computational Chemistry, Volume Ill
Kenny B. Lipkowitz and Donald B. Boyd, Editors VCH Publishers, Inc. New York, 0 1992
73
74
Predicting Three-Dimensional Structures of Oligopeptides
globular protein for which the size and regularity constraints do not obtain, but, nevertheless, progress is being made, even for such larger molecules. What does it mean to “predict the three-dimensional structure of an oligopeptide” ? In contrast to a globular protein for which the average conformation is recognizable, with small fluctuations from the average, a small linear (and, less so, a small cyclic) oligopeptide consists of an ensemble of low-energy conformations. The partition function for such a system must contain contributions from all significantly populated low-energy conformers. One can therefore ask (1) for the conformation of lowest energy, U; or (2) for the average conformation (Boltzmann-averaged U ) ;or (3) for the conformation of minimum free energy, G, taking into account the conformations of various energies U,a solvation contribution V (which may be regarded as a potential of mean force), and the vibrational motion of each conformation. The difference between the free energies of two conformations, taking the vibrational motion into account, is illustrated in Figure 1, where the broad minimum of the higherenergy conformation B could endow it with enough vibrational entropy to reduce its free energy below that of the lower-energy conformation A. In this chapter, we present examples that answer each of the aforementioned questions, but we concentrate on searches for the global minimum of the potential energy, with some attention paid to the solvation and conformational entropy contributions. The methodology to compute the three-dimensional structures of oligopeptides should be extendable to larger molecules, that is, proteins. Besides the intrinsic interest in oligopeptides as important biologically active molecules, that is why so much effort is devoted to these small molecules, namely, for an
JkT
Figure 1 Schematic representation of a potential energy surface with two minima. The larger width of the higher-energy minimum B (leading to greater flexibility compared with the narrower lower-energy minimum A) could result in a lower free energy for conformation B that fluctuates about the higher-energy minimum.
Theoretical Foundations
75
ultimate understanding of the conformational properties of the larger ones. For globular proteins, however, for which it is meaningful to speak of an internal core surrounded by surface residues, one can introduce additional constraints in the computation, for example, separate treatment of the core and surface residues and use of pattern recognition based on the protein structures that have already been determined. In addition, anticipated increases in the power of computer hardware and software, use of parallel processing, and improved treatment of electrostatics and hydration should all contribute to making the protein structure prediction problem tractable. It should be emphasized that, in the last analysis, if we want to gain an understanding of the physical basis for folded structures of oligopeptides and proteins, we must learn how the energy leads to such structures; empirical “rules” and games may possibly predict structures, but they provide no insight as to how they arose. Confining our attention in this chapter to oligopeptides, we discuss the theoretical foundations of the computational methods, describe the computational methodology, present the results of such computations together with their comparison with experiment, discuss the future of oligopeptide structure computations, and, finally, consider the extension of this methodology to globular proteins. For details not covered in this chapter, the reader is referred to earlier reviews by the author and his colleagues,~-*0as well as to reviews by Leach,” Troyer and Cohen,l2 Dinur and Hagler,13 Williams,14 Boyd,lS and McCammon and Harvey.*6
THEORETICAL FOUNDATIONS The theoretical foundations for computing stable conformations of macromolecules have been developed in a series of publicationsl~17-21 and summarized in a review article.2 It is assumed that the native conformation of a macromolecule is that which corresponds to the global minimum of a function G, where G is the sum of the potential energy of all intrapolypeptide interactions, U, the free energy for all interactions involving the solvent, V (which includes the free energy of solvation), and a free-energy term arising from vibrational entropy. As mentioned, the vibrational entropy of conformation B may endow it with a lower free energy than that of conformation A (Figure 1). The computation of stable conformations of oligopeptides then involves three different problems: First, the geometry (bond lengths and bond angles) of the chain must be known, and the functional form (and the values of the parameters) of U and V must be determined. Second, the sum U + V must be minimized to locate the various local minima. Third, the vibrational (conformational) entropy of each minimum within a reasonable range of the global one, that is, within about 1 kcal/mol, must be calculated to locate the global minimum of G.
76
Predicting Three-Dimensional Structures of Oliaopeptides
As far as the first problem is concerned, conventions for the description of the polypeptide chain, geometrical data, and procedures for transformation of coordinates were discussed in a previous review.’ Since that review was published, a new set of conventions has been adopted by an international commission on nomenclature22 and is in general use today. The conformational energy, U + V, of an arbitrary conformation may then be expressed as a function of the coordinates of the macromolecule. In the following sections, we discuss the generation of open-chain and cyclic polypeptides, the nature of G, and procedures for finding the global minimum of this function, including the calculation of vibrational entropy.
GENERATION OF OLIGOPEPTIDE CHAIN To compute the free energy of an oligopeptide, it is first necessary to express the Cartesian coordinates of every atom of the molecule (in any conformation) in the same coordinate system. The coordinates of each amino acid residue and those of the end groups of the chain are first expressed in local coordinate systems. The chain is then built in a prespecified amino acid sequence and conformation by connecting the individual residues and end groups, with proper adjustment of the dihedral angles.
Residue Geometry Initially, each residue is oriented in a local Cartesian coordinate system with the N atom of the residue at the origin, the Ca atom on the positive x axis, and the amide H atom in the xy plane with H in the + y direction (Figure 2). The z axis is fixed to give a right-hand coordinate system. The input coordinates for the residue begin with the N, H, and Ca and end with an N, which is used to connect the next residue or end group. Input coordinates for all residues (and end groups) in such standard local coordinate systems, based on crystallographic data [including the dependence of the bond angle at the cx carbon, 7(NCaC’), on amino acid type], are given in listings for the Empirical Conformational Energy Program for Peptides (ECEPP).23
End-Group Geometry N-Terminal Group The coordinates of the N-terminal group are expressed in the same local coordinate system as the amino acid residue. As an example, the CH3CO-
Generation of Oligopeptide Chain
77
Y
II
I I
\I
H
/
/
2
/
/
/
I
/
N
/
1
Figure 2 Geometry of a full L-amino acid residue, where R represents the side chain. All solid lines lie in the xy plane; wedges indicate positions of atoms above or below this plane. A D-amino acid is identical except that the H and R groups attached to (3have z coordinates of opposite sign.
end group (including the atoms N, H, and Ca of the first residue) is shown in Figure 3. Thus, there is an overlap of three atoms (N, H, Ca) when one begins with the first end group and then attaches a residue.
C-Terminal Group The coordinates of the C-terminal group are expressed in a different (local) coordinate system. As an example, the -NHCH, end group (including the carbony10 atom of the preceding amino acid residue) is shown in Figure 4. The origin is at the C’ atom. The local x axis is chosen so that the Ca atom of the preceding residue lies in the --x direction; the 0 atom and the N of the end group lie in the xy plane, with a positive y coordinate assigned to the 0.
Constructing a Molecule One begins with the end group at the N terminus and uses all end-group coordinates including the N, H, and Ca. One next adds coordinates from the first full residue, and all subsequent ones, finally ending up with the C-terminal group. To connect all the residues and end groups together and to express the coordinates of all of the atoms in one coordinate system (that of the initial end group and residue), it is necessary to carry out a series of rotations and transla-
78
Predicting Three-Dimensional Structures of Okobebtides
Y I
0 Figure 3 Geometry of the CH3CO- end group. All solid lines lie in the x y plane; wedges indicate positions of atoms above or below this plane.
tions of the local coordinate systems (of the type shown in Figures 2 to 4) with appropriate matrix operations. Figure 5 illustrates two residues that are to be connected by such operations. Corresponding matrix operations are used to generate the side chains. When the residues are connected together, they are also rotated to the desired conformations [i.e., specific values of 4, w, and x
+,
Y I I
Ca/
/
/
\ H/N
H
\
H
Figure 4 Geometry of the -NHCH3 end group. All solid lines lie in the xy plane; wedges indicate positions of atoms above or below this plane. The Cu, C', and 0 atoms are not part of this end group, but serve only to orient it.
Generation of Oligopeptide Chain
79
Y' I
Y I
H
I
" / C ,
\
0
Figure 5 Two amino acid residues before rotation through the angle 6 and translation to form a trans peptide bond. Each residue is shown in its own local coordinate system, but the second residue has been rotated by 180" around its local x axis. The x, y and x ' , y' axes are coplanar. All solid lines lie in this plane; wedges indicate positions of atoms above or below this plane.
of each residue (Figure 6)Jfor subsequent computations of the energy. These matrix operations are described in the ECEPP manua1.23
Ring Closure without Symmetry The aforementioned procedures are used for generating open-chain molecules. It is also necessary to be able to generate cyclic structures or, equivalently, loop segments between two fixed points of a polypeptide chain. G6 and Scheraga have developed such procedures for exactly closed rings without24-29 1 in molecules whose bond lengths and bond angles are and with 3 0 ~ symmetry, kept fixed. In a chain or loop of n dihedral angles, only n - 6 dihedral angles can be varied independently, because the values of six of the dihedral angles are determined by the relationships that ensure exact ring closure. Therefore, to generate a cyclic structure in a molecule without symmetry, n - 6 dihedral angles are selected independently, and one solves for the remaining six24J9 dihedral angles. Dudek and Scheraga32 developed an alternative formulation of these equations, and Palmer and Scheraga33.34 modified the original formulation24 to take into account the different equilibrium values for the backbone bond
80
Predicting Three-Dimensional Structures of Oligobebtides
\
b7
+1
Figure 6 Perspective drawing of terminally blocked L-alanine, the backbone of which may be considered a prototype of a section of an oligopeptide chain. The dashed lines indicate the limits of the alanine residue. The Greek letters designate the backbone dihedral angles.22 For larger side chains, the dihedral angles for rotation about the side-chain bonds are designated by X I , xz, and so 011.22
angles ?(NCaC') of each of the 20 amino acid residues. Bruccoleri and Karplus35 also modified the original formulation24 but by allowing for bond angle bending; however, Palmer and Scheraga33 showed that it is not necessary to alter bond angles to obtain satisfactory closures with rigid geometry. Other procedures for treating loops have been presented by Levinthal and coworkers36137 and by Moult and James.38
Ring Closure with Symmetry For molecules with symmetry, the condition of exact ring closure decreases the number of independent variables. For example, for molecules with C,, I, or S,, symmetry, the number of independent variables decreases from m (i.e., the number of variable dihedral angles in a symmetry unit) to m - 2, m 3, and m - 1, respectively. Equations analogous to those derived for exact ring closure without symmetry24 are presented in Reference 30. This procedure has C,, I, and C2 symmetry) and to been applied to cyclohexaglycyl3~(with c6,&, the cyclic decapeptide gramicidin S39 (with C2 symmetry). For example, for
Early Use of Hard-Sphere Potential
81
gramicidin S which contains two proline residues, there are 18 backbone dihedral angles (not counting the peptide bond dihedral angles), and the procedure described in Reference 24 would have involved 12 independent variables; however, because of the C , symmetry, with m = 9 variable dihedral angles in the symmetry unit, there are only 7 independent variables whose range has to be searched to find the remaining 2 dependent variables to effect exact ring closure (see “Build-up Method”).
EARLY USE OF HARD-SPHERE POTENTIAL Before introducing the complete function G to calculate oligopeptide conformation, it is instructive to examine a simplified historical approach, based on the use of only a hard-sphere potential. Consideration of early computations40-44 with a hard-sphere potential provides very useful, though approximate, insights into the stereochemistry of an oligopeptide chain, a segment of which is shown in Figure 6. Some of the literature refers to this structure as a dipeptide because it contains two peptide bonds, but this is bad biochemical terminology. Instead, this structure should be referred to as a terminally blocked amino acid residue (L-alanine in Figure 6, hence N-acetylalanyl-“-methyl arnide). The terminal methyl groups are the analogs of the a carbons of the neighboring residues when this structure is a section of a longer oligopeptide chain. The geometry (bond lengths and bond angles), taken from the X-ray structural literature,23 is maintained fixed, and the conformation is altered by varying the dihedral angles for rotation about the bonds of the backbone and side chains. The coordinate transformation procedures referred to in the previous section are used to define the conformation in a given coordinate system. The early calculations40-44 employed only the hard-sphere potential (lefthand side of Figure 7), in which there is no attractive term and only an infinite repulsion when two atoms approach each other within a distance less than or equal to the sum of their van der Waals radii. Therefore, certain conformations of the terminally blocked amino acid residue X are disallowed because of steric restrictions. The allowed and disallowed regions can be represented on a (+,$) map,40,42,45 with the side chains fixed in their preferred rotamer states, as in Figure 8. From this figure, it can be seen that more than about 50% of the area is disallowed for any of the amino acids. For the simplest amino acid, glycine, that is, for the structure where X = Gly, all the area of different shadings is allowed. The diagram is symmetrical because of the absence of an R substituent on the OL carbon of glycine. As soon as an R group is added (e.g., the methyl group of alanine), then further restrictions appear, as shown. Larger side chains, especially those, like valine, with branching at the f3 carbon, lead to
82
Predicting Three-Dimensional Structures of Oligopeptides
U
U
Hard -Sphere Potent iol
-
Lennard Jones 6-12 Potential
Figure 7 Schematic drawing, comparing the hard-sphere and Lennard-Jones 6- 12
potentials. The potential energy U is plotted as a function of the interatomic
distance Y.
further restrictions. The diagram for D residues would be the mirror image of that shown in Figure 8 for L residues. The calculations represented in Figure 8 illustrate the simple fact that atoms occupy space; that is, the hard-sphere potential already provides a considerable amount of restriction on the allowed conformations of terminally blocked amino acid residues and hence of oligopeptides built up from such residues. This fact will remain true, to a first approximation, even when more realistic energy functions are introduced.
MORE REALISTIC POTENTIALS Following the application of the hard-sphere potential, more realistic potential functions were introduced. The coordinate transformation procedure for generating a chain and the procedure for generating rings, however, remained the same. After an open-chain or cyclic peptide has been generated, it is then necessary to calculate the function G and minimize it. The procedures involved are discussed below.
Potential Functions Several algorithmic formulations have been developed for computations on peptides and proteins. Those in most widespread use are AMBER,46*4’ CHARMM,4* DISCOVER,‘@ and ECEPP.23750-52 Similar formulations are
More Realistic Potentials
83
360
300
-0
240
h
-
"0 I
180
Y
3. I20
60
0
60
I20
I80
Q)(N -Ca)
240
300
360
Figure 8 Allowed areas of the steric map for various terminally blocked amino acid residues X.45 In area 0, no conformations are allowed. Conformations in areas 1 to 4 are allowed for X = glycine, in areas 2 to 4 for X = alanine, in areas 3 to 4 for higher straight-chain homologs, whereas only area 4 is allowed for X = valine or isoleucine. The circles marked R and L indicate the locations of the standard rightand left-handed a helices on the steric map.
also used for computations on small organic molecules, for example, MM2.53 In all of them, the force fields are expressed in terms of empirical potential energy functions. The mathematical form of each component of the function is based largely on some physical concept concerning the nature of the component, but the form is specified to render it efficient for computational programming. The functions also contain many constants that describe the molecular geometry (i.e., bond lengths and bond angles) and the strength of particular interatomic interactions. The constants are generally parameterized on empirical structural and thermodynamic information for small organic model molecules, but they may also include some information derived from quantum mechanical computations (e.g., the partial charges on atoms). The potential energy is expressed in the form of atom-centered pair potentials, with the energy of the molecule being a sum over all pairwise interactions. The latter include contributions from electrostatic, nonbonded, and hydrogen-bonding interactions. There is usually an intrinsic torsional potential for rotation about
84
Predicting Three-Dimensional Structures of Oligopeptides
single bonds. The various components are selected in a self-consistent manner in any given formulation of the potential function. Only the total pair interaction, and not its ingredients, has physical meaning. Therefore, terms taken from various formulations should not be mixed in a given computation. Recently, critical comparisons and evaluations have been made of several of these potentials.s4-s7 Quantitatively, these functions (e.g., in ECEPP) are given by the expression
i#j
+
c .',[ 5 ( 'fl
!$)I2
-
6
(
!$)lo
] + c DI,, "'l
'fl
where E,, and ri,O are the potential well depth and position of the minimum of the pair interaction energy (the 12-6 expression pertains to the nonbonded energy, and the 12-10 expression to the hydrogen-bonding energy), q is the partial atomic charge, D is the dielectric constant, ri, is the distance between the two interacting atoms, A kis the barrier height for rotation around the kth bond, 8, is the dihedral angle, and n is the n-fold degeneracy of the torsional potential. The development of this potential function and its parameterization may be traced through References 58 to 70. The original ECEPP function23 was published in 1975. It was upgraded in 1983 (ECEPP/2)50 and in 1992 (ECEPP/3),s2as improved experimental data became available for its parameterization. By applying the potential function of Eq. [1], in the ECEPP form, the (+,$) map of the molecule in Figure 6 takes on the appearance illustrated in Figure 9, where contours of constant energy (in kcal/mol) are shown. Now, instead of regarding conformations as allowed or disallowed (as in Figure 8), the relative energies of the various conformations are specified. The dots represent local minima. Those at (+,+) = (- 154",153"),(-74",-45"), (54",57"), (-84",79"), and (7#",-64")represent the almost fully extended, the right- and left-handed a-helical, and the C;q and C F conformations, respectively. The global minimum corresponds to the C;q conformation, the C F conformation being of very high energy according to the ECEPP/3 potential function. The C:q and CT, that is, the seven-membered equatorial and axial ring conformations, respectively, are illustrated in Figure 10. Corresponding (+,$) maps for the AMBER and CHARMM potential functions differ somewhat from that of Figure 9 and are illustrated in Reference 55. An energy contour map for N-acetylglycyl-N'-methylamide is shown in Figure 11. In contrast to the map in Figure 9, the one for the glycyl residue is symmetrical because this residue is not chiral. The distributions of the dihedral angles in X-ray structures of pro-
More Realistic Potentials
8.:
. -I2Ot
.--
-180
-120
-60
0
+ (degrees)
60
I20
I80
Figure 9 Conformational energy contour map of N-acetyl-"-methylalanine amide, for x1 = 60". Locations of minima are indicated by the filled circles. The contour lines are labeled with energy in kcalhol above the minimum-energy point at
(+A) = (-84",79").
teins are illustrated in Figures 12, 13, and 14, for nonglycyl-nonprolyl residues, glycyl residues, and prolyl residues, respectively. The maps in Figures 12 and 13 resemble those in Figures 9 and 11, respectively. The map for prolyl residues in Figure 14 reflects the restriction on I$ because of the pyrrolidine ring and the restriction on I$ to two regions, the right-handed cx helix and the poly-L-proline regions around (+&) = (-75", -35") and (-75", 150"), respectively. The low-energy conformational states of all 20 naturally occurring amino acids52971 and of some dipeptidesS2172-7s and oligorners76 have been reported. The foregoing discussion pertains to oligopeptides with trans peptide groups connecting two amino acid residues. The energetic factors favoring trans over cis and the planarity of the peptide group have been identified.77 For peptide groups preceding proline, the cis conformation is energetically competitive with the trans form.52J2J6977
86
Predicting Three-Dimensional Structures of OliaoPeptides A
B
Figure 10 Stereographic views of hydrogen-bonded ring conformations of N-acetyl-64"). "-methylalanine amide. A. C,eq (Q -84", - 75").B. C,aX(Q 78",
-
-
+
+-
It is of interest to compare the (+,Q)maps for terminally blocked alanine computed with both the hard-sphere potential and a more realistic potential. Such a comparison is illustrated in Figure 15 (however, with an early version of the realistic potential7* and with an earlier convention). It can be seen from Figure 15 that the hard-sphere potential captures a large measure of the stereochemical properties of the molecule; that is, the repulsive part of the Lennard-Jones potential is a dominant component of the total energy of Eq. [ 11, and the other terms modulate the stereochemical properties to a lesser extent. Some formulations of the potential energy function (e.g., References 27 and 79, and AMBER, CHARMM, and DISCOVER, as well as most force fields used in small-molecule studies) include terms that allow for bond stretching and bond angle bending, that is, for flexible geometry. Hence, the terms in Eq. [I] are augmented by the expression27
u=t
c
+ K,
[K,(b - b,)2 A2
+ KJT
-
T")2
+ Kl(1 - 1,)' + Xi'( -] lo)]
PI
where the terms pertain to bond stretching, bond angle bending, out-of-plane deformation, and corner interaction, respectively. In principle, it is more rigorous to include the terms in Eq. [2] because molecules are not completely rigid. Allowing for flexibility is a necessity when
More Realistic Potentials
-180
-120
-60
0
(deg reed
60
I20
87
I80
Figure 11 Conformational energy contour map of N-acetyl-N'-methylglycine amide. Locations of minima are indicated by the filled circles. The contour lines are labeled with energy in kcal/mol above the minimum-energy points at (+,$) = (-83",76") and (83",-76").
details of the molecular geometry must be determined in small, highly substituted organic molecules, because most local steric overlaps can be relieved (fully or partially) only by distortions of the geometry. On the other hand, the consideration of flexible geometry usually is much less (if at all) helpful in the structural analysis of oligopeptide and protein structures, because constraints are relieved by other means and because the use of flexible geometry may even result in complications or the introduction of numerical uncertainties for the following reasons.54-57 First, ample evidence exists that equilibrium values for bond lengths and bond angles in oligopeptides and proteins are close to those observed in crystals of amino acids; that is, they already take into account short-range atomic interactions. In contrast to small organic molecules, therefore, most of the unfavorable atomic contacts (i.e., steric overlaps) in a folded oligopeptide or protein arise from long-range interactions, that is, from overlaps between atoms far from each other in the covalent structure. Such overlaps can be
I . , . . ..... . .. . .. 2;;..;:. '2 ..:: .
-180 -180
~
-90
0
90
180
@ (ded
Figure 12 Plot of the conformations of 5986 nonglycyl-nonprolyl residues (from X-ray structures of proteins) in the conventional (+,$) map. ci,e,a*, and E* represent convenient bounded regions encompassing frequently occurring conformations.
180
90 /7
0 Q)
Q
O
-3-90
-180 -
Figure 13 Plot of the conformations of 560 glycyl residues (from X-ray structures of proteins) in the conventional (+,+) map.
More Realistic Potentials
89
Figure 14 Plot of the conformations of 326 prolyl residues (from X-ray structures of proteins) in the conventional (+A) map.
relieved easily by rotations about intervening single bonds, without the need for varying bond lengths or bond angles. Second, the formal representation of these variations in terms of bond stretching and bond angle bending involves a simplification, because the accompanying energy is generally expressed as a harmonic function of the distortion, as in Eq. [2]. This is an approximation that is valid only for distortions with small amplitudes. For example, anharmonic terms become significant for deviations of several degrees in bond angles. Nevertheless, anharmonic terms are usually not included in the potential energy function, mostly because of the lack of information about reliable numerical values of the anharmonicity constants. This may result in severe numerical errors in the potential energy and sometimes in computed structures with unrealistic bond lengths and bond angles. On the other hand, when rigid geometry is used, the number of variables is decreased by about a factor of 3 because only the dihedral angles for rotation about single bonds are varied. Therefore, the use of rigid geometry has some advantages, provided that the set of bond lengths and bond angles is chosen carefully. Electrostatic interactions are usually calculated by using Coulomb’s law, which involves partial charges on the atoms, as in Eq. [ 11. For this purpose, the interior of a protein is usually treated as a low-dielectric medium, with an effective dielectric constant of 2 to 4. On the other hand, for isolated charged
YO
Predicting ?‘hree-Dimensional Structures of Oligopeptides
4
I
60
I20
I80
# (N-Ca)
3
1
I
240
300
i0
Figure 15 An early calculation7~illustrating the superposition of the hard-sphere map on the energy contours for N-acetyl-N’-methylalanineamide. The units of energy are kcal/mol.
particles surrounded by water, a high value (near 80) is appropriate. In some algorithms (e.g., Reference 48) the dielectric constant is treated as a distancedependent function. The treatment of electrostatic effects in macromolecular modeling has recently been reviewed elsewhere.14*80A powerful and productive numerical method to obtain the electrostatic field inside and around a macromolecule involves numerical solution of the Poisson-Boltzmann equation.81.82 Several approximations are helpful for accelerating the computation of the energy. These include united-atom83 and united-residue84 representations. In the former approximation, CH, CH,, and CH, groups are treated as single atoms, whereas in the latter approximation, whole residues are treated as single point centers of interaction when any two residues are beyond u critical distance from each other.
More Realistic Potentials
91
Solvation is treated in several different ways. For nonaqueous85 and aqueous86 solutions, binding of individual solvent molecules to the oligopeptide can be treated by introducing a binding equilibrium constant. In a second approach, the oligopeptide is placed in a box of solvent molecules, and the interaction energies between all the (solute and solvent) molecules in the box are calculated within the framework of energy minimization, Monte Carlo, or molecular dynamics procedures.87-89 Because a complete exploration of conformational space in this second approach is too time consuming, a third approach is used for the practical computation of V, in which the free energy of solvation is computed, as an approximation, by expressing it as an appropriate average over all positions and orientations of the solvent molecules for a given conformation of the solute.90~9*This can be done by using simple empirical hydration models. In such models, it is assumed that the effective influence of the solvent can be expressed for every functional group of the solute in terms of an averaged free energy of interaction with a layer of nearby water molecules that form a hydration shell. Two alternative forms have been introduced for such a model. In one (e.g., Reference 92), the free energy of hydration of any group is taken to be proportional to the water-accessible volume of the first layer of water molecules surrounding it. In the other form, this free energy is assumed to be proportional to the solvent-accessible surface area of the group93794; a recently developed algorithm for a rapid calculation of accessible surface areas and their derivatives9s.96 makes this approach very attractive. In both forms, the proportionality constant for each group, representing a free energy density, is obtained as an empirical parameter from experimental data on small organic molecules. These data may be derived from solution thermodynamic measurements or from a physical property that depends on the conformation, such as NMR coupling constants. Several such hydration models have been evaluated in terms of their abilities to discriminate among various folded forms of bovine pancreatic trypsin inhibitor (BPTI)97-99. In this approach, the free energy of hydration, V, is added to the conformational energy of the oligopeptide in the absence of solvent, U, of Eq. [l],to obtain the total conformational energy, G:
where N
V
=
2 Ap;.
i- 1
[41
Ai is the solvent-accessible surface area of the ith group (computed with the algorithm of Reference 96), ui is an empirically derived free energy parameter for the ith group, and N is the number of groups in the oligopeptide.
92
Predicting Three-Dimensional Structures of Oligopeptides
As discussed elsewhere,2J8-2* the entropy is also treated classically. With the bond lengths and bond angles fixed, the conformational entropy arises from the fluctuations of the backbone and side-chain dihedral angles and the distribution among various local minima. The normalized statistical weight, wi, which expresses the probability of occurrence (or mole fraction) of the ith conformation (in which there are small fluctuations in the dihedral angles about the energy minimum), is given by20921JoO w, = ( 1 / Z ) (2nRT)WZ (det Fi)-l/2 exp(-AUJRT),
PI
where AUi is the conformational energy at the ith minimum (relative to the lowest energy), RT is the gas constant times the temperature, k is the number of variable dihedral angles (degrees of freedom), and Fi is the matrix of second derivatives of the energy20 at the ith minimum. The partition function Z is given by N
2
=
(27rRT)L’Z
i= 1
(det Fi)-1/2 exp ( - A U J R T ) ,
[61
where N is the number of low-energy minima (AU < 3-5 kcallmol). The conformational free energy G iat the ith minimum is defined as
G = -RT In w i
[71
and the relative free energy as
where Go is the free energy of the conformation of lowest energy (Lee,the one at AU = 0). The relative entropy is given by AS = ( 1 / T )(AU - AG).
[91
This is equivalent to the definition of liberational entropy given by GO et a1.20.21.100 An alternative approach to calculate the configurational entropy, involving a scanning simulation method in the absence of solvent, has been carried out by Meirovitchlol and applied to the two-dimensional freely jointed chain of hard disks102 and to decaglycine in the a helix,103-104hairpin,103 and statistical coil forms.104,105 The restriction of the entropy of a chain by formation of a loop has been computed by several authors.106J07 Interacting108 and overlapping109J10 loops have also been treated.
More Realistic Potentials
93
In considering fluids, a variety of approaches have been used to compute entropy and free energy: a free-volume method,lllJl2 thermodynamic perturbation theory,lI3-116 thermodynamic integration,"7-121 umbrella sampling,122-124 and a Monte Carlo recursion method.125-126 The entropy of association of two protein molecules in water has also been computed.127
Optimization Methods During the years in which conformational energy calculations have been applied to polypeptides and proteins, optimization of the conformational energy has been achieved by energy minimization, Monte Carlo, and molecular dynamics procedures. Some minimization methods that do not require gradients are those of Rosenbrock;128 Fletcher, Powell et al.129-131; and Nelder and Mead.132 Those that require gradients inciude steepest descent, conjugate gradient, and the Gill-Murray method.133-136 The one that has been judged to be most efficient in our laboratory is the secant unconstrained minimization solver (SUMSL) method of Gay137 which uses analytical gradients. Metropolis Monte Carl0138 calculations have also been used for optimization; however, unless used in conjunction with other procedures (see later), the Metropolis procedure is inefficient. Likewise, molecular dynamics methods163891139 which involve 1-fs steps and generally attain trajectory times of 100 ps, are not efficient for optimization because conformational transitions in polypeptides are much longer (although the technique might be applicable to oligopeptides). An implementation of the molecular dynamics technique that enables time steps as long as 15 fs to be made has been reported,139-141 but it too does not reach the time regime of real computational transitions in available computer time. Longer-time processes, however, have been explored by Brownian dynamics .89
Anciiiary Techniques In some computations, the backbone of the polypeptide chain is treated as a virtual chain, that is, as a connected set of Ca atoms, the connections being referred to as virtual bonds. At the end of such a computation, it becomes necessary to convert the virtual chain to a real one, one containing all backbone atoms with the appropriate bond lengths and bond angles. A geometrical procedure is available for carrying out this transformation in terms of dihedral angles.142 This transformation requires knowledge of the dihedral angles and Jr of one residue to obtain a unique solution, and criteria have been given142 for choosing the initial values for this residue. A particularly interesting use of a virtual chain is made in a treatment of protein folding with the aid of differential geometry.143 By focusing on virtualbond segments consisting of four and five Ca atoms, respectively, it has been
+
94
Predicting Three-Dimensional Structures of Oligopeptides
possible to extend the description of polypeptide chains from the (+,$) representation of single residues to four- and five-0-length scales, respectively. The course of the backbone of the chain is described in terms of two quantities, K~ and T ~ ,the curvature and torsion, respectively, at Cp. This methodology has been used to compare different polypeptide and protein conformations, and to classify the range of conformations that are adopted by polypeptide chains, especially for initiating the formation of folded structures. Rackovsky144 has developed a generalization of the differential geometric approach that is applicable to arbitrary length scales, and has applied this method to the classification of the known protein X-ray structures. Computations with all of the algorithms used in conformational analysis have been greatly facilitated by continual software and hardware developments. Among these are the use of array processors14s and parallelism.146 Implementation of parallel processing is currently a very active field.
APPLICATION TO SIMPLE SYSTEMS Computations on simple systems and comparison of the results with experiment have provided confidence in the validity and applicability of the foregoing approach. Considering the fundamental structures adopted by polypeptide chains, the a helix and the p sheet, their stereochemistry has been elucidated by such calculations. For example, homopolymers of amino acids can adopt either a right- or a left-handed &-helical twist. Figure 16 illustrates an early calculation78 of a (+,$) map (presented here in the original earlier convention) for poly-L-alanine, in which every residue of the chain is constrained to have the same conformation. In contrast to the single-residue map for terminally blocked alanine (Figure 9), in which the global minimum is the C,cq conformation, the lowestenergy minima for the regular homopolymer correspond to the right- and lefthanded OL helices; the right-handed helix is the global minimum and is more stable than the left-handed one by 0.4 k c a l h o l per residue,147 in agreement with experiment. As in Figure 15, the hard-sphere map is superposed on the energy contour map in Figure 16 and again illustrates the dominance of the repulsive part of the Lennard-Jones potential. The gap in both the hard-sphere and energy conarises because these helices have very low tour maps near (+,$) = (90°,1200) axial translation; that is, the chain does not advance rapidly enough along the helix axis to allow sufficient spatial separation between atoms on adjacent turns of the helix. Similar energy contour maps and the relative stabilities of right- and lefthanded a helices have been computed for other homopolymers, and Table 1 compares the computed and observed preferred twists of a number of a-helical
Application to Simple Systems
95
0
Figure 16 An early calculation7*of the superposition of the hard-sphere map on the energy contours for regular poly-L-alanine helices. The units of energy are kcal/mol per residue. structures.148 The origin of the preference for right- or left-handedness lies in a CPH, . . backbone nonbonded interaction that favors right-handedness, as in poiy(L-alanine) and in interactions involving atoms of the side chain beyond CPH, that can favor either sense of twist.109147 As an illustration, in the series of poly(ot-tho, metu, and puru)chlorobenzyl aspartates, the ortho, metu, and unsubstituted polymers adopt left-handed a-helical forms, whereas the para polymer adopts a right-handed a-helical form. In these polymers, the side chain takes on a transverse or longitudinal orientation with respect to the backbone, bringing the chlorine atom close enough to the backbone to influence its helical twist. Figure 17 illustrates the lowest-energy conformations of the meta polymer, showing a favorable, attractive interaction between the C-CI dipole and the dipole of the closest peptide group in the left-handed form; the correspond-
.
96
Predicting Three-Dimensional Structures of Oligopeptides
-
Table 1 Comparison of Calculated and Experimental a-Helix Sense for Several Polyamino Acids Helix Sense" Polyamino Acid Poly-L-alanine Poly-L-valine -L-As~-COOH -L-GIu-COOH Poly-P-methyl-L-Asp
-ethyl-L- Asp -n-propyl-L-Asp -isopropy I- L- Asp Poly-p-benzyl-L-Asp -0-CI-benz yl-L-Asp -rn-C1-benzyl- L- Asp -p-CI-benz yl-L- Asp -p-CN-benzyl-L- Asp -p-NO,-benzyl-L- Asp -p-CH,-benzyl-L- Asp Poly-y-methyl-L-Glu -y-benzyl-L-Glu p-Ci-benzyl-L-Glu p-CN- benzyl-L-Glu Poly-L-phenylalanine Poly-1.-tyrosine "R, right-handed; L, left-handed.
Calculated
Experiment
R R R R
R
L L
R
R L
L L
R R R R
R K R
R
R
R
K R R L R R
R L L L
R R R R R
R
-
Reprinted, with permission, from Scheraga.148
ing interaction in the right-handed form is repulsive. Thus, this dipole-dipole interaction plays a dominant role in leading to the preference for lefthandedness. These preferences have been verified experimentally for all three chloro-substituted poly(benzy1 aspartates).l49 If the a hydrogen is substituted as, for example, in a-amino isobutyric acid (Aib), then the conformational energy (+,Jt) map is very restricted, and the preferred form of Aib peptides is computed to be the 310 rather than the a-helical form.150 This prediction has been verified by NMR and infrared spectroscopic measurements on solutions of oligomers of Aib.151 The stability of the 3lo-helix for short poly (Aib-L-alanine) polypeptides and the increased stabilization of the a-helical form with a lengthening of the chain have been demonstrated recently.152 Chothia153 has observed that the p sheets in globular proteins have a right-handed twist. Computations on model g sheets, for example, the parallel and antiparallel structures of poly(L-valine) sheets,*S4 illustrated in Figure 18, have accounted for these observations.8JOJ54 In general, side chain-backbone interactions within each strand result in a preference for a right-handed twist for L-amino acids, although there are exceptions. In addition, interstrand side
Application to Simple Systems Left - handed
Right -handed
a
Figure 17 Orientation of the side chains of the left- and right-handed a-helices of poly(m-chlorobenzyl-L-aspartate).The solid arrows represent the direction of the C-CI, ester, and arnide dipoles, respectively.' 0
A
B
Figure 18 Stereo drawings of the minimum-energy p sheets with five CH,CO(L-Val),-NHCH, chains. A. Antiparallel structure. B. Parallel structure.154
97
98
Predicting Tkree-Dimensional Structures of Oligopeptides
chain-side chain interactions also make significant contributions. Thus, intrastrand interactions in an isolated extended poly(L-isoleucine) strand energetically favor the left-handed twist, but interstrand interactions result in the stabilization of a poly(L-isoleucine) p sheet with a right-handed tWist.lS5 Poly(L-serine) is exceptional in that it is computed to favor a left-handed p sheet.156 This prediction is verified by the observed behavior of serine residues in proteins. Even though serine occurs relatively infrequently in p sheets, it usually imparts a local deformation to the polypeptide chain that corresponds to reduced or left-handed local twisting. The computed energies have been used to predict the relative stabilities of parallel and antiparallel p sheets of polyamino acids.10,'s6 The antiparallel form was predicted to be favored for sheets formed by residues with small unbranched (or y-branched) side chains (glycine, alanine, leucine), whereas the parallel form is favored for residues such as valine, isoleucine, lysine, serine, threonine, phenylalanine, and tyrosine. All of these predictions agree with experimental observations on oligopeptides, wherever data are available.157 The computational methodology has also accounted for various types of packing of ct helices and p sheetsg>"? a/ct packing, alp packing, p/p packing, pap crossover packing, p barrels, coiled-coil packing of helices, and so on. Conformational transitions, for example, the helix-coil transition, for which a large literature exists,*sS have also been treated.10 The helix-coil transition has been treated by statistical mechanics, making use of the one-dimensional Ising model, and has been addressed for both homopolymers and binary and multicomponent random copolymers of amino acids.10 The relative helix-coil preferences of each of the 20 naturally occurring amino acids in water have been determined from experiments on thermally induced helix-coil transitions in host-quest binary random copolymers in which the host is a neutral water-soluble polyamino acid and the guest is, in turn, each of the 20 naturally occurring amino acids.159 These results are expressed in terms of the Zimm-Bragg parameters160 IJand s. Figure 19 illustrates the temperature dependence of s for the various amino acids.159 These values represent the intrinsic properties of each residue, reflecting only the interaction of a side chain and its own backbone because long-range interactions are averaged out in the random copolymers used to obtain these values.161 When these values, together with the Ising model, are used to predict the locations of ct helices in specific-sequence copolymers, that is, proteins, it is necessary to modify the values of s to take account of specific long-range interactions, such as salt links, charge-dipole interactions, and hydrophobic interactions.162 These interactions are accounted for in terms of empirical parameters for each specific pair of residues that interact with each other. Numerical values of the parameters are derived from experimentally determined helix contents for specific-sequence peptides. The parameters modify the statistical weights of the respective conformational states in the Zimm-Bragg treatment'60 of the helix-coil transition.
Application to Simple Systems I
I
1
I
I
99
1
*
\
0
20
40
60
-
-
-
-
0 TemperaI 1'ure
I
("C)
I
20
I
I
I
40
I
60
Figure 19 Plots of s versus T for the 20 naturally occurring amino acid residues. The data were obtained with the host-guest technique.159
Besides this statistical mechanical approach t o the question of helix stability, the problem has also been addressed by conformational energy calculations. First, the helix-breaking tendencies of such residues as serine and aspartic acid can be accounted for by the tendency toward formation of side chainbackbone hydrogen bonds in nonhelical conformations163 (Figures 20 and 2 1). Second, the free energies of the helical and statistical coil forms in water have
1 00 Predicting Three-Dimensional Structures of Oligopeptides I
I1
C'
111
IV
Figure 20 Illustration of the types of hydrogen bonds between serine and theonine side chains and the b a ~ k b o n e . 1 ~ ~
been calculated; they rationalize the observed behavior in terms of interatomic interactions, including those with the soIvent.1*,*6,164,165 For example, the increase in s, that is, the stabilization of the helix, with increasing temperature for poly(L-valine) (Figures 19 and 22) is accounted for by hydrophobic interactions between the valine side chains in the OL helix.165 On the other hand, poly(is0leucine) exhibits a decrease in s with increasing temperature (Figure 19), even though, like valine, isoleucine has a branch on the p carbon and its side chains also participate in hydrophobic interactions in the OL helix. The difference in behavior of poly (L-valine), poly (L-isoleucine), and poly (L-leucine)has been accounted for by subtle differences in the degree of hydration, that is, in the strength of hydrophobic interactions, in both the helical and coil forms of each of these When homopolymers consist of ionizable groups, for example, poly(Llysine), the helix-coil transition is inducible by a change of pH, with the pH of
Application to Simple Systems 101
II
I
Figure 21 Illustration of the types of hydrogen bonds between aspartic acid and asparagine side chains and the backbone.
Temperature ("C1
Figure 22 s-versus-T curves for poly(L-valine)in water. The squares are the experimental results (Figure 19), and the line is the calculated r e s ~ 1 t . l ~ ~
102 Predicting Three-Dimensional Structures of Oligopeptides the transition region depending on ionic strength. Conformational energy calculations on the helix-coil transition in aqueous salt solutions of poly (L-lysine) have accounted for the effect of p H and ionic strength on the tran~ition.1~7 In addition to the order-disorder transition, observed for (Y helices, helical structures can also be induced to undergo transitions from one ordered form to another. For example, a crystalline form of poly[P-(p-chlorobenzy1)-Laspartate] can be made to undergo a phase transition from an a-helical to an o-helical form by heating; rotational entropy is computed to play a role in this process.68 Another order-order transition is the solvent-induced interconversion between polyproline I (with cis peptide bonds) and polyproline I1 (with trans peptide bonds), a process that has also been subjected to conformational energy calculations.*~The transition has been accounted for in terms of differences in the binding of solvent components to the peptide C=O groups.
MULTIPLE-MINIMA PROBLEM A large number of local minima exist on the multidimensional conformational energy surface of an oligopeptide.168 Thus, in addition to having proper potential functions and local minimization algorithms, it is necessary to be able to surmount barriers on this surface to reach the global minimum.168 A direct mathematical approach has not resolved this problem except for systems involving three or four degrees of freedom. 169-174 Therefore, much effort, described below, has been expended to solve this problem. The various procedures that have been developed are used separately, and in various combinations with each other, to locate the approximate native conformation. They are all intended as the initial approaches in the computations. In the final stages, the results from all of these procedures are collated into an approximate three-dimensional structure whose energy should lie in the potential well containing the global minimum (Le., this structure should be a good approximation of the native structure). Then the complete conformational energy of this structure is minimized taking all pairwise interactions (over the whole molecule) into account.
Build-up Method In a renormalization group-type approach, one starts with the low-energy structures of single residues and uses these to build up low-energy structures of dipeptides, tripeptides, and so on, carrying out energy minimization at each stage.175 (Recently,99 it has been possible to minimize the function U + V directly.) The build-up procedure requires storage of many (backbone and sidechain) low-energy conformations. For example, at the single-residue level, the
Multiple-Minima Problem 103
numbers of such conformations are in the tens, whereas they are already in the hundreds at the dipeptide level. It might, at first sight, seem to be an impossible task to store so many minimum-energy conformations, as the peptides get longer and longer; however, various strategies permit a legitimate elimination of many of these as longer peptides are built up. For example, in the pentapepbuilt from the tetrapeptides A-B-C-D and Btide of sequence A-B-C-D-E, C-D-E, we can eliminate all tetrapeptide conformations in which residues BC-D (being common to both tetrapeptides) are not in the same conformational state.176 It is also possible to reduce the sizes of the ensembles being stored by eliminating very high energy conformations of the component peptides. The conformational states that are stored are ordered according to their energies, taking hydration into account with a solvent-shell model. As the peptides become longer and longer, long-range interactions alter the (energetic) order of their conformations. The variety of structures being stored at each stage have conformations that allow such long-range interactions to come into play, as residues are added to the growing chain. From a practical point of view, this procedure and others have been applied to oligopeptides as large as 25 to 30 residues. These 25- to 30-residue segments can then be “stitched” together to build up even larger structures. This method has been applied to several linear and cyclic oligopeptides. In the case of the linear pentapeptide Met-enkephalin, a favorite model for method evaluation, the same lowest-energy conformation of the isolated molecule177 (Figure 23) has been obtained by the build-up method and by other methods described later; this attests to the validity of the several methods employed. Other higher-energy structures, however, pack better in crystals because of the presence of intermolecular hydrogen bonds.178 Consequently, as shown by computations on several crystalline forms of enkephalin,l78 the observed crystal structures are favored over hypothetical packed structures formed from the global-minimum structure of the isolated molecule. On the other hand, the conformations in the crystals have energies higher (as isolated molecules) than that of the structure of Figure 23. This influence of environment illustrates an important point, namely, the conformation of an isolated linear peptide hormone, for example, need bear no relation to the biologically active form that is bound to the receptor. To mimic the biologically active form, such as in drug design, it is necessary to carry out the calculations on the complex of hormone plus receptor, as was done for crystalline enkephalin.178 If a peptide is cyclic, there is an additional constraint, namely, that the ring must close exactly (see the earlier sections on ring closure). In addition, cyclic peptides are probably less susceptible to crystal packing interactions than are flexible linear ones. Figure 24 illustrates computed energy-minimized399179 and observed180 structures of the cyclic decapeptide gramicidin S. The small distortion in the lower right-hand portion of the observed structure is undoubtedly due to a nearby urea molecule that cocrystallized with the decapeptide. More recently, Mirau and Boveyl8* carried out a two-dimensional NMR
1 04 Predicting Three-Dimensional Structures of Oliaopeptides
Figure 23 Stereo view of the global minimum-energy structure of Met-enkephalin in the absence of water.177
ROESY experiment on gramicidin S in solution and compared the experimental spectrum with a theoretical spectrum calculated from the published atomic coordinates39 of the energy-minimized structure; close agreement was obtained for the backbone protons. Differences that were observed for the side-chain protons were attributed to motion in solution. The build-up method has also been applied to fibrous and globular proteins. Collagen is an example of a fibrous protein that involves interchain association to form a triple-stranded coiled-coil structure. Conformational energy calculations have been carried out on three-chain complexes of several synthetic poly(tripeptide) analogs, poly(G1y-X-Y), of collagen.182-185 Because of the regularity conditions imposed on each tripeptide in the computations, the number of degrees of freedom was small; so that, again, the multipleminimum problem was surmounted by an adequate coverage of conformational space using the build-up procedure. The computations indicated that poly(G1y-Pro-Pro), poly(G1y-Pro-Hyp), and poly(G1y-Pro-Ala) form stable, triple-stranded, coiled-coil, collagen-like structures, whereas poly(G1yAla-Pro) does not, all in agreement with experiment. The results obtained for the regular-sequence polytripeptides can be extended to include local amino acid substitutions in the triple helix.186 The computations also provided an explanation for the association of the chains in the collagen molecule [in contrast to the single-chain structures of or-helical forms of, e.g., poly(y-benzy1-Lglutamate)] : the resulting interchain interactions among the three chains of collagen lower the energy of a tripeptide unit below that in the nonassociated single chain. Furthermore, the interchain interactions induce a slight conforma-
Figure 24 Stereo views of (A) computed39J79 and (B) X-ray180 structures of gramicidin s, showing (among other things) a hydrogen bond between the ornithine side chain and the phenylalanine backbone carbonyl group.
B
A
106 Predicting Three-Dimensional Structures of Oligopeptides tional change in going from the low-energy form of the single chain to the lower-energy form of the triple-stranded complex. After completion of the calculations182 on poly(Gly-Pro-Pro), it was learned that Okuyama et al.187 had carried out a single-crystal X-ray structure analysis of (Pro-Pro-Gly),,. The calculated structure (Figure 25) is in agreefor all (nonment with theirs, with a root-mean-square deviation of 0.3 hydrogen) atoms, based on a comparison between the X-ray coordinates (kindly provided by Professor M. Kakudo) and the computed coordinates.
Optimization of Electrostatics (Self-consistent Electric Field) In the self-consistent electric field (SCEF) procedure, one makes an initial approximation by neglecting all components of the total energy except the electrostatic and assumes that each residue must have optimal electrostatic energy; that is, the dipole moment of each residue must be optimally aligned in the electrostatic field created by the whole molecule. If it is not, the orientations of the dipole moment of each residue are changed successively to improve its electrostatic energy. As this involves a local movement (in the field of the whole molecule), it is computationally very fast. Then the energy of the whole molecule (taking all interactions, not only electrostatic, into account) is minimized, and the whole procedure is repeated iteratively. Thus far, this procedure. has been tested on a 19-residue poly(L-alanine) chain with acetyl- and N-methyl amide terminal blocking groups. The global minimum of this structure is presumably a right-handed a helix. Starting with conformations very far from the helical conformation, the global minimum was achieved in trivially short computation time.188 The top stereo diagram of Figure 26 illustrates one of the starting conformations (optimized by conventional energy minimization), which is very far from a helical one. After application of the electrostatic optimization procedure (and subsequent energy minimization with the complete energy function), the global-minimum (righthanded a-helix) structure at the bottom of Figure 26 was obtained. Unlike the usual minimization procedures which make small changes in the dihedral angles, this procedure can make very large changes (even 100" to 200") in these independent variables. Further improvements of this method are described under Electrostatically Driven Monte Carlo.
Monte Carlo plus Minimization To overcome the inefficiency of Metropolis Monte Carlo, which searches all of the conformational space but does so very slowly, a procedure to move rapidly through the space of local minima has been devised.177 The energy of a random starting conformation is minimized. Then, a random change (selected in the range 0 to 27r) is made in a randomly chosen dihedral angle, and the
Figure 25 Calculated triple-stranded coiled-coil complex of poly(G1y-Pro-Pro) of lowest
1 08 Predicting Three-Dimensional Structures of Oligopeptides P
Figure 26 Stereo diagrams illustrating (top) a compact conformation of N-acetylAla,,-NHCH, after local minimization of the complete energy function to reach this particular local minimum, and (bottom) the global-minimum (right-handed a helix) structure attained by first optimizing the local electrostatic interactions and then carrying out a complete energy minimization.188
energy of this new conformation is minimized. The Metropolis criterion is used to decide whether to accept this new minimum, and the procedure is then iterated. Figure 27 shows how 13 of 18 random starting conformations of the pentapeptide enkephalin converged to the same global minimum (Figure 23); the remaining 5 runs were trapped in (recognizable)higher-energy local min-
Multiple-Minima Problem 109
I*jl 1
rn ,
, g
I5
16
Phe-4
,180
' ,;: I
4
9
0
(deg)
I80
Figure 27 Illustration of random starting conformations of Met-enkephalin (indicated by the numbers 1 to 18) and the global minimum (indicated by 0), which was reached from 13 of the 18 starting conformations.177 In a subsequent variation of the procedure,189 all 18 starting conformations converged to the same global minimum.
110 Predicting Three-Dimensional Structures of Oligopeptides ima. The Monte Carlo plus minimization (MCM) procedure was subsequently improved189by allowing random changes in more than one dihedral angle with successively lower probabilities and by sampling fluctuations in backbone dihedral angles more frequently than side-chain dihedral angles; with this modified sampling strategy, the five initial conformations, which previously did not converge, all achieved convergence. Attempts are now being made to scale up this procedure to larger molecules and to include the effect of hydration.
Electrostatically Driven Monte Carlo The electrostatically driven Monte Carlo (EDMC) procedure incorporates the best features of the SCEF and MCM methods, combined with random conformational changes to simulate the effect of thermal motion.190 This technique analyzes a given conformation (the current one), producing an electrostatic diagnosis based on the orientations of the dipole moments of the protein with respect to the local electric field. This diagnosis is used in combination with a random sampling technique to generate new conformations, each of which is subjected to conventional energy minimization to reach a local energy minimum. This local minimum is compared with that corresponding to the current conformation with the aid of the Metropolis criterion. Each time a conformation is accepted, it replaces the current one and is subjected to an identical analysis. If all the electrostatic diagnoses fail to produce an acceptable conformation, and this situation remains unalterable after a significant number of random conformations are generated, the process is forced to choose one of the conformations generated previously (but rejected) and to use that one as the subsequent current conformation. This procedure is equivalent to a perturbation resulting from thermal effects. Figure 28 and 29 illustrate how the righthanded a helix [the presumed global-minimum structure of a terminally blocked 19-residue chain of poly(~-alanine)]was achieved in successive stages of this procedure by starting from the fully extended structure and from the left-handed a helix, respectively. Figure 30 shows how a hairpinlike structure of Met-enkephalin (the same as that in Figure 23) was obtained from a randomly generated conformation by this same procedure.191 Other applications of the EDMC method are described in References 192 and 193.
Adaptive Importance Sampling Monte Carlo Another procedure to overcome the inefficiency of Metropolis Monte Carlo is adaptive importance sampling.194-196 In this technique, the partition function (and quantities derived from it, such as the probability of a given conformation) is evaluated by continually upgrading the distribution function (ultimately to the Boltzmann distribution) to concentrate the sampling in the region(s) where the probabilities are highest.
Multible-Minima Problem 1 1 1
Figure 28 Diagrams of a sequence of conformations of poly( L-alanine) encountered at various successive stages on a conformational pathway during a folding simulation, starting from a fully extended c0nformation.1~0The final structure is the same
as shown in Figure 29e.
The partition function for a polypeptide chain with a specific amino acid sequence can be written as194
Z(P1 =
S...l e xp[-P~~+,*,iIl 4+,Jl,x);
POI
where p = 1/RT, R is the gas constant, T is the absolute temperature, and U{+,+,i) is the conformational energy of a rigid-model polypeptide chain as a
1 1 2 Predicting Three-Dimensional Structures of Oligopeptides
Figure 29 Diagrams of a set of conformations of poly(L-alanine)encountered at various successive stages on a conformational pathway during a folding simulation, starting from a left-handed a helix.190
function of the dihedral angles {+,$,%}, where the latter is a shorthand notation for all of the backbone (+,$) and side-chain ( x ) dihedral angles. With Eq. [lo], the quantity P,(+j,$i) can be computed:
1 = z(p) .f.*.Sexp t-~U{+’,+‘,ilI
d{+’,+’,Xl,
[111
where P,(+;,Jl;) is the single-residue probability density of the backbone structure for the ith residue averaged over all other dihedral angles of the chain, with
a
b
C
d
Figure 30 Stereo views of some conformations of Met-enkephalin along a conformational pathway in the electrostatically driven Monte Carlo procedure during a folding simulation, starting from a randomly generated structure. The structure in (d)*91 is the same as that in Figure 23.
114 Predicting Three-Dimensional Structures of Oligopeptides all side-chain dihedral angles X held fixed,194 and where the primes indicate that, during the evaluation of the conformational energy, the backbone dihedral angles of the ith residue are fixed at +i and +i.Similar definitions for the probability densities of the peptide bond and side-chain dihedral angles of the ith residue have been formulated.196 For convenience of interpretation, the probability is converted into a “conditional free energy” by the relation
The quantity U{+‘,+’,j(} is computed by summing over the ECEPP energies of all pairs of nonbonded atoms of the whole molecule. Equations [ll] and [12] are used to compute conditional free energy (+,I+) contour maps, similar to standard conformational energy (+,+) contour maps, for each residue of the polypeptide. The probable conformation of the ith residue is taken as the one of lowest conditional free energy, and the probable conformation of the whole polypeptide chain is assumed to be the combination of the probable conformations of the individual residues. Besides calculating the probable conformation, it is also possible to compute the average conformation195 over the whole ensemble. For this purpose, we first compute the average values of +i and +i for the ith residue as
The average conformation of the oligopeptide chain is then taken as the combination of the average values of and for all individual residues. The integrals of the type appearing in Eqs. [lo], [ll],[13], and [14] are evaluated by an adaptive importance sampling Monte Carlo technique that overcomes the deficiencies of Metropolis Monte Carlo mentioned in an earlier section. The entire conformational space is searched, and the search is automatically adjusted to concentrate its sampling in regions where the magnitude of the integrand is largest (“importance sampling”). The probability density function that governs the sampling initially covers the whole (+,+) space of each residue, with emphasis on preferential sampling in the (broadly defined) lowenergy regions of the map (see Figure 1 of Reference 194). After the initial value of the partition function (i.e., of these integrals) is computed, the algorithm uses information from the Monte Carlo search and adjusts the probability density function so that the subsequent Monte Carlo search will emphasize the regions of high probability. The adjusted probability density function is then used to provide an improved estimation of Z(p) and of the quantities computed therefrom. The process of evaluation and adaptation is continued until the probability density function approximates the Boltzmann distribution function.
+
+
Multiple-Minima Problem 115
This procedure has been tested on Met-enke~halin,19~-196and it led to a lowest-energy conformation similar to that found by the other methods discussed here; however, because of the inclusion of conformational entropy in this procedure,196 a conformation higher in energy but lower in free energy was also obtained.
Increase in Dimensionality Kirkpatrick et a1.197 and Vanderbilt and Louie198 developed the simulated annealing approach in which the temperature of the system is raised (when the minimization becomes trapped in a local minimum) and a Monte Carlo procedure is carried out to allow the system to escape from the local potential well (see Simulated Annealing for a discussion of this method). We have developed a method for relaxing a system not by raising the temperature but by raising the dimensionality of the space.199J00 In a higher-dimensional space, there are many more degrees of freedom in which the atoms can move about, making it easier to adjust to a low-energy conformation. Many potential barriers in three dimensions do not exist in higher dimensions. This method starts from a low-energy high-dimensional conformation and obtains a lowenergy three-dimensional structure from it by gradual contraction of the dimensionality. The contraction in dimensionality is achieved by use of CayleyMenger determinants, a simplified form of which has been derived.199-201 By use of this procedure, the energy minimization problem is recast in terms of distances as the primary variables. In this respect, there is a similarity to the distance-geometry approach of Crippen202 and Braun et al.203; however, our method differs from these in the manner in which the distances are used. Each distance variable is initially set to its own minimum-energy value subject to whatever geometric constraints may be imposed on it. This starting set of distances is then at a global energy minimum if the dimensionality is not a consideration; that is, this is a lower bound on the energy. To obtain a realizable three-dimensional structure, a penalty function is added to the objective function to be minimized. This penalty function consists of the Cayley-Menger five- and six-point determinants. The purpose of these is to force the four- and five-dimensional volumes of the structure to zero (to satisfy the necessary and sufficient condition that a structure be embeddable in a three-dimensional space). A penalty function consisting of upper and lower bounds on the distance is also added. These bounds are obtained from covalent geometric constraints but may optionally contain bounds from other theoretical or experimental considerations. The objective function to be minimized is
where the w's are relative weights, FE is the ECEPP energy, F4D and F,, are the Cayley-Menger determinant constraints on the 4D and 5D volumes, and FB
116 Predicting Three-Dimensional Structures of Oligopeptides incorporates information on upper and lower bounds. By steadily increasing the weights of the Cayley-Menger determinants, the distances are forced to three-dimensionality, and the three-dimensional global energy minimum is approached from below rather than from above. The expectation is that, as the dimensionality is gradually reduced, the structure, having started from a highdimensional global minimum, will evolve into the three-dimensional global energy minimum. Preliminary results199 for a virtual-bond pentapeptide and for full-atom representations of several terminally blocked amino acids are encouraging, and application of the method to enkephalin2OO led to the same global minimum described in Figure 23.
Deformation of the Potential Energy Hypersurface We have recently developed another algorithm to search for the global minimum of a potential energy function in the conformational analysis of molecules.204 The algorithm is based on the deformation of the original potential energy hypersurface in such a way as to obtain only a single minimum that, in most cases, is related to the global one. This single minimum can easily be attained from any starting point of the modified hypersurface by standard local minimization procedures. The position of this minimum with respect to the global one in the original hypersurface may have been changed during deformation; therefore, a procedure is applied in which the global minimum is usually attained by gradually reversing the deformation. The hypersurface is deformed with the aid of the diffusion (or heat conduction) equation, with the original shape of the hypersurface having a meaning analogous to the initial concentration (or temperature) distribution. The technique is illustrated in Figure 31 for a function f ( x ) of only one variable with two minima, where the original curve is transformed into a function F(x,t) with only one minimum by solution of the diffusion equation
with the boundary condition F (x,O) = f(x).The single minimum at t = 0.25 is attainable from any point of the deformed space by a simple minimization. Then, the reversing procedure (shown by the arrows directed downward) is applied by considering a sequence of the deformed curves at various values oft (less than 0.25) until t = 0. Each step of the procedure is followed by a minimization symbolized in Figure 31 by a ball moving downhill from the minimum position of the upper curve and always reaching the position of the minimum in the lower curve. In the final step, the position of the global minimum is found.
Multiple-Minima Problem 1 1 7
deformation
Figure 31 Illustration of the stages of deformation of a test function, f ( x ) = x4 2x3 + 0.9x2, and of the reversing procedure.204
+
For a problem with many degrees of freedom, Eq. [16] is replaced by dF AF = at
where A is the Laplacian Z(Wax,2). As a test of the method on a manydimensional problem, the technique was applied to clusters of Lennard-Jones atoms.205 Because it is possible to obtain an analytical solution of Eq. [17] for certain forms of the potential function, for example, for Gaussians, tbe original Lennard-Jones potential f(y) was expressed as a sum of d Gaussians,
11 8 Predicting Three- Dimensional Structures of Oligopeptides and the Fourier-Poisson integral
was used to solve Eq. [ 171 analytically, where JIx- yll is the length of the vector x-y. The solution of Eq. [19] is
The conformational spaces of clusters of Lennard-Jones atoms were searched by application of the diffusion equation method for clusters of various sizes m from m = 5 to 55. For example, for m = 55, with 3m - 6 = 159 degrees of freedom, there are about 1045 local minima, and the global minimum (a MacKay icosahedron) was found in about 400 s on an IBM 3090 computer.205 The diffusion equation method (DEM) was then applied to terminally blocked alanine,206 using the ECEPP/2 potential function of Eq. [l].For this purpose, this potential function was transformed into an appropriate form to evaluate the Fourier-Poisson integral. Figures 32 to 35 show the (+,+) energy maps for terminally blocked alanine at t = 0, 10.0 (where only one minimum remains), 0.295 (a stage in the reversing procedure), and, finally, a trajectory of the local minima from t = 10.0 to t = 0, where the global minimum of the original function is recovered.206 The diffusion equation method has also been applied to Met-enkephalin,zOc with results as good as those obtained with the other methods cited here.
Mean-Field Theory As the maximum value of the square of the ground-state wavefunction of any system may be close to the global-minimum position of the potential energy,207 the wavefunction should be able to identify the global minimum even though the potential energy may have many local minima. Olszewski et a1.208 have made use of this principle to develop a self-consistent multitorsional field (SCMTF) method of global optimization of the conformational energy of an oligopeptide. Somorjai209J10 also made use of this principle, but implemented it in a different manner, as discussed in Reference 208. With bond lengths and bond angles kept fixed, the ground-state wavefunction was obtainedZO8 by solving the Schrodinger equation in dihedral angle space 8 = (01,...,ON), where the 0;’s are the N, dihedral angles of the oligopeptide. The Hamiltonian operator, H,was approximated by
fi = -
c - - + V(0), h2 a2 21, a0;
Multiple-Minima Problem 1 1 9
I80
9c
h
[r
:
Y
c
3
-9c
-I 8C -I 0
- 90
I
0
9 (deg)
I
90
Figure 32 ECEPP (+&) map of terminally blocked alanine at t position of the global minimum.206
I 0
=
0. Point C is the
where l j is the moment of inertia associated with dihedral angle Oj, and V(6) is the potential energy for conformation 6. As an additional approximation, as in the Hartree method in quantum theory,211 the dihedral angles are assumed to be independent of each other (mean-field approximation). These approximations lead to a set of N coupled, one-dimensional Schrodinger e-quations, one for each of the N dihedral angles. The effective potential energy, Vtff, in each of these one-dimensional Schrodinger equations depends on, the mean field created by averaging over the other dihedral angles O1 (1 # i). VFff is approximated by a Monte Carlo procedure. This procedure has thus far been applied to terminally blocked alanine (with 4, and x as variables) and to Met-enkephalin (with +i and of each residue as variables, but with the mi’s and side-chain dihedral angles fixed at the values corresponding to the known global minimum). The global minimum was achieved in several trial runs for each of these molecules. It is important to note that the several methods discussed here all lead to the same global-minimum structure and energy of Met-enkephalin (Figure 23)
+,
+;
120 Predicting Tbree-Dimensional Structures of Oligopeptides I
-1
-I 80
-90
0
9 (deg)
90
I80
Figure 33 Deformed potential surface for terminally blocked alanine at t = 10 A2. The position of the unique minimum is indicated by point M.206
when the same (ECEPP) potential function is used; this attests to the efficiency of each of these procedures. We may thus regard the multiple-minima problem as having been solved for oligopeptides and for regular-repeating fibrous proteins. It is of interest to compare the relative efficiencies of the foregoing methods by considering the CPU time (on one processor of the IBM 3090 computer) to reach the global minimum for Met-enkephalin, when starting from a random conformation. This information can be provided only for several of the more recently developed methods because the older methods were developed on computers that are no longer available. The CPU times for the MCM, EDMC, DEM, and mean-field procedures are -3 h, -2 h, 10 min, and 8 min, respectively. To treat larger molecules, consisting of N dihedral angles, these techniques scale exponentially for MCM and EDMC, and as N3 or at most N4 for the DEM and mean-field procedures. Note that DEM is a deterministic procedure, whereas the other three procedures are stochastic.
Multiple-Minima Problem 121
-180
-90
0
d (deg)
90
I80
Figure 34 Deformed potential surface for terminally blocked alanine at t = 0.295 A2. The unique minimum of the t = 10 i\z map of Figure 33 has moved to the intermediate position I, but (even though other minima appear) there is no problem in identifying point I with the position of the global minimum in the map of Figure 32.206
Simulated Annealing Several workers212-219 have used the simulated annealing technique of Kirkpatrick et a1.197 and Vanderbilt and Louie198 in optimization problems involving the location of the global minimum-energy structures of polypeptides and proteinszlz.2’3.216-219 and for the refinement of X-ray and NMR structures.214J15 This technique combines a Monte Carlo search of conformational space at an initially high temperature with an appropriate “cooling schedule,” that is, a lowering of the temperature that, if gentle enough, theoretically ensures that the system eventually becomes trapped (“freezes”) into the conformation of lowest energy.220 This is an improvement over, say, a Metropolis Monte Carlo search,l38 because it biases the acceptance criterion in a way that apparently ensures that the system converges to an energy minimum. N o mini-
122 Predicting Three-Dimensional Structures of Oligopeptides
-
I80
9c
h
0, Q)
z ! o j.
-9c
-18C -I
I
3
- 90
I
I
0
90
I80
9 (deg) Figure 35 Trajectory of the global minimum for terminally blocked alanine.2n6 Point M corresponds to the unique minimum at t = 10AZ,and point C corresponds to the global minimum at t = 0.
mization is carried out at any stage of simulated annealing or at the final stage because, at the end of such a run, the system has, in principle, reached a frozen state corresponding to a local minimum.197 The time-consuming minimization steps that are required in MCM are thus circumvented. The main drawback of this method, however, is that it is strongly dependent on the choice of the cooling schedule for its efficiency and success: a very slow cooling rate causes unnecessary wastage of computer time as the search does not spend much time in the minima; conversely, a rapid cooling rate can cause the system to become supercooled, thus getting trapped in local minima from which it cannot escape. Furthermore, as the optimal cooling schedule is a sensitive function of the topography of the potential hypersurface, the cooling schedule depends critically on the particular system whose energy is to be minimized, as well as on the nature of the potential energy function. The determination of the cooling schedule is a matter of trial and error.22"
Extension of Methodology to Large Polypeptides and Proteins 123 The simulated annealing technique seems to be successful for constrained problems, for example, to refine an X-ray or NMR s t r u ~ t u r e 2 ~ ~ ,with 2 * 5 constraints that keep the conformation close to the starting one; however, as shown by Nayeem et a1.22' in an application to search the whole conformational space of the pentapeptide Met-enkephalin, the simulated annealing technique failed to locate the global minimum. As shown in Figure 36, the MCM technique always converges to a unique (global) minimum. Figure 37, however, shows that the simulated annealing technique does not; also, the results vary from run to run. Furthermore, the behavior of the root-mean-square deviations with respect to the global minimum exhibits no correlation with the overall energy decrease in the case of simulated annealing, whereas such a correlation is evident with MCM; this implies that, even though the potential energy decreases in the annealing process, the Monte Carlo simulated annealing trajectory does not necessarily proceed toward the global minimum. Nayeem et a1.221 have discussed possible reasons for the failure of the simulated annealing technique in problems in which it is necessary to search conformational space when little or no structural information is available in advance for the molecule of interest, even though the technique does seem to work for improving o r refining given structures.
EXTENSION OF METHODOLOGY TO LARGE POLYPEPTIDES AND PROTEINS The methodology described in the preceding is being extended to large polypeptide molecules, to models of fibrous proteins such as collagen'82-185 and silk,222 to globular proteins, and to enzyme-substrate complexes.223J23
Build-up Method The globular protein human leukocyte interferon (containing 156 residues) is being treated by the build-up procedure.146 Thus far, approximately two thirds of the molecule has been built up.
Build-up with Limited Constraints Another globular protein, bovine pancreatic trypsin inhibitor (BPTI), has also been treated by the build-up procedure; however, because of a limitation on computer time when the calculation on this protein was carried out, a limited set of simulated NMR distance constraints (taken from the known X-ray structure225) was used226J27 to reduce the number of conformations
a L
5 ; E \ -5 a
&0 2-10 w
-16
0
I"
I
I
5000
I
10000
Accepted Step Number 2
0
F Q ) a
i
)r
c w o O t E (u \ - 5
;Ti
a w -2-,0 c)
-15
0
5000
Accepted Step Number
I0000
Figure 36 Progress of the MCM procedure, starting from two different initial randomly chosen conformations of Met-enkephalin (a and b). In each of the two cases, the ECEPP energy and the root-mean-square deviation with respect to the global minimum-energy structure are shown.221
124
I
t .-0
a -
c
.-0
-
-
-
1.0
I
I
70
I
I
-
I -
0
w
0
2500 5000 Accepted Step Number
7500
0
0
2500
5000
7500
Accepted Step Number Figure 37 Progress of the simulated annealing procedure, starting from the same two initial conformations of Met-enkephalin (a and b) as in Figure 36. In each of the two cases, the ECEPP energy and the root-mean-square deviation with respect to the global minimum-energy structure are shown.221
125
126 Predicting Three-Dimensional Structures of Oligopeptides retained in the procedure. The backbone atoms of the resulting structure (shown in Figure 1 of Reference 9) have a root-mean-square deviation of 1.19 A from an idealized model of the X-ray structure. With availability of more computer time, as for the calculations on interferon,l46 it may be possible to carry out a further calculation on BPTI without introducing any distance constraints. The ultimate test for any computed structure (for a protein of known structure) will be the fit to the computed X-ray structure factors.
Calculations with Constraints Statistical information, based on X-ray structures of proteins, can be used as initial constraints to locate a-helical, extended, and p-turn structures. These are based on the concept that short-range interactions dominate in determining protein structure.228-2.30 The predictability can be improved by incorporating medium-range interactions (up to four residues away from a given one) with an Ising model (see Figure 2 of Reference 231). Alternatively, this information has been obtained from experiments on synthetic polypeptides. Random hostguest copolymers have been used159 to obtain the parameters u and s of the Zimm-Bragg nearest-neighbor theory160 of the helix-coil transition. Figure 19 illustrates how such values of s vary with temperature.159 These values of s have been modified by incorporation of medium- and long-range interactions and used to obtain an initial prediction of the location of a helices.162.232 Recently, experiments on the cyclization of hexapeptides (Figure 38) and tetrapeptides have been used to provide analogous information to obtain an initial prediction of the locations of p sheets and p turns,233-235 respectively. In addition, empirical procedures are available to predict the location of structural
/
Pro - G l y
X
\
/
Y
I
CYS
CYS
S
S
I
I
7
-
SH
I
-cys
4
X
I
HS-Cys
Figure 38 Representation of the equilibrium between the cyclic and acyclic forms of a hexapeptide. The N- and C-terminal blocking groups, CH3CO- and -NHCH3, respectively, are not shown.233 The standard free energy change for this process depends on the intrinsic chemistry to form a disulfide bond from two sulfhydryl groups, the tendency of Pro-Gly (or any other dipeptide in this position) to form a p turn, and the tendencies of residues X and Y to adopt the extended conformation (and interact with each other). Table 5 of Reference 234 illustrates the range of standard free energy changes for a family of such hexapeptides.
Extension of Methodology to Large Polypeptides and Proteins 127
domains, the packing arrangements of strands in p sheets, and probable folding pathways in globular proteins.236-239 All of these initial predictions serve as constraints in energy minimization, in which the approximate constraints are subsequently relaxed and ultimately eliminated. The computational methodology described in earlier sections has also been used to carry out a potential energy-constrained, real-space refinement of the structure of BPTI using limited diffraction data.240 In the early stages of the calculations, the model is fit to only those atoms that are identified in the electron density map. After each cycle of the refinement, the map is recalculated with the appearance of new atoms, and their coordinates are used in subsequent cycles. During the last stages of refinement, the model is fit to those atoms that have appeared in the map and the remaining atoms are fit to points in the current electron density map. Figure 39 illustrates how the energy-refined map on the right identifies more of the electron density of Arg-42 than the map on the left. The same computational methodology is also applicable to the refinement of structures determined by NMR spectroscopy.241
Use of Homology Another use of distance constraints, together with energy minimization, is in the computations of the structures of homologous proteins. The known structure of one such protein provides enough distance constraints that, together with energy minimization, can lead to the computed structure of the other. Several years ago, we computed the structure of a-lactalbumin on the basis of its homology to lysozyme.242 Recently, the X-ray structure of a-lactal-
Figure 39 Comparison of sections of electron density maps around Arg-42 of bovine pancreatic trypsin inhibitor, projected onto the x y ~ I a n e . 2 (Left) ~ ~ . Map calculated by using the experimental 2.5-A phases. (Right) Map obtained by using the calculated phases in a late stage of refinement.
128 Predicting Three-Dimensional Structures of Oligopeptides bumin has been determined243 and it agrees very well with the computed structures.244 Figure 2 of Reference 9 provides a superposition of the two structures.
Pattern-Recognition Importance Sampling Minimization (PRISM) Pattern-recognition techniques are used to predict a series of probable backbone structures, whose energies are then minimized to locate the global minimum.245-247 The chain is built up with probabilities, and the computationally intensive energy minimization is delayed until the final stages. The (44) map of each residue is divided into four regions ( ~ , E , ( Y * , and e*) (see Figure 12), and all possible tripeptides from a properly selected set of X-ray structures from the Brookhaven Protein Data Bank248 are collected and grouped according to conformation (e.g., aaa,WE, OLE€*, etc.). The patternrecognition procedure uses amino acid properties249 to map peptide sequences into a multivariate property space. Particular tripeptide conformations tend to map to particular regions of the property space. These regions are represented by multivariate Gaussian distributions, where the parameters of the distributions are determined from tripeptides in the Brookhaven Protein Data Bank.248 These data are then used to calculate the probability that each tripeptide in a protein under study has a given conformation. The polypeptide chain is built up from the N terminus, fitting the most probable tripeptide conformations together, one tripeptide at a time, allowing for proper overlap of the tripeptides. As the build-up proceeds, the probabilities of the growing chain (conformation) are calculated, and only the 1000 most probable are retained. Thus, when the C terminus is reached, there are 1000 different predictions of the backbone structure of the protein, sorted in order of decreasing probability. The symbolic representation (in terms of the regions a , ~ , a *and , E * ) of the conformations of a protein is converted to a dihedral angle representation by randomly generating values of 4 and I$ in each of the assigned regions from appropriate probability distributions. A bivariate (two-dimensional) Gaussian distribution parameterized on values of (+,I$) from the known X-ray structures248 is used, together with standard techniques for generating random numbers from Gaussian distributions. Several such random structures are generated for each backbone prediction, and the energy of each of them is minimized. The lowest-energy structure is taken to represent the backbone prediction. Most of the time-consuming energy minimization of the build-up procedure is avoided because the aforementioned probabilities serve to reduce, to a manageable size, the set of conformations whose energies have to be minimized. This procedure has been tested on the 36-residue avian pancreatic polypeptide, and the lowest-energy structure247 is compared with the X-ray structure250J5* in Figure 40.
Acknowledgments 129
Figure 40 Stereo view illustrating superposition of computed structure of avian pancreatic polypeptide (open c i ~ c l e s ) o2 n~ the ~ X-ray structure (solid circles).250Js1 Only the Ca atoms are shown.
OUTLOOK FOR THE FUTURE Although the basic methodology for calculating structures of oligopeptides is now in place, there is still room for improvement of potential functions, not only in the parameters used but also in the forms of the functions themselves. In the near future, we can expect to see the introduction of anharmonicity in bond angle bending, a treatment involving many-body interactions (rather than pairwise interactions), polarization (and possibly distributed multipoles) to treat electrostatics, and improved treatment of hydration. In addition, we will undoubtedly see considerably more use of parallelism in computer hardware and software. Finally, with all the current activity on the multiple-minima problem, which is essentially solved for oligopeptides and regular-repeating structures of fibrous proteins, we may hope to see further progress in treating larger molecules, that is, globular proteins, by more efficient searches of conformational space.
ACKNOWLEDGMENTS I am indebted to George NCmethy for helpful comments on this chapter. The work described here, which was carried out in the author’s laboratory, was supported by research grants from the National Institutes of Health (GM-14312) and the National Science Foundation (DMB84-01811 and DMB90-15815).
130 Predicting Three-Dimensional Structures of Oligopeptides
REFERENCES 1. H. A. Scheraga, Adv. Phys. Org. Chem., 6, 103 (3968).Calculations of Conformations of Polypeptides. 2. H. A. Scheraga, Chem. Rev., 71, 195 (1971). Theoretical and Experimental Studies of Conformations of Polypeptides. 3. C. B. Anfinsen and H. A. Scheraga, Adv. Protein Chem., 29,205 (1975). Experimental and Theoretical Aspects of Protein Folding. 4. G. NCmethy and H. A. Scheraga, Quart. Rev. Biophys., 10, 239 (1977). Protein Folding. 5. H. A. Scheraga, Carlsberg Research Comm., 49, 1 (1984). Protein Structure and Function, from a Colloidal to a Molecular View (7th Linderstrem-Lang Lecture, Copenhagen, May 10, 1983). 6. H. A. Scheraga, Ann. N.Y. Acad. Sci., 439, 170 (1985). Calculations of the ThreeDimensional Structures of Proteins. 7. H. A. Scheraga, Chem. Scr., 29A, 3 (1989).Calculations of Stable Conformations of Polypeptides, Proteins, and Protein Complexes. 8. K.-C. Chou, G. Ntmethy, and H. A. Scheraga, Acc. Chem. Res., 23, 134 (1990).Energetics of Interactions of Regular Structural Elements in Proteins. 9. G. NCmethy and H. A. Scheraga, FASEB /. 4, 3189 (1990). Theoretical Studies of Protein Conformation by Means of Energy Computations. 10. H. A. Scheraga and G. NCmethy, in Molecules in Natural Scienceand Medicine-Encomium for Linus Pauling, Z . B. Maksic and M. E. Maksic, Eds., pp. 141-176, Ellis Horwood, Chichester, 1991. Calculated Structures and Stabilities of Fibrous Macromolecules. 11. A. R. Leach, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. 8. Boyd, Eds., p. 1, VCH Publishers, New York, 1991. A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules. 12. J. M. Troyer and F. E. Cohen, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., p. 57, VCH Publishers, New York, 1991. Simplified Models for Understanding and Predicting Protein Structure. 13. U. Dinur and A. T. Hagler, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., p. 99, VCH Publishers, New York, 1991. New Approaches to Empirical Force Fields. 14. D. E. Williams, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. 8. Boyd, Eds., p. 219, VCH Publishers, New York, 1991. Net Atomic Charge and Multipole Models for the Ab lnitio Molecular Electric Potential. 15. D. B. Boyd, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., p. 461, VCH Publishers, New York, 1991. The Computational Chemistry Literature. IDid., p. 481. Compendium of Software for Molecular Modeling. 16. J. A. McCammon and S. C. Harvey, Dynamics of Proteins and Nucleic Acids, Cambridge University Press, New York, 1987. 17. S. Lifson and 1. Oppenheim, ]. Chem. Phys., 33, 109 (1960). Neighbor Interactions and Internal Rotations in Polymer Molecules. IV. Solvent Effect on Internal Rotations. 18. N. G6, M. Go, and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 59, 1030 (1968).Molecular Theory of the Helix-Coil Transition in Polyamino Acids. I. Formulation. 19. K. D. Gibson and H. A. Scheraga, Physiol. Chem. Phys., 1, 109 (1969). Minimization of Polypeptide Energy. V. Theoretical Aspects. 20. N. Go and H. A. Scheraga,]. Chem. Phys., 51,4751 (1969).Analysis of the Contribution of Internal Vibrations to the Statistical Weights of Equilibrium Conformations of Macromolecules. 21. N. Go and H. A. Scheraga, Macromolecules, 9,535 (1976). On the Use of Classical Statistical Mechanics in the Treatment of Polymer Chain Conformation.
References 13 1 22. IUPAC-IUB Commission on Biochemical Nomenclature, Biochemistry, 9,3471 (1970). Abbreviations and Symbols for the Description of the Conformation of Polypeptide Chains. Tentative Rules (1969). 23. F. A. Momany, R. F. McGuire, A. W. Burgess, and H. A. Scheraga,]. Phys. Chem., 79,2361 (1975). Energy Parameters in Polypeptides. VII. Geometric Parameters, Partial Atomic Charges, Nonbonded Interactions, Hydrogen Bond Interactions, and Intrinsic Torsional Potentials for the Naturally Occurring Amino Acids. 24. N. Go and H. A. Scheraga, Macromolecules, 3,178 (1970).Ring Closure and Local Conformational Deformations of Chain Molecules. 25. N. Go and H. A. Scheraga, Macromolecules, 3, 188 (1970). Calculation of the Conformation of the Pentapeptide Cyclo(glycylglycylglycylprolylprolyl).1. A Complete Energy Map. 26. N. Go, P. N. Lewis, and H. A. Scheraga, Macromolecules, 3,628 (1970). Calculation of the Conformation of the Pentapeptide Cyclo(glycylglycylglycylpropylprolyl). 11. Statistical Weights. 27. G. C.-C. Niu, N. Go, and H. A. Scheraga, Macromolecules, 6,91 (1973). Erratum, ibid., 6, 796 (1973). Calculation of the Conformation of the Pentapeptide Cyclo(glycylg1ycylglycylprolylprolyl). 111. Treatment of a Flexible Molecule. 28. N. G6 and H. A. Scheraga, Mucromolecules, 11,552 (1978). Calculation of the Conformation of Cyclo-hexaglycyl. 2. Application of a Monte Carlo Method. 29. G. Nimethy, J. R. McQuie, M. S. Pottle, and H. A. Scheraga, Macromolecules, 14, 975 (1981). Conformation of Cyclo (L-Alanylglycyl-eaminocaproyl),a Cyclized Dipeptide Model for a p Bend. 1. Conformational Energy Calculations. 30. N. G6 and H. A. Scheraga, Macromolecules, 6, 273 (1973). Ring Closure in Chain Molecules with C,, I, or s,, Symmetry. 31. N. Go and H. A. Scheraga, Macromolecules, 6, 525 (1973). Erratum, ibid., 7, 148 (1974). Calculation of the Conformation of Cyclo-hexaglycyl. 32. M. J. Dudek and H. A. Scheraga, J. Comput. Chem., 11, 121 (1990). Protein Structure Prediction Using a Combination of Sequence Homology and Global Energy Minimization. 1. Global Energy Minimization of Surface Loops. 33. K. A. Palmer and H. A. Scheraga, J. Comput. Chem., 12,505 (1991). Standard-Geometry Chains Fitted to X-ray Derived Structures: Validation of the Rigid-Geometry Approximation. I. Chain Closure Through a Limited Search of “Loop” Conformations. 34. K. A. Palmer and H. A. Scheraga,]. Comput. Chem., 13, 329 (1992). Standard-Geometry Chains Fitted to X-ray Derived Structures: Validation of the Rigid-Geometry Approximation, Ii. Systematic Searches for Short Loops in Proteins: Applications to Bovine Pancreatic Ribonuclease A and Human Lysozyme. 35. R. E. Bruccoleri and M. Karplus, Macromolecules, 18, 2767 (1985). Chain Closure with Bond Angle Variations. 36. P. S. Shenkin, D. L. Yarmush, R. M. Fine, H. Wang, and C. Levinthal, Biopolymers, 26, 2053 (1987). Predicting Antibody Hypervariable Loop Conformation. I. Ensembles of Random Conformations for Ringlike Structures. 37. R. M. Fine, H. Wang, P. S. Shenkin, D. L. Yarmush, and C. Levinthal, Proteins: Struct. Funct. Genet., 1, 342 (1986). Predicting Antibody Hypervariable Loop Conformations. 11. Minimization and Molecular Dynamics Studies of MCPC603 from Many Randomly Generated Loop Conformations. 38. J. Moult and M. N. G. James, Proteins: Struct., Funct., Genet., 1, 146 (1986). An Algorithm for Determining the Conformation of Polypeptide Segments in Proteins by Systematic Search. 39. M. Dygert, N. Go, and H. A. Scheraga, Macromolecules, 8,750 (1975). Use of a Symmetry Condition to Compute the Conformation of Gramicidin S. 40. G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, J. Mol. Biol., 7, 95 (1963). Stereochemistry of Polypeptide Chain Configurations.
132 Predicting Three-Dimensional Structures of Oligopeptides 41. G. Nimethy and H. A. Scheraga, Biopolymers, 3,155 (1965).Theoretical Determination of Sterically Allowed Conformations of a Polypeptide Chain by a Computer Method. 42. S. J. Leach, G. Nimethy, and H. A. Scheraga, Biopolymers, 4,369 (1966). Computation of the Sterically Allowed Conformations of Peptides. 43. S. J. Leach, G. Ntmethy, and H. A. Scheraga, Biopolymers, 4, 887 (1966). Intramolecular Steric Effects and Hydrogen Bonding in Regular Conformations of Polyarnino Acids. 44. H. A. Scheraga, R. A. Scott, G. Vanderkooi, S. J. Leach, K. D. Gibson, T. Ooi, and G. NCmethy, in Conformation of Biopolymers, G. N. Ramachandran, Ed., p. 43, Academic Press, London, 1967. Calculations of Polypeptide Structures from Amino Acid Sequence. 45. G. Nimethy, S. J. Leach, and H. A. Scheraga, J. Phys. Chem., 70,998 (1966).The Influence of Amino Acid Side Chains on the Free Energy of Helix-Coil Transitions. 46. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta, Jr., and P. Weiner, 1.Am. Cbem. Soc., 106,765 (1984). A New Force Field for Molecular Mechanical Simulation of Nucleic Acids and Proteins. 47. S. J. Weiner, P.A. Kollman, D. T. Nguyen, and D. A. Case,J. Comput., Chem., 7,230 (1986). An All Atom Force Field for Simulations of Proteins and Nucleic Acids. 48. 8. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, /. Comput. Chem., 4, 187 (1983). CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. 49. P. Dauber-Osguthorpe, V. A. Roberts, D. J. Osguthorpe, J. Wolff, M. Genest, and A. T. Hagler, Proteins: Struct. Funct. Genet., 4, 31 (1988). Structure and Energetics of Ligand Binding to Proteins: Escherichiu coli Dihydrofolate Reductase-Trimethoprim, a DrugReceptor System. 50. G. Nimethy, M. S. Pottle, and H. A. Scheraga, J. Phys. Chem., 87, 1883 (1983). Energy Parameters in Polypeptides. 9. Updating of Geometrical Parameters, Nonbonded Interactions, and Hydrogen Bond Interactions for the Naturally Occurring Amino Acids. 51. M. J. Sippl, G. Nemethy, and H. A. Scheraga, J. Phys. Chem., 88,6231 (1984). Intermolecular Potentials from Crystal Data. 6. Determination of Empirical Potentials for O-H...O=C Hydrogen Bonds from Packing Configurations. 52. G. Ntmethy, K. D. Gibson, K. A. Palmer, C. N. Yoon, G. Paterlini, A. Zagari, S. Rumsey, and H. A. Scheraga, J. Phys. Chem., 96, 6472 (1992). Energy Parameters in Polypeptides. 10. Improved Geometrical Parameters and Nonbonded Interactions for Use in the ECEP1’/3 Algorithm, with Application to Proline-Containing Peptides. 53. N. L. Allinger,J. Am. Chem. SOC., 99,8127 (1977).Conformational Analysis. 130. MM2. A Hydrocarbon Force Field Utilizing V, and V, Torsional Terms. 54. 1. K. Roterman, K. D. Gibson, and H. A. ScheragaJ Biomol. Struct. Dyn., 7,391 (1989).A Comparison of the CHARMM, AMBER and ECEPP Potentials for Peptides. I. Conformational Predictions for the Tandemly Repeated Peptide (Asn-Ala-Asn-Pro),.
55. 1. K. Roterman, M. H. Lambert, K. D. Gibson, and H. A. Scheraga, J. Biomol. Struct. Dyn., 7,421 (1989).A Comparison of the CHARMM, AMBER and ECEPP Potentials for Peptides. 11. +-$ Maps for N-Acetyl Alanine “-Methyl Amide: Comparisons, Contrasts and Simple Experimental Tests. 56. P. A. Kollman and K. A. Dill, J. Biomol. Struct. Dyn., 8, 1103 (1991). Decisions in Force Field Development: An Alternative to Those Described by Roterman et al. 57. K. D. Gibson and H. A. Scheraga, J. Biomol. Struct. Dyn., 8 , 1109 (1991).Decisions in Force Field Development. Reply to Kollman and Dill. 58. D. Poland and H. A. Scheraga, Biochemistry, 6,3791 (1967).Energy Parameters in Polypeptides. 1. Charge Distributions and the Hydrogen Bond. 59. F. A. Momany, G. Vanderkooi, and H. A. Scheraga, Proc. Natl. Acud. Sci., USA. 61,429 ( 1 968). Determination of Intermolecular Potentials from Crystal Data. 1. General Theory and Application to Crystalline Benzene at Several Temperatures.
References 133 60. J. F. Yan, F. A. Momany, R. Hoffmann, and H. A. Scheraga,]. Phys. Chem., 74,420 (1970). Energy Parameters in Polypeptides. 11. Semiempirical Molecular Orbital Calculations for Model Peptides. 61. F. A. Momany, R. F. McGuire, J. F. Yan, and H. A. Scheraga, 1. Phys. Chem., 74, 2424 (1970). Energy Parameters in Polypeptides. Ill. Semiempirical Molecular Orbital Calculations for Hydrogen-Bonded Model Peptides. 62. R. F. McGuire, G. Vanderkooi, F. A. Momany, R. T. Ingwall, G. M. Crippen, N. Lotan, R. W. Tuttle, K. L. Kashuba, and H. A. Scheraga, Macromolecules, 4, 112 (1971).Determination of Intermolecular Potentials from Crystal Data. 11. Crystal Packing with Applications to Polyamino Acids. 63. F. A. Momany, R. F. McGuire, J. F. Yan, and H. A. Scheraga, 1. Phys. Chem., 75, 2286 (1971). Energy Parameters in Polypeptides. IV. Semiempirical Molecular Orbital Calculations of Conformational Dependence of Energy and Partial Charge in Di- and Tripeptides. 64. R. F. McCuire, F. A. Momany, and H. A. Scheraga, 1. Phys. Chem., 76,375 (1972).Energy Parameters in Polypeptides. V. An Empirical Hydrogen Bond Potential Function Based on Molecular Orbital Calculations. 65. P. N. Lewis, F. A. Momany, and H. A. Scheraga, lsr. 1. Chem., 11, 121 (1973). Energy Parameters in Polypeptides. VI. Conformational Energy Analysis of the N-Acetyl "-Methyl Amides of the Twenty Naturally Occurring Amino Acids. 66. F. A. Momany, L. M. Carruthers, R. F. McGuire, and H. A. Scheraga,]. Phys. Chem., 78, 1595 (1974). Intermolecular Potentials from Crystal Data. 111. Determination of Empirical Potentials and Application to the Packing Configurations and Lattice Energies in Crystals of Hydrocarbons, Carboxylic Acids, Amines, and Amides. 67. F. A. Momany, L. M. Carruthers, and H. A. Scheraga, 1. Phys. Chem., 78, 1621 (1974). Intermolecular Potentials from Crystal Data. IV. Application of Empirical Potentials to the Packing Configurations and Lattice Energies in Crystals of Amino Acids. 68. Y-C. Fu, R. F. McGuire, and H. A. Scheraga, Macromolecules, 7,468 (1974).Intermolecular Potentials from Crystal Data. V. Crystal Packing of Poly[p-(p-chlorobenzyl)-L-aspartate]. 69. G. Nimethy and H. A. Scheraga,]. Phys. Chem., 80, 928 (1977).Intermolecular Potentials from Crystal Data. 5. Determination of Empirical Potentials for O-H-..O Hydrogen Bonds from Packing Configurations and Lattice Energies of Polyhydric Alcohols. 70. G. Nimethy and H. A. Scheraga, Biochem. Biopbys. Res. Comm., 98,482 (1981). Strong Interaction between Disulfide Derivatives and Aromatic Groups in Peptides and Proteins. 71. M. Vasquez, G. Nimethy, and H. A. Scheraga, Macromolecules, 16,1043 (1983).Computed Conformational States of the 20 Naturally Occurring Amino Acid Residues and of the Prototype Residue a-Aminobutyric Acid. 72. S. S. Zimmerman and H. A. Scheraga, Biopolymers, 16, 811 (1977). Erratum: ibid., 16, 1385 (1977). Influence of Local Interactions on Protein Structure. 1. Conformational Energy Studies of N-Acetyl-"-methyl Amides of Pro-X and X-Pro Dipeptides. 73. S. S. Zimmerman and H. A. Scheraga, Biopolymers, 17, 1849 (1978).Influence of Local Interactions on Protein Structure. 11. Conformational Energy Studies of N-Acety1-N'methylamides of Ala-X and X-Ala Dipeptides. 74. S. S. Zimmerman and H. A. Scheraga, Biopolymers, 17, 1871 (1978). Influence of Local Interactions on Protein Structure. 111. Conformational Energy Studies of N-Acetyl-N'methylamides of Gly-X and X-GIy Dipeptides. 75. S. S. Zimmerman and H. A. Scheraga, Biopolymers, 17, 1885 (1978).Influence of Local lnteractions on Protein Structure. 1V. Conformational Energy Studies of N-Acetyl-N'methylamides of Ser-X and X-Ser Dipeptides. 76. S. Tanaka and H. A. Scheraga, Macromolecules, 7 , 698 (1974). Calculation of Conformational Properties of Oligomers of L-Proline. 77. S. S. Zimmerman and H. A. Scheraga, Macromolecules, 9,408 (1976).Stability of cis, trans, and Nonplanar Peptide Groups.
134 Predicting Three-Dimensional Structures of Oligopeptides 78. R. A. Scott and H. A. Scheraga, J. Chem. Phys., 45,2091 (1966).Conformational Analysis of Macromolecules. 111. Helical Structures of Polyglycine and Poly-L-alanine. 79. K. D. Gibson and H. A. Scheraga, Biopolymers, 4,709 (1966).Influence of Flexibility on the Energy Contours of Dipeptide Maps. 80. S. C. Harvey, Proteins: Struct. Funct. Genet., 5, 78 (1989). Treatment of Electrostatic Effect in Macromolecular Modeling. 81. M. K. Gilson and B. Honig, Proteins: Struct. Funct. Genet., 4, 7 (1988). Calculation of the Total Electrostatic Energy of a Macromolecular System: Solvation Energies, Binding Energies, and Conformational Analysis. 82. Y. N. Vorobjev, J. A. Grant, and H. A. Scheraga, J. Am. Chem. Soc., 114, 3189 (1992). A Combined Iterative and Boundary-Element Approach for Solution of the Nonlinear Poisson-Boltzmann Equation. 83. L. G. Dunfield, A. W. Burgess, and H . A. Scheraga, 1.Phys. Chem., 82,2609 (1978).Energy Parameters in Polypeptides. 8. Empirical Potential Energy Algorithm for the Conformational Analysis of Large Molecules. 84. M. R. Pincus and H. A. Scheraga, J. Phys. Chem., 81, 1579 (1977). An Approximate Treatment of Long-Range interactions in Proteins. 85. S. Tanaka and H. A. Scheraga, Macromolecules, 8,516 (1975). Theory of the Cooperative Transition Between Two Ordered Conformations of Poly(L-proline), Ill. Molecular Theory in the Presence of Solvent. 86. M. G6, N. G6, and H. A. Scheraga, J . Chem. Phys., 54,4489 (1971). Molecular Theory of the Helix-Coil Transition in Polyamino Acids. 111. Evaluation and Analysis of s and u for Polyglycine and Poly-L-alanine in Water. 87. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, I. Chem. Phys., 79, 926 (1983). Comparison of Simple Potential Functions for Simulating Liquid Water. 88. A. Anderson, M. Carson, and J. Hermans, Ann. N.Y. Acad. Sci., 482,51 (1986). Molecular Dynamics Simulation Study of Polypeptide Conformational Equilibria: A Progress Report. 89. C. L. Brooks, M. Karplus, and B. M. Pettitt, in Proteins: A Theoretical Perspective of Dynamics Structure, and Thermodynamics, Adv. Chem. Phys., Vol. 71, 1. Prigogine and S. A. Rice, Eds., John Wiley & Sons, New York, 1988. 90. K. D. Gibson and H. A. Scheraga, Proc. Nutl. Acud. Sci. USA, 58,420 (1967).Minimization of Polypeptide Energy. 1. Preliminary Structures of Bovine Pancreatic Ribonuclease S-Peptide. 91. G. NCmethy, W. J. Peer, and H. A. Scheraga, Annu. Rev. Biophys. Bioengin., 10,459 (1981). Effect of Protein-Solvent lnteractions on Protein Conformation. 92. Y. K. Kang, K. D. Gibson, G. NCrnethy, and H. A. Scheraga, J. Phys. Chem., 92, 4739 (1988).Free Energies of Hydration of Solute Molecules. 4. Revised Treatment of the Hydration Shell Model. 93. D. Eisenberg and A. D. McLachlan, Nature (London), 319,199 (1986).Solvation Energy in Protein Folding and Binding. 94. T. Ooi, M. Oobatake, G. NCmethy, and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 84, 3086 (1987). Erratum: ibid., 84,6015 (1987).Accessible Surface Areas as a Measure of the Thermodynamic Parameters of Hydration of Peptides. 95. G. Perrot and 8. Maigret,]. Mol. Graph., 8,141 (1990).New Determinations and Simplified Representations of Macromolecular Surfaces. 96. G. Perrot, B. Cheng, K. D. Gibson, J. Vila, K. A. Palmer, A. Nayeem, B. Maigret, and H. A. Scheraga, J. Comput. Chem., 13, 1 (1992).MSEED: A Program for the Rapid Analytical Determination of Accessible Surface Areas and Their Derivatives. 97. D. R. Ripoll, L. Piela, M. Vasquez, and H. A. Scheraga, Proteins: Struct. Funct. Genet., 10, 188 (1991). On the Multiple-Minima Problem in the Conformational Analysis of Polypeptides. V. Application of the Self-consistent Electrostatic Field and the Electrostatically Driven Monte Carlo Methods to Bovine Pancreatic Trypsin Inhibitor.
References 1 3 5 98. J. Vila, R. L. Williams, M. Vasquez, and H. A. Scheraga, Proteins: Struct. Funct. Genet., 10, 199 (1991). Empirical Solvation Models Can be Used to Differentiate Native from NearNative Conformations of Bovine Pancreatic Trypsin Inhibitor. 99. R. L. Williams, J. Vila, G . Perrot, and H. A. Scheraga, Proteins: Struct. Funct. Genet., 14, 110 (1992).Empirical Solvation Models in the Context of Conformational Energy Searches. Application to Bovine Pancreatic Trypsin Inhibitor. 100. N. Go, M. G6, and H. A. Scheraga, Macromolecules, 7, 137 (1974). New Method for Calculating the Conformational Entropy of a Regular Helix. 101. H. Meirovitch, ]. Chem. Phys., 89, 2514 (1988). Statistical Properties of the Scanning Simulation Method for Polymer Chains. 102. H. Meirovitch and H. A. Scheraga,]. Chem. Phys., 84,6369 (1986). Computer Simulation of the Entropy of Continuum Chain Models: The Two-Dimensional Freely Jointed Chain of Hard Disks. 103. H. Meirovitch, M. Visquez and H. A. Scheraga, Biopolymers, 26, 651 (1987). Stability of Polypeptide Conformational States as Determined by Computer Simulation of the Free Energy. 104. H. Meirovitch, M. Vasquez and H. A. Scheraga, Biopolymers, 27, 1189 (1988). Stability of Polypeptide Conformational States. 11. Folding of a Polypeptide Chain by the Scanning Simulation Method, and Calculation of the Free Energy of the Statistical Coil. 105. H. Meirovitch, M. Visquez and H. A. Scheraga, ]. Chem. Phys., 92, 1248 (1990). Free Energy and Stability of Macromolecules Studied by the Double Scanning Simulation Procedure. 106. J. A. Schellman, Compt. Rend. Truv. Lab. Curlsberg, Ser. Chim., 29,230 (1955).Stability of Hydrogen Bonded Peptide Structures in Aqueous Solution. 107. P. J. Flory, ]. Am. Chem. Soc., 78, 5222 (1956). Theory of Elastic Mechanisms in Fibrous Proteins. 108. W. L. Mattice, G. Nkmethy, and H. A. Scheraga, Macromolecules, 21,2811 (1988).Conformational Entropy Associated with the Formation of Internal Loops in Collagen. 109. D. C. Poland and H. A. Scheraga, Biopolymers, 3,379 (1965).Statistical Mechanics of Noncovalent Bonds in Polyamino Acids. VIII. Covalent Loops in Proteins. 110. S. H. Lin, Y. Konishi, M. E. Denton, and H. A. Scheraga, Biochemistq 23, 5504 (1984). Influence of an Extrinsic Cross-link on the Folding Pathway of Ribonuclease A. Conformational and Thermodynamic Analysis of Cross-linked (Lysine7-Lysine4')-Ribonuclease A. 1 1 1 . B. J. Yoon and H. A. Scheraga, ]. Mol. Struct. (Theochem), 199,33 (1989). Calculation of the Entropy of a Fluid by a Monte Carlo Simulation Based on Free Volume. 112. 8. J. Yoon, S. D. Hong, M. S. Jhon, and H. A. Scheraga, Chem. Phys. Lett., 181,73 (1991). Calculation of the Entropy and the Chemical Potential of Fluids and Solids from the Radial Free-Space Distribution Function. 113. B. L. Tembe and J. A. McCammon, Comput. Chem., 8, 281 (1984). Ligand-Receptor Interactions. 114. W. L. Jorgensen and C. Ravimohan, 1.Chem. Phys., 83,3050 (1985). Monte Carlo Simulation of Differences in Free Energies of Hydration. 115. A. Warshel and F. Sussman, Proc. Nutl. Acad. Sci. USA, 83,3806 (1986).Toward ComputerAided Site-Directed Mutagenesis of Enzymes. 116. P. A. Bash, U. C. Singh, R. Langridge, and P. A. Kollman, Science, 236, 564 (1987). Free Energy Calculations by Computer Simulation. 117. D. Levesque and L. Verlet, Phys. Rev., 182,307 (1969).Perturbation Theory and Equation of State for Fluids. 118. J. P. Hansen and L. Verlet, Phys. Rev., 184, 151 (1969).Phase Transitions of the LennardJones System.
119. J. P. Hansen, Phys. Rev. A, 2,221 (1970).Phase Transition of the Lennard-Jones System. 11. High-Temperature Limit.
136 Predicting Three-Dimensional Structures of Oligopeptides 120. W. G. Hoover, M. Ross, K. W. Johnson, D. Henderson, J. A. Barker, and B. C. Brown, J. Chem. Phys., 52, 4931 (1970). Soft-Sphere Equation of State. 121. G. N. Patey and J. P. Valleau, Chem. Phys. Lett., 21,297 (1973).The Free Energy of Spheres with Dipoles: Monte Carlo with Multistage Sampling. 122. G. M. Torrie and J. P. Valleau, Chem. Phys. Lett., 28,578 (1974).Monte Carlo Free Energy Estimates Using Non-Boltzmann Sampling: Application to the Sub-critical Lennard-Jones Fluid. / 123. G. M. Torrie and J. P. Valleau, J. Comput. Phys., 23, 187 (1977). Nonphysical Sampling Distribution in Monte Carlo Free-Energy Estimation: Umbrella Sampling. 124. J. C. Owicki and H. A. Scheraga, j . Phys. Chem., 82,1257 (1978). Monte Carlo Free Energy Calculations on Dilute Solutions in the Isothermal-Isobaric Ensemble. 125. Z. Li and H. A. Scheraga, J. Phys. Chem., 92,2633 (1988).Monte Carlo Recursion Evaluation of Free Energy. 126. Z. Li and H. A. Scheraga, Chem. Phys. Lett., 154, 516 (1989). Erratum: ibid., 157, 579 (1989). Computation of the Free Energy of Liquid Water by the Monte Carlo Recursion Method. 127. I. Z. Steinberg and H. A. Scheraga, J. Biol. Chem., 238, 172 (1963). Entropy Changes Accompanying Association Reactions of Proteins. 128. H. H. Rosenbrock, Comput. J., 3, 175 (1961). An Automatic Method for Finding the Greatest or Least Value of a Function. 129. R. Fletcher and M. J. D. Powell, Comput. J., 6, 163 (1963).A Rapidly Convergent Descent Method for Minimization. 130. R. Fletcher and C. M. Reeves, Comput. J., 7, 149 (1964). Function Minimization by Conjugate Gradients. R. Fletcher, Comput. I. 8,,33 (1965). Function Minimization without Evaluating Derivatives-A Review. 131. M. j. D. Powell, Comput. J., 7,303 (1965). A Method for Minimizing a Sum of Squares of Non-linear Functions Without Calculating Derivatives. 132. J. A. Nelder and R. Mead, Comput. J., 7 , 308 (1965). A Simplex Method for Function Minimization. 133. J. D. Pearson, Comput. I., 12, 171 (1969). Variable Metric Methods of Minimisation. 134. P. E. Gill and W. Murray, Math. Prog.. 7, 311 (1974). Newton-Type Methods for Unconstrained and Linearly Constrained Optimization. 135. P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press, London, 1981. 136. T. Schlick, in Reviews in Computational Chemistry, Vol. 3, K. B. Lipkowitz and D. B. Boyd, Eds., p. 1, VCH Publishers, New York, 1992. Optimization Methods in Computational Chemistry. 137. D. M. Gay, ACM Trans. Math. Software, 9, 503 (1983). Algorithm 611. Subroutines for Unconstrained Minimization Using a Model/Trust-Region Approach. 138. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem. Phys., 21, 1087 (1953).Equation of State Calculations by Fast Computing Machines. 139. T. P. Lybrand, in Reviews in Computational Chemistry, Vol. 1, K. B. Lipkowitz and D. B. Boyd, Eds., p. 295, VCH Publishers, New.York, 1990. Computer Simulations of Biomolecular Systems Using Molecular Dynamics and Free Energy Perturbation Methods. 140. K. D. Gibson and H. A. Scheraga,!. Comput. Chem., 11,468 (1990).Variable Step Molecular Dynamics: An Exploratory Technique for Peptides with Fixed Geometry. 141. K. D. Gibson and H. A. Scheraga,). Comput. Chem., 11,487 (1990).Dynamics of Peptides with Fixed Geometry: Kinetic Energy Terms and Potential Energy Derivatives as Functions of Dihedral Angles. 142. E. 0. Purisima and H. A. Scheraga, Biopolymers, 23, 1207 (1984). Conversion from a Virtual-Bond Chain to a Complete Polypeptide Backbone Chain.
References 13 7 143. S. Rackovsky and H. A. Scheraga, Acc. Chem. Res., 17,209 (1984).Differential Geometry and Protein Folding. 144. S. Rackovsky, Proteins: Struct. Funct. Genet., 7 , 378 (1990). Quantitative Organization of the Known Protein X-Ray Structures. I. Methods and Short-Length-Scale Results. 145. C. Pottle, M. S. Pottle, R. W. Tuttle, R. J. Kinch, and H. A. Scheraga, /. Comput. Chem., 1, 46 (1980). Conformational Analysis of Proteins: Algorithms and Data Structures for Array Processing. 146. K. D. Gibson, S. Chin, M. R. Pincus, E. Clementi, and H. A. Scheraga, in Lecture Notes in Chemistry, Vol. 44: Supercomputer Simulations in Chemistry, M. Dupuis, Ed., p. 198, Springer-Verlag, Berlin, 1986. Parallelism in Conformational Energy Calculations on Proteins. 147. T. Ooi, R. A. Scott, G. Vanderkooi, and H. A. Scheraga,J. Chem. Phys., 46,4410 (1967). Conformational Analysis of Macromolecules. 1V. Helical Structures of Poly-L-alanine, Poly-L-valine, My-P-methyl-L-aspartate, Poly-y-methyl-L-glutamate, and Poly-t-tyrosine. 148. H. A. Scheraga, Harvey Lectures, 63,99 (1969).Calculation of Polypeptide Conformation. 149. E. H. Erenrich, R. H. Andreatta, and H. A. Scheraga, j . Am. Chem. SOC., 92, 1116 (1970). Experimental Verification of Predicted Helix Sense of Two Polyamino Acids. 150. Y. Paterson, S. M. Rumsey, E. Benedetti, G . NCmethy, and H. A. Scheraga, J. Am. Chem. SOC.,103, 2947 (1981). Sensitivity of Polypeptide Conformation to Geometry. Theoretical Conformational Analysis of Oligomers of a-Aminoisobutyric Acid. 151. Y. Paterson, E. R. Stimson, D. J. Evans, S. J. Leach, and H. A. Scheraga, Int. /. Pept. Protein Res., 20, 468 (1982). Erratum: ibid., 22, 128a (1983). Solution Conformations of Oligomers of a-Aminoisobutyric Acid. 152. V. Pavone, E. Benedetti, B. Di Blasio, C. Pedone, A. Santini, A. Bavoso, C. Toniolo, M. Crisma, and L. Sartore, J. Biomol. Struct. Dyn., 7, 1321 (1990). Critical Main-Chain Length for Conformational Conversion From 3 ,,-Helix to a-Helix in Polypeptides. 153. C. Chothia, /. Mol. Biof., 75, 295 (1973). Conformation of Twisted @-PleatedSheets in Proteins. 154. K.-C. Chou and H. A. Scheraga, Proc. Natl. Acad. Sci., USA, 79,7047 (1982). Origin of the Right-Handed Twist of P-Sheets of Poly(L-Val) Chains. 155. K.-C. Chou, G. Nkmethy, and H. A. Scheraga, J. Mol. Biol., 168, 389 (1983). Role of Interchain Interactions in the Stabilization of the Right-Handed Twist of P-Sheets. 156. K.-C. Chou, G. NCmethy, and H. A. Scheraga, Biochemistry, 22, 6213 (1983). Effect of Amino Acid Composition on the Twist and the Relative Stability of Parallel and Antiparallel @-Sheets. 157. J. S. Balcerski, E. S. Pysh, G . M. Bonora, and C. Toniolo, J. Am. Chem. SOC.,98, 3470 (1 976). Vacuum Ultraviolet Circular Dichroism of P-Forming AIkyl Oligopeptides. 158. D. Poland and H. A. Scheraga, Theory ofHelix-Coil Transitions in Biopolymers, Academic Press, New York, 1970. 159. J. Wojcik, K. H. Altmann, and H. A. Scheraga, Biopolymers, 30, 121 (1990). Helix-Coil Stability Constants for the Naturally Occurring Amino Acids in Water. XXIV. Half-Cystine Parameters from Random Poly(hydroxybutylglutamine-co-S-methylthio-L-cysteine). 160. B. H. Zimm and J. K. Bragg,]. Chem, Phys., 31,526 (1959).Theory of the Phase Transition Between Helix and Random Coil in Polypeptide Chains. 161. H. A. Scheraga, Proc. Natl. Acad. Sci, USA, 82,5585 (1985).Effect of Side Chain-Backbone Electrostatic Interactions on the Stability of a-Helices. 162. M. Visquez and H. A. Scheraga, Biopolymers, 27, 41 (1988). Effect of Sequence-Specific Interactions on the Stability of Helical Conformations in Polypeptides. 163. P. N. Lewis, F. A. Momany, and H. A. Scheraga, Isr. J. Chem., 11, 121 (1973). Energy Parameters in Polypeptides. VI. Conformational Energy Analysis of the N-Acetyl “-methyl Amides of the Twenty Naturally Occurring Amino Acids.
13 8 Predicting Three-Dimensional Structures of Oligopeptides 164. M. GO, N. Go, and H. A. Scheraga, J. Chem. Phys., 52,2060 (1970). Molecular Theory of the Helix-Coil Transition in Polyamino Acids. 11. Numerical Evaluation of s and u for Polyglycine and Poly-L-alaninein the Absence (for s and u) and Presence (for u)of Solvent. 165. M. Go, F. T. Hesselink, N. Go, and H. A. Scheraga, Macromolecules, 7 , 459 (1974). Molecular Theory of the Helix-Coil Transition in Poly(amino acids). IV. Evaluation and Analysis of s for Poly(L-valine)in the Absence and Presence of Water. 166. M. Go and H. A. Scheraga, Biopolymers, 23,1961 (1984).Molecular Theory of the HelixCoil Transition in Polyamino Acids. V. Explanation of the Different Conformational Behavior of Valine, Isoleucine, and Leucine in Aqueous Solution. 167. F. T. Hesselink, T. Ooi, and H. A. Scheraga, Macromolecules, 6, 541 (1973). Conformational Energy Calculations. Thermodynamic Parameters of the Helix-Coil Transition for Poly(L4ysine) in Aqueous Salt Solution. 168. K. D. Gibson and H. A. Scheraga, in Structure and Expression, Vol. 1: From Proteins to Ribosomes, R. H. Sarma and M. H. Sarma, Eds., p. 67, Adenine Press, Guilderland, New York, 1988. The Multiple-Minima Problem in Protein Folding. 169. K. D. Gibson and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 63,9 (1969).Minimization of Polypeptide Energy. VI. Systematic Searches for Low-Energy Conformations of Deca-Lalanine and the Octapeptide Loop of Ribonuclease. 170. G. M. Crippen and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 64,42 (1969).Minimization of Polypeptide Energy. VIII. Application of the Deflation Technique to a Dipeptide. 171. K. D. Gibson and H. A. Scheraga, Comput. Biomed. Res., 3,375 (1970). Minimization of Polypeptide Energy. IX. A Procedure for Seeking the Global Minimum of Functions with Many Minima. 172. G. M. Crippen and H. A. Scheraga, Arch. Biochem. Biophys., 144,453 (1971). Minimization of Polypeptide Energy. X. A Global Search Algorithm. 173. G. M. Crippen and H. A. Scheraga, Arch. Biochem. Biophys., 144,462 (1971). Minimization of Polypeptide Energy. XI. The Method of Gentlest Ascent. 174. G. M. Crippen and H. A. Scheraga, 1. Comput. Phys., 12, 491 (1973). Minimization of Polypeptide Energy. XII. The Methods of Partial Energies and Cubic Subdivision. 175. K. D. Gibson and H. A. Scheraga, J. Comput. Chem. 8,826 (1987). Revised Algorithms for the Build-up Procedure for Predicting Protein Conformations by Energy Minimization. 176. M. Vasquez and H. A. Scheraga, Biopolymers, 24,1437 (1985). Use of Buildup and EnergyMinimization Procedures to Compute Low-Energy Structures of the Backbone of Enkephalin. 177. Z. Li and H. A. Scheraga, Proc. Natl. Acad. Sci. USA, 84, 6611 (1987). Monte CarloMinimization Approach to the Multiple-Minima Problem in Protein Folding. 178. L. Glasser and H. A. Scheraga, J. Mol. Biol., 199, 513 (1988). Calculations on Crystal Packing of a Flexible Molecule, Leu-enkephalin. 179. G. Nimethy and H. A. Scheraga, Biochem. Biophys. Res. Comm., 118,643 (1984). Hydrogen Bonding Involving the Ornithine Side Chain of Gramicidin S. 180. S. E. Hull, R. Karlsson, P. Main, M. M. Woolfson, and E. J. Dodson, Nature (London), 275, 206 (1978). The Crystal Structure of a Hydrated Gramicidin S-Urea Complex. 181. P. A. Mirau and F. A. Bovey, Abstracts of 199th Am, Chem. SOC. Meeting, Boston, April 1990, p. POLY 58. 2D and 3D NMR Studies of Polypeptide Structure and Function. 182. M. H. Miller and H. A. Scheraga,]. Polym. Sci. Polym. Symp., No. 54, 171 (1976).Calculation of the Structures of Collagen Models. Role of Interchain Interactions in Determining the Triple-Helical Coiled-Coil Conformation. 1. Poly(glycy1-prolyl-prolyl). 183. M. H. Miller, G. Nemethy, and H. A. Scheraga, Macromolecules, 13,470 (1980).Calculation of the Structures of Collagen Models. Role of Interchain Interactions in Determining the Triple-Helical Coiled-Coil Conformation. 2. Poly(glycy1-prolyl-hydroxyprolyl). 184. M. H. Miller, G. Nimethy, and H. A. Scheraga, Macromolecules, 13, 910 (1980).Calculation of the Structures of Collagen Models. Role of Interchain Interactions in Determining the Triple-Helical Coiled-Coil Conformation. 3. Poly(glycy1-prolyl-alanyl).
References 139 185. G. Nemethy, M. H. Miller, and H. A. Scheraga, Macromolecules, 13, 914 (1980). Calculation of the Structures of Collagen Models. Role of Interchain Interactions in Determining the Triple-Helical Coiled-Coil Conformation. 4. Poly(glycy1-alanyl-prolyl). 186. G. NCmethy and H. A. Scheraga, Biopolymers, 21, 1535 (1982). Conformational Preferences of Amino Acid Side Chains in Collagen. 187. K. Okuyama, N. Tanaka, T. Ashida, and M. Kakudo, Bull. Chem. SOC. ]upan, 49, 1805 (1976). Structure Analysis of a Collagen Model Polypeptide, (Pro-Pro-Gly),,. 188. L. Piela and H. A. Scheraga, Biopolymers, 26, S33 (1987).On the Multiple-Minima Problem in the Conformational Analysis of Polypeptides. 1. Backbone Degrees of Freedom for a Perturbed a-Helix. 189. Z. Li and H. A. Scheraga,]. Mol. Struct. (Theochem}, 179, 333 (1988). Structure and Free Energy of Complex Thermodynamic Systems. 190. D. R. Ripoll and H. A. Scheraga, Biopolymers, 27, 1283 (1988). On the Multiple-Minima Problem in the Conformational Analysis of Polypeptides. 11. An Electrostatically Driven Monte Carlo Method-Tests on Poly(L-Alanine). 191. D. R. Ripoll and H. A. Scheraga,]. Protein Chem., 8, 263 (1989). The Multiple-Minima Problem in the Conformational Analysis of Polypeptides. 111. An Electrostatically Driven Monte Carlo Method-Tests on Enkephalin. 192. D. R. Ripoll and H. A. Scheraga, Biopolymers, 30, 165 (1990). On the Multiple-Minima Problem in the Conformational Analysis of Polypeptides. IV. Application of the Electrostatically Driven Monte Carlo Method to the 20-Residue Membrane-Bound Portion of Melittin. 193. D. R. Ripoll, M. J. Visquez, and H. A. Scheraga, Biopolymers, 31, 319 (1991). The Electrostatically Driven Monte Carlo Method: Application to Conformational Analysis of Decaglycine. 194. G. H. Paine and H. A. Scheraga, Biopolymers, 24, 1391 (1985). Prediction of the Native Conformation of a Polypeptide by a Statistical-Mechanical Procedure. 1. Backbone Structure of Enkephalin. 195. G. H. Paine and H. A. Scheraga, Biopolymers, 25, 1547 (1986). Prediction of the Native Conformation of a Polypeptide by a Statistical-Mechanical Procedure. 11. Average Backbone Structure of Enkephalin. 196. G. H. Paine and H. A. Scheraga, Biopolymers, 26, 1125 (1987). Prediction of the Native Conformation of a Polypeptide by a Statistical-Mechanical Procedure. 111. Probable and Average Conformations of Enkephalin. 197. S. Kirkpatrick, C. D. Gelatt, Jr., and M. P.Vecchi, Science, 220,671 (1983). Optimization by Simulated Annealing. 198. D. Vanderbilt and S. G. Louie,J. Comput. Phys., 56,259 (1984).A Monte Carlo Simulated Annealing Approach to Optimization over Continuous Variables. 199. E. 0. Purisima and H. A. Scheraga, Proc. Nutl. Acud. Sci. USA, 83, 2782 (1986). An Approach to the Multiple-Minima Problem by Relaxing Dimensionality. 200. E. 0. Purisima and H. A. Scheraga, 1. Mol. Biol., 196, 697 (1987). An Approach to the Multiple-Minima Problem in Protein Folding by Relaxing Dimensionality. Tests on Enkephalin. 201. M. J. Sippl and H. A. Scheraga, Proc. Nutl. Acud. Sci. USA, 83, 2283 (1986). CayleyMenger Coordinates. 202. G. M. Crippen, J. Comput. Chem., 5, 548 (1984). Conformational Analysis by Scaled Energy Embedding. 203. W. Braun, C. Bosch, L. R. Brown. N. Go, and K. Wuthrich, Biochim. Biophys. Actu, 667, 377 (1981). Combined Use of Proton-Proton Overhauser Enhancements and a Distance Geometry Algorithm for Determination of Polypeptide Conformations. 204. L. Piela, J. Kostrowicki, and H. A. Scheraga,]. Phys. Chem., 93,3339 (1989).The Multipleminima Problem in the Conformational Analysis of Molecules. Deformation of the Potential Energy Hypersurface by the Diffusion Equation Method.
140 Predicting Three-Dimensional Structures of Oligopeptides 205. J. Kostrowicki, L. Piela, B. J. Cherayil, and H. A. Scheraga,]. Phys. Chem., 95,4113 (1991). Performance of the Diffusion Equation Method in Searches for Optimum Structures of Clusters of Lennard-Jones Atoms. 206. J. Kostrowicki and H. A. Scheraga, ]. Phys. Chem., in press. Application of the Diffusion Equation Method for Global Optimization to Oligopeptides. 207. L. D. Landau and E. M. Lifshitz, Quantum Mechanics, Chap. VII, Pergamon Press, 1959. 208. K. A. Olszewski, L. Piela, and H. A. Scheraga, J. Phys. Chem. 96,4672 (1992). Mean-Field Theory as a Tool for Intramolecular Conformational Optimization. 1. Tests on TerminallyBlocked Alanine and Met-enkephalin. 209. R. L. Somorjai,]. Phys. Chem., 95,4141 (1991).Novel Approach for Computing the Global Minimum of Proteins. 1. General Concepts, Methods, and Approximations. 210. M. Sylvain and R. L. Somorjai, ]. Phys. Chem. 95, 4147 (1991). Novel Approach for Computing the Global Minimum of Proteins. 2. One-Dimensional Test Cases. 211. L. D. Landau and E. M. Lifshitz, Quantum Mechanics, pp. 232-235, Pergamon Press, 1959. 212. S. R. Wilson, W. Cui, J. W. Moskowitz, and K. E. Schmidt, Tetrahedron Lett., 29, 4373 (1988). Conformational Analysis of Flexible Molecules: Location of the Global Minimum Energy Conformation by the Simulated Annealing Method. 213. J. W. Moskowitz, K. E. Schmidt, S. R. Wilson, and W. Cui, Znt. ]. Quantum Chem.: Quantum Chem. Symp., 22, 611 (1988).The Application of Simulated Annealing to Problems of Molecular Mechanics. 214. A. T. Brunger, J. Mol. Biol., 203, 803 (1988). Crystallographic Refinement by Simulated Annealing. Application to a 2.8 A Resolution Structure of Aspartate Aminotransferase. 215. M. Nilges, A. M. Gronenborn, A. T. Brunger, and G. M. Clore, Protein Engin., 2,27 (1988). Determination of Three-Dimensional Structures of Proteins by Simulated Annealing with Interproton Distance Restraints. Application to Crambin, Potato Carboxypeptidase Inhibitor, and Barley Serine Proteinase Inhibitor 2. 216. H. Kawai, T. Kikuchi, and Y. Okamoto, Protein Engin., 3, 85 (1989). A Prediction of Tertiary Structures of Peptide by the Monte Carlo Simulated Annealing Method. 217. C. Wilson and S. Doniach, Proteins:-Struct. Funct. Genet., 6 , 193 (1989). A Computer Model to Dynamically Simulate Protein Folding: Studies with Crambin. 218. P. Auffinger and G. Wipff, J. Comput. Chem., 11, 19 (1990). High Temperature Annealed Molecular Dynamics Simulations as a Tool for Conformational Sampling. Application to the Bicyclic "222" Cryptand. 219. S. R. Wilson and W. Cui, Eiopolymers, 29,225 (1990).Applications of Simulated Annealing to Peptides. 220. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The Art of Scientific Computing, p. 326, Cambridge University Press, Cambridge, 1986. 221. A. Nayeem, J. Vila, and H. A. Scheraga, ]. Comput. Chem., 12,594 (1991).A Comparative Study of the Simulated-Annealing and Monte Carlo-with-Minimization Approaches to the Minimum-Energy Structures of Polypeptides: [Met]-enkephalin. 222. S. A. Fossey, G. NCmethy, K. D. Gibson, and H. A. Scheraga, Biopolymers, 31, 1529 (1991). Conformational Energy Studies of p-Sheets of Model Silk Fibroin Peptides. 1. Sheets of Poly(Ala-Gly) Chains. 223. M. R. Pincus and H. A. Scheraga, Acc. Chem. Res., 14,299 (1981).Theoretical Calculations on Enzyme-Substrate Complexes: The Basis of Molecular Recognition and Catalysis. 224. H. A. Scheraga, M. R. Pincus, and K. E. Burke, in Structure of Complexes between Biopolymers and Low Molecular Weight Molecules, W. Bartmann and G. Snatzke, Eds., pp. 53-76, John Wiley & Sons, Chichester, 1983. Calculations of Structures of EnzymeSubstrate Complexes. 225. J. Deisenhofer and W. Steigemann, Acta Crystallogr. Sect. E, 31, 238 (1975). Crystallographic Refinement of the Structure of Bovine Pancreatic Trypsin Inhibitor at 1.S A Resolution.
References 1 41 226. M. Vrisquez and H. A. Scheraga, 1. Biomol. Stmct. Dyn., 5 , 705 (1988). Calculation of Protein Conformation by the Build-up Procedure. Application to Bovine Pancreatic Trypsin Inhibitor Using Limited Simulated Nuclear Magnetic Resonance Data. 227. M. Vasquez and H. A. Scheraga, J. Biomol. S t r u t . Dyn., 5 , 757 (1988). Variable-TargetFunction and Build-up Procedures for the Calculation of Protein Conformation. Application to Bovine Pancreatic Trypsin Inhibitor Using Limited Simulated Nuclear Magnetic Resonance Data. 228. H. A. Scheraga, Pure Appl. Chem., 36,1 (1973). On the Dominance of Short-Range Interactions in Polypeptides and Proteins. 229. P. N. Lewis, F. A. Momany, and H. A. Scheraga, Proc. Natl. Acad. Sci., USA, 68, 2293 (1971). Folding of Polypeptide Chains in Proteins: A Proposed Mechanism for Folding. 230. P. N. Lewis, F. A. Momany, and H. A. Scheraga, Biochim. Biophys. Acta, 303,211 (1973). Chain Reversals in Proteins. 231. H. Wako, N. SaitB, and H. A. Scheraga,]. Protein Chem., 2,221 (1983).Statistical Mechanical Treatment of a-Helices and Extended Structures in Proteins with Inclusion of Short- and Medium-Range Interactions. 232. M. Vrisquez, M. R. Pincus, and H. A. Scheraga, Biopolymers, 26, 351 (1987). Helix-Coil Transition Theory Including Long-Range Electrostatic Interactions: Application to Globular Proteins. 233. P. J. Milburn, Y. Konishi, Y. C. Meinwald, and H. A. Scheraga, /. Am. Chem. SOC.,109, 4486 (1987). Erratum: ibid., 109, 8123 (1987).Chain Reversals in Model Peptides: Studies of Cystine-Containing Cyclic Peptides. 1. Conformational Free Energies of Cyclization of Hexapeptides of Sequence Ac-Cys-X-Pro-Gly-Y-Cys-NHMe. 234. P. J. Milburn, Y.C. Meinwald, S. Takahashi, T. Ooi, and H. A. Scheraga. Int. ],Pept. Protein Res., 31, 311 (1988). Erratum: ibid., 31, 587 (1988).Chain Reversals in Model Peptides: Studies of Cystine-Containing Cyclic Peptides. 11. Effects of Valyl Residues and Possible i-to-(i + 3) Attractive Ionic Interactions on Cyclization of [Cys'], [Cys6] Hexapeptides. 235. C. M. Falcomer, Y.C. Meinwald, I. Choudhary, S. Talluri, P. J. Milburn, J. Clardy, and H. A. Scheraga,]. Am. Chem. SOC., 114,4036 (1992).Chain Reversals in Model Peptides: Studies of Cystine-Containing Cyclic Peptides. 3. Conformational Free Energies of Cyclization of Tetrapeptides of Sequence Ac-Cys-Pro-X-Cys-NHMe. 236. T. Kikuchi, G. NCmethy, and H. A. %heraga,/. Protein Chem., 7,427 (1988). Prediction of the Location of Structural Domains in Globular Proteins. 237. T. Kikuchi, G. NCmethy, and H. A. Scheraga, J. Protein Chem., 7,473 (1988).Prediction of the Packing Arrangement of Strands in @-Sheetsof Globular Proteins. 238. T. Kikuchi, G. NCmethy, and H. A. Scheraga, 1.Protein Chem., 7,491 (1988).Prediction of Probable Pathways of Folding in Globular Proteins. 239. S. Tanaka and H. A. Scheraga, Macromolecules, 10, 291 (1977). Hypothesis about the Mechanism of Protein Folding. 240. S. Fitzwater and H. A. Scheraga, Proc. Nutl. Acud. Sci. USA, 79,2133 (1982). CombinedInformation Protein Structure Refinement: Potential Energy-Constrained Real-Space Method for Refinement with Limited Diffraction Data. 241. G. T. Montelione, K. Wuthrich, A. W. Burgess, E. C. Nice, G. Wagner, K. D. Gibson, and H. A. Scheraga, Biochemistry, 31, 236 (1992). Solution Structure of Murine Epidermal Growth Factor Determined by NMR Spectroscopy and Refined by Energy Minimization with Restraints. 242. P. K. Warme, F. A. Momany, S. V. Rumball, R. W. Tuttle, and H. A. Scheraga, Biochemistry, 13, 768 (1974). Computation of Structures of Homologous Proteins; a-Lactalbumin from Lysozyme. 243. K. R. Acharya, D. I. Stuart, N. P. C. Walker, M. Lewis, and D. C. Phillips, 1.Mol. Biol., 208, 99 (1989).Refined Structure of Baboon a-Lactalbumin at 1.7 A Resolution. 244. K. R. Acharya, D. I. Stuart, D. C. Phillips, and H. A. Scheraga, J. Protein Chem., 9, 549 (1990). A Critical Evaluation of the Predicted and X-ray Structures of a-Lactalbumin.
142 Predicting Three-Dimensional Structures of Oligopeptides 245. M. H. Lambert and H. A. Scheraga, J. Comput. Chem., 10,770 (1989). Pattern Recognition in the Prediction of Protein Structure. 1. Tripeptide Conformational Probabilities Calculated from the Amino Acid Sequence. 246. M. H. Lambert and H. A. Scheraga, J. Comput. Chem., 10,798 (1989). Pattern Recognition in the Prediction of Protein Structure. 11. Chain Conformation from a Probability-Directed Search Procedure. 247. M. H. Lambert and H. A. Scheraga,]. Comput. Chem., 10,817 (1989). Pattern Recognition in the Prediction of Protein Structure. 111. An Importance-Sampling Minimization Procedure. 248. F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer, Jr., M. D. Brice, J. R. Rodgers, 0. Kennard, T. Shimanouchi, and M. Tasumi,]. Mol. Biol., 112,535 (1977).The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures. 249. A. Kidera, Y. Konishi, M. Oka, T. Ooi, and H. A. Scheraga, ]. Protein Chem., 4,23 (1985). Statistical Analysis of the Physical Properties of the 20 Naturally Occurring Amino Acids. 250. T. L. Blundell, J. E. Pitts, 1. J. Tickle, S. P. Wood, and C. W. Wu,Proc. Natl. Acad. Sci. USA, 78,4175 (1981). X-ray Analysis (1.4-A Resolution) of Avian Pancreatic Polypeptide: Small Globular Protein Hormone. 251. I. Glover, 1. Haneef, J. Pitts, S. Wood, D. Moss, I. Tickle, and T. Blundell, Biopolymers. 22, 293 (1983).Conformational Flexibility in a Small Globular Hormone: X-ray Analysis of Avian Pancreatic Polypeptide at 0.98-A Resolution.
CHAPTER 3
Molecular Modeling Using Nuclear Magnetic Resonance Data Andrew E. Torda and Wilfred F. van Gunsteren Physical Chemistry, ETH Zentrum, CH 8092, Zurich, Switzerland
INTRODUCTION Scope and Definitions Molecular modeling can be a vague term with different connotations to different people. These can range from idle molecular doodling to large-scale a6 initio calculations. Modeling based on nuclear magnetic resonance (NMR) data should be less vague, and in this chapter we discuss two aspects. First, calculations based on experimental data imply some (mathematical) model for the data. These models are discussed in the section Modeling of Experimental Data. Second, given a set of approximate atomic Cartesian coordinates, how can those coordinates be adjusted so that the molecular model best agrees with the experimental data and known chemical and energetic properties? This is discussed in the section Refinement, Minimization, and Dynamics. This limited definition has the following implications. NMR techniques themselves are not discussed, except to the extent that one must understand the nature of the data to model them. Similarly, the methods for generating initial structures are treated superficially to understand their limitations and possible errors that might be propagated through a molecular refinement. Finally, exReviews in Computational Chemistry, Volume Ill Kenny B. Lipkowitz and Donald B. Boyd, Editors VCH Publishers, Inc. New York, 0 1992
143
144 Molecular Modeling Using Nuclear Magnetic Resonance Data
amples in this chapter are strongly weighted toward protein studies, but most of the discussion is equally applicable to other areas such as the ever-expanding field of oligonucleotide investigations.
Historical Perspective Historically, X-ray crystallography was the only method for determining three-dimensional structures of macromolecules at atomic resolution. The rise of NMR as a rival technique resulted from a combination of NMR techniques on the experimental side and corresponding algorithmic developments on the theoretical side.’ Even before the common use of Fourier transform NMR, it was known that the homonuclear nuclear Overhauser enhancement (NOE) could be used to identify pairs of protons separated by a few angstroms in space.2 It was also clear, however, that many such distances and accurate assignments to specific protons would be needed to define a solution structure fully. The advent of twodimensional NMR was a major step toward providing the necessary data,3,4 but the introduction of the two-dimensional NOE measurement (NOESY)S.6 truly revolutionized the ability both to assign spectral peaks and to acquire large quantities of distance information. At the same time, it must have been seen that despite the progress of NMR, the method would probably not provide enough information to overdetermine a structure as in X-ray crystallography, even after including known covalent restraints, such as bond lengths and angles. Furthermore, the information would not necessarily be precise or accurate. There was (and still is) no known analytical method for producing Cartesian coordinates given this limited information. This prompted advances in the field known as “distance geometry,”’ described in the next section. Nuclear magnetic resonance methods continued to improve and provide larger quantities of more precise data and alternative approaches to distance geometry were proposed,* but the methods suffered from a lack of realistic atomic interactions. This led to a new treatment of the problem of generating coordinates. Instead of regarding the problem solely in geometric terms, the distance information could be cast in the form of an energy term and completely integrated into a full molecular force field.9 This led to the development of “restrained molecular dynamics.” The field of modeling based on NMR data has continually progressed since the early 1980s, but this cursory historical view puts the rest of this article in perspective. It is still the case that NMR data serve not to define a single structure, but rather to delimit a region of conformational space allowed to the molecule. Because it is hard to deal with the concept of a region of space, most methods simply attempt to generate a set of individual structures that collectively give an idea of the range of allowed conformations. Another feature, common to all approaches, is that of the penalty function. In general, one
Molecular Representation
145
wants to define some function whose value increases as the system moves away from some experimentally measured quantity. In distance geometry calculations, this is simply called the penalty function, whereas in molecular force field approaches, it is called an energy or pseudo-energy term. The nature of the penalty function is discussed in detail in the section Modeling of Experimental Data. Finally, the improvements in the quality and quantity of NMR data have been such that they have shown weaknesses in the original, simple methods for modeling the data. This problem, along with more sophisticated models, is discussed later.
MOLECULAR REPRESENTATION Before considering the generation of structures, it is necessary to consider the level of detail wanted, what is available from NMR, and what kind of representation will be used for a molecule. For example, some of the earliest calculations, predating adequate NMR data, were based on a representation of a protein by one point per residue.10 In protein X-ray crystallography, diffraction caused by protons is never observed, so a typical protein data bank structure has atomic detail, but only for heavy atoms. In contrast, NMR-derived data, especially of larger molecules, consist mostly of information about protons, although this may change as isotope labeling techniques become more widely used. Clearly it would not be adequate for NMR spectroscopists to generate lists of proton coordinates, so a typical NMR-based structure also has atomic detail, but with all heavy atoms and protons. NMR data have the additional property that there are many instances when a resonance is known to arise from one of two sites, but it is not known which one. For example, a residue may contain a pair of methylene protons that give rise to distinct resonances, but the peaks have not been stereospecifically assigned. Similarly, there will be sets of protons that interchange positions rapidly on the NMR time scale, so only a single peak is observed. For example, methyl groups rotate rapidly, so although there may be distance restraints relating to the group, these distance restraints relate to an average of the three methyl protons. For such cases, Wiithrich, Billeter, and Braun introduced the concept of “pseudo-atoms” into the molecular representation.” This means that at such sites, an atom is introduced at the geometric center of the indistinguishable protons and distance restraints are referred to that point. As a consequence, a correction must be added corresponding to the distance from one of the real protons to the pseudo-atom. A second kind of extra atom is also sometimes introduced as a result of a limitation of certain molecular force fields (see Refinement, Minimization, and
246 Molecular Modeling Using Nuclear Magnetic Resonance Data
Dynamics). The field of molecular simulations grew up independently from that of NMR structural studies and developed its own level of molecular detail. Many such force fields use the “united atom” representation of a molecule, in which all heavy atoms are present but only polar hydrogens are treated explicitly.12-14 An NMR restraint, however, may well refer to an aliphatic proton not explicitly treated. To handle this situation, van Gunsteren et al., applied what was called the virtual atom,” With this method, one can take the force that one would want to impose on the proton and redistribute it on to the atoms present. For example, one may have a potential V defined in terms of the coordinates of a proton H, The force as a function of that proton’s coordinates would be written as follows:
Now, as the proton is not present, the force must be written down in terms of the atoms present. So, if the proton’s coordinates can be expressed in terms of adjacent carbon atoms, C1, C2, and so on, the force on carbon C1 could be calculated with the chain rule:
This computational device has become less frequently used in recent years as all-atom force fields have become more popular, but leaving out aliphatic protons does have the advantage of roughly halving the number of particles in an in vacuo simulation and thus significantly reducing computational times.
GENERATING INITIAL STRUCTURES The generation of initial atomic coordinates is a distinct topic in its own right16-1* and is treated here only to the extent necessary to understand its implications for later modeling steps. Here we briefly describe methods that are considered distance geometry techniques in that they treat the problem purely in terms of generating coordinates consistent with the distance information and covalent properties of a molecule. Single algorithms that are capable of both generating initial structures and readily incorporating energetic terms are discussed later under Refinement, Minimization, and Dynamics.
Generating Initial Structures 147
Metric Matrix Method The earliest methods for generating Cartesian coordinates from distance information were reliable only in the case of complete and precise distances.19 A more robust method was proposed by Crippen,20 subsequently revised and comprehensively described7.21 and dubbed the embed algorithm.22 The method can be understood by first considering the case where every interpoint distance is known before introducing the approximations necessary to handle real NMR data. First, a matrix D can be constructed containing the distance between every pair of points. Next, the distance from every point to the center of mass, indicated by the subscript 0, can be calculated from n
n
i-1
The metric matrix G was then defined as the matrix where each element gjj is the dot product of the two vectors from the center of mass to the points i and j . From the cosine rule, each element can be calculated:
Crippen and Have1 then pointed out that this matrix G would have at most three eigenvalues greater than zero: Al, A,, A,. If a matrix W were constructed using the corresponding eigenvectors as its columns, the Cartesian coordinates ( x i , y,, z , ) could be calculated as
Unfortunately, this method cannot be implemented as described. First, the distance matrix D is neither complete nor accurate. Second, a metric matrix, G, calculated from such a matrix may not have only three positive eigenvalues. These problems were also addressed by Crippen and Havel.21 To try to get a reasonable starting matrix D, one first builds a matrix L of lower distance bounds and a corresponding matrix U of upper bounds. Both matrices should contain any experimental distances as well as any covalently determined distances. In cases such as bond lengths, elements 1, may nearly equal u,,, but in the case of undetermined distances between points covalently far from each other, ljj may be the sum of the van der Waal radii, whereas uij will be some large number. To produce a reasonable U matrix a process known as triangle smoothing is applied to every triplet of points in turn. So, for three points, i, j , k , the
148 Molecular Modeling Using Nuclear Magnetic Resonance Data distance dii may be unknown, but an upper bound uji must be less than the sum of the upper bounds u,k and U j k . Formally, this is just stating that
Intuitively, this can be seen by considering a chain of linked points. After iterating through the upper bounds matrix, there will be an upper bound on every distance corresponding to the number of links between each point. A corresponding “inverse triangle inequality” can be applied to each triplet to raise values in the lower bound matrix L. Now, a distance matrix D, usually referred to as the trial distance matrix, can be constructed by simply choosing elements dj, randomly between ujj and 1, and used to construct a metric matrix G . A matrix so constructed might be some approximation to the distances in the real molecule, but probably nor a very good one. Clearly, every time an element d, is selected, it puts limits on subsequent selected distances. This problem of “correlated distances” is discussed further in the section Systematic Errors and Bias. Because of the way the trial matrix is generated, it may not actually correspond to any three-dimensional structure and may be a distance matrix representing an M-dimensional set of coordinates. If this is the case, then the matrix may have M positive eigenvalues, and truncation after the three largest eigenvalues of G will be rather a poor approximation. This means that the generated three-dimensional coordinates will require some form of refinement (discussed in the section Refinement, Minimization, and Dynamics). A more serious problem with the distance matrix approach is that it does not naturally include chirality of molecules. For example, both L and D isomers of an amino acid have exactly the same distance matrix. This problem was not treated by modifying the embed algorithm as such. Instead, the refinement process was modified so as to include a term in the penalty function whose value increased according to the chirality violation at each chiral center.23 The embed algorithm was for many years the method of choice and used for several early solution structures.24-26. This was due partly to an elegant and widely available implementation in the program DISGE0.27 This program also introduced the idea of “substructure embedding,” where a subset of atoms would be initially selected for embedding, For a protein, this subset might consist of carbonyl carbons, amide nitrogens, and a-carbon protons to represent the backbone. Side chains might be represented by f3 carbons, nonterminal y carbons, and a point in the middle of aromatic rings. Distances could then be extracted from the embedded substructures, relaxed somewhat, and put back into a new complete distance matrix D, thus acting as restraints on a second all-atom embedding. Since the mid- 1980s, several suggestions have been made regarding the details of the embed algorithm’s implementation. Some of these were inspired
Generating Initial Structures 149
by perceived deficiencies in the sampling properties of the method. These ideas are discussed in the section Systematic Errors and Biases. One change to the algorithm itself has been to avoid the initial approximation of the distance matrix by three-dimensional coordinates. Instead of taking only the three largest eigenvalues for generating coordinates, it is reasonable to take the four (or possibly more) largest eigenvalues and generate four-dimensional coordinates from the corresponding eigenvectors. These can then be reduced by minimizing against a penalty function that includes a term depending on the molecule’s extent in the fourth dimension.18.28 Another algorithm attributed to Crippen, linearized embedding, does actually involve the creation of a trial metric matrix, but is otherwise very different from the standard embed algorithm.29 Its main virtue is the incorporation of covalent restraints, chirality, and ring planarity at a more fundamental level than the original embed algorithm. Unfortunately, there does not yet seem to be much experience with the method.
Variable Target Function Method Despite the widespread use of the metric matrix method, some drawbacks were apparent in early implementations. First, if a system has n points, then there will be n ( n - 1)/2interpoint distances, and manipulating matrices of this size requires large amounts of memory (real or virtual). Second, correct chirality was never really integrated into the algorithm proper, but rather as a correction during the refinement procedure. To avoid these problems entirely, Braun and C6 developed a very different procedure known as the variable target function method.8 Rather than use a nonlinear optimization procedure as a means for refining coordinates generated by some other algorithm, they used an optimization procedure from the outset, but with a changing penalty function so as to attempt to avoid local minima. The basic intention of this procedure was to start with something like a random o r extended chain conformation for the molecule and first consider only short-range (with respect to the covalent structure) interactions. A penalty function with respect to NMR distance information and van der Waals overlap would then be minimized. Gradually, longer-range interactions would be included and the minimization repeated at each step. In practice, this might mean initially considering only interactions between adjacent residues, that is, every residue i and z + 1. At the next step, interactions between residues 2 apart in the sequence would be considered ( i to i f 1 and i + 2). Next, interactions from each i to i + 3 would be included and so on. More formally, they defined the target (penalty) function Tkhwhere k determines which atom pairs (ij)are to be considered for NMR restraints and 1 serves the same role for van der Waals overlaps. When k = 1, the set of interacting atoms includes pairs of atoms on adjacent residues. When k = 2, the set also includes atoms on residues 2 apart
1 S O Molecular Modeling Using Nuclear Magnetic Resonance Data in the sequence and so on. The penalty function was then defined in terms of the distances d,, between atoms, upper bounds uii, lower bounds l,,, and the repulsive radius si of the atom: Tk,l =
pk(i,j)O(d,;- uii)(d$- u $ ) ~ (i,i)
# 2. 11
+ ca k ( i , j ) e ( l , ,
-
d i , ) ( l i - d:)2
[71
12
((s, + ~ , ) 2-
dii)2.
The first and second terms represent violation of upper and lower bounds, respectively. The third term, independently weighted by an arbitrary w, represents violations of van der Waals radii. Step functions 8, p, a, and 7 are then applied to turn each term on or off as appropriate. Thus, if a bound is violated, e = 1;otherwise it is zero. To define the subsets, p ( i,j) = 1 if i,j are in the subset of upper bounds limited by k ; otherwise it is zero. Similarly, if the relevant atoms are within k residues of each other, a(i , j ) = 1. If the atoms are within 1 residues of each other, ~ ( i , j=) 1; otherwise it too is zero. Aside from the idea of a variable target function, Braun and Gii also chose to implement the algorithm in terms of dihedral angles rather than Cartesian coordinates.8 This had several important effects. First, the representation of the molecule is very compact. For example, the residue glycine, represented as 6 points, would contain 15 distances, but in terms of dihedral angles, it can be completely represented by just two variables. Second, because covalent distances are fixed, final structures produced by this method must have good covalent geometry. This is not the case with metric matrix method structures. Finally, the variable target function method naturally includes correct chirality and can easily incorporate experimental restraints on dihedral angles. Probably the main disadvantage of the variable target function is its alleged lack of convergence, meaning that a rather large number of starting structures may be needed to produce a reasonable number of refined structures.30 This problem is apparently worse for “P-proteins” with more complicated topologies.31 After a demonstration of the method’s abilities on simulated data,8 the algorithm was soon applied to several cases with real data such as metallothionein,32 tendamistat,33 and basic pancreatic trypsin inhibitor (BPTI).34 Furthermore, aside from the original program, DISMAN,8 the basic algorithm has been implemented in other programs such as DADAS3s and the
Generating Initial Structures 151
very highly optimized DlANA.36 Most recently, Giintert and Wiithrich have introduced the idea of calculating a set of initial structures that are used only to identify locally acceptable segments and calculate additional dihedral angle restraints. These additional restraints are redundant, but increase the rate of successful convergence when introduced into subsequent complete DIANA calculations.31
Other Methods for Generating Initial Structures Distance geometry methods are certainly the most attractive means for generating initial structures in that they are now widely implemented, are relatively well understood, and reduce the tendency for an investigator to introduce biases into the structures. Some other methods should, however, be briefly mentioned. It is always possible to build a model of a molecule manually using interactive graphics either from scratch or based on a homologous moleculc, but this method has not been used much in recent years by NMR spectroscopists. First, the sheer number of NMR restraints in a modern data set means that it will be technically difficult to produce a satisfactory structure. Second, the method relies on the ability of the investigator and will introduce any preconceptions of what the molecule should look like. Finally, the sampling properties of such an approach are almost nonexistent. Rather than manual model building, it is possible to use a database approach to build a starting structure from homologous structural fragments taken from the crystallographic protein data bank.37 The main use of this technique is that it can be used during the earliest investigations of a molecule when adequate NMR may not yet be available. A potentially very powerful method, named PROFILE, with a completely different philosophical approach has been developed by Altman and Jardetzky.38s39 Conventional methods attempt to estimate the allowed conformational space by simply generating structures scattered through it. Altman and Jardetzky take the alternative view that one should find the most likely coordinate for each atom and then give a probability distribution for the likelihood of finding the atom in the surrounding region. Calculations on the cyclic peptide cyclosporin A40 have shown the method’s feasibility with real data and the kind of molecular representation it generates. In principle, it is hard to dispute the validity of this approach, but it remains to be seen how well it can be coupled with the inclusion of energetic terms. Lastly, it should be mentioned that if a refinement method is very effective, the choice of a starting structure may be practically irrelevant. Some examples of this are listed under Refinement, Minimization, and Dynamics.
152 Molecular Modeling Using Nuclear Magnetic Resonance Data
MODELING OF EXPERIMENTAL DATA As described in the Introduction, it is usually possible to consider the modeling of experimental data separately from the scheme actually used to move atoms about. Ideally, the different models should be able to be used in the different minimization or dynamics schemes. Thus, the subsequent sections describe the kind of data offered by NMR and the kinds of penalty functions o r pseudo-energy terms that can be used to represent them. For convenience, we use nomenclature common to force field-based approaches where one refers to a distance constraint potential Vdc(r ) as a function of internuclear distance.
Distance Restraints Because the NOE is due to dipolar interactions, its intensity, u, depends on the inverse sixth power of the distance r between the protons. As any macromolecule will contain protons at fixed distances, one should be able to calculate an unknown distance by comparison of its corresponding NOE with the NOE measured between protons at some reference distance r,,f from the simple relation r
= ~ ~ ~ ~ ( u ~ ~ ~ / u ) ~ / 6 .
[HI
This treatment ignores the fact that the reference distance and unknown distance may not be subject to the same motions2 and assumes that only pairwise interactions contribute to the measured intensities.41 Now, given that one can estimate distances in a molecule, there are several ways this can be turned into a penalty function. First, it is clear that if the distance r in the modeled system is less than an upper limit, r,, the penalty term should be zero. Next, if r is greater than r,, the energy should rise with the square of the violation. If the term is being used in some dynamics scheme, one may limit the rate at which the energy increases beyond some certain maximum violation, A r. Combining these three regimes and adding an arbitrary scaling constant K,, give the following form:YJ5 if r
vdc(r) = 0 =
4 Kdc(r -
=
Kdc ( r
-
5 Y,,
if r, < r < r,
ro - i\.>Ar 2
+ Ar
[91
ifro+Arsr.
It is also clear that by reversing signs in Eq. 191, one can enforce a minimum distance between two particles. This may well be useful if one has accurate distance restraints or if one wishes to treat the absence of an NOE as indicative
Modeling of Experimental Data 153 of some minimum distance.42 The choice of a quadratic term in Eq. [9] is really arbitrary and reflects a decision about the relative weighting of NMR distance information and other terms (energetic) that constitute the complete force field. Fry et a1.43 used a quartic form for the penalty term for violated NOE restraints:
Continuing in the same vein, other workers have even used a sixth-power term to enforce distance restraints.44.4s In principle, there is no reason why one should favor Eq. [9] or [ 101, but it must be remembered that unless the scaling constant for this term is minuscule, a quartic or sixth-power term will very quickly rise so as to dominate other terms in a force field. This may be a danger if such methods are used during the early stages of refinement with poor models (large violations) and possibly less accurate NMR data.46.47 Rather than simply adjusting a power term, Scarsdale et a1.47 noted that the pseudo-energy term was being based on a distance Y, but what was measured was more directly related to Y - 6 . With this reasoning, they used a term of the form
As r0 is constant for any particular distance restraint, the term ( r O ) - h can be treated simply as a zero offset. Then, it can be seen that Eq. [ll]better models the physical nature of the NOE than any of the previously mentioned forms. It has now been used in the determination of several solution structures.47@
Averaging over Discrete Conformations In the preceding section, we did not mention that the measured NOE is an ensemble average. Often this fact is regarded as unimportant and one simply refers to generated structures as the average solution structure. In certain cases this viewpoint is totally inappropriate. Consider, for example, the case of a molecule moving between two quite distinct conformations on the NMR time scale. If the interconversion is fast on the NMR time scale, only average peaks will be seen and resulting NOES will reflect both conformations simultane0usly.4~In the case of small peptides with relatively well-characterized conformations, this kind of phenomenon has been clearly observed.sO.s1 With this kind of motion, an expression such as Eq. [9] is a totally inadequate model for the NMR data. To address this point, Scarsdale et a1.47 used a two-state model to build a penalty function. Two noninteracting copies of the oligosaccharide head group of globoside were simulated at once, and the penalty function was based on the weighted average of the two molecules. This weighting was, in turn, based on
154 Molecular Modeling Using Nuclear Magnetic Resonance Data
conformational energies of the two molecules. The approach was subsequently applied to the acyl carrier protein,52 with the relative weighting of conformers determined by a least-squares fitting and starting conformations taken from a previous refinement. As expected, linear combinations of structures were better able to reproduce experimental data without incurring an energetic penalty. This approach is certainly better than trying to model data by a single average conformation, but it seems to be of limited applicability. It is only in small molecules that one is likely to have a small number of well-characterized conformations and the choice of initial conformations seems quite arbitrary.
Time-Averaged Distance Restraints A large flexible molecule is constantly shifting between many states, each of which contributes to measured average NOES. Although one may sometimes be able to identify individual conformations, it is more correct to regard the conformational space of such a system as a continuum. In this case, a single conformation model as in Eq. [9] or [lo] is inadequate, as is even a two-state model. To address this problem, one should not try to force each instantaneous distance Y in the molecule to agree with experimental restraints, but rather base the penalty function on some average distance, ?( t ) , which is now written as a function of time. Ideally, this average should be a true ensemble average, but failing that, it can .at least be an average over time, as in a molecular dynamics trajectory. Such a model was implemented and tested on a model system53 and then revised and shown to work well on a polypeptide with real data.54 The first step in implementing this model is to define ?( t ) suitable for use in a penalty function. Naively, one could write
where r is the scalar distance and the angle brackets denote an average over time rather than over a true ensemble. This would account for the sixth-power averaging of dipolar interactions, but in a short-time approximation, thirdpower averaging is more appropriates” so one should modify Eq. [12] and write ? ( t )=
(y-3)-1/3.
~ 3 1
Then, this could be cast into a form suitable for averaging over a molecular dynamics trajectory by integrating over time t : ? ( t )=
(
r ( t ’ ) - 3d t ’
In practice, however, Eq. [14] would not be suitable for an energy term because the rate of change of r ( t )would depend on how long the averaging had been
Modeling of Experimental Data 155
carried out. To avoid this problem, a memory function was built in with a decay constant of T, and the final form used for the averaging was
( 5 I, e-t’’T[r(t t
I(t) =
-
t’)]-3dt‘
Finally, the derivative of Eq. [15] has fourth-power terms with respect to I ( t ) / r (t ) ,so it was not used to define an energy term as such. Instead, a force was defined in terms of I ( t ) :
-
F(t) = 0
if I ( t ) 5 ro
The use of this model for experimental distance information in a small protein was found to allow increased mobility over the course of a molecular dynamics trajectory while still satisfying the distance restraints.54 I t also allowed the use of a smaller force constant with respect to the energetic terms in the force field and, by increasing mobility, improved the sampling properties of the refinement procedure. The use of time-averaged restraints was more extensively tested with a long oligonucleotide simulation, using simulated experimental restraints extracted from an unrestrained simulation.56 The authors found that the simulations with time-averaged restraints better reproduced simulation details such as sequence-dependent properties than did conventional refinements. One drawback of this method is that it requires the use of an extra parameter, T, the time constant for the decay of the memory function. In practice, this should not be a problem as long as T is longer than the period of the longest motions which are important for NOE averaging. A limitation of the method is that, in contrast to the two-state model of Scarsdale et a1.,47 the procedure can only average over conformations that can interchange during the short time of a molecular dynamics simulation.
Direct NOE Refinement It has been known for a long time that the kind of simplistic distance calibration suggested by Eq. [8] may be subject to systematic errors. First, the intensity of an NOE depends on the spectral density function for the reorientation of the vector between relaxing nuclei. This means that Eq. [8] is valid only if the reference distance and unknown distance are undergoing the same motions. As this is not likely to be the case, distance calibrations have attempted to allow for the possibility of systematic errors.23 Equation [8] also assumes that the dipolar relaxation can be considered in terms of isolated spins relaxing each other. In the presence of spin diffusion, this will lead to a systematic underestimation of distances.41.57.58
156 Molecular Modeling Using Nuclear Magnetic Resonance Data The earliest attempts to account for the effect of spin diffusion relied on some iterative scheme wherein a full relaxation matrix analysis would be used to calculate distances from NOEs, and these distances could be used in a standard refinement procedure.59-64 Clearly a more elegant approach would be to base an energy term on the directly measured NOE intensity. This approach was first implemented by Yip and Case,65 who started with the well-known relationship between the NOE intensity matrix, A(T,), relaxation matrix, R, and NOESY mixing time, 7,: A(T,) = exp(-T,R)A(O) .
~ 7 1
They then managed to take analytical derivatives of exp(-t,R) to build a penalty function based on the observed NOEs. This was, in principle, a distinct improvement over previous models for NOE data, but it was extremely computationally expensive. This situation has led to a series of attempts to find approximations to the method of Yip and Case that would allow the method to be computationally more tractable. Baleja et a1.66 used a complete relaxation matrix analysis, but took two steps to reduce calculation time. First, they used a cutoff of 8 A when considering multispin effects, and, second, they used numerical rather than analytical derivatives. Nilges et al. used a very complete analytical treatment, but showed that the calculations would be tractable if a cutoff of 4 A were used, apparently without introducing too great an appr0ximation.~7Applying the method to squash trypsin inhibitor, they found that atomic shifts of only = l A greatly improved agreement with experimental data and that the previous isolated spin pair approximation had caused a slight systematic contraction of the structures. Bonvin et a1.68 also used analytical gradients, but adopted an approximation to Eq. [17]. Taking a Taylor expansion of the exponential term, they kept only the first two terms, giving for the derivative the form VA(7,)
=
V(l
-
T,R)A(O) = -7,VR.
[I81
where V has its conventional meaning as a vector of partial derivatives (the nabla operator). More recently, they have extended this procedure to work with cross-peaks from three-dimensional NMR spectra.69 Lastly, Mertz et al.70 used a form that was close to the complete analytical form used by Yip and Case, but took a different approach to making the calculations computationally feasible. Their computational efficiency was achieved by noting the large number of numerically negligible contributions to relaxation matrix calculations and applying a filter so as to include only the dominant terms. Even with this improvement, they report that their complete relaxation matrix calculations were 100 times slower than conventional dis-
Modeling of Experimental Data 157 tance geometry calculations. The exact form of the penalty function they used was also unusual in that it was expressed terms of dihedral angles, so it could be implemented in the program DIANA.36 All of these direct NOE penalty functions have several features in common. First, they are all, in principle, more correct than refinement against calculated distances. Unfortunately, they vary from moderately to extremely costly in terms of CPU time. Each implementation also raises the question of the nature of the spectral density function used to model the molecule’s motions. In each case, except that of Baleja et a1.,66 isotropic motion was assumed, although all the methods could be readily adapted to use more elaborate spectral density functions. As experience with these methods accumulates, it should become clear when the extra computational time is worth investing and what the effects of different motional models are. In practice, these methods offer more than just reliable interpretation of distances. Because the procedure can be used to model NOESY cross-peaks with contributions from spin diffusion, they suggest that cross-peaks from spectra with long mixing times, and thus better signal-to-noise, can be reliably used. As Nilges et al.67 point out, a cross-peak may arise entirely via indirect magnetization transfer, but may contain useful structural information.
Dihedral Angle Restraints Well before homonuclear coupling constants could be measured in proteins, Karplus71.72 proposed the relationship between the coupling constant J and the dihedral angle between vicinal protons 8:
where A, B, and C are constants calibrated by fitting measured J values from molecules whose angles are known from crystal structures.73J4 Because of the periodic nature of the cosine function, certain ranges of J correspond to more than one value for 0 from Eq. [19]. Fortunately, large values of J give rise to only one 6 , and where more than one 8 value is possible, it may be the case that only one is viable on steric grounds. Given restraints on dihedral angles, a penalty function Vdlr(8),weighted by an arbitrary force constant Kdlr, was first constructed by analogy with torsional angle potentials in molecular force fields42:
where is chosen so that the cosine term goes to - 1 at the desired angle. A different form for the penalty function has also been used, it being quadratic with respect to the size of the angle violation:75*76
158 Molecular Modeling Using Nuclear Magnetic Resonance Data
where 8, and 8, are lower and upper bounds on the dihedral angle, respectively. There is probably no reason to favor either of the forms given in Eqs. [20] and [21], but an improvement has been suggested by Kim and Prestegard.77 As was the reasoning with distance restraint terms, one should not base a penalty function on a derived quantity, the dihedral angle 0. Instead, one should use an expression of the form
where J(8) was calculated from a recast version of Eq. [19]. This has the disadvantage that multiple values of 8 will give a minimum for the potential energy, but the authors felt that steric restrictions and information from distance restraints should serve to drive the system toward the physically correct solution. Kim and Prestegard also introduced a two-state model for dihedral angles, analogous to their previous work47Jz with distance restraints. As expected, this two-state model produced the best agreement with experimental data and the structures with the lowest potential energy.
REFINEMENT, MINIMIZATION, AND DYNAMICS As described in the preceding section, it is generally possible to separate a penalty function from the method used to actually move the atom’s coordinates. In this section, we consider a number of such schemes that are either in common use for modeling of NMR data or appear promising for the future. We also concentrate on schemes that are less likely to fall into local minima with respect to the energy/pseudo-energy hypersurface. This usually implies that the method can take steps uphill as well as downhill, although some of the methods discussed under Other Derivative-Based Schemes do not fit into this category.
Molecular Dynamics Historically, the field of molecular dynamics (MD) evolved as a means of simulating the behavior of molecules in an attempt to reproduce and, hopefully, to predict structural, dynamic, and thermodynamic properties of molecules.78-80 In recent years, however, it has become popular as a means for refining molecular structures with respect to experimental data.81 This relies on the fact that if any system is simulated, it will tend to run downhill energetically
Refinement,Minimization,and Dynamics 159 as long as excess heat is removed by some means.82 If experimental constraints are cast in the form of an energy term, the system will tend to move downhill with respect to both real and pseudo-energy terms. Conceptually, the M D a1gorit.m is simple. One starts with Newton’s equations of motion relating force (F), and mass (m), and acceleration (Xi)
Since the mass of each particle is known and the force is given by the negative of the derivative of the potential with respect to coordinates the acceleration can be calculated for every particle independently for any configuration. By numerical integration, one can then get velocities and new coordinates for each time step. This method of refinement was first applied, using the distance restraint form of Eq. [9], to the lac repressor headpiece using a model as the initial structure.9~83It was later applied to a heptadecapeptide.84 Subsequent studies with model data showed that, with a carefully chosen simulation protocol, the method was capable of refining even poor starting structures.85,86 Since this early work, experience has accumulated with restrained MD and some points can be made with respect to the effects of restrained MD and simulation protocol. First, if the starting structures have already been very highly refined with respect to distances, and if the distance restraints are very numerous and accurate, then the restrained MD protocol will not make much difference. If the starting structures are less well refined or the restraints less numerous, then restrained M D will be important, as will the choice of protocol. If substantial structural changes are required, then the simulation should be run at high temperature and gradually cooled down as in simulated annealing;87 in the context of dynamics this is often referred to as dynamical simulated annealing.88 The power of this method was well demonstrated by Nilges et al.,89 who managed to refine structures starting from a random array of atoms. Because of the force field used in this study, this method is discussed further in the section Force Fields.
111,112First, one should look at how well defined the structures are. This reflects the quality of the NMR data. Second, one should look at how well the generated structures explain the NMR data. This is a reflection of the refinement procedure. With respect to the first criterion, Have1109 discussed the various measures of dissimilarity between structures. The most common estimate of the spread of a group of structures is the root-mean-square difference of Cartesian coordinates between individual structures or between individual structures and a set of mean coordinates. This figure is usually qualified by being calculated over backbone atoms, all atoms, or atoms within a range of what are regarded as well-defined residues. The root-mean-square difference of backbone dihedrals is another possible indicator,”3 but can be misleading in that dihedral angles can compensate for the effects of one another, leading to a large dihedral angle difference but little difference in overall fold. Finally, Have1 mentions the possibility of using the “distance matrix error,” which is a rootmean-square difference, but is calculated over elements of distance matrices. The only criticism of this measure is that it is insensitive to chiral errors. That is, both left- and right-handed versions of a structure will have identical distance matrices. The most sophisticated approach to quantifying a structure’s definition is the probabilistic approach introduced by Altman, Jardetzky, and coworkers.3*~39~40 in principle, the probability density for each atom is a correct way to estimate the allowed volume, but calculating this quantity does involve some assumptions about the atom’s distribution in space. With respect to agreement with experimental data, the most obvious criterion is the size and number of violations of experimental data. Of course, this figure does not include variation caused by the nature of the data set. For example, a small set of loose restraints with redundant information should always be less violated than a large set of restraints interpreted as tightly as possible. Ignoring variation resulting from data sets, a measure analogous to
Future Directions 165
the crystallographic R factor may become popular, although one could reasonably argue that it be based on distances, all information including dihedral angles, distances to the sixth power, or NOE intensities. For example, Baleja et a1.66 used an R factor based on each of the i NOE intensities, observed and experimental,
whereas Gonzalez et al.114 used a more elaborate form including a weighting based on the NOESY mixing time 7,:
2 R = T m
+ T , I N O E ~ ~-, ,NOE,,n,iexpI ~~~ i
c c ~,NOETm,IObS T,,,
[281
i
Lastly, generated structures should be judged by many of the criteria that X-ray crystallographers already apply to their structures. These include low potential energy, hydrogen-bonded donors and acceptors in the core of proteins, stereochemically reasonable Ramachandran plots, no regions of unexplained charge density in the protein core, and a clustering of hydrophobic residues within the core rather than on the surface of molecules.
FUTURE DIRECTIONS Since the earliest structures were published based on NMR data, the nature of the computational problem has actually changed. In the mid-l980s, it would be an experimental achievement to have around 102 distance restraints. More recently, structures have been published based on literally an order of magnitude more information.115 To some extent, this trend will continue as instrumentation improves, multidimensional NMR pulse techniques develop,4.116 isotopic labeling becomes more common, and the processing of raw NMR data manages to extract more information.117 Ideally, this would mean that the process of modeling based on NMR data could become completely automatic and issues such as sampling of conformational space would become less important as structural definition improves. In practice, what will also happen is that data will accumulate so that prior models become less adequate. Thus, for example, one can expect to see more examples where direct NOE refinement makes the isolated spin
166 Molecular Modeling Using Ntrclear Magnetic Resonance Data
pair approximation break down and more cases where high-quality NMR data suggest conformational heterogeneity rather than just conformational definition.54.102 From the point of view of modeling data, there are still some areas that are not well treated, especially systems where dynamics are important. Rather than concentrating on ever larger molecules, it is interesting to note that small noncyclic peptides are generally considered too flexible to be modeled well. Several molecular complexes have been studied by NMR,'1*-'2* and aside from their biological importance, these may well turn up new challenges for the interpretation of experimental data. Finally, NMR-based structures will, in the foreseeable future, suffer from a poor ratio of data to degrees of freedom, when compared with X-ray crystallographic structures. This suggests that modeling based on NMR data will continue to rely on the quality of the force field used in refinements. Consequently, improvements in force fields will contribute to the quality of NMR structures as will the standard inclusion of solvents in refinement calculations.
ACKNOWLEDGMENT We are most grateful to Dr. Tim Harvey for discussions and his opinions on this chapter.
REFERENCES 1. G. M. Clore and A. M. Gronenborn, CRC Crit. Rev. Biochem., 24,479 (1989).Determination of the Three Dimensional Structures of Proteins and Nucleic Acids in Solution by Nuclear Magnetic Resonance Spectroscopy. 2. J. H. Noggle and R. E. Schirmer, The Nuclear Overhauser Efect, Academic Press, New York, 1971. 3. J. Jenner, in Proceedings of the Ampere international Summer School, Bask Polje, Yugoslavia, 1971. 4. R. R. Ernst, G. Bodenhausen, and A. Wokaun, Principles of Nuclear Magnetic Resonance in One and Two Dimensions, Clarendon Press, Oxford, 1986. 5. A. Kumar, R. R. Ernst, and K. Wuthrich, Biochem. Biophys. Res. Comm., 95, 1 (1980). A Two-Dimensional Nuclear Overhauser Enhancement (2D NOE) Experiment for the Elucidation of Complete Proton-Proton Cross-Relaxation Networks in Biological Macromolecules. 6. S. Macura and R. R. Ernst, Mol. Phys., 41, 95 (1980). Elucidation of Cross-Relaxation in Liquids by Two-Dimensional NMR Spectroscopy. 7. G. M. Crippen in Chemometrics Research Studies Series, Vol. 1, D. Bawden, Ed., Research Studies Press (Wiley), New York, I98 1. Distance Geometry and Conformational Calculations.
References 167 8. W. Braun and N. Go, ]. Mol. Biof., 186,611 (1983).Calculation of Protein Conformations by Proton-Proton Distance Constraints. A New Efficient Algorithm. 9. W. F. van Gunsteren, R. Kaptein, and E. R. P. Zuiderweg, in Proceedings ofthe NATO/ CECAM Workshop on Nucleic Acid Conformation and Dynamics, W. K. Olsen, Ed., Centre de Calcul Atomique et Moleculaire, Orsay, 1984, pp. 79-82. Use of Molecular Dynamics Computer Simulations When Determining Protein Structure by 2D-NMR. 10. T. F. Havel, G. M. Crippen, and I. D. Kuntz, Biopolymers, 18,73 (1979).Effects of Distance constraints on Macromolecular Conformation. 11. Simulation of Experimental Results and Theoretical Predictions. 11. K. Wuthrich, M. Billeter, and W. Braun,]. Mol. Biol., 169,949 (1983).Pseudo-Structures for the 20 Common Amino Acids for Use in Studies of Protein Conformations by Measurements of Intramolecular Proton-Proton Distance Constraints with Nuclear Magnetic Resonance. 12. L. G. Dunfield, A. W. Burgess, and H. A. Scheraga, ]. Am. Chem. SOL., 82, 2609 (1978). Energy Parameters in Polypeptides. 8. Empirical Potential Energy Algorithm for the Conformation Analysis of Large Molecules. 13. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, 1. Comput. Chem., 4, 187 (1983). CHARMM: A Program for Macromolecular Energy, Minimization and Dynamics Calculations. 14. W. F. van Gunsteren and H. J. C. Berendsen, Biochem. SOL. Trans., 10,301 (1982).Molecular Dynamics: Perspectives for Complex Systems. 15. W. F. van Gunsteren, R. Boelens, R. Kaptein, R. M. Scheek, and E. R. P. Zuiderweg, in Molecular Dynamics and Protein Structure, J. Hermans, Ed., Polycrystal Book Service, Western Springs, Illinois, 1985, pp. 92-99. An Improved Restrained Molecular Dynamics Technique to Obtain Protein Tertiary Structure from Nuclear Magnetic Resonance Data. 16. W. Braun, Q.Rev. Biophys., 19, 115 (1987). Distance Geometry and Related Methods for Protein Structure Determination from NMR Data. 17. G. M. Clore and A. M. Gronenborn, Protein Engin., 1,275 (1987). Determination of ThreeDimensional Structures of Proteins in Solution by Nuclear Magnetic Resonance Spectroscopy. T. F. Havel, Prog. Biophys. Mol. Biol., 56, 43 (1991). An Evaluation of Computational Strategies for Use in the Computational Determination of Protein Structure from Distance Constraints Obtained by Nuclear Magnetic Resonance. 19. L. M. Blumenthal, Theory and Applications of Distance Geometry, Chelsea, New York, 1970. 20. G. M. Crippen,J. Comput. Phys., 26,449 (1977). A Novel Approach to the Calculation of Conformation: Distance Geometry. 21. G. M. Crippen and T. F. Havel, Distance Geometry and Molecular Conformation, Chemometrics Research Studies Series, D. Bawden, Ed., Wiley, New York, 1981. 22. T. F. Havel, I. D. Kuntz, and G. M. Crippen, Bull. Math. Biol. 45, 665 (1983). The Theory and Practice of Distance Geometry. See also A. R. Leach, Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 1-55. A Survey of Methods for Searching the Conformational Space of Small and MediumSized Molecules. 23. W. Braun, C. Bosch. L. R. Brown, N. GB, and K. Wuthrich, Biochim. Biophys. Acta, 667, 377 (1981). Combined Use of Proton-Proton Overhauser Enhancements and a Distance Geometry Algorithm for Determination of Polypeptide Conformations. Application to Micelle-Bound Glucagon. 24. W. Braun, G. Wider, H. K. Lee, and K. Wiithrich,]. Mol. Biol., 169,921 (1983).Conformation of Glucagon in a Lipid-Water Interphase 1H Nuclear Magnetic Resonance. 25. A. D. Arseniev, V. I. Kondakov, V. N. Maiorov, and V. F. Bystrov, FEBS Lett., 165,57 (1984). NMR Solution Spatial Structure of Short Scorpion Insectotoxin I,A.
168 Molecular Modeling Using Nuclear Magnetic Resonance Data 26. M. P. Williamson, T. F. Havel, and K. Wiithrich, J. Mol. Biol., 182, 295 (1985). Solution Conformation of Proteinase Inhibitor 11A from Bull Seminal Plasma by 1H Nuclear Magnetic Resonance and Distance Geometry. T. F. Havel and K. Wiithrich, Bull. Math. Biol., 182, 673 (1985). A Distance Geometry 27. Program for Determining the Structures of Small Proteins and Other Macromolecules from Nuclear Magnetic Resonance Measurements of Intramolecular 'H-*H Proximities in Solution. 28. R. Morrison and D. Hare, J. Mol. Biol., 204, 483 (1988). Determining Stereospecific 'H Nuclear Magnetic Resonance Assignments from Distance Geometry Calculations. 29. C . M. Crippen, I. Comput. Chem., 10, 896 (1989). Linearized Embedding: A New Metric Matrix Algorithm for Calculating Molecular Conformations Subject to Geometric Constraints. 30. A. D. Kline, W. Braun, and K. Wiithrich,]. Mol. fliol., 204,675 (1988).Determination of the Complete Three-Dimensional Structure of the a-Amylase Inhibitor Tendamistat in Aqueous Solution by Nuclear Magnetic Resonance and Distance Geometry. 31. P. Giintert and K. Wiithrich,]. Biomol. N M R , 1,447 (1991). Improved Efficiency of Protein Structure Calculations from NMR Data Using the Program DIANA with Redundant Dihedral Angle Constraints. 32. W. Braun, G. Wagner, E. Worgotter, M. Vasak, J. H. R. Kagi, and K. Wiithrich, J. Mol. Biol. 187, 125 (1986). PoIypeptide Fold in the Two Metal Clusters of Metallothionein-2 by Nuclear Magnetic Resonance and Distance Geometry. 33. A. D. Kline, W. Braun, and K. Wiithrich, J. Mol. E d . , 189, 377 (1986). Studies by 'H Nuclear Magnetic Resonance and Distance Geometry of the Solution Conformation of Tendamistat, and a-Amylase Inhibitor. 34. G. Wagner, W. Braun, T. F. Havel, T. Schaumann, N. GO, and K. Wiithrich, J. Mol. Biol., 196, 61 1 (1987). Protein Structures in Solution by Nuclear Magnetic Resonance and Distance Geometry: The Polypeptide Fold of the Basic Pancreatic Trypsin Inhibitor Determined Using Two Different Algorithms. DISGEO and DISMAN. 35. D. Kohda, N. GO, K. Hayashi, and F. Inagaki, 1. Biocbem., 103, 741 (1988). Tertiary Structure of Mouse Epidermal Growth Factor Determined by Two-Dimensional 'H NMR. 36. P. Giintert, Y. Q. Qian, G. Otting, M. Miiller, W. Gehring, and K. Wiithrich, J. Mol. B i d , 217, 531 (1991). Structure Determination of the Antp (C39 -+ S) Homeodomain from Nuclear Magnetic Resonance Data in Solution Using a Novel Strategy for the Structure Calculation with the Programs DIANA, CALIBA, HABAS, and GLOMSA. 3 7. P. J. Kraulis and T. A. Jones, Proteins, 2, 188 (1987). Determination of Three-Dimensional Protein Structures from Nuclear Magnetic Resonance Data Using Fragments of Known Structures. 38. R. B. Altman and 0.Jardetzky, in Methods in Enzymology, Vol. 177, N. J. Oppenheimer and T. L. James, Eds., Academic Press, San Diego, Calif., 1989, pp. 218-246. Heuristic Refinement Method for Determination of Solution Structure of Proteins from Nuclear Magnetic Resonance Data. 39. R. Pachter, R. B. Altman, and 0. Jardetzky, 1.Mugn. Reson., 89, 578 (1990). The Dependence of a Protein Solution Structure on the Quality of the Input NMR Data. Application of the Double-Iterated Kalman Filter Technique to Oxytocin. 40. R. Pachter, R. B. Altman, J. Czaplicki, and 0.Jardetzky, J. Mugn. Reson., 92,468 (1991). Comparison of the NMR Solution Structures of Cyclosporin A Determined by Different Techniques. 41. A. Kalk and H. J. C. Berendsen, J. Mugn. Reson., 24,343 (1976). Proton Magnetic Relaxation and Spin Diffusion in Proteins. 42. J. de Vlieg, R. Boelens, R. M. Scheek, R. Kaptein, and W. F. van Gunsteren, lsr.]. Chem., 27, 18 1 (1986).Restrained Molecular Dynamics Procedure for Protein Tertiary Structure Determination from NMR Data: A lac Repressor Headpiece Structure Based on Information on J-Coupling and from Presence and Absence of NOES.
References 169 43. D. C. Fry, V. S. Madison, D. R. Bolin, D. N. Greeley, V. Toome, and B. B. Wegrzynski, Biochemistry, 28, 2399 (1 989). Solution Structure of an Analogue of Vasoactive Intestinal Peptide as Determined by Two-Dimensional NMR and Circular Dichroism Spectroscopies and Constrained Molecular Dynamics. 44. H. Widmer, M. Billeter, and K. Wiithrich, Proteins, 6, 357 (1989). Three-Dimensional Structure of the Neurotoxin ATX la from Anemoniu sulcutu in Aqueous Solution Determined by Nuclear Magnetic Resonance Spectroscopy. 4.5. M. Billeter, Th. Schaumann, W. Braun, and K. Wuthrich, Biopolymers, 29, 695 (1989). Restrained Energy Refinement with Two DifferentAlgorithms and Force Fields of the Structure of the a-Amylase Inhibitor Tendamistat Determined by NMR in Solution. 46. T. A. Holak, J. H. Prestegard, and J. D. Forman, Biochemistry, 26, 4652 (1987). NMRPseudoenergy Approach to the Solution Structure of Acyl Carrier Protein. 47. J. N. Scarsdale, P. Ram, J. H. Prestegard, and R. K. Yu,]. Comput. Chem., 9, 133 (1988). A Molecular Mechanics-NMR Pseudoenergy Approach to the Solution Conformation of Glycolipids. 48. S. A. Woodson and D. M. Crothers, Biopolymers, 28, 1149 (1989). Conformation of a Bulge-Containing Oligomer from a Hot-Spot Sequence by NMR and Energy Minimization. 49. 0.Jardetzky, Biochim. Biophys. Actu, 621,227 ( 1 980). On the Nature of Molecular Conformations Inferred from High-Resolution NMR. 50. H. Pepermans, D. Tourwe, G. van Binst, R. Boelens, R. M. Scheek, W. F. van Gunsteren, and R. Kaptein, Biopolymers, 27,323 (1988).The Combined Use of NMR, Distance Geometry and Restrained Molecular Dynamics for the Conformational Study of a Cyclic Somatostatin Analogue. 51. H. Kessler, C. Griesinger, J. Lautz, A. Miiller, and W. F. van Gunsteren, 1.Am. Chem. SOC., 110, 3393 (1988). Conformational Dynamics Detected by Nuclear Magnetic Resonance NOE Values and ] Coupling Constants. 52. Y. Kim and J. H. Prestegard, Biochemistry, 28, 8792 (1989). A Dynamic Model for the Structure of Acyl Carrier Protein in Solution. 53. A. E. Torda, R. M. Scheek, and W. F. van Gunsteren, Chem. Phys. Lett., 157,289 (1989). Time-Dependent Distance Restraints in Molecular Dynamics Simulations. 54. A. E. Torda, R. M. Scheek, and W. F. van Gunsteren,]. Mol. Biol.,214,223, (1990).TimeAveraged NOE Distance Restraints Applied to Tendamistat. 55. J. Tropp, J. Chem. Phys., 72, 6035 (1980). Dipolar Relaxation and Nuclear Overhauser Effects in Nonrigid Molecules: The Effect of Fluctuating lnternuclear Distances. 56. D. A. Pearlman and P. A. Kollman, J. Mol. Biol., 220, 457 (1991). Are Time-Averaged Restraints Necessary for Nuclear Magnetic Resonance Refinement? 57. B. D. Sykes, W. E. Hull, and G.H. Snyder, Biophys. I., 21, 137 (1978). Experimental Evidence for the Role of Cross-relaxation in Proton Nuclear Magnetic Resonance Spin Lattice Relaxation Time Measurements in Proteins.
58. M. Madrid, J. E. Mace, and 0.Jardetzky,]. Magn. Reson., 83,267 (1989).Consequences of Magnetization Transfer on the Determination of Solution Structures of Proteins. 59. R. Boelens, T. M. G. Koning, and R. Kaptein,]. Mol. Struct., 173,299 (1988).Determination of Biomolecular Structures from Proton-Proton NOE’s Using a Relaxation Matrix Approach. 60. R. Boelens, T. M. G. Koning, G. A. van der Marel, J. H. van Boom, and R. Kaptein,]. Mugn. Reson., 82, 290 (1989). Iterative Procedure for Structure Determination from ProtonProton NOE’s Using a Full Relaxation Matrix Approach. Applications to a DNA Octamer. 61. J. W. Keepers and T. L. James, I. Mugn. Reson., 57, 404 (1984). A Theoretical Study of Distance Determinations from NMR. Two-Dimensional Nuclear Overhauser Effect Spectra.
62. B. A. Borgias and T. L. James, I. Mugn. Reson., 87,475 (1990). MARDIGRAS-A Procedure for Matrix Analysis of Relaxation for Discerning Geometry of an Aqueous Structure.
170 Molecular Modeling Using Nuclear Magnetic Resonance Data 63. E. T. Olejniczak, R. T. Gampe, Jr., and S. W. Fesik,]. Mugn. Reson., 67,28 (1986).Accounting for Spin Diffusion in the Analysis of 2D NOE Data. 64. K. M. Banks, D. R. Hare, and B. R. Reid, Biochemistry, 28,6996 (1989).Three-Dimensional Solution Structure of a DNA Duplex Containing the BclI Restriction Sequence: TwoDimensional NMR Studies, Distance Geometry Calculations, and Refinement by BackCalculation of the NOESY Spectrum. 65. P. Yip and D. A. Case, J. Mugn. Reson., 83,643 (1989). A New Method for Refinement of Macromolecular Structures Based on Nuclear Overhauser Effect Spectra. 66. J. D. Baleja, J. Moult, and B. D. Sykes, J. Mugn. Reson., 87, 375 (1990). Distance Measurement and Structure Refinement with NMR Data. 67. M. Nilges, J. Habazettl, A. T. Briinger, and T. A. Holak, 1. Mol. Biol., 219, 499 (1991). Relaxation Matrix Refinement of the Solution Structure of Squash Trypsin Inhibitor. 68. A. M. J. J. Bonvin, R. Boelens, and R. Kaptein, J. Biomol. N M R , in press. Direct NOE Refinement of Biomolecular Structures Using 2D NMR Data. 69. A. M. 1. J. Bonvin, R. Boelens, and R. Kaptein, 1.Magn. Reson., in press. Direct Structure Refinement using 3D NOE-NOE Spectra of Bioniolecules. 70. J. E. Mertz, P. Giintert, K. Wuthrich, and W. Braun, J. Biomol. N M R , 1, 257 (1991). Complete Relaxation Matrix Refinement of NMR Structures of Proteins Using Analytically Calculated Dihedral Angle Derivatives of NOE Intensities. 71. M. Karplus, J . Chem. Phys., 30, 1 1 (1959). Contact Electron-Spin Coupling of Nuclear Magnetic Moments. 72. M. Karplus, J. Am. Chem. Soc., 85, 2870 (1963). Vicinal Proton Coupling in Nuclear Magnetic Resonance. 73. A. I’ardi, M. Bilker, and K. Wiithrich, J. Mol. Biol., 180, 741 (1984). Calibration of the Angular Dependence of the Amide Proton-Ca Proton Coupling Constants, 3JHNa, in a Globular Protein. 74. S. Ludvigsen, K. V. Andersen, and F. M. Poulsen, J. Mol. Biol., 217, 731 (1991). Accurate Measurements of Coupling Constants from Two-Dimensional Nuclear Magnetic Resonance Spectra of Proteins and Determination of +Angles. 75. G . M. Clore, M. Nilges, D. K. Sukumaran, A. T. Briinger, M. Karplus, and A. M. Gronenborn, E M B O I. 5, , 2729 (1986). The Three-Dimensional Structure of a-Purothionin in Solution: Combined Use of Nuclear Magnetic Resonance, Distance Geometry, and Restrained Molecular Dynamics. 76. G. M. Clore, D. K. Sukumaran, M. Nilges, and A. M. Gronenborn, Biochemistry, 26,1732 (1987). Three-Dimensional Structure of Phoratoxin in Solution: Combined Use of Nuclear Magnetic Resonance, Distance Geometry, and Restrained Molecular Dynamics. 77. Y. Kim and J. H. Prestegard, Proteins, 8, 377 (1990). Refinement of the NMR Structures for Acyl Carrier Protein with Scalar Coupling Data. 78. J. A. McCammon and S. C. Harvey, Dynamics of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, 1987. 79. C. L. Brooks 111, M. Karplus, and B. M. Pettitt, Proteins: A Theoretical Perspective of Dynamics, Structure and Thermodynamics, Advances in Chemical Physics, Vol. 71, Wiley, New York, 1988. 80. W. F. van Gunsterrn and H. J. C. Berendsen, Angew. Chem., 29, 992 (1990). Computer Simulation of Molecular Dynamics: Methodology, Applications, and Perspectives in Chemistry. 81. A. T. Briinger and M. Karplus, Acc. Chem. Res., 24, 54 (1991). Molecular Dynamics Simulations with Experimental Restraints. 82. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R. Haak, J. Chem. Phys., 81, 3684 (1984). Molecular Dynamics with Coupling to a Heat Bath. 83. R. Kaptein, E. R. P. Zuiderweg, R. M. Scheek, R. Boelens, and W. F. van Gunsteren, J. Mol. Biol., 182, 179 (1985). A Protein Structure from Nuclear Magnetic Resonance Data: lac Repressor Headpiece.
References 171 84. G. M. Clore, A. M. Gronenborn, A. T. Brunger, and M. Karplus, ]. Mol. B~ol.,186, 435 (1985). Solution Conformation of a Heptadecapeptide Comprising the DNA Binding Helix F of the Cyclic AMP Receptor Protein of Escherichiu coli. Combined Use of 'H Nuclear Magnetic Resonance and Restrained Molecular Dynamics. 85. G. M. Clore, A. T. Brunger, M. Karplus, and A. M. Gronenborn, ]. Mol. Biol., 191, 523 (1986). Applications of Molecular Dynamics to Three-Dimensional Protein Structure Determination. A Model Study of Crambin. 86. A. T. Brunger, G. M. Clore, A. M. Gronenborn, and M. Karplus, Proc. Nutl. Acud. Sci. USA, 83, 3801 (1986). Three-Dimensional Structure of Proteins Determined by Molecular Dynamics with Interproton Distance Restraints: Applications to Crambin. 87. S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, Science, 220,671 (1983).Optimization by Simulated Annealing. 88. R. Carr and M. Parrinello, Phys. Rev. Lett., 55,2471 (198.5).Unified Approach for Molecular Dynamics and Density-Functional Theory. 89. M. Nilges, G . M. Clore, and A. M. Gronenborn, FEBS Lett., 239, 129 (1988).Determination of Three-Dimensional Structures of Proteins from Interproton Distance Data by Dynamical Simulated Annealing from a Random Array of Atoms. 90. Shi Yun-yu, W. Lu, and W. F. van Gunsteren, Mol. Simul., 1, 369 (1988). On the Approximation of Solvent Effects on the Conformation and Dynamics of Cyclosporin A by Stochastic Dynamics Simulation Techniques. 91. A. DiNola, H. J. C. Berendsen, and 0. Edholm, Macromolecules, 17, 2044 (1984). Free Energy Determination of Polypeptide Conformations Generated by Molecular Dynamics. 92. R. A. Donnelly and J. W. Rogers, Jr., Int. ]. Quantum Chem. Quantum Chem. Symp., 22, 507 (1988). A Discrete Search Technique for Global Optimization. 93. R. C . van Schaik, W. F. van Gunsteren, and H. J. C. Berendsen, ]. Cornput.-Aided Mol. Design, 6, 97 (1992). Conformational Search by Potential Energy Annealing: Algorithm and Applications to Cyclosporin A. 94. G. M. Crippen, ]. Comput. Chem., 3, 471 (1982). Conformational Analysis by Energy Embedding. 95. G. M. Crippen, Biopolymers, 21, 1933 (1982). Energy Embedding of Trypsin Inhibitor. 96. G. M. Crippen,]. Phys. Chem., 91, 6341 (1987). Why Energy Embedding Works. 97. G. M. Crippen and T. F. Havel, ]. Chem. Inf. Comput. Sci., 30,222 (1990). Global Energy Minimization by Rotational Energy Embedding. 98. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, 1. Chem. Phys., 21, 1087 (1953). Equation of State Calculations by Fast Computing Machines. 99. S. H. Northrup and J. A. McCammon, Biopolymers, 19, 1001 (1980). Simulation Methods for Protein Structure Fluctuations. 100. R. M. Levy, D. A. Bassolino, D. B. Kitchen, and A. Pardi, Biochemistry, 28, 9361 (1989). Solution Structures of Proteins from NMR Data and Modelling: Alternative Folds for Neutrophil Peptide 5. 101. Z. Li and H. A. Scheraga, Proc. Nutl. Acad. Sci. USA, 84, 6611 (1987). Monte CarloMinimization Approach to the Multiple-Minima Problem in Protein Folding. 102. T. Schaumann, W.Braun, and K. Wuthrich, Biopolymers, 29, 679 (1990). The Program FANTOM for Energy Refinement of Polypeptides and Proteins Using a Newton-Raphson Minimizer in Torsion Angle Space.
103. B. van Freyberg and W. Braun, 1.Comput. Chem., 12, 1065 (1991).Efficient Search for All Low Energy Conformations of Polypeptides by Monte Carlo Methods. 104. M. Nilges, G. M. Clore, and A. M. Gronenborn, FEBS Lett., 229,317 (1988). Determination of Three-Dimensional Structures of Proteins from Interproton Distance Data by Hybrid Distance Geometry-Dynamical Simulated Annealing Calculations. 105. M. Nilges, A. M. Gronenborn, A. T. Brunger, and G. M. Clore, Protein Engin., 2,27 (1988). Determination of Three-Dimensional Structures of Proteins by Simulated Annealing with
172 Molecular Modelina Usina Nuclear Magnetic Resonance Data lnterproton Distance Restraints. Application to Crambin, Potato Carboxypeptidase Inhibitor, and Barley Serine Proteinase Inhibitor 2. 106. A. T. Briinger, G. M. Clore, A. M. Gronenborn, and M. Karplus, Protein Engin., 1, 399 ( 1 987). Solution Conformations of Human Growth Hormone Releasing Factor: Comparison of the Restrained Molecular Dynamics and Distance Geometry Methods for a System without Long-Range Distance Data. 107. J. de Vlieg, R. M. Scheek, W. F. van Gunsteren, H. J. C. Berendsen, R. Kaptein, and J. Thomason, Proteins, 3, 209 (1988). Combined Procedure of Distance Geometry and Restrained Molecular Dynamics Techniques for Protein Structure Determination from Nuclear Magnetic Resonance Data: Application to the DNA Binding Domain of lac Repressor from Escherichia coli. 108. W. J. Metzler, D. R. Hare, and A. Pardi, Biochemistry, 28,7045, (1989).Limited Sampling of Conformational Space by the Distance Geometry Algorithm: Implications for Structures Generated from NMR Data. 109. T. F. Havel, Biopolymers, 29, 1565 (1990). The Sampling Properties of Some Distance Geometry Algorithms Applied to Unconstrained Polypeptide Chains: A Study of 3 830 Independently Computed Conformations. 110. J. Kuszewski, M. Nilges, and A. T. Briinger, J . Biomof. NMR, in press. Sampling and Efficiency of Metric Matrix Distance Geometry: A Novel Partial Metrization Algorithm. 111. W. F.van Gunsteren, in Studies in Physical and Theoretical Chemistry, Vol. 71: Modelling of Molecular Structures, J-L. Rivail, Ed., Elsevier, Amsterdam, 1990, pp. 463-478. On Testing Theoretical Models by Comparison of Calculated with Experimental Data. 112. W. F. van Gunsteren, P. Gros, A. E. Torda, H. J. C. Berendsen, and R. C. van Schaik, in Protein Conformation, F. M. Richards, Ed., John Wiley & Sons, Chichester, 1991, pp. 150166. On Deriving Spatial Protein Structure from NMR or X-Ray Diffraction Data. 113. T. F. Havel and K. Wuthrich,]. Mol Biol., 182,281 (1985).An Evaluation of the Combined Use of Nuclear Magnetic Resonance and Distance Geometry for the Determination of Protein Conformations in Solution. 114. C. Gonzalez, J. A. C. Rullmann, A. M. J. J. Bonvin, R. Boelens, and R. Kaptein, I. Magn. Reson., 91, 659 (1991).Toward an NMR R Factor. 115. G. M. Clore, P. T. Wingfield, and A. M. Gronenborn, Biochemistry, 30,2315 (1991).HighResolution Three-Dimensional Structure of Interleukin l p in Solution by Three- and FourDimensional Nuclear Magnetic Resonance Spectroscopy. 116. G. M. Clore and A. M. Gronenborn, Prog. NMR Spectrosc., 23,43 (1991).Applications of Three- and Four-Dimensional Heteronuclear NMR Spectroscopy to Protein Structure Determination. 117. R. E. Hoffman and C. C. Levy, Prog. NMR Spectrosc. 23,211 (1991).Modern Methods of NMR Data Processing and Data Evaluation. 118. J. de Vlieg, H. J. C. Berendsen, and W. F. van Gunsteren, Proteins, 6, 104 (1989).An NMR Based Molecular Dynamics Simulation of the Interaction of the luc Repressor Headpiece and Its Operator in Aqueous Solution. 119. J. Kallen, C. Spitzfaden, M. G. M. Zurini, G. Wider, H. Widmer, K. Wuthrich, and M. D. Walkinshaw, Nature, 353,276 (1991).Structure of Human Cyclophilin and Its Binding Site for Cyclosporin A Determined by X-ray Crystallography and NMR Spectroscopy. 120. S. W. Fesik, R. T. Gampe, Jr., H. L. Eaton, G. Gemmecker, E. T. Olejniczak, P. Neri, T. F. Holzman, D. A. Egan, R. Edalii, R. Simmer, R. Helfrich, J. Hochlowski, and M. Jackson, Biochemistry, 30, 6574 (1991). NMR Studies ot [U-~~C]CyclosporinA Bound to Cyclophilin: Bound Conformation and Portions of Cyclosporin Involved in Binding. 121. C. Weber, G. Wider, B. von Freyberg, R. Traber, W. Braun, W. Widmer, and K. Wuthrich, Biochemistry, 30,6563 (1991).The NMR Structure of Cyclosporin A Bound to Cyclophilin in Aqueous Solution.
CHAPTER 4
Computer-Assisted Methods in the Evaluation of Chemical Toxicity David F. V. Lewis Molecular Toxicology Research Group, Division of Toxicology, School of Biological Sciences, University of Surrey, Guildford, Surrey G U2 S X H , United Kingdom
INTRODUCTION Toxicology may be defined as the “study of the adverse effects of chemical agents on biological systems.”’ It is worth remarking that all chemicals are toxic and, therefore, it is the dose of a given substance that determines whether a toxic response is observed. One way of expressing the toxicity of a chemical is by the acute lethal dose (LD,,), which can be used for ranking substances in order of relative toxicity (Table 1). On such a basis, a substance would be regarded as practically nontoxic if its lethal oral dose in humans were greater than 15 g/kg body weight.’ A highly toxic chemical, in contrast, could possess a lethal dose of less than 5 mg/kg (Table 2). An inspection of tables of toxic chemicals shows considerable structural diversity: a toxic substance can be anything from a small molecule, such as carbon monoxide, to a large molecular weight protein like the insecticidal 6-endotoxin.2 Moreover, toxicity may be acute or chronic and subdivisions exist between these two extremes. Clearly, the mechanisms by which chemicals exert their toxic effects differ considerably;
Reviews in Computational Chemistry, Volume Ill Kenny 8 . Lipkowitz and Donald B. Boyd, Editors VCH Publishers, Inc. New York, 0 1992
173
1 74 Computer-Assisted Methods in the Evaluation of Chemical Toxicity Table 1 Approximate Acute Lethal Doses of Some Representative Chemical Agents in 50% of Individuals Ethyl alcohol Sodium chloride
10,000 4,000 1,500 900 150 5 2 1 0.5
Ferrous sulfate
Morphine sulfate Phenobarbital sodium Picrotoxin Strychnine sulfate Nicotine d-Tubocurarine Hemicholinium-3 Tetrodotoxin
0.2
2,3,7,8-Tetrachlorodibenzo-p-dioxin(TCDD) Botulinum toxin
0.10 0.001
0.00001
also, toxicity varies between species, organs, sites of action, routes of administration, and duration of exposure. Toxic substances may be either natural or synthetic. Certain species, such as venomous reptiles, arthropods, some amphibia, marine animals, and varieties of fungi and other plants, contain toxins that tend to maintain the relatively safe existence of the species. Animal toxins usually act on the nervous system, giving rise to neurotoxicity and cardiotoxicity, whereas plant toxins often tend to affect the cardiovascular and gastric systems, though some contain neuromuscular blocking agents or substances that act on neurotransmitter receptors. Such considerations overlap with the subject of pharmacology3 as natural receptor antagonists can provide an insight into structural requirements for ligand binding, leading to the design of novel therapeutic agents. Synthetic chemicals to which humans are exposed that can give rise to a variety of toxic effects of varying levels encompass pharmaceuticals, industrial solvents, pesticides and agrochemicals, food additives and contaminants, cosmetics, plastics, and air pollutants. In a modern society, of course, organizations exist Table 2 Toxicity Rating Chart Probable Lethal Oral Dose for Humans Toxicity Rating or Class
Dosage
1. Practically nontoxic 2. Slightly toxic 3. Moderately toxic 4. Very toxic 5. Extremely toxic 6. Supertoxic
> 15 g/kg
5-15 g/kg 0.5-5 g/kg 50-500 mg/kg 5-50 mg/kg < 5 mg/kg
For Average Adult
More than 1 quart Between pint and quart Between ounce and pint Between teaspoonful and ounce Between 7 drops and teaspoonful A taste (less than 7 drops)
Introduction 175 to monitor, assess, and regulate the introduction of potentially toxic substances. The 'use of animals in the testing of chemicals for various forms of toxicity arising from human exposure has increased in the last 50 years as a result of the statutory requirements for risk assessment in the environment. Apart from those involved in pest control, scientists are not generally interested in the metabolism of potentially toxic chemicals in animals: animals are merely being used as surrogates for humans. There are, however, several reasons for concern regarding such forms of experimentation. There are many examples of the failure of rodent assays and other animal tests for the safety evaluation of chemicals, with often tragic consequences for patients and loss of revenue for chemical companies, combined with expensive and damaging litigation. Furthermore, there is an increasing concern by both scientists and the general public for animal welfare. Consequently, in recent times, alternatives to animal tests 4-6 have arisen because the latter are becoming less desirable for scientific, economic (Table 3), and ethical reasons. Although there is controversy over the reliability of animal testing of chemicals for human exposure, largely because of metabolic differences between species,' there is also considerable disparity of opinion regarding the validity of alternatives,8-12 many of
Table 3 Typical Costs of Descriptive Toxicity Tests Acute toxicity (rat, two routes) Acute dermal toxicity (rabbit) Acute inhalation toxicity (rat) Acute dermal irritation (rabbit) Acute eye irritation (rabbit) Skin sensitization (guinea pig) Repeated dose toxicity 14-day exposure (rat) 90-day exposure (rat) 1-year (diet, rat) 1-year (oral gavage, rat) 2-year (diet, rat) 2-year (oral gavage, rat) Genetic toxicology tests Reverse mutativd' n assay (Salmonetlutyphimurium) Mammalian bone marrow cytogenetics (in vivo, rat) Micronucleus test (rat) Dominant lethal (mouse) Host-mediated assay (mouse) Drosophiliu Reproduction Phase I (rat) Phase I1 (rat) Phase I1 (rabbit) Phase 111 (rat) Acute toxicity in fish (LC,,) Duphnia reproduction study Algal growth inhibition
U.S. $6,500 3,000 6,500 700 500 7,000 40,000 100,000
200,000 260,O00 470,000 600.000
1,500 16,000 4,500
15,000
6,000 20,000
30,000 20,000 30,000 22,000 1,500 1,500 1,500
1 76 Computer-Asststed Methods in the Evaluation of Chemical Toxicity which do not even correlate well with animal data,l3 let alone data resulting from clinical studies. Such alternatives exist, however, and are still in the process of evaluation;14-16 some are now required alongside animal procedures in the safety evaluation of chemicals as prescribed by the regulatory authorities. Among these alternatives, such as the Ames test6.17-zz for bacterial mutagenicity, cell culture experiments, and other in vitro procedures, lie the computer-based systems, all in their infancy and probably the most controversial and mistrusted by toxicologists and regulators alike. Clearly, toxicology is a vast and complex subject, overlapping with several other disciplines.lJ3 A full and detailed account of this branch of science would take many volumes, but some aspects of relevance to this review are discussed to provide some biological background: one example is the role of the cytochromes P450 in metabolism, detoxication, and carcinogenesis. Toxic chemicals may be ion channel blockers, enzyme inhibitors, narcotics, irritants, photosensitizing agents, receptor antagonists, allergens, carcinogens, mutagens, or teratogens, and in each of these categories lie many different classes of compounds of diverse structure. In view of such complexity, it would seem that the computer prediction of toxicity24 is an extremely difficult task; however, there are ways of simplifying the problem, and they can be categorized as being essentially twofold: (1) those involving quantitative structure-activity relationships and (2) those concerning the identification of toxic segments or molecular fragments.
COMPUTER-BASEDMETHODS FOR TOXICITY EVALUATION Toxicity, as with all forms of biological activity, is a result of the molecular structure of the chemical concerned. Given that fact, the computational chemist is presented with a problem that is, at least theoretically, soluble. The tools that have been applied so successfully to rationalizing biological activity in terms of chemical structure can also be used for correlating toxicity with various structural parameters.24 Such structural descriptors may be physicochemical values,25 functions of molecular size and shape, molecular connectivity, and numbers of atoms, or they may be quantum-chemical parameters relating to electronic distribution within the molecule.26J7
QUANTITATIVE STRUCTUREACTIVITY RELATIONSHIP Quantitative structure-activity relationships (QSARs) are mathematical equations produced from statistical analysis of the biological and structural data for series of compounds, where it is assumed that some degree of correla-
Pattern Recognition Techniques 177
tion exists between a number of structural parameters and bioactivity within the series of chemicals.27 The aim of a QSAR, therefore, is to relate factors of chemical structure with biological activity in a quantitative way to enable the prediction of the activity of untested compounds and to provide a rationale for activity differences between congeners, possibly giving an insight into the mechanisms of biological activity as well. Based on the earlier work of Meyer and Overton, who showed that the narcotic effect of anesthetics was related to their oil/water partition coefficients, Hansch and his co-workers have demonstrated unequivocally the importance of hydrophobic parameters such as log P (where P is, usually, the octanol/water partition coefficient) in QSAR analysis.28 The so-called “classical” QSAR approach, pioneered by Hansch, involves stepwise multiple regression analysis (MRA) in the generation of activity correlations with structural descriptors, such as physicochemical parameters (log P, molar refractivity, etc.) or substituent constants such as IT, a, and E, (where these represent hydrophobic, electronic, and steric effects, respectively). The Hansch approach has been very successful in accurately predicting effects in many biological systems, some of which have been subsequently rationalized by inspection of the threedimensional structures of receptor proteins.28 The use of log P (and its associated substituent parameter, IT) is very important in toxicity,29-32 as well as in other forms of bioactivity, because of the role of hydrophobicity in molecular transport across cell membranes and other biological barriers. Many series of compounds have been subjected to QSAR analyses3.3-49 and to list all would be well beyond the scope of this review; however, the majority of those reported over the last 5 years or so have been summarized in the journal Quantitative Structure-Activity Relationships.
PATTERN RECOGNITION TECHNIQUES Ideally, every congeneric series of chemicals should be capable of QSAR analysis so that, in theory, the biological activity of any chemical could be predicted with a reasonable degree of accuracy.sOJ1 Toxicity, however, is often very difficult to quantify in this way because of the broad range of different types of toxicity and also because of the structural diversity of chemicals which exhibit varying degrees of toxicity in different biological systems, that is, in different species, strains, sexes, organs, and tissues. Therefore, a computerbased system comprising all of the QSAR evaluations pertinent to each form of toxicity in all biological systems would be very unwieldy, though TOPKAT52-55 and HAZARDEXPERT (see last section) do, at least partially, endeavor to address this point. Such difficulties have prompted researchers to formulate alternative methods for computer prediction of toxicity.
-
1 78 Combuter-Assisted Methods in the Evaluation of Chemical Toxicity Entry and storage of molecular structures Pattern recognition or statistical analysis
5.
Structure-activity relationships
7
Three-dimensional molecular modeling
3.
Molecular descriptor generation Prediction of biological activity by interpolation or extrapolation
Scheme 1 ADAPT flowchart.
These techniques can be broadly categorized as pattern recognition and have been used for other forms of biological correlation. The methods known as CASE, ADAPT, and SIMCA are examples of various ways of approaching the problem of predicting toxicity from chemical strUctures.~6-73Briefly, SIMCA56 (soft, independent modeling of class analogy) uses a form of cluster analysis57 to probe regions of multivariate space, whereas ADAPTS*-65 (automated data analysis by pattern recognition techniques) is essentially an extension of the Hansch and SIMCA approaches to QSARs (Scheme 1). CASE66-73 (computerautomated structure evaluation) represents an attempt to overcome limitations in the aforementioned methods by searching for substructural fragments that are essential (or nonessential) for biological activity (Scheme 2). It is, therefore, a topological method similar to the molecular connectivity concept propounded by Kier and Ha11.74 The TOPKAT program mentioned before combines the moiecular fragment approach with other QSAR techniques such as discriminant analysis to construct training sets, thus generating model systems for different types of toxicity. TOPKAT is the only one of these methods specifically designed for toxicity prediction. Scheme 3 shows a flowchart outlining the steps used in the development of a model and in an estimation of toxicity. A relatively recent technique employed in the generation of QSARs is principal components analysis (PCA), which is viewed by many as a successor to MRA. This alternative approach analyzes the multidimensional variable space of structural descriptors to yield QSAR equations of usually higher statisInput molecular structures
.1
Generate toxic and nontoxic fragments
.1
Select discriminatory substructural units
5.
Evaluate relative statistical weighting of fragments
3.
Predict toxicity/activity on the basis of key biophores required for activity Scheme 2 CASE flowchart.
Computer Modeling and Knowledge-Based Systems 1 79
1. Development of a model Assembly of database
2.
.1
Estimation of toxicitylcarcinogenicity Structure to be evaluated
.1
Look up and/or compute parameters
.1
Calculate discriminant score
.1
Calculate probability of carcinogenicity
.1
Evaluate in terms of similar compounds and statistical reliability
Selection of many potential parameters Simple statistics Stepwise discriminant analysis Identification of influential observations and their removal Diagnostics for the identification and removal of poorly behaved parameters
.1
.1 .1
4
.1
Final discriminant analysis
.1
Valj da ti on Scheme 3
TOPKAT flowcharts.
tical significance than MRA at the expense of comprehension because of the fact that, in general, larger numbers of descriptor variables span the QSAR produced. The increase in popularity of PCA over MRA arose from an important study by Topliss and Costello75 that indicated that the number of chance correlations generated by M R A is proportional to the number of descriptor variables employed in the data set. There is obviously some degree of validity in this finding, but the analysis was carried out using random numbers, and QSAR data are rarely random in nature. Furthermore, the likelihood of a correlation appearing by chance in MRA can be assessed by a variety of statistical tools that are usually implemented during the analytical procedure anyway; however, this seminal work should serve as a warning to those in the QSAR field to undertake their analyses with caution, because cross-correlations and chance correlations can often make nonsense out of a seemingly useful and highly significant QSAR equation.
COMPUTER MODELING AND KNOWLEDGE-BASED SYSTEMS The advent of molecular graphics and other facets of computational chemistry has led to an increased use of steric and electronic parameters in QSAR studies, and these quantities can now be calculated more precisely and
180 Computer-Assisted Methods in the Evaluation of Chemical Toxicity
in a relatively shorter time frame than before.76 Steric and electrostatic field factors are combined in the COMFA program written by Cramer and coworkers.” Accurate molecular structures, which are required in COMFA, can be produced by X-ray crystallography78J9 or from molecular mechanics and conformational analysis.800.81 Measurements of molecular dimensions, which can be used as QSAR descriptors, are readily obtained via molecular modeling packages, which usually incorporate routines for the calculation of electronic structure by semiempirical or ab initio methods.82-84 For structurally diverse chemicals exhibiting toxic effects, the semiempirical neglect-ofdifferential-overlap methods are generally the most useful as they combine a reasonable degree of accuracy with relatively short execution times and extensive parameterization for the majority of atom types encountered in toxic chemicals.85 Many of the aforementioned features are combined in integrated molecular modeling software such as SYBYL, Chem-X, COSMIC,80 Quanta and CHARMm, BIOGRAF, Insight and Discover, Chemlab, and MOLIDEA. These packages are available on a variety of hardware platforms ranging from personal computers (PCs) to mainframe computers. A number of computer-based expert systems have emerged that are more directly focused on toxicity evaluation: these are HAZARDEXPERT (with its accompanying package, METABOLEXPERT), DEREK,86 and COMPACT.87-91 The DEREK (deductive estimation of risk from existing knowledge) system for qualitative toxicity prediction is an expert system with the ability to identify molecular fragments or substructures in a chemical that are previously known to give rise to various forms of toxicity.86 Presently restricted to molecules containing up to 64 atoms, DEREK uses some of the features of the LHASA (logic and heuristics applied to synthetic analysis) program, which is an expert system designed by E. J. Corey and co-workers at Harvard University for the identification of organic synthetic routes. Scheme 4 provides a flowchart of the computational stages in an evaluation of toxicity using the DEREK system. Hazardexpert, which runs on a PC o r VAX computer, possesses features similar to those of DEREK in that it is also a knowledge-based system that
Graphical input of chemical structure
J.
Identification of structural features
.1
Classification of potential toxic effects
.1
Report of potential toxicity Scheme 4 DEREK flowchart.
Computer Modeling and Knowledge-Based Systems 181 Input compound, graphically or via line notation
.1
, molecular weight Calculate parameters, log P, P K , ~ and
.1
Identify toxic segments
J.
Define species, route, and duration of exposure, dose
J.
Predict toxic components and intensive factors of bioaccumulation and bioavailability
.1
Overall hazard assessment and percentage toxic risk Scheme 5 HAZARDEXPERT flowchart.
searches for toxic segments within a given molecular structure (Scheme 5). It also has the facility to calculate log P and pK, values, which are then used in an overall quantitative assessment of toxic risk toward a given species at a userdefined exposure level and dose. Hazardexpert will generate metabolites for an input structure, a feature it shares with the companion software Metabolexpert (see last section). Metabolexpert, as the name suggests, is an expert system for the prediction of metabolism. This package, which also runs on the PC, contains considerable information on the biotransformation pathways of most substructural fragments. An extensive database holds many key reaction pathways undergone by “phase I” and “phase 11” metabolism of known chemicals so that a metabolite “tree” may be constructed. (Phase I1 metabolism involves further chemical transformation of the products from the initial transformations.) In both Hazardexpert and Metabolexpert, input structures may be readily built up with the aid of a mouse from molecular fragments or via a line notation method. These packages are easy to use and processing time is relatively short for most applications. Compudrug also produces several other packages of interest such as Pro-log P, MOLIDEA, and DRUGIDEA, in addition to a QSAR database. It can therefore be seen that Hazardexpert is essentially a quantitative version of DEREK that is also able to take into account the important factors of metabolism and bioavailability. Both of the aforementioned programs, however, are more focused on the biological area of structure-toxicity evaluation, a factor they share with COMPACT. The latter is sufficiently different and specific relative to the previous methods to require some degree of introduction prior to a more descriptive treatment. As the COMPACT procedure developed out of research carried out on the cytochrome P450 superfamily of enzymes, it will be necessary to provide some biochemical background relating to these isozymes.
182 Computer-Assisted Methods in the Evaluation of Chemical Toxicity
THE CYTOCHROMES P450 Central to toxicology and drug (or xenobiotic) metabolism lie the cytochromes P450,a superfamily of enzymes that generally function as monooxygenases (Figure 1) in the toxic activation or detoxication of chemicaIs.92-102 Metabolism by cytochromes P450 is a large and complex area requiring many volumes for an exhaustive treatment, as the literature on this subject is so vast and detailed. The cytochromes P450 are termed mixedfunction oxidases, being capable of the metabolic transformation of a wide variety of structural classes such that over 95% of all phase I metabolism of chemicals occurs via a cytochrome P450-mediated pathway (Figure 2). These enzymes are ubiquitous in biological systems, being present in bacteria, plants, fungi, insects, reptiles, fish, birds, and mammals. The cytochromes P450 possess both exogenous and endogenous roles (Table 4). They are involved in the biosynthesis and regulation of the steroid hormones (Figure 3) and are required in the metabolism of fatty acids, prostaglandins, and eicosanoids. Xenobiotics that are metabolized by cytochromes P450 or act as inducers encompass pharmaceuticals, insecticides, agrochemicals, food additives, and pyrolysis products, industrial chemicals, and environmental pollutants. The reasons for such diversity in substrate specificity result from the large number of enzymes of which this superfamily is composed and from their evolution. Currently, about 150 P450 proteins have been identified and their amino acid sequences determined. Analysis of these sequences has enabled a classification of the cytochromes P450 into families and subfamilies leading to the construction of
S
H20+S
co
-c
Fez+ - S
I
co
2H+
eFigure l
Reaction cycle and enzymatic intermediates in P4SO-catalyzed oxidations.
The Cytochromes P450 183 HYDROXYLATION I
d
s
A N
-
0
OAN
S
O N O O H
5-ethyl- 5 - (3'- hydroxy- 1'- methylbuty1)- borbituric ocid
pentoborbttol
6 - @ oce?onilide
OH
2- nophthylomine
p- hydroryocetonilide
2- hydroxylominonaphthalene
EPOXIDATION
nophtholene
-
-
nophtholene I ,2 oxide
N AND 0 DEALKYLATION U
ominopyrene
corbinolomine intermediote
-
monomethyl- 4 aminoontipyrene
yNa -[A0"] ?
0
Yao"
O k
0-
ocetophenetidin
H
hemikctot intermediote
p-hydroxyocetonilide
DEHALOGENATION
F
F
methoxyllurone
LHO'
f a halo alcohol intermediate
-
methoxydifluoroocetic ocid
Figure 2 Reactions catalyzed by cytochrome P450.
an evolutionary tree.103-110 O n the basis of amino acid changes resulting from natural genetic mutations it is thought that the cytochromes P450 evolved from a common ancestor more than 1.5 billion years ago (Figure 4). It has been proposed by Nebert and Gonzalez**4 and others that mammalian cytochromes P450 arose by a process known as coevolution, where plants developed toxins
Diallyi sulfide
Noninduci ble Ethanol, isoniazid
Rat db,
Rat j, rabbit form 3a
P450 IIDl
P450 IIEl
Rat pcn 1
Rat pcn 2
P450 111 P450 M A 1
P450 IIIAZ
Pregnenolone, 16acarbonitrile Pregnenolone, 16acarbonitrile, dexamethasone
Erythromycin
Triacetyloleandomycin
SKF-525A, secobarbital, AIA Ajarnalicine, quinidine
Phenobarbital
Rat b, rabbit form 2
Phenobarbital
P450 IIBl
P450 I1
Rat a
9-Hydroxyellipticine
Inhibitors
P450 IIAl
Polycyclic aromatic hydrocarbons, e.g., benzo[a]pyrenea Isosafrole
Inducers
Rat d, mouse P,, rabbit form 4, human P,
Rat c, mouse PI, rabbit form 6, human P,
Other Literature Names
P450 IA2
P450 I P450 IA1
Family/Subfamily and Protein
Table 4 Cytochrome P450 Superfamily
Testosterone 2 P-hydroxylase Testosterone 6f3-hy droxy lase
Progesterone 7 a and testosterone 15a h ydrox ylation Pentoxyresorufin 0-depenty lase Debrisoquine 4-hy droxylase Dimethylnitrosamine N-demethylase and p-nitrophenol oxidase
Glucose- 1-phosphate N-hydroxylase
7-Ethoxyresorufin 0-deethylase
Selectively Catalyzed Reaction
Progesterone
Rabbit pz
P450 IVA4
Noninduci ble Noninducible Noninduci ble
1lP
17a
Arornatase
c2 1
14DM
CAM
P450 XIBl
P450 XVIIAl
P4SO XIXAl
P450 XXIA1
P450 LIAl
P450 CIA1
Adapted, with permission, from Murray and Reidy.95
Noninducible
Noninducible
Noninduci ble
SCC
P450 XIAl Noninducible
Clofibrate
Rat LAW
P4SO IV P450 IVAl
P450 XI
Pregnenolone, 16acarbonitrile
Human nf
P450 IIIA4
Metyrapone
Ketoconazole
Cyclopropylarnino androstenol 4-Hydroxyandrostenedione
Trimethylsilylethyl pregn-S -enediol
Terminal acetylenic fatty acids
Ethinylestradiol
17a-hydroxyprogesterone C-21 hydroxylase Lanosterol C-14 demethylase Camphor S-exohydroxylase
Cholesterol side-chain cleavage Deoxycorticosterone 11P-hydroxylase Progesterone 17ahydroxylase Androgen aromatase
Lauric acid 12hydroxylase Prostaglandin E, w-hydroxy lase
Nifedipine N-oxidase
186 Computer-Assisted Methods in the Evaluation of Chemical Toxicity P-4 50
P-450
P-450 UT-A
H-17a
2a -H 0
P-450 UT-A P-450 UT-F
Figure 3 Regioselective oxygenations in testosterone with individual P450s involved in the reactions.
to protect their species and animals consequently evolved enzyme systems to
metabolize the plant toxins. Of the dozen or more P450 gene families, it would appear that it is only the cytochrome P450 I, 11, 111, and IV families that either metabolize exogenous chemicals or are induced by them''*-''* (Table 4). Many known carcinogens exhibit a specificity for cytochromes P450 I (Table 5). In particular, polyaromatic and heteroaromatic hydrocarbons and
1500
Figure 4 Cytochrome P450 superfamily phylogenetic tree showing major gene
families.
The Cytochromes P450 287 Table 5 Metabolic Activation of Chemical Carcinogens by Rat Hepatic Cytochrome
P4SO Proteins
Carcinogen
Cytochrome P450 Protein
AaC (2-amino-9H-pyrido[2,3-b]indole) 2-Acetylaminofluorene Aflatoxin B, 2-Aminoanthracene o-Aminoazotoluene 4-Aminobiphenyl 2-Aminofluorene Benzo[a]pyrene 4,4‘-(Bis)methylenechloroaniline (MOCA) 1,2,3,4-Dibenzanthracene 7,12-Dimethylbenz[a]anthracene N,N-Dimethylnitrosamine
IA1, IA2 IA1,1A2, IIBl lA2, IIB1, IICll, IIC12 IA1, IA2 IA2 IA2 IA1, IA2 IA1, IIBl 1A2, IIBl IAl, IA2, IIBl IA 1 IlEl Glu-P- 1 (2-amino-6-methylpyrido[1,2-a:3’,2’-d]imidazole) IA2 1A1, IA2 Glu-P-2 (2-amidopyrido[l,2-a:3’,2‘-d]imidazole) 1Q (2-amino-3-methylimidazo[4,5-flquinoline) IAl, IA2 MeAa (2-amino-3-methyl-9H-pyrido[2,3-b]indole) IAI, lA2 MeIQ (2-amino-3,4-dimethylimidazo[4,5-flquinoline) IAl, lA2 MelQx (2-amino-3,8-dimethylimidazo[4,S-flquinoxaline) IAI, IA2 3-Methylcholanthrene IA 1 N-Methyl-4-aminoazobenzene IAl, IA2 2-Naphthylamine IA2 Trp-P-1 (3-amino-1,4-dirnethyI-SH-pyrido[4,3-b]Indole) IAI, IA2 Trp-P-2 (3-amino-l-methyI-SH-pyrido[4,3-b]indole) IA1, IA2 Data were taken from Guengerich.98
their amino derivatives either induce this P450 family and/or are metabolized by them to yield reactive intermediates such as epoxides, diol epoxides, carbonium ions, or nitrenium ions which can interact with DNA, forming covalent adducts that bring about miscoding, mutagenesis, and carcinogenesis. In contrast, substrates and inducers of cytochromes P450 I1 are generally of low toxicity, with the exception of chemicals showing specificity for the cytochrome P450 IIE subfamily (see Table 5). The latter are known to be involved in the production of reactive oxygen species such as singlet oxygen, superoxy anion, and hydroxyl radicals.116 In addition to oxygen radical production via redox cycling, cytochromes P450 IIE are responsible for the activation of carcinogens such as benzene and the alkyl nitrosamines. Elevated levels of this P450 family occur as a result of diabetes or starvation, where toxic effects may be observed as a result of the formation of oxygen radicals. The cytochromes P450 111 metabolize some of the steroid hormones, but they also appear to be able to oxygenate a small number of carcinogens such as aflatoxin; in most cases, however, detoxication of polyaromatic hydrocarbons is mediated by these isozymes. Cytochromes P450 IV play an endogenous role in the oxygenation of long-chain carboxylic acids and prostaglandins; however,
188 Computer-Assisted Methods in the Evaluatioiz of Chemical Toxicity
they are also inducible by certain exogenous chemicals which cause peroxisome proliferation and bring about carcinogenesis in rodents. To date, there has been no evidence for any cytochrome P450 IV inducers being human carcinogens. In mammals, the cytochromes P450 are found mainly in the liver but are also present in smaller quantities in other organs and tissues. Levels of various cytochromes P450 can be readily raised as a result of their induction by specific chemicals acting on the appropriate cellular receptors, such as the Ah receptor in the case of polyaromatic hydrocarbons. Therefore, the stages in the process of carcinogenesis can be viewed as receptor binding 4translocation to nucleus 4 induction of P450s + production of reactive intermediates --* DNA damage 3 mutation + cancer. Of course, competing processes complicate this picture, and the likelihood of carcinogenesis (as well as other forms of P45O-mediated toxicity) is determined by the efficiency of the DNA repair mechanisms, relative levels of other cytochromes P450 that can aid detoxication, oxygen tension, presence of promotors that initiate the protein kinase c cascade, ability of phase I1 metabolizing enzymes involved in conjugation to prevent the buildup of harmful metabolites, and the availability of cellular glutathione. Fortunately, quite a lot of information is known about the cytochromes P450: they are all hemoproteins sharing some characteristics with other cytochromes such as electron transfer capability and possession of a single polypeptide chain of between 400 and 500 amino acids. All cytochromes P450 are able to bind carbon monoxide when in the reduced Fe(I1) state, which gives a strong W absorption maximum at around 450 nm, hence the name. The appearance of this Soret band in the UV is due to thiolate ligation of the heme iron by a cysteine residue that is invariant in all P450 proteins. The three-dimensional structure of a bacterial form of the enzyme has been determined by X-ray crystallography but, as yet, mammalian cytochromes P450 have not been crystallized because they are membrane bound. Being terminal monooxygenases in an electron transport chain, they use molecular oxygen to bring about oxidation of substrates via single oxygen insertion with the concomitant production of water (see Figure 1).The endogenous substrate for the bacterial enzyme cytochrome P45ocAM is camphor, which is oxygenated stereospecifically to the 5-exo-hydroxide. Computer graphics of this protein’s three-dimensional structure shows that the camphor substrate occupies a well-defined cavity within the cytochrome P450,AM enzyme close to the heme face, where it is oriented for selective oxygenation by a number of hydrophobic amino acid residues and by a single hydrogen bond to tyrosine-96 (Figure 5). The kinetics and thermodynamics of the binding process have been extensively studied,93?117and it appears that the energy required for the binding of camphor is entirely provided by desolvation of the heme pocket by the incoming substrate. Thermodynamic measurements show that the binding process is entropy driven at 21°C. Calculations indicate that this favorable entropy change is
Figure 5
Active site region of cytochrome P450,,,.
I
190 Computer-Assisted Methods in the Evaluution of Chemical Toxicity equivalent to the removal of between 6 and 10 water molecules, and inspection of the differences between substrate-bound and substrate-free cytochrome P45OCA, shows that this is indeed the case.913119 As substrate binding triggers the entire catalytic cycle of the enzyme, hydrophobic interactions play a major role in cytochrome P450-mediated reactions. It is therefore not surprising that hydrophobicity parameters such as log P, which relate to desolvation, give rise to good correlations with biological activities associated with the P450 system. As there is also evidence for the involvement of the phospholipid membrane in the transfer of substrates to cytochromes P450, where hydrophobic forces would be expected to make a contribution, the success of the Hansch approach in explaining potency differences for processes mediated by cytochromes P450 can be readily understood. Although there is no crystal structure available for any of the mammalian cytochromes P450, sequence homology matching between a variety of P450 proteins and the bacterial form of the enzyme can enable identification of active site residues as a result of the conservation of the tertiary fold together with a number of key amino acids like an invariant cysteine. Molecular models of the substrate binding sites of mammalian cytochromes P450, based on analogy with cytochrome P45OCAM,show that there are amino acid residues occupying the hydrophobic pocket that are entirely complementary to groups in the known substrates.91 Such findings give added weight to the COMPACT method in defining the size, shape, and charge distribution of the putative binding sites of mammalian cytochromes P450 involved in oxidative metabolism. The COMPACT procedure for the prediction of potential toxicity via a cytochrome P450-dependent pathway is now described in some detail.
COMPACT The COMPACT (computer-optimized molecular parametric analysis of chemical toxicity) method makes use of known data on substrates and inducers of cytochromes P450 to map structural criteria required for chemicals to exhibit specificity for a variety of toxicologically important P450 proteins. COMPACT (Scheme 6) developed out of research into possible correlations between molecular shape and substrate specificity for cytochrome P450 1 (previousiy known as P448). An inspection of the molecular structures of chemicals that exhibit selectivity for this enzyme (Figure 6 ) shows them to be essentially planar in shape1199'20; however, a systematic study of all known cytochrome P450 I substrates and inducers, using molecular graphics and X-ray crystallographic data, provides more precise details of the dimensional requirements for molecules to fit the enzyme's active site. It has been known for some time that many polycyclic aromatic hydrocarbons (PAHs) are potent carcinogens and that they possess planar molecular structures. Furthermore, the
ComDact 191 Construct molecule
.1
Minimize geometry
.1
Measure molecular dimensions
.1
Calculate electronic structure
.1
Compare molecular structure parameters graphically with training set data (2D or 31))
.1
Prediction of P450 specificity (P450 I, 11, 111, and IV) using cluster analysis
J.
Potential toxicity evaluation Diagnosis P450 I specificity
P450 IIE specificity P450 IV specificity P450 111 specificity P450 IIB and other specificity
strong evidence of toxicity (reactive intermediates) + suspected toxicity (oxygen radicals) + likely rodent toxicity (peroxisome proliferation) + possible weak toxicity + low level of toxicity
--$
Scheme 6 COMPACT flowchart.
degree of molecular planarity is related, to some extent, to the carcinogenicity of PAHs. Presumably, compounds that are composed of rigid planar molecules will, in general, be able to bind to the Ah receptor and thus induce cytochrome P450 I. They will, moreover, possess the structural requirements to fit the cytochrome P450 I active site. If their electronic energy levels are optimum, they can become activated to yield derivatives that will readily intercalate with DNA base pairs, bringing about genetic injury. To differentiate between substrates of cytochromes P450 I and those of other P450 proteins, a molecular shape parameter expressing the degree of planarity was constructed that contained two elements, area and depth. It was found that the area/depth parameter was sufficiently discriminatory to identify cytochrome P450 I-specific chemicals from those that do not show a preference for this isozyme. When further compounds were included in the data set, however, it was found that added weight had to be given to the depth parameter, that is, area/depth2, to fully categorize substrates of cytochromes P450 I from others. Because of the structural diversity of cytochrome P450 substrates, in some cases, even within those specific for one subfamily, it was necessary to
192 Computer-Assisted Methods in the Evaluation of Chemical Toxicity
Dibenz (a, h) anthracene
Phenobotbital
9-Hydroxyellipticine Mcl yraponc
7-Ethoxyresorufin
Heiobor 0 ita \
Figure 6 Space-filled molecular structures of cytochromes P450 I substrates and inducers (dibenzanthracene, 9-hydroxyellipticine, and 7-ethoxyresorufin) compared with chemicals that act as substrates for cytochromes P450 11 (phenobarbital, metyrapone, and hexobarbital).
include other parameters to describe the spatial characteristics of the substrate binding sites of different P450 proteins. Consequently, a second shape factor was used in conjunction with the molecular shape parameter (arealdepthz) such that a two-dimensional plot revealed clusters corresponding to regions of cytochrome P450 substrate specificity. The ratio of molecular length to width used in combination with area/depthz gave a reasonably good differentiation between planar PAHs and planar monocyclic aromatics such as benzene and aniline. The latter are substrates of cytochrome P450 IIE. This form of cluster analysis showed a region of parametric space that matched the spatial requirements of cytochrome P450 111 substrates. The use of shape parameters for structure-activity relationships has been highlighted by the work of Verloop and Tipker,121J22 where good correlations with potency can be achieved in certain series of compounds. Moreover, Arcos
Compact 193
and co-workers~23-*24 have shown previously that the encumbrance area bears some relationship to carcinogenicity for PAHs, whereas Kaliszan and coworkers125 made use of an elongation factor (similar to length/width) to describe activity differences in a series of PAHs inhibiting dimethylnitrosamine demethylase. It is therefore not surprising that molecular dimensions of substrates are able to characterize the requirements for occupancy of enzyme active sites, which, in the case of carcinogenicity, are the cytochromes P450 known to be responsible for the metabolic activation of many known carcinogenic chemicals. Interest in the energetics of cytochrome P450 catalysis has led to investigations of the electronic structures of chemicals of known isoenzyme preferences. One of the best ways of estimating the potential for metabolic activation is from consideration of the electronic activation energy AE. This electronic parameter, when obtained from a molecular orbital (MO) calculation, is expressed as the difference between the frontier orbital energies, AE = E(LEM0) - &HOMO), where LEMO and H O M O denote the lowest empty and highest occupied molecular orbitals, respectively. When combined with the molecular shape parameter, area/depthZ, the activation energy AE produces an excellent differentiation between substrates and inducers of cytochrome P450 I with respect to those of other cytochromes P450 (Figure 7). Moreover, chemicals that are specific for other P450 proteins, such as cytochromes P450 IIE, P450 IIB, and P450 IV, appear as well-defined clusters on a two-dimensional COMPACT plot.126-128 Substances with molecular parametric data appearing at the interface between the regions occupied by cytochrome P450 I and cytochrome P450 IIB substrates were found to be metabolized by both enzymes (Figure 8). COMPACT analysis using area/depth2 and AE allowed identification and, hence, prediction of potential carcinogens, mutagens, and other chemicals giving rise to different forms of toxicity mediated by various cytochromes P450. This is due to the fact that an indication of cytochrome P450 IIE specificity is suggestive of toxicity via oxygen radical production, whereas chemicals exhibiting a preference for cytochrome P450 IV are able to produce hydrogen peroxide and tend to be peroxisome proliferators, many of which are known to be rodent carcinogens. The AE parameter has been employed by Wald and Feuer129 in a QSAR investigation of a series of coumarins in which it was found that aryl hydrocarbon hydroxylase activity correlated with activation energy. Furthermore, Mohammad and Hopfinger reported correlations between AE and mutagenicity in a series of PAHs.130-132 It would appear also that AE can be applied to broad structural classes of chemicals in structure-toxicity relationships. Although carcinogenicity is one of the most difficult forms of toxicity to quantify, Benigni and co-~orkers133>134have employed an experimentally determined physicochemical parameter, ke,*35.136 which is the rate constant for free electron acceptance, as a determinant of this activity. It is therefore encouraging
194 Computer-Assisted Methods in the Evaluation of Chemical Toxicitv
A
A
A
A
A
A
A
A A A
A
'h
M
/
A
A A
A
01 I
8
I
13
eV
I
1
18
23
0
Delta E
Figure 7 Two-dimensionalscatter graph of areaIdepth2 plotted against AE for a variety of cytochrome P450 substrates. The curve marks the distinction between P450 I substrates and P450 11 substrates where the latter lie below the line.
to find that AE correlates with log k , for 26 structurally diverse chemicals according to the following equation91: log k, = - (0.34 '-c 0.05) AE + (0.44 I 0.67) n = 26, s = 0.73, r = 0.82, F = 49.9. Chemicals with low values of AE thus possess a relatively strong propensity for activation and, consequently, exhibit a high affinity for solvated electrons, which, according to Benigni et al., is a prerequisite for carcinogenicity. In fact, frontier orbital energies appear quite frequently in correlations with carcinogenicity, mutagenicity, and other forms of biological activity related to cytochrome P45O-mediated reactions. For example, the rates of metabolism of haloalkanes exhibit a parallelism with their ionization energies (analogous with energies of the highest occupied MOs). The isopotentials are also related to the redox potentials for one-electron transfer, a process occurring during the
Compact 195
Figure 8 COMPACT plot showing a number of different chemicals together with the regions of P450 specificity.
cytochrome P450 catalytic cycle. It is well established that substrate binding causes a lowering of the cytochrome P450 redox potential which facilitates electron transfer from the reductase. The lowering in potential and an accompanying modulation of the heme iron spin state equilibrium trigger the entire cytochrome P450 oxidative pathway.137 There is, moreover, a correlation between spin equilibria and substrate turnover rates in certain series of cytochrome P450 substrates. Furthermore, frontier orbitals appear in QSAR expressions (Table 6) for the mutagenicity of benzanthracenes, methyl chrysenes, and aromatic amines; the Ah receptor binding of 2,3,7,8-tetrachlorodibenzo-p-dioxin(TCDD) analogs; the carcinogenicity of nitrosamines; aryl hydrocarbon hydroxylase activity of coumarins; ethoxyresorufin 0-deethylase activity; ethyl morphine N-dernethylase activity of steroids; cytochrome P450 spin equilibria of hydrocarbons; and the toxicity of benoxaprofen analogs.138 In cases in which AE is involved in correlations with activity, it is possible that the biochemical interactions are frontier-orbital controlled, as this term forms part of the expression for the interaction energy between two molecular species, as formulated by Klopman and Hudson (see Reference 27 for details). Clearly, frontier orbital energies play an important role in cytochrome P45O-mediated reactions and also in rationalizing potency differences within series of carcinogens and other toxic chemicals. A COMPACT-2D analysis using a scatter plot of area/depth2 against AE appears to be quite satisfactory at discriminating carcinogens from noncar-
9. Steroids (A = ethylmorphine N-demethylase activity) 10. Benzimidazoles (I,, = AHH inhibition)
5. Nitrosamines (Reference 128) (LD,, = carcinogenicity) 6. Nitnles (Ki = ethanol inhibition constant) 7. Coumarins (I5, = AHHb inhibition) 8. Aromatic amines
1. Benzanthracenes (M = mutagenicity) 2. Chrysenes (M = mutagenicity) 3. TCDDs (EC,, = Ah receptor binding) 4. PCBsb (Reference 139)
Group of Compounds
+ 7.56
+ 12.26
=
5.64E (HOMO) + 48.05
pl,,
=
0.07 area/depth
+ 0 . 0 1 ~+ ~2.33
log AMES = -3.83Q6 - 0.08 arealdepth + 0.65E(HOMO) + 10.95 log EROD = -8.02Q3H - 0.17 area/depth2 + O.SlE(HOM0) + 9.29 log A = 0.20 areaIdepth2 - 0.47AE + 8.25
PI,,
Receptor binding = 0.40 area/depth2 + 3.19 EROD activity = 0.49 area/depth2 + 3.87 -log LD50 = 4.79AE - 67.47 -log LD,, = -54.83Q,H + 3.16 -log Ki = -1.25 a/AE + 0.95
pEC5, = -3.94E (HOMO) - 24.87
log M = -49.16QI,H
log M = -13.38E(LEMO)
Relationship
Table 6 Summary of Quantitative Structure-Activity Relationships in Toxicity and P450 Activity"
8
0.127
0.196
0.220
14
14
0.301
0.482
0.650 0.777 0.334 0.602 0.282
0.550
0.210
0.272
5
14
23
15 10 6 31 13
8
6
14
n
0.98
0.87
0.87
0.81
0.82
0.82 0.83 0.95 0.61 0.76
0.87
0.89
0.82
1
53.03
17.85
10.69
6.14
41.55
25.89 17.98 35.68 16.94 15.26
19.26
14.51
24.40
F
15. Imidazoles (MIC = minimum inhibitory concentration) 16. Alcohols (Reference 154) (IC,, = inhibition of aniline hydroxylation 17. Alkyl benzenes (K, = spin-state equilibrium constant) 18. Benoxaprofen analogues (Reference 138) (LD = lethal dose) 19. Phenyl aziridines (LD,, = lethal dose) 20. Nitrosoureas (LD,, = lethal dose) 21. Aniline mustards (k = alkylation rate constant)
14. Phthalate esters (Reference 151)
11. Resorufins (A = ERODb inhibition) 12. Methylene dioxy benzenes (I,, = 50% EROD inhibition) 13. Clofibrate analoguesb
+ 5.17
+ 6.42
log k = 16.46Q,
+ 2.07
23
8 15 28
+ 0.15
log l/LD,, = -254.97 QsH + 7.61 log l/LD,, = -177.74Q8H + 21.06Q3H log l/LD,, = -0.01HE - 0.32SE - 0.45
10
20
6
9
15
16
+ 1.61
+
8 8 15
8
-log LD = -0.16SN5 - 1.74 E(HOM0) - 17.22
K , = 1.74SE- 50.44AE-'
PIC,, = -13.3QcL
pl,, = 1.22 areaidepthz - 5.68 pl,, = 0.23 area/depth - 5.39 log pCoA activity = O.45SE - 2.54 log catalase activity = 0.395S, - 1.99 -Log pCoA activity = -616.?Q, - 14.51Q,, 0.12S, + 256.30 -log MIC = 9.12&, + 0.40
log A = -0.08SN
0.344
0.260 0.224 0.227
0.124
0.645
0.337
0.058
0.208 0.268 0.643 0.483 0.137
0.094
0.78
0.95 0.93 0.84
0.83
0.93
0.92
32.78
43.81 41.14 29.29
14.46
23.90
93.03
73.69
71.74 40.92 17.88 24.26 24.87
79.97
(continued)
0.97
0.96 0.93 0.76 0.81 0.97
0.96
-log IC,, = 0 . 0 3 -~ 4.32 ~ ~ ~ ~ K , = 196.3QNH - 268.8
log A = 1.76 log S, - 1.64
Relationship
20 6
7
n
0.291 1.670
0.277
5
0.75 0.98
0.92
T
23.06 83.63
25.87
F
=E(LEMO) = energy of the lowest empty molecular orbital (LEMO) E ( H O M 0 ) = energy of the highest occupied molecular orbital (HOMO) hE = E(LEM0) - E(HOM0) ~ M O L = molecular polarizability P = dipole moment QN = atomic charge on atom N Q,H,L = electron density in the HOMO or LEMO, respectively, of atom N SN = total nucleophilic superdelocalizability SE = total eiectrophilic superdelocalizability = calculated heat of formation 4 area = product of molecular length and width in the main molecular plane = molecular depth measured relative to the main molecular plane depth n = number of compounds in the series S = standard error in the y estimate 7 = correlation coefficient F = variance ratio PCB, polychlorinated biphenyl; AHH, arylhydrocarbon hydroxylase; pCoA, palmitoyl o-coenzyme bTCDD, 2,3,7,8-tetrachlorodibenzo-p-dioxin; A; EROD, ethoxyresorufin 0-deethylase induction.
23. Phenytoins (Reference 149) 24. Aliphatic amines (K, = binding constant)
dirnethylnitrosarnine demethylase)
22. Aromatic hydrocarbons (Reference 150) ( A = inhibition of
Group of Compounds
Table 6 Summary of Quantitative Structure-Activity Relationships in Toxicity and P450 Activity” (continued)
Comoact 199
cinogens assuming that most carcinogenic chemicals are likely to be activated by cytochrome P450 I. In fact, a validation study of 100 compounds from the National Toxicology Program (NTP) data compiled by Ashby and Tennant indicates that COMPACT is about 80% predictive of rodent carcinogenicity (two species, two sexes). When combined with the Ames mutagenicity test, the correlation increases to 95%. It is thus likely that COMPACT is identifying the non-direct-acting carcinogens that the Ames test fails to pick up, whereas the Ames test complements COMPACT in the cases of those chemicals that exert their toxicity by direct interaction with DNA. To date, more than 1000 chemicals of diverse structure have been analyzed by COMPACT using a database of over 100 known cytochrome P450 substrates and inducers which make up the training set of compounds. COMPACT parameters have been found to correlate well with various forms of toxicity and cytochrome P450 activity in many series of chemicals,91 including TCDDs, polychlorinated biphenyls (PCBs),139 polyaromatic hydrocarbons, aromatic amines, nitrosamines, nitriles, steroids, benzimidazoles, methylene dioxybenzenes, and alkylbenzenes (Table 6). Moreover, COMPACT evaluations have been carried out on many individual industrial chemicafs, including pharmaceuticals,~40-142antifungal agents, petrochemicals and food additives, and on large numbers of natural food flavors. In some cases, the predictions of COMPACT have been verified by enzyme induction and other unpublished studies on the chemicals in question. COMPACT is still in the process of development so as to enhance and extend its predictive power, as our knowledge of toxic mechanisms increases, by identifying other structural criteria for substrate specificity toward all the major cytochromes P450 involved in xenobiotic metabolism. This is achieved by analyzing the structural similarities possessed by substrates of the same enzyme. A recent development has been to make use of an additional structural parameter, the molecular diameter, to discriminate further between cytochrome P450 substrates by three-dimensional cluster analysis, known as COMPACT-3D (Figure 9). Substrates and inducers of cytochrome P450 IIE tend to be relatively small molecules compared with chemicals specific for other cytochromes P450.Consequently, they may be readily differentiated by employing molecular diameter as a measure of molecular size, even though the actual shape may vary. It is thought that a three-dimensional plot using the COMPACT parameters of areajdepthz, AE, and diameter will successfully categorize substrates and inducers of cytochromes P450 I, P450 IIB, P450 IIE, and P450 111, as the latter generally accept substrates of large molecular size but with a variety of shape. Inducers of cytochromes P450 IV that are involved in peroxisome proliferation, which brings about carcinogenicity in rodents, are quite difficult to identify using COMPACT. However, because these chemicals usually possess an acidic function, or an isostere, combined with at least one aromatic ring in a specific orientation relative to the acid moiety, it is possible to predict potential peroxisome proliferators by structural analogy with known inducers such as
200 Computer-Assisted Methods in the Evaluation of Chemical Toxicity
Figure 9 COMPACT-3D scatter plot of molecular structural data for substrates of three P450 families. Chemicals specific for P450 IIE are characterized by relatively small molecular diameters.
clofibrate, a hypolipidemic agent. The degree of induction of peroxisomal proliferating enzymes can be estimated from thermodynamic binding energies obtained from molecular mechanics calculations on the substrates binding to the peroxisome proliferator receptor, a member of the steroid receptor superfamily. As more receptors are isolated and their amino acid sequences determined, it will be possible to extend this technique to investigate structural rationalizations for receptor selectivity and potency of inducing agents.
MOLECULAR ORBITAL CALCULATIONS AND QSARs IN TOXICITY The Pullmans were the first to use electronic structure calculations in several studies on the carcinogenicity of polycyclic aromatic hydrocarbons and hetero analogs in the late 1940s and early 1950s. Much of the early work on
Molecular Orbital Calculations and QSARs in Toxicity 201 the K-L theory of carcinogenesis of PAHs was summarized by Bernard and Alberte Pullman in 1955.143 It is interesting to note that, although a simple LCAO procedure based on the Hiickel method was used, it was found that electronic excitation energies roughly correlated with the carcinogenicity of methyl benzanthracenes, whereas the total charge in the K region showed a parallelism with the potency of carcinogenic PAHs and acridines. Since then, there have been many investigations using MO calculations on series of compounds possessing carcinogenic activity or other forms of toxicity,144-148 some where the involvement of the cytochrome P450 superfamily is implicated or suspected.149-155 Some of these studies have been reviewed previously27991 and documented in earlier literature surveys. A number of these investigations have formed the basis of QSAR studies, whereas others have been investigations on individual compounds, either contrasting or comparing different varieties and levels of activity. A summary of M O studies of relevance to toxicity'56-185 is presented in Table 7 where it can be seen that both transition-state energies and frontier orbital energies frequently form the basis of correlations with carcinogenicity; it is probable that the former is a result of the latter electronic property, so the appearance of both is perhaps not surprising. The majority of these investigations tend to support the use of frontier orbital energies and their differences by the COMPACT procedure; however, in some cases, when dealing with particular series of compounds, specific parameters relating to the electronic structure on certain regions of the molecules, for example, electron densities on individual atoms, exhibit parallelisms with carcinogenic potency. Electrostatic potential energies186-188 are becoming more widely used in structure-activity studies, especially in toxicity correlations involving the reactivity of epoxides, which are the intermediates formed when oxygen is inserted during cytochrome P450 oxygenations of different classes of chemicals. Electrostatic isopotential (EIP) minimal86 often identify sites and ease of metabolism by epoxide hydrase, an enzyme responsible for the conversion of epoxides to diols by the addition of water. Molecular electrostatic potential energy calculations also are probably the best means of identifying positions of epoxidation and, possibly, metabolism in general. In the case of aflatoxin B,, for example, EIP maxima and minima calculated by the C N D 0 / 2 method all lie close to the known sites of metabolism and, in particular, the formation of the carcinogenic 2,3-epoxide is readily predicted.189 An electronic parameter that often correlates with metabolic rates is the electrophilic (or nucleophilic) superdelocalizability. This quantity is a reactivity index formulated by Fukui and colleagues as an orbital-weighted electron density.145 The total electrophilic superdelocalizability, 2SE, summed over all atoms in a molecule, exhibits a parallelism with the hydrophobic parameter, log P, in several series of compounds such as PAHs and aliphatic amines, where it is probably approximating molecular volume. As mentioned earlier, the logarithm of the octanol/water partition coefficient is an extremely important quantity in toxicological evaluations. Hansch
202 Computer-Assisted Methods in the Evaluation of Chemical Toxicity Table 7 Molecular Orbital and Molecular Modeling Studies of Toxic Chemicals Compound
Authors
Year
P450 model Benzo[a]pyrene Arene oxides PAHsQ Aromatic amines Aromatic amines Aromatic hydrocarbon Secondary amines Alcohols Diol epoxides Chloroethenes P450 models PAHs Nitrosamines Epoxides Amines Aromatics PAHs PCBs and TCDDs TCDD PCBs and TCDDs TCDDs TCDDs PCBs PCBs Nitroarenes PCDFs Epoxides PCBs Aromatic acids Benzoquinolines
Loew and Kirchner Shipman Marsh and Jerina Pullman Loew et al. Loew et al. Yukawa et al. Le Page et al. Testa Adams and Kaminsky Loew et al. Pudzianowski and Loew Mohammad et al. Loew et al. Politzer and Laurence Goldblum and Loew Korzekwa et a!. Mohammad McKinney et al. McKinney et al. McKinney et al. McKinney et al. Pedersen et al. McKinney and Pedersen Rickenbacher et al. Maynard et al. Long et al. Politzer and Murray Lewis et al. Mehler and Gerhards Belik et al.
1975 1976 1978 1979 1979 1979 1980 1980 1981 1982 1983 1983 1983 1983 1984 1985 1985 1985 1985 1985 1985 1985 1986 1986 1986 1986 1987 1987 1987 1987 1990
Parameter
Reference
w b o n d orders Transition state energies (review) Frontier electron density Frontier electron density Interaction energies Frontier orbital energies Frontier orbital energies Heats of formation Heats of formation
-
Frontier orbital energies Frontier orbital energies EIPs Transition state energies Transition state energies Frontier orbital energies a,polarizability Molecular modeling a,polarizability Molecular modeling Molecular modeling Molecular modeling Molecular modeling Frontier orbital energies a,polarizability EIPs Molecular modeling EIPs Molecular modeling
156 178 223 148 159 160 175 155 35 172 162 177 130 163 186 169 168 131,132 200
193 199 194 203 20 1 202 214 165 187 26 40 44
.PAH, polycyclic aromatic hydrocarbon; PCB, polychlorinated biphenyl; TCDD, 2,3,7,8tetrachlorodibenzo-p-dioxin; PCDF, polychlorinated dibenzofuran; EIP, electrostatic isopotential.
and co-authors have reported relationships with toxicity where either log P itself or a quadratic equation in log P is the sole structural descriptor, in no less than 166 different QSARs for many classes of compounds including alcohols, amines, hydrocarbons, ethers, aldehydes, ketones, amides, and alkylureas.190 Although it is relatively straightforward to calculate log P values from molecular fragments, it is also of interest to investigate the relative contributions to this parameter from polarizability, polarity, and basicity. These three quantities can be calculated by MO procedures. Polarizability and dipole moment can be calculated directly, whereas energy of the highest occupied MO can be an approximate measure of basicity. A linear combination of these parameters correlates with log P for compounds of diverse chemical structure, the basicity
Critical Assessment of Predictive Methods 203 term being less important than the other two factors. As the expression for polarizability bears a resemblance to that of superdelocalizability, it is understandable that total superdelocalizabilities can correlate with log P. In this section, we have been considering the toxicity of organic molecules because large numbers of new organic chemicals are produced yearly and the majority of compounds destined for human exposure are organic in nature. It should not, however, be forgotten that many metals and other inorganic materials are known to be toxic and, therefore, one should consider ways of estimating their potential toxicity. The means by which metals and their compounds exert toxic effects vary, and different theories have been proposed to account for their toxicity. As far as acute toxicity is concerned, it is possible that the production of free radicals is involved in metal toxicity, and one way of estimating the ability of metals or their salts to form toxic radicals would be to look at their relative electron donor/acceptor characteristics. A measure of this ability to donate and accept electrons is expressed by the redox potential of a given metal/metal ion system, E". In fact, one finds that for 26 different metal systems the acute oral toxicity in rodents correlates with redox potential according to the expression -log LD,, = (0.42 _t 0.05) E" + (3.69 ? 0.08) n = 26, s = 0.32, r = 0.86, F = 64.9, where Y is the correlation coefficient and F is the variance ratio. Furthermore, it is worth adding that redox potential seems to be strongly dependent ( r = 0.92) on the free energy of hydration of the metal ions, which, in turn, correlates with the energy of the lowest empty molecular orbital. It may, therefore, be possible to quantify and estimate metal toxicity in terms of electronic structural parameters.
CRITICAL ASSESSMENT OF PREDICTIVE METHODS The computer-based tests outlined previously are, in most cases, still in the process of development and evaluation so a full assessment of their capabilities is difficult at this stage. The three methods developed in the early 1 9 8 0 ~TOPKAT, ~ ADAPT, and CASE, have been applied to many varieties of compounds in different areas of toxicity, mainly carcinogenicity and mutagenicity, and are reported to have a predictive power of between 75% and 95% depending on the training sets used to generate structure-toxicity models.191 Therefore, for the series of chemicals where studies have been carried out, the predictive value of these methods appears to be quite high even though many of
204 Combuter-Assisted Methods in the Evaluation of Chemical Toxicitv
these evaluations were conducted on structurally similar compounds. In favor of TOPKAT, however, it should be stated that reported validations have been on specified toxic endpoints, such as irritancy, mutagenicity, and carcinogenicity, for structurally diverse chemicals. Comparison between the computer-based methods is difficult because none of them have, as yet, reported evaluations on the same group of chemicals. The more recently developed methods, such as COMPACT, HAZARDEXPERT, and DEREK, have not undertaken a sufficiently extensive validation exercise to enable fuU comparisons to be made, though Table 8 demonstrates the agreement between COMPACT and DEREK for 40 chemicals currently being tested by the NTP. The results in Table 8 show that DEREK is able to identify the genotoxicity potential of diverse chemicals with a high degree of accuracy. COMPACT, however, is capable of identifying carcinogens that are not picked up by bacterial mutagenicity testing. Naphthalene, for example, is shown to be positive by COMPACT but is negative in the other tests; however, an early disclosure of the results of the NTP study on these chemicals mentions that naphthalene has been found to be a lung carcinogen in the female mouse.192 Apparently, COMPACT was the only predictive test that was able to identify this compound as a potential carcinogen. A possible reason for the failure of other computer methods in this instance could be that they are usually set up to show the toxic effects of substituents rather than those of the parent hydrocarbons. It is interesting to note that the Ames test for bacterial mutagenicity was highly predictive of rodent carcinogenicity when it had only been demonstrated on a relatively small number of chemicals. As time progressed, however, its predictive power diminished such that it is now only about 55% accurate at predicting carcinogenicity in animals. Nevertheless, the Ames test is still used extensively for safety evaluation and is also a standard regulatory requirement. If any new test procedure currently being developed was subsequently found to be as poorly predictive as 55'10, it would be immediately discounted. For example, TOPKAT is claimed to be 95% predictive of most toxic endpoints, and although very high, this claim can probably be substantiated in many cases, depending on the size and structural diversity of the training set used to generate the model. In contrast, COMPACT is between 75% and 85% predictive toward the rodent two-species carcinogenicity assay, though this evaluation was only made for 200 compounds. The reason for this lower performance is undoubtedly due to the fact that COMPACT has been designed to identify not direct-acting carcinogens, but only those chemicals giving rise to reactive intermediates via metabolic activation. Therefore, a combination of COMPACT and the Ames test improves the prediction to about 95% because the latter picks the false negatives from COMPACT that directly interact with DNA. A comparison between six computer-based methods currently in use for the evaluation of chemical safety is shown in Table 9, which summarizes their relative strengths and weaknesses. The majority of these methods are analytical
Critical Assessment of Predictive Methods 205 Table 8 Comparison between Toxicity Predictions by COMPACT, DEREK, and the Ames Test (Genotoxicity Potential) for 40 Diverse Chemicals ~~
No. 1. 2. 3. 4. 5. 6.
7. 8. 9. 10.
11. 12. 13. 14. 15. 16. 17. 18.
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40.
~~
Compound
COMPACT0 DEREKb GenotoxicityC
Amphetamine Naphthalene Promethazine Resorcinol y-Butyrolactone Chloracetic acid p-Nitrophenol Tricresyl phosphate o-Benzyl p-chlorophenol 2,2-Bis(bromoethyl)-1,3-propanediol t-Butyl alcohol 3,4-Dihydrocoumarin Ethylene glycol Methylphenidate Theophylline 4,4'-Thiobis(6+butyl rn-cresol) Triamterene Diphenylh ydantoin Pen tachloroanisole Chloramine 4,4'-Diamino-2,2'-stilbenedisulfonic acid Methyl bromide p-Nitrobenzoic acid Hydrazoic acid Tris(2-chloroethyl)phosphate CI Direct Blue 218 CI Pigment Red 3 CI Pigment Red 23 2,4-Diaminophenol 4-Hydroxyacetanilide Salicylazosulfapyridine CI Acid Red 114 CI Direct Blue 15 Coumarin 2,3-Dibromo-l-propanol 3,3 '-Dimethylbenzidine HC Yellow 4 p-Nitroaniline o-Nitroanisole 1,2,3-TrichIoropropane
Glndication of P450 I specificity only. Some chemicals with a negative indication could be carcinogenic via a P450 IIE pathway such as compounds 6, 22, and 40. Compound 24 is probably direct acting and, therefore, not activated via a cytochrome P450. bOnly indication of carcinogenicity is labeled as positive although DEREK also gives an indication of other forms of toxicity. = A n indication of genotoxic potential is provided by the results of bacterial mutagenicity tests.
206 Computer-Assisted Methods in the Evaluation of Chemicat Toxicity Table 9 Summary of Comeuter-Based Methods of Toxicity Prediction Method
Features
Advantages
Disadvantages
ADAPT
Pattern recognition Cluster analysis
CASE
Pattern recognition Topological analysis
Limited to congeneric series of chemicals Limited to congeneric series of chemicals
COMPACT
Based on known biochemistry and cluster analysis
Uses QSAR descriptors of molecular structure Can discriminate between positional isomers Automated Most molecular modeling software systems can be used Easy to carry out Structurally' diverse
DEREK
Searches for toxic fragments based on predefined rules
Can be modified and extended to include new rules Structurally diverse chemicals
HAZARDEXPERT
Searches for toxic segments Species, dose, and exposure levels can be defined by user Based on known QSAR methods Used by the EPA for risk assessment
Runs on a PC Quantitative Calculates log P and pK, data Predicts metabolites Runs on a PC Quantitative Several toxicity models available
TOPKAT
Limited to cytochrome P4SO-mediated toxicity Qualitative Requires interpretation Compounds are limited to 64 atoms Runs only on VAX machines under VMS Qualitative Does not distinguish between positional isomers Additive in nature Does not distinguish between positional isomers Additive in nature Expensive and no academic discount
in the respect that they computationally break up a given structure into its components, and then the toxic effect of each component, substituent, or fragment is assessed either qualitatively or quantitatively. COMPACT is the only one of these methods that is not analytical as it views the toxic potential of the whole structure. As there is often synergy between different groups in a molecule, it is an oversimplification to view their contributions to toxicity as additive in nature. The true situation is that each functional moiety modifies the effects of the others, giving rise to an overall toxicity. Therefore, the molecular
Critical Assessment of Predictive Methods 207
and electronic structure should be considered as a whole as the total effect is not simply a linear sum of group contributions. Although not strictly quantitative, COMPACT can be used to estimate the likely toxic risk from the position occupied by a given compound on a COMP-4CT plot as there is a rough correspondence with carcinogenicity. In some cases, there is a quantitative relationship between the magnitude of COMPACT parameters and different forms of toxicity, as can be seen from an inspection of Table 6. TOPKAT is used by the Environmental Protection Agency (EPA) for predicting chemical toxicity, and this has led to its use by many industries in the evaluation of the toxic risk of their own chemicals under development. Clearly, it is commercially advantageous to judge the viability of one’s own compounds with the same criteria employed by a governmental monitoring authority. Hazardexpert makes use of information from EPA structural alerts to define toxic segments identifiable by searching fragments of the input structure and matching with a knowledge base of known toxic fragments. DEREK bears some degree of similarity to Hazardexpert in that it also searches for potentially toxic fragments and assigns a particular type of toxicity to each one. Although DEREK includes Food and Drug Administration structural alerts in its rule base, it provides only a qualitative estimate of toxicity; however, DEREK can be seen as a system designed to cut down the time and possible fallability of human intervention in toxicity screening by structura1 analogy with known toxic agents. It can also be improved by including additional rules, whereas Hazardexpert has a user entry facility into the knowledge base and compound base. Although no publications to date have reported the use of Hazardexpert in toxicity assessment, this software is currently being evaluated in the author’s laboratory where it has been found quite useful as an adjunct to COMPACT. ADAPT and CASE are particularly powerful for correlating toxicity (and other forms of biological activity) in structurally related series of chemicals. Consequently, they are both useful in the prioritization of groups of similar compounds toward a well-defined toxic endpoint. ADAPT makes use of the more traditional QSAR descriptors, whereas CASE employs the somewhat more recent topological approach to structure-activity analysis. Many of these predictive methods are complementary rather than competitive: it would not be wise to rely entirely on any single technique but a combination of at least two or three methods would be prudent. Such an approach would optimize the strengths and weaknesses of each method and yield a fairly accurate description of the toxic potential of a given substance. Although there are known advantages and disadvantages in each method, their employment in a battery of short-term in vttro tests would circumvent the limitations while making full use of their potential benefits-to-risk assessment. Even with such an array of promising techniques, however, there is always room for further development of novel systems for toxicity evaluation.
208 Computer-Assisted Methods in the Evaluation of Chemical Toxicity
CONCLUSIONS AND FUTURE PROSPECTS Many factors contribute to the overall toxicity of a chemical that are not intrinsic, such as pharmacokinetics and pharmacodynamics. Generally, the accumulation of a chemical in a given tissue is undesirable unless its therapeutic value outweighs possible toxic side effects, for example, antineoplastic agents. Because of the large numbers of new chemicals produced every year, the likely toxic risk in terms of potential hazard to humans has to be determined. Toxicology is therefore a major growth area in science, and as the manifestation of undesirable toxicity in a product can result in considerable financial loss to a company, a great deal of effort is made to ensure that any new chemical coming onto the market is safe. As has been mentioned earlier, all substances are toxic; it is the dose that varies. As in other forms of bioactivity, however, the dose is not linearly related to response and, as the function is exponential in nature, it is usual to employ log dose-response curves to compare potencies; there is an approximately linear portion of such functions that facilitates comparison between substances. Toxicity, like other forms of biological activity, is due to the molecular and electronic structure of the chemical. Toxicity can, however, be modified by the physical state of the compound and its effect varies depending on the species, sex, tissue, genetic disposition, and environment of the animal concerned. On this basis, it is clear that animals are inadequate models for humans, but they are still required by the regulatory authorities for the evaluation of chemical safety. It is well known that the metabolism of laboratory animals is different from human metabolism, thus making it difficult to extrapolate, say, rodent carcinogenicity to likely human risk. Humans have a more sophisticated defense system against toxicity arising from the production of oxygen radicals and other reactive intermediates: the DNA repair mechanisms are more efficient and, also, Homo sapiens has a lower oxygen tension than small rodents. Consequently, although many chemicals have been found to be carcinogenic in rodents, there are very few known human carcinogens. Toxic chemicals may be direct acting or metabolically activated to a reactive species, or they may induce the formation of reactive entities such as oxygen radicals. The cytochromes P450 play a crucial role in the activation and detoxication of chemicals because of their ability to perform a very large number of different metabolic reactions such as oxidations, dealkylations, dehydrohalogenations, and, occasionally, reductions. Some cytochromes P450 mediate the formation of carcinogens, mutagens, neoantigens, and other toxic species, whereas the majority of these enzymes generally bring about detoxication by hydroxylation, which facilitates conjugation via the glutathione S-transferases or UDP-glucuronyl transferases, for example. Humans are better equipped to use these pathways of detoxication than other species because of a more developed metabolic system.
Conclusions and Future Prospects 209 Computers are now used in the design and development of new chemi-
cals, and their employment in toxicity prediction could lead to improved prod-
ucts that present a reduced hazard to humans. Although computers are useful for performing routine calculations, they do not usually possess insight or rationalization. Therefore, they should represent only one of a number of test procedures used to formulate a full safety evaluation in a given chemical. Where they are used, their results should be interpreted by a panel of expert toxicologists capable of providing an overall view of the likely toxic risk in the human environment. Hydrophobic chemicals tend to be toxic as they are often difficult to metabolize and, consequently, accumulate in the body. Many of these are substrates or inducers of cytochrome P450 I or P450 IIE, for example, TCDD,193-198 benzene, PCBs,l99-203 and PAHs.204-206 It is therefore common that hydrophobic parameters, such as log P, correlate with toxicity. Many of the currently available computer-based methods for evaluating chemical toxicity rely on QSAR techniques that have been successfully applied in other areas of biological activity.191.207 There is also a large body of information showing that many structural descriptors can correlate with various forms of toxicity, the most common being the hydrophobic parameter log P. The rationalization behind the use of this parameter has been discussed previously, and there is a clear link between hydrophobicity and cytochrome P450 substrate specificity; however, even in cases where cytochrome P450-specific interactions are not as hydrophobic as those found in cytochrome P450,-,, that is, where entropic processes are not necessarily the sole driving force behind the reaction, log r is nevertheless likely to make a major contribution to toxicity. Although the octanol/water partition coefficient is an experimentally measurable quantity, it is also useful to be able to calculate its value for compounds that are not readily soluble in water or when experimental data are not available. Either this quantity can be calculated from fragmentation constants, such as IT and fvalues, using packages such as CLOGP208 and Pro-log I?, or it is possible to estimate log P from a linear combination of its electronic and polarization components. These may be determined from solvent parameters, such as IT*, or via quantum mechanical calculations of polarizabilities and dipole moments. This enables one to investigate which components of log P determine the potency differences in series of chemicals. One of the limitations of log P is that it gives no indication of the possible mechanism of action even though there is a rough correlation. In addition to log P, many other structural parameters have been found to relate to toxicity,209-214 in particular, those involving molecular dimensions and features of electronic structure, especially those pertaining to frontier orbitals. Although QSARs provide strong correlations for congeneric series of chemicals, they are not always applicable to diverse structures. There are examples in the literature where different QSAR approaches using different parameters can all produce good correlations with toxicity or
21 0 Computer-Assisted Methods in the Evuluatioti of Chemical Toxicity indeed other biological activities.21sJ16 It is possible that different structural descriptors can actually be measuring the same effect, and these parameters will often be interrelated; for example, molecular connectivity, length of a carbon chain, and log P can be shown to cross-correlate in some cases. Parameters that represent the entire molecular structure are promising for wide application in QSAR, and it is likely that molecular electrostatic potential energy values and their associated electric field components will play an important role in the future for the rationalization of biological activity with chemical structure, in addition to other molecular modeling and MO techniques.217-225 Various computer-based systems for toxicity evaluation are currently available, some of which are commercial packages. These frequently use QSAR techniques and/or analyze molecular fragments. In the author’s opinion, no single method is preferable, and all have merit. To achieve a high degree of predictive power, it would be advantageous to combine the results of at least four different methods so as to optimize their strengths and minimize their weaknesses. With the additional support of toxicity databases such as Toxline, RTECS, and that produced by the EPA, a scan for structural alerts can be made that will back up the computer predictions. Moreover, the results of noncomputational methods2263227 should be included in an overall evaluation of chemical safety. If and when such procedures accurately simulate the results of animal tests, the days of rodent carcinogenicity screening will be numbered.
INFORMATION SOURCES FOR SOFTWARE FOR COMPUTER-AIDED PREDICTION OF TOXICITY METHODS TOPKAT:
Health Designs, Inc. 183 East Main Street Rochester, NY 14604, U.S.A.
DEREK:
LHASA, UK, Ltd. School of Chemistry University of Leeds Leeds, LS2 9JT, U.K.
CASE:
Gilles Klopman Department of Chemistry Case Western Reserve University Cleveland, OH 44106, U.S.A.
References 2 1 1
ADAPT:
Peter Jurs Department of Chemistry Pennsylvania State University University Park, PA 16802, U.S.A.
COMPACT:
David Lewis School of Biological Sciences University of Surrey Guildford Surrey, GU2 5XH, U.K.
HAZARDEXPERT:
Compudrug Ltd. H-1136 Budapest Fiirst Sindor utca. 5, Hungary
ACKNOWLEDGMENTS The financial support of the Humane Research Trust, the Ministry of Agriculture, Fisheries and Food, and the University of Surrey is gratefully acknowledged.
REFERENCES 1. C. D. Klaassen, M. 0. Amdur, and J. Doull, Eds., Casarett and Doulli Toxicology: The Basic Science of Poisons, 3rd ed., Macmillan, New York, 1986. 2. J. Li, J. Carroll and D. J. Ellar, Nature, 353, 815 (1991).Crystal Structure of Insecticidal &Endotoxin from Bacillus thuringiensis at 2.5 A Resolution. 3. W. B. Pratt and P. Taylor, Principles of Drug Action, 3rd ed., Churchill Livingstone, New York, 1990. 4. M. Balls, J. Bridges, and J. Southee, Animals and Alternatives in ToxicoloD, Macmillan, London, 1991. 5. A. M. Goldberg and J. M. Frazier, Sci. Am., 261, 16 (1989). Alternatives to Animals in Toxicity Testing. 6. J. McCann and B. N. Ames, Adv. Mod. Toxicol.. 5 , 87 (1978). The Salmonella/Microsome Mutagenicity Test: Predictive Value for Animal Carcinogenicity. 7. J. K. Haseman and J. E. Huff, Cancer Lett., 37, 125 (1987). Species Correlation in Long Term Carcinogenicity Studies. 8. C. C. Travis, A. W. Saulsbury, and S. A. Richter Pack, Mutagenesis, 5,213 (1990).Prediction of Cancer Potency Using a Battery of Mutation and Toxicity Data. 9. B. Allen, K. Crump, and A. Shipp, in Carcinogen Risk Assessment: New Directions in the Qualitative and Quantitative Aspects, R. W. Hart and F. D. Hoerger, Eds., Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1988, pp. 197-209. Is It Possible to Predict the Carcinogenic Potency of a Chemical in Humans Using Animal Data?
21 2 Computer-Assisted Methods in the Evaluation of Chemical Toxicity W.Yambert, Mutation Res., 241, 21 (1990).Prediction of Carcinogenic Potency from Toxicological Data. F. K. Ennever, T. J. Noonan, and €1. S. Rosenkranz, Mutagenesis, 2 , 73 (1987).The Prediction of Animal Bioassays and Short-term Genotoxicity Tests for Carcinogenicity and Noncarcinogenicity to Humans. R. W. Tennant, B. H. Margolin, M. D. Shelby, E. Zeiger, J. K. Haseman, J. Spalding, W. Caspary, M. Resnick, S. Stasiewicz, B. Anderson, and R. Minor, Science, 236, 933 (1987). Prediction of Chemical Carcinogenicity in Rodents from in Vitro Genetic Toxicity Assays. C. M. Burnett and J. F. Corbett, Food Chem. Toxicol., 25,703 (1987).Failure of Short-Term in Vitro Mutagenicity Tests to Predict the Animal Carcinogenicity of Hair Dyes. J. Ashby, J. A. Styles, and D. Paton, Carcinogenesis, 1, 1 (1980).Studies in Vitro to Discern the Structural Requirements for Carcinogenicity in Analogues of the Carcinogen 4-Dimethylaminoazobenzene (Butter Yellow). M. D. Shelby and S. Stasiewicz, Bnviron. Mutagen., 6,871 (1984). Chemicals Showing No Evidence of Carcinogenicity in Long-Term, Two-species Rodent Studies: The Need for Short-‘krm Test Data. F. K. Ennever, T. J. Noonan, and H. S. Rosenkranz, Carcinogenesis, 2 , 73 (1987). The Predictivity of Animal Bioassays and Short-Term Genotoxicity Tests for Carcinogenicity and Non-carcinogenicity to Humans. B. N. Ames and I.. S. Gold, Proc. Natl. Acad. Sci. USA, 87, 7772 (1990). Chemical Carcinogenesis: Too Many Rodent Carcinogens. B. N. Ames and L. S. Gold, Science, 249, 970 (1990). Too Many Rodent Carcinogens: Mitogenesis Increases Mutagenesis. B. N. Ames and L. S. Gold, in Science and the Law, 1’. Huber, Ed., in press. Environmental Pollution and Cancer: Some Misconceptions. B. N. Ames and L. S. Gold, Chem. Engin. News, 69, (January 7) 28 (1991).Cancer Prevention Strategies Greatly Exaggerate Risk. B. N. Ames, M. Profet, and L. S. Gold, Proc. Natl. Acad. Sci. USA, 87,7777 (1990).Dietary Pesticides (99.99% all natural). B. N. Ames, M. Profet, and L. S. Gold, Proc. Nut/. Acad. Sci. USA, 87,7782 (1990).Nature’s Chemicals and Synthetic Chemicals: Comparative Toxicology. A. Albert, Selective Toxicity, 7th ed., Chapman & Hall, London, 1985. L. Goldberg, Structure-Activity Correlation as a Predictive Tool in Toxicology, Hemisphere, Washington, D.C., 1983. J. C. Dearden, in Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental Chemistry and Toxicology, W. Karcher and J. Devillers, Eds., EEC, Brussels, 1990. Physico-chemical Descriptions. D. F. V. Lewis, T. J. B. Gray, and B. G. Lake, in Drug Metabolism-From Molecules to Man, D. J. Benford, J. W. Bridges, and G. G. Gibson, Eds., Taylor & Francis, London, 1987, pp. 369-378. The Use of Structure-Activity Relationships. D. F. V. Lewis, Prog. Drug Metab., 12, 205 (1990). MO-QSARs: A Review of Molecular Orbital-Generated Quantitative Structure-Activity Relationships. C. Hansch, in QSAR: Rational Approaches to the Design of Bioactive Compounds, C . Silipo and A. Vittoria, Eds., Elsevier, Amsterdam, 1991, pp. 3-10. New Perspectives in QSAR. R. L. Lipnick and W. J. Dunn, in Quantitative Approaches to Drug Design,J . C . Dearden, Ed., Ekevier, Amsterdam, 1983, pp. 265-266. A MLAB Study of Aquatic StructureToxicity Relationships. R. L. Lipnick, C. S. Pritzker, and D. L. Bentley, in QSAR and Strategies in the Design of Bioactive Compounds, J. K . Seydel, Ed., VCH Publishers, Weinheim, 1985, pp. 420-423. A QSAR Study of the Rat LD,,, for Alcohols.
10. C. C. Travis, S. A. Richter Pack, A. W. Saulsbury, and M.
11.
12. 13. 14. 15. 16.
17. 18.
19. 20.
21.
22. 23. 24. 25. 26. 27. 28.
29. 30.
References 2 13 31. R. L. Lipnick, in QSAR in Toxicology and Xenobiochemistry, M . Tichy, Ed., Elsevier, Amsterdam, 1985, pp. 39-52. Validation and Extension of Fish Toxicity QSARs and Interspecies Comparisons for Certain Classes of Organic Chemicals. 32. R. L. Lipnick, Trends Pharmacol. Sci., 10, 265 (1989). Hans Horst Meyer and the Lipoid Theory of Narcosis. 33. M. Tichy, Ed., QSAR in Toxicology and Xenobiochemistry, Elsevier, Amsterdam, 1985. 34. D. Hadzi and B. Jerman-Blazic, Eds., QSAR in Drug Design and Toxicology, Elsevier, Amsterdam, 1987. 35. B. Testa, Chem.-Biol. Interact., 34,287 (1981).Structural and Electronic Factors Influencing the Inhibition of Aniline Hydroxylation by Alcohols and Their Binding to Cytochrome P-450. 36. J. K, Seydel and K.-J. Schaper, Pharmacol. Ther., 15, 131 (1982). Quantitative StructurePharmacokinetic Relationships and Drug Design. 37. R. Purchase, J. Phillips, and 8. G. Lake, BIBRA Bull., 29, 5 (1990). Structure-Activity Techniques in Toxicology. 38. R. Purchase, J. C. Phillips, and B. G. Lake, Food Cbem. Toxicol.,28,459 (1990).StructureActivity Techniques in Toxicology. 39. F. Rippmann, Quant. Struct.-Act. Relat., 9 , l (1990).Hydrophobicity and Tumour Promoting Activity of Phorbol Esters. 40. E. L. Mehler and J. Gerhards, Mol. Pbarmacol., 31,284 (1987). Electronic Determinants of the Anti-inflammatory Action of Benzoic and Salicylic Acids. 41. J. C. Phillips, W. B. Gibson, J. Yam, C. L. Alden, and G . C. Hard, Food Chem. Toxicol.,28, 375 (1990). Survey of the QSAR and in Vitro Approaches for Developing Non-animal Methods to Supersede the in Vivo LD,, Test. 42. L. Turner, F. Choplin, P. Dugard, J. Hermens, R. Jaeckh, M. Marsmann, and D. Roberts, Toxicol.in Vitro, 1, 143 (1987). Structure-Activity Relationships in Toxicology and Ecotoxicology: An Assessment. 43. G. W. Adamson, D. Bawden, and D. T. Saggers, Pestic. Sci., 15, 31 (1984). Quantitative Structure-Activity Relationship Studies of Acute Toxicity LD,, in a Large Series of Herbicidal Benzirnidazoles. 44. A. V. Belik, V. V. Guseva, M. N. Lebedeva, N. P. Kozyreva, and N. S. Zefirov, Dokl. Akad. Nauk. SSSR. 310, 1144 (1990). Prediction of the Acute Toxicity of Benzo[g]quinoline Derivatives. 45. Q. Dai and C. Shi, Kexue Tangbao, 30, 52 (1985). Structure-Carcinogenic Activity Relationship of Alkly-Substituted Polycyclic Aromatic Hydrocarbons by Pattern Recognition by Computer. 46. S. Borman, Chem. Engin. News, 68, (February 19) 20 (1990). New QSAR Techniques Eyed for Environmental Assessments. 47. M. G. Ford and D. J. Livingstone, Quant. Struct.-Act. Relat., 9, 107 (1990). Multivariate Techniques for Parameter Selection and Data Analysis Exemplified by a Study of Pyrethroid Neurotoxicity. 48. H. Kubinyi, 1. Cancer Res. Clin. Oncol., 116,529 (1990). Quantitative Structure-Activity Relationships (QSAR) and Molecular Modeling in Cancer Research. 49. D. R. Krewski, P. L. Carr, R. Anderson, and S. G. Gilbert, in Handbook ofin Vivo Toxicity Testing, D. L. Arnold, H. C. Grice and D. R. Krewski, Eds., Academic Press, New York, 1990, pp. 555-579. Computer Applications in Toxicological Research. 50. N. Ramiller, Am. Lab., 16, (June) 78 (1984). Computer Assisted Studies in StructureActivity Relationships. 51. P. P. Mager, Zentralbl. Pharmakol. Pharmakother. Lab., 121,23 (1982).Computer Assisted Drug Design Univariate and Multivariate Structure-Activity Relationship Models.
2 14 Cornouter-Assisted Methods in the Evaluation of Chemical Toxicitv 52. K. Enslein, Pharmacol. Rev., 36, 1315 (1984). Estimation of Toxicological Endpoints by Structure-Activity Relationships. 53. K. Enslein, B. W. Blake, M. E. Tomb, and H. H. Borgstedt, In Vitro Toxicol., 1, 33 (1986). Prediction of Ames Test Results by Structure-Activity Relationships. 54. K. Enslein, H. H. Borgstedt, B. W. Blake, and J. B. Hart, In Vitro Toxicol., 1, 129 (1987). Prediction of Rabbit Skin Irritation Severity by Structure-Activity Relationships. 55. K. Enslein, H. H. Borgstedt, M. E. Tomb, B. W. Blake, and J. B. Hart, Toxicol. Ind. Health, 3, 267 (1987). A Structure-Activity Prediction Model of Carcinogenicity Based o n NCUNTP Assays and Food Additives. 56. W. J. Dunn, Toxicol. Lett., 43, 277 (1988). QSAR Approaches to Predicting Toxicity. 57. J. W. McFarland and D. J. Gans,]. Med. Chem., 30,46 (1987). Cluster Significance Analysis Contrasted with Three Other Quantitative Structure-Activity Relationships Methods. 58. J. T. Chou and P. C. Jurs, J. Med. Chem., 22, 792 (1979). Computer-Assisted StructureActivity Studies of Chemical Carcinogens. An N-Nitroso Compound Data Set. 59. P. C. Jurs, J. T. Chou, and M. Yuan, ]. Med. Chem., 22, 476 (1979). Computer-Assisted Structure-Activity Studies of Chemical Carcinogens. A Heterogeneous Data Set. 60. P. C. Jurs, Drug lnj. I., 17, 219a (1983) Computer Assisted Studies of Structure-Activity Relationships Using Pattern Recognition.
61. P. C. Jurs, M. N. Hasan, D. R. Henry, T. R. Stouch, and E. K. Whaleupe, Fundam. Appl. Toxicol., 3 , 343 (1983). Computer-Assisted Studies of Molecular Structure and Carcinogenic Activity.
62. P. C. Jurs, T. R. Stouch, M. Czerwinski, and J. N. Narvaez, /. Chem. lnf. Cornput. Sci., 25, 296 (1985). Computer-Assisted Studies of Molecular Structure-Biological Activity Relationships. 63. D. R. Henry, P. C. Jurs, and W. A. Denny, 1. Med. Chem., 25, 899 (1982). StructureAntitumor Activity Relationships of 9-Anilinoacridines Using Pattern Recognition.
64. M. Yuan and P. C. Jurs, Toxicol. Appl. Pharmacol., 52, 294 (1980). Computer-Assisted Structure-Activity Studies of Chemical Carcinogens: A Polyaromatic Hydrocarbon Data Set. 65. K. Yuta and P. C. Jurs, J . Ned. Chem., 24, 241 (1981). Computer-Assisted StructureActivity Studies of Chemical Carcinogens. Aromatic Amines. 66. G. Klopman, H. Grimberg, and A. J. Hopfinger, J. Theor. Biol., 79,355 (1979). MIND013 Calculations of the Conformation and Carcinogenicity of Epoxy Metabolites of Aromatic Hydrocarbons. 67. C. Klopman, Enuiron. Health Perspect., 61, 269 (1984). Predicting Toxicity through a Computer Automated Structure Evaluation Program. 68. G. Klopman and M. R. Frierson, Croat. Chim. Acta, 57, 1411 (1984), The a-Effect: A Theoretical Study Incorporating Solvent Effects. 69. G. Klopman and 0. T. Macina, J . Theor. Biol., 113, 637 (1985). Use of the Computer Automated Structure Evaluation Program in Determining Quantitative Structure-Activity Relationships within Hallucinogenic Phenylalkylamines. 70. G. Klopman, K. Namboodiri, and A. N. Kalos, in Molecular Basis of Cancer, Part A: Macromolecular Structure, Carcinogens and Oncogenes, R. Rein, Ed., Alan R. Liss, 1985, pp. 287-298. Computer Automated Evaluation and Prediction of the Iball Index of Carcinogenicity of Polycyclic Aromatic Hydrocarbons. 71. G. Klopman, A. N. Kalos, and H. S. Rosenkranz, Mol. Toxicol., 1, 61 (1987). A Computer Automated Study of the Structure-Mutagenicity Relationships of Non-Fused-Ring Nitroarenes and Related Compounds. 72. G. Klopman and 0. T. Macina, Mol. Pharmacol., 31, 457 (1987). Computer-Automated Structure Evaluation of Anti-leukemic 9-Anilinoacridines.
References 21 5 73. G. Klopman and M. 1. Dimayuga,]. Cornput.-AidedMol. Design, 4, 117 (1990). Computer Automated Structure Evaluation (CASE) of the Teratogenicity of Retinoids with the Aid of a Novel Geometry Index. 74. L. B. Kier and L. H. Hall, Molecular Connectivity in Chemistry and Drug Research, Academic Press, New York, 1976. L. H. Hall and L. B. Kier, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 367-422. The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in StructureProperty Modeling. 75. J. G. Topliss and R. J. Costello, ]. Med. Chem., 15, 1066 (1972). Chance Correlations in Structure-Activity Studies Using Multiple Regression Analysis. 76. D. F. V. Lewis, Manufact. Chem., 59, 22 (1988). Computer Prediction of Toxicity and Bioactivity. 77. G. R. Marshall and R. D. Cramer, Trends Pharmacol. Sci., 9, 285 (1988). ThreeDimensional Structure-Activity Relationships. 78. M. Rossi, ]. Med. Chem., 26, 1246 (1983). Structural Studies of Metyrapone: A Potent Inhibitor of Cytochrome P-450. 79. M. Rossi, S. Markovitz, and T. Callahan, Carcinogenesis, 8 , 881 (1987).Defining the Active Site of Cytochrome P-450:The Crystal and Molecular Structure of an Inhibitor, SKF-525A. 80. J. G. Vinter, A. Davis, and M. R. Saunders, I. Cornput.-Aided Mol. Design, 1, 31 (1987). Strategic Approaches to Drug Design. 1. An Integrated Software Framework for Molecular Modelling. 81. M.-P. Mingeot-Leclercq, A. Van Schepdael, R. Brasseur, R. Busson, H. J. Vanderhaeghe, P. J. Claes, and P. M. Tulkens, J. Med. Chem., 34,1476 (1991).New Derivatives of Kanamycin B Obtained by Modifications and Substitutions in Position 6". In Vitro and Computer-Aided Toxicological Evaluation with Respect to Interactions with Phosphatidylinositol. 82. D. B. Boyd, Drug Inf.J., 17,121 (1983). Quantum Mechanics in Drug Design: Methods and Applications. D. B. Boyd, D. W. Smith, J. J. P. Stewart, and E. Wimmer, J. Comput. Chem., 9, 387 (1988). Numerical Sensitivity of Trajectories across Conformational Energy Hypersurfaces from Geometry Optimized Molecular Orbital Calculations: AM1, M N D O and M N D 0 / 3 . D. B. Boyd, in Reviews in Computational Chemistry, Vol. 1, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, pp. 321-354. Aspects of Molecular Modeling. 83. J. J. P. Stewart, in Reviews in Computational Chemistry, Vol. 1, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, pp. 45-81. Semiempirical Molecular Orbital Methods. 84. M. C. Zerner, in Reviews in Computational Chemistry, Vol. 2. K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 313-365. Semiempirical Molecular Orbital Methods. 85. D. F. V. Lewis, C. loannides, and D. V. Parke, Toxicol. Lett., 45, 1 (1989). Prediction of Chemical Carcinogenicity from Molecular and Electronic Structure: A Comparison of MINDOi3 and C N D O l 2 Molecular Orbital Methods. 86. D. M. Sanderson and C. G. Earnshaw, Hum. Exp. Toxicol., 10, 261 (1991). Computer Prediction of Possible Toxic Action from Chemical Structure: The DEREK System. 87. D. F. V. Lewis, C. loannides, and D. V. Parke, in QSAR: Rational Approaches to the Design of Bioactive Compounds, C. Silipo and A. Vittoria, Eds., Elsevier. Amsterdam, 1991, pp. 525-527. COMPACT: A Form of Discriminant Analysis for the Identification of Potential Carcinogens. 88. D. F. V. Lewis, C. loannides, and D. V. Parke, ATLA, 18,91 (1990).The Safety Evaluation of Drugs and Chemicals by the Use of Computer Optimized Molecular Parametric Analysis of Chemical Toxicity (COMPACT). 89. D. F, V. Lewis, C. Ioannides, and D. V. Parke, Mutagenesas, 5 , 433 (1990). A Prospective Toxicity Evaluation (COMPACT) o n 40 Chemicals Currently Being Tested by the National Toxicology Program.
21 6 Commter-Assisted Methods in the Evaluation of Chemical Toxicitv 90. D. F. V. Lewis, in Animals and Alternatives in Toxicology, M. Balls, J. Bridges, and J. Southee, Eds., Macmillan, London, 1991. Computers and Mathematical Modelling. 91. D. F. V. Lewis, Frontiers Biotrunsform., 7, 90 (1991). Computer Modelling of Cytochromes P-450 and Their Substrates: A Rational Approach to the Prediction of Carcinogenicity. 92. B. K. Park and N. R. Kitteringham, Prog. Drug Metab., 11, 1 (1988). Assessing Induction and Inhibition of Drug Metabolism in Man. 93. I. Shuster, Ed., Cytochrome P-450: Biochemistry and Biophysics, Taylor & Francis, London, 1989. 94. D. E. Ryan and W. Levin, Pharmacol. Ther., 45, 153 (1990).Purification and Characterization of Hepatic Microsomal Cytochrome P-450. 95. M. Murray and G. F. Reidy, Pharmacol. Rev., 42, 85 (1990). Selectivity in the Inhibition of Mammalian Cytochromes P-450 by Chemical Agents. 96. G. G. Gibson and P. Skett, Introduction to Drug Metabolism, Chapman & Hall, London, 1986. 97. F. P. Guengerich and T. L. Macdonald, Acc. Chem. Res., 17, 9 (1984). Chemical Mechanisms of Catalysis by Cytochromes P-450: A Unified View. 98. F. P. Guengerich, Cancer Res., 48, 2946 (1988). Roles of Cytochrome P-450 Enzymes in Chemical Carcinogenesis and Cancer Chemotherapy. 99. F. P. Guengerich and T. L. Macdonald, FASEB J., 4, 2453 (1990). Mechanisms of Cytochrome P-450 Catalysis.
100. F. P. Guengerich, Trends Pharmucol. Sci., 12, 281 (1991). Molecular Advances for the Cytochrome P-450Superfamily.
101. F. P. Guengerich and T. Shimada, Chem. Res. Toxicol., 4, 391 (1991). Oxidation of Toxic and Carcinogenic Chemicals by Human Cytochrome P-450 Enzymes. 102. D. F. V. Lewis, Drug. Metab. Rev., 17, 1 (1986). Physical Methods in the Study of the Active Site Geometry of Cytochrome P-450.
103. D. W. Nebert and F. J. Gonzalez, Trends Phurmacol. Sci., 6 , 160 (1985). Cytochrome P450 Gene Expression and Regulation. 104. D. W. Nebert and F. J. Gonzalez, Annu. Rev. Biochem., 56, 945 (1987). P450 Genes: Structure, Evolution and Regulation.
105. D. W. Nebert, D. R. Nelson, M. Adesnik, M. J. Coon, R. W. Estabrook, F. J. Gonzalez, F. P. Guengerich, I. C. Gunsalus, E. F. Johnson, B. Kemper, W. Levin, I. R. Phillips, R. Sato, and M. R. Waterman, DNA, 8, 1 (1989).The P450 Superfamily: Updated Listing of All Genes and Recommended Nomenclature for the Chromosomal Loci.
106. D. W. Nebert and F. J. Gonzalez, Frontiers Biotransform., 2, 35 (1990). The P450 Gene Superfamily.
107. D. W. Nebert, D. R. Nelson, M. J. Coon, R. W. Estabrook, R. Feyereisen, Y.Fujii-Kuriyama, F. J. Gonzalez, F. P. Guengerich, 1. C. Gunsalus, E. F. Johnson, J. C. Loper, R. Sato, M. R. Waterman, and D. J. Waxman, DNA Cell Biol., 10, 1 (1991). The P450 Superfamily: Update on New Sequences, Gene Mapping and Recommended Nomenclature. 108. D. R. Nelson and H. W. Strobel, Mol. Biol. Euol., 4,572 (1987). Evolution of Cytochrome P-450 Proteins. 109. F. J, Gonzalez, Pharmacol. Rev., 40, 243 (1988). The Molecular Biology of Cytochrome P450s. 110. F. J. Gonzalez and D. W. Nebert, Trends Genet., 6, 182 (1990). Evolution of the P450 Gene Superfamily. 1 1 1. D. V. Parke, Arch. Toxicol., 60, 5 (1987). Activation Mechanisms to Chemical Toxicity. 112. D. V. Parke, Regul. Toxicol. Pharmucol., 7, 220 (1987). Chemical Toxicity, Cytochromes P-450and Computer Graphics.
References 21 7 113. C. loannides and D. V. Parke, Drug Metab. Rev., 22.1 (1990).The Cytochrome P450 I Gene Family of Microsomal Hemoproteins and Their Role in the Metabolic Activation of Chemicals. 114. C. loannides and D. V. Parke, Biochem. Pharmacol., 36, 4197 (1987). The Cytochromes P-448: A Unique Family of Enzymes Involved in Chemical Toxicity and Carcinogenesis. D. V. Parke and C. Ioannides, in Comprehensive Medicinal Chemistry, J. B. Taylor, Ed., Vol. 5, Pergamon, Oxford, 1990. Toxicokinetics. 115. D. V. Parke, Frontiers Biotransform., 2 , l (1990).Induction of Cytochromes P-450: General Principles and Biological Consequences. 116. D. V. Parke, Acta Pharm. Nord., 2,231 (1989). Drug Metabolism in the Design and Safety Evaluation of New Drugs. The Scheele Memorial Lecture 1989. 117. A. I. Archakov and G . I. Bachmanova, Cytochrome P-450 and Active Oxygen, Taylor & Francis, London, 1990. 118. L. S. Alexander and H. M. Goff, J. Chem. Edrrc., 59, 179 (1982). Chemicals, Cancer and Cytochrome P-450. 119. D. F. V. Lewis, C. loannides, and D. V. Parke, Biochem. Pharmacol., 35, 2179 (1986). Molecular Dimensions of the Substrate Binding Site of Cytochrome P-448. 120. D. F. V. Lewis, C. loannides, and D. V. Parke, Chew.-EioL lnterac., 64, 39 (1987). Structural Requirements for Substrates of Cytochromes P-450 and P-448. 121. A. Verloop, Drug. Des., 3, 133 (1972).The Use of Linear Free Energy Parameters and Other Experimental Constants in Structure-Activity Studies. 122. A. Verloop and J. Tipker, Pestic. Sci., 7,379 (1976). Use of Linear Free Energy Related and Other Parameters in the Study of Fungicidal Selectivity. 123. J. C. Arcos and M. F. Argus, Adv. Cancer Res., 11, 305 (1968). Molecular Geometry and Carcinogenic Activity of Aromatic Compounds. New Perspectives. 124. J. C. Arcos, A. M. Conney, and N. P. Buu-Hoi, J. Biol. Chem., 236,1291 (1961).Induction of Microsomal Enzyme Synthesis by Polycyclic Aromatic Hydrocarbons of Different Molecular Sizes. 125. R. Kaliszan, H. Lamparczyk, and A. Radecki, Biochem. Pharmacol., 28, 123 (1979). A Relationship between Repression of Dimethylnitrosamine Demethylase by Polycyclic Aromatic Hydrocarbons and Their Shape. 126. D. V. Parke, C. Ioannides, and D. F. V. Lewis, Hum. Toxicol., 7, 397 (1988). Metabolic Activation of Carcinogens and Toxic Chemicals. 127. D. V. Parke, C. loannides, and D. F. V. Lewis, Polish J. Occup. Med., 3, 15 (1990).Current Problems in the Evaluation of Chemical Safety.
128. D. V. Parke, D. F. V. Lewis, and C. Ioannides, in Risk Assessment of Chemicals in the Environment, M. L. Richardson, Ed., Royal Society of Chemistry, London, 1988. Current Procedures for the Evaluation of Chemical Safety.
129. R. W. Wald and G. Feuer,]. Med. Chem., 14, 1081 (1971). Molecular Orbital Calculations on Coumarins and the Induction of Drug-Metabolizing Enzymes.
130. S. N. Mohammad, A. J. Hopfinger, and D. R. Bickers, 1. Theor. Biol., 102, 323 (1983). Intrinsic Mutagenicity of Polycyclic Aromatic Hydrocarbons: A Quantitative StructureActivity Study Based on Molecular Shape Analysis. 131. S. N. Mohammad, Mol. Pharmacol., 27, 1 (198.5). Metabolic Activation and Carcinogenicity of Extended Anilines and Aminoazo Compounds. 132. S. N. Mohammad, Indian j . Biochem. Biophys., 22,56 (198.5). MINDOI3 Calculations of Carcinogenic Activities of Polycycl~cHydrocarbons. 133. R. Benigni, C. Andreoli, and A. Giuliani, Carcinogenesis, 10,55 (1989). Structure-Activity Studies of Chemical Carcinogens: Use of an Electrophilic Reactivity Parameter in a New QSAR Model.
21 8 Computer-Assisted Methods in the Evaluation of Chemical Toxicity 134. R. Benigni, C. Andreoli, and A. Giuliani, in QSAR: Quantitative Structure-Activity Relutionships in Drug Design, J. L. Fauch$re, Ed., Alan R. Liss, New York, 1989. StructureActivity Studies of Chemical Carcinogens in Nan-generic Sets of Compounds. 135. G. Bakale, R. D. McCreary, and E. C. Gregg, Int.]. Quantum Chem., Quantum Biol. Symp., 9, 15 (1982).Quasifree Electron Attachment to Carcinogens. 136. G. Bakale and R. D. McCreary, Carcinogenesis, 8 , 253 (1987). A Physico-chemical Screening Test for Chemical Carcinogens: The k, Test. 137. G. G. Gibson, in Development of Drugs and Modern Medicines, J. Gorrod, Ed., Ellis Horwood, Chichester, 1986, pp. 253-266. Cytochrome P-450: From Biophysics to Pharmacology. 138. D. F. V. Lewis, C. loannides, and D. V. Parke, Toxicology, 65, 33 (1991). A Retrospective Study of the Molecular Toxicology of Benoxaprofen. 139. D. V. Parke, C. loannides, and D. F. V. Lewis, in Toxicology in Europe in tbe Year 2000. FEST Supplement, C. M. Hodel, Ed., Elsevier, Amsterdam, 1986, pp. 14-19. StructureActivity Models for Toxicity Testing. 140. A. L. Parke, C. loannides, D. F. V. Lewis, and D. V. Parke, ~ln~ammophurmacology, 1, 3 (1991).Molecular Pathology of Drug-Disease Interactions in Chronic Autoimmune Inflammatory Diseases. 141. A. L. Parke, C. loannides, and D. F. V. Lewis, Toxicol. in Vitro, 4, 680 (1990). Computer Modelling and Other in Vitro Tests in the Safety Evaluation of Chemicals-Strategic Applications. 142. A. L. Parke, C. Ioannides, and D. F. V. Lewis, Can.j . Physiol. Pharmucol., 69,537 (1991). The Role of Cytochromes P-450 in the Detoxication and Activation of Drugs and Other Chemicals. 143. A. Pullman and B. Pullman, Adv. Cancer Res., 3 , 117 (1955). Electronic Structure and Carcinogenic Activity and Aromatic Molecules: New Developments. 144. B. Pullman, Electronic Aspects of Biochemistry, Academic Press, New York, 1962. 145. A. Pullman and B. Pullman, Quantum Biochemistry, Wiley, New York, 1963. 146. A. Pullman and B. Pullman, INSERM, 117,51 (1983). Mechanism of Interaction of DNA with Carcinogens. 147. B. PuIlman, Polycyclic Hydrocurbons Cancer, 2 , 419 (1978). I’olycyclic Hydrocarbon Carcinogenesis-An Overall View with Results of Huckel Calculations on BP Epoxides. 148. B. Pullman, Int. 1. Quantum Chem., 16, 669 (1979). Recent Developments on the Mechanism of Chemical Carcinogenesis by Aromatic Hydrocarbons. 149. L. P. Brown, D. F. V. Lewis, T. C. Orton, 0. P. Flint, and G. G. Gibson, Xenobiotica, 19, 1471 (1989). Teratology of Phenylhydantoins in an in Vitro System: Molecular OrbitalGentrated Quantitative Structure-Activity Relationships. 150. D. F. V. Lewis, Xenobioticu, 17, 1351 (1987).Molecular Orbital Calculations and Quantitative Structure-Activity Relationships for Some Polyaromatic Hydrocarbons. 151.. B. G . Lake, T. J. B. Gray, D. F. V. Lewis, J. A. Beamand, K. D. Hodder, R. Purchase, and S. D. Gangolli, Toxicol. Ind. Health, 3, 165 (1987). Structure-Activity Relationships for Induction of Peroxisomal Enzyme Activities by Phthalate Monoesters in Primary Rat Hepatocyte Cultures. 152. B. G. Lake, D. F. V. Lewis, and T. J. B. Gray, Arch. Toxicol., Suppl. 12, 217 (1988). Structure- Activity Relationships for Hepatic Peroxisome Proliferation. 153. D. F. V. Lewis, B. G. Lake, T. J. B. Gray, and S. D. Gangolli, Arch. Toxicol., Suppl. 11, 39 (1987).Structure-Activity Requirements for Induction of Peroxisomal Enzyme Activities in Primary Rat Hepatocyte Cultures. 154. D. F. V. Lewis, Chem.-Biol. Interact., 62, 271 (1 987). Quantitative Structure-Activity Relationships in a Series of Alcohols Exhibiting Inhibition of Cytochrome P-450 Mediated Aniline Hydroxylation.
References 2 1 9 155. C. Le Page, M. Schaefer, A. M. Batt, and G. Siest, Dev. Biochem., 13,363 (1980). QSAR in a Series of Secondary Amines Derived from Perhexilline Maleate with Purified PhenobarbitalInduced Pig Liver Microsomal Cytochrome P-450. 156. G. H. Loew and R. F. Kirchner, J. Am. Chem. SOC., 97, 7388 (1975). Electronic Structure and Electric Field Gradients in Oxyhemoglobin and Cytochrome P-450 Model Compounds. 157. G. H. Loew, J. Phillips, J. Wong, L. Hjelmeland, and G. Pack, Cancer Biochem. Biophys., 2, 113 (1978). Quantum Mechanical Studies of Metabolism of Polycyclic Aromatic Hydrocarbons-Bay Region Reactivity as a Criterion for Carcinogenic Potency. 158. G. H. Loew, B. S. Sudhindra, and J. E. Ferrel, Chem.-Biol. Interact., 26,75 (1979). Quantum Mechanical Studies of Polycyclic Aromatic Hydrocarbons and MetabolitesCorrelations with Carcinogenesis. 159. G. H. Loew, B. H. Sudhindra, S. Burt, G. P. Pack, and K. MacElroy, Int. J. Quantum Chem., Quant. Biol. Symp., 6, 259 (1979). Aromatic Amine Carcinogenesis: Activation and Interaction with Nucleic Acid Bases. 160. G . H. Loew, B. S. Sudhindra, J. M. Walker, C. C. Sigman, and H. L. Johnson, 1. Environ. Pathol. Toxicol., 2, 1069 (1979). Mutagenic Properties of Aniline Derivatives. 161. G. H. Loew, M. T. Poulsen, J. Ferrel, and D. Chaet, Chew.-Biol. Interact., 31, 319 (1Y80). Studies of Methyl Benzo[a]anthracenes: Metabolism and Correlations with Carcinogenicity. 162. G. H. Loew, E. Kurkjian, and M. Rebagliati, Chem.-Biol. Interact., 43,33 (1983). Metabolism and Relative Carcinogenic Potency of Chloroethylenes: A Quantum Chemical Structure-Activity Study. 163. G. H. Loew, M. T. Poulsen, D. Spangler, and E. Kirkjian, lnt. 1. Quantum Chem., Quantum Biol. Symp., 10, 201 (1983). Structure-Activity Studies of Dialkylnitrosamines. 164. G. H. Loew, D. Spangler, and R. J. Spanggord, in QSAR in Toxicology and Xenobiocbemistry, M. Tichy, Ed., Elsevier, Amsterdam, 1985, pp. 111-126. Computer-Assisted Risk Assessment Mechanistic Structure Activity Studies of Mutagenic Nitroaromatic Compounds. 165. G. Long, J. D. McKinney, and L. G. Pedersen, Quant. Struct.-Act. Relut., 6, 1 (1987). Polychlorinated Dibenzofuran (DCDF) Binding to the Ah Keceptor(s) and Associated Enzyme Induction. Theoretical Model Based on Molecular Parameters. 166. J. P. Lowe, Int. J. Quantum Chem., Quantum Biol. Symp., 9 , 5 (1982).Theoretical Study of Relations between Bay-Region and K-Region Indices of Carcinogenicity. 167. D. F. V. Lewis and V. S. Griffiths, Xenobiotica, 17, 769 (1987). Molecular Electrostatic Potential Energies and Methylation of DNA Bases: A Molecular Orbital-Generated Quantitative Structure-Activity Relationship. 168. K. Korzekwa, W. Trager, M. Gouterman, D. Spangler, and G. H. Loew,]. Am. Chem. Soc., 107,4273 (1985). Cytochrome P450 Mediated Aromatic Oxidations: A Theoretical Study. 169. A. Goldblum and G. H. Loew, J. Am. Chem. SOL., 107,4265 (1985). Quantum Chemical Studies of Model Cytochrome P450 Oxidations of Amines. I. MNDO Pathways for Alkylamine Reactions with Singlet and Triplet Oxygen. 170. B. B. Hasinoff, Biochim. Biophys. Acta, 829, 1 (1985). Quantitative Structure-Activity Relationships for the Reaction of Hydrated Electrons with Heme Proteins. 171. J. J. Kaufman, P. C. Hariharan, W. S. Koski, and K. Balasubramanian, Prog. Clin. Biol. Res., 172,263 (1985). Quantum Chemical and Other Theoretical Studies of Carcinogens, Their Metabolic Activation and Attack on DNA Constituents. 172. S. M. Adams and L. S. Kaminsky, Mol. Pharmacol., 22, 459 (1982). Molecular Orbital Studies of Epoxide Stability of Carcinogenic Polycyclic Aromatic Hydrocarbon Diol Epoxides. 173. G. D. Berger and P. G. Seybold, lnt. 1. Quantum Chem., Quantum Biol. Symp., 6, 305 (1979). Substituent Effects in Chemical Carcinogenesis: Chrysene and Its Methyl Derivatives.
220 Computer-Assisted Methods in the Evaluation of Chemical Toxicity 174. J. R. Collins, B. T. Luke, and G. H. Loew, Int. J. Quantum Chem., Quantum Biol. Symp., 13, 143 (1986). Putative Intermediates in the Adduct Formation between Ethylene and Cytochrome P-450: A Quantum Mechanical Study. 175. E. Yukawa, K. Kouno, Y.Ono, and Y. Ueda, Chem. Pharm. Bull., 28,2593 (1980). Quantum Chemical Study on the Cytochrome P-450 Catalyzed Hydroxylation of Aromatic Hydrocarbons. 176. A. T. Pudzianowski and G. H. Loew, J. Phys. Chem., 87, 1081 (1983). Hydrogen Abstractions from Methyl Groups by Atomic Oxygen. Kinetic Isotope Effects Calculated from MNDO/UHF Results and an Assessment of Their Applicability to MonooxygenaseDependent Hydroxylations. 177. A. T. Pudzianowski and G. H. Loew, Int. J. Quantum Chem., 23,1257 (1983). Mechanistic Studies of Oxene Reactions with Organic Substrates: Reaction Paths on MNDO Enthalpy Surfaces-Models for Cytochrome P450 Oxidations. 178. L. L. Shipman, in Polynuclear Aromatic Hydrocarbons, R. Freudenthal and P, W. Jones, Eds., Raven Press, New York, 1976. A6 lnitio Quantum Mechanical Characterization of the Ground Electronic State of Benzo[a]pyrene. Implications for the Mechanism of Polynuclear Aromatic Hydrocarbon Oxidation to Epoxides by Cytochrome P-450. 179. 1. A. Smith and P. G. Seybold, Int. 1.Quantum Cbem., Quantum Biol. Symp., 5,311 (1978). Methylbenz[a]anthracenes: Correlations between Theoretical Reactivity Indices and Carcinogenicity. 180. L. von Szentphly and C. Parkanyi, J. Mol. Struct., 151, 245 (1987). The MCS Model of Chemical Initiation of Cancer: PPP Calculations on Methylated and N-Heteroaromatic Polycycles. 181. D. Qianhuan, J. Mol. Struct., 149, 167 (1987). Di-region Theory for Carcinogenesis and Wholesale M O Calculation. 182. C . Lui, Z. Shi, Y. Wang, and S. Dai, h n z i Kexui Xuebuo, 1,19 (1981).Superdelocalizability and Carcinogenesis of Polyaromatic Hydrocarbons. 183. C. A. Reynolds and C. Thomson, Int. 1. Quantum Chem., Quantum Biol. Symp., 1 1 , 167 (1984).Ab lnitio Calculations Relevant to the Mechanism of Chemical Carcinogenesis by N-Nitrosamines. 1. The Nitrosation of Amines. 184. C. A. Reynolds and C. Thomson, J. Mol. Struct., 138, 131 (1986).A6 lnitio Calculations on Mechanism of Chemical Carcinogenesis by N-Nitrosamines. 185. C. A. Reynolds and C. Thomson, Int. J. Quantum Chem., 30, 751 (1986). A Theoretical Study of N-Nitrosamine Metabolites: Possible Alkylating Species in Carcinogenesis by N,N’-Dimethylnitrosamine. 186. P. Politzer and P. R. Laurence, Carcinogenesis, 5 , 845 (1984). Relationships between the Electrostatic Potential, Epoxide Hydrase Inhibition, and Carcinogenicity for Some Hydrocarbon and Halogenated Hydrocarbon Epoxides. 187. P. Politzer and J. S. Murray, Mol. Toxicol., 1, 1 (1987).Halogenated Hydrocarbon Epoxides: Some Predictive Methods for Carcinogenic Activity Based on Electronic Mechanisms. 188. P. R. Lawrence, T. R. Proctor, and P. Politzer, Int. 1. Quantum Chem., 26, 425 (1984). Reactive Properties of trans-Dichloro-oxirane in Relation to Carcinogenicities of Vinyl Chloride and trans-Dichloro-ethylene.J. S. Murray and P. Politzer, Int. J. Quantum Chem., 31, 569 (1987). A Computational Study of Isomerization Equilibria: Relation to Vinyl Chloride Carcinogenicity. P. I’olitzer and J. S. Murray, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 273-3 12. Molecular Electrostatic Potentials and Chemical Reactivity. 189. D. F. V. Lewis, unpublished data. 190. C. Hansch, D. Kim, A. J. Leo, E. Novellino, C. Silipo, and A. Vittoria, CRC Crit. Rev. Toxicol., 19, 185 (1989). Toward a Quantitative Comparative Toxicology of Organic Compounds. 191. M. R. Frierson, G. Klopman, and H. S. Rosenkranz, Environ. Mutagen., 8 , 283 (1986). Structure-Activity Relationships among Mutagens and Carcinogens: A Review.
References 221 192. J. A. Ashby, personal communication. 193. J. D. McKinney, J. Fawkes, S. Joran, K. Chae, S. Oatley, R. E. Coleman, and W. Briner, Environ. Health Perspect., 61,41 (1985).2,3,7,8-Tetrachlorodibenzo-p-dioxin(TCDD) as a
Potent and Persistent Thyroxine Agonist: A Mechanistic Model for Toxicity Based on Molecular Reactivity. 194. J. D. McKinney, K. Chae, S. J. Oatley, and C. C. F. Blake,]. M e d . Chem., 28,375 (1985). Molecular Interactions of Toxic Chlorinated Dibenzo-p-dioxins and Dibenzofurans with Thyroxine Binding Prealbumin. 195. P. Sjoberg, J.S. Murray, T. Brinck, P. Evans, and P. Politzer, 1. M o l . Graphics, 8,81 (1990). The Use of the Electrostatic Potential at the Molecular Surface in Recognition Interactions: Dibenzo-p-dioxins and Related Systems. 196. S. Safe, S. Bandiera, T. Sawyer, B. Zmudzka, G. Mason, M. Romkes, M. A. Denomme, J. Sparling, A. B. Okey, and T. Fujita, Enoiron. Health Perspect., 61, 21 (1985).Effects of Structure on Binding to the 2,3,7,8-TCDD Receptor Protein and AHH InductionHalogenated Biphenyls. 197. K. Chae and J. D. McKinney, I. Med. Chem., 31, 357 (1988).Molecular Complexes of Thyroid Hormone Tyrosyl Rings with Aromatic Donors. Possible Relationship to Receptor Protein Interactions. 198. J. M. Blaney, E. C. Jorgensen, M. L. Connolly, T. E. Ferrin, R. Langridge, S. J. Oatley, J. M. Burridge, and C. C. F. Blake,]. M e d . Chem., 25,785 (1982).Computer Graphics in Drug Design: Molecular Modelling of Thyroid Hormone-Prealbumin Interactions. 199. J. D. McKinney, K. Chae, E. E. McConnell, and L. S. Birnbaum, Environ. Health Perspect., 60,.57 (1985).Structure-Induction versus Structure-Toxicity Relationships for Polychlorinated Biphenyls and Related Aromatic Hydrocarbons. 200. J. D. McKinney, T. Darden, M. A. Lyerly, and L. G. Pedersen, Quant. Struct.-Act. Relat., 4, 166 (1985).PCB and Related Compound Binding to the the Ah Receptor(s): Theoretical Model Based on Molecular Parameters and Molecular Mechanics. 201. J. D. McKinney and L. G. Pedersen, Biochem. I., 240, 621 (1986).Biological Activity of Polychlorinated Biphenyls Related to Conformational Structure. 202. U. Rickenbacher, J. D. McKinney, S. J. Oatley, and C. C. F. Blake,]. Med. Chem., 29,641 (1986).Structurally Specific Binding of Halogenated Biphenyls to Thyroxine Transport Protein. 203. L. G. Pedersen, T.A. Darden, S. J. Oatley, and J. D. McKinney, 1. Med. Cbem., 29, 2451 (1986). A Theoretical Study of the Binding of Polychlorinated Biphenyls (PCBs), Dibenzodioxins and Dibenzofuran to Human Plasma Prealbumin. 204. J. P.Whitlock, Trends Pharrnacol. Sci., 10,285 (1989).The Control of Cytochrome P-450 Gene Expression by Dioxin. 205. D. V. Parke, The Biochemistry of Foreign Compounds, Pergamon, Oxford, 1968. 206. T. A. Connors, in Carcinogenicity Testing: Principles and Problems, A. D. Dayan and R. W. Brimblecombe, Eds., MTP Press, Lancaster, 1978,pp. 65-76. Biochemical Mechanisms of Carcinogenicity. 207. H. S Rosenkranz and G. Klopman, Prog. Clin. Biol. Res., 209A, 71 (1986).Mutagens, Carcinogens and Computers. 208. Daylight Chemical Information Systems, Irvine, Calif. 209. M. T. Nguyen and A. F. Hegarty, I. Chem. Soc., Perkin Trans. 2, 345 (1987).A b Initio Calculations of the Acid-Catalysed Hydrolysis of N-Nitrosamines. 210. M. Poulsen, D. Spangler, and G. H. Loew, Mol. Toxicol., 1, 35 (1987).Nitrosamine Carcinogen Activation Pathway Determined by Quantum Chemical Methods. 211. K. Balasubramanian, J. J. Kaufman, W. S. Koski, and A. T. Balaban, I. Comput. Chem., 1. 149 (1980).Computer Generation of Carcinogenic Benzenoid Hydrocarbons. 212. G. Kalopissis, Mutat. Res., 246, 4.5 (1991).Structure-Activity Relationships of Aromatic Amines in the Ames Salmonella typhimurium Assay.
222 ComDuter-Assisted Methods in the Evaluation of Chemical Toxicitv 213. A. F. Cuthbertson and C. Thomson,]. Mol. Graphics, 5 , 92 (1987). Electrostatic Potentials of Tumour Promoters. 214. A. T. Maynard, L. G. Pedersen, H. S. Posner, and J. D. McKinney, Mol. Pharmucol., 29,629 (1986). An Ab Initio Study of the Relationship between Nitroarene Mutagenicity and Electron Affinity. 215. W. G. Richards, Quantum Pharmacology, Butterworths, London, 1983. 216. L. B. Kier, M O Theory in Drug Research, Academic Press, New York, 1971. 217. A. M. Jeffecy and R. M. j. Liskamp, Proc. Nutl. Acad. Sci. USA, 83,241 (1986). Computer Assisted Molecular Modelling of Tumor Promoters. 218. P. A. Wender, K. F. Koehler, N. A. Sharkey, M. L. Dell’Aquila, and P. M. Blumberg, Proc. Natl. Acad. Sci. USA, 83, 4214 (1986). Analysis of the Phorbol Ester Pharmacophore on Protein Kinase C: Guide to Design of Analogs. 219. A. Van Schepdael, R. Busson, H. J. Vanderhaeghe, P. J. Claes, L. Verbist, M. P. MingeotLeclercq, R. Brasseur, and P.M. Tulkens,]. Med. Chem., 34;1483 (1991). New Derivatives of Kanamycin B Obtained by Combined Modifications in Positions 1 and 6“. Synthesis, Microbiological Properties, and In Vitro and Computer-Aided Toxicological Evaluation. 220. N. A. Sparrow, S. Afr. J. Sci., 85, 36 (1989). Tailoring Biologically Active Agents Using Computer Graphics. 221. 1.’ G. Seybold and K. B. Lipkowitz, Int. J. Quantum Chem., 31, 847 (1987). An Empirical Force Field Examination of the peri Effect in Aromatic Hydrocarbon Carcinogenesis. 222. D. F. V. Lewis, Chem. Rev., 86, 1111 (1986). MIND013: A Review of the Literature. 223. M. M. Marsh and D. M. Jerina, J. Med. Chem., 21 1298 (1978). Calculated Properties of Arene Oxides of Biological Interest. 1. Molecular Orbital Examination of Simple Models. 224. S. L. Rose and P. C. Jurs, J. Med. Chem., 25, 76Y (1982). Computer-Assisted Studies of Structure-Activity Relationships of N-Nitroso Compounds Using Pattern Recognition. 225. D. M. Hirst, A Computational Approach to Chemistry, Blackwell, Oxford, 1990. 226. J. C. Phillips, R. Purchase, P. Watts, and S. D. Cangolli, Food Additives Contam., 4, 109 (1987). An Evaluation of the Decision Free Approach for Assessing Priorities for Safety Testing of Food Additives. 227. H. M. Wortelboer, C. A. de Kruif, W. I. de Boer, A. A. J. van lersel, H. E. Falke, and B. J. Blaauboer, Mol. Toxicof., 1, 373 (1987). Induction and Activity of Several Isoenzymes of Cytochrome P-450 in Primary Cultures of Rat Hepatocytes, in Comparison with in Viuo Data.
APPENDIX
Compendium of Software for Molecular Modeling Donald B. Boyd Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana 46285
INTRODUCTION Molecular modeling means different things to different people. As used here and expounded on earlier,’ moiecular modeling means the generation, manipulation, and/or representation of realistic molecular structures and associated physicochemical properties. As generally used by scientists in industry, organic chemistry, and other fields, the term molecular modeling encompasses a number of techniques associated with computational chemistry. Exposure to modeling nowadays is through computer graphics,2 and a few scientists may even blur the distinction between molecular graphics and molecular modeling. For many scientists who come in contact with computer-aided chemistry, molecular modeling is a perfectly respectable term. Among some theoreticians, on the other hand, the term molecular modeling still does not evoke an aura of high-quality research; however, because a quantum mechanical code may lie behind a graphical front-end in a state-of-the-art molecular modeling system, a hierarchical stratification seems unnecessary. An inclusive, rather than exclusive, approach to molecular modeling is the trend of the future. It should be
Reviews in Computational Chemistry, Volume Ill Kenny B. Lipkowitz and Donald B. Boyd, Editors VCH Publishers, Inc. New York, 0 1992
223
224 Compendium of Software for Moleculm Modeling
-
easy to see that the most important question is whether the computer-based techniques of studying molecules will help answer questions faced in day-today research. Not only should many techniques be included under the umbrella of molecular modeling/computational chemistry, but also the nontheoretically trained bench scientist must be welcomed to apply the techniques. In addition, it is obvious that as molecular modeling packages become easier to use with pull-down menus and point-and-click buttons, it is incumbent on the developers to make sure their software is foolproof and not easily inappropriately applied. Listed in this compendium are sources of software that may be of benefit to computational chemists and others interested in applying the techniques. This compendium is provided as a service to both software developers and software consumers. The aim is to advance the field by making, the tools widely known. With the ever-increasing array of software available for molecular modeling, it is useful to attempt to categorize it. Software packages listed in this appendix have been divided into two broad categories based on the platform on which they run, that is, on an inexpensive personal microcomputer or a more powerful computer, such as a minicomputer, mainframe, workstation, or supercomputer. Most users of the software will have either a small budget, in which case the first classification of software is more pertinent, or a large (institutional) budget, in which case the second classification may look more enticing. Within each of these two groupings, we have further subdivided the software (and the corresponding suppliers) according to its main thrust: 1. 2. 3. 4.
General-purpose molecular modeling Quantum chemistry calculations Management of databases of molecular structures Molecular graphics and other applications
Group 1 includes multifunctional and molecular mechanics programs. In the second group are programs for specialized calculations based on molecular orbital or other quantum mechanical theories. Group 3 encompasses software for storage and retrieval of molecular structure data. The fourth group is arbitrarily defined to include programs that can be used to visualize molecules but not to optimize an energy. The reader will immediately recognize that some of the more sophisticated molecular modeling packages-really suites of software-encompass all four areas. Suppliers who offer several products in more than one group are listed in each; otherwise, all the products are listed under the main heading. Most of the software packages listed here are commercially available, although some are free. For each software package, a brief description, the address and telephone number of the supplier, and other pertinent information, such as when a vendor offers more than one program pertinent to molecular
References 225
modeling, are given. The descriptions are necessarily concise overviews, not reviews, and have been gleaned from a variety of materials and sources. Inclusion in this compendium should not be construed as an endorsement. There is no claim as to completeness or accuracy. The reader is encouraged to pursue further details germane to his or her own interests. Product names are the registered symbols or trademarks of their respective organizations. Code developers are increasingly inspired to write their programs to be portable between several machines, even from microcomputers to large computers and vice versa. Software for minisupercomputers and supercomputers may be optimized for a particular machine, in which case the hardware vendor may be an additional source of information about a program. Prices of the software, which range from essentially free to more than $100,000 (U.S.), are not included because they are subject to change and specific conditions. In some cases, particularly with respect to QCPE (Indiana University, Bloomington, Indiana), significant software, which has been verified with respect to expected output, can be obtained at practically no cost. Users of software should be optimistic about prices. With so much excellent software and so many suppliers now vying in the marketplace, prices will be under increasing competitive pressure. Besides those listed here, there are other molecular modeling programs developed in academic and industrial laboratories around the world; however, because the availability, documentation, and degree of support of these other programs are highly variable, it is impractical to include them all. More and more frequently, workstation software is being transferred between computational chemists via file transfer protocol (ftp) or electronic mail over Internet, Bitnet, and similar networks of computers. In these situations, the cost is nothing, but reliability of the software can be unknown. Although an earnest effort has been made to have this compilation as comprehensive, accurate, and up-to-date as possible at the time of its preparation, it should be kept in mind that change is constant. New modeling software products are continually appearing in the marketplace (and a few are falling by the wayside). Readers who have or know of a product or supplier that is not listed here or know of a change in a listing are encouraged to communicate that information to us for future reference.
REFERENCES 1. D. B. Boyd, in Reviews in Computational Chemistry, Vol. 1, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, pp. 321-354. Aspects of Molecular Modeling. 2. Molecular modeling in the minds of some scientists also includes the examination of hand-held wire or plastic models of molecules. For present purposes, only computer-based techniques are considered.
226 Compendium of Software for Molecular Modeling
SOFTWARE FOR PERSONAL COMPUTERS Apple Macintosh 11 or Quadra; IBM PC XT/AT with EGA card or Personal System/2; Silicon Graphics Indigo
General Purpose Molecular Modeling Alchemy I11 Tripos Associates 1699 Hanley Road, Suite 303 St. Louis, M O 63144-2913, U.S.A. Tel. 800-323-2960, 314-647-1099, fax 314-647-9241 (U.S.A.), tel. 44-344-300-144, fax 44-344-360-638 (U.K.), tel. 8 1-3-3711-1511, fa:; 8 1-3-3711-1704 (Japan), e-mail
[email protected] Structure building, manipulation. SYBYL energy minimization. Stick, spacefilling, or cylinder (thick bonds) display. Interface to Chemical Abstracts Service registry files. Molfile transfer to SYBYL and Labvision. ChemPrint for chemical structure drawing. MM2(87) for PC. Macintosh, PC DOS, and Windows versions. CAChe Work System CAChe Scientific Tektronix, Inc. P.O. Box 500, Mail Stop 13-400 Beaverton, OR 97077, U.S.A. Tel. 503-627-3737 Structure building from library of fragments and molecules, manipulation. MM2 energy minimization. Stick, ball-and-stick, or space-filling display. Extended Hiickel, MOPAC, and ZlNDO molecular orbital programs. Orbital, electron density, and electrostatic maps. Applicable to chemical reactivity modeling. BLogP and BLogW for prediction of octanol/water partition coefficient and water solubility. Personal CAChe for standard Macintosh. CAChe GroupServer for networking to IBM workstation. Tektronix enhanced Macintosh I1 workstation with RISC coprocessor and stereoscopic graphics. CAMSEQIM Weintraub Software Associates, Inc. P.O. Box 42577 Cincinnati, OH 45242, U.S.A. Structure building, manipulation. Rigid conformational searching with interface to CAMSEQIPC. Stick, ball-and-stick, and space-filling display. PC.
Software for Personal COmbUterS 227
Chem3D Plus Cambridge Scientific Computing Inc. Dr. Stewart Rubenstein 875 Massachusetts Avenue, Suite 41 Cambridge, MA 02139, U.S.A. Tel. 617-491-6862, e-mail
[email protected] Structure building, manipulation. Simple force field and MM2 energy minimization and molecular dynamics. Ball-and-stick and space-filling display. 2D to 3D conversion. ChemDraw for chemical structure drawing. Macintosh I1 and UNIX workstations. ChemCad+ C-Graph Software, Inc. P.O. Box 5641 Austin, T X 78763, U.S.A. Tel. 512-459-3562 Structure building, manipulation. Van der Waals and electrostatic energy minimization by M M 2 and MNDO. Stick or ball-and-stick display. Report generation, interface to ChemDraft for drawing chemical structures. Database of structures of compounds used in the development of the semiempirical methods in MOPAC and AMPAC. PC. Chem-X Chemical Design Inc. 200 Route 17 South, Suite 120 Mahwah, NJ 07430, U.S.A. Tel. 201-529-3323, fax 201-529-2443 (U.S.A.), tel. 44-0223-251-483, fax 44-0865-250-270 (U.K.), tel. 81-03-3345-1411, fax 81-03-3344-3949 (Japan) An integrated, modular system for molecular visualization and computation of organic, inorganic, peptide, and polymeric compounds. Stick, ball-and-stick, and space-filling representations. 386 and 486 PCs and Apple Macintosh 11. Desktop Molecular Modeller Oxford Electronic Publishing Oxford University Press Walton Street Oxford OX2 6DP, England, U.K. Tel. 44-865-56767, x4278 Structure building, manipulation. Energy minimization. Stick, ball-and-stick, or space-filling display. PC.
228 Compendium of Software for Molecular Modeling HAMOG P.O. Box 1247 Birkenstrasse 1A Schwerte D-5840, Federal Republic of Germany Structure building, manipulation. Electrostatic potentials. Interfaces to ECEPP and MM2P. Stick, ball-and-stick, or space-filling display. PC. HyperChem Autodesk, Inc. 2320 Marinship Way Sausalito, CA 94965, U.S.A. Tel. 800-424-9737, fax 415-491-831 1 (U.S.A.), tel. 41-38-337633 (U.K.) Model building, display, charge density, electrostatic potential, and molecular orbital plots. Stick, sphere, and dot surface display. 2D to 3D conversion. Protein and DNA fragment libraries. M M + , BIO+ (implementations of MM2 and CHARMM, respectively), OPLS, and AMBER 'molecular mechanics and dynamics. Solvent box. Semiempirical calculations by Extended Hiickel, CNDO, INDO, MIND0/3, MNDO, AM1, and PM3. Originated at Hypercube, Inc. (Dr. N. Ostlund eta].), of Ontario, Canada. Runs under Windows on a 386 or 486 PC and under Motif on a Silicon Graphics workstation. MacMimic InStar Software AB IDEON Research Park 5-223 70 I,und, Sweden Tel. 46-46-182470, fax 46-46-128022 (and) Dr. Anders Sundin Organic Chemistry 2 University of Lund P.O. Box 124 S-22100 Lund, Sweden Tel. 46-46-108214, fax 46-46-108209, e-mail
[email protected],
[email protected] Structure building, manipulation, comparison. Energy minimization by authentic MM2 (91) force field, dihedral angle driver for structures with up to 200 atoms. Stick or ball-and-stick display, multiple structures in multiple windows, structures with up to 32,000 atoms. Macintosh 11 and Quadra with FPU math coprocessor. MicroChem Chemlab, Inc. 1780 Wilson Drive Lake Forest, IL 60045, U.S.A. Tel. 312-996-4816
Software for Personal Computers 229
Structure building, manipulation. Energy minimization of organic, inorganic, and polymer units. Stick, ball-and-stick, or space-filling display. QSAR Craig plots. PC. MOBY Springer-Verlag New York, Inc. Electronic Media Department 175 Fifth Avenue New York, NY 10010, U.S.A. Tel. 212-460-1653 Structure building, geometry optimization, dynamics, semiempirical calculations. AUTONOM for computerized assignment of chemical nomenclature to structures. PC. NEMESIS Oxford Molecular Ltd. The Magdalen Centre Oxford Science Park
sand ford-on-Thames
Oxford OX4 4GA, England, U.K. Tel. 44-0865-784600, fax 44-0865-784601 (U.K.), tel. 415-494-6274, fax 415-494-7140 (U.S.A.), tel. 81-33-243-5004, fax 8 1-33-245-5009 (Japan) Desktop molecular modeling on the Macintosh 11. NEMESIS SAMPLER for PC and Macintosh 11. PCMODEL Serena Software Dr. Kevin E. Gilbert P. 0. Box 3076 Bloomington, IN 47402, U.S.A. Tel. 812-333-0823, 812-855-1302/9415, e-mail
[email protected] Structure building, manipulation. Energy minimization by MMX (an extension of M M 2 and MMP1). Stick and dot surface display for organic, inorganic, organometallic, hydrogen-bonded, pi-bonded, and transition-state systems. Solvent dynamics. Structure files can be read and/or written for MM2, MOPAC, X-ray crystal data, and others. Companion MOPAC program. IBM DOS PC, Macintosh 11, Silicon Graphics, and Sun versions.
Quantum Chemistry Calculations ATOM Project Seraphim Department of Chemistry University of Wisconsin Madison, WI 53706, U.S.A. Also, ATOMPLUS, H2ION, and GAUSS2 for educational uses.
2.30 Compendium of Software for Molecular Modeling
HMO Trinity Software Dr. J. Figueras P. 0. Box 960 Campton, N H 03223, U S A . Graphics-based Huckel molecular orbital calculator.
MOPAC QCPE Creative Arts Building 181 Indiana University 840 State Highway 46 Bypass Bloomington, IN 47405, U.S.A. Tel. 812-855-4784, fax 812-855-5539, e-mail
[email protected] Semiempirical molecular orbital package for optimizing geometry and studying reaction coordinates. Extensive library of more than 100 programs for quantum mechanics, molecular mechanics, and molecular graphics, including DRAW (a graphical complement to MOPAC), AMPAC, MM2, CNINDO/D, FORTICON8 (Extended Huckel), MNDO, HAM/3, POLYATOM, MOLVIEW, NAMOD, MOPC (orbital plots), STERIMOL, MOLYROO, MLDC8 (NMR analysis), and MOLVIB. PC and Macintosh 11.
Databases of Molecular Structures STN EXPRESS Chemical Abstracts Service 2540 Olentangy River Road Columbus, OH 43210, U.S.A. Tel. 614-447-3600, e-mail
[email protected] Three-dimensional structures for over four million compounds from the Chemical Abstracts registry file generated through CONCORD.
Molecular Graphics and Other Applications Ball & Stick Cherwell Scientific Publishing 27 Park End Street Oxford, OX1 IHU, England, U.K. Tel. 44-0865-794884, fax 44-0865-794664 Import from molecular modeling packages, rotation. Macintosh.
Software for Personal Computers 23 1 chemVISION Molecular Arts Corporation 1532 East Katella Avenue Anaheim, CA 92805, U.S.A. Tel. 714-634-8100, fax 714-634-1999 3D rendering of molecules. PC under Windows. Kekuk Fein-Marquart Associates Inc. 7215 York Road Baltimore, M D 21212, U.S.A. Tel. 410-821-5980, fax 410-296-0712 Converts a computer-scanned chemical structure drawing into molfiles and SMILES. IBM 386 and 486 PCs. Kinemage Protein Science University of Wisconsin, SJ-70 Seattle, WA 98195, U.S.A. Tel. 206-685-1039, fax 206-685-2674, e-mail
[email protected] PREKIN and MAGE by Dr. David C. Richardson for visualization of structures in Brookhaven PDB format from the journal Protein Science. Macintosh. Maclmdad Molecular Applications Group Dr. Michael Levitt 880 Lathrop Drive Stanford, CA 94305-1503, U.S.A. Tel. 415-857-0903, fax 415-857-1754 Graphical system for small and macromolecule building, display, and animation. Macintosh 11. METABOLEXPERT CompuDrug North America, Inc. Dr. Harold Borgstedt P.O. Box 23196 Rochester NY 14692-3196, U.S.A. Tel. 716-292-6830, fax 716-292-6834 Prediction of metabolic products based on a library of known transformations. PC.
232 ComDendium of Software for Molecular Modeling MOLCONN-X Hall Associates Consulting Dr. Lowell H. Hall Department of Chemistry Eastern Nazarene College Quincy, MA 02170, U.S.A. Computes topological indexes from molecular structures for use in QSAR analysis. PC and Macintosh. Also VAX version. Molecules Atlantic Software P.O. Box 299 Wenham, MA 01984, U.S.A. Tel. 508-922-4352 Builds and plots 3D structures. DNA/RNA Builder. Protein Predictor and N.N.Charge both based on neutral network approach. Macintosh. NanoVision American Chemical Society Distribution Office P.O. Box 57136, West End Station Washington, DC 20037, U.S.A. Tel. 800-227-5558, 202-872-4363, fax 202-872-6067 A 3D visualization program that is capable of rotating molecules with up to 32,000 atoms, for the Macintosh 11. Also, ACS is second party distributor for Alchemy, Chem3D Plus, and other software for PCs and Macintosh 11. Promodeler New England BioCraphics Barnet, VT,U.S.A. Tel. 802-633-4344 Modeling macromolecules.
TOPMOST Health Designs, Inc. Dr. Kurt Enslein 183 East Main Street Rochester, NY 14604, U.S.A. Tel. 716-546-1464, fax 716-546-3411 (U.S.A.), tel. 44-379-644122, fax 44-379-651165 (U.K.) Calculation of electronic charges and related parameters by quick methods based on electronegativity. TOPKAT program for statistically modeling car-
Minicomputers, Superminicomputers, Workstations, Supercomputers 233 cinogenicity, mutagenicity, skin and eye irritation, teratogenicity, and several other acute toxicity endpoints from their structures. TOPDRAW for graphical input. PC DOS. Also runs on DEC VAX.
SOFTWARE FOR MINICOMPUTERS, SUPERMINICOMPUTERS, WORKSTATIONS, AND SUPERCOMPUTERS Alliant; AT&T; Convex; Cray; DEC; Evans & Sutherland; Fujitsu; Hewiett-Packard; Hitachi; IBM; Intel; Kubota; NEC; Silicon Graphics; Star; Sun; others.
General-Purpose Molecular Modeling AMBER Dr. Peter A. Kollman Department of Pharmaceutical Chemistry University of California San Francisco, CA 94143, U.S.A. Tel. 415-476-4637, fax 415-476-0688, e-mail
[email protected] Assisted Model Building using Energy Refinement. Energy minimization, molecular dynamics, and free energy perturbation (FEP) calculations. SPASMS (San Francisco Package of Applications for the Simulation of Molecular Systems). VAX, Cray versions. Catalyst BioCAD Corporation 1390 Shorebird Way Mountain View, CA 94043, U.S.A. Tel. 415-903-3900, fax 415-961-0584 Computer-assisted molecular design based on three-dimensional disposition of chemical features defining a pharmacophore; bioactivities of hypothetical structures predicted based on their fit to models derived from known compounds. Silicon Graphics. CHARMM Dr. Martin Karplus Department of Chemistry Harvard University 12 Oxford Street
234 Compendium of Software for Molecular Modeling Cambridge, MA 02138, U.S.A. Tel, 617-495-4018, fax 617-495-1792, e-mail
[email protected] Molecular dynamics package using Chemistry at Harvard Macromolecular Mechanics force field. Extensive scripting language for molecular mechanics, simulations, solvation, electrostatics, crystal packing, vibrational analysis, free energy perturbation (FEP) calculations, quantum mechanics/molecular mechanics calculations, stochastic dynamics, and graphing data. Chem-X Chemical Design Inc. 200 Route 17 South, Suite 120 Mahwah, NJ 07430, U.S.A. Tel. 201-529-3323, fax 201-529-2443 (U.S.A.), tel. 44-0223-251-483, fax 44-0865-250-270 (U.K.), tel. 81-03-3345-1411, fax 81-03-3344-3949 (Japan) An integrated, modular system for molecular visualization, computatior, and 3D database creation and searching for compounds in all areas of chemistry. The base system provides the molecular building and display, geometry and energy calculations. Other modules for proteins, AMBER interface, polymers, quantum mechanics, DGEOM interface, QSAR and statistical analysis, database management, and structural databases. Silicon Graphics, ESV, DECstation, IBM RS/6000, VAX, 386 and 486 PCs, and Apple Macintosh 11. GROMOS Biomos B.V. Laboratory of Physical Chemistry University of Groningen Nijenborgh 16 9747 AG Croningen, The Netherlands Tel. 31-50-63-43291432314320 Groningen molecular simulation system. SPC solvation model. PCMCAD for polymer/biopolymer mechanics. Insight/Discover BIOSYM Technologies, Inc. 9685 Scranton Road San Diego, CA 92121 -2777, U.S.A. Tel. 619-458-9990, fax 619-458-0136 (U.S.A.), tel. 44-256-817-577, fax 44-256-817-600 (U.K.), tel. 81-04-7353-6997, fax 8 1-04-7353-6330 (Japan), e-mail
[email protected] Insight 11, a graphics program for building, loop searching, manipulating, and analyzing molecules. Discover, a molecular mechanics and dynamics program. Insight Xpress is subset for bench chemists. Delphi, calculation and visualization of Poisson-Boltzmann electrostatic potentials. Homology, construction of
Minicomputers, Superminicomputers, Workstations, Supercomputers 235 proteins by structural homology. Polymer, modeling properties of polymers. NMRchitect, modeling based on NMR data. Aligned with Hare Research for NMR analysis. Apex for QSAR. Ludi for ligand design from receptor site geometry. Sketcher for 2D-to-3D conversion using distance geometry. Converter for 2D-to-3D conversion of structures in MACCS databases. VAX, Cray, and Silicon Graphics and IBM workstation versions. MacroModel Dr. W. Clark Still Department of Chemistry Columbia University New York, NY 10027, U.S.A. Tel. 212-280-2577, e-mail sl$model%cuchem.bitnet A user-friendly molecular modeling package for molecular mechanics and conformational searching of organic molecules, proteins, nucleic acids, and carbohydrates. AMBER-, MM2-, and MM3-like and OPLS force fields; implicit solvation model. Reads Cambridge and Brookhaven PDB files. VAX, Convex, Alliant, Cray, and workstations. MAD Aquitaine Systemes Tour Elf, 2 Place de la Coupole 92078 Paris la Defense, Cedex 45, France Tel. 33-1-4744-4082 Molecular Advanced Design systems for general molecular modeling. MAD TSAR for QSAR analysis, including computation of topological indices. IBM, Silicon Graphics, and Hewlett-Packard workstations. MM3 Technical Utilization Corporation, Inc. 235 Glen Village Court Powell, OH 43065, U.S.A. Tel. 614-885-0657 N. L. Allinger’s molecular mechanics program for energy minimization of organic molecules. Also MM2. MODEL Dr. Kosta Steliou Department of Chemistry University of Montreal Montreal, Quebec H3C 357, Canada Tel. 514-343-6219, fax 514-343-7586, e-mail
[email protected] 236 Compendium of Software for Molecular Modeling Molecular modeling with an AMBER-like and MM2 force fields. Batch conformational searching with BAKMDL. Interfaces to AMPAC, MacroModel, GAUSSIAN86, SYBYL, PCMODEL, CHEM-3D. VAX.
MOPAC QCPE Creative Arts Building 181 Indiana University 840 State Highway 46 Bypass Bloomington, IN 47405, U.S.A. Tel. 8 12-855-4784, fax 8 12-855-5539, e-mail
[email protected] Semiempirical molecular orbital package for optimizing geometry and studying reaction coordinates. Extensive library of more than 600 other programs from academia and industry for quantum mechanics, molecular mechanics, structure generation from NMR data, and molecular graphics, including DRAW, AMPAC, MNDOC, MNDO, FORTICON8 (Extended Hiickel), CNINDO, CNDO/S, PCILO3 (Perturbative Configuration Interaction using Localized Orbitals), GAUSSIAN, PSI77 (orbital plots), DISGEO (distance geometry), ECEPP2 (Empirical Conformational Energy Program for Peptides), QCFF/PI, BIGSTRN3, DGEOM, TRIBL, MOLY-86, AMSOL, NOEL (molecular similarity), PRODEN, CHEMICALC-2 (log p), MSEED (solventaccessible surface areas), CPKPDB, SIBFA (intermolecular interactions), PSDD (neural network simulator for drug design), and VOID (protein packing). Holdings include some IBM programs described in the MOTECC book series, such as CMAP (Chemical Modeling Applications Platform), HONDO, and KGNMOL. QCPE has also accepted the responsibility to distribute “semicommercial” academic software, such as MM2(91), MM3(92), POLYRATE, and PEFF. Most programs are in FORTRAN. Many of the programs run on several hardware platforms, including DEC (VAX), IBM, Silicon Graphics, Stardent, Fujitsu, and Cray.
PRO-EXPLORE BioStructure S.A. Parc &Innovation 67400 Illkirch, France Tel. 33-88-678900, fax 33-88-679801 Sequence analysis and biomolecular modeling. PRO-SIMULATE for molecular simulations with GROMOS, AMBER, and OPLS force fields. PROQUANTUM for semiempirical (MOPAC) and ab initio (CADPAC)calculations via a graphical interface.
Minicomputers, Superminicomputers, Workstations, Supercomputers 23 7 Prometheus Proteus Biotechnology Ltd. 48 Stockport Road Marple, Cheshire SK6 6AB, England, U.K. Tel. 44-061-426-0191 Protein model building based on artificial intelligence and energy minimization. PROPHET BBN Systems and Technologies Corporation 10 Moulton Street Cambridge, MA 02238,U.S.A. e-mail prophet-info@?bbn.com Tel. 617-873-2669, Molecular building, molecular mechanics, simulations, and display. Statistical and mathematical modeling and display. Sequence analysis. Structural and sequence database retrieval. UNIX workstations, such as Sun, VAX (Ultrix), DECstations, and Macintosh IIfx (A/UX). QUANTAjCHARMm Molecular Simulations Inc. 200 Fifth Avenue Waltham, MA 02154,U.S.A. Tel. 617-890-2888, 408-732-9090, fax 617-890-8694, 408-732-0831 (U.S.A.), tel. 44-734-568-056, 44-223-421-590, fax 44-734-567-73 1, 44-223-421-591(U.K.),tel. 81-3-3358-5261, fax 81-3-3358-5260 (Japan), e-mail
[email protected] Structure building, manipulation, energy minimization, molecular dynamics, Boltzmann jump Monte Carlo conformational searching, and protein homology building. QUANTA molecular graphics system is integrated with the CHARMm molecular dynamics software based on a force field derived from the Chemistry at Harvard Macromolecular Mechanics force field. X-PLOR for X-ray structure refinement and simulated annealing. QSPR-Polymer for property estimation. Receptor to set up 3D queries for MACCS or ISIS/3D databases and to visualize hits. BIOGRAF for biological applications with features for drug, proteins, carbohydrates, lipids, and DNAIRNA. POLYGRAF for modeling polymers, materials, and solvents. NMRgraf for structure prediction with NMR data. CERIUS for modeling of polymeric, small molecular, and inorganic materials; statistical mechanical simulation; crystal modeling; diffraction and scattering simulation; and noncrystalline diffraction data processing. AVS ChemistryViewer for visualization of computational chemistry results. Products of Polygen, Molecular Simulations Inc., and Cambridge Molecular Design. Applicable to drugs, protein engineering, molecular biology, polymer
238 Compendium of Software for Molecular Modeling
chemistry, and material science. Silicon Graphics, Cray, Sun, DEC, Alliant, Convex, Stardent, HP, and IBM workstation versions. SYBYL Tripos Associates 1699 Hanley Road, Suite 303 St. Louis, M O 63144-2913, U S A . Tel. 800-323-2960, 314-647-1099, fax 314-647-9241 (U.S.A.), tel. 44-344-300-144, fax 44-344-360-638 (U.K.), tel. 81-3-371 1-1511, fax 81-3-3711-1704 (Japan), e-mail
[email protected] An integrated molecular modeling package with capabilities for molecular mechanics, conformational searching, minimization, semiempirical and ab initio molecular orbital calculations, molecular graphics, active analog approach, and molecular dynamics. Tripos, AMBER- and MM2-like force fields. Components for handling small molecules, biomolecules, and polymers. Interfaces to Cambridge Structural Database, Brookhaven Protein Databank, and QCPE programs. QSAR based on Comparative Molecular Field Analysis and interface to Daylight’s CLOGP and CMR. Molecular Spreadsheet for data management and analysis. N. L. Allinger’s MM3 and MM2(91) molecular mechanics programs for industrial customers. R. Pearlman’s CONCORD knowledgebased model builder for rapid generation of 3D databases from connectivity databases. T. Blundell’s COMPOSER for building proteins by homology. W. L. Jorgensen’s BOSS (Biochemical and Organic Simulation System) program for Monte Carlo simulations. Molecular Silverware for solvating molecules. R. Dammkoehler’s RECEPTOR for constrained conformational searching. NMR TRIAD for multidimensional data processing and structure determination. N M R l and NMRZ of New Methods Research Inc. LabVision is a subset of SYBYL for bench chemists using ESV workstation. J. Brickmann’s MOLCAD for visualization with Gourard-shaded and transparent surfaces on Silicon Graphics. NITRO terminal emulator for Macintosh and PC. X Windows for Macintosh, PC, and X terminals. VAX, Silicon Graphics, Evans & Sutherland, and Cray versions. WHAT IF Dr. Gerrit Vriend EMBL Meyerhofstrasse 1 6900 Heidelberg, Federal Republic of Germany Tel. 49-6221-387473, fax 49-6221-387517, e-mail
[email protected] Protein modeling package with molecular graphics, homology building, database searches, and options for NMR and X-ray-related work. VAX/PS300, E&S, and Silicon Graphics workstations, and Bruker.
Minicomputers, Superminicomputers, Workstations, Supercomputers 23 9
Yeti Dr. Angelo Vedani Biographics Laboratory Swiss Institute for Alternatives to Animal Testing Aeschstrasse 14 CH-4107 Ettingen, Switzerland e-mail
[email protected] Molecular mechanics with special treatment of hydrogen bonding, solvation, and metal ions. Also Yak for receptor modeling based on directionality of potential binding points on a ligand. VAX, Silicon Graphics, and Evans & Sutherland.
Quantum Chemistry Calculations ACES I1
Dr. Rodney J. Bartlett Quantum Theory Project 362 Williamson Hall University of Florida Gainesville, FL 3261 1-2085, U.S.A. Tel. 904-392-1597, fax 904-392-8722, e-mail
[email protected] Ab initio molecular orbital calculations using coupled-cluster and many-body perturbation theory methods. Argus Dr. Mark A. Thompson Molecular Science Research Center Pacific Northwest Laboratory P.O. Box 999, Mail Stop K1-87 Richland, WA 99352, U.S.A. e-mail
[email protected] Semiempirical (EHT, INDO1, INDOl/S, and NDDOl) and SCF calculations for spectroscopic properties. C language. Sun, HP, IBM workstations, Cray, and PC. ASTERIX Computer Physics Communications Program Library Queens’ University of Belfast Belfast, Northern Ireland, U.K. (and) Dr. Marie-Madeleine Rohmer Laboratoire de Chimie Quantique
240 Compendium of Software for Molecular Modeling Institut Le Be1 4, rue Blaise Pascal F-67000 Strasbourg, France Tel. 33-88-41-61-42, fax 33-88-61-20-85, e-mail
[email protected] Ab initio calculations for large organometallic and other compounds. FORTRAN programs designed for Cray supercomputers. CADPAC Lynxvale WCIU Programs Dr. Roger Amos 20 Trumpington Street Cambridge CB2 lQA, England, U.K. Tel. 44-223-336384 Cambridge Analytical Derivatives Package. Ab initio calculations. Cray and other versions. CHELPG Dr. Curt M. Breneman Department of Chemistry Rensselaer Polytechnic Institute Troy, NY 12180, U.S.A. Tel. 5 18-276-2678, e-mail
[email protected] Computes electrostatic potential-derived charges from ab initio wavefunctions generated by one of the Gaussian 86/88/90 packages. This program is a modification of CHELP by Dr. Lisa E. Chirlian and Dr. Michelle M. Francl. UNIX and VMS machines. COLUMBUS Program System Dr. Isaiah Shavitt Dr. Russell M. Pitzer Department of Chemistry Ohio State University Columbus, OH 43210, U.S.A. Tel. 6 14-292-1668, fax 6 3 4-292-1685, e-mail
[email protected],
[email protected],
[email protected],
[email protected] Modular FORTRAN programs for performing general ab inztio multireference single- and double-excitation configuration interaction, averaged coupled-pair functional and linearized couple-cluster method calculations. Cray and other versions. DMol BIOSYM Technologies, Inc. 9685 Scranton Road
Minicomputers, Superminicomputers, Workstations, Supercomputers 24 1
San Diego, CA 92121-2777, U.S.A. Tel. 619-458-9990, fax 619-458-0136 (U.S.A.), tel. 44-256-817-577, fax 44-256-8 17-600 (U.K.), tel. 8 1-047353-6997, fax 8 1-04735 3-6330 (Japan), e-mail
[email protected] Local density functional (LDF) quantum mechanical calculations for materials science. Turbomole for Hartree-Fock and MP2 a6 initio calculations. Silicon Graphics and IBM workstation versions. GAMESS Dr. Michael Schmidt Department of Chemistry North Dakota State University 1301 Twelfth Avenue North Fargo, ND 58105, U.S.A. Tel. 701-237-7966, e-mail mischmid@ndsuvml .bitnet, mischmid@vml .nodak.edu General Atomic and Molecular Electronic Structure System. A6 initio calculations. Cray and other versions. Gaussian Gaussian, Inc. Dr, Michael Frisch 4415 Fifth Avenue Pittsburg, PA 15213, U.S.A. Tel. 412-621-2050, fax 412-621-3563, e-mail
[email protected] Gaussian 92. A6 initio molecular orbital calculations (Hartree-Fock, Direct HF, Msller-Plesset, C1, Reaction Field Theory, electrostatic potential-derived charges, vibrational frequencies, etc.). Input and output of molecular structures in formats of many other molecular modeling systems. Browse for archival storage of computed results. VAX, Cray, DEC-RISC (Ultrix), Fujitsu (UXP/M), Kubota, IBM RS/6000,Multiflow, Silicon Graphics, Sun, and other versions. Gaussian 90 for Convex, FPS-500, Fujitsu (MSP), IBM (VM, MVS), HP-700, and NEC SX/3 systems. GRADSCF Polyatomics Research Institute Dr. Andrew Kormornicki 1101 San Antonio Road, Suite 420 Mountain View, CA 94043, U.S.A. Tel. 415-964-4013 Ab initio calculations. Cray and other versions.
242 Compendium of Software for Molecular Modeling HONDO IBM Dr. Michel Dupuis Department 48B, Mail Stop 428 Kingston, NY 12401, U S A . Tel. 914-385-4965, e-mail
[email protected] Ab initio calculations for IBM 3090 and other models. KGNMOL Dr. Enrico Clementi Centro di Ricerca, Sviluppo, e Studi Superiori in Sardegna (CRS4) Casella Postale 488 09100 Cagliari, Italy Tel. 39-70-279-62-231, fax 39-70-279-62-220, e-mail
[email protected],
[email protected] A b initio calculations. ATOMSCF, BROWNIAN, KGNGRAF, KNGMD, ALCHEMY-11, HONDO-8, MELD, and other programs described in MOTECC book series (E. Clementi, Ed., 1989-1991, ESCOM, Leiden). IBM machines under VM, MVS, and AIX operating systems. PSI88 Dr. W. L. Jorgensen Dr. D. L. Severance Yale University P.O. Box 6666 New Haven, CT 06511, U.S.A. Tel. 203-432-6288, fax 203-432-6144, e-mail
[email protected] Plots of wavefunctions in three dimensions from semiempirical and popular ab initio basis sets. Silicon Graphics, Sun, VAX, Cray, and others. SPARTAN Wavefunction, Inc. Dr. Warren J. Hehre 18401 Von Karman, Suite 210 Irvine, CA 92715, U.S.A. Tel. 714-955-2120, fax 714-955-21 18 Ab initio (Hartree-Fock, Mdler-Plesset, direct HF), semiempirical (MNDO, AM1, PM3), and molecular mechanics. Graphical front-end and post processor of the output. Cray, Convex, DEC, HI?,IBM, and Silicon Graphics versions. UniChem Cray Research, Inc. Cray Research Park 655 Lone Oak Drive
Minicomputers, Superminicomputers, Workstations, Supercomputers 243 Eagan, M N 55121, U.S.A. Tel. 612-683-3688, fax 612-683-3099, e-mail
[email protected] DGauss for density functional theory calculations with nonlocal, SCF corrections, and geometry optimization. Cadpac 5.0 for ab initio calculations. MNDO9O for semiempirical molecular orbital calculations. A package with a graphics front end for structure input and visualizations of electron density, electrostatic potentials, and molecular orbitals. Silicon Graphics and Macintosh (under X-Windows) networked to a Cray. ZlNDO Dr. Michael C. Zerner Quantum Theory Project Department of Chemistry Williamson Hall University of Florida Gainseville, FL 32611, U.S.A. Tel. 904-392-0541 A general semiempirical molecular -orbital package including parameters for transition metals and for spectroscopy.
Databases of Molecular Structures BLDKIT Protein Data Bank Chemistry Department, Building 555 Brookhaven National Laboratory Upton, NY 11973, U.S.A. Tel. 516-282-3629, fax 516-282-5751, e-mail
[email protected],
[email protected] Model builder’s kit. BENDER for bent wire models. CONECT generates full connectivity from atomic coordinates in Brookhaven database. DGPLOT for diagonal plots on printer. DIHDRL for torsional angles. DSTNCE for interatomic distances. FISIPL for phi/psi plots. NEWHEL92 for helix parameters. Database of over more than 800 sets of atomic coordinates of proteins and other macromolecules derived from X-ray crystallography, modeling, and NMR. CAVEAT Dr. Paul A. Bartlett Department of Chemistry University of California Berkeley, CA 94720, U.S.A. Tel. 415-642-1259, fax 415-642-8369 Searching Cambridge database for molecules with specified bond vectors.
244 Compendium of Software for Molecular Modeling Chem-X Chemical Design Inc. 200 Route 17 South, Suite 120 Mahwah, NJ 07430, U.S.A. Tel. 201-529-3323, fax 201-529-2443 (U.S.A.), tel. 44-0223-251-483, fax 44-0865-250-270 (U.K.), tel. 81-03-3345-141 1, fax 81-03-3344-3949 (Japan) ChemCore module to three-dimensionalize 2D structures, interfaces to reformat MACCS, SMILES, or DARC-21) databases, ChemDBS-1 module to build 3D databases, and ChemDBS-3D module to search 3D databases. Database searching accounts for conformational flexibility while storing only one conformation. Chapman & Hall’s 3D Dictionary of Drugs (12,000 medicinally interesting compounds), 3D Dictionary of Natural Products (50,000 antibiotics, alkaloids, and terpenoids), and 3D Dictionary of Fine Chemicals (105,000 organics). COBRA Oxford Molecular Ltd. The Magdalen Centre Oxford Science Park
sand ford-on-Thames
Oxford OX4 4GA, England, U.K. Tel. 44-0865-784600, fax 44-0865-784601 (U.K.), tel. 415-494-6274, fax 415-494-7140 (U.S.A.), tel. 81-33-245-5004, fax 81-33-245-5009 (Japan) Constructs multiple conformers from a library of 3D fragments and rules (update of WIZARD); accepts SMILES notation input. Iditis is a relational database of protein structures from the Brookhaven Protein Data Bank. Serratus is a nonredundant database of amino sequences from NBRF-PIR, SWISSPROT, and GenBank. Asp for molecular similarity comparisons. CONSTRICTOR for distance, geometry. CAMELEON for protein sequence alignment. Antibody Modelling for building variable fragments and energy-refining them with EUREKA and the Pimms molecular modeling system. Python for QSAR spreadsheet and statistics on HP and Silicon Graphics. DAYMENUS Daylight Chemical Information Systems, Inc. 18500 Von Karman Avenue, Suite 450 Irvine, CA 92715, U.S.A. Tel. 714-476-0451, fax 714-476-0654 Chemical information platform for integration of chemical software tools including nomenclature (SMILES), 2D and 3D structural database management, similarity searching, graphic display, geometry, and modeling. THOR chemical information databases. POMONA database of 25,000 compounds and their properties. GEMINI for molfile conversions. Castor for managing a database
Minicomputers, Superminicomputers, Workstations, Supercomputers 245 on a workstation with structure entry via ChemDraw and STN Express. Interfaces to programs for predicting lipophilicity (CLOGP) and molar refractivity (CMR), generating single (CONCORD) and multiple (COBRA) conformations via rules, molecular surface area/volume (SAVOLZ), 3D database searching (ALADDIN), and molecular descriptor generation (TOPMOST). ISIS Molecular Design Ltd. 2132 Farallon Drive San Leandro, CA 94577, U.S.A. Tel. 510-895-1313, fax 510-352-2870 (U.S.A.), tel. 41-61-4812180, fax 41-61-4812721 (Switzerland), tel. 81-06-949-0476, tel. 81-06-241-4701 (Japan) Integrated Scientific Information System for management of databases of 2D and 3D structures and associated properties on multiple platforms. MS-DOS, Macintosh, and Fujitsu FMR terminal support of ISIS/Draw and /Base. MACCS 2.0 for managing databases of 2D and 3 0 structures on a single platform. 3D searches of structures in fixed conformations. Databases of structures three-dimensionalized by CONCORD, including CMC-3D of known pharmaceutical agents mentioned in Comprehensive Medicinal Chemistry (5000 medicinally interesting compounds; C. Hansch et al., 1990, Pergamon Press, Elmsford, NY), FCD-3D from the Fine Chemical Directory (57,000 commercial chemicals), and MDDR-3D from the Drug Data Reports (12,000 drugs under development). Cambridge Structural Database will be in MACCS format. VAX, IBM, and other superminicomputers and mainframes. QUEST Cambridge Crystallographic Data Centre 12 Union Road Cambridge CB1 lEZ, England, U.K. Tel. 44-0223-336408, fax 44-0223-312288, e-mail
[email protected] Data retrieval and analysis for the Cambridge Structural Database with about 100,000 X-ray structures of low-molecular-weight organic and organometallic compounds. BUILDER converts structures to CSD format. PLUTO for molecular graphics. GSTAT for generation of molecular geometry. The CSD is also to be made available to MACCS format. VAX, Silicon Graphics, and others. SYBYL/3DB Tripos Associates 1699 Hanley Road, Suite 303 St. Louis, MO 63144-2913, U.S.A. Tel. 800-323-2960, 314-647-1099, fax 3 14-647-9241 (U.S.A.), tel. 44-344-300-144, fax 44-344-360-638 (U.K.), tel. 81-3-3711-1511, fax 81-3-3711-1704 (japan), e-mail
[email protected] 246 Compendium of Software for Molecular Modeling Combines 2D and 3D searching and storage with other molecular design tools. Searches Cambridge Structural Database, Chemical Abstracts Service registry file, or any MACCS database. POSSUM and PROTEP for searching databases for structural motifs. CONCORD for rapid generation of a single, high-quality conformation from connectivity of a small molecule. VAX, UNIX workstations, and Macintosh and PC under X-Windows.
Molecular Graphics and Other Applications DOCK Dr. Irwin D. Kuntz Department of Pharmaceutical Chemistry School of Pharmacy University of California San Francisco, CA 94143-0446, U.S.A. Tel. 415-476-1397 Samples the six degrees of freedom involved in the relative placement of two three-dimensional rigid structures and scores their fit. Companion programs SPHGEN, DISTMAP, and CHEMGRID. Silicon Graphics. FITIT Dr. Douglas A. Smith Department of Chemistry University of Toledo Toledo, OH 43606-3390, U.S.A. Tel. 419-537-2116, fax 419-537-4033, e-mail
[email protected] Outputs molecule structure files in formats readable by MM2, MM3, MOPAC, AMPAC, MacroModel, and other programs. XDRAW for displaying input and output of MOPAC. BOLTZMANN for conformer populations. VAX, UNIX, and DOS versions. FRODO Dr. Florante A. Quiocho Howard Hughes Medical Institute Baylor College of Medicine One Baylor Plaza Houston, TX 77030, U.S.A. Tel. 713-798-6565, fax 713-797-6718, e-mail
[email protected] Molecular graphics and crystallographic applications. Evans & Sutherland. CHAIN is a newer, supported program for electron density fitting and molecular graphics that runs on Evans & Sutherland (PS3OOs and ESVs) and Silicon Graphics.
Minicomputers, Superminicomputers, Workstations, Supercomputers 247 GRID Molecular Discovery Ltd. Dr. Peter Goodford West Way House Elms Parade Oxford OX2 9LL,England, U.K. Display and nonbonded force field probe for sites of interaction between small molecules/functional groups and rigid protein structures. VAX and Evans & Sutherland. Midasplus Dr. Robert Langridge Department of Pharmaceutical Chemistry University of California San Francisco, CA 94143, U.S.A. fax 415-476-0688, e-mail
[email protected] Tel. 415-476-2630, Real-time interactive vector, space-filling, and ribbon displays. Silicon Graphics. OpenMolecule Andataco Computer Peripherals 9550 Waples Street, Suite 105 San Diego, CA 92121,U.S.A. Tel. 619-453-91 9 1, 800-334-9 19 1, fax 619-453-9294, e-mail daryl%
[email protected] Molecular graphics for a Sun SPARCstation. SIMCA-R Umetri AB Box 1456 S 901 24 Umea, Sweden fax 46-90-197685 Tel. 46-90-196890, Data handling, statistical modeling (projection of latent structures, principal components analysis), and plotting for QSAR. VAX and PC.
Reviews in Computational Chemistry, Volume3 Edited by Kenny B. Lipkowitz, Donald B. Boyd Copyright 0 1992 by John Wiley & Sons, Inc.
Author Index Aarts, E. H. L., 65, 67 Acharya, K. R., 141 Acton, F. S., 64 Adams, S. M., 219 Adamson, G. W., 213 Adesnik, M., 216 Alagona, G., 132 AI-Baali, M., 68 Albert, A., 212 Alden, C. L., 213 Alexander, L. S., 217 Allen, B., 211 Allinger, N. L., 65, 132 Aitman, R. B., 168 Altmann, K. H., 137 Amdur, M. O., 211 Ames, B. N., 211, 212 Andersen, K. V., 170 Anderson, A., 134 Anderson, B., 212 Anderson, E., 71 Anderson, R., 213 Andreatta, R. H., 137 Andreoli, C., 217, 218 Anfinsen, C. B., 130 Archakov, A. I., 217 Arcos, J. C., 217 Argus, M. F., 217 Arnold, D. L., 213 Arseniev, A. D., 1 6 7 Ashby, J., 212, 221
Ashida, T., 139 Auffinger, P., 140 Axelsson, O., 68 Rachem, A., 67 Bachmanova, G. I., 217 Bai, Z., 71 Bakale, G., 218 Balaban, A. T., 221 Balasubramanian, K., 219, 221 Balcerski, J. S., 137 Baleja, J. D., 170 Balls, M., 211, 216 Bandiera, S., 221 Banks, K. M., 170 Baptist, P., 68 Barker, J. A., 136 Bartmann, W., 140 Bash, P. A., 135 Bassolino, D. A., 171 Batt, A. M., 219 Bavoso, A., 137 Bawden, D., 166, 213 Beamand, J. A., 218 Belik, A. V., 213 Benedetti, E., 137 Benford, D. J., 212 Benigni, R., 217, 218 Bentley, D. L., 212 Berendsen, H. J. C., 167, 168, 170, 171, 172
249
250 Atstbor Index Berger, G. D., 219 Berne, B. J., 66 Bernstein, F. C., 142 Beveridge, D. L., 65, 66 Bickers, D. R., 217 Billeter, M., 167, 169, 170 Birnbaum, L. S., 221 Bischof, C., 71 Bjork, A., 65 Blaaboer, B. J., 222 Blake, B. W., 214 Blake, C. C. F., 221 Blaney, J. M., 221 Blumberg, P. M., 222 Blumenthal, L. M., 167 Blundell, T. L., 142 Bodenhausen, G., 166 Boelens, R., 167, 168, 169, 170, 172 Boggs, P. T., 65, 66, 69 Bohachevsky, 1. O., 67 Bolin, D. R., 169 Bonora, G. M., 137 Bonvin, A. M. J. J., 170, 172 Borgias, B. A., 169 Borgstedt, H. H., 214 Borman, S., 213 Bosch, C., 139, 167 Bovey, F. A., 138 Boyd, D. B., 66, 130, 136, 167, 215, 220, 225 Bragg, J. K., 137 Brasseur, R., 215, 222 Braun, W., 139, 167, 168, 169, 170, 171, 172 Brice, M. D., 142 Bridges, J., 211, 216 Bridges, J. W., 212 Brimblecombe, R. W., 221 Brinck, T., 221 Briner, W., 221 Brooks, B. R., 65, 132, 167 Brooks, C. L., 111, 134, 170 Brown, B. C., 136 Brown, L. P., 218 Brown, L. R., 139, 167 Broyde, S., 65, 66, 67 Bruccoleri, R. E., 65, 131, 132, 167 Brunger, A. T., 140, 170, 171, 172 Buckley, A., 69
Bunch, J. R., 68 Bures, M. G., 130 Burgess, A. W., 131, 134, 141, 167 Burke, K. E., 140 Burkert, U., 65 Burnett, C. M., 212 Burridge, j.M., 221 Burt, S., 219 Busson, R., 215,222 Buu-Hoi, N. P., 217 Byrd, R. H., 65, 66, 69 Bystrov, V. F., 167
Callahan, T., 215 Carr, P. L., 213 Carr, R., 171 Carrier, J., 65 Carroll, J., 211 Carruthers, L. M., 133 Carson, M., 134 Case, D. A., 65, 132, 170 Caspary, W., 212 Cauchy, A., 67 Chae, K., 221 Chaet, D., 219 Chandrasekhar, J., 134 Cheng, B., 134 Cherayd, €3. j., 140 Chin, S., 137 Choplin, F., 213 Chothia, D., 137 Chou, J. T., 214 Chou, K.-C., 130, 137 Choudhary, I., 141 Chow, T.-T., 65 Ciarlet, I? G., 65 Claes, P. J., 215, 222 Clardy, J., 141 Clernenti, E., 137 Clore, G. M., 140, 166, 167, 170, 171, 172 Cohen, A., 68 Cohen, F. E., 130 Coleman, R. E., 221 Collins, J. R., 220 Concus, P., 68 Conn, A. R., 69 Comers, T. A., 221 Conney, A. M., 217
Author
Connolly, M. L., 221 Coon, M. J., 216 Corbett, J. F., 212 Costello, R. J., 215 Cramer, R. D., 215 Crippen, G. M., 133, 138, 139, 166, 167, 168, 171 Crisrna, M., 137 Crothers, D. M., 169 Crowder, H. P., 68 Crump, K., 211 Cui, W., 140 Curtis, A. R., 68 Cuthbertson, A. F., 222 Czaplicki, J., 168 Czerwinski, M., 214 Dahlquist, G., 65 Dai, Q., 213 Dai, S., 220 Darden, T., 221 Dauber-Osguthorpe, P., 132 Davidon, W. C., 67 Davis, A., 215 Dayan, A. D., 221 de Boer, W. I., 222 de Kruif, C. A., 222 de Vlieg, J., 168, 172 Dearden, J. C., 212 Deisenhofer, J., 140 Dekkers, A,, 67 Dell’Aquila, M. L., 222 Dembo, R. S., 69 Demmel, J., 71 Dennis, J. E., Jr., 65, 67, 69 Denny, W. A., 214 Denomme, M. A., 221 Denton, M. E., 135 Dert, C. L., 66 Deuflhard, P., 70 Devillers, J., 212 Di Blasio, B., 137 Dill, K. A., 132 Dimayuga, M. I., 215 DiNola, A., 170, 171 Dinur, U., 130 Dixon, L. C. W., 67 Dodson, E. J., 138 Dongarra, J., 71
Doniach, S., 140 Donnelly, R. A., 171 Doull, J., 2 11 Du Croz, J., 71 Dudek, M. J., 131 Duff, I. S., 68 Dugard, P., 213 Dunfield, L. G., 134, 167 Dunn, W. J., 212, 214 Dupuis, M., 137 Dygert, M., 131 Earnshaw, C. G., 215 Eaton, H. L., 172 Edalji, R., 172 Edholm, O., 171 Egan, D. A., 172 Eisenberg, D., 134 Eisenstat, S. C., 69, 70 Ellar, D. J., 211 Ennever, F. K., 212 Enslein, K., 214 Erenrich, E. H., 137 Erisman, A. M., 68 Ernst, R. R., 166 Eskow, E., 66, 70 Estabrook, R. W., 216 Evans, D. J., 137 Evans, J. D., 68 Evans, P., 221 Falcomer, C. M., 141 Falke, H. E., 222 Fauchkre, J. L., 218 Fauci, L., 70 Fawkes, J., 221 Ferrel, J. E., 219 Ferrin, T. E., 221 Fesik, S. W., 170, 172 Feuer, G., 21 7 Feyereisen, R., 216 Fine, R. M., 131 Fingueroa, S., 66 Fitzwater, S., 141 Flannery, B. P., 140 Fletcher, R., 65, 67, 136 Flint, 0. P., 218 Flory, P. J., 135 Floudas, C. A., 65
Index 251
252 Author Index Fogelson, A., 70 Ford, M. G., 213 Forman, J. D., 169 Fossey, S. A., 140 Frazier, J. M., 21 1 Freudenthal, R., 220 Frierson, M. R., 214, 220 Fry, D. C., 169 Fu, Y-C., 13.3 Fujii-Kuriyama, Y., 216 Fujita, T., 221 Gampe, R. T., Jr., 170, 172 Gangolli, S. D., 218, 222 Gans, D. J., 214 Gay, D. M., 136 Gehring, W., 168 Gelatt, C. D., Jr., 67, 1.39, 171 Gemmecker, G., 172 Genest, M., 132 George, A., 68, 70 Gerhards, J., 213 Ghio, C., 132 Gibson, G. G., 212, 216, 218 Gibson, K. D., 130, 132, 134, 136, 137, 138, 140, 141 Gibson, W. B., 213 Gilbert, J. C., 68, 69 Gilbert, S. G., 213 GilI, P. E., 64, 6.5, 67, 69, 136 Giuliani, A., 217, 218 Glasser, L., 138 Glover, I., 142 Go, M., 130, 134, 13.5, 138 Go, N., 130, 131, 134, 135, 138, 139, 167, 168 Goff, H. M., 217 Gold, L. S., 212 Goldberg, A. M., 21 1 Goldberg, L., 212 Goldblum, A., 219 Golub, G. H., 65, 67, 68 Gomez, S., 66 Gonzales, C., 172 Gonzalez, F. J., 216 Gorrod, J., 2 18 Could, N. 1. M., 69 Gouterman, M., 219
Grabow, B. S., 71 Grant, J. A., 134 Gray, T. J. B., 212, 218 Greeley, D. N., 169 Creenbaum, A., 71 Greengard, L., 65 Gregg, E. C., 218 Grice, H. C., 213 Griesinger, C., 169 Griewank, A., 67, 69 Griffiths, V. S., 219 Grimberg, H., 214 Gronenborn, A. M., 140, 166, 167, 170, 171, 172 Gros, P., 172 Grotschel, M., 67 Guengerich, F. P., 216 Gunsafus, I. C., 216 Giintert, P., 168, 170 Gursky, M. C., 70 Guseva, V. V., 213 Haak, J. R., 170 Habazettl, J., 170 Hadzi, D., 213 Hagler, A. 1, 130, 132 Hall, L. H., 215 Hammarling, S., 71 Haneef, I., 142 Hansch, C., 212, 220 Hansen, J. P., 135 Hao, M.-H., 67 Hard, G. C., 213 Hare, D. R., 168, 170, 172 Hariharan, P. C., 219 Hart, J. B., 214 Hart, R. W., 211 Harvey, S. C., 66, 68, 1.30, 134, 170 Hasan, M. N., 214 Haseman, J. K., 211, 212 Hasinoff, B. B., 219 Havel, T. F., 167, 168, 171, 172 Hayashi, K., 168 Hayden, T. L., 66 Heath, M. T., 70 Hegarty, A. F., 221 Helfrich, R., 172 Henderson, D., 136
Author Index 253 Henry, D. R., 214 Hermans, J., 134, 167 Hermens, J., 213 Hesselink, F. T., 138 Hestenes, M. R., 67 Hillstrom, K. E., 71 Hingerty, B. E., 65, 66, 67 Hirst, D. M., 222 Hielmeland, L., 219 Hochlowski, J., 172 Hodder, K. D., 218 Hoerger, F. D., 21 1 Hoffmann, R., 133 Hoffmann, R. E., 172 Holak, T. A., 169, 170 Holzman, T. F., 172 Hong, S. D., 135 Honig, B., 134 Hoover, W. G., 136 Hopfinger, A. J., 214, 217 Hu, Y. F., 68 Huff, J. E., 211 Hull, S. E., 138 Hull, W. E., 169 Irnpy, R. W., 134 Inagaki, F., 168 Ingwall, R. T., 133 Ioannides, C., 215, 217, 218 Iri, M., 66
Jackson, M., 172 Jaeckh, R., 213 James, M. N. G., 131 James, T. L., 168, 169 Jardetzky, O., 168, 169 Jeffery, A. M., 222 Jenner, J., 166 Jerina, D. M., 222 Jerman-Blazic, B., 213 Jhon, M. S., 135 Johnson, E. F., 216 Johnson, H. L., 219 Johnson, K. W., 136 Johnson, M. E., 67 Jones, P. W., 220 Jones, T. A., 168 Joran, S., 221
Jorgensen, E. C., 221 Jorgensen, W. L., 134, 135 Joubert, W. D., 70 Jurs, P. C., 214, 222 Kagi, J. H. R., 268 Kakudo, M., 139 Kaliszan, R., 217 Kalk, A., 168 Kallen, J., 172 Kalopissis, G., 221 Kalos, A. N., 214 Kaminsky, L. S., 219 Kang, Y. K., 134 Khptein, R., 167, 168, 169, 170, 172 Karcher, W., 212 Karlsson, R., 138 Karplus, M., 65, 131, 132, 134, 167, 170, 171, 172 Kashuba, K. L., 133 Kaufman, J. J., 219, 221 Kawai, H., 140 Keepers, J. W., 169 Kemper, B., 216 Kennard, O., 142 Kessler, H., 169 Kettler, P. C., 69 Kidera, A., 142 Kier, L. B., 215, 222 Kikuchi, T., 140, 141 Kim, D., 220 Kim, Y., 169, 170 Kincaid, D. R., 70 Kinch, R. J., 137 Kirchner, R. F., 219 Kirkjian, E., 219 Kirkpatrick, S., 67, 139, 171 Kitchen, D. B., 171 Kitteringham, N. R., 216 Klaassen, C. D., 211 Klausner, R., 66 Klein, M. I., 134 Kline, A. D., 168 Klopman, G., 214, 215, 220, 221 Koehler, K. F., 222 Koetzle, T. F., 142 Kohda, D., 168 Kollman, P. A., 65, 132, 135, 169
254 Author Index Kondakov, V. I., '167 Koning, T. M. G., 169 Konishi, Y., 135, 141, 142 Korst, J., 65 Korte, G., 67 Korzekwa, K., 219 Koski, W. S., 219, 221 Kostrowicki, J., 66, 139, 140 Kouno, K., 220 Kozyreva, N. P., 213 Kraulis, P. J., 168 Krewski, D. R., 213 Kubinyi, H., 213 Kumar, A., 166 Kuntz, I. D., 167 Kurkjian, E., 219 Kuszewski, J., 172 Lake, B. G., 212,213, 218 Lambert, M. H., 132, 142 Lamparczyk, H., 217 Landau, L. D., 140 Langridge, R., 135, 221 Laurence, P. R., 220 Lautz, J., 169 Lavery, R., 65, 66 Le Dimet, F. X., 69 Le Page, C., 219 Leach, A. R., 66, 130 Leach, S. J., 132, 137 Lebedeva, M. N., 213 Lee, H. K., 167 Lemarechal, C., 69 LeNir, A., 69 Leo, A. J., 220 Lesyng, B., 68 Levesque, D., 135 Levin, W., 216 Levinthal, C., 131 Levitt, M., 68 Levy, A. V., 66 Levy, R. M., 171 Lewis, D. F. V., 212, 215, 216, 217, 218,219, 222 Lewis, M., 141 Lewis, P. N., 131, 133, 137, 141 Li, J., 211 Li, Z., 136, 138, 139, 171
~~~
~
Lifshitz, E. M., 140 Lifson, S., 65, 130 Lin, S. H., 135 Lindskog, G., 68 Lipkowitz, K. B., 66, 130, 136, 167, 215,220, 222,225 Lipnick, R. L., 212, 213 Liskamp, R. M. J., 222 Liu, D. C., 69 Liu, J. W., 68, 70 Livingstone, D. J., 213 Loew, G. H., 219,220, 221 Long, G., 219 Loper, J. C., 216 Lotan, N., 133 Louie, S. G., 139 Lowe, J. P., 219 Lu, S., 171 Ludvigsen, S., 170 Luenberger, D. G., 64 Lui, C., 220 Luke, B. T., 220 Lybrand, T. P., 136 Lyerly, M. A., 221 Macdonald, T. L., 216 Mace, J. E., 169 MacElroy, R., 219 Macina, 0. T., 214 Macura, S., 166 Madison, V. S., 169 Madrid, M., 169 Madura, J. D., 134 Mager, P. P., 213 Maigret, B., 134 Main, P., 138 Maiorov, V. N., 167 Maksic, M. E., 130 Maksic, Z. B., 130 Margolin, B. H., 212 Markovitz, S., 215 Marsh, M. M., 222 Marshall, G. R., 215 Marsmann, M., 213 Martin, Y. C., 130 Mason, G., 221 Matthies, H., 70 Mattice, W. L., 135
Author Index 255 Maynard, A. T., 222 McCammon, J. A,, 66, 68, 130, 135, 170, 171 McCann, J., 211 McConnell, E. E., 221 McCreary, R. D., 218 McFarland, J. W., 214 McGuire, R. F., 131, 133 McKenney, A., 71 McKinney, J. D., 219, 221, 222 McLachlan, A, D., 134 McQuie, J. R., 131 Mead, R., 136 Mehler, E. L., 213 Meinwald, Y. C., 141 Meirovitch, H., 135 Mertz, J. E., 170 Metropolis, N., 67, 136, 171 Metzler, W. J., 172 Meyer, E. F., Jr., 142 Mezei, M., 66 Milburn, P. J., 141 Miller, M. H., 138, 139 Mingeot-Leclercq, M.-P., 215, 222 Minor, R., 212 Mirau, P. A., 138 Mohammed, S. N., 217 Momany, F. A., 131, 132, 133, 137, 141 Montelione, G. T., 141 More, J., 67, 69, 71 Morrison, R., 168 Moskowitz, J. W., 140 Moss, D., 142 Moult, J., 131, 170 Miiller, A, 169 Miiller, M., 168 Murray, J. S., 220, 221 Murray, M., 216 Murray, W., 64, 65, 67, 69, 136 Namboodiri, K., 214 Narvaez, J. N., 214 Nash, S. G., 69, 70 Navon, I. M., 69 Nayeem, A., 134, 140 Nebert, D. W., 216 Nelder, J. A., 136 Nelson, D. R., 216
Nimethy, G., 130, 131, 132, 133, 134, 135, 137, 138,139, 140, 141 Nemhauser, G. L., 65, 66, 67 Neri, P., 172 Ng, E., 70 Nguyen, D. T., 65, 132 Nguyen, M. T., 221 Nice, E. C., 141 Nilges, M., 140, 170, 171, 172 Niu, G. C.-C., 131 Nocedal, J., 68, 69 Noggle, J. H., 166 Noonan, T. J., 212 Northrup, S. H., 171 Nouailler, A., 69 Novellino, E., 220 Nyberg, A., 66 Oatley, S., 221 Oka, M., 142 Okamoto, Y., 140 Okey, A. B., 221 Okuyama, K., 139 Olafson, B. D., 65, 132, 167 O’Leary, D. P., 68, 69 Olejniczak, E. T., 170, 172 Olson, W. K., 66, 67, 167 Olszewski, K. A., 140 O’Neil, J., 70 Ono, Y., 220 Oobatake, M., 134 Ooi, T., 132, 134, 137, 138, 141, 142 Oppe, T. C., 70 Oppenheim, I., 130 Oppenheimer, N. J., 168 Ortega, J. M., 70 Orton, T. C., 218 Osguthorpe, D. J., 132 Ostrouchov, S., 71 Otting, G., 168 Overton, M., 65, 68 Owicki, J. C., 136 Pachter, R., 168 Pack, G. P., 219 Paine, G. H., 139 Palmer, K. A., 131, 132, 134 Pardalos, P. M., 65
256 Author lndex Pardi, A., 170, 171, 172 Park, B. K., 216 Parkanyi, C., 220 Parke, A. L., 218 Parke, D. V., 215, 216, 217, 218, 221 Parrinello, M., 171 Paterlini, G., 132 Paterson, Y., 137 Patey, G. N., 136 Paton, D., 212 Pavone, V., 137 Pearlman, D. A., 169 Pearson, J. D., 136 Pedersen, L. G., 219, 221, 222 Pedone, C., 137 Peer, W. J., 134 Pepermans, H., 169 Perrot, G., 134, 135 Peskin, C. S., 65, 66 Pettitt, B. M., 134, 170 Phillips, D. C., 141 Phillips, 1. R., 216 Phillips, J., 219 Phillips, J. C., 213, 222 Phua, K. H., 68 Piela, L., 66, 134, 139, 140 Pinat, M. C., 70 Pincus, M. R., 66, 134, 137, 140, 141 Pitts, J. E., 142 Polak, E., 68 Poland, D., 132, 135, 137 Politzer, P., 220, 221 Ponder, J. W., 70 Posner, H. X., 222 Postrna, J. P. M., 170 Pottle, C., 137 Pottle, M. S., 131, 132, 137 Poulsen, F. M., 170 Poulsen, M. T., 219, 221 Powell, M. j. D., 67, 68, 69, 136 Pratt, W. B., 21 1 Press, W. H., 140 Prestegard, J. H., 169, 170 Prigogine, I., 134 Pritzker, C. S., 212 Proctor, T. R., 220 Profet, M., 212 Profeta, S., Jr., 132
Pudzianowski, A. T., 220 Pullman, A., 218 Pullman, B., 218 Purchase, R., 213, 218, 222 Purisima, E. O., 66, 136, 139 Pysh, E. S., 137 Qian, Y. Q., 168 Qianhuan, D., 220 Rackovsky, S., 137 Radecki, A,, 217 Rall, L. B., 67 Ram, P., 169 Rarnachandran, G. N., 131, 132 Rarnakrishnan, C., 131 Rarniller, N., 213 Ravirnohan, C., 13.5 Read, R. C., 70 Rebagliati, M., 219 Reevcs, C. M., 67, 136 Reid, B. R., 170 Reid, J. K., 68 Reidy, G. F., 216 Rein, R., 214 Resnick, M., 212 Reynolds, C. A., 220 Ribikre, G., 68 Rice, S. A., 134 Richards, F. M., 70, 172 Richards, W. G., 222 Richardson, M. L., 217 Richter Pack, S. A., 21 1, 212 Rickenbacher, U., 221 Rinnooy Kan, A. H. G., 65, 66, 67 Ripoll, D. R., 134, 139 Rippmann, F., 213 Rivail, j.-L., 172 Roberts, D., 213 Roberts, V. A., 132 Rodgers, J. R., 142 Rogers, J. W., Jr., 171 Rokhlin, V., 65 Romkes, M., 221 Rose, D. J., 68, 70 Rose, S. L., 222 Rosenbluth, A. W., 67, 136, 171 Rosenhluth, M. N., 67, 136, 171
Author Index 257 Rosenbrock, H. H., 136 Rosenkranz, H. S., 212, 214, 220, 221 Ross, M., 136 Rossi, A., 66 Rossi, M., 215 Roterman, I. K., 132 Rullmann, J. A. C., 172 Rumball, S. V., 141 Rumsey, S., 132 Rumsey, S. M., 137 Ryan, D. E., 216 Saad, Y., 70 Saenger, W., 68 Safe, S., 221 Saggers, D. T., 213 SaitB, N., 141 Sanderson, D. M., 215 Santini, A., 137 Sartore, L., 137 Sasisekharan, V., 131 Sato, R., 216 Saulsbury, A. W., 211, 212 Saunders, M. A., 67 Saunders, M. R., 215 Sawyer, T., 221 Scarsdale, J. N., 169 Schaefer, M., 219 Schaper, K.-J., 213 Schaumann, T., 168, 169, 171 Scheek, R. M., 167, 168, 169, 170, 172 Schellman, J. A., 135 Scheraga, H. A., 66, 68, 120, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 167, 171 Schirmer, R. E., 166 Schlick, T., 65, 66, 68, 69, 70, 71, 136 Schmidt, K. E., 140 Schnabel, R. B., 65, 66, 67, 69, 70 Scott, R. A., 68, 132, 134, 137 Seybold, P. G., 219, 220, 222 Seydel, J. K., 212, 213 Shanno, D. F., 68, 69 Sharkey, N. A., 222 Shelby, M. D., 212 Shenkin, P. S., 131 Sherman, A. H., 70 Shi, C., 213
Shi, Y., 171 Shi, Z., 220 Shimada, T., 216 Shimanouchi, T., 142 Shipman, L. L., 220 Shipp, A., 21 1 Shultz, M. H., 70 Shuster, I., 216 Siest, G., 219 Sigman, C. C., 219 Silipo, C., 212, 215, 220 Simmer, R., 172 Singh, U. C., 132, 13.5 Sippl, M. J., 132, 139 Sjoberg, P., 221 Skett, P., 216 Smith, D. W., 215 Smith, I. A., 220 Smith, S. L., 66 Snatzke, G., 140 Snyder, G. H., 169 Sofer, A., 70 Somorjai, R. L., 140 Sorensen, D., 71 Sorenson, D. C., 6 7 Southee, J., 211, 216 Spalding, J., 212 Spanggord, R. J., 219 Spangler, D., 219, 221 Sparling, J., 221 Sparrow, N. A., 222 Spitzfaden, C., 172 Stasiewicz, S., 212 States, D. J., 65, 132, 167 Steigemann, W., 140 Steihaug, T., 69 Stein, M. L., 67 Steinberg, I. Z., 136 Stewart, J. J. P., 215 Stiefel, E., 67 Stimson, E. R., 137 Stoer, J., 68 Storey, C., 68 Stouch, T. R., 214 Strang, C., 70 Strobel, H. W., 216 Stuart, D. I., 141 Styles, J. A., 212
258 Author Index Sudhindra, B. S., 219 Sukumaran, D. K., 170 Sussman, F., 135 Swaminathan, S., 65, 132, 167 Sykes, B. D., 169, 170 Sylvain, M., 140 Szyld, D. B., 70 Takahashi, S., 141 Talluri, S., 141 Tan, R. K. Z., 66 Tanabe, K., 66 Tanaka, N., 139 Tanaka, S., 133, 134, 141 Tasurni, M., 142 Taylor, J. B., 217 Taylor, P., 21 1 Teller, A. H., 67, 136, 171 Teller, E., 67, 136, 171 Ternbe, B. L., 135 Tennant, R. W., 212 Testa, B., 213 Teukolsky, S. A., 140 Thomason, J., 172 Thomson, C., 220, 222 Thuente, D. J., 67 Tichy, M., 213, 219 Tickle, 1. J., 142 Tirnrner, G. T., 66 Tipker, J., 217 Todd, M. J., 65, 66, 67 Toint, Ph. L., 68, 69 Tomb, M. E., 214 Toniolo, C., 137 Toorne, V., 169 Topliss, J. G., 215 Torda, A. E., 169, 172 Torrie, G. M., 136 Tourwe, D., 169 Traber, R., 172 Trager, W., 219 Travis, C. C., 211, 212 Tropp, J., 169 Troyer, J. M., 130 Tuckerman, M. E., 66 Tulkens, P. M., 215,222 Tung, C.-S., 68 Turner, L., 213 Tuttle, R. W., 133, 137, 141
Ueda, Y., 220 Valleau, J. P., 136 van Binst, G., 169 van Boom, J. H., 169 van der Marel, G. A., 169 van Gunsteren, W. F., 167, 168, 169, 170, 171, 172 van Iersel, A. A. J., 222 van Laarhoven, P. J. M., 65 van Loan, C. F., 65 van Schaik, R. C., 171, 172 Van Schepdael, A., 215, 222 Vanderbilt, D., 139 Vanderhaeghe, H. J., 215, 222 Vanderkooi, G., 132, 133, 137 Vasik, M., 168 Visquez, M., 133, 134, 135, 1.37, 138, 139, 141 Vecchi, M. P., 67, 139, 171 Verbist, L., 222 Verlet, L., 135 Verloop, A., 217 Vetterling, W. T., 140 Vichnevetsky, R., 70 Vila, J., 134, 135, 140 Vinter, J. G., 215 Vittoria, A., 212, 215, 220 von Freyberg, B., 171, 172 von Szentpiily, L., 220 Vorobjev, Y. N., 134 Wagner, G., 141, 168 Wako, H., 141 Wald, R. W., 219 Walker, H. F., 69 Walker, J. M., 219 Walker, N. P. C., 141 Walkinshaw, M. D., 172 Wang, H., 131 Wang, Y., 220 Warrne, P. K., 141 Warshel, A., 65, 135 Waterman, M. R., 216 Watts, P., 222 Waxrnan, D. J., 216 Weber, C., 172 Wegrzynski, B. B., 169 Weiner, P., 132
Author Index 259 Weiner, S. J., 65, 132 Wender, P. A., 222 Whaleupe, E. K., 214 Whitlock, J. P., 221 Wider, G., 172 Widmer, H., 169, 172 Willett, P., 130 Williams, D. E., 130 Williams, G. J. B., 142 Williams, R. L., 135 Williamson, M. P., 168 Wilson, C., 140 Wilson, S. R., 140 Wimmer, E., 215 Wingfield, P. T., 172 Wipff, G., 140 Wojcik, J., 137 Wokaun, A., 166 Wolfe, P., 68 Wolff, J., 132 Wong, J., 219 Wood, S. P., 142 Woodson, S. A., 169 Woolfson, M. M., 138 Worgotter, E., 168 Wortelboer, H. M., 222 Wright, M. H., 64, 65, 67, 136 Wu, C. W., 142
Wiithrich, K., 139, 141, 166, 167, 168, 169, 170, 171, 172 Yam, J., 213 Yambert, M. W., 212 Yan, J. F., 133 Yarmush, D. L., 131 Yip, P., 170 Yoon, B. J., 135 Yoon, C. N., 132 Yu, R. K., 169 Yuan, M., 214 Yuan, Y., 69 Yukawa, E., 220 Yuta, K., 214 Zagari, A., 132 Zefirov, N. S., 213 Zeiger, E., 212 Zenios, S. A., 70 Zerner, M. C., 215 Zimm, B. H., 137 Zimmerman, S. S., 133 Zmudzka, B., 221 Zou, X., 69 Zuiderweg, E. R. P., 167, 170 Zurini, M. G. M., 172
Reviews in Computational Chemistry, Volume3 Edited by Kenny B. Lipkowitz, Donald B. Boyd Copyright 0 1992 by John Wiley & Sons, Inc.
Subject Index A6 initio, 143 ACES 11, 239 Acetanilide, 183 Acetophenetidin, 183 Acute lethal dose (LD,,), 173, 174 Acyl carrier protein, 154 ADAPT (automated data analysis by pattern recognition techniques), 178, 203,206, 210 Adaptive Importance Sampling Monte Carlo, 110, 114 Aflatoxin B,, 201 Ah receptor, 188, 191, 195 ALADDIN, 245 L-Alanine, 8 0 ALCHEMY-11, 242 Alchemy 111, 226 Alcohols, 202 Aliphatic amines, 198 All-atom force field, 146 Alpha helix, 92, 94, 97, 98, 102 Alpha/alpha packing, 98 Alpha-amino isobutyric acid (Aib), 96 Alpha/beta packing, 98 Alpha-carbon, 76 Alpha-helical conformations, 84 Alpha-helix sense, 96 Alpha-lactalbumin, 127 AMBER, 82, 84, 86, 233 Ames test, 176, 199, 204 AMPAC, 230,236
Amphetamine, 205 AMSOL, 236 Anharmonicity, 89, 129 Aniline mustards, 197 Animal tests, 17.5 Annealing protocol, 162 Antiparallel beta sheets, 98 Apex, 235 Approximate Newton search direction, 43 Arealdepth parameter, 191 Arene oxides, 202 Argus, 239 Armijo and Goldstein criterion, 25 Aromatase, 185 Aromatic acids, 202 Aromatic amines, 196, 202 Aromatic hydrocarbons, 198, 202 Aromatics, 202 Array processors, 94 Asparagine, 101 Aspartic acid, 101 ASTERIX, 239 ATOM, 229 Atom-centered pair potentials, 83 ATOMPLUS, 229 ATOMSCF, 242 Automatic differentiation, 29 AUTONOM, 229 Average conformation, 114 Average solution structure, 153
261
262 Subject Index Avian pancreatic polypeptide, 128, 129 AVS ChemistryViewer, 237 Ball & Stick, 230 Basic descent, 21 Basic linear algebra subroutines (BLAS), 63 Benoxaprofen, 195, 197 Benzanthracenes, 196 Benzimidazoles, 196 Benzo[a]pyrene, 202 Benzoquinolines, 202 Beta barrels, 98 Beta sheet, 94, 96, 97, 98, 126 Beta turns, 126 Beta-alpha-beta crossover packing, 98 BetaIbeta packing, 98 Beta-proteins, 150 BIGSTRN3, 236 Bioavailability, 181 BIOGRAF, 180,237 Biologically active form, 103 BLDKIT, 243 Blocked amino acid residue, 81 Boltzmann-averaged conformation, 74 Boltzmann distribution, 110 BOSS, 238 Botulinum toxin, 174 Bovine pancreatic trypsin inhibitor (BPTI), 91, 123, 127, 150 Brookhaven Protein Data Bank, 128 Brownian dynamics, 93
Broyden-Fletcher-Goldfarb-Shannon (BFGS) formula, 41, 59, 63 Broyden’s quasi-Newton method, 40 Buildup technique, 17, 102, 104, 123, 128 Buildup with limited constraints, 123 Butane, 16, 17 t-Butyl alcohol, 205 y-Butyrolactone, 205 CAChe Work System, 226 CADPAC, 240 Cambridge Structural Database, 245, 246
CAMELEON, 244 Camphor, 188 CAMSEQIM, 226 Carbon monoxide, 173 Cardiotoxicity, 174 Cartesian coordinate system, 76, 78 CASE (computer-automated structure evaluation), 178, 203, 210 Catalyst, 233 CAVEAT, 244 Cayley-Menger determinant, 115 Cerius, 237 CHAIN, 247 Chance correlations, 179 CHARMm, 180, 237 CHARMM, 82, 84, 86, 233,237 CHELPG, 240 Chem3D Plus, 227 ChemCad+, 227 ChemDBS-3D, 244 Chemical carcinogens, 187 Chemlab, 180 chemVISION, 23 1 Chem-X, 180,227,234,244 Chiral errors, 164 Chirality, 148, 149 Chloracetic acid, 205 Chloramine, 205 Chlorobenzyl aspartates, 95 Cholesky factorization, 45 Chrysenes, 196 Cis peptide bonds, 85, 102 Classic Newton, 47, 48 CLOGP, 209, 245 Cluster analysis, 178 CNDOI2 method, 201 CNINDO, 236 COBRA, 244 Coiled-coil packing of helices, 98 Collagen, 104, 123 COLUMBUS, 240 Comparative Molecular Field Analysis (COMFA) 180, 238 Complete metrization, 163 Composer, 238 Computer-optimized molecular parametric analysis of chemical toxicity (COMPACT), 180, 181, 190, 191, 193, 195, 199,204,206, 210
Subiect lndex 263 Computer programs 3D Dictionary, 244 ACES 11,239 ALADDIN, 245 Alchemy 111, 226 ALCHEMY-11, 242 AMBER, 82, 84,86, 233 AMPAC, 230, 236 AMSOL, 236 Apex, 235 Argus, 239 ASTERIX, 239 ATOM, 229 ATOMPLUS, 229 ATOMSCF, 242 AUTONOM, 229 AVS Chemistryviewer, 237 Ball & Stick, 230 BIGSTRN3, 236 BIOGRAF, 180,237 BLDKIT, 243 BOSS, 238 CAChe Work System, 226 CADPAC, 240 CAMELEON, 244 CAMSEQIM, 226 CASE, 178, 203,210 Catalyst, 233 CAVEAT, 244 Cerius, 237 CHAIN, 247 CHARMM, 82, 84, 86, 180,233 CHELPG, 240 Chem3D Plus, 227 ChemCad+, 227 ChemDBS-3D, 244 Chemlab, 180 chemVISION, 23 1 Chem-X, 180, 227,234, 244 CLOGP, 209, 245 CNINDO, 236 COBRA, 244 COLUMBUS, 240 COMFA, 180,238 Composer, 238 CONCORD, 238, 246 CONECT, 243 CONMIN, 50 CONSTRICTOR, 244
Converter, 235 COSMIC, 180 DADAS, 150 DAYMENUS, 244 Delphi, 234 DEREK, 180, 181,204,206,210 Desktop Molecular Modeller, 227 DGEOM, 236 DIANA, 151, 157 DISCOVER, 82, 86 DISGEO, 148,236 DISMAN, 150 DMol, 241 DNAIRNA Builder, 232 DOCK, 246 DRUGIDEA, 181 ECEPPIZ, 76, 79, 82, 83, 84, 115, 118, 120 ECEPP/3, 84 EUREKA, 244 FANTOM, 161 FISIPL, 243 FITIT, 246 FORTICON8,230,236 FRODO, 246 GAMESS, 241 Gauss2, 229 Gaussian, 24 1 GRADSCF, 242 GRID, 247 GROMOS, 234 GSTAT, 245 H210N, 229 HAMOG, 228 HAZARDEXPERT, 177, 180, 181, 204,206, 211 HMO, 230 HONDO, 242 HyperChem, 228 Insight/Discover, 180, 234 ISIS, 245 Kekuli, 231 KGNGRAF, 242 KGNMOL, 242 Kinemage, 231 LAPACK, 64 LHASA, 180 Ludi, 235 Maclmdad, 23 1
264 Subject Index Computer programs (cont.) MacMirnic, 228 MacroModel, 235 MAD, 235 MELD, 242 METABOLEXPERT, I80,23 1 MicroChem, 228 Midasplus, 247 MM2, 8.3, 230, 236 MM3,235,236 MNDO, 236 MNDOSO, 243 MNDOC, 236 MOBY, 229 MODEL, 235 MOLCONN-X, 232 Molecular Silverware, 238 Molecules, 232 MOLIDEA, 180, 181 MOPAC, 230,236 NAMOD, 230 NanoVision, 232 NEMESIS, 229 NITRO, 238 NMR TRIAD, 238 NMRchitect, 235 NMRgraf, 237 N.N.Charge, 232 OpenMolecule, 247 PCILO3, 236 PCMODEL, 229 Pimms, 244 PLUTO, 245 POLYGRAF, 237 Polymer, 235 POLYRATE, 236 POSSUM, 246 PRO-EXPLORE, 236 Pro-log P, 181, 209 Prometheus, 237 PROPHET, 237 PRO-QUANTUM, 236 Protein Predictor, 232 PROTEP, 246 PSI88, 242 Python, 244 QSPR-Polymer, 237 Quanta, 180
QUANTAKHARMm, 237 QUEST, 245 RECEPTOR, 238 RTECS, 210 SAVOL2,245 Serratus, 244 SIMCA-R, 178, 247 Sketcher, 235 SPARTAN, 242 SPASMS, 233 STERIMOL, 230 STN EXPRESS, 230 SUMSL, 93 SYBYL, 180,238 SYBYLMDB, 246 TNPACK, 50 TOPDRAW, 233 TOPKAT, 177, 179, 203, 206, 210, 232 TOPMOST, 232 Toxline, 210 UniChem, 243 WHAT IF, 238 X-PLOR, 237 Yak, 239 Yeti, 239 ZINDO, 243 CONCORD, 238,246 Condition number, 30 Conditional free energy, 114 Conditions at minima, 5 CONECT, 243 Configurational entropy, 92 Conformational entropy, 92 Conformational free energy, 92 Conformational space, 144, 154, 163 Conjugate direction, 32 Conjugate gradient, 30, 59, 63, 93 CONMIN, 50 CONSTRICTOR, 244 Contraction in dimensionality, 115 Convergence, 28, 42, 150 Convergence bounds, 32 Convergence characterization, 28 Convergence criteria, 26 Convergence order, 28 Convergence rate, 30 Convergence ratio, 28, 30
Subject Index 265 Convergence test, 27 Converter, 235 Cooling schedule, 121 Coordinate descent methods, 29 Correlated distances, 148 COSMIC, 180 Coulomb's law, 89 Coumarins, 193, 195, 196 Coupling constant, 157 Covalent restraints, 144, 149 Crystal packing interactions, 103 Cyclohexaglycyl, 80 Cyclosporin A, 151 Cytochromes P450, 176, 182, 184 3D Dictionary, 244 DADAS, 150 Databases, 230, 243 DAYMENUS, 244 Dealkylation, 183 Decaglycine, 92 Decay constant, 155 Deformation of the potential energy space, 116 Deformed potential surface, 120 Degrees of freedom, 92 Dehalogenation, 183 Delphi, 234 Delta-endotoxin, 173 Deoxycytidine, 55 DEREK (deductive estimation of risk from existing knowledge), 180, 181, 204,206, 210 Descent directions, 21, 30 Descent methods, 18 Descent structure of local methods, 20 Descriptor variables, 179 Desktop Molecular Modeller, 227 Deterministic global algorithm, 19 Deterministic methods, 18 DGEOM, 236 2,4-Diarninophenol, 205 DIANA, 151, 157 Dibenz[a,h]anthracene, 192 2,3-Dibromo-l-propanol, 205 Dielectric constant, 90 Diffusion equation, 116
Diffusion equation method (DEM), 118, 120 Dihedral angle restraints, 157 Dihedral angle space, 118 Dihedral angles, 80, 150 3,4-Dihydrocoumarin, 205 3,3 '-Dimethylbenzidine, 205 Diol epoxides, 202 Diphenylhydantoin, 204 Dipolar interactions, 152, 154 Dipolar relaxation, 155 Dipole moment, 202 DISCOVER, 82, 86 Discrete conformations, 153 Discrete Newton, 35, 38 Discriminant analysis, 178 DISGEO, 148,236 DISMAN, 150 Dissimilarity, 164 Distance constraint potential, 152 Distance constraints, 127 Distance geometry, 115, 144 Distance matrix, 147, 163 Distance matrix error, 164 Distance restraints, 145, 152, 153, 15.5 Distance-dependent function, 90 DMol, 241 DNA, 6, 18 DNA/RNA Builder, 232 DOCK, 246 Dose-response curves, 208 Drug design, 103 DRUGIDEA, 181 Dynamical simulated annealing, 159 ECEPP/2, 84, 118 ECEPP/3, 84 Electrostatic field, 90 Electrostatic isopotential (EIP), 201 Electrostatic potential energies, 201 Electrostatically driven Monte Carlo (EDMC), 110, 120 Elongation factor, 193 Embed algorithm, 147, 148, 149 Empirical Conformational Energy Program for Peptides (ECEPP), 76, 79, 82, 83, 115,120 Empirical hydration models, 91
266 Subject Index End groups, 76, 77 Energy embedding algorithm, 160 Enkephalin, 103, 108 Ensemble, 74 Ensemble average, 153, 154 Environmental Protection Agency, 207 Epoxidation, 183 Epoxide hydrase, 201 Epoxides, 202 Ethyl alcohol, 174 Ethylene glycol, 205 Euclidean norm, 2 7 EUREKA, 244 FANTOM, 161 Fast particle methods, 16 Ferrous sulfate, 174 Fibrous proteins, 104, 120, 123 FISIPL, 243 FITIT, 246 Fletcher-Reeves formula, 34 Flexible geometry, 86 Folded structures, 75 Food and Drug Administration structural alerts, 207 Force field, 161 FORTICONI, 230,236 Four-dimensional coordinates, 149 Fourier-Poisson integral, 118 Free-volume method, 93 Friction term, 160 FRODO, 246 Frontier orbital energies, 193 Frontier orbitals, 209 Full-memory quasi-Newton methods, 42, 47,48 GAMESS, 241 Gauss2, 229 Gaussian, 241 Geometry, 81 Global minimum, 75, 84, 94, 102, 116, 118, 121, 123, 160 Global minimum function, 28 Global optimization methods, 18 Globoside, 153 Globular proteins, 104, 123 Glycine, 150
Gradient, 3 Gradient methods, 30 Gradient vector, 4, 12 GRADSCF, 242 Gramicidin S, 80, 103, 104, 105 Grand challenge, 63 GRID, 247 GROMOS, 234 GSTAT, 245 HZION, 229 Hairpin, 92 Hamiltonian operator, 118 HAMOG, 228 Hard-sphere potential, 81, 82, 86 Harmonic function, 89 HAZARDEXPERT, 177, 180, 181,204, 206,211 Helix breaking, 99 Helix stability, 99 Helix-coil transition, 98, 126 Hemicholinium-3, 174 Hemoproteins, 188 Hessian, 3, 4, 5, 38, 46, 47, 63 Hestenes-Stiefel formula, 3 4 Hexapeptide, 126 Higher-dimensional Euclidean space, 160 Highest occupied molecular orbitals, 193 HMO, 230 Homologous structural fragments, 151 Homology, 127 Homology matching, 190 HONDO, 242 Hiickel method, 201 Human leukocyte interferon, 123 Hydration, 129 Hydrogen bonds, 62, 84, 103 Hydrophobic interactions, 98, 100, 190 Hydrophobic parameters, 209 Hydrophobicity, 177 4-Hydroxyacetanilide, 205 Hydroxylation, 183 HyperChem, 228 Hyperplane, 4 Increase in Dimensionality, 115 Indefinite matrix, 12 Initial structures, 146, 151
Subject Index 267
Insight/Discover, 180, 234 Intensity, 152 Interchain interactions, 104 ISlS, 245 KekulC, 231 Ketoconazole, 185 KGNGRAF, 242 KGNMOL, 242 Kinemage, 23 1 K-L theory of carcinogenesis, 201 Knowledge-based systems, 179
Lac repressor headpiece, 159 Langevin equation, 160 LAPACK, 64 Laplacian, 117 Lennard-Jones 6-12 potential, 82 LHASA (logic and heuristics applied to synthetic analysis), 180 Liberational entropy, 92 Limited-memory Broyden-FletcherGoldfarb-Shannon formula (LMBFGS), 42, 59 Limited-memory quasi-Newton (LMQN), 35, 41, 47, 48 Line search, 21,22, 34 Linear conjugate gradient, 32 Linear preconditioning conjugate method, 43 Linearized embedding, 149 Local algorithm, 19 Local minima, 2, 102, 110, 122, 149, 158 Local minimization algorithms, 20 Local optimization methods, 18 Local preconditioner, 45 Local terms, 33 Locally acceptable segments, I51 Log P, 177, 181, 190,201, 209 Long-range interactions, 16, 87, 149 Loop segments, 79 Lothionein, 150 Lower dimensional spaces, 160 Lower distance bounds, 147, 150 Lowest empty molecular orbitals, 193 Ludi, 235 Lysozyme, 127
Maclmdad, 231 MacMimic, 228 MacroModel, 235 Macromolecular modeling, 90 MAD, 235 Manual model building, 151 Many-body interactions, 129 Mathematical notation, 3 Matrix density, 4 Mean-field theory, 118, 120 MELD, 242 Memory function, 155 METABOLEXPERT, 180,231 Metabolism, 181 Metabolite “tree”, 181 Metal toxicity, 203 Met-enkephalin, 102, 104, 109, 113, 115, 118, 119, 123, 124 Methoxyflurane, 183 Methyl bromide, 205 Metric matrix method, 147, 150, 163 Metropolis criterion, 108, 110 Metropolis Monte Carlo, 93, 110, 114, 121 MicroChem, 228 Midasplus, 247 Minicomputers, 233 Minima, 4, 5, 10 Minimization methods, 93 Minimum-energy point, 85, 87 Mixed function oxidases, 182 Mixing time, 165 MM2, 83, 230, 236 MM3, 235, 236 MNDO, 236 MNDOBO, 243 MNDOC, 236 MOBY, 229 MODEL, 235 Modified Cholesky factorization, 46, 47, 63 Modified Newton, 37 MOLCONN-X, 232 Molecular connectivity, 178 Molecular diameter, 199 Molecular dimensions, 180, 209 Molecular dynamics, 17, 91, 93, 158 Molecular dynamics trajectory, 154
268 Subject Index Molecular electrostatic potential, 209 Molecular force field, 144 Molecular fragment, 178 Molecular graphics, 230, 246 Molecular mechanics, 16, 45 Molecular modeling, 143, 223, 233 Molecular orbital calculations, 200 Molecular Silverware, 238 Molecules, 232 MOLIDEA, 180, 181 Mono-oxygenases, 182 Monte Carlo, 91, 119, 121, 161 Monte Carlo plus minimization (MCM), 106, 110, 120 Monte Carlo recursion, 93 MOPAC, 230,236 Morphine sulfate, 174 Multidimensional phase space, 2 Multiple instructions multiple data (MIMD), 63 Multiple regression analysis (MRA), 177 Multiple-minima problem, 16, 102, 104, 120 Multivariate property space, 128 Nabla operator, 156 N-acetylglycyl-”-methyl amide, 84, 87 N-acetylglycyl-N‘methylalanine amide, 86 NAMOD, 230 NanoVision, 232 2-Naphthylamine, 183 Naphthylene, 183, 205 National Toxicology Program (NTP), 199 Native conformation, 75 Negative-definite matrix, 4, 12 NEMESIS, 229 Neurotoxicity, 174 Neurotransmitter receptors, 174 Newton direction, 36 Newton equation, 36, 50 Newton methods, 35 Newton-Raphson, 35 Newton’s equations of motion, 159 Nicotine, 174 Nitriles, 196 NITRO, 238
p-Nitroaniline, 205 o-Nitroanisole, 205 Nitroarenes, 202 p-Nitrobenzoic acid, 205 p-Nitrophenol, 205 Nitrosamines, 196, 202 Nitrosoureas, 197 NMR distance constraints, 123 NMR TRIAD, 238 NMRchitect, 235 NMRgraf, 237 N.N.Charge, 232 NOESY cross-peaks, 157 Nonbonded energy, 84 Nonderivative methods, 29 Nonhelical conformations, 99 Nonlinear conjugate gradient algorithms, 26, 34, 39, 42, 47, 48 Nonlocal terms, 33 Non-Newtonian dynamics, 160 Nuclear magnetic resonance (NMR), 143 Nuclear magnetic resonance restraints, 149 Nuclear Overhauser enhancement (NOE), 144, 152, 155, 156 Nucleic acids, 17 Objective function, 2, 36, 43, 51, 115 Octanol/water partition coefficient, 209 Oligonucleotide, 155 Oligopeptides, 73, 87, 120 Omega-helical form, 102 OpenMolecule, 247 Optimization, 1 Parallel beta sheets, 98 Parallelism, 94, 129 Partial metrization, 163 Partition coefficient, 177, 209 Partition function, 74, 110, 111 Pattern recognition, 177 Pattern-recognition importance sampling minimization (PRISM), 128 PCBs, 196, 202 PCIL03,236 PCMODEL, 229 PEACS, 160
Subject
Penalty function, 115, 144, 150, 152, 156, 157 Pentachloroanisole, 205 Pentobarbital, 183 Personal computers, 226 Phenobarbital, 184, 192 Phenobarbital sodium, 174 Phenyl aziridines, 197 Phenytoins, 198 Phi/psi map, 81, 84, 88, 114, 128 Picrotoxin, 174 Pimms, 244 PK,, 181 PLUTO, 245 Poisson-Boltzmann equation, 90 Polak-Ribiere (PR) formula, 34 Polarizability, 202 Polarization, 129 Poly(G1y-Pro-Pro), 106, 107 Polyarnino acids, 96 Polycyclic aromatic hydrocarbons (PAHs), 184, 190, 193, 199, 200, 202 POLYGRAF, 237 Poly-L-alanine, 94, 95, 111, 112 Poly-L-proline, 85 Poly-L-valine, 96, 101 Polymer, 235 Polynucleotides, 17 Polypeptides, 17, 73, 121 POLYRATE, 236 Positive semidefinite matrix, 4, 12 Positive-definite matrix, 4, 12 POSSUM, 246 Potential energy function, 16, 82, 86 Potential energy minimization, 1 7 Powell’s method, 29 Precision, 17 Preconditioning, 32, 42 Preconditioning conjugate gradient (PCG) method, 33, 44 Pregnenolone, 184 Principal components analysis (PCA), 178 Probability density, 164 PRO-EXPLORE, 236 PROFILE method, 151 Pro-log P, 181, 209
Index 269
Promethazine, 205 Prometheus, 237 PROPHET, 237 PRO-QUANTUM, 236 Protein backbone, 148 Protein data bank, 145 Protein folding, 93 Protein Predictor, 232 Proteins, 74, 87, 121, 123 PROTEP, 246 Pseudo-atoms, 145 Pseudo-energy force constants, 162 Pseudo-energy term, 145, 153, 161 Pseudo parameter, 55 PS188, 242 Python, 244 QCPE, 230 QSPR-Polymer, 237 Quadratic convex function, 28 Quadratic truncation, 44 Quanta, 180 QUANTA/CHARMm, 237 Quantitative structure-activity relationships (QSARs), 176 Quasi-Newton, 35, 38 QUEST, 245 R factor, 165 Rank 1 update, 40 Rank 2 update, 41 RECEPTOR, 238 Receptor, 103 Redox potential, 203 Regularity constraint, 73 Relative entropy, 92 Residual test, 43 Residue geometry, 76 Resorcinol, 205 Restart vector, 35 Restrained molecular dynamics, 144 Right-handed alpha helix, 110 Ring closure with symmetry, 80 Ring closure without symmetry, 79 Root-mean-square deviation, 126 Rosenbrock minimization, 5 1, 55 Rotational embedding algorithm, 160
270 Subject Index Rotational entropy, 102 RTECS, 210 Saddle point, 7, 10 Sampling bias, 163 Sampling properties, 149, 155 SAVOL2, 245 Search direction, 21, 42 Search vector, 18, 21, 32, 36, 50 Secant approximation, 39 Secant unconstrained minimization solver (SUMSL), 93 Self-consistent electric field (SCEF), 106, 110 Self-consistent multitorsional field (SCMTF), 118 Serine, 100 Serratus, 244 Shape parameters, 192 Sherman-Morrison-Woodbury formula, 41 Short-range interactions, 87, 126, 149 Silk, 123 SIMCA-R, 247 Simulated annealing, 17, 18, 115, 121, 123, 159 Single instruction multiple data streams (SIMD), 63 Sketcher, 235 SNIFR, 160 Sodium chloride, 174 Soft, independent modeling of class analogy (SIMCA), 178 Sohation, 74, 91 Solvent parameters, 209 Solvent-accessible surface area, 9 1 Soret band, 188 Sparse factorization techniques, 45 Sparse matrix, 4 SPARTAN, 242 SPASMS (San Francisco Package of Applications for the Simulation of Molecular Systems), 233 Spin diffusion, 155 Squash trypsin inhibitor, 156 Stationary point, 7 Statistical coil, 92 Steepest descent, 30, 54, 93
Step length, 21, 32, 37 Steric map, 83 STERIMOL, 230 Steroid hormones, 182 Steroids, 195, 196, 199 S T N EXPRESS, 230 Stochastic dynamics, 159 Stochastic global algorithm, 19 Stochastic global optimization, 17, 18 Strict local minimum, 21 Structural descriptors, 176 Structure-toxicity models, 203 Strychnine sulfate, 174 Substituent constants, 177 Substructural fragments, 181 Substructure embedding, 148, 163 Supercomputers, 233 Superdelocalizability, 20 1 Superminicomputers, 233 SYBYL, 180,238 SYBYL/3DB, 246 Symmetric matrix, 4 Symmetric rank 1 update, 40 Systematic biases, 163 Systematic errors, 163 Tangent vector, 12 Taylor expansion, 5, 9 TCDDs, 174, 196,202 Tendamistat, 150 Testosterone, 186 Tetrodotoxin, 174 Theophylline, 205 Thermodynamic integration, 93 Thermodynamic perturbation theory, 93 Three-dimensional NMR, 156 Threonine, 100 Time-averaged distance restraints, 154 TNPACK, 50 TOPDRAW, 233 TOPKAT, 177, 179, 203,206,210, 232 TOPMOST, 232 Toxic endpoint, 207 Toxic radicals, 203 Toxic segments, 181 Toxicity, 208 Toxicity evaluation, 176 Toxicity predictions, 205
Subject Index 271 Toxicity rating, 174 Toxicity tests, 175 Toxicology, 173 Toxline, 210 Ttuns peptide bond, 79, 85, 102 Trial distance matrix, 148 Triamterene, 205 Triangle smoothing, 147 1,2,3-Trichloropropane, 205 Tricresyl phosphate, 205 Truncated Newton, 35, 43, 45, 46, 47, 48, 50, 59 Trust radius, 22 Trust region, 21, 34 d-Tubocurarine, 174 Two-dimensional NMR, 103, 144
Variable target function method, 149, 163 Vibrational entropy, 74, 75 Vicinal protons, 157 Virtual atom, 146 Virtual bonds, 93 Water, 90, 99 Water clusters, 61 Water-accessible volume, 91 WHAT IF, 238 Workstations, 233 Xenobiotics, 182 X-PLOR, 237 X-ray crystallography, 145, 188
Umbrella sampling, 93 UniChem, 243 United-atom, 90, 146 United-residue, 90 Upper distance bounds, 147, 150 W absorption, 188
Yak, 239 Yale Sparse Matrix Package (YSMP), 45,
van der Waals radii, 150 Variable metric, 35
Zimm-Bragg parameters, 98 ZINDO, 243
so, 5s
Yeti, 239