VIRTUAL SCREENING: AN ALTERNATIVE OR COMPLEMENT TO HIGH THROUGHPUT SCREENING?
Virtual Screening: An Alternative or Complement to High Throughput Screening?
Proceedings of the Workshop ‘New Approaches in Drug Design and Discovery’, special topic ‘Virtual Screening’, Schloß Rauischholzhausen, Germany, March 15–18, 1999
Edited by
Gerhard Klebe Institute of Pharmaceutical Chemistry, Philipps University of Marburg, Marbacher Weg 6, D-35032 Marburg, Germany
Reprinted from Perspectives in Drug Discovery and Design, Volume 20,2000
Kluwer Academic Publishers New York / Boston / Dordrecht / London / Moscow
eBook ISBN: Print ISBN:
0-306-46883-2 0-792-36633-6
©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Kluwer Online at: and Kluwer's eBookstore at:
http://www.kluweronline.com http://www.ebooks.kluweronline.com
Table of Contents Preface Combination of molecular similarity measures using data fusion C.M.R. Ginn, P. Willett and J. Bradshaw Optimization of the drug-likeness of chemical libraries J. Sadowski
vii 1 17
Generating consistent sets of thermodynamic and structural data for analysis of protein-ligand interactions T.G. Davies, J.R.H. Tame and R.E. Hubbard
29
Multiple molecular superpositioning as an effective tool for virtual database screening C. Lemmen, M. Zimmermann and T. Lengauer
43
A recursive algorithm for efficient combinatorial library docking M. Rarey and T. Lengauer
63
Modifications of the scoring function in FlexX for virtual screening applications M. Stahl
83
A knowledge-based scoring function for protein-ligand interactions: Probing the reference state I. Muegge
99
Predicting binding modes, binding affinities and ‘hot spots’ for protein-ligand complexes using a knowledge-based scoring function H. Gohlke, M. Hendlich and G. Klebe
115
Hydrophobicity maps and docking of molecular fragments with solvation N. Majeux, M. Scarsi, C. Tenette-Souaille and A. Caflisch
145
Virtual screening with solvation and ligand-induced complementarity V. Schnecke and L.A. Kuhn
171
Similarity versus docking in 3D virtual screening J. Mestres and R.M.A. Knegtel
191
Discovering high-affinity ligands from the computationally predicted structures and affinities of small molecules bound to a target: A virtual screening approach T.J. Marrone, B.A. Luty and P.W. Rose
209
In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes H. Briem and U.F. Lessel
231
Computer-assisted synthesis and reaction planning in combinatorial chemistry J. Gasteiger, M. Pförtner, M. Sitzmann, R. Höllering, O. Sacher, T. Kostka and N. Karg
245
Evaluation of reactant-based and product-based approaches to the design of combinatorial libraries V.J. Gillet and O. Nicolotti
265
Author Index Subject Index
289 291
Preface Virtual Screening: An Alternative or Complement to High Throughput Screening?
Gerhard Klebe
In the next couple of years the human genome will be fully sequenced [1]. This will provide us with the sequence and overall function of all human genes as well as the complete genome for many microorganisms. Subsequently it is hoped, that by means of powerful bioinformatic tools, the gene variants can be determined that contribute to various multifactorial diseases and genes that exist in certain infectious agents but not humans. As a consequence, this will allow us to define the most appropriate levels for drug intervention. It can be expected that the number of potential drug targets will increase, possibly by a factor of 10 or more [2,3]. Nevertheless, sequencing the human genome or, for that matter, the genome of other species will be only the starting point for the understanding of their biological function. Structural genomics is a likely follow-up, combined with new techniques to validate the therapeutic relevance of such newly discovered targets [4]. Accordingly, it can be expected that in the near future we will witness a substantial increase in novel putative targets for drugs. To address these new targets effectively, we require new approaches and innovative tools [3]. At present two alternative, however complementary, techniques are employed: experimental high-throughput screening (HTS) of large compound libraries, increasingly provided by combinatorial chemistry, and computational methods for virtual screening (VS) and de novo design [5]. Experimental HTS involves highly sophisticated robotics and advanced engineering know-how. Appropriate molecular test systems have to be automated and adapted to the conditions of HTS. Advanced computer and informatics technology has to handle the logistics and the immense data flow. HTS typically produces a tremendous amount of ligand binding data with typical hit rates of about 1%. Perhaps, at first glance, this figure appears quite low. However, considering one to several million compounds to be assayed per HTS run, this hit rate still provides a fair number of active compounds. Because HTS requires engagement in several cost- and labor-intensive techniques, many attempts have been made to increase its efficacy. As a consequence, in many companies, modelers have shifted their focus toward
viii the design of libraries optimally suited for HTS. So-called ‘optimally diverse libraries’ showing a minimum of redundancy have been created and compiled on the basis of inventive, property-discriminating descriptors. However, the enrichment with respect to discovered hits did not significantly depart from a random selection taken from a large library holding various organic compounds in the correct molecular weight range [6]. Perhaps these studies have stimulated and improved our understanding of similarity and helped to design targeted libraries for one particular binding site or as isosteres for a given reference ligand. Similarity and likewise diversity are typical properties that can only be defined relative to a reference and not globally over an entire sample of compounds. In the early stage of HTS quite optimistic and enthusiastic perspectives have been predicted. Together with the emergence of combinatorial chemistry, that was expected to push the frontiers of compound synthesis ahead by some orders of magnitude, the end of any rational and knowledge-based approaches has been forecasted. Today, several years later, a more realistic view has been accepted. First of all, automating biological testing is not without problems. False positives or non-specific target binding of possible test candidates are only some of the problems that puzzle scientists. Quite depressing are the reported success rates to translate apparent actives from HTS into leads that are suited for a subsequent optimization into a drug candidate [7]. Nevertheless, although hits discovered by HTS provide medicinal chemists with real chemical compounds that bind to a target [8], these hits do not contribute to our understanding of why and how they act upon the target. Any increase in knowledge is produced only once experimental structural biology or molecular modeling come into play to detect structural similarity or possible common binding modes among the obtained hits. Often enough hits are quite diverse in their chemical structure, thus preventing any reasonable intuitive comparison. Virtual screening, VS, is an alternative where the selection of compounds with predicted binding properties is attempted in the computer [9]. The approach appears quite tempting. Compounds to be studied do not necessarily exist and their testing does not consume valuable substance material. Experimental deficiencies, e.g. due to limited solubility or other effects that can interfere with the assay conditions do not matter. In contrast to HTS, VS requires as key prerequisite knowledge about the criteria responsible for binding to a particular target. Either the three-dimensional structure of the target is given by crystal structure determination, by NMR and by homology modeling, or at least a rigid reference ligand with known bioactive conformation is known that allows for sophisticated pharmacophore modelling. This provides information about the binding-site geometry and helps to define and predict
ix possible ligand-binding modes. Once the receptor-bound conformation of a reference ligand is known or can be estimated, searches for molecules with similar recognition properties, eventually experienced by quite different molecular skeletons, can be started. These comparative techniques either use fast flexible docking algorithms or focus on sophisticated molecular superposition techniques. However, if one sufficiently understands the features that make topologically diverse ligands similar or that are responsible for achieving a particular affinity toward a certain receptor, VS can be applied to screen either compound libraries of existing substances or computer-generated molecules. The latter examples could be detected as prospective leads and accordingly potential candidates for subsequent synthesis. Speculations have been made about the number of potential drug-like molecules ( y/(ny – 1).
Conclusions In this article we presented a combinatorial docking algorithm based on the incremental construction method in FLEXX. The idea of the algorithm is to enumerate the library molecules during the incremental construction algorithm, based on a tree data structure allowing to reuse previously calculated docking results efficiently. Because we assume that the structure of the library is given, the main application of this algorithm is in the development of focussed libraries in cases where already some information about the protein and putative ligands is available. We applied the new algorithm to three different libraries. For two libraries, we compared the sequential versus the combinatorial docking results showing that they are in good agreement. Nevertheless, it is also shown that the results depend on the order in which the R-groups are added in the build-up procedure. For the third library, we demonstrated that the algorithm is able to retrieve a known inhibitor from a large virtual library. Because the docking algorithm enumerates the library on the fly, the algorithm is very time- and space-efficient. The calculations for a large library could be done basically in main memory. Compared to a sequential calculation, the combinatorial docking algorithm is 25 to 30 times faster allowing the docking of a 20000-molecule library on a single CPU in a day. The recursive combinatorial docking algorithm can be applied in cases where one group plays a dominant role in the binding process. This group can be the core or one of the R-groups, it can have several instances and also several different binding modes. If such a group does not exist, the algorithm can still be applied with different build-up orders.
Acknowledgements The authors thank Bernd Kramer (4 Scientific Computing GmbH, Martinsried) for fruitful discussions on this topic and preparing most of the input data used in this article. We also thank our cooperation partners, especially Gerhard Klebe (University of Marburg), Hans Briem and Uta Lessel (Boehringer Ingelheim Pharma KG, Ingelheim) for various helpful comments during the method development. This work is part of the Relimo project, funded by the bmb+f (Bundesministerium für Bildung und Forschung) and the participat-
80 ing industrial partners Boehringer Ingelheim Pharma KG and Merck KgaA, Darmstadt under grant 03 11 620.
References 1. Gallop, M.A., Barrett, R.W., Dower, W.J., Fodor, P.A. and Gordon, E.M., J. Med. Chem., 37 (1994) 1233. 2. Gordon, E.M., Barrett, R.W., Dower, W.J., Fodor, P.A. and Gallop, M.A., J. Med. Chem., 37 (1994) 1386. 3. Walters, W.P., Stahl, M.T. and Murcko, M.A., Drug Disc. Today, 3 (1998) 160. 4. Kubinyi, H., Curr. Opin. Drug Discov. Development, 1 (1998) 16. 5. Kuntz, I.D., Science, 257 (1992) 1078. 6. Blaney, J.M. and Dixon, J.S., Perspect. Drug Discov. Design, 1 (1993) 301. 7. Lewis, R.A. and Meng, E.C., In Vinter, J.G. and Gardner, M. (Eds.), Molecular Modelling and Drug Design, CRC Press, Boca Raton, FL, 1994. 8. Guida, W.C., Curr. Opin. Struct. Biol., 4 (1994) 777. 9. Colman, P.M., Curr. Opin. Struct. Biol., 4 (1994) 868. 10. Rosenfeld, R., Vajda, S. and DeLisi, C., Annu. Rev. Biophys. and Biomol. Struct., 24 (1995) 677. 11. Böhm, H.-J., Curr. Opin. Biotechnol., 7 (1996) 433. 12. Lengauer, T. and Rarey, M., Curr, Opin. Struct. Biol., 6 (1996) 402. 13. Rarey, M., Kramer, B., Bernd, C. and Lengauer, T., In Hunter, L. and Klein, T. (Eds.), Biocomputing: Proceedings of the 1996 Pacific Symposium (electronic version at http://www.cgl.ucsf.edu/psb/psb96/proceedings/eproceedings.html). World Scientific Publishing Co, Singapore, 1996. 14. Makino, S. and Kuntz, I.D., J. Comput. Chem., 19 (1998) 1834. 15. Murray, C.W., Clark, D.E., Auton, T.R., Firth, M.A., Li, J., Sykes, R.A., Waszkowycz, B., Westhead, D.R. and Young, S.C., J. Cornput.-Aided Mol. Design, 11 (1997) 193. 16. Sun, Y., Ewing, T.J.A., Skillman, A.G. and Kuntz, I.D., J. Cornput.-Aided Mol. Design, 12 (1999) 597. 17. Kick, E.K., Roe, D.C., Skillman, A.G., Guangcheng, L., Ewing, T.J.A., Sun, Y., Kuntz, I.D. and Ellman, J.A., Chem. Biol., 4 (1997) 297. 18. Roe, D.C. and Kuntz, I.D., J. Cornput.-Aided Mol. Design, 9 (1995) 269. 19. Bohm, H.J., Banner, D.W. and Weber, L., J. Cornput.-Aided Mol. Design, 13 (1999) 51. 20. Caflisch, A., J. Cornput.-Aided Mol. Design, 10 (1996) 372. 21. Makino, S., Ewing, T.J.A. and Kuntz, I.D., J. Cornput.-Aided Mol. Design, 13 (1999) 513. 22. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. 23. Rarey, M., Kramer, B. and Lengauer, T., J. Cornput.-Aided Mol. Design, 11 (1997) 369. 24. Kramer, B., Rarey, M. and Lengauer, T., Proteins Struct. Funct. Genet., 37 (1999) 228. 25. Rarey, M., Wefing, S. and Lengauer, T., J. Cornput.-Aided Mol. Design, 10 (1996) 41. 26. Böhm, H.-J., Thrombin-Inhibitors, collected experimental data, personal communication. 27. Selassie, C.D., Fang, Z., Li, R., Hansch, C., Debnath, G., Klein, T.E., Langridge, R. and Kaufman, B.T., J. Med. Chem., 32 (1989) 1895. 28. Weber, L., Wallbaum, S., Broger, C. and Gubernator, K., Angew. Chem. Int. Ed. Engl., 34 (1 995) 2280.
81 29. Tripos Associates, Inc., St. Louis, MO, U.S.A., SYBYL Molecular Modeling Software Version 6.x, 1994. 30. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer Jr., E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M., J. Mol. Biol., 112 (1977) 535. 31. Banner, D.W. and Hadvary, P., J. Biol. Chem., 266 (1991) 20085. 32. Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin, R.C. and Kraut, J., J. Biol. Chem., 257 (1982) 13650.
Perspectives in Drug Discovery and Design, 20: 83–98, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.
Modifications of the scoring function in FlexX for virtual screening applications MARTIN STAHL F: Hoffmann - La Roche Ltd., Pharmaceutical Research, CH-4070 Basel, Switzerland (E-mail:
[email protected])
Summary. A modification of the hydrogen bond score in the docking program FlexX is presented. Hydrogen bonds formed in inaccessible regions of protein cavities thereby gain larger weight than others formed at the protein surface. The modified scoring function is tested with thrombin as a target. Secondly, a recently published knowledge-based scoring function is compared to the FlexX scoring function in several database ranking experiments. Key words: docking, hydrogen bonds, scoring, virtual screening
Introduction The goal of ‘virtual screening’ is to select subsets of chemical libraries in such a way that they are enriched with compounds showing a desired affinity towards a given macromolecular target [ 1,2]. Docking calculations are a means of database prioritization that makes use of the 3D structure of a receptor in a quantitative way [3–8]. The computational expenditure for docking calculations is higher than for 2D similarity and most 3D pharmacophore search methods. Reasonable database sizes are 100–10 000 compounds; depending on the problem specification, between 20 and 1000 compounds can nowadays be docked per CPU-hour. Docking has the advantages that – besides the target 3D structure – no other experimental information is needed and that there is no bias towards finding active compounds of specific structural classes. Since the seminal work by the Kuntz group [9], many docking algorithms have been proposed [ 10–24], some of which have resulted in commercially available packages such as GOLD [17,18], DOCK [9–11,13] and FlexX [21,22], which are suitable for virtual screening purposes. Flexible docking programs search the translational, rotational and conformational space of a putative ligand in the active site pocket of the receptor (see References 25–28 for recent analyses of search strategies). For each pose (denoting a conformation together with its orientation in space relative to the
84 receptor coordinates), the interactions formed between the receptor and the ligand are evaluated. This leads to an energy value, a score, for each pose. The score is a measure of the free energy of binding. It serves various ranking purposes, (i) ranking of poses generated for one ligand molecule and one receptor structure (structure prediction), (ii) ranking of different ligands relative to the same receptor (database prioritization) and (iii) comparison of binding energies of a ligand in two different receptors (selectivity assessment). Up to now, there exists no scoring function with satisfactory performance in all these ranking problems. Indeed, the accurate and rapid prediction of binding free energies is the one challenging problem of structure-based drug design [29–33]. Many types of fast general scoring functions have been proposed, ranging from molecular mechanics force fields [9,15,25,33–36], to functions empirically fitted to experimental binding energies [37–43] and potentials of mean force [44–50]. While the reproduction of experimentally solved complex structures is an accepted way of testing and comparing docking programs [51,52], surprisingly few virtual screening applications have been published. Compound selection based on docking calculations, for example, has been done for thrombin [53,54], thymidylate synthase [55,56], DHFR enzymes [57] and HIV protease [58,59]. However, only in rare cases [13], such studies are conclusive as to the efficiency of the docking tools in enriching subsets of a database in active compounds. The reason may be that only few consistent sets of measured binding energies are available to the public. Here we describe applications of the docking program FlexX [21,22] to a number of Roche in-house target structures and compound libraries in order to assess the performance of this docking tool in virtual screening applications. A simple modification of H-bond score in the FlexX empirical scoring function is presented as well as a comparison of the FlexX empirical scoring function [37] with a recently published knowledge-based scoring function [47], The results presented here give insight into the usefulness of current docking tools and scoring functions in practical virtual screening applications.
Computational methods Preparation of ligand libraries Three sets of Roche compounds were prepared, (i) a set of 3700 Roche compounds with known thrombin Ki values, (ii) a set of 470 diaminopyrimidines with associated IC50 data for S. aureus DHFR, and (iii) a set of 650 COX2 inhibitors. The thrombin and the DHFR data sets uniformly cover 6 orders of magnitude in activity and contain around 15% inactive molecules, while the
85 COX2 dataset covers 4 orders of magnitude. The program Corina [60] was used to generate 3D structures in Sybyl [61] mol2 format. The files generated by Corina were further processed by a C routine that generated a likely protonation state of acidic and basic functional groups. Aliphatic amines, amidines and guanidines were protonated, carboxylic acids were deprotonated, while the protonation state of aromatic nitrogen-containing heterocycles was left as generated by Corina. A set of 5000 randomly selected compounds from the WDI [62] was converted to mol2 format by means of the same procedure.
Docking protocol An X-ray structure of human α-thrombin complexed with NAPAP (PDB [63] code ldwd) was chosen as thrombin target structure. All protein atoms within a distance of 8 Å, of any NAPAP ligand atom were defined as active site atoms. The water molecule 47 in the P1 pocket was retained as an active site atom. In-house X-ray structures of Staphylococcus aureus DHFR [64] and COX2 were prepared in a similar way. All libraries were docked into the thrombin active site by means of the standard scoring scheme for hydrogen bonds. The WDI and thrombin libraries were also docked using a modified version employing accessibility scaling (vide infra). FlexX default settings were used except for ∆Grot, which was set to 0.7 kJ mol–1. Automatic base placement was used for the WDI, thrombin and COX2 libraries (average execution time per molecule: 2 min on an SGI R10K processor). The DHFR library was docked using a fixed placement of the diaminopyrimidine moiety that was taken from the X-ray structure (average execution time 6 s). Only rank 1 solutions were considered for each compound. Since many compounds contained stereocenters with unknown configuration and because a complete enumeration of all stereoisomers could not be afforded, docking was performed with just two enantiomers of an arbitrary stereoisomer generated by Corina. Only the structure and energy with the better rank 1 score was used for database ranking.
Hydrogen bond scoring schemes When solvation effects are not explicitly accounted for, empirical scoring functions treat hydrogen bond (or in the case of force fields electrostatic) contributions equal, whether they are formed on the surface of the protein or within a protein pocket. In our modified scheme, individual H-bond contributions are evaluated as a function of the solvent accessibility of the H-bond partner in the binding pocket, simulating the electrostatic shielding experienced by buried hydrogen bonds [65]. The standard hydrogen bond scoring
86
Figure 1 . Schematic representation of the calculation of surface accessibility. Instead of the 18 probe directions on this 2D representation, 45 spherical directions are used in the 3D case. In a first step, the number of directions is counted along which access to a surface point P is unhindered from the outside of the protein. In a second step, the resulting integer values are scaled in an interval between 0 and 1. Points on planar or convex parts of a surface receive accessibility values of 1.
scheme in FlexX is based on the penetration of so-called interaction geometries [21] of the binding partners. If a hydrogen bond exists by this definition, its contribution to the score is calculated as a constant term multiplied by a penalty function describing the amount of deviation from preset ideal angle and distance values. The modified scoring scheme introduces a second scaling term. For each point on the Connolly surface [66] of the binding site, an accessibility value a is assigned, whose calculation is described in detail elsewhere [67] (cf. Figure 1). Average a values are calculated for each surface atom. A sigmoidal function was empirically chosen to scale the hydrogen bond scores according to the accessibility values. Several functions of the general form (1 + exp(p(a – q) – r)), where a is the atomic accessibility value, were tested with varying values of p, q and r, ranging from almost linear to very
87 steeply ascending scaling functions. Best performance on a series of in-house data sets (data not shown) was found with values of p = 10, q = 0.2 and r = 5. This scaling function virtually removes the contributions of all hydrogen bonds formed at values of a = 0.6 and above and removes about half of the contribution of hydrogen bonds formed at a = 0.3. As a consequence, there is a strong overall reduction of hydrogen bond energy in the total score, which reflects our observation that contributions of the lipophilic contact surface are often underestimated in the FlexX scoring function. The performance of FlexX for structure prediction is not altered when the modified score is used (results not shown). It is obvious that the modified scoring scheme for hydrogen bonds is still a crude approximation. Still, all hydrogen bonds between various functional groups are treated equal and the chemical environment of each hydrogen bond is not considered [68]. Attempts by Bohm to improve his regression-based scoring scheme by terms accounting for the environment of hydrogen bonds have failed [41], but it is likely that terms for such secondary interactions cannot be derived by regression techniques.
Calculation of enrichment factors In this study we were not interested in absolute scoring energy values obtained from single docking runs, but focussed exclusively on the ranking of molecules with respect to each other. The quality of the rankings was assessed by enrichment factors. For this purpose, libraries were divided into ‘active’ and ‘inactive’ compounds at an arbitrary pKi threshold. In enriching experiments employing the WDI library, sets of 100 randomly selected compounds from the thrombin library were selected and added as the only ‘active’ compounds to the WDI library. In all cases, enrichment factors were calculated as a function of the size of library subsets by the formula ef(subset size) = fraction active in subset/fraction active in library On average, random screening should result in enrichment factors of ef = 1, values of ef < 1 are obtained when the subset contains less active compounds than should be expected from the library average. Enrichment factors were not scaled, i.e. absolute values obtained depend on the ratio of active and inactive compounds defined in each specific case and should be compared with the maximum achievable ef that is obtained by dividing the total number of molecules by the number of active molecules. For very large subsets of a library the term ‘enrichment’ naturally loses its meaning.
88
Figure 2. a) Plot of pKi vs. number of rotatable bonds for the thrombin library. Enrichment plots for two subsets of the thrombin library containing only compounds with 3–8 rotatable bonds (b) and only compounds with more than 8 rotatable bonds (c) are shown.
Standard vs. accessibility-scaled H-bond score Ranking a library of thrombin inhibitors The thrombin library used in this study has two properties that should be kept in mind: Firstly, 75% of the compounds contain an amidinium or guanidinium functional group and another 7% contain a different basic functional group designed to fit into the S1 pocket. Therefore, any successful ranking of binding affinity must rely on secondary hydrogen bonds and hydrophobic interactions rather than on the presence or absence of a basic group binding to Asp189 in the S1 pocket. Secondly, Figure 2a shows that the library contains a large percentage of compounds with many rotatable bonds and that there is a clear tendency for the large compounds to have higher activity. In order
89 to minimize the effect of molecule size, we will discuss results separately for two subsets, one containing all molecules with 3–8 rotatable bonds (1627 compounds, 141 3.5 Å (Figure 2). Going from O.3-O.3 to O.3-O.co2 and O.co2-N.p13, the minima corresponding to the shell of next neighbors fall into a decreasing distance range,
122
Figure 1. Statistical preferences for polar/charged pair interactions as a function of the distance R, calculated according to Equation 1.
thus exhibiting favorable interactions at lower distances. Contacts between the above-mentioned atom-types can be assigned to a ‘normal’ hydrogenbond, a polar charge-assisted interaction and a salt-bridge [48]. Expressed in terms of statistical preferences (Equation 1), an ideal O.3-O.3 interaction is 2.5 times less favorable than a similar O.3-O.co2 interaction. For the O.co2N.pl3 atom pair one has to take into account that in a bidentate salt bridge between carboxylate and amidinium/guanidinium this interaction is counted twice. For nonpolar contacts (Figure 2), the C.ar-C.ar interaction shows a slightly more structured preference compared to the (2.3-C.3 interaction and the minimum of the former resides at a shorter distance of 3.7 Å, in agreement with the well-known aromatic-aromatic interactions [49]. In contrast, C.3-C.3 interactions do not show any distinct preference of the atom pair distribution over the entire distance range of 4 to 6 Å of favorable interactions. This is clearly in agreement with the well-known fact that the latter type of interaction hardly exhibits any directional preferences.
123
Figure 2. Statistical preferences for nonpolar/aromatic pair interactions as a function of the distance R, calculated according to Equation 1.
Figure 3. Statistical preferences for ligand atoms of type C.3 and O.co2 as calculated from the distribution functions for solvent accessibility of both atom types for complexed and separated state from the protein according to Equation 2. The number of cubes (#cubes) is an approximate measure for the solvent accessibility; zero cubes refer to complete burial.
124 Table 1. Results for scoring multiple docking solutions of 91 protein-ligand complexes generated by FlexX and DrugScore % of complexes with solutions exhibiting rmsd of the crystal structure