LEAD GENERATION APPROACHES IN DRUG DISCOVERY
LEAD GENERATION APPROACHES IN DRUG DISCOVERY
Zoran Rankovic Schering-Plough Corporation Newhouse, Lanarkshire, UK
Richard Morphy Schering-Plough Corporation Newhouse, Lanarkshire, UK
Copyright r 2010 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Lead generation approaches in drug discovery / [edited by] Zoran Rankovic, Richard Morphy. p. ; cm. Includes bibliographical references and index. ISBN 978-0-470-25761-6 (cloth) 1. Drug development. 2. Drugs—Design. 3. High throughput screening (Drug development) I. Rankovic, Zoran. II. Morphy, Richard. [DNLM: 1. Drug Discovery—methods. 2. Lead. QV 744 L4345 2010] RM301.25.L433 2010 615u.19—dc22 2009040188 Printed in the United States of America 10 9 8 7 6
5 4 3
2 1
CONTENTS PREFACE
vii
CONTRIBUTORS
xi
1 Lead Discovery: The Process
1
William F. Michne
2 High Throughput Screening Approach to Lead Discovery
21
Zoran Rankovic, Craig Jamieson, and Richard Morphy
3 In Silico Screening
73
Dagmar Stumpfe, Hanna Geppert, and Ju¨rgen Bajorath
4 Fragment-Based Lead Discovery
105
Jeffrey S. Albert
5 Design of Multitarget Ligands
141
Richard Morphy and Zoran Rankovic
6 De novo Drug Design
165
Gisbert Schneider, Markus Hartenfeller, and Ewgenij Proschak
7 Role of Natural Products in Drug Discovery
187
Hugo Lachance, Stefan Wetzel, and Herbert Waldmann
8 Early Screening for ADMET Properties
231
Dennis A. Smith
9 Role of Chemistry in Lead Discovery
259
Roland E. Dolle and Karin Worm
INDEX
291
v
PREFACE Over the past decade or so, many, if not most, pharmaceutical companies have introduced a distinctive lead generation phase to their drug discovery programs. The critical importance of generating high quality starting compounds for lead optimization cannot be overstated. The decisions made during lead generation, in terms of which compounds to progress, fundamentally influence the chances of a new chemical entity progressing successfully to the market. One of the principal architects of lead generation as a separate phase within the pharmaceutical industry was Bill Michne while at AstraZeneca. In Chapter 1, he presents an overview of the lead discovery process describing its challenges and the achievements of medicinal chemists working in this area. The main source of hits continues to be high throughput screening (HTS). In Chapter 2, the evolution of the HTS approach over the last two decades is described by Zoran Rankovic, Craig Jamieson, and Richard Morphy from Schering-Plough Corporation. Today there is greater emphasis than ever before on the quality of compound collections and the robustness and fidelity of HTS assays and equipment. Screening is followed by a hit validation phase in which the desired and undesired attributes of hits are investigated, centred on potency, selectivity, physicochemical and ADMET properties, ease of synthesis, novelty, and SAR. Sufficiently attractive hits are then selected for hit-tolead optimization in which the goal is to build confidence in a chemical series to lower the risk that the commitment of additional resources in lead optimization will be in vain. In Chapter 3, Dagmar Stumpfe, Hanna Geppert, and Ju¨rgen Bajorath outline how knowledge-based in silico or “virtual” screening plays an increasingly important role at this early stage of the drug discovery process. Virtual screening methods can be neatly grouped into protein structure-based and ligand-based approaches, the former being used for targets for which 3D structural information is available, such as kinases and proteases, and the latter more extensively employed for membrane-bound targets such as GPCRs. Ligand-based virtual screening hinges on the concept of molecular similarity, what descriptors are used to represent molecules, and how the similarity between those representations of different molecules is assessed. HTS and in silico screening are complementary techniques; following HTS, readily available analog of hits can be retrieved using similarity searching thereby generating early SAR to guide the hit-to-lead process. The ever-increasing emphasis on physicochemical properties in recent years has spawned the concept of ligand efficiency and an increased interest in vii
viii
PREFACE
fragment-based drug discovery (FBDD). In Chapter 4, Jeffrey Albert describes the different methods of conducting FBDD, NMR spectroscopy and crystallography in particular, and some of the successes that have been achieved. The first compounds derived from FBDD are now advancing through the clinic, proving the validity of the approach, at least for enzyme targets. Most current drug discovery projects aim to discover a drug that is highly selective for a single target. However, one of the main causes of attrition in Phase 2 proof-of-concept studies is poor efficacy. The multitarget drug discovery (MTDD) area is attracting increasing attention amongst drug discoverers, since drugs that modulate multiple targets simultaneously have the potential to provide enhanced efficacy. The challenges and opportunities presented to medicinal chemists by MTDD are described by Richard Morphy and Zoran Rankovic in Chapter 5. Most approaches to lead generation rely upon screening of some sort, be it random HTS, FBDD, or a knowledge-based virtual screen. However, in Chapter 6, Gisbert Schneider, Markus Hartenfeller, and Ewgenij Proschak provide an alternative perspective, describing how lead compounds can be designed de novo. Algorithms that allow either structure-based or ligand-based assembly of new molecules are described. In the area of de novo design, historically one of the main challenges has been to ensure the synthetic feasibility of the designed compounds. Fortunately, new approaches to estimate the synthetic accessibility of de novo designed molecules are becoming available. Natural products have been an important source of drugs, morphine and taxol being two prominent examples that are both ancient and modern, respectively. During the rush toward high throughput methods of lead generation in the past decade, less emphasis was placed on using the natural world as a source of starting compounds and inspiration for drug discovery. However, recently a resurgence of interest in natural products is apparent within the drug discovery community. Hugo Lachance, Stefan Wetzel, and Herbert Waldmann remind us in Chapter 7 of their historical track record and the potential of natural product-like compounds for exploring regions of biologically-relevant chemical space that are not accessible using the medicinal chemist’s more usual repertoire of synthetic compounds. In the early days of the lead generation phase, there was often an overemphasis on increasing potency during the hit-to-lead phase while neglecting physicochemical property and pharmacokinetic profiles. Today, multiple parameters are optimized in parallel to produce leads with a balanced profile of biological, physicochemical, and pharmacokinetic properties. In Chapter 8, Dennis Smith of Pfizer outlines how early screening of hits and leads for metabolic, pharmacokinetic, and toxicological liabilities can reduce attrition during the later phases of drug discovery. No matter whether hits are derived from random screening or knowledgebased approaches, medicinal chemists will naturally assess and prioritize hits on the basis of their synthetic feasibility. Those that are amenable to analog
PREFACE
ix
synthesis in a high throughput parallel fashion will be favored. The skills for making libraries of compounds are now well developed amongst lead generation chemists, and the historical development of this area is described by Roland Dolle and Karin Worm in Chapter 9. One approach to lead generation that is not covered explicitly in this book, but which is as relevant today as it ever was, is starting from an existing drug. As Nobel Laureate James Black once remarked, “The most fruitful basis for the discovery of a new drug is to start with an old drug.”. If there are no suitable drugs with which to start, preclinical molecules that are described in the primary literature or in a patent can also be considered. The advantage of starting with an existing agent is that much knowledge concerning its behavior in in vitro and in vivo assays, and perhaps also in the clinic, is already in the public domain. This can facilitate rapid progress toward a second-generation “best-in-class” molecule that has an improved profile, which could be, for example, improved pharmacokinetics, allowing an easier dosing schedule for patients or an improved safety profile by virtue of being more selective for the target of interest. It is hoped that this book succinctly captures the essence of lead generation toward the end of the first decade of the twenty-first century and, looking further ahead, this critical phase of the drug discovery process is unlikely to diminish in its relevance and importance, and thus many new developments can be eagerly anticipated.
CONTRIBUTORS JEFFREY S. ALBERT, AstraZeneca Pharmaceuticals, Wilmington, Delaware, USA JU¨RGEN BAJORATH, Rheinische Friedrich-Wilhelms-Universita¨t, Bonn, Germany ROLAND E. DOLLE, Adolor Corporation, Exton, Pennsylvania, USA HANNA GEPPERT, Rheinische Friedrich-Wilhelms-Universita¨t, Bonn, Germany MARKUS HARTENFELLER, Eidgeno¨ssische Technische Hochschule (ETH), Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zu¨rich, Switzerland. CRAIG JAMIESON, Schering-Plough Corporation, Newhouse, Lanarkshire, UK HUGO LACHANCE, Max-Planck Institute for Molecular Physiology, Dortmund, Germany WILLIAM F. MICHNE, MedOrion, Inc., Oriental, North Carolina, USA RICHARD MORPHY, Schering-Plough Corporation, Newhouse, Lanarkshire, UK EWGENIJ PROSCHAK, Johann Wolfgang Goethe-University, Frankfurt am Main, Germany ZORAN RANKOVIC, Schering-Plough Corporation, Newhouse, Lanarkshire, UK GISBERT SCHNEIDER, Eidgeno¨ssische Technische Hochschule (ETH), Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zu¨rich, Switzerland. DENNIS A. SMITH, Pfizer Global R&D, Sandwich, Kent, UK DAGMAR STUMPFE, Rheinische Friedrich-Wilhelms-Universita¨t, Bonn, Germany HERBERT WALDMANN, Max-Planck Institute for Molecular Physiology, Dortmund, Germany STEFAN WETZEL, Max-Planck Institute for Molecular Physiology, Dortmund, Germany KARIN WORM, Adolor Corporation, Exton, Pennsylvania, USA
xi
1
Lead Discovery: The Process WILLIAM F. MICHNE MedOrion, Inc., 6027 Dolphin Road, Oriental, NC 28571, USA.
[email protected] CONTENTS 1.1
Introduction
1
1.2
Historical Overview
2
1.3
The Hit-to-Lead Process
3
1.4
The Magnitude of the Problem
5
1.5
Screening Libraries
6
1.6
Molecular Complexity
10
1.7
Similarity
12
1.8
Biology-Based Libraries
13
1.9
A Hit-to-Lead Example
15
1.10 Conclusion References
1.1.
17 17
INTRODUCTION
The discovery of a lead compound is arguably the first significant chemistry milestone along the path to the discovery of a new small molecule drug. Without a solid lead a new drug discovery project must be put on hold or abandoned. The process by which this milestone is achieved has been and is one of continuous evolution. Indeed, even the definition of what constitutes a lead has been refined over the past two decades such that lead criteria are much more stringent than ever before. This is a necessary consequence of high Lead Generation Approaches in Drug Discovery, Edited by Zoran Rankovic and Richard Morphy. Copyright r 2010 John Wiley & Sons, Inc.
1
2
LEAD DISCOVERY: THE PROCESS
attrition rates in the clinic, the escalation of drug development costs beyond the billion dollar mark, and the ever higher expectations of safety and efficacy on the part of regulatory agencies, physicians, and patient populations at large. It is incumbent upon medicinal chemists to find the strongest possible leads so as to minimize the risk of failure as the compounds move along the increasingly expensive path to the pharmacists’ shelves. This chapter is intended to provide an introduction to the volume by presenting a historical overview of lead discovery, along with an assessment of where we stand today and the major barriers that still remain. The coverage cannot be exhaustive, considering the volume of literature on the subject. Rather, it will be selective, focusing on key contributions that form the body of basic principles in current use, and providing indications of directions for further research likely to improve the efficiency and cost effectiveness of lead discovery.
1.2.
HISTORICAL OVERVIEW
The use of exogenous substances to treat human diseases can be traced to very ancient times. The use of opium, for example, was described by the Sumerians several thousand years ago. In 1805 Serturner isolated from opium a single pure substance that he called morphine. This allowed the study of the effects of morphine without the interfering effects of other constituents of the opium mixture, and marks the beginning of the science of pharmacology. There followed rapidly the isolation of additional alkaloids from opium, and their comparative biological evaluations were the beginnings of modern medicinal chemistry, notwithstanding the fact that the structures of these substances would not be known for another century or more. Biological studies were purely descriptive. Compounds were administered to whole animals and effects were observed. If a compound’s effect was considered desirable as it may have related to human disease, the compound might have been evaluated for toxicity, and then tried in humans. In the early twentieth century the theory of drug effect resulting through specific interaction with a receptor was advanced by Paul Ehrlich. His basic concepts of specificity, reversibility, and development of tachyphylaxis are still largely valid today. In combination with the development of molecular structure and its elucidation, this gave rise to the concept of structure–activity relationships (SAR) as a means of improving drug action. Wohler’s synthesis of urea in 1828 established the science of synthetic organic chemistry, which allowed chemists to explore non-natural analogs of natural product structures for biological activity. Thus, the process of discovering new drugs relied almost entirely on the discovery of analogs of natural products, prepared one at a time, with biological activity in whole animal systems, through about two-thirds of the twentieth century. About that time, the pace of discovery and understanding in the field of molecular biology began to accelerate. We quickly learned that genes were expressed as proteins, and that some of these proteins, either because they were overexpressed, or
1.3. THE HIT-TO-LEAD PROCESS
3
incorrectly expressed because of genetic mutations, were largely responsible for many human diseases. We also learned that these proteins were in many cases the targets of drugs shown to be effective in treating the disease. We learned how to manipulate genetic sequences, how to transfect modified sequences into cells that would then produce the modified protein, how to isolate those proteins, and how to assay for protein function in the presence of a potential drug. Advances in robotics allowed us to conduct tens of thousands of assays in a matter of weeks or days, considered a trivial exercise by today’s standards. By the early 1990s, high throughput screening (HTS) emerged as the ultimate solution to drug discovery, if only we could provide the numbers of compounds to stoke the robots. A company’s legacy compound collection became its screening library. These collections, typically in the range of 50–100,000 compounds, reflected the past chemical interests of the company, and as a result, were of a low order of chemical diversity, consisting rather of a relatively small number of groups of similar compounds. This quickly gave rise to compound acquisition strategies. New positions and new departments with new budgets were created with the purpose of buying compounds from whomever was willing to sell them, such as university professors and small chemical companies that had closed their doors for whatever reason. Eastern European chemical operations became a rich source of compounds with the collapse of the Soviet Union. Still, screening libraries grew slowly during the early 1990s, while screening robots and assay development moved forward at a similarly slow pace. The situation is captured in this quote from the first paper published on the hit-to-lead (HTL) process [1]: Consider the fairly typical situation of a screening program consisting of sixteen assays and a library of 150,000 compounds to be completed in one year. If each compound is screened at a single concentration, then 2.4 million data points will be generated. If the hit rate is only 0.01%, then 20 active compounds per month can be expected.
At that time the question of how to sort out which compounds to work on was a fairly daunting task.
1.3.
THE HIT-TO-LEAD PROCESS
Several companies worked on the problem, but Sterling-Winthrop was the first to publish a method [1]. They began by defining hit-to-lead chemistry as ‘‘a process wherein the likelihood of failure or long term success for a given chemical series can be evaluated in a systematic way.’’ They further defined an active as ‘‘any substance that shows reproducible activity in any screen.’’ They defined a hit as having the following attributes: activity in a relevant bioassay is reproducible; structure and high purity has been confirmed; selectivity data has been generated; available analogs have been tested; the structure is chemically
4
LEAD DISCOVERY: THE PROCESS
tractable; and there is potential for novelty. Finally, a lead was defined as a hit with genuine SAR, which suggests that more potent and selective compounds can be found. The process that was developed had five basic objectives that are still valid today. The first is the validation of structure and purity. Many of the screening samples were decades old, prepared and purified prior to the widespread use of high field NMR spectroscopy for structure determination and chromatographic techniques for purity assessment. Further, many samples simply decomposed during long-term storage. It was found that 10–30 percent of the samples in some collections either had incorrect structures or were less than 90 percent pure. A second objective is to find the minimum active fragment (pharmacophore) in a complex active molecule, as such information impacts the strategy for SAR development of the series. A third objective is to enhance selectivity in order to minimize activity at closely related molecular targets. A fourth objective is to exclude compounds with inappropriate modes of action, such as nonspecific binding to the molecular target. The fifth objective is to increase potency. Actives often have activity in the micromolar range. Such compounds can be problematic later in development with respect to achieving effective concentrations in the target tissue or organ. The overall objective of HTL chemistry is to find the best possible leads that can be progressed to lead optimization (LO) in the shortest period of time. Project leaders must be ruthlessly Darwinian in their willingness to drop projects for which one or more of the above objectives cannot be achieved in about six months. Beyond that, time-precious resources are increasingly expended on projects with decreasing probabilities of success. The effectiveness of an HTL chemistry group is determined solely by the rate and quality of leads generated; the timely termination of hit series for cause must be considered to play a vital role in that effectiveness. In the ensuing years since the formulation of the HTL objectives, most if not all lead discovery organizations established an HTL function. Beyond the five fundamental objectives most had differences in degree, rather than kind, as to the exact process to be followed. Nevertheless, essentially every drug discovery organization in the world had incorporated, by the late 1990s, an HTL function, by whatever name they chose, into their overall drug discovery pathway. One such typical pathway is illustrated in Figure 1.1. In general, such pathways captured the key activities and timelines for progression of compounds, but often lacked specific criteria to be met. As experience was gained, and successes and failures were evaluated, new attributes of the process were developed. Many organizations became more rigid in their milestone criteria for hits and leads. Perhaps one of the most complete yet succinct descriptions of criteria in use was published by Steele and coworkers [2] at AstraZeneca. While these criteria may not be universally applicable because of the nature of a particular molecular or disease target, they were developed as a result of their internal experience with the HTS–HTL–LO history across many disease targets and therapeutic areas. Hit criteria are aimed at providing post-HTS evidence that the activity is not associated with
1.4. THE MAGNITUDE OF THE PROBLEM
Hits: attractive multiple starting points for HO, i.e., lead-like, tractable.
Quality chemotypes from multiple approaches
Development candidate (DC): • Potency & selectivity • Efficacy demonstrated in relevant disease models • Suitable for the preferred route of administration and dosing regiment • Safety window • Feasible formulation and manufacturing • Patentable • Differentiable (First in class or Best in class)
Hit Confirmation
Hit Optimization
Lead Optimisation
Clinical Development
Assay development: 3–4 months 6–12 months
9–18 months
1–2 years
7–10 years
Target Discovery
Hit Generation
5
HTS campaign: 2–4 weeks
Leads: Confidence in delivering Achieved a level of target validation DC. Clear strategy to overcome deficiencies Defined discovery strategy, i.e. hit generation (designed targeted libraries)
Approved drug Feedback to research
Figure 1.1. The drug discovery pathway from the target to the clinic. The figure was kindly provided by Dr. Zoran Rankovic.
impurities or false positives, and that the issues to be addressed in HTL have at least been evaluated. Lead criteria build on the hit criteria to demonstrate sufficient scope and robustness to sustain a major LO project. Both sets of criteria are subdivided into three categories as they relate to the compound series; pharmacology; and absorption, distribution, metabolism, and excretion (ADME), including in vivo data for leads. The authors admit that this is a bewildering array of information to capture during a supposedly fast and low resource plan. However, they state that, ‘‘this set of criteria has grown from lessons learned by the unwitting progression of avoidable but time-consuming issues into LO, leading to more attrition in a phase where it is less acceptable.’’ Subsequent publications from this company have included an abbreviated version of the criteria to illustrate the HTL process (vide infra).
1.4.
THE MAGNITUDE OF THE PROBLEM
Over the last two decades there have been tremendous advances both in synthetic organic chemistry and in screening technology. The former, through methods in combinatorial chemistry, and more recently in multiple parallel syntheses, has increased the available numbers and diversity of screening libraries, and the latter has increased the rate at which assays can be run. As these technologies advanced, libraries of billions of compounds, and the robots to screen them, were envisaged. A logical extension of these technologies would
6
LEAD DISCOVERY: THE PROCESS
be to synthesize and test all possible drug-like molecules against all possible disease targets. This turns out to be utterly and completely impossible. Defining as drug-like molecules that consist of no more than 30 nonhydrogen atoms, limited to C, N, O, P, S, F, Cl, and Br, with molecular weight r500 Da, and making an allowance for those unstable to moisture and oxygen at room temperature, the total number of possible combinations has been estimated to be of the order of 3 1062 [3]. Assuming an average molecular weight of 350 Da, if one were to prepare only a single molecule of each of these possible structures and place them in a sphere at a density of 1 g/cm3, the diameter of the sphere would be approximately 40 billion miles. Note, however, that the best small molecule drug for every disease is contained within this sphere of chemistry space. Note also that we can never know whether any particular drug is the best possible one for its use.
1.5.
SCREENING LIBRARIES
Early thinking in the design of screening libraries was that biological activity was to be found uniformly in diversity space. Accordingly, vast combinatorial libraries, based solely on available synthetic methodologies, and the desire to be as broadly diverse as possible, were prepared and screened against a broad range of molecular targets. Hit rates turned out to be disappointingly low, and attempts to progress many of these hits to leads were unsuccessful. It rapidly became clear that within drug chemistry space biological activity occurred only in certain regions, which in all likelihood occupied only a small portion of the total space. Thus, the random approach to building productive screening libraries had only a correspondingly small probability of success. This gave rise to new thinking about how to describe biologically relevant chemistry space, and how then to design libraries that expand and further explore that space. Many approaches were taken, but all are, for the most part, variations on one of the following three themes: physical properties, chemical structures, and biological behavior. Most of the effort has been concentrated on the first two of these, as structure and properties enjoy a high level of intellectual comfort among chemists. The relationship of biological behavior to library design is less intuitive, but may ultimately prove to be the most effective approach. Through about the early to mid-1990s medicinal chemistry strategy for drug discovery was to make compounds of similar structure to some lead compound with the hope of finding increasingly potent compounds. The most potent compounds were then advanced. When a compound failed in development it was replaced with a related potent compound and the process repeated. The relationship between physical properties and successful drug development was not fully appreciated until publication of the ‘‘rule of five’’ by Lipinski and coworkers [4]. The importance of this paper is apparent from its well over 1500 citations. The rule states that poor absorption or permeability are more
1.5. SCREENING LIBRARIES
7
likely when the calculated 1-octanol/water partition coefficient is W5, the molecular mass is W500 Da, the number of hydrogen bond donors (OH plus NH count) is W5, and the number of hydrogen bond acceptors (O plus N atoms) is W10. Somewhat later the importance of the number of rotatable bonds [5] and polar surface area [6] to oral bioavailability was established. These six easily calculated physical properties have become important considerations in the design of screening libraries, as well as HTL and lead optimization programs. Just how important these properties are was assessed by Wenlock and coworkers [7]. This work compared profiles of compounds in various stages of development (Phase I, II, III, Preregistration, and Marketed oral drugs), and included data on compounds that were discontinued from the first three phases in order to evaluate properties trends. The properties studied did not include polar surface area, but did include logD7.4, a measure of the lipid water distribution of ionizable compounds at pH 7.4. The data was reported as the mean value and standard deviation of each property for the set of compounds in each phase, both active and discontinued. The mean molecular weights for drugs in all phases of development are higher than that of marketed oral drugs. The mean logP of compounds in the active phases are very similar to that of marketed oral drugs, whereas the values for discontinued compounds are approximately 0.6 units higher. The situation is exactly the same for logD7.4. For H-bond donors, only the Phase I group was significantly different at the 95 percent level from the marketed drugs. For H-bond acceptors, four groups differed from the mean value for marketed drugs. For rotatable bonds, five groups differed significantly from marketed drugs. One of the clearest findings of this work is that mean molecular weight in development decreases on passing through each of the phases. The same appears to be true for logP and logD7.4. Thus, molecular weight and lipophilicity are two properties that show the clearest influence on the successful passage of a candidate drug through the different stages of development. Leeson and Springthorpe [8] extended this work to evaluate trends in physical properties of launched drugs from 1983 to 2006. The number of drugs trended steadily downward, while the molecular weights trended steadily upward. Indeed, over this time period, there are increases in three of the four rule-of-five properties, with lipophilicity changing less appreciably. Polar surface area and rotatable bond count also increased. The authors suggest that factors driving the increases in physical properties include a greater number of apparently less druggable new targets, for which larger and more lipophilic molecules appear to be necessary for high affinity binding to active sites. But the fact that drug lipophilicity is changing less over time than other physical properties suggests that this is an especially important drug-like property. This property essentially reflects the key event of molecular desolvation in transfer from aqueous phases to cell membranes and to protein binding sites, which are mostly hydrophobic in nature. The authors’ thesis is that if lipophilicity is too high, there is an increased likelihood of binding to multiple targets. They used the Cerep BioPrint database of drugs and reference compounds to examine the
8
LEAD DISCOVERY: THE PROCESS
role of physical properties in influencing drug promiscuity and side effects. They found that promiscuity is predominantly controlled by lipophilicity and ionization state. Thus, they concluded that the most important challenge today is delivering candidate drugs that will not eventually fail owing to propertiesbased toxicity. The implications of working increasingly closer to the extremities of drug-like chemical space appear serious for overall productivity and promiscuity, leading to increased risks of pharmacologically based toxicity. The assessment and characterization of the structures of all known biologically active molecules would be a daunting task. A simpler and arguably still representative approach would be to study the structural attributes of known drugs. Bemis and Murcko [9] used some 5000 compounds from the Comprehensive Medicinal Chemistry database. They analyzed only the graph representation of the molecule, that is, the representation of the molecule by considering each atom to be a vertex and each bond to be an edge on a graph, without regard to atom type or hybridization. Ring systems were defined as cycles within the graph representation and cycles sharing an edge. Linker atoms are on the direct path connecting two ring systems. A framework was defined as the union of ring systems and linkers. Interestingly, only 6 percent of the molecules were acyclic, and only 32 frameworks were found to occur in 20 or more compounds, accounting for 50 percent of the total compounds. By separating rings from linkers it was found that only 14 rings and 8 linkers (chains of 0–7 atoms) were represented. While it is tempting to speculate about reasons for finding so few frameworks among so many drugs, the authors stated that the results do not necessarily reveal some fundamental truths about drugs, receptors, metabolism, or toxicity. Instead, it may reflect the constraints imposed by the scientists who have produced the drugs. Synthetic and patent considerations, and a tendency to make new compounds that are similar to known compounds, may all be reflected in these findings. In a second publication by these authors [10] a similar analysis was carried out for the side chains, defined as nonring, nonlinker atoms. They found that 73 percent of side-chain occurrences were accounted for by only 20 side chains. This suggests that the diversity that side chains provide to drug molecules is quite low, and may explain the poor performance of combinatorial screening libraries in which diversity elements are added as side chains. The authors suggest that one way their findings could be used is to generate a large virtual library by attaching side chains to frameworks at various points. The virtual library could then be computationally filtered to optimize a particular property of interest prior to the library actually being synthesized. They have illustrated this application by designing a library optimized for blood brain barrier penetration [11]. Most companies began to analyze in detail the composition and HTS hit rates of their legacy libraries, combinatorial libraries, and natural products libraries. Jacoby and coworkers published a detailed analysis of their experience at Novartis [12]. They found that the validated hit rate of the combinatorial collection was 0.02 percent. The hit rate of the historic medicinal
1.5. SCREENING LIBRARIES
9
chemistry archive was higher at 0.03 percent, but both were outperformed by the natural products collection that had a hit rate of 0.12 percent. However, when the percentage of assays with hits from each compound source is considered, a very different picture emerges. The combinatorial and natural products collections were successful in about 60 percent of the assays, whereas the historic archive collection always produced hits. They suggested that the higher performance of the latter collection is due in part to its consisting of individually synthesized compounds with a resulting overall higher scaffold diversity, and in part to the compounds having been driven toward a historic target focus, and therefore have more similarity to active compounds of similar classical druggable targets. Using this and additional analyses based on cheminformatics, as well as the molecular framework approach of Bemis and Murcko (vide supra), these authors developed a compound collection enhancement program that also included the extraction of privileged scaffolds from HTS data. The literature soon saw a proliferation of computational methods to extract the best compounds and the most information from screening libraries. Many of these methods, while interesting in their own right, were unavailable to the broad scientific community because of issues of hardware, software support, and the proprietary nature of the method. The most valuable of these methods, then, were the empirical, practical, hands-on strategies. In these cases, chemists developed sets of rules to define the acceptability of compounds. This approach is not without its pitfalls. For example, Lajiness and coworkers [13] conducted a study in which 13 experienced medicinal chemists reviewed a total of about 22,000 compounds with regard to their acceptability for purchase for use in screening. The compounds had already passed a number of standard compound filters designed to eliminate chemically and/or pharmaceutically undesirable compounds. The compounds were divided into 11 lists of about 2000 compounds each. Most of the chemists reviewed two lists. Unknown to the chemists, each list contained a subset of 250 compounds previously rejected by a very experienced senior medicinal chemist. The results indicated that there is very little consistency among medicinal chemists in deciding which compounds to reject. It was also shown that when medicinal chemists review the same compounds a second time but embedded within a different 2000 compound set, they reject the same compounds only about 50 percent of the time. The authors were careful to point out that the results make no judgments of right or wrong, but only that opinions often differ. There is probably better agreement among chemists when considering substructures and functional groups. Steele and coworkers [2] at AstraZeneca developed a set of such chemistry filters for eliminating undesirable compounds. Their resultant rules are quite simple. The first rule states that compounds must have at least one N or O atom and at least one noncyclic bond. The second rule states that compounds must not have any of the substructures or functional groups listed in a table of chemistry filters with 46 entries. In contrast to the above discussion of a low level of consistency among individual chemists considering specific compounds, these filters were derived from a consensus of opinions of a
10
LEAD DISCOVERY: THE PROCESS
panel of experienced medicinal chemists considering only substructures and functional groups. Similar tables have certainly been developed at other companies, and it would be interesting to compare them if they were to become available. The AstraZeneca group went on to partition their collection into the Good Set (largely lead-like compounds passing all chemical filters, molecular weight o 400, logP o 5.0); the Bad Set (pass all chemical filters, molecular weight o 550, logP o 7.0); and the Ugly Set (molecular weight W 550, or logP W 7.0, or fail any chemical filter). The Ugly Set was excluded from HTS.
1.6.
MOLECULAR COMPLEXITY
Molecular complexity is probably impossible to define precisely, but like art, most chemists know it when they see it. Such molecular features as multiple chiral centers, polycyclic structures, and multiple functionality, symmetry, heteroatoms, etc., all contribute to one’s perception of complexity. Notwithstanding our inability to define it, there have been numerous attempts to quantitate it [14–16]. All of these attempts have been structured from an organic synthesis perspective. For example, using these methods the complexity index of each intermediate of a synthesis can be calculated and the change in complexity plotted as a function of step number in the synthesis. In this way, several syntheses of the same molecule can be compared with respect to progression of complexity as the syntheses proceed. Of course, a complex molecule may require a complex synthesis today, but new developments in synthetic methodology may result in a dramatically simplified synthesis. Does this mean the molecule is no longer considered complex? From a biological perspective, however, the notion of molecular complexity has a rather different meaning. The extent to which two molecules bind, or not, is dependent on the complementarity of their various surface features, such as charge, dipole effects, hydrogen binding, local hydrophobicity, etc. If complementarity is thought of in terms of complexity, it seems intuitive that the more complex a molecule is, the less likely it is to bind to multiple protein binding sites. In other words, the more complex the molecule, the more likely it is to be selective for a single, or small number of protein targets. This notion appears to have been first expressed by Bentley and Hardy in 1967 [17]. They were trying to discover analgesics superior to morphine (1, Fig. 1.2) by separating desirable from undesirable effects. All efforts to that point had been in the direction of making structures simpler than morphine, such as meperidine (2). They reasoned that compounds of simpler structure would be able to fit all of the receptor surfaces associated with the different effects. These considerations led them to examine bases more complex and more rigid than morphine, in the hope that the reduced flexibility and the differences in peripheral shape between such compounds and other known analgesics would result in the new bases being unacceptable at some of the receptor surfaces, and thus to a separation of the various effects. Among the molecules that have emerged from this work is
1.6. MOLECULAR COMPLEXITY
CH3 N
CH3 N
N H3C CH3 CH3 CH3 OH O O H3C
CO2C2H5 HO
O
HO
OH
1
2 OH
O
H2N
OH
N
3 NH2 OH
HO O
HO
O CH3
O
4 (138)
11
5 (237)
O H2N
OH O
6 (243)
Figure 1.2. The structures of the opiates morphine (1), the simpler analog meperidine (2), and the more complex buprenorphine (3). The glutamate ligands glutamic acid (4), (S)-AMPA (5), and Ly-354740 (6) are shown with calculated Barone–Chanon complexity indices in parentheses.
buprenorphine (3), clearly more complex than morphine by the addition of a bridged bicylic system, a hydroxyl group, two new chiral centers, and peripheral hydrophobicity. A strict comparison of the promiscuities of 2 and 3 apparently has not been done. However, using the Cerep BioPrint database (vide supra) it was found that meperidine produced 24 hits, and buprenorphine produced only 8 hits, defined as W30 percent inhibition at 30 mM.1 It thus appears that higher orders of complexity are consistent with the Bentley and Hardy hypothesis. A second example using structures more closely related to each other is from the glutamate field. For each structure, the Barone–Chanon complexity index [16] has been calculated and is shown in Figure 1.2. Glutamic acid itself (4) has an index of 138, and is obviously a ligand for all 20-plus glutamate receptors. (S)-AMPA (5) has an index of 237 and is highly selective for the ionotropic glutamate receptors, and LY-354740 (6), with an index of 243, is highly selective for a subset of metabotropic glutamate receptors. Again, there does seem to be a relationship between molecular complexity and receptor selectivity.2 This suggests that for an LO program, increasing complexity while holding the values of other properties within druglike boundaries might be a viable strategy. But what about screening libraries? The above observations suggest that libraries of very complex molecules might produce relatively few hits because the compounds are too complex to fit the
1 2
We thank Dr. Paul D. Leeson for providing this information. Michne, W. F., unpublished data.
12
LEAD DISCOVERY: THE PROCESS
binding sites on the protein targets. Indeed, Tan et al. [18] assembled a library of over two million compounds based on a very complex natural product core. Screened in three assays, the library produced no hits in two of them, and only one hit in the third, with an EC50 of 50 mM. Hann and coworkers [19] developed a very simple model to represent probabilistic aspects of molecular recognition. The model does not incorporate flexibility in either the ligand or the binding site. With this type of model it is possible to explore how the complexity of a ligand affects its chance of matching a binding site of given complexity. They found that the probability of finding any type of match decays exponentially as the complexity of the ligand increases, and that the chance of finding a ligand that matches only one way, that is, a unique binding mode, in a random choice of ligand peaks at a very low ligand complexity. The results further suggested that libraries containing very complex molecules have a low chance of individual molecules binding. In their view it is better to start with less complex molecules, and to increase potency by increasing complexity. They suggest that there is an optimal complexity of molecules that should be considered when screening collections of molecules, but did not speculate as to what that complexity should be, or how to measure it. Schuffenhauer and coworkers [20] used historical IC50/EC50 summary data of 160 assays run at Novartis covering a diverse range of targets, and compared this to the background of inactive compounds. As complexity measures they used the number of structural features present in various molecular fingerprints and descriptors. They found that with increasing activity of the ligands, their average complexity also increased. They could therefore establish a minimum number of structural features in each descriptor needed for biological activity. They did not, however, address the issue of complexity versus selectivity.
1.7.
SIMILARITY
Generations of medicinal chemists have developed SAR by synthesizing molecules similar to an active molecule. This is generally considered successful, although any experienced chemist will agree that the approach is full of surprises. When considering compounds for purchase to augment an existing library, it is important to consider whether structurally similar molecules have similar biological activity. This question was specifically addressed by Martin and coworkers [21]. They defined similar as a compound with Z0.85 Daylight Tanimoto similarity to an active compound. They showed that for IC50 values determined as a follow-up to 115 HTS assays, there is only a 30 percent chance that a compound similar to an active is itself active. They also considered whether biologically similar compounds have similar structures. They compared the nicotinic agonists acetylcholine (7) and nicotine (8), and the dopaminergic agonists dopamine (9) and pergolide (10) (Fig. 1.3). They found that the highest Daylight Tanimoto similarity within this group of four compounds (0.32) was between nicotine and pergolide, and the second
1.8. BIOLOGY-BASED LIBRARIES
CH3 N
N CH3 CH3
O
H3C
N
CH3
O
7
HO
13
8
NH2
S
HN
HO
CH3
N
CH3 9
10
Figure 1.3. Biologically similar molecule pairs: acetylcholine (7) and nicotine (8), and dopamine (9) and pergolide (10).
highest (0.22) was between nicotine and dopamine. These results were likely in part due to deficiencies in the similarity calculations, and in part because similar compounds may not necessarily interact with the target macromolecule in similar ways. The latter explanation was addressed elegantly by Bostro¨m et al. [22]. They exhaustively analyzed experimental data from the protein database (PDB). The complete PDB was searched for pairs of structurally similar ligands binding to the same biological target. The binding sites of the pairs of proteins complexing structurally similar ligands were found to differ in 83 percent of the cases. The most recurrent structural change among the pairs involves different water molecule architecture. Side-chain movements were observed in half of the pairs, whereas backbone movements rarely occurred.
1.8.
BIOLOGY-BASED LIBRARIES
Consideration of the foregoing discussions of complexity, structural similarity, and binding mode dissimilarity suggests that the vast majority of the possible compounds in drug-like chemistry space are simply not biologically active. However, the fact that a compound similar to another biologically active compound has a 70 percent probability of being inactive at that molecular target does not preclude its being active at a different unrelated molecular target. This suggested to us that compounds that have exhibited biological activity at any target, as well as compounds similar to biologically active compounds, might have activity at other targets, and would therefore serve as the basis for building efficient screening libraries, which we dubbed biologybased libraries. We tested this hypothesis2 by selecting all the compounds in the AstraZeneca compound collection that had shown confirmed activity in any
14
LEAD DISCOVERY: THE PROCESS
screen, including compounds that had been prepared for SAR development in LO projects. At the time of this work the set consisted of approximately 72,000 compounds. The compounds were filtered to retain only those that had leadlike physicochemical properties and were devoid of objectionable functionality. We then selected against frequent hitters by choosing only those compounds that had been tested in a minimum of 10 assays, and found active in at least 2, but in no more than 5 assays. This resulted in 25,536 compounds. We assumed that if this set consisted largely of frequent hitters, the distribution of compounds would be weighted toward those that were active in five assays, with correspondingly fewer actives in four, three, and two assays. However, the distribution was exactly the opposite: 15,351 were active in two, 5,849 were active in three, 2,863 were active in four, and only 1,473 were active in five assays. We then compared the performance of these compounds against the performance of the general screening collection across 40 new assays of diverse molecular targets and found that hit rates for the selected set exceeded those of the general collection in 38 of the assays, in some cases dramatically so. The 25,536 compounds consisted of about 9,000 clusters, of which about 5000 were singletons. We concluded that these singletons would serve as good starting points for expanding the screening collection. The advantages of this approach is that hits could be quickly found in time-consuming and expensive low throughput assays, and that new privileged cores will be found. The disadvantage is that new chemotypes are not likely to be found very often. Nevertheless, we believe that the approach can add value to an overall screening program. Another interesting approach to new hit and lead discovery has been developed at Telik, and described in detail with examples in a paper by Beroza et al. [23]. Briefly, target-related affinity profiling (TRAP) is based on the principle that almost all drugs produce their effects by binding to proteins. This fact suggests that protein-binding patterns might be a particularly relevant way to classify molecules. In TRAP, the characterization of a molecule’s proteinbinding preferences is captured by assaying it against a panel of proteins, the ‘‘reference panel.’’ The panel is fixed, and each molecule in a set is assayed against it. The vector of each compound’s binding affinities to the panel is its affinity fingerprint. A library of compounds can therefore be considered a set of vectors in ‘‘affinity fingerprint space,’’ an alternative space to those defined by molecular structure. Molecules that interact with the same affinity toward the same proteins would be considered biologically similar, regardless of their chemical structures. For example, met-enkephalin and morphine are obviously very different structurally, but exhibit very similar opioid pharmacology. They also have very similar affinity fingerprints, as shown in Figure 1.4. By contrast, morphine, an agonist, and naloxone, an antagonist, have very similar structures, but very different biological activities. The use of affinity fingerprints for lead discovery has a limitation because there is no in silico method to generate an affinity fingerprint for a compound. A physical sample of the compound must be available, which carries with it the costs associated with compound acquisition, chemical inventory, and assaying the compounds
15
1.9. A HIT-TO-LEAD EXAMPLE
Met-enkephalin
Morphine affinity fingerprint
chemical structure
affinity fingerprint
chemical structure
S
H2N
H N O
CH3
N
OH
O N H
H N O
CH3
H
H
O N H
OH
OH O
O OH
Figure 1.4. Chemical structures and affinity fingerprints for met-enkephalin and morphine. The protein panel is able to recognize the biological similarities between these two structurally distinct molecules. The two molecules have the same affinity for over half the proteins in the 12-member panel.
against the protein panel. Despite this limitation, affinity fingerprints have been shown to be an effective descriptor for identifying drug leads. The screening process that uses them for low throughput, high information assays reliably identifies molecules with activities in the low micromolar range.
1.9.
A HIT-TO-LEAD EXAMPLE
Baxter and coworkers at AstraZeneca have published a number of papers [24–26] that specifically illustrate the HTL process applied to various molecular targets following HTS campaigns. These works are distinguished by their emphasis on meeting a set of generic lead criteria. These criteria are an abbreviated version of those listed in Reference 2, and are shown in Table 1.1. Lead compounds should fulfill the majority of the lead criteria, but it is understood that very few compounds will be found that meet them all. By way of example, Reference 26 describes the process as applied to a series of thiazolopyrimidine CXCR2 receptor antagonists. The profile of the screening hit 11 is shown in Table 1.2. The compound is not particularly potent (10 mM) in the binding assay, but did show comparable potency (2 mM) in a functional FLIPR assay, suggesting that the observed activity was real. The HTL strategy was to examine in turn the three thiazolopyrimidine substituents looking for SAR leading to increasing potency. Replacement of the 2-amino group with hydrogen retained activity, but other changes eliminated activity. The 2-amino group was therefore held
16
LEAD DISCOVERY: THE PROCESS
Table 1.1. Generic lead target profile Generic assay
Generic criteria
Potency Rat hepatocytes Human liver microsomes Rat iv pharmacokinetics Oral bioavailability Physical chemical Additionally
IC50 o 0.1 mM Clearance o 14 mL/min/106 cells Clearance o 23 mL/min/mg Clearance o 35 mL/min/kg, Vss W 0.5 L/kg, T1/2 W 0.5 h F W 10%, PPB o 99.5% MW o 450, clogP o 3.0, logD o 3.0 Clear SAR, appropriate selectivity data, and patentable multiple series
Table 1.2. Lead criteria profiles of screening active 11 and resulting lead 12
OH S H2N
S
N
N
N
OH
HN H2N S
11
N
N N
F S
F
12
Generic lead criteriaa
Compound 11
Compound 12
Binding IC50 o 0.1 mM Ca flux IC50 o 0.1 mM Rat hepatocyte clearance o14 Human microsome clearance o23 Rat iv clearance o35 Rat iv Vss W 0.5 L/kg Rat iv T1/2 W 0.5 h Rat po bioavailability W10% Plasma protein binding W99.5% Molecular weight o450 Solubility W10 mg/mL clogP o 3.0 LogD o 3.0
10 2.0 49 18 – – – – – 270 27 2.0 2.9
0.014 0.04 4 31 25 1.9 1.2 15 98.4 397 0.5 3.1 3.4
a
Units as in Table 1.1 where not stated.
constant. The 7-hydroxy group could be replaced by hydroxyethylamines, but not by the slightly larger hydroxypropylamine. The 5-alkylthio group could be replaced by a benzylthio group with a significant increase in potency. The chemistry lent itself readily to combinatorial expansions at positions 5 and 7,
REFERENCES
17
resulting in a series of compounds with clear SAR. Of particular interest was the observation that compounds with electron withdrawing groups in the aryl ring of the 5-benzylthio substituent showed decreased clearance. The lead criteria profile for one of the resulting compounds (12) is also shown in Table 1.2. The compound fails in human microsome clearance and rat volume of distribution criteria, but by only a small amount in either case. Further, the compound showed good oral bioavailability, notwithstanding its low solubility. Supported by SAR for several of the criteria, and considerable freedom to operate in intellectual property space, the series exemplified by compound 12 formed the basis of a new LO project. Similar successful HTL campaigns are described in References 25 and 26.
1.10.
CONCLUSION
Developing new strategies for finding leads for drug discovery is an ongoing process. Much progress has been made over the last two decades from the elementary process of high throughput random screening. There has been growing appreciation for drug-like physicochemical properties and earlier ADME to prevent attrition further along the increasingly expensive development pathway. As we have gained experience with current strategies, newer strategies continue to emerge. The following chapters describe some of these emerging strategies, and should be both exciting and informative for anyone involved in what is arguably the most challenging aspect of modern medicinal chemistry.
REFERENCES 1. Michne, W. F. (1996). Hit-to-lead chemistry: a key element in new lead generation. Pharmaceutical News, 3, 19–21. 2. Davis, A. M., Keeling, D. J., Steele, J., Tomkinson, N. P., Tinker, A. C. (2005). Components of successful lead generation. Current Topics in Medicinal Chemistry, 5, 421–439. 3. Bohacek, R. S., McMartin, C., Guida, W. C. (1996). The art and practice of structure-based drug design: a molecular modeling perspective. Medicinal Research Reviews, 16, 3–50. 4. Lipinski, C. A., Lombardo, F., Dominy, B. W., Feeney, P. J. (1997). Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 23, 3–25. 5. Veber, D. F., Johnson, S. R., Cheng, H., Smith, B. R., Ward, K. W., Kopple, K. D. (2002). Molecular properties that influence the oral bioavailability of drug candidates. Journal of Medicinal Chemistry, 45, 2615–2623. 6. Lu, J. J., Crimin, K., Goodwin, J. T., Crivori, P., Orrenius, C., Xing, L., Tandler, P. J., Vidmar, T. J., Amore, B. M., Wilson, A. G. E., Stouten, P. F. W., Burton, P. S. (2004). Influence of molecular flexibility and polar surface area metrics on oral bioavailability in the rat. Journal of Medicinal Chemistry, 47, 6104–6107.
18
LEAD DISCOVERY: THE PROCESS
7. Wenlock, M. C., Austin, R. P., Barton, P., Davis, A. M., Leeson, P. D. (2003). A comparison of physiochemical property profiles of development and marketed drugs. Journal of Medicinal Chemistry, 46, 1250–1256. 8. Leeson, P. D., Springthorpe, B. (2007). The influence of drug-like concepts on decision-making in medicinal chemistry. Nature Reviews Drug Discovery, 6, 881–890. 9. Bemis, G. W., Murcko, M. A. (1996). The properties of known drugs. 1. Molecular frameworks. Journal of Medicinal Chemistry, 39, 2887–2893. 10. Bemis, G. W., Murcko, M. A. (1999). The properties of known drugs. 2. Side chains. Journal of Medicinal Chemistry, 42, 5095–5099. 11. Ajay, Bemis, G. W., Murcko, M. A. (1999). Designing libraries with CNS activity. Journal of Medicinal Chemistry, 42, 4942–4951. 12. Jacoby, E., Schuffenhauer, A., Popov, M., Azzaoui, K., Havill, B., Schopfer, U., Engeloch, C., Stanek, J., Acklin, P., Rigollier, P., Stoll, F., Koch, G., Meier, P., Orain, D., Giger, R., Hinrichs, J., Malagu, K., Zimmermann, J., Roth, H. (2005). Key aspects of the Novartis compound collection enhancement project for the compilation of a comprehensive chemogenomics drug discovery screening collection. Current Topics in Medicinal Chemistry, 5, 397–411. 13. Lajiness, M. S., Maggiora, G. M., Shanmugasundaram, V. (2004). Assessment of the consistency of medicinal chemists in reviewing sets of compounds. Journal of Medicinal Chemistry, 47, 4891–4896. 14. Bertz, S. H. (1981). The first general index of molecular complexity. Journal of the American Chemical Society, 103, 3599–3601. 15. Whitlock, H. W. (1998). On the structure of total syntheses of complex natural products. Journal of Organic Chemistry, 63, 7982–7989. 16. Barone, R., Chanon, M. (2001). A new and simple approach to chemical complexity. Application to the synthesis of natural products. Journal of Chemical Information and Computer Sciences, 41, 269–272. 17. Bentley, K. W., Hardy, D. G. (1967). Novel analgesics and molecular rearrangements in the morphine-thebaine group. I. Ketones derived from 6,14-endo-ethenotetrahydrothebaine. Journal of the American Chemical Society, 89, 3267–3273. 18. Tan, D. S., Foley, M. A., Stockwell, B. R., Shair, M. D., Schreiber, S. L. (1999). Synthesis and preliminary evaluation of a library of polycyclic small molecules for use in chemical genetics assays. Journal of the American Chemical Society, 121, 9073–9087. 19. Hann, M. M., Leach, A. R., Harper, G. (2001). Molecular complexity and its impact on the probability of finding leads for drug discovery. Journal of Chemical Information and Computer Sciences, 41, 856–864. 20. Schuffenhauer, A., Brown, N., Selzer, P., Ertl, P., Jacoby, E. (2006). Relationships between molecular complexity, biological activity, and structural diversity. Journal of Chemical Information and Modeling, 46, 525–535. 21. Martin, Y. C., Kofron, J. L., Traphagen, L. M. (2002). Do structurally similar molecules have similar biological activity? Journal of Medicinal Chemistry, 45, 4350–4358. 22. Bostro¨m, J., Hogner, A., Schmitt, S. (2006). Do structurally similar ligands bind in a similar manner? Journal of Medicinal Chemistry, 49, 6716–6725. 23. Beroza, P., Damodaran, K., Lum, R. T. (2005). Target-related affinity profiling: Telik’s lead discovery technology. Current Topics in Medicinal Chemistry, 5, 371–381.
REFERENCES
19
24. Baxter, A., Bennion, C., Bent, J., Boden, K., Brough, S., Cooper, A., Kinchin, E., Kindon, N., McInally, T., Mortimore, M., Roberts, B., Unitt, J. (2003). Hit-to-lead studies: the discovery of potent, orally bioavailable triazolethiol CXCR2 receptor antagonists. Bioorganic and Medicinal Chemistry Letters, 13, 2625–2628. 25. Baxter, A., Brough, S., Cooper, A., Floettmann, E., Foster, S., Harding, C., Kettle, J., McInally, T., Martin, C., Mobbs, M., Needham, M., Newham, P., Paine, S., StGallay, S., Salter, S., Unitt, J., Xue, Y. (2004). Hit-to-lead studies: the discovery of potent, orally active, thiophenecarboxamide IKK-2 inhibitors. Bioorganic and Medicinal Chemistry Letters, 14, 2817–2822. 26. Baxter, A., Cooper, A., Kinchin, E., Moakes, K., Unitt, J., Wallace, A. (2006). Hitto-lead studies: the discovery of potent, orally bioavailable thiazolopyrimidine CXCR2 receptor antagonists. Bioorganic and Medicinal Chemistry Letters, 16, 960–963.
2
High Throughput Screening Approach to Lead Discovery ZORAN RANKOVIC*, CRAIG JAMIESON, AND RICHARD MORPHY Medicinal Chemistry, Schering-Plough Corporation, Newhouse, Lanarkshire, ML1 5SH, UK.
[email protected] CONTENTS 2.1
Introduction
22
2.2
Historical Background 2.2.1 Early Years: Focus on Numbers and Speed 2.2.2 Phase 2: In Pursuit of Quality
22 22 23
2.3
Screening Collection Enhancement Strategies 2.3.1 Screening Libraries 2.3.2 Compound Acquisition 2.3.3 Size Matters Screening Strategies 2.4.1 Mixtures versus Single Compound Screening 2.4.2 Full Deck Screening 2.4.3 Focused Screening 2.4.4 Sequential Screening HTS Assays and Equipment Hit Confirmation and Assessment 2.6.1 Hit Confirmation Workflow 2.6.2 Post Screen Heuristics 2.6.3 Retesting at Single Concentration 2.6.4 Expert Analysis 2.6.5 Hit Confirmation 2.6.6 Hit Profiling HTS: Successes, Failures, and Examples Summary and Outlook References
30 31 41 43 45 45 45 46 46 47 50 50 50 51 52 54 56 58 60 62
2.4
2.5 2.6
2.7 2.8
Lead Generation Approaches in Drug Discovery, Edited by Zoran Rankovic and Richard Morphy. Copyright r 2010 John Wiley & Sons, Inc.
21
22
HIGH THROUGHPUT SCREENING APPROACH TO LEAD DISCOVERY
2.1.
INTRODUCTION
For most of the last century, drugs were discovered starting from endogenous ligands, natural products, or marketed drugs followed by chemical modifications to improve efficacy in in vivo models [1]. With the breakthroughs in biochemistry in the 1970s and advances in molecular biology in the 1980s, this successful approach was supplemented by in vitro assays with increased throughput. The ambition was to screen an ever-increasing number of compounds to identify leads for numerous new targets derived from the human genome project. In the early 1990s, the inception of high throughput screening (HTS), and the advent of combinatorial chemistry heralded a new age of drug discovery. The ability to screen hundreds of thousands, if not millions, of compounds in a high throughput fashion to populate company pipelines is now widely recognized as a central paradigm in modern drug discovery. More recently, HTS has been recognized by the academic community as providing an avenue for identifying chemical probes to investigate biological systems and the effects of target modulation [2].
2.2.
HISTORICAL BACKGROUND
Since its inception in the early 1990s, HTS has undergone a rapid and profound evolution. Two distinct phases could be delineated from this period. The early years were characterized by a brute force approach with the main emphasis on sheer numbers of compounds and speed of their synthesis and screening. Lessons learned from this period had suggested ‘‘less is more’’ and triggered a second phase characterized by more emphasis on quality. 2.2.1.
Early Years: Focus on Numbers and Speed
The emergence of a new drug discovery paradigm in the early 1990s powered by combinatorial chemistry, genomics, and HTS technologies created unprecedented optimism within the pharmaceutical industry and wider public [3]. There were high expectations that this approach would deliver multiple new starting points for each target derived from the human genome project, and new drugs with improved efficacy and safety profile would quickly follow [4]. Screening throughput and the size of compound collections across the industry rapidly increased as the high expectations led to considerable investment in the further development of automation, assay technologies, and combinatorial chemistry. Before the HTS revolution, most pharmaceutical companies possessed collections of only a few thousand compounds originating mainly from historical projects. The advent of HTS in the early 1990s triggered the expansion of corporate compound collections at a staggering rate. By the mid-1990s the number of compounds available for screening in major
2.2. HISTORICAL BACKGROUND
23
pharmaceutical companies was already around 50,000 and by end of the decade was over 500,000. The majority of this expansion was driven by combinatorial chemistry and purchasing compounds from vendors. Initially, vendors were mainly providing compounds acquired from academic labs, often in the former Soviet Union, in variable quantities and qualities. However, against the background of ever-increasing demand, this business model rapidly evolved into what is currently a multimillion dollar industry, offering well over 10 million compounds of generally high purity [5]. However, as it often happens with new technologies, the early optimism was soon replaced by deep skepticism. By the late 1990s it became apparent that despite unprecedented investment, the tangible deliverables from HTS fell well short of expectations, raising concerns about the effectiveness of this new approach [6]. Screening campaigns provided very few tractable leads, positively impacting only a small proportion of discovery projects [7]. Poor hit rates and high attrition at the discovery stage were attributed primarily to the suboptimal quality of corporate screening files brought by a numbers-driven compound acquisition strategy that focused on rapid expansion and paid too little attention to quality. 2.2.2.
Phase 2: In Pursuit of Quality
Once the importance of the quality of corporate screening collection became apparent by the end of the 1990s, the process of weeding out undesirable material began. The focus was initially on the purity and identity of compounds in the collection. In the next stage, the emphasis shifted toward removal of undesirable structural features using chemical filters (structural alerts), and then advanced further to consider physicochemical properties that influence the in vivo profile of a compound. 2.2.2.1. Sample Purity and Identity The purity of samples entering screening collections had been compromised in the early 1990s in order to meet the demands of highly ambitious expansion programs that were designed to support the increasing HTS throughput. The basic premise of the ‘‘en-masse’’ approach was that a massive size expansion of the screening pool was likely to enhance the chance of finding hits. So much so that even if only a small proportion of all potential hits in the file were to be identified, it would still be sufficient to initiate successful drug discovery programs for every target of interest. Hence, the purity and overall quality of compounds in screening collections were considered to be less important compared to the actual number of compounds in the collection. By the mid-1990s, having endured several cycles of resource-intensive yet largely fruitless pursuits for new HTS-derived leads, most of the industry had shifted the focus from the synthesis and testing libraries of mixtures to that of discrete compounds. Critically, the emphasis was still on the number of compounds at the expense of their purity. Since the output of high throughput
24
HIGH THROUGHPUT SCREENING APPROACH TO LEAD DISCOVERY
synthesis was far beyond the library purification and analysis capabilities of the time, only a tiny proportion of library compounds were purified and fully characterized. Consequently, screening collections continued to be populated by uncharacterized, often crude reaction mixtures, despite being originally intended as discrete compounds. It transpired that the screening of such libraries regularly produced high false positive hit rates and had attendant issues with hit confirmation, resulting in a substantial waste of time and resources [8]. It has also been demonstrated that the degradation rate during the storage of compounds with lower purity is greater compared to those with higher purity [9]. All of these findings provided a strong justification for investments into the development and acquisition of high throughput purification systems [10]. Use of resin-bound capturing reagents and automated liquid– liquid extractors (LLE) were amongst the earliest investigated library purification techniques, both of which unfortunately proved to be of limited scope and only partially successful in removing excess reagents. Similarly to LLEs, automated solid-phase extraction (SPE) was more suitable for relatively small hit and lead optimization libraries containing up to several hundred compounds, rather than diverse screening libraries of over a thousand compounds. Particular attention has been given to high performance liquid chromatography (HPLC) methods due to their wide applicability and adaptability to automation. Semipreparative systems can purify 50–100 mg of crude material per injection using standard reverse-phase binary acetonitrile/water gradients over a relatively short period of time. Traditional HPLC systems used mainly for repetitive purifications of sequential batches were adapted for high throughput by incorporation of automated sample injection and fraction collection devices. Importantly, replacement of the traditional UV peak detection by realtime mass spectrometry to trigger fraction collection (liquid chromatography mass spectrometry, LC-MS) eliminated the need for collection of large fraction arrays, and consequently significantly improved the throughput and logistics of library purification. After raising the quality-based entry criteria and providing the relevant infrastructure for new compound production, attention inevitably shifted toward the quality of compounds already in the screening collection. Indeed, understanding the quality of the screening collection has been seen as the first step toward its improvement. In a pursuit of the ‘‘pure and sure’’ ideal, some research organizations undertook the enormous task of analysing their entire screening collections. In the case of the combined heritage collections of SmithKline Beecham and GlaxoWellcome at the time of their merger, the new collection contained well over a million compounds [11]. Analysis of the whole was achieved using custom-built HPLC systems equipped with a 384-well-plate autosampler and a sequence of diode array (DAD), evaporative light scattering (ELSD), and mass spectrometry (positive- and negative-ion electrospray) detectors. A fast gradient method with a 5-min cycle time per sample allowed analysis of 50,000 compounds per month. The analysis showed just over 60 percent of compounds met the internal quality requirements.
2.2. HISTORICAL BACKGROUND
25
Interestingly, the pass rate was very similar for the two heritage collections despite their quite different composition and storage conditions, suggesting that the observed quality level may well be typical for historical collections in other big pharmaceutical companies. The dominance of LC-MS as the method of choice for library purification and analysis has been seriously challenged over the recent years by advances made in supercritical fluid chromatography (SFC). This technique is a variation of normal phase HPLC in which a compressed supercritical gas (carbon dioxide near its critical temperature of 311C and pressure of 73 bar) is used as a mobile phase with methanol as the polar component of the binary gradient. This confers several advantages, including the possibility of using higher flow rates and longer columns for better separation, reduced costs, and, most crucially, easy evaporation of the SFC mobile phase eliminating the need for timeconsuming solvent evaporation and disposal (which often exceeds the purchase price). The ability to provide purified compounds in salt-free form is also seen as a potentially important advantage since organic salts of trifluoroacetic acid (TFA), which is traditionally used in HPLC as a modifier, have recently been associated with high rates of degradation during storage [12], and cell-based assay interference [13]. Since both analytical and purification capabilities are at least comparable to HPLC [14], the only remaining advance still required before SFC is adopted as a method of choice for library purification is its combination with mass spectrometry (MS) detection. Although custom SFC/ MS systems have been reported [15,16], it still remains to be seen if the technology gains wider acceptance and becomes commercially available. In most current workflows, compounds are generally purified and subjected to vigorous analysis before being incorporated into the screening collection. The most widely accepted purity cutoff for this purpose is generally greater than 85 percent. Inevitably, the requirement for robust synthetic protocols, purification, and thorough structural analysis has resulted in fewer compounds and libraries being produced. This, however, is generally considered to be an acceptable limitation given the industry’s current focus on quality. 2.2.2.2. Compound Storage The quality of screening collections can be influenced by the initial chemical and physical quality of compound samples as well as their rate of degradation under storage conditions. Traditionally, the handling of newly synthesized compounds had been a manual, resourceintensive process that included local storage and ad hoc distribution across projects. In the late 1990s it became apparent that compound management had to change in a major way in order to be compatible with the requirements of HTS. The increased number of screens, rapid expansion of corporate collections, and growing focus on quality demanded new approaches to compound handling. Indeed, by the turn of the century most drug discovery organizations had developed integrated, automated compound management systems. The key features of these systems include centralized solid and liquid sample stores enabling reliable production, storage, and rapid delivery to HTS of high quality
26
HIGH THROUGHPUT SCREENING APPROACH TO LEAD DISCOVERY
dimethyl sulfoxide (DMSO) solutions with defined concentration. The creation of liquid stores was not trivial, however, since the prolonged storage of organic compounds in solution can lead to significant sample degradation at room temperature [17]. Many solvents were initially explored, including DMF, methanol, and ethanol, but DMSO was universally adopted as a solvent of choice since it solubilizes the highest proportion of test compounds, and is compatible with both cell-based (0.1–0.2 percent v/v) and cell-free assays (1–5 percent v/v). There still remain some drawbacks associated with this choice, which are mainly related to the hydroscopic nature of DMSO. It can absorb up to 10 percent water in as little as 5 h under normal laboratory conditions, which can have a significant effect on compound concentration, solubility, and degradation [18]. Samples containing 5 percent water in DMSO are less stable than dry DMSO, and thus humidity control is considered critical for maintaining the integrity of repository compounds [9]. Therefore, in order to reduce sample degradation many pharmaceutical organizations store their screening collections as frozen, dry DMSO solutions at temperatures from 41C to as low as 201C, and low relative humidity to maintain water content below 0.5–1 percent. Consequently, samples have to be thawed to produce solutions for HTS and then frozen again for long-term storage. This freeze/thaw cycling can have profound effects on the sample integrity due to compound degradation or precipitation [17]. Indeed, the repeated freeze/thaw cycles, together with water uptake during reformatting and poor initial quality, were cited as the most common reasons for compromised sample integrity throughout the industry [19]. To address some of these issues, new liquid management concepts have been introduced, such as storing samples in individual sealed tubes as small DMSO aliquots for single use, eliminating the need for repeated freeze/thaw cycles, and minimizing exposure to air and moisture. Interestingly, some organizations had adopted a quite opposite strategy, where rather than trying to avoid water absorption, samples are stored as 10 percent water/DMSO solutions [20]. This controlled addition of water potentially turns the liability into an advantage. The 10 percent water addition lowers the freezing point of DMSO (181C), allowing samples to be stored as solutions at 41C to slow down any potential degradation, while eliminating the need for potentially damaging freeze/thaw cycles. In addition, the water uptake of DMSO/water mixtures is much slower compared to anhydrous DMSO, therefore further reducing problems related to volume increase and diminished solubility during storage. 2.2.2.3. Chemical Filters (Structural Alerts) To take full advantage of HTS, the original strategy was to screen every available compound in a company’s collection. Consequently, synthetic intermediates, reagents, and even catalysts were incorporated into screening collections. Following the realization that not every conceivable structure may be suitable for screening, various computational filters were developed to ensure that compounds containing undesirable
2.2. HISTORICAL BACKGROUND
27
structural features, termed structural alerts, are removed from existing compound collections and acquisition efforts. Removal of suspected false positives is a critical aspect of enhancing the quality of the HTS process and output. One of the most common causes of false positives in the early days of HTS was the presence of chemically reactive species, such as alkylating and acylating reagents, which can react irreversibly with proteins and consequently interfere with an assay readout [21,22]. Easily identifiable chemically reactive groups were among the first structural alerts used as computational filters to flag and remove undesirable compounds from existing screening collections and future acquisition activities. However, there are a range of chemotypes that are more difficult to capture by simple in silico filters that can also interfere with certain types of assays via other mechanisms and appear as false positives. For example, fluorescent and colored compounds are well known to produce misleading readouts in assays based on fluorimetric or colorimetric detection [23]. Similarly, aggregate-forming compounds could ‘‘light up’’ as false positives in biochemical assays [24]. However, the task of identifying these ‘‘problem compounds’’ on the basis of their structure is less than trivial; application of robust orthogonal assays at the hit confirmation stage is critical to weed out compound and assay-related artifacts, and this issue is discussed in more detail in Section 2.6. Chemical filters are generally not limited to assay interfering functionalities, but also include features that would make molecules unattractive as a potential starting point for hit optimization. Maximal numbers for some groups or atoms are defined, for example, cutoffs for halogen atoms: W3 chloro or bromo atoms or W6 fluorines. Similarly, highly complex molecules, as measured by number of asymmetric centers or fused rings, are also considered undesirable, as well as ‘‘bland’’ or undevelopable structures that lack functionalities generally associated with specific protein binding. A number of molecular complexity descriptors have been reported in the literature that could be used to filter out compounds from both of these extremes [25]. In addition, well-documented toxicophores such as aromatic amine, hydrazine, and diazo groups are also considered as structural alerts and have been used as exclusion or deselection filters [26]. There is a ongoing debate within the medicinal chemistry community about how strictly toxicophore filters should be applied when designing screening libraries or purchasing commercial compounds for corporate collections. On one side of the argument, rather than limiting the chemical diversity coverage of the screening collection and therefore the chances of finding a hit, the presence of a toxicophore should be tolerated since a suitable bioisostere could be found during the hit optimization. On the other hand, considering both the vastness of chemical space and the intense challenges of drug discovery, one may argue that a risk management approach of selecting a nearest neighbor lacking structural liabilities could be a more profitable strategy. This is particularly compelling given that toxicity is one of the major causes of failure in clinical development [27].
28
HIGH THROUGHPUT SCREENING APPROACH TO LEAD DISCOVERY
Most research organizations would now apply a specific set of structural alerts derived from a consensus of opinions of an in-house panel of experienced medicinal chemists. In addition, ongoing analysis of HTS data is invaluable to identify frequent hitters, which are otherwise difficult to predict [28], and this information can be fed into the design of new alerts. The list is reviewed on a regular basis (e.g., annually) to ensure new knowledge and structural alerts are captured. In addition to applying relevant knowledge derived from internal projects and the literature, visual inspection of compounds selected for purchase for the screening collection is an effective way of identifying new structural alerts and preventing undesirables slipping through existing chemical filters. Some of the most widely applied structural alerts are listed in Figure 2.1. 2.2.2.4. Physicochemical Properties Retrospective analysis of combinatorial chemistry and HTS output in the mid-1990s raised concern since the majority of the hits identified were large lipophilic molecules that were often difficult to optimize for in vivo activity. Addressing these concerns, the seminal publication in 1997 of the ‘‘rule of five’’ (Ro5) physicochemical property guidelines for drug permeability by Lipinski et al. has changed focus of drug discovery [29]. Their ease of calculation and conceptual simplicity has made the Ro5 the leading descriptor of ‘‘drug-likeness’’ adopted across the drug discovery community.
Compounds must NOT have any of the following: Reactive and chemically unstable groups, such as: Anhydride, acylhalide, sulphonyl halide, alkyl halides except alkyl fluoride, N–S, –N=C=O, –N=C=S, –N=C=N–, –C=S, N–halogen, S–halogen, R–SH, epoxide, azide, aziridine, thiirane, Michael acceptor, aldehydes, imine, 1,2-dicarbonyl. Potential toxicophores: –N–NH2, –N=N–, ArNH2, ArNO2, >2 ArOH, >2 fused aryl rings Structurally unattractive features Any element other than H, C, N, O, S, F, Cl, Br, I Sugar-like frameworks (C–O)n, where n>3 Crown ethers Linear chain with (CH2)n, where n>6 More than 6 F atoms, >3 Cl, Br, I atoms Compounds must HAVE the following: At least one N or O atom At least one non-cyclic bond
Figure 2.1. Structural alerts: Groups and structural features considered undesirable for compounds in a screening collection.
2.2. HISTORICAL BACKGROUND
29
Compounds displaying drug-like properties are associated with acceptable aqueous solubility and adequate oral bioavailability, and consequently have a better chance of progressing successfully through the drug discovery and development trajectory [30]. For example, it has been shown that the mean properties of compounds in development converge at each stage toward the mean properties of marketed oral drugs, indicating that clinical candidates with properties close to marketed oral drugs are likely to be more successful, whereas compounds with higher lipophilicity tend to fail in development [31]. The fact that the physicochemical property means of US Food and Drug Agency (FDA)-approved orally administered drugs have not significantly changed over the past 20 years (i.e. mean MW = 343 Da and mean clogP = 2.3) indicate that these parameters are not incidental but have real physiological relevance [32]. The publication of the Ro5 not only changed the course of medicinal chemistry but also provided a powerful demonstration of the importance of in silico and in vitro predictors for in vivo pharmacokinetic properties [33]. For example, in an analysis of other parameters important in controlling absorption, Veber suggested that compounds with r10 rotatable bonds and PSAr140A˚, or r12 H-bond donors and acceptors, have high probability of good (W20 percent) oral bioavailability in the rat [34]. It should be emphasized that the above observations should be used more as guidance since different drug administration routes [32] and the nature of the target may demand different property profiles [35,36]. For example, a program aimed at modulating a monoamine transporter in the central nervous system (CNS) would require ligands with a different profile in terms of logP and number of hydrogen bond donors and acceptors [37], compared to a project aimed at identifying antimicrobial agents active in the urinary tract. It is therefore appropriate to adopt the Ro5 as necessary, but compliance with this is not sufficient to create a drug alone [38]. The emergence of the concept of drug-likeness enhanced awareness within the medicinal chemistry community regarding the importance of physicochemical parameters and the fact that selecting and optimizing leads is a multiparameter problem. However, when considering the optimal physicochemical profile of screening libraries it should be borne in mind that the primary aim of HTS is to produce tractable starting points for discovery programs, rather than ready-made drugs. An analysis of the properties of lead-drug pairs found that in comparison to drugs, lead structures on average exhibit lower lipohilicity and lower molecular complexity, expressed as lower MW and lower number of rotatable bonds and rings [39]. Indeed, incorporating lipophilic groups is an effective and commonly employed medicinal chemistry strategy for improving in vitro potency, for which reason the process of optimizing leads into drugs often result in more complex structures. Application of this strategy would be quite difficult for leads with properties already at the top end of drug-likeness, without compromising Ro5 guidelines. This would be highly undesirable since high lipophilicity has been
30
HIGH THROUGHPUT SCREENING APPROACH TO LEAD DISCOVERY
associated with not only poor pharmacokinetic properties but also promiscuity, resulting in increased risks of pharmacologically based toxicity [40,41]. Consequently, it has been proposed that when designing screening libraries, emphasis should be placed on compounds with ‘‘lead-like’’ properties [38,39]: MW r 460 Da; clogP 4 to 4.2; number of rotatable bonds r10; number of rings r4; hydrogen bond donors r5; and hydrogen bond acceptors r9. A study of a probability model of interactions between ligands and receptors of varying complexities further reinforced the importance of lead-like concepts when designing screening libraries [42]. It demonstrated that as the system becomes more complex the chance of observing a useful interaction for a randomly chosen ligand fails dramatically. This suggests that screening libraries containing large complex molecules have lower chance of producing a hit. Even if a hit is found, options to optimize using approaches that increase clogP/MW to improve potency would be limited for large structurally complex leads. The lower limits of ‘‘the smaller the better’’ philosophy are determined by biological assay capabilities. Starting points with lower MW are likely to have lower potency, which may fall below the sensitivity of HTS assay at the standard 10 mM screening concentration. Screening such compounds at higher concentrations (e.g., W500 mM) is severely hampered by issues related to solubility, purity, and interference with assay readout. The lead-like concept has been pushed even further in recent years in the development of a novel approach, fragment-based lead discovery, centered around Ro3 compounds and relying on biophysical methods for screening (see Chapter 4). It has become clear over the last decade that the success of the HTS paradigm is not only dependent on increasing throughput, in terms of quantities of compounds synthesized and screened, but also on quality of their design. For more details on the impact of physicochemical properties on in vitro and in vivo ADME properties see Chapter 8.
2.3.
SCREENING COLLECTION ENHANCEMENT STRATEGIES
The implementation of new inclusion criteria and the consequent cleanup activities during the past decade have resulted in a significant depletion of screening collections across the industry. In some cases, in excess of half the collection was deemed unsuitable for screening [43,11]. This provided a strong industry-wide impetus for enhanced acquisition programs. In the current environment, strict quality control measures have been incorporated into the process in order to ensure that the mistakes of earlier acquisition campaigns are not repeated [43,20]. Most corporate collection enhancement programs are based around three complementary acquisition strategies: internal project compounds, libraries designed specifically for HTS and compounds acquired from commercial
2.3. SCREENING COLLECTION ENHANCEMENT STRATEGIES
31
Screening Libraries
Internal Projects
Screening Collection
HTS
Compound Acquisition
Figure 2.2.
Main sources of compounds for screening collection.
suppliers (Fig. 2.2). The first and arguably most critical approach is to ensure that all compounds synthesized within internal projects are swiftly incorporated into the screening collection. It is widely recognized that corporate historical sets often deliver high hit rates and attractive starting points with a favorable intellectual property (IP) position. Historically, compound transfer to the screening collection was a manual ad hoc activity occurring mostly after project termination, by which time many of the most interesting samples would be lost or used up. This also meant that compounds would lie idle for several years in chemist’s drawers before becoming available for screening. Today, many research organizations have established centralized sample management systems, which deliver newly synthesized project compounds to a project’s primary assay screen and to a centralized HTS store at approximately the same time. 2.3.1.
Screening Libraries
The question of which compounds to make from the vast chemical universe and how to design screening libraries in order to increase the chances of identifying tractable hits has both fascinated and challenged the drug discovery community ever since the advent of HTS. As discussed above, the results of the early random approach to the design of screening libraries fell well short of the performance originally anticipated from HTS, in terms of both the number and
32
HIGH THROUGHPUT SCREENING APPROACH TO LEAD DISCOVERY
quality of starting points for optimization. This stimulated massive efforts over the past decade across the industry and in academia to improve library design methods. There are two distinct design approaches currently in use. The diversity-based approach is chemistry-driven design with the ultimate objective to enhance the coverage of diversity space that a screening collection represents, whereas the focused-based approach also considers target or target family knowledge to incorporate biological relevance into the library design. 2.3.1.1. Diversity Libraries Although diversity continues to be an important aspect in the design of screening sets, diversity-based design methods have evolved significantly from the early days of combinatorial chemistry when emphasis was primarily on large libraries often built around structurally complex and unusual scaffolds (a common core structure present in all members of a library produced by a common synthetic route). Diversity considerations are now more concerned with covering chemical space, mainly by smaller ‘‘smarter’’ libraries built around more scaffolds, rather than larger libraries with fewer scaffolds. Accordingly, the emphasis of compound collection diversity analysis has shifted in recent years from complete structures to their scaffolds [44]. This highlighted the bias that exists in current screening collections, and turned attention toward filling the gaps in scaffold space [45]. The selection of scaffolds for diversity libraries is predominately chemistry driven. Novel synthetic strategies and methods are used to access previously unexplored drug-like or lead-like chemotypes. The structural novelty is an important criterion addressing not only the gaps in the screening collection but also the patentability of HTS hits, which is increasingly considered critical within the highly competitive drug discovery environment. Hence, the evaluation of proposed scaffolds normally begins with an assessment of their novelty in the primary and patent literature, using for example SciFinder and Marpat. (Fig. 2.3). A number of parameters are used to assess the library structure coverage, including dimensionality, complementarity to the corporate screening collection and the number of available building blocks. Generally, libraries with higher and more proportionate dimensionality are highly valued (e.g., 10 10 10), whereas 1D libraries are often considered undesirable. Similarly, multidimensional libraries with a large difference in number of building blocks available for the least and most diverse points (i.e., more than twofold) are also less favored. The synthetic feasibility of a scaffold and corresponding library compounds are also an important part of the overall library assessment. Scaffolds should be accessible on around the 50–100-g scale preferably by well-precedented chemistry, whereas library steps should be sufficiently robust to allow use of a wide range of building blocks. Libraries with anticipated lengthy development and production timelines (e.g., W8 months) are generally considered to be unfavorable. Scaffolds should also be relatively small, displaying physicochemical properties that would allow for majority of structures within enumerated libraries to meet lead-like or drug-like criteria. In addition to the most
2.3. SCREENING COLLECTION ENHANCEMENT STRATEGIES
33
Scaffold selection • Patentability/Novelty • Complementarity with scaffolds in collection • Dimensionality (2D or 3D) • Synthetic feasibility (scaffold and library) • Structural alerts (no reactive or unstable groups, and toxicophores) • Physicochemical properties (i.e., MW